geom_col
(and
geom_bar
)Let’s make a stacked bar chart with labels inside each of the bars
which show the value of that bar. We’ll use the Titanic dataset as an
example. The Titanic dataset is not actually a dataframe but a different
structure called a ‘table’, but we can easily convert it to a dataframe
using data.frame()
. Here we are counting the number of
people in each class who did and did not survive. We then make a column
for each class, each having two rectangles, one for
Survived == "Yes"
and one for
Survived == "No"
.
To add the labels we use geom_text()
, and the content of
the labels is specified using label = count
within the
aes
mapping function. But if we left it at that the plot
would look crap! Within the geom_text
function I specify
position = position_stack(vjust = 0.5)
. This tells the
labels that they should be stacked, and that they should be placed
halfway up the column. If we removed the vjust = 0.5
then
the label would appear exactly on top of the column.
library(tidyverse)
data_for_plot <- Titanic %>% data.frame() %>%
group_by(Class , Survived) %>%
summarise(count = sum(Freq))
ggplot(data_for_plot, aes(x=Class, y = count , fill = Survived , label = count)) +
geom_col() +
geom_text(position = position_stack(vjust = 0.5))
If you want some kind of special formatting on the labels (like percent of thousand comma separators) you can either
count
that has your formatting and use that in the label =
part,
or
label =
part.
Below I’ll show both examples. The output is the same so I’ll only
show it once. I’ll load the scales package which provides the convenient
formatting functions percent()
and
comma()
.
library(scales)
# Option A, creating a new column which is a formatted version of pct
data_for_plot <- Titanic %>% data.frame() %>%
group_by(Class , Survived) %>%
summarise(count = sum(Freq)) %>%
mutate(pct = count/sum(count)) %>%
mutate(pct_label = percent(pct)) # < key part
ggplot(data_for_plot, aes(x=Class ,
y=pct ,
fill=Survived ,
label = pct_label)) + # < key part
geom_col() +
geom_text(position = position_stack(vjust = 0.5)) +
scale_y_continuous(labels = percent)
# Option B, wrapping pct in the formatting function pct directly
# in the 'label = ' bit.
data_for_plot <- Titanic %>% data.frame() %>%
group_by(Class , Survived) %>%
summarise(count = sum(Freq)) %>%
mutate(pct = count/sum(count))
ggplot(data_for_plot, aes(x=Class ,
y=pct ,
fill=Survived ,
label = percent(pct))) + # < key part
geom_col() +
geom_text(position = position_stack(vjust = 0.5)) +
scale_y_continuous(labels = percent)
Two notes on this.
scale_y_continuous(labels = percent)
so that the y scale goes from 0% to 100% rather than from 0 to 1. Note
that percent
here refers to the percent
function from the scales package, but the brackets must not be
used.accuracy
, so for example
percent(0.051, accuracy=1)
produces 5%, while
percent(0.051, accuracy=0.1)
produces 5.1%.Let’s say we have a normal, non-stacked column chart, and want the
labels to stick out at the top of each column. We can do that by adding
a nudge_y
value to geom_text()
. The parameters
nudge_x
and nudge_y
move the labels relative
to their default location which is the top centre of each column, and
the units for these parameters are the units of the axes themselves. The
value of 40 here is found through a little bit of trial and error.
data_for_plot <- Titanic %>% data.frame() %>%
group_by(Class ) %>%
summarise(count = sum(Freq))
ggplot(data_for_plot, aes(x=Class , y=count , label=count)) +
geom_col() +
geom_text(nudge_y = 40)
Another option is to use parameters hjust
and
vjust
, but you should be aware of some differences:
nudge_x
/nudge_y
and
hjust
/vjust
nudge_x
and nudge_y
, but not with
hjust
and vjust
. If a label disappears off the
plot when using vjust
you will probably need to adjust the
axis limits.nudge_x
and nudge_y
cannot be used in
combination with position =
geom_text, but
hjust
and vjust
can.nudge_x
and nudge_y
are the
units of the respective axes, while I think the units of
hjust
and vjust
relate to the size of the text
item itself.nudge_x
and nudge_y
move text right and up, while positive values of hjust
and
vjust
move text left and down.+ coord_flip()
then
the values of nudge_x
and nudge_y
would remain
unchanged but you would need to reconfigure hjust
and
vjust
since they relate to the plot area and not the
axes.hjust
and vjust
can take values of
"top"
, "bottom"
, "left"
, and
"right"
. However for the purposes of positioning text
labels I don’t find "top"
and "bottom"
useful
because there is zero offset; the text touches the column border.What about a grouped column chart with text labels popping out the
top? In that case, we need to move the text outside the top of the bar
and offset the text to the left and right so they are
not sitting on top of each other. We can achieve the latter
left-right-offset by using position = position_dodge()
within the geom_text()
function. The width
value of 0.9 seems to work well for most charts of this type with a
discrete x-axis. However, when it comes to moving the text outside the
top of the bar we cannot use nudge_y
in combination with
position =
as mentioned above. Instead we have to use
vjust
. Note that I’ve had to adjust the y-axis limits using
ylim()
because the plot area doesn’t expand for text moved
using vjust
.
Titanic %>% data.frame() %>%
group_by(Class, Sex ) %>%
summarise(count = sum(Freq)) %>%
ggplot(aes(x=Class , y=count , fill = Sex, label=count)) +
geom_col(position="dodge") +
geom_text(position=position_dodge(width=0.9) , vjust=-0.7) +
ylim(0,1000)
I saw some nice infographics with labels at the right-hand side of a
bar chart and wanted to recreate that. I do it here by making a new
variable called label_nudge_distance
which will be equal to
the distance from the end of each bar to the position of the label.
Since each bar ends at a different point, the value of
label_nudge_distance
is specific to each bar. I set it as
the maximum value of count
minus the current value of
count
, plus an extra value of 50 so that the text doesn’t
sit right on top of the end of the largest bar, and this value of 50 was
found through trial and error (maybe you could standardise it by setting
it to a percent of the largest bar).
Note that label_nudge_distance
is specified within
nudge_y
using the $
operator to extract
label_nudge_distance
from data_for_plot
, since
there is no aes
mapping function with
geom_text
.
data_for_plot <- Titanic %>% data.frame() %>%
group_by(Class ) %>%
summarise(count = sum(Freq)) %>%
mutate(label_nudge_distance = 50 + max(count)-count)
ggplot(data_for_plot, aes(x=Class , y=count , label=count)) +
geom_col() +
coord_flip()+
geom_text(nudge_y = data_for_plot$label_nudge_distance)
Let’s give this plot the Rolls Royce treatment with a couple of
adjustments, making the largest category the focus of the plot. We’ll
order the bars from largest to smallest using the function
reorder
within the aes
mapping function for
ggplot, with the second argument to this function being
count
indicating how they will be ordered. I’ll set the
colour for the bar and label using the variable
bar_label_colour
, which is equal to darkred
for the largest category and grey50
for all other
categories. I’m making a couple of other changes to the appearance here,
using theme_minimal
to remove a lot of the aesthetic bits
like the background colour, then removing gridlines using
panel.grid.major
and panel.grid.minor
(both
are set to element_blank()
). I’m setting the y-axis text to
size 13 and getting rid of the x-axis text altogether (maybe
controversial). I’m using labs
to set the x and y axis
labels to an empty string and to give the plot an appropriate title.
data_for_plot <- Titanic %>% data.frame() %>%
group_by(Class ) %>%
summarise(count = sum(Freq)) %>%
mutate(label_nudge_distance = 50 + max(count)-count) %>%
mutate(bar_label_colour = if_else(count == max(count), "darkred","grey50"))
ggplot(data_for_plot, aes(x=reorder(Class, count) , y=count , label=count)) +
geom_col(fill = data_for_plot$bar_label_colour) +
coord_flip()+
geom_text(nudge_y = data_for_plot$label_nudge_distance,
fontface="bold",
colour = data_for_plot$bar_label_colour) +
theme_minimal() +
labs(y = "", x = "" , title = "Deaths by passenger class") +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_text(size=13))