Labels in the middle of columns

Let’s make a stacked bar chart with labels inside each of the bars which show the value of that bar. We’ll use the Titanic dataset as an example. The Titanic dataset is not actually a dataframe but a different structure called a ‘table’, but we can easily convert it to a dataframe using data.frame(). Here we are counting the number of people in each class who did and did not survive. We then make a column for each class, each having two rectangles, one for Survived == "Yes" and one for Survived == "No".

To add the labels we use geom_text(), and the content of the labels is specified using label = count within the aes mapping function. But if we left it at that the plot would look crap! Within the geom_text function I specify position = position_stack(vjust = 0.5). This tells the labels that they should be stacked, and that they should be placed halfway up the column. If we removed the vjust = 0.5 then the label would appear exactly on top of the column.

library(tidyverse)

data_for_plot <- Titanic %>% data.frame() %>% 
  group_by(Class , Survived) %>% 
  summarise(count = sum(Freq)) 

ggplot(data_for_plot, aes(x=Class, y = count , fill = Survived , label = count)) +
  geom_col() +
  geom_text(position = position_stack(vjust = 0.5))

If you want some kind of special formatting on the labels (like percent of thousand comma separators) you can either

  1. make a new variable which is a character version of count that has your formatting and use that in the label = part, or
  2. wrap the numeric value in the formatting function you want directly in the label = part.

Below I’ll show both examples. The output is the same so I’ll only show it once. I’ll load the scales package which provides the convenient formatting functions percent() and comma().

library(scales)
# Option A, creating a new column which is a formatted version of pct

data_for_plot <- Titanic %>% data.frame() %>% 
  group_by(Class , Survived) %>% 
  summarise(count = sum(Freq)) %>% 
  mutate(pct = count/sum(count)) %>% 
  mutate(pct_label = percent(pct))  # < key part

ggplot(data_for_plot, aes(x=Class , 
                          y=pct , 
                          fill=Survived , 
                          label = pct_label)) + # < key part
  geom_col() +
  geom_text(position = position_stack(vjust = 0.5)) +
  scale_y_continuous(labels = percent)
# Option B, wrapping pct in the formatting function pct directly
# in the 'label = ' bit.

data_for_plot <- Titanic %>% data.frame() %>% 
  group_by(Class , Survived) %>% 
  summarise(count = sum(Freq)) %>% 
  mutate(pct = count/sum(count))

ggplot(data_for_plot, aes(x=Class , 
                          y=pct , 
                          fill=Survived , 
                          label = percent(pct))) + # < key part
  geom_col() +
  geom_text(position = position_stack(vjust = 0.5)) +
  scale_y_continuous(labels = percent)

Two notes on this.

Labels at the end of columns

Let’s say we have a normal, non-stacked column chart, and want the labels to stick out at the top of each column. We can do that by adding a nudge_y value to geom_text(). The parameters nudge_x and nudge_y move the labels relative to their default location which is the top centre of each column, and the units for these parameters are the units of the axes themselves. The value of 40 here is found through a little bit of trial and error.

data_for_plot <- Titanic %>% data.frame() %>% 
  group_by(Class ) %>% 
  summarise(count = sum(Freq))

ggplot(data_for_plot, aes(x=Class , y=count , label=count)) +
  geom_col() +
  geom_text(nudge_y = 40) 

Another option is to use parameters hjust and vjust, but you should be aware of some differences:

Differences between nudge_x/nudge_y and hjust/vjust

  • The plot area expands to accomodate movement of labels through nudge_x and nudge_y, but not with hjust and vjust. If a label disappears off the plot when using vjust you will probably need to adjust the axis limits.
  • nudge_x and nudge_y cannot be used in combination with position = geom_text, but hjust and vjust can.
  • The units of nudge_x and nudge_y are the units of the respective axes, while I think the units of hjust and vjust relate to the size of the text item itself.
  • Positive values of nudge_x and nudge_y move text right and up, while positive values of hjust and vjust move text left and down.
  • If you flip the coordinates using + coord_flip() then the values of nudge_x and nudge_y would remain unchanged but you would need to reconfigure hjust and vjust since they relate to the plot area and not the axes.
  • hjust and vjust can take values of "top", "bottom", "left", and "right". However for the purposes of positioning text labels I don’t find "top" and "bottom" useful because there is zero offset; the text touches the column border.

What about a grouped column chart with text labels popping out the top? In that case, we need to move the text outside the top of the bar and offset the text to the left and right so they are not sitting on top of each other. We can achieve the latter left-right-offset by using position = position_dodge() within the geom_text() function. The width value of 0.9 seems to work well for most charts of this type with a discrete x-axis. However, when it comes to moving the text outside the top of the bar we cannot use nudge_y in combination with position = as mentioned above. Instead we have to use vjust. Note that I’ve had to adjust the y-axis limits using ylim() because the plot area doesn’t expand for text moved using vjust.

Titanic %>% data.frame() %>% 
  group_by(Class, Sex ) %>% 
  summarise(count = sum(Freq)) %>% 
  ggplot(aes(x=Class , y=count , fill = Sex, label=count)) +
  geom_col(position="dodge") +
  geom_text(position=position_dodge(width=0.9) , vjust=-0.7) +
  ylim(0,1000)

Labels at the end of the plot

I saw some nice infographics with labels at the right-hand side of a bar chart and wanted to recreate that. I do it here by making a new variable called label_nudge_distance which will be equal to the distance from the end of each bar to the position of the label. Since each bar ends at a different point, the value of label_nudge_distance is specific to each bar. I set it as the maximum value of count minus the current value of count, plus an extra value of 50 so that the text doesn’t sit right on top of the end of the largest bar, and this value of 50 was found through trial and error (maybe you could standardise it by setting it to a percent of the largest bar).

Note that label_nudge_distance is specified within nudge_y using the $ operator to extract label_nudge_distance from data_for_plot, since there is no aes mapping function with geom_text.

data_for_plot <- Titanic %>% data.frame() %>% 
  group_by(Class ) %>% 
  summarise(count = sum(Freq)) %>% 
  mutate(label_nudge_distance = 50 + max(count)-count) 

ggplot(data_for_plot, aes(x=Class , y=count , label=count)) +
  geom_col() +
  coord_flip()+
  geom_text(nudge_y = data_for_plot$label_nudge_distance) 

Let’s give this plot the Rolls Royce treatment with a couple of adjustments, making the largest category the focus of the plot. We’ll order the bars from largest to smallest using the function reorder within the aes mapping function for ggplot, with the second argument to this function being count indicating how they will be ordered. I’ll set the colour for the bar and label using the variable bar_label_colour, which is equal to darkred for the largest category and grey50 for all other categories. I’m making a couple of other changes to the appearance here, using theme_minimal to remove a lot of the aesthetic bits like the background colour, then removing gridlines using panel.grid.major and panel.grid.minor (both are set to element_blank()). I’m setting the y-axis text to size 13 and getting rid of the x-axis text altogether (maybe controversial). I’m using labs to set the x and y axis labels to an empty string and to give the plot an appropriate title.

data_for_plot <- Titanic %>% data.frame() %>% 
  group_by(Class ) %>% 
  summarise(count = sum(Freq)) %>% 
  mutate(label_nudge_distance = 50 + max(count)-count) %>% 
  mutate(bar_label_colour = if_else(count == max(count), "darkred","grey50"))

ggplot(data_for_plot, aes(x=reorder(Class, count) , y=count , label=count)) +
  geom_col(fill = data_for_plot$bar_label_colour) +
  coord_flip()+
  geom_text(nudge_y = data_for_plot$label_nudge_distance,
            fontface="bold", 
            colour = data_for_plot$bar_label_colour) +
  theme_minimal() + 
  labs(y = "", x = "" , title = "Deaths by passenger class") +
  theme(panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(),
        axis.text.x = element_blank(),
        axis.text.y = element_text(size=13))