ggplot provides geom_bar and geom_col for making bar/column plots. But both result in vertical ‘columns’, so what’s the difference? geom_bar is designed for counting the number of cases in each group. So for a dataframe, the number of rows in each group. Here’s an example using the starwars dataset.

library(tidyverse)

starwars %>% 
  ggplot(aes(x=sex)) + geom_bar()

If you try the same thing with geom_col you’ll get an error saying that y has to be defined:

library(tidyverse)

starwars %>% 
  ggplot(aes(x=sex)) + geom_col()
## Error in `check_required_aesthetics()`:
## ! geom_col requires the following missing aesthetics: y

What you usually see is that people calculate the count or the y value anyway. This makes sense as you may wish to view it or use it for labels or something. But if you do that with geom_bar then you have to add stat = "identity" to the geom_bar function, because its default value is "count". So like this:

library(tidyverse)

starwars %>% 
  group_by(sex) %>% 
  summarise(count = n()) %>% 
  ggplot(aes(x=sex , y=count)) + geom_bar(stat = "identity")

But why not just use geom_col and save yourself the extra term? Like this:

library(tidyverse)

starwars %>% 
  group_by(sex) %>% 
  summarise(count = n()) %>% 
  ggplot(aes(x=sex , y=count)) + geom_col()

For some reason geom_bar along with stat = "identity" seems to appear on stack overflow far more commonly than geom_col. I don’t understand why. This is my PSA to use geom_col instead.