geom_colggplot provides geom_bar and geom_col for
making bar/column plots. But both result in vertical ‘columns’, so
what’s the difference? geom_bar is designed for counting
the number of cases in each group. So for a dataframe, the number of
rows in each group. Here’s an example using the starwars dataset.
library(tidyverse)
starwars %>%
ggplot(aes(x=sex)) + geom_bar()
If you try the same thing with geom_col you’ll get an
error saying that y has to be defined:
library(tidyverse)
starwars %>%
ggplot(aes(x=sex)) + geom_col()
## Error in `check_required_aesthetics()`:
## ! geom_col requires the following missing aesthetics: y
What you usually see is that people calculate the count or the y
value anyway. This makes sense as you may wish to view it or use it for
labels or something. But if you do that with geom_bar then
you have to add stat = "identity" to the
geom_bar function, because its default value is
"count". So like this:
library(tidyverse)
starwars %>%
group_by(sex) %>%
summarise(count = n()) %>%
ggplot(aes(x=sex , y=count)) + geom_bar(stat = "identity")
But why not just use geom_col and save yourself the
extra term? Like this:
library(tidyverse)
starwars %>%
group_by(sex) %>%
summarise(count = n()) %>%
ggplot(aes(x=sex , y=count)) + geom_col()
For some reason geom_bar along with
stat = "identity" seems to appear on stack overflow far
more commonly than geom_col. I don’t understand why. This
is my PSA to use geom_col instead.