geom_col
ggplot provides geom_bar
and geom_col
for
making bar/column plots. But both result in vertical ‘columns’, so
what’s the difference? geom_bar
is designed for counting
the number of cases in each group. So for a dataframe, the number of
rows in each group. Here’s an example using the starwars dataset.
library(tidyverse)
starwars %>%
ggplot(aes(x=sex)) + geom_bar()
If you try the same thing with geom_col
you’ll get an
error saying that y has to be defined:
library(tidyverse)
starwars %>%
ggplot(aes(x=sex)) + geom_col()
## Error in `check_required_aesthetics()`:
## ! geom_col requires the following missing aesthetics: y
What you usually see is that people calculate the count or the y
value anyway. This makes sense as you may wish to view it or use it for
labels or something. But if you do that with geom_bar
then
you have to add stat = "identity"
to the
geom_bar
function, because its default value is
"count"
. So like this:
library(tidyverse)
starwars %>%
group_by(sex) %>%
summarise(count = n()) %>%
ggplot(aes(x=sex , y=count)) + geom_bar(stat = "identity")
But why not just use geom_col
and save yourself the
extra term? Like this:
library(tidyverse)
starwars %>%
group_by(sex) %>%
summarise(count = n()) %>%
ggplot(aes(x=sex , y=count)) + geom_col()
For some reason geom_bar
along with
stat = "identity"
seems to appear on stack overflow far
more commonly than geom_col
. I don’t understand why. This
is my PSA to use geom_col
instead.