library(tidyverse)
library(extrafont)
hot_dogs <- read_csv("http://bit.ly/cs631-hotdog",
col_types = cols(
gender = col_factor(levels = NULL)
))
Let’s adapt Nathan Yau’s hot dog contest example.
The first thing we notice is that we don’t have data about whether
each year’s winner is a record or not. Since our data is nicely tidy, we
can use dplyr
window functions:
First, we use base R’s cummax
to create a new
variable that reflects the maximum HDB eaten cumulatively, that is,
compared to all earlier years. For this reason, the
arrange(year)
here is critical.
Next, we want to know if the hdb_record
is actually
a new record or not, compared to all previous years. We can use
case_when
to create a logical variable that is TRUE if the
hdb_record
for a given year is greater than the
hdb_record
from the year before (using
dplyr::lag
). If not, this variable is FALSE.
hot_dogs_records <- hot_dogs %>%
filter(year >= 1980 & gender == 'male') %>%
arrange(year) %>%
mutate(hdb_record = cummax(num_eaten),
new_record = case_when(
hdb_record > lag(hdb_record) ~ TRUE,
TRUE ~ FALSE
)) %>%
filter(year >= 1981)
We’ll also make our x-axis ticks again…
years_to_label <- seq(from = 1981, to = 2017, by = 4)
years_to_label
[1] 1981 1985 1989 1993 1997 2001 2005 2009 2013 2017
hd_years <- hot_dogs_records %>%
distinct(year) %>%
mutate(year_lab = ifelse(year %in% years_to_label, year, ""))
hdb_records <- ggplot(hot_dogs_records,
aes(x = year, y = num_eaten)) +
geom_col(aes(fill = new_record)) +
labs(x = "Year", y = "Hot Dogs and Buns Consumed") +
ggtitle("Nathan's Hot Dog Eating Contest Results, 1981-2017") +
scale_fill_manual(values = c('#284a29', '#629d62')) +
scale_y_continuous(expand = c(0, 0),
breaks = seq(0, 70, 10)) +
scale_x_continuous(expand = c(0, 0),
breaks = hd_years$year,
labels = hd_years$year_lab) +
coord_cartesian(xlim = c(1980, 2018), ylim = c(0, 80)) +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5),
axis.text = element_text(size = 12),
panel.background = element_blank(),
axis.line.x = element_line(color = "gray92",
size = 0.5),
axis.ticks = element_line(color = "gray92",
size = 0.5),
text = element_text(family = "Lato"),
legend.position = "bottom",
panel.grid.minor = element_blank())
hdb_records
We’ll do this to highlight differences in gender.
https://drsimonj.svbtle.com/plotting-background-data-for-groups-with-ggplot2
hot_dogs_both <- hot_dogs %>%
filter(year >= 1981)
hot_dog_behind <- hot_dogs_both %>%
filter(gender == "male") %>%
select(-gender)
hdb_facets <- ggplot(hot_dogs_both,
aes(x = year, y = num_eaten)) +
geom_col(data = hot_dog_behind, fill = '#4254a7', alpha = .1) +
geom_col(aes(fill = gender), show.legend = FALSE) +
facet_wrap(~gender) +
labs(x = "", y = "Hot Dogs and Buns Consumed") +
ggtitle("Nathan's Hot Dog Eating Contest Results, 1981-2017") +
scale_fill_manual(values = c('#4254a7', '#f4b31a')) +
scale_y_continuous(expand = c(0, 0),
breaks = seq(0, 70, 10)) +
scale_x_continuous(expand = c(0, 0),
breaks = seq(1981, 2017, 6)) +
coord_cartesian(xlim = c(1980, 2018), ylim = c(0, 80)) +
theme(axis.text = element_text(size = 10),
panel.background = element_blank(),
axis.line.x = element_line(color = "grey80",
size = 0.5),
axis.ticks = element_line(color = "grey80",
size = 0.5),
text = element_text(family = "Lato"),
legend.position = "bottom",
panel.grid.minor = element_blank(),
panel.spacing = unit(1, "lines"))
hdb_facets