Become familiar with tools for generating publication-ready tables directly in R.
We will use data from the following paper: MacFarlane, H., Gorman, K., Ingham, R., Presmanes Hill, A., Papadakis, K., Kiss, G., & van Santen, J. (2017). Quantitative analysis of disfluency in children with autism spectrum disorder or language impairment. PLoS ONE, 12(3), e0173936.
mazes <- read_csv("http://bit.ly/mazes-gist") %>%
clean_names() #janitor package
glimpse(mazes)
Rows: 381
Columns: 12
$ study_id <chr> "CSLU-001", "CSLU-001", "CSLU-001", "CSLU-001", "CSLU-002", …
$ ca <dbl> 5.6667, 5.6667, 5.6667, 5.6667, 6.5000, 6.5000, 6.5000, 6.50…
$ viq <dbl> 124, 124, 124, 124, 124, 124, 124, 124, 108, 108, 108, 108, …
$ dx <chr> "TD", "TD", "TD", "TD", "TD", "TD", "TD", "TD", "TD", "TD", …
$ activity <chr> "Conversation", "Picture Description", "Play", "Wordless Pic…
$ content <dbl> 24, 1, 21, 8, 3, 5, 8, 2, 25, 10, 2, 5, 32, 20, 13, 21, 27, …
$ filler <dbl> 31, 2, 6, 2, 10, 3, 8, 2, 21, 13, 10, 2, 12, 9, 4, 4, 12, 6,…
$ rep <dbl> 2, 0, 3, 0, 3, 2, 3, 0, 4, 0, 0, 0, 13, 5, 5, 6, 10, 5, 5, 2…
$ rev <dbl> 5, 0, 8, 4, 0, 1, 2, 0, 4, 2, 1, 3, 8, 7, 2, 8, 5, 1, 4, 2, …
$ fs <dbl> 17, 1, 10, 4, 0, 2, 3, 2, 17, 8, 1, 2, 11, 8, 6, 7, 12, 3, 3…
$ cued <dbl> 36, 2, 6, 2, 10, 3, 9, 2, 29, 13, 11, 2, 14, 12, 4, 11, 17, …
$ not_cued <dbl> 50, 3, 27, 10, 13, 8, 15, 4, 38, 23, 11, 7, 42, 26, 17, 18, …
The workhorse for making tables in R Markdown documents is the knitr
package’s kable
function. This function is really versatile, but also free of fancy formatting options, for better or worse.
knitr::kable
kable
all tables everywhereUpdate the YAML of your document. For HTML:
---
title: "My Awesome Data Vis Lab"
output:
html_document:
df_print: kable
---
You can also define the html format in the global options.
# If you don't define format here, you'll need put `format = "html"` in every kable function.
options(knitr.table.format = "html")
# You may also wish to set this option
options(scipen = 1, digits = 2)
kable
table in a chunkFor HTML:
head(mazes) %>%
kable(format = "html")
study_id | ca | viq | dx | activity | content | filler | rep | rev | fs | cued | not_cued |
---|---|---|---|---|---|---|---|---|---|---|---|
CSLU-001 | 5.6667 | 124 | TD | Conversation | 24 | 31 | 2 | 5 | 17 | 36 | 50 |
CSLU-001 | 5.6667 | 124 | TD | Picture Description | 1 | 2 | 0 | 0 | 1 | 2 | 3 |
CSLU-001 | 5.6667 | 124 | TD | Play | 21 | 6 | 3 | 8 | 10 | 6 | 27 |
CSLU-001 | 5.6667 | 124 | TD | Wordless Picture Book | 8 | 2 | 0 | 4 | 4 | 2 | 10 |
CSLU-002 | 6.5000 | 124 | TD | Conversation | 3 | 10 | 3 | 0 | 0 | 10 | 13 |
CSLU-002 | 6.5000 | 124 | TD | Picture Description | 5 | 3 | 2 | 1 | 2 | 3 | 8 |
head(mazes) %>%
kable(format = "html", digits = 2, caption = "A table produced by kable.")
study_id | ca | viq | dx | activity | content | filler | rep | rev | fs | cued | not_cued |
---|---|---|---|---|---|---|---|---|---|---|---|
CSLU-001 | 5.67 | 124 | TD | Conversation | 24 | 31 | 2 | 5 | 17 | 36 | 50 |
CSLU-001 | 5.67 | 124 | TD | Picture Description | 1 | 2 | 0 | 0 | 1 | 2 | 3 |
CSLU-001 | 5.67 | 124 | TD | Play | 21 | 6 | 3 | 8 | 10 | 6 | 27 |
CSLU-001 | 5.67 | 124 | TD | Wordless Picture Book | 8 | 2 | 0 | 4 | 4 | 2 | 10 |
CSLU-002 | 6.50 | 124 | TD | Conversation | 3 | 10 | 3 | 0 | 0 | 10 | 13 |
CSLU-002 | 6.50 | 124 | TD | Picture Description | 5 | 3 | 2 | 1 | 2 | 3 | 8 |
my_maze_names <- c("Participant", "Age", "Verbal\nIQ", "Group", "Activity", "Content\nMaze", "Filler\nMaze", "Repetition", "Revision", "False\nStart", "Cued", "Not\nCued")
head(mazes) %>%
kable(format = "html", digits = 2, caption = "A table produced by kable.",
col.names = my_maze_names)
Participant | Age | Verbal IQ | Group | Activity | Content Maze | Filler Maze | Repetition | Revision | False Start | Cued | Not Cued |
---|---|---|---|---|---|---|---|---|---|---|---|
CSLU-001 | 5.67 | 124 | TD | Conversation | 24 | 31 | 2 | 5 | 17 | 36 | 50 |
CSLU-001 | 5.67 | 124 | TD | Picture Description | 1 | 2 | 0 | 0 | 1 | 2 | 3 |
CSLU-001 | 5.67 | 124 | TD | Play | 21 | 6 | 3 | 8 | 10 | 6 | 27 |
CSLU-001 | 5.67 | 124 | TD | Wordless Picture Book | 8 | 2 | 0 | 4 | 4 | 2 | 10 |
CSLU-002 | 6.50 | 124 | TD | Conversation | 3 | 10 | 3 | 0 | 0 | 10 | 13 |
CSLU-002 | 6.50 | 124 | TD | Picture Description | 5 | 3 | 2 | 1 | 2 | 3 | 8 |
kable
tables in a chunkSolution: apply some Bootstrap CSS styling using the kableExtra
package.
head(mazes) %>%
kable(format = "html", digits = 2, caption = "A styled kable table.",
col.names = my_maze_names) %>%
kable_styling()
Participant | Age | Verbal IQ | Group | Activity | Content Maze | Filler Maze | Repetition | Revision | False Start | Cued | Not Cued |
---|---|---|---|---|---|---|---|---|---|---|---|
CSLU-001 | 5.67 | 124 | TD | Conversation | 24 | 31 | 2 | 5 | 17 | 36 | 50 |
CSLU-001 | 5.67 | 124 | TD | Picture Description | 1 | 2 | 0 | 0 | 1 | 2 | 3 |
CSLU-001 | 5.67 | 124 | TD | Play | 21 | 6 | 3 | 8 | 10 | 6 | 27 |
CSLU-001 | 5.67 | 124 | TD | Wordless Picture Book | 8 | 2 | 0 | 4 | 4 | 2 | 10 |
CSLU-002 | 6.50 | 124 | TD | Conversation | 3 | 10 | 3 | 0 | 0 | 10 | 13 |
CSLU-002 | 6.50 | 124 | TD | Picture Description | 5 | 3 | 2 | 1 | 2 | 3 | 8 |
Lots of printing options: https://haozhu233.github.io/kableExtra/awesome_table_in_html.html
head(mazes) %>%
kable(format = "html", digits = 2, caption = "A non-full width zebra kable table.") %>%
kable_styling(bootstrap_options = "striped", full_width = F)
study_id | ca | viq | dx | activity | content | filler | rep | rev | fs | cued | not_cued |
---|---|---|---|---|---|---|---|---|---|---|---|
CSLU-001 | 5.67 | 124 | TD | Conversation | 24 | 31 | 2 | 5 | 17 | 36 | 50 |
CSLU-001 | 5.67 | 124 | TD | Picture Description | 1 | 2 | 0 | 0 | 1 | 2 | 3 |
CSLU-001 | 5.67 | 124 | TD | Play | 21 | 6 | 3 | 8 | 10 | 6 | 27 |
CSLU-001 | 5.67 | 124 | TD | Wordless Picture Book | 8 | 2 | 0 | 4 | 4 | 2 | 10 |
CSLU-002 | 6.50 | 124 | TD | Conversation | 3 | 10 | 3 | 0 | 0 | 10 | 13 |
CSLU-002 | 6.50 | 124 | TD | Picture Description | 5 | 3 | 2 | 1 | 2 | 3 | 8 |
head(mazes) %>%
kable(format = "html", digits = 2, caption = "Over here!") %>%
kable_styling(bootstrap_options = "striped", full_width = F, position = "left")
study_id | ca | viq | dx | activity | content | filler | rep | rev | fs | cued | not_cued |
---|---|---|---|---|---|---|---|---|---|---|---|
CSLU-001 | 5.67 | 124 | TD | Conversation | 24 | 31 | 2 | 5 | 17 | 36 | 50 |
CSLU-001 | 5.67 | 124 | TD | Picture Description | 1 | 2 | 0 | 0 | 1 | 2 | 3 |
CSLU-001 | 5.67 | 124 | TD | Play | 21 | 6 | 3 | 8 | 10 | 6 | 27 |
CSLU-001 | 5.67 | 124 | TD | Wordless Picture Book | 8 | 2 | 0 | 4 | 4 | 2 | 10 |
CSLU-002 | 6.50 | 124 | TD | Conversation | 3 | 10 | 3 | 0 | 0 | 10 | 13 |
CSLU-002 | 6.50 | 124 | TD | Picture Description | 5 | 3 | 2 | 1 | 2 | 3 | 8 |
kable
+ kableExtra
+ formattable
color_tile
and color_bar
are neat extras if used wisely!
http://haozhu233.github.io/kableExtra/use_kableExtra_with_formattable.html
library(formattable)
head(mazes) %>%
mutate(ca = color_tile("transparent", "lightpink")(ca),
viq = color_bar("lightseagreen")(viq)) %>%
kable("html", escape = F, caption = 'This table is colored.') %>%
kable_styling(position = "center") %>%
column_spec(4, width = "3cm")
study_id | ca | viq | dx | activity | content | filler | rep | rev | fs | cued | not_cued |
---|---|---|---|---|---|---|---|---|---|---|---|
CSLU-001 | 5.6667 | 124 | TD | Conversation | 24 | 31 | 2 | 5 | 17 | 36 | 50 |
CSLU-001 | 5.6667 | 124 | TD | Picture Description | 1 | 2 | 0 | 0 | 1 | 2 | 3 |
CSLU-001 | 5.6667 | 124 | TD | Play | 21 | 6 | 3 | 8 | 10 | 6 | 27 |
CSLU-001 | 5.6667 | 124 | TD | Wordless Picture Book | 8 | 2 | 0 | 4 | 4 | 2 | 10 |
CSLU-002 | 6.5000 | 124 | TD | Conversation | 3 | 10 | 3 | 0 | 0 | 10 | 13 |
CSLU-002 | 6.5000 | 124 | TD | Picture Description | 5 | 3 | 2 | 1 | 2 | 3 | 8 |
tibble
+ kable
+ kableExtra
You can also use any of these tools with plain text tables using the tibble
package to create a table. Two main functions:
tribble
: enter tibble by rowstibble
: enter tibble by columnsFor example, I used tribble
to make this table in our slide decks:
math_table <- tibble::tribble(
~Operator, ~Description, ~Usage,
"\\+", "addition", "x + y",
"\\-", "subtraction", "x - y",
"\\*", "multiplication", "x * y",
"/", "division", "x / y",
"^", "raised to the power of", "x ^ y",
"abs", "absolute value", "abs(x)",
"%/%", "integer division", "x %/% y",
"%%", "remainder after division", "x %% y"
)
Then I used this chunk to print it:
```{r, results = 'asis'}
knitr::kable(math_table, format = "html", caption = "Helpful mutate functions") %>%
kable_styling(bootstrap_options = "striped", full_width = F, position = "left")
```
knitr::kable(math_table, format = "html", caption = "Helpful mutate functions") %>%
kable_styling(bootstrap_options = "striped", full_width = F, position = "left")
Operator | Description | Usage |
---|---|---|
+ | addition | x + y |
- | subtraction | x - y |
* | multiplication | x * y |
/ | division | x / y |
^ | raised to the power of | x ^ y |
abs | absolute value | abs(x) |
%/% | integer division | x %/% y |
%% | remainder after division | x %% y |
Sometimes you may just want to type in a table in Markdown and ignore R. Four kinds of tables may be used. The first three kinds presuppose the use of a fixed-width font, such as Courier. The fourth kind can be used with proportionally spaced fonts, as it does not require lining up columns. All of the below will render when typed outside of an R code chunk since these are based on pandoc
being used to render your markdown document. Note that these should all work whether you are knitting to either html or PDF.
This code for a simple table:
Right Left Center Default
------- ------ ---------- -------
12 12 12 12
123 123 123 123
1 1 1 1
Table: Demonstration of simple table syntax.
Produces this simple table:
Right | Left | Center | Default |
---|---|---|---|
12 | 12 | 12 | 12 |
123 | 123 | 123 | 123 |
1 | 1 | 1 | 1 |
The headers and table rows must each fit on one line. Column alignments are determined by the position of the header text relative to the dashed line below it:3
The column headers may be omitted, provided a dashed line is used to end the table.
This code for a multi-line table:
-------------------------------------------------------------
Centered Default Right Left
Header Aligned Aligned Aligned
----------- ------- --------------- -------------------------
First row 12.0 Example of a row that
spans multiple lines.
Second row 5.0 Here's another one. Note
the blank line between
rows.
-------------------------------------------------------------
Table: Here's the caption. It, too, may span
multiple lines.
Produces this multi-line table:
Centered Header | Default Aligned | Right Aligned | Left Aligned |
---|---|---|---|
First | row | 12.0 | Example of a row that spans multiple lines. |
Second | row | 5.0 | Here’s another one. Note the blank line between rows. |
This code for a grid table:
: Sample grid table.
+---------------+---------------+--------------------+
| Fruit | Price | Advantages |
+===============+===============+====================+
| Bananas | $1.34 | - built-in wrapper |
| | | - bright color |
+---------------+---------------+--------------------+
| Oranges | $2.10 | - cures scurvy |
| | | - tasty |
+---------------+---------------+--------------------+
Produces this grid table:
Fruit | Price | Advantages |
---|---|---|
Bananas | $1.34 |
|
Oranges | $2.10 |
|
Alignments are not supported, nor are cells that span multiple columns or rows.
This code for a pipe table:
| Right | Left | Default | Center |
|------:|:-----|---------|:------:|
| 12 | 12 | 12 | 12 |
| 123 | 123 | 123 | 123 |
| 1 | 1 | 1 | 1 |
: Demonstration of pipe table syntax.
Produces this pipe table:
Right | Left | Default | Center |
---|---|---|---|
12 | 12 | 12 | 12 |
123 | 123 | 123 | 123 |
1 | 1 | 1 | 1 |
If you want to make tables that include R output (like output from functions like means, variances, or output from models), there are two steps:
This section covers (1). But, although there are some nice options for (2) within R Markdown via various packages, I am not dogmatic about doing everything in R Markdown, especially things like (2).
dplyr
We’ll use the pnwflights14
package to practice our dplyr
skills. We need to download the package from github using devtools
.
# once per machine
install.packages("devtools")
devtools::install_github("ismayc/pnwflights14")
Now, we need to load the flights
dataset from the pnwflights14
package.
# once per work session
data("flights", package = "pnwflights14")
dplyr::select
Use select to specify which columns in a dataframe you’d like to keep by name. Heretofore, this was not possible in base R! In base R, this can only be achieved using numeric variable positions. But most of the time, you keep track of your variables by name (like carrier
) rather than position (the 8th column).
# keep these 2 cols
mini_flights <- flights %>%
select(carrier, flight)
glimpse(mini_flights)
Rows: 162,049
Columns: 2
$ carrier <chr> "AS", "US", "UA", "US", "AS", "DL", "UA", "UA", "UA", "UA", "…
$ flight <int> 145, 1830, 1609, 466, 121, 1823, 1481, 229, 1576, 478, 1569, …
# keep first five cols
first_five <- flights %>%
select(year, month, day, dep_time, dep_delay)
glimpse(first_five)
Rows: 162,049
Columns: 5
$ year <int> 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014,…
$ month <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ day <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ dep_time <int> 1, 4, 8, 28, 34, 37, 346, 526, 527, 536, 541, 549, 550, 557…
$ dep_delay <dbl> 96, -6, 13, -2, 44, 82, 227, -4, 7, 1, 1, 24, 0, -3, -3, -2…
# alternatively, specify range
first_five <- flights %>%
select(year:dep_delay)
glimpse(first_five)
Rows: 162,049
Columns: 5
$ year <int> 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014,…
$ month <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ day <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ dep_time <int> 1, 4, 8, 28, 34, 37, 346, 526, 527, 536, 541, 549, 550, 557…
$ dep_delay <dbl> 96, -6, 13, -2, 44, 82, 227, -4, 7, 1, 1, 24, 0, -3, -3, -2…
We can also choose the columns we want by negation, that is, you can specify which columns to drop instead of keep. This way, all variables not listed are kept.
# we can also use negation
all_but_year <- flights %>%
select(-year)
glimpse(all_but_year)
Rows: 162,049
Columns: 15
$ month <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ day <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ dep_time <int> 1, 4, 8, 28, 34, 37, 346, 526, 527, 536, 541, 549, 550, 557…
$ dep_delay <dbl> 96, -6, 13, -2, 44, 82, 227, -4, 7, 1, 1, 24, 0, -3, -3, -2…
$ arr_time <int> 235, 738, 548, 800, 325, 747, 936, 1148, 917, 1334, 911, 90…
$ arr_delay <dbl> 70, -23, -4, -23, 43, 88, 219, 15, 24, -6, 4, 12, -12, -16,…
$ carrier <chr> "AS", "US", "UA", "US", "AS", "DL", "UA", "UA", "UA", "UA",…
$ tailnum <chr> "N508AS", "N195UW", "N37422", "N547UW", "N762AS", "N806DN",…
$ flight <int> 145, 1830, 1609, 466, 121, 1823, 1481, 229, 1576, 478, 1569…
$ origin <chr> "PDX", "SEA", "PDX", "PDX", "SEA", "SEA", "SEA", "PDX", "SE…
$ dest <chr> "ANC", "CLT", "IAH", "CLT", "ANC", "DTW", "ORD", "IAH", "DE…
$ air_time <dbl> 194, 252, 201, 251, 201, 224, 202, 217, 136, 268, 130, 122,…
$ distance <dbl> 1542, 2279, 1825, 2282, 1448, 1927, 1721, 1825, 1024, 2402,…
$ hour <dbl> 0, 0, 0, 0, 0, 0, 3, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6,…
$ minute <dbl> 1, 4, 8, 28, 34, 37, 46, 26, 27, 36, 41, 49, 50, 57, 57, 58…
dplyr::select
comes with several other helper functions…
depart <- flights %>%
select(starts_with("dep_"))
glimpse(depart)
Rows: 162,049
Columns: 2
$ dep_time <int> 1, 4, 8, 28, 34, 37, 346, 526, 527, 536, 541, 549, 550, 557…
$ dep_delay <dbl> 96, -6, 13, -2, 44, 82, 227, -4, 7, 1, 1, 24, 0, -3, -3, -2…
times <- flights %>%
select(contains("time"))
glimpse(times)
Rows: 162,049
Columns: 3
$ dep_time <int> 1, 4, 8, 28, 34, 37, 346, 526, 527, 536, 541, 549, 550, 557,…
$ arr_time <int> 235, 738, 548, 800, 325, 747, 936, 1148, 917, 1334, 911, 907…
$ air_time <dbl> 194, 252, 201, 251, 201, 224, 202, 217, 136, 268, 130, 122, …
# note that we are not actually saving the new dataframe here
flights %>%
select(-contains("time")) %>% head()
year | month | day | dep_delay | arr_delay | carrier | tailnum | flight | origin | dest | distance | hour | minute |
2014 | 1 | 1 | 96 | 70 | AS | N508AS | 145 | PDX | ANC | 1.54e+03 | 0 | 1 |
2014 | 1 | 1 | -6 | -23 | US | N195UW | 1830 | SEA | CLT | 2.28e+03 | 0 | 4 |
2014 | 1 | 1 | 13 | -4 | UA | N37422 | 1609 | PDX | IAH | 1.82e+03 | 0 | 8 |
2014 | 1 | 1 | -2 | -23 | US | N547UW | 466 | PDX | CLT | 2.28e+03 | 0 | 28 |
2014 | 1 | 1 | 44 | 43 | AS | N762AS | 121 | SEA | ANC | 1.45e+03 | 0 | 34 |
2014 | 1 | 1 | 82 | 88 | DL | N806DN | 1823 | SEA | DTW | 1.93e+03 | 0 | 37 |
delays <- flights %>%
select(ends_with("delay"))
glimpse(delays)
Rows: 162,049
Columns: 2
$ dep_delay <dbl> 96, -6, 13, -2, 44, 82, 227, -4, 7, 1, 1, 24, 0, -3, -3, -2…
$ arr_delay <dbl> 70, -23, -4, -23, 43, 88, 219, 15, 24, -6, 4, 12, -12, -16,…
One of my favorite select helper functions is everything()
, which allows you to use select to keep all your variables, but easily rearrange the columns without having to list all the variables to keep/drop.
new_order <- flights %>%
select(origin, dest, everything())
head(new_order)
origin | dest | year | month | day | dep_time | dep_delay | arr_time | arr_delay | carrier | tailnum | flight | air_time | distance | hour | minute |
PDX | ANC | 2014 | 1 | 1 | 1 | 96 | 235 | 70 | AS | N508AS | 145 | 194 | 1.54e+03 | 0 | 1 |
SEA | CLT | 2014 | 1 | 1 | 4 | -6 | 738 | -23 | US | N195UW | 1830 | 252 | 2.28e+03 | 0 | 4 |
PDX | IAH | 2014 | 1 | 1 | 8 | 13 | 548 | -4 | UA | N37422 | 1609 | 201 | 1.82e+03 | 0 | 8 |
PDX | CLT | 2014 | 1 | 1 | 28 | -2 | 800 | -23 | US | N547UW | 466 | 251 | 2.28e+03 | 0 | 28 |
SEA | ANC | 2014 | 1 | 1 | 34 | 44 | 325 | 43 | AS | N762AS | 121 | 201 | 1.45e+03 | 0 | 34 |
SEA | DTW | 2014 | 1 | 1 | 37 | 82 | 747 | 88 | DL | N806DN | 1823 | 224 | 1.93e+03 | 0 | 37 |
# with negation
new_order2 <- flights %>%
select(origin, dest, everything(), -year)
head(new_order2)
origin | dest | month | day | dep_time | dep_delay | arr_time | arr_delay | carrier | tailnum | flight | air_time | distance | hour | minute |
PDX | ANC | 1 | 1 | 1 | 96 | 235 | 70 | AS | N508AS | 145 | 194 | 1.54e+03 | 0 | 1 |
SEA | CLT | 1 | 1 | 4 | -6 | 738 | -23 | US | N195UW | 1830 | 252 | 2.28e+03 | 0 | 4 |
PDX | IAH | 1 | 1 | 8 | 13 | 548 | -4 | UA | N37422 | 1609 | 201 | 1.82e+03 | 0 | 8 |
PDX | CLT | 1 | 1 | 28 | -2 | 800 | -23 | US | N547UW | 466 | 251 | 2.28e+03 | 0 | 28 |
SEA | ANC | 1 | 1 | 34 | 44 | 325 | 43 | AS | N762AS | 121 | 201 | 1.45e+03 | 0 | 34 |
SEA | DTW | 1 | 1 | 37 | 82 | 747 | 88 | DL | N806DN | 1823 | 224 | 1.93e+03 | 0 | 37 |
We can also rename variables within select.
flights2 <- flights %>%
select(tail_num = tailnum, everything())
head(flights2)
tail_num | year | month | day | dep_time | dep_delay | arr_time | arr_delay | carrier | flight | origin | dest | air_time | distance | hour | minute |
N508AS | 2014 | 1 | 1 | 1 | 96 | 235 | 70 | AS | 145 | PDX | ANC | 194 | 1.54e+03 | 0 | 1 |
N195UW | 2014 | 1 | 1 | 4 | -6 | 738 | -23 | US | 1830 | SEA | CLT | 252 | 2.28e+03 | 0 | 4 |
N37422 | 2014 | 1 | 1 | 8 | 13 | 548 | -4 | UA | 1609 | PDX | IAH | 201 | 1.82e+03 | 0 | 8 |
N547UW | 2014 | 1 | 1 | 28 | -2 | 800 | -23 | US | 466 | PDX | CLT | 251 | 2.28e+03 | 0 | 28 |
N762AS | 2014 | 1 | 1 | 34 | 44 | 325 | 43 | AS | 121 | SEA | ANC | 201 | 1.45e+03 | 0 | 34 |
N806DN | 2014 | 1 | 1 | 37 | 82 | 747 | 88 | DL | 1823 | SEA | DTW | 224 | 1.93e+03 | 0 | 37 |
If you don’t want to move the renamed variables within your dataframe, you can use the rename
function.
flights3 <- flights %>%
rename(tail_num = tailnum)
glimpse(flights3)
Rows: 162,049
Columns: 16
$ year <int> 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014,…
$ month <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ day <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ dep_time <int> 1, 4, 8, 28, 34, 37, 346, 526, 527, 536, 541, 549, 550, 557…
$ dep_delay <dbl> 96, -6, 13, -2, 44, 82, 227, -4, 7, 1, 1, 24, 0, -3, -3, -2…
$ arr_time <int> 235, 738, 548, 800, 325, 747, 936, 1148, 917, 1334, 911, 90…
$ arr_delay <dbl> 70, -23, -4, -23, 43, 88, 219, 15, 24, -6, 4, 12, -12, -16,…
$ carrier <chr> "AS", "US", "UA", "US", "AS", "DL", "UA", "UA", "UA", "UA",…
$ tail_num <chr> "N508AS", "N195UW", "N37422", "N547UW", "N762AS", "N806DN",…
$ flight <int> 145, 1830, 1609, 466, 121, 1823, 1481, 229, 1576, 478, 1569…
$ origin <chr> "PDX", "SEA", "PDX", "PDX", "SEA", "SEA", "SEA", "PDX", "SE…
$ dest <chr> "ANC", "CLT", "IAH", "CLT", "ANC", "DTW", "ORD", "IAH", "DE…
$ air_time <dbl> 194, 252, 201, 251, 201, 224, 202, 217, 136, 268, 130, 122,…
$ distance <dbl> 1542, 2279, 1825, 2282, 1448, 1927, 1721, 1825, 1024, 2402,…
$ hour <dbl> 0, 0, 0, 0, 0, 0, 3, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6,…
$ minute <dbl> 1, 4, 8, 28, 34, 37, 46, 26, 27, 36, 41, 49, 50, 57, 57, 58…
dplyr::filter
# flights taking off from PDX
pdx <- flights %>%
filter(origin == "PDX")
head(pdx)
year | month | day | dep_time | dep_delay | arr_time | arr_delay | carrier | tailnum | flight | origin | dest | air_time | distance | hour | minute |
2014 | 1 | 1 | 1 | 96 | 235 | 70 | AS | N508AS | 145 | PDX | ANC | 194 | 1.54e+03 | 0 | 1 |
2014 | 1 | 1 | 8 | 13 | 548 | -4 | UA | N37422 | 1609 | PDX | IAH | 201 | 1.82e+03 | 0 | 8 |
2014 | 1 | 1 | 28 | -2 | 800 | -23 | US | N547UW | 466 | PDX | CLT | 251 | 2.28e+03 | 0 | 28 |
2014 | 1 | 1 | 526 | -4 | 1148 | 15 | UA | N813UA | 229 | PDX | IAH | 217 | 1.82e+03 | 5 | 26 |
2014 | 1 | 1 | 541 | 1 | 911 | 4 | UA | N36476 | 1569 | PDX | DEN | 130 | 991 | 5 | 41 |
2014 | 1 | 1 | 549 | 24 | 907 | 12 | US | N548UW | 649 | PDX | PHX | 122 | 1.01e+03 | 5 | 49 |
# january flights from PDX
pdx_jan <- flights %>%
filter(origin == "PDX", month == 1) # the comma is an "and"
head(pdx_jan)
year | month | day | dep_time | dep_delay | arr_time | arr_delay | carrier | tailnum | flight | origin | dest | air_time | distance | hour | minute |
2014 | 1 | 1 | 1 | 96 | 235 | 70 | AS | N508AS | 145 | PDX | ANC | 194 | 1.54e+03 | 0 | 1 |
2014 | 1 | 1 | 8 | 13 | 548 | -4 | UA | N37422 | 1609 | PDX | IAH | 201 | 1.82e+03 | 0 | 8 |
2014 | 1 | 1 | 28 | -2 | 800 | -23 | US | N547UW | 466 | PDX | CLT | 251 | 2.28e+03 | 0 | 28 |
2014 | 1 | 1 | 526 | -4 | 1148 | 15 | UA | N813UA | 229 | PDX | IAH | 217 | 1.82e+03 | 5 | 26 |
2014 | 1 | 1 | 541 | 1 | 911 | 4 | UA | N36476 | 1569 | PDX | DEN | 130 | 991 | 5 | 41 |
2014 | 1 | 1 | 549 | 24 | 907 | 12 | US | N548UW | 649 | PDX | PHX | 122 | 1.01e+03 | 5 | 49 |
# flights to ATL (Atlanta) or BNA (Nashville)
to_south <- flights %>%
filter(dest == "ATL" | dest == "BNA") %>% # | is "or"
select(origin, dest, everything())
head(to_south)
origin | dest | year | month | day | dep_time | dep_delay | arr_time | arr_delay | carrier | tailnum | flight | air_time | distance | hour | minute |
SEA | ATL | 2014 | 1 | 1 | 624 | -6 | 1401 | -6 | DL | N617DL | 968 | 235 | 2.18e+03 | 6 | 24 |
SEA | ATL | 2014 | 1 | 1 | 802 | -3 | 1533 | -17 | AS | N532AS | 742 | 249 | 2.18e+03 | 8 | 2 |
SEA | ATL | 2014 | 1 | 1 | 824 | -1 | 1546 | -14 | DL | N633DL | 128 | 234 | 2.18e+03 | 8 | 24 |
PDX | ATL | 2014 | 1 | 1 | 944 | -6 | 1727 | -8 | AS | N548AS | 752 | 252 | 2.17e+03 | 9 | 44 |
PDX | ATL | 2014 | 1 | 1 | 1054 | 94 | 1807 | 84 | DL | N377DA | 1502 | 237 | 2.17e+03 | 10 | 54 |
SEA | ATL | 2014 | 1 | 1 | 1158 | 6 | 1915 | -14 | DL | N6712B | 1962 | 234 | 2.18e+03 | 11 | 58 |
# flights from PDX to ATL (Atlanta) or BNA (Nashville)
pdx_to_south <- flights %>%
filter(origin == "PDX", dest == "ATL" | dest == "BNA") %>% # | is "or"
select(origin, dest, everything())
head(pdx_to_south)
origin | dest | year | month | day | dep_time | dep_delay | arr_time | arr_delay | carrier | tailnum | flight | air_time | distance | hour | minute |
PDX | ATL | 2014 | 1 | 1 | 944 | -6 | 1727 | -8 | AS | N548AS | 752 | 252 | 2.17e+03 | 9 | 44 |
PDX | ATL | 2014 | 1 | 1 | 1054 | 94 | 1807 | 84 | DL | N377DA | 1502 | 237 | 2.17e+03 | 10 | 54 |
PDX | ATL | 2014 | 1 | 1 | 1323 | -2 | 2038 | -15 | DL | N393DA | 773 | 235 | 2.17e+03 | 13 | 23 |
PDX | ATL | 2014 | 1 | 1 | 2253 | 8 | 611 | 4 | DL | N371DA | 503 | 240 | 2.17e+03 | 22 | 53 |
PDX | ATL | 2014 | 1 | 2 | 627 | -3 | 1350 | -7 | DL | N3746H | 1156 | 244 | 2.17e+03 | 6 | 27 |
PDX | ATL | 2014 | 1 | 2 | 918 | -2 | 1643 | -2 | DL | N3756 | 1940 | 249 | 2.17e+03 | 9 | 18 |
# alternatively, using group membership
south_dests <- c("ATL", "BNA")
pdx_to_south2 <- flights %>%
filter(origin == "PDX", dest %in% south_dests) %>%
select(origin, dest, everything())
head(pdx_to_south2)
origin | dest | year | month | day | dep_time | dep_delay | arr_time | arr_delay | carrier | tailnum | flight | air_time | distance | hour | minute |
PDX | ATL | 2014 | 1 | 1 | 944 | -6 | 1727 | -8 | AS | N548AS | 752 | 252 | 2.17e+03 | 9 | 44 |
PDX | ATL | 2014 | 1 | 1 | 1054 | 94 | 1807 | 84 | DL | N377DA | 1502 | 237 | 2.17e+03 | 10 | 54 |
PDX | ATL | 2014 | 1 | 1 | 1323 | -2 | 2038 | -15 | DL | N393DA | 773 | 235 | 2.17e+03 | 13 | 23 |
PDX | ATL | 2014 | 1 | 1 | 2253 | 8 | 611 | 4 | DL | N371DA | 503 | 240 | 2.17e+03 | 22 | 53 |
PDX | ATL | 2014 | 1 | 2 | 627 | -3 | 1350 | -7 | DL | N3746H | 1156 | 244 | 2.17e+03 | 6 | 27 |
PDX | ATL | 2014 | 1 | 2 | 918 | -2 | 1643 | -2 | DL | N3756 | 1940 | 249 | 2.17e+03 | 9 | 18 |
# flights delayed by 1 hour or more
delay_1plus <- flights %>%
filter(dep_delay >= 60)
head(delay_1plus)
year | month | day | dep_time | dep_delay | arr_time | arr_delay | carrier | tailnum | flight | origin | dest | air_time | distance | hour | minute |
2014 | 1 | 1 | 1 | 96 | 235 | 70 | AS | N508AS | 145 | PDX | ANC | 194 | 1.54e+03 | 0 | 1 |
2014 | 1 | 1 | 37 | 82 | 747 | 88 | DL | N806DN | 1823 | SEA | DTW | 224 | 1.93e+03 | 0 | 37 |
2014 | 1 | 1 | 346 | 227 | 936 | 219 | UA | N14219 | 1481 | SEA | ORD | 202 | 1.72e+03 | 3 | 46 |
2014 | 1 | 1 | 650 | 90 | 1037 | 91 | US | N626AW | 460 | SEA | PHX | 141 | 1.11e+03 | 6 | 50 |
2014 | 1 | 1 | 959 | 164 | 1137 | 157 | AS | N534AS | 805 | SEA | SMF | 77 | 605 | 9 | 59 |
2014 | 1 | 1 | 1008 | 68 | 1242 | 64 | AS | N788AS | 456 | SEA | LAX | 129 | 954 | 10 | 8 |
# flights delayed by 1 hour, but not more than 2 hours
delay_1hr <- flights %>%
filter(dep_delay >= 60, dep_delay < 120)
head(delay_1hr)
year | month | day | dep_time | dep_delay | arr_time | arr_delay | carrier | tailnum | flight | origin | dest | air_time | distance | hour | minute |
2014 | 1 | 1 | 1 | 96 | 235 | 70 | AS | N508AS | 145 | PDX | ANC | 194 | 1.54e+03 | 0 | 1 |
2014 | 1 | 1 | 37 | 82 | 747 | 88 | DL | N806DN | 1823 | SEA | DTW | 224 | 1.93e+03 | 0 | 37 |
2014 | 1 | 1 | 650 | 90 | 1037 | 91 | US | N626AW | 460 | SEA | PHX | 141 | 1.11e+03 | 6 | 50 |
2014 | 1 | 1 | 1008 | 68 | 1242 | 64 | AS | N788AS | 456 | SEA | LAX | 129 | 954 | 10 | 8 |
2014 | 1 | 1 | 1014 | 75 | 1613 | 81 | UA | N37408 | 1444 | SEA | ORD | 201 | 1.72e+03 | 10 | 14 |
2014 | 1 | 1 | 1036 | 81 | 1408 | 63 | OO | N218AG | 3466 | PDX | TUS | 134 | 1.12e+03 | 10 | 36 |
range(delay_1hr$dep_delay, na.rm = TRUE)
[1] 60 119
# even more efficient using between (always inclusive)
delay_bwn <- flights %>%
filter(between(dep_delay, 60, 119))
head(delay_bwn)
year | month | day | dep_time | dep_delay | arr_time | arr_delay | carrier | tailnum | flight | origin | dest | air_time | distance | hour | minute |
2014 | 1 | 1 | 1 | 96 | 235 | 70 | AS | N508AS | 145 | PDX | ANC | 194 | 1.54e+03 | 0 | 1 |
2014 | 1 | 1 | 37 | 82 | 747 | 88 | DL | N806DN | 1823 | SEA | DTW | 224 | 1.93e+03 | 0 | 37 |
2014 | 1 | 1 | 650 | 90 | 1037 | 91 | US | N626AW | 460 | SEA | PHX | 141 | 1.11e+03 | 6 | 50 |
2014 | 1 | 1 | 1008 | 68 | 1242 | 64 | AS | N788AS | 456 | SEA | LAX | 129 | 954 | 10 | 8 |
2014 | 1 | 1 | 1014 | 75 | 1613 | 81 | UA | N37408 | 1444 | SEA | ORD | 201 | 1.72e+03 | 10 | 14 |
2014 | 1 | 1 | 1036 | 81 | 1408 | 63 | OO | N218AG | 3466 | PDX | TUS | 134 | 1.12e+03 | 10 | 36 |
range(delay_bwn$dep_delay, na.rm = TRUE)
[1] 60 119
dplyr::arrange
# default is ascending order
flights %>%
arrange(year, month, day) %>% head(n=20)
year | month | day | dep_time | dep_delay | arr_time | arr_delay | carrier | tailnum | flight | origin | dest | air_time | distance | hour | minute |
2014 | 1 | 1 | 1 | 96 | 235 | 70 | AS | N508AS | 145 | PDX | ANC | 194 | 1.54e+03 | 0 | 1 |
2014 | 1 | 1 | 4 | -6 | 738 | -23 | US | N195UW | 1830 | SEA | CLT | 252 | 2.28e+03 | 0 | 4 |
2014 | 1 | 1 | 8 | 13 | 548 | -4 | UA | N37422 | 1609 | PDX | IAH | 201 | 1.82e+03 | 0 | 8 |
2014 | 1 | 1 | 28 | -2 | 800 | -23 | US | N547UW | 466 | PDX | CLT | 251 | 2.28e+03 | 0 | 28 |
2014 | 1 | 1 | 34 | 44 | 325 | 43 | AS | N762AS | 121 | SEA | ANC | 201 | 1.45e+03 | 0 | 34 |
2014 | 1 | 1 | 37 | 82 | 747 | 88 | DL | N806DN | 1823 | SEA | DTW | 224 | 1.93e+03 | 0 | 37 |
2014 | 1 | 1 | 346 | 227 | 936 | 219 | UA | N14219 | 1481 | SEA | ORD | 202 | 1.72e+03 | 3 | 46 |
2014 | 1 | 1 | 526 | -4 | 1148 | 15 | UA | N813UA | 229 | PDX | IAH | 217 | 1.82e+03 | 5 | 26 |
2014 | 1 | 1 | 527 | 7 | 917 | 24 | UA | N75433 | 1576 | SEA | DEN | 136 | 1.02e+03 | 5 | 27 |
2014 | 1 | 1 | 536 | 1 | 1334 | -6 | UA | N574UA | 478 | SEA | EWR | 268 | 2.4e+03 | 5 | 36 |
2014 | 1 | 1 | 541 | 1 | 911 | 4 | UA | N36476 | 1569 | PDX | DEN | 130 | 991 | 5 | 41 |
2014 | 1 | 1 | 549 | 24 | 907 | 12 | US | N548UW | 649 | PDX | PHX | 122 | 1.01e+03 | 5 | 49 |
2014 | 1 | 1 | 550 | 0 | 837 | -12 | DL | N660DL | 1634 | SEA | SLC | 82 | 689 | 5 | 50 |
2014 | 1 | 1 | 557 | -3 | 1134 | -16 | AA | N3JLAA | 1094 | SEA | DFW | 184 | 1.66e+03 | 5 | 57 |
2014 | 1 | 1 | 557 | -3 | 825 | -25 | AS | N562AS | 81 | SEA | ANC | 188 | 1.45e+03 | 5 | 57 |
2014 | 1 | 1 | 558 | -2 | 801 | -2 | AS | N402AS | 200 | SEA | SJC | 100 | 697 | 5 | 58 |
2014 | 1 | 1 | 559 | -1 | 916 | -9 | F9 | N210FR | 796 | PDX | DEN | 125 | 991 | 5 | 59 |
2014 | 1 | 1 | 600 | 0 | 1151 | -19 | AA | N3JFAA | 2240 | SEA | ORD | 206 | 1.72e+03 | 6 | 0 |
2014 | 1 | 1 | 600 | -10 | 842 | -8 | AS | N786AS | 426 | SEA | LAX | 125 | 954 | 6 | 0 |
2014 | 1 | 1 | 602 | -3 | 943 | 5 | F9 | N201FR | 144 | SEA | DEN | 129 | 1.02e+03 | 6 | 2 |
# descending order
flights %>%
arrange(desc(year), desc(month), desc(day)) %>% head(n=20)
year | month | day | dep_time | dep_delay | arr_time | arr_delay | carrier | tailnum | flight | origin | dest | air_time | distance | hour | minute |
2014 | 12 | 31 | 2 | 12 | 601 | 31 | AA | N3JKAA | 1230 | SEA | DFW | 210 | 1.66e+03 | 0 | 2 |
2014 | 12 | 31 | 27 | -3 | 623 | 3 | AA | N3EWAA | 1431 | SEA | ORD | 211 | 1.72e+03 | 0 | 27 |
2014 | 12 | 31 | 39 | 14 | 324 | 4 | AS | N762AS | 135 | SEA | ANC | 207 | 1.45e+03 | 0 | 39 |
2014 | 12 | 31 | 40 | 0 | 549 | 0 | DL | N757AT | 2440 | SEA | MSP | 174 | 1.4e+03 | 0 | 40 |
2014 | 12 | 31 | 52 | -8 | 917 | -21 | AA | N3JFAA | 371 | SEA | MIA | 306 | 2.72e+03 | 0 | 52 |
2014 | 12 | 31 | 54 | 4 | 621 | 17 | DL | N128DL | 1670 | PDX | MSP | 174 | 1.43e+03 | 0 | 54 |
2014 | 12 | 31 | 56 | 61 | 848 | 80 | DL | N655DL | 929 | SEA | ATL | 249 | 2.18e+03 | 0 | 56 |
2014 | 12 | 31 | 512 | -3 | 904 | 4 | US | N653AW | 480 | SEA | PHX | 154 | 1.11e+03 | 5 | 12 |
2014 | 12 | 31 | 515 | -5 | 855 | 5 | US | N580UW | 425 | PDX | PHX | 144 | 1.01e+03 | 5 | 15 |
2014 | 12 | 31 | 534 | 4 | 859 | 7 | UA | N34460 | 1075 | PDX | DEN | 125 | 991 | 5 | 34 |
2014 | 12 | 31 | 546 | 1 | 916 | -4 | WN | N8323C | 757 | PDX | DEN | 126 | 991 | 5 | 46 |
2014 | 12 | 31 | 548 | -2 | 1351 | -13 | UA | N461UA | 665 | PDX | EWR | 287 | 2.43e+03 | 5 | 48 |
2014 | 12 | 31 | 549 | 4 | 1208 | 12 | UA | N68807 | 1457 | SEA | IAH | 223 | 1.87e+03 | 5 | 49 |
2014 | 12 | 31 | 550 | 0 | 922 | 2 | WN | N797MX | 2121 | PDX | PHX | 138 | 1.01e+03 | 5 | 50 |
2014 | 12 | 31 | 551 | -4 | 1202 | 12 | AA | N3HXAA | 1094 | SEA | DFW | 211 | 1.66e+03 | 5 | 51 |
2014 | 12 | 31 | 551 | -9 | 744 | -15 | AS | N570AS | 342 | SEA | OAK | 91 | 671 | 5 | 51 |
2014 | 12 | 31 | 555 | -10 | 824 | -1 | AS | N548AS | 602 | SEA | LAS | 124 | 867 | 5 | 55 |
2014 | 12 | 31 | 558 | -2 | 849 | 0 | DL | N668DN | 1831 | PDX | SLC | 85 | 630 | 5 | 58 |
2014 | 12 | 31 | 558 | -2 | 1149 | 4 | AA | N436AA | 1534 | PDX | DFW | 211 | 1.62e+03 | 5 | 58 |
2014 | 12 | 31 | 558 | -2 | 738 | -4 | AS | N585AS | 406 | PDX | SJC | 80 | 569 | 5 | 58 |
dplyr::distinct
# unique origin-dest combinations
flights %>%
select(origin, dest) %>%
distinct %>% head(n=50)
origin | dest |
PDX | ANC |
SEA | CLT |
PDX | IAH |
PDX | CLT |
SEA | ANC |
SEA | DTW |
SEA | ORD |
SEA | DEN |
SEA | EWR |
PDX | DEN |
PDX | PHX |
SEA | SLC |
SEA | DFW |
SEA | SJC |
SEA | LAX |
PDX | ORD |
SEA | OAK |
SEA | SFO |
PDX | SJC |
SEA | SNA |
SEA | SAN |
PDX | DFW |
PDX | EWR |
SEA | IAH |
SEA | ATL |
PDX | BUR |
SEA | MDW |
PDX | LAX |
PDX | SAN |
PDX | IAD |
SEA | PSP |
PDX | MDW |
SEA | MSP |
PDX | SNA |
SEA | PHX |
PDX | HNL |
SEA | JFK |
PDX | MSP |
PDX | SFO |
SEA | LAS |
SEA | LGB |
PDX | ONT |
PDX | LGB |
PDX | JFK |
PDX | KOA |
PDX | LAS |
SEA | ONT |
SEA | MKE |
SEA | BUR |
SEA | KTN |
# all unique destinations from PDX (there are 49, so we'll just list the first few)
from_pdx <- flights %>%
filter(origin == "PDX") %>%
select(origin, dest) %>%
distinct(dest)
head(from_pdx)
dest |
ANC |
IAH |
CLT |
DEN |
PHX |
ORD |
dplyr::mutate
# add total delay variable
flights %>%
mutate(tot_delay = dep_delay + arr_delay) %>%
select(origin, dest, ends_with("delay"), everything()) %>%
head(n=100)
origin | dest | dep_delay | arr_delay | tot_delay | year | month | day | dep_time | arr_time | carrier | tailnum | flight | air_time | distance | hour | minute |
PDX | ANC | 96 | 70 | 166 | 2014 | 1 | 1 | 1 | 235 | AS | N508AS | 145 | 194 | 1.54e+03 | 0 | 1 |
SEA | CLT | -6 | -23 | -29 | 2014 | 1 | 1 | 4 | 738 | US | N195UW | 1830 | 252 | 2.28e+03 | 0 | 4 |
PDX | IAH | 13 | -4 | 9 | 2014 | 1 | 1 | 8 | 548 | UA | N37422 | 1609 | 201 | 1.82e+03 | 0 | 8 |
PDX | CLT | -2 | -23 | -25 | 2014 | 1 | 1 | 28 | 800 | US | N547UW | 466 | 251 | 2.28e+03 | 0 | 28 |
SEA | ANC | 44 | 43 | 87 | 2014 | 1 | 1 | 34 | 325 | AS | N762AS | 121 | 201 | 1.45e+03 | 0 | 34 |
SEA | DTW | 82 | 88 | 170 | 2014 | 1 | 1 | 37 | 747 | DL | N806DN | 1823 | 224 | 1.93e+03 | 0 | 37 |
SEA | ORD | 227 | 219 | 446 | 2014 | 1 | 1 | 346 | 936 | UA | N14219 | 1481 | 202 | 1.72e+03 | 3 | 46 |
PDX | IAH | -4 | 15 | 11 | 2014 | 1 | 1 | 526 | 1148 | UA | N813UA | 229 | 217 | 1.82e+03 | 5 | 26 |
SEA | DEN | 7 | 24 | 31 | 2014 | 1 | 1 | 527 | 917 | UA | N75433 | 1576 | 136 | 1.02e+03 | 5 | 27 |
SEA | EWR | 1 | -6 | -5 | 2014 | 1 | 1 | 536 | 1334 | UA | N574UA | 478 | 268 | 2.4e+03 | 5 | 36 |
PDX | DEN | 1 | 4 | 5 | 2014 | 1 | 1 | 541 | 911 | UA | N36476 | 1569 | 130 | 991 | 5 | 41 |
PDX | PHX | 24 | 12 | 36 | 2014 | 1 | 1 | 549 | 907 | US | N548UW | 649 | 122 | 1.01e+03 | 5 | 49 |
SEA | SLC | 0 | -12 | -12 | 2014 | 1 | 1 | 550 | 837 | DL | N660DL | 1634 | 82 | 689 | 5 | 50 |
SEA | DFW | -3 | -16 | -19 | 2014 | 1 | 1 | 557 | 1134 | AA | N3JLAA | 1094 | 184 | 1.66e+03 | 5 | 57 |
SEA | ANC | -3 | -25 | -28 | 2014 | 1 | 1 | 557 | 825 | AS | N562AS | 81 | 188 | 1.45e+03 | 5 | 57 |
SEA | SJC | -2 | -2 | -4 | 2014 | 1 | 1 | 558 | 801 | AS | N402AS | 200 | 100 | 697 | 5 | 58 |
PDX | DEN | -1 | -9 | -10 | 2014 | 1 | 1 | 559 | 916 | F9 | N210FR | 796 | 125 | 991 | 5 | 59 |
SEA | ORD | 0 | -19 | -19 | 2014 | 1 | 1 | 600 | 1151 | AA | N3JFAA | 2240 | 206 | 1.72e+03 | 6 | 0 |
SEA | LAX | -10 | -8 | -18 | 2014 | 1 | 1 | 600 | 842 | AS | N786AS | 426 | 125 | 954 | 6 | 0 |
SEA | DEN | -3 | 5 | 2 | 2014 | 1 | 1 | 602 | 943 | F9 | N201FR | 144 | 129 | 1.02e+03 | 6 | 2 |
PDX | ORD | -3 | 7 | 4 | 2014 | 1 | 1 | 602 | 1204 | UA | N77295 | 1573 | 203 | 1.74e+03 | 6 | 2 |
SEA | OAK | -2 | -17 | -19 | 2014 | 1 | 1 | 603 | 755 | AS | N577AS | 344 | 94 | 672 | 6 | 3 |
SEA | ORD | -3 | -2 | -5 | 2014 | 1 | 1 | 603 | 1202 | UA | N37290 | 1136 | 203 | 1.72e+03 | 6 | 3 |
SEA | SFO | -4 | -19 | -23 | 2014 | 1 | 1 | 606 | 806 | AS | N525AS | 308 | 98 | 679 | 6 | 6 |
PDX | SJC | 6 | 3 | 9 | 2014 | 1 | 1 | 606 | 746 | AS | N796AS | 406 | 87 | 569 | 6 | 6 |
SEA | SNA | -1 | -2 | -3 | 2014 | 1 | 1 | 614 | 850 | AS | N607AS | 524 | 135 | 978 | 6 | 14 |
SEA | SAN | 2 | -12 | -10 | 2014 | 1 | 1 | 617 | 850 | AS | N579AS | 494 | 131 | 1.05e+03 | 6 | 17 |
PDX | DFW | -2 | -30 | -32 | 2014 | 1 | 1 | 618 | 1135 | AA | N4XMAA | 1650 | 184 | 1.62e+03 | 6 | 18 |
SEA | SFO | -6 | -7 | -13 | 2014 | 1 | 1 | 619 | 822 | VX | N841VA | 751 | 101 | 679 | 6 | 19 |
SEA | LAX | -2 | 0 | -2 | 2014 | 1 | 1 | 620 | 905 | OO | N804SK | 4693 | 125 | 954 | 6 | 20 |
PDX | EWR | 2 | -19 | -17 | 2014 | 1 | 1 | 622 | 1412 | UA | N36472 | 1021 | 270 | 2.43e+03 | 6 | 22 |
SEA | IAH | 13 | -4 | 9 | 2014 | 1 | 1 | 623 | 1218 | UA | N72405 | 1101 | 213 | 1.87e+03 | 6 | 23 |
SEA | ATL | -6 | -6 | -12 | 2014 | 1 | 1 | 624 | 1401 | DL | N617DL | 968 | 235 | 2.18e+03 | 6 | 24 |
SEA | DEN | -9 | -1 | -10 | 2014 | 1 | 1 | 629 | 1014 | UA | N566UA | 883 | 132 | 1.02e+03 | 6 | 29 |
PDX | BUR | -10 | -14 | -24 | 2014 | 1 | 1 | 630 | 834 | OO | N219AG | 3456 | 105 | 817 | 6 | 30 |
SEA | MDW | -3 | 5 | 2 | 2014 | 1 | 1 | 632 | 1235 | WN | N8302F | 3694 | 204 | 1.73e+03 | 6 | 32 |
PDX | LAX | -8 | -7 | -15 | 2014 | 1 | 1 | 637 | 858 | AS | N408AS | 560 | 108 | 834 | 6 | 37 |
PDX | SAN | -3 | -6 | -9 | 2014 | 1 | 1 | 637 | 854 | AS | N588AS | 572 | 120 | 933 | 6 | 37 |
PDX | PHX | -2 | -5 | -7 | 2014 | 1 | 1 | 638 | 1003 | AS | N508AS | 646 | 125 | 1.01e+03 | 6 | 38 |
PDX | IAD | 10 | -4 | 6 | 2014 | 1 | 1 | 638 | 1408 | UA | N526UA | 250 | 253 | 2.33e+03 | 6 | 38 |
SEA | PSP | -1 | 3 | 2 | 2014 | 1 | 1 | 639 | 918 | AS | N794AS | 556 | 130 | 987 | 6 | 39 |
SEA | DFW | -1 | -10 | -11 | 2014 | 1 | 1 | 639 | 1216 | AS | N442AS | 668 | 189 | 1.66e+03 | 6 | 39 |
PDX | MDW | 4 | -11 | -7 | 2014 | 1 | 1 | 639 | 1219 | WN | N8308K | 401 | 207 | 1.75e+03 | 6 | 39 |
SEA | LAX | -3 | -2 | -5 | 2014 | 1 | 1 | 647 | 923 | AS | N315AS | 470 | 124 | 954 | 6 | 47 |
SEA | DEN | -3 | 19 | 16 | 2014 | 1 | 1 | 647 | 1038 | AS | N553AS | 682 | 136 | 1.02e+03 | 6 | 47 |
SEA | MSP | -7 | -9 | -16 | 2014 | 1 | 1 | 648 | 1203 | AS | N419AS | 36 | 164 | 1.4e+03 | 6 | 48 |
PDX | SNA | -12 | -11 | -23 | 2014 | 1 | 1 | 648 | 910 | AS | N611AS | 582 | 122 | 859 | 6 | 48 |
SEA | PHX | 90 | 91 | 181 | 2014 | 1 | 1 | 650 | 1037 | US | N626AW | 460 | 141 | 1.11e+03 | 6 | 50 |
PDX | HNL | -6 | -33 | -39 | 2014 | 1 | 1 | 654 | 1047 | AS | N586AS | 833 | 327 | 2.6e+03 | 6 | 54 |
SEA | JFK | -6 | -10 | -16 | 2014 | 1 | 1 | 654 | 1455 | DL | N686DA | 418 | 273 | 2.42e+03 | 6 | 54 |
PDX | MSP | -5 | -7 | -12 | 2014 | 1 | 1 | 655 | 1210 | DL | N963DN | 2090 | 168 | 1.43e+03 | 6 | 55 |
PDX | ORD | -4 | -28 | -32 | 2014 | 1 | 1 | 656 | 1242 | AA | N3KMAA | 2218 | 200 | 1.74e+03 | 6 | 56 |
PDX | SFO | -4 | 8 | 4 | 2014 | 1 | 1 | 656 | 853 | VX | N523VA | 805 | 82 | 550 | 6 | 56 |
PDX | SFO | 0 | -1 | -1 | 2014 | 1 | 1 | 700 | 844 | UA | N479UA | 519 | 80 | 550 | 7 | 0 |
SEA | LAS | -4 | -6 | -10 | 2014 | 1 | 1 | 701 | 918 | AS | N618AS | 606 | 115 | 867 | 7 | 1 |
SEA | PHX | 1 | -9 | -8 | 2014 | 1 | 1 | 701 | 1036 | WN | N8318F | 669 | 133 | 1.11e+03 | 7 | 1 |
SEA | LAX | -3 | -8 | -11 | 2014 | 1 | 1 | 702 | 932 | VX | N840VA | 1780 | 127 | 954 | 7 | 2 |
SEA | LGB | -8 | -2 | -10 | 2014 | 1 | 1 | 702 | 940 | OO | N217AG | 3472 | 130 | 965 | 7 | 2 |
SEA | DFW | -6 | -20 | -26 | 2014 | 1 | 1 | 704 | 1245 | AA | N3KNAA | 1308 | 188 | 1.66e+03 | 7 | 4 |
PDX | ONT | -1 | -9 | -10 | 2014 | 1 | 1 | 704 | 910 | OO | N223AG | 3460 | 106 | 838 | 7 | 4 |
PDX | LGB | 0 | 9 | 9 | 2014 | 1 | 1 | 705 | 931 | B6 | N627JB | 1121 | 113 | 846 | 7 | 5 |
SEA | SLC | 6 | -13 | -7 | 2014 | 1 | 1 | 706 | 952 | AS | N306AS | 726 | 86 | 689 | 7 | 6 |
SEA | JFK | -7 | -19 | -26 | 2014 | 1 | 1 | 708 | 1510 | AA | N3DNAA | 236 | 281 | 2.42e+03 | 7 | 8 |
PDX | JFK | -2 | -20 | -22 | 2014 | 1 | 1 | 708 | 1453 | DL | N3772H | 2258 | 267 | 2.45e+03 | 7 | 8 |
SEA | SJC | -1 | -5 | -6 | 2014 | 1 | 1 | 709 | 915 | AS | N558AS | 332 | 103 | 697 | 7 | 9 |
SEA | MSP | 9 | -4 | 5 | 2014 | 1 | 1 | 709 | 1209 | DL | N374NW | 2314 | 160 | 1.4e+03 | 7 | 9 |
PDX | SFO | -5 | -21 | -26 | 2014 | 1 | 1 | 710 | 906 | AS | N309AS | 244 | 82 | 550 | 7 | 10 |
SEA | SLC | -3 | -1 | -4 | 2014 | 1 | 1 | 712 | 1013 | DL | N355NW | 2397 | 85 | 689 | 7 | 12 |
SEA | IAH | 20 | 10 | 30 | 2014 | 1 | 1 | 713 | 1317 | UA | N854UA | 461 | 218 | 1.87e+03 | 7 | 13 |
SEA | LAX | 0 | -10 | -10 | 2014 | 1 | 1 | 715 | 944 | AS | N617AS | 478 | 127 | 954 | 7 | 15 |
PDX | KOA | -5 | -47 | -52 | 2014 | 1 | 1 | 715 | 1055 | AS | N536AS | 855 | 326 | 2.61e+03 | 7 | 15 |
SEA | PHX | -3 | -11 | -14 | 2014 | 1 | 1 | 717 | 1057 | AS | N769AS | 642 | 134 | 1.11e+03 | 7 | 17 |
PDX | LAS | -3 | 3 | 0 | 2014 | 1 | 1 | 722 | 933 | WN | N407WN | 947 | 101 | 763 | 7 | 22 |
SEA | DEN | 1 | 17 | 18 | 2014 | 1 | 1 | 726 | 1117 | WN | N603SW | 2181 | 133 | 1.02e+03 | 7 | 26 |
SEA | ONT | -2 | -7 | -9 | 2014 | 1 | 1 | 728 | 951 | AS | N320AS | 530 | 124 | 956 | 7 | 28 |
SEA | SFO | 5 | -9 | -4 | 2014 | 1 | 1 | 735 | 940 | UA | N16732 | 1114 | 104 | 679 | 7 | 35 |
SEA | SAN | 14 | -2 | 12 | 2014 | 1 | 1 | 739 | 1003 | AS | N518AS | 240 | 132 | 1.05e+03 | 7 | 39 |
SEA | SFO | 31 | 19 | 50 | 2014 | 1 | 1 | 741 | 947 | AS | N431AS | 222 | 96 | 679 | 7 | 41 |
SEA | MKE | 5 | 7 | 12 | 2014 | 1 | 1 | 745 | 1327 | WN | N924WN | 3872 | 200 | 1.69e+03 | 7 | 45 |
SEA | BUR | -6 | -3 | -9 | 2014 | 1 | 1 | 749 | 1013 | AS | N627AS | 522 | 122 | 937 | 7 | 49 |
PDX | LAX | -6 | -14 | -20 | 2014 | 1 | 1 | 754 | 1011 | AS | N706AS | 568 | 111 | 834 | 7 | 54 |
SEA | KTN | 52 | 45 | 97 | 2014 | 1 | 1 | 757 | 855 | AS | N705AS | 65 | 100 | 680 | 7 | 57 |
PDX | OAK | -4 | -18 | -22 | 2014 | 1 | 1 | 801 | 932 | WN | N447WN | 48 | 78 | 543 | 8 | 1 |
SEA | ATL | -3 | -17 | -20 | 2014 | 1 | 1 | 802 | 1533 | AS | N532AS | 742 | 249 | 2.18e+03 | 8 | 2 |
SEA | LAS | -1 | -8 | -9 | 2014 | 1 | 1 | 809 | 1021 | AS | N557AS | 600 | 115 | 867 | 8 | 9 |
PDX | MSP | -6 | -5 | -11 | 2014 | 1 | 1 | 809 | 1326 | DL | N951DN | 1588 | 170 | 1.43e+03 | 8 | 9 |
SEA | ORD | -10 | -11 | -21 | 2014 | 1 | 1 | 810 | 1419 | AA | N3DSAA | 1236 | 202 | 1.72e+03 | 8 | 10 |
SEA | DCA | 0 | -2 | -2 | 2014 | 1 | 1 | 810 | 1604 | AS | N514AS | 4 | 259 | 2.33e+03 | 8 | 10 |
SEA | ANC | 17 | -4 | 13 | 2014 | 1 | 1 | 817 | 1045 | AS | N529AS | 83 | 184 | 1.45e+03 | 8 | 17 |
SEA | ABQ | -3 | -7 | -10 | 2014 | 1 | 1 | 817 | 1158 | WN | N379SW | 253 | 138 | 1.18e+03 | 8 | 17 |
PDX | DFW | 14 | -7 | 7 | 2014 | 1 | 1 | 819 | 1343 | AA | N4XPAA | 1220 | 178 | 1.62e+03 | 8 | 19 |
SEA | IAD | 10 | -8 | 2 | 2014 | 1 | 1 | 821 | 1552 | UA | N514UA | 419 | 250 | 2.31e+03 | 8 | 21 |
SEA | ATL | -1 | -14 | -15 | 2014 | 1 | 1 | 824 | 1546 | DL | N633DL | 128 | 234 | 2.18e+03 | 8 | 24 |
SEA | DFW | -5 | -3 | -8 | 2014 | 1 | 1 | 825 | 1412 | AA | N3BDAA | 312 | 183 | 1.66e+03 | 8 | 25 |
SEA | MCO | 25 | 30 | 55 | 2014 | 1 | 1 | 825 | 1650 | AS | N551AS | 16 | 301 | 2.55e+03 | 8 | 25 |
SEA | ORD | -5 | 15 | 10 | 2014 | 1 | 1 | 825 | 1436 | AS | N305AS | 28 | 202 | 1.72e+03 | 8 | 25 |
SEA | MSP | 0 | 10 | 10 | 2014 | 1 | 1 | 825 | 1352 | DL | N584NW | 588 | 162 | 1.4e+03 | 8 | 25 |
PDX | IAH | 2 | -16 | -14 | 2014 | 1 | 1 | 827 | 1414 | UA | N38446 | 1510 | 210 | 1.82e+03 | 8 | 27 |
SEA | OGG | -1 | -22 | -23 | 2014 | 1 | 1 | 829 | 1228 | AS | N563AS | 861 | 338 | 2.64e+03 | 8 | 29 |
SEA | PHX | 0 | -12 | -12 | 2014 | 1 | 1 | 830 | 1206 | US | N667AW | 440 | 132 | 1.11e+03 | 8 | 30 |
# flights that were delayed at departure had on time or early arrivals?
arrivals <- flights %>%
mutate(arr_ok = ifelse(dep_delay > 0 & arr_delay <= 0, 1, 0)) %>%
select(origin, dest, ends_with("delay"), carrier, arr_ok)
# peek at it
arrivals %>%
filter(arr_ok == 1) %>%
head
origin | dest | dep_delay | arr_delay | carrier | arr_ok |
PDX | IAH | 13 | -4 | UA | 1 |
SEA | EWR | 1 | -6 | UA | 1 |
SEA | SAN | 2 | -12 | AS | 1 |
PDX | EWR | 2 | -19 | UA | 1 |
SEA | IAH | 13 | -4 | UA | 1 |
PDX | IAD | 10 | -4 | UA | 1 |
dplyr::summarise
(or dplyr::summarize
)flights %>%
summarise(mean(dep_delay, na.rm = TRUE))
mean(dep_delay, na.rm = TRUE) |
6.13 |
# we can also name that variable, and summarise multiple variables
flights %>%
summarise(mean_delay = mean(dep_delay, na.rm = TRUE),
sd_delay = sd(dep_delay, na.rm = TRUE),
median_delay = median(dep_delay, na.rm = TRUE))
mean_delay | sd_delay | median_delay |
6.13 | 29.1 | -2 |
But this can get tedious with multiple summaries…
flights %>%
filter(!is.na(dep_delay)) %>%
select(dep_delay) %>%
summarise_each(lst(mean, sd, median))
mean | sd | median |
6.13 | 29.1 | -2 |
# same thing
flights %>%
filter(!is.na(dep_delay)) %>%
summarise_each(lst(mean, sd, median), dep_delay)
mean | sd | median |
6.13 | 29.1 | -2 |
# combine with gather, change names too
flights %>%
filter(!is.na(dep_delay)) %>%
summarise_each(lst(mean, stdev = sd, median), dep_delay) %>%
gather(delay_stat, value)
delay_stat | value |
mean | 6.13 |
stdev | 29.1 |
median | -2 |
Using aggregating functions in summarise
# how many unique destinations?
summary_table <- flights %>%
summarise(tot_flights = n(),
tot_planes = n_distinct(tailnum),
tot_carriers = n_distinct(carrier),
tot_dests = n_distinct(dest),
tot_origins = n_distinct(origin))
summary_table
tot_flights | tot_planes | tot_carriers | tot_dests | tot_origins |
162049 | 3023 | 11 | 71 | 2 |
# chain with tidyr functions
summary_table %>%
gather(key, value) %>%
separate(key, into = c("tot", "entity")) %>%
select(-tot, total = value)
entity | total |
flights | 162049 |
planes | 3023 |
carriers | 11 |
dests | 71 |
origins | 2 |
tidyr
We’ll work with a made up dataframe:
df <- data.frame(
id = 1:10,
date = as.Date('2015-01-01') + 0:9,
q1_m1_w1 = rnorm(10, 0, 1),
q1_m1_w2 = rnorm(10, 0, 1),
q1_m2_w3 = rnorm(10, 0, 1),
q2_m1_w1 = rnorm(10, 0, 1),
q2_m2_w1 = rnorm(10, 0, 1),
q2_m2_w2 = rnorm(10, 0, 1)
)
# HLO
head(df)
id | date | q1_m1_w1 | q1_m1_w2 | q1_m2_w3 | q2_m1_w1 | q2_m2_w1 | q2_m2_w2 |
1 | 2015-01-01 | 0.908 | -1.06 | 0.259 | -0.0417 | 0.312 | 0.96 |
2 | 2015-01-02 | -0.8 | 0.378 | -0.74 | -0.888 | -0.474 | 1.13 |
3 | 2015-01-03 | 0.373 | 0.12 | -0.954 | -0.767 | -0.789 | -0.0994 |
4 | 2015-01-04 | -0.0835 | -1.18 | 0.336 | -1.02 | -0.808 | -0.201 |
5 | 2015-01-05 | -0.261 | -0.47 | 1.12 | 1.24 | 0.423 | -0.781 |
6 | 2015-01-06 | 0.16 | 0.264 | 0.939 | -1.08 | 0.17 | 0.0755 |
glimpse(df)
Rows: 10
Columns: 8
$ id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
$ date <date> 2015-01-01, 2015-01-02, 2015-01-03, 2015-01-04, 2015-01-05,…
$ q1_m1_w1 <dbl> 0.90833895, -0.80014146, 0.37278036, -0.08348309, -0.2609710…
$ q1_m1_w2 <dbl> -1.0636021, 0.3780610, 0.1200822, -1.1750064, -0.4698987, 0.…
$ q1_m2_w3 <dbl> 0.2585625, -0.7404383, -0.9540600, 0.3363360, 1.1159808, 0.9…
$ q2_m1_w1 <dbl> -0.04166279, -0.88810449, -0.76684055, -1.02118865, 1.243960…
$ q2_m2_w1 <dbl> 0.3118560, -0.4739947, -0.7892439, -0.8077828, 0.4226551, 0.…
$ q2_m2_w2 <dbl> 0.96043415, 1.13194728, -0.09936716, -0.20098062, -0.7809419…
tidyr::gather
First, let’s gather…
df_tidy <- df %>%
gather(key, value, q1_m1_w1:q2_m2_w2)
head(df_tidy)
id | date | key | value |
1 | 2015-01-01 | q1_m1_w1 | 0.908 |
2 | 2015-01-02 | q1_m1_w1 | -0.8 |
3 | 2015-01-03 | q1_m1_w1 | 0.373 |
4 | 2015-01-04 | q1_m1_w1 | -0.0835 |
5 | 2015-01-05 | q1_m1_w1 | -0.261 |
6 | 2015-01-06 | q1_m1_w1 | 0.16 |
Now let’s gather using subtraction…
df_tidy <- df %>%
gather(key, value, -id, -date)
head(df_tidy)
id | date | key | value |
1 | 2015-01-01 | q1_m1_w1 | 0.908 |
2 | 2015-01-02 | q1_m1_w1 | -0.8 |
3 | 2015-01-03 | q1_m1_w1 | 0.373 |
4 | 2015-01-04 | q1_m1_w1 | -0.0835 |
5 | 2015-01-05 | q1_m1_w1 | -0.261 |
6 | 2015-01-06 | q1_m1_w1 | 0.16 |
tidyr::separate
# separate 1 col into 3 cols
df_sep <- df_tidy %>%
separate(key, into = c("quarter", "month", "week"))
head(df_sep)
id | date | quarter | month | week | value |
1 | 2015-01-01 | q1 | m1 | w1 | 0.908 |
2 | 2015-01-02 | q1 | m1 | w1 | -0.8 |
3 | 2015-01-03 | q1 | m1 | w1 | 0.373 |
4 | 2015-01-04 | q1 | m1 | w1 | -0.0835 |
5 | 2015-01-05 | q1 | m1 | w1 | -0.261 |
6 | 2015-01-06 | q1 | m1 | w1 | 0.16 |
# separate 1 col into 2 cols
df_sep2 <- df_tidy %>%
separate(key, into = c("quarter", "period"), extra = "merge")
head(df_sep2)
id | date | quarter | period | value |
1 | 2015-01-01 | q1 | m1_w1 | 0.908 |
2 | 2015-01-02 | q1 | m1_w1 | -0.8 |
3 | 2015-01-03 | q1 | m1_w1 | 0.373 |
4 | 2015-01-04 | q1 | m1_w1 | -0.0835 |
5 | 2015-01-05 | q1 | m1_w1 | -0.261 |
6 | 2015-01-06 | q1 | m1_w1 | 0.16 |
tidyr::extract
Extract
is essentially the same as separate
, let’s see how…
# extract
df_ext <- df_sep2 %>%
extract(period, into = "month")
head(df_ext)
id | date | quarter | month | value |
1 | 2015-01-01 | q1 | m1 | 0.908 |
2 | 2015-01-02 | q1 | m1 | -0.8 |
3 | 2015-01-03 | q1 | m1 | 0.373 |
4 | 2015-01-04 | q1 | m1 | -0.0835 |
5 | 2015-01-05 | q1 | m1 | -0.261 |
6 | 2015-01-06 | q1 | m1 | 0.16 |
# this gives us same output as separate
df_ext <- df_sep2 %>%
extract(period, into = c("month", "week"),
regex = "([[:alnum:]]+)_([[:alnum:]]+)")
head(df_ext)
id | date | quarter | month | week | value |
1 | 2015-01-01 | q1 | m1 | w1 | 0.908 |
2 | 2015-01-02 | q1 | m1 | w1 | -0.8 |
3 | 2015-01-03 | q1 | m1 | w1 | 0.373 |
4 | 2015-01-04 | q1 | m1 | w1 | -0.0835 |
5 | 2015-01-05 | q1 | m1 | w1 | -0.261 |
6 | 2015-01-06 | q1 | m1 | w1 | 0.16 |
tidyr::unite
# let's say we want to combine quarter and month with an underscore
df_uni <- df_sep %>%
unite(period, quarter:month) # sep = "_" is the default arg
head(df_uni)
id | date | period | week | value |
1 | 2015-01-01 | q1_m1 | w1 | 0.908 |
2 | 2015-01-02 | q1_m1 | w1 | -0.8 |
3 | 2015-01-03 | q1_m1 | w1 | 0.373 |
4 | 2015-01-04 | q1_m1 | w1 | -0.0835 |
5 | 2015-01-05 | q1_m1 | w1 | -0.261 |
6 | 2015-01-06 | q1_m1 | w1 | 0.16 |
# let's say we want to combine quarter and month with nothing
df_uni <- df_sep %>%
unite(period, quarter:month, sep = "")
head(df_uni)
id | date | period | week | value |
1 | 2015-01-01 | q1m1 | w1 | 0.908 |
2 | 2015-01-02 | q1m1 | w1 | -0.8 |
3 | 2015-01-03 | q1m1 | w1 | 0.373 |
4 | 2015-01-04 | q1m1 | w1 | -0.0835 |
5 | 2015-01-05 | q1m1 | w1 | -0.261 |
6 | 2015-01-06 | q1m1 | w1 | 0.16 |
tidyr::spread
# finally let's spread
df_spread <- df_uni %>%
spread(week, value) # fill = NA is default arg
head(df_spread)
id | date | period | w1 | w2 | w3 |
1 | 2015-01-01 | q1m1 | 0.908 | -1.06 | |
1 | 2015-01-01 | q1m2 | 0.259 | ||
1 | 2015-01-01 | q2m1 | -0.0417 | ||
1 | 2015-01-01 | q2m2 | 0.312 | 0.96 | |
2 | 2015-01-02 | q1m1 | -0.8 | 0.378 | |
2 | 2015-01-02 | q1m2 | -0.74 |
gather() %>% separate() %>% spread()
)Gather multiple sets of columns
All in one, if we had wanted to essentially “gather” three sets of columns (here, one for each week)…
df_tidiest <- df %>%
gather(key, value, -id, -date) %>%
separate(key, into = c("quarter", "month", "week")) %>%
spread(week, value)
head(df_tidiest)
id | date | quarter | month | w1 | w2 | w3 |
1 | 2015-01-01 | q1 | m1 | 0.908 | -1.06 | |
1 | 2015-01-01 | q1 | m2 | 0.259 | ||
1 | 2015-01-01 | q2 | m1 | -0.0417 | ||
1 | 2015-01-01 | q2 | m2 | 0.312 | 0.96 | |
2 | 2015-01-02 | q1 | m1 | -0.8 | 0.378 | |
2 | 2015-01-02 | q1 | m2 | -0.74 |
broom
“The broom package takes the messy output of built-in functions in R, such as lm
, nls
, or t.test
, and turns them into tidy data frames.” So, broom tidies output from other R functions that are un-tidy.
See here for list of functions: https://github.com/dgrtwo/broom
Vignette: ftp://cran.r-project.org/pub/R/web/packages/broom/vignettes/broom.html
fit <- lm(mpg ~ qsec + factor(am) + wt + factor(gear),
data = mtcars)
Un-tidy output from lm
summary(fit)
Call:
lm(formula = mpg ~ qsec + factor(am) + wt + factor(gear), data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.5064 -1.5220 -0.7517 1.3841 4.6345
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.3650 8.3730 1.118 0.27359
qsec 1.2449 0.3828 3.252 0.00317 **
factor(am)1 3.1505 1.9405 1.624 0.11654
wt -3.9263 0.7428 -5.286 1.58e-05 ***
factor(gear)4 -0.2682 1.6555 -0.162 0.87257
factor(gear)5 -0.2697 2.0632 -0.131 0.89698
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.55 on 26 degrees of freedom
Multiple R-squared: 0.8498, Adjusted R-squared: 0.8209
F-statistic: 29.43 on 5 and 26 DF, p-value: 6.379e-10
Tidy output from broom
tidy(fit)
term | estimate | std.error | statistic | p.value |
(Intercept) | 9.37 | 8.37 | 1.12 | 0.274 |
qsec | 1.24 | 0.383 | 3.25 | 0.00317 |
factor(am)1 | 3.15 | 1.94 | 1.62 | 0.117 |
wt | -3.93 | 0.743 | -5.29 | 1.58e-05 |
factor(gear)4 | -0.268 | 1.66 | -0.162 | 0.873 |
factor(gear)5 | -0.27 | 2.06 | -0.131 | 0.897 |
gt
gt
is a relatively new package from RStudio designed for publication-ready figures.
library(gt)
Let’s set up a tiny little data table…
prepped_flight_data <- flights %>% filter(dest %in% c("DEN", "DCA", "JFK", "SFO")) %>%
group_by(dest) %>%
mutate_at(vars(dep_delay), replace_na, replace=0.0) %>%
summarise(mean_delay=mean(dep_delay, na.rm=TRUE), median_delay=median(dep_delay, na.rm=TRUE))
glimpse(prepped_flight_data)
Rows: 4
Columns: 3
$ dest <chr> "DCA", "DEN", "JFK", "SFO"
$ mean_delay <dbl> 1.475138, 8.753835, 9.235996, 12.996799
$ median_delay <dbl> -3, -1, -1, -1
… and now make a table out of it:
gt_formatted <- prepped_flight_data %>% gt() %>%
tab_header(title="PDX Departure Delays, DCA/DEN/JFK/SFO", subtitle="(Delay in minutes)") %>%
fmt_number(columns=vars(mean_delay, median_delay), decimals=2) %>%
cols_label(dest="Destination", mean_delay="Mean", median_delay="Median")
gt_formatted
PDX Departure Delays, DCA/DEN/JFK/SFO | ||
---|---|---|
(Delay in minutes) | ||
Destination | Mean | Median |
DCA | 1.48 | −3.00 |
DEN | 8.75 | −1.00 |
JFK | 9.24 | −1.00 |
SFO | 13.00 | −1.00 |
We can adjust things like text alignment after-the-fact:
gt_formatted <- gt_formatted %>%
cols_align(align="center", columns=vars(dest)) %>%
tab_style(
cell_text(align = "right"),
locations = cells_title(groups = c("subtitle"))
) %>%
tab_style(
cell_text(style = "italic"),
locations = cells_title(groups=c("title"))
)
gt_formatted
PDX Departure Delays, DCA/DEN/JFK/SFO | ||
---|---|---|
(Delay in minutes) | ||
Destination | Mean | Median |
DCA | 1.48 | −3.00 |
DEN | 8.75 | −1.00 |
JFK | 9.24 | −1.00 |
SFO | 13.00 | −1.00 |
Tip: In addition to cells_title()
, there are helpers to select any of the other cell types (e.g. cells_column_labels()
, etc.).
Now we can turn it into whatever we need! As Latex:
gt_formatted %>% gtsave("my_table.tex")
As RTF, to put in Word:
gt_formatted %>% gtsave("my_table.rtf")
tableone
tableone
is for generating “Table 1” in your paper. You know, the “Subject Characteristics” table- lots of boring summary statistics.
Vignette: https://cran.r-project.org/web/packages/tableone/vignettes/introduction.html
library(tableone)
CreateTableOne(data = mazes)
Overall
n 381
study_id (%)
CSLU-001 4 ( 1.0)
CSLU-002 4 ( 1.0)
CSLU-007 4 ( 1.0)
CSLU-010 4 ( 1.0)
CSLU-020 4 ( 1.0)
CSLU-024 4 ( 1.0)
CSLU-027 4 ( 1.0)
CSLU-031 4 ( 1.0)
CSLU-036 3 ( 0.8)
CSLU-046 4 ( 1.0)
CSLU-053 4 ( 1.0)
CSLU-054 4 ( 1.0)
CSLU-059 4 ( 1.0)
CSLU-062 4 ( 1.0)
CSLU-066 4 ( 1.0)
CSLU-073 4 ( 1.0)
CSLU-077 4 ( 1.0)
CSLU-080 4 ( 1.0)
CSLU-082 3 ( 0.8)
CSLU-084 4 ( 1.0)
CSLU-089 4 ( 1.0)
CSLU-095 3 ( 0.8)
CSLU-096 4 ( 1.0)
CSLU-101 4 ( 1.0)
CSLU-104 4 ( 1.0)
CSLU-112 4 ( 1.0)
CSLU-117 4 ( 1.0)
CSLU-119 4 ( 1.0)
CSLU-122 4 ( 1.0)
CSLU-124 3 ( 0.8)
CSLU-142 4 ( 1.0)
CSLU-144 4 ( 1.0)
CSLU-146 4 ( 1.0)
CSLU-154 4 ( 1.0)
CSLU-156 4 ( 1.0)
CSLU-161 4 ( 1.0)
CSLU-163 4 ( 1.0)
CSLU-165 4 ( 1.0)
CSLU-167 4 ( 1.0)
CSLU-180 4 ( 1.0)
CSLU-191 4 ( 1.0)
CSLU-203 4 ( 1.0)
CSLU-204 4 ( 1.0)
CSLU-213 4 ( 1.0)
CSLU-216 4 ( 1.0)
CSLU-220 4 ( 1.0)
CSLU-226 4 ( 1.0)
CSLU-233 4 ( 1.0)
CSLU-238 4 ( 1.0)
CSLU-245 4 ( 1.0)
CSLU-258 4 ( 1.0)
CSLU-259 4 ( 1.0)
CSLU-263 4 ( 1.0)
CSLU-269 4 ( 1.0)
CSLU-274 4 ( 1.0)
CSLU-275 4 ( 1.0)
CSLU-277 4 ( 1.0)
CSLU-284 4 ( 1.0)
CSLU-290 4 ( 1.0)
CSLU-303 4 ( 1.0)
CSLU-306 4 ( 1.0)
CSLU-312 4 ( 1.0)
CSLU-315 4 ( 1.0)
CSLU-316 4 ( 1.0)
CSLU-320 4 ( 1.0)
CSLU-324 4 ( 1.0)
CSLU-332 4 ( 1.0)
CSLU-335 4 ( 1.0)
CSLU-339 4 ( 1.0)
CSLU-348 4 ( 1.0)
CSLU-349 4 ( 1.0)
CSLU-355 4 ( 1.0)
CSLU-359 4 ( 1.0)
CSLU-372 4 ( 1.0)
CSLU-373 4 ( 1.0)
CSLU-375 4 ( 1.0)
CSLU-379 4 ( 1.0)
CSLU-388 2 ( 0.5)
CSLU-389 4 ( 1.0)
CSLU-393 4 ( 1.0)
CSLU-395 4 ( 1.0)
CSLU-417 4 ( 1.0)
CSLU-419 4 ( 1.0)
CSLU-427 4 ( 1.0)
CSLU-432 3 ( 0.8)
CSLU-435 4 ( 1.0)
CSLU-441 4 ( 1.0)
CSLU-442 4 ( 1.0)
CSLU-447 4 ( 1.0)
CSLU-454 4 ( 1.0)
CSLU-460 4 ( 1.0)
CSLU-470 4 ( 1.0)
CSLU-472 4 ( 1.0)
CSLU-477 4 ( 1.0)
CSLU-482 4 ( 1.0)
CSLU-486 4 ( 1.0)
CSLU-499 4 ( 1.0)
ca (mean (SD)) 6.83 (1.06)
viq (mean (SD)) 100.82 (18.74)
dx (%)
ASD 183 (48.0)
SLI 71 (18.6)
TD 127 (33.3)
activity (%)
Conversation 94 (24.7)
Picture Description 94 (24.7)
Play 96 (25.2)
Wordless Picture Book 97 (25.5)
content (mean (SD)) 18.73 (24.84)
filler (mean (SD)) 11.20 (17.59)
rep (mean (SD)) 6.24 (9.45)
rev (mean (SD)) 3.79 (4.31)
fs (mean (SD)) 8.70 (12.76)
cued (mean (SD)) 14.36 (24.22)
not_cued (mean (SD)) 26.77 (31.73)
my_maze_names <- c("Participant", "Age", "Verbal\nIQ", "Group", "Activity", "Content\nMaze", "Filler\nMaze", "Repetition", "Revision", "False\nStart", "Cued", "Not\nCued")
## Vector of variables to summarize
my_num_vars <- c("ca", "viq", "content", "filler", "rep", "rev", "fs", "cued", "not_cued")
## Vector of categorical variables that need transformation
my_cat_vars <- c("dx", "activity")
## Create a TableOne object
tab2 <- CreateTableOne(vars = my_num_vars, data = mazes, factorVars = my_cat_vars)
print(tab2, showAllLevels = TRUE)
level Overall
n 381
ca (mean (SD)) 6.83 (1.06)
viq (mean (SD)) 100.82 (18.74)
content (mean (SD)) 18.73 (24.84)
filler (mean (SD)) 11.20 (17.59)
rep (mean (SD)) 6.24 (9.45)
rev (mean (SD)) 3.79 (4.31)
fs (mean (SD)) 8.70 (12.76)
cued (mean (SD)) 14.36 (24.22)
not_cued (mean (SD)) 26.77 (31.73)
tab3 <- CreateTableOne(vars = my_num_vars, strata = "dx" , data = mazes)
tab3
Stratified by dx
ASD SLI TD p test
n 183 71 127
ca (mean (SD)) 6.74 (1.11) 7.15 (1.00) 6.76 (0.97) 0.015
viq (mean (SD)) 95.29 (17.62) 86.24 (5.95) 116.94 (12.82) <0.001
content (mean (SD)) 20.46 (29.73) 17.34 (24.35) 17.00 (15.67) 0.422
filler (mean (SD)) 7.86 (13.54) 10.56 (16.35) 16.38 (21.84) <0.001
rep (mean (SD)) 7.25 (11.82) 5.45 (6.86) 5.23 (6.21) 0.134
rev (mean (SD)) 3.87 (4.85) 3.25 (3.55) 3.98 (3.85) 0.498
fs (mean (SD)) 9.35 (14.60) 8.63 (15.00) 7.80 (7.55) 0.574
cued (mean (SD)) 10.66 (21.94) 13.21 (22.54) 20.35 (27.10) 0.002
not_cued (mean (SD)) 25.52 (33.49) 25.25 (31.84) 29.41 (29.04) 0.517
DT
packageAn excellent tutorial on DT is available at https://rstudio.github.io/DT/.
datatable(mazes)
xtable
(best for html)The xtable is a solution that delivers both HTML and LaTeX. The syntax is very similar to kable:
output <-
matrix(sprintf("Content %s", LETTERS[1:4]),
ncol=2, byrow=TRUE)
colnames(output) <-
c("1st header", "2nd header")
rownames(output) <-
c("1st row", "2nd row")
print(xtable(output,
caption="A test table",
align = c("l", "c", "r")),
type="html")
<!-- html table generated in R 3.6.3 by xtable 1.8-4 package -->
<!-- Mon May 4 16:21:07 2020 -->
<table border=1>
<caption align="bottom"> A test table </caption>
<tr> <th> </th> <th> 1st header </th> <th> 2nd header </th> </tr>
<tr> <td> 1st row </td> <td align="center"> Content A </td> <td align="right"> Content B </td> </tr>
<tr> <td> 2nd row </td> <td align="center"> Content C </td> <td align="right"> Content D </td> </tr>
</table>
Note that to make it knit
, you need to specify a chunk option: results = 'asis'
print(xtable(output,
caption="A test table",
align = c("l", "c", "r")),
type="html")
1st header | 2nd header | |
---|---|---|
1st row | Content A | Content B |
2nd row | Content C | Content D |
print(xtable(head(iris)), type = 'html', html.table.attributes = '')
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | |
---|---|---|---|---|---|
1 | 5.10 | 3.50 | 1.40 | 0.20 | setosa |
2 | 4.90 | 3.00 | 1.40 | 0.20 | setosa |
3 | 4.70 | 3.20 | 1.30 | 0.20 | setosa |
4 | 4.60 | 3.10 | 1.50 | 0.20 | setosa |
5 | 5.00 | 3.60 | 1.40 | 0.20 | setosa |
6 | 5.40 | 3.90 | 1.70 | 0.40 | setosa |
pixiedust
(best for PDF)Remember that broom
package we used earlier? We can make this table better…
tidy(fit)
term | estimate | std.error | statistic | p.value |
(Intercept) | 9.37 | 8.37 | 1.12 | 0.274 |
qsec | 1.24 | 0.383 | 3.25 | 0.00317 |
factor(am)1 | 3.15 | 1.94 | 1.62 | 0.117 |
wt | -3.93 | 0.743 | -5.29 | 1.58e-05 |
factor(gear)4 | -0.268 | 1.66 | -0.162 | 0.873 |
factor(gear)5 | -0.27 | 2.06 | -0.131 | 0.897 |
https://cran.r-project.org/web/packages/pixiedust/vignettes/pixiedust.html
http://www.suchanutter.net/pixiedust/index.html
dust(fit) %>%
sprinkle(cols = "term",
replace = c("Intercept", "Quarter Mile Time", "Automatic vs. Manual",
"Weight", "Gears: 4 vs. 3", "Gears: 5 vs 3")) %>%
sprinkle(cols = c("estimate", "std.error", "statistic"),
round = 3) %>%
sprinkle(cols = "p.value", fn = quote(pvalString(value))) %>%
sprinkle_colnames("Term", "Coefficient", "SE", "T-statistic", "P-value")
Term | Coefficient | SE | T-statistic | P-value |
---|---|---|---|---|
Intercept | 9.365 | 8.373 | 1.118 | 0.27 |
Quarter Mile Time | 1.245 | 0.383 | 3.252 | 0.003 |
Automatic vs. Manual | 3.151 | 1.941 | 1.624 | 0.12 |
Weight | -3.926 | 0.743 | -5.286 | < 0.001 |
Gears: 4 vs. 3 | -0.268 | 1.655 | -0.162 | 0.87 |
Gears: 5 vs 3 | -0.27 | 2.063 | -0.131 | 0.9 |
extrafont
The extrafont
package is the best place to start with fonts in R and ggplot.
library(extrafont)
Note: Follow all installation instructions from github
!
You can access fonts on your local system in ggplot
by using the theme()
function to set the relevant properties of your figure.
demo_plt <- ggplot(iris, aes(Sepal.Length, Sepal.Width, color=Species)) +
geom_point(size=3) +
scale_color_discrete(labels=c("Setosa", "Versicolor", "Virginica")) +
labs(x="Sepal Length (cm)", y="Sepal Width (cm)", title="Fisher's Irises")
demo_plt
Let’s try changing that font a bit:
demo_plt + theme(text=element_text(family="Optima"))
demo_plt + theme(text=element_text(family="Palatino"))
demo_plt + theme(text=element_text(family="Zapfino"))
Note that the specific name that you must use to refer to the font is not exactly obvious. It is not the font name, but rather the font’s “family name”. Because of some under-the-hood details about the way that fonts work, this is a bit different from what you might see in e.g. a font selection menu in Word. You can see a list of all the fonts that R knows about by using the fonttable()
command:
fonttable() %>% glimpse()
Rows: 122
Columns: 10
$ package <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ afmfile <chr> "Keyboard.afm.gz", "SFNSDisplay.afm.gz", "SFNSText.afm.gz"…
$ fontfile <chr> "/System/Library/Fonts/Keyboard.ttf", "/System/Library/Fon…
$ FullName <chr> ".Keyboard", "System Font", "System Font", "System Font It…
$ FamilyName <chr> ".Keyboard", "System Font", "System Font", "System Font", …
$ FontName <chr> "-Keyboard", "-SFNSDisplay", "-SFNSText", "-SFNSText-Itali…
$ Bold <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…
$ Italic <lgl> FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FAL…
$ Symbol <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…
$ afmsymfile <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
fonttable() %>% head()
package | afmfile | fontfile | FullName | FamilyName | FontName | Bold | Italic | Symbol | afmsymfile |
Keyboard.afm.gz | /System/Library/Fonts/Keyboard.ttf | .Keyboard | .Keyboard | -Keyboard | FALSE | FALSE | FALSE | ||
SFNSDisplay.afm.gz | /System/Library/Fonts/SFNSDisplay.ttf | System Font | System Font | -SFNSDisplay | FALSE | FALSE | FALSE | ||
SFNSText.afm.gz | /System/Library/Fonts/SFNSText.ttf | System Font | System Font | -SFNSText | FALSE | FALSE | FALSE | ||
SFNSTextItalic.afm.gz | /System/Library/Fonts/SFNSTextItalic.ttf | System Font Italic | System Font | -SFNSText-Italic | FALSE | TRUE | FALSE | ||
Andale Mono.afm.gz | /Library/Fonts/Andale Mono.ttf | Andale Mono | Andale Mono | AndaleMono | FALSE | FALSE | FALSE | ||
Apple Braille.afm.gz | /System/Library/Fonts/Apple Braille.ttf | Apple Braille | Apple Braille | AppleBraille | FALSE | FALSE | FALSE |
fonttable() %>% select(FullName, FamilyName, FontName) %>% head(20)
FullName | FamilyName | FontName |
.Keyboard | .Keyboard | -Keyboard |
System Font | System Font | -SFNSDisplay |
System Font | System Font | -SFNSText |
System Font Italic | System Font | -SFNSText-Italic |
Andale Mono | Andale Mono | AndaleMono |
Apple Braille | Apple Braille | AppleBraille |
Apple Braille Outline 6 Dot | Apple Braille | AppleBraille-Outline6Dot |
Apple Braille Outline 8 Dot | Apple Braille | AppleBraille-Outline8Dot |
Apple Braille Pinpoint 6 Dot | Apple Braille | AppleBraille-Pinpoint6Dot |
Apple Braille Pinpoint 8 Dot | Apple Braille | AppleBraille-Pinpoint8Dot |
AppleMyungjo Regular | AppleMyungjo | AppleMyungjo |
Arial Black | Arial Black | Arial-Black |
Arial Bold Italic | Arial | Arial-BoldItalicMT |
Arial Bold | Arial | Arial-BoldMT |
Arial Italic | Arial | Arial-ItalicMT |
Arial | Arial | ArialMT |
Arial Narrow | Arial Narrow | ArialNarrow |
Arial Narrow Bold | Arial Narrow | ArialNarrow-Bold |
Arial Narrow Bold Italic | Arial Narrow | ArialNarrow-BoldItalic |
Arial Narrow Italic | Arial Narrow | ArialNarrow-Italic |
fonttable() %>% select(FamilyName) %>% distinct()
FamilyName |
.Keyboard |
System Font |
Andale Mono |
Apple Braille |
AppleMyungjo |
Arial Black |
Arial |
Arial Narrow |
Arial Rounded MT Bold |
Arial Unicode MS |
Bitter |
Bodoni Ornaments |
Bodoni 72 Smallcaps |
Brush Script MT |
Calibri |
Calibri Light |
Comic Sans MS |
Consolas |
Courier New |
DIN Alternate |
DIN Condensed |
Fondamento |
Georgia |
Gorni |
Impact |
Khmer Sangam MN |
Lao Sangam MN |
Lato Black |
Lato |
Lato Hairline |
Lato Light |
Luminari |
Microsoft Sans Serif |
Noto Sans |
Noto Serif |
Open Sans |
Open Sans Extrabold |
Open Sans Light |
Open Sans Semibold |
SILDoulos IPA93 |
SILManuscript IPA93 |
SILSophia IPA93 |
Tahoma |
Tangerine |
Times New Roman |
Trattatello |
Trebuchet MS |
Verdana |
Webdings |
WeePeople |
Wingdings |
Wingdings 2 |
Wingdings 3 |
You’ll want to use the value in FamilyName
.
This part is a bit beyond the scope of this class, but the upshot is that after you install new fonts in your computer, you should tell extrafont
about it using the font_import()
command.
extrafont
only knows about TrueType fonts (ones whose file ends in .ttf
). If you have an OpenType font you want to use with R, your best bet probably involves the showtext
package.