1 Goals for Lab 05

Become familiar with tools for generating publication-ready tables directly in R.

We will use data from the following paper: MacFarlane, H., Gorman, K., Ingham, R., Presmanes Hill, A., Papadakis, K., Kiss, G., & van Santen, J. (2017). Quantitative analysis of disfluency in children with autism spectrum disorder or language impairment. PLoS ONE, 12(3), e0173936.

mazes <- read_csv("http://bit.ly/mazes-gist") %>% 
  clean_names()  #janitor package

glimpse(mazes)
Rows: 381
Columns: 12
$ study_id <chr> "CSLU-001", "CSLU-001", "CSLU-001", "CSLU-001", "CSLU-002", …
$ ca       <dbl> 5.6667, 5.6667, 5.6667, 5.6667, 6.5000, 6.5000, 6.5000, 6.50…
$ viq      <dbl> 124, 124, 124, 124, 124, 124, 124, 124, 108, 108, 108, 108, …
$ dx       <chr> "TD", "TD", "TD", "TD", "TD", "TD", "TD", "TD", "TD", "TD", …
$ activity <chr> "Conversation", "Picture Description", "Play", "Wordless Pic…
$ content  <dbl> 24, 1, 21, 8, 3, 5, 8, 2, 25, 10, 2, 5, 32, 20, 13, 21, 27, …
$ filler   <dbl> 31, 2, 6, 2, 10, 3, 8, 2, 21, 13, 10, 2, 12, 9, 4, 4, 12, 6,…
$ rep      <dbl> 2, 0, 3, 0, 3, 2, 3, 0, 4, 0, 0, 0, 13, 5, 5, 6, 10, 5, 5, 2…
$ rev      <dbl> 5, 0, 8, 4, 0, 1, 2, 0, 4, 2, 1, 3, 8, 7, 2, 8, 5, 1, 4, 2, …
$ fs       <dbl> 17, 1, 10, 4, 0, 2, 3, 2, 17, 8, 1, 2, 11, 8, 6, 7, 12, 3, 3…
$ cued     <dbl> 36, 2, 6, 2, 10, 3, 9, 2, 29, 13, 11, 2, 14, 12, 4, 11, 17, …
$ not_cued <dbl> 50, 3, 27, 10, 13, 8, 15, 4, 38, 23, 11, 7, 42, 26, 17, 18, …

2 TL;DR

The workhorse for making tables in R Markdown documents is the knitr package’s kable function. This function is really versatile, but also free of fancy formatting options, for better or worse.

3 knitr::kable

3.1 kable all tables everywhere

Update the YAML of your document. For HTML:

---
title: "My Awesome Data Vis Lab"
output:
  html_document:
    df_print: kable
---

You can also define the html format in the global options.

# If you don't define format here, you'll need put `format = "html"` in every kable function.
options(knitr.table.format = "html") 
# You may also wish to set this option
options(scipen = 1, digits = 2)

3.2 kable table in a chunk

For HTML:

head(mazes) %>% 
  kable(format = "html")
study_id ca viq dx activity content filler rep rev fs cued not_cued
CSLU-001 5.6667 124 TD Conversation 24 31 2 5 17 36 50
CSLU-001 5.6667 124 TD Picture Description 1 2 0 0 1 2 3
CSLU-001 5.6667 124 TD Play 21 6 3 8 10 6 27
CSLU-001 5.6667 124 TD Wordless Picture Book 8 2 0 4 4 2 10
CSLU-002 6.5000 124 TD Conversation 3 10 3 0 0 10 13
CSLU-002 6.5000 124 TD Picture Description 5 3 2 1 2 3 8
head(mazes) %>% 
  kable(format = "html", digits = 2, caption = "A table produced by kable.")
A table produced by kable.
study_id ca viq dx activity content filler rep rev fs cued not_cued
CSLU-001 5.67 124 TD Conversation 24 31 2 5 17 36 50
CSLU-001 5.67 124 TD Picture Description 1 2 0 0 1 2 3
CSLU-001 5.67 124 TD Play 21 6 3 8 10 6 27
CSLU-001 5.67 124 TD Wordless Picture Book 8 2 0 4 4 2 10
CSLU-002 6.50 124 TD Conversation 3 10 3 0 0 10 13
CSLU-002 6.50 124 TD Picture Description 5 3 2 1 2 3 8
my_maze_names <- c("Participant", "Age", "Verbal\nIQ", "Group", "Activity", "Content\nMaze", "Filler\nMaze", "Repetition", "Revision", "False\nStart", "Cued", "Not\nCued")
head(mazes) %>% 
  kable(format = "html", digits = 2, caption = "A table produced by kable.",
        col.names = my_maze_names)
A table produced by kable.
Participant Age Verbal IQ Group Activity Content Maze Filler Maze Repetition Revision False Start Cued Not Cued
CSLU-001 5.67 124 TD Conversation 24 31 2 5 17 36 50
CSLU-001 5.67 124 TD Picture Description 1 2 0 0 1 2 3
CSLU-001 5.67 124 TD Play 21 6 3 8 10 6 27
CSLU-001 5.67 124 TD Wordless Picture Book 8 2 0 4 4 2 10
CSLU-002 6.50 124 TD Conversation 3 10 3 0 0 10 13
CSLU-002 6.50 124 TD Picture Description 5 3 2 1 2 3 8

3.3 Styled kable tables in a chunk

Solution: apply some Bootstrap CSS styling using the kableExtra package.

head(mazes) %>% 
  kable(format = "html", digits = 2, caption = "A styled kable table.",
        col.names = my_maze_names) %>% 
  kable_styling()
A styled kable table.
Participant Age Verbal IQ Group Activity Content Maze Filler Maze Repetition Revision False Start Cued Not Cued
CSLU-001 5.67 124 TD Conversation 24 31 2 5 17 36 50
CSLU-001 5.67 124 TD Picture Description 1 2 0 0 1 2 3
CSLU-001 5.67 124 TD Play 21 6 3 8 10 6 27
CSLU-001 5.67 124 TD Wordless Picture Book 8 2 0 4 4 2 10
CSLU-002 6.50 124 TD Conversation 3 10 3 0 0 10 13
CSLU-002 6.50 124 TD Picture Description 5 3 2 1 2 3 8

Lots of printing options: https://haozhu233.github.io/kableExtra/awesome_table_in_html.html

head(mazes) %>% 
  kable(format = "html", digits = 2, caption = "A non-full width zebra kable table.") %>% 
  kable_styling(bootstrap_options = "striped", full_width = F)
A non-full width zebra kable table.
study_id ca viq dx activity content filler rep rev fs cued not_cued
CSLU-001 5.67 124 TD Conversation 24 31 2 5 17 36 50
CSLU-001 5.67 124 TD Picture Description 1 2 0 0 1 2 3
CSLU-001 5.67 124 TD Play 21 6 3 8 10 6 27
CSLU-001 5.67 124 TD Wordless Picture Book 8 2 0 4 4 2 10
CSLU-002 6.50 124 TD Conversation 3 10 3 0 0 10 13
CSLU-002 6.50 124 TD Picture Description 5 3 2 1 2 3 8
head(mazes) %>% 
  kable(format = "html", digits = 2, caption = "Over here!") %>% 
  kable_styling(bootstrap_options = "striped", full_width = F, position = "left")
Over here!
study_id ca viq dx activity content filler rep rev fs cued not_cued
CSLU-001 5.67 124 TD Conversation 24 31 2 5 17 36 50
CSLU-001 5.67 124 TD Picture Description 1 2 0 0 1 2 3
CSLU-001 5.67 124 TD Play 21 6 3 8 10 6 27
CSLU-001 5.67 124 TD Wordless Picture Book 8 2 0 4 4 2 10
CSLU-002 6.50 124 TD Conversation 3 10 3 0 0 10 13
CSLU-002 6.50 124 TD Picture Description 5 3 2 1 2 3 8

3.4 kable + kableExtra + formattable

color_tile and color_bar are neat extras if used wisely!

http://haozhu233.github.io/kableExtra/use_kableExtra_with_formattable.html

library(formattable)
head(mazes) %>% 
  mutate(ca = color_tile("transparent", "lightpink")(ca),
         viq = color_bar("lightseagreen")(viq)) %>%
  kable("html", escape = F, caption = 'This table is colored.') %>%
  kable_styling(position = "center") %>% 
  column_spec(4, width = "3cm") 
This table is colored.
study_id ca viq dx activity content filler rep rev fs cued not_cued
CSLU-001 5.6667 124 TD Conversation 24 31 2 5 17 36 50
CSLU-001 5.6667 124 TD Picture Description 1 2 0 0 1 2 3
CSLU-001 5.6667 124 TD Play 21 6 3 8 10 6 27
CSLU-001 5.6667 124 TD Wordless Picture Book 8 2 0 4 4 2 10
CSLU-002 6.5000 124 TD Conversation 3 10 3 0 0 10 13
CSLU-002 6.5000 124 TD Picture Description 5 3 2 1 2 3 8

3.5 tibble + kable + kableExtra

You can also use any of these tools with plain text tables using the tibble package to create a table. Two main functions:

  • tribble: enter tibble by rows
  • tibble: enter tibble by columns

For example, I used tribble to make this table in our slide decks:

math_table <- tibble::tribble(
  ~Operator, ~Description, ~Usage,
  "\\+", "addition", "x + y",
  "\\-", "subtraction", "x - y",
  "\\*", "multiplication", "x * y",
  "/", "division", "x / y",
  "^", "raised to the power of", "x ^ y",
  "abs", "absolute value", "abs(x)",
  "%/%", "integer division", "x %/% y",
  "%%", "remainder after division", "x %% y"
)

Then I used this chunk to print it:

```{r, results = 'asis'}
knitr::kable(math_table, format = "html", caption = "Helpful mutate functions") %>% 
  kable_styling(bootstrap_options = "striped", full_width = F, position = "left")
```
knitr::kable(math_table, format = "html", caption = "Helpful mutate functions") %>% 
  kable_styling(bootstrap_options = "striped", full_width = F, position = "left")
Helpful mutate functions
Operator Description Usage
+ addition x + y
- subtraction x - y
* multiplication x * y
/ division x / y
^ raised to the power of x ^ y
abs absolute value abs(x)
%/% integer division x %/% y
%% remainder after division x %% y

4 Markdown Tables

Sometimes you may just want to type in a table in Markdown and ignore R. Four kinds of tables may be used. The first three kinds presuppose the use of a fixed-width font, such as Courier. The fourth kind can be used with proportionally spaced fonts, as it does not require lining up columns. All of the below will render when typed outside of an R code chunk since these are based on pandoc being used to render your markdown document. Note that these should all work whether you are knitting to either html or PDF.

4.1 Simple table

This code for a simple table:

  Right     Left     Center     Default
-------     ------ ----------   -------
     12     12        12            12
    123     123       123          123
      1     1          1             1

Table:  Demonstration of simple table syntax.

Produces this simple table:

Demonstration of simple table syntax.
Right Left Center Default
12 12 12 12
123 123 123 123
1 1 1 1

The headers and table rows must each fit on one line. Column alignments are determined by the position of the header text relative to the dashed line below it:3

  • If the dashed line is flush with the header text on the right side but extends beyond it on the left, the column is right-aligned.
  • If the dashed line is flush with the header text on the left side but extends beyond it on the right, the column is left-aligned.
  • If the dashed line extends beyond the header text on both sides, the column is centered.
  • If the dashed line is flush with the header text on both sides, the default alignment is used (in most cases, this will be left).
  • The table must end with a blank line, or a line of dashes followed by a blank line.

The column headers may be omitted, provided a dashed line is used to end the table.

4.2 Multi-line tables

This code for a multi-line table:

-------------------------------------------------------------
 Centered   Default           Right Left
  Header    Aligned         Aligned Aligned
----------- ------- --------------- -------------------------
   First    row                12.0 Example of a row that
                                    spans multiple lines.

  Second    row                 5.0 Here's another one. Note
                                    the blank line between
                                    rows.
-------------------------------------------------------------

Table: Here's the caption. It, too, may span
multiple lines.

Produces this multi-line table:

Here’s the caption. It, too, may span multiple lines.
Centered Header Default Aligned Right Aligned Left Aligned
First row 12.0 Example of a row that spans multiple lines.
Second row 5.0 Here’s another one. Note the blank line between rows.

4.3 Grid tables

This code for a grid table:

: Sample grid table.

+---------------+---------------+--------------------+
| Fruit         | Price         | Advantages         |
+===============+===============+====================+
| Bananas       | $1.34         | - built-in wrapper |
|               |               | - bright color     |
+---------------+---------------+--------------------+
| Oranges       | $2.10         | - cures scurvy     |
|               |               | - tasty            |
+---------------+---------------+--------------------+

Produces this grid table:

Sample grid table.
Fruit Price Advantages
Bananas $1.34
  • built-in wrapper
  • bright color
Oranges $2.10
  • cures scurvy
  • tasty

Alignments are not supported, nor are cells that span multiple columns or rows.

4.4 Pipe tables

This code for a pipe table:

| Right | Left | Default | Center |
|------:|:-----|---------|:------:|
|   12  |  12  |    12   |    12  |
|  123  |  123 |   123   |   123  |
|    1  |    1 |     1   |     1  |

  : Demonstration of pipe table syntax.

Produces this pipe table:

Demonstration of pipe table syntax.
Right Left Default Center
12 12 12 12
123 123 123 123
1 1 1 1

5 Making tables in R

If you want to make tables that include R output (like output from functions like means, variances, or output from models), there are two steps:

  1. Get the numbers you need in tabular format; then
  2. Render that information in an aesthetically-pleasing way.

This section covers (1). But, although there are some nice options for (2) within R Markdown via various packages, I am not dogmatic about doing everything in R Markdown, especially things like (2).

5.1 dplyr

We’ll use the pnwflights14 package to practice our dplyr skills. We need to download the package from github using devtools.

# once per machine
install.packages("devtools")
devtools::install_github("ismayc/pnwflights14")

Now, we need to load the flights dataset from the pnwflights14 package.

# once per work session
data("flights", package = "pnwflights14")

5.1.1 dplyr::select

Use select to specify which columns in a dataframe you’d like to keep by name. Heretofore, this was not possible in base R! In base R, this can only be achieved using numeric variable positions. But most of the time, you keep track of your variables by name (like carrier) rather than position (the 8th column).

# keep these 2 cols
mini_flights <- flights %>% 
  select(carrier, flight)
glimpse(mini_flights)
Rows: 162,049
Columns: 2
$ carrier <chr> "AS", "US", "UA", "US", "AS", "DL", "UA", "UA", "UA", "UA", "…
$ flight  <int> 145, 1830, 1609, 466, 121, 1823, 1481, 229, 1576, 478, 1569, …
# keep first five cols
first_five <- flights %>% 
  select(year, month, day, dep_time, dep_delay)
glimpse(first_five)
Rows: 162,049
Columns: 5
$ year      <int> 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014,…
$ month     <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ day       <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ dep_time  <int> 1, 4, 8, 28, 34, 37, 346, 526, 527, 536, 541, 549, 550, 557…
$ dep_delay <dbl> 96, -6, 13, -2, 44, 82, 227, -4, 7, 1, 1, 24, 0, -3, -3, -2…
# alternatively, specify range
first_five <- flights %>% 
  select(year:dep_delay)
glimpse(first_five)
Rows: 162,049
Columns: 5
$ year      <int> 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014,…
$ month     <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ day       <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ dep_time  <int> 1, 4, 8, 28, 34, 37, 346, 526, 527, 536, 541, 549, 550, 557…
$ dep_delay <dbl> 96, -6, 13, -2, 44, 82, 227, -4, 7, 1, 1, 24, 0, -3, -3, -2…

We can also choose the columns we want by negation, that is, you can specify which columns to drop instead of keep. This way, all variables not listed are kept.

# we can also use negation
all_but_year <- flights %>% 
  select(-year)
glimpse(all_but_year)
Rows: 162,049
Columns: 15
$ month     <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ day       <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ dep_time  <int> 1, 4, 8, 28, 34, 37, 346, 526, 527, 536, 541, 549, 550, 557…
$ dep_delay <dbl> 96, -6, 13, -2, 44, 82, 227, -4, 7, 1, 1, 24, 0, -3, -3, -2…
$ arr_time  <int> 235, 738, 548, 800, 325, 747, 936, 1148, 917, 1334, 911, 90…
$ arr_delay <dbl> 70, -23, -4, -23, 43, 88, 219, 15, 24, -6, 4, 12, -12, -16,…
$ carrier   <chr> "AS", "US", "UA", "US", "AS", "DL", "UA", "UA", "UA", "UA",…
$ tailnum   <chr> "N508AS", "N195UW", "N37422", "N547UW", "N762AS", "N806DN",…
$ flight    <int> 145, 1830, 1609, 466, 121, 1823, 1481, 229, 1576, 478, 1569…
$ origin    <chr> "PDX", "SEA", "PDX", "PDX", "SEA", "SEA", "SEA", "PDX", "SE…
$ dest      <chr> "ANC", "CLT", "IAH", "CLT", "ANC", "DTW", "ORD", "IAH", "DE…
$ air_time  <dbl> 194, 252, 201, 251, 201, 224, 202, 217, 136, 268, 130, 122,…
$ distance  <dbl> 1542, 2279, 1825, 2282, 1448, 1927, 1721, 1825, 1024, 2402,…
$ hour      <dbl> 0, 0, 0, 0, 0, 0, 3, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6,…
$ minute    <dbl> 1, 4, 8, 28, 34, 37, 46, 26, 27, 36, 41, 49, 50, 57, 57, 58…

dplyr::select comes with several other helper functions…

depart <- flights %>%
  select(starts_with("dep_"))
glimpse(depart)
Rows: 162,049
Columns: 2
$ dep_time  <int> 1, 4, 8, 28, 34, 37, 346, 526, 527, 536, 541, 549, 550, 557…
$ dep_delay <dbl> 96, -6, 13, -2, 44, 82, 227, -4, 7, 1, 1, 24, 0, -3, -3, -2…
times <- flights %>%
  select(contains("time"))
glimpse(times)
Rows: 162,049
Columns: 3
$ dep_time <int> 1, 4, 8, 28, 34, 37, 346, 526, 527, 536, 541, 549, 550, 557,…
$ arr_time <int> 235, 738, 548, 800, 325, 747, 936, 1148, 917, 1334, 911, 907…
$ air_time <dbl> 194, 252, 201, 251, 201, 224, 202, 217, 136, 268, 130, 122, …
# note that we are not actually saving the new dataframe here
flights %>%
  select(-contains("time")) %>% head()
year month day dep_delay arr_delay carrier tailnum flight origin dest distance hour minute
2014 1 1 96 70 AS N508AS 145 PDX ANC 1.54e+03 0 1
2014 1 1 -6 -23 US N195UW 1830 SEA CLT 2.28e+03 0 4
2014 1 1 13 -4 UA N37422 1609 PDX IAH 1.82e+03 0 8
2014 1 1 -2 -23 US N547UW 466 PDX CLT 2.28e+03 0 28
2014 1 1 44 43 AS N762AS 121 SEA ANC 1.45e+03 0 34
2014 1 1 82 88 DL N806DN 1823 SEA DTW 1.93e+03 0 37
delays <- flights %>%
  select(ends_with("delay"))
glimpse(delays)
Rows: 162,049
Columns: 2
$ dep_delay <dbl> 96, -6, 13, -2, 44, 82, 227, -4, 7, 1, 1, 24, 0, -3, -3, -2…
$ arr_delay <dbl> 70, -23, -4, -23, 43, 88, 219, 15, 24, -6, 4, 12, -12, -16,…

One of my favorite select helper functions is everything(), which allows you to use select to keep all your variables, but easily rearrange the columns without having to list all the variables to keep/drop.

new_order <- flights %>% 
  select(origin, dest, everything())
head(new_order)
origin dest year month day dep_time dep_delay arr_time arr_delay carrier tailnum flight air_time distance hour minute
PDX ANC 2014 1 1 1 96 235 70 AS N508AS 145 194 1.54e+03 0 1
SEA CLT 2014 1 1 4 -6 738 -23 US N195UW 1830 252 2.28e+03 0 4
PDX IAH 2014 1 1 8 13 548 -4 UA N37422 1609 201 1.82e+03 0 8
PDX CLT 2014 1 1 28 -2 800 -23 US N547UW 466 251 2.28e+03 0 28
SEA ANC 2014 1 1 34 44 325 43 AS N762AS 121 201 1.45e+03 0 34
SEA DTW 2014 1 1 37 82 747 88 DL N806DN 1823 224 1.93e+03 0 37
# with negation
new_order2 <- flights %>% 
  select(origin, dest, everything(), -year)
head(new_order2)
origin dest month day dep_time dep_delay arr_time arr_delay carrier tailnum flight air_time distance hour minute
PDX ANC 1 1 1 96 235 70 AS N508AS 145 194 1.54e+03 0 1
SEA CLT 1 1 4 -6 738 -23 US N195UW 1830 252 2.28e+03 0 4
PDX IAH 1 1 8 13 548 -4 UA N37422 1609 201 1.82e+03 0 8
PDX CLT 1 1 28 -2 800 -23 US N547UW 466 251 2.28e+03 0 28
SEA ANC 1 1 34 44 325 43 AS N762AS 121 201 1.45e+03 0 34
SEA DTW 1 1 37 82 747 88 DL N806DN 1823 224 1.93e+03 0 37

We can also rename variables within select.

flights2 <- flights %>%
  select(tail_num = tailnum, everything())
head(flights2)
tail_num year month day dep_time dep_delay arr_time arr_delay carrier flight origin dest air_time distance hour minute
N508AS 2014 1 1 1 96 235 70 AS 145 PDX ANC 194 1.54e+03 0 1
N195UW 2014 1 1 4 -6 738 -23 US 1830 SEA CLT 252 2.28e+03 0 4
N37422 2014 1 1 8 13 548 -4 UA 1609 PDX IAH 201 1.82e+03 0 8
N547UW 2014 1 1 28 -2 800 -23 US 466 PDX CLT 251 2.28e+03 0 28
N762AS 2014 1 1 34 44 325 43 AS 121 SEA ANC 201 1.45e+03 0 34
N806DN 2014 1 1 37 82 747 88 DL 1823 SEA DTW 224 1.93e+03 0 37

If you don’t want to move the renamed variables within your dataframe, you can use the rename function.

flights3 <- flights %>%
  rename(tail_num = tailnum)
glimpse(flights3)
Rows: 162,049
Columns: 16
$ year      <int> 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014,…
$ month     <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ day       <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
$ dep_time  <int> 1, 4, 8, 28, 34, 37, 346, 526, 527, 536, 541, 549, 550, 557…
$ dep_delay <dbl> 96, -6, 13, -2, 44, 82, 227, -4, 7, 1, 1, 24, 0, -3, -3, -2…
$ arr_time  <int> 235, 738, 548, 800, 325, 747, 936, 1148, 917, 1334, 911, 90…
$ arr_delay <dbl> 70, -23, -4, -23, 43, 88, 219, 15, 24, -6, 4, 12, -12, -16,…
$ carrier   <chr> "AS", "US", "UA", "US", "AS", "DL", "UA", "UA", "UA", "UA",…
$ tail_num  <chr> "N508AS", "N195UW", "N37422", "N547UW", "N762AS", "N806DN",…
$ flight    <int> 145, 1830, 1609, 466, 121, 1823, 1481, 229, 1576, 478, 1569…
$ origin    <chr> "PDX", "SEA", "PDX", "PDX", "SEA", "SEA", "SEA", "PDX", "SE…
$ dest      <chr> "ANC", "CLT", "IAH", "CLT", "ANC", "DTW", "ORD", "IAH", "DE…
$ air_time  <dbl> 194, 252, 201, 251, 201, 224, 202, 217, 136, 268, 130, 122,…
$ distance  <dbl> 1542, 2279, 1825, 2282, 1448, 1927, 1721, 1825, 1024, 2402,…
$ hour      <dbl> 0, 0, 0, 0, 0, 0, 3, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6,…
$ minute    <dbl> 1, 4, 8, 28, 34, 37, 46, 26, 27, 36, 41, 49, 50, 57, 57, 58…

5.1.2 dplyr::filter

# flights taking off from PDX
pdx <- flights %>% 
  filter(origin == "PDX")
head(pdx)
year month day dep_time dep_delay arr_time arr_delay carrier tailnum flight origin dest air_time distance hour minute
2014 1 1 1 96 235 70 AS N508AS 145 PDX ANC 194 1.54e+03 0 1
2014 1 1 8 13 548 -4 UA N37422 1609 PDX IAH 201 1.82e+03 0 8
2014 1 1 28 -2 800 -23 US N547UW 466 PDX CLT 251 2.28e+03 0 28
2014 1 1 526 -4 1148 15 UA N813UA 229 PDX IAH 217 1.82e+03 5 26
2014 1 1 541 1 911 4 UA N36476 1569 PDX DEN 130 991        5 41
2014 1 1 549 24 907 12 US N548UW 649 PDX PHX 122 1.01e+03 5 49
# january flights from PDX
pdx_jan <- flights %>% 
  filter(origin == "PDX", month == 1) # the comma is an "and"
head(pdx_jan)
year month day dep_time dep_delay arr_time arr_delay carrier tailnum flight origin dest air_time distance hour minute
2014 1 1 1 96 235 70 AS N508AS 145 PDX ANC 194 1.54e+03 0 1
2014 1 1 8 13 548 -4 UA N37422 1609 PDX IAH 201 1.82e+03 0 8
2014 1 1 28 -2 800 -23 US N547UW 466 PDX CLT 251 2.28e+03 0 28
2014 1 1 526 -4 1148 15 UA N813UA 229 PDX IAH 217 1.82e+03 5 26
2014 1 1 541 1 911 4 UA N36476 1569 PDX DEN 130 991        5 41
2014 1 1 549 24 907 12 US N548UW 649 PDX PHX 122 1.01e+03 5 49
# flights to ATL (Atlanta) or BNA (Nashville)
to_south <- flights %>% 
  filter(dest == "ATL" | dest == "BNA") %>% # | is "or"
  select(origin, dest, everything())
head(to_south)
origin dest year month day dep_time dep_delay arr_time arr_delay carrier tailnum flight air_time distance hour minute
SEA ATL 2014 1 1 624 -6 1401 -6 DL N617DL 968 235 2.18e+03 6 24
SEA ATL 2014 1 1 802 -3 1533 -17 AS N532AS 742 249 2.18e+03 8 2
SEA ATL 2014 1 1 824 -1 1546 -14 DL N633DL 128 234 2.18e+03 8 24
PDX ATL 2014 1 1 944 -6 1727 -8 AS N548AS 752 252 2.17e+03 9 44
PDX ATL 2014 1 1 1054 94 1807 84 DL N377DA 1502 237 2.17e+03 10 54
SEA ATL 2014 1 1 1158 6 1915 -14 DL N6712B 1962 234 2.18e+03 11 58
# flights from PDX to ATL (Atlanta) or BNA (Nashville)
pdx_to_south <- flights %>% 
  filter(origin == "PDX", dest == "ATL" | dest == "BNA") %>% # | is "or"
  select(origin, dest, everything())
head(pdx_to_south)
origin dest year month day dep_time dep_delay arr_time arr_delay carrier tailnum flight air_time distance hour minute
PDX ATL 2014 1 1 944 -6 1727 -8 AS N548AS 752 252 2.17e+03 9 44
PDX ATL 2014 1 1 1054 94 1807 84 DL N377DA 1502 237 2.17e+03 10 54
PDX ATL 2014 1 1 1323 -2 2038 -15 DL N393DA 773 235 2.17e+03 13 23
PDX ATL 2014 1 1 2253 8 611 4 DL N371DA 503 240 2.17e+03 22 53
PDX ATL 2014 1 2 627 -3 1350 -7 DL N3746H 1156 244 2.17e+03 6 27
PDX ATL 2014 1 2 918 -2 1643 -2 DL N3756 1940 249 2.17e+03 9 18
# alternatively, using group membership
south_dests <- c("ATL", "BNA")
pdx_to_south2 <- flights %>% 
  filter(origin == "PDX", dest %in% south_dests) %>% 
  select(origin, dest, everything())
head(pdx_to_south2)
origin dest year month day dep_time dep_delay arr_time arr_delay carrier tailnum flight air_time distance hour minute
PDX ATL 2014 1 1 944 -6 1727 -8 AS N548AS 752 252 2.17e+03 9 44
PDX ATL 2014 1 1 1054 94 1807 84 DL N377DA 1502 237 2.17e+03 10 54
PDX ATL 2014 1 1 1323 -2 2038 -15 DL N393DA 773 235 2.17e+03 13 23
PDX ATL 2014 1 1 2253 8 611 4 DL N371DA 503 240 2.17e+03 22 53
PDX ATL 2014 1 2 627 -3 1350 -7 DL N3746H 1156 244 2.17e+03 6 27
PDX ATL 2014 1 2 918 -2 1643 -2 DL N3756 1940 249 2.17e+03 9 18
# flights delayed by 1 hour or more
delay_1plus <- flights %>%
  filter(dep_delay >= 60)
head(delay_1plus)
year month day dep_time dep_delay arr_time arr_delay carrier tailnum flight origin dest air_time distance hour minute
2014 1 1 1 96 235 70 AS N508AS 145 PDX ANC 194 1.54e+03 0 1
2014 1 1 37 82 747 88 DL N806DN 1823 SEA DTW 224 1.93e+03 0 37
2014 1 1 346 227 936 219 UA N14219 1481 SEA ORD 202 1.72e+03 3 46
2014 1 1 650 90 1037 91 US N626AW 460 SEA PHX 141 1.11e+03 6 50
2014 1 1 959 164 1137 157 AS N534AS 805 SEA SMF 77 605        9 59
2014 1 1 1008 68 1242 64 AS N788AS 456 SEA LAX 129 954        10 8
# flights delayed by 1 hour, but not more than 2 hours
delay_1hr <- flights %>%
  filter(dep_delay >= 60, dep_delay < 120)
head(delay_1hr)
year month day dep_time dep_delay arr_time arr_delay carrier tailnum flight origin dest air_time distance hour minute
2014 1 1 1 96 235 70 AS N508AS 145 PDX ANC 194 1.54e+03 0 1
2014 1 1 37 82 747 88 DL N806DN 1823 SEA DTW 224 1.93e+03 0 37
2014 1 1 650 90 1037 91 US N626AW 460 SEA PHX 141 1.11e+03 6 50
2014 1 1 1008 68 1242 64 AS N788AS 456 SEA LAX 129 954        10 8
2014 1 1 1014 75 1613 81 UA N37408 1444 SEA ORD 201 1.72e+03 10 14
2014 1 1 1036 81 1408 63 OO N218AG 3466 PDX TUS 134 1.12e+03 10 36
range(delay_1hr$dep_delay, na.rm = TRUE)
[1]  60 119
# even more efficient using between (always inclusive)
delay_bwn <- flights %>%
  filter(between(dep_delay, 60, 119))
head(delay_bwn)
year month day dep_time dep_delay arr_time arr_delay carrier tailnum flight origin dest air_time distance hour minute
2014 1 1 1 96 235 70 AS N508AS 145 PDX ANC 194 1.54e+03 0 1
2014 1 1 37 82 747 88 DL N806DN 1823 SEA DTW 224 1.93e+03 0 37
2014 1 1 650 90 1037 91 US N626AW 460 SEA PHX 141 1.11e+03 6 50
2014 1 1 1008 68 1242 64 AS N788AS 456 SEA LAX 129 954        10 8
2014 1 1 1014 75 1613 81 UA N37408 1444 SEA ORD 201 1.72e+03 10 14
2014 1 1 1036 81 1408 63 OO N218AG 3466 PDX TUS 134 1.12e+03 10 36
range(delay_bwn$dep_delay, na.rm = TRUE)
[1]  60 119

5.1.3 dplyr::arrange

# default is ascending order
flights %>% 
  arrange(year, month, day) %>% head(n=20)
year month day dep_time dep_delay arr_time arr_delay carrier tailnum flight origin dest air_time distance hour minute
2014 1 1 1 96 235 70 AS N508AS 145 PDX ANC 194 1.54e+03 0 1
2014 1 1 4 -6 738 -23 US N195UW 1830 SEA CLT 252 2.28e+03 0 4
2014 1 1 8 13 548 -4 UA N37422 1609 PDX IAH 201 1.82e+03 0 8
2014 1 1 28 -2 800 -23 US N547UW 466 PDX CLT 251 2.28e+03 0 28
2014 1 1 34 44 325 43 AS N762AS 121 SEA ANC 201 1.45e+03 0 34
2014 1 1 37 82 747 88 DL N806DN 1823 SEA DTW 224 1.93e+03 0 37
2014 1 1 346 227 936 219 UA N14219 1481 SEA ORD 202 1.72e+03 3 46
2014 1 1 526 -4 1148 15 UA N813UA 229 PDX IAH 217 1.82e+03 5 26
2014 1 1 527 7 917 24 UA N75433 1576 SEA DEN 136 1.02e+03 5 27
2014 1 1 536 1 1334 -6 UA N574UA 478 SEA EWR 268 2.4e+03  5 36
2014 1 1 541 1 911 4 UA N36476 1569 PDX DEN 130 991        5 41
2014 1 1 549 24 907 12 US N548UW 649 PDX PHX 122 1.01e+03 5 49
2014 1 1 550 0 837 -12 DL N660DL 1634 SEA SLC 82 689        5 50
2014 1 1 557 -3 1134 -16 AA N3JLAA 1094 SEA DFW 184 1.66e+03 5 57
2014 1 1 557 -3 825 -25 AS N562AS 81 SEA ANC 188 1.45e+03 5 57
2014 1 1 558 -2 801 -2 AS N402AS 200 SEA SJC 100 697        5 58
2014 1 1 559 -1 916 -9 F9 N210FR 796 PDX DEN 125 991        5 59
2014 1 1 600 0 1151 -19 AA N3JFAA 2240 SEA ORD 206 1.72e+03 6 0
2014 1 1 600 -10 842 -8 AS N786AS 426 SEA LAX 125 954        6 0
2014 1 1 602 -3 943 5 F9 N201FR 144 SEA DEN 129 1.02e+03 6 2
# descending order
flights %>% 
  arrange(desc(year), desc(month), desc(day)) %>% head(n=20)
year month day dep_time dep_delay arr_time arr_delay carrier tailnum flight origin dest air_time distance hour minute
2014 12 31 2 12 601 31 AA N3JKAA 1230 SEA DFW 210 1.66e+03 0 2
2014 12 31 27 -3 623 3 AA N3EWAA 1431 SEA ORD 211 1.72e+03 0 27
2014 12 31 39 14 324 4 AS N762AS 135 SEA ANC 207 1.45e+03 0 39
2014 12 31 40 0 549 0 DL N757AT 2440 SEA MSP 174 1.4e+03  0 40
2014 12 31 52 -8 917 -21 AA N3JFAA 371 SEA MIA 306 2.72e+03 0 52
2014 12 31 54 4 621 17 DL N128DL 1670 PDX MSP 174 1.43e+03 0 54
2014 12 31 56 61 848 80 DL N655DL 929 SEA ATL 249 2.18e+03 0 56
2014 12 31 512 -3 904 4 US N653AW 480 SEA PHX 154 1.11e+03 5 12
2014 12 31 515 -5 855 5 US N580UW 425 PDX PHX 144 1.01e+03 5 15
2014 12 31 534 4 859 7 UA N34460 1075 PDX DEN 125 991        5 34
2014 12 31 546 1 916 -4 WN N8323C 757 PDX DEN 126 991        5 46
2014 12 31 548 -2 1351 -13 UA N461UA 665 PDX EWR 287 2.43e+03 5 48
2014 12 31 549 4 1208 12 UA N68807 1457 SEA IAH 223 1.87e+03 5 49
2014 12 31 550 0 922 2 WN N797MX 2121 PDX PHX 138 1.01e+03 5 50
2014 12 31 551 -4 1202 12 AA N3HXAA 1094 SEA DFW 211 1.66e+03 5 51
2014 12 31 551 -9 744 -15 AS N570AS 342 SEA OAK 91 671        5 51
2014 12 31 555 -10 824 -1 AS N548AS 602 SEA LAS 124 867        5 55
2014 12 31 558 -2 849 0 DL N668DN 1831 PDX SLC 85 630        5 58
2014 12 31 558 -2 1149 4 AA N436AA 1534 PDX DFW 211 1.62e+03 5 58
2014 12 31 558 -2 738 -4 AS N585AS 406 PDX SJC 80 569        5 58

5.1.4 dplyr::distinct

# unique origin-dest combinations
flights %>% 
  select(origin, dest) %>% 
  distinct %>% head(n=50)
origin dest
PDX ANC
SEA CLT
PDX IAH
PDX CLT
SEA ANC
SEA DTW
SEA ORD
SEA DEN
SEA EWR
PDX DEN
PDX PHX
SEA SLC
SEA DFW
SEA SJC
SEA LAX
PDX ORD
SEA OAK
SEA SFO
PDX SJC
SEA SNA
SEA SAN
PDX DFW
PDX EWR
SEA IAH
SEA ATL
PDX BUR
SEA MDW
PDX LAX
PDX SAN
PDX IAD
SEA PSP
PDX MDW
SEA MSP
PDX SNA
SEA PHX
PDX HNL
SEA JFK
PDX MSP
PDX SFO
SEA LAS
SEA LGB
PDX ONT
PDX LGB
PDX JFK
PDX KOA
PDX LAS
SEA ONT
SEA MKE
SEA BUR
SEA KTN
# all unique destinations from PDX (there are 49, so we'll just list the first few)
from_pdx <- flights %>% 
  filter(origin == "PDX") %>% 
  select(origin, dest) %>%
  distinct(dest)
head(from_pdx)
dest
ANC
IAH
CLT
DEN
PHX
ORD

5.1.5 dplyr::mutate

# add total delay variable
flights %>%
  mutate(tot_delay = dep_delay + arr_delay) %>%
  select(origin, dest, ends_with("delay"), everything()) %>% 
  head(n=100)
origin dest dep_delay arr_delay tot_delay year month day dep_time arr_time carrier tailnum flight air_time distance hour minute
PDX ANC 96 70 166 2014 1 1 1 235 AS N508AS 145 194 1.54e+03 0 1
SEA CLT -6 -23 -29 2014 1 1 4 738 US N195UW 1830 252 2.28e+03 0 4
PDX IAH 13 -4 9 2014 1 1 8 548 UA N37422 1609 201 1.82e+03 0 8
PDX CLT -2 -23 -25 2014 1 1 28 800 US N547UW 466 251 2.28e+03 0 28
SEA ANC 44 43 87 2014 1 1 34 325 AS N762AS 121 201 1.45e+03 0 34
SEA DTW 82 88 170 2014 1 1 37 747 DL N806DN 1823 224 1.93e+03 0 37
SEA ORD 227 219 446 2014 1 1 346 936 UA N14219 1481 202 1.72e+03 3 46
PDX IAH -4 15 11 2014 1 1 526 1148 UA N813UA 229 217 1.82e+03 5 26
SEA DEN 7 24 31 2014 1 1 527 917 UA N75433 1576 136 1.02e+03 5 27
SEA EWR 1 -6 -5 2014 1 1 536 1334 UA N574UA 478 268 2.4e+03  5 36
PDX DEN 1 4 5 2014 1 1 541 911 UA N36476 1569 130 991        5 41
PDX PHX 24 12 36 2014 1 1 549 907 US N548UW 649 122 1.01e+03 5 49
SEA SLC 0 -12 -12 2014 1 1 550 837 DL N660DL 1634 82 689        5 50
SEA DFW -3 -16 -19 2014 1 1 557 1134 AA N3JLAA 1094 184 1.66e+03 5 57
SEA ANC -3 -25 -28 2014 1 1 557 825 AS N562AS 81 188 1.45e+03 5 57
SEA SJC -2 -2 -4 2014 1 1 558 801 AS N402AS 200 100 697        5 58
PDX DEN -1 -9 -10 2014 1 1 559 916 F9 N210FR 796 125 991        5 59
SEA ORD 0 -19 -19 2014 1 1 600 1151 AA N3JFAA 2240 206 1.72e+03 6 0
SEA LAX -10 -8 -18 2014 1 1 600 842 AS N786AS 426 125 954        6 0
SEA DEN -3 5 2 2014 1 1 602 943 F9 N201FR 144 129 1.02e+03 6 2
PDX ORD -3 7 4 2014 1 1 602 1204 UA N77295 1573 203 1.74e+03 6 2
SEA OAK -2 -17 -19 2014 1 1 603 755 AS N577AS 344 94 672        6 3
SEA ORD -3 -2 -5 2014 1 1 603 1202 UA N37290 1136 203 1.72e+03 6 3
SEA SFO -4 -19 -23 2014 1 1 606 806 AS N525AS 308 98 679        6 6
PDX SJC 6 3 9 2014 1 1 606 746 AS N796AS 406 87 569        6 6
SEA SNA -1 -2 -3 2014 1 1 614 850 AS N607AS 524 135 978        6 14
SEA SAN 2 -12 -10 2014 1 1 617 850 AS N579AS 494 131 1.05e+03 6 17
PDX DFW -2 -30 -32 2014 1 1 618 1135 AA N4XMAA 1650 184 1.62e+03 6 18
SEA SFO -6 -7 -13 2014 1 1 619 822 VX N841VA 751 101 679        6 19
SEA LAX -2 0 -2 2014 1 1 620 905 OO N804SK 4693 125 954        6 20
PDX EWR 2 -19 -17 2014 1 1 622 1412 UA N36472 1021 270 2.43e+03 6 22
SEA IAH 13 -4 9 2014 1 1 623 1218 UA N72405 1101 213 1.87e+03 6 23
SEA ATL -6 -6 -12 2014 1 1 624 1401 DL N617DL 968 235 2.18e+03 6 24
SEA DEN -9 -1 -10 2014 1 1 629 1014 UA N566UA 883 132 1.02e+03 6 29
PDX BUR -10 -14 -24 2014 1 1 630 834 OO N219AG 3456 105 817        6 30
SEA MDW -3 5 2 2014 1 1 632 1235 WN N8302F 3694 204 1.73e+03 6 32
PDX LAX -8 -7 -15 2014 1 1 637 858 AS N408AS 560 108 834        6 37
PDX SAN -3 -6 -9 2014 1 1 637 854 AS N588AS 572 120 933        6 37
PDX PHX -2 -5 -7 2014 1 1 638 1003 AS N508AS 646 125 1.01e+03 6 38
PDX IAD 10 -4 6 2014 1 1 638 1408 UA N526UA 250 253 2.33e+03 6 38
SEA PSP -1 3 2 2014 1 1 639 918 AS N794AS 556 130 987        6 39
SEA DFW -1 -10 -11 2014 1 1 639 1216 AS N442AS 668 189 1.66e+03 6 39
PDX MDW 4 -11 -7 2014 1 1 639 1219 WN N8308K 401 207 1.75e+03 6 39
SEA LAX -3 -2 -5 2014 1 1 647 923 AS N315AS 470 124 954        6 47
SEA DEN -3 19 16 2014 1 1 647 1038 AS N553AS 682 136 1.02e+03 6 47
SEA MSP -7 -9 -16 2014 1 1 648 1203 AS N419AS 36 164 1.4e+03  6 48
PDX SNA -12 -11 -23 2014 1 1 648 910 AS N611AS 582 122 859        6 48
SEA PHX 90 91 181 2014 1 1 650 1037 US N626AW 460 141 1.11e+03 6 50
PDX HNL -6 -33 -39 2014 1 1 654 1047 AS N586AS 833 327 2.6e+03  6 54
SEA JFK -6 -10 -16 2014 1 1 654 1455 DL N686DA 418 273 2.42e+03 6 54
PDX MSP -5 -7 -12 2014 1 1 655 1210 DL N963DN 2090 168 1.43e+03 6 55
PDX ORD -4 -28 -32 2014 1 1 656 1242 AA N3KMAA 2218 200 1.74e+03 6 56
PDX SFO -4 8 4 2014 1 1 656 853 VX N523VA 805 82 550        6 56
PDX SFO 0 -1 -1 2014 1 1 700 844 UA N479UA 519 80 550        7 0
SEA LAS -4 -6 -10 2014 1 1 701 918 AS N618AS 606 115 867        7 1
SEA PHX 1 -9 -8 2014 1 1 701 1036 WN N8318F 669 133 1.11e+03 7 1
SEA LAX -3 -8 -11 2014 1 1 702 932 VX N840VA 1780 127 954        7 2
SEA LGB -8 -2 -10 2014 1 1 702 940 OO N217AG 3472 130 965        7 2
SEA DFW -6 -20 -26 2014 1 1 704 1245 AA N3KNAA 1308 188 1.66e+03 7 4
PDX ONT -1 -9 -10 2014 1 1 704 910 OO N223AG 3460 106 838        7 4
PDX LGB 0 9 9 2014 1 1 705 931 B6 N627JB 1121 113 846        7 5
SEA SLC 6 -13 -7 2014 1 1 706 952 AS N306AS 726 86 689        7 6
SEA JFK -7 -19 -26 2014 1 1 708 1510 AA N3DNAA 236 281 2.42e+03 7 8
PDX JFK -2 -20 -22 2014 1 1 708 1453 DL N3772H 2258 267 2.45e+03 7 8
SEA SJC -1 -5 -6 2014 1 1 709 915 AS N558AS 332 103 697        7 9
SEA MSP 9 -4 5 2014 1 1 709 1209 DL N374NW 2314 160 1.4e+03  7 9
PDX SFO -5 -21 -26 2014 1 1 710 906 AS N309AS 244 82 550        7 10
SEA SLC -3 -1 -4 2014 1 1 712 1013 DL N355NW 2397 85 689        7 12
SEA IAH 20 10 30 2014 1 1 713 1317 UA N854UA 461 218 1.87e+03 7 13
SEA LAX 0 -10 -10 2014 1 1 715 944 AS N617AS 478 127 954        7 15
PDX KOA -5 -47 -52 2014 1 1 715 1055 AS N536AS 855 326 2.61e+03 7 15
SEA PHX -3 -11 -14 2014 1 1 717 1057 AS N769AS 642 134 1.11e+03 7 17
PDX LAS -3 3 0 2014 1 1 722 933 WN N407WN 947 101 763        7 22
SEA DEN 1 17 18 2014 1 1 726 1117 WN N603SW 2181 133 1.02e+03 7 26
SEA ONT -2 -7 -9 2014 1 1 728 951 AS N320AS 530 124 956        7 28
SEA SFO 5 -9 -4 2014 1 1 735 940 UA N16732 1114 104 679        7 35
SEA SAN 14 -2 12 2014 1 1 739 1003 AS N518AS 240 132 1.05e+03 7 39
SEA SFO 31 19 50 2014 1 1 741 947 AS N431AS 222 96 679        7 41
SEA MKE 5 7 12 2014 1 1 745 1327 WN N924WN 3872 200 1.69e+03 7 45
SEA BUR -6 -3 -9 2014 1 1 749 1013 AS N627AS 522 122 937        7 49
PDX LAX -6 -14 -20 2014 1 1 754 1011 AS N706AS 568 111 834        7 54
SEA KTN 52 45 97 2014 1 1 757 855 AS N705AS 65 100 680        7 57
PDX OAK -4 -18 -22 2014 1 1 801 932 WN N447WN 48 78 543        8 1
SEA ATL -3 -17 -20 2014 1 1 802 1533 AS N532AS 742 249 2.18e+03 8 2
SEA LAS -1 -8 -9 2014 1 1 809 1021 AS N557AS 600 115 867        8 9
PDX MSP -6 -5 -11 2014 1 1 809 1326 DL N951DN 1588 170 1.43e+03 8 9
SEA ORD -10 -11 -21 2014 1 1 810 1419 AA N3DSAA 1236 202 1.72e+03 8 10
SEA DCA 0 -2 -2 2014 1 1 810 1604 AS N514AS 4 259 2.33e+03 8 10
SEA ANC 17 -4 13 2014 1 1 817 1045 AS N529AS 83 184 1.45e+03 8 17
SEA ABQ -3 -7 -10 2014 1 1 817 1158 WN N379SW 253 138 1.18e+03 8 17
PDX DFW 14 -7 7 2014 1 1 819 1343 AA N4XPAA 1220 178 1.62e+03 8 19
SEA IAD 10 -8 2 2014 1 1 821 1552 UA N514UA 419 250 2.31e+03 8 21
SEA ATL -1 -14 -15 2014 1 1 824 1546 DL N633DL 128 234 2.18e+03 8 24
SEA DFW -5 -3 -8 2014 1 1 825 1412 AA N3BDAA 312 183 1.66e+03 8 25
SEA MCO 25 30 55 2014 1 1 825 1650 AS N551AS 16 301 2.55e+03 8 25
SEA ORD -5 15 10 2014 1 1 825 1436 AS N305AS 28 202 1.72e+03 8 25
SEA MSP 0 10 10 2014 1 1 825 1352 DL N584NW 588 162 1.4e+03  8 25
PDX IAH 2 -16 -14 2014 1 1 827 1414 UA N38446 1510 210 1.82e+03 8 27
SEA OGG -1 -22 -23 2014 1 1 829 1228 AS N563AS 861 338 2.64e+03 8 29
SEA PHX 0 -12 -12 2014 1 1 830 1206 US N667AW 440 132 1.11e+03 8 30
# flights that were delayed at departure had on time or early arrivals?
arrivals <- flights %>%
  mutate(arr_ok = ifelse(dep_delay > 0 & arr_delay <= 0, 1, 0)) %>% 
  select(origin, dest, ends_with("delay"), carrier, arr_ok)

# peek at it
arrivals %>%
  filter(arr_ok == 1) %>%
  head
origin dest dep_delay arr_delay carrier arr_ok
PDX IAH 13 -4 UA 1
SEA EWR 1 -6 UA 1
SEA SAN 2 -12 AS 1
PDX EWR 2 -19 UA 1
SEA IAH 13 -4 UA 1
PDX IAD 10 -4 UA 1

5.1.6 dplyr::summarise (or dplyr::summarize)

flights %>%
  summarise(mean(dep_delay, na.rm = TRUE))
mean(dep_delay, na.rm = TRUE)
6.13
# we can also name that variable, and summarise multiple variables
flights %>%
  summarise(mean_delay = mean(dep_delay, na.rm = TRUE),
            sd_delay = sd(dep_delay, na.rm = TRUE),
            median_delay = median(dep_delay, na.rm = TRUE))
mean_delay sd_delay median_delay
6.13 29.1 -2

But this can get tedious with multiple summaries…

flights %>%
  filter(!is.na(dep_delay)) %>%
  select(dep_delay) %>%
  summarise_each(lst(mean, sd, median))
mean sd median
6.13 29.1 -2
# same thing
flights %>%
  filter(!is.na(dep_delay)) %>%
  summarise_each(lst(mean, sd, median), dep_delay)
mean sd median
6.13 29.1 -2
# combine with gather, change names too
flights %>%
  filter(!is.na(dep_delay)) %>%
  summarise_each(lst(mean, stdev = sd, median), dep_delay) %>%
  gather(delay_stat, value)
delay_stat value
mean 6.13
stdev 29.1 
median -2   

Using aggregating functions in summarise

# how many unique destinations?
summary_table <- flights %>% 
  summarise(tot_flights = n(),
            tot_planes = n_distinct(tailnum),
            tot_carriers = n_distinct(carrier),
            tot_dests = n_distinct(dest),
            tot_origins = n_distinct(origin))

summary_table
tot_flights tot_planes tot_carriers tot_dests tot_origins
162049 3023 11 71 2
# chain with tidyr functions
summary_table %>% 
  gather(key, value) %>% 
  separate(key, into = c("tot", "entity")) %>% 
  select(-tot, total = value)
entity total
flights 162049
planes 3023
carriers 11
dests 71
origins 2

5.2 tidyr

We’ll work with a made up dataframe:

df <- data.frame(
  id = 1:10,
  date = as.Date('2015-01-01') + 0:9,
  q1_m1_w1 = rnorm(10, 0, 1),
  q1_m1_w2 = rnorm(10, 0, 1),
  q1_m2_w3 = rnorm(10, 0, 1),
  q2_m1_w1 = rnorm(10, 0, 1),
  q2_m2_w1 = rnorm(10, 0, 1),
  q2_m2_w2 = rnorm(10, 0, 1)
)
# HLO
head(df)
id date q1_m1_w1 q1_m1_w2 q1_m2_w3 q2_m1_w1 q2_m2_w1 q2_m2_w2
1 2015-01-01 0.908  -1.06  0.259 -0.0417 0.312 0.96  
2 2015-01-02 -0.8    0.378 -0.74  -0.888  -0.474 1.13  
3 2015-01-03 0.373  0.12  -0.954 -0.767  -0.789 -0.0994
4 2015-01-04 -0.0835 -1.18  0.336 -1.02   -0.808 -0.201 
5 2015-01-05 -0.261  -0.47  1.12  1.24   0.423 -0.781 
6 2015-01-06 0.16   0.264 0.939 -1.08   0.17  0.0755
glimpse(df)
Rows: 10
Columns: 8
$ id       <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
$ date     <date> 2015-01-01, 2015-01-02, 2015-01-03, 2015-01-04, 2015-01-05,…
$ q1_m1_w1 <dbl> 0.90833895, -0.80014146, 0.37278036, -0.08348309, -0.2609710…
$ q1_m1_w2 <dbl> -1.0636021, 0.3780610, 0.1200822, -1.1750064, -0.4698987, 0.…
$ q1_m2_w3 <dbl> 0.2585625, -0.7404383, -0.9540600, 0.3363360, 1.1159808, 0.9…
$ q2_m1_w1 <dbl> -0.04166279, -0.88810449, -0.76684055, -1.02118865, 1.243960…
$ q2_m2_w1 <dbl> 0.3118560, -0.4739947, -0.7892439, -0.8077828, 0.4226551, 0.…
$ q2_m2_w2 <dbl> 0.96043415, 1.13194728, -0.09936716, -0.20098062, -0.7809419…

5.2.1 tidyr::gather

First, let’s gather…

df_tidy <- df %>%
  gather(key, value, q1_m1_w1:q2_m2_w2)
head(df_tidy)
id date key value
1 2015-01-01 q1_m1_w1 0.908 
2 2015-01-02 q1_m1_w1 -0.8   
3 2015-01-03 q1_m1_w1 0.373 
4 2015-01-04 q1_m1_w1 -0.0835
5 2015-01-05 q1_m1_w1 -0.261 
6 2015-01-06 q1_m1_w1 0.16  

Now let’s gather using subtraction…

df_tidy <- df %>%
  gather(key, value, -id, -date)
head(df_tidy)
id date key value
1 2015-01-01 q1_m1_w1 0.908 
2 2015-01-02 q1_m1_w1 -0.8   
3 2015-01-03 q1_m1_w1 0.373 
4 2015-01-04 q1_m1_w1 -0.0835
5 2015-01-05 q1_m1_w1 -0.261 
6 2015-01-06 q1_m1_w1 0.16  

5.2.2 tidyr::separate

# separate 1 col into 3 cols
df_sep <- df_tidy %>%
  separate(key, into = c("quarter", "month", "week"))
head(df_sep)
id date quarter month week value
1 2015-01-01 q1 m1 w1 0.908 
2 2015-01-02 q1 m1 w1 -0.8   
3 2015-01-03 q1 m1 w1 0.373 
4 2015-01-04 q1 m1 w1 -0.0835
5 2015-01-05 q1 m1 w1 -0.261 
6 2015-01-06 q1 m1 w1 0.16  
# separate 1 col into 2 cols
df_sep2 <- df_tidy %>%
  separate(key, into = c("quarter", "period"), extra = "merge")
head(df_sep2)

id date quarter period value
1 2015-01-01 q1 m1_w1 0.908 
2 2015-01-02 q1 m1_w1 -0.8   
3 2015-01-03 q1 m1_w1 0.373 
4 2015-01-04 q1 m1_w1 -0.0835
5 2015-01-05 q1 m1_w1 -0.261 
6 2015-01-06 q1 m1_w1 0.16  
stringr vs. tidyr separate by regular expression

5.2.3 tidyr::extract

Extract is essentially the same as separate, let’s see how…

# extract
df_ext <- df_sep2 %>%
  extract(period, into = "month")
head(df_ext)
id date quarter month value
1 2015-01-01 q1 m1 0.908 
2 2015-01-02 q1 m1 -0.8   
3 2015-01-03 q1 m1 0.373 
4 2015-01-04 q1 m1 -0.0835
5 2015-01-05 q1 m1 -0.261 
6 2015-01-06 q1 m1 0.16  
# this gives us same output as separate
df_ext <- df_sep2 %>%
  extract(period, into = c("month", "week"), 
          regex = "([[:alnum:]]+)_([[:alnum:]]+)")
head(df_ext)
id date quarter month week value
1 2015-01-01 q1 m1 w1 0.908 
2 2015-01-02 q1 m1 w1 -0.8   
3 2015-01-03 q1 m1 w1 0.373 
4 2015-01-04 q1 m1 w1 -0.0835
5 2015-01-05 q1 m1 w1 -0.261 
6 2015-01-06 q1 m1 w1 0.16  

5.2.4 tidyr::unite

# let's say we want to combine quarter and month with an underscore
df_uni <- df_sep %>%
  unite(period, quarter:month) # sep = "_" is the default arg
head(df_uni)
id date period week value
1 2015-01-01 q1_m1 w1 0.908 
2 2015-01-02 q1_m1 w1 -0.8   
3 2015-01-03 q1_m1 w1 0.373 
4 2015-01-04 q1_m1 w1 -0.0835
5 2015-01-05 q1_m1 w1 -0.261 
6 2015-01-06 q1_m1 w1 0.16  
# let's say we want to combine quarter and month with nothing
df_uni <- df_sep %>%
  unite(period, quarter:month, sep = "")
head(df_uni)
id date period week value
1 2015-01-01 q1m1 w1 0.908 
2 2015-01-02 q1m1 w1 -0.8   
3 2015-01-03 q1m1 w1 0.373 
4 2015-01-04 q1m1 w1 -0.0835
5 2015-01-05 q1m1 w1 -0.261 
6 2015-01-06 q1m1 w1 0.16  

5.2.5 tidyr::spread

# finally let's spread
df_spread <- df_uni %>%
  spread(week, value) # fill = NA is default arg
head(df_spread)
id date period w1 w2 w3
1 2015-01-01 q1m1 0.908  -1.06      
1 2015-01-01 q1m2            0.259
1 2015-01-01 q2m1 -0.0417          
1 2015-01-01 q2m2 0.312  0.96      
2 2015-01-02 q1m1 -0.8    0.378     
2 2015-01-02 q1m2            -0.74 

5.2.6 Gather multiple sets of columns (gather() %>% separate() %>% spread())

Gather multiple sets of columns

All in one, if we had wanted to essentially “gather” three sets of columns (here, one for each week)…

df_tidiest <- df %>%
  gather(key, value, -id, -date) %>%
  separate(key, into = c("quarter", "month", "week")) %>%
  spread(week, value)
head(df_tidiest)
id date quarter month w1 w2 w3
1 2015-01-01 q1 m1 0.908  -1.06      
1 2015-01-01 q1 m2            0.259
1 2015-01-01 q2 m1 -0.0417          
1 2015-01-01 q2 m2 0.312  0.96      
2 2015-01-02 q1 m1 -0.8    0.378     
2 2015-01-02 q1 m2            -0.74 

5.3 broom

“The broom package takes the messy output of built-in functions in R, such as lm, nls, or t.test, and turns them into tidy data frames.” So, broom tidies output from other R functions that are un-tidy.

See here for list of functions: https://github.com/dgrtwo/broom

Vignette: ftp://cran.r-project.org/pub/R/web/packages/broom/vignettes/broom.html

fit <- lm(mpg ~ qsec + factor(am) + wt + factor(gear), 
          data = mtcars)

Un-tidy output from lm

summary(fit)

Call:
lm(formula = mpg ~ qsec + factor(am) + wt + factor(gear), data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.5064 -1.5220 -0.7517  1.3841  4.6345 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)     9.3650     8.3730   1.118  0.27359    
qsec            1.2449     0.3828   3.252  0.00317 ** 
factor(am)1     3.1505     1.9405   1.624  0.11654    
wt             -3.9263     0.7428  -5.286 1.58e-05 ***
factor(gear)4  -0.2682     1.6555  -0.162  0.87257    
factor(gear)5  -0.2697     2.0632  -0.131  0.89698    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.55 on 26 degrees of freedom
Multiple R-squared:  0.8498,    Adjusted R-squared:  0.8209 
F-statistic: 29.43 on 5 and 26 DF,  p-value: 6.379e-10

Tidy output from broom

tidy(fit)
term estimate std.error statistic p.value
(Intercept) 9.37  8.37  1.12  0.274   
qsec 1.24  0.383 3.25  0.00317 
factor(am)1 3.15  1.94  1.62  0.117   
wt -3.93  0.743 -5.29  1.58e-05
factor(gear)4 -0.268 1.66  -0.162 0.873   
factor(gear)5 -0.27  2.06  -0.131 0.897   

6 Specialized Packages

6.1 gt

gt is a relatively new package from RStudio designed for publication-ready figures.

library(gt)

Let’s set up a tiny little data table…

prepped_flight_data <- flights %>% filter(dest %in% c("DEN", "DCA", "JFK", "SFO")) %>% 
  group_by(dest) %>% 
  mutate_at(vars(dep_delay), replace_na, replace=0.0) %>%  
  summarise(mean_delay=mean(dep_delay, na.rm=TRUE), median_delay=median(dep_delay, na.rm=TRUE))
glimpse(prepped_flight_data)
Rows: 4
Columns: 3
$ dest         <chr> "DCA", "DEN", "JFK", "SFO"
$ mean_delay   <dbl> 1.475138, 8.753835, 9.235996, 12.996799
$ median_delay <dbl> -3, -1, -1, -1

… and now make a table out of it:

gt_formatted <- prepped_flight_data %>%  gt() %>% 
  tab_header(title="PDX Departure Delays, DCA/DEN/JFK/SFO", subtitle="(Delay in minutes)") %>% 
  fmt_number(columns=vars(mean_delay, median_delay), decimals=2) %>% 
  cols_label(dest="Destination", mean_delay="Mean", median_delay="Median") 
gt_formatted 
PDX Departure Delays, DCA/DEN/JFK/SFO
(Delay in minutes)
Destination Mean Median
DCA 1.48 −3.00
DEN 8.75 −1.00
JFK 9.24 −1.00
SFO 13.00 −1.00

We can adjust things like text alignment after-the-fact:

gt_formatted <- gt_formatted %>% 
  cols_align(align="center", columns=vars(dest)) %>%
  tab_style(
    cell_text(align = "right"), 
    locations = cells_title(groups = c("subtitle"))
  ) %>% 
  tab_style(
    cell_text(style = "italic"),
    locations = cells_title(groups=c("title"))
  )
gt_formatted
PDX Departure Delays, DCA/DEN/JFK/SFO
(Delay in minutes)
Destination Mean Median
DCA 1.48 −3.00
DEN 8.75 −1.00
JFK 9.24 −1.00
SFO 13.00 −1.00

Tip: In addition to cells_title(), there are helpers to select any of the other cell types (e.g. cells_column_labels(), etc.).

Now we can turn it into whatever we need! As Latex:

gt_formatted %>% gtsave("my_table.tex")

As RTF, to put in Word:

gt_formatted %>% gtsave("my_table.rtf")

6.2 tableone

tableone is for generating “Table 1” in your paper. You know, the “Subject Characteristics” table- lots of boring summary statistics.

Vignette: https://cran.r-project.org/web/packages/tableone/vignettes/introduction.html

library(tableone)
CreateTableOne(data = mazes)
                          
                           Overall       
  n                           381        
  study_id (%)                           
     CSLU-001                   4 ( 1.0) 
     CSLU-002                   4 ( 1.0) 
     CSLU-007                   4 ( 1.0) 
     CSLU-010                   4 ( 1.0) 
     CSLU-020                   4 ( 1.0) 
     CSLU-024                   4 ( 1.0) 
     CSLU-027                   4 ( 1.0) 
     CSLU-031                   4 ( 1.0) 
     CSLU-036                   3 ( 0.8) 
     CSLU-046                   4 ( 1.0) 
     CSLU-053                   4 ( 1.0) 
     CSLU-054                   4 ( 1.0) 
     CSLU-059                   4 ( 1.0) 
     CSLU-062                   4 ( 1.0) 
     CSLU-066                   4 ( 1.0) 
     CSLU-073                   4 ( 1.0) 
     CSLU-077                   4 ( 1.0) 
     CSLU-080                   4 ( 1.0) 
     CSLU-082                   3 ( 0.8) 
     CSLU-084                   4 ( 1.0) 
     CSLU-089                   4 ( 1.0) 
     CSLU-095                   3 ( 0.8) 
     CSLU-096                   4 ( 1.0) 
     CSLU-101                   4 ( 1.0) 
     CSLU-104                   4 ( 1.0) 
     CSLU-112                   4 ( 1.0) 
     CSLU-117                   4 ( 1.0) 
     CSLU-119                   4 ( 1.0) 
     CSLU-122                   4 ( 1.0) 
     CSLU-124                   3 ( 0.8) 
     CSLU-142                   4 ( 1.0) 
     CSLU-144                   4 ( 1.0) 
     CSLU-146                   4 ( 1.0) 
     CSLU-154                   4 ( 1.0) 
     CSLU-156                   4 ( 1.0) 
     CSLU-161                   4 ( 1.0) 
     CSLU-163                   4 ( 1.0) 
     CSLU-165                   4 ( 1.0) 
     CSLU-167                   4 ( 1.0) 
     CSLU-180                   4 ( 1.0) 
     CSLU-191                   4 ( 1.0) 
     CSLU-203                   4 ( 1.0) 
     CSLU-204                   4 ( 1.0) 
     CSLU-213                   4 ( 1.0) 
     CSLU-216                   4 ( 1.0) 
     CSLU-220                   4 ( 1.0) 
     CSLU-226                   4 ( 1.0) 
     CSLU-233                   4 ( 1.0) 
     CSLU-238                   4 ( 1.0) 
     CSLU-245                   4 ( 1.0) 
     CSLU-258                   4 ( 1.0) 
     CSLU-259                   4 ( 1.0) 
     CSLU-263                   4 ( 1.0) 
     CSLU-269                   4 ( 1.0) 
     CSLU-274                   4 ( 1.0) 
     CSLU-275                   4 ( 1.0) 
     CSLU-277                   4 ( 1.0) 
     CSLU-284                   4 ( 1.0) 
     CSLU-290                   4 ( 1.0) 
     CSLU-303                   4 ( 1.0) 
     CSLU-306                   4 ( 1.0) 
     CSLU-312                   4 ( 1.0) 
     CSLU-315                   4 ( 1.0) 
     CSLU-316                   4 ( 1.0) 
     CSLU-320                   4 ( 1.0) 
     CSLU-324                   4 ( 1.0) 
     CSLU-332                   4 ( 1.0) 
     CSLU-335                   4 ( 1.0) 
     CSLU-339                   4 ( 1.0) 
     CSLU-348                   4 ( 1.0) 
     CSLU-349                   4 ( 1.0) 
     CSLU-355                   4 ( 1.0) 
     CSLU-359                   4 ( 1.0) 
     CSLU-372                   4 ( 1.0) 
     CSLU-373                   4 ( 1.0) 
     CSLU-375                   4 ( 1.0) 
     CSLU-379                   4 ( 1.0) 
     CSLU-388                   2 ( 0.5) 
     CSLU-389                   4 ( 1.0) 
     CSLU-393                   4 ( 1.0) 
     CSLU-395                   4 ( 1.0) 
     CSLU-417                   4 ( 1.0) 
     CSLU-419                   4 ( 1.0) 
     CSLU-427                   4 ( 1.0) 
     CSLU-432                   3 ( 0.8) 
     CSLU-435                   4 ( 1.0) 
     CSLU-441                   4 ( 1.0) 
     CSLU-442                   4 ( 1.0) 
     CSLU-447                   4 ( 1.0) 
     CSLU-454                   4 ( 1.0) 
     CSLU-460                   4 ( 1.0) 
     CSLU-470                   4 ( 1.0) 
     CSLU-472                   4 ( 1.0) 
     CSLU-477                   4 ( 1.0) 
     CSLU-482                   4 ( 1.0) 
     CSLU-486                   4 ( 1.0) 
     CSLU-499                   4 ( 1.0) 
  ca (mean (SD))             6.83 (1.06) 
  viq (mean (SD))          100.82 (18.74)
  dx (%)                                 
     ASD                      183 (48.0) 
     SLI                       71 (18.6) 
     TD                       127 (33.3) 
  activity (%)                           
     Conversation              94 (24.7) 
     Picture Description       94 (24.7) 
     Play                      96 (25.2) 
     Wordless Picture Book     97 (25.5) 
  content (mean (SD))       18.73 (24.84)
  filler (mean (SD))        11.20 (17.59)
  rep (mean (SD))            6.24 (9.45) 
  rev (mean (SD))            3.79 (4.31) 
  fs (mean (SD))             8.70 (12.76)
  cued (mean (SD))          14.36 (24.22)
  not_cued (mean (SD))      26.77 (31.73)
my_maze_names <- c("Participant", "Age", "Verbal\nIQ", "Group", "Activity", "Content\nMaze", "Filler\nMaze", "Repetition", "Revision", "False\nStart", "Cued", "Not\nCued")
## Vector of variables to summarize
my_num_vars <- c("ca", "viq", "content", "filler", "rep", "rev", "fs", "cued", "not_cued")
## Vector of categorical variables that need transformation
my_cat_vars <- c("dx", "activity")
## Create a TableOne object
tab2 <- CreateTableOne(vars = my_num_vars, data = mazes, factorVars = my_cat_vars)
print(tab2, showAllLevels = TRUE)
                      
                       level Overall       
  n                             381        
  ca (mean (SD))               6.83 (1.06) 
  viq (mean (SD))            100.82 (18.74)
  content (mean (SD))         18.73 (24.84)
  filler (mean (SD))          11.20 (17.59)
  rep (mean (SD))              6.24 (9.45) 
  rev (mean (SD))              3.79 (4.31) 
  fs (mean (SD))               8.70 (12.76)
  cued (mean (SD))            14.36 (24.22)
  not_cued (mean (SD))        26.77 (31.73)
tab3 <- CreateTableOne(vars = my_num_vars, strata = "dx" , data = mazes)
tab3
                      Stratified by dx
                       ASD           SLI           TD             p      test
  n                      183            71            127                    
  ca (mean (SD))        6.74 (1.11)   7.15 (1.00)    6.76 (0.97)   0.015     
  viq (mean (SD))      95.29 (17.62) 86.24 (5.95)  116.94 (12.82) <0.001     
  content (mean (SD))  20.46 (29.73) 17.34 (24.35)  17.00 (15.67)  0.422     
  filler (mean (SD))    7.86 (13.54) 10.56 (16.35)  16.38 (21.84) <0.001     
  rep (mean (SD))       7.25 (11.82)  5.45 (6.86)    5.23 (6.21)   0.134     
  rev (mean (SD))       3.87 (4.85)   3.25 (3.55)    3.98 (3.85)   0.498     
  fs (mean (SD))        9.35 (14.60)  8.63 (15.00)   7.80 (7.55)   0.574     
  cued (mean (SD))     10.66 (21.94) 13.21 (22.54)  20.35 (27.10)  0.002     
  not_cued (mean (SD)) 25.52 (33.49) 25.25 (31.84)  29.41 (29.04)  0.517     

6.3 The DT package

An excellent tutorial on DT is available at https://rstudio.github.io/DT/.

datatable(mazes)

6.4 xtable (best for html)

The xtable is a solution that delivers both HTML and LaTeX. The syntax is very similar to kable:

output <- 
  matrix(sprintf("Content %s", LETTERS[1:4]),
         ncol=2, byrow=TRUE)
colnames(output) <- 
  c("1st header", "2nd header")
rownames(output) <- 
  c("1st row", "2nd row")

print(xtable(output, 
             caption="A test table", 
             align = c("l", "c", "r")), 
      type="html")
<!-- html table generated in R 3.6.3 by xtable 1.8-4 package -->
<!-- Mon May  4 16:21:07 2020 -->
<table border=1>
<caption align="bottom"> A test table </caption>
<tr> <th>  </th> <th> 1st header </th> <th> 2nd header </th>  </tr>
  <tr> <td> 1st row </td> <td align="center"> Content A </td> <td align="right"> Content B </td> </tr>
  <tr> <td> 2nd row </td> <td align="center"> Content C </td> <td align="right"> Content D </td> </tr>
   </table>

Note that to make it knit, you need to specify a chunk option: results = 'asis'

print(xtable(output, 
             caption="A test table", 
             align = c("l", "c", "r")), 
      type="html")
A test table
1st header 2nd header
1st row Content A Content B
2nd row Content C Content D
print(xtable(head(iris)), type = 'html', html.table.attributes = '')
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.10 3.50 1.40 0.20 setosa
2 4.90 3.00 1.40 0.20 setosa
3 4.70 3.20 1.30 0.20 setosa
4 4.60 3.10 1.50 0.20 setosa
5 5.00 3.60 1.40 0.20 setosa
6 5.40 3.90 1.70 0.40 setosa

6.5 pixiedust (best for PDF)

Remember that broom package we used earlier? We can make this table better…

tidy(fit)
term estimate std.error statistic p.value
(Intercept) 9.37  8.37  1.12  0.274   
qsec 1.24  0.383 3.25  0.00317 
factor(am)1 3.15  1.94  1.62  0.117   
wt -3.93  0.743 -5.29  1.58e-05
factor(gear)4 -0.268 1.66  -0.162 0.873   
factor(gear)5 -0.27  2.06  -0.131 0.897   

https://cran.r-project.org/web/packages/pixiedust/vignettes/pixiedust.html

http://www.suchanutter.net/pixiedust/index.html

dust(fit) %>% 
  sprinkle(cols = "term", 
           replace = c("Intercept", "Quarter Mile Time", "Automatic vs. Manual",
                       "Weight", "Gears: 4 vs. 3", "Gears: 5 vs 3")) %>%
  sprinkle(cols = c("estimate", "std.error", "statistic"),
           round = 3) %>% 
  sprinkle(cols = "p.value", fn = quote(pvalString(value))) %>% 
  sprinkle_colnames("Term", "Coefficient", "SE", "T-statistic", "P-value")
Term Coefficient SE T-statistic P-value
Intercept 9.365 8.373 1.118 0.27
Quarter Mile Time 1.245 0.383 3.252 0.003
Automatic vs. Manual 3.151 1.941 1.624 0.12
Weight -3.926 0.743 -5.286 < 0.001
Gears: 4 vs. 3 -0.268 1.655 -0.162 0.87
Gears: 5 vs 3 -0.27 2.063 -0.131 0.9


7 Finally, fonts!

7.1 Your friend, extrafont

The extrafont package is the best place to start with fonts in R and ggplot.

library(extrafont)

Note: Follow all installation instructions from github!

You can access fonts on your local system in ggplot by using the theme() function to set the relevant properties of your figure.

demo_plt <- ggplot(iris, aes(Sepal.Length, Sepal.Width, color=Species)) + 
  geom_point(size=3) +
  scale_color_discrete(labels=c("Setosa", "Versicolor", "Virginica")) + 
  labs(x="Sepal Length (cm)", y="Sepal Width (cm)", title="Fisher's Irises") 
demo_plt

Let’s try changing that font a bit:

demo_plt + theme(text=element_text(family="Optima"))

demo_plt + theme(text=element_text(family="Palatino"))

demo_plt + theme(text=element_text(family="Zapfino"))

Note that the specific name that you must use to refer to the font is not exactly obvious. It is not the font name, but rather the font’s “family name”. Because of some under-the-hood details about the way that fonts work, this is a bit different from what you might see in e.g. a font selection menu in Word. You can see a list of all the fonts that R knows about by using the fonttable() command:

fonttable() %>% glimpse()
Rows: 122
Columns: 10
$ package    <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ afmfile    <chr> "Keyboard.afm.gz", "SFNSDisplay.afm.gz", "SFNSText.afm.gz"…
$ fontfile   <chr> "/System/Library/Fonts/Keyboard.ttf", "/System/Library/Fon…
$ FullName   <chr> ".Keyboard", "System Font", "System Font", "System Font It…
$ FamilyName <chr> ".Keyboard", "System Font", "System Font", "System Font", …
$ FontName   <chr> "-Keyboard", "-SFNSDisplay", "-SFNSText", "-SFNSText-Itali…
$ Bold       <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…
$ Italic     <lgl> FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FAL…
$ Symbol     <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…
$ afmsymfile <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
fonttable() %>% head()
package afmfile fontfile FullName FamilyName FontName Bold Italic Symbol afmsymfile
Keyboard.afm.gz /System/Library/Fonts/Keyboard.ttf .Keyboard .Keyboard -Keyboard FALSE FALSE FALSE
SFNSDisplay.afm.gz /System/Library/Fonts/SFNSDisplay.ttf System Font System Font -SFNSDisplay FALSE FALSE FALSE
SFNSText.afm.gz /System/Library/Fonts/SFNSText.ttf System Font System Font -SFNSText FALSE FALSE FALSE
SFNSTextItalic.afm.gz /System/Library/Fonts/SFNSTextItalic.ttf System Font Italic System Font -SFNSText-Italic FALSE TRUE FALSE
Andale Mono.afm.gz /Library/Fonts/Andale Mono.ttf Andale Mono Andale Mono AndaleMono FALSE FALSE FALSE
Apple Braille.afm.gz /System/Library/Fonts/Apple Braille.ttf Apple Braille Apple Braille AppleBraille FALSE FALSE FALSE
fonttable() %>% select(FullName, FamilyName, FontName) %>% head(20)
FullName FamilyName FontName
.Keyboard .Keyboard -Keyboard
System Font System Font -SFNSDisplay
System Font System Font -SFNSText
System Font Italic System Font -SFNSText-Italic
Andale Mono Andale Mono AndaleMono
Apple Braille Apple Braille AppleBraille
Apple Braille Outline 6 Dot Apple Braille AppleBraille-Outline6Dot
Apple Braille Outline 8 Dot Apple Braille AppleBraille-Outline8Dot
Apple Braille Pinpoint 6 Dot Apple Braille AppleBraille-Pinpoint6Dot
Apple Braille Pinpoint 8 Dot Apple Braille AppleBraille-Pinpoint8Dot
AppleMyungjo Regular AppleMyungjo AppleMyungjo
Arial Black Arial Black Arial-Black
Arial Bold Italic Arial Arial-BoldItalicMT
Arial Bold Arial Arial-BoldMT
Arial Italic Arial Arial-ItalicMT
Arial Arial ArialMT
Arial Narrow Arial Narrow ArialNarrow
Arial Narrow Bold Arial Narrow ArialNarrow-Bold
Arial Narrow Bold Italic Arial Narrow ArialNarrow-BoldItalic
Arial Narrow Italic Arial Narrow ArialNarrow-Italic
fonttable() %>% select(FamilyName) %>% distinct()
FamilyName
.Keyboard
System Font
Andale Mono
Apple Braille
AppleMyungjo
Arial Black
Arial
Arial Narrow
Arial Rounded MT Bold
Arial Unicode MS
Bitter
Bodoni Ornaments
Bodoni 72 Smallcaps
Brush Script MT
Calibri
Calibri Light
Comic Sans MS
Consolas
Courier New
DIN Alternate
DIN Condensed
Fondamento
Georgia
Gorni
Impact
Khmer Sangam MN
Lao Sangam MN
Lato Black
Lato
Lato Hairline
Lato Light
Luminari
Microsoft Sans Serif
Noto Sans
Noto Serif
Open Sans
Open Sans Extrabold
Open Sans Light
Open Sans Semibold
SILDoulos IPA93
SILManuscript IPA93
SILSophia IPA93
Tahoma
Tangerine
Times New Roman
Trattatello
Trebuchet MS
Verdana
Webdings
WeePeople
Wingdings
Wingdings 2
Wingdings 3

You’ll want to use the value in FamilyName.

7.2 Installing new fonts

This part is a bit beyond the scope of this class, but the upshot is that after you install new fonts in your computer, you should tell extrafont about it using the font_import() command.

7.3 Caveats

extrafont only knows about TrueType fonts (ones whose file ends in .ttf). If you have an OpenType font you want to use with R, your best bet probably involves the showtext package.

Creative Commons License