class: center, middle, inverse, title-slide # Lab 01: BMI 5/625 ## Working in R ### Alison Hill, updates by Steven Bedrick --- ## R Basics -- * R is an interpreter (`>`) -- * Name objects in R (`i_like_snake_case <- `) asdf -- * Know your object types (`typeof()`) -- * Case matters (`my_names != My_names`) -- * Use comments! (`# use the hashtag symbol`) -- * Functions (`fun`!) -- * Use packages (*"install once per machine, load once per R session"*) -- * Use the `%>%` (*"dataframe first, dataframe once"*) --- ## R is an interpreter `>` -- You enter commands line-by-line (as opposed to compiled languages). -- * The `>` means R is a ready for a command -- * The `+` means your last command isn't complete -- - If you get stuck with a `+` use your escape key! --- class: center, middle, inverse # 🐍 ## Name Objects in R ## `i_like_snake_case <-` -- RStudio Keyboard Shortcuts: OSX: `Option` + `-` Else: `Alt` + `-` *(the + means and, not the + key)* --- ## Name your own objects ```r us <- c("Steven", "Jackie", "Alison") # combine strings us ``` ``` [1] "Steven" "Jackie" "Alison" ``` -- ```r num_labs <- c(1:10) # combine numbers num_labs ``` ``` [1] 1 2 3 4 5 6 7 8 9 10 ``` -- ```r mood <- rep("yippee", length(num_labs)) # replicate 10 times mood ``` ``` [1] "yippee" "yippee" "yippee" "yippee" "yippee" "yippee" "yippee" "yippee" [9] "yippee" "yippee" ``` --- ## Re-name others' objects ```r my_alpha <- letters # built-in, no package needed my_alpha ``` ``` [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" [20] "t" "u" "v" "w" "x" "y" "z" ``` -- ```r my_names <- babynames # from the babynames package my_names ``` ``` # A tibble: 1,924,665 × 5 year sex name n prop <dbl> <chr> <chr> <int> <dbl> 1 1880 F Mary 7065 0.0724 2 1880 F Anna 2604 0.0267 3 1880 F Emma 2003 0.0205 4 1880 F Elizabeth 1939 0.0199 5 1880 F Minnie 1746 0.0179 6 1880 F Margaret 1578 0.0162 7 1880 F Ida 1472 0.0151 8 1880 F Alice 1414 0.0145 9 1880 F Bertha 1320 0.0135 10 1880 F Sarah 1288 0.0132 # … with 1,924,655 more rows ``` --- ## What to name objects? -- .pull-left[ Object names cannot: - Start with a number - Contain a space - Contain ["reserved" words](http://stat.ethz.ch/R-manual/R-devel/library/base/html/Reserved.html) ] -- .pull-right[ Object names must: - Start with a letter - Contain letters, numbers, `_` and `.` ] -- ```r ?make.names ``` -- ```r some_things <- c("Kilroy was here", "2 legit to quit") make.names(some_things) ``` -- ``` [1] "Kilroy.was.here" "X2.legit.to.quit" ``` --- ## Adopt a consistent naming style ```r i_use_snake_case # recommended otherPeopleUseCamelCase some.people.use.periods And_aFew.People_RENOUNCEconvention ``` From: http://r4ds.had.co.nz/workflow-basics.html#whats-in-a-name Read more: http://style.tidyverse.org/syntax.html#object-names --- class: center, middle, inverse # 🔦 ## Know Your Data Types ## `typeof()` --- ## Know your data types * Numeric (2 subtypes) - Integers (`1, 50`) - Double (`1.5, 50.25`, `?double`) * Character (`"hello"`) * Factor (`grade = "A" | grade = "B"`) * Logical (`TRUE | FALSE`) -- ```r typeof(num_labs) # numeric ``` ``` [1] "integer" ``` -- ```r typeof(mood) # "yippee" is a character ``` ``` [1] "character" ``` -- ```r typeof(mood == "yippee") # is mood equal to "yippee"- T or F? ``` ``` [1] "logical" ``` --- ## Characters can be deceiving ```r my_things <- c(num_labs, mood) my_things ``` ``` [1] "1" "2" "3" "4" "5" "6" "7" "8" [9] "9" "10" "yippee" "yippee" "yippee" "yippee" "yippee" "yippee" [17] "yippee" "yippee" "yippee" "yippee" ``` -- ```r typeof(my_things) ``` ``` [1] "character" ``` --- ## `NA` is special ```r num_labs <- c(num_labs, NA) num_labs ``` ``` [1] 1 2 3 4 5 6 7 8 9 10 NA ``` -- ```r typeof(num_labs) ``` ``` [1] "integer" ``` -- ```r num_labs*3 ``` ``` [1] 3 6 9 12 15 18 21 24 27 30 NA ``` ```r max(num_labs) ``` ``` [1] NA ``` ```r max(num_labs, na.rm = TRUE) ``` ``` [1] 10 ``` --- class: center, middle, inverse ## Case matters ## `my_names != My_names` --- ## Case matters This works: ```r glimpse(babynames) ``` ``` Rows: 1,924,665 Columns: 5 $ year <dbl> 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880,… $ sex <chr> "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", … $ name <chr> "Mary", "Anna", "Emma", "Elizabeth", "Minnie", "Margaret", "Ida",… $ n <int> 7065, 2604, 2003, 1939, 1746, 1578, 1472, 1414, 1320, 1288, 1258,… $ prop <dbl> 0.07238359, 0.02667896, 0.02052149, 0.01986579, 0.01788843, 0.016… ``` -- These do not: ```r Glimpse(babynames) # no function ``` ``` Error in Glimpse(babynames): could not find function "Glimpse" ``` ```r glimpse(Babynames) # no data ``` ``` Error in glimpse(Babynames): object 'Babynames' not found ``` --- class: center, middle, inverse # 📢 ## Comments ## `# go here` --- ## Text behind a `#` is a comment ```r num_labs + 2 # add 2 here ``` ``` [1] 3 4 5 6 7 8 9 10 11 12 NA ``` ```r num_weeks <- num_labs + 2 # save as new object ``` -- ```r # I can say anything I want here... num_weeks ``` ``` [1] 3 4 5 6 7 8 9 10 11 12 NA ``` -- ```r but not here ``` ``` Error: <text>:1:5: unexpected symbol 1: but not ^ ``` --- class: center, middle, inverse # 🍰 ## Functions --- background-image: url("../images/Slide06.png") background-size: cover --- background-image: url("../images/Slide07.png") background-size: cover --- background-image: url("../images/Slide08.png") background-size: cover --- background-image: url("../images/Slide10.png") background-size: cover --- ## Functions Sometimes abbreviated `funs` in documentation, which is a little ironic 😉. Functions can come from: - base R (these functions are "built in") - packages - you --- class: middle # Base R Functions ```r seq(1, 12, 1) # base R ``` ``` [1] 1 2 3 4 5 6 7 8 9 10 11 12 ``` --- class: middle # Functions from Packages ```r babynames %>% count(sex) # count is from dplyr ``` ``` # A tibble: 2 × 2 sex n <chr> <int> 1 F 1138293 2 M 786372 ``` --- class: middle # Roll Your Own Functions ```r greet <- function(name) { glue::glue("Welcome to BMI 5/625, {name}!") } greet("Sloane") ``` ``` Welcome to BMI 5/625, Sloane! ``` --- # Function help ```r ?seq ?count ``` Pay attention to: -- * Usage *(recipe)* -- * Arguments *(ingredients)* -- * Examples -- Read more: - http://r4ds.had.co.nz/workflow-basics.html#calling-functions - http://socviz.co/appendix.html#a-little-more-about-r - http://stat545.com/block011_write-your-own-function-01.html - http://stat545.com/block011_write-your-own-function-02.html - http://stat545.com/block011_write-your-own-function-03.html --- class: center, middle, inverse # 📦 ## Packages *"install once per machine, load once per R session"* --- ## Packages! *Install once* per machine ```r install.packages("dplyr") ``` -- *Load once* per R work session ```r library(dplyr) ``` -- *also: quotes matter, sorry* --- class: inverse, middle, center <img src="https://www.tidyverse.org/images/hex-tidyverse.png" width="50%" style="display: block; margin: auto;" /> ## The `tidyverse` package ecosystem https://www.tidyverse.org --- class: inverse, middle, center <img src="https://www.tidyverse.org/images/hex-tidyverse.png" width="25%" style="display: block; margin: auto;" /> *"The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures."* ```r install.packages("tidyverse") library(tidyverse) ``` See packages included here: https://www.tidyverse.org/packages/ --- class: center, middle, inverse # `%>%` ## The pipe *"dataframe first, dataframe once"* -- ```r library(dplyr) ``` -- RStudio Keyboard Shortcuts: OSX: `CMD` + `SHIFT` + `M` Else: `CTRL` + `SHIFT` + `M` --- class: middle *Nesting* a dataframe inside a function is hard to read. ```r slice(babynames, 1) ``` ``` # A tibble: 1 × 5 year sex name n prop <dbl> <chr> <chr> <int> <dbl> 1 1880 F Mary 7065 0.0724 ``` -- Here, the "sentence" starts with a <font color="#ED1941">verb</font>. -- <hr> *Piping* a dataframe into a function lets you read L to R ```r babynames %>% slice(1) ``` ``` # A tibble: 1 × 5 year sex name n prop <dbl> <chr> <chr> <int> <dbl> 1 1880 F Mary 7065 0.0724 ``` -- Now, the "sentence" starts with a <font color="#ED1941">noun</font>. --- class: middle Sequences of functions make you read *inside out* ```r slice(filter(babynames, sex == "M"), 1) ``` ``` # A tibble: 1 × 5 year sex name n prop <dbl> <chr> <chr> <int> <dbl> 1 1880 M John 9655 0.0815 ``` -- <hr> Chaining functions together lets you read *L to R* ```r babynames %>% filter(sex == "M") %>% slice(1) ``` ``` # A tibble: 1 × 5 year sex name n prop <dbl> <chr> <chr> <int> <dbl> 1 1880 M John 9655 0.0815 ``` --- class: inverse, middle, center .pull-left[ <img src="https://magrittr.tidyverse.org/logo.png" width="50%" style="display: block; margin: auto;" /> ] -- .pull-right[ <img src="https://upload.wikimedia.org/wikipedia/en/b/b9/MagrittePipe.jpg" width="50%" style="display: block; margin: auto;" /> ] -- ## "dataframe first, dataframe once" --- class: middle ```r babynames %>% filter(sex == "M") %>% slice(1) ``` ``` # A tibble: 1 × 5 year sex name n prop <dbl> <chr> <chr> <int> <dbl> 1 1880 M John 9655 0.0815 ``` -- <hr> This does the same thing: ```r babynames %>% filter(.data = ., sex == "M") %>% slice(.data = ., 1) ``` ``` # A tibble: 1 × 5 year sex name n prop <dbl> <chr> <chr> <int> <dbl> 1 1880 M John 9655 0.0815 ``` -- <hr> So does this: ```r babynames %>% filter(., sex == "M") %>% slice(., 1) ``` ``` # A tibble: 1 × 5 year sex name n prop <dbl> <chr> <chr> <int> <dbl> 1 1880 M John 9655 0.0815 ``` --- class: inverse, middle, center ## I know... ![](https://media.giphy.com/media/uEhLWy2eu87Ly/source.gif) --- class: inverse, middle, center ## I promise, it gets better. ![](https://media.giphy.com/media/l0MYRzcWP7cjfNQ2I/giphy.gif) --- class: inverse, middle, center # 🏃🏽 ## Resources for Working in R: http://r4ds.had.co.nz/workflow-basics.html http://moderndive.com/2-getting-started.html https://bookdown.org/chesterismay/rbasics/3-rstudiobasics.html https://github.com/rstudio/cheatsheets/blob/master/rstudio-ide.pdf