Both challenges are due by the end of the day via Sakai on Wednesday April 29th. For the first challenge, focused on data-tidying, you’ll want to refer back to our slides. For the second challenge, you’ll want to refer to the reference lab.
Tidy the data/gapminder_broadband_per_100.xlsx
file (Tip: use the readxl package’s
read_excel() function to import from Excel, and use
janitor::clean_names() immediately after import to make
life easier)
Install and load the gapminder data package (already
installed on Posit Cloud).
install.packages("gapminder")
library(gapminder)
?gapminder
Pick at least two of the tasks below from the task menu and approach each with both a table (containing the appropriately-wrangled data) and a companion figure.
dplyr should be your main data manipulation toolggplot2 should be your main visualization toolMake observations about what your tables/figures show and about the process. If you want to do something comparable but different, i.e. swap one quantitative variable for another- go for it!
You do not have to use tidyr or otherwise worry about
reshaping your tables. Many of your tables may not be formatted
perfectly in the report. Simply printing dplyr tabular
output is fine. For all things, graphical and tabular, if you’re
dissatisfied with a result, discuss the problem, what you tried to do to
fix it, and move on.
Note: The dataset is chronological, in that it contains data over multiple years for the same country and quantity. Make sure that your analysis takes this into account! In other words, make sure not to blindly treat e.g. datapoints about a country’s GDP from 1950 as being comparable another country’s GDP in 2000.
For each table, make sure to include a relevant figure. One tip for
starting is to draw out on paper what you want your x- and y-axis to be
first and what your geom is; that is, start by drawing the
plot you want ggplot to give you. Your figure does not have
to depict every single number present in the table. Use your judgement.
It just needs to complement the table, add context, and allow for some
sanity checking.
Notice which figures are easy/hard to make, and whether the visualization adds clarity, detracts from, or is completely redundant (and therefore probably unnecessary) with respect to the tabular display.
You’re encouraged to reflect on what was hard/easy, problems you solved, helpful tutorials you read, etc.
Gapminder EDA ideas from Jenny Bryan, author and creator of the Gapminder package.