Tables
Using the data set from the reference lab of PNW flights, choose
three analytical questions from the menu below, and design a table for
each one that answers it in a publication-ready format (using
gt
), including appropriate column names, titles,
captions, etc. Unless otherwise specified, all questions should be
explored for flights departing from Portland. Alternatively, please feel
free to come up with your own analyses!
- Which airlines had the best and worst track
records of on-time departures in each month? Is it different between PDX
and SEA?
- Which airlines improved the most in terms of on-time
departures over time, and on which routes? Which airlines got
worse?
- What cities have the most service from Portland (defined however you
like, but do make sure to define it clearly!), and which have the
worst?
- By month, what new routes were added or removed?
- Hint: dplyr’s
lead
and lag
commands could be helpful here
- Finding busy aircraft (identified by
tailnum
), with
“business” defined as:
- Which specific aircraft (
tailnum
) are seen most often,
for whom do they fly, and on what routes?
- Which specific aircraft accumulate the most flight
time?
- Which specific aircraft log the most distance?
- Descriptive statistics comparing several characteristics of
long-haul vs. short-haul routes (defined however you like, but you must
be clear about your definition)
- the
air_time
column will be useful here
- Time of Day: are some destinations from PDX “morning” destinations
vs “evening” ones?
Note: For some of these questions, you may need to make
editorial/analytical choices about what data to include, how to define
metrics, etc. For example, it may be the case that improvement/decline
in timeliness may be tricky to measure, as naïve approachces may be
easily skewed by outliers or by variation in the data. You may choose to
exclude certain low-volume carriers, or only include routes that are
present throughout the entire dataset, or something else altogether.
Make sure your table includes sufficient information to guide the
interpretation and comprehension of your analysis.
Make sure to keep in mind the design principles that we discussed on
Monday regarding spacing, use of rules, row-vs-column orientation,
alignment, etc. In addition to the table itself, provide a short
description of your design and its motivation.
Things to
consider
- For your analytical questions of choice, what measure of central
tendency is appropriate? Does mean or median make more sense? What
should you look at to try and answer that question?
- Pay attention to formatting in your tables- numbers should have
commas in appropriate places, column headers should be human-readable,
as opposed to just whatever the dataframe column was named (“Total
on-time flights” as opposed to “total_on_time_flights”), etc.
- The flight delays dataset has month as a numerical variable; does
this make sense for display in your table?
- The
pnwflights14
dataset has several ancillary
dataframes besides flights
that might be useful for
polishing your table.
- For example, the
airports
dataframe includes both the
FAA airport code (“PDX”) as well as the actual display name of each
airport.
- There are several others to explore…
Fonts
- Orient yourself to the built-in font library in RStudio.cloud. Using
the
fonttable()
function (along with dplyr
or
your data-wrangling method of choice), answer the following:
- How many distinct font families (not fonts!) are
installed?
- What proportion of the installed font families include
bold and italic faces?
- Some font families include many fonts, others include only one.
- Generate a plot of your choice illustrating this distribution.
- Compute a table with descriptive statistics about the built-in font
library (e.g. mean/median number of fonts per family, etc.)
- Spend some time on Google
Fonts (or a different font repository) and pick out a serif,
sanserif, and display font that “speaks to you”.
- Write a sentence or two about each one, including what sort of
scenario you think it would work well for.
- Install them into your R project as shown in the reference lab, and
if appropriate use them in a figure.
Deliverable
Your knitted .Rmd file (i.e., the HTML output).