Tables
Using the data set from the reference lab of PNW flights, choose
three analytical questions from the menu below, and design a table for
each one that answers it in a publication-ready format (using
gt
), including appropriate column names, titles,
captions, etc. Unless otherwise specified, all questions should be
explored for flights departing from Portland. Alternatively, please feel
free to come up with your own analyses!
- Which airlines had the best and worst track
records of on-time departures in each month? Is it different between PDX
and SEA?
- Which airlines improved the most in terms of on-time
departures over time, and on which routes? Which airlines got
worse?
- What cities have the most service from Portland (defined however you
like, but do make sure to define it clearly!), and which have the
worst?
- By month, what new routes were added or removed?
- Hint: dplyr’s
lead
and lag
commands could be helpful here
- Finding busy aircraft (identified by
tailnum
), with
“business” defined as:
- Which specific aircraft (
tailnum
) are seen most often,
for whom do they fly, and on what routes?
- Which specific aircraft accumulate the most flight
time?
- Which specific aircraft log the most distance?
- Descriptive statistics comparing several characteristics of
long-haul vs. short-haul routes (defined however you like, but you must
be clear about your definition)
- the
air_time
column will be useful here
- Time of Day: are some destinations from PDX “morning” destinations
vs “evening” ones?
Note: For some of these questions, you may need to make
editorial/analytical choices about what data to include, how to define
metrics, etc. For example, it may be the case that improvement/decline
in timeliness may be tricky to measure, as naïve approachces may be
easily skewed by outliers or by variation in the data. You may choose to
exclude certain low-volume carriers, or only include routes that are
present throughout the entire dataset, or something else altogether.
Make sure your table includes sufficient information to guide the
interpretation and comprehension of your analysis.
Make sure to keep in mind the design principles that we discussed on
Monday regarding spacing, use of rules, row-vs-column orientation,
alignment, etc. In addition to the table itself, provide a short
description of your design and its motivation.
Fonts
- Orient yourself to the built-in font library in RStudio.cloud. Using
the
fonttable()
function (along with dplyr
or
your data-wrangling method of choice), answer the following:
- How many distinct font families (not fonts!) are
installed?
- What proportion of the installed font families include
bold and italic faces?
- Some font families include many fonts, others include only one.
- Generate a plot of your choice illustrating this distribution.
- Compute a table with descriptive statistics about the built-in font
library (e.g. mean/median number of fonts per family, etc.)
- Spend some time on Google
Fonts (or a different font repository) and pick out a serif,
sanserif, and display font that “speaks to you”.
- Write a sentence or two about each one, including what sort of
scenario you think it would work well for.
- Install them into your R project as shown in the reference lab, and
if appropriate use them in a figure.
Deliverable
Your knitted .Rmd file (i.e., the HTML output).