Data Wrangling Examples in R

Xiaorui (Jeremy) Zhu

01/25/2026

Aircraft data

The aircraft.xlsx dataset contains columns for MD (United States Air Force (USAF) ), FH (Flight Hours), FY (Fiscal Year), Cost, and Gallons.

Import

aircraft.xlsx

  1. What spreadsheets are contained in the aircraft.xlsx file?
  2. Read in the Trainers worksheet data without the header information.

Q1 Solution

  1. What spreadsheets are contained in the aircraft.xlsx file?
library(readxl)
excel_sheets("data/aircraft.xlsx")
## [1] "Bombers"              "Fighters"             "Trainers"            
## [4] "UAV_Drones"           "Tankers_Transporters"

Q2 Solution

  1. Read in the Trainers worksheet data without the header information.
aircraft <- read_excel("data/aircraft.xlsx", sheet = "Trainers", skip = 3)
head(aircraft)
## # A tibble: 6 × 6
##   Type    MD       FY    FH  Gallons     Cost
##   <chr>   <chr> <dbl> <dbl>    <dbl>    <dbl>
## 1 Trainer AT-38  1996 12517  6681614  5641569
## 2 Trainer AT-38  1997 11656  7707001  6506680
## 3 Trainer AT-38  1998 12619  9749881  9526089
## 4 Trainer AT-38  1999 13132 10534024  9343636
## 5 Trainer AT-38  2000 14400 10769237  7242603
## 6 Trainer AT-38  2001 12674  9680191 10533477

Numeric Understanding of data: Question 3 to 7

  1. Look at the summary information for this data. Anything seem odd?

  2. Which aircraft MDs are represented?

  3. Are there any missing years between 1996-2014 in this data?

  4. 90% of flying hours fall under what value?

  5. What is the spread of the range of costs?

Q3 Solution

  1. Look at the summary information for this data. Anything seem odd??

Q4 Solution

  1. Which aircraft MDs are represented?

Q5 Solution

  1. Are there any missing years between 1996-2014 in this data?

Q6 Solution

  1. 90% of flying hours fall under what value?

Q7 Solution

  1. What is the spread of the range of costs?

go to top

Visual Understanding of data: Question 8 to 10

  1. How would you describe the distribution of flying hours?

  2. If we wanted to focus on only the trainers with the largest variance in flying hours, which MDs would we select?

  3. Are all FYs equally represented?

Q8 Solution

  1. How would you describe the distribution of flying hours?

Q9 Solution

  1. If we wanted to focus on only the trainers with the largest variance in flying hours, which MDs would we select?

Q10 Solution

  1. Are all FYs equally represented?

go to top

go back to course website