Week 4 (Nov 8th)

This week builds onto our data wrangling skills by focusing on transforming and joining data. Common transformation procedures include filtering observations by their values, reordering the rows, selecting variables, creating new variables with functions of existing variables, and collapsing values down to single summary summary statistics (i.e. mean, max, variance).

Furthermore, it’s rare that a data analysis involves only a single table of data. Typically you have many tables of data, and you must combine them to answer the questions that you’re interested in. Collectively, multiple tables of data are called relational data because it is the relations, not just the individual datasets, that are important.

Data scientists often work with different data types, and sometimes working with different data types can be difficult. Thankfully, the Tidyverse has powerful (and easy to use!) packages that make data wrangling with difficult data types much easier.

This module covers these basic capabilities by teaching you how to use the dplyr package and other Tidyverse to perform common data transformation and joining tasks.


Class

Please download the materials for Monday’s class:

Title Handouts
Lecture 04-A Slides
Lecture 04-B Slides
Cincy Crimes Data Data
Midterm project discussion (in-class exercises) pdf
Homework 3 (ACS data) pdf
ACS data dictionary docx
ACS 2015 data Data

See you in class on Monday!


Assignments

  • Complete Homework #3 located in this week’s folder.
  • One person from each group will submit via Canvas the group’s .R script and Word document.
  • This homework assignment is due by 6PM, Nov 15th, 2021.

Readings



Mid-term Project Due!

Be sure to refer to the grading rubric so you understand what is expected for your mid-term project. Create an HTML R markdown document titled “Project Proposal” and be sure to include your name in the YAML.