This week builds onto our data wrangling skills by focusing on transforming and joining data. Common transformation procedures include filtering observations by their values, reordering the rows, selecting variables, creating new variables with functions of existing variables, and collapsing values down to single summary summary statistics (i.e. mean, max, variance).
Furthermore, it’s rare that a data analysis involves only a single table of data. Typically you have many tables of data, and you must combine them to answer the questions that you’re interested in. Collectively, multiple tables of data are called relational data because it is the relations, not just the individual datasets, that are important.
Data scientists often work with different data types, and sometimes working with different data types can be difficult. Thankfully, the Tidyverse has powerful (and easy to use!) packages that make data wrangling with difficult data types much easier.
This module covers these basic capabilities by teaching you how to use the dplyr
package and other Tidyverse to perform common data transformation and joining tasks.
Class
Please download the materials for Monday’s class:
Title | Handouts |
---|---|
Lecture 04-A | Slides |
Lecture 04-B | Slides |
Cincy Crimes Data | Data |
Midterm project discussion (in-class exercises) | |
Homework 3 (ACS data) | |
ACS data dictionary | docx |
ACS 2015 data | Data |
See you in class on Monday!
Assignments
- Complete Homework #3 located in this week’s folder.
- One person from each group will submit via Canvas the group’s .R script and Word document.
- This homework assignment is due by 6PM, Nov 15th, 2021.
Readings
- BEFORE next session’s class on Nov 15th, read Chapter 3, all sections, of R for Data Science.
- As you read, check your answers for the guided reading with this solutions manual.
Mid-term Project Due!
Be sure to refer to the grading rubric so you understand what is expected for your mid-term project. Create an HTML R markdown document titled “Project Proposal” and be sure to include your name in the YAML.