Last week we discussed general guidelines for first interacting with a new data set. This week we want to build on those activities by learning how to clean and tidy our data, and then beginning our journey to creating insights with data through data manipulation.
Specifically, this week you are going to learn:
- How to make your data “tidy”.
- How to manipulate data to create insights from data.
Consequently, this week will give you a strong foundation for managing and cleaning your data. This will prepare you for your second challenge in completing your course project - that of cleaning, tidying, and preparing your data for exploratory data analysis!
Below outlines the tutorials that you need to review, and the assignments you need to complete, prior to Monday’s class. The skills and functions introduced in these tutorials will be necessary for Monday’s in-class activities.
Assignments
- A precise grading rubric is in the folder to download for today’s class (link below), and be sure to refer to the midterm page so you understand what is expected.
- You will send me through Slack (in a private message!) the URL to your project–only one person needs to send the link if you are working with another person.
- Please include your name(s) in the YAML.
Readings
- BEFORE next session’s class on Nov 8, read Chapter 5 sections 5.5 through 5.7 of R for Data Science.
- As you read, check your answers for the guided reading with this solutions manual.
Class
Please download the materials for Monday’s class:
Title | Handouts |
---|---|
Lecture 03-A | Slides |
Lecture 03-B | Slides |
Midterm project instruction | |
Data | Data |
MBTA exercise |
In addition, be sure to have identified which data you are going to use for your final project. Be sure to have access to this data because you will work on it during class. Furthermore, identify at least 10 specific questions you want to ask of your project data. Using what you learned this week, what type of data transformations do you need to make to help answer these questions? Be ready to use dplyr to answer these questions in class.
See you in class!