Welcome to Data Wrangling with R! This course provides an intensive, hands-on introduction to Data Wrangling with the R programming language. You will learn the fundamental skills required to acquire, munge, transform, manipulate, and visualize data in a computing environment that fosters reproducibility.
- Instructor: Xiaorui Zhu
- Day(s): Mondays
- Time: 6:00PM-9:50PM
- Location: Lindner Hall 2120
- Office Hours: 5:00PM-6:00PM Lindner Hall 3329.2
- Webpage: https://xiaorui.site/data-wrangling
- Hardcopy Syllabus:
Upon successfully completing this course, you will be able to:
- Perform your data analysis in a literate programming environment
- Manage different types of data
- Manage different data structures
- Import, scrape, and export data
- Index, subset, reshape and transform your data
- Compute descriptive statistics
- Visualize data
- Perform iterative functions
- Write your own functions
- (If time allows) Develop your own Shiny app
…all with R!
Each week you will read and work through selected tutorials on specific data wrangling activities in R. In this class I blend external interactive reading modules via R for Data Science where you will read chapters prior to each class session. Then in each class I’ll spend the first part of class reviewing the data wrangling activity and answer any burning questions. Then you will break up into defined small groups and work together to complete a data wrangling problem prior to the end of class. Thus, the majority of class time will be spent practicing and applying what you learned outside of the classroom.
The purpose for this course structure is multi-dimensional:
- It will teach you to read and learn R programming tutorials and techniques on your own.
- The out-of-class modules will force you to come to each class prepared and these modules will also prepare you for your final project.
- The in-class small group work will teach you to work on a coding task collaboratively and within a constrained time limit and also teach you to assess other people’s code.
All required classroom material will be provided in class or online. Any recommended yet optional material will also be provided in the classroom notes.
|Intro to data wrangling, R, and course outline|
|2||Reproducible Documents and Importing Data|
|Managing your workflow and reproducibility|
|Importing data and understanding the basics of it|
|3||Tidy Data and Data Manipulation|
|Tidying & preparing data for analysis|
|4||Relational Data and More Tidyverse Packages|
|Leveraging the Tidyverse to simplify data wrangling|
|6||Creating Efficient Code in R|
|Control statements & iteration|
|7||Introduction to Applied Modeling|
|Correlation & regression|
|Introduction to machine learning|
Course grades will consist of:
- 20% Homework assignments (4 equally weighted assignments)
- 20% Mid-term Project Evaluation
- 50% Final Project
- 10% Engagement
- There will be no in-class final exam
Final grades will be distributed according to the following cutoffs:
- A 94 – 100%
- A- 90 – 93%
- B+ 87 – 89%
- B 83 – 86%
- B- 80 – 82%
- C+ 77 – 79%
- C 73 – 76%
- C- 70 – 72%
- D & F Hopefully None!
We will use this software during the course. Plan on bringing a computer to each class meeting.
- R and RStudio will be used to perform all programming activities, assignments, and the final project. You can find details on how to download these here.
Academic Integrity: As with all Lindner College of Business efforts, this course will uphold the highest ethical standards, which are critical to building character. LCB instructors are required to report ANY incident of academic misconduct (e.g., cheating, plagiarism) to the college review process, which could result in severe consequences, including potential dismissal from the college. For further information on Academic Misconduct or related university policies and procedures, please see the UC Code of Conduct.
All academic programs at the Lindner College of Business apply a “Two Strikes Policy” regarding Academic Integrity. Any student who has been found responsible for two cases of academic misconduct may be dismissed from the College. The “Two Strikes Policy” supplements the UC Student Code of Conduct.
All cases of academic misconduct (e.g., cheating, plagiarism, falsification) will be formally reported by faculty. Students will be afforded due process for allegations, as outlined in the policy. If a student is found guilty of academic misconduct in two instances, the student may be dismissed from the Lindner College of Business. The “Two Strikes Policy” is now in effect.
Disability: Students with disabilities who need academic accommodations or other specialized services while attending the University of Cincinnati will receive reasonable accommodations to meet their individual needs as well as advocacy assistance on disability-related issues. Students requiring special accommodation must register with the Disability Services Office.
Attendance: Your attendance is expected at every meeting. If you must be absent, I request that you notify me in advance of the class meeting.
Grade appeals: If you think the grade of your work (homework, peer reviews, participation) is miscalculated, you have the right to appeal. The appeal must be done (through email) within 7 calendar days since the grade is released/posted. After that, your grade is final and will not be changed.
Acknowledgments: I have drawn ideas or readings from the following syllabi:
- Justin Jodrey, http://uc-r.github.io/data_wrangling
- Garrett Grolemund & Hadley Wickham, R for Data Science
- Jenny Bryan, STAT 545: Data wrangling, exploration, and analysis with R
- Lincoln Mullen, HIST 688: Data and visualization in digital history
- Kieran Healy, Soc 880: Data Visualization
- Garrett Grolemund, Hands-On Programming with R