Week 2 (October 25)

“What we have is a data glut." - Vernon Vinge

Data are being generated by everything around us at all times. Every digital process and social media exchange produces it. Systems, sensors and mobile devices transmit it. Countless databases collect it. Data are arriving from multiple sources at an alarming rate and analysts and organizations are seeking ways to leverage these new sources of information. Consequently, analysts need to understand how to get data from these data sources.

Welcome to week 2! This week we will focus on:

  1. Creating reproducible documents with R Markdown.
  2. Understanding the basics of project workflow with R Projects, R Markdown, and R Notebooks.
  3. How to import (quickly!) flat files, and understanding the difference between Base R functions and Tidyverse functions to import data.
  4. Advanced importing capabilities such as importing data straight from relational databases (i.e. SQL), web scraping, and importing other statistical software data files (i.e. SPSS, SAS, STATA)

Please download the materials for Monday’s class:

Title Handouts
Lecture 02-A Slides
Lecture 02-B Slides
Coding exercises pdf
Aircraft data Data
RMarkdown Demonstration Text Doc
RMarkdown Demonstration Code Code
Homework 2 Instruction
Justin_Jodrey bio Bio

Consequently, this week will give you a strong foundation for the different ways to get your data into R and understanding the basics of your data set. This will prepare you for your first challenge in completing your course project - that of acquiring your data!

Below outlines the readings that you need to review and the assignments you need to complete after Saturday’s class. The skills and functions introduced in these tutorials will be necessary for Saturday’s in-class activities.


Assignments

  • Complete Homework #2 located in this week’s folder.
  • One person from each group will submit via Slack the group’s Word document.
  • This homework assignment is due by 9AM, Nov 1, 2021.

Readings

  • BEFORE next week’s class on March 28th, read Chapter 12 sections 12.1 through 12.5 of R for Data Science.
  • BEFORE next week’s class on March 28th, read Chapter 5 sections 5.1 through 5.4 of R for Data Science.
  • As you read, check your answers for the guided reading with this solutions manual.
  • On the course website, read the pages for the midterm and final project.

See you in class on Monday!