Exploratory Data Analysis (EDA) is usually the first step when you analyze data, because you want to know what information the dataset carries. In this lab, we introudce basic R functions for EDA in both quantitative and graphical approaches. In the end, we will also learn some useful functions for data manipulation, which is often necessary in data analysis.

1 Exploratory Data Analysis

1.1 Basic summary Statistics

Before start, always do

  • set the working directory!
  • create a new R script (unless you are continuing last project)
  • Save the R script.

Let’s first load the Iris dataset. This is a very famous dataset in almost all data mining, machine learning courses, and it has been an R build-in dataset. The dataset consists of 50 samples from each of three species of Iris flowers (Iris setosa, Iris virginicaand Iris versicolor). Four features(variables) were measured from each sample, they are the length and the width of sepal and petal, in centimeters. It is introduced by Sir Ronald Fisher in 1936.

  • 3 Species