Let’s first load the Iris dataset. This is a very famous dataset in almost all data mining, machine learning courses, and it has been an R build-in dataset. The dataset consists of 50 samples from each of three species of Iris flowers (Iris setosa, Iris virginicaand Iris versicolor). Four features(variables) were measured from each sample, they are the length and the width of sepal and petal, in centimeters. It is introduced by Sir Ronald Fisher in 1936.
Histogram is the easiest way to show how numerical variables are distributed.
data(iris)
hist(iris$Sepal.Length, col="green", breaks=20)
You may change “breaks=” and “col=” to have different appearance.
Density plot is a nonparametric fitting.
plot(density(iris$Sepal.Length))
You can make the plot more elegant with different options. For example, adding a title, adjusting the axis range, renaming the axis label, and so on…
You can also add curves on top of an existing plot by using lines()
or abline()
function.
hist(iris$Sepal.Length, prob=T, col="green", breaks=20, main="Histogram and Density of Sepal Length", xlim=c(3,9), xlab="Sepal Length")
lines(density(iris$Sepal.Length), col="red", lwd=2)
# Add a vertical line that indicates the average of Sepal Length
abline(v=mean(iris$Sepal.Length), col="blue", lty=2, lwd=1.5)
Bar chart is produces by using a vector of single data points, which is often a vector of summary statistics. Therefore, you need to preprocess your data, and get summary statistics before drawing the bar chart.
# bar chart for average of the 4 quantitative variables
aveg<- apply(iris[,1:4], 2, mean)
barplot(aveg, ylab = "Average")
?barplot
or Google search to produce following bar chart.Pie chart is commonly used to visualize the proportion of different subject. It is similar to bar chart. You have to use a vector of single data points to produce a pie chart.
pie(table(iris$Species), col=rainbow(3))
Box plot can only be drawn for continuous variable.
# box plot of Sepal.Length
boxplot(iris$Sepal.Length)
boxplot(iris[,1:4], notch=T, col=c("red", "blue", "yellow", "grey"))
boxplot(iris[,1]~iris[,5], notch=T, ylab="Sepal Length", col="blue")
plot(iris$Sepal.Length, iris$Sepal.Width, xlab = "Length", ylab = "Width", main = "Sepal")
pairs(iris[,1:4])
library(MASS)
parcoord(iris[,1:4],col=iris$Species)
You may display multiple plots in one window (one figure).
# set arrangement of multiple plots
par(mfrow=c(2,2))
# set mrgins
par(mar=c(4.5, 4.2, 3, 1.5))
hist(iris$Sepal.Length, xlab = "Sepal Length", cex.lab=1.5)
hist(iris$Sepal.Width, xlab = "Sepal Width", col = "red")
plot(iris$Sepal.Length, iris$Sepal.Width, xlab = "Length", ylab = "Width", main= "Sepal", pch=17)
boxplot(iris[,1:4], notch=T, col=c("red", "blue", "yellow", "grey"))
There are much more options that can make your plot nice. You can learn options at here or ask your best friend – Google.
Details about figure margins can be found here.