- 1 Introduction to R and R Studio
- 2 Preparations
- 3 Fundamentals
- 4 Summary
- 5 Don’t you want to improve your life? Let RMarkdown do the trick!

Without a doubt, R is nibbling other programming languages that popularly used by data scientists or statisticians. Recently, most of statisticians use R to analyze data, fit models, do research. Moreover, a lot of practitioners also tend to use R due to its free, open source and active community full of great researchers. Researchers and many other soft engineers help R community to be organicaly developed. All those cooperations are helping R to become one of the greatest statistical programming language in the world. R Studio, REvolution, GGplot2, Shiny App etc. are a few great tools in this community which play big role to make this language great.

For people who want to study frontier statistical methodologies, R helps them to be able to focus on the implementation of statistical methods. One can easily implement new estimation method or algorithms for simulation or comparison purpose, meanwhile, one can easily develop R packages for the methods or models they proposed, which can be used by practionor, maybe right after the method is published. This cannot happen before when statistician know few about software developing and engineers also care less about statistical theories. However, by learning a few of simple R package developing knowledge, one can develop statistical package for their new methods so that people in industry can quickly apply research fruits to accelare their business or improve their decision making, just as simple as clicking button in SPSS. On the other side, practitioners can also develop package for their own projects, which makes their intellectual property reproducible for further using (they don’t necessarily need to publish their package to the public). Although open source may face some issues like infrequent maintenance or security, it makes the flow of knowledge seamless.

And it is easy to learn as learning how to use a package in R. Here is the tutorial: https://shiny.rstudio.com/tutorial/

R Studio is a very active community to enrich functions of R. There are several great features I want to emphisis and recommend.

- R Markdown or R Notebook

Markdown is an elegant syntax for writers, no matter he or she is a novelist or even in social science or nature science fields, which need more sophisticated notations or graphical and numerical results. Its simple syntax allows writers to be able to focus on the essential part of knowledge sharing: the content and idea itself. Although LaTex is a great tool to generate beautiful document, researchers, unlike publisher, doesn’t really need to know much about a lot of tedious things like: formating, typesetting, referencing, and perfectly positioning statistical tables or figures. Because smooth writing process leads to fluent communication with readers, the advantage of Markdown language is obvious.

Here is Markdown Basics: http://rmarkdown.rstudio.com/authoring_basics.html And many other tutorials are available in the same website.

New version of R studio provides R Notebook feature that is similar to R markdown. Both of them provide an easy environment to test and iterate when writing article with code. You can run programs in the article and display whatever results you want to insert.

- Shiny Web App

Shiny Web App provide a very important pipeline for statisticians and data scientists to interact with audience or customers. With this simple and user-friendly tool, those who are playing with data can directly interpret what they’ve found to those who are interested in. This process was used to be complicated and need the involvement of professionals in software developing or web developing. The Shiny App, however, simplify the process and provide a intuitive way to build interactive application upon statistical results. Therefore, it is a nice tool that is worthy to learn so that you can smoothly tranfer your idea into real product.

**R**

- Download at https://cran.r-project.org/
- Windows user: click base and then Download R 3.4.1 (or later version) for Windows. http://cran.r-project.org/bin/windows/base/
- Mac user: click R-3.4.1.pkg (or later version). http://cran.case.edu/bin/macosx/R-latest.pkg
- Follow the instructions to install R.

**R Studio**

- Download at https://www.rstudio.com/products/rstudio/download/
- Follow the instructions to install RStudio.

**Shiny App**

**R Markdown**

The download and installation should be straightforward, in case you encounter problems you can check the following video tutorials.

*Install R:* http://www.youtube.com/watch?v=SJ9sVyqWJn8&hd=1

*Install R Studio:* http://www.youtube.com/watch?v=6aTRbo7kdGk&hd=1

RStudio is running based on R. It is an IDE (Intergrated Development Environment) with many advanced features. This lab notes is created based R Markdown, a very nice and useful tool from RStudio.

After you open RStudio, it should look like this:

There are three panels showing. However, you need the forth one, which is the editor window. Click the green-plus icon on left-top corner, and select R Script. You write all your code in this editor window, and remember to save it!

Other Panels

- Console: It shows any command you have run and corresponding output.
- Environment: It shows what you currently have. Data you have loaded, functions that have been defined, and other R objectives.
- File/Plot…: Files in current working directory, latest plot you have generated…

R is open source software. That means, everyone can contribute to it by writing R packages and sharing to the community. An package usually consists of several R functions and datasets that are designed for specific tasks. There are over 10,000 packages in CRAN.

You need to download the package first and then load it to working environment before using some particular functions. We will see this later.

You may call yourself software developer if you can write R packages. If you are interested in writing package, here is a good book to read http://r-pkgs.had.co.nz/.

- Google: simply search “how to … with R”.
- Stack Overflow: a searchable Q&A site oriented toward programming issues.
- Cross Validated: a searchable Q&A site oriented toward statistical analysis.
- R-bloggers: a central hub of content from over 500 bloggers who provide news and tutorials about R.
- Use
*help()*or Question Mark in R console: This is the most convinient way to learn R functions. More than 80% of the time during my programming was looking at the help document in R.- Please try
**“?lm”**(type it in your console), or**help(lm)**

- Please try

Always set working directory before you start coding. Working directory where you may read external data, write data, and save the code.

**Look at current working directory**: type**getwd()**in console.**Set working directory**: use**setwd(“the path”)**,

Or

Click **Session -> Set Working Directory -> Choose Directory**, then choose the folder to which you wish to save your work. This will be the default Create a “R Script”, name your R Script and save it. Then you can start writing code in the editor window.

Your objects (loaded datasets, variables, functions, etc.) are contained in your “current workspace”, which can be saved any time. In Rstudio: Session -> Load Workspace/Save Workspace As….

**Remember:** Keep it tidy! Keep separate projects (code, data files) in separate workspaces/directories.

You can assign numbers and lists of numbers (vector) to a variable. Assignment is specified with the “<-” or “=” symbol. They assign the RHS value to LHS object. There are some subtle differences between them but most of the time they are equivalent. I highly suggest you to use “<-” when you want to do assignment, but use “=” in the argument of function(May explain later).

Here we define two variables \(x = 10\) and \(y = 5\), then we calculate the result of \(x+y=\). Type following code in the editor and run line by line. To run a line of code, you can move cursor to that line, and use **Crtl+Enter** (**Command+Enter** for Mac). If you want to run multiple lines of code, simply highlight those lines and use the same command. (Note that you can put a # in front of a line to write comment in code.)

```
x <- 10
y = 5
x+y
```

`## [1] 15`

After you run the code, what did you find in the Global Environment (Workspace) window?

In RStudio, you can view every variable you defined in the Global Environment (Workspace) window, along with other objects such as imported datasets in the *Workspace* panel. You can use R as an over-qualified calculator. Try the following commands. You already have \(x, y\) defined. Then you can calculate \(log(x)=\)

`log(x)`

`## [1] 2.302585`

\(exp(y)=\)

`exp(y)`

`## [1] 148.4132`

\(cos(x)=\)

`cos(x)`

`## [1] -0.8390715`

The log, exp, cos operators are *functions* in r. They take inputs (also called *arguments*) in parentheses and give outputs.

You can also run logical operations, such as \(x == y, x > y\):

`x == y`

`## [1] FALSE`

`x > y`

`## [1] TRUE`

Exercise:Economic Order Quantity Model: \(Q= \sqrt{2DK/h}\)

- D=5000: annual demand quantity
- K=$4: fixed cost per order
- h=$0.5: holding cost per unit
Q=?

There are four types of data structure in R: Vector, Matrix, Data frame, and List

**Vector**is a list of numbers (or strings), such as the \(z\) above. It is a vector with elements: \([3,5,7,9]\). There are some basic calculations on list of numbers.

To assign a list of numbers (vector) to a variable, the numbers within the c command are separated by commas. As an example, we can create a new variable, called “z” which will contain the numbers 3, 5, 7, and 9.

```
# Define numerical vector z
z<- c(3,5,7,9)
# Define character vector zz, where numerical operations cannot be directly applied.
zz<- c("cup", "plate", "pen", "paper")
```

```
#Average
mean(z)
```

`## [1] 6`

```
#Standard devidation
sd(z)
```

`## [1] 2.581989`

```
#Median
median(z)
```

`## [1] 6`

```
#Max
max(z)
```

`## [1] 9`

```
#Min
min(z)
```

`## [1] 3`

```
#Summary Stats
summary(z)
```

```
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.0 4.5 6.0 6.0 7.5 9.0
```

Elementwise operations for single vector or vectors:

`z`

`## [1] 3 5 7 9`

`z+2`

`## [1] 5 7 9 11`

`z/10`

`## [1] 0.3 0.5 0.7 0.9`

```
# define vector z1
z1 <- c(2,4,6,8)
# Elementwise operations (must be the same length)
z+z1
```

`## [1] 5 9 13 17`

`z*z1`

`## [1] 6 20 42 72`

Vector of multiple vectors is still a vector. z2

```
# define vector z2
z2 <- c(z, z1)
z2
```

`## [1] 3 5 7 9 2 4 6 8`

How to extract the second entry of vector z2=(3,5,7,9,2,4,6,8)?

`z2[2]`

`## [1] 5`

How to extract all elements greater than 3 from vector z2?

`z2[z2>3]`

`## [1] 5 7 9 4 6 8`

How to extract all elements greater than 3 and smaller than 6 from vector z2?

`z2[z2>3 & z2<6]`

`## [1] 5 4`

How to order the vector z2 from smallest to largest?

`z2[order(z2)]`

`## [1] 2 3 4 5 6 7 8 9`

Exercise:

- What is dot product(inner product) of z and z1?
- Find the elements of z2 that smaller than 3 or greater than 7.

**Matrix** is a table of numbers (or strings). \(A\) is a matrix with 2 rows and 3 columns.

```
z = c(3,5,7,9)
A = matrix(data = c(1,2,3,4,5,6), nrow = 2)
```

*matrix()* is a function that creates a matrix from a given vector. Some of the arguments in a function can be optional. For example you can also add the *ncol* arguments, which is unnecessary in this situation.

`A <- matrix(data = c(1,2,3,4,5,6), nrow = 2, ncol = 3)`

Another way to write the function is to ignore the argument names and just put arguments in the right order, but this may cause confusion for readers.

`A <- matrix(c(1,2,3,4,5,6), 2, 3)`

Question: Think about what would it be if specify ncol=2, or ncol=4?

The default order to position the numbers of a vector to matrix is by column(from top to bottom), but you can specify it as by row using an additional argument **byrow=TRUE**.

`A <- matrix(data = z2, nrow = 4, ncol = 2, byrow = TRUE)`

Elementwise operations for matrices:

`A`

```
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
```

`A+2`

```
## [,1] [,2] [,3]
## [1,] 3 5 7
## [2,] 4 6 8
```

Deimension

```
# Dimensions of A
dim(A)
```

`## [1] 2 3`

Transpose and Multiplication

```
# Transpose
t(A)
```

```
## [,1] [,2]
## [1,] 1 2
## [2,] 3 4
## [3,] 5 6
```

```
# Matrix multiplication is doable if and only if the number of columns in A1 equals the number of rows in A2
t(A) %*% A
```

```
## [,1] [,2] [,3]
## [1,] 5 11 17
## [2,] 11 25 39
## [3,] 17 39 61
```

```
# New matrix with dimension 4*2
A2 <- A * 2
# Matrix calculation should satisfy the rules of matrix algebra
A + A2
```

```
## [,1] [,2] [,3]
## [1,] 3 9 15
## [2,] 6 12 18
```

Question: What would happen if run A %*% A2?

How to extract the second entry of second row from matrix A?

`A[2,2]`

`## [1] 4`

How to extract the first row from matrix A?

`A[1, ]`

`## [1] 1 3 5`

How to extract first two column from matrix A?

`A[,1:2]`

```
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
```

Exercise:

- What are the diagonal elements of t(A) %*% A?

**Data frames** in R are the “datasets”, that is tables of data with each row as an observation, and each column representing a variable. Data frames have column names (variable names) and row names.

- Convert a matrix to data frame

You can use *data.frame()* to transform a matrix into a dataframe. Most of the time you will import a text file as a data frame or use one of the example datasets that come with R.

```
mydf <- data.frame(A)
class(mydf)
```

`## [1] "data.frame"`

- Read external data file (.txt and .csv files)

Use the *read.table* or *read.csv* function to import comma/space/tab delimited text files. You can also use the Import Dataset Wizard in RStudio. Package “readxl” allows you to read xls/xlsx files. First, download the storks datasets ( storks.cvs and storks.txt files) and save them into your Working Directory.

```
mydata_csv <- read.csv("storks.csv", header=TRUE)
mydata_txt <- read.table("storks.txt", header=TRUE, sep = "\t")
```

- Load built-in dataset

```
#Load cars dataset that comes with R (50 obs, 2 variables)
data(cars)
```

- Summary of a dataset

```
#Dimension
dim(cars)
#Preview the first few rows
head(cars)
#Variable names
names(cars)
#Summary
summary(cars)
#Structure
str(cars)
```

Subsetting elements from data frames is similar to subsetting from matrices. On the other hand, since data frames have variable names (label for each column), you can also use the following two ways to refer variables of a data frame:

- df$var will select var from df
- df[, c(‘var1’,‘var2’)] will select var1 and var2 from df

In RStudio, hitting tab after `df$`

allows you to select/autocomplete variable names in df

- Adding and dropping variables

Add new variable to the data

```
#First 2 obs of the variable dist in cars
cars$dist[1:2]
```

`## [1] 2 10`

```
cars1<- cars
cars1$time<- cars$dist/cars$speed
```

Drop variable *time*

```
# since "time" is the third column, we can do
cars2<- cars1[,-3]
# we can also drop "time" by keeping the other two variables
cars3<- cars1[c("speed", "dist")]
```

**List** is a container. You can put different types of objects into a list to create your own list of all you have in hand.

`mylist<- list(myvector=z, mymatrix=A, mydata=cars)`

Most of the output of R function is a list that contains severl objects.

```
# Load car dataset that comes with R
data(cars)
#fit a simple linear regression between braking distance and speed
lm(dist~speed, data=cars)
```

```
##
## Call:
## lm(formula = dist ~ speed, data = cars)
##
## Coefficients:
## (Intercept) speed
## -17.579 3.932
```

There are three ways to get an element from a list:

- use
*listname[[i]]*to get the ith element of the list; - use
*listname[[“elementname”]]*; - use
*listname$elementname*.

Note that you use double square brackets for indexing a list.

```
reg <- lm(dist~speed, data = cars)
reg[[1]]
reg[["coeffcients"]]
reg$coeffcients
```

If you have done object oriented programming before, the list “reg” is actually an object that belongs to class “lm”. The element names such as “coeffcients” are fields of the “lm” class.

Define a vector with values (5, 2, 11, 19, 3, -9, 8, 20, 1). Calculate the sum, mean, and standard deviation.

Re-order the vector from largest to smallest, and make it a new vector.

Convert the vector to a 3*3 matrix ordered by column. What is the sum of first column? What is the number in column 2 row 3? What is the column sum?

Download the CustomerData to your working directory. Load it to R.

- How many rows and columns are there?
- Extract all variable names.
- What is the average “Debt to Income Ratio”?
- What is the proportion of “Married” customers?

A Simple Scatter Plot

`plot(cars)`

Types of distributions: norm, binom, beta, cauchy, chisq, exp, f, gamma, geom, hyper, lnorm, logis, nbinom, t, unif, weibull, wilcox

Four prefixes:

‘d’ for density (PDF)

‘p’ for distribution (CDF)

‘q’ for quantile (percentiles)

‘r’ for random generation (simulation)

`dbinom(x=4,size=10,prob=0.5)`

`## [1] 0.2050781`

`pnorm(1.86)`

`## [1] 0.9685572`

`qnorm(0.975)`

`## [1] 1.959964`

`rnorm(10)`

```
## [1] 1.68251119 1.16199649 -0.18213457 -0.48326558 0.07830935 0.51926933
## [7] -1.04622944 0.67249993 0.12307184 1.01232022
```

`rnorm(n=10,mean=100,sd=20)`

```
## [1] 107.96094 112.76725 68.70794 104.23919 75.90895 98.85581 102.52294
## [8] 103.70692 81.56627 93.78238
```

- R programming is essential applying and writing functions. Most of R consists of functions.
- An R function may require multiple inputs, we call them argument. The arguments can either be input in the right order, or using argument names. In RStudio, pressing tab after function name gives help about arguments
- Using “?+function name” to learn how to use that funcion.
- We introduce how to write simple functions here. In the following example the function
*abs_val*returns the absolute value of a number.

```
abs_val = function(x){
if(x >= 0){
return(x)
}
else{
return(-x)
}
}
abs_val(-5)
```

`## [1] 5`

A function for vector truncation

```
mytruncation<- function(v, lower, upper){
v[which(v<lower)]<- lower
v[which(v>upper)]<- upper
return(v)
}
```

You just defined a global function for truncation. Now let’s apply it to vector z2, where we truncate at lower=3 upper=7.

`mytruncation(v = z2, lower = 3, upper = 7)`

`## [1] 3 5 7 7 3 4 6 7`

There are two ways to write a loop: while and for loop. Loop is very useful to do iterative and duplicated computing.

For example: calculate \(1+1/2+1/3+...+1/100\).

```
i<- 1
x<- 1
while(i<100){
i<- i+1
x<- x+1/i
}
x
```

`## [1] 5.187378`

```
x<- 1
for(i in 2:100){
x<- x+1/i
}
x
```

`## [1] 5.187378`

Exercise:

- Do you think \(1+1/2^2+1/3^2+...+1/n^2\) converges or diverges as \(n\rightarrow \infty\)? Use R to verify your answer.
- Fibonacci sequence: 1, 1, 2, 3, 5, 8, 13,… What is the next number? What is the 50th number? Creat a vector of first 30 Fibonacci numbers.
- Write a function that can either calculate the summation of the serie in Question 1 or generate and print Fibonacci sequence in Question 2.

That is great! Go R!

You may have trouble in the rest of the semester…, so please try to get used to it!

- Set working directory.
- How to creat a vector?
- How to load an external data file?
- How to subset/index a vector, a matrix, and a data frame?
- Basic calculations and syntax.

You may ask:

- How can I save time when I need to rerun my programs to update the results?
- How can I efficiently rewrite my report/homework if I need to change the data but keep all the analyses?
- How can I avoid dealing with the format of my report and focus on the content itself?

If these questions bother you at least once, then you need to learn “**Markdown**/**RMarkdown**”.

The most recent version of R-studio has added the “R Markdown” file type as default. You can create an “R Markdown” file.