# Data Mining in R

This set of learning materials for undergraduate and graduate data mining class is currently maintained by Xiaorui Zhu. Many materials are from Dr. Yan Yu’s previous class notes. Thanks for the contribution from previous Ph.D. students in Lindner College of Business. Thanks to Dr. Brittany Green for recording the videos.

## Lecture and Lab Notes

### Introduction to Data Mining and R

Lab Notes Video Exercise
1.A Introduction to Data Mining
1.B Introduction to R   1.B
1.C Advanced techniques: function and loop   1.C
1.D Introduction to RMarkdown (optional)

### Exploratory Data Analysis

Lab Notes Video Exercise
2.A Explore and describe dataset   2.A
2.B Exploratory data analysis by visualization   2.B
2.C tidyverse: R packages for EDA (optional)

### Linear Regression, Prediction and Variables Seleciton

Lab Notes Video Exercise
3.A Linear regression and prediction   3.A
3.B Subset variable selection   3.B
3.C LASSO variable selection   3.C
3.D Monte Carlo simulation

### Logistic Regression

Lab Notes Video Exercise
4.A Logistic regression and prediction   4.A
4.B Logistic regression and variable selection   4.B
4.C Logistic Regression for binary classification   4.C
4.D Logistic regression and ROC   4.D

### Cross Validation

Lab Notes Video Exercise
5.A Cross validation   5.A
5.B Cross validation (Logit model)   5.B

### Tree Models

Lab Notes Video Exercise
6.A Regression Trees   6.A
6.B Classification Trees   6.B

### Advanced Tree Models: Bagging, Random Forests, and Boosting Tree

Lab Notes Video Exercise
7.A Bagging trees
7.B Random forests   7.B
7.C Boosting trees   7.C

### Nonlinearity, Generalized Additive Models (GAM), and Nonparametric Smoothing

Lab Notes Video Exercise
8.A Univariate Nonparametric Smoothing
8.B Generalized additive model (GAM)   8.B

### Neural Network, LDA, and SVM

Lab Notes Video Exercise
9.A Neural network models   9.A
9.B (Optional) Discriminant analysis
9.C (Optional) Support vector machine (SVM)

### Unsupervised Learning: Clustering

Lab Notes Video Exercise
10 Clustering   10

### Unsupervised Learning: Association Rules

Lab Notes Video Exercise
11 Association Rules   11

### Other Topics 1: Basic Text Mining

Basic Text Mining

### Contributors:

• Tracy Zhou Wu, Ph.D. (2008). Executive Director/VP, JPMorgan Chase, Dallas, TX.
• Shaonan Tian, Ph.D. (2012). Tenure Track Assistant Prof. at San Jose State University, CA.
• Chaojiang Wu, Ph.D. (2013). Tenure Track Assistant Prof. at Drexel University, PA (now Kent State University, OH).
• Feng Mai, Ph.D., Assistant Professor of Information Systems in the School of Business at Stevens Institute of Technology
• Shaobo Li, Ph.D. (2018). Tenure Track Assistant Prof. at University of Kansas.
• Yuankun Zhang, Ph.D. (2018). VP, Bank of New York Mellon, Pittsburgh, PA.
• Brittany Green, Ph.D. (2020). Tenure Track Assistant Prof. at University of Louisville.
• Xiaorui Zhu, Tianhai Zu, Saidat Sanni, Zewei Lin, ongoing Ph.D. students.
• Zhiyuan Dong, Ph.D. Principal, Media Center of Excellence at IRI, Chicago
• Wei Xiong, Jingyin Gene, ChongQing, China