I'm Xiaorui (Jeremy) Zhu, an Assistant Professor in the Department of Business Analytics and Technology Management, Towson University. I obtained my Ph.D. degree in Business Analytics from the University of Cincinnati, Lindner College of Business. Before that, I received an master degree in Finance from Penn State University and a master of Applied Statistics from Beijing University of Technology.
My research interests include high-dimensional statistics, categorical data analysis, machine learning, default prediction and sentiment analysis in finance and information systems, and creativity. Specifically, I study the variable selection methods and statistical inference in high-dimensional statistics. One of my works proposed the sparsified simultaneous confidence intervals for the high-dimensional linear regression model (SSCI). In finance, I work on bankruptcy, delisting problems, and stock return prediction using sentiment analysis. For the bankruptcy problem, I am interested in the prediction of bankruptcy and how the bankruptcy risk is associated the equity return. One of my projects focuses on bankruptcy and delistings due to other failure reasons, two closely related yet sharply different distress events. We identify two different models for bankruptcy risk and other-failure risk. In another published work, I proposed an adaptive method to estimate the coefficients of the GARCH model with heavy-tailed innovation. I'm also interested in how creativity can stimulate the development of machine learning algorithms and artificial intelligence.
I believe staying Hungry, Foolish, and Creative is the secret to success. I dream of being a statistician engineering a machine intelligence that is the "epitome" of the descendants of human intelligence.
I'm interested in making research results more intuitive and understandable. Therefore, I use Shiny app for interactively telling interesting data story. Here are some latest shiny apps and R packages I created. Everyone interested in my research or software is welcome to contact me.
An R package for the Surrogate \(R^2\) measure for categorical data analysis. It can generate a point or interval measure of the surrogate \(R^2\), and a ranking measure of each variable's contribution.
This course emphasizes hands-on data analysis experience using the most recent progression in data mining and machine learning for business analytics with statistical software R. Topics include modern data wrangling techniques, data visualization, linear regression, logistic regression, variable selection, model evaluation, K-nearest neighbors, classification and regression trees (CART), etc.
SyllabusThis course focuses on using standard business analytic models to summarize and analyze data, build models, and drive impact through quantitative decision-making.
SyllabusData Wrangling with R! This course provides an intensive, hands-on introduction to Data Wrangling with the R programming language. You will learn the fundamental skills required to acquire, munge, transform, manipulate, and visualize data in a computing environment that fosters reproducibility.
Syllabus Lab NotesThis course covers time series analysis, emphasizing the appropriate models for estimation, testing, and forecasting. For example, Univariate Box-Jenkins for fitting and forecasting time series; ARIMA models, stationarity and nonstationarity; diagnosing time series models; forecasting, point, and interval forecasts, seasonal time series models, modeling volatility with ARCH, GARCH, , and other methods. The R Shiny App development is also covered to help students obtain skills in making a prototype of their models and ideas.
Undergraduate GraduateThis course develops fundamental knowledge and skills for applying statistics to business decision-making. Topics include descriptive statistics, probability distributions, sampling, confidence intervals, hypothesis testing, and computer software for statistical applications. (2018 Spring & Fall)
SyllabusThe statistical methods in these two courses include Linear Regression, Generalized Linear Models (e.g., Logistic regression), Variable Selection, Cross Validation, k-nearest neighbors, Classification and Regression Trees (CART), Bagging, Boosting, Random Forests, Generalized Additive Models (GAM for Nonlinearity), Nonparametric Smoothing; Neural Network, Clustering(K-means clustering, Support Vector Machine), Principal Component Analysis, Association Rules, and Text Mining.
Syllabus Lab Notes