Xiaorui Zhu

About Me

I'm an Assistant Professor of Business Analytics at Towson University, where I explore how data, machine learning, and statistical reasoning can illuminate complex problems in finance, risk management, and information systems. My research focuses on high-dimensional statistics, machine learning, and categorical data analysis, with applications in bankruptcy prediction, sentiment analysis, and asset pricing. I obtained my Ph.D. degree in Business Analytics advised by Dr. Yan Yu from the University of Cincinnati, Lindner College of Business. Before that, I received an master degree in Finance from Penn State University and a master degree of Applied Statistics from Beijing University of Technology.

I'm especially interested in building interpretable models that balance statistical rigor with practical insight. I've developed several R packages—like SurrogateRsq, PAsso, and SSCI. My work has been published in journals such as Journal of Business & Economic Statistics and Decision Support Systems .

I believe staying Hungry, Foolish, and Creative is the secret to success. My long-term goal is to contribute to a form of machine intelligence that is the "epitome" of a good human being and human intelligence.

Google Curriculum Vitae Github

Research

Publications

Sparsified Simultaneous Confidence Intervals for High-Dimensional Linear Models,
Zhu, X., Qin, Y., and Wang, P. (2024), Metrika, 1-25.
(Relevant package "SPSP": )
Impact of the COVID-19 Pandemic on the Stock Market and Investor Online Word of Mouth,
Zhu, X., Li, S., Srinivasan, K., Lash, M. T. (2024), Decision Support Systems, 176, 114074.
Corporate Probability of Default: A Single-Index Hazard Model Approach,
Li, S., Yu, Y., Tian, S., Zhu, X., Lian, H. (2022),
Journal of Business & Economic Statistics, DOI: 10.1080/07350015.2022.2120484
SurrogateRsq: an R package for categorical data goodness-of-fit analysis using the surrogate \(R^2\),
Zhu, X., Lin, Z., Liu, D., and Greenwell, B. (2023), The New England Journal of Statistics in Data Science, 1-12.
The effects of NASDAQ delisting on firm performance,
Li, M., Liu, K., Zhu, X. (2023), Research in International Business and Finance, 2024.
A New Goodness-of-fit Measure for Probit Models: Surrogate \(R^2\),
Liu, D., Zhu, X., Greenwell, B., Lin, Z. (2023),
British Journal of Mathematical and Statistical Psychology , 76, 192– 210.
PAsso: an R Package for Assessing Partial Association between Ordinal Variables,
Li, S., Zhu, X., Chen, Y., Liu, D. (2021), The R Journal. 13, no. 2: 135.
Exploring the Diversity of Creative Prototyping in a Global Online Learning Environment,
Jablokow, K. W., Zhu, X., and Matson, J. V.. (2020).
International Journal of Design Creativity and Innovation, 1-23.
Adaptive Quasi-maximum Likelihood Estimation of GARCH Models with Student's t Likelihood,
Zhu, X., and Li Xie. (2016).
Communications in Statistics-Theory and Methods 45, no. 20 (2016): 6102-6111.
Stimulating Creativity in Online Learning Environments through Intelligent Fast Failure (IFF),
Jablokow, K. W., Zhu, X., Matson, J. V., and Akshay Kakde. (2016).
ASEE 2016 Annual Conference & Exposition, Conference Proceedings (Vol. 2016)

Working Paper

Zooming in on Distress Risk: Bankruptcy vs. Other Failures
Xing, Y., Yu, Y., Zhu, X. (2025, alphabetical authorship, authors contributed equally) Under revision after peer review.
The Impact of Signals from Firm-Generated Content (FGC) on User-Generated Content (UGC) and the Dynamics of Stock Market
Zhu, X., Li, S., Lash, M. T. (2025), Mnauscript in final preparation to be submitted.
The Echoes of Geopolitical Risk and Sentiment: Interactions Between Geopolitical Uncertainty, Social Media, and Stock Returns
Zhu, X., Lash, M. T., and Li, K., (2025), In preparation for submission.
Using Surrogate \(R^2\) to Assess Logistic Models in Bankruptcy Analysis: Explainability, Predictability, and Comparability,
with Liu, D., on-going.

Research In Progress

Corporate Bankruptcy Prediction: A Penalized Semiparametric Index Hazard Model Approach,
with Yu, Y., and Tian, S., on-going.
Additive Logistic Model with Macroeconomic Covariates for Corporate Bankruptcy Prediction,
with Yu, Y., and Tian, S., on-going.
COVID-19 Pandemic Impact on the Industry-level Credit Risk.
with Peterburgsky, S., Yu, Y., on-going.

Presentations

"The Impact of Signals from Firm-Generated Content (FGC) on User-Generated Content (UGC) and the Dynamics of Stock Market"
Session Chair, INFORMS Annual Meeting, Atlanta, MA, Oct 2025.
"The Echoes of Geopolitical Risk and Sentiment: Interactions Between Geopolitical Uncertainty, Social Media, and Stock Returns"
Invited talk, Beijing University of Technology, Beijing, June 2025.
"The Echoes of Geopolitical Risk and Sentiment: Interactions Between Geopolitical Uncertainty, Social Media, and Stock Returns"
CBE Research Conference 2025, Towson University, May 2025.
"Geopolitical Risks Effects on Social Media Sentiments and Stock Return"
NEDSI 2025, Hershey, PA, Mar 2025.
"The impact of social media on firm stock performance: firm-generated content vs. user-generated content"
INFORMS 8th Workshop on Data Science 2024, Seattle, Oct 2024.
"Corporate Probability of Default: A Single-Index Hazard Model Approach"
Invited talk, ASA 2024 The Joint Statistical Meetings (JSM), Oregon, August 2024.
"Assessing Partial Association between Ordinal Variables Using PAsso R Package for American National Election and U.S. Bankruptcy,
Invited talk, NESS 37th New England Statistics Symposium, Boston, MA, May 2023.
"Using surrogate R-squared to assess logistic models in bankruptcy analysis: explainability, predictability, and comparability",
2023 Symposium on Data Science and Statistics, St. Louis, Missouri, May 2023.
"A new goodness-of-fit measure for categorical data goodness-of-fit analysis: surrogate \(R^2\)",
BFF8, Cincinnati, OH, May 2023.
"Measuring goodness-of-fit for logit model and its application to U.S. and Polish bankruptcy prediction",
ICSA Applied Statistics Symposium, Gainesville, FL, June 2022.
"Zooming in on Distress Anomaly: Bankruptcy vs. Other Failures",
INFORMS Annual Meeting, Anaheim, CA, Oct 2021.
"Surrogate \(R^2\) for Probit Models",
INFORMS Annual Meeting, Virtual, Nov 2020.
"Demystifying Differences between Bankruptcy and Other Failures in Terms of Major Drivers and Risk-Return Relationship",
Session Chair, INFORMS Annual Meeting, Seattle, WA, Oct 2019.
"Conducting Inference of Coefficients and Model for High-Dimensional Sparse Linear Model",
INFORMS 3rd Workshop on Data Science (Peer-Reviewed), Seattle, WA, Oct 2019.
"Simultaneous Confidence Intervals Using Entire Solution Paths", Poster
The Fourth Workshop on Higher-Order Asymptotics and Post-Selection Inference (WHOA-PSI), St. Louis, MO, August 2019.
"Simultaneous Confidence Intervals Using Entire Solution Paths", Slides
ASA 2019 Joint Statistical Meeting, Denver, CO, July, 2019.
"What drivers Bankruptcy and Other Failures? An Analysis with Variable Selection",
INFORMS Annual Meeting, Data Science Workshop (Peer-Reviewed), Phoenix, AZ, November 2018
"Additive Logistic Model with Macroeconomic Covariates for Corporate Bankruptcy Prediction",
ASA 2017 Joint Statistical Meeting, Baltimore, MD, August, 2017.
"Amplifying Creativity in Big Data Era: Relationship between creativity and failure",
The 7th International Forum on Statistics, Beijing, China, 2016

Software

I'm interested in making research results more intuitive and understandable. Therefore, I use Shiny app for interactively telling interesting data story. Here are some latest shiny apps and R packages I created. Everyone interested in my research or software is welcome to contact me.

SurrogateRsq (R Package)

An R package for the Surrogate \(R^2\) measure for categorical data analysis. It can generate a point or interval measure of the surrogate \(R^2\), and a ranking measure of each variable's contribution.

PAsso (R Package)

An R package and implementation of the unified framework for assessing Parrtial Association between ordinal variables after adjusting for a set of covariates.

SSCI (R Package)

An R Package for constructing the sparsified simultaneous confidence intervals (SSCI).

SPSP (R Package)

An R Package for Selecting the relevant predictors by Partitioning the Solution Paths of the Penalized Likelihood Approach.

Trading Strategies (App)

It is an online platform to display interactive trading strategies.

Stock Display (App)

It is an Shiny app that is used to display time-series data and forecasting of stocks.

Bankruptcy Prediction (App)

It is live app showing probability of bankruptcy of public companies in US.

Teaching

Data Mining for Business Analytics (Towson)
Instructor

This course emphasizes hands-on data analysis experience using the most recent progression in data mining and machine learning for business analytics with statistical software R. Topics include modern data wrangling techniques, data visualization, linear regression, logistic regression, variable selection, model evaluation, K-nearest neighbors, classification and regression trees (CART), etc.

Syllabus

Business Analytics (Towson)
Instructor

This course focuses on using standard business analytic models to summarize and analyze data, build models, and drive impact through quantitative decision-making.

Syllabus

Data Wrangling with R (UC)
Instructor

Data Wrangling with R! This course provides an intensive, hands-on introduction to Data Wrangling with the R programming language. You will learn the fundamental skills required to acquire, munge, transform, manipulate, and visualize data in a computing environment that fosters reproducibility.

Syllabus Lab Notes

Forecasting and Time Series Methods (UC)
Instructor

This course covers time series analysis, emphasizing the appropriate models for estimation, testing, and forecasting. For example, Univariate Box-Jenkins for fitting and forecasting time series; ARIMA models, stationarity and nonstationarity; diagnosing time series models; forecasting, point, and interval forecasts, seasonal time series models, modeling volatility with ARCH, GARCH, , and other methods. The R Shiny App development is also covered to help students obtain skills in making a prototype of their models and ideas.

Undergraduate Graduate

Business Analytics (UC)
Instructor

This course develops fundamental knowledge and skills for applying statistics to business decision-making. Topics include descriptive statistics, probability distributions, sampling, confidence intervals, hypothesis testing, and computer software for statistical applications. (2018 Spring & Fall)

Syllabus

Data Mining I & II (UC)
Instructor

The statistical methods in these two courses include Linear Regression, Generalized Linear Models (e.g., Logistic regression), Variable Selection, Cross Validation, k-nearest neighbors, Classification and Regression Trees (CART), Bagging, Boosting, Random Forests, Generalized Additive Models (GAM for Nonlinearity), Nonparametric Smoothing; Neural Network, Clustering(K-means clustering, Support Vector Machine), Principal Component Analysis, Association Rules, and Text Mining.

Syllabus Lab Notes

About Me