DACHENG XIU

Professor of Econometrics and Statistics, Booth School of Business, University of Chicago
Affiliated Faculty, Department of Statistics, University of Chicago
Research Associate, National Bureau of Economic Research

Assistant: Roy, Ruhi
5807 South Woodlawn Avenue, Chicago, IL 60637
773.834.7191
dacheng.xiu@chicagobooth.edu
Google Scholar Profile
Curriculum Vitae

Dacheng Xiu specializes in developing statistical methodologies and their applications to financial data to investigate economic implications. His earlier research involved risk measurement and portfolio management with high-frequency data and econometric modeling of derivatives. Currently, he focuses on developing machine learning solutions for big-data problems in empirical asset pricing. His research has appeared in Econometrica, Journal of Political Economy, Journal of Finance, Review of Financial Studies, Journal of the American Statistical Association, and Annals of Statistics. For a more accessible introduction to his work, explore a curated list of articles in the Chicago Booth Review.
Xiu serves as a Research Associate at the National Bureau of Economic Research. He currently holds and has previously held several editorial positions, including Co-Editor of Journal of Business & Economic Statistics and Journal of Financial Econometrics, as well as Associate Editor for journals such as Journal of Finance, Review of Financial Studies, Journal of the American Statistical Association, Management Science, and Journal of Econometrics. He has received several recognitions for his research, including Fellow of the Society for Financial Econometrics, Fellow of the Journal of Econometrics, Swiss Finance Institute Outstanding Paper Award, AQR Insight Award, and best paper prizes at various conferences. He has been recognized as one of Poets & Quants’ Best 40-under-40 Business School Professors.

Xiu earned his PhD and MA in applied mathematics from Princeton University, where he was also a student at the Bendheim Center for Finance. Prior to his graduate studies, he obtained a BS in mathematics from the University of Science and Technology of China.
PUBLICATIONS
"Test Assets and Weak Factors", with Stefano Giglio and Dake Zhang, forthcoming in the Journal of Finance.
BFI FINDING Slides Matlab Codes.zip Fama-MacBeth Estimator Factor Model Supervised PCA Cross-Section of Expected Returns Machine Learning Big Data Marginal Screening Weak Factors
"Business News and Business Cycles", with Leland Bybee, Bryan Kelly, and Asaf Manela, Journal of Finance, Vol. 79, Issue 5, (2024), 2905-3677.
The Structure of Economic News Website Text Mining Machine Learning Big Data Topic Model
"Non-Standard Errors”, crowdsourcing project with 342 coauthors, Journal of Finance, Vol. 79, Issue 3, (2024), 2339-2390.
"(Re-)Imag(in)ing Price Trends", with Jingwen Jiang and Bryan Kelly, Journal of Finance, Vol. 78, Issue 6, (2023), 3193-3249.
Chicago Booth Review CBR Video IPython Imaging Example.html Image Data.zip Machine Learning Cross-Section of Expected Returns Convolutional Neural Networks Transfer Learning Big Data Return Predictability
"When Moving-Average Models Meet High-Frequency Data: Uniform Inference on Volatility", with Rui Da, Econometrica, Vol. 89, No. 6, (2021), 2787-2825.
Chicago Booth Review Supplemental Material QMLE Volatility Estimation Microstructure Noise Kalman Filtering Model Selection
"Asset Pricing with Omitted Factors", with Stefano Giglio, Journal of Political Economy, Vol. 129, No. 7, (2021), 1947-1990 (lead article). Winner of the Best Conference Paper Prize at the 44th EFA.
Chicago Booth Review Supplemental Material Slides Matlab Codes.zip Fama-MacBeth Estimator Factor Model PCA Cross-Section of Expected Returns Big Data
"Thousands of Alpha Tests", with Stefano Giglio and Yuan Liao, Review of Financial Studies, Vol. 34, Issue 7, (2021), 3456-3496.
Supplemental Material Python Codes.zip Data Snooping Multiple Testing Hedge Funds Factor Model PCA Matrix Completion Cross-Section of Expected Returns Machine Learning Big Data Missing Data Marginal Screening
"Autoencoder Asset Pricing Models", with Shihao Gu and Bryan Kelly, Journal of Econometrics 222 (2021), 429-450.
Machine Learning Time-Series and Cross-Section of Expected Returns Nonlinear Factor Model Neural Networks Big Data Return Predictability Deep Learning
"Empirical Asset Pricing via Machine Learning", with Shihao Gu and Bryan Kelly, Review of Financial Studies, Vol. 33, Issue 5, (2020), 2223-2273. Winner of the 2018 Swiss Finance Institute Outstanding Paper Award.
Chicago Booth Review GitHub Empirical Data (UPDATED June 2021) SAS/Python Codes for Data Supplemental Material Machine Learning Time-Series and Cross-Section of Expected Returns Neural Networks Big Data Return Predictability Deep Learning
"Taming the Factor Zoo: A Test of New Factors", with Guanhao Feng and Stefano Giglio, Journal of Finance, Vol. 75, Issue 3, (2020), 1327-1370. First Prize Winner of the 2018 AQR Insight Award.
Chicago Booth Review Supplemental Material SAS/Python Codes for Data Slides Machine Learning Variable Selection Cross-Section of Expected Returns Big Data
"High-Frequency Factor Models and Regressions", with Yacine Aït-Sahalia and Ilze Kalnina, Journal of Econometrics 216 (2020), 86-105.
"A Hausman Test for the Presence of Market Microstructure Noise in High Frequency Data", with Yacine Aït-Sahalia, Journal of Econometrics 211 (2019), 176-205.
Matlab Codes.zip Microstructure Noise QMLE Volatility Estimation
Factor Model Covariance Estimation Microstructure Noise MSCI Barra Model PCA Portfolio Allocation Big Data
"Efficient Estimation of Integrated Volatility Functionals via Multiscale Jackknife", with Jia Li and Yunxiao Liu, Annals of Statistics Vol. 47, No. 1, (2019), 156-176.
Spot (Co)Variance Jackknife Bootstrap
"Principal Component Analysis of High Frequency Data", with Yacine Aït-Sahalia, Journal of the American Statistical Association 114 (2019), 287-303.
Matlab Codes.zip Biplots.mp4 PCA Spot (Co)Variance
"Resolution of Policy Uncertainty and Sudden Declines in Volatility", with Dante Amengual, Journal of Econometrics 203 (2018), 297-315.
Chicago Booth Review Supplemental Material Variance Swaps Pricing Non-affine Models Downward Volatility Jumps VIX
Matlab Codes.zip QMLE Factor Model Covariance Estimation Microstructure Noise Kalman Filtering, Smoothing, and EM Algorithm Missing Data
"Nonparametric Estimation of the Leverage Effect: A Trade-off between Robustness and Efficiency", with Ilze Kalnina, Journal of the American Statistical Association 112 (2017), 384-396.
Covariance Estimation VIX Spot (Co)Variance
"Using Principal Component Analysis to Estimate a High Dimensional Factor Model with High-Frequency Data", with Yacine Aït-Sahalia, Journal of Econometrics 201 (2017), 384-399.
High Frequency Fama-French Factors Big Data Factor Model PCA Portfolio Allocation Covariance Estimation
"Increased Correlation Among Asset Classes: Are Volatility or Jumps to Blame, or Both?" , with Yacine Aït-Sahalia, Journal of Econometrics 194 (2016), 205-219.
Co-Jumps Covariance Estimation Microstructure Noise
"Incorporating Global Industrial Classification Standard into Portfolio Allocation: A Simple Factor-Based Large Covariance Matrix Estimator with High Frequency Data", with Jianqing Fan and Alex Furger, Journal of Business & Economic Statistics 34 (2016), 489-503, (Special Issue on Big Data).
Chicago Booth Review Matlab Codes.zip High Frequency Fama-French Factors Portfolio Allocation Big Data Covariance Estimation
"A Tale of Two Option Markets: Pricing Kernels and Volatility Risk", with Zhaogang Song, Journal of Econometrics 190 (2016), 176-196. Honorable mention of the 2017 Dennis J. Aigner Award.
Nonparametric Option Pricing Closed-form Pricing of SPX and VIX Options Particle Filtering
"Quasi-Maximum Likelihood Estimation of GARCH Models with Heavy-Tailed Likelihoods", with Jianqing Fan and Lei Qi, Journal of Business & Economic Statistics 32 (2014), 178-191. Invited Paper with Discussion.
Rejoinder.pdf Matlab Codes.zip Volatility Estimation
"Hermite Polynomial Based Expansion of European Option Prices", Journal of Econometrics 179 (2014), 158-177.
Option Pricing Non-affine Models
"High-Frequency Covariance Estimates with Noisy and Asynchronous Data", with Yacine Aït-Sahalia and Jianqing Fan, Journal of the American Statistical Association 105 (2010), 1504-1517.
Matlab Codes.zip QMLE Covariance Estimation Microstructure Noise
QMLE Volatility Estimation Microstructure Noise
WORKING PAPERS
"Predicting Returns with Text Data", with Tracy Ke and Bryan Kelly, Mar. 2022. Winner of the 2019 CICF Best Paper Award.
Chicago Booth Review Text Mining Machine Learning Cross-Section of Expected Returns Big Data Return Predictability Sentiment Analysis Topic Model Marginal Screening
"Continuous-Time Fama-MacBeth Regressions", with Yacine Aït-Sahalia and Jean Jacod, Mar. 2023. Revision Requested.
SoFiE Seminar Video Fama-MacBeth Estimator Factor Model Cross-Section of Expected Returns
"Disentangling Autocorrelated Intraday Returns", with Rui Da, Jun. 2021. Revision Requested.
QMLE Autocovariance Estimation Microstructure Noise Kalman Filtering Model Selection
"The Statistical Limit of Arbitrage", with Rui Da and Stefan Nagel, Jun. 2022.
Chicago Booth Review Machine Learning Factor Model Cross-Section of Expected Returns MSCI Barra Model Multiple Testing Variable Selection Portfolio Allocation Weak Signals Bayes Risk
"Prediction When Factors are Weak", with Stefano Giglio and Dake Zhang, Mar. 2023. Submitted.
Supplement Factor Model Supervised PCA Machine Learning Big Data Marginal Screening Weak Factors
"Expected Returns and Large Language Models", with Yifei Chen and Bryan Kelly, Jul. 2023. Winner of the 2023 GSU-RFS FinTech Conference Best Paper Award.
Text Mining Machine Learning Cross-Section of Expected Returns Big Data Return Predictability Sentiment Analysis
"Can Machines Learn Weak Signals?", with Zhouyu Shen, Mar. 2024. Submitted. Winner of the 2024 Bates-White Prize for the Best Paper at SoFiE Annual Conference.
Weak Signals Machine Learning Big Data Bayes Risk
BOOK CHAPTERS, SURVEYS AND COMMENTS
"Financial Machine Learning", with Bryan Kelly, Foundations and Trends in Finance, Vol. 13, No. 3-4, (2023), 205-363.
"Factor Models, Machine Learning, and Asset Pricing", with Stefano Giglio and Bryan Kelly, Annual Review of Financial Economics, Vol. 14, (2022), 337-368.
"Comment on: Limit of Random Measures Associated with the Increments of a Brownian Semimartingale", with Jia Li, Journal of Financial Econometrics 16(4) (2018), 570-582.
"Likelihood-Based Volatility Estimators in the Presence of Market Microstructure Noise: A Review", with Yacine Aït-Sahalia, Handbook of Volatility Models and their Applications, Chapter 14, Wiley 2012.
The 2008 meltdown of the financial system has led to tremendous interest in understanding and controlling the systemic risk of the financial market, which further hinges on the assessment of the risks of individual assets on the market and their interdependencies. The increasing availability of transaction-level data on a growing cross-section of tradable assets presents a unique opportunity and yet substantial challenges in estimating these quantities. The overarching theme of the lab is to design methodologies that exploit information embedded in the big data to better ascertain and manage the risk.
Realized Volatility V 1.0
Loading data ...
Electronic trades in GLOBEX hours
Agricultural
BOSoybean Oil Futures
CCCocoa Futures
CCorn Futures
CTCotton No.2 Futures
FCFeeder Cattle Futures
KCCoffee C Futures
LBLumber Futures
LCLive Cattle Futures
LHLean Hogs Futures
OJOrange Juice Futures
OOats Futures
SBSugar #11 Futures
SMSoybean Meal Futures
SSoybean Futures
WWheat Futures CBOT
Energy
CLLight Crude Oil Futures ...
HOHeating Oil #2 Futures N...
NGNatural Gas Futures NYMEX
Equities/Equity Index
DMS&P 400 MidCap E-Mini Fu...
DXDollar Index Futures ICE
ESSP 500 E-Mini Futures
NKNikkei 225 Futures CME
NQNASDAQ 100 E-Mini Futures
SPSP 500 Futures
SXFS&P Canada 60 Futures
YMDow Jones ($5) E-mini Fu...
FX
ADAustralian Dollar Futures
BPBritish Pound Futures
CDCanadian Dollar Futures
JYJapanese Yen Futures
JYNMJapanese Yen E-Mini Futures
NENew Zealand Dollar Futures
SFSwiss Franc Futures
UROEuro FX Futures
UROMEuro FX E-mini Futures
Interest Rates
CGBCanadian 10-Year Futures
EDEurodollar Futures CME
FVUS 5-Year T-Note Futures
TUUS 2-Year T-Note Futures
TYUS 10-Year T-Note Futures
USUS 30-Year T-Bond Futures
Metals
GCGold Futures COMEX
HGCopper High Grade Future...
PAPalladium Futures NYMEX
PLPlatinum Futures NYMEX
SISilver Futures COMEX
Cryptocurrency
BTCCME Bitcoin Futures
Trades in PIT market hours
Energy
CLLight Crude Oil Futures ...
Data
Trades
Mid-Quotes
Add RV
5-Min
15-Min
ETFs10/03/202410/02/202410/01/202409/27/202409/26/2024
0Market S&P500SPY11.86% ±0.4%09.01% ±0.3%14.35% ±0.3%07.30% ±0.1%07.98% ±0.2%
MaterialXLB14.39% ±1.2%11.92% ±0.8%15.66% ±1.2%11.69% ±0.9%12.36% ±0.7%
IndustrialXLI14.45% ±1.1%11.92% ±0.9%17.73% ±1.0%10.90% ±0.7%11.65% ±0.6%
Consumer DiscretionaryXLY15.32% ±1.3%12.38% ±0.8%17.67% ±1.3%10.37% ±0.6%11.21% ±0.8%
Consumer StaplesXLP10.55% ±0.8%09.04% ±1.1%11.09% ±0.4%08.15% ±0.7%08.77% ±0.7%
Health CareXLV11.49% ±0.6%11.01% ±0.9%12.67% ±0.6%09.62% ±0.5%09.45% ±0.6%
FinancialXLF13.51% ±0.8%11.50% ±0.7%14.62% ±0.8%10.04% ±0.5%11.66% ±0.7%
TechnologyXLK19.05% ±1.4%15.13% ±1.5%20.82% ±1.2%13.17% ±0.8%15.33% ±1.5%
UtilitiesXLU15.21% ±0.6%11.39% ±1.0%12.69% ±0.8%09.92% ±0.6%10.85% ±0.7%
EnergyXLE20.59% ±1.2%19.83% ±0.9%24.12% ±1.2%15.67% ±0.7%21.20% ±1.1%
Objective
We provide up-to-date daily annualized realized volatilities for individual stocks, ETFs, and future contracts, which are estimated from high-frequency data. We are in the process of incorporating equities from global markets.

Data
We collect trades at their highest frequencies available (up to every millisecond for US equities after 2007), and clean them using the prevalent national best bid and offer (NBBO) that are available up to every second. The mid-quotes are calculated based on the NBBOs, so their highest sampling frequencies are also up to every second.

Methodology
We provide quasi-maximum likelihood estimates of volatility (QMLE) based on moving-average models MA(q), using non-zero returns of transaction prices (or mid-quotes if available) sampled up to their highest frequency available, for days with at least 12 observations. We select the best model (q) using Akaike Information Criterion (AIC). For comparison, we report realized volatility (RV) estimates using 5-minute and 15-minute subsampled returns.

References
1. “When Moving-Average Models Meet High-Frequency Data: Uniform Inference on Volatility”, by Rui Da and Dacheng Xiu. Econometrica, Vol. 89, No. 6, (2021), 2787-2825.
2. “Quasi-Maximum Likelihood Estimation of Volatility with High Frequency Data”, by Dacheng Xiu. Journal of Econometrics, 159 (2010), 235-250.
3. “How Often to Sample A Continuous-time Process in the Presence of Market Microstructure Noise”, by Yacine Aït-Sahalia, Per Mykland, and Lan Zhang. Review of Financial Studies, 18 (2005), 351–416.
4. “The Distribution of Exchange Rate Volatility”, by Torben Andersen, Tim Bollerslev, Francis X. Diebold, and Paul Labys. Journal of the American Statistical Association, 96 (2001), 42-55.
5. “Econometric Analysis of Realized Volatility and Its Use in Estimating Stochastic Volatility Models”, by Ole E Barndorff‐Nielsen and Neil Shephard. Journal of the Royal Statistical Society: Series B, 64 (2002), 253-280.
B41210 Financial Analytics (MiF)
Financial Analytics is an in-depth course designed to explore the analysis, exploration, and simplification of large and complex datasets. This course arms students with the essential skills to model and derive insights from data, enabling the development of robust predictive and classification models. The curriculum spans a variety of methodologies, including linear and logistic regression, model selection, multinomial and binary regression, clustering, factor models, decision trees, random forests, and deep learning. A strong emphasis is placed on practical computational skills and the fundamental concepts underpinning these methods. Students will actively engage with actual financial datasets, applying their knowledge to develop tailored methodologies for specific applications. Prerequisites for this course include a solid foundation in statistics, linear algebra, and proficiency in coding with Python.
B32810 Artificial Intelligence (EMBA)
This course is designed to introduce students to the cutting-edge field of Artificial Intelligence (AI). Spanning five detailed lectures, participants will gain insights into the core principles of AI and machine learning, explore the intricacies of natural language processing, discover the potential of vision recognition and image generation technologies, examine the burgeoning field of generative AI and the AI-generated content industry, and understand the vital significance of bias assessment and the concepts of responsible AI. This course is designed for those who are interested in the inner workings of state-of-the-art AI technologies. Specifically, it places emphases on the philosophy and intuition behind these technologies, as well as their promises and perils, but not on the technical details.
B41813 Decoding FinTech (EMBA/MBA)
This course provides a high-level introduction to two rapidly developing technologies: artificial intelligence and blockchain. Artificial intelligence, in particular machine learning and natural language processing algorithms, has been adopted by a variety of real-world FinTech companies that build their business based on credit scoring, fraud detection, real-estate valuation, portfolio management, and quantitative trading. Blockchain technology is the cornerstone of cryptocurrencies, smart contracts, and decentralized finance, a rising industry with great potential to disrupt the future of finance. This course is designed for those who are interested in the inner workings of the technologies, as well as their applications in the FinTech industry. Specifically, it places emphases on the philosophy and intuition behind these technologies, as well as their promises and perils, but not on the technical details. Students are expected to have completed MBA core courses before taking this course. The coding component is optional. Prior coding experience is not required.
B41100 Applied Regression Analysis (MBA)
This course is about regression, a powerful and widely used data analysis technique wherein we seek to understand how different random quantities relate to one another. Students will learn how to use regression to analyze a variety of complex real world problems, with the aim of understanding data and prediction of future events. Focus is placed on understanding of fundamental concepts and development of the skills necessary for robust application of regression techniques. Examples are used throughout to illustrate application of the tools.
B41902 Statistical Inference (PhD)
The focus of this course will be methods to draw inferences in econometric models. We will cover linear regression models, GMM, nonlinear models, and time series models. The majority of the discussion will cover frequentist methods focusing on the use of approximations to finite-sample sampling distributions as a means for obtaining inference. We will cover methods that are appropriate for independent data as well as dependent data. We will discuss intuition for how and when to use the econometric tools developed in the class in addition to deriving some of the relevant theoretical properties.
B20800 Big Data (Undergraduate)
Big Data is a course about data mining: the analysis, exploration, and simplification of large high-dimensional datasets. Students will learn how to model and interpret complicated `Big Data' and become adept at building powerful models for prediction and classification. Techniques covered include an advanced overview of linear and logistic regression, model choice and false discovery rates, multinomial and binary regression, classification, decision trees, factor models, clustering, the bootstrap and cross-validation. We learn both basic underlying concepts and practical computational skills, including techniques for analysis of distributed data. Heavy emphasis is placed on analysis of actual datasets, and on development of application specific methodology. Among other examples, we will consider investment, consumer database mining, internet and social media tracking, network analysis, and text mining.
2018 SoFiE Summer School
Machine Learning and Finance: The New Empirical Asset Pricing. Program is here.
2021 SoFiE Summer School
Machine Learning in Finance. Program is here.
2024 SoFiE Summer School
Financial Machine Learning. Program is here.
This website and the information presented on it are for research purposes only and should not be used for investment or other commercial purposes. You may use this website and the information presented on it solely for research purposes and not for any commercial purpose. The University of Chicago (the “University”) does not endorse this website and hereby disclaims all representations and warranties about it and the information presented on it, whether express or implied, including any implied warranties of merchantability, fitness for a particular purpose, and non-infringement. The University specifically disclaims any warranties as to the accuracy, usefulness, truthfulness, and availability of the information presented on this website. By using this website, you agree that you will not make any claim against the University or any of its trustees, officers, employees, agents, or other representatives relating to it or the information presented thereon. Neither the University nor any of its trustees, officers, employees, agents, or other representatives will have any liability to you or anyone else in connection with your use of this website or any information you receive through your use of it.
©2017, Dacheng Xiu at the University of Chicago.