We conduct a comparative analysis of methods in the machine learning repertoire, including penalized linear models, generalized linear models, boosted regression trees, random forests, and neural networks, that investors can deploy to forecast the cross-section of stock returns.

Gaining more widespread use in economics, machine learning algorithms have demonstrated the ability to reveal complex, nonlinear patterns that are difficult or largely impossible to detect with conventional statistical methods and are often more robust to the effects of multi-collinearity among predictors. We provide new evidence that machine learning techniques can improve the economic value of cross-sectional return forecasts.

The implications of machine learning for quantitative finance are becoming both increasingly apparent and controversial. There is a growing discussion over whether machine learning tools can and should be applied to predict stock returns with greater precision. Broadly speaking, models that can be used to explain the returns of individual stocks draw on stock and firm characteristics, such as the market price of financial instruments and companies' accounting data. These characteristics can also be used to predict expected returns out-of-sample.

Leseprobe

1 INTRODUCTION

1.1 WHAT IS MACHINE LEARNING?

1.2 WHAT MACHINE LEARNING CAN(NOT) DO

2 LITERATURE

3 DATA AND METHODOLOGY

3.1 DATA

3.2 METHODOLOGY

4 MODELS

4.1 BENCHMARK

4.2 PENALIZED LINEAR

4.3 TREE-BASED MODELS

4.4 NEURAL NETWORKS

5 RESULTS

5.1 PREDICTIVE SLOPE

5.2 PORTFOLIOS

6 ROBUSTNESS CHECKS

7 CONCLUSION

Research Objective and Key Themes

This thesis investigates whether advanced machine learning modeling approaches can effectively synthesize and dissect high-dimensional sets of stock and firm characteristics to yield more precise out-of-sample stock return predictions compared to traditional econometric approaches.

Comparative analysis of machine learning methods (penalized linear models, tree-based models, neural networks) in quantitative finance.
Evaluation of nonlinear predictor interactions and their contribution to return forecast accuracy.
Methodological rigorousness in handling multi-collinearity and overfitting via hyper-parameter tuning and validation set approaches.
Economic assessment of machine learning-based return forecasts via portfolio construction and performance metrics.

Excerpt from the Book

1.1 What Is Machine Learning?

Given the broad potential applications of machine learning, there has been an understandable eagerness to utilize these techniques in a variety of applied settings. But before exploring the stock selection applications of this emerging field in depth, it is necessary to first briefly review the basics of machine learning. In the broadest sense, machine learning is the science of getting computers to act without being explicitly programmed (Samuel, 1959). In a typical scenario, we deal with either a quantitative (e.g., stock price) or categorical (e.g., up/down movement) outcome measurement (or response variable) that we wish to predict according to a set of features (e.g., size, value, momentum). Both the outcome measurement and the set of features can be observed in a training set of data. Using this data, we develop a predictive model that allows us to forecast the outcome of new, unseen measurements in a test set of data. This exercise is referred to as a supervised learning problem, so called because the outcome measurement is present to guide the learning process.

Machine learning techniques have a long history of development and application. This begs the question: why are these techniques only now gaining popularity? The main reasons behind the rise of machine learning can be traced back to the following developments:

• Digitalization has increased the amount of readily-available data.

• Heightened computing power and the significant drop in storage costs have allowed for the greater exploitation of this data (e.g., the ability to train large neural networks on either a CPU or GPU).

• Significant algorithmic innovations in recent years (e.g., one of the major breakthroughs in neural networks has been the switch from a sigmoid activation function to a rectified linear unit activation function).

Summary of Chapters

1 INTRODUCTION: This chapter introduces the motivation for applying machine learning to quantitative finance, discusses existing research challenges, and outlines the thesis structure.

2 LITERATURE: This section reviews the academic history of asset pricing models, starting from CAPM through multi-factor models, and provides context for the machine learning application in finance.

3 DATA AND METHODOLOGY: This chapter defines the data sample of stock characteristics and details the statistical framework for return prediction, including backtesting and validation set procedures.

4 MODELS: This chapter describes the specific models implemented, ranging from OLS benchmarks to penalized linear models, tree-based algorithms, and neural networks.

5 RESULTS: This section presents the main empirical results, analyzing predictive slopes and the economic performance of portfolios constructed from the model forecasts.

6 ROBUSTNESS CHECKS: This chapter demonstrates the reliability of the findings by testing the methodology against alternative sample sets and different hyper-parameter configurations.

7 CONCLUSION: This chapter summarizes the findings, confirming that nonlinear machine learning methods outperform traditional models due to their ability to capture complex factor interactions.

Keywords

Machine Learning, Deep Learning, Big Data, Stock Returns, Characteristic-based Anomalies, Ridge Regression, Lasso, Elastic-net, Random Forest, Gradient Boosting, Neural Networks, Supervised Learning, Asset Pricing, Portfolio Management, Overfitting

Frequently Asked Questions

What is the core focus of this research?

The work focuses on using machine learning techniques to predict the cross-section of expected stock returns more accurately than traditional linear econometric models.

What are the central thematic areas?

The core themes include model comparison, non-linear predictive modeling, handling high-dimensional financial data, mitigating overfitting, and evaluating the economic value of forecasts through portfolio construction.

What is the primary research goal?

The goal is to determine if machine learning approaches can synthesize existing stock characteristics into return prediction models that surpass traditional forecasting methods in out-of-sample performance.

Which scientific methods are employed?

The study utilizes a variety of supervised learning methods, including OLS (benchmark), penalized linear models (Lasso, Ridge, Elastic-net), tree-based models (Random Forest, Gradient Boosting), and Neural Networks.

What is covered in the main body of the work?

The main body covers data description, methodology (including model-specific loss functions and hyper-parameter tuning), model implementation, result analysis of predictive power, and robustness tests.

What are the key characteristics of this work?

The work is characterized by its comparative empirical approach, strict out-of-sample validation procedures, and its focus on nonlinear interactions between stock-level predictors.

How does this study handle the risk of overfitting?

The study uses regularization techniques and a rigorous validation set approach to ensure that the models extract universal patterns rather than parameterizing noise.

What findings does the study report regarding Neural Networks?

The study finds that neural networks, along with tree-based methods, are among the most powerful models, attributing their success to the effective identification of complex, nonlinear interactions.

Ende der Leseprobe aus 97 Seiten - nach oben

Details

Titel: Dissecting Characteristics via Machine Learning for Stock Selection
Autor: David Dümig (Autor:in)
Erscheinungsjahr: 2019
Seiten: 97
Katalognummer: V502999
ISBN (eBook): 9783346106551
Sprache: Englisch
Schlagworte: dissecting characteristics machine learning stock selection
Produktsicherheit: GRIN Publishing GmbH

Arbeit zitieren: David Dümig (Autor:in), 2019, Dissecting Characteristics via Machine Learning for Stock Selection, München, GRIN Verlag, https://www.grin.com/document/502999

Dissecting Characteristics via Machine Learning for Stock Selection