Often, model development starts with missing data treatment. For regulatory internal rating based (IRB) models, missing data raise data quality concerns around system and process whereas the randomness nature of missing is sometimes overlooked, resulting inappropriate choice of imputation methods, more importantly, the chosen imputation method could lead to issues that violate modelling assumptions in the later process. With ML and AI methods introduced to regulatory modelling, impact of missing data will be more thoroughly investigated and challenged.
This paper starts with issues arose from imputation processes in practice, then briefly review common approaches for missing data treatment. A candidate Bayesian approach is then proposed as an alternative. In conclusion, imputed results using the proposed approach improve the explanatory power of historical observations while housing multiple convergence conditions such as the train-test accuracy, likelihood of value distribution, cross-validation and challenger model performance. At the dawn of ML and AI algorithms coming to the regulatory IRB models, these properties are highly desired in the area.
Inhaltsverzeichnis (Table of Contents)
- Motivation
- Short review of imputation approaches
- Logistic Regression and randomness of missing in IRB
- Logistic Regression
- Randomness of missing for IRB modelling
- Bayesian imputation
- Numerical example
- Conclusion
Zielsetzung und Themenschwerpunkte (Objectives and Key Themes)
This paper examines the importance of missing data treatment for regulatory internal rating based (IRB) models. It highlights the potential issues and challenges associated with imputation methods, particularly in the context of machine learning (ML) and artificial intelligence (AI) integration in regulatory modeling. The paper emphasizes the need for appropriate imputation strategies that align with modeling assumptions and address the randomness of missing data.
- The impact of missing data on IRB model development and data quality
- A review of common imputation approaches and their limitations
- The challenges of applying traditional imputation methods to Logistic Regression models
- The significance of randomness in missing data and its implications for IRB modeling
- The exploration of a Bayesian imputation approach as an alternative solution.
Zusammenfassung der Kapitel (Chapter Summaries)
- Motivation: This chapter introduces the significance of addressing missing data in IRB models, emphasizing the potential impact on model assumptions, data quality, and regulatory requirements. It discusses the challenges associated with imputation methods and the need for appropriate strategies.
- Short review of imputation approaches: This chapter provides an overview of commonly used imputation approaches, including record removal, constant value imputation, data-driven imputation, and model-based imputation. It discusses the strengths and weaknesses of each approach, highlighting potential limitations and challenges.
- Logistic Regression and randomness of missing in IRB: This chapter delves into the specific challenges of missing data treatment in the context of Logistic Regression models, a widely used approach in credit risk rating. It addresses the correlation issues associated with imputation methods and the importance of considering the randomness of missing data.
Schlüsselwörter (Keywords)
This paper focuses on the key themes of missing data treatment, imputation methodologies, Bayesian imputation, and cross-validation. The research explores the challenges and potential solutions for handling missing data in regulatory IRB models, particularly in the context of ML and AI applications. The paper emphasizes the need for robust and theoretically sound imputation strategies that address the complexities of missing data and ensure the integrity of model development and analysis.
- Quote paper
- Yang Liu (Author), 2021, On Missing Data Imputation for IRB Models, Munich, GRIN Verlag, https://www.grin.com/document/1159447