Grin logo
de en es fr
Shop
GRIN Website
Publish your texts - enjoy our full service for authors
Go to shop › Business economics - Banking, Stock Exchanges, Insurance, Accounting

On Missing Data Imputation for IRB Models

Title: On Missing Data Imputation for IRB Models

Technical Report , 2021 , 14 Pages , Grade: 1

Autor:in: Yang Liu (Author)

Business economics - Banking, Stock Exchanges, Insurance, Accounting
Excerpt & Details   Look inside the ebook
Summary Excerpt Details

Often, model development starts with missing data treatment. For regulatory internal rating based (IRB) models, missing data raise data quality concerns around system and process whereas the randomness nature of missing is sometimes overlooked, resulting inappropriate choice of imputation methods, more importantly, the chosen imputation method could lead to issues that violate modelling assumptions in the later process. With ML and AI methods introduced to regulatory modelling, impact of missing data will be more thoroughly investigated and challenged.

This paper starts with issues arose from imputation processes in practice, then briefly review common approaches for missing data treatment. A candidate Bayesian approach is then proposed as an alternative. In conclusion, imputed results using the proposed approach improve the explanatory power of historical observations while housing multiple convergence conditions such as the train-test accuracy, likelihood of value distribution, cross-validation and challenger model performance. At the dawn of ML and AI algorithms coming to the regulatory IRB models, these properties are highly desired in the area.

Excerpt


Table of Contents

1. Motivation

2. Short review of imputation approaches

3. Logistic Regression and randomness of missing in IRB

4. Bayesian imputation

5. Numerical example

6. Conclusion

Objectives and Topics

This paper addresses the challenges associated with missing data in internal rating-based (IRB) models, specifically highlighting how traditional imputation methods can violate essential modelling assumptions. The study introduces a Bayesian approach to data imputation designed to improve the explanatory power of historical observations while better managing the uncertainty inherent in missing information.

  • Theoretical limitations of common imputation methods in IRB modeling.
  • Challenges related to data randomness (MCAR, MAR, and MNAR) in financial institutions.
  • A detailed Bayesian likelihood-based methodology for individual record imputation.
  • Empirical evaluation using the Iris dataset compared against benchmarks like KNN and MICE.
  • Assessment of model predictability, cross-validation performance, and feature integrity.

Excerpt from the Book

4. Bayesian imputation

The Bayesian likelihood based approach helps to ease the dependence on availability of complete data, while focusing on the predictability of individual record outcome. Denote the target feature with missing data by xk and the rest of features as xk, and omit the k-range in xk for k ∈ (1, K) and xk for k ∈ (1,...,k − 1, k + 1,...,K), the Bayesian imputation procedure can be broken down to the following general steps:

Step 1: Take an initial guess and simply impute all missing values in the dataset by a constant, e.g. 0.

Step 2: Fit a model with the initial imputed values. The model used here can be any specification of the general form: Y ← F(x1,...,xK) where the features with missing data are used with the initial guess from Step 1. In the numerical example provided, a Random Forest model is applied for imputation purpose, in the meantime, a Logistic Regression model is also fitted to further explore evolution of coefficient parameterisation in regression form.

Step 3: For record i where feature k ∈ (1, K) is missing, denote the observed target value for record i as Y i and the Step 2 model predicted probability towards Y i as p(Y i |xk). Let Xi k be the set of all values that for x ∈ Xi k the model predicted probability p(Y i |x, xk) > p(Y i |xk, xk).

Step 4: For x ∈ Xi k, use the fitted model to estimate p(Y i |x, xk). Calculate the Bayesian likelihood using the classic Bayesian equation: p(xk = x|Y i) · p(Y i) = p(Y i|xk = x) · p(xk = x)

Chapter Summaries

1. Motivation: Discusses the risks of using punitive imputation for missing data in IRB models and highlights the need for methods that maintain theoretical independence assumptions.

2. Short review of imputation approaches: Provides an overview of standard missing data techniques, including record removal, constant values, data-driven methods, and model-based approaches like MICE.

3. Logistic Regression and randomness of missing in IRB: Explains the standard logistic regression model and classifies the types of data missingness, specifically emphasizing why MNAR is a critical concern for financial institutions.

4. Bayesian imputation: Details the proposed step-by-step Bayesian methodology to estimate missing values based on individual record predictability rather than generic statistical assumptions.

5. Numerical example: Demonstrates the practical application of the proposed Bayesian method using the Iris dataset, comparing its performance against benchmark imputation techniques.

6. Conclusion: Summarizes that the proposed Bayesian approach offers a more reliable way to manage missing data, particularly for IRB models facing the evolving landscape of AI and machine learning.

Keywords

Missing data treatment, Imputation methodology, Bayesian imputation, Cross-validation, IRB models, Logistic Regression, Data randomness, MCAR, MAR, MNAR, Predictive power, Machine learning, Credit risk, Data quality, Statistical inference.

Frequently Asked Questions

What is the primary objective of this work?

The paper aims to develop a more robust imputation approach for IRB models that avoids the pitfalls of punitive or generic data replacement by leveraging a Bayesian likelihood-based framework.

What are the central themes discussed?

The central themes include the impact of missing data on regulatory model quality, the theoretical limitations of standard imputation methods, and the application of Bayesian probability to address uncertainty.

Which scientific methodology is utilized?

The author uses a Bayesian likelihood-based approach, which iterates between fitting a predictive model (e.g., Random Forest) and calculating the posterior distribution of potential missing values.

What is the focus of the main body?

The main body examines current industry practices, contrasts them with regression requirements, details the proposed Bayesian algorithm, and provides an empirical demonstration using a standardized dataset.

Why is standard imputation problematic for IRB models?

Standard methods can introduce artificial correlations, violate independence assumptions of logistic regression, and fail to account for the specific nature of MNAR (Missing Not at Random) data in banking.

What is the main takeaway for practitioners?

Practitioners are advised to invest time in understanding the nature of missing data before applying imputation and to consider record-level Bayesian approaches to improve model stability and explainability.

How does the Bayesian approach differ from MICE?

Unlike MICE, which focuses on chained equations across variables, the proposed Bayesian method emphasizes individual record-level predictability and likelihood optimization for better consistency with observed target outcomes.

What role does the Iris dataset play in this study?

It serves as a benchmark numerical example to validate the methodology, providing a controlled environment to test imputation accuracy against other common techniques like KNN and iterative methods.

Excerpt out of 14 pages  - scroll top

Details

Title
On Missing Data Imputation for IRB Models
Grade
1
Author
Yang Liu (Author)
Publication Year
2021
Pages
14
Catalog Number
V1159447
ISBN (PDF)
9783346564863
Language
English
Tags
missing data imputation models
Product Safety
GRIN Publishing GmbH
Quote paper
Yang Liu (Author), 2021, On Missing Data Imputation for IRB Models, Munich, GRIN Verlag, https://www.grin.com/document/1159447
Look inside the ebook
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
  • Depending on your browser, you might see this message in place of the failed image.
Excerpt from  14  pages
Grin logo
  • Grin.com
  • Shipping
  • Contact
  • Privacy
  • Terms
  • Imprint