Building a Predictor for Credit Default Risk Evaluation

Machine Learning Random Forest XGBoost Financial Data Risk Modeling Incremental Learning

Table of Contents

Motivation

For a person who applies to a credit loan, there is a probability for credit default (risk). I train machine learning algorithms to build a probability model to classify if a loan application should be approved or not.

Exploratory Data Analysis

I identified some few main issues to be addressed after the initial data analysis.

Addressing Missing Values

First identified the nature (MCAR or MAR or MNAR) of the missing values with the help of missingno python library.

Addressing Imbalanced classes

Compared the performances for applying SMOTE and ADASYN (Adaptive Synthetic) algorithm to balance the Target variable.

Feature Engineering

We appplied different methods to select important features so it will reduce the computational time, the risk of overfitting and complexity of interpretation.

Hyperparamter Tunning

Applied Optuna that utillizes Bayesian optimization algorithm for sampling hyperparamtes

Results

Evolution of Precision-Recall AUC score during the training of a XGBoost model.
Precision-Recall Curve for the best XGBoost model.

Challenges

Impact

The model has a 0.66 ROC-auc score.