How Banks Decide on Your Loan
[Default Probability Modeling]

October 3, 2022 15 minute read

If you’ve had the experience of applying for a bank loan, you’re probably familiar with the fact that it’s quite a challenging process. It usually begins with an overwhelming amount of paperwork – forms to fill, guarantees to provide, and numerous signatures. But have you ever wondered how banks sift through all this information to decide who gets a loan and who doesn’t?

Banks essentially grapple with a fundamental question: Is the applicant capable of repaying the loan punctually, including interest and principal? A positive answer leads to loan approval, while a negative one results in rejection. The decision-making process has evolved significantly over the years. No longer relying solely on human judgment, banks have been embracing advanced classification algorithms for more than a decade now to enhance this process.

The task at hand is a classic example of a binary classification problem, one that today’s machine learning (ML) algorithms handle with remarkable efficiency. In the rest of this blog post, I’ll delve into the intriguing details of how this process works.

To begin with, banks must collect and compile a substantial dataset to effectively train their classification models. Here is a good example of such training dataset:

loan_id	account_amount_added_12_24m	account_days_in_dc_12_24m	account_days_in_rem_12_24m	account_days_in_term_12_24m	account_incoming_debt_vs_paid_0_24m	account_status	account_worst_status_0_3m	account_worst_status_12_24m	account_worst_status_3_6m	account_worst_status_6_12m	age	avg_payment_span_0_12m	avg_payment_span_0_3m	merchant_category	merchant_group	has_paid	max_paid_inv_0_12m	max_paid_inv_0_24m	name_in_email	num_active_div_by_paid_inv_0_12m	num_active_inv	num_arch_ok_0_12m	num_arch_ok_12_24m	num_arch_rem_0_12m	num_arch_written_off_0_12m	num_arch_written_off_12_24m	num_unpaid_bills	status_last_archived_0_24m	status_2nd_last_archived_0_24m	status_3rd_last_archived_0_24m	status_max_archived_0_6_months	status_max_archived_0_12_months	status_max_archived_0_24_months	sum_capital_paid_account_0_12m	sum_capital_paid_account_12_24m	sum_paid_inv_0_12m	time_hours	worst_status_active_inv
39161	0	0.00000	0.00000	0.00000	0.00000	1.00000	1.00000	NaN	1.00000	NaN	20	12.69231	8.33333	15	7	1	31638.00000	31638.00000	7	0.15385	2	13	14	0	0.00000	0.00000	2	1	1	1	1	1	1	0	0	178839	9.65333	1.00000
5668	0	0.00000	0.00000	0.00000	NaN	1.00000	1.00000	1.00000	1.00000	1.00000	50	25.83333	25.00000	4	4	1	13749.00000	13749.00000	1	0.00000	0	9	19	3	0.00000	0.00000	0	1	1	1	1	2	2	0	0	49014	13.18139	NaN
84760	0	0.00000	0.00000	0.00000	NaN	NaN	NaN	NaN	NaN	NaN	22	20.00000	18.00000	22	4	1	29890.00000	29890.00000	5	0.07143	1	11	0	3	0.00000	0.00000	1	1	1	1	1	2	2	0	0	124839	11.56194	1.00000
224	0	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	36	4.68750	4.88889	22	4	1	40040.00000	40040.00000	2	0.03125	1	31	21	0	0.00000	0.00000	1	1	1	1	1	1	1	0	0	324676	15.75111	1.00000
78571	0	0.00000	0.00000	0.00000	NaN	NaN	NaN	NaN	NaN	NaN	25	13.00000	13.00000	25	3	1	7100.00000	7100.00000	1	0.00000	0	1	0	0	0.00000	0.00000	0	1	0	0	1	1	1	0	0	7100	12.69861	NaN
4705	0	0.00000	0.00000	0.00000	NaN	NaN	NaN	NaN	NaN	NaN	18	NaN	NaN	15	7	0	0.00000	0.00000	5	NaN	0	0	0	0	NaN	NaN	0	0	0	0	0	0	0	0	0	0	18.32833	NaN
16396	0	0.00000	142.00000	0.00000	0.00000	1.00000	2.00000	2.00000	1.00000	3.00000	49	3.00000	3.00000	10	11	1	2373.00000	2373.00000	5	0.00000	0	1	0	0	0.00000	0.00000	0	1	0	0	1	1	1	18760	8337	2373	10.24444	NaN
34357	57229	0.00000	0.00000	0.00000	0.23224	1.00000	1.00000	1.00000	1.00000	1.00000	34	26.93023	25.86667	22	4	1	8655.00000	9645.00000	0	0.08333	20	215	257	0	0.00000	0.00000	37	1	1	1	1	1	1	42206	35336	457257	12.19278	1.00000
19701	148922	0.00000	47.00000	0.00000	0.96906	1.00000	2.00000	2.00000	2.00000	2.00000	40	33.72727	37.57143	22	4	1	6075.00000	9090.00000	6	0.81818	9	3	2	3	0.00000	0.00000	23	1	2	2	2	2	2	104643	32381	24390	21.41111	1.00000
83080	0	0.00000	0.00000	0.00000	NaN	NaN	NaN	NaN	NaN	NaN	47	21.00000	21.25000	22	4	1	36985.00000	36985.00000	6	0.00000	0	5	10	0	0.00000	0.00000	0	1	1	1	1	1	1	0	0	78620	13.34083	NaN

The table presented above showcases an actual dataset from a fintech firm that operates as a consumer bank. For every loan given out, the bank collects information about account balances, previous loan records, financial background and demographic data of the applicant, at the time of application. The dataset in question is extensive, but not all the information it contains may be relevant. It’s a commonly accepted best practice to remove features that fail to provide extra value in the decision-making process. A time-tested method for identifying these redundant features involves calculating a correlation matrix for all features. By examining this matrix, one can easily spot features that exhibit a high correlation coefficient, indicating their potential redundancy.

Correlation matrix of the Dataset features

correlation_matrix

A critical information of this dataset is the “default” column, where each entry is either 0 or 1, indicating whether a customer has defaulted on their loan. This information is crucial for training the algorithm, and data records lacking this information can not be processed by the model. Another notable characteristic of this dataset is its sparsity, due to the frequent occurrence of missing values in loan applications. Prior to feeding this dataset to a classification model, it must undergo a specific encoding process. This process aims to effectively manage missing values and categorical features while simultaneously minimizing the loss of information. The financial sector commonly employs Weight-of-Evidence encoding due to its effectiveness in handling these challenges.

Weight-of-Evidence-Encoding

Weight of Evidence (WoE) encoding is a statistical method utilized in data preprocessing, particularly in machine learning for credit scoring and risk assessment. The technique focuses on converting categorical variables into a continuous scale. It does this by calculating the logarithm of odds associated with the target variable. Specifically, the WoE for each category is derived by taking the natural logarithm of the ratio of the proportion of positive outcomes to that of negative outcomes within that category. Such a transformation is key in highlighting the predictive strength of a categorical variable in relation to the target variable. This enables algorithms to more effectively process and learn from the data.

Additionally, WoE encoding treats missing values as distinct categorical entities. This approach avoids the pitfalls of discarding or replacing missing values with substitute figures, thus preserving the information of the dataset. Primarily designed for categorical data, WoE can also be adapted for numerical features with excessive missing values. This is achieved by categorizing these numerical values into bins and then applying WoE encoding. Below is an example of how an arbitrary categorical feature can be encoded with WOE. The number of events and non-events correspond to the count of binary values in the response variable (in our example the “default” column).

Example of Weight-of-Evidence encoding

Feature-X(values)	Number of events	Number of non-events	Percentage events	Percentage non-events	WOE	IV
A	90	2400	90/490=0.184	2400/9510=0.25	Ln(0.184/0.25)=-0.3065	0.02
B	130	1300	130/490=0.265	1300/9510=0.137	Ln(0.265/0.137)=0.659	0.175
C	80	3500	80/490=0.16	3500/9510=0.37	Ln(0.16/0.37)=-0.838	0.176
D	100	1210	100/490=0.2	1210/9510=0.18	Ln(0.2/0.18)=0.105	0.002
E	90	1100	90/490=0.184	1100/9510=0.12	Ln(0.184/0.12)=-0.427	0.026
Sum	490	9510	-	-	-	0.399

The column WOE indicates a single feature’s predictive capability concerning its independent feature. When a category or bin within a feature displays a higher ratio of events relative to non-events, it yields a higher WoE value. High value of WoE suggests that the feature’s particular category or bin effectively discriminates between events and non-events. The formula to calculate the Weight-of-Evidence for any feature is:

drawing

Lastly, it’s important to mention that a feature should typically meet a minimum threshold of information value to be included in the model otherwise it may simply add noise. The table’s final column displays the aggregated Information Value (IV), where we focus on the sum of IV across unique feature values. Generally, any feature with an aggregated IV below 0.02 is deemed non-significant. The aggregated IV statistic is calculated as follows:

After applying the WOE encoding, our dataset looks like this:

loan_id	account_amount_added_12_24m_woe	account_days_in_dc_12_24m_woe	account_days_in_rem_12_24m_woe	account_days_in_term_12_24m_woe	account_incoming_debt_vs_paid_0_24m_woe	account_status_woe	account_worst_status_0_3m_woe	account_worst_status_12_24m_woe	account_worst_status_3_6m_woe	account_worst_status_6_12m_woe	age_woe	avg_payment_span_0_12m_woe	avg_payment_span_0_3m_woe	merchant_category_woe	merchant_group_woe	has_paid_woe	max_paid_inv_0_12m_woe	max_paid_inv_0_24m_woe	name_in_email_woe	num_active_div_by_paid_inv_0_12m_woe	num_active_inv_woe	num_arch_dc_0_12m_woe	num_arch_dc_12_24m_woe	num_arch_ok_0_12m_woe	num_arch_ok_12_24m_woe	num_arch_rem_0_12m_woe	num_arch_written_off_0_12m_woe	num_arch_written_off_12_24m_woe	num_unpaid_bills_woe	status_last_archived_0_24m_woe	status_2nd_last_archived_0_24m_woe	status_3rd_last_archived_0_24m_woe	status_max_archived_0_6_months_woe	status_max_archived_0_12_months_woe	status_max_archived_0_24_months_woe	recovery_debt_woe	sum_capital_paid_account_0_12m_woe	sum_capital_paid_account_12_24m_woe	sum_paid_inv_0_12m_woe	time_hours_woe	worst_status_active_inv_woe
39161	-0.20919	-0.93784	-0.86552	-0.92614	0.14921	-0.19706	-0.06701	-0.08080	0.02000	0.02000	1.11861	0.87745	1.90707	2.13021	1.57060	-0.85294	1.90707	2.13021	0.47199	2.13021	1.43707	-0.37531	-0.39551	1.31928	1.57060	-0.18479	-0.81423	-0.81423	0.87745	-0.70300	-0.64237	-0.59437	-0.45378	-0.45378	-0.24469	-0.39551	-0.17237	-0.22116	0.00000	1.43707	0.25841
5668	-0.20919	-0.93784	-0.86552	-0.92614	0.06652	-0.19706	-0.06701	0.42547	0.02000	0.11531	1.57060	1.57060	2.41790	1.21392	-0.37531	-0.85294	0.47199	0.29763	0.62614	-0.57784	-0.09441	-0.37531	-0.39551	-0.18479	1.57060	1.57060	-0.81423	-0.81423	0.11531	-0.70300	-0.64237	-0.59437	-0.45378	0.52078	0.05077	-0.39551	-0.17237	-0.22116	1.57060	1.57060	-0.09441
84760	-0.20919	-0.93784	-0.86552	-0.92614	0.06652	0.11531	0.11531	-0.08080	0.08252	0.02000	1.43707	0.68329	1.43707	-0.17237	-0.37531	-0.85294	2.13021	1.90707	0.42547	-0.57784	0.95156	-0.37531	-0.39551	1.31928	-0.23300	1.57060	-0.81423	-0.81423	0.80846	-0.70300	-0.64237	-0.59437	-0.45378	0.52078	0.05077	-0.39551	-0.17237	-0.22116	1.90707	1.57060	0.25841
224	-0.20919	1.50161	1.50161	1.50161	0.06652	0.11531	0.11531	-0.08080	0.08252	0.02000	1.57060	1.72475	1.90707	-0.17237	-0.37531	-0.85294	2.41790	2.41790	1.72475	-0.57784	0.95156	-0.37531	-0.39551	1.43707	1.57060	-0.18479	-0.81423	-0.81423	0.80846	-0.70300	-0.64237	-0.59437	-0.45378	-0.45378	-0.24469	-0.39551	-0.17237	-0.22116	0.00000	1.43707	0.25841
78571	-0.20919	-0.93784	-0.86552	-0.92614	0.06652	0.11531	0.11531	-0.08080	0.08252	0.02000	2.13021	0.74392	1.43707	2.41790	2.13021	-0.85294	0.09878	0.27783	0.62614	-0.57784	-0.09441	-0.37531	-0.39551	-0.18479	-0.23300	-0.18479	-0.81423	-0.81423	0.11531	-0.70300	0.77567	0.65431	-0.45378	-0.45378	-0.24469	-0.39551	-0.17237	-0.22116	0.14921	1.31928	-0.09441
4705	-0.20919	-0.93784	-0.86552	-0.92614	0.06652	0.11531	0.11531	-0.08080	0.08252	0.02000	1.11861	0.91382	0.31784	2.13021	1.57060	1.11861	0.09878	0.27783	0.42547	0.91382	-0.09441	-0.37531	-0.39551	-0.18479	-0.23300	-0.18479	0.99078	0.99078	0.11531	0.99078	0.77567	0.65431	0.52078	0.91382	0.99078	-0.39551	-0.17237	-0.22116	0.14921	1.43707	-0.09441
16396	-0.20919	-0.93784	0.00000	-0.92614	0.14921	-0.19706	1.72475	1.72475	0.02000	0.00000	1.57060	1.72475	1.72475	1.72475	1.03160	-0.85294	0.09878	0.27783	0.42547	-0.57784	-0.09441	-0.37531	-0.39551	-0.18479	-0.23300	-0.18479	-0.81423	-0.81423	0.11531	-0.70300	0.77567	0.65431	-0.45378	-0.45378	-0.24469	-0.39551	1.57060	2.41790	0.14921	1.43707	-0.09441
34357	0.00000	-0.93784	-0.86552	-0.92614	0.14921	-0.19706	-0.06701	0.42547	0.02000	0.11531	1.57060	1.57060	2.41790	-0.17237	-0.37531	-0.85294	0.47199	0.29763	1.21392	-0.57784	0.00000	-0.37531	-0.39551	0.00000	0.00000	-0.18479	-0.81423	-0.81423	0.00000	-0.70300	-0.64237	-0.59437	-0.45378	-0.45378	-0.24469	-0.39551	2.13021	2.41790	0.00000	1.31928	0.25841
19701	2.41790	-0.93784	0.00000	-0.92614	0.00000	-0.19706	1.72475	1.72475	1.72475	1.90707	1.72475	2.41790	2.41790	-0.17237	-0.37531	-0.85294	0.09878	0.29763	1.31928	0.00000	2.41790	-0.37531	-0.39551	-0.18479	-0.23300	1.57060	-0.81423	-0.81423	0.00000	-0.70300	1.72475	1.72475	1.43707	0.52078	0.05077	-0.39551	0.00000	0.00000	0.74392	1.90707	0.25841
83080	-0.20919	-0.93784	-0.86552	-0.92614	0.06652	0.11531	0.11531	-0.08080	0.08252	0.02000	1.57060	1.72475	1.90707	-0.17237	-0.37531	-0.85294	1.90707	2.13021	1.31928	-0.57784	-0.09441	-0.37531	-0.39551	-0.18479	-0.23300	-0.18479	-0.81423	-0.81423	0.11531	-0.70300	-0.64237	-0.59437	-0.45378	-0.45378	-0.24469	-0.39551	-0.17237	-0.22116	1.72475	1.57060	-0.09441

The dataset is now prepared and set for the model to be executed. For brevity, I haven’t explained here the essential step of dividing the dataset into training, validation, and test subsets. However, it’s crucial for the reader to remember that the WOE encoding should be applied exclusively to the training subset. The validation and test datasets should be encoded by mapping their original values to the WOE-encoded values derived from the training set. Validation and test data are never encoded alongside the training data to prevent data leakage.

The Model

In binary classification tasks such as this, the finance sector frequently opts for the Logistic Regression (Logit) classifier model. This choice is standard in the industry for 3 reasons. First, the logit model produces a probability output, indicating the likelihood of an event being a “success” (in this context, it implies a loan default). This output is straightforward to understand and allows for the easy establishment of a customized threshold for success or failure, which can be different from the usual 0.5. Secondly, the Logit model provides us with a built-in feature importance for every independent variable of the model. And finally, the Logit model stands out as the most straightforward and easiest to understand classifier.

Recently, more advanced models have gained popularity among loan providers, offering an alternative to the traditional logit model. While the logit model’s simplicity is unmatched, certain complex models have been delivering consistently more accurate results, making them the preferred choice for some lenders. One such model is XGBoost, an efficient and scalable variant of the boosting family of models. As an ensemble learning method, XGBoost constructs multiple models (specifically decision trees) and amalgamates them for improved accuracy and robustness. The models are added in sequence during training, each addressing the errors of its predecessors until no further enhancements are possible. Similar to the Logit model, XGBoost predicts the probability of “success”. However, it’s more intricate to train due to the extensive calibration of numerous hyperparameters required throughout the training phase.

Hyperparameter tuning or calibration is a key aspect of this model. This process involves creating a grid encompassing all sensible values of the hyperparameters. For each unique combination of these hyperparameters, the model undergoes training, and the set of hyperparameters that produces the most optimal result (be it in terms of Accuracy, F1 Score, Recall, or any other metric chosen by the model developer) is identified and retained. The model trained with this set of hyperparameters is often referred to as the “champion model.” This model is then applied to test data, which in our context means predicting the likelihood of loan defaults for new loan applications. In the following, we will go through some of the common hyperparameters of the XGBoost model:

Example of a hyperparameter grid

param_grid = {
        'learning_rate': [0.01, 0.01],
        'max_depth': [8, 9, 10],
        'subsample': [0.8, 0.7, 0.6],
        'colsample_bytree': [0.9, 0.8],
        'scale_pos_weight': [70] 
        'reg_alpha': [0.1, 0.5]
    }

learning_rate: determines the impact of each tree on the final outcome. A smaller learning rate requires more boosting rounds to achieve the same reduction in residual error as a larger learning rate. Common values are in the range of 0.01 to 0.3.

max_depth: defines the maximum depth of a tree. Deeper trees can model more complex patterns, but they are also more likely to overfit. The values in the grid (8, 9, 10) suggest moderately deep trees.

subsample: sets the fraction of the training data to be randomly sampled for each tree. Values below 1 make the algorithm more conservative and prevent overfitting but too small values might lead to underfitting.

colsample_bytree: The fraction of features to be randomly sampled for each tree. Values less than 1 help in making the model more robust by reducing overfitting and can be particularly useful for datasets with a large number of features.

scale_pos_weight: Used in imbalanced classification to scale the gradient for the minority class. A value of 70 suggests that the problem is quite imbalanced, and the model should increase the importance of the minority class by a factor of 70.

reg_alpha: The L1 regularization term on the weights (also known as alpha). Regularization helps to prevent overfitting by penalizing more complex models.

The full list of potential hyperparameters is extensive, and it is available here. To run this model efficiently it is important to understand well the mechanism of the model and the impact of its parameters.

After training the champion model, the next step is to test it on the validation dataset to evaluate its performance. If the results meet our expectations and criteria on the validation set, we can then move forward to deploy the model in a real-world setting. This means the model will be used to determine whether new customers are eligible for loan approval or not. This approach is reflective of how modern banking systems process loan approvals.

We’re now set to run our champion model on the WOE encoded validation data. It’s important to note that our model was trained using the fairly light parameter grid outlined earlier. Training the XGBoost model with an extensive grid on a personal computer can be an extremely time-intensive task. Consequently, our parameter tuning and model results could likely be improved conditional we had access to greater computational resources. Our model yields the following results on the validation dataset:

AUC-ROC: 0.8868571946417096
Recall: 0.7372262773722628
Precision: 0.08054226475279107
AUC-PR: 0.12131912757189459
Confusion Matrix:
[[7707 1153]
 [  36  101]]

This dataset exhibits a significant imbalance, making the accurate identification of defaults more critical than the occasional mislabeling of non-defaults. In this context, the relatively high recall metric is a positive aspect, as it indicates a good rate of correctly identifying actual defaults. However, the precision is notably low, suggesting that only a small fraction of loans predicted to default actually do so. This could mean the model is overly cautious, potentially rejecting financialy healthy customers, which is not beneficial for the business. Another interesting observation is the stark contrast between the AUC-ROC and AUC-PR metrics, which again underlines how important it is to consider the use of AUC-PR instead of the AUC-ROC metrix when faced with an imbalanced dataset.

Executing the champion model on a batch of fresh loan applications would produce the following probabilities of default:

loan_id	default_probability
349	0.23701356
44	0.19725785
8411	0.2826372
693	0.4592986
676	0.6802527
70	0.33273283
598	0.2290507
66790	0.28260818
1470	0.47338352
79385	0.27572608
2894	0.40206018

As anticipated, the default probabilities vary between 0 and 1, with just a single loan application showing a default probability above 0.5. In a practical scenario, this would mean that the loan with ID 676 is likely to be denied approval.

Dear reader, we have reached the end of this post. I hope you’ve found it enlightening and that it has given you a deeper understanding of the mechanics behind loan applications. Instead of a bank official reviewing your documents, it’s quite probable that a sophisticated algorithm in the bank’s infrastructure is making the decision about your loan approval. For those interested in delving deeper into this subject, I encourage you to visit my git project. There you can explore the entire dataset, examine the code implementation in detail, or even download the model as a Docker application.

Thank you for reading and I wish you a 0% default probability!

How Banks Decide on Your Loan [Default Probability Modeling]

Weight-of-Evidence-Encoding

The Model

How Banks Decide on Your Loan
[Default Probability Modeling]