Credit Scoring, Artificial Intelligence and Quantum Machine Learning
COURSE OBJECTIVE
This rigorous course is designed to impart skills necessary for creating and calibrating credit scoring models, including calculating default probabilities (PD) and validating these models. Participants will explore a range of machine learning approaches from traditional methods to quantum and probabilistic techniques, and learn how to leverage artificial intelligence for automating these processes.
Participants will gain proficiency in both conventional and cuttingedge models for credit scoring during the stages of credit admission and monitoring. This includes handling vast datasets to construct comprehensive credit and behavior scoring systems.
The course also delves into advanced data analytics, covering topics like sampling, exploratory analysis, feature engineering, segmentation, and outlier detection.
A variety of machine learning techniques will be discussed—ranging from supervised and unsupervised learning to reinforcement learning—specifically applied to developing tools for credit scoring. Wellestablished methods like logistic regression and other innovative machine learning techniques such as decision trees, naive Bayes, Knearest neighbors, LASSO logistic regression, random forests, neural networks, Bayesian networks, Support Vector Machines, and gradient boosting trees will be explored.
The application of deep learning in building robust credit scoring models suitable for banking applications will be covered extensively. This includes the use of various neural network architectures such as feedforward, convolutional, recurrent, and adversarial generative networks, alongside Fermac Risk’s proprietary methodology for managing and interpreting deep learning models to prevent the pitfalls of black box scenarios.
Instruction on tuning hyperparameters, which are crucial for controlling the learning process and optimizing model performance, will be provided along with techniques like grid search, random search, and Bayesian optimization.
The course provides over 20 distinct credit scoring models using different methodologies across multiple programming environments like R, Python, Jupyterlab, Tensorflow, and SAS. This spans models for various credit aspects including origination, behavior, recovery, income, and churn.
Advanced techniques for calibrating risk parameters for the IRB and IFRS 9 PD are included, covering methods from adjustment to central tendency to deep learning models for PD lifetime calibration under IFRS 9.
The curriculum introduces automated machine learning (AutoML), enhancing the ability of risk analysts to develop, scale, and validate highquality machine learning models efficiently.
Participants will also explore probabilistic machine learning techniques, like Bayesian neural networks, to construct credit scoring models, alongside best practices for model validation, particularly focusing on AIdriven financial tools as per European regulatory standards.
Finally, the course highlights the emerging field of Quantum Machine Learning, discussing its potential to revolutionize financial services through enhanced computational speeds and capabilities using quantum algorithms.
This comprehensive program aims to equip participants with the skills to utilize advanced computing technologies, including quantum and tensor networks, for machine learning calculations, preparing them for significant advancements in the financial sector.
WHO SHOULD ATTEND?
The Course is aimed at professionals from financial institutions interested in developing powerful credit scoring models and calibrating their output, as well as model managers in credit risk and data science departments.
For a better understanding of the topics, the participant must know statistics and mathematics. You can benefit from quantum computing technologies without needing to have knowledge of quantum physics.
AGENDA
Credit Scoring, Artificial Intelligence and Quantum Machine Learning
CREDIT SCORING
Module 0: Quantum Computing and Algorithms

Future of quantum computing in banking

Is it necessary to know quantum mechanics?

QIS Hardware and Apps

quantum operations

Qubit representation

Measurement

Overlap

matrix multiplication

Qubit operations

Multiple Quantum Circuits

Entanglement

Deutsch Algorithm

Quantum Fourier transform and search algorithms

Hybrid quantumclassical algorithms

Quantum annealing, simulation and optimization of algorithms

Quantum machine learning algorithms

Exercise 1: Quantum operations
Module 1: Artificial Intelligence for Credit Scoring

Big Data Definition

Big Data in financial institutions and fintech

Big data in Bigtech

Data typology

structured

semistructured

Unstructured Data


Big data: Volume, Velocity, Variety, Veracity and Value

Big Data Size

Big data sources

transactional data

social media dating

Credit bureau data

Origin of data sources

The data of the website

Text Data

sensor data

RFID and NFC data

Data from telecom operators

Smart grid data


banking digitization

financial inclusion

Regulation in Europe, USA and Latin America

Artificial intelligence in banking

Artificial intelligence in the credit cycle
Module 2: AI in Credit Scoring

AI in Credit Scoring for Banking and Fintech

Offline and online credit scoring

Design and Construction of Credit Scoring Models

Advantages and disadvantages

Models to face new financial crises

Machine Learning to develop and validate credit scoring

Importance of the Bureau Score

Credit Scorecard Management

Default Probability Estimation PD
Module 3: Machine Learning

Definition of Machine Learning

Machine Learning Methodology

Data Storage

Abstraction

Generalization

Assessment


Supervised Learning

Unsupervised Learning

Reinforcement Learning

deep learning

Typology of Machine Learning Algorithms

Steps to Implement an Algorithm

information collection

Exploratory Analysis

Model Training

Model Evaluation

Model improvements

Machine Learning in Credit Scoring Models

Quantum Machine Learning

Exploratory Data Analysis (EDA) and Feature Engineering
Module 4: Exploratory Data Analysis

Data typology

transactional data

Unstructured data embedded in text documents

Social Media Data

data sources

Data review

Target definition

Time horizon of the target variable

Sampling

Random Sampling

Stratified Sampling

Rebalanced Sampling


Exploratory Analysis:

histograms

Q Q Plot

Moment analysis

boxplot


Treatment of Missing values

Multivariate Imputation Model

Advanced Outlier detection and treatment techniques

Univariate technique: winsorized and trimming

Multivariate Technique: Mahalanobis Distance

Module 5: Feature Engineering

Feature Engineering

Data Standardization

Variable categorization

Equal Interval Binning

Equal Frequency Binning

ChiSquare Test


binary coding

WOE Coding

WOE Definition

Univariate Analysis with Target Variable

Variable Selection

Treatment of Continuous Variables

Treatment of Categorical Variables

Using Gini

Information Value

Optimization of continuous variables

Optimization of categorical variables


Exercise 1: Exploratory Analysis in R

Exercise 2: Detection and Treatment of Advanced Outliers

Exercise 3: Stratified and Random Sampling in R

Exercise 4: Multivariate imputation model

Exercise 5: Univariate analysis in percentiles in R

Exercise 6: Continuous variable optimal univariate analysis in Excel

Exercise 7: Estimation of the KS, Gini, and IV of each variable in Excel

Exercise 8: Word Cloud analysis of variables in R
MACHINE LEARNING
Unsupervised Learning
Module 6: Unsupervised models

Hierarchical Clusters

K Means

standard algorithm

Euclidean distance

Principal Component Analysis (PCA)

Advanced PCA Visualization

Eigenvectors and Eigenvalues

Exercise 14: Core components in R and SAS

Exercise 15: Segmentation of the data with KMeans R
Supervised Learning
Module 7: Logistic Regression and LASSO Regression

Econometric Models

Logit regression

probit regression

Piecewise Regression

survival models


Machine Learning Models

Lasso Regression

Ridge Regression


Model Risk in Logistic Regression

Exercise 16: Credit Scoring Logistic Regression in SAS and R

Exercise 17: Credit Scoring Lasso Logistic Regression in R

Exercise 18: Model Risk Using Confidence Intervals of Logistic Regression Coefficients
Module 8: Trees, KNN and Naive Bayes

Decision Trees

modeling

Advantages and disadvantages

Recursion and Partitioning Processes

Recursive partitioning tree

Pruning Decision tree

Conditional inference tree

tree display

Measurement of decision tree prediction

CHAID model

Model C5.0


KNearest Neighbors KNN

modeling

Advantages and disadvantages

Euclidean distance

Distance Manhattan

K value selection


Probabilistic Model: Naive Bayes

naive bayes

Bayes' theorem

Laplace estimator

Classification with Naive Bayes

Advantages and disadvantages


Exercise 19: Credit Scoring Decision Tree in SAS and R

Exercise 20: Credit Scoring KNN in R and SAS

Exercise 21: Credit Scoring Naive Bayes in R
Module 9: Support Vector Machine SVM

SVM with dummy variables

SVM

optimal hyperplane

Support Vectors

add costs

Advantages and disadvantages

SVM visualization

Tuning SVM

kernel trick

Exercise 22: Credit Scoring Support Vector Machine in R data 1

Exercise 23: Credit Scoring Support Vector Machine in Python data 2
Module 10: Ensemble Learning

set models

bagging

bagging trees

Random Forest

Boosting

adaboost

Gradient Boosting Trees

Advantages and disadvantages

Exercise 24: Credit Scoring Boosting in R

Exercise 25: Credit Scoring Bagging in R

Exercise 26: Credit Scoring Random Forest, R and Python, data 1 and 2

Exercise 27: Credit Scoring Gradient Boosting Trees
MODEL VALIDATION
Module 11: Validation of traditional and Machine Learning models

Model validation

Validation of machine learning models

Regulatory validation of machine learning models in Europe

Out of Sample and Out of time validation

Checking pvalues in regressions

R squared, MSE, MAD

Waste diagnosis

Goodness of Fit Test

multicollinearity

Binary case confusion matrix

Multinomial case confusion matrix

Main discriminant power tests

confidence intervals

Jackknifing with discriminant power test

Bootstrapping with discriminant power test

Kappa statistic

KFold Cross Validation

Exercise 28: Logistic Regression GoodnessofFit Test

Exercise 29: Cross validation in SAS

Exercise 30: Gini Estimation, Information Value, Brier Score, Lift Curve, CAP, ROC, Divergence in SAS and Excel

Exercise 31: Bootstrapping of SAS parameters

Exercise 32: Jackkinifng in SAS

Exercise 33: Gini/ROC Bootstrapping in SAS

Exercise 34: Kappa estimation

Exercise 35: KFold Cross Validation in R

Exercise 36: Traffic light validation out of time (horizon 6 years) of Logistics and Machine Learning models
Module 12: Stability Testing

Model stability index

Factor stability index

Xisquare test

KS test

Exercise 37: Stability tests of models and factors
DEEP LEARNING
Module 14: Introduction to Deep Learning

Definition and concept of deep learning

Why now the use of deep learning?

Artificial neural networks

Neural network architectures

activation function

sigmoidal

Rectified linear unit

hypertangent

Softmax


feedforward network

Multilayer Perceptron

Using Tensorflow

Using Tensorboard

R deep learning

Python deep learning

Convolutional Neural Networks

Use of deep learning in image classification

cost function

Gradient descending optimization

Using deep learning for credit scoring

How many hidden layers?

How many neurons, 100, 1000?

How many times and size of the batch size?

What is the best activation function?


Deep Learning Software: Caffe, H20, Keras, Microsoft, Matlab, etc.

Deployment software: Nvidia and Cuda

Hardware, CPU, GPU and cloud environments

Advantages and disadvantages of deep learning
Module 15: Deep Learning Feed Forward Neural Networks

Single Layer Perceptron

Multiple Layer Perceptron

Neural network architectures

activation function

sigmoidal

Rectified linear unit (Relu)

The U

Selu

hyperbolic hypertangent

Softmax

other


Back propagation

Directional derivatives

gradients

Jacobians

Chain rule

Optimization and local and global minima


Exercise 38: Credit Scoring using Deep Learning Feed Forward
Module 16: Deep Learning Convolutional Neural Networks CNN

CNN for pictures

Design and architectures

convolution operation

descending gradient

filters

strider

padding

Subsampling

pooling

fully connected

Credit Scoring using CNN

Recent CNN studies applied to credit risk and scoring

Exercise 39: Credit scoring using deep learning CNN
Module 17: Deep Learning Recurrent Neural Networks RNN

Natural Language Processing

Natural Language Processing (NLP) text classification

Long Term Short Term Memory (LSTM)

hopfield

Bidirectional associative memory

descending gradient

Global optimization methods

RNN and LSTM for credit scoring

Oneway and twoway models

Deep Bidirectional Transformers for Language Understanding

Exercise 40: Credit Scoring using Deep Learning LSTM
Module 18: Generative Adversarial Networks (GANs)

Generative Adversarial Networks (GANs)

Fundamental components of the GANs

GAN architectures

Bidirectional GAN

Training generative models

Synthetic Data

Credit Scoring using GANs

Exercise 41: Credit Scoring using GANs
Module 19: Calibrating Machine Learning and Deep Learning

hyperparameterization

grid search

random search

Bayesian Optimization

Train test split ratio

Learning rate in optimization algorithms (e.g. gradient descent)

Selection of optimization algorithm (e.g., gradient descent, stochastic gradient descent, or Adam optimizer)

Activation function selection in a (nn) layer neural network (e.g. Sigmoid, ReLU, Tanh)

Selection of loss, cost and custom function

Number of hidden layers in an NN

Number of activation units in each layer

The dropout rate in nn (dropout probability)

Number of iterations (epochs) in training a nn

Number of clusters in a clustering task

Kernel or filter size in convolutional layers

pooling size

batch size

Exercise 42: Optimization Credit Scoring Xboosting, Random forest and SVM

Exercise 43: Optimized Credit Scoring Deep Learning
Module 20: Traditional Scorecard Construction

scoring assignment

Scorecard Classification

Scorecard WOE

Binary Scorecard

Continuous Scorecard


Scorecard Rescaling

Factor and Offset Analysis

Scorecard WOE

Binary Scorecard


Reject Inference Techniques

cutoff

parceling

Fuzzy Augmentation

Machine Learning


Advanced Cut Point Techniques

Cutoff optimization using ROC curves


Exercise 44: Building Scorecard in Excel, R and Python

Exercise 45: Optimum cutoff point estimation in Excel and model risk by cutoff point selection

Exercise 46: Confusion matrix to verify Type 1 and Type 2 Error in Excel with and without variables
QUANTUM MACHINE LEARNING
Module 21: Quantum Credit Scoring

What is quantum machine learning?

Qubit and Quantum States

Quantum Automatic Machine Algorithms

quantum circuits

quantum k means

Support Vector Machine

Support Vector Quantum Machine

Variational quantum classifier

Training quantum machine learning models

Quantum Neural Networks

Quantum GAN

Quantum Boltzmann machines

Quantum machine learning in Credit Risk

Quantum machine learning in credit scoring

quantum software

Exercise 47: Quantum Kmeans

Exercise 48: Quantum Support Vector Machine to develop credit scoring model

Exercise 49: Quantum feed forward Neural Networks to develop a credit scoring model

Exercise 50: Quantum Convoluted Neural Networks to develop a credit scoring model
Module 22: Tensor Networks for Quantum Machine Learning

What are tensor networks?

Quantum Entanglement

Tensor networks in machine learning

Tensor networks in unsupervised models

Tensor networks in SVM

Tensor networks in NN

NN tensioning

Application of tensor networks in credit scoring models

Exercise 51: Construction of credit scoring using tensor networks
PROBABILISTIC MACHINE LEARNING
Module 23: Probabilistic Machine Learning

Introduction to probabilistic machine learning

Gaussian models

Bayesian Statistics

Bayesian logistic regression

Kernel family

Gaussian processes

Gaussian processes for regression


Hidden Markov Model

Markov chain Monte Carlo (MCMC)

Metropolis Hastings algorithm


Machine Learning Probabilistic Model

Bayesian Boosting

Bayesian Neural Networks

Exercise 52: Gaussian process for regression

Exercise 53: Credit scoring model using Bayesian Neural Networks
MODEL RISK
Module 24: Model Risk in Credit Scoring

Model Risk

Model risk in deep learning

Model risk in credit scoring

black boxes

cutoff decision

absence of data

Model Risk for not updating or recalibrating

Ethical concepts of credit scoring

Exercise 54: Model risk in credit scoring due to not recalibrating on time
CREDIT SCORING MODELS
Module 25: Credit Scoring Models by Product

Origination Credit Scoring

Credit Card Score

Mortgage Score

consumption scores

Car Score


Behavior Score (BS)

Temporal horizon

Dashboard data information

Panel data regression

Cox regression

Behavior Score with macroeconomic variables

transition matrices

Behavior Score with transition matrices

Transaction Score

Machine Learning Models

BEHAVIOR SCORE ON CREDIT CARDS


Exercise 55: Behavior Score Logistic Regression in Python data 2

Exercise 56: Behavior Score Support Vector Machines in python

Exercise 57: Behavior Score Random Forest in python

Exercise 58: Behavior Score Gradient Boosting Trees in python

Exercise 59: Behavior Score Deep Learning LSTM in python
Module 26: Typology of Scoring models

Response Score

Income score

Churn Score

Origination Fraud Score

Behavior Fraud Score

Collection Score

Recovery Score

Big Data Scoring

Exercise 60: Fraud Score with neural networks

Exercise 61: Income Score

Exercise 62: Collection Score

Exercise 63: Recovery Score

Exercise 64: Quit Score
CALIBRATION OF PD MODELS
Module 27: Calibration of the Probability of Default PD IRB

PD estimation

econometric models

Machine Learning Models

Data requirement

Risk drivers and credit scoring criteria

Rating philosophy

Pool Treatment


PD Calibration

Default Definition

Long run average for PD

Technical defaults and technical default filters

Data requirement

One Year Default Rate Calculation

LongTerm Default Rate Calculation


PD Model Risk

Conservatism Margin


PD Calibration Techniques

Anchor Point Estimate

Mapping from Score to PD

Adjustment to the PD Economic Cycle

Rating Philosophy


PD Trough The Cycle (PD TTC) models

PD Point in Time PD (PD PIT ) models

PD Calibration of Models Using Machine and Deep Learning

Exercise 65: PD Calibration in Machine Learning Models
Module 28: Machine Learning models to estimate Lifetime PD under IFRS 9

Credit scoring models to estimate Lifetime PD

PD Lifetime in IFRS 9

Impact of COVID19 on models

Climate Risk Impact

Inflation impact

Impact of rising prices

Regression Models

Logistic regression

Logistic Multinomial Regression

Ordinal Probit Regression


VAR and VEC models

Machine Learning Model

SVM: Kernel Function Definition

Neural Network: definition of hyperparameters and activation function

deep learning

LSTM


PD Calibration of Models Using Machine and Deep Learning

Exercise 66: PD Lifetime using logistic regression

Exercise 67: PD Lifetime using multinomial regression in R

Exercise 68: PD Lifetime using SVM in Python

Exercise 69: PD Lifetime using Deep Learning in Python

Exercise 70: PD Lifetime using Deep Learning LSTM in Python
VALIDATION OF PD MODELS
Module 29: Validation of PD models

Definition of PD Backtesting

PD Calibration Validation

normal test

Binomial Test

Traffic Light Approach


Traffic Light Analysis and PD Dashboard

PS Stability Test

Forecasting PD vs. Real PD in time

When should we recalibrate or reestimate a credit scoring model?

Redevelopment

Reestimation

Model Risk in PD

Machine Learning to validate PD models

Artificial Intelligence to recalibrate and rebuild models autonomously

Exercise 71: Backtesting PD in Excel

Exercise 72: Forecasting PD and actual PD in Excel
AUTOMATION OF CREDIT SCORING AND PD WITH AI
Module 30: Automation of Credit Scoring and PD Modeling

What is modeling automation?

that is automated

Automation of machine learning processes

Optimizers and Evaluators

Modeling Automation Workflow Components

Summary

Indicted

Feature engineering

Model generation

Assessment


Hyperparameter optimization

Reconstruction or recalibration of credit scoring

Credit Scoring Modeling

Main milestones

Evaluation and optimization

Possible Issues


PD calibration modeling

Evaluation and optimization

backtesting

Discriminating Power

Stability Tests


Global evaluation of modeling automation

Implementation of modeling automation in banking

Technological requirements

available tools

Benefits and possible ROI estimation

Main Issues

Model Risk

Genetic algorithms

Exercise 73: Automation of the modeling, optimization and validation of credit scoring hyperparametry

Exercise 74: Automation of PD modeling and validation