Credit Scoring, Artificial Intelligence and Quantum Machine Learning
COURSE OBJECTIVE
Intensive course to develop credit scoring tools, calibrate the probability of default, PD, and validate models. Traditional, probabilistic and quantum machine learning methodologies are explained. It also explains how to automate the construction and calibration of the PD with the artificial intelligence itself.
The participant will learn to develop traditional and advanced credit scoring models in the credit admission and monitoring stage. In other words, the construction of credit and behavior scoring is explained using enormous volumes of information.
Regarding data analytics, a module is exposed on advanced data processing, explaining, among other topics, sampling, exploratory analysis, segmentation and detection of outliers.
The main techniques of machine learning, supervised, unsupervised and reinforcement learning, applied to the creation of credit scoring tools, are exposed.
Traditional methodologies such as logistic regression and other, innovative, machine learning methodologies are exposed, such as: decision trees, naive bayes, KKN, LASSO logistic regression, random forest, neural networks, Bayesian networks, Support Vector Machines, gradient boosting tree, etc .
The use of deep learning neural networks to develop powerful credit scoring models that banks can implement as challenging tools or useful tools in the admission and monitoring process is explained. Feed forward, convolutional, recurrent neural networks and antagonistic generative networks are exposed. A proprietary methodology, by Fermac Risk, is explained to control deep learning models and make them interpretable. This will avoid unacceptable black boxes.
Hyperparameters are parameters whose values control the learning process and determine the values of the model parameters that a learning algorithm ends up learning. The prefix hyper suggests that they are 'higher level' parameters that control the learning process and the model parameters that result from it.
Techniques for calculating hyperparameters are shown, such as grid search, random search, and Bayesian optimization.
More than 20 credit scoring models are delivered, with different methodologies in various programming languages such as: R, Python, Jupyterlab, Tensorflow and SAS. Credit scoring models for admission, followup, recovery, income and abandonment are delivered.
Advanced methodologies for calibrating the PD IRB risk parameter are taught. Calibration by adjustment to central tendency, the philosophy of the PD PIT and PD TTC rating, the calibration of machine learning models so that they produce probabilities of default are addressed. In addition, a module has been included to develop and calibrate the PD Lifetime of IFRS 9 using deep learning models.
Automated machine learning, also called automated ML or AutoML, is the process of automating the iterative tasks of machine learning model development. Allowing risk analysts to build machine learning models with high scalability, efficiency, and productivity, while maintaining model quality, they can help not only selfbuild models but validation of credit scoring models.
Probabilistic machine learning techniques are shown to build credit scoring models such as Bayesian neural networks among other models.
Automated machine learning methodologies using genetic algorithms among other advanced techniques are explained.
The best practices for validation of credit scoring models of financial institutions using artificial intelligence and the regulatory requirements in Europe to use this type of models are indicated.
Quantum Machine Learning is the integration of quantum algorithms within Machine Learning programs. Machine learning algorithms are used to compute vast amounts of data, quantum machine learning uses qubits and quantum operations or specialized quantum systems to improve the speed of computation and data storage performed by algorithms in a program. For example, some mathematical and numerical techniques from quantum physics are applicable to classical deep learning. A quantum neural network has computational capabilities to decrease the number of steps, the qubits used, and the computation time.
The objective of the course is to show the use of quantum computing and tensor networks for the calculation of machine learning algorithms.
We believe that quantum computing will begin to transform the financial services landscape in the coming years. Banks that adopt quantum algorithms will have competitive advantages, including the potential to outpace competitors to become undisputed market leaders.
WHO SHOULD ATTEND?
The Course is aimed at professionals from financial institutions interested in developing powerful credit scoring models and calibrating their output, as well as model managers in credit risk and data science departments.
For a better understanding of the topics it is necessary that the participant has knowledge of statistics and mathematics.

Europe: MonFri, CEST 1619 h

America: MonFri, CDT 1821 h

Asia: MonFri, IST 1821 h
Schedules:
Price: 8.900 €
Level: Advanced
Duration: 40 h
Material:

Presentations in PDF

Exercises in Excel, R , SAS, Python, Jupyterlab y Tensorflow
AGENDA
Credit Scoring, Artificial Intelligence and Quantum Machine Learning
CREDIT SCORING
Module 0: Quantum Computing and Algorithms (Optional)

Future of quantum computing in banking

Is it necessary to know quantum mechanics?

QIS Hardware and Apps

quantum operations

Qubit representation

Measurement

Overlap

matrix multiplication

Qubit operations

Multiple Quantum Circuits

Entanglement

Deutsch Algorithm

Quantum Fourier transform and search algorithms

Hybrid quantumclassical algorithms

Quantum annealing, simulation and optimization of algorithms

Quantum machine learning algorithms

Exercise 1: Quantum operations
Module 1: Artificial Intelligence for Credit Scoring

Big Data Definition

Big Data in financial institutions and fintech

Big data in Bigtech

Data typology

structured

semistructured

Unstructured Data


Big data: Volume, Velocity, Variety, Veracity and Value

Big Data Size

Big data sources

transactional data

social media dating

Credit bureau data

Origin of data sources

The data of the website

Text Data

sensor data

RFID and NFC data

Data from telecom operators

Smart grid data


banking digitization

financial inclusion

Regulation in Europe, USA and Latin America

Artificial intelligence in banking

Artificial intelligence in the credit cycle
Module 2: AI in Credit Scoring

AI in Credit Scoring for Banking and Fintech

Offline and online credit scoring

Design and Construction of Credit Scoring Models

Advantages and disadvantages

Models to face new financial crises

Machine Learning to develop and validate credit scoring

Importance of the Bureau Score

Credit Scorecard Management

Default Probability Estimation PD
Module 3: Machine Learning

Definition of Machine Learning

Machine Learning Methodology

Data Storage

Abstraction

Generalization

Assessment


Supervised Learning

Unsupervised Learning

Reinforcement Learning

deep learning

Typology of Machine Learning algorithms

Steps to Implement an Algorithm

information collection

Exploratory Analysis

Model Training

Model Evaluation

Model improvements

Machine Learning in credit scoring models

Quantum Machine Learning

MODELING
Module 4: Exploratory Analysis

Data typology

transactional data

Unstructured data embedded in text documents

Social Media Data

data sources

Data review

Target definition

Time horizon of the target variable

Sampling

Random Sampling

Stratified Sampling

Rebalanced Sampling


Exploratory Analysis:

histograms

Q Q Plot

Moment analysis

boxplot


Treatment of Missing values

Multivariate Imputation Model

Advanced Outlier detection and treatment techniques

Univariate technique: winsorized and trimming

Multivariate Technique: Mahalanobis Distance

Module 5: Univariate Analysis

Data Standardization

Variable categorization

Equal Interval Binning

Equal Frequency Binning

ChiSquare Test


binary coding

WOE Coding

WOE Definition

Univariate Analysis with Target variable

Variable Selection

Treatment of Continuous Variables

Treatment of Categorical Variables

gini

Information Value

Optimization of continuous variables

Optimization of categorical variables


Exercise 1: Exploratory Analysis in R

Exercise 2: Detection and treatment of Advanced Outliers

Exercise 3: Stratified and Random Sampling in R

Exercise 4: Multivariate imputation model

Exercise 5: Univariate analysis in percentiles in R

Exercise 6: Continuous variable optimal univariate analysis in Excel

Exercise 7: Estimation of the KS, Gini and IV of each variable in Excel

Exercise 8: Word Cloud analysis of variables in R
MACHINE LEARNING
Unsupervised Learning
Module 6: Unsupervised models

Hierarchical Clusters

K Means

standard algorithm

Euclidean distance

Principal Component Analysis (PCA)

Advanced PCA Visualization

Eigenvectors and Eigenvalues

Exercise 14: Core components in R and SAS

Exercise 15: Segmentation of the data with KMeans R
Supervised Learning
Module 7: Logistic Regression and LASSO Regression

Econometric Models

Logit regression

probit regression

Piecewise Regression

survival models


Machine Learning Models

Lasso Regression

Ridge Regression


Model Risk in Logistic Regression

Exercise 16: Credit Scoring Logistic Regression in SAS and R

Exercise 17: Credit Scoring Lasso Logistic Regression in R

Exercise 18: Model Risk Using Confidence Intervals of Logistic Regression Coefficients
Module 8: Trees, KNN and Naive Bayes

Decision Trees

modeling

Advantages and disadvantages

Recursion and Partitioning Processes

Recursive partitioning tree

Pruning Decision tree

Conditional inference tree

tree display

Measurement of decision tree prediction

CHAID model

Model C5.0


KNearest Neighbors KNN

modeling

Advantages and disadvantages

Euclidean distance

Distance Manhattan

K value selection


Probabilistic Model: Naive Bayes

naive bayes

Bayes' theorem

Laplace estimator

Classification with Naive Bayes

Advantages and disadvantages


Exercise 19: Credit Scoring Decision Tree in SAS and R

Exercise 20: Credit Scoring KNN in R and SAS

Exercise 21: Credit Scoring Naive Bayes in R
Module 9: Support Vector Machine SVM

SVM with dummy variables

SVM

optimal hyperplane

Support Vectors

add costs

Advantages and disadvantages

SVM visualization

Tuning SVM

kernel trick

Exercise 22: Credit Scoring Support Vector Machine in R data 1

Exercise 23: Credit Scoring Support Vector Machine in Python data 2
Module 10: Ensemble Learning

set models

bagging

bagging trees

Random Forest

Boosting

adaboost

Gradient Boosting Trees

Advantages and disadvantages

Exercise 24: Credit Scoring Boosting in R

Exercise 25: Credit Scoring Bagging in R

Exercise 26: Credit Scoring Random Forest, R and Python, data 1 and 2

Exercise 27: Credit Scoring Gradient Boosting Trees
MODEL VALIDATION
Module 11: Validation of traditional and Machine Learning models

Model validation

Validation of machine learning models

Regulatory validation of machine learning models in Europe

Out of Sample and Out of time validation

Checking pvalues in regressions

R squared, MSE, MAD

Waste diagnosis

Goodness of Fit Test

multicollinearity

Binary case confusion matrix

Multinomial case confusion matrix

Main discriminant power tests

confidence intervals

Jackknifing with discriminant power test

Bootstrapping with discriminant power test

Kappa statistic

KFold Cross Validation

Exercise 28: Logistic Regression GoodnessofFit Test

Exercise 29: Cross validation in SAS

Exercise 30: Gini Estimation, Information Value, Brier Score, Lift Curve, CAP, ROC, Divergence in SAS and Excel

Exercise 31: Bootstrapping of SAS parameters

Exercise 32: Jackkinifng in SAS

Exercise 33: Gini/ROC Bootstrapping in SAS

Exercise 34: Kappa estimation

Exercise 35: KFold Cross Validation in R

Exercise 36: Traffic light validation out of time (horizon 6 years) of Logistics and Machine Learning models
Module 12: Stability Testing

Model stability index

Factor stability index

Xisquare test

KS test

Exercise 37: Stability tests of models and factors
DEEP LEARNING
Module 14: Introduction to Deep Learning

Definition and concept of deep learning

Why now the use of deep learning?

Artificial neural networks

Neural network architectures

activation function

sigmoidal

Rectified linear unit

hypertangent

Softmax


feedforward network

Multilayer Perceptron

Using Tensorflow

Using Tensorboard

R deep learning

Python deep learning

Convolutional Neural Networks

Use of deep learning in image classification

cost function

Gradient descending optimization

Using deep learning for credit scoring

How many hidden layers?

How many neurons, 100, 1000?

How many times and size of the batch size?

What is the best activation function?


Deep Learning Software: Caffe, H20, Keras, Microsoft, Matlab, etc.

Deployment software: Nvidia and Cuda

Hardware, CPU, GPU and cloud environments

Advantages and disadvantages of deep learning
Module 15: Deep Learning Feed Forward Neural Networks

Single Layer Perceptron

Multiple Layer Perceptron

Neural network architectures

activation function

sigmoidal

Rectified linear unit (Relu)

The U

Selu

hyperbolic hypertangent

Softmax

other


Back propagation

Directional derivatives

gradients

Jacobians

Chain rule

Optimization and local and global minima


Exercise 38: Credit Scoring using Deep Learning Feed Forward
Module 16: Deep Learning Convolutional Neural Networks CNN

CNN for pictures

Design and architectures

convolution operation

descending gradient

filters

strider

padding

Subsampling

pooling

fully connected

Credit Scoring using CNN

Recent CNN studies applied to credit risk and scoring

Exercise 39: Credit scoring using deep learning CNN
Module 17: Deep Learning Recurrent Neural Networks RNN

Natural Language Processing

Natural Language Processing (NLP) text classification

Long Term Short Term Memory (LSTM)

hopfield

Bidirectional associative memory

descending gradient

Global optimization methods

RNN and LSTM for credit scoring

Oneway and twoway models

Deep Bidirectional Transformers for Language Understanding

Exercise 40: Credit Scoring using Deep Learning LSTM
Module 18: Generative Adversarial Networks (GANs)

Generative Adversarial Networks (GANs)

Fundamental components of the GANs

GAN architectures

Bidirectional GAN

Training generative models

Credit Scoring using GANs

Exercise 41: Credit Scoring using GANs
Module 19: Calibrating Machine Learning and Deep Learning

hyperparameterization

grid search

random search

Bayesian Optimization

Train test split ratio

Learning rate in optimization algorithms (e.g. gradient descent)

Selection of optimization algorithm (e.g., gradient descent, stochastic gradient descent, or Adam optimizer)

Activation function selection in a (nn) layer neural network (e.g. Sigmoid, ReLU, Tanh)

Selection of loss, cost and custom function

Number of hidden layers in an NN

Number of activation units in each layer

The dropout rate in nn (dropout probability)

Number of iterations (epochs) in training a nn

Number of clusters in a clustering task

Kernel or filter size in convolutional layers

pooling size

batch size

Exercise 42: Optimization Credit Scoring Xboosting, Random forest and SVM

Exercise 43: Optimized Credit Scoring Deep Learning
Module 20: Traditional Scorecard Construction

scoring assignment

Scorecard Classification

Scorecard WOE

Binary Scorecard

Continuous Scorecard


Scorecard Rescaling

Factor and Offset Analysis

Scorecard WOE

Binary Scorecard


Reject Inference Techniques

cutoff

parceling

Fuzzy Augmentation

Machine Learning


Advanced Cut Point Techniques

Cutoff optimization using ROC curves


Exercise 44: Building Scorecard in Excel, R and Python

Exercise 45: Optimum cutoff point estimation in Excel and model risk by cutoff point selection

Exercise 46: Confusion matrix to verify Type 1 and Type 2 Error in Excel with and without variables
QUANTUM MACHINE LEARNING
Module 21: Quantum Credit Scoring

What is quantum machine learning?

Qubit and Quantum States

Quantum Automatic Machine Algorithms

quantum circuits

quantum k means

Support Vector Machine

Support Vector Quantum Machine

Variational quantum classifier

Training quantum machine learning models

Quantum Neural Networks

Quantum GAN

Quantum Boltzmann machines

Quantum machine learning in Credit Risk

Quantum machine learning in credit scoring

quantum software

Exercise 47: Quantum Kmeans

Exercise 48: Quantum Support Vector Machine to develop credit scoring model

Exercise 49: Quantum feed forward Neural Networks to develop a credit scoring model

Exercise 50: Quantum Convoluted Neural Networks to develop a credit scoring model
Module 22: Tensor Networks for Quantum Machine Learning

What are tensor networks?

Quantum Entanglement

Tensor networks in machine learning

Tensor networks in unsupervised models

Tensor networks in SVM

Tensor networks in NN

NN tensioning

Application of tensor networks in credit scoring models

Exercise 51: Construction of credit scoring using tensor networks
PROBABILISTIC MACHINE LEARNING
Module 23: Probabilistic Machine Learning

Introduction to probabilistic machine learning

Gaussian models

Bayesian Statistics

Bayesian logistic regression

Kernel family

Gaussian processes

Gaussian processes for regression


Hidden Markov Model

Markov chain Monte Carlo (MCMC)

Metropolis Hastings algorithm


Machine Learning Probabilistic Model

Bayesian Boosting

Bayesian Neural Networks

Exercise 52: Gaussian process for regression

Exercise 53: Credit scoring model using Bayesian Neural Networks
MODEL RISK
Module 24: Model Risk in Credit Scoring

Model Risk

Model risk in deep learning

Model risk in credit scoring

black boxes

cutoff decision

absence of data

Model Risk for not updating or recalibrating

Ethical concepts of credit scoring

Exercise 54: Model risk in credit scoring due to not recalibrating on time
CREDIT SCORING MODELS
Module 25: Credit Scoring Models by Product

Admission Credit Scoring

Credit Card Score

Mortgage Score

consumption scores

Car Score


Behavior Score (BS)

Temporal horizon

Dashboard data information

Panel data regression

Cox regression

Behavior Score with macroeconomic variables

transition matrices

Behavior Score with transition matrices

Transaction Score

Machine Learning Models

BEHAVIOR SCORE ON CREDIT CARDS


Exercise 55: Behavior Score Logistic Regression in Python data 2

Exercise 56: Behavior Score Support Vector Machines in python

Exercise 57: Behavior Score Random Forest in python

Exercise 58: Behavior Score Gradient Boosting Trees in python

Exercise 59: Behavior Score Deep Learning LSTM in python
Module 26: Typology of Scores

Response Score

Income score

Dropout Score

Admission Fraud Score

Followup Fraud Score

Collection Score

Recovery Score

Big Data Scoring

Exercise 60: Fraud Score with neural networks

Exercise 61: Income Score

Exercise 62: Collection Score

Exercise 63: Recovery Score

Exercise 64: Quit Score
CALIBRATION OF PD MODELS
Module 27: Calibration of the Probability of Default PD IRB

PD estimation

econometric models

Machine Learning Models

Data requirement

Risk drivers and credit scoring criteria

Rating philosophy

Pool Treatment


PD Calibration

Default Definition

Long run average for PD

Technical defaults and technical default filters

Data requirement

One Year Default Rate Calculation

LongTerm Default Rate Calculation


PD Model Risk

Conservatism Margin


PD Calibration Techniques

Anchor Point Estimate

Mapping from Score to PD

Adjustment to the PD Economic Cycle

Rating Philosophy


PD Trough The Cycle (PD TTC) models

PD Point in Time PD (PD PIT ) models

PD Calibration of Models Using Machine and Deep Learning

Exercise 65: PD Calibration in Machine Learning Models
Module 28: Machine Learning models to estimate Lifetime PD under IFRS 9

Credit scoring models to estimate Lifetime PD

PD Lifetime in IFRS 9

Impact of COVID19 on models

Climate Risk Impact

Inflation impact

Impact of rising prices

Regression Models

Logistic regression

Logistic Multinomial Regression

Ordinal Probit Regression


VAR and VEC models

Machine Learning Model

SVM: Kernel Function Definition

Neural Network: definition of hyperparameters and activation function

deep learning

LSTM


PD Calibration of Models Using Machine and Deep Learning

Exercise 66: PD Lifetime using logistic regression

Exercise 67: PD Lifetime using multinomial regression in R

Exercise 68: PD Lifetime using SVM in Python

Exercise 69: PD Lifetime using Deep Learning in Python

Exercise 70: PD Lifetime using Deep Learning LSTM in Python
VALIDATION OF PD MODELS
Module 29: Validation of PD models

Definition of PD Backtesting

PD Calibration Validation

normal test

Binomial Test

Traffic Light Approach


Traffic Light Analysis and PD Dashboard

PS Stability Test

Forecasting PD vs Real PD in time

When to recalibrate or reestimate a credit scoring model?

Redevelopment

reestimation

Model Risk in PD

Machine Learning to validate PD models

Artificial Intelligence to recalibrate and rebuild models autonomously

Exercise 71: Backtesting PD in Excel

Exercise 72: Forecasting PD and actual PD in Excel
AUTOMATION OF CREDIT SCORING AND PD WITH AI
Module 30: Automation of Credit Scoring and PD Modeling

What is modeling automation?

that is automated

Automation of machine learning processes

Optimizers and Evaluators

Modeling Automation Workflow Components

Summary

Indicted

Feature engineering

Model generation

Assessment


Hyperparameter optimization

Reconstruction or recalibration of credit scoring

Credit Scoring Modeling

Main milestones

Evaluation and optimization

Possible Issues


PD calibration modeling

Evaluation and optimization

backtesting

Discriminating Power

Stability Tests


Global evaluation of modeling automation

Implementation of modeling automation in banking

Technological requirements

available tools

Benefits and possible ROI estimation

Main Issues

Model Risk

Genetic algorithms

Exercise 73: Automation of the modeling, optimization and validation of credit scoring hyperparametry

Exercise 74: Automation of PD modeling and validation