100% found this document useful (2 votes)
112 views545 pages

Applied Finance in Python

Enhance your Python financial skills and learn how to manipulate data and make better data-driven decisions. You’ll begin this track by discovering how to evaluate portfolios, mitigate risk exposure, and use the Monte Carlo simulation to model probability. Next, you’ll learn how to rebalance a portfolio using neural networks. Through interactive coding exercises, you’ll use powerful libraries, including SciPy, statsmodels, scikit-learn, TensorFlow, Keras, and XGBoost, to examine and manage risk. You’ll then apply what you’ve learned to answer questions commonly faced by financial firms, such as whether or not to approve a loan or a credit card request, using machine learning and financial techniques. Along the way, you’ll also create GARCH models and get hands-on with real datasets that feature Microsoft stocks, historical foreign exchange rates, and cryptocurrency data. Start this track to advance your Python financial skills. https://ebooks-tech.sellfy.store/p/applied-finance-in-python/

Uploaded by

jcmayac
Copyright
© © All Rights Reserved
100% found this document useful (2 votes)
112 views545 pages

Applied Finance in Python

Enhance your Python financial skills and learn how to manipulate data and make better data-driven decisions. You’ll begin this track by discovering how to evaluate portfolios, mitigate risk exposure, and use the Monte Carlo simulation to model probability. Next, you’ll learn how to rebalance a portfolio using neural networks. Through interactive coding exercises, you’ll use powerful libraries, including SciPy, statsmodels, scikit-learn, TensorFlow, Keras, and XGBoost, to examine and manage risk. You’ll then apply what you’ve learned to answer questions commonly faced by financial firms, such as whether or not to approve a loan or a credit card request, using machine learning and financial techniques. Along the way, you’ll also create GARCH models and get hands-on with real datasets that feature Microsoft stocks, historical foreign exchange rates, and cryptocurrency data. Start this track to advance your Python financial skills. https://ebooks-tech.sellfy.store/p/applied-finance-in-python/

Uploaded by

jcmayac
Copyright
© © All Rights Reserved
You are on page 1/ 545

Financial returns

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON

Dakota Wixom
Quantitative Analyst | QuantCourse.com
Course overview
Learn how to analyze investment return distributions, build
portfolios and reduce risk, and identify key factors which are
driving portfolio returns.

Univariate Investment Risk

Portfolio Investing

Factor Investing

Forecasting and Reducing Risk

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Investment risk
What is Risk?

Risk in nancial markets is a measure of uncertainty

Dispersion or variance of nancial returns

How do you typically measure risk?

Standard deviation or variance of daily returns

Kurtosis of the daily returns distribution

Skewness of the daily returns distribution

Historical drawdown

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Financial risk
Returns Probability

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


A tale of two returns
Returns are derived from
stock prices

Discrete returns (simple


returns) are the most
commonly used, and
represent periodic (e.g.
daily, weekly, monthly, etc.)
price movements

Log returns are o en used


in academic research and
nancial modeling. They
assume continuous
compounding.

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Calculating stock returns
Discrete returns are
calculated as the change in
price as a percentage of the
previous period's price

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Calculating log returns
Log returns are calculated
as the di erence between
Pt2
the log of two prices Rl = ln( )
Pt1
Log returns aggregate
across time, while discrete or equivalently
returns aggregate across
Rl = ln(Pt2 ) − ln(Pt1 )
assets

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Calculating stock returns in Python
Step 1:
Load in stock prices data and store it as a pandas DataFrame
organized by date:

import pandas as pd
StockPrices = pd.read_csv('StockData.csv', parse_dates=['Date'])
StockPrices = StockPrices.sort_values(by='Date')
StockPrices.set_index('Date', inplace=True)

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Calculating stock Returns in Python
Step 2:
Calculate daily returns of the adjusted close prices and append
the returns as a new column in the DataFrame.

StockPrices["Returns"] = StockPrices["Adj Close"].pct_change()


StockPrices["Returns"].head()

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Visualizing return distributions
import matplotlib.pyplot as plt
plt.hist(StockPrices["Returns"].dropna(), bins=75, density=False)
plt.show()

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Let's practice!
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Mean, variance, and
normal distribution
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON

Dakota Wixom
Quantitative Analyst | QuantCourse.com
Moments of distributions
Probability distributions have the following moments:

1) Mean (μ)

2) Variance ( σ2 )
3) Skewness

4) Kurtosis

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


There are many types of
distributions. Some are normal
and some are non-normal. A
random variable with a
Gaussian distribution is said
to be normally distributed.

Normal Distributions have the


following properties:

Mean = μ
Variance = σ2
Skewness = 0

Kurtosis = 3

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


The standard normal distribution
The Standard Normal is a special case of the Normal
Distribution when:

σ=1
μ=0

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Comparing against a normal distribution
Normal distributions have a skewness near 0 and a kurtosis
near 3.

Financial returns tend not to be normally distributed

Financial returns can have high kurtosis

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Comparing against a normal distribution

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Calculating mean returns in python
To calculate the average daily return, use the np.mean()
function:

import numpy as np
np.mean(StockPrices["Returns"])

0.0003

To calculate the average annualized return assuming 252


trading days in a year:

import numpy as np
((1+np.mean(StockPrices["Returns"]))**252)-1

0.0785

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Standard deviation and variance
Standard Deviation (Volatility)

Variance = σ2
O en represented in
mathematical notation as σ,
or referred to as volatility

An investment with higher σ


is viewed as a higher risk
investment

Measures the dispersion of


returns

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Standard deviation and variance in Python
Assume you have pre-loaded stock returns data in the
StockData object. To calculate the periodic standard deviation
of returns:

import numpy as np
np.std(StockPrices["Returns"])

0.0256

To calculate variance, simply square the standard deviation:

np.std(StockPrices["Returns"])**2

0.000655

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Scaling volatility
Volatility scales with the
square root of time

You can normally assume


252 trading days in a given
year, and 21 trading days in
a given month

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Scaling volatility in Python
Assume you have pre-loaded stock returns data in the
StockData object. To calculate the annualized volatility of
returns:

import numpy as np
np.std(StockPrices["Returns"]) * np.sqrt(252)

0.3071

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Let's practice!
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Skewness and
kurtosis
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON

Dakota Wixom
Quantitative Analyst | QuantCourse.com
Skewness is the third moment
of a distribution.

Negative Skew: The mass of


the distribution is
concentrated on the right.
Usually a right-leaning curve

Positive Skew: The mass of


the distribution is
concentrated on the le .
Usually a le -leaning curve

In nance, you would tend


to want positive skewness

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Skewness in Python
Assume you have pre-loaded stock returns data in the
StockData object.

To calculate the skewness of returns:

from scipy.stats import skew


skew(StockData["Returns"].dropna())

0.225

Note that the skewness is higher than 0 in this example,


suggesting non-normality.

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Kurtosis is a measure of the
thickness of the tails of a
distribution

Most nancial returns are


leptokurtic

Leptokurtic: When a
distribution has positive
excess kurtosis (kurtosis
greater than 3)

Excess Kurtosis: Subtract 3


from the sample kurtosis to
calculate "Excess Kurtosis"

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Excess kurtosis in Python
Assume you have pre-loaded stock returns data in the
StockData object. To calculate the excess kurtosis of returns:

from scipy.stats import kurtosis


kurtosis(StockData["Returns"].dropna())

2.44

Note the excess kurtosis greater than 0 in this example,


suggesting non-normality.

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Testing for normality in Python
How do you perform a statistical test for normality?

The null hypothesis of the Shapiro-Wilk test is that the data are
normally distributed.

# Run the Shapiro-Wilk normality test in Python


from scipy import stats
p_value = stats.shapiro(StockData["Returns"].dropna())[1]
if p_value <= 0.05:
print("Null hypothesis of normality is rejected.")
else:
print("Null hypothesis of normality is accepted.")

The p-value is the second variable returned in the list. If the p-


value is less than 0.05, the null hypothesis is rejected because
the data are most likely non-normal.

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Let's practice!
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Portfolio
composition and
backtesting
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON

Dakota Wixom
Quantitative Analyst | QuantCourse.com
Calculating portfolio returns
Portfolio Return Formula:
Rp = Ra1 wa1 + Ra2 wa2 + ... + Ran wa1

Rp : Portfolio return
Ran : Return for asset n
wan : Weight for asset n

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Assuming StockReturns is a pandas DataFrame of stock
returns, you can calculate the portfolio return for a set of
portfolio weights as follows:

import numpy as np
portfolio_weights = np.array([0.25, 0.35, 0.10, 0.20, 0.10])
port_ret = StockReturns.mul(portfolio_weights, axis=1).sum(axis=1)
port_ret

Date
2017-01-03 0.008082
2017-01-04 0.000161
2017-01-05 0.003448
...

StockReturns["Portfolio"] = port_ret

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Equally weighted portfolios in Python
Assuming StockReturns is a pandas DataFrame of stock
returns, you can calculate the portfolio return for an equally
weighted portfolio as follows:

import numpy as np
numstocks = 5
portfolio_weights_ew = np.repeat(1/numstocks, numstocks)
StockReturns.iloc[:,0:numstocks].mul(portfolio_weights_ew, axis=1).sum(axis=1)

Date
2017-01-03 0.008082
2017-01-04 0.000161
2017-01-05 0.003448
...

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Plotting portfolio returns in Python
To plot the daily returns in Python:

StockPrices["Returns"] = StockPrices["Adj Close"].pct_change()


StockReturns = StockPrices["Returns"]
StockReturns.plot()

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Plotting portfolio cumulative returns
In order to plot the cumulative returns of multiple portfolios:

import matplotlib.pyplot as plt


CumulativeReturns = ((1 + StockReturns).cumprod() - 1)
CumulativeReturns[["Portfolio","Portfolio_EW"]].plot()

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Market capitalization

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Market capitalization
Market capitalization: The value of a company's publicly traded
shares.

Also referred to as Market cap.

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Market-cap weighted portfolios
In order to calculate the market cap weight of a given stock n:

mcapn
wmcapn = n
∑i=1 mcapi

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Market-Cap weights in Python
To calculate market cap weights in python, assuming you have
data on the market caps of each company:

import numpy as np
market_capitalizations = np.array([100, 200, 100, 100])
mcap_weights = market_capitalizations/sum(market_capitalizations)
mcap_weights

array([0.2, 0.4, 0.2, 0.2])

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Let's practice!
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Correlation and co-
variance
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON

Dakota Wixom
Quantitative Analyst | QuantCourse.com
Pearson correlation
Examples of di erent correlations between two
random variables:

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Pearson correlation
A heatmap of a correlation matrix:

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Correlation matrix in Python
Assuming StockReturns is a pandas DataFrame of stock
returns, you can calculate the correlation matrix as follows:

correlation_matrix = StockReturns.corr()
print(correlation_matrix)

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Portfolio standard deviation
Portfolio standard deviation for a two asset portfolio:

σp = √w12 σ12 + w22 σ22 + 2w1 w2 ρ1,2 σ1 σ2

σp : Portfolio standard deviation


w: Asset weight

σ : Asset volatility
ρ1,2 : Correlation between assets 1 and 2

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


The Co-variance matrix
To calculate the co-variance matrix (Σ) of returns X:

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


The Co-variance matrix in Python
Assuming StockReturns is a pandas DataFrame of stock
returns, you can calculate the covariance matrix as follows:

cov_mat = StockReturns.cov()
cov_mat

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Annualizing the covariance matrix
To annualize the covariance matrix:

cov_mat_annual = cov_mat * 252

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Portfolio standard deviation using covariance
The formula for portfolio volatility is:

σP ortf olio = √wT ⋅ Σ ⋅ w

σP ortf olio : Portfolio volatility


Σ: Covariance matrix of returns
w: Portfolio weights (wT is transposed portfolio weights)

⋅ The dot-multiplication operator

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Matrix transpose
Examples of matrix transpose operations:

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Dot product
The dot product operation of two vectors a and b:

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Portfolio standard deviation using Python
To calculate portfolio volatility assume a weights array and a
covariance matrix:

import numpy as np
port_vol = np.sqrt(np.dot(weights.T, np.dot(cov_mat, weights)))
port_vol

0.035

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Let's practice!
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Markowitz portfolios
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON

Dakota Wixom
Quantitative Analyst | QuantCourse.com
100,000 randomly generated portfolios

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Sharpe ratio
The Sharpe ratio is a measure of risk-adjusted return.

To calculate the 1966 version of the Sharpe ratio:

Ra − rf
S=
σa
S: Sharpe Ratio

Ra : Asset return
rf : Risk-free rate of return
σa : Asset volatility

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


The efficient frontier

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


The Markowitz portfolios
Any point on the e cient
frontier is an optimum
portfolio.

These two common points are


called Markowitz Portfolios:

MSR: Max Sharpe Ratio


portfolio

GMV: Global Minimum


Volatility portfolio

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Choosing a portfolio
How do you choose the best portfolio?

Try to pick a portfolio on the bounding edge of the e cient


frontier

Higher return is available if you can stomach higher risk

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Selecting the MSR in Python
Assuming a DataFrame df of random portfolios with
Volatility and Returns columns:

numstocks = 5
risk_free = 0
df["Sharpe"] = (df["Returns"] - risk_free) / df["Volatility"]
MSR = df.sort_values(by=['Sharpe'], ascending=False)
MSR_weights = MSR.iloc[0, 0:numstocks]
np.array(MSR_weights)

array([0.15, 0.35, 0.10, 0.15, 0.25])

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Past performance is not a guarantee of future returns
Even though a Max Sharpe Ratio portfolio might sound nice, in
practice, returns are extremely di cult to predict.

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Selecting the GMV in Python
Assuming a DataFrame df of random portfolios with
Volatility and Returns columns:

numstocks = 5
GMV = df.sort_values(by=['Volatility'], ascending=True)
GMV_weights = GMV.iloc[0, 0:numstocks]
np.array(GMV_weights)

array([0.25, 0.15, 0.35, 0.15, 0.10])

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Let's practice!
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
The Capital Asset
Pricing Model
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON

Dakota Wixom
Quantitative Analyst | QuantCourse.com
The founding father of asset pricing models
CAPM
The Capital Asset Pricing Model is the fundamental building
block for many other asset pricing models and factor models in
nance.

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Excess returns
To calculate excess returns, simply subtract the risk free rate of
return from your total return:

Excess Return = Return − Risk Free Return

Example:

Investing in Brazil:

10% Portfolio Return - 15% Risk Free Rate = -5% Excess Return

Investing in the US:

10% Portfolio Return - 3% Risk Free Rate = 7% Excess Return

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


The Capital Asset Pricing Model
E(RP ) − RF = βP (E(RM ) − RF )

E(RP ) − RF : The excess expected return of a stock or


portfolio P

E(RM ) − RF : The excess expected return of the broad


market portfolio B

RF : The regional risk free-rate


βP : Portfolio beta, or exposure, to the broad market portfolio
B

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Calculating Beta using co-variance
To calculate historical beta using co-variance:

Cov(RP , RB )
βP =
V ar(RB )
βP : Portfolio beta
Cov(RP , RB ): The co-variance between the portfolio (P)
and the benchmark market index (B)

V ar(RB ): The variance of the benchmark market index

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Calculating Beta using co-variance in Python
Assuming you already have excess portfolio and market returns
in the object Data :

covariance_matrix = Data[["Port_Excess","Mkt_Excess"]].cov()
covariance_coefficient = covariance_matrix.iloc[0, 1]
benchmark_variance = Data["Mkt_Excess"].var()
portfolio_beta = covariance_coefficient / benchmark_variance
portfolio_beta

0.93

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Linear regressions
Example of a linear regression: Regression formula in matrix
notation:

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Calculating Beta using linear regression
Assuming you already have excess portfolio and market returns
in the object Data :

import statsmodels.formula.api as smf


model = smf.ols(formula='Port_Excess ~ Mkt_Excess', data=Data)
fit = model.fit()
beta = fit.params["Mkt_Excess"]
beta

0.93

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


R-Squared vs Adjusted R-Squared
To extract the adjusted r-squared and r-squared values:

import statsmodels.formula.api as smf


model = smf.ols(formula='Port_Excess ~ Mkt_Excess', data=Data)
fit = model.fit()
r_squared = fit.rsquared
r_squared

0.70

adjusted_r_squared = fit.rsquared_adj

0.65

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Let's practice!
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Alpha and multi-
factor models
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON

Dakota Wixom
Quantitative Analyst | QuantCourse.com
The Fama-French 3 factor Model
RP =

RF + βM (RM − RF ) + bSM B ⋅ SM B + bHM L ⋅ HM L + α

SMB: The small minus big factor

bSM B : Exposure to the SMB factor


HML: The high minus low factor

bHM L : Exposure to the HML factor


α: Performance which is unexplained by any other factors
βM : Beta to the broad market portfolio B

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


The Fama-French 3 factor model

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


The Fama-French 3 factor model in Python
Assuming you already have excess portfolio and market returns
in the object Data :

import statsmodels.formula.api as smf


model = smf.ols(formula='Port_Excess ~ Mkt_Excess + SMB + HML',
data=Data)
fit = model.fit()
adjusted_r_squared = fit.rsquared_adj
adjusted_r_squared

0.90

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


P-values and statistical significance
To extract the HML p-value, assuming you have a ed
regression model object in your workspace as fit :

fit.pvalues["HML"]

0.0063

To test if it is statistically signi cant, simply examine whether or


not it is less than a given threshold, normally 0.05:

fit.pvalues["HML"] < 0.05

True

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Extracting coefficients
To extract the HML coe cient, assuming you have a ed
regression model object in your workspace as fit :

fit.params["HML"]

0.502

fit.params["SMB"]

-0.243

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Alpha and the efficient market hypothesis
Assuming you already have a ed regression analysis in the
object fit :

portfolio_alpha = fit.params["Intercept"]
portfolio_alpha_annualized = ((1 + portfolio_alpha) ** 252) - 1
portfolio_alpha_annualized

0.045

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Let's practice!
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Expanding the 3-
factor model
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON

Dakota Wixom
Quantitative Analyst | QuantCourse.com
Fama French 1993
The original paper that started it all:

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Cliff Assness on Momentum
A paper published later by Cli Asness from AQR:

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


The Fama-French 5 factor model
In 2015, Fama and French extended their previous 3-factor
model, adding two additional factors:

RMW: Pro tability

CMA: Investment

The RMW factor represents the returns of companies with high


operating pro tability versus those with low operating
pro tability.

The CMA factor represents the returns of companies with


aggressive investments versus those who are more conservative.

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


The Fama-French 5 factor model

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


The Fama-French 5 factor model in Python
Assuming you already have excess portfolio and market returns
in the object Data :

import statsmodels.formula.api as smf


model = smf.ols(formula='Port_Excess ~ Mkt_Excess + SMB + HML + RMW + CMA',
data=Data)
fit = model.fit()
adjusted_r_squared = fit.rsquared_adj
adjusted_r_squared

0.92

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Let's practice!
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Estimating tail risk
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON

Dakota Wixom
Quantitative Analyst | QuantCourse.com
Estimating tail risk
Tail risk is the risk of extreme investment outcomes, most
notably on the negative side of a distribution.

Historical Drawdown

Value at Risk

Conditional Value at Risk

Monte-Carlo Simulation

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Historical drawdown
Drawdown is the percentage Historical Drawdown of
loss from the highest the USO Oil ETF
cumulative historical point.

rt
Drawdown = −1
RM
rt : Cumulative return at
time t

RM : Running maximum

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Historical drawdown in Python
Assuming cum_rets is an np.array of cumulative returns over
time

running_max = np.maximum.accumulate(cum_rets)
running_max[running_max < 1] = 1
drawdown = (cum_rets) / running_max - 1
drawdown

Date Return
2007-01-03 -0.042636
2007-01-04 -0.081589
2007-01-05 -0.073062

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Historical Value at Risk
Value at Risk, or VaR, is a
threshold with a given
con dence level that losses
will not (or more accurately,
will not historically) exceed a
certain level.

VaR is commonly quoted with


95% certain that losses will
quantiles such as 95, 99, and
not exceed -2.3% in a given
99.9.
day based on historical values.

Example: VaR(95) = -2.3%

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Historical Value at Risk in Python
var_level = 95
var_95 = np.percentile(StockReturns, 100 - var_level)
var_95

-0.023

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Historical expected shortfall
Conditional Value at Risk, or
CVaR, is an estimate of
expected losses sustained in
the worst 1 - x% of scenarios.

CVaR is commonly quoted


with quantiles such as 95, 99,
and 99.9.

In the worst 5% of cases,


Example: CVaR(95) = -2.5% losses were on average
exceed -2.5% historically.

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Historical expected shortfall in Python
Assuming you have an object StockReturns which is a time
series of stock returns.

To calculate historical CVaR(95):

var_level = 95
var_95 = np.percentile(StockReturns, 100 - var_level)
cvar_95 = StockReturns[StockReturns <= var_95].mean()
cvar_95

-0.025

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Let's practice!
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
VaR extensions
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON

Dakota Wixom
Quantitative Analyst | QuantCourse.com
VaR quantiles

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Empirical assumptions
Empirical historical values are those that have actually
occurred.

How do you simulate the probability of a value that has never


occurred historically before?

Sample from a probability distribution

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Parametric VaR in Python
Assuming you have an object StockReturns which is a time
series of stock returns.

To calculate parametric VaR(95):

mu = np.mean(StockReturns)
std = np.std(StockReturns)
confidence_level = 0.05
VaR = norm.ppf(confidence_level, mu, std)
VaR

-0.0235

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Scaling risk

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Scaling risk in Python
Assuming you have a one-day estimate of VaR(95) var_95 .

To estimate 5-day VaR(95):

forecast_days = 5
forecast_var95_5day = var_95*np.sqrt(forecast_days)
forecast_var95_5day

-0.0525

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Let's practice!
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Random walks
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON

Dakota Wixom
Quantitative Analyst | QuantCourse.com
Random walks
Most o en, random walks in
nance are rather simple
compared to physics:

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Random walks in Python
Assuming you have an object StockReturns which is a time
series of stock returns.

To simulate a random walk:

mu = np.mean(StockReturns)
std = np.std(StockReturns)
T = 252
S0 = 10
rand_rets = np.random.normal(mu, std, T) + 1
forecasted_values = S0 * (rand_rets.cumprod())
forecasted_values

array([ 9.71274884, 9.72536923, 10.03605425 ... ])

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Monte Carlo simulations
A series of Monte Carlo simulations of a single asset starting at
stock price $10 at T0. Forecasted for 1 year (252 trading days
along the x-axis):

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Monte Carlo VaR in Python
To calculate the VaR(95) of 100 Monte Carlo simulations:

mu = 0.0005
vol = 0.001
T = 252
sim_returns = []
for i in range(100):
rand_rets = np.random.normal(mu, vol, T)
sim_returns.append(rand_rets)
var_95 = np.percentile(sim_returns, 5)
var_95

-0.028

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Let's practice!
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Understanding risk
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON

Dakota Wixom
Quantitative Analyst | QuantCourse.com
Summary
Moments and Distributions

Portfolio Composition

Correlation and Co-Variance

Markowitz Optimization

Beta & CAPM

FAMA French Factor Modeling

Alpha

Value at Risk

Monte Carlo Simulations

INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON


Good luck!
INTRODUCTION TO PORTFOLIO RISK MANAGEMENT IN PYTHON
Welcome!
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N

Dr. Jamsheed Shorish


Computational Economist
About Me
Computational Economist
Specializing in:
asset pricing

financial technologies ("FinTech")

computer applications to economics and finance

Co-instructor, "Economic Analysis of the Digital Economy" at the ANU

Shorish Research (Belgium): computational business applications

QUANTITATIVE RISK MANAGEMENT IN PYTHON


What is Quantitative Risk Management?
Quantitative Risk Management: Study of quantifiable uncertainty

Uncertainty:
Future outcomes are unknown
Outcomes impact planning decisions

Risk management: mitigate (reduce effects of) adverse outcomes

Quantifiable uncertainty: identify factors to measure risk


Example: Fire insurance. What factors make fire more likely?

This course: focus upon risk associated with a financial portfolio

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Risk management and the Global Financial Crisis
Great Recession (2007 - 2010)
Global growth loss more than $2 trillion

United States: nearly $10 trillion lost in household wealth

U.S. stock markets lost c. $8 trillion in value

Global Financial Crisis (2007-2009)


Large-scale changes in fundamental asset values

Massive uncertainty about future returns

High asset returns volatility

Risk management critical to success or failure

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Quick recap: financial portfolios
Financial portfolio
Collection of assets with uncertain future returns

Stocks

Bonds

Foreign exchange holdings ('forex')

Stock options

Challenge: quantify risk to manage uncertainty


Make optimal investment decisions

Maximize portfolio return, conditional on risk appetite

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Quantifying return
Portfolio return: weighted sum of individual asset returns
Pandas data analysis library

DataFrame prices

.pct_change() method

.dot() method of returns

prices = pandas.read_csv("portfolio.csv")
returns = prices.pct_change()
weights = (weight_1, weight_2, ...)
portfolio_returns = returns.dot(weights)

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Quantifying risk
Portfolio return volatility = risk
Calculate volatility via covariance matrix

Use .cov() DataFrame method of


returns and annualize

covariance = returns.cov()*252
print(covariance)

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Quantifying risk
Portfolio return volatility = risk
Calculate volatility via covariance matrix

Use .cov() DataFrame method of


returns and annualize

Diagonal of covariance is individual asset


variances

covariance = returns.cov()*252
print(covariance)

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Quantifying risk
Portfolio return volatility = risk

Calculate volatility via covariance matrix

Use .cov() DataFrame method of


returns and annualize

Diagonal of covariance is individual asset


variances

Off-diagonals of covariance are


covariances between assets

covariance = returns.cov()*252
print(covariance)

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Portfolio risk
Depends upon asset weights in portfolio

Portfolio variance σp2 is


σp2 := wT ⋅ Covp ⋅ w

Matrix multiplication can be computed using @ operator in Python

Standard deviation is usually used instead of variance

weights = [0.25, 0.25, 0.25, 0.25] # Assumes four assets in portfolio


portfolio_variance = np.transpose(weights) @ covariance @ weights
portfolio_volatility = np.sqrt(portfolio_variance)

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Volatility time series
Can also calculate portfolio volatility over
time

Use a 'window' to compute volatility over a


fixed time period (e.g. week, 30-day
'month')

Series.rolling() creates a window

Observe volatility trend and possible


extreme events

windowed = portfolio_returns.rolling(30)
volatility = windowed.std()*np.sqrt(252)
volatility.plot()
.set_ylabel("Standard Deviation...")

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Let's practice!
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Risk factors and the
financial crisis
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N

Jamsheed Shorish
Computational Economist
Risk factors
Volatility: measure of dispersion of returns
around expected value

Time series: expected value = sample


average

What drives expectation and dispersion?

Risk factors: variables or events driving


portfolio return and volatility

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Risk exposure
Risk exposure: measure of possible portfolio loss
Risk factors determine risk exposure

Example: Flood Insurance


Deductible: out-of-pocket payment regardless of loss
100% coverage still leaves deductible to be paid

So deductible is risk exposure

Frequent flooding => more volatile flood outcome

Frequent flooding => higher risk exposure

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Systematic risk
Systematic risk: risk factor(s) affecting
volatility of all portfolio assets
Market risk: systematic risk from general
financial market movements
Airplane engine failure: systematic risk!

Examples of financial systematic risk


factors:
Price level changes, i.e. inflation

Interest rate changes

Economic climate changes

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Idiosyncratic risk
Idiosyncratic risk: risk specific to a
particular asset/asset class.

Turbulence and the unfastened seatbelt:


idiosyncratic risk!

Examples of idiosyncratic risk:


Bond portfolio: issuer risk of default

Firm/sector characteristics
Firm size (market capitalization)

Book-to-market ratio

Sector shocks

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Factor models
Factor model: assessment of risk factors affecting portfolio return
Statistical regression, e.g. Ordinary Least Squares (OLS):
dependent variable: returns (or volatility)

independent variable(s): systemic and/or idiosyncratic risk factors

Fama-French factor model: combination of


market risk and

idiosyncratic risk (firm size, firm value)

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Crisis risk factor: mortgage-backed securities
Investment banks: borrowed heavily just
before the crisis

Collateral: mortgage-backed securities


(MBS)

MBS: supposed to diversify risk by holding


many mortgages of different
characteristics
Flaw: mortgage default risk in fact was
highly correlated

Avalanche of delinquencies/default
destroyed collateral value
90-day mortgage delinquency: risk factor

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Crisis factor model
Factor model regression: portfolio returns vs. mortgage delinquency
Import statsmodels.api library for regression tools

Fit regression using .OLS() object and its .fit() method

Display results using regression's .summary() method

import statsmodels.api as sm
regression = sm.OLS(returns, delinquencies).fit()
print(regression.summary())

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Regression .summary() results

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Let's practice!
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Modern portfolio
theory
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N

Jamsheed Shorish
Computational Economist
The risk-return trade-off
Risk factors: sources of uncertainty affecting return
Intuitively: greater uncertainty (more risk) compensated by greater return

Cannot guarantee return: need some measure of expected return


average (mean) historical return: proxy for expected future return

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Investor risk appetite
Investor survey: minimum return required for given level of risk?
Survey response creates (risk, return) risk profile "data point"

Vary risk level => set of (risk, return) points

Investor risk appetite: defines one quantified relationship between risk and return

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Choosing portfolio weights
Vary portfolio weights of given portfolio => creates set of (risk, return) pairs
Changing weights = beginning risk management!

Goal: change weights to maximize expected return, given risk level


Equivalently: minimize risk, given expected return level

Changing weights = adjusting investor's risk exposure

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Modern portfolio theory
Efficient portfolio: portfolio with weights generating highest expected return for given level
of risk

Modern Portfolio Theory (MPT), 1952


H. M. Markowitz (Nobel Laureate 1990)

Efficient portfolio weight vector w ⋆ solves:

QUANTITATIVE RISK MANAGEMENT IN PYTHON


The efficient frontier
Compute many efficient portfolios for different levels of risk
Efficient frontier: locus of (risk, return) pairs created by efficient portfolios

PyPortfolioOpt library: optimized tools for MPT


EfficientFrontier class: generates one optimal portfolio at a time

Constrained Line Algorithm ( CLA ) class: generates the entire efficient frontier
Requires covariance matrix of returns

Requires proxy for expected future returns: mean historical returns

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Investment bank portfolio 2005 - 2010
Expected returns: historical data
Covariance matrix: Covariance Shrinkage improves efficiency of estimate

Constrained Line Algorithm object CLA

Minimum variance portfolio: cla.min_volatility()

Efficient frontier: cla.efficient_frontier()

expected_returns = mean_historical_return(prices)
efficient_cov = CovarianceShrinkage(prices).ledoit_wolf()
cla = CLA(expected_returns, efficient_cov)
minimum_variance = cla.min_volatility()
(ret, vol, weights) = cla.efficient_frontier()

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Visualizing the efficient frontier
Scatter plot of (vol, ret) pairs

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Visualizing the efficient frontier
Scatter plot of (vol, ret) pairs
Minimum variance portfolio: smallest
volatility of all possible efficient portfolios

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Visualizing the efficient frontier
Scatter plot of (vol, ret) pairs
Minimum variance portfolio: smallest
volatility of all possible efficient portfolios

Increasing risk appetite: move along the


frontier

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Let's practice!
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Measuring Risk
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N

Jamsheed Shorish
CEO, Shorish Research
The Loss Distribution
Forex Example: Loss distribution: Random realizations of r
Portfolio value in U.S. dollars is USD 100 => distribution of portfolio losses in the

Risk factor = / exchange rate


future

Portfolio value in EURO if 1 =1 :


USD 100 x EUR 1 / USD 1 = EUR 100.

Portfolio value in EURO if r =1 :=


USD 100 x EUR r / 1 USD = EUR 100 x r

Loss = EUR 100 - EUR 100 x r = EUR 100 x


(1 - r)

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Maximum loss
What is the maximum loss of a portfolio?

Losses cannot be bounded with 100% certainty

Confidence Level: replace 100% certainty with likelihood of upper bound

Can express questions like "What is the maximum loss that would take place 95% of the
time?"
Here the confidence level is 95%.

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Value at Risk (VaR)
VaR: statistic measuring maximum
portfolio loss at a particular confidence
level

Typical confidence levels: 95%, 99%, and


99.5% (usually represented as decimals)

Forex Example: If 95% of the time EUR /


USD exchange rate is at least 0.40, then:
portfolio value is at least USD 100 x 0.40
EUR / USD = EUR 40,

portofio loss is at most EUR 40 - EUR 100


= EUR 60,
so the 95% VaR is EUR 60.

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Conditional Value at Risk (CVaR)
CVaR: measures expected loss given a Forex Example:
minimum loss equal to the VaR 95% CVaR = expected loss for 5% of
cases when portfolio value smaller than
Equals expected value of the tail of the
loss distribution: EUR 40
1 x̄
CVaR(α) := E∫ xf (x)dx,
1−α VaR(α)

f (⋅) = loss distribution pdf


x̄ = upper bound of the loss (can be
infinity)

VaR(α) = VaR at the α confidence level.

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Deriving the VaR
1. Specify confidence level, e.g. 95% (0.95)
2. Create Series of loss observations

3. Compute loss.quantile() at specified confidence level

4. VaR = computed .quantile() at desired confidence level

5. scipy.stats loss distribution: percent point function .ppf() can also be used

loss = pd.Series(observations)
VaR_95 = loss.quantile(0.95)
print("VaR_95 = ", VaR_95)

Var_95 = 1.6192834157254088

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Deriving the CVaR
1. Specify confidence level, e.g. 95% (0.95)
2. Create or use sample from loss distribution

3. Compute VaR at a specified confidence level, e.g. 0.95.

4. Compute CVaR as expected loss (Normal distribution: scipy.stats.norm.expect() does


this).

losses = pd.Series(scipy.stats.norm.rvs(size=1000))
VaR_95 = scipy.stats.norm.ppf(0.95)
CVaR_95 = (1/(1 - 0.95))*scipy.stats.norm.expect(lambda x: x, lb = VaR_95)
print("CVaR_95 = ", CVaR_95)

CVaR_95 = 2.153595332530393

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Visualizing the VaR
Loss distribution histogram for 1000 draws
from N(1,3)

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Visualizing the VaR
Loss distribution histogram for 1000 draws
from N(1,3)
VaR95 = 5.72, i.e. VaR at 95% confidence

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Visualizing the VaR
Loss distribution histogram for 1000 draws
from N(1,3)
VaR95 = 5.72, i.e. VaR at 95% confidence

VaR99 = 7.81, i.e. VaR at 99% confidence

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Visualizing the VaR
Loss distribution histogram for 1000 draws
from N(1,3)
VaR95 = 5.72, i.e. VaR at 95% confidence

VaR99 = 7.81, i.e. VaR at 99% confidence

VaR99.5 = 8.78, i.e. VaR at 99.5%


confidence

The VaR measure increases as the


confidence level rises

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Let's practice!
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Risk exposure and
loss
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N

Jamsheed Shorish
Computational Economist
A vacation analogy
Hotel reservations for vacation
Pay in advance, before stay
Low room rate

Non-refundable: cancellation fee = 100%


of room rate

Pay after arrival


High room rate

Partially refundable: cancellation fee of


20% of room rate

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Deciding between options
What determines your decision?
1. Chance of negative shock: illness, travel
disruption, weather
Probability of loss
2. Loss associated with shock: amount or
conditional amount
e.g. VaR, CVaR

3. Desire to avoid shock: personal feeling


Risk tolerance

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Risk exposure and VaR
Risk exposure: probability of loss x loss measure
Loss measure: e.g. VaR

10% chance of canceling vacation: P(Illness) = 0.10

Non-refundable:
Total non-refundable hotel cost: € 500

VaR at 90% confidence level: € 500

Partially refundable:
Refundable hotel cost: € 550

VaR at 90% confidence level: 20% cancellation fee x € 550 = € 110

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Calculating risk exposure
Non-refundable exposure ("nr"):
P(illness) x VaRnr
0.90 = 0.10 x € 500 = € 50.
Partially refundable exposure ("pr"):
pr
P(illness) x VaR0.90 = 0.10 x € 110 = € 11.

Difference in risk exposure: € 50 - € 11 = € 39.

Total price difference between offers: € 550 - € 500 = € 50.

Risk tolerance: is paying € 50 more worth avoiding € 39 of additional exposure?

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Risk tolerance and risk appetite
Risk-neutral: only expected values matter
€ 39 < € 50 ⇒ prefer non-refundable option

Risk-averse: uncertainty itself carries a cost


€ 39 < € 50 ⇒ prefer partially refundable option

Enterprise/institutional risk management: preferences as risk appetite

Individual investors: preferences as risk tolerance

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Loss distribution - discrete
Risk exposure depends upon loss
distribution (probability of loss)

Vacation example: 2 outcomes from


random risk factor

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Loss distribution - continuous
Risk exposure depends upon loss
distribution (probability of loss)

Vacation example: 2 outcomes from


random risk factor

More generally: continuous loss distribution


Normal distribution: good for large
samples

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Loss distribution - continuous
Risk exposure depends upon loss
distribution (probability of loss)

Vacation example: 2 outcomes from


random risk factor

More generally: continuous loss distribution


Normal distribution: good for large
samples

Student's t-distribution: good for smaller


samples

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Primer: Student's t-distribution
Also referred to as T distribution
Has "fatter" tails than Normal for small
samples
Similar to portfolio returns/losses

As sample size grows, T converges to


Normal distribution

QUANTITATIVE RISK MANAGEMENT IN PYTHON


T distribution in Python
Example: compute 95% VaR from T
distribution
Import t distribution from scipy.stats

Fit portfolio_loss data using t.fit()

from scipy.stats import t


params = t.fit(portfolio_losses)

QUANTITATIVE RISK MANAGEMENT IN PYTHON


T distribution in Python
Example: compute 95% VaR from T
distribution
Import t distribution from scipy.stats

Fit portfolio_loss data using t.fit()

Compute percent point function with


.ppf() to find VaR

from scipy.stats import t


params = t.fit(portfolio_losses)
VaR_95 = t.ppf(0.95, *params)

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Degrees of freedom
Degrees of freedom (df): number of
independent observations

Small df: "fat tailed" T distribution


Large df: Normal distribution

x = np.linspace(-3, 3, 100)
plt.plot(x, t.pdf(x, df = 2))
plt.plot(x, t.pdf(x, df = 5))
plt.plot(x, t.pdf(x, df = 30))

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Let's practice!
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Risk management
using VaR & CVaR
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N

Jamsheed Shorish
Computational Economist
Risk management via modern portfolio theory
Efficient Portfolio
Portfolio weights maximize return given
risk level

Efficient Frontier: locus of (risk, return)


points generated by different efficient
portfolios
Each point = portfolio weight
optimization

Creation of efficient portfolio/frontier:


Modern Portfolio Theory

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Incorporating Value at Risk into MPT
Modern Portfolio Theory (MPT): "mean-variance" optimization
Highest expected return

Risk level (volatility) is given

Objective function: expected return


VaR/CVaR: measure risk over distribution of loss

Adapt MPT to optimize over loss distribution vs. expected return

QUANTITATIVE RISK MANAGEMENT IN PYTHON


A new objective: minimize CVaR
Change objective of portfolio optimization
mean-variance objective: maximize expected mean return
CVaR objective: minimize expected conditional loss at a given confidence level
Example: Loss distribution
VaR: maximum loss with 95% confidence

Optimization: portfolio weights minimizing CVaR

CVaR: expected loss given at least VaR loss (worst 5% of cases)

Find lowest expected loss in worst 100% - 95% = 5% of possible outcomes

QUANTITATIVE RISK MANAGEMENT IN PYTHON


The risk management problem
Select optimal portfolio weights w ⋆ as solution to

Recall: f (x) = probability density function of portfolio loss

PyPortfolioOpt: select minimization of CVaR as new objective

QUANTITATIVE RISK MANAGEMENT IN PYTHON


CVaR minimization using PyPortfolioOpt
Create an EfficientCVaR object with asset returns returns

Compute optimal portfolio weights using .min_cvar() method

ec = pypfopt.efficient_frontier.EfficientCVaR(None, returns)
optimal_weights = ec.min_cvar()

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Mean-variance vs. CVaR risk management
Mean-variance minimum volatility portfolio, 2005-2010 investment bank assets

ef = EfficientFrontier(None, e_cov)
min_vol_weights = ef.min_volatility()
print(min_vol_weights)

{'Citibank': 0.0,
'Morgan Stanley': 5.0784330940519306e-18,
'Goldman Sachs': 0.6280157234640608,
'J.P. Morgan': 0.3719842765359393}

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Mean-variance vs. CVaR risk management
CVaR-minimizing portfolio, 2005-2010 investment bank assets

ec = pypfopt.efficient_frontier.EfficientCVaR(None, returns)
min_cvar_weights = ec.min_cvar()
print(min_cvar_weights)

{'Citibank': 0.0,
'Morgan Stanley': 0.0,
'Goldman Sachs': 0.669324359403484,
'J.P. Morgan': 0.3306756405965026}

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Let's practice!
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Portfolio hedging:
offsetting risk
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N

Jamsheed Shorish
Computational Economist
Portfolio stability
VaR/CVaR: potential portfolio loss for given confidence level

Portfolio optimization: 'best' portfolio weights


But volatility is still present!
Institutional investors: stability of portfolio against volatile changes
Pension funds: c. USD 20 trillion

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Rainy days, sunny days
Investment portfolio: sunglasses company
Risk factor: weather

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Rainy days, sunny days
Investment portfolio: sunglasses company
Risk factor: weather (rain)

More rain => lower company value

Lower company value => lower stock


price

Lower stock price => lower portfolio value

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Rainy days, sunny days
Investment portfolio: sunglasses company
Risk factor: weather (rain)

More rain => lower company value

Lower company value => lower stock


price

Lower stock price => lower portfolio value

Second opportunity: umbrella company


More rain => more value!

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Rainy days, sunny days
Investment portfolio: sunglasses company
Risk factor: weather (rain)

More rain => lower company value

Lower company value => lower stock


price

Lower stock price => lower portfolio value

Second opportunity: umbrella company


More rain => more value!

Portfolio: sunglasses & umbrellas, more


stable
Volatility of rain is offset

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Hedging
Hedging: offset volatility with another asset
Crucial for institutional investor risk management

Additional return stream moving opposite to portfolio

Used in pension funds, ForEx, futures, derivatives...


2019: hedge fund market c. USD 3.6 trillion

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Hedge instruments: options
Derivative: hedge instrument
European option: very popular derivative
European call option: right (not obligation) to purchase stock at fixed price X on date M

European put option: right (not obligation) to sell stock at fixed price X on date M

Stock = "underlying" of the option


Current market price S = spot price

X = strike price
M = maturity date

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Black-Scholes option pricing
Option value changes when price of underlying changes => can be used to hedge risk
Need to value option: requires assumptions about market, underlying, interest rate, etc.

Black-Scholes option pricing formula: Fisher Black & Nobel Laureate Myron Scholes (1973)
Requires for each time t:
spot price S

strike price X

time to maturity T := M − t
risk-free interest rate r

volatility of underlying returns σ (standard deviation)

1Black, F. and M. Scholes (1973). "The Pricing of Options and Corporate Liabilities", Journal of Political Economy
vol 81 no. 3, pp. 637–654.{{3}}

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Black-Scholes formula assumptions
Market structure
Efficient markets

No transactions costs

Risk-free interest rate

Underlying stock
No dividends

Normally distributed returns

Online calculator: https://www.math.drexel.edu/~pg/fin/VanillaCalculator.html

Python function black_scholes() : source code link available in the exercises

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Computing the Black-Scholes option value
Black-Scholes option pricing formula black_scholes()
Required parameters: S , X , T (in fractions of a year), r , σ

Use the desired option_type ('call' or 'put')

S = 70; X = 80; T = 0.5; r = 0.02; sigma = 0.2


option_value = black_scholes(S, X, T, r, sigma, option_type = "put")
print(option_value)

10.31222171237868

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Hedging a stock position with an option
Hedge stock with European put option: underlying is same as stock in portfolio

Spot price S falls (ΔS < 0) => option value V rises (ΔV > 0)
Delta of an option: Δ := ∂V
∂S
1
Hedge one share with Δ options
ΔV
Delta neutral: ΔS + Δ
= 0; stock is hedged!
Python function bs_delta() : computes the option delta
Link to source available in the exercises

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Let's practice!
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Parametric
Estimation
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N

Jamsheed Shorish
Computational Economist
A class of distributions
Loss distribution: not known with certainty
Class of possible distributions?
Suppose class of distributions f (x; θ)

x is loss (random variable)


θ is vector of unknown parameters
Example: Normal distribution
Parameters: θ = (μ, σ), mean μ and standard deviation σ

Parametric estimation: find 'best' θ ⋆ given data

Loss distribution: f (x, θ ⋆ )

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Fitting a distribution
Fit distribution according to error-minimizing criteria
Example: scipy.stats.norm.fit() , fitting Normal distribution to data
Result: optimally fitted mean and standard deviation

Advantages:
Can visualize difference between data and estimate using histogram

Can provide goodness-of-fit tests

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Goodness of fit
How well does an estimated distribution fit
the data?

Visualize: plot histogram of portfolio losses

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Goodness of fit
How well does an estimated distribution fit
the data?

Visualize: plot histogram of portfolio losses

Normal distribution with norm.fit()

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Goodness of fit
How well does an estimated distribution fit
the data?

Visualize: plot histogram of portfolio losses

Example:
Normal distribution with norm.fit()

Student's t-distribution with t.fit()

Asymmetrical histogram?

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Anderson-Darling test
Statistical test of goodness of fit
Test null hypothesis: data are Normally distributed

Test statistic rejects Normal distribution if larger than critical_values

Import scipy.stats.anderson

Compute test result using loss data

from scipy.stats import anderson


anderson(loss)

AndersonResult(statistic=11.048641503898523,
critical_values=array([0.57 , 0.649, 0.779, 0.909, 1.081]),
significance_level=array([15. , 10. , 5. , 2.5, 1. ]))

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Skewness
Skewness: degree to which data is non-
symmetrically distributed
Normal distribution: symmetric

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Skewness
Skewness: degree to which data is non-
symmetrically distributed
Normal distribution: symmetric

Student's t-distribution: symmetric

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Skewness
Skewness: degree to which data is non-
symmetrically distributed
Normal distribution: symmetric

Student's t-distribution: symmetric


Skewed Normal distribution: asymmetric
Contains Normal as special case

Useful for portfolio data, where e.g.


losses more frequent than gains

Available in scipy.stats as skewnorm

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Testing for skewness
Test how far data is from symmetric distribution: scipy.stats.skewtest
Null hypothesis: no skewness
Import skewtest from scipy.stats

Compute test result on loss data


Statistically significant => use distribution class with skewness

from scipy.stats import skewtest


skewtest(loss)

SkewtestResult(statistic=-7.786120875514511,
pvalue=6.90978472959861e-15)

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Let's practice!
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Historical and
Monte Carlo
Simulation
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N

Jamsheed Shorish
Computational Economist
Historical simulation
No appropriate class of distributions?
Historical simulation: use past to predict future
No distributional assumption required

Data about previous losses become simulated losses for tomorrow

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Historical simulation in Python
VaR: start with returns in asset_returns
Compute portfolio_returns using portfolio weights

Convert portfolio_returns into losses

VaR: compute np.quantile() for losses at e.g. 95% confidence level

Assumes future distribution of losses is exactly the same as past

weights = [0.25, 0.25, 0.25, 0.25]


portfolio_returns = asset_returns.dot(weights)
losses = - portfolio_returns
VaR_95 = np.quantile(losses, 0.95)

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Monte Carlo simulation
Monte Carlo simulation: powerful combination of parametric estimation and simulation
Assumes distribution(s) for portfolio loss and/or risk factors

Relies upon random draws from distribution(s) to create random path, called a run

Repeat random draws ⇒ creates set of simulation runs

Compute simulated portfolio loss over each run up to desired time

Find VaR estimate as quantile of simulated losses

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Monte Carlo simulation in Python
Step One:
Import Normal distribution norm from scipy.stats

Define total_steps (1 day = 1440 minutes)

Define number of runs N

Compute mean mu and standard deviation sigma of portfolio_losses data

from scipy.stats import norm


total_steps = 1440
N = 10000
mu = portfolio_losses.mean()
sigma = portfolio_losses.std()

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Monte Carlo simulation in Python
Step Two:
Initialize daily_loss vector for N runs

Loop over N runs


Compute Monte Carlo simulated loss vector
Uses norm.rvs() to draw repeatedly from standard Normal distribution

Draws match data using mu and sigma scaled by 1/ total_steps

daily_loss = np.zeros(N)
for n in range(N):
loss = ( mu * (1/total_steps) +
norm.rvs(size=total_steps) * sigma * np.sqrt(1/total_steps) )

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Monte Carlo simulation in Python
Step Three:
Generate cumulative daily_loss , for each run n

Use np.quantile() to find the VaR at e.g. 95% confidence level, over daily_loss

daily_loss = np.zeros(N)
for n in range(N):
loss = mu * (1/total_steps) + ...
norm.rvs(size=total_steps) * sigma * np.sqrt(1/total_steps)
daily_loss[n] = sum(loss)
VaR_95 = np.quantile(daily_loss, 0.95)

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Simulating asset returns
Refinement: generate random sample paths of asset returns in portfolio
Allows more realism: asset returns can be individually simulated

Asset returns can be correlated


Recall: efficient covariance matrix e_cov

Used in Step 2 to compute asset returns

Exercises: Monte Carlo simulation with asset return simulation

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Let's practice!
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Structural breaks
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N

Jamsheed Shorish
Computational Economist
Risk and distribution
Risk management toolkit
Risk mitigation: MPT

Risk measurement: VaR, CVaR

Risk: dispersion, volatility


Variance (standard deviation) as risk definition

Connection between risk and distribution of risk factors as random variables

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Stationarity
Assumption: distribution is same over time
Unchanging distribution = stationary

Global financial crisis period efficient frontier


Not stationary
Estimation techniques require stationarity
Historical: unknown stationary distribution from past data

Parametric: assumed stationary distribution class

Monte Carlo: assumed stationary distribution for random draws

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Structural breaks
Non-stationary => perhaps distribution changes over time
Assume specific points in time for change
Break up data into sub-periods

Within each sub-period, assume stationarity

Structural break(s): point(s) of change


Change in 'trend' of average and/or volatility of data

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Example: China's population growth
Examine period 1950 - 2019
Trend is roughly linear...

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Example: China's population growth
Examine period 1950 - 2019
Trend is roughly linear...

...but seems to slow down from around 1990

Possible structural break near 1990.

Implies distribution of net population (births


- deaths) changed

Possible reasons: government policy,


standard of living, etc.

QUANTITATIVE RISK MANAGEMENT IN PYTHON


The Chow test
Previous example: visual evidence for structural break
Quantification: statistical measure

Chow Test:
Test for existence of structural break given linear model

Null hypothesis: no break


Requires three OLS regressions
Regression for entire period

Two regressions, before and after break

Collect sum-of-squared residuals

Test statistic is distributed according to "F" distribution

QUANTITATIVE RISK MANAGEMENT IN PYTHON


The Chow test in Python
Hypothesis: structural break in 1990 for China population
Assume linear "factor model":
log(Populationt ) = α + β ∗ Yeart + ut

OLS regression using statsmodels 's OLS object over full period 1950 - 2019
Retrieve sum-of-squared residual res.ssr

import statsmodels.api as sm
res = sm.OLS(log_pop, year).fit()
print('SSR 1950-2019: ', res.ssr)

SSR 1950-2019: 0.29240576138055463

QUANTITATIVE RISK MANAGEMENT IN PYTHON


The Chow test in Python
Break 1950 - 2019 into 1950 - 1989 and 1990 - 2019 sub-periods
Perform OLS regressions on each sub-period
Retrieve res_before.ssr and res_after.ssr

pop_before = log_pop.loc['1950':'1989']; year_before = year.loc['1950':'1989'];


pop_after = log_pop.loc['1990':'2019']; year_after = year.loc['1990':'2019'];
res_before = sm.OLS(pop_before, year_before).fit()
res_after = sm.OLS(pop_after, year_after).fit()
print('SSR 1950-1989: ', res_before.ssr)
print('SSR 1990-2019: ', res_after.ssr)

SSR 1950-1989: 0.011741113017411783


SSR 1990-2019: 0.0013717593339608077

QUANTITATIVE RISK MANAGEMENT IN PYTHON


The Chow test in Python
Compute the F-distributed Chow test statistic
Compute the numerator
k = 2 degrees of freedom = 2 OLS coefficients α, β

Compute the denominator


66 degrees of freedom = total number of data points (70) - 2*k

numerator = (ssr_total - (ssr_before + ssr_after)) / 2


denominator = (ssr_before + ssr_after) / 66
chow_test = numerator / denominator
print("Chow test statistic: ", chow_test, "; Critical value, 99.9%: ", 7.7)

Chow test statistic: 702.8715822890057; Critical value, 99.9%: 7.7

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Let's practice!
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Volatility and
extreme values
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N

Jamsheed Shorish
Computational Economist
Chow test assumptions
Chow test: identify statistical significance
of possible structural break

Requires: pre-specified point of structural


break

Requires: linear relation (e.g. factor model)


log(Populationt ) = α + β ∗ Yeart + ut

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Structural break indications
Visualization of trend may not indicate
break point

Alternative: examine volatility rather than


trend
Structural change often accompanied by
greater uncertainty => volatility

Allows richer models to be considered


(e.g. stochastic volatility models)

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Rolling window volatility
Rolling window: compute volatility over time and detect changes

Recall: 30-day rolling window


Create rolling window from ".rolling()" method

Compute the volatility of the rolling window (drop unavailable dates)

Compute summary statistic of interest, e.g. .mean() , .min() , etc.

rolling = portfolio_returns.rolling(30)
volatility = rolling.std().dropna()
vol_mean = volatility.resample("M").mean()

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Rolling window volatility
Visualize resulting volatility (variance or import matplotlib.pyplot as plt
vol_mean.plot(
standard deviation) title="Monthly average volatility"
).set_ylabel("Standard deviation")
plt.show()

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Rolling window volatility
Visualize resulting volatility (variance or vol_mean.pct_change().plot(
title="$\Delta$ average volatility"
standard deviation) ).set_ylabel("% $\Delta$ stdev")
plt.show()
Large changes in volatility => possible
structural break point(s)

Use proposed break points in linear model


of volatility
Variant of Chow Test

Guidance for applying e.g. ARCH,


stochastic volatility models

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Extreme values
VaR, CVaR: maximum loss, expected shortfall at particular confidence level
Visualize changes in maximum loss by plotting VaR?
Useful for large datasets

Small datasets: not enough information

Alternative: find losses exceeding some threshold

Example: VaR95 is maximum loss 95% of the time


So 5% of the time, losses can be expected to exceed VaR95

Backtesting: use previous data ex-post to see how risk estimate performs
Used extensively in enterprise risk management

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Backtesting
Suppose VaR95 = 0.03
Losses exceeding 3% are then extreme
values
Backtesting: around 5% (100% - 95%) of
previous losses should exceed 3%
More than 5%: distribution with wider
("fatter") tails

Less than 5%: distribution with narrower


tails

CVaR for backtesting: accounts for tail


better than VaR

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Let's practice!
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Extreme value
theory
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N

Jamsheed Shorish
Computational Economist
Extreme values
Portfolio losses: extreme values Extreme values: from tail of distribution
Tail losses: losses exceeding some value

Model tail losses => better risk


management

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Extreme value theory
Extreme value theory: statistical
distribution of extreme values

Block maxima

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Extreme value theory
Extreme value theory: statistical
distribution of extreme values

Block maxima:
Break period into sub-periods

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Extreme value theory
Extreme value theory: statistical
distribution of extreme values

Block maxima:
Break period into sub-periods

Form block from each sub-period

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Extreme value theory
Extreme value theory: statistical
distribution of extreme values

Block maxima:
Break period into sub-periods

Form blocks from each sub-period

Set of block maxima = dataset

Peak over threshold (POT):


Find all losses over given level

Set of such losses = dataset

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Generalized Extreme Value Distribution
Example: Block maxima for 2007 - 2009
Resample losses with desired period (e.g. weekly)

maxima = losses.resample("W").max()

Generalized Extreme Value Distribution (GEV)


Distribution of maxima of data

Example: parametric estimation using scipy.stats.genextreme

from scipy.stats import genextreme


params = genextreme.fit(maxima)

QUANTITATIVE RISK MANAGEMENT IN PYTHON


VaR and CVaR from GEV distribution
99% VaR from GEV distribution
Use .ppf() percent point function to find 99% VaR

Requires params from fitted GEV distribution

Finds maximum loss over one week period at 99% confidence

99% CVaR from GEV distribution


CVaR is conditional expectation of loss given VaR as minimum loss

Use .expect() method to find expected value

VaR_99 = genextreme.ppf(0.99, *params)

CVar_99 = ( 1 / (1 - 0.99) ) * genextreme.expect(lambda x: x, *params, lb = VaR_99)

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Covering losses
Risk management: covering losses
Regulatory requirement (banks, insurance)

Reserves must be available to cover losses


For a specified period (e.g. one week)

At a specified confidence level (e.g. 99%)

VaR from GEV distribution:


estimates maximum loss
given period

given confidence level

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Covering losses
Example: Initial portfolio value = $1,000,000

One week reserve requirement at 99% confidence


VaR99 from GEV distribution: maximum loss over one week at 99% confidence
Reserve requirement: Portfolio value x VaR99
Suppose VaR99 = 0.10, i.e. 10% maximum loss

Reserve requirement = $100,000

Portfolio value changes => reserve requirement changes

Regulation sets frequency of reserve requirement updating

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Let's practice!
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Kernel density
estimation
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N

Jamsheed Shorish
Computational Economist
The histogram revisited
Risk factor distributions
Assumed (e.g. Normal, T, etc.)
Fitted (parametric estimation, Monte
Carlo simulation)

Ignored (historical simulation)


Actual data: histogram

How to represent histogram by probability


distribution?
Smooth data using filtering
Non-parametric estimation

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Data smoothing
Filter: smoothen out 'bumps' of histogram

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Data smoothing
Filter: smoothen out 'bumps' of histogram
Observations accumulate in over time

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Data smoothing
Filter: smoothen out 'bumps' of histogram
Observations accumulate in over time

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Data smoothing
Filter: smoothen out 'bumps' of histogram
Observations accumulate in over time

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Data smoothing
Filter: smoothen out 'bumps' of histogram
Observations accumulate in over time

Pick particular portfolio loss

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Data smoothing
Filter: smoothen out 'bumps' of histogram
Observations accumulate in over time

Pick particular portfolio loss


Examine nearby losses

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Data smoothing
Filter: smoothen out 'bumps' of histogram
Observations accumulate in over time

Pick particular portfolio loss


Examine nearby losses

Form "weighted average" of losses

Kernel: filter choice; determines "window"

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Data smoothing
Filter: smoothen out 'bumps' of histogram
Observations accumulate in over time

Pick particular portfolio loss


Examine nearby losses

Form "weighted average" of losses

Kernel: filter choice; determines "window"


Move window to another loss

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Data smoothing
Filter: smoothen out 'bumps' of histogram
Observations accumulate in over time

Pick particular portfolio loss


Examine nearby losses

Form "weighted average" of losses

Kernel: filter choice; determines "window"


Move window to another loss

Kernel density estimate: probability density

QUANTITATIVE RISK MANAGEMENT IN PYTHON


The Gaussian kernel
Continuous kernel
Weights all observations by distance from
center

Generally: many different kernels are


available
Used in time series analysis

Used in signal processing

QUANTITATIVE RISK MANAGEMENT IN PYTHON


KDE in Python
from scipy.stats import gaussian_kde
kde = guassian_kde(losses)
loss_range = np.linspace(np.min(losses),
np.max(losses),
1000)
plt.plot(loss_range, kde.pdf(loss_range))

Visualization: probability density function


from KDE fit

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Finding VaR using KDE
VaR: use gaussian_kde .resample() method
Find quantile of resulting sample

CVaR: expected value as previously encountered, but


gaussian_kde has no .expect() method => compute integral manually

special .expect() method written for exercise

sample = kde.resample(size = 1000)


VaR_99 = np.quantile(sample, 0.99)
print("VaR_99 from KDE: ", VaR_99)

VaR_99 from KDE: 0.08796423698448601

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Let's practice!
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Neural network risk
management
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N

Jamsheed Shorish
Computational Economist
Real-time portfolio updating
Risk management
Defined risk measures (VaR, CVaR)

Estimated risk measures (parameteric, historical, Monte Carlo)

Optimized portfolio (e.g. Modern Portfolio Theory)

New market information => update portfolio weights


Problem: portfolio optimization costly

Solution: weights = f (prices)


Evaluate f in real-time
Update f only occasionally

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Neural networks
Neural Network: output = f (input)
Neuron: interconnected processing node in function

Initially developed 1940s-1950s

Early 2000s: application of neural networks to "big data"


Image recognition, processing

Financial data

Search engine data

Deep Learning: neural networks as part of Machine Learning


2015: Google releases open-source Tensorflow deep learning library for Python

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Neural network structure
Layers: connected processing neurons
Input layer

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Neural network structure
Neural network structure
Input layer

Hidden layer

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Neural network structure
Neural network structure
Input layer

Hidden layer

Output layer

Training: learn relationship between input


and output

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Neural network structure
Neural network structure
Input layer

Hidden layer

Output layer

Training: learn relationship between input


and output
Asset prices => Input layer

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Neural network structure
Neural network structure
Input layer

Hidden layer

Output layer

Training: learn relationship between input


and output
Asset prices => Input layer

Input + hidden layer processing

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Neural network structure
Neural network structure
Input layer

Hidden layer

Output layer

Training: learn relationship between input


and output
Asset prices => Input layer

Input + hidden layer processing

Hidden + output layer processing

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Neural network structure
Neural network structure
Input layer

Hidden layer

Output layer

Training: learn relationship between input


and output
Asset prices => Input layer

Input + hidden layer processing

Hidden + output layer processing

Output => portfolio weights

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Using neural networks for portfolio optimization
Training
Compare output and pre-existing "best" portfolio weights
Goal: minimize "error" between output and weights

Small error => network is trained

Usage
Input: new, unseen asset prices

Output: predicted "best" portfolio weights for new asset prices

Best weights = risk management

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Creating neural networks in Python
Keras: high-level Python library for neural networks/deep learning
Further info: Introduction to Deep Learning with Keras

from tensorflow.keras.models import Sequential


from tensorflow.keras.layers import Dense
model = Sequential()
model.add(Dense(10, input_dim=4, activation='sigmoid'))
model.add(Dense(4))

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Training the network in Python
Historical asset prices: training_input matrix
Historical portfolio weights: training_output vector

Compile model with:


given error minimization ('loss')

given optimization algorithm ('optimizer')

Fit model to training data


epochs: number of training loops to update internal parameters

model.compile(loss='mean_squared_error', optimizer='rmsprop')
model.fit(training_input, training_output, epochs=100)

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Risk management in Python
Usage: provide new (e.g. real-time) asset pricing data
New vector new_asset_prices given to input layer

Evaluate network using model.predict() on new prices


Result: predicted portfolio weights

Accumulate enough data over time => re-train network


Test network on previous data => backtesting

# new asset prices are in the vector new_asset_prices


predicted = model.predict(new_asset_prices)

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Let's practice!
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Wrap-up and Future
Steps
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N

Jamsheed Shorish
Computational Economist
Congratulations!

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Congratulations!

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Congratulations!

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Congratulations!

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Tools in your toolkit

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Future steps and reference
Upcoming DataCamp courses
Credit Risk Modeling in Python

Financial Forecasting in Python

Machine Learning for Finance in Python

GARCH Models for Finance in Python

Quantitative Risk Management: Concepts, Techniques and Tools, McNeil, Frey & Embrechts,
Princeton UP, 2015.

QUANTITATIVE RISK MANAGEMENT IN PYTHON


Best of luck on your
data science
journey!
Q U A N T I TAT I V E R I S K M A N A G E M E N T I N P Y T H O N
Understanding
credit risk
CREDIT RISK MODELING IN PYTHON

Michael Crabtree
Data Scientist, Ford Motor Company
What is credit risk?
The possibility that someone who has borrowed money will not repay it all

Calculated risk di erence between lending someone money and a government bond

When someone fails to repay a loan, it is said to be in default

The likelihood that someone will default on a loan is the probability of default (PD)

CREDIT RISK MODELING IN PYTHON


What is credit risk?
The possibility that someone who has borrowed money will not repay it all

Calculated risk di erence between lending someone money and a government bond

When someone fails to repay a loan, it is said to be in default

The likelihood that someone will default on a loan is the probability of default (PD)

Payment Payment Date Loan Status


$100 Jun 15 Non-Default
$100 Jul 15 Non-Default
$0 Aug 15 Default

CREDIT RISK MODELING IN PYTHON


Expected loss
The dollar amount the rm loses as a result of loan default

Three primary components:


Probability of Default (PD)

Exposure at Default (EAD)

Loss Given Default (LGD)

Formula for expected loss:

expected_loss = PD * EAD * LGD

CREDIT RISK MODELING IN PYTHON


Types of data used
Two Primary types of data used:

Application data

Behavioral data

Application Behavioral
Interest Rate Employment Length
Grade Historical Default
Amount Income

CREDIT RISK MODELING IN PYTHON


Data columns
Mix of behavioral and application Column Column
Contain columns simulating credit bureau Income Loan grade
data Age Loan amount
Home ownership Interest rate
Employment length Loan status
Loan intent Historical default
Percent Income Credit history length

CREDIT RISK MODELING IN PYTHON


Exploring with cross tables
pd.crosstab(cr_loan['person_home_ownership'], cr_loan['loan_status'],
values=cr_loan['loan_int_rate'], aggfunc='mean').round(2)

CREDIT RISK MODELING IN PYTHON


Exploring with visuals
plt.scatter(cr_loan['person_income'], cr_loan['loan_int_rate'],c='blue', alpha=0.5)
plt.xlabel("Personal Income")
plt.ylabel("Loan Interest Rate")
plt.show()

CREDIT RISK MODELING IN PYTHON


Let's practice!
CREDIT RISK MODELING IN PYTHON
Outliers in Credit
Data
CREDIT RISK MODELING IN PYTHON

Michael Crabtree
Data Scientist, Ford Motor Company
Data processing
Prepared data allows models to train faster

O en positively impacts model performance

CREDIT RISK MODELING IN PYTHON


Outliers and performance
Possible causes of outliers:

Problems with data entry systems (human error)

Issues with data ingestion tools

CREDIT RISK MODELING IN PYTHON


Outliers and performance
Possible causes of outliers:

Problems with data entry systems (human error)

Issues with data ingestion tools

Feature Coe cient With Outliers Coe cient Without Outliers


Interest Rate 0.2 0.01
Employment Length 0.5 0.6
Income 0.6 0.75

CREDIT RISK MODELING IN PYTHON


Detecting outliers with cross tables
Use cross tables with aggregate functions

pd.crosstab(cr_loan['person_home_ownership'], cr_loan['loan_status'],
values=cr_loan['loan_int_rate'], aggfunc='mean').round(2)

CREDIT RISK MODELING IN PYTHON


Detecting outliers visually
Detecting outliers visually

Histograms

Sca er plots

CREDIT RISK MODELING IN PYTHON


Removing outliers
Use the .drop() method within Pandas

indices = cr_loan[cr_loan['person_emp_length'] >= 60].index


cr_loan.drop(indices, inplace=True)

CREDIT RISK MODELING IN PYTHON


Let's practice!
CREDIT RISK MODELING IN PYTHON
Risk with missing
data in loan data
CREDIT RISK MODELING IN PYTHON

Michael Crabtree
Data Scientist, Ford Motor Company
What is missing data?
NULLs in a row instead of an actual value

An empty string ''

Not an entirely empty row

Can occur in any column in the data

CREDIT RISK MODELING IN PYTHON


Similarities with outliers
Negatively a ect machine learning model performance

May bias models in unanticipated ways

May cause errors for some machine learning models

CREDIT RISK MODELING IN PYTHON


Similarities with outliers
Negatively a ect machine learning model performance

May bias models in unanticipated ways

May cause errors for some machine learning models

Missing Data Type Possible Result


NULL in numeric column Error
NULL in string column Error

CREDIT RISK MODELING IN PYTHON


How to handle missing data
Generally three ways to handle missing data
Replace values where the data is missing

Remove the rows containing missing data

Leave the rows with missing data unchanged

Understanding the data determines the course of action

CREDIT RISK MODELING IN PYTHON


How to handle missing data
Generally three ways to handle missing data
Replace values where the data is missing

Remove the rows containing missing data

Leave the rows with missing data unchanged

Understanding the data determines the course of action

Missing Data Interpretation Action

NULL in loan_status Loan recently approved Remove from prediction data

NULL in person_age Age not recorded or disclosed Replace with median

CREDIT RISK MODELING IN PYTHON


Finding missing data
Null values are easily found by using the isnull() function

Null records can easily be counted with the sum() function

.any() method checks all columns

null_columns = cr_loan.columns[cr_loan.isnull().any()]
cr_loan[null_columns].isnull().sum()

# Total number of null values per column


person_home_ownership 25
person_emp_length 895
loan_intent 25
loan_int_rate 3140
cb_person_default_on_file 15

CREDIT RISK MODELING IN PYTHON


Replacing Missing data
Replace the missing data using methods like .fillna() with aggregate functions and
methods

cr_loan['loan_int_rate'].fillna((cr_loan['loan_int_rate'].mean()), inplace = True)

CREDIT RISK MODELING IN PYTHON


Dropping missing data
Uses indices to identify records the same as with outliers

Remove the records entirely using the .drop() method

indices = cr_loan[cr_loan['person_emp_length'].isnull()].index
cr_loan.drop(indices, inplace=True)

CREDIT RISK MODELING IN PYTHON


Let's practice!
CREDIT RISK MODELING IN PYTHON
Logistic regression
for probability of
default
CREDIT RISK MODELING IN PYTHON

Michael Crabtree
Data Scientist, Ford Motor Company
Probability of default
The likelihood that someone will default on a loan is the probability of default

A probability value between 0 and 1 like 0.86

loan_status of 1 is a default or 0 for non-default

CREDIT RISK MODELING IN PYTHON


Probability of default
The likelihood that someone will default on a loan is the probability of default

A probability value between 0 and 1 like 0.86

loan_status of 1 is a default or 0 for non-default

Probability of Default Interpretation Predicted loan status


0.4 Unlikely to default 0
0.90 Very likely to default 1
0.1 Very unlikely to default 0

CREDIT RISK MODELING IN PYTHON


Predicting probabilities
Probabilities of default as an outcome from machine learning
Learn from data in columns (features)

Classi cation models (default, non-default)

Two most common models:


Logistic regression

Decision tree

CREDIT RISK MODELING IN PYTHON


Logistic regression
Similar to the linear regression, but only produces values between 0 and 1

CREDIT RISK MODELING IN PYTHON


Training a logistic regression
Logistic regression available within the scikit-learn package

from sklearn.linear_model import LogisticRegression

Called as a function with or without parameters

clf_logistic = LogisticRegression(solver='lbfgs')

Uses the method .fit() to train

clf_logistic.fit(training_columns, np.ravel(training_labels))

Training Columns: all of the columns in our data except loan_status

Labels: loan_status (0,1)

CREDIT RISK MODELING IN PYTHON


Training and testing
Entire data set is usually split into two parts

CREDIT RISK MODELING IN PYTHON


Training and testing
Entire data set is usually split into two parts

Data Subset Usage Portion


Train Learn from the data to generate predictions 60%
Test Test learning on new unseen data 40%

CREDIT RISK MODELING IN PYTHON


Creating the training and test sets
Separate the data into training columns and labels

X = cr_loan.drop('loan_status', axis = 1)
y = cr_loan[['loan_status']]

Use train_test_split() function already within sci-kit learn

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.4, random_state=123)

test_size : percentage of data for test set

random_state : a random seed value for reproducibility

CREDIT RISK MODELING IN PYTHON


Let's practice!
CREDIT RISK MODELING IN PYTHON
Predicting the
probability of
default
CREDIT RISK MODELING IN PYTHON

Michael Crabtree
Data Scientist, Ford Motor Company
Logistic regression coefficients
# Model Intercept
array([-3.30582292e-10])
# Coefficients for ['loan_int_rate','person_emp_length','person_income']
array([[ 1.28517496e-09, -2.27622202e-09, -2.17211991e-05]])

# Calculating probability of default


int_coef_sum = -3.3e-10 +
(1.29e-09 * loan_int_rate) + (-2.28e-09 * person_emp_length) + (-2.17e-05 * person_income)
prob_default = 1 / (1 + np.exp(-int_coef_sum))
prob_nondefault = 1 - (1 / (1 + np.exp(-int_coef_sum)))

CREDIT RISK MODELING IN PYTHON


Interpreting coefficients
# Intercept
intercept = -1.02
# Coefficient for employment length
person_emp_length_coef = -0.056

For every 1 year increase in person_emp_length , the person is less likely to default

CREDIT RISK MODELING IN PYTHON


Interpreting coefficients
# Intercept
intercept = -1.02
# Coefficient for employment length
person_emp_length_coef = -0.056

For every 1 year increase in person_emp_length , the person is less likely to default

intercept person_emp_length value * coef probability of default

-1.02 10 (10 * -0.06 ) .17

-1.02 11 (11 * -0.06 ) .16

-1.02 12 (12 * -0.06 ) .15

CREDIT RISK MODELING IN PYTHON


Using non-numeric columns
Numeric: loan_int_rate , person_emp_length , person_income

Non-numeric:

cr_loan_clean['loan_intent']

EDUCATION
MEDICAL
VENTURE
PERSONAL
DEBTCONSOLIDATION
HOMEIMPROVEMENT

Will cause errors with machine learning models in Python unless processed

CREDIT RISK MODELING IN PYTHON


One-hot encoding
Represent a string with a number

CREDIT RISK MODELING IN PYTHON


One-hot encoding
Represent a string with a number

0 or 1 in a new column column_VALUE

CREDIT RISK MODELING IN PYTHON


Get dummies
Utilize the get_dummies() within pandas

# Separate the numeric columns


cred_num = cr_loan.select_dtypes(exclude=['object'])
# Separate non-numeric columns
cred_cat = cr_loan.select_dtypes(include=['object'])
# One-hot encode the non-numeric columns only
cred_cat_onehot = pd.get_dummies(cred_cat)
# Union the numeric columns with the one-hot encoded columns
cr_loan = pd.concat([cred_num, cred_cat_onehot], axis=1)

CREDIT RISK MODELING IN PYTHON


Predicting the future, probably
Use the .predict_proba() method within scikit-learn

# Train the model


clf_logistic.fit(X_train, np.ravel(y_train))
# Predict using the model
clf_logistic.predict_proba(X_test)

Creates array of probabilities of default

# Probabilities: [[non-default, default]]


array([[0.55, 0.45]])

CREDIT RISK MODELING IN PYTHON


Let's practice!
CREDIT RISK MODELING IN PYTHON
Credit model
performance
CREDIT RISK MODELING IN PYTHON

Michael Crabtree
Data Scientist, Ford Motor Company
Model accuracy scoring
Calculate accuracy

Use the .score() method from scikit-learn

# Check the accuracy against the test data


clf_logistic1.score(X_test,y_test)

0.81

81% of values for loan_status predicted correctly

CREDIT RISK MODELING IN PYTHON


ROC curve charts
Receiver Operating Characteristic curve
Plots true positive rate (sensitivity) against false positive rate (fall-out)

fallout, sensitivity, thresholds = roc_curve(y_test, prob_default)


plt.plot(fallout, sensitivity, color = 'darkorange')

CREDIT RISK MODELING IN PYTHON


Analyzing ROC charts
Area Under Curve (AUC): area between curve and random prediction

CREDIT RISK MODELING IN PYTHON


Default thresholds
Threshold: at what point a probability is a default

CREDIT RISK MODELING IN PYTHON


Setting the threshold
Relabel loans based on our threshold of 0.5

preds = clf_logistic.predict_proba(X_test)
preds_df = pd.DataFrame(preds[:,1], columns = ['prob_default'])
preds_df['loan_status'] = preds_df['prob_default'].apply(lambda x: 1 if x > 0.5 else 0)

CREDIT RISK MODELING IN PYTHON


Credit classification reports
classification_report() within scikit-learn

from sklearn.metrics import classification_report


classification_report(y_test, preds_df['loan_status'], target_names=target_names)

CREDIT RISK MODELING IN PYTHON


Selecting classification metrics
Select and store speci c components from the classification_report()

Use the precision_recall_fscore_support() function from scikit-learn

from sklearn.metrics import precision_recall_fscore_support


precision_recall_fscore_support(y_test,preds_df['loan_status'])[1][1]

CREDIT RISK MODELING IN PYTHON


Let's practice!
CREDIT RISK MODELING IN PYTHON
Model
discrimination and
impact
CREDIT RISK MODELING IN PYTHON

Michael Crabtree
Data Scientist, Ford Motor Company
Confusion matrices
Shows the number of correct and incorrect predictions for each loan_status

CREDIT RISK MODELING IN PYTHON


Default recall for loan status
Default recall (or sensitivity) is the proportion of true defaults predicted

CREDIT RISK MODELING IN PYTHON


Recall portfolio impact
Classi cation report - Underperforming Logistic Regression model

CREDIT RISK MODELING IN PYTHON


Recall portfolio impact
Classi cation report - Underperforming Logistic Regression model

Number of true defaults: 50,000

Loan Amount Defaults Predicted / Not Predicted Estimated Loss on Defaults


$50 .04 / .96 (50000 x .96) x 50 = $2,400,000

CREDIT RISK MODELING IN PYTHON


Recall, precision, and accuracy
Di cult to maximize all of them because there is a trade-o

CREDIT RISK MODELING IN PYTHON


Let's practice!
CREDIT RISK MODELING IN PYTHON
Gradient boosted
trees with XGBoost
CREDIT RISK MODELING IN PYTHON

Michael Crabtree
Data Scientist, Ford Motor Company
Decision trees
Creates predictions similar to logistic regression

Not structured like a regression

CREDIT RISK MODELING IN PYTHON


Decision trees for loan status
Simple decision tree for predicting loan_status probability of default

CREDIT RISK MODELING IN PYTHON


Decision tree impact

Loan True loan status Pred. Loan Status Loan payo value Selling Value Gain/Loss
1 0 1 $1,500 $250 -$1,250
2 0 1 $1,200 $250 -$950

CREDIT RISK MODELING IN PYTHON


A forest of trees
XGBoost uses many simplistic trees (ensemble)

Each tree will be slightly be er than a coin toss

CREDIT RISK MODELING IN PYTHON


Creating and training trees
Part of the xgboost Python package, called xgb here

Trains with .fit() just like the logistic regression model

# Create a logistic regression model


clf_logistic = LogisticRegression()
# Train the logistic regression
clf_logistic.fit(X_train, np.ravel(y_train))

# Create a gradient boosted tree model


clf_gbt = xgb.XGBClassifier()
# Train the gradient boosted tree
clf_gbt.fit(X_train,np.ravel(y_train))

CREDIT RISK MODELING IN PYTHON


Default predictions with XGBoost
Predicts with both .predict() and .predict_proba()
.predict_proba() produces a value between 0 and 1

.predict() produces a 1 or 0 for loan_status

# Predict probabilities of default


gbt_preds_prob = clf_gbt.predict_proba(X_test)
# Predict loan_status as a 1 or 0
gbt_preds = clf_gbt.predict(X_test)

# gbt_preds_prob
array([[0.059, 0.940], [0.121, 0.989]])
# gbt_preds
array([1, 1, 0...])

CREDIT RISK MODELING IN PYTHON


Hyperparameters of gradient boosted trees
Hyperparameters: model parameters (se ings) that cannot be learned from data

Some common hyperparameters for gradient boosted trees


learning_rate : smaller values make each step more conservative

max_depth : sets how deep each tree can go, larger means more complex

xgb.XGBClassifier(learning_rate = 0.2,
max_depth = 4)

CREDIT RISK MODELING IN PYTHON


Let's practice!
CREDIT RISK MODELING IN PYTHON
Column selection for
credit risk
CREDIT RISK MODELING IN PYTHON

Michael Crabtree
Data Scientist, Ford Motor Company
Choosing specific columns
We've been using all columns for predictions

# Selects a few specific columns


X_multi = cr_loan_prep[['loan_int_rate','person_emp_length']]

# Selects all data except loan_status


X = cr_loan_prep.drop('loan_status', axis = 1)

How you can tell how important each column is


Logistic Regression: column coe cients

Gradient Boosted Trees: ?

CREDIT RISK MODELING IN PYTHON


Column importances
Use the .get_booster() and .get_score() methods
Weight: the number of times the column appears in all trees

# Train the model


clf_gbt.fit(X_train,np.ravel(y_train))
# Print the feature importances
clf_gbt.get_booster().get_score(importance_type = 'weight')

{'person_home_ownership_RENT': 1, 'person_home_ownership_OWN': 2}

CREDIT RISK MODELING IN PYTHON


Column importance interpretation
# Column importances from importance_type = 'weight'
{'person_home_ownership_RENT': 1, 'person_home_ownership_OWN': 2}

CREDIT RISK MODELING IN PYTHON


Plotting column importances
Use the plot_importance() function

xgb.plot_importance(clf_gbt, importance_type = 'weight')


{'person_income': 315, 'loan_int_rate': 195, 'loan_percent_income': 146}

CREDIT RISK MODELING IN PYTHON


Choosing training columns
Column importance is used to sometimes decide which columns to use for training

Di erent sets a ect the performance of the models

Model Model Default


Columns Importances
Accuracy Recall
loan_int_rate, person_emp_length (100, 100) 0.81 0.67
loan_int_rate, person_emp_length,
(98, 70, 5) 0.84 0.52
loan_percent_income

CREDIT RISK MODELING IN PYTHON


F1 scoring for models
Thinking about accuracy and recall for di erent column groups is time consuming

F1 score is a single metric used to look at both accuracy and recall

Shows up as a part of the classification_report()

CREDIT RISK MODELING IN PYTHON


Let's practice!
CREDIT RISK MODELING IN PYTHON
Cross validation for
credit models
CREDIT RISK MODELING IN PYTHON

Michael Crabtree
Data Scientist, Ford Motor Company
Cross validation basics
Used to train and test the model in a way that simulates using the model on new data

Segments training data into di erent pieces to estimate future performance

Uses DMatrix , an internal structure optimized for XGBoost

Early stopping tells cross validation to stop a er a scoring metric has not improved a er a
number of iterations

CREDIT RISK MODELING IN PYTHON


How cross validation works
Processes parts of training data as (called folds) and tests against unused part

Final testing against the actual test set

1 h ps://scikit-learn.org/stable/modules/cross_validation.html

CREDIT RISK MODELING IN PYTHON


Setting up cross validation within XGBoost
# Set the number of folds
n_folds = 2
# Set early stopping number
early_stop = 5
# Set any specific parameters for cross validation
params = {'objective': 'binary:logistic',
'seed': 99, 'eval_metric':'auc'}

'binary':'logistic' is used to specify classi cation for loan_status

'eval_metric':'auc' tells XGBoost to score the model's performance on AUC

CREDIT RISK MODELING IN PYTHON


Using cross validation within XGBoost
# Restructure the train data for xgboost
DTrain = xgb.DMatrix(X_train, label = y_train)
# Perform cross validation
xgb.cv(params, DTrain, num_boost_round = 5, nfold=n_folds,
early_stopping_rounds=early_stop)

DMatrix() creates a special object for xgboost optimized for training

CREDIT RISK MODELING IN PYTHON


The results of cross validation
Creates a data frame of the values from the cross validation

CREDIT RISK MODELING IN PYTHON


Cross validation scoring
Uses cross validation and scoring metrics with cross_val_score() function in scikit-learn

# Import the module


from sklearn.model_selection import cross_val_score
# Create a gbt model
xg = xgb.XGBClassifier(learning_rate = 0.4, max_depth = 10)
# Use cross valudation and accuracy scores 5 consecutive times
cross_val_score(gbt, X_train, y_train, cv = 5)

array([0.92748092, 0.92575308, 0.93975392, 0.93378608, 0.93336163])

CREDIT RISK MODELING IN PYTHON


Let's practice!
CREDIT RISK MODELING IN PYTHON
Class imbalance in
loan data
CREDIT RISK MODELING IN PYTHON

Michael Crabtree
Data Scientist, Ford Motor Company
Not enough defaults in the data
The values of loan_status are the classes
Non-default: 0

Default: 1

y_train['loan_status'].value_counts()

loan_status Training Data Count Percentage of Total


0 13,798 78%
1 3,877 22%

CREDIT RISK MODELING IN PYTHON


Model loss function
Gradient Boosted Trees in xgboost use a loss function of log-loss
The goal is to minimize this value

True loan status Predicted probability Log Loss


1 0.1 2.3
0 0.9 2.3
An inaccurately predicted default has more negative nancial impact

CREDIT RISK MODELING IN PYTHON


The cost of imbalance
A false negative (default predicted as non-default) is much more costly

Person Loan Amount Potential Pro t Predicted Status Actual Status Losses
A $1,000 $10 Default Non-Default -$10
B $1,000 $10 Non-Default Default -$1,000
Log-loss for the model is the same for both, our actual losses is not

CREDIT RISK MODELING IN PYTHON


Causes of imbalance
Data problems
Credit data was not sampled correctly

Data storage problems

Business processes:
Measures already in place to not accept probable defaults

Probable defaults are quickly sold to other rms

Behavioral factors:
Normally, people do not default on their loans
The less o en they default, the higher their credit rating

CREDIT RISK MODELING IN PYTHON


Dealing with class imbalance
Several ways to deal with class imbalance in data

Method Pros Cons


Increases number of
Gather more data Percentage of defaults may not change
defaults
Increases recall for Model requires more tuning and
Penalize models
defaults maintenance
Sample data Least technical
Fewer defaults in data
di erently adjustment

CREDIT RISK MODELING IN PYTHON


Undersampling strategy
Combine smaller random sample of non-defaults with defaults

CREDIT RISK MODELING IN PYTHON


Combining the split data sets
Test and training set must be put back together

Create two new sets based on actual loan_status

# Concat the training sets


X_y_train = pd.concat([X_train.reset_index(drop = True),
y_train.reset_index(drop = True)], axis = 1)
# Get the counts of defaults and non-defaults
count_nondefault, count_default = X_y_train['loan_status'].value_counts()
# Separate nondefaults and defaults
nondefaults = X_y_train[X_y_train['loan_status'] == 0]
defaults = X_y_train[X_y_train['loan_status'] == 1]

CREDIT RISK MODELING IN PYTHON


Undersampling the non-defaults
Randomly sample data set of non-defaults

Concatenate with data set of defaults

# Undersample the non-defaults using sample() in pandas


nondefaults_under = nondefaults.sample(count_default)
# Concat the undersampled non-defaults with the defaults
X_y_train_under = pd.concat([nondefaults_under.reset_index(drop = True),
defaults.reset_index(drop = True)], axis=0)

CREDIT RISK MODELING IN PYTHON


Let's practice!
CREDIT RISK MODELING IN PYTHON
Model evaluation
and implementation
CREDIT RISK MODELING IN PYTHON

Michael Crabtree
Data Scientist, Ford Motor Company
Comparing classification reports
Create the reports with classification_report() and compare

CREDIT RISK MODELING IN PYTHON


ROC and AUC analysis
Models with be er performance will have more li

More li means the AUC score is higher

CREDIT RISK MODELING IN PYTHON


Model calibration
We want our probabilities of default to accurately represent the model's con dence level
The probability of default has a degree of uncertainty in it's predictions

A sample of loans and their predicted probabilities of default should be close to the
percentage of defaults in that sample

Sample of Average predicted Sample percentage of actual


Calibrated?
loans PD defaults
10 0.12 0.12 Yes
10 0.25 0.65 No

h p://datascienceassn.org/sites/default/ les/Predicting%20good%20probabilities%20with%20supervised%20lea

CREDIT RISK MODELING IN PYTHON


Calculating calibration
Shows percentage of true defaults for each predicted probability

Essentially a line plot of the results of calibration_curve()

from sklearn.calibration import calibration_curve


calibration_curve(y_test, probabilities_of_default, n_bins = 5)

# Fraction of positives
(array([0.09602649, 0.19521012, 0.62035996, 0.67361111]),
# Average probability
array([0.09543535, 0.29196742, 0.46898465, 0.65512207]))

CREDIT RISK MODELING IN PYTHON


Plotting calibration curves
plt.plot(mean_predicted_value, fraction_of_positives, label="%s" % "Example Model")

CREDIT RISK MODELING IN PYTHON


Checking calibration curves
As an example, two events selected (above and below perfect line)

CREDIT RISK MODELING IN PYTHON


Calibration curve interpretation

CREDIT RISK MODELING IN PYTHON


Calibration curve interpretation

CREDIT RISK MODELING IN PYTHON


Let's practice!
CREDIT RISK MODELING IN PYTHON
Credit acceptance
rates
CREDIT RISK MODELING IN PYTHON

Michael Crabtree
Data Scientist, Ford Motor Company
Thresholds and loan status
Previously we set a threshold for a range of prob_default values
This was used to change the predicted loan_status of the loan

preds_df['loan_status'] = preds_df['prob_default'].apply(lambda x: 1 if x > 0.4 else 0)

Loan prob_default threshold loan_status


1 0.25 0.4 0
2 0.42 0.4 1
3 0.75 0.4 1

CREDIT RISK MODELING IN PYTHON


Thresholds and acceptance rate
Use model predictions to set be er thresholds
Can also be used to approve or deny new loans

For all new loans, we want to deny probable defaults


Use the test data as an example of new loans

Acceptance rate: what percentage of new loans are accepted to keep the number of
defaults in a portfolio low
Accepted loans which are defaults have an impact similar to false negatives

CREDIT RISK MODELING IN PYTHON


Understanding acceptance rate
Example: Accept 85% of loans with the lowest prob_default

CREDIT RISK MODELING IN PYTHON


Calculating the threshold
Calculate the threshold value for an 85% acceptance rate

import numpy as np
# Compute the threshold for 85% acceptance rate
threshold = np.quantile(prob_default, 0.85)

0.804

Loan prob_default Threshold Predicted loan_status Accept or Reject

1 0.65 0.804 0 Accept


2 0.85 0.804 1 Reject

CREDIT RISK MODELING IN PYTHON


Implementing the calculated threshold
Reassign loan_status values using the new threshold

# Compute the quantile on the probabilities of default


preds_df['loan_status'] = preds_df['prob_default'].apply(lambda x: 1 if x > 0.804 else 0)

CREDIT RISK MODELING IN PYTHON


Bad Rate
Even with a calculated threshold, some of the accepted loans will be defaults

These are loans with prob_default values around where our model is not well calibrated

CREDIT RISK MODELING IN PYTHON


Bad rate calculation

#Calculate the bad rate


np.sum(accepted_loans['true_loan_status']) / accepted_loans['true_loan_status'].count()

If non-default is 0 , and default is 1 then the sum() is the count of defaults

The .count() of a single column is the same as the row count for the data frame

CREDIT RISK MODELING IN PYTHON


Let's practice!
CREDIT RISK MODELING IN PYTHON
Credit strategy and
minimum expected
loss
CREDIT RISK MODELING IN PYTHON

Michael Crabtree
Data Scientist, Ford Motor Company
Selecting acceptance rates
First acceptance rate was set to 85%, but other rates might be selected as well

Two options to test di erent rates:


Calculate the threshold, bad rate, and losses manually

Automatically create a table of these values and select an acceptance rate

The table of all the possible values is called a strategy table

CREDIT RISK MODELING IN PYTHON


Setting up the strategy table
Set up arrays or lists to store each value

# Set all the acceptance rates to test


accept_rates = [1.0, 0.95, 0.9, 0.85, 0.8, 0.75, 0.7, 0.65, 0.6, 0.55,
0.5, 0.45, 0.4, 0.35, 0.3, 0.25, 0.2, 0.15, 0.1, 0.05]
# Create lists to store thresholds and bad rates
thresholds = []
bad_rates = []

CREDIT RISK MODELING IN PYTHON


Calculating the table values
Calculate the threshold and bad rate for all acceptance rates

for rate in accept_rates:


# Calculate threshold
threshold = np.quantile(preds_df['prob_default'], rate).round(3)
# Store threshold value in a list
thresholds.append(np.quantile(preds_gbt['prob_default'], rate).round(3))
# Apply the threshold to reassign loan_status
test_pred_df['pred_loan_status'] = \
test_pred_df['prob_default'].apply(lambda x: 1 if x > thresh else 0)
# Create accepted loans set of predicted non-defaults
accepted_loans = test_pred_df[test_pred_df['pred_loan_status'] == 0]
# Calculate and store bad rate
bad_rates.append(np.sum((accepted_loans['true_loan_status'])
/ accepted_loans['true_loan_status'].count()).round(3))

CREDIT RISK MODELING IN PYTHON


Strategy table interpretation
strat_df = pd.DataFrame(zip(accept_rates, thresholds, bad_rates),
columns = ['Acceptance Rate','Threshold','Bad Rate'])

CREDIT RISK MODELING IN PYTHON


Adding accepted loans
The number of loans accepted for each acceptance rate
Can use len() or .count()

CREDIT RISK MODELING IN PYTHON


Adding average loan amount
Average loan_amnt from the test set data

CREDIT RISK MODELING IN PYTHON


Estimating portfolio value
Average value of accepted loan non-defaults minus average value of accepted defaults

Assumes each default is a loss of the loan_amnt

CREDIT RISK MODELING IN PYTHON


Total expected loss
How much we expect to lose on the defaults in our portfolio

# Probability of default (PD)


test_pred_df['prob_default']
# Exposure at default = loan amount (EAD)
test_pred_df['loan_amnt']
# Loss given default = 1.0 for total loss (LGD)
test_pred_df['loss_given_default']

CREDIT RISK MODELING IN PYTHON


Let's practice!
CREDIT RISK MODELING IN PYTHON
Course wrap up
CREDIT RISK MODELING IN PYTHON

Michael Crabtree
Data Scientist, Ford Motor Company
Your journey...so far
Prepare credit data for machine learning models
Important to understand the data

Improving the data allows for high performing simple models

Develop, score, and understand logistic regressions and gradient boosted trees

Analyze the performance of models by changing the data

Understand the nancial impact of results

Implement the model with an understanding of strategy

CREDIT RISK MODELING IN PYTHON


Risk modeling techniques
The models and framework in this course:
Discrete-time hazard model (point in time): the probability of default is a point-in-time
event

Stuctural model framework: the model explains the default even based on other factors

Other techniques
Through-the-cycle model (continuous time): macro-economic conditions and other e ects
are used, but the risk is seen as an independent event

Reduced-form model framework: a statistical approach estimating probability of default


as an independent Poisson-based event

CREDIT RISK MODELING IN PYTHON


Choosing models
Many machine learning models available, but logistic regression and tree models were used
These models are simple and explainable

Their performance on probabilities is acceptable

Many nancial sectors prefer model interpretability


Complex or "black-box" models are a risk because the business cannot explain their
decisions fully

Deep neural networks are o en too complex

CREDIT RISK MODELING IN PYTHON


Tips from me to you
Focus on the data
Gather as much data as possible

Use many di erent techniques to prepare and enhance the data

Learn about the business

Increase value through data

Model complexity can be a two-edged sword


Really complex models may perform well, but are seen as a "black-box"

In many cases, business users will not accept a model they cannot understand

Complex models can be very large and di cult to put into production

CREDIT RISK MODELING IN PYTHON


Thank you!
CREDIT RISK MODELING IN PYTHON
Why do we need
GARCH models
GARCH MODELS IN PYTHON

Chelsea Yang
Data Science Instructor
Course overview
GARCH: Generalized AutoRegressive Conditional Heteroskedasticity

Chapter 1: GARCH Model Fundamentals

Chapter 2: GARCH Model Con guration

Chapter 3: Model Performance Evaluation

Chapter 4: GARCH in Action

GARCH MODELS IN PYTHON


What is volatility
Describe the dispersion of nancial asset returns over time

O en computed as the standard deviation or variance of price returns

The higher the volatility, the riskier a nancial asset

GARCH MODELS IN PYTHON


How to compute volatility
Step 1: Calculate returns as percentage of price changes
P1 − P0
return =
P0
Step 2: Calculate the sample mean return
∑ni=1 returni
mean =
n
Step 3: Calculate the sample standard deviation


2
∑n
(returni − mean)
volatility = i=1
= √variance
n−1

GARCH MODELS IN PYTHON


Compute volatility in Python
Use pandas pct_change() method:

return_data = price_data.pct_change()

Use pandas std() method:

volatility = return_data.std()

GARCH MODELS IN PYTHON


Volatility conversion
Convert to monthly volatility from daily:

(assume 21 trading days in a month)

σmonthly = √21 ∗ σd

Convert to annual volatility from daily:

(assume 252 trading days in a year)

σannual = √252 ∗ σd

GARCH MODELS IN PYTHON


The challenge of volatility modeling
Heteroskedasticity:

In ancient Greek: "di erent" (hetero) + "dispersion" (skedasis)

A time series demonstrates varying volatility systematically over time

GARCH MODELS IN PYTHON


Detect heteroskedasticity
Homoskedasticity vs Heteroskedasticity

GARCH MODELS IN PYTHON


Volatility clustering
VIX historical prices:

GARCH MODELS IN PYTHON


Let's practice!
GARCH MODELS IN PYTHON
What are ARCH and
GARCH
GARCH MODELS IN PYTHON

Chelsea Yang
Data Science Instructor
First came the ARCH
Auto Regressive Conditional Heteroskedasticity

Developed by Robert F. Engle (Nobel prize laureate 2003)

GARCH MODELS IN PYTHON


Then came the GARCH
"Generalized" ARCH

Developed by Tim Bollerslev (Robert F. Engle's student)

GARCH MODELS IN PYTHON


Related statistical terms
White noise (z): Uncorrelated random variables with a zero mean and a nite variance

Residual = predicted value - observed value

GARCH MODELS IN PYTHON


Model notations
Expected return: Expected volatility:
μt = Expected[rt ∣I(t − 1)] σ 2 = Expected[(rt − μt )2 ∣I(t − 1)]

Residual (prediction error): Volatility is related to the residuals:


rt = μt + ϵt ϵt = σt ∗ ζ(W hiteN oise)

GARCH MODELS IN PYTHON


Model equations: ARCH

GARCH MODELS IN PYTHON


Model equations: GARCH

GARCH MODELS IN PYTHON


Model intuition
Autoregressive: predict future behavior based on past behavior

Volatility as a weighted average of past information

GARCH MODELS IN PYTHON


GARCH(1,1) parameter constraints
To make the GARCH(1,1) process realistic, it requires:

All parameters are non-negative, so the variance cannot be negative.

ω, α, β >= 0

Model estimations are "mean-reverting" to the long-run variance.

α+β <1

long-run variance:
ω/(1 − α − β)

GARCH MODELS IN PYTHON


GARCH(1,1) parameter dynamics
The larger the α, the bigger the immediate impact of the shock
The larger the β , the longer the duration of the impact

GARCH MODELS IN PYTHON


Let's practice!
GARCH MODELS IN PYTHON
How to implement
GARCH models in
Python
GARCH MODELS IN PYTHON

Chelsea Yang
Data Science Instructor
Python "arch" package
from arch import arch_model

1Kevin Sheppard. (2019, March 28). bashtage/arch: Release 4.8.1 (Version 4.8.1). Zenodo.
h p://doi.org/10.5281/zenodo.2613877

GARCH MODELS IN PYTHON


Workflow
Develop a GARCH model in three steps:

1. Specify the model

2. Fit the model

3. Make a forecast

GARCH MODELS IN PYTHON


Model specification
Model assumptions:

Distribution: "normal" (default), "t" , "skewt"

Mean model: "constant" (default), "zero" , "AR"

Volatility model: "GARCH" (default), "ARCH" , "EGARCH"

basic_gm = arch_model(sp_data['Return'], p = 1, q = 1,
mean = 'constant', vol = 'GARCH', dist = 'normal')

GARCH MODELS IN PYTHON


Model fitting
Display model ing output a er every n iterations:

gm_result = gm_model.fit(update_freq = 4)

Turn o the display:

gm_result = gm_model.fit(disp = 'off')

GARCH MODELS IN PYTHON


Fitted results: parameters
Estimated by "maximum likelihood method"

print(gm_result.params)

mu 0.077239
omega 0.039587
alpha[1] 0.167963
beta[1] 0.786467
Name: params, dtype: float64

GARCH MODELS IN PYTHON


Fitted results: summary
print(gm_result.summary())

GARCH MODELS IN PYTHON


Fitted results: plots
gm_result.plot()

GARCH MODELS IN PYTHON


Model forecasting
# Make 5-period ahead forecast
gm_forecast = gm_result.forecast(horizon = 5)

# Print out the last row of variance forecast


print(gm_forecast.variance[-1:])

h.1 h.2 h.3 h.4 h.5


Date
2019-10-10 0.994079 0.988366 0.982913 0.977708 0.972741

h.1 in row "2019-10-10": 1-step ahead forecast made using data up to and including that date

GARCH MODELS IN PYTHON


Let's practice!
GARCH MODELS IN PYTHON
Distribution
assumptions
GARCH MODELS IN PYTHON

Chelsea Yang
Data Science Instructor
Why make assumptions
Volatility is not directly observable

GARCH model use residuals as volatility shocks


rt = μt + ϵt

Volatility is related to the residuals:


ϵt = σt ∗ ζ(W hiteN oise)

GARCH MODELS IN PYTHON


Standardized residuals
Residual = predicted return - mean return
residuals = ϵt = rt − μt

Standardized residual = residual / return volatility


ϵt
std Resid =
σt

GARCH MODELS IN PYTHON


Residuals in GARCH
gm_std_resid = gm_result.resid / gm_result.conditional_volatility

plt.hist(gm_std_resid, facecolor = 'orange',label = 'standardized residuals')

GARCH MODELS IN PYTHON


Fat tails
Higher probability to observe large (positive or negative) returns than under a normal
distribution

GARCH MODELS IN PYTHON


Skewness
Measure of asymmetry of a probability distribution

GARCH MODELS IN PYTHON


Student's t-distribution

ν parameter of a Student's t-distribution indicates its shape

GARCH MODELS IN PYTHON


GARCH with t-distribution
arch_model(my_data, p = 1, q = 1,
mean = 'constant', vol = 'GARCH',
dist = 't')

Distribution
========================================================================
coef std err t P>|t| 95.0% Conf. Int.
.-----------------------------------------------------------------------
nu 4.9249 0.507 9.709 2.768e-22 [ 3.931, 5.919]
========================================================================

GARCH MODELS IN PYTHON


GARCH with skewed t-distribution
arch_model(my_data, p = 1, q = 1,
mean = 'constant', vol = 'GARCH',
dist = 'skewt')

Distribution
===========================================================================
coef std err t P>|t| 95.0% Conf. Int.
.--------------------------------------------------------------------------
nu 5.2437 0.575 9.118 7.681e-20 [ 4.117, 6.371]
lambda -0.0822 2.541e-02 -3.235 1.216e-03 [ -0.132,-3.241e-02]
===========================================================================

GARCH MODELS IN PYTHON


Let's practice!
GARCH MODELS IN PYTHON
Mean model
specifications
GARCH MODELS IN PYTHON

Chelsea Yang
Data Science Instructor
Constant mean by default
constant mean: generally works well with most nancial return data

arch_model(my_data, p = 1, q = 1,
mean = 'constant', vol = 'GARCH')

GARCH MODELS IN PYTHON


Zero mean assumption
zero mean: use when the mean has been modeled separately

arch_model(my_data, p = 1, q = 1,
mean = 'zero', vol = 'GARCH')

GARCH MODELS IN PYTHON


Autoregressive mean
AR mean: model the mean as an autoregressive (AR) process

arch_model(my_data, p = 1, q = 1,
mean = 'AR', lags = 1, vol = 'GARCH')

GARCH MODELS IN PYTHON


Let's practice!
GARCH MODELS IN PYTHON
Volatility models for
asymmetric shocks
GARCH MODELS IN PYTHON

Chelsea Yang
Data Science Instructor
Asymmetric shocks in financial data
News impact curve:

GARCH MODELS IN PYTHON


Leverage effect
Debt-equity Ratio = Debt / Equity
Stock price goes down, debt-equity ratio goes up

Riskier!

GARCH MODELS IN PYTHON


GJR-GARCH

GARCH MODELS IN PYTHON


GJR-GARCH in Python
arch_model(my_data, p = 1, q = 1, o = 1,
mean = 'constant', vol = 'GARCH')

GARCH MODELS IN PYTHON


EGARCH
A popular option to model asymmetric shocks

Exponential GARCH

Add a conditional component to model the asymmetry in shocks similar to the GJR-GARCH

No non-negative constraints on alpha, beta so it runs faster

GARCH MODELS IN PYTHON


EGARCH in Python
arch_model(my_data, p = 1, q = 1, o = 1,
mean = 'constant', vol = 'EGARCH')

GARCH MODELS IN PYTHON


Which model to use
GJR-GARCH or EGARCH?

Which model is be er depends on the data

GARCH MODELS IN PYTHON


Let's practice!
GARCH MODELS IN PYTHON
GARCH rolling
window forecast
GARCH MODELS IN PYTHON

Chelsea Yang
Data Science Instructor
Rolling window for out-of-sample forecast
An exciting part of nancial modeling: predict the unknown

Rolling window forecast: repeatedly perform model ing and forecast as time rolls forward

GARCH MODELS IN PYTHON


Expanding window forecast
Continuously add new data points to the sample

GARCH MODELS IN PYTHON


Motivations of rolling window forecast
Avoid lookback bias

Less subject to over ing

Adapt forecast to new observations

GARCH MODELS IN PYTHON


Implement expanding window forecast
Expanding window forecast:

for i in range(120):
gm_result = basic_gm.fit(first_obs = start_loc,
last_obs = i + end_loc, disp = 'off')
temp_result = gm_result.forecast(horizon = 1).variance

GARCH MODELS IN PYTHON


Fixed rolling window forecast
New data points are added while old ones are dropped from the sample

GARCH MODELS IN PYTHON


Implement fixed rolling window forecast
Fixed rolling window forecast:

for i in range(120):
# Specify rolling window range for model fitting
gm_result = basic_gm.fit(first_obs = i + start_loc,
last_obs = i + end_loc, disp = 'off')
temp_result = gm_result.forecast(horizon = 1).variance

GARCH MODELS IN PYTHON


How to determine window size
Usually determined on a case-by-case basis

Too wide window size: include obsolete data that may lead to higher variance

Too narrow window size: exclude relevant data that may lead to higher bias

The optimal window size: trade-o to balance bias and variance

GARCH MODELS IN PYTHON


Let's practice!
GARCH MODELS IN PYTHON
Significance testing
of model parameters
GARCH MODELS IN PYTHON

Chelsea Yang
Data Science Instructor
Do I need this parameter?
Is it relevant

KISS: keep it simple stupid

Always prefer a parsimonious model

GARCH MODELS IN PYTHON


Hypothesis test
Null hypothesis (H0): a claim to be veri ed

H0: parameter value = 0

If H0 cannot be rejected, leave out the parameter

GARCH MODELS IN PYTHON


Statistical significance
Quantify having the observed results by chance

Common threshold: 5%

GARCH MODELS IN PYTHON


P-value
The odds of the observed results could have happened by chance

The lower the p-value, the more ridiculous the null hypothesis looks

Reject the null hypothesis if p-value < signi cance level

GARCH MODELS IN PYTHON


P-value example
print(gm_result.summary()) print(gm_result.pvalues)

mu 9.031206e-08
omega 1.619415e-05
alpha[1] 4.283526e-10
beta[1] 1.302531e-183
Name: pvalues, dtype: float64

GARCH MODELS IN PYTHON


T-statistic
T-statistic = estimated parameter / standard error

The absolute value of the t-statistic is a distance measure

If |t-statistic| > 2: keep the parameter in the GARCH model

GARCH MODELS IN PYTHON


T-statistic example
print(gm_result.summary()) print(gm_result.tvalues)

mu 5.345210
omega 4.311785
alpha[1] 6.243330
beta[1] 28.896991
Name: tvalues, dtype: float64

# Manual calculation
t = gm_result.params/gm_result.std_err

GARCH MODELS IN PYTHON


Let's practice!
GARCH MODELS IN PYTHON
Validation of GARCH
model assumptions
GARCH MODELS IN PYTHON

Chelsea Yang
Data Science Instructor
Visual check

GARCH MODELS IN PYTHON


Autocorrelation
Describe the correlation of a variable with itself given a time lag

Existence of autocorrelation in the standardized residuals indicates the model may not be
sound

To detect autocorrelation:

ACF plot

Ljung-Box

GARCH MODELS IN PYTHON


ACF plot
ACF: AutoCorrelation Function

ACF Plot: visual representation of the autocorrelation by lags

Red area in the plot indicates the con dence level (alpha = 5%)

GARCH MODELS IN PYTHON


ACF plot in Python
from statsmodels.graphics.tsaplots import plot_acf

plot_acf(my_data, alpha = 0.05)

GARCH MODELS IN PYTHON


Ljung-Box test
Test whether any of a group of autocorrelations of a time series are di erent from zero

H0: the data is independently distributed

P-value < 5%: the model is not sound

GARCH MODELS IN PYTHON


Ljung-Box test Python
# Import the Python module
from statsmodels.stats.diagnostic import acorr_ljungbox

# Perform the Ljung-Box test


lb_test = acorr_ljungbox(std_resid , lags = 10)

# Check p-values
print('P-values are: ', lb_test[1])

GARCH MODELS IN PYTHON


Let's practice!
GARCH MODELS IN PYTHON
Goodness of fit
measures
GARCH MODELS IN PYTHON

Chelsea Yang
Data Science Instructor
Goodness of fit
Can model do a good job explaining the data?

1. Maximum likelihood

2. Information criteria

GARCH MODELS IN PYTHON


Maximum likelihood
Maximize the probability of ge ing the data observed under the assumed model

Prefer models with larger likelihood values

GARCH MODELS IN PYTHON


Log-likelihood in Python
Typically used in log form: log-likelihood

print(gm_result.loglikelihood)

GARCH MODELS IN PYTHON


Overfitting
Fit in-sample data well, but perform poorly on out-out-sample predictions

Usually due to the model is overly complex

GARCH MODELS IN PYTHON


Information criteria
Measure the trade-o between goodness of t and model complexity

Likelihood + penalty for model complexity

AIC: Akaike's Information Criterion

BIC: Bayesian Information Criterion

_Prefer models with the lower information criterion score _

GARCH MODELS IN PYTHON


AIC vs. BIC
Generally they agree with each other

BIC penalizes model complexity more severely

GARCH MODELS IN PYTHON


AIC/BIC in Python

print(gm_result.aic)
print(gm_result.bic)

GARCH MODELS IN PYTHON


Let's practice!
GARCH MODELS IN PYTHON
GARCH model
backtesting
GARCH MODELS IN PYTHON

Chelsea Yang
Data Science Instructor
Backtesting
An approach to evaluate model forecasting capability

Compare the model predictions with the actual historical data

GARCH MODELS IN PYTHON


In-sample vs. out-of-sample
In-sample: model ing

Out-of-sample: backtesting

GARCH MODELS IN PYTHON


MAE
Mean Absolute Error

GARCH MODELS IN PYTHON


MSE
Mean Squared Error

GARCH MODELS IN PYTHON


Calculate MAE, MSE in Python
from sklearn.metrics import mean_absolute_error, mean_squared_error

# Call function to calculate MAE


mae = mean_absolute_error(observation, forecast)

# Call function to calculate MSE


mse = mean_squared_error(observation, forecast)

GARCH MODELS IN PYTHON


Let's practice!
GARCH MODELS IN PYTHON
VaR in financial risk
management
GARCH MODELS IN PYTHON

Chelsea Yang
Data Science Instructor
Risk management mindset
Rule No.1: Never lose money

Rule No.2 Never forget Rule No.1

-- Warren Bu e

GARCH MODELS IN PYTHON


What is VaR
VaR stands for Value at Risk

Three ingredients:
1. portfolio

2. time horizon

3. probability

GARCH MODELS IN PYTHON


VaR examples
_1-day 5% VaR of $1 million _

5% probability the portfolio will fall in value by 1 million dollars or more over a 1-day period

10-day 1% VaR of $9 million

1% probability the portfolio will fall in value by 9 million dollars or more over a 10-day period

GARCH MODELS IN PYTHON


VaR in risk management
Set risk limits

VaR exceedance: portfolio loss exceeds the VaR

GARCH MODELS IN PYTHON


Dynamic VaR with GARCH
More realistic VaR estimation with GARCH

VaR = mean + (GARCH vol) * quantile

VaR = mean_forecast.values + np.sqrt(variance_forecast).values * quantile

GARCH MODELS IN PYTHON


Dynamic VaR calculation
Step 1: Use GARCH model to make variance forecast

# Specify and fit a GARCH model


basic_gm = arch_model(bitcoin_data['Return'], p = 1, q = 1,
mean = 'constant', vol = 'GARCH', dist = 't')
gm_result = basic_gm.fit()

# Make variance forecast


gm_forecast = gm_result.forecast(start = '2019-01-01')

GARCH MODELS IN PYTHON


Dynamic VaR calculation (cont.)
Step 2: Use GARCH model to obtain forward-looking mean and volatility

mean_forecast = gm_forecast.mean['2019-01-01':]
variance_forecast = gm_forecast.variance['2019-01-01':]

Step 3: Obtain the quantile according to a con dence level


1. Parametric VaR

2. Empirical VaR

GARCH MODELS IN PYTHON


Parametric VaR
Estimate quantiles based on GARCH assumed distribution of the standardized residuals

# Assume a Student's t-distribution


# ppf(): Percent point function

q_parametric = garch_model.distribution.ppf(0.05, nu)

GARCH MODELS IN PYTHON


Empirical VaR
Estimate quantiles based on the observed distribution of the GARCH standardized residuals

q_empirical = std_resid.quantile(0.05)

GARCH MODELS IN PYTHON


Let's practice!
GARCH MODELS IN PYTHON
Dynamic covariance
in portfolio
optimization
GARCH MODELS IN PYTHON

Chelsea Yang
Data Science Instructor
What is covariance
Describe the relationship between movement of two variables

Positive covariance: move together

Negative covariance; move in the opposite directions

GARCH MODELS IN PYTHON


Dynamic covariance with GARCH
If two asset returns have correlation ρ and time-varying volatility of σ1 and σ2 :

Covariance = ρ ⋅ σ1 ⋅ σ2

covariance = correlation * garch_vol1 * garch_vol2

GARCH MODELS IN PYTHON


Calculate GARCH covariance in Python
Step 1: Fit GARCH models and obtain volatility for each return series

# gm_eur, gm_cad are fitted GARCH models


vol_eur = gm_eur.conditional_volatility
vol_cad = gm_cad.conditional_volatility

Step 2: Compute standardized residuals from the ed GARCH models

resid_eur = gm_eur.resid/vol_eur
resid_cad = gm_cad.resid/vol_cad

GARCH MODELS IN PYTHON


Calculate GARCH covariance in Python (cont.)
Step 3: Compute ρ as simple correlation of standardized residuals

corr = np.corrcoef(resid_eur, resid_cad)[0,1]

Step 4: Compute GARCH covariance by multiplying the correlation and volatility.

covariance = corr * vol_eur * vol_cad

GARCH MODELS IN PYTHON


Modern portfolio theory (MPT)
Pioneered by Harry Markowitz in his paper "Portfolio Selection"(1952)

Take advantage of the diversi cation e ect

The optimal portfolio can yield the maximum return with the minimum risk

GARCH MODELS IN PYTHON


MPT intuition
Variance of a simple two-asset portfolio:

_W1∗ Variance1 + W2∗ Variance2 + 2∗W1∗W2∗Covariance _

Diversi cation e ect:

Risk can be reduced in a portfolio by pairing assets that have a negative covariance

GARCH MODELS IN PYTHON


Let's practice!
GARCH MODELS IN PYTHON
Dynamic Beta in
portfolio
management
GARCH MODELS IN PYTHON

Chelsea Yang
Data Science Instructor
What is Beta
Stock Beta:

a measure of stock volatility in relation to the general market

Systematic risk:

the portion of the risk that cannot be diversi ed away

GARCH MODELS IN PYTHON


Beta in portfolio management
_Gauge investment risk _

Market Beta = 1: used as benchmark

Beta > 1: the stock bears more risks than the general market

Beta < 1: the stock bears less risks than the general market

GARCH MODELS IN PYTHON


Beta in CAPM
Estimate risk premium of a stock

CAPM: Capital Asset Pricing Model

E(Rs ) = Rf + β (E(Rm ) − Rf )

E(Rs ): stock required rate of return


Rf : risk-free rate (e.g. Treasuries)
E(Rm ): market expected return (e.g. S&P 500)
E(Rm ) − Rf : Market premium

GARCH MODELS IN PYTHON


Dynamic Beta with GARCH
Beta = ρ * σ _stock / σ __market

GARCH MODELS IN PYTHON


Calculate dynamic Beta in Python
1). Compute correlation between S&P500 and stock

resid_stock = stock_gm.resid / stock_gm.conditional_volatility


resid_sp500 = sp500_gm.resid / sp500_gm.conditional_volatility

correlation = numpy.corrcoef(resid_stock, resid_sp500)[0, 1]

2). Compute dynamic Beta for the stock

stock_beta = correlation * (stock_gm.conditional_volatility /


sp500_gm.conditional_volatility)

GARCH MODELS IN PYTHON


Let's practice!
GARCH MODELS IN PYTHON
Congratulations!
GARCH MODELS IN PYTHON

Chelsea Yang
Data Science Instructor
You did it
Fit GARCH models

Make volatility forecast

Evaluate model performance

GARCH in action: VaR, covariance, Beta

GARCH MODELS IN PYTHON


Going forward
Time series analysis

ARIMA (AutoRegressive Integrated Moving Average) models

CAPM (Capital Asset Pricing Model)

Portfolio optimization

GARCH MODELS IN PYTHON


Have fun and keep
improving!
GARCH MODELS IN PYTHON

You might also like