Applied Finance in Python
Applied Finance in Python
Dakota Wixom
Quantitative Analyst | QuantCourse.com
Course overview
Learn how to analyze investment return distributions, build
portfolios and reduce risk, and identify key factors which are
driving portfolio returns.
Portfolio Investing
Factor Investing
Historical drawdown
import pandas as pd
StockPrices = pd.read_csv('StockData.csv', parse_dates=['Date'])
StockPrices = StockPrices.sort_values(by='Date')
StockPrices.set_index('Date', inplace=True)
Dakota Wixom
Quantitative Analyst | QuantCourse.com
Moments of distributions
Probability distributions have the following moments:
1) Mean (μ)
2) Variance ( σ2 )
3) Skewness
4) Kurtosis
Mean = μ
Variance = σ2
Skewness = 0
Kurtosis = 3
σ=1
μ=0
import numpy as np
np.mean(StockPrices["Returns"])
0.0003
import numpy as np
((1+np.mean(StockPrices["Returns"]))**252)-1
0.0785
Variance = σ2
O en represented in
mathematical notation as σ,
or referred to as volatility
import numpy as np
np.std(StockPrices["Returns"])
0.0256
np.std(StockPrices["Returns"])**2
0.000655
import numpy as np
np.std(StockPrices["Returns"]) * np.sqrt(252)
0.3071
Dakota Wixom
Quantitative Analyst | QuantCourse.com
Skewness is the third moment
of a distribution.
0.225
Leptokurtic: When a
distribution has positive
excess kurtosis (kurtosis
greater than 3)
2.44
The null hypothesis of the Shapiro-Wilk test is that the data are
normally distributed.
Dakota Wixom
Quantitative Analyst | QuantCourse.com
Calculating portfolio returns
Portfolio Return Formula:
Rp = Ra1 wa1 + Ra2 wa2 + ... + Ran wa1
Rp : Portfolio return
Ran : Return for asset n
wan : Weight for asset n
import numpy as np
portfolio_weights = np.array([0.25, 0.35, 0.10, 0.20, 0.10])
port_ret = StockReturns.mul(portfolio_weights, axis=1).sum(axis=1)
port_ret
Date
2017-01-03 0.008082
2017-01-04 0.000161
2017-01-05 0.003448
...
StockReturns["Portfolio"] = port_ret
import numpy as np
numstocks = 5
portfolio_weights_ew = np.repeat(1/numstocks, numstocks)
StockReturns.iloc[:,0:numstocks].mul(portfolio_weights_ew, axis=1).sum(axis=1)
Date
2017-01-03 0.008082
2017-01-04 0.000161
2017-01-05 0.003448
...
mcapn
wmcapn = n
∑i=1 mcapi
import numpy as np
market_capitalizations = np.array([100, 200, 100, 100])
mcap_weights = market_capitalizations/sum(market_capitalizations)
mcap_weights
Dakota Wixom
Quantitative Analyst | QuantCourse.com
Pearson correlation
Examples of di erent correlations between two
random variables:
correlation_matrix = StockReturns.corr()
print(correlation_matrix)
σ : Asset volatility
ρ1,2 : Correlation between assets 1 and 2
cov_mat = StockReturns.cov()
cov_mat
import numpy as np
port_vol = np.sqrt(np.dot(weights.T, np.dot(cov_mat, weights)))
port_vol
0.035
Dakota Wixom
Quantitative Analyst | QuantCourse.com
100,000 randomly generated portfolios
Ra − rf
S=
σa
S: Sharpe Ratio
Ra : Asset return
rf : Risk-free rate of return
σa : Asset volatility
numstocks = 5
risk_free = 0
df["Sharpe"] = (df["Returns"] - risk_free) / df["Volatility"]
MSR = df.sort_values(by=['Sharpe'], ascending=False)
MSR_weights = MSR.iloc[0, 0:numstocks]
np.array(MSR_weights)
numstocks = 5
GMV = df.sort_values(by=['Volatility'], ascending=True)
GMV_weights = GMV.iloc[0, 0:numstocks]
np.array(GMV_weights)
Dakota Wixom
Quantitative Analyst | QuantCourse.com
The founding father of asset pricing models
CAPM
The Capital Asset Pricing Model is the fundamental building
block for many other asset pricing models and factor models in
nance.
Example:
Investing in Brazil:
10% Portfolio Return - 15% Risk Free Rate = -5% Excess Return
Cov(RP , RB )
βP =
V ar(RB )
βP : Portfolio beta
Cov(RP , RB ): The co-variance between the portfolio (P)
and the benchmark market index (B)
covariance_matrix = Data[["Port_Excess","Mkt_Excess"]].cov()
covariance_coefficient = covariance_matrix.iloc[0, 1]
benchmark_variance = Data["Mkt_Excess"].var()
portfolio_beta = covariance_coefficient / benchmark_variance
portfolio_beta
0.93
0.93
0.70
adjusted_r_squared = fit.rsquared_adj
0.65
Dakota Wixom
Quantitative Analyst | QuantCourse.com
The Fama-French 3 factor Model
RP =
0.90
fit.pvalues["HML"]
0.0063
True
fit.params["HML"]
0.502
fit.params["SMB"]
-0.243
portfolio_alpha = fit.params["Intercept"]
portfolio_alpha_annualized = ((1 + portfolio_alpha) ** 252) - 1
portfolio_alpha_annualized
0.045
Dakota Wixom
Quantitative Analyst | QuantCourse.com
Fama French 1993
The original paper that started it all:
CMA: Investment
0.92
Dakota Wixom
Quantitative Analyst | QuantCourse.com
Estimating tail risk
Tail risk is the risk of extreme investment outcomes, most
notably on the negative side of a distribution.
Historical Drawdown
Value at Risk
Monte-Carlo Simulation
rt
Drawdown = −1
RM
rt : Cumulative return at
time t
RM : Running maximum
running_max = np.maximum.accumulate(cum_rets)
running_max[running_max < 1] = 1
drawdown = (cum_rets) / running_max - 1
drawdown
Date Return
2007-01-03 -0.042636
2007-01-04 -0.081589
2007-01-05 -0.073062
-0.023
var_level = 95
var_95 = np.percentile(StockReturns, 100 - var_level)
cvar_95 = StockReturns[StockReturns <= var_95].mean()
cvar_95
-0.025
Dakota Wixom
Quantitative Analyst | QuantCourse.com
VaR quantiles
mu = np.mean(StockReturns)
std = np.std(StockReturns)
confidence_level = 0.05
VaR = norm.ppf(confidence_level, mu, std)
VaR
-0.0235
forecast_days = 5
forecast_var95_5day = var_95*np.sqrt(forecast_days)
forecast_var95_5day
-0.0525
Dakota Wixom
Quantitative Analyst | QuantCourse.com
Random walks
Most o en, random walks in
nance are rather simple
compared to physics:
mu = np.mean(StockReturns)
std = np.std(StockReturns)
T = 252
S0 = 10
rand_rets = np.random.normal(mu, std, T) + 1
forecasted_values = S0 * (rand_rets.cumprod())
forecasted_values
mu = 0.0005
vol = 0.001
T = 252
sim_returns = []
for i in range(100):
rand_rets = np.random.normal(mu, vol, T)
sim_returns.append(rand_rets)
var_95 = np.percentile(sim_returns, 5)
var_95
-0.028
Dakota Wixom
Quantitative Analyst | QuantCourse.com
Summary
Moments and Distributions
Portfolio Composition
Markowitz Optimization
Alpha
Value at Risk
Uncertainty:
Future outcomes are unknown
Outcomes impact planning decisions
Stocks
Bonds
Stock options
DataFrame prices
.pct_change() method
prices = pandas.read_csv("portfolio.csv")
returns = prices.pct_change()
weights = (weight_1, weight_2, ...)
portfolio_returns = returns.dot(weights)
covariance = returns.cov()*252
print(covariance)
covariance = returns.cov()*252
print(covariance)
covariance = returns.cov()*252
print(covariance)
windowed = portfolio_returns.rolling(30)
volatility = windowed.std()*np.sqrt(252)
volatility.plot()
.set_ylabel("Standard Deviation...")
Jamsheed Shorish
Computational Economist
Risk factors
Volatility: measure of dispersion of returns
around expected value
Firm/sector characteristics
Firm size (market capitalization)
Book-to-market ratio
Sector shocks
Avalanche of delinquencies/default
destroyed collateral value
90-day mortgage delinquency: risk factor
import statsmodels.api as sm
regression = sm.OLS(returns, delinquencies).fit()
print(regression.summary())
Jamsheed Shorish
Computational Economist
The risk-return trade-off
Risk factors: sources of uncertainty affecting return
Intuitively: greater uncertainty (more risk) compensated by greater return
Investor risk appetite: defines one quantified relationship between risk and return
Constrained Line Algorithm ( CLA ) class: generates the entire efficient frontier
Requires covariance matrix of returns
expected_returns = mean_historical_return(prices)
efficient_cov = CovarianceShrinkage(prices).ledoit_wolf()
cla = CLA(expected_returns, efficient_cov)
minimum_variance = cla.min_volatility()
(ret, vol, weights) = cla.efficient_frontier()
Jamsheed Shorish
CEO, Shorish Research
The Loss Distribution
Forex Example: Loss distribution: Random realizations of r
Portfolio value in U.S. dollars is USD 100 => distribution of portfolio losses in the
Can express questions like "What is the maximum loss that would take place 95% of the
time?"
Here the confidence level is 95%.
5. scipy.stats loss distribution: percent point function .ppf() can also be used
loss = pd.Series(observations)
VaR_95 = loss.quantile(0.95)
print("VaR_95 = ", VaR_95)
Var_95 = 1.6192834157254088
losses = pd.Series(scipy.stats.norm.rvs(size=1000))
VaR_95 = scipy.stats.norm.ppf(0.95)
CVaR_95 = (1/(1 - 0.95))*scipy.stats.norm.expect(lambda x: x, lb = VaR_95)
print("CVaR_95 = ", CVaR_95)
CVaR_95 = 2.153595332530393
Jamsheed Shorish
Computational Economist
A vacation analogy
Hotel reservations for vacation
Pay in advance, before stay
Low room rate
Non-refundable:
Total non-refundable hotel cost: € 500
Partially refundable:
Refundable hotel cost: € 550
x = np.linspace(-3, 3, 100)
plt.plot(x, t.pdf(x, df = 2))
plt.plot(x, t.pdf(x, df = 5))
plt.plot(x, t.pdf(x, df = 30))
Jamsheed Shorish
Computational Economist
Risk management via modern portfolio theory
Efficient Portfolio
Portfolio weights maximize return given
risk level
ec = pypfopt.efficient_frontier.EfficientCVaR(None, returns)
optimal_weights = ec.min_cvar()
ef = EfficientFrontier(None, e_cov)
min_vol_weights = ef.min_volatility()
print(min_vol_weights)
{'Citibank': 0.0,
'Morgan Stanley': 5.0784330940519306e-18,
'Goldman Sachs': 0.6280157234640608,
'J.P. Morgan': 0.3719842765359393}
ec = pypfopt.efficient_frontier.EfficientCVaR(None, returns)
min_cvar_weights = ec.min_cvar()
print(min_cvar_weights)
{'Citibank': 0.0,
'Morgan Stanley': 0.0,
'Goldman Sachs': 0.669324359403484,
'J.P. Morgan': 0.3306756405965026}
Jamsheed Shorish
Computational Economist
Portfolio stability
VaR/CVaR: potential portfolio loss for given confidence level
European put option: right (not obligation) to sell stock at fixed price X on date M
X = strike price
M = maturity date
Black-Scholes option pricing formula: Fisher Black & Nobel Laureate Myron Scholes (1973)
Requires for each time t:
spot price S
strike price X
time to maturity T := M − t
risk-free interest rate r
1Black, F. and M. Scholes (1973). "The Pricing of Options and Corporate Liabilities", Journal of Political Economy
vol 81 no. 3, pp. 637–654.{{3}}
No transactions costs
Underlying stock
No dividends
10.31222171237868
Spot price S falls (ΔS < 0) => option value V rises (ΔV > 0)
Delta of an option: Δ := ∂V
∂S
1
Hedge one share with Δ options
ΔV
Delta neutral: ΔS + Δ
= 0; stock is hedged!
Python function bs_delta() : computes the option delta
Link to source available in the exercises
Jamsheed Shorish
Computational Economist
A class of distributions
Loss distribution: not known with certainty
Class of possible distributions?
Suppose class of distributions f (x; θ)
Advantages:
Can visualize difference between data and estimate using histogram
Example:
Normal distribution with norm.fit()
Asymmetrical histogram?
Import scipy.stats.anderson
AndersonResult(statistic=11.048641503898523,
critical_values=array([0.57 , 0.649, 0.779, 0.909, 1.081]),
significance_level=array([15. , 10. , 5. , 2.5, 1. ]))
SkewtestResult(statistic=-7.786120875514511,
pvalue=6.90978472959861e-15)
Jamsheed Shorish
Computational Economist
Historical simulation
No appropriate class of distributions?
Historical simulation: use past to predict future
No distributional assumption required
Relies upon random draws from distribution(s) to create random path, called a run
daily_loss = np.zeros(N)
for n in range(N):
loss = ( mu * (1/total_steps) +
norm.rvs(size=total_steps) * sigma * np.sqrt(1/total_steps) )
Use np.quantile() to find the VaR at e.g. 95% confidence level, over daily_loss
daily_loss = np.zeros(N)
for n in range(N):
loss = mu * (1/total_steps) + ...
norm.rvs(size=total_steps) * sigma * np.sqrt(1/total_steps)
daily_loss[n] = sum(loss)
VaR_95 = np.quantile(daily_loss, 0.95)
Jamsheed Shorish
Computational Economist
Risk and distribution
Risk management toolkit
Risk mitigation: MPT
Chow Test:
Test for existence of structural break given linear model
OLS regression using statsmodels 's OLS object over full period 1950 - 2019
Retrieve sum-of-squared residual res.ssr
import statsmodels.api as sm
res = sm.OLS(log_pop, year).fit()
print('SSR 1950-2019: ', res.ssr)
Jamsheed Shorish
Computational Economist
Chow test assumptions
Chow test: identify statistical significance
of possible structural break
rolling = portfolio_returns.rolling(30)
volatility = rolling.std().dropna()
vol_mean = volatility.resample("M").mean()
Backtesting: use previous data ex-post to see how risk estimate performs
Used extensively in enterprise risk management
Jamsheed Shorish
Computational Economist
Extreme values
Portfolio losses: extreme values Extreme values: from tail of distribution
Tail losses: losses exceeding some value
Block maxima
Block maxima:
Break period into sub-periods
Block maxima:
Break period into sub-periods
Block maxima:
Break period into sub-periods
maxima = losses.resample("W").max()
Jamsheed Shorish
Computational Economist
The histogram revisited
Risk factor distributions
Assumed (e.g. Normal, T, etc.)
Fitted (parametric estimation, Monte
Carlo simulation)
Jamsheed Shorish
Computational Economist
Real-time portfolio updating
Risk management
Defined risk measures (VaR, CVaR)
Financial data
Hidden layer
Hidden layer
Output layer
Hidden layer
Output layer
Hidden layer
Output layer
Hidden layer
Output layer
Hidden layer
Output layer
Usage
Input: new, unseen asset prices
model.compile(loss='mean_squared_error', optimizer='rmsprop')
model.fit(training_input, training_output, epochs=100)
Jamsheed Shorish
Computational Economist
Congratulations!
Quantitative Risk Management: Concepts, Techniques and Tools, McNeil, Frey & Embrechts,
Princeton UP, 2015.
Michael Crabtree
Data Scientist, Ford Motor Company
What is credit risk?
The possibility that someone who has borrowed money will not repay it all
Calculated risk di erence between lending someone money and a government bond
The likelihood that someone will default on a loan is the probability of default (PD)
Calculated risk di erence between lending someone money and a government bond
The likelihood that someone will default on a loan is the probability of default (PD)
Application data
Behavioral data
Application Behavioral
Interest Rate Employment Length
Grade Historical Default
Amount Income
Michael Crabtree
Data Scientist, Ford Motor Company
Data processing
Prepared data allows models to train faster
pd.crosstab(cr_loan['person_home_ownership'], cr_loan['loan_status'],
values=cr_loan['loan_int_rate'], aggfunc='mean').round(2)
Histograms
Sca er plots
Michael Crabtree
Data Scientist, Ford Motor Company
What is missing data?
NULLs in a row instead of an actual value
null_columns = cr_loan.columns[cr_loan.isnull().any()]
cr_loan[null_columns].isnull().sum()
indices = cr_loan[cr_loan['person_emp_length'].isnull()].index
cr_loan.drop(indices, inplace=True)
Michael Crabtree
Data Scientist, Ford Motor Company
Probability of default
The likelihood that someone will default on a loan is the probability of default
Decision tree
clf_logistic = LogisticRegression(solver='lbfgs')
clf_logistic.fit(training_columns, np.ravel(training_labels))
X = cr_loan.drop('loan_status', axis = 1)
y = cr_loan[['loan_status']]
Michael Crabtree
Data Scientist, Ford Motor Company
Logistic regression coefficients
# Model Intercept
array([-3.30582292e-10])
# Coefficients for ['loan_int_rate','person_emp_length','person_income']
array([[ 1.28517496e-09, -2.27622202e-09, -2.17211991e-05]])
For every 1 year increase in person_emp_length , the person is less likely to default
For every 1 year increase in person_emp_length , the person is less likely to default
Non-numeric:
cr_loan_clean['loan_intent']
EDUCATION
MEDICAL
VENTURE
PERSONAL
DEBTCONSOLIDATION
HOMEIMPROVEMENT
Will cause errors with machine learning models in Python unless processed
Michael Crabtree
Data Scientist, Ford Motor Company
Model accuracy scoring
Calculate accuracy
0.81
preds = clf_logistic.predict_proba(X_test)
preds_df = pd.DataFrame(preds[:,1], columns = ['prob_default'])
preds_df['loan_status'] = preds_df['prob_default'].apply(lambda x: 1 if x > 0.5 else 0)
Michael Crabtree
Data Scientist, Ford Motor Company
Confusion matrices
Shows the number of correct and incorrect predictions for each loan_status
Michael Crabtree
Data Scientist, Ford Motor Company
Decision trees
Creates predictions similar to logistic regression
Loan True loan status Pred. Loan Status Loan payo value Selling Value Gain/Loss
1 0 1 $1,500 $250 -$1,250
2 0 1 $1,200 $250 -$950
# gbt_preds_prob
array([[0.059, 0.940], [0.121, 0.989]])
# gbt_preds
array([1, 1, 0...])
max_depth : sets how deep each tree can go, larger means more complex
xgb.XGBClassifier(learning_rate = 0.2,
max_depth = 4)
Michael Crabtree
Data Scientist, Ford Motor Company
Choosing specific columns
We've been using all columns for predictions
{'person_home_ownership_RENT': 1, 'person_home_ownership_OWN': 2}
Michael Crabtree
Data Scientist, Ford Motor Company
Cross validation basics
Used to train and test the model in a way that simulates using the model on new data
Early stopping tells cross validation to stop a er a scoring metric has not improved a er a
number of iterations
1 h ps://scikit-learn.org/stable/modules/cross_validation.html
Michael Crabtree
Data Scientist, Ford Motor Company
Not enough defaults in the data
The values of loan_status are the classes
Non-default: 0
Default: 1
y_train['loan_status'].value_counts()
Person Loan Amount Potential Pro t Predicted Status Actual Status Losses
A $1,000 $10 Default Non-Default -$10
B $1,000 $10 Non-Default Default -$1,000
Log-loss for the model is the same for both, our actual losses is not
Business processes:
Measures already in place to not accept probable defaults
Behavioral factors:
Normally, people do not default on their loans
The less o en they default, the higher their credit rating
Michael Crabtree
Data Scientist, Ford Motor Company
Comparing classification reports
Create the reports with classification_report() and compare
A sample of loans and their predicted probabilities of default should be close to the
percentage of defaults in that sample
h p://datascienceassn.org/sites/default/ les/Predicting%20good%20probabilities%20with%20supervised%20lea
# Fraction of positives
(array([0.09602649, 0.19521012, 0.62035996, 0.67361111]),
# Average probability
array([0.09543535, 0.29196742, 0.46898465, 0.65512207]))
Michael Crabtree
Data Scientist, Ford Motor Company
Thresholds and loan status
Previously we set a threshold for a range of prob_default values
This was used to change the predicted loan_status of the loan
Acceptance rate: what percentage of new loans are accepted to keep the number of
defaults in a portfolio low
Accepted loans which are defaults have an impact similar to false negatives
import numpy as np
# Compute the threshold for 85% acceptance rate
threshold = np.quantile(prob_default, 0.85)
0.804
These are loans with prob_default values around where our model is not well calibrated
The .count() of a single column is the same as the row count for the data frame
Michael Crabtree
Data Scientist, Ford Motor Company
Selecting acceptance rates
First acceptance rate was set to 85%, but other rates might be selected as well
Michael Crabtree
Data Scientist, Ford Motor Company
Your journey...so far
Prepare credit data for machine learning models
Important to understand the data
Develop, score, and understand logistic regressions and gradient boosted trees
Stuctural model framework: the model explains the default even based on other factors
Other techniques
Through-the-cycle model (continuous time): macro-economic conditions and other e ects
are used, but the risk is seen as an independent event
In many cases, business users will not accept a model they cannot understand
Complex models can be very large and di cult to put into production
Chelsea Yang
Data Science Instructor
Course overview
GARCH: Generalized AutoRegressive Conditional Heteroskedasticity
√
2
∑n
(returni − mean)
volatility = i=1
= √variance
n−1
return_data = price_data.pct_change()
volatility = return_data.std()
σmonthly = √21 ∗ σd
σannual = √252 ∗ σd
Chelsea Yang
Data Science Instructor
First came the ARCH
Auto Regressive Conditional Heteroskedasticity
ω, α, β >= 0
α+β <1
long-run variance:
ω/(1 − α − β)
Chelsea Yang
Data Science Instructor
Python "arch" package
from arch import arch_model
1Kevin Sheppard. (2019, March 28). bashtage/arch: Release 4.8.1 (Version 4.8.1). Zenodo.
h p://doi.org/10.5281/zenodo.2613877
3. Make a forecast
basic_gm = arch_model(sp_data['Return'], p = 1, q = 1,
mean = 'constant', vol = 'GARCH', dist = 'normal')
gm_result = gm_model.fit(update_freq = 4)
print(gm_result.params)
mu 0.077239
omega 0.039587
alpha[1] 0.167963
beta[1] 0.786467
Name: params, dtype: float64
h.1 in row "2019-10-10": 1-step ahead forecast made using data up to and including that date
Chelsea Yang
Data Science Instructor
Why make assumptions
Volatility is not directly observable
Distribution
========================================================================
coef std err t P>|t| 95.0% Conf. Int.
.-----------------------------------------------------------------------
nu 4.9249 0.507 9.709 2.768e-22 [ 3.931, 5.919]
========================================================================
Distribution
===========================================================================
coef std err t P>|t| 95.0% Conf. Int.
.--------------------------------------------------------------------------
nu 5.2437 0.575 9.118 7.681e-20 [ 4.117, 6.371]
lambda -0.0822 2.541e-02 -3.235 1.216e-03 [ -0.132,-3.241e-02]
===========================================================================
Chelsea Yang
Data Science Instructor
Constant mean by default
constant mean: generally works well with most nancial return data
arch_model(my_data, p = 1, q = 1,
mean = 'constant', vol = 'GARCH')
arch_model(my_data, p = 1, q = 1,
mean = 'zero', vol = 'GARCH')
arch_model(my_data, p = 1, q = 1,
mean = 'AR', lags = 1, vol = 'GARCH')
Chelsea Yang
Data Science Instructor
Asymmetric shocks in financial data
News impact curve:
Riskier!
Exponential GARCH
Add a conditional component to model the asymmetry in shocks similar to the GJR-GARCH
Chelsea Yang
Data Science Instructor
Rolling window for out-of-sample forecast
An exciting part of nancial modeling: predict the unknown
Rolling window forecast: repeatedly perform model ing and forecast as time rolls forward
for i in range(120):
gm_result = basic_gm.fit(first_obs = start_loc,
last_obs = i + end_loc, disp = 'off')
temp_result = gm_result.forecast(horizon = 1).variance
for i in range(120):
# Specify rolling window range for model fitting
gm_result = basic_gm.fit(first_obs = i + start_loc,
last_obs = i + end_loc, disp = 'off')
temp_result = gm_result.forecast(horizon = 1).variance
Too wide window size: include obsolete data that may lead to higher variance
Too narrow window size: exclude relevant data that may lead to higher bias
Chelsea Yang
Data Science Instructor
Do I need this parameter?
Is it relevant
Common threshold: 5%
The lower the p-value, the more ridiculous the null hypothesis looks
mu 9.031206e-08
omega 1.619415e-05
alpha[1] 4.283526e-10
beta[1] 1.302531e-183
Name: pvalues, dtype: float64
mu 5.345210
omega 4.311785
alpha[1] 6.243330
beta[1] 28.896991
Name: tvalues, dtype: float64
# Manual calculation
t = gm_result.params/gm_result.std_err
Chelsea Yang
Data Science Instructor
Visual check
Existence of autocorrelation in the standardized residuals indicates the model may not be
sound
To detect autocorrelation:
ACF plot
Ljung-Box
Red area in the plot indicates the con dence level (alpha = 5%)
# Check p-values
print('P-values are: ', lb_test[1])
Chelsea Yang
Data Science Instructor
Goodness of fit
Can model do a good job explaining the data?
1. Maximum likelihood
2. Information criteria
print(gm_result.loglikelihood)
print(gm_result.aic)
print(gm_result.bic)
Chelsea Yang
Data Science Instructor
Backtesting
An approach to evaluate model forecasting capability
Out-of-sample: backtesting
Chelsea Yang
Data Science Instructor
Risk management mindset
Rule No.1: Never lose money
-- Warren Bu e
Three ingredients:
1. portfolio
2. time horizon
3. probability
5% probability the portfolio will fall in value by 1 million dollars or more over a 1-day period
1% probability the portfolio will fall in value by 9 million dollars or more over a 10-day period
mean_forecast = gm_forecast.mean['2019-01-01':]
variance_forecast = gm_forecast.variance['2019-01-01':]
2. Empirical VaR
q_empirical = std_resid.quantile(0.05)
Chelsea Yang
Data Science Instructor
What is covariance
Describe the relationship between movement of two variables
Covariance = ρ ⋅ σ1 ⋅ σ2
resid_eur = gm_eur.resid/vol_eur
resid_cad = gm_cad.resid/vol_cad
The optimal portfolio can yield the maximum return with the minimum risk
Risk can be reduced in a portfolio by pairing assets that have a negative covariance
Chelsea Yang
Data Science Instructor
What is Beta
Stock Beta:
Systematic risk:
Beta > 1: the stock bears more risks than the general market
Beta < 1: the stock bears less risks than the general market
E(Rs ) = Rf + β (E(Rm ) − Rf )
Chelsea Yang
Data Science Instructor
You did it
Fit GARCH models
Portfolio optimization