Machine Learning in Trading (MLT-03)
Deepak Kaushal
Outline
● Limitation of linear models
● Advanced Models
● Neural Networks - Introduction
● Hands-on Lab I
● Neural Networks - Properties
● Hyperparameter Tuning
● Hands-on Lab II
Limitations of linear models
● Both linear and logistic regression
inherently create a linear model
● Some machine learning tasks are
not linearly separable
● Advance models can capture
nonlinear relationships
K Nearest Neighbors
● Find k number of training samples
‘closest’ to the test sample
● Doesn’t create a model, but just a
store of training samples
● Euclidean distance most common
choice to define ‘closeness’
● Hyperparameters - k, distance,
weights
Source: [Link]
Neural Networks
● One of the most popular machine
learning models of the last decade
● Has provided state-of-the-art
results for image and speech
recognition
● Loosely derive its structure from
how human brain functions
● Good at capturing complex
relationships, prone to overfitting
Source: [Link]
Logistic Regression as building blocks
Linear Combination Sigmoid function Loss function - Log loss
Neural Networks - Learning process
1. Initialize random weights
2. Forward pass - Apply sequence of linear
combinations + activations
3. Use log loss to compute errors
4. Use gradient descent algorithm to
identify appropriate adjustments
5. Backpropagation - Propagate the
updates to weight from layers n -> n-1->
… back to first layer Source: [Link]
QUIZ
Neural Network - Properties
● Input and Output layer
● Hidden layers
○ Depth - No. of hidden layers
○ Width - No. of Neurons per hidden layer
○ Eg (3,3), (5,3,2) etc
○ Defines number of parameters(and thus complexity)
● Activation function - Eg Sigmoid function, tanh
function, ReLU function
● Loss function - Eg Log loss function, mean squared
error
Source: [Link]/figure/The-structure-of-an-artificial-neural-network-m-ij-denotes-the-
weight-between-the-ith_fig1_329367972
Neural Networks - Activation Functions
● Responsible for introducing
non-linearity in the model
● Without activation, linear
combination of various layers
is still linear
● Designed to be suitable for
backpropagation
Source: [Link]
Regularization
● Techniques to avoid
overfitting
● Minimize(Loss + Model
Complexity)
● L2 Regularization - simplify the
model
● Neural network specific
○ Early stopping
○ Dropout
Source: [Link]
L2 Regularization - Simplify the model
● A Feature weight with a high
absolute value is more
complex
● Complexity is defined as the
sum of squares of all feature
weights
● λ (lambda) is the
regularization rate
● Higher the lambda, more the Source: [Link]
regularization
● This is tuned based on the
problem at hand
Neural Network - Choosing (Hyper)Parameters
● Hidden layers, activation function,
regularization, learning rate are called
hyperparameters of the model
● Different form weights(often called
parameters) as they are not ‘learned’
● These are often arrived at by trials and
iterations
Source: [Link]
Grid Search - Hyperparameter tuning
● Use scikit functionality
sklearn.model_selection.GridSerachCV
● Provide parameter space
{'hidden_layer': [(5), (5,3,2), (10,5,2]],
'activation': ['logistic', ’tanh’],
'alpha': [1, 0.1, 0.001]},} #Neural Network
{'n_neighbors': [3, 5, 10],
'metric': ['logistic', ’tanh’],
'alpha': [1, 0.1, 0.001]},} #kNN
● Cross validation - Number of Folds
● Score function - accuracy, precision Source: [Link]
QUIZ