0% found this document useful (0 votes)

120 views

Cheat Sheet - Machine Learning - Data Science Interview PDF

The document provides cheat sheets on various machine learning topics that are commonly asked about during interviews. It includes summaries on bias-variance tradeoff, imbalanced classification data, principal component analysis, and preparing for coding and behavioral interviews. The cheat sheets concisely explain key concepts in 1-2 paragraphs and include relevant diagrams.

Uploaded by

Paulo

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

120 views

Cheat Sheet - Machine Learning - Data Science Interview PDF

Uploaded by

Paulo

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Machine Learning/Data Science Interview

Cheat sheets
Aqeel Anwar
Version: 0.1.0.2

This document contains cheat sheets on various topics asked during a Machine Learn-
ing/Data science interview. This document is constantly updated to include more topics.

Click here to get the updated version

Table of Contents
Machine Learning Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1. Bias-Variance Trade-off . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2. Imbalanced Data in Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3. Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

4. Bayes’ Theorem and Classifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

5. Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

6. Regularization in ML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

7. Convolutional Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

8. Famous CNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

9. Ensemble Methods in Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Interview Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1. Preparing for Coding Interviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2. How to prepare for behavioral interview? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12

3. How to answer a behavioral question? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Page 1 of 16
Cheat Sheet – Bias-Variance Tradeoff
What is Bias?
• Error between average model prediction and ground truth
• The bias of the estimated function tells us the capacity of the underlying model to
predict the values
What is Variance?
• Average variability in the model prediction for the given dataset
• The variance of the estimated function tells you how much the function can adjust
to the change in the dataset
High Bias Overly-simplified Model
Under-fitting
High error on both test and train data

High Variance Overly-complex Model

Over-fitting
Low error on train data and high on test
Starts modelling the noise in the input

Minimum Error

Bias variance Trade-off

• Increasing bias reduces variance and vice-versa
• Error = bias2 + variance +irreducible error
• The best model is where the error is reduced.
• Compromise between bias and variance
Source: https://www.cheatsheets.aqeel-anwar.com Tutorial: Click here

Page 2 of 16
Cheat Sheet – Imbalanced Data in Classification
Blue: Label 1

Green: Label 0 Correct Predictions

Accuracy =
Total Predictions
Classifier that always predicts label blue yields prediction accuracy of 90%
Accuracy doesn’t always give the correct insight about your trained model
Accuracy: %age correct prediction Correct prediction over total predictions One value for entire network
Precision: Exactness of model From the detected cats, how many were Each class/label has a value
actually cats
Recall: Completeness of model Correctly detected cats over total cats Each class/label has a value
F1 Score: Combines Precision/Recall Harmonic mean of Precision and Recall Each class/label has a value

Performance metrics associated with Class 1

(Is your prediction correct?) (What did you predict)
Actual Labels True Negative
1 0
(Your prediction is correct) (You predicted 0)
TP FP
True False
Predicted Labels

Precision = False +ve rate =

Positive Positive TP + FP TN + FP
(Prec x Rec) TP + TN
F1 score = 2x Accuracy =
(Prec + Rec) TP + FN + FP + TN
False True
0

Negative Negative TN TP
Specificity = Recall, Sensitivity =
TN + FP True +ve rate TP + FN

Possible solutions
1. Data Replication: Replicate the available data until the Blue: Label 1
number of samples are comparable Green: Label 0
2. Synthetic Data: Images: Rotate, dilate, crop, add noise to Blue: Label 1
existing input images and create new data Green: Label 0
3. Modified Loss: Modify the loss to reflect greater error when 𝑙𝑜𝑠𝑠 = 𝑎 ∗ 𝒍𝒐𝒔𝒔𝒈𝒓𝒆𝒆𝒏 + 𝑏 ∗ 𝒍𝒐𝒔𝒔𝒃𝒍𝒖𝒆 𝑎>𝑏
misclassifying smaller sample set
4. Change the algorithm: Increase the model/algorithm complexity so that the two classes are perfectly
separable (Con: Overfitting)
Increase model
complexity

No straight line (y=ax) passing through origin can perfectly Straight line (y=ax+b) can perfectly separate data.
separate data. Best solution: line y=0, predict all labels blue Green class will no longer be predicted as blue

Source: https://www.cheatsheets.aqeel-anwar.com Tutorial: Click here

Page 3 of 16
Cheat Sheet – PCA Dimensionality Reduction
What is PCA?
• Based on the dataset find a new set of orthogonal feature vectors in such a way that the
data spread is maximum in the direction of the feature vector (or dimension)
• Rates the feature vector in the decreasing order of data spread (or variance)
• The datapoints have maximum variance in the first feature vector, and minimum variance
in the last feature vector
• The variance of the datapoints in the direction of feature vector can be termed as a
measure of information in that direction.
Steps
1. Standardize the datapoints
2. Find the covariance matrix from the given datapoints
3. Carry out eigen-value decomposition of the covariance matrix
4. Sort the eigenvalues and eigenvectors

Dimensionality Reduction with PCA

• Keep the first m out of n feature vectors rated by PCA. These m vectors will be the best m
vectors preserving the maximum information that could have been preserved with m
vectors on the given dataset
Steps:
1. Carry out steps 1-4 from above
2. Keep first m feature vectors from the sorted eigenvector matrix
3. Transform the data for the new basis (feature vectors)
4. The importance of the feature vector is proportional to the magnitude of the eigen value

Figure 1 Figure 2
Feature # 1 (F1)

FeFeature # 1

Variance
Variance

1
e#

2
ur

e#
at

ur
at
Fe
ew

w
Ne
N

F2 F1 Feature # 2 (F2) Feature # 2 F2 F1

Figure 3 Figure 1: Datapoints with feature vectors as

x and y-axis
Figure 2: The cartesian coordinate system is
rotated to maximize the standard deviation
Variance
ew Feature # 1

along any one axis (new feature # 2)

1
#

2 Figure 3: Remove the feature vector with

e#
u

ur minimum standard deviation of datapoints

at
Fe

Fe F2 F2 (new feature # 1) and project the data on

w
Ne
N

Feature # 2 new feature # 2

Source: https://www.cheatsheets.aqeel-anwar.com

Page 4 of 16
Cheat Sheet – Bayes Theorem and Classifier
What is Bayes’ Theorem?
• Describes the probability of an event, based on prior knowledge of conditions that might be
related to the event.

P(A B)
• How the probability of an event changes when
we have knowledge of another event Posterior
Probability
P(A) P(A B)
Usually, a better
estimate than P(A)
Bayes’ Theorem
Example
• Probability of fire P(F) = 1%
• Probability of smoke P(S) = 10%
Likelihood P(A) Evidence
• Prob of smoke given there is a fire P(S F) = 90%
• What is the probability that there is a fire given P(B A) Prior P(B)
we see a smoke P(F S)? Probability

Maximum Aposteriori Probability (MAP) Estimation

The MAP estimate of the random variable y, given that we have observed iid (x1, x2, x3, … ), is
given by. We try to accommodate our prior knowledge when estimating.
ˆMAP y that maximizes the product of
prior and likelihood

Maximum Likelihood Estimation (MLE)

The MAP estimate of the random variable y, given that we have observed iid (x1, x2, x3, … ), is
given by. We assume we don’t have any prior knowledge of the quantity being estimated.
ˆ y that maximizes only the
MLE
likelihood
MLE is a special case of MAP where our prior is uniform (all values are equally likely)

Naïve Bayes’ Classifier (Instantiation of MAP as classifier)

Suppose we have two classes, y=y1 and y=y2. Say we have more than one evidence/features (x1,
x2, x3, … ), using Bayes’ theorem

Naïve Bayes’ theorem assumes the features (x1, x2, … ) are i.i.d. i.e

Source: https://www.cheatsheets.aqeel-anwar.com

Page 5 of 16
Cheat Sheet – Regression Analysis
What is Regression Analysis?
Fitting a function f(.) to datapoints yi=f(xi) under some error function. Based on the estimated
function and error, we have the following types of regression
1. Linear Regression:
Fits a line minimizing the sum of mean-squared error
for each datapoint.
2. Polynomial Regression:
Fits a polynomial of order k (k+1 unknowns) minimizing
the sum of mean-squared error for each datapoint.
3. Bayesian Regression:
For each datapoint, fits a gaussian distribution by
minimizing the mean-squared error. As the number of
data points xi increases, it converges to point
estimates i.e.
4. Ridge Regression:
Can fit either a line, or polynomial minimizing the sum
of mean-squared error for each datapoint and the
weighted L2 norm of the function parameters beta.
5. LASSO Regression:
Can fit either a line, or polynomial minimizing the the
sum of mean-squared error for each datapoint and the
weighted L1 norm of the function parameters beta.
6. Logistic Regression:
Can fit either a line, or polynomial with sigmoid
activation minimizing the binary cross-entropy loss for
each datapoint. The labels y are binary class labels.
Visual Representation:
Linear Regression Polynomial Regression Bayesian Linear Regression Logistic Regression
Label 1
y
y

Label 0

x x x x

Summary:
What does it fit? Estimated function Error Function
Linear A line in n dimensions
Polynomial A polynomial of order k
Bayesian Linear Gaussian distribution for each point
Ridge Linear/polynomial
LASSO Linear/polynomial
Logistic Linear/polynomial with sigmoid

Source: https://www.cheatsheets.aqeel-anwar.com Tutorial: Click here

Page 6 of 16
Cheat Sheet – Regularization in ML
What is Regularization in ML?
• Regularization is an approach to address over-fitting in ML.
• Overfitted model fails to generalize estimations on test data
• When the underlying model to be learned is low bias/high
variance, or when we have small amount of data, the
estimated model is prone to over-fitting.
• Regularization reduces the variance of the model
Types of Regularization: Figure 1. Overfitting
1. Modify the loss function:
• L2 Regularization: Prevents the weights from getting too large (defined by L2 norm). Larger
the weights, more complex the model is, more chances of overfitting.

• L1 Regularization: Prevents the weights from getting too large (defined by L1 norm). Larger
the weights, more complex the model is, more chances of overfitting. L1 regularization
introduces sparsity in the weights. It forces more weights to be zero, than reducing the the
average magnitude of all weights

• Entropy: Used for the models that output probability. Forces the probability distribution
towards uniform distribution.

2. Modify data sampling:

• Data augmentation: Create more data from available data by randomly cropping, dilating,
rotating, adding small amount of noise etc.
• K-fold Cross-validation: Divide the data into k groups. Train on (k-1) groups and test on 1
group. Try all k possible combinations.

3. Change training approach:

• Injecting noise: Add random noise to the weights when they are being learned. It pushes the
model to be relatively insensitive to small variations in the weights, hence regularization
• Dropout: Generally used for neural networks. Connections between consecutive layers are
randomly dropped based on a dropout-ratio and the remaining network is trained in the
current iteration. In the next iteration, another set of random connections are dropped.
5-fold cross-validation Original Network Dropout-ratio = 30%
Test Train
Train Test Train

Train Test Train

Train Test Connections = 16 Active = 11 (70%) Active = 11 (70%)

Figure 2. K-fold CV Figure 3. Drop-out

Source: https://www.cheatsheets.aqeel-anwar.com Tutorial: Click here

Page 7 of 16
Cheat Sheet – Famous CNNs
AlexNet – 2012
Why: AlexNet was born out of the need to improve the results of
the ImageNet challenge.
What: The network consists of 5 Convolutional (CONV) layers and 3
Fully Connected (FC) layers. The activation used is the Rectified
Linear Unit (ReLU).
How: Data augmentation is carried out to reduce over-fitting, Uses
Local response localization.

VGGNet – 2014
Why: VGGNet was born out of the need to reduce the # of
parameters in the CONV layers and improve on training time
What: There are multiple variants of VGGNet (VGG16, VGG19, etc.)
How: The important point to note here is that all the conv kernels are
of size 3x3 and maxpool kernels are of size 2x2 with a stride of two.

ResNet – 2015
Why: Neural Networks are notorious for not being able to find a
simpler mapping when it exists. ResNet solves that.
What: There are multiple versions of ResNetXX architectures where
‘XX’ denotes the number of layers. The most used ones are ResNet50
and ResNet101. Since the vanishing gradient problem was taken care of
(more about it in the How part), CNN started to get deeper and deeper
How: ResNet architecture makes use of shortcut connections do solve
the vanishing gradient problem. The basic building block of ResNet is
a Residual block that is repeated throughout the network.
Filter
Concatenation
Weight layer

f(x) x 1x1
3x3
Conv
5x5
Conv
1x1 Conv

Weight layer Conv 1x1 1x1 3x3

Conv Conv Maxpool

+ Previous
f(x)+x Layer

Figure 1 ResNet Block Figure 2 Inception Block

Inception – 2014
Why: Lager kernels are preferred for more global features, on the other
hand, smaller kernels provide good results in detecting area-specific
features. For effective recognition of such a variable-sized feature, we
need kernels of different sizes. That is what Inception does.
What: The Inception network architecture consists of several inception
modules of the following structure. Each inception module consists of
four operations in parallel, 1x1 conv layer, 3x3 conv layer, 5x5 conv
layer, max pooling
How: Inception increases the network space from which the best
network is to be chosen via training. Each inception module can
capture salient features at different levels.

Source: https://www.cheatsheets.aqeel-anwar.com Tutorial: Click here

Page 8 of 16
Cheat Sheet – Convolutional Neural Network
Convolutional Neural Network:
The data gets into the CNN through the input layer and passes
through various hidden layers before getting to the output layer.
The output of the network is compared to the actual labels in
terms of loss or error. The partial derivatives of this loss w.r.t the
trainable weights are calculated, and the weights are updated
through one of the various methods using backpropagation.

CNN Template:
Most of the commonly used hidden layers (not all) follow a
pattern
1. Layer function: Basic transforming function such as
convolutional or fully connected layer.
a. Fully Connected: Linear functions between the input and the
output.
a. Convolutional Layers: These layers are applied to 2D (3D) input feature maps. The trainable weights are a 2D (3D)
kernel/filter that moves across the input feature map, generating dot products with the overlapping region of the input
feature map.
b.Transposed Convolutional (DeConvolutional) Layer: Usually used to increase the size of the output feature map
(Upsampling) The idea behind the transposed convolutional layer is to undo (not exactly) the convolutional layer
Fully Connected Layer Convolutional Layer
w11*x
x1 1+ b1
+ b1 y1
w21*x2
x2
1
3 +b
1*x
x3 w3

Input Node Output Node Input Map Kernel Output Map

2. Pooling: Non-trainable layer to change the size of the feature map

a. Max/Average Pooling: Decrease the spatial size of the input layer based on
selecting the maximum/average value in receptive field defined by the kernel
b. UnPooling: A non-trainable layer used to increase the spatial size of the input
layer based on placing the input pixel at a certain index in the receptive field
of the output defined by the kernel.
3. Normalization: Usually used just before the activation functions to limit the
unbounded activation from increasing the output layer values too high
a. Local Response Normalization LRN: A non-trainable layer that square-normalizes the pixel values in a feature map
within a local neighborhood.
b. Batch Normalization: A trainable approach to normalizing the data by learning scale and shift variable during training.
3. Activation: Introduce non-linearity so CNN can 5. Loss function: Quantifies how far off the CNN prediction
efficiently map non-linear complex mapping. is from the actual labels.
a. Non-parametric/Static functions: Linear, ReLU a. Regression Loss Functions: MAE, MSE, Huber loss
b. Parametric functions: ELU, tanh, sigmoid, Leaky ReLU b. Classification Loss Functions: Cross entropy, Hinge loss
c. Bounded functions: tanh, sigmoid 4.0
MSE Loss
2.0
MAE Loss
2.0 Ω
Huber Loss
æ
mse = (x ° x̂)2 mae = |x ° x̂| 1
2 (x ° x̂)
2
: |x ° x̂| < ∞
3.5 1.75 1.75 ∞|x ° x̂| ° 12 ∞ 2 : else
∞ =1.9
3.0 1.5 1.5
2.5 1.25 1.25
2.0 1.0 1.0
1.5 0.75 0.75
1.0 0.5 0.5
0.5 0.25 0.25
0.0 0.0 0.0
-2.0 -1.0 0.0 1.0 2.0 -2.0 -1.0 0.0 1.0 2.0 -2.0 -1.0 0.0 1.0 2.0

Hinge Loss Cross Entropy Loss

1.0
3.0 Ω æ
max(0, 1 ° x̂) : x = 1 °ylog(p) ° (1 ° y)log(1 ° p)
2.5
max(0, 1 + x̂) : x = °1 8.0 0.8

2.0 6.0 0.6

1.5
4.0 0.4
1.0
2.0
0.5 0.2

0.0 0.0 0.0

-2.0 -1.0 0.0 1.0 2.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

Source: https://www.cheatsheets.aqeel-anwar.com Tutorial: Click here

Page 9 of 16
Cheat Sheet – Ensemble Learning in ML
What is Ensemble Learning? Wisdom of the crowd
Combine multiple weak models/learners into one predictive model to reduce bias, variance and/or improve accuracy.

Types of Ensemble Learning: N number of weak learners

1.Bagging: Trains N different weak models (usually of same types – homogenous) with N non-overlapping subset of the
input dataset in parallel. In the test phase, each model is evaluated. The label with the greatest number of predictions is
selected as the prediction. Bagging methods reduces variance of the prediction

2.Boosting: Trains N different weak models (usually of same types – homogenous) with the complete dataset in a
sequential order. The datapoints wrongly classified with previous weak model is provided more weights to that they can
be classified by the next weak leaner properly. In the test phase, each model is evaluated and based on the test error of
each weak model, the prediction is weighted for voting. Boosting methods decreases the bias of the prediction.

3.Stacking: Trains N different weak models (usually of different types – heterogenous) with one of the two subsets of the
dataset in parallel. Once the weak learners are trained, they are used to trained a meta learner to combine their
predictions and carry out final prediction using the other subset. In test phase, each model predicts its label, these set of
labels are fed to the meta learner which generates the final prediction.

The block diagrams, and comparison table for each of these three methods can be seen below.
Ensemble Method – Boosting Ensemble Method – Bagging
Input Dataset Step #1 Input Dataset
Step #1 Create N subsets
Assign equal weights Complete dataset from original Subset #1 Subset #2 Subset #3 Subset #4
to all the datapoints dataset, one for each
in the dataset weak model

Uniform weights
Step #2
Train each weak
Weak Model Weak Model Weak Model Weak Model
Step #2a Step #2b model with an
Train a weak model Train Weak • Based on the final error on the independent #1 #2 #3 #4
with equal weights to trained weak model, calculate a subset, in
Model #1 parallel
all the datapoints scalar alpha.
• Use alpha to increase the weights of
wrongly classified points, and
decrease the weights of correctly
alpha1 Adjusted weights classified points
Step #3
In the test phase, predict from
each weak model and vote their Voting
Step #3b predictions to get final prediction
Step #3a Train Weak • Based on the final error on the
Train a weak model Model #2 trained weak model, calculate a
with adjusted weights scalar alpha.
on all the datapoints • Use alpha to increase the weights of
in the dataset wrongly classified points, and Final Prediction
decrease the weights of correctly
alpha2 Adjusted weights classified points

Train Weak Ensemble Method – Stacking

Model #3
Step #1
Create 2 subsets from Input Dataset
original dataset, one
for training weak Subset #1 – Weak Learners Subset #3#2 – Meta Learner
Subset
alpha3 Adjusted weights models and one for
meta-model

Train Weak
Step #(n+1)a Model #4 Step #2
Train a weak model Train each weak
with adjusted weights model with the
Train Weak Train Weak Train Weak Train Weak
on all the datapoints weak learner Model #1 Model #2 Model #3 Model #4
in the dataset dataset
alpha3

x x x x Input Dataset
Subset #1 – Weak Learners Subset #2 – Meta Learner
Step #n+2
In the test phase, predict from each
weak model and vote their predictions
weighted by the corresponding alpha to
get final prediction Step #3
Voting Train a meta-
learner for which Trained Weak Trained Weak Trained Weak Trained Weak
the input is the
outputs of the Model Model Model Model
weak models for #1 #2 #3 #4
the Meta Learner
dataset
Final Prediction

Parameter Bagging Boosting Stacking

Meta Model
Focuses on Reducing variance Reducing bias Improving accuracy
Nature of weak
Homogenous Homogenous Heterogenous Step #4
learners is In the test phase, feed the input to the
weak models, collect the output and feed
Weak learners are Learned voting it to the meta model. The output of the
Final Prediction
Simple voting Weighted voting meta model is the final prediction
aggregated by (meta-learner)

Source: https://www.cheatsheets.aqeel-anwar.com Tutorial: Click here

Page 10 of 16
Cheat Sheet – Preparing for Coding Interviews
Part 1 – How to prepare for coding interviews?*
• The timeline: Start applying for jobs Graduation
Start preparing

1 month 3 months
Fig. 1 – Preparation Timeline for Coding Interviews
• Review Data structures and Complexities:
The following 7 data structures are necessary for the interview, and their time/space complexity
• List/Arrays, Linked List, Hash Table/dictionary, Tree, Graph, Heap, Queue
• Click here for tutorial.
• Practice coding questions:
• Multiple online resources such as LeetCode.com, InterviewBit.com, HackerRank.com etc.
• Pick one online resource and aim for easy and medium coding questions (approx. 100-150).
• Beginners start preparing 2-3 months before the interview, and intermediates about 1 month.
• Note:
• From my personal experience, paid subscription of LeetCode.com was worth it.
• Facebook, Uber, Google and Microsoft tagged question of LeetCode covered almost 90% of the
questions asked
Part 2 – How to answer a coding question?*
• Listen to the question
The interviewer will explain the question with an example. Note down the important points.
• Talk about your understanding of the question
Repeat the question and confirm your understanding. Ask clarifying questions such as
1. Input/Output data type limitations
2. Input size/length limitations
3. Special/Corner cases
• Discuss your approach
Walk through how would you approach the problem and ask the interviewer if he agrees with it.
Talk about the data structure you prefer and why. Discuss the solution with the bigger picture.
• Start coding
Ask the interviewer if you could start coding. Define useful functions and explain as you write.
Think out loud so the interviewer can evaluate your thought process.
• Discuss the time and space complexity
Discuss the time and space complexity in terms of Big O for your coded approach.
• Optimize the approach
If your approach is not the most optimized one, the interviewer will hint you a few
improvements. Pay attention to hints and try to optimize your code.

Discuss Time
Listen & Walk through
& Space
Repeat your approach
complexity

Ask Clarifying
Start Coding Optimize
Questions

Fig. 2 – How to answer a coding question?

*Disclaimer: The recommendations are based on personal experiences of the author. The mentioned approach and resources might work great for some, but not so much for others.

Source: https://www.cheatsheets.aqeel-anwar.com

Page 11 of 16
How to prepare for
1/4 behavioral interview?
Collect stories, assign keywords, practice
the STAR format

Keywords List important keywords that will be populated with your personal
stories. Most common keywords are given in the table below

Conflict Compromise to
Negotiation Creativity Flexibility Convincing
Resolution achieve goal
Another team Adjust to a
Handling Challenging Working with
priorities not colleague Take Stand
Crisis Situation difficult people
aligned style
Handling –ve Coworker Working with a Your Influence
Your strength
feedback view of you deadline weakness Others
Handling Converting Decision
Handling Conflict Mentorship/
unexpected challenge to without enough
failure Resolution Leadership
situation opportunity data

Stories
1. List all the organizations you have been a part of. For example
1. Academia: BSc, MSc, PhD
2. Industry: Jobs, Internship
3. Societies: Cultural, Technical, Sports
2. Think of stories from step 1 that can fall into one of the keywords categories. The
more stories the better. You should have at least 10-15 stories.
3. Create a summary table by assigning multiple keywords to each stories. This will help
you filter out the stories when the question asked in the interview. An example can be
seen below
Story 1: [Convincing] [Take Stand] [influence other]
Story 2: [Mentorship] [Leadership]
Story 3: [Conflict resolution] [Negotiation]
Story 4: [decision-without-enough-data]

STAR Format
Write down the stories in the STAR format as explained in the 2/4 part of this cheat
sheet. This will help you practice the organization of story in a meaningful way.

Icon Source: www.flaticon.com

Source: https://www.cheatsheets.aqeel-anwar.com Tutorial: Click here

Page 12 of 16
How to prepare for
2/4 behavioral interview?
Direct*, meaningful*, personalized*, logical*
*(Respective colors are used to identify these characteristics in the example)

Example: “Tell us about a time when you had to convince senior executives”

S
“I worked as an intern in XYZ company in
Situation the summer of 2019. The project details
provided to me was elaborative. After
Explain the situation and some initial brainstorming, and research I
realized that the project approach can be
provide necessary context for modified to make it more efficient in
terms of the underlying KPIs. I decided to
your story. talk to my manager about it.”

“I had an hour-long call with my manager

and explained him in detail the proposed

T
Task approach and how it could improve the
KPIs. I was able to convince him. He
Explain the task and your asked me if I will be able to present my
proposed approach for approval in front of
responsibility in the the higher executives. I agreed to it. I was
working out of the ABC(city) office and
situation the executives need to fly in from
XYZ(city) office.”

“I did a quick background check on the

Action

A
executives to know better about their area
of expertise so that I can convince them
Walk through the steps and accordingly. I prepared an elaborative 15
slide presentation starting with explaining
actions you took to address their approach, moving onto my proposed
the issue approach and finally comparing them on
preliminary results.

“After some active discussion we were able

to establish that the proposed approach
Result

R
was better than the initial one. The
executives proposed a few small changes
State the outcome of the to my approach and really appreciated my
result of your actions stand. At the end of my internship, I was
selected among the 3 out of 68 interns
who got to meet the senior vice president
of the company over lunch.”

Icon Source: www.flaticon.com

Icon Source: www.flaticon.com
Source: https://www.cheatsheets.aqeel-anwar.com Tutorial: Click here

Page 13 of 16
How to answer a
3/4 behavioral question?
Understand, Extract, Map, Select and Apply
Example: “Tell us about a time when you had to convince senior executives”

Understand the question

Example: A story where I was able to convince
Understand my seniors. Maybe they had something in mind,
and I had a better approach and tried to
convince them

Extract keywords and tags

Extract useful keywords that encapsulates the
Extract Example:
gist of the question

[Convincing], [Creative], [Leadership]

Map the keyword to your stories

Shortlist all the stories that fall under the
Map keywords extracted from previous step
Example:
Story1, Story2, Story3, Story4, … , Story N

Select the best story

From the shortlisted stories, pick the one that
Select best describes the question and has not been used
so far in the interview
Example: Story3

Apply the STAR method

Apply the STAR method on the selected story to
Apply answer the question

Example: See Cheat Sheet 2/3 for details

Icon Source: www.flaticon.com

Icon Source: www.flaticon.com
Source: https://www.cheatsheets.aqeel-anwar.com Tutorial: Click here

Page 14 of 16
Behavioral Interview
4/4 Cheat Sheet
Summarizing the behavioral interview

Gather important topics as keywords

1 Understand and collect all the important topics
commonly asked in the interview

Collect your stories

How to
2 Based on all the organizations you have been a part of,
think of all the stories that fall under the keywords above

prepare Practice stories in STAR format

for the 3 Practice each story using the STAR format. You will have
to answer the question following this format.

interview Assign keywords to stories

4 Assign each of your story one or more keywords. This will
help you recall them quickly

Create a summary table

5 Create a summary table mapping stories to their associated
keywords. This will be used during the behavioral question

Understand the question

U Understand the question and clarify any confusions that
you have

Extract the keywords

How to E Try to extract one or more of the keywords from the

question
answer a
Map the keywords to stories
question
during
M Based on the keywords extracted, find the stories using the
summary table created during preparation (Step 4)

interview Select a story

S Since each keyword maybe assigned to multiple stories,
select the one that is most relevant and has not been used.

Apply the START format

A Once the story has been shortlisted, apply STAR format on
the story to answer the question.
Icon Source: www.flaticon.com

Source: https://www.cheatsheets.aqeel-anwar.com Tutorial: Click here

Page 15 of 16
Follow the Author:
Follow the author for more machine learning/data science content at

• Medium:https://aqeel-anwar.medium.com
• ° LinkedIn:https://www.linkedin.com/in/aqeelanwarmalik/

Version History
• Version 0.1.0.2 - May 19, 2021
Added more content

• Version 0.1.0.1 - Apr 05, 2021

Fixed minor typo issues in Baye’s Theorem, Regression analysis and Classifier and
PCA dimensionality reduction cheat sheets.

• Version 0.1.0.0 - Mar 30, 2021

Initial draft with nine basics of ML and two behavioral interview cheat sheets.

Page 16 of 16

Python Seaborn Notes
No ratings yet
Python Seaborn Notes
28 pages
Errata For Digital Signal Processing A Computer Based Approach 4th Edition by Sanjit Mitra
No ratings yet
Errata For Digital Signal Processing A Computer Based Approach 4th Edition by Sanjit Mitra
8 pages
Machine Learning Cheat Sheet PDF
No ratings yet
Machine Learning Cheat Sheet PDF
15 pages
Machine Learning Interview
No ratings yet
Machine Learning Interview
14 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Machine Learning
100% (1)
Machine Learning
62 pages
Machine Learning Project Report
100% (1)
Machine Learning Project Report
4 pages
Statistics For Data Science - 1
100% (2)
Statistics For Data Science - 1
38 pages
Deploy A Machine Learning Model Using Flask - Towards Data Science
No ratings yet
Deploy A Machine Learning Model Using Flask - Towards Data Science
12 pages
Exploratory Analysis
100% (1)
Exploratory Analysis
3 pages
List of Deep Learning and NLP Resources
No ratings yet
List of Deep Learning and NLP Resources
69 pages
An Introduction of Ensemble Learning
100% (1)
An Introduction of Ensemble Learning
40 pages
Review Article: Data Mining For The Internet of Things: Literature Review and Challenges
No ratings yet
Review Article: Data Mining For The Internet of Things: Literature Review and Challenges
14 pages
Data Cleaning
No ratings yet
Data Cleaning
8 pages
Time Series
No ratings yet
Time Series
23 pages
Top 100 ML Interview Q&A
100% (1)
Top 100 ML Interview Q&A
39 pages
Principal Component Analysis (PCA) in Machine Learning
No ratings yet
Principal Component Analysis (PCA) in Machine Learning
20 pages
Using Categorical Data With One Hot Encoding - Kaggle PDF
No ratings yet
Using Categorical Data With One Hot Encoding - Kaggle PDF
4 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
15 pages
Logistic Regression
100% (1)
Logistic Regression
21 pages
Statistics For Data Science
No ratings yet
Statistics For Data Science
30 pages
Github Data Science Projects
No ratings yet
Github Data Science Projects
16 pages
Learning Path Machine Learning
No ratings yet
Learning Path Machine Learning
7 pages
Exploratory Data Analysis
100% (3)
Exploratory Data Analysis
26 pages
Statistical Machine Learning
100% (1)
Statistical Machine Learning
12 pages
Missing Value Treatment
No ratings yet
Missing Value Treatment
22 pages
Data Pre-Processing (Pandas)
No ratings yet
Data Pre-Processing (Pandas)
19 pages
Data Visualization Cheatsheet 1702209209
100% (1)
Data Visualization Cheatsheet 1702209209
7 pages
Statistics in Details
100% (2)
Statistics in Details
283 pages
A Comprehensive Statistics Cheat Sheet For Data Science 1685659812
No ratings yet
A Comprehensive Statistics Cheat Sheet For Data Science 1685659812
39 pages
Bias and Variance
No ratings yet
Bias and Variance
6 pages
Support Vector Machine
No ratings yet
Support Vector Machine
12 pages
Deep Learning Cheatsheet
No ratings yet
Deep Learning Cheatsheet
5 pages
1694600777-Unit2.2 Logistic Regression CU 2.0
100% (1)
1694600777-Unit2.2 Logistic Regression CU 2.0
37 pages
7 Time Series Datasets For Machine Learning
No ratings yet
7 Time Series Datasets For Machine Learning
8 pages
Classification Algorithms
100% (2)
Classification Algorithms
23 pages
Data Science Course Content
No ratings yet
Data Science Course Content
4 pages
DataScience Interview Questions
100% (1)
DataScience Interview Questions
66 pages
Semi-Automated Exploratory Data Analysis (EDA) in Python - by Destin Gong - Mar, 2021 - Towards Data
No ratings yet
Semi-Automated Exploratory Data Analysis (EDA) in Python - by Destin Gong - Mar, 2021 - Towards Data
3 pages
Prediction of Alzheimer's Disease Using CNN
100% (2)
Prediction of Alzheimer's Disease Using CNN
11 pages
Glossary of Statistical Terms: Roger Stern, Ian Dale and Sandro Leidi
No ratings yet
Glossary of Statistical Terms: Roger Stern, Ian Dale and Sandro Leidi
23 pages
02 - Decision Tree Classification On Iris Dataset
No ratings yet
02 - Decision Tree Classification On Iris Dataset
6 pages
Data Science Resource Package!
No ratings yet
Data Science Resource Package!
14 pages
Data Science Use Cases
100% (1)
Data Science Use Cases
10 pages
Linear Regression For Machine Learning
No ratings yet
Linear Regression For Machine Learning
17 pages
SAS Presentation
No ratings yet
SAS Presentation
49 pages
Pattern Classification
100% (1)
Pattern Classification
42 pages
Building Powerful Image Classification Models Using Very Little Data
No ratings yet
Building Powerful Image Classification Models Using Very Little Data
20 pages
Statistical Foundations - Intro 64zlf
100% (2)
Statistical Foundations - Intro 64zlf
86 pages
Kenny-230717-Google Data Scientist Guide
No ratings yet
Kenny-230717-Google Data Scientist Guide
8 pages
Bagging and Boosting Regression Algorithms
100% (1)
Bagging and Boosting Regression Algorithms
84 pages
Machine Learning and Real-World Applications
100% (1)
Machine Learning and Real-World Applications
19 pages
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
100% (1)
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
6 pages
Feature Selection Techniques in Machine Learning
No ratings yet
Feature Selection Techniques in Machine Learning
9 pages
Generative Adversarial Networks (GANs)
No ratings yet
Generative Adversarial Networks (GANs)
51 pages
ML Project Shivani Pandey
100% (2)
ML Project Shivani Pandey
49 pages
Machine Learning Algorithm Cheat Sheet - Laura Diane Hamilton
No ratings yet
Machine Learning Algorithm Cheat Sheet - Laura Diane Hamilton
2 pages
Outlier Detection: Univariate and Multivariate
No ratings yet
Outlier Detection: Univariate and Multivariate
13 pages
Career Plans For Next 2 Years
No ratings yet
Career Plans For Next 2 Years
11 pages
Start Predicting In A World Of Data Science And Predictive Analysis
From Everand
Start Predicting In A World Of Data Science And Predictive Analysis
Matthew Abbitt
No ratings yet
Machine Learning with Python: Design and Develop Machine Learning and Deep Learning Technique using real world code examples
From Everand
Machine Learning with Python: Design and Develop Machine Learning and Deep Learning Technique using real world code examples
Abhishek Vijayvargia
No ratings yet
Somatic RET Indels in Sporadic Medullary Thyroid Cancer
No ratings yet
Somatic RET Indels in Sporadic Medullary Thyroid Cancer
8 pages
Lecture 5 - Costs and Profit
No ratings yet
Lecture 5 - Costs and Profit
7 pages
Lists_Sets
No ratings yet
Lists_Sets
2 pages
Practical Obstetrics and Gynaecology Handbook for O G Clinicians and General Practitioners 2nd Edition Thiam Chye Tan - Read the ebook online or download it to own the full content
No ratings yet
Practical Obstetrics and Gynaecology Handbook for O G Clinicians and General Practitioners 2nd Edition Thiam Chye Tan - Read the ebook online or download it to own the full content
76 pages
Sidambaram V Lok Bee Yeong
No ratings yet
Sidambaram V Lok Bee Yeong
30 pages
3406C Specs
100% (5)
3406C Specs
4 pages
Indian Youth Cafe - Ecosystem Map
No ratings yet
Indian Youth Cafe - Ecosystem Map
1 page
A Full-Scale Fluvial Flood Modelling Framework Based On A High-Performance Integrated Hydrodynamic Modelling System (HiPIMS)
No ratings yet
A Full-Scale Fluvial Flood Modelling Framework Based On A High-Performance Integrated Hydrodynamic Modelling System (HiPIMS)
42 pages
Read Me - LHB Duronto Express
No ratings yet
Read Me - LHB Duronto Express
2 pages
Hong Kong Marriage and Family Therapy Association Limited: Name of Applicant: (English) ( )
No ratings yet
Hong Kong Marriage and Family Therapy Association Limited: Name of Applicant: (English) ( )
10 pages
Introduction To DNA Sequencing
100% (2)
Introduction To DNA Sequencing
12 pages
30 Day Challenge Meal-Plan - Week 1
No ratings yet
30 Day Challenge Meal-Plan - Week 1
39 pages
La Forchetta Trademark Application
No ratings yet
La Forchetta Trademark Application
15 pages
IGBT MODULE (N Series) : N N Outline Drawing
No ratings yet
IGBT MODULE (N Series) : N N Outline Drawing
4 pages
Practice of Urban Aerial Ropeways: Work Report No.1
No ratings yet
Practice of Urban Aerial Ropeways: Work Report No.1
79 pages
Figma Basics For Bravo - Beginner
No ratings yet
Figma Basics For Bravo - Beginner
50 pages
Graphic Designer Neville Brody Facts
No ratings yet
Graphic Designer Neville Brody Facts
3 pages
Inventorymgnt BBA 3rd
No ratings yet
Inventorymgnt BBA 3rd
69 pages
Elvax® 460 PDF
No ratings yet
Elvax® 460 PDF
2 pages
03 Wheel Loader Typical Failure Analysis Mannul-Axle
No ratings yet
03 Wheel Loader Typical Failure Analysis Mannul-Axle
37 pages
An Assignment Based On Career Planning
No ratings yet
An Assignment Based On Career Planning
12 pages
Employee Warning Notice Form in DOC
No ratings yet
Employee Warning Notice Form in DOC
1 page
Business Combinations-Conso at DOA Pt2
No ratings yet
Business Combinations-Conso at DOA Pt2
1 page
G1 Housekeeping
No ratings yet
G1 Housekeeping
35 pages
Manual Julabo HE4
No ratings yet
Manual Julabo HE4
78 pages
Computeractive - September 25, 2024 UK
No ratings yet
Computeractive - September 25, 2024 UK
76 pages
PSL Help
100% (1)
PSL Help
58 pages
Lista-Pyramid (27-08-2014)
No ratings yet
Lista-Pyramid (27-08-2014)
19 pages
2223 BLP ws16 Ce01 Guide
No ratings yet
2223 BLP ws16 Ce01 Guide
18 pages