Machine Learning Interview Questions
Machine Learning Interview Questions
Online Learning
Confusion Matrix
Precision vs Recall
When you plot a learning curve(training error,cross validation error as a function of number of
training examples) if training error and cross validation error both are high then high bias or high
variance High Bias
If you train a model and it performs well on the training set and fails to generalize on the validation
set then High bias or high variance High Variance
Adding new features fixes High Bias or High Variance High Bias
In case of neural network adding more layers reduces bias or variance Bias
PageRank Algorithm and its applications(Word Sense Disambiguation what specific meaning is
being conveyed by the given sentence whenever it’s appearing, Auto Summarization of
Text(TextRank))
How would you build a model to distinguish between apple company and apple fruit
Min- Max tries to get the values closer to mean. But when there are outliers in the data which are
important and we don’t want to loose their impact ,we go with Z score normalization.
If your data contains more than 30% missing values what would you do ??
-> Perform treatement or drop them from the analysis
The bias value allows the activation function to be shifted to the left or right, to better fit the data.
Hence changes to the weights alter the steepness of the sigmoid curve, whilst the bias offsets it,
shifting the entire curve so it fits better. Note also how the bias only influences the output values, it
doesn’t interact with the actual input data
New Questions
Underfitting OverFitting
4.Using regularization
8.What is validation set used for?
These two facts have a great consequence: Gradient Descent is guaranteed to approach arbitrarily
close the global minimum (if you wait long enough and if the learning rate is not too high
11. Does Stochastic Gradient Descent always return optimal parameter values. If not, why ?
Due to its stochastic (i.e., random) nature, this algorithm is much less regular than Batch Gradient
Descent: instead of gently decreasing until it reaches the minimum, the cost function will bounce
up and down, decreasing only on average.
Over time it will end up very close to the minimum, but once it gets there it will continue to bounce
around, never settling down . So once the algorithm stops, the final parameter values are good, but
not optimal.
Randomness is good to escape from local optima, but bad because it means that the algorithm can
never settle at the minimum. One solution to this dilemma is to gradually reduce the learning rate.
The steps start out large (which helps make quick progress and escape local minima), then get
smaller and smaller, allowing the algorithm to settle at the global minimum. This process is called
simulated annealing, because it resembles the process of annealing in metallurgy where molten
metal is slowly cooled down.
The function that determines the learning rate at each iteration is called the learning schedule. If
the learning rate is reduced too quickly, you may get stuck in a local minimum, or even end up
frozen halfway to the minimum. If the learning rate is reduced too slowly, you may jump around the
minimum for a long time and end up with a suboptimal solution if you halt training too early
They are:
1. Linear Algebra
2. Singular Value Decomposition
3. Introductory level Pattern Recognition
4. Principal Component Analysis
5. Linear Discriminant Analysis
6. Fourier Transform
7. Wavelets
8. Probability, Bayes rule, Maximum Likelihood, MAP
9. Mixtures and Expectation-Maximization Algorithm
10. Introductory level Statistical Learning
11. Support Vector Machines
12. Genetic Algorithms
13. Hidden Markov Models
14. Bayesian Networks
15. Kalman filtering
6. How many channels are there in Grayscale Image and RGB image ?
LINEAR ALGEBRA