Machine Learning Regression Interview
Questions with Answers
Basic Questions with Answers
1. What is regression in machine learning? Why is it important?
• Answer: Regression is a supervised learning technique used to
predict continuous values. For example, predicting house prices
or stock market trends. It is important because many real-world
problems require forecasting or determining numerical outcomes.
2. What is Linear Regression? Explain with a simple example.
• Answer: Linear Regression is a method to model the relationship
between a dependent variable (output) and one independent vari-
able (input) using a straight line. For example, predicting house
price based on its area.
3. What is the formula for Linear Regression?
• Answer: The formula is:
y = mx + c
where y is the predicted output, x is the input, m is the slope,
and c is the y-intercept.
4. What is Multiple Linear Regression, and how is it different
from Linear Regression?
1
• Answer: Multiple Linear Regression involves more than one in-
dependent variable. The formula is:
y = m1 x1 + m2 x2 + · · · + c
It is used when multiple factors affect the outcome.
5. How do we evaluate the performance of Linear and Multiple
Regression?
• Answer: Common metrics include Mean Squared Error (MSE),
Mean Absolute Error (MAE), and R-squared (R2 ) to measure the
goodness of fit.
6. What is Polynomial Regression? How is it different?
• Answer: Polynomial Regression models non-linear relationships
by introducing polynomial terms, such as x2 , x3 , etc., into the
equation. It captures curved trends in data.
7. How do we decide whether to use Linear, Multiple, or Poly-
nomial Regression?
• Answer: If the data has a simple linear relationship, use Linear
Regression. If multiple factors influence the output, use Multiple
Linear Regression. For non-linear relationships, use Polynomial
Regression.
8. What are the limitations of these regression techniques?
• Answer: Linear and Multiple Regression assume a linear rela-
tionship, which may not always hold. Polynomial Regression can
overfit data if the degree of the polynomial is too high.
9. What is overfitting, and how can we prevent it?
• Answer: Overfitting occurs when the model performs well on
training data but poorly on unseen data. Prevention methods
include cross-validation, using simpler models, or applying regu-
larization techniques.
10. Explain regularization in regression.
2
• Answer: Regularization adds a penalty to the model’s coeffi-
cients to prevent overfitting. Ridge Regression penalizes large co-
efficients, while Lasso Regression can shrink some coefficients to
zero.
11. How would you implement Linear Regression from scratch?
• Answer: Steps include:
(a) Calculate the slope (m) and intercept (c) using the formula
for the line of best fit.
(b) Use the equation y = mx + c for predictions.
(c) Optimize parameters using Gradient Descent.
12. Can you explain how Gradient Descent works in Linear Re-
gression?
• Answer: Gradient Descent minimizes the error by iteratively ad-
justing the parameters m and c in the direction of the negative
gradient of the loss function.
13. How would you explain the assumptions of Linear Regression?
• Answer: Key assumptions include:
(a) Linearity: The relationship between input and output is lin-
ear.
(b) Homoscedasticity: Residuals have constant variance.
(c) Independence: Observations are independent.
(d) Normality: Residuals are normally distributed.
14. What are Ridge and Lasso Regression?
• Answer: Ridge Regression penalizes the square of coefficients,
while Lasso Regression penalizes the absolute value, leading to
sparse models with some coefficients reduced to zero.
15. How do you handle categorical variables in regression?
• Answer: Use One-Hot Encoding to convert categorical variables
into numerical format. For example, a category with three levels
becomes three binary variables.
3
Frame Depth Question with Answer
Question: You are designing a machine learning model to predict house
prices. You have the following data:
• Number of bedrooms (continuous).
• Area in square feet (continuous).
• Neighborhood (categorical: A, B, C).
• Year of construction (continuous).
• Proximity to schools (categorical: Near, Far).
The relationship between features and price is not strictly linear; it includes
some non-linear effects (e.g., price increases faster for larger houses in neigh-
borhoods A and B).
1. Which regression technique(s) would you use to create the best predic-
tive model?
2. Explain your steps, including data preparation, model choice, evalua-
tion, and how you’d address overfitting.
Answer:
1. Data Preparation:
• Apply One-Hot Encoding for categorical variables like Neighbor-
hood and Proximity.
• Normalize continuous variables like Area and Year of Construc-
tion.
2. Initial Model:
• Start with Multiple Linear Regression to analyze linear relation-
ships.
3. Handling Non-Linearity:
• Introduce Polynomial Regression for non-linear relationships (e.g.,
Area).
4
4. Prevent Overfitting:
• Use Ridge or Lasso Regression to regularize the model.
5. Model Evaluation:
• Evaluate performance using Mean Squared Error (MSE), Mean
Absolute Error (MAE), and R-squared.
• Perform Cross-Validation to ensure robustness.