[Link] between regression and classification with suitable real-world examples.
2. Explain the workflow of a supervised learning model. What are the main components?
3. Describe the working of simple linear regression. Derive the formula for the regression line
using least squares.
4. Explain multiple linear regression. How is it different from simple linear regression?
5. What is polynomial regression? How does it handle non-linear data? Give an example.
6. Explain the concept of regularization in regression. Why is it needed?
7. Differentiate between Ridge and Lasso regression. When would you prefer one over the
other?
8. Explain the bias-variance tradeoff with the help of diagrams and examples.
9. Describe the working of Support Vector Regression (SVR). How is it different from
traditional linear regression?
10. Explain logistic regression. Derive the sigmoid function and describe its significance.
11. Differentiate between binary and multi-class classification in logistic regression. How is
multi-class handled?
12. Discuss the K-Nearest Neighbors algorithm. What are its advantages and limitations?
13. How does the choice of 'k' affect the performance of the KNN algorithm?
14. What is a hyperplane in SVM? Explain the role of support vectors in classification.
15. Describe the use of kernel tricks in SVM. Compare linear, polynomial, and RBF kernels.
16. How does SVM handle linear and non-linear classification problems? Illustrate with
examples.
17. Explain the process of constructing a decision tree. How is information gain used?
18. What is pruning in decision trees? Why is it important?
19. Describe the ensemble method of Bagging with an example. How does it improve model
performance?
20. Explain Random Forests. How do they address overfitting in decision trees?
1. Difference between Regression and Classification with Examples
Regression predicts continuous numerical values, e.g., predicting house prices or
temperature. Classification predicts discrete categories or classes, e.g., identifying spam
emails or cancer detection. Regression outputs values on a continuous scale, while
classification outputs class labels like "spam" or "not spam"[1][2][3].
2. Workflow of a Supervised Learning Model and Main Components
The workflow includes:
Data collection and preprocessing
Splitting data into training and testing sets
Choosing a model (e.g., regression or classification)
Training the model on labeled data
Evaluating model performance using metrics
Making predictions on new data
Main components: input features, labeled output, model, loss function, and evaluation
metric[4].
3. Working of Simple Linear Regression and Derivation of Regression Line
Simple linear regression models the relationship between one independent variable x and
dependent variable y with a line y=mx+c .
Using least squares, minimize the sum of squared errors:
S=∑ ¿
Differentiating w.r.t. m and c , set to zero, solve for m and c to get:
n ∑ xi y i−∑ x i ∑ y i
m=
n ∑ xi −¿ ¿
2
This line best fits the data minimizing error[4].
4. Multiple Linear Regression and Difference from Simple Linear Regression
Multiple linear regression predicts y using multiple independent variables:
y=β 0 + β 1 x 1 + β 2 x 2 +…+ β n x n+ ϵ
Unlike simple linear regression with one predictor, multiple regression handles several predictors
simultaneously to capture more complex relationships[2][4].
5. Polynomial Regression and Handling Non-linear Data with Example
Polynomial regression fits a curve by modeling y as a polynomial of degree d :
2 d
y=β 0 + β 1 x + β 2 x +…+ βd x + ϵ
It captures non-linear relationships by adding powers of x . Example: modeling growth rate of
plants over time where growth accelerates non-linearly[2].
6. Concept and Need for Regularization in Regression
Regularization adds a penalty term to the loss function to prevent overfitting by shrinking
coefficients. It controls model complexity, improving generalization on unseen data. Without
regularization, models may fit noise in training data[4].
7. Difference between Ridge and Lasso Regression and Preference
Ridge adds L2 penalty (∑ β 2j ), shrinking coefficients but not zeroing them.
Lasso adds L1 penalty (∑ ∨β j∨¿ ), which can shrink some coefficients to zero, performing
feature selection.
Prefer Lasso when you want sparse models; Ridge when all features are useful but need
shrinkage[4].
8. Bias-Variance Tradeoff with Diagrams and Examples
Bias is error from wrong assumptions (underfitting), variance is error from sensitivity to data
fluctuations (overfitting).
High bias: simple model, poor training and test accuracy
High variance: complex model, good training but poor test accuracy
Tradeoff balances these to minimize total error[4].
9. Support Vector Regression (SVR) and Difference from Linear Regression
SVR fits a function within a margin ϵ , ignoring errors within this margin and penalizing
errors outside it. It uses support vectors to define the margin. Unlike linear regression
minimizing squared errors, SVR focuses on fitting within a tube, robust to outliers[4].
10. Logistic Regression, Sigmoid Function Derivation and Significance
Logistic regression models probability p of class 1 as:
1
p= −z
, z=β 0 + β 1 x
1+e
Sigmoid function maps any real number to (0,1), enabling probability interpretation. Derived from
odds ratio and logit transform, it is key for binary classification[4].
11. Binary vs Multi-class Classification in Logistic Regression and Handling Multi-class
Binary logistic regression predicts two classes. Multi-class classification extends this using:
One-vs-Rest (OvR): train one classifier per class vs others
Softmax regression: generalizes sigmoid to multiple classes, outputs class probabilities [4].
12. K-Nearest Neighbors (KNN) Algorithm, Advantages and Limitations
KNN predicts label based on majority class among k nearest neighbors in feature space.
Advantages: simple, no training phase, effective with well-separated classes.
Limitations: computationally expensive at prediction, sensitive to irrelevant features and
choice of k [4].
13. Effect of Choice of 'k' in KNN Performance
Small k : sensitive to noise, high variance, overfitting
Large k : smoother decision boundary, high bias, underfitting
Optimal k balances bias and variance, often chosen via cross-validation[4].
14. Hyperplane in SVM and Role of Support Vectors
A hyperplane is a decision boundary separating classes in feature space. Support vectors are
data points closest to the hyperplane that define its position and margin. They are critical for
maximizing margin and model robustness[4].
15. Kernel Tricks in SVM and Comparison of Linear, Polynomial, RBF Kernels
Kernel trick maps data to higher dimensions without explicit computation.
Linear kernel: for linearly separable data
Polynomial kernel: captures polynomial relations, flexible curves
RBF (Radial Basis Function): maps to infinite dimensions, handles complex non-linear
boundaries
Choice depends on data complexity[4].
16. SVM Handling Linear and Non-linear Classification with Examples
Linear SVM finds a straight hyperplane for separable data
Non-linear SVM uses kernels (e.g., RBF) to separate data in transformed space
Example: linearly separable emails vs complex image classification[4].
17. Constructing a Decision Tree and Use of Information Gain
Decision tree splits data based on features to maximize purity. Information gain measures
reduction in entropy after a split. The feature with highest information gain is chosen to split
nodes, recursively building the tree[4].
18. Pruning in Decision Trees and Its Importance
Pruning removes branches with little predictive power to reduce overfitting. It simplifies the
tree, improves generalization, and reduces model complexity[4].
19. Bagging Ensemble Method with Example and Performance Improvement
Bagging builds multiple models on bootstrapped samples and aggregates predictions (e.g.,
majority vote). Example: Random Forest uses bagging of decision trees. It reduces variance
and improves stability and accuracy[4].
20. Random Forests and Addressing Overfitting in Decision Trees
Random Forests combine many decision trees trained on random feature subsets and
samples. This randomness decorrelates trees, reducing overfitting common in single trees
and improving generalization[4].
If you want, I can expand any answer with formulas or diagrams as needed.
1. [Link]
2. [Link]
3. [Link]
4. education.machine_learning