3. Explain the role of entropy as an information measure 14.
Discuss the concept of feature engineering and its
20. Discuss Non-Parametric Testing. Provide examples
in probability theory.
significance in machine learning models.
where it is preferable to parametric methods.
Entropy quantifies uncertainty in a random variable. Its Feature engineering is the process of selecting, modifying,
Definition: Non-parametric tests do not assume specific
formula is:H(X) = -\sum_{i=1}^{n} p(x_i) \log p(x_i)
or creating new features (variables or attributes) from raw
distributions for the data.
Low Entropy: Less uncertainty (e.g., a biased coin with data to improve the performance of machine learning
Examples:
has lower entropy).Entropy helps measure unpredictability [Link] Concepts in Feature Engineering:
1. Mann-Whitney U Test: Compares medians of two groups
in information systems, aiding in decision-making and 1. Feature Creation: Creating new features based on
when normality cannot be assumed.
efficient data encoding. existing ones. This could involve mathematical
Example: Comparing customer satisfaction ratings (ordinal
transformations combining multiple features, or extracting
data) between two branches.
meaningful information
5E. xplain the differences between discrete and 2. Feature Selection: Identifying and retaining only the
2. Kruskal-Wallis Test: Non-parametric alternative to
ANOVA for comparing more than two groups.
continuous random variables. Provide examples of each.
most relevant features for the model. Irrelevant or
Example: Analyzing exam scores across schools with non-
Discrete Random Variable
redundant features can increase complexity, overfitting,
normal data.
1. Definition: A random variable that can take on a and reduce the model's generalizability.
Advantages:
countable number of distinct values.
3. Feature Transformation: This involves changing the
Robust to outliers. Applicable to small sample sizes.
2. Values: Usually integers or specific values (e.g., 0, 1, 2, scale, distribution, or representation of data.
Preferable When: Data is ordinal, non-normal, or has
3...).
4. Handling Missing Data: Dealing with missing values by
unequal variances.
3. Examples: Number of students in a class.
using imputation techniques, such as replacing missing
Rolling a dice (values: 1, 2, 3, 4, 5, 6). values with the mean, median, or mode, or using more
advanced techniques like KNN or regression imputation.
5E. xplain the differences between discrete and
5. Dealing with Categorical Data: Categorical features
need to be converted into numerical representations for
continuous random variables. Provide examples of each.
most machine learning models
Discrete Random Variable
Significance of Feature Engineering:
1. Definition: A random variable that can take on a
1. Improved Model Accuracy: 2. Dimensionality
countable number of distinct values.
Reduction3. Better Generalization
2. Values: Usually integers or specific values (e.g., 0, 1, 2,
4. Data Representation 5. Domain Knowledge Integration
3...).
3. Examples: Number of students in a class.
Rolling a dice (values: 1, 2, 3, 4, 5, 6). 15E . xplain regression methods (L inear and L ogistic ) and
their applications in predictive modeling.
L inear Regression:
4. Probabilit y Distribution: Represented using a
Definition: A statistical method for predicting a continuous
y ( PMF) which assigns
dependent variable based on one or more independent
Y=\ _0 + \ _X+\ n
probabilit mass function
[Link]: beta beta 1 epsilo
:I : :
probabilities to individual outcomes.
ndependent variable (input). Coefficients. Error term.
Continuous Random Variable
Application: Predicting house prices based on features like
1. Definition: A random variable that can take on any value
area, location, and number of bedrooms.
within a given range (uncountable).
2. Values: Real numbers, often measured rather than L ogistic Regression:Definition: A classification algorithm
used to predict probabilities of binary outcomes.
counted (e.g., height, weight, time).
3. Examples: The height of people in a group.
Equation( Y= = \
1) frac 1{ }{ + ^{ \
1 e -( beta _0 + \
beta 1 _ X)}}
Application: Predicting whether a patient has a disease
The time it takes to complete a task (e.g., 2.1, 2.11, 2.111
seconds).
(Y / es No) based on medical parameters.
4. Probability Distribution: Represented using a probability Comparison: L inear regression is used for continuous
density function (PDF). Probabilities are calculated for outputs. Logistic regression is used for classification tasks.
intervals, not specific values (e.g., ).
16W . hat is the C y? H
urse of Dimensionalit ow does it
8 . Define covariance and eigenfeatures. E xplain their role affect data models ?
in data science.
Definition: As the number of features (dimensions)
Covariance: Measures the relationship between two increases, the volume of data required to build a reliable
variables:Cov(X, Y) = \frac{\sum (X_i - \bar{X})(Y_i - model grows exponentially.
\bar{Y})}{n-1}Eigenfeatures: Principal components derived Effect:Sparse Data: Data becomes sparse in high-
from covariance matrices, used in dimensionality dimensional [Link] Overfitting: Models may
reduction techniques like [Link]: Covariance identifies memorize noise instead of learning patterns.
relationships, and eigenfeatures simplify complex datasets Increased Complexity: Computations and storage
while preserving variability. requirements [Link]: In a 2D space, 100 data
points may suffice to form clusters. In a 100D space,
9. W hat is the Maximum Lk
i elihood Estimation (MLE) exponentially more points are needed for similar
principle, and how is it used in By a esian classification ?
[Link]:Dimensionality reduction techniques
L
M E estimates parameters by maximi ing the likelihood of z like [Link] selection to retain only relevant features.
observed [Link]: For a Gaussian distribution, MLE
estimates mean () and variance () :
C Sq W y
L(\mu, \sigma^2) = \prod_{i=1}^{n} \frac{1} 17. Discuss ANOVA
y ?
and hi- uare tests. hen are the
{\sqrt{2\pi\sigma^2}} e^{-\frac{(x_i - \mu)^2}{2\sigma^2}}
used in data anal sis
In Bayesian Classification: MLE computes probabilities of ANOVA (Analysis of Variance):
Purpose: Compares means of three or more groups to
classes, aiding in optimal classification.
determine if at least one group mean differs significantly.
E C L Assumptions:
10. xplain the entral imit Theorem and its implications
Data follows a normal [Link] are equal
in statistical inference.
The CLT states that, for a sufficiently large sample size,
across [Link]: Comparing test scores of
the sample mean’s distribution approaches a normal
students across three different teaching methods.
Chi-Square Test:Purpose: Tests the association between
distribution, regardless of the population’s distribution.
categorical [Link]:
Implications:Enables use of normal distribution for \chi^2 = \sum \frac{(O - E)^2}{E}: Expected frequency.
confidence intervals and hypothesis testing.
Example: Analyzing the relationship between gender and
Simplifies analysis when population data is complex. choice of a [Link]:ANOVA deals with
continuous data (means).Chi-square focuses on
y
11. Define h pothesis testing. Explain the Ne y man- categorical data (frequencies).
Pearson framework with an example.
Hypothesis Testing: A method to decide whether to reject
a null hypothesis () based on evidence.
18E . xplain the concept of bias and variance in data
Neyman-Pearson Framework: Balances Type I () and Type models. H y
ow do the ?
impact model performance
II () errors for optimal decision-making.
Bias:
Example: Testing a new drug’s effectiveness:
Definition: Error introduced by approximating a real-world
: No improvement.
problem using a simplified model.
: Significant improvement.
Effect: High bias leads to underfitting.
The decision depends on statistical thresholds for errors . Example: Assuming a linear relationship when data follows
a polynomial trend.
W y Variance:Definition: Sensitivity of a model to small
12. hat are Z-scores,
?I
and how are the used in
changes in the training dataset.
detecting outliers llustrate with an example.
Effect: High variance leads to overfitting.
Z-scores measure how far a data point is from the mean in
Example: A decision tree capturing noise instead of
terms of standard deviations:
[Link]-off: A good model balances bias and
Z = \frac{X - \mu}{\sigma}
Outliers: Points with are considered outliers.
variance to minimize total error.
Example:
Data: [10, 12, 15, 50].
19. W hat is Nearest Neighbor (K-NN) classification, and
Mean = 21.75, .
how does it work ?
Z for 50 = (not an outlier). Definition: A non-parametric method that classifies a data
point based on the majority class of its nearest neighbors.
13. Explain the difference between simple linear Algorithm:
regression and logistic regression.
1. Choose the number of neighbors ().
Simple Linear Regression: Predicts continuous outcomes:
2. Compute distances between the query point and all
Y = aX + b
points in the training set.
Logistic Regression: Predicts probabilities for binary 3. Select closest neighbors.
outcomes:
4. Assign the most frequent class among neighbors to the
P = \frac{1}{1+e^{-(aX+b)}}
query point.
Example: Linear regression predicts sales; logistic Example: Dataset: [(1,1,‘A’), (2,2,‘A’), (5,5,‘B’)]. Query Point:
regression classifies email as spam or not. (1.5, 1.5). Nearest Neighbor: (1,1).Predicted Class: ‘A’.
Application: Used in recommendation systems and pattern
recognition.