0% found this document useful (0 votes)

111 views1 page

ERERER

Uploaded by

Harshit Malik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

111 views1 page

ERERER

Uploaded by

Harshit Malik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

3. Explain the role of entropy as an information measure 14.

Discuss the concept of feature engineering and its

20. Discuss Non-Parametric Testing. Provide examples
in probability theory.
significance in machine learning models.

where it is preferable to parametric methods.

Entropy quantifies uncertainty in a random variable. Its Feature engineering is the process of selecting, modifying,
Definition: Non-parametric tests do not assume specific
formula is:H(X) = -\sum_{i=1}^{n} p(x_i) \log p(x_i)
or creating new features (variables or attributes) from raw
distributions for the data.

Low Entropy: Less uncertainty (e.g., a biased coin with data to improve the performance of machine learning
Examples:

has lower entropy).Entropy helps measure unpredictability [Link] Concepts in Feature Engineering:

1. Mann-Whitney U Test: Compares medians of two groups

in information systems, aiding in decision-making and 1. Feature Creation: Creating new features based on
when normality cannot be assumed.

efficient data encoding. existing ones. This could involve mathematical

Example: Comparing customer satisfaction ratings (ordinal
transformations combining multiple features, or extracting
data) between two branches.

meaningful information

5E. xplain the differences between discrete and 2. Feature Selection: Identifying and retaining only the
2. Kruskal-Wallis Test: Non-parametric alternative to

ANOVA for comparing more than two groups.

continuous random variables. Provide examples of each.

most relevant features for the model. Irrelevant or
Example: Analyzing exam scores across schools with non-
Discrete Random Variable
redundant features can increase complexity, overfitting,
normal data.

1. Definition: A random variable that can take on a and reduce the model's generalizability.

Advantages:

countable number of distinct values.

3. Feature Transformation: This involves changing the
Robust to outliers. Applicable to small sample sizes.

2. Values: Usually integers or specific values (e.g., 0, 1, 2, scale, distribution, or representation of data.

Preferable When: Data is ordinal, non-normal, or has

3...).
4. Handling Missing Data: Dealing with missing values by
unequal variances.
3. Examples: Number of students in a class.
using imputation techniques, such as replacing missing

Rolling a dice (values: 1, 2, 3, 4, 5, 6). values with the mean, median, or mode, or using more

advanced techniques like KNN or regression imputation.

5E. xplain the differences between discrete and

5. Dealing with Categorical Data: Categorical features

need to be converted into numerical representations for

continuous random variables. Provide examples of each.

most machine learning models

Discrete Random Variable

Significance of Feature Engineering:

1. Definition: A random variable that can take on a

1. Improved Model Accuracy: 2. Dimensionality
countable number of distinct values.

Reduction3. Better Generalization

2. Values: Usually integers or specific values (e.g., 0, 1, 2,

4. Data Representation 5. Domain Knowledge Integration
3...).

3. Examples: Number of students in a class.

Rolling a dice (values: 1, 2, 3, 4, 5, 6). 15E . xplain regression methods (L inear and L ogistic ) and

their applications in predictive modeling.

L inear Regression:

4. Probabilit y Distribution: Represented using a

Definition: A statistical method for predicting a continuous

y ( PMF) which assigns

dependent variable based on one or more independent

Y=\ _0 + \ _X+\ n

probabilit mass function

[Link]: beta beta 1 epsilo

:I : :
probabilities to individual outcomes.

ndependent variable (input). Coefficients. Error term.

Continuous Random Variable

Application: Predicting house prices based on features like

1. Definition: A random variable that can take on any value
area, location, and number of bedrooms.

within a given range (uncountable).

2. Values: Real numbers, often measured rather than L ogistic Regression:Definition: A classification algorithm

used to predict probabilities of binary outcomes.

counted (e.g., height, weight, time).

3. Examples: The height of people in a group.

Equation( Y= = \
1) frac 1{ }{ + ^{ \
1 e -( beta _0 + \
beta 1 _ X)}}

Application: Predicting whether a patient has a disease

The time it takes to complete a task (e.g., 2.1, 2.11, 2.111
seconds).
(Y / es No) based on medical parameters.

4. Probability Distribution: Represented using a probability Comparison: L inear regression is used for continuous

density function (PDF). Probabilities are calculated for outputs. Logistic regression is used for classification tasks.

intervals, not specific values (e.g., ).

16W . hat is the C y? H
urse of Dimensionalit ow does it

8 . Define covariance and eigenfeatures. E xplain their role affect data models ?

in data science.
Definition: As the number of features (dimensions)
Covariance: Measures the relationship between two increases, the volume of data required to build a reliable
variables:Cov(X, Y) = \frac{\sum (X_i - \bar{X})(Y_i - model grows exponentially.

\bar{Y})}{n-1}Eigenfeatures: Principal components derived Effect:Sparse Data: Data becomes sparse in high-
from covariance matrices, used in dimensionality dimensional [Link] Overfitting: Models may
reduction techniques like [Link]: Covariance identifies memorize noise instead of learning patterns.

relationships, and eigenfeatures simplify complex datasets Increased Complexity: Computations and storage
while preserving variability. requirements [Link]: In a 2D space, 100 data
points may suffice to form clusters. In a 100D space,
9. W hat is the Maximum Lk
i elihood Estimation (MLE) exponentially more points are needed for similar
principle, and how is it used in By a esian classification ?
[Link]:Dimensionality reduction techniques
L
M E estimates parameters by maximi ing the likelihood of z like [Link] selection to retain only relevant features.
observed [Link]: For a Gaussian distribution, MLE
estimates mean () and variance () :
C Sq W y
L(\mu, \sigma^2) = \prod_{i=1}^{n} \frac{1} 17. Discuss ANOVA
y ?

and hi- uare tests. hen are the

{\sqrt{2\pi\sigma^2}} e^{-\frac{(x_i - \mu)^2}{2\sigma^2}}

used in data anal sis

In Bayesian Classification: MLE computes probabilities of ANOVA (Analysis of Variance):

Purpose: Compares means of three or more groups to

classes, aiding in optimal classification.
determine if at least one group mean differs significantly.

E C L Assumptions:

10. xplain the entral imit Theorem and its implications

Data follows a normal [Link] are equal
in statistical inference.

The CLT states that, for a sufficiently large sample size,

across [Link]: Comparing test scores of
the sample mean’s distribution approaches a normal
students across three different teaching methods.

Chi-Square Test:Purpose: Tests the association between

distribution, regardless of the population’s distribution.
categorical [Link]:

Implications:Enables use of normal distribution for \chi^2 = \sum \frac{(O - E)^2}{E}: Expected frequency.

confidence intervals and hypothesis testing.

Example: Analyzing the relationship between gender and
Simplifies analysis when population data is complex. choice of a [Link]:ANOVA deals with
continuous data (means).Chi-square focuses on
y
11. Define h pothesis testing. Explain the Ne y man- categorical data (frequencies).
Pearson framework with an example.

Hypothesis Testing: A method to decide whether to reject

a null hypothesis () based on evidence.
18E . xplain the concept of bias and variance in data
Neyman-Pearson Framework: Balances Type I () and Type models. H y
ow do the ?

impact model performance

II () errors for optimal decision-making.
Bias:

Example: Testing a new drug’s effectiveness:

Definition: Error introduced by approximating a real-world
: No improvement.
problem using a simplified model.

: Significant improvement.
Effect: High bias leads to underfitting.

The decision depends on statistical thresholds for errors . Example: Assuming a linear relationship when data follows
a polynomial trend.

W y Variance:Definition: Sensitivity of a model to small

12. hat are Z-scores,
?I
and how are the used in
changes in the training dataset.

detecting outliers llustrate with an example.

Effect: High variance leads to overfitting.

Z-scores measure how far a data point is from the mean in

Example: A decision tree capturing noise instead of
terms of standard deviations:

[Link]-off: A good model balances bias and

Z = \frac{X - \mu}{\sigma}

Outliers: Points with are considered outliers.

variance to minimize total error.

Example:

Data: [10, 12, 15, 50].

19. W hat is Nearest Neighbor (K-NN) classification, and
Mean = 21.75, .
how does it work ?

Z for 50 = (not an outlier). Definition: A non-parametric method that classifies a data

point based on the majority class of its nearest neighbors.

13. Explain the difference between simple linear Algorithm:

regression and logistic regression.

1. Choose the number of neighbors ().

Simple Linear Regression: Predicts continuous outcomes:

2. Compute distances between the query point and all
Y = aX + b
points in the training set.

Logistic Regression: Predicts probabilities for binary 3. Select closest neighbors.

outcomes:
4. Assign the most frequent class among neighbors to the
P = \frac{1}{1+e^{-(aX+b)}}
query point.

Example: Linear regression predicts sales; logistic Example: Dataset: [(1,1,‘A’), (2,2,‘A’), (5,5,‘B’)]. Query Point:
regression classifies email as spam or not. (1.5, 1.5). Nearest Neighbor: (1,1).Predicted Class: ‘A’.

Application: Used in recommendation systems and pattern

recognition.

Data Analytics Course Overview and Concepts
No ratings yet
Data Analytics Course Overview and Concepts
40 pages
QBank All Mod
No ratings yet
QBank All Mod
5 pages
Exam Topics2
No ratings yet
Exam Topics2
7 pages
Question Bank
No ratings yet
Question Bank
6 pages
SVM, ANOVA, and Data Analytics Insights
No ratings yet
SVM, ANOVA, and Data Analytics Insights
1 page
Data Scientist Interview Prep
No ratings yet
Data Scientist Interview Prep
23 pages
Pattern L1 L6
No ratings yet
Pattern L1 L6
19 pages
Pattern Recognition and Machine Learning Overview
No ratings yet
Pattern Recognition and Machine Learning Overview
28 pages
FDSA SEM Answer Key
No ratings yet
FDSA SEM Answer Key
11 pages
UNIT 1 Practice Quiz - MCQs - ML
100% (1)
UNIT 1 Practice Quiz - MCQs - ML
10 pages
Key Concepts in Data Science and Machine Learning
No ratings yet
Key Concepts in Data Science and Machine Learning
19 pages
Data Science Interview Questions Guide
No ratings yet
Data Science Interview Questions Guide
55 pages
Data Management in Machine Learning
No ratings yet
Data Management in Machine Learning
9 pages
Conceptual Questions
No ratings yet
Conceptual Questions
5 pages
Python and R Programming Quiz Answers
No ratings yet
Python and R Programming Quiz Answers
35 pages
DWM Quesans
No ratings yet
DWM Quesans
21 pages
Foundations of Data Science Questions
No ratings yet
Foundations of Data Science Questions
93 pages
Interview Quations Data Science
50% (2)
Interview Quations Data Science
3 pages
Data Science Basics for Civil Engineers
No ratings yet
Data Science Basics for Civil Engineers
7 pages
Mid-Term Data Science Exam Questions
No ratings yet
Mid-Term Data Science Exam Questions
6 pages
Classification Concepts and Techniques
No ratings yet
Classification Concepts and Techniques
8 pages
Final Exam: Data Science MS4610
No ratings yet
Final Exam: Data Science MS4610
11 pages
FA1 Module 1,2,3 ML
No ratings yet
FA1 Module 1,2,3 ML
6 pages
Unique Features of Data Science Projects
No ratings yet
Unique Features of Data Science Projects
44 pages
Python Data Science Essentials
No ratings yet
Python Data Science Essentials
11 pages
Big Data Concepts and Analytics Explained
No ratings yet
Big Data Concepts and Analytics Explained
15 pages
Unsupervised Learning Techniques Overview
No ratings yet
Unsupervised Learning Techniques Overview
30 pages
BigDataSolution of Paper Oct 2022
No ratings yet
BigDataSolution of Paper Oct 2022
11 pages
Understanding Prediction in Regression Models
No ratings yet
Understanding Prediction in Regression Models
21 pages
Understanding P-Values and A/B Testing
No ratings yet
Understanding P-Values and A/B Testing
7 pages
Data Science
No ratings yet
Data Science
14 pages
ISL Answers
No ratings yet
ISL Answers
19 pages
ADS Viva
No ratings yet
ADS Viva
55 pages
Machine Learning and Data Mining
No ratings yet
Machine Learning and Data Mining
88 pages
Data Science Assignment for CSE Students
No ratings yet
Data Science Assignment for CSE Students
21 pages
kNN, k-means, and Regression Explained
No ratings yet
kNN, k-means, and Regression Explained
11 pages
ML Exam Prep: Key Concepts & Techniques
No ratings yet
ML Exam Prep: Key Concepts & Techniques
78 pages
Machine Learning Concepts and Applications
No ratings yet
Machine Learning Concepts and Applications
8 pages
Activities Super
No ratings yet
Activities Super
6 pages
200 Data Science Interview Questions
No ratings yet
200 Data Science Interview Questions
16 pages
ML Imp QB
No ratings yet
ML Imp QB
34 pages
Chapter - 2 - Arranging - and - Collecting - Data Class9
100% (1)
Chapter - 2 - Arranging - and - Collecting - Data Class9
10 pages
Machine Learning Insem-01 QP
No ratings yet
Machine Learning Insem-01 QP
6 pages
Machine Learning Techniques Overview
No ratings yet
Machine Learning Techniques Overview
73 pages
Aam Unit 1 QB With Answer
No ratings yet
Aam Unit 1 QB With Answer
12 pages
Data Science Life Cycle and Applications
No ratings yet
Data Science Life Cycle and Applications
9 pages
Data Analytics Course (IIFT MBA) Full Course Summary - 27072023
No ratings yet
Data Analytics Course (IIFT MBA) Full Course Summary - 27072023
253 pages
ml2 250401 105339
No ratings yet
ml2 250401 105339
10 pages
Ds Viva
No ratings yet
Ds Viva
9 pages
Comparing Traditional vs. Machine Learning Programming
No ratings yet
Comparing Traditional vs. Machine Learning Programming
3 pages
Key Concepts in Data Science and Analytics
No ratings yet
Key Concepts in Data Science and Analytics
17 pages
ML Question Bank Ans
No ratings yet
ML Question Bank Ans
24 pages
Machine Learning Exam Prep Guide
No ratings yet
Machine Learning Exam Prep Guide
41 pages
Business Analytics 2 Marks Question Bank
No ratings yet
Business Analytics 2 Marks Question Bank
5 pages
Ds Revision 1
No ratings yet
Ds Revision 1
5 pages
Data Modelling Visualization Solutions Marking Scheme
No ratings yet
Data Modelling Visualization Solutions Marking Scheme
6 pages
Machine Learning Concepts and Algorithms
No ratings yet
Machine Learning Concepts and Algorithms
22 pages
Data Mining Exam Questions and Answers
No ratings yet
Data Mining Exam Questions and Answers
4 pages
Exam Preparation Notes
No ratings yet
Exam Preparation Notes
31 pages
Cryptography vs. Steganography Explained
No ratings yet
Cryptography vs. Steganography Explained
1 page
Data Structures and Algorithms Overview
No ratings yet
Data Structures and Algorithms Overview
1 page
Operations Research in Manufacturing Optimization
No ratings yet
Operations Research in Manufacturing Optimization
1 page
Understanding Static vs. Dynamic Models
No ratings yet
Understanding Static vs. Dynamic Models
1 page
Comparative Education Overview
No ratings yet
Comparative Education Overview
92 pages
Introduction To Behavioral Research Methods 6th Edition Leary Solutions Manual
No ratings yet
Introduction To Behavioral Research Methods 6th Edition Leary Solutions Manual
25 pages
Discrete Probability Distributions
No ratings yet
Discrete Probability Distributions
10 pages
The Impact of Corporate Social Responsibility Disclosure On Financial Performance of Firms in Africa
No ratings yet
The Impact of Corporate Social Responsibility Disclosure On Financial Performance of Firms in Africa
11 pages
Robust Kernel Density Estimation MoM
No ratings yet
Robust Kernel Density Estimation MoM
22 pages
Icah 1
No ratings yet
Icah 1
35 pages
Statistical Fault Simulation in VLSI Testing
No ratings yet
Statistical Fault Simulation in VLSI Testing
4 pages
Video Game Addiction Research Study
No ratings yet
Video Game Addiction Research Study
41 pages
Engineering Data Analysis Guide
No ratings yet
Engineering Data Analysis Guide
40 pages
Statistics For Business and Economics 11th Edition David Ray Anderson
No ratings yet
Statistics For Business and Economics 11th Edition David Ray Anderson
391 pages
Bivariate Linear Regression Analysis Guide
No ratings yet
Bivariate Linear Regression Analysis Guide
15 pages
Applied Statistics for Economics Course
No ratings yet
Applied Statistics for Economics Course
49 pages
Artificial Neural Networks Course Overview
No ratings yet
Artificial Neural Networks Course Overview
1 page
20.kumar Et Al
No ratings yet
20.kumar Et Al
17 pages
Convolutional Neural Networks (CNN) - QA & HandsOn
60% (5)
Convolutional Neural Networks (CNN) - QA & HandsOn
5 pages
Importing Files STATA
No ratings yet
Importing Files STATA
17 pages
Statistical Analysis of Pain Relief and Lubricants
No ratings yet
Statistical Analysis of Pain Relief and Lubricants
7 pages
Statistics and Computer
No ratings yet
Statistics and Computer
2 pages
Data Science Vs Data Analytics
No ratings yet
Data Science Vs Data Analytics
5 pages
ANOVA for Engineers and Researchers
No ratings yet
ANOVA for Engineers and Researchers
98 pages
Statistics in Toxicology Using R
No ratings yet
Statistics in Toxicology Using R
253 pages
Statistical Analysis in Criminal Justice Research
No ratings yet
Statistical Analysis in Criminal Justice Research
3 pages
Hasil Uji NS1 dan PCR pada Demam
No ratings yet
Hasil Uji NS1 dan PCR pada Demam
5 pages
L15 - Naive Bayes Classifier
No ratings yet
L15 - Naive Bayes Classifier
27 pages
Quality Control & Inspection Methods
No ratings yet
Quality Control & Inspection Methods
37 pages
Economic Analysis of HDI and Confidence Intervals
No ratings yet
Economic Analysis of HDI and Confidence Intervals
13 pages
Impact of Multimodal Ling Zhang
No ratings yet
Impact of Multimodal Ling Zhang
7 pages
Reliability Theory and Life Testing DSE_SEM_8
No ratings yet
Reliability Theory and Life Testing DSE_SEM_8
3 pages
A Duke 1232015 B Je Sbs 19759
No ratings yet
A Duke 1232015 B Je Sbs 19759
10 pages
Understanding Research Methods & Value
No ratings yet
Understanding Research Methods & Value
10 pages

ERERER

Uploaded by

ERERER

Uploaded by

3. Explain the role of entropy as an information measure 14.

Discuss the concept of feature engineering and its

where it is preferable to parametric methods.

1. Mann-Whitney U Test: Compares medians of two groups

efficient data encoding. existing ones. This could involve mathematical

ANOVA for comparing more than two groups.

continuous random variables. Provide examples of each.

countable number of distinct values.

Preferable When: Data is ordinal, non-normal, or has

advanced techniques like KNN or regression imputation.

5E. xplain the differences between discrete and

need to be converted into numerical representations for

most machine learning models

Discrete Random Variable

Significance of Feature Engineering:

1. Definition: A random variable that can take on a

Reduction3. Better Generalization

2. Values: Usually integers or specific values (e.g., 0, 1, 2,

3. Examples: Number of students in a class.

their applications in predictive modeling.

4. Probabilit y Distribution: Represented using a

y ( PMF) which assigns

probabilit mass function

ndependent variable (input). Coefficients. Error term.

Continuous Random Variable

Application: Predicting house prices based on features like

within a given range (uncountable).

used to predict probabilities of binary outcomes.

counted (e.g., height, weight, time).

3. Examples: The height of people in a group.

Application: Predicting whether a patient has a disease

intervals, not specific values (e.g., ).

and hi- uare tests. hen are the

{\sqrt{2\pi\sigma^2}} e^{-\frac{(x_i - \mu)^2}{2\sigma^2}}

used in data anal sis

In Bayesian Classification: MLE computes probabilities of ANOVA (Analysis of Variance):

Purpose: Compares means of three or more groups to

10. xplain the entral imit Theorem and its implications

The CLT states that, for a sufficiently large sample size,

Chi-Square Test:Purpose: Tests the association between

confidence intervals and hypothesis testing.

Hypothesis Testing: A method to decide whether to reject

impact model performance

Example: Testing a new drug’s effectiveness:

W y Variance:Definition: Sensitivity of a model to small

detecting outliers llustrate with an example.

Effect: High variance leads to overfitting.

Z-scores measure how far a data point is from the mean in

[Link]-off: A good model balances bias and

Outliers: Points with are considered outliers.

variance to minimize total error.

Data: [10, 12, 15, 50].

Z for 50 = (not an outlier). Definition: A non-parametric method that classifies a data

13. Explain the difference between simple linear Algorithm:

regression and logistic regression.

Simple Linear Regression: Predicts continuous outcomes:

Logistic Regression: Predicts probabilities for binary 3. Select closest neighbors.

Application: Used in recommendation systems and pattern

You might also like