0% found this document useful (0 votes)

6 views14 pages

Mlviva

The document provides a comprehensive guide on setting up Python and essential libraries for data science, including NumPy and pandas. It covers key concepts of machine learning with scikit-learn, including installation, algorithms, and data handling techniques. Additionally, it discusses data visualization methods using Matplotlib and Seaborn, as well as various machine learning models like k-NN, linear regression, decision trees, and K-Means clustering.

Uploaded by

theswaran909

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views14 pages

Mlviva

Uploaded by

theswaran909

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

1.

Install and Set Up Python and Essential Libraries

(NumPy, pandas)
Q1: Why is Python a preferred language for data science and machine
learning?

Answer:
Python is preferred because:
✔ Easy to Learn: Simple syntax, readable code.
✔ Rich Ecosystem: Libraries like NumPy, pandas, scikit-learn, TensorFlow.
✔ Community Support: Large open-source community.
✔ Versatility: Used in web development, automation, and AI.
✔ Integration: Works well with databases, big data tools (Spark), and visualization
libraries (Matplotlib, Seaborn).

Q2: What is the purpose of pip in Python, and how do you install a library
using it?

Answer:

 pip is Python’s package installer.

 To install a library:

pip install numpy pandas

 To upgrade:
pip install --upgrade pandas

Q3: Explain the difference between NumPy and pandas.

Answer:
NumPy pandas

Used for numerical computations (arrays,

Used for structured data (DataFrames, Series).
matrices).

Optimized for math operations (linear Optimized for data manipulation (filtering,
algebra). grouping).

[Link]() and [Link]() are core

[Link]() is the core data structure.
structures.

Q4: How do you check the installed version of a Python library like
pandas?

Answer:

import pandas as pd
print(pd.__version__)

or via terminal:

pip show pandas

Q5: What is a virtual environment, and why is it useful?

Answer:

 A virtual environment isolates Python dependencies for different projects.

 Why use it?
✔ Avoids version conflicts between projects.
✔ Ensures reproducibility.
 How to create one?
python -m venv myenv
source myenv/bin/activate # Linux/Mac
myenv\Scripts\activate # Windows
2. Introduction to Scikit-Learn
Q1: What is scikit-learn, and what are its key features?

Answer:

 Scikit-learn is a Python ML library for:

✔ Classification (k-NN, SVM, Decision Trees).
✔ Regression (Linear, Ridge, Lasso).
✔ Clustering (K-Means, DBSCAN).
✔ Preprocessing (Scaling, Encoding).
 Features:
✔ Open-source, easy-to-use API.
✔ Built on NumPy, SciPy, and Matplotlib.

Q2: Name some common ML algorithms in scikit-learn.

Answer:

 Supervised Learning:

o LinearRegression, LogisticRegression

o DecisionTreeClassifier, RandomForest

o SVM, k-NN

 Unsupervised Learning:

o KMeans, DBSCAN

o PCA (Dimensionality Reduction)

Q3: How does scikit-learn differ from TensorFlow/PyTorch?

Answer:

Scikit-Learn TensorFlow/PyTorch

Traditional ML (small datasets). Deep Learning (neural networks).

Scikit-Learn TensorFlow/PyTorch

Simpler API. Complex, GPU-accelerated.

No GPU support. Optimized for GPU/TPU.

Q4: What are the main modules in scikit-learn?

Answer:

 sklearn.model_selection (train-test split, cross-validation).

 [Link] (scaling, encoding).

 [Link] (accuracy, MSE, confusion matrix).

 sklearn.linear_model (LinearRegression, LogisticRegression).

Q5: Can scikit-learn be used for deep learning?

Answer:
❌ No, scikit-learn is for traditional ML. For deep learning, use:
✔ TensorFlow / Keras
✔ PyTorch

3. Install and Set Up Scikit-Learn

Q1: How to install scikit-learn?

Answer:

pip install scikit-learn

or with conda:

conda install scikit-learn

Q2: What are the dependencies of scikit-learn?

Answer:

 NumPy
 SciPy
 Matplotlib (for visualization)
 Joblib (for model saving)

Q3: How to verify scikit-learn installation?

Answer:

import sklearn
print(sklearn.__version__)

Q4: What is Jupyter Notebook, and why use it?

Answer:

 Interactive Python environment for data analysis.

 Why use it?
✔ Combines code, visualizations, and text.
✔ Great for exploratory data analysis (EDA).

Q5: Alternative tools alongside scikit-learn?

Answer:

 XGBoost (Gradient Boosting).

 StatsModels (Statistical modeling).
 LightGBM (Optimized decision trees).

4. Load and Explore CSV/Excel Data with Pandas

Q1: How to read a CSV file in pandas?
Answer:

import pandas as pd
data = pd.read_csv("[Link]")

Q2: Common pandas functions for exploration?

Answer:

 [Link]() → First 5 rows.

 [Link]() → Statistics (mean, std, min, max).

 [Link]() → Data types and missing values.

Q3: How to check for missing values?

Answer:

[Link]().sum()

Q4: Difference between loc and iloc?

Answer:

 loc → Label-based indexing ([Link][0, 'column']).

 iloc → Integer-based indexing ([Link][0, 1]).

Q5: How to filter rows in a DataFrame?

Answer:

filtered_data = data[data['column'] > 50]

5. Data Visualization with Matplotlib/Seaborn

Q1: Difference between Matplotlib and Seaborn?
Answer:

 Matplotlib: Basic, customizable plots.

 Seaborn: Built on Matplotlib, prettier statistical plots.

Q2: How to create a scatter plot?

Answer:

import [Link] as plt

[Link](x=data['X'], y=data['Y'])
[Link]()

Q3: Plots for distributions?

Answer:

 Histogram ([Link]()).
 Boxplot ([Link]()).

Q4: How to customize plot labels?

Answer:

[Link]("X-axis")
[Link]("Y-axis")
[Link]("Scatter Plot")

Q5: Bar Chart vs. Pie Chart?

Answer:

 Bar Chart: Compare categories (e.g., sales per month).

 Pie Chart: Show proportions (e.g., market share %).

(Continued in next message for remaining answers...)

Would you like me to proceed with the rest? 🚀

yes

Answers to Viva Questions (Continued)

6. Handling Missing Data, Encoding, and Feature Scaling

Q1: Strategies to handle missing data?

Answer:
✔ Deletion: dropna() (if few missing values).
✔ Imputation:

 Mean/Median (SimpleImputer).
 Mode (for categorical data).
 Advanced: KNN imputation.

Q2: What is one-hot encoding?

Answer:

 Converts categorical variables into binary columns.

 Example:
pd.get_dummies(data['category'])

 When to use? When categories are not ordinal (e.g., colors: Red, Green, Blue).

Q3: Why is feature scaling important?

Answer:

 Ensures all features contribute equally to distance-based algorithms (k-NN, SVM).

 Prevents dominance of high-magnitude features.
Q4: StandardScaler vs. MinMaxScaler?

Answer:

StandardScaler MinMaxScaler

Scales to mean=0, std=1. Scales to range [0, 1].

Good for Gaussian data. Good for bounded data.

Q5: How to encode categorical variables?

Answer:

 Label Encoding (Ordinal):

from [Link] import LabelEncoder
le = LabelEncoder()
data['category'] = le.fit_transform(data['category'])

 One-Hot Encoding (Nominal):

pd.get_dummies(data)

7. k-Nearest Neighbors (k-NN) Classifier

Q1: Basic principle of k-NN?

Answer:

 Classifies a data point based on majority vote of its k nearest neighbors.

 Steps:

1. Compute distances (Euclidean/Manhattan).

2. Find k nearest points.
3. Assign majority class.

Q2: How does k affect performance?

Answer:

 Small k:
✔ High variance (overfitting).
✔ Sensitive to noise.
 Large k:
✔ High bias (underfitting).
✔ Smoother decision boundaries.

Q3: Distance metrics in k-NN?

Answer:

 Euclidean: Default (sqrt(∑(x_i - y_i)²)).

 Manhattan: Sum of absolute differences (∑|x_i - y_i|).
 Minkowski: Generalization of both.

Q4: Train-test split in scikit-learn?

Answer:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Q5: Evaluation metrics for classification?

Answer:

 Accuracy: (TP + TN) / Total.

 Precision: TP / (TP + FP).
 Recall: TP / (TP + FN).
 F1-Score: Harmonic mean of precision/recall.
8. Linear Regression Model
Q1: Mathematical formulation?

Answer:
y = β₀ + β₁x₁ + β₂x₂ + ... + βₙ xₙ + ε

 y: Target variable.

 β₀: Intercept.

 β₁...βₙ : Coefficients.

 ε: Error term.

Q2: How to interpret coefficients?

Answer:

 β₁: For 1-unit increase in x₁, y changes by β₁ (holding other features constant).
 Example: If β₁ = 2.5, y increases by 2.5 for every 1-unit increase in x₁.

Q3: fit() vs. predict()?

Answer:

 fit(): Trains the model on data ([Link](X_train, y_train)).

 predict(): Generates predictions (y_pred = [Link](X_test)).

Q4: What is Mean Squared Error (MSE)?

Answer:

 Average squared difference between predicted and actual values.

 Formula:
from [Link] import mean_squared_error
mse = mean_squared_error(y_test, y_pred)

Q5: How to check for overfitting?

Answer:

 High training accuracy, low test accuracy → Overfitting.

 Solutions:
✔ Regularization (Ridge/Lasso).
✔ Cross-validation.

9. Decision Tree Classifier

Q1: How does a decision tree make splits?

Answer:

 Splits based on:

o Gini Impurity: Measures misclassification probability.

o Entropy: Measures disorder (lower = better split).

Q2: What is Gini impurity?

Answer:

 Probability of incorrect classification for a randomly chosen element.

 Formula: Gini = 1 - ∑(p_i)² (where p_i is class proportion).

Q3: How to visualize a decision tree?

Answer:

from [Link] import plot_tree

plot_tree(model, feature_names=[Link], filled=True)

Q4: Pros and cons of decision trees?

Answer:
Pros Cons

Easy to interpret. Prone to overfitting.

No need for scaling. Unstable (small changes → different tree).

Q5: What is pruning?

Answer:

 Reduces tree size by removing unnecessary splits.

 Methods:

o Pre-pruning (limit depth/min_samples_split).

o Post-pruning (cost-complexity pruning).

10. K-Means Clustering

Q1: Objective of K-Means?

Answer:

 Partition data into k clusters where each point belongs to the nearest centroid.
 Goal: Minimize within-cluster variance.

Q2: How to choose optimal k?

Answer:

 Elbow Method: Plot Inertia (sum of squared distances) vs. k.

 Silhouette Score: Measures cluster separation.

Q3: Role of centroid?

Answer:
 Mean position of all points in a cluster.
 Updated iteratively to minimize distances.

Q4: Initialization sensitivity?

Answer:

 Random centroids can lead to suboptimal clusters.

 Solution: Use k-means++ (smart initialization).

Q5: Limitations of K-Means?

Answer:
❌ Assumes spherical clusters.
❌ Sensitive to outliers.
❌ Requires predefined k.

Summary Table: Key Scikit-Learn Methods

Task Function

Train-test split train_test_split()

Feature scaling StandardScaler(), MinMaxScaler()

k-NN Classifier KNeighborsClassifier()

Linear Regression LinearRegression()

Decision Tree DecisionTreeClassifier()

K-Means KMeans()

Aids Viva Experiment Wise
No ratings yet
Aids Viva Experiment Wise
8 pages
ML LabManual
No ratings yet
ML LabManual
16 pages
Viva ML
No ratings yet
Viva ML
10 pages
Viva
No ratings yet
Viva
7 pages
Machine Learning Lab Dlihebca6sem
100% (1)
Machine Learning Lab Dlihebca6sem
25 pages
Feature Engineering Assignment
No ratings yet
Feature Engineering Assignment
7 pages
PML Lab Manual-7-12
No ratings yet
PML Lab Manual-7-12
6 pages
ML Pgms - 24mar2025
No ratings yet
ML Pgms - 24mar2025
23 pages
Ds Viva
No ratings yet
Ds Viva
9 pages
Python Libraries Questions
No ratings yet
Python Libraries Questions
3 pages
DR Antonio Gulli - A Collection of Advanced Data Science and Machine Learning Interview Questions Solved in Python and Spark (II) - Hands-On Big Data and Machine - Programming Interview Questions) (
No ratings yet
DR Antonio Gulli - A Collection of Advanced Data Science and Machine Learning Interview Questions Solved in Python and Spark (II) - Hands-On Big Data and Machine - Programming Interview Questions) (
112 pages
Data Science
No ratings yet
Data Science
16 pages
Hands-On Data Preprocessing in Python
No ratings yet
Hands-On Data Preprocessing in Python
32 pages
Machine Learning Laboratory: Manual
No ratings yet
Machine Learning Laboratory: Manual
52 pages
Scikit-Learn: Library For Machine Learning and Data Science With Python
100% (1)
Scikit-Learn: Library For Machine Learning and Data Science With Python
11 pages
Ds You Should Know
No ratings yet
Ds You Should Know
6 pages
Top 30 AI ML Fresher QA
No ratings yet
Top 30 AI ML Fresher QA
3 pages
Scikit-Learn Overview and Algorithms
100% (2)
Scikit-Learn Overview and Algorithms
12 pages
DS - Sample Questions (Practical)
No ratings yet
DS - Sample Questions (Practical)
8 pages
Scikit-Learn Python Cheat Sheet
100% (1)
Scikit-Learn Python Cheat Sheet
1 page
ML Exp
No ratings yet
ML Exp
9 pages
Hands-On Data Science and Python Machine Learning - Perform Data Mining and Machine Learning Efficiently Using Python and Spark PDF
No ratings yet
Hands-On Data Science and Python Machine Learning - Perform Data Mining and Machine Learning Efficiently Using Python and Spark PDF
415 pages
Python for Machine Learning Enthusiasts
No ratings yet
Python for Machine Learning Enthusiasts
50 pages
ML Lab Manual (Vim)
No ratings yet
ML Lab Manual (Vim)
13 pages
Ai Possible Qns
No ratings yet
Ai Possible Qns
15 pages
Exercise and Experiment 3
No ratings yet
Exercise and Experiment 3
14 pages
Coincent - Data Science With Python Assignment
100% (2)
Coincent - Data Science With Python Assignment
23 pages
Scikit-Learn: Python Data Analytics
No ratings yet
Scikit-Learn: Python Data Analytics
58 pages
ML Practice - Set
No ratings yet
ML Practice - Set
2 pages
Machine Learning: Supervised /unsupervised
No ratings yet
Machine Learning: Supervised /unsupervised
33 pages
DSVIVATXT
No ratings yet
DSVIVATXT
5 pages
305 BA PYTHON - APR 2022 ANSWER Key
No ratings yet
305 BA PYTHON - APR 2022 ANSWER Key
14 pages
Practical Assignment ML
No ratings yet
Practical Assignment ML
50 pages
Python Pandas and Machine Learning Guide
No ratings yet
Python Pandas and Machine Learning Guide
21 pages
Machine Learning Cheat Sheet: Karn Singh
No ratings yet
Machine Learning Cheat Sheet: Karn Singh
13 pages
Machine Learning Question Bank
No ratings yet
Machine Learning Question Bank
7 pages
Skit Learn Cheatsheet
No ratings yet
Skit Learn Cheatsheet
11 pages
Data Analysis in Python - ML
No ratings yet
Data Analysis in Python - ML
21 pages
Python Library Functions Overview
No ratings yet
Python Library Functions Overview
12 pages
Viva Questions Answers
No ratings yet
Viva Questions Answers
2 pages
Unit 1-1
No ratings yet
Unit 1-1
10 pages
Data Mining Lab Manual CSE VII Sem
No ratings yet
Data Mining Lab Manual CSE VII Sem
63 pages
Machine Learning
No ratings yet
Machine Learning
8 pages
DSCI 6003 Class Notes
No ratings yet
DSCI 6003 Class Notes
7 pages
Machine Learning and AI Quiz
No ratings yet
Machine Learning and AI Quiz
33 pages
Digital Principal and System Design
No ratings yet
Digital Principal and System Design
17 pages
ml2 250401 105339
No ratings yet
ml2 250401 105339
10 pages
Data Analytics Libraries Overview
No ratings yet
Data Analytics Libraries Overview
8 pages
Data Scientists' scikit-learn Guide
No ratings yet
Data Scientists' scikit-learn Guide
52 pages
Scikit-Learn Classification Cheat Sheet
No ratings yet
Scikit-Learn Classification Cheat Sheet
1 page
Python 06 MachineLearning
No ratings yet
Python 06 MachineLearning
45 pages
Top 90+ Data Science Interview Questions and Answers (2024)
No ratings yet
Top 90+ Data Science Interview Questions and Answers (2024)
38 pages
Unit 4
No ratings yet
Unit 4
8 pages
MLP Question Bank of AI and ML and NLP
No ratings yet
MLP Question Bank of AI and ML and NLP
7 pages
IR Sensor: Created by Lady Ada
No ratings yet
IR Sensor: Created by Lady Ada
39 pages
The Thinking of Computer Language Courses in China
No ratings yet
The Thinking of Computer Language Courses in China
4 pages
Register Python in Windows Registry
No ratings yet
Register Python in Windows Registry
2 pages
GSM BSC Maintenance
No ratings yet
GSM BSC Maintenance
44 pages
G16 Analisa Statistik Korosi PDF
No ratings yet
G16 Analisa Statistik Korosi PDF
14 pages
Ips7 Link English
No ratings yet
Ips7 Link English
41 pages
SAP FI Certification Exam Questions
No ratings yet
SAP FI Certification Exam Questions
9 pages
History of Java
No ratings yet
History of Java
3 pages
9 - Merged
No ratings yet
9 - Merged
10 pages
Job Application for Office Engineer Position
No ratings yet
Job Application for Office Engineer Position
4 pages
Iii Ece NM Name List 2023-24 Even
No ratings yet
Iii Ece NM Name List 2023-24 Even
4 pages
Guidelines for IOSR Journal Submissions
No ratings yet
Guidelines for IOSR Journal Submissions
2 pages
Computer Graphics: Image Representation
No ratings yet
Computer Graphics: Image Representation
44 pages
Hana Cleaner
No ratings yet
Hana Cleaner
62 pages
VC Audio Pro User Manual Guide
No ratings yet
VC Audio Pro User Manual Guide
23 pages
Hive and Impala
No ratings yet
Hive and Impala
46 pages
Erased Log by Sos
No ratings yet
Erased Log by Sos
1 page
Wondershare Quiz Creator User Manual PDF
No ratings yet
Wondershare Quiz Creator User Manual PDF
46 pages
Ballistics Solver User Guide
No ratings yet
Ballistics Solver User Guide
5 pages
DB Concepts
No ratings yet
DB Concepts
22 pages
Catalogo ADB
No ratings yet
Catalogo ADB
92 pages
Philosophical Password Insights
No ratings yet
Philosophical Password Insights
5 pages
IT251 Java Practical List
No ratings yet
IT251 Java Practical List
10 pages
Computer Studies II PDF
No ratings yet
Computer Studies II PDF
3 pages
PYthon Class 15 Telugu
No ratings yet
PYthon Class 15 Telugu
6 pages
China Mobile C2000
No ratings yet
China Mobile C2000
9 pages
SQL B12
No ratings yet
SQL B12
3 pages
Date and Time Functions Exam
No ratings yet
Date and Time Functions Exam
7 pages
ESL Brains - AI Chatbot Ready To Help
No ratings yet
ESL Brains - AI Chatbot Ready To Help
31 pages
Oracle Fusion Middleware 12c (12.2.1.1.0) Certification Matrix
No ratings yet
Oracle Fusion Middleware 12c (12.2.1.1.0) Certification Matrix
126 pages