0% found this document useful (0 votes)

23 views26 pages

Machine Learning Unit 1

The document discusses various machine learning concepts including supervised, unsupervised, and reinforcement learning, as well as linear regression and its evaluation metrics. It covers topics like gradient descent, multicollinearity, bias-variance tradeoff, and the significance of support vectors in classification. Additionally, it provides Python code examples for training models to predict rainfall and employee salaries, along with explanations of overfitting and underfitting.

Uploaded by

divyaashok19052008

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views26 pages

Machine Learning Unit 1

Uploaded by

divyaashok19052008

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Version 2

1. Supervised vs. Unsupervised vs. Reinforcement

 Supervised learning uses labeled data with known outputs to train models (e.g.,
predicting marks from study hours).
 Unsupervised learning works with unlabeled data to find hidden patterns or
clusters (e.g., customer segmentation).
 Reinforcement learning is based on agents learning by interacting with an
environment and improving via rewards/penalties.

2. Linear Regression

Linear regression is a statistical method to predict the dependent variable (y) based on
one or more independent variables (x).
It assumes a linear relationship and fits a straight line, generally expressed as y = mx +
c, where m is slope and c is intercept.

3. R² for given data

R² measures how well the predicted values fit the actual data.
For the given dataset, Mean(y)=79, SST=685, SSR=660.
R² = SSR/SST = 660/685 ≈ 0.964 → Model explains about 96.4% of variance.

4. Role of Learning Rate in Gradient Descent

The learning rate decides the step size during parameter updates.
If it is too high, the model may overshoot the minimum; if too low, convergence will be
very slow.
An optimal learning rate ensures fast and stable convergence.

5. Gradient Descent in Linear Regression

Gradient descent minimizes the cost function by updating weights iteratively.

It calculates the gradient (slope of error function) and moves in the opposite direction to
reduce error.
This process continues until the error reaches its minimum value.
6. Multicollinearity

Multicollinearity occurs when two or more independent variables in regression are highly
correlated.
This makes it difficult to determine the effect of each predictor accurately.
It can lead to unstable coefficient estimates and poor model interpretability.

7. Bias-Variance Tradeoff

Bias is the error due to oversimplifying the model (underfitting), while variance is the
error due to high sensitivity to training data (overfitting).
A high-bias model ignores important patterns, and a high-variance model fails to
generalize.
The tradeoff balances both for optimal performance.

8. Outliers

Outliers are extreme values that differ significantly from other observations.
They can distort regression lines, increase error, and reduce prediction accuracy.
Handling outliers is important for building robust models.
QA507 Define support vectors and explain their significance.
Support vectors are the data points that are nearest to the hyper-plane and affect the position and
orientation of the hyper-plane. We have to select a hyperplane, for which the margin, i.e the distance
between support vectors and hyper-plane is maximum. Even a little interference in the position of
these support vectors can change the hyper-plane
QA508 What is the intuition of a large margin classifier?
A classifier with a large margin makes no low certainty classification decisions. This gives you a
classification safety margin: a slight error in measurement or a slight document variation will not
cause a misclassification.

PART B (5 x 13 = 65 marks)

QB101 (a) Explain gradient descent algorithm for linear regression, emphasizing its optimization process.
Answer:
Gradient Descent is an optimization algorithm for finding a local minimum of a differentiable
function. Gradient descent is simply used in machine learning to find the values of a function's
parameters (coefficients) that minimize a cost function as far as possible.
In linear regression, the goal is to find the best-fitting line (or hyperplane in higher dimensions)
that minimizes the sum of squared differences between the observed and predicted values.
Gradient descent is used to update the coefficients of the linear regression model to minimize
this error.
Gradient Descent is an algorithm that finds the best-fit line for a given training dataset in a
smaller number of iterations.
When slope and intercept is plotted against MSE, it will acquire a bowl shape.

Gradient descent is used to minimize a cost function J(W) parameterized by a model parameter
W.

The gradient (or derivative) tells us the incline or slope of the cost function. Hence, to
minimize the cost function, we move in the direction opposite to the gradient.

1. Initialize the weights W randomly.

2. Calculate the gradients G of cost function w.r.t parameters. This is done using partial
differentiation: G = ∂J(W)/∂W. The value of the gradient G depends on the inputs, the
current values of the model parameters, and the cost function. You might need to revisit
the topic of differentiation if you are calculating the gradient by hand.
3. Update the weights by an amount proportional to G, i.e. W = W - ηG
4. Repeat until the cost J(w) stops reducing, or some other pre-defined termination criteria
is met.
Defining the Algorithm:
The gradient descent algorithm is defined as a repeated convergence for each input parameter,

9
⍺= Learning rate, which is a value that control how big of a step is taken. This is usually a
small positive number, between 0 and 1. A large alpha value corresponds to a bigger step and
vice-versa.

Derivative of the cost function, which determines the direction to take

each step.

Gradient descent intuition:

Choosing The Learning Rate:

The choice of learning rate ⍺ will have a huge impact on the efficiency of the overall
algorithm. If the learning rate is too small, many more steps will be taken than required, so the
10
total time for the algorithm will grow dramatically. On the other hand, if the learning rate is too
large, then steps can miss the true minimum cost function, and the algorithm itself will actually
diverge away from it.

11
(Or)
QB101 (b)
Write a python program using appropriate libraries to train a model to predict the rainfall (in
mm) using precipitation and humidity factor (explanatory).

import numpy as np

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from [Link] import mean_squared_error, r2_score

12
# Sample dataset creation (You can replace this with your actual dataset)

data = {

'Precipitation': [10.2, 12.3, 7.8, 9.5, 8.4, 13.2, 11.5, 14.2, 9.0, 10.8],

'Humidity': [80, 85, 78, 82, 75, 90, 88, 92, 84, 79],

'Rainfall': [15.2, 18.3, 10.1, 13.5, 11.9, 19.5, 16.7, 20.2, 13.0, 14.8]

# Convert the data into a pandas DataFrame

df = [Link](data)

# Define the explanatory variables (X) and the target variable (y)

X = df[['Precipitation', 'Humidity']]

y = df['Rainfall']

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the linear regression model

model = LinearRegression()

# Train the model on the training data

[Link](X_train, y_train)

# Predict the rainfall on the testing data

y_pred = [Link](X_test)

# Evaluate the model's performance

mse = mean_squared_error(y_test, y_pred)

r2 = r2_score(y_test, y_pred)

13
# Print the results

print("Mean Squared Error:", mse)

print("R-squared:", r2)

# Print model coefficients

print("Coefficients:", model.coef_)

print("Intercept:", model.intercept_)

# Predicting rainfall for a new set of data

new_data = [Link]([[10.5, 82]])

predicted_rainfall = [Link](new_data)

print(f"Predicted Rainfall for Precipitation=10.5 and Humidity=82: {predicted_rainfall[0]:.2f}

mm")
Short explanation/algorithm about the code should be given.

QB102 (a) Explain about overfitting, underfitting and best fit in detail with examples.
Overfitting
Overfitting occurs when our machine learning model tries to cover all the data points or more
than the required data points present in the given dataset.

Because of this, the model starts caching noise and inaccurate values present in the dataset, and
all these factors reduce the efficiency and accuracy of the model.

The overfitted model has low bias and high variance.

The chances of occurrence of overfitting increase as much we provide training to our model. It
means the more we train our model, the more chances of occurring the overfitted model.

Overfitting is the main problem that occurs in supervised learning.

14
How to avoid the Overfitting in Model:
Both overfitting and underfitting cause the degraded performance of the machine learning
model. But the main cause is overfitting, so there are some ways by which we can reduce the
occurrence of overfitting in our model.
● Cross-Validation
● Training with more data
● Removing features
● Regularization
● Reduce model complexity.
● Early stopping during the training phase (have an eye over the loss over
the training period as soon as loss begins to increase stop training).

Underfitting:
Underfitting occurs when our machine learning model is not able to capture the underlying
trend of the data.
To avoid the overfitting in the model, the feed of training data can be stopped at an early stage,
due to which the model may not learn enough from the training data. As a result, it may fail to
find the best fit of the dominant trend in the data.
In the case of underfitting, the model is not able to learn enough from the training data, and
hence it reduces the accuracy and produces unreliable predictions.
An underfitted model has high bias and low variance.

15
Techniques to reduce underfitting:

● Increase model complexity

● Increase the number of features, performing feature engineering
● Remove noise from the data.
● Increase the number of epochs or increase the duration of training to get better results.
by increasing the training time of the model.

Best Fit:
● The goal of model fitting is to strike a balance between overfitting and underfitting to
achieve the best generalization performance.
● The best-fit model generalizes well to unseen data, capturing the underlying patterns in
the data without being overly complex or too simplistic.
● Achieving the best fit involves selecting a model that balances bias and variance
appropriately.
● This can be achieved through techniques such as cross-validation, regularization,
feature selection, and model complexity control.

Explanation – 13 Marks

(Or)
*QB102 (b) Write a python program using appropriate libraries to train a model to predict employees
monthly salaries based on years of experience and education level using Linear Regression.

# Import necessary libraries

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from [Link] import LabelEncoder
from [Link] import mean_squared_error

16
# Step 1: Load the dataset (assuming you have a file named 'employee_data.csv')
df = pd.read_csv('employee_data.csv')

# Step 2: Data Preprocessing (encode 'education_level' categorical feature)

label_encoder = LabelEncoder()
df['education_level'] = label_encoder.fit_transform(df['education_level'])

# Step 3: Define features (X) and target variable (y)

X = df[['years_of_experience', 'education_level']] # Features
y = df['salary'] # Target variable

# Step 4: Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Step 5: Initialize and train the Linear Regression model

model = LinearRegression()
[Link](X_train, y_train)

# Step 6: Make predictions on the test set

y_pred = [Link](X_test)

# Step 7: Evaluate the model using Mean Squared Error

mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")

# Step 8: Print the model coefficients

print(f"Model Coefficients: {model.coef_}")
print(f"Intercept: {model.intercept_}")

# Step 9: Making a prediction for a new employee

new_employee = [Link]({'years_of_experience': [6], 'education_level':
[label_encoder.transform(['Masters'])[0]]})

17
predicted_salary = [Link](new_employee)
print(f"Predicted salary for new employee: {predicted_salary[0]}")
Explanation:
● Load the dataset: Load the data from a CSV file.
● Preprocess the data: Encode the categorical feature (education_level).
● Train the model: Split the data into training and testing sets, then train the linear
regression model.
● Evaluate the model: Check model accuracy using Mean Squared Error (MSE).
● Make predictions: Predict the salary for new employee data.

1. Loading Data:

o The data is loaded from a CSV file using pd.read_csv(). The dataset must
include columns like years_of_experience, education_level, and salary.

2. Label Encoding:

o education_level is a categorical feature, so we convert it to numerical values

using LabelEncoder from scikit-learn.

3. Model Training:

o We split the dataset into training and test sets using train_test_split(), then train
the Linear Regression model using [Link]().

4. Evaluation:

o We evaluate the model's performance by calculating the Mean Squared Error

(MSE) using mean_squared_error() from [Link].

5. Prediction:

We make predictions for new data, where the education_level is encoded using the
LabelEncoder.

*QB103 (a) (i) What is Linear Regression? Describe the steps involved in the linear regression model
representation for a single variable. (6)
Linear Regression is a statistical method used to model the relationship between a dependent
variable (target) and one or more independent variables (features) by fitting a linear equation to
observed data. The goal is to find the best-fit line that predicts the target variable based on the
input feature(s).
In the case of single-variable linear regression, there is only one independent variable
(feature) and one dependent variable (target). The model assumes a linear relationship between
them.
Linear Regression Model Representation for a Single Variable:
For a single-variable linear regression, the relationship between the input variable
X(independent variable) and the output Y (dependent variable) can be represented by the
equation:

18
19
(ii)How can the performance of a linear regression model be evaluated, and what are the key
performance measures used for this evaluation? (7)
●To measure the performance of your regression model, some statistical metrics are used.
○ Mean Squared Error(MSE)

○
○ Mean Absolute Error(MAE)

○
○ Root Mean Square Error(RMSE)

○ Coefficient of determination or R2

○

○ Adjusted R2

○

●A good regression model is one where the difference between the actual or observed values
and predicted values for the selected model is small and unbiased for train, validation and
test data sets.
The performance of a regression model can be understood by knowing the error rate of the
predictions made by the model

20
(Or)
QB103 (b) (i)Write a python program to fit a straight line for the given data using Least squares method,
also plot the graph.
X=[1,2,3,4,5,6,7,8,9,10]
Y=[0,3,5,7,9,11,13,2,4,8] (6)
import numpy as np
import [Link] as plt

# Preprocessing Input data

X = [Link]([1,2,3,4,5,6,7,8,9,10])
Y = [Link]([0,3,5,7,9,11,13,2,4,8])

# Mean
X_mean =[Link](X)
Y_mean=[Link](Y)
num=0 #for slope
denom=0 #for slope

#to find sum of (xi-x') & (yi-y') & (xi-x')^2

for i in range(len(X)):
num+=(X[i] -X_mean)*(Y[i]-Y_mean)
denom+= (X[i]-X_mean)**2

#calculate slope
m=num/denom

#calculate intercept
b=Y_mean-m*X_mean

print(m,b)
#Line equation
y_predicted=m*X+b
print(y_predicted)
#to plot graph
[Link](X,Y)
[Link](X,y_predicted,color='red')
[Link]()

Code – 10 Marks
Explanation - 3 marks

21
(ii)Find the MSE for the following set of values: (43,41), (44,45), (45,49), (46,47), (47,44).
(7)
Step 1: Find the regression line.

The regression line y = 9.2 + 0.8x.

QB104 (a) Write a python program using appropriate libraries to train a model to predict the price of a
house and number of occupants for california housing dataset and also find its performance
measures.
import numpy as np
from [Link] import fetch_california_housing
from sklearn.linear_model import SGDRegressor
from [Link] import MultiOutputRegressor
from sklearn.model_selection import train_test_split
from [Link] import mean_squared_error
from [Link] import StandardScaler

# Load the California Housing dataset

data = fetch_california_housing()

# Use the first 3 features as inputs

X = [Link][:, :3] # Features: 'MedInc', 'HouseAge', 'AveRooms'

# Use 'MedHouseVal' and 'AveOccup' as output variables

Y = np.column_stack(([Link], [Link][:, 6])) # Targets: 'MedHouseVal', 'AveOccup'
22
# Split the data into training and testing sets
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

# Scale the features and target variables

scaler_X = StandardScaler()
scaler_Y = StandardScaler()

X_train = scaler_X.fit_transform(X_train)
X_test = scaler_X.transform(X_test)
Y_train = scaler_Y.fit_transform(Y_train)
Y_test = scaler_Y.transform(Y_test)

# Initialize the SGDRegressor

sgd = SGDRegressor(max_iter=1000, tol=1e-3)

# Use MultiOutputRegressor to handle multiple output variables

multi_output_sgd = MultiOutputRegressor(sgd)

# Train the model

multi_output_sgd.fit(X_train, Y_train)

# Predict on the test data

Y_pred = multi_output_sgd.predict(X_test)

# Inverse transform the predictions to get them back to the original scale
Y_pred = scaler_Y.inverse_transform(Y_pred)
Y_test = scaler_Y.inverse_transform(Y_test)

# Evaluate the model using Mean Squared Error

mse = mean_squared_error(Y_test, Y_pred)
print("Mean Squared Error:", mse)

# Optionally, print some predictions

print("\nPredictions:\n", Y_pred[:5]) # Print first 5 predictions

Short explanation/algorithm about the code should be given.

(Or)
QB104 (b)
Number of man-hours and the corresponding productivity (in units) are furnished below.
Determine the line of best fit using the least squares method. Also predict what will be the
productivity if the man-hours is 8.8. Also calculate MSE and MAE.

Mean X = 6.9
Mean Y = 13.44
Sum of squares = 40.48
Sum of products = 62.21

Regression Equation, ŷ = bX + a

23
b = 62.21/40.48 = 1.53

a = MY - bMX = 13.44 - (1.53*6.9) = 2.89

ŷ = 1.53X + 2.89

For X=8.8,

ŷ=1.53(8.8)+2.89
ŷ=16.37

MSE = 6.42
MAE = 1.94

Step-by-Step calculations to be done. Use tabulation.

*QB201 (a) Write a Python program to implement logistic regression for classifying customer reviews as
positive (1) or negative (0) based on extracted features such as review length, number of
positive/negative words, and star rating.
import pandas as pd
import numpy as np
import re
from sklearn.model_selection import train_test_split
from [Link] import StandardScaler
from sklearn.linear_model import LogisticRegression
from [Link] import accuracy_score, classification_report

# Sample list of positive and negative words

positive_words = set(["good", "great", "excellent", "awesome", "perfect", "love", "best"])
negative_words = set(["bad", "terrible", "poor", "horrible", "worst", "disappointing", "hate"])

# Function to extract features from review

def extract_features(review, star_rating):
review_length = len([Link]()) # Length of the review (in words)

# Count number of positive and negative words in the review

pos_count = sum(1 for word in [Link]().split() if word in positive_words)
neg_count = sum(1 for word in [Link]().split() if word in negative_words)

# Return a feature vector (review_length, positive_count, negative_count, star_rating)

return [review_length, pos_count, neg_count, star_rating]

# Load the dataset (replace 'customer_reviews.csv' with your file path)

df = pd.read_csv('customer_reviews.csv')

# Extract features and labels

X = [Link]([extract_features(review, rating) for review, rating in zip(df['review'],
df['rating'])])
y = df['label'].values

# Split data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
24

Part B (5 X 13 65 Marks)
No ratings yet
Part B (5 X 13 65 Marks)
16 pages
Regression
No ratings yet
Regression
56 pages
Week11 - Regularization and Optimization
No ratings yet
Week11 - Regularization and Optimization
75 pages
Unit 2 ML Regression
No ratings yet
Unit 2 ML Regression
46 pages
ML 21-22 Sem
No ratings yet
ML 21-22 Sem
10 pages
Predictive Maintenance
No ratings yet
Predictive Maintenance
66 pages
Linear Regression, Polynomical, Gradiant Descent
No ratings yet
Linear Regression, Polynomical, Gradiant Descent
42 pages
Unit 1.2 Perceptron 2024
No ratings yet
Unit 1.2 Perceptron 2024
107 pages
Machine Learning Algorithms Guide
No ratings yet
Machine Learning Algorithms Guide
5 pages
Csa202 Unit 2
No ratings yet
Csa202 Unit 2
36 pages
Linear Regression and Gradient Descent
No ratings yet
Linear Regression and Gradient Descent
29 pages
Linear Regression Summary
No ratings yet
Linear Regression Summary
57 pages
Linear Regression
No ratings yet
Linear Regression
61 pages
Q. (A) What Are Different Types of Machine Learning? Discuss The Differences
No ratings yet
Q. (A) What Are Different Types of Machine Learning? Discuss The Differences
12 pages
ML 1
No ratings yet
ML 1
24 pages
ML Models and When To Choose One Over Others
No ratings yet
ML Models and When To Choose One Over Others
7 pages
ML 1 PPT Unit 1
No ratings yet
ML 1 PPT Unit 1
93 pages
Module I-Part 1
No ratings yet
Module I-Part 1
48 pages
SP 24 BADM 576 Final - Exam - Study - Guide
No ratings yet
SP 24 BADM 576 Final - Exam - Study - Guide
13 pages
DL Unit1
100% (1)
DL Unit1
61 pages
Model Evaluation
No ratings yet
Model Evaluation
29 pages
Complete Chapter Revision Takeaways Supervised ML Regression
No ratings yet
Complete Chapter Revision Takeaways Supervised ML Regression
22 pages
Linear Regression Techniques Overview
No ratings yet
Linear Regression Techniques Overview
19 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
1 - Intro To Machine Learning
No ratings yet
1 - Intro To Machine Learning
34 pages
Cp4252 ML Unit-II
No ratings yet
Cp4252 ML Unit-II
44 pages
Linear Regression
No ratings yet
Linear Regression
89 pages
Module I Complete Notes
No ratings yet
Module I Complete Notes
136 pages
Lecture 10 - 04.09.2024 - Regression-02 Lecture Slides
No ratings yet
Lecture 10 - 04.09.2024 - Regression-02 Lecture Slides
61 pages
Machine Learning
No ratings yet
Machine Learning
20 pages
MLT Study
No ratings yet
MLT Study
22 pages
ML Notes
No ratings yet
ML Notes
14 pages
Supervised Learning: Classification & Regression
No ratings yet
Supervised Learning: Classification & Regression
6 pages
Classification vs Regression in ML
No ratings yet
Classification vs Regression in ML
15 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
Unit-III Advanced Machine Learning
No ratings yet
Unit-III Advanced Machine Learning
8 pages
Machine Learning Insem-01 QP
No ratings yet
Machine Learning Insem-01 QP
6 pages
Regression
No ratings yet
Regression
24 pages
Theory in Machine Learning
No ratings yet
Theory in Machine Learning
60 pages
Diagnosing Bias Vs Variance
No ratings yet
Diagnosing Bias Vs Variance
11 pages
SumitBurnwal ML
No ratings yet
SumitBurnwal ML
13 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
Oral Qu Stions
No ratings yet
Oral Qu Stions
12 pages
AI & ML Unit 3 Notes
No ratings yet
AI & ML Unit 3 Notes
20 pages
Machine Learning Concepts Explained
No ratings yet
Machine Learning Concepts Explained
4 pages
Machine Learning Model Validation Insights
No ratings yet
Machine Learning Model Validation Insights
15 pages
6.classification & Regression
No ratings yet
6.classification & Regression
45 pages
Supervised Learning in Machine Learning
No ratings yet
Supervised Learning in Machine Learning
97 pages
Lecture3 Upload
No ratings yet
Lecture3 Upload
28 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
1 page
ML Imp QB
No ratings yet
ML Imp QB
34 pages
Training Neural Networks: Key Concepts
No ratings yet
Training Neural Networks: Key Concepts
37 pages
Regression
No ratings yet
Regression
45 pages
IB Schools Comparison Malaysia
No ratings yet
IB Schools Comparison Malaysia
4 pages
Full Text: Performance of Schools
No ratings yet
Full Text: Performance of Schools
20 pages
LG IT-SD4 Module 2 Mathematical Literacy v.2
No ratings yet
LG IT-SD4 Module 2 Mathematical Literacy v.2
96 pages
INTRODUCTION
No ratings yet
INTRODUCTION
3 pages
Second Language Acquisition
No ratings yet
Second Language Acquisition
2 pages
Nerve Impulse Elicitation and Inhibition
No ratings yet
Nerve Impulse Elicitation and Inhibition
3 pages
Ambedkar Medical College
No ratings yet
Ambedkar Medical College
16 pages
Session Plan - Frigid Zone
100% (2)
Session Plan - Frigid Zone
3 pages
Ip (Unit-2) - Vit
No ratings yet
Ip (Unit-2) - Vit
25 pages
TG 9780199065196 PDF
No ratings yet
TG 9780199065196 PDF
144 pages
Aldelia Vacancy - Lead Static Mechanical Engineer - Dubai
No ratings yet
Aldelia Vacancy - Lead Static Mechanical Engineer - Dubai
2 pages
UGC-NET MCQs With Answer Key
83% (6)
UGC-NET MCQs With Answer Key
40 pages
Activity Proposal Division Capacity Building of Raters of The 12 Additional Indicators
100% (1)
Activity Proposal Division Capacity Building of Raters of The 12 Additional Indicators
13 pages
IELTS Listening
No ratings yet
IELTS Listening
7 pages
IIT-JAM 2020 Physics MCQs and Solutions
No ratings yet
IIT-JAM 2020 Physics MCQs and Solutions
29 pages
Certification: This Qualitative Research Entitled, "THE INSIGHTS OF
No ratings yet
Certification: This Qualitative Research Entitled, "THE INSIGHTS OF
5 pages
Team 8-c Newsletter 091914
No ratings yet
Team 8-c Newsletter 091914
2 pages
Impact of Immediate and Delayed Error Co
No ratings yet
Impact of Immediate and Delayed Error Co
10 pages
Thesis Introduction Past Tense
100% (2)
Thesis Introduction Past Tense
6 pages
NATE Module 3 - Week10 PDF
No ratings yet
NATE Module 3 - Week10 PDF
22 pages
Test. ... - 7Itle:..Noen: of Semester
No ratings yet
Test. ... - 7Itle:..Noen: of Semester
16 pages
Syllabus For CEG-4012
No ratings yet
Syllabus For CEG-4012
3 pages
Feist10e PPT ch01 Intro
No ratings yet
Feist10e PPT ch01 Intro
13 pages
Acctg 111B-Partnership and Corporation Accounting
No ratings yet
Acctg 111B-Partnership and Corporation Accounting
7 pages
AI Oops
No ratings yet
AI Oops
24 pages
Gray Scale Image Captioning Using CNN and LSTM
No ratings yet
Gray Scale Image Captioning Using CNN and LSTM
8 pages
IKS - PPTX 210625
No ratings yet
IKS - PPTX 210625
62 pages
Administration of Raven's Progressive Matrices Test PDF
No ratings yet
Administration of Raven's Progressive Matrices Test PDF
3 pages
Types of Information Systems
No ratings yet
Types of Information Systems
13 pages
CBSE Schools: Submit Vaccination Data
No ratings yet
CBSE Schools: Submit Vaccination Data
2 pages

Machine Learning Unit 1

Uploaded by

Machine Learning Unit 1

Uploaded by

Version 2

1. Supervised vs. Unsupervised vs. Reinforcement

3. R² for given data

4. Role of Learning Rate in Gradient Descent

5. Gradient Descent in Linear Regression

Gradient descent minimizes the cost function by updating weights iteratively.

1.​ Initialize the weights W randomly.

Derivative of the cost function, which determines the direction to take

Gradient descent intuition:

Choosing The Learning Rate:

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from [Link] import mean_squared_error, r2_score

# Convert the data into a pandas DataFrame

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the linear regression model

# Train the model on the training data

# Predict the rainfall on the testing data

# Evaluate the model's performance

mse = mean_squared_error(y_test, y_pred)

print("Mean Squared Error:", mse)

# Print model coefficients

# Predicting rainfall for a new set of data

new_data = [Link]([[10.5, 82]])

print(f"Predicted Rainfall for Precipitation=10.5 and Humidity=82: {predicted_rainfall[0]:.2f}

The overfitted model has low bias and high variance.

Overfitting is the main problem that occurs in supervised learning.

●​ Increase model complexity

# Import necessary libraries

# Step 2: Data Preprocessing (encode 'education_level' categorical feature)

# Step 3: Define features (X) and target variable (y)

# Step 4: Split the data into training and testing sets

# Step 5: Initialize and train the Linear Regression model

# Step 6: Make predictions on the test set

# Step 7: Evaluate the model using Mean Squared Error

# Step 8: Print the model coefficients

# Step 9: Making a prediction for a new employee

1.​ Loading Data:

2.​ Label Encoding:

o​ education_level is a categorical feature, so we convert it to numerical values

3.​ Model Training:

o​ We evaluate the model's performance by calculating the Mean Squared Error

# Preprocessing Input data

#to find sum of (xi-x') & (yi-y') & (xi-x')^2

The regression line y = 9.2 + 0.8x.

# Load the California Housing dataset

# Use the first 3 features as inputs

# Use 'MedHouseVal' and 'AveOccup' as output variables

# Scale the features and target variables

# Initialize the SGDRegressor

# Use MultiOutputRegressor to handle multiple output variables

# Train the model

# Predict on the test data

# Evaluate the model using Mean Squared Error

# Optionally, print some predictions

Short explanation/algorithm about the code should be given.

a = MY - bMX = 13.44 - (1.53*6.9) = 2.89

Step-by-Step calculations to be done. Use tabulation.

# Sample list of positive and negative words

# Function to extract features from review

# Count number of positive and negative words in the review

# Return a feature vector (review_length, positive_count, negative_count, star_rating)

# Load the dataset (replace 'customer_reviews.csv' with your file path)

# Extract features and labels

# Split data into training and testing sets

You might also like

1. Initialize the weights W randomly.

● Increase model complexity

1. Loading Data:

2. Label Encoding:

o education_level is a categorical feature, so we convert it to numerical values

3. Model Training:

o We evaluate the model's performance by calculating the Mean Squared Error