0% found this document useful (0 votes)

254 views43 pages

Machine Learning Lab

Uploaded by

aids2aitstpt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

254 views43 pages

Machine Learning Lab

Uploaded by

aids2aitstpt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Machine Learning Lab

List of Tasks
1. Exercises to solve the real-world problems using the following machine
learning methods:

a. Linear Regression

b. Logistic Regression.

2. Write a program to Implement Support Vector Machines.

3. Exploratory Data Analysis for Classification using Pandas and Matplotlib.

4. Implement a program for Bias, Variance, and Cross Validation.

5. Write a program to simulate a perception network for pattern classification

and function approximation.

6. Write a program to demonstrate the working of the decision tree based ID3
algorithm. Use an appropriate data set for

building the decision tree and apply this knowledge to classify a new sample.

7. Build an Artificial Neural Network by implementing the Back propagation

algorithm and test the same using appropriate

data sets.

8. Write a program to implement the naïve Bayesian classifier for Iris data set.
Compute the accuracy of the classifier,

considering few test data sets.

9. Assuming a set of documents that need to be classified, use the naïve

Bayesian Classifier model to perform this task.

Built-in Java classes/API can be used to write the program. Calculate the
accuracy, precision, and recall for your data
set.

10. Apply EM algorithm to cluster a Heart Disease Data Set. Use the same data
set for clustering using k-Means algorithm.

Compare the results of these two algorithms and comment on the quality of
clustering. You can add Java/Python ML

library classes/API in the program.

11. Write a program to implement k-Nearest Neighbour algorithm to classify the

iris data set. Print both correct and wrong

predictions.

12. Implement the non-parametric Locally Weighted Regression algorithm in

order to fit data points. Select appropriate data

set for your experiment and draw graphs.

13. For a given set of training data examples stored in a .CSV file, implement
and demonstrate the Candidate-Elimination

algorithm to output a description of the set of all hypotheses consistent with the
training examples.

14. Implement and demonstrate the FIND-S algorithm for finding the most
specific hypothesis based on a given set of

training data samples. Read the training data from a .CSV file.

15. Solve optimal relay coordination as a linear programming problem using

Genetic Algorithm.
To Run the Machine Learning Programs, you need to install below
modules or packages:

Before we can start using Jupyter Notebook, we need to install it on our

Windows machine. Here are the steps to follow:

1. First, you need to have Python installed on your machine. You can
download the latest version of Python from the official website
(https://www.python.org/downloads/). Make sure to select the option to
add Python to your system PATH during the installation process.
2. Once Python is installed, open the command prompt by pressing the
Windows key + R and typing cmd in the Run dialog box.
3. In the command prompt, type the following command to install Jupyter
Notebook:
4. pip install jupyter
This will download and install Jupyter Notebook and its dependencies.

5. After the installation is complete, type the following command to start

Jupyter Notebook:
6. jupyter notebook
This will open a new tab in your web browser with the Jupyter Notebook
interface.

Once you’ve created a new notebook, you can start writing code in the cells. To
run a cell, press Shift + Enter or click the Run button in the toolbar. You can
also add text, equations, and visualizations to your notebook using Markdown
syntax.
Jupyter Notebook also supports a variety of keyboard shortcuts to make your
work more efficient. Here are some of the most useful shortcuts:

• Shift + Enter: Run the current cell and move to the next one.
• Ctrl + Enter: Run the current cell.
• Esc: Enter command mode.
• Enter: Enter edit mode.
• A: Insert a new cell above the current cell.
• B: Insert a new cell below the current cell.
• D + D: Delete the current cell.
• M: Change the current cell type to Markdown.
• Y: Change the current cell type to code.

Install Below Libraries:

In Cmd:

1) To upgrade Pip : python.exe -m pip install --upgrade pip

2) For Numpy : pip install numpy
3) For Pandas : pip install pandas
4) For Matplotlib : pip install matplotlib
5) For sklearn : pip install scikit-learn
1A) Aim: Exercise to solve the real-world problems using LINEAR
REGRESSION.
Description: Linear regression is a popular machine learning algorithm used for
predicting a continuous target variable based on one or more independent
variables.
The general equation for linear regression with a single independent variable
can be written as:
y = b0 + b1 * x
Where:
• y is the target variable
• x is the independent variable
• b0 is the intercept (the value of y when x is 0)
• b1 is the coefficient (the change in y for a unit change in x)
The goal of linear regression is to find the best values for b0 and b1 that
minimize the difference between the predicted values and the actual
values in the training data.
Source code: (CSV File: Salary_Data.csv)

#Import the required modules

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
dataset = pd.read_csv('Salary_Data.csv')
dataset.head()

#Data Preprocessing
X = dataset.iloc[:,:-1].values #independent variable array
y = dataset.iloc[:,1].values #dependent variable vector

# Splitting the dataset

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test=train_test_split(X,y,test_size=1/3,random_state=0)
#Fitting linear regression model into the training set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train,y_train) #actually produces the linear eqn for the data
# Predicting the test set results
y_pred = regressor.predict(X_test)
y_pred
y_test

#Visualizing the results

# plot for the Training
plt.scatter(X_train, y_train, color='red') # plot the observation line
plt.plot(X_train, regressor.predict(X_train), color='blue') # plot the regression line
plt.title("Salary vs Experience (Training set)") # stating the title of the graph
plt.xlabel("Years of experience") # adding the name of x-axis
plt.ylabel("Salaries") # adding the name of y-axis
plt.show() # specifies end of graph

# plot for the Testing

plt.scatter(X_test, y_test, color='red')
plt.plot(X_train, regressor.predict(X_train), color='blue') # plot the regression line
plt.title("Salary vs Experience (Testing set)")
plt.xlabel("Years of experience")
plt.ylabel("Salaries")
plt.show()
Output:
1B) Aim: Exercise to solve the real-world problems using LOGISTIC
REGRESSION.
Description: Logistic regression is a supervised machine learning algorithm
used for binary classification tasks. It models the relationship between the
independent variables and the probability of a binary outcome.
In logistic regression, the output is a probability value that is transformed using a
sigmoid function to constrain it between 0 and 1. If the probability is above a
certain threshold (e.g., 0.5), the sample is classified as belonging to one class, and
if it is below the threshold, it is classified as belonging to the other class.
Source code:

# Import the required modules

from sklearn.datasets import make_classification
from matplotlib import pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
import pandas as pd
# Generate the dataset
x, y = make_classification(
n_samples=100,
n_features=1,
n_classes=2,
n_clusters_per_class=1,
flip_y=0.03,
n_informative=1,
n_redundant=0,
n_repeated=0
)
# Visualize the Data
plt.scatter(x, y, c=y, cmap='rainbow')
plt.title('Scatter Plot of Logistic Regression')
plt.show()
# Split the Dataset
# Split the dataset into training and test dataset
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=1)
#Create a Logistic Regression Object and Perform Logistic Regression
log_reg = LogisticRegression()
log_reg.fit(x_train, y_train)
LogisticRegression(C=1.0, class_weight=None, dual=False,
fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='ovr',
n_jobs=1, penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0, warm_start=False)
print(log_reg.coef_)
print(log_reg.intercept_)
# Perform prediction using the test dataset
y_pred = log_reg.predict(x_test)
confusion_matrix(y_test, y_pred)

Output:

Note: At every run we get different plots and different values

2) Aim: Write a program to implement support-vector machines.
Description:
Support Vector Machines (SVM) is a supervised machine learning algorithm
used for both classification and regression tasks. It is particularly effective in
handling high-dimensional data and finding non-linear decision boundaries.
In SVM, the algorithm constructs hyperplanes in a high-dimensional space that
separate different classes of data points. The main objective is to find the
hyperplane that maximizes the margin between the classes, allowing for better
generalization and robustness to new data points.
In this example, we import the necessary libraries and define the input features X
and target labels y. We split the data into training and testing sets using
train_test_split.
Next, we apply feature scaling using StandardScaler to standardize the input
features to have zero mean and unit variance.We create an SVM classifier with a
linear kernel using svm. SVC(kernel='linear') and fit the classifier to the
training data using the fit method. Then, we use the trained classifier to predict
the classes of the testing data using the predict method. Finally, we calculate the
accuracy of the model using accuracy_score.
Source code:

# Import necessary libraries

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
# Load the iris dataset
iris = datasets.load_iris()
X = iris.data[:, :2] # we only take the first two features for visualization
y = iris.target
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)
# Create an SVM classifier with a linear kernel
clf = SVC(kernel='linear')
# Train the classifier on the training data
clf.fit(X_train, y_train)
# Make predictions on the testing data
y_pred = clf.predict(X_test)
# Calculate the accuracy of the classifier
acc = accuracy_score(y_test, y_pred)
# Plot the decision boundary
xmin, xmax = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
ymin, ymax = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
xx, yy = np.meshgrid(np.arange(xmin, xmax, 0.02), np.arange(ymin, ymax, 0.02))
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.Paired)
# Plot the training and testing data points
plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=plt.cm.Paired,
edgecolors='k')
plt.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=plt.cm.Paired, marker='x',
edgecolors='k')
# Add labels and a title to the plot
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.title('SVM decision boundary (accuracy={:.2f})'.format(acc))
# Display the plot
plt.show()
Output:
3) Aim: Exploratory Data Analysis for classification using Pandas and
Matplotlib.
Description: Pandas and Matplotlib are two popular libraries in Python
commonly used for data manipulation, analysis, and visualization. Here's a brief
overview of each library:
Pandas: Pandas is a powerful library for data manipulation and analysis. It
provides data structures and functions to efficiently handle structured data, such
as tabular data, time series, and more. The primary data structures in Pandas are
Series (a one-dimensional labelled array) and Data Frame (a two-dimensional
labelled data structure).
Pandas offers a wide range of functionalities, including data loading and saving,
data cleaning and preprocessing, data exploration and manipulation, merging and
joining datasets, and handling missing values. It also provides powerful data
aggregation and grouping capabilities, as well as support for time series analysis.
Matplotlib: Matplotlib is a popular library for creating static, animated, and
interactive visualizations in Python. It provides a wide range of plotting functions
and customization options to create various types of plots, including line plots,
scatter plots, bar plots, histograms, and more.
Source code:

# Import necessary libraries

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Create a sample dataframe
data = {
'Student_ID': range(1, 101),
'Math_Score': np.random.randint(40, 100, 100),
'Science_Score': np.random.randint(40, 100, 100),
'Pass_Fail': np.random.choice(['Pass', 'Fail'], size=100)
}
# Convert the dictionary to a DataFrame
df = pd.DataFrame(data)
# Display the first few rows of the dataframe
print(df.head())
# Summary statistics of the dataset
print(df.describe())
# Check for missing values
print(df.isnull().sum())
# Data visualization
# Example: Histogram of a numeric variable (Math_Score)
plt.figure(figsize=(10, 6))
plt.hist(df['Math_Score'], bins=30, edgecolor='k', alpha=0.7)
plt.title('Distribution of Math Scores')
plt.xlabel('Math Score')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()
# Example: Count plot for a categorical variable (Pass_Fail)
pass_fail_counts = df['Pass_Fail'].value_counts()
plt.figure(figsize=(8, 6))
plt.bar(pass_fail_counts.index, pass_fail_counts.values, color=['blue', 'red'])
plt.title('Count of Pass/Fail')
plt.xlabel('Pass/Fail')
plt.ylabel('Count')
plt.grid(axis='y')
plt.show()
Output:
4) Aim: Implement a program for Bias, Variance and cross validation.
Description:
Machine learning is a branch of Artificial Intelligence, which allows machines to
perform data analysis and make predictions. However, if the machine learning
model is not accurate, it can make predictions errors, and these prediction errors
are usually known as Bias and Variance. In machine learning, these errors will
always be present as there is always a slight difference between the model
predictions and actual predictions. The main aim of ML/data science analysts is
to reduce these errors in order to get more accurate results.
Low Bias: A low bias model will make fewer assumptions about the form of the
target function. High Bias: A model with a high bias makes more assumptions,
and the model becomes unable to capture the important features of our dataset. A
high bias model also cannot perform well on new data.

Variance tells that how much a random variable is different from its expected
value.
Low variance means there is a small variation in the prediction of the target
function with changes in the training data set. At the same time, High
variance shows a large variation in the prediction of the target function with
changes in the training dataset.
Source code:

# Import necessary libraries

import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split, KFold
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# 1. Generate synthetic data
np.random.seed(0)
X = np.random.rand(100, 1) * 10 # 100 data points in the range [0, 10)
# Quadratic relationship with noise
y = 3 * (X - 5) ** 2 + np.random.randn(100, 1) * 2
# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# 2. Bias and Variance Decomposition
def calculate_bias_variance(model, X, y, num_runs=100):
y_preds = np.zeros((len(X), num_runs))
for i in range(num_runs):
# Generate bootstrapped samples
indices = np.random.choice(len(X), size=len(X), replace=True)
X_sampled = X[indices]
y_sampled = y[indices]
model.fit(X_sampled, y_sampled)
y_preds[:, i] = model.predict(X).flatten()
# Calculate bias and variance
bias = np.mean((np.mean(y_preds, axis=1) - y.flatten()) ** 2)
variance = np.mean(np.var(y_preds, axis=1))
return bias, variance
# Create a simple linear regression model
model = LinearRegression()
bias, variance = calculate_bias_variance(model, X_train, y_train)
print(f"Bias: {bias:.2f}")
print(f"Variance: {variance:.2f}")
# 3. Cross-validation
k_fold = KFold(n_splits=5, shuffle=True, random_state=42)
mse_scores = []
for train_idx, val_idx in k_fold.split(X_train):
X_train_fold, X_val_fold = X_train[train_idx], X_train[val_idx]
y_train_fold, y_val_fold = y_train[train_idx], y_train[val_idx]
model.fit(X_train_fold, y_train_fold)
y_val_pred = model.predict(X_val_fold)
mse = mean_squared_error(y_val_fold, y_val_pred)
mse_scores.append(mse)
average_mse = np.mean(mse_scores)
print(f"Average MSE using 5-fold CV: {average_mse:.2f}")
# Plotting
plt.scatter(X_train, y_train, c='b', label='Training Data')
plt.scatter(X_test, y_test, c='r', label='Test Data', alpha=0.5)
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title('Synthetic Data')
plt.show()

Output:
Bias: 540.17
Variance: 18.08
Average MSE using 5-fold CV: 566.52
5) Aim: Write a program to simulate a perception network for pattern
classification and function approximation.
Description:

The perceptron, as a foundational neural network model, is used in this simulation

to understand its capabilities in both pattern classification and function
approximation tasks.

Pattern Classification:

A synthetic dataset is generated where points are randomly distributed in a 2D

space. The perceptron is trained to classify these points based on whether they lie
inside or outside a predefined region.

Function Approximation:

Another synthetic dataset is generated where data points represent values of a

function with some added noise. The perceptron attempts to approximate this
function by learning the underlying pattern from the data points.

The program provides visual insights by plotting the generated data, the decision
boundary for classification, and the approximated function using the perceptron.
Through this simulation, one can observe the perceptron's ability to make binary
classifications and its approach to approximating simple functions.

Source code:

# Import necessary libraries

import numpy as np
import matplotlib.pyplot as plt
# 1. Generate synthetic data for pattern classification
X = np.random.rand(100, 2) * 10 - 5
y = np.array([1 if x[0]**2 + x[1]**2 < 9 else 0 for x in X])
plt.scatter(X[y == 0][:, 0], X[y == 0][:, 1], color='b')
plt.scatter(X[y == 1][:, 0], X[y == 1][:, 1], color='r')
plt.show() # Plot the data
# 2. Perceptron Learning Algorithm (PLA) for classification
class Perceptron:
def __init__(self):
self.weights = np.zeros(3)
def train(self, X, y, lr=0.1, epochs=1000):
X = np.c_[X, np.ones(X.shape[0])]
for _ in range(epochs):
for i in range(X.shape[0]):
prediction = np.dot(X[i], self.weights)
self.weights += lr * (y[i] - prediction) * X[i]
def predict(self, x):
return 1 if np.dot(np.append(x, 1), self.weights) > 0 else 0
# Train and plot decision boundary
perceptron = Perceptron()
perceptron.train(X, y)
x1 = np.linspace(-5, 5, 100)
x2 = -(perceptron.weights[0] * x1 + perceptron.weights[2]) /
perceptron.weights[1]
plt.scatter(X[y == 0][:, 0], X[y == 0][:, 1], color='b')
plt.scatter(X[y == 1][:, 0], X[y == 1][:, 1], color='r')
plt.plot(x1, x2, color='k')
plt.show()

# 3. Generate synthetic data for function approximation

X_func = np.random.rand(100) * 10 - 5
y_func = np.sin(X_func) + np.random.randn(100) * 0.5
# Perceptron for function approximation

class SimplePerceptron:
def __init__(self):
self.weights = np.zeros(2)

def train(self, X, y, lr=0.1, epochs=1000):

X = np.c_[X, np.ones(X.shape[0])]
for _ in range(epochs):
for i in range(X.shape[0]):
prediction = np.dot(X[i], self.weights)
self.weights += lr * (y[i] - prediction) * X[i]

def predict(self, x):

return np.dot(np.append(x, 1), self.weights)

# Train and plot function approximation

func_perceptron = SimplePerceptron()
func_perceptron.train(X_func, y_func)
x_vals = np.linspace(-5, 5, 100)
y_vals = [func_perceptron.predict(x) for x in x_vals]
plt.scatter(X_func, y_func, color='b')
plt.plot(x_vals, y_vals, color='r')
plt.show()
Output:

Note: At every run we get different types of plots.

6) Aim: Write a program to demonstrate the working of the decision tree
based ID3 algorithm. Use an appropriate data set for building the decision
tree and apply this knowledge to classify a new sample.

Description:

The ID3 (Iterative Dichotomiser 3) algorithm is a foundational decision tree

algorithm primarily used for classification tasks. The core principle behind ID3
is to construct a decision tree by iteratively selecting the best attribute that
maximizes the information gain to partition the dataset.

Working Mechanism:

Entropy Calculation:

The ID3 algorithm starts by calculating the entropy of each attribute present in
the dataset. Entropy provides a measure of the randomness or impurity in the
dataset. Lower entropy values suggest attributes that are more effective for
classification.

Information Gain:

Post entropy calculation, the algorithm computes the information gain for each
attribute. Information gain quantifies the reduction in entropy achieved by
splitting the dataset based on a particular attribute. The attribute with the highest
information gain is selected as the decision node.

Tree Construction:

Using the attribute with the maximum information gain, the dataset is split into
subsets. This splitting process is recursive, meaning each subset becomes a node
in the decision tree, and the process continues until a stopping criterion is met,
such as reaching a minimum node size or depth.

Decision Making:

Once the decision tree is constructed, it can be utilized for classification. To

classify a new sample, one simply traverses the tree from the root node down to
a leaf node, making decisions based on the attribute values of the sample.

Source code: (tennis.csv)

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import math
import copy
dataset = pd.read_csv('tennis.csv')
X = dataset.iloc[:, 1:].values
# print(X)
attribute = ['outlook', 'temp', 'humidity', 'wind']
class Node(object):
def __init__(self):
self.value = None
self.decision = None
self.childs = None
def findEntropy(data, rows):
yes = 0
no = 0
ans = -1
idx = len(data[0]) - 1
entropy = 0
for i in rows:
if data[i][idx] == 'Yes':
yes = yes + 1
else:
no = no + 1
x = yes/(yes+no)
y = no/(yes+no)
if x != 0 and y != 0:
entropy = -1 * (x*math.log2(x) + y*math.log2(y))
if x == 1:
ans = 1
if y == 1:
ans = 0
return entropy, ans
def findMaxGain(data, rows, columns):
maxGain = 0
retidx = -1
entropy, ans = findEntropy(data, rows)
if entropy == 0:
"""if ans == 1:
print("Yes")
else:
print("No")"""
return maxGain, retidx, ans
for j in columns:
mydict = {}
idx = j
for i in rows:
key = data[i][idx]
if key not in mydict:
mydict[key] = 1
else:
mydict[key] = mydict[key] + 1
gain = entropy
for key in mydict:
yes = 0
no = 0
for k in rows:
if data[k][j] == key:
if data[k][-1] == 'Yes':
yes = yes + 1
else:
no = no + 1
# print(yes, no)
x = yes/(yes+no)
y = no/(yes+no)
# print(x, y)
if x != 0 and y != 0:
gain += (mydict[key] * (x*math.log2(x) + y*math.log2(y)))/14
# print(gain)
if gain > maxGain:
# print("hello")
maxGain = gain
retidx = j
return maxGain, retidx, ans
def buildTree(data, rows, columns):
maxGain, idx, ans = findMaxGain(X, rows, columns)
root = Node()
root.childs = []
if maxGain == 0:
if ans == 1:
root.value = 'Yes'
else:
root.value = 'No'
return root
root.value = attribute[idx]
mydict = {}
for i in rows:
key = data[i][idx]
if key not in mydict:
mydict[key] = 1
else:
mydict[key] += 1
newcolumns = copy.deepcopy(columns)
newcolumns.remove(idx)
for key in mydict:
newrows = []
for i in rows:
if data[i][idx] == key:
newrows.append(i)
# print(newrows)
temp = buildTree(data, newrows, newcolumns)
temp.decision = key
root.childs.append(temp)
return root
def traverse(root):
print(root.decision)
print(root.value)

n = len(root.childs)
if n > 0:
for i in range(0, n):
traverse(root.childs[i])
def calculate():
rows = [i for i in range(0, 14)]
columns = [i for i in range(0, 4)]
root = buildTree(X, rows, columns)
root.decision = 'Start'
traverse(root)
calculate()
Output:
8) Aim: Write a program to implement the naïve Bayesian classifier for
Iris data set. Compute the accuracy of the classifier, considering few test
data sets

Description:
The implementation of the Naïve Bayes classifier for the Iris dataset offers an
insightful exploration into the application of probabilistic classification
techniques on a well-known dataset in the machine learning community.

Implementation Steps:

Data Loading:

Begin by importing the Iris dataset, commonly available in machine learning

libraries like scikit-learn. This dataset is partitioned into feature vectors
representing the four attributes and corresponding class labels.

Data Preprocessing:

Split the dataset into a training set and a test set. Typically, a significant portion
(e.g., 70-80%) is used for training, and the remaining portion is used for testing.

Model Training:

Prior Probabilities: Compute the prior probabilities for each class based on the
frequency of occurrences in the training set.

Likelihood Estimation: Estimate the likelihood probabilities of features given

each class. Due to the Naïve Bayes assumption of feature independence, calculate
individual probabilities for each feature.

Classification:

For each test instance, compute the posterior probabilities of all classes using
Bayes' theorem. Assign the class label to the instance based on the highest
posterior probability.

Performance Evaluation:

After classifying all test instances, compare the predicted class labels with the
actual labels to compute the classifier's accuracy. Accuracy is a fundamental
metric that provides insights into the model's predictive performance on unseen
data.
Source code:

#Import libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Train the Naive Bayes classifier
model = GaussianNB()
model.fit(X_train, y_train)
# Evaluate the model on the test set
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy:', accuracy)
#Make predictions on new samples
sample = [[5.1, 3.5, 1.4, 6.2]]
prediction = model.predict(sample)
print('Prediction:', prediction)

Output:
11) Aim: Write a program to implement k-Nearest Neighbor algorithm to
classify the iris data set. Print both correct and wrong predictions.
Description:
K-Nearest Neighbour is one of the simplest Machine Learning algorithms based
on Supervised Learning technique. K-NN algorithm assumes the similarity
between the new case/data and available cases and put the new case into the
category that is most similar to the available categories.
The K-NN working can be explained on the basis of the below algorithm:
Step-1: Select the number K of the neighbours

Step-2: Calculate the Euclidean distance of K number of neighbours

Step-3: Take the K nearest neighbours as per the calculated Euclidean distance.
Among these k neighbours, count the number of the data points in each category.

Step-4: Assign the new data points to that category for which the number of the
neighbour is maximum.

Step-5: Our model is ready.

Source code:

from sklearn.cluster import KMeans

from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
# Generate sample data
X, y = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0)
# Train a KMeans clustering model on the data
kmeans = KMeans(n_clusters=4)
kmeans.fit(X)
# Plot the results
plt.scatter(X[:, 0], X[:, 1], c=kmeans.labels_, cmap='viridis')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], marker='*',
s=200, color='black')
plt.show()
Output:
12) Aim: Implement the non-parametric Locally Weighted Regression
algorithm in order to fit data points. Select appropriate data set for your
experiment and draw graphs.
Description:

Locally Weighted Regression (LWR):

Locally Weighted Regression is a non-parametric algorithm used for regression

tasks, especially beneficial when dealing with complex and nonlinear
relationships between variables. Instead of fitting a single model to the entire
dataset, LWR fits a separate regression model to each point in the dataset, giving
more weight to points closer to the one being predicted.

Implementation Steps:

Synthetic Data Generation:

A synthetic dataset is created using numpy, comprising 100 data points generated
along a sine curve with some added random noise.

Locally Weighted Regression Function:

The core of the algorithm lies in the locally_weighted_regression function. For a

given test point:

It computes weights for each training point based on the Gaussian kernel,
emphasizing points closer to the test point.

It fits a regression model considering these weights, resulting in a locally

weighted prediction.

Prediction:

Using the LWR function, predictions are generated for each point in the dataset.
This means that instead of a single line of best fit, the algorithm calculates a curve
that best fits the data point-by-point.

Visualization:

The matplotlib library is utilized to visualize the synthetic data points as blue dots
and the LWR fit as a red curve. This visualization provides a clear understanding
of how LWR fits a curve that closely follows the underlying trend of the data,
especially in regions of high data density.
Source code:

import numpy as np
import matplotlib.pyplot as plt
# Generate synthetic data
np.random.seed(0)
X = np.linspace(0, 10, 100)
y = np.sin(X) + np.random.normal(0, 0.2, 100)
#Locally Weighted Regression algorithm.
def locally_weighted_regression(test_point, X, y, tau):
m = X.shape[0]
weights = np.exp(-((X - test_point) ** 2) / (2 * tau ** 2))
W = np.diag(weights)
theta = np.linalg.inv(X[:, None].T @ W @ X[:, None]) @ X[:, None].T @ W @ y
prediction = test_point * theta
return prediction[0]
# Predictions using LWR
predictions = [locally_weighted_regression(test_point, X, y, tau=0.5) for test_point in
X]
# Plotting the results
plt.figure(figsize=(10, 6))
plt.scatter(X, y, color='blue', label='Data Points')
plt.plot(X, predictions, color='red', label='LWR Fit')
plt.title('Locally Weighted Regression')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.grid(True)
plt.show()
Output:
13) Aim: For a given set of training data examples stored in a .CSV file,
implement and demonstrate the Candidate-Elimination algorithm to output
a description of the set of all hypotheses consistent with the training
examples.
Description:

The Candidate-Elimination algorithm is a concept learning method that refines a

general hypothesis and a specific hypothesis iteratively based on observed
training examples to find a consistent representation of the target concept.

Initialization: Start with the most general and specific hypotheses.

Iterative Refinement:

For positive instances, generalize the specific hypothesis and specialize the
general hypothesis.

For negative instances, generalize the general hypothesis and specialize the
specific hypothesis.

Output: The algorithm refines hypotheses until it converges to a consistent

representation between the most general and specific hypotheses.

Source code: (training_data.csv)

import pandas as pd

# Load dataset

def load_data(filename):

return pd.read_csv(filename)

# Implement Candidate-Elimination algorithm

def candidate_elimination(examples):

specific_h = [None] * (len(examples.columns) - 1)

general_h = [['?' for _ in range(len(examples.columns) - 1)] for _ in

range(len(examples.columns) - 1)]
for i, row in examples.iterrows():

if row[-1] == 1: # Positive example

for j in range(len(specific_h)):

if specific_h[j] is None:

specific_h[j] = row[j]

elif specific_h[j] != row[j]:

specific_h[j] = '?'

general_h[j][j] = '?'

else: # Negative example

for j in range(len(specific_h)):

if specific_h[j] != row[j]:

general_h[j][j] = specific_h[j]

# Remove all inconsistent hypotheses from general_h

to_remove = []

for hypothesis in general_h:

if all(val == '?' for val in hypothesis):

to_remove.append(hypothesis)

for hypothesis in to_remove:

general_h.remove(hypothesis)

return specific_h, general_h

# Load the dataset from CSV

data = load_data('training_data.csv')

# Run the Candidate-Elimination algorithm

specific, general = candidate_elimination(data)

# Display results

print("\nSpecific hypothesis:")

print(specific)

print("\nGeneral hypothesis:")

for hypothesis in general:

print(hypothesis)

Output:
14) Aim: Implement and demonstrate the FIND-S algorithm for finding the
most specific hypothesis based on a given set of training data samples. Read
the training data from a .CSV file.

Description:
The find-S algorithm is a machine learning concept learning algorithm. The find-
S technique identifies the hypothesis that best matches all of the positive cases.
The find-S algorithm considers only positive cases.
When the find-S method fails to categorize observed positive training data, it
starts with the most particular hypothesis and generalizes it.

Representations:
• The most specific hypothesis is represented using ϕ.
• The most general hypothesis is represented using ?.

Source code:

import numpy as np
# Define the training data
X = np.array([
['Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same'],
['Sunny', 'Warm', 'High', 'Strong', 'Warm', 'Same'],
['Rainy', 'Cold', 'High', 'Strong', 'Warm', 'Change'],
['Sunny', 'Warm', 'High', 'Strong', 'Cool', 'Change']
])
y = np.array(['+', '+', '-', '+'])
# Initialize the most specific hypothesis
hypothesis = ['0', '0', '0', '0', '0', '0']
# Find the most specific hypothesis
for i in range(len(y)):
if y[i] == '+':
for j in range(len(X[i])):
if hypothesis[j] == '0':
hypothesis[j] = X[i][j]
elif hypothesis[j] != X[i][j]:
hypothesis[j] = '?'
print("Most specific hypothesis:", hypothesis)

Output:
15) Aim: Solve optimal relay coordination as a linear programming problem
using Genetic Algorithm.

Description:

Optimal relay coordination is a method used in power systems to ensure that

protection relays respond in a coordinated manner to faults. The coordination
ensures that the relay closest to the fault operates before other relays further away.

Using Genetic Algorithm (GA) to solve this as a linear programming problem is

an interesting approach. Here's a simplified demonstration to get you started. Note
that this is a basic example and may not capture all complexities of real-world
scenarios.

Let's break down the problem:

Objective: Minimize the sum of the relay time settings while ensuring
coordination.

Constraints: Ensure that the time settings for each relay are within their
minimum and maximum bounds.

For simplicity, let's assume we have 3 relays with minimum and maximum time
settings.

Source Code:

import numpy as np
import random
import matplotlib.pyplot as plt
# Define relay settings [min_time, max_time]
relay_settings = [
[0.1, 0.3], # Relay 1
[0.2, 0.5], # Relay 2
[0.4, 0.8] # Relay 3
]
# GA parameters
population_size = 100
num_generations = 100
mutation_rate = 0.1

def compute_fitness(individual):
return sum(individual)
def crossover(parent1, parent2):
crossover_point = random.randint(0, len(parent1) - 1)
child1 = parent1[:crossover_point] + parent2[crossover_point:]
child2 = parent2[:crossover_point] + parent1[crossover_point:]
return child1, child2

def mutate(individual):
for i in range(len(individual)):
if random.random() < mutation_rate:
individual[i] = random.uniform(relay_settings[i][0], relay_settings[i][1])
return individual
# Initialize population
population = [ [random.uniform(min_time, max_time) for min_time, max_time in
relay_settings] for _ in range(population_size)]
# Main GA loop
for generation in range(num_generations):
# Evaluate fitness
fitness_scores = [compute_fitness(individual) for individual in population]
sorted_population = [x for _, x in sorted(zip(fitness_scores, population), key=lambda
pair: pair[0])]
# Selection: Top 50% of the population becomes parents
parents = sorted_population[:population_size // 2]

# Crossover and Mutation

offspring = []
while len(offspring) < population_size:
parent1, parent2 = random.sample(parents, 2)
child1, child2 = crossover(parent1, parent2)
offspring.append(mutate(child1))
offspring.append(mutate(child2))
# Update population
population = offspring
# Get the best individual (relay settings)
best_individual = sorted(population, key=lambda x: compute_fitness(x))[0]
print("Optimal relay settings:", best_individual)

Output:

Note: At every run we get different values.

ML Lab Manual
No ratings yet
ML Lab Manual
38 pages
Sahil ML
No ratings yet
Sahil ML
21 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
ML - LAB - FILE Amrit
No ratings yet
ML - LAB - FILE Amrit
13 pages
Python Simple Linear Regression Guide
No ratings yet
Python Simple Linear Regression Guide
14 pages
Machine Learning Crash Course With Exercises
No ratings yet
Machine Learning Crash Course With Exercises
7 pages
Python Machine Learning Practical Guide
No ratings yet
Python Machine Learning Practical Guide
13 pages
FYMCA IDSLab A6 Submission
No ratings yet
FYMCA IDSLab A6 Submission
9 pages
ML Assignment 1
No ratings yet
ML Assignment 1
15 pages
27 KrishParasShah
No ratings yet
27 KrishParasShah
17 pages
Machine Learning Lab Manual 2021-22
No ratings yet
Machine Learning Lab Manual 2021-22
23 pages
Lab Mannual of ML
No ratings yet
Lab Mannual of ML
43 pages
Machine Learning Hands-On
100% (1)
Machine Learning Hands-On
18 pages
ML Lab Record - 250625 - 105014
No ratings yet
ML Lab Record - 250625 - 105014
29 pages
Logistic Regression Lab Guide
No ratings yet
Logistic Regression Lab Guide
9 pages
Ritesh Mangla ML PracticalFile
No ratings yet
Ritesh Mangla ML PracticalFile
55 pages
Deep Learningexp4
No ratings yet
Deep Learningexp4
4 pages
Lecture-2 Unit 2
No ratings yet
Lecture-2 Unit 2
56 pages
Simple Linear Regression in Machine Learning
No ratings yet
Simple Linear Regression in Machine Learning
7 pages
Machine Learning 2
No ratings yet
Machine Learning 2
45 pages
CL IV Manual
No ratings yet
CL IV Manual
108 pages
Machine Learning Practicals
No ratings yet
Machine Learning Practicals
30 pages
Machine Learning Lab Manual 06
100% (1)
Machine Learning Lab Manual 06
8 pages
Easy Pract ML
No ratings yet
Easy Pract ML
7 pages
Machine Learning Laboratory Exercises
No ratings yet
Machine Learning Laboratory Exercises
16 pages
Machine Learning With Python Algorithms
No ratings yet
Machine Learning With Python Algorithms
28 pages
Machine Learning Laboratory Report
No ratings yet
Machine Learning Laboratory Report
23 pages
DTS 101 Lecture 3
No ratings yet
DTS 101 Lecture 3
21 pages
ML Lab Programs
No ratings yet
ML Lab Programs
9 pages
ML WorkSheet Milan
No ratings yet
ML WorkSheet Milan
4 pages
2.1 ML (Implementation of Simple Linear Regression in Python)
No ratings yet
2.1 ML (Implementation of Simple Linear Regression in Python)
8 pages
MlLabManualdocx 2024 09 04 22 02 58
No ratings yet
MlLabManualdocx 2024 09 04 22 02 58
19 pages
Important Questions
No ratings yet
Important Questions
4 pages
ML Lab Record
No ratings yet
ML Lab Record
45 pages
Logistic Regression for Classification
No ratings yet
Logistic Regression for Classification
13 pages
B24 ML Exp-1
No ratings yet
B24 ML Exp-1
10 pages
Machine Learning Strategies
No ratings yet
Machine Learning Strategies
59 pages
Lab Manual 04
No ratings yet
Lab Manual 04
12 pages
Linear Regression
No ratings yet
Linear Regression
6 pages
ML Lab Syllabus for Students
No ratings yet
ML Lab Syllabus for Students
90 pages
ML Priyesha - 778
No ratings yet
ML Priyesha - 778
23 pages
ML RECORD - Merged
No ratings yet
ML RECORD - Merged
33 pages
ML in Python Part-2
No ratings yet
ML in Python Part-2
21 pages
Vishal AIML 2.2
No ratings yet
Vishal AIML 2.2
4 pages
ML Lab Manual
No ratings yet
ML Lab Manual
36 pages
Exp 4 - LM
No ratings yet
Exp 4 - LM
5 pages
Python Data Preprocessing & Regression
No ratings yet
Python Data Preprocessing & Regression
68 pages
Hemraj Python Ass1
No ratings yet
Hemraj Python Ass1
7 pages
Machine Learning Assignment Guide
No ratings yet
Machine Learning Assignment Guide
2 pages
Logistic Regression in Machine Learning
No ratings yet
Logistic Regression in Machine Learning
28 pages
ML Lab
No ratings yet
ML Lab
30 pages
ML Lab 06 Manual - Linear Regression 1 (Version 6)
No ratings yet
ML Lab 06 Manual - Linear Regression 1 (Version 6)
8 pages
Agniva
No ratings yet
Agniva
16 pages
AI and ML Lab Ex3 To 12
No ratings yet
AI and ML Lab Ex3 To 12
27 pages
Machine Learning Lab Manual SPCE
No ratings yet
Machine Learning Lab Manual SPCE
57 pages
C2W3 Lab 01 Model Evaluation and Selection
No ratings yet
C2W3 Lab 01 Model Evaluation and Selection
21 pages
Regression Analysis
No ratings yet
Regression Analysis
16 pages
Food Microbial Ecology Insights
No ratings yet
Food Microbial Ecology Insights
10 pages
Angelarium Oracle of Emanations
No ratings yet
Angelarium Oracle of Emanations
145 pages
Medical Billing Terminology and Meanings
No ratings yet
Medical Billing Terminology and Meanings
7 pages
Engineering Management Thesis Support
100% (3)
Engineering Management Thesis Support
5 pages
09-Bomba de Aceite PDF
No ratings yet
09-Bomba de Aceite PDF
212 pages
Dec 01 To 31 2020 Rbi GR B Sebi GR A Nabard GR A Lyst3862
No ratings yet
Dec 01 To 31 2020 Rbi GR B Sebi GR A Nabard GR A Lyst3862
243 pages
Unit 3
No ratings yet
Unit 3
16 pages
Photoelectric Effect in Quantum Physics
No ratings yet
Photoelectric Effect in Quantum Physics
13 pages
PR Avon Tubetech 21feb23
No ratings yet
PR Avon Tubetech 21feb23
7 pages
Build A Large Space Saving CNC Router For Under 60
No ratings yet
Build A Large Space Saving CNC Router For Under 60
10 pages
Lecture 4.8 - Summary of Contents Introduced in Week 1 To 4
No ratings yet
Lecture 4.8 - Summary of Contents Introduced in Week 1 To 4
35 pages
Exercise Sheet 2
No ratings yet
Exercise Sheet 2
1 page
Automatic Congestion Handling Feature Parameter Description: Issue Date
No ratings yet
Automatic Congestion Handling Feature Parameter Description: Issue Date
61 pages
Karthik June24
No ratings yet
Karthik June24
1 page
Present Continuous - Affirmative and Negative PDF
No ratings yet
Present Continuous - Affirmative and Negative PDF
3 pages
12th Maths EM Queation Types English Medium PDF Download
No ratings yet
12th Maths EM Queation Types English Medium PDF Download
18 pages
Free Range Farming Manual
No ratings yet
Free Range Farming Manual
55 pages
Word Based Arrangement For Bank Exam - Question Bank Set 1 (Eng)
No ratings yet
Word Based Arrangement For Bank Exam - Question Bank Set 1 (Eng)
5 pages
40 (3ph) KVA-WC
No ratings yet
40 (3ph) KVA-WC
5 pages
ABB Low Voltage Coils & Kits Pricing
No ratings yet
ABB Low Voltage Coils & Kits Pricing
1 page
Quantification and Costing - by Q-Colleagues
No ratings yet
Quantification and Costing - by Q-Colleagues
15 pages
I Built A Side Hustle With AI. Now It Pays Me $800 Every Month - by Raj Monetix ? - Jun, 2025
0% (1)
I Built A Side Hustle With AI. Now It Pays Me $800 Every Month - by Raj Monetix ? - Jun, 2025
7 pages
Grades 1-12 Performance Overview
100% (2)
Grades 1-12 Performance Overview
12 pages
Practical Exercise3 - Cultural Ecosystem Services Assesssment
No ratings yet
Practical Exercise3 - Cultural Ecosystem Services Assesssment
3 pages
MY2222 802 X253 GLC Spec Combine 20220309
No ratings yet
MY2222 802 X253 GLC Spec Combine 20220309
56 pages
Allowable Stress Values of Stainless Steel and Carbon Steel PDF
No ratings yet
Allowable Stress Values of Stainless Steel and Carbon Steel PDF
2 pages
Finance & HR Expert Seeking New Role
No ratings yet
Finance & HR Expert Seeking New Role
7 pages
AU - GMA Garnet SDS - Rev 17 - May 2019 - Final PDF
No ratings yet
AU - GMA Garnet SDS - Rev 17 - May 2019 - Final PDF
7 pages
Awareness on Poverty and Malnutrition
No ratings yet
Awareness on Poverty and Malnutrition
9 pages
General Principles of Food Hygiene CXC 1-1969
No ratings yet
General Principles of Food Hygiene CXC 1-1969
35 pages

Machine Learning Lab

Uploaded by

Machine Learning Lab

Uploaded by

Machine Learning Lab

2. Write a program to Implement Support Vector Machines.

3. Exploratory Data Analysis for Classification using Pandas and Matplotlib.

4. Implement a program for Bias, Variance, and Cross Validation.

5. Write a program to simulate a perception network for pattern classification

7. Build an Artificial Neural Network by implementing the Back propagation

considering few test data sets.

9. Assuming a set of documents that need to be classified, use the naïve

library classes/API in the program.

11. Write a program to implement k-Nearest Neighbour algorithm to classify the

12. Implement the non-parametric Locally Weighted Regression algorithm in

set for your experiment and draw graphs.

15. Solve optimal relay coordination as a linear programming problem using

Before we can start using Jupyter Notebook, we need to install it on our

5. After the installation is complete, type the following command to start

Install Below Libraries:

1) To upgrade Pip : python.exe -m pip install --upgrade pip

#Import the required modules

# Splitting the dataset

#Visualizing the results

# plot for the Testing

# Import the required modules

Note: At every run we get different plots and different values

# Import necessary libraries

# Import necessary libraries

# Import necessary libraries

The perceptron, as a foundational neural network model, is used in this simulation

A synthetic dataset is generated where points are randomly distributed in a 2D

Another synthetic dataset is generated where data points represent values of a

# Import necessary libraries

# 3. Generate synthetic data for function approximation

def train(self, X, y, lr=0.1, epochs=1000):

def predict(self, x):

# Train and plot function approximation

Note: At every run we get different types of plots.

The ID3 (Iterative Dichotomiser 3) algorithm is a foundational decision tree

Once the decision tree is constructed, it can be utilized for classification. To

Source code: (tennis.csv)

Begin by importing the Iris dataset, commonly available in machine learning

Likelihood Estimation: Estimate the likelihood probabilities of features given

Step-2: Calculate the Euclidean distance of K number of neighbours

Step-5: Our model is ready.

from sklearn.cluster import KMeans

Locally Weighted Regression (LWR):

Locally Weighted Regression is a non-parametric algorithm used for regression

Synthetic Data Generation:

Locally Weighted Regression Function:

The core of the algorithm lies in the locally_weighted_regression function. For a

It fits a regression model considering these weights, resulting in a locally

The Candidate-Elimination algorithm is a concept learning method that refines a

Initialization: Start with the most general and specific hypotheses.

Output: The algorithm refines hypotheses until it converges to a consistent

Source code: (training_data.csv)

# Implement Candidate-Elimination algorithm

specific_h = [None] * (len(examples.columns) - 1)

general_h = [['?' for _ in range(len(examples.columns) - 1)] for _ in

if row[-1] == 1: # Positive example

elif specific_h[j] != row[j]:

else: # Negative example

# Remove all inconsistent hypotheses from general_h

for hypothesis in general_h:

if all(val == '?' for val in hypothesis):

for hypothesis in to_remove:

return specific_h, general_h

# Run the Candidate-Elimination algorithm

specific, general = candidate_elimination(data)

for hypothesis in general:

Optimal relay coordination is a method used in power systems to ensure that

Using Genetic Algorithm (GA) to solve this as a linear programming problem is

Let's break down the problem:

# Crossover and Mutation

Note: At every run we get different values.

You might also like