0% found this document useful (0 votes)
90 views43 pages

Machine Learning Lab

Uploaded by

aids2aitstpt
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
90 views43 pages

Machine Learning Lab

Uploaded by

aids2aitstpt
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 43

Machine Learning Lab

List of Tasks
1. Exercises to solve the real-world problems using the following machine
learning methods:

a. Linear Regression

b. Logistic Regression.

2. Write a program to Implement Support Vector Machines.

3. Exploratory Data Analysis for Classification using Pandas and Matplotlib.

4. Implement a program for Bias, Variance, and Cross Validation.

5. Write a program to simulate a perception network for pattern classification


and function approximation.

6. Write a program to demonstrate the working of the decision tree based ID3
algorithm. Use an appropriate data set for

building the decision tree and apply this knowledge to classify a new sample.

7. Build an Artificial Neural Network by implementing the Back propagation


algorithm and test the same using appropriate

data sets.

8. Write a program to implement the naïve Bayesian classifier for Iris data set.
Compute the accuracy of the classifier,

considering few test data sets.

9. Assuming a set of documents that need to be classified, use the naïve


Bayesian Classifier model to perform this task.

Built-in Java classes/API can be used to write the program. Calculate the
accuracy, precision, and recall for your data
set.

10. Apply EM algorithm to cluster a Heart Disease Data Set. Use the same data
set for clustering using k-Means algorithm.

Compare the results of these two algorithms and comment on the quality of
clustering. You can add Java/Python ML

library classes/API in the program.

11. Write a program to implement k-Nearest Neighbour algorithm to classify the


iris data set. Print both correct and wrong

predictions.

12. Implement the non-parametric Locally Weighted Regression algorithm in


order to fit data points. Select appropriate data

set for your experiment and draw graphs.

13. For a given set of training data examples stored in a .CSV file, implement
and demonstrate the Candidate-Elimination

algorithm to output a description of the set of all hypotheses consistent with the
training examples.

14. Implement and demonstrate the FIND-S algorithm for finding the most
specific hypothesis based on a given set of

training data samples. Read the training data from a .CSV file.

15. Solve optimal relay coordination as a linear programming problem using


Genetic Algorithm.
To Run the Machine Learning Programs, you need to install below
modules or packages:

Before we can start using Jupyter Notebook, we need to install it on our


Windows machine. Here are the steps to follow:

1. First, you need to have Python installed on your machine. You can
download the latest version of Python from the official website
(https://www.python.org/downloads/). Make sure to select the option to
add Python to your system PATH during the installation process.
2. Once Python is installed, open the command prompt by pressing the
Windows key + R and typing cmd in the Run dialog box.
3. In the command prompt, type the following command to install Jupyter
Notebook:
4. pip install jupyter
This will download and install Jupyter Notebook and its dependencies.

5. After the installation is complete, type the following command to start


Jupyter Notebook:
6. jupyter notebook
This will open a new tab in your web browser with the Jupyter Notebook
interface.

Once you’ve created a new notebook, you can start writing code in the cells. To
run a cell, press Shift + Enter or click the Run button in the toolbar. You can
also add text, equations, and visualizations to your notebook using Markdown
syntax.
Jupyter Notebook also supports a variety of keyboard shortcuts to make your
work more efficient. Here are some of the most useful shortcuts:

• Shift + Enter: Run the current cell and move to the next one.
• Ctrl + Enter: Run the current cell.
• Esc: Enter command mode.
• Enter: Enter edit mode.
• A: Insert a new cell above the current cell.
• B: Insert a new cell below the current cell.
• D + D: Delete the current cell.
• M: Change the current cell type to Markdown.
• Y: Change the current cell type to code.

Install Below Libraries:

In Cmd:

1) To upgrade Pip : python.exe -m pip install --upgrade pip


2) For Numpy : pip install numpy
3) For Pandas : pip install pandas
4) For Matplotlib : pip install matplotlib
5) For sklearn : pip install scikit-learn
1A) Aim: Exercise to solve the real-world problems using LINEAR
REGRESSION.
Description: Linear regression is a popular machine learning algorithm used for
predicting a continuous target variable based on one or more independent
variables.
The general equation for linear regression with a single independent variable
can be written as:
y = b0 + b1 * x
Where:
• y is the target variable
• x is the independent variable
• b0 is the intercept (the value of y when x is 0)
• b1 is the coefficient (the change in y for a unit change in x)
The goal of linear regression is to find the best values for b0 and b1 that
minimize the difference between the predicted values and the actual
values in the training data.
Source code: (CSV File: Salary_Data.csv)

#Import the required modules


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
dataset = pd.read_csv('Salary_Data.csv')
dataset.head()

#Data Preprocessing
X = dataset.iloc[:,:-1].values #independent variable array
y = dataset.iloc[:,1].values #dependent variable vector

# Splitting the dataset


from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test=train_test_split(X,y,test_size=1/3,random_state=0)
#Fitting linear regression model into the training set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train,y_train) #actually produces the linear eqn for the data
# Predicting the test set results
y_pred = regressor.predict(X_test)
y_pred
y_test

#Visualizing the results


# plot for the Training
plt.scatter(X_train, y_train, color='red') # plot the observation line
plt.plot(X_train, regressor.predict(X_train), color='blue') # plot the regression line
plt.title("Salary vs Experience (Training set)") # stating the title of the graph
plt.xlabel("Years of experience") # adding the name of x-axis
plt.ylabel("Salaries") # adding the name of y-axis
plt.show() # specifies end of graph

# plot for the Testing


plt.scatter(X_test, y_test, color='red')
plt.plot(X_train, regressor.predict(X_train), color='blue') # plot the regression line
plt.title("Salary vs Experience (Testing set)")
plt.xlabel("Years of experience")
plt.ylabel("Salaries")
plt.show()
Output:
1B) Aim: Exercise to solve the real-world problems using LOGISTIC
REGRESSION.
Description: Logistic regression is a supervised machine learning algorithm
used for binary classification tasks. It models the relationship between the
independent variables and the probability of a binary outcome.
In logistic regression, the output is a probability value that is transformed using a
sigmoid function to constrain it between 0 and 1. If the probability is above a
certain threshold (e.g., 0.5), the sample is classified as belonging to one class, and
if it is below the threshold, it is classified as belonging to the other class.
Source code:

# Import the required modules


from sklearn.datasets import make_classification
from matplotlib import pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
import pandas as pd
# Generate the dataset
x, y = make_classification(
n_samples=100,
n_features=1,
n_classes=2,
n_clusters_per_class=1,
flip_y=0.03,
n_informative=1,
n_redundant=0,
n_repeated=0
)
# Visualize the Data
plt.scatter(x, y, c=y, cmap='rainbow')
plt.title('Scatter Plot of Logistic Regression')
plt.show()
# Split the Dataset
# Split the dataset into training and test dataset
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=1)
#Create a Logistic Regression Object and Perform Logistic Regression
log_reg = LogisticRegression()
log_reg.fit(x_train, y_train)
LogisticRegression(C=1.0, class_weight=None, dual=False,
fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='ovr',
n_jobs=1, penalty='l2', random_state=None, solver='liblinear', tol=0.0001,
verbose=0, warm_start=False)
print(log_reg.coef_)
print(log_reg.intercept_)
# Perform prediction using the test dataset
y_pred = log_reg.predict(x_test)
confusion_matrix(y_test, y_pred)

Output:

Note: At every run we get different plots and different values


2) Aim: Write a program to implement support-vector machines.
Description:
Support Vector Machines (SVM) is a supervised machine learning algorithm
used for both classification and regression tasks. It is particularly effective in
handling high-dimensional data and finding non-linear decision boundaries.
In SVM, the algorithm constructs hyperplanes in a high-dimensional space that
separate different classes of data points. The main objective is to find the
hyperplane that maximizes the margin between the classes, allowing for better
generalization and robustness to new data points.
In this example, we import the necessary libraries and define the input features X
and target labels y. We split the data into training and testing sets using
train_test_split.
Next, we apply feature scaling using StandardScaler to standardize the input
features to have zero mean and unit variance.We create an SVM classifier with a
linear kernel using svm. SVC(kernel='linear') and fit the classifier to the
training data using the fit method. Then, we use the trained classifier to predict
the classes of the testing data using the predict method. Finally, we calculate the
accuracy of the model using accuracy_score.
Source code:

# Import necessary libraries


import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
# Load the iris dataset
iris = datasets.load_iris()
X = iris.data[:, :2] # we only take the first two features for visualization
y = iris.target
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)
# Create an SVM classifier with a linear kernel
clf = SVC(kernel='linear')
# Train the classifier on the training data
clf.fit(X_train, y_train)
# Make predictions on the testing data
y_pred = clf.predict(X_test)
# Calculate the accuracy of the classifier
acc = accuracy_score(y_test, y_pred)
# Plot the decision boundary
xmin, xmax = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
ymin, ymax = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
xx, yy = np.meshgrid(np.arange(xmin, xmax, 0.02), np.arange(ymin, ymax, 0.02))
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.Paired)
# Plot the training and testing data points
plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=plt.cm.Paired,
edgecolors='k')
plt.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=plt.cm.Paired, marker='x',
edgecolors='k')
# Add labels and a title to the plot
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.title('SVM decision boundary (accuracy={:.2f})'.format(acc))
# Display the plot
plt.show()
Output:
3) Aim: Exploratory Data Analysis for classification using Pandas and
Matplotlib.
Description: Pandas and Matplotlib are two popular libraries in Python
commonly used for data manipulation, analysis, and visualization. Here's a brief
overview of each library:
Pandas: Pandas is a powerful library for data manipulation and analysis. It
provides data structures and functions to efficiently handle structured data, such
as tabular data, time series, and more. The primary data structures in Pandas are
Series (a one-dimensional labelled array) and Data Frame (a two-dimensional
labelled data structure).
Pandas offers a wide range of functionalities, including data loading and saving,
data cleaning and preprocessing, data exploration and manipulation, merging and
joining datasets, and handling missing values. It also provides powerful data
aggregation and grouping capabilities, as well as support for time series analysis.
Matplotlib: Matplotlib is a popular library for creating static, animated, and
interactive visualizations in Python. It provides a wide range of plotting functions
and customization options to create various types of plots, including line plots,
scatter plots, bar plots, histograms, and more.
Source code:

# Import necessary libraries


import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Create a sample dataframe
data = {
'Student_ID': range(1, 101),
'Math_Score': np.random.randint(40, 100, 100),
'Science_Score': np.random.randint(40, 100, 100),
'Pass_Fail': np.random.choice(['Pass', 'Fail'], size=100)
}
# Convert the dictionary to a DataFrame
df = pd.DataFrame(data)
# Display the first few rows of the dataframe
print(df.head())
# Summary statistics of the dataset
print(df.describe())
# Check for missing values
print(df.isnull().sum())
# Data visualization
# Example: Histogram of a numeric variable (Math_Score)
plt.figure(figsize=(10, 6))
plt.hist(df['Math_Score'], bins=30, edgecolor='k', alpha=0.7)
plt.title('Distribution of Math Scores')
plt.xlabel('Math Score')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()
# Example: Count plot for a categorical variable (Pass_Fail)
pass_fail_counts = df['Pass_Fail'].value_counts()
plt.figure(figsize=(8, 6))
plt.bar(pass_fail_counts.index, pass_fail_counts.values, color=['blue', 'red'])
plt.title('Count of Pass/Fail')
plt.xlabel('Pass/Fail')
plt.ylabel('Count')
plt.grid(axis='y')
plt.show()
Output:
4) Aim: Implement a program for Bias, Variance and cross validation.
Description:
Machine learning is a branch of Artificial Intelligence, which allows machines to
perform data analysis and make predictions. However, if the machine learning
model is not accurate, it can make predictions errors, and these prediction errors
are usually known as Bias and Variance. In machine learning, these errors will
always be present as there is always a slight difference between the model
predictions and actual predictions. The main aim of ML/data science analysts is
to reduce these errors in order to get more accurate results.
Low Bias: A low bias model will make fewer assumptions about the form of the
target function. High Bias: A model with a high bias makes more assumptions,
and the model becomes unable to capture the important features of our dataset. A
high bias model also cannot perform well on new data.

Variance tells that how much a random variable is different from its expected
value.
Low variance means there is a small variation in the prediction of the target
function with changes in the training data set. At the same time, High
variance shows a large variation in the prediction of the target function with
changes in the training dataset.
Source code:

# Import necessary libraries


import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split, KFold
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# 1. Generate synthetic data
np.random.seed(0)
X = np.random.rand(100, 1) * 10 # 100 data points in the range [0, 10)
# Quadratic relationship with noise
y = 3 * (X - 5) ** 2 + np.random.randn(100, 1) * 2
# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
# 2. Bias and Variance Decomposition
def calculate_bias_variance(model, X, y, num_runs=100):
y_preds = np.zeros((len(X), num_runs))
for i in range(num_runs):
# Generate bootstrapped samples
indices = np.random.choice(len(X), size=len(X), replace=True)
X_sampled = X[indices]
y_sampled = y[indices]
model.fit(X_sampled, y_sampled)
y_preds[:, i] = model.predict(X).flatten()
# Calculate bias and variance
bias = np.mean((np.mean(y_preds, axis=1) - y.flatten()) ** 2)
variance = np.mean(np.var(y_preds, axis=1))
return bias, variance
# Create a simple linear regression model
model = LinearRegression()
bias, variance = calculate_bias_variance(model, X_train, y_train)
print(f"Bias: {bias:.2f}")
print(f"Variance: {variance:.2f}")
# 3. Cross-validation
k_fold = KFold(n_splits=5, shuffle=True, random_state=42)
mse_scores = []
for train_idx, val_idx in k_fold.split(X_train):
X_train_fold, X_val_fold = X_train[train_idx], X_train[val_idx]
y_train_fold, y_val_fold = y_train[train_idx], y_train[val_idx]
model.fit(X_train_fold, y_train_fold)
y_val_pred = model.predict(X_val_fold)
mse = mean_squared_error(y_val_fold, y_val_pred)
mse_scores.append(mse)
average_mse = np.mean(mse_scores)
print(f"Average MSE using 5-fold CV: {average_mse:.2f}")
# Plotting
plt.scatter(X_train, y_train, c='b', label='Training Data')
plt.scatter(X_test, y_test, c='r', label='Test Data', alpha=0.5)
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.title('Synthetic Data')
plt.show()

Output:
Bias: 540.17
Variance: 18.08
Average MSE using 5-fold CV: 566.52
5) Aim: Write a program to simulate a perception network for pattern
classification and function approximation.
Description:

The perceptron, as a foundational neural network model, is used in this simulation


to understand its capabilities in both pattern classification and function
approximation tasks.

Pattern Classification:

A synthetic dataset is generated where points are randomly distributed in a 2D


space. The perceptron is trained to classify these points based on whether they lie
inside or outside a predefined region.

Function Approximation:

Another synthetic dataset is generated where data points represent values of a


function with some added noise. The perceptron attempts to approximate this
function by learning the underlying pattern from the data points.

The program provides visual insights by plotting the generated data, the decision
boundary for classification, and the approximated function using the perceptron.
Through this simulation, one can observe the perceptron's ability to make binary
classifications and its approach to approximating simple functions.

Source code:

# Import necessary libraries


import numpy as np
import matplotlib.pyplot as plt
# 1. Generate synthetic data for pattern classification
X = np.random.rand(100, 2) * 10 - 5
y = np.array([1 if x[0]**2 + x[1]**2 < 9 else 0 for x in X])
plt.scatter(X[y == 0][:, 0], X[y == 0][:, 1], color='b')
plt.scatter(X[y == 1][:, 0], X[y == 1][:, 1], color='r')
plt.show() # Plot the data
# 2. Perceptron Learning Algorithm (PLA) for classification
class Perceptron:
def __init__(self):
self.weights = np.zeros(3)
def train(self, X, y, lr=0.1, epochs=1000):
X = np.c_[X, np.ones(X.shape[0])]
for _ in range(epochs):
for i in range(X.shape[0]):
prediction = np.dot(X[i], self.weights)
self.weights += lr * (y[i] - prediction) * X[i]
def predict(self, x):
return 1 if np.dot(np.append(x, 1), self.weights) > 0 else 0
# Train and plot decision boundary
perceptron = Perceptron()
perceptron.train(X, y)
x1 = np.linspace(-5, 5, 100)
x2 = -(perceptron.weights[0] * x1 + perceptron.weights[2]) /
perceptron.weights[1]
plt.scatter(X[y == 0][:, 0], X[y == 0][:, 1], color='b')
plt.scatter(X[y == 1][:, 0], X[y == 1][:, 1], color='r')
plt.plot(x1, x2, color='k')
plt.show()

# 3. Generate synthetic data for function approximation


X_func = np.random.rand(100) * 10 - 5
y_func = np.sin(X_func) + np.random.randn(100) * 0.5
# Perceptron for function approximation

class SimplePerceptron:
def __init__(self):
self.weights = np.zeros(2)

def train(self, X, y, lr=0.1, epochs=1000):


X = np.c_[X, np.ones(X.shape[0])]
for _ in range(epochs):
for i in range(X.shape[0]):
prediction = np.dot(X[i], self.weights)
self.weights += lr * (y[i] - prediction) * X[i]

def predict(self, x):


return np.dot(np.append(x, 1), self.weights)

# Train and plot function approximation


func_perceptron = SimplePerceptron()
func_perceptron.train(X_func, y_func)
x_vals = np.linspace(-5, 5, 100)
y_vals = [func_perceptron.predict(x) for x in x_vals]
plt.scatter(X_func, y_func, color='b')
plt.plot(x_vals, y_vals, color='r')
plt.show()
Output:

Note: At every run we get different types of plots.


6) Aim: Write a program to demonstrate the working of the decision tree
based ID3 algorithm. Use an appropriate data set for building the decision
tree and apply this knowledge to classify a new sample.

Description:

The ID3 (Iterative Dichotomiser 3) algorithm is a foundational decision tree


algorithm primarily used for classification tasks. The core principle behind ID3
is to construct a decision tree by iteratively selecting the best attribute that
maximizes the information gain to partition the dataset.

Working Mechanism:

Entropy Calculation:

The ID3 algorithm starts by calculating the entropy of each attribute present in
the dataset. Entropy provides a measure of the randomness or impurity in the
dataset. Lower entropy values suggest attributes that are more effective for
classification.

Information Gain:

Post entropy calculation, the algorithm computes the information gain for each
attribute. Information gain quantifies the reduction in entropy achieved by
splitting the dataset based on a particular attribute. The attribute with the highest
information gain is selected as the decision node.

Tree Construction:

Using the attribute with the maximum information gain, the dataset is split into
subsets. This splitting process is recursive, meaning each subset becomes a node
in the decision tree, and the process continues until a stopping criterion is met,
such as reaching a minimum node size or depth.

Decision Making:

Once the decision tree is constructed, it can be utilized for classification. To


classify a new sample, one simply traverses the tree from the root node down to
a leaf node, making decisions based on the attribute values of the sample.

Source code: (tennis.csv)


import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import math
import copy
dataset = pd.read_csv('tennis.csv')
X = dataset.iloc[:, 1:].values
# print(X)
attribute = ['outlook', 'temp', 'humidity', 'wind']
class Node(object):
def __init__(self):
self.value = None
self.decision = None
self.childs = None
def findEntropy(data, rows):
yes = 0
no = 0
ans = -1
idx = len(data[0]) - 1
entropy = 0
for i in rows:
if data[i][idx] == 'Yes':
yes = yes + 1
else:
no = no + 1
x = yes/(yes+no)
y = no/(yes+no)
if x != 0 and y != 0:
entropy = -1 * (x*math.log2(x) + y*math.log2(y))
if x == 1:
ans = 1
if y == 1:
ans = 0
return entropy, ans
def findMaxGain(data, rows, columns):
maxGain = 0
retidx = -1
entropy, ans = findEntropy(data, rows)
if entropy == 0:
"""if ans == 1:
print("Yes")
else:
print("No")"""
return maxGain, retidx, ans
for j in columns:
mydict = {}
idx = j
for i in rows:
key = data[i][idx]
if key not in mydict:
mydict[key] = 1
else:
mydict[key] = mydict[key] + 1
gain = entropy
for key in mydict:
yes = 0
no = 0
for k in rows:
if data[k][j] == key:
if data[k][-1] == 'Yes':
yes = yes + 1
else:
no = no + 1
# print(yes, no)
x = yes/(yes+no)
y = no/(yes+no)
# print(x, y)
if x != 0 and y != 0:
gain += (mydict[key] * (x*math.log2(x) + y*math.log2(y)))/14
# print(gain)
if gain > maxGain:
# print("hello")
maxGain = gain
retidx = j
return maxGain, retidx, ans
def buildTree(data, rows, columns):
maxGain, idx, ans = findMaxGain(X, rows, columns)
root = Node()
root.childs = []
if maxGain == 0:
if ans == 1:
root.value = 'Yes'
else:
root.value = 'No'
return root
root.value = attribute[idx]
mydict = {}
for i in rows:
key = data[i][idx]
if key not in mydict:
mydict[key] = 1
else:
mydict[key] += 1
newcolumns = copy.deepcopy(columns)
newcolumns.remove(idx)
for key in mydict:
newrows = []
for i in rows:
if data[i][idx] == key:
newrows.append(i)
# print(newrows)
temp = buildTree(data, newrows, newcolumns)
temp.decision = key
root.childs.append(temp)
return root
def traverse(root):
print(root.decision)
print(root.value)

n = len(root.childs)
if n > 0:
for i in range(0, n):
traverse(root.childs[i])
def calculate():
rows = [i for i in range(0, 14)]
columns = [i for i in range(0, 4)]
root = buildTree(X, rows, columns)
root.decision = 'Start'
traverse(root)
calculate()
Output:
8) Aim: Write a program to implement the naïve Bayesian classifier for
Iris data set. Compute the accuracy of the classifier, considering few test
data sets

Description:
The implementation of the Naïve Bayes classifier for the Iris dataset offers an
insightful exploration into the application of probabilistic classification
techniques on a well-known dataset in the machine learning community.

Implementation Steps:

Data Loading:

Begin by importing the Iris dataset, commonly available in machine learning


libraries like scikit-learn. This dataset is partitioned into feature vectors
representing the four attributes and corresponding class labels.

Data Preprocessing:

Split the dataset into a training set and a test set. Typically, a significant portion
(e.g., 70-80%) is used for training, and the remaining portion is used for testing.

Model Training:

Prior Probabilities: Compute the prior probabilities for each class based on the
frequency of occurrences in the training set.

Likelihood Estimation: Estimate the likelihood probabilities of features given


each class. Due to the Naïve Bayes assumption of feature independence, calculate
individual probabilities for each feature.

Classification:

For each test instance, compute the posterior probabilities of all classes using
Bayes' theorem. Assign the class label to the instance based on the highest
posterior probability.

Performance Evaluation:

After classifying all test instances, compare the predicted class labels with the
actual labels to compute the classifier's accuracy. Accuracy is a fundamental
metric that provides insights into the model's predictive performance on unseen
data.
Source code:

#Import libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score
# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Train the Naive Bayes classifier
model = GaussianNB()
model.fit(X_train, y_train)
# Evaluate the model on the test set
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy:', accuracy)
#Make predictions on new samples
sample = [[5.1, 3.5, 1.4, 6.2]]
prediction = model.predict(sample)
print('Prediction:', prediction)

Output:
11) Aim: Write a program to implement k-Nearest Neighbor algorithm to
classify the iris data set. Print both correct and wrong predictions.
Description:
K-Nearest Neighbour is one of the simplest Machine Learning algorithms based
on Supervised Learning technique. K-NN algorithm assumes the similarity
between the new case/data and available cases and put the new case into the
category that is most similar to the available categories.
The K-NN working can be explained on the basis of the below algorithm:
Step-1: Select the number K of the neighbours

Step-2: Calculate the Euclidean distance of K number of neighbours

Step-3: Take the K nearest neighbours as per the calculated Euclidean distance.
Among these k neighbours, count the number of the data points in each category.

Step-4: Assign the new data points to that category for which the number of the
neighbour is maximum.

Step-5: Our model is ready.

Source code:

from sklearn.cluster import KMeans


from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt
# Generate sample data
X, y = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0)
# Train a KMeans clustering model on the data
kmeans = KMeans(n_clusters=4)
kmeans.fit(X)
# Plot the results
plt.scatter(X[:, 0], X[:, 1], c=kmeans.labels_, cmap='viridis')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], marker='*',
s=200, color='black')
plt.show()
Output:
12) Aim: Implement the non-parametric Locally Weighted Regression
algorithm in order to fit data points. Select appropriate data set for your
experiment and draw graphs.
Description:

Locally Weighted Regression (LWR):

Locally Weighted Regression is a non-parametric algorithm used for regression


tasks, especially beneficial when dealing with complex and nonlinear
relationships between variables. Instead of fitting a single model to the entire
dataset, LWR fits a separate regression model to each point in the dataset, giving
more weight to points closer to the one being predicted.

Implementation Steps:

Synthetic Data Generation:

A synthetic dataset is created using numpy, comprising 100 data points generated
along a sine curve with some added random noise.

Locally Weighted Regression Function:

The core of the algorithm lies in the locally_weighted_regression function. For a


given test point:

It computes weights for each training point based on the Gaussian kernel,
emphasizing points closer to the test point.

It fits a regression model considering these weights, resulting in a locally


weighted prediction.

Prediction:

Using the LWR function, predictions are generated for each point in the dataset.
This means that instead of a single line of best fit, the algorithm calculates a curve
that best fits the data point-by-point.

Visualization:

The matplotlib library is utilized to visualize the synthetic data points as blue dots
and the LWR fit as a red curve. This visualization provides a clear understanding
of how LWR fits a curve that closely follows the underlying trend of the data,
especially in regions of high data density.
Source code:

import numpy as np
import matplotlib.pyplot as plt
# Generate synthetic data
np.random.seed(0)
X = np.linspace(0, 10, 100)
y = np.sin(X) + np.random.normal(0, 0.2, 100)
#Locally Weighted Regression algorithm.
def locally_weighted_regression(test_point, X, y, tau):
m = X.shape[0]
weights = np.exp(-((X - test_point) ** 2) / (2 * tau ** 2))
W = np.diag(weights)
theta = np.linalg.inv(X[:, None].T @ W @ X[:, None]) @ X[:, None].T @ W @ y
prediction = test_point * theta
return prediction[0]
# Predictions using LWR
predictions = [locally_weighted_regression(test_point, X, y, tau=0.5) for test_point in
X]
# Plotting the results
plt.figure(figsize=(10, 6))
plt.scatter(X, y, color='blue', label='Data Points')
plt.plot(X, predictions, color='red', label='LWR Fit')
plt.title('Locally Weighted Regression')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.grid(True)
plt.show()
Output:
13) Aim: For a given set of training data examples stored in a .CSV file,
implement and demonstrate the Candidate-Elimination algorithm to output
a description of the set of all hypotheses consistent with the training
examples.
Description:

The Candidate-Elimination algorithm is a concept learning method that refines a


general hypothesis and a specific hypothesis iteratively based on observed
training examples to find a consistent representation of the target concept.

Initialization: Start with the most general and specific hypotheses.

Iterative Refinement:

For positive instances, generalize the specific hypothesis and specialize the
general hypothesis.

For negative instances, generalize the general hypothesis and specialize the
specific hypothesis.

Output: The algorithm refines hypotheses until it converges to a consistent


representation between the most general and specific hypotheses.

Source code: (training_data.csv)

import pandas as pd

# Load dataset

def load_data(filename):

return pd.read_csv(filename)

# Implement Candidate-Elimination algorithm

def candidate_elimination(examples):

specific_h = [None] * (len(examples.columns) - 1)

general_h = [['?' for _ in range(len(examples.columns) - 1)] for _ in


range(len(examples.columns) - 1)]
for i, row in examples.iterrows():

if row[-1] == 1: # Positive example

for j in range(len(specific_h)):

if specific_h[j] is None:

specific_h[j] = row[j]

elif specific_h[j] != row[j]:

specific_h[j] = '?'

general_h[j][j] = '?'

else: # Negative example

for j in range(len(specific_h)):

if specific_h[j] != row[j]:

general_h[j][j] = specific_h[j]

# Remove all inconsistent hypotheses from general_h

to_remove = []

for hypothesis in general_h:

if all(val == '?' for val in hypothesis):

to_remove.append(hypothesis)

for hypothesis in to_remove:

general_h.remove(hypothesis)

return specific_h, general_h


# Load the dataset from CSV

data = load_data('training_data.csv')

# Run the Candidate-Elimination algorithm

specific, general = candidate_elimination(data)

# Display results

print("\nSpecific hypothesis:")

print(specific)

print("\nGeneral hypothesis:")

for hypothesis in general:

print(hypothesis)

Output:
14) Aim: Implement and demonstrate the FIND-S algorithm for finding the
most specific hypothesis based on a given set of training data samples. Read
the training data from a .CSV file.

Description:
The find-S algorithm is a machine learning concept learning algorithm. The find-
S technique identifies the hypothesis that best matches all of the positive cases.
The find-S algorithm considers only positive cases.
When the find-S method fails to categorize observed positive training data, it
starts with the most particular hypothesis and generalizes it.

Representations:
• The most specific hypothesis is represented using ϕ.
• The most general hypothesis is represented using ?.

Source code:

import numpy as np
# Define the training data
X = np.array([
['Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same'],
['Sunny', 'Warm', 'High', 'Strong', 'Warm', 'Same'],
['Rainy', 'Cold', 'High', 'Strong', 'Warm', 'Change'],
['Sunny', 'Warm', 'High', 'Strong', 'Cool', 'Change']
])
y = np.array(['+', '+', '-', '+'])
# Initialize the most specific hypothesis
hypothesis = ['0', '0', '0', '0', '0', '0']
# Find the most specific hypothesis
for i in range(len(y)):
if y[i] == '+':
for j in range(len(X[i])):
if hypothesis[j] == '0':
hypothesis[j] = X[i][j]
elif hypothesis[j] != X[i][j]:
hypothesis[j] = '?'
print("Most specific hypothesis:", hypothesis)

Output:
15) Aim: Solve optimal relay coordination as a linear programming problem
using Genetic Algorithm.

Description:

Optimal relay coordination is a method used in power systems to ensure that


protection relays respond in a coordinated manner to faults. The coordination
ensures that the relay closest to the fault operates before other relays further away.

Using Genetic Algorithm (GA) to solve this as a linear programming problem is


an interesting approach. Here's a simplified demonstration to get you started. Note
that this is a basic example and may not capture all complexities of real-world
scenarios.

Let's break down the problem:

Objective: Minimize the sum of the relay time settings while ensuring
coordination.

Constraints: Ensure that the time settings for each relay are within their
minimum and maximum bounds.

For simplicity, let's assume we have 3 relays with minimum and maximum time
settings.

Source Code:

import numpy as np
import random
import matplotlib.pyplot as plt
# Define relay settings [min_time, max_time]
relay_settings = [
[0.1, 0.3], # Relay 1
[0.2, 0.5], # Relay 2
[0.4, 0.8] # Relay 3
]
# GA parameters
population_size = 100
num_generations = 100
mutation_rate = 0.1

def compute_fitness(individual):
return sum(individual)
def crossover(parent1, parent2):
crossover_point = random.randint(0, len(parent1) - 1)
child1 = parent1[:crossover_point] + parent2[crossover_point:]
child2 = parent2[:crossover_point] + parent1[crossover_point:]
return child1, child2

def mutate(individual):
for i in range(len(individual)):
if random.random() < mutation_rate:
individual[i] = random.uniform(relay_settings[i][0], relay_settings[i][1])
return individual
# Initialize population
population = [ [random.uniform(min_time, max_time) for min_time, max_time in
relay_settings] for _ in range(population_size)]
# Main GA loop
for generation in range(num_generations):
# Evaluate fitness
fitness_scores = [compute_fitness(individual) for individual in population]
sorted_population = [x for _, x in sorted(zip(fitness_scores, population), key=lambda
pair: pair[0])]
# Selection: Top 50% of the population becomes parents
parents = sorted_population[:population_size // 2]

# Crossover and Mutation


offspring = []
while len(offspring) < population_size:
parent1, parent2 = random.sample(parents, 2)
child1, child2 = crossover(parent1, parent2)
offspring.append(mutate(child1))
offspring.append(mutate(child2))
# Update population
population = offspring
# Get the best individual (relay settings)
best_individual = sorted(population, key=lambda x: compute_fitness(x))[0]
print("Optimal relay settings:", best_individual)

Output:

Note: At every run we get different values.

You might also like