ML Using Python IT UPDATED
ML Using Python IT UPDATED
Regulations: R20
Year & Sem: III-II
VISION
To evolve as a Premier Engineering Institution in the Country with its continues
strive for excellence inEducation, Research and Technological Services.
MISSION
To generate the community of highly learned students with greater acquirement of
knowledge and to apply it professionally with due consideration for ecological,
economic and ethical issues.
VISION
To explore innovative approaches to enhance and expand learning opportunities
through the integration of various technologies.
To build a strong research and teaching environment that responds to the real-
time challenges of the industry
MISSION
To inculcate the blend of competence, aptitude of knowledge and investigate flair
through devising anambient environment for sustainable learning.
To transform attitude, values, priorities by changing mindset and instill positive
outlook for sociallyconscious intellectual development.
Incubate, apply and spread innovative ideas to evolve the department as a center
of excellence in thrust areas.
DEPARTMENT OF INFORMATION TECHNOLOGY
PO2. Problem Analysis: Identify, formulate, review research literature, and analyze
complex engineering problems reaching substantiated conclusions using first
principles of mathematics, natural sciences, and engineering sciences.
PO5. Modern Tool Usage: Create, select, and apply appropriate techniques, resources,
and modern engineering and IT tools including prediction and modelling to
complex engineering activities with an understanding of the limitations.
PO6. The Engineer and Society: Apply reasoning informed by the contextual
knowledge to assess societal, health, safety, legal and cultural issues and the
consequent responsibilities relevant to the professional engineering practice.
PO8. Ethics: Apply ethical principles and commit to professional ethics and
responsibilities and norms of the engineering practice.
PO9. Individual and Team Work: Function effectively as an individual, and as a
member or leader in diverse teams, and in multidisciplinary settings.
PO12. Life-long Learning: Recognize the need for, and have the
preparation and ability to engage in independent and lifelong
learning in the broadest context of technological change.
DEPARTMENT OF INFORMATION TECHNOLOGY
PSO1. Ability to apply their skills in the field of algorithms, networking, web
design, cloud computing and databases.
PSO2. Ability to develop and deploy software solutions for real world problems.
PSO3. Gain knowledge in diverse areas of Computer Science and experience an
environment conducive in cultivating skills for successful career,
entrepreneurship, research and higher studies.
DEPARTMENT OF INFORMATION TECHNOLOGY
Course Objectives:
This course will enable students to learn and understand different Data sets in
implementing the machine learning algorithms.
Course Outcomes:
At the end of the course, student will be able to
Implement procedures for the machine learning algorithms
Design and Develop Python programs for various Learning algorithms
Apply appropriate data sets to the Machine Learning algorithms
Develop Machine Learning algorithms to solve real world problems
DEPARTMENT OF INFORMATION TECHNOLOGY
SYLLABUS
Experiment-1: Implement and demonstrate the FIND-S algorithm for finding the most specific
hypothesis based on a given set of training data samples. Read the training data from a .CSV
file.
Experiment-2: For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Candidate Elimination algorithm to output a description of the set of all
Experiment-3: Write a program to demonstrate the working of the decision tree based ID3
algorithm. Use an appropriate data set for building the decision tree and apply this knowledge to
Experiment-4: Exercises to solve the real-world problems using the following machine learning
Experiment-5: Develop a program for Bias, Variance, Remove duplicates , Cross Validation
Experiment-8: Write a program to implement k-Nearest Neighbor algorithm to classify the iris
fit data points. Select appropriate data set for your experiment and draw graphs.
Experiment-10: Assuming a set of documents that need to be classified, use the naïve Bayesian
Classifier model to perform this task. Built-in Java classes/API can be used to write the
program. Calculate the accuracy, precision, and recall for your data set.
Experiment-11: Apply EM algorithm to cluster a Heart Disease Data Set. Use the same data set
for clustering using k-Means algorithm. Compare the results of these two algorithms and
comment on the quality of clustering. You can add Java/Python ML library classes/API in the
program.
data. Use this model to demonstrate the diagnosis of heart patients using standard Heart Disease
Data Set
Component Analysis
No. of
S.No.
Sub Topic Names Teaching Aid Classes
Unit/Topic No.
Require
IImplement and demonstrate the FIND-S algorithm for Hands on
finding the most specific hypothesis based on a given set
1 3
of training data samples. Read the training data from a
.CSV file.
For a given set of training data examples stored in a .CSV Hands on
file, implement and demonstrate the Candidate
2 3
Elimination algorithm to output a description of the set of
all hypotheses consistent with the training examples.
Write a program to demonstrate the working of the Hands on
decision tree based ID3 algorithm. Use an appropriate data
3 3
set for building the decision tree and apply this knowledge
to classify a new sample.
Exercises to solve the real-world problems using the Hands on
4 following machine learning methods: a) Linear Regression 3
b) Logistic Regression c) Binary Classifier
Develop a program for Bias, Variance, Remove duplicates Hands on
5 3
, Cross Validation
Write a program to implement Categorical Encoding, Hands on
6 3
One-hot Encoding
Build an Artificial Neural Network by implementing the Hands on
7 Back propagation algorithm and test the same using 3
appropriate data sets.
Write a program to implement k-Nearest Neighbor Hands on
8 algorithm to classify the iris data set. Print both correct 3
and wrong predictions.
Implement the non-parametric Locally Weighted Hands on
9 Regression algorithm in order to fit data points. Select 3
appropriate data set for your experiment and draw graphs.
Assuming a set of documents that need to be classified, Hands on
use the naïve Bayesian Classifier model to perform this
10 task. Built-in Java classes/API can be used to write the 3
program. Calculate the accuracy, precision, and recall for
your data set.
Apply EM algorithm to cluster a Heart Disease Data Set. Hands on
Use the same data set for clustering using k-Means
11 algorithm. Compare the results of these two algorithms 3
and comment on the quality of clustering. You can add
Java/Python ML library classes/API in the program.
Exploratory Data Analysis for Classification using Pandas Hands on
12 3
or Matplotlib.
Write a Python program to construct a Bayesian network Hands on
considering medical data. Use this model to demonstrate
13 3
the diagnosis of heart patients using standard Heart
Disease Data Set
Write a program to Implement Support Vector Machines Hands on
14 3
and Principle Component Analysis
Write a program to Implement Principle Component Hands on
15 3
Analysis
Total No. of hours:45
Requirements: Develop the following program using Anaconda/ Jupiter/ Spider and evaluate ML models.
Text Books:
1. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition,
O’Reilly Publications, 2019
Reference Books:
1. Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012.
INDEX
Exercise Description
Implement and demonstrate the FIND-S algorithm for finding the most
1 specific hypothesis based on a given set of training data samples.Read the
training data from a .CSV file.
For a given set of training data examples stored in a .CSV file,implement and
demonstrate the Candidate Elimination algorithm to
2
output a description of the set of all hypotheses consistent with thetraining
examples.
Write a program to demonstrate the working of the decision tree basedID3
3 algorithm. Use an appropriate data set for building the decision tree
and apply this knowledge to classify a new sample.
Exercises to solve the real-world problems using the following machinelearning
4 methods: a) Linear Regression b) Logistic Regression c)
Binary Classifier
Develop a program for Bias, Variance, Remove duplicates , Cross
5 Validation
Write a program to implement Categorical Encoding, One-hot
6
Encoding
Build an Artificial Neural Network by implementing the Back
7
propagation algorithm and test the same using appropriate data sets.
Write a program to implement k-Nearest Neighbor algorithm to classify
8 the iris data set. Print both correct and wrong predictions.
Implement the non-parametric Locally Weighted Regression algorithm
9 in order to fit data points. Select appropriate data set for yourexperiment and
draw graphs.
Assuming a set of documents that need to be classified, use the naïve Bayesian
Classifier model to perform this task. Built-in Java classes/API can be used to
10 write the program. Calculate the accuracy,
precision, and recall for your data set.
Apply EM algorithm to cluster a Heart Disease Data Set. Use the same data set for
clustering using k-Means algorithm. Compare the results of these two algorithms
11
and comment on the quality of clustering. You can
add Java/Python ML library classes/API in the program.
Exploratory Data Analysis for Classification using Pandas or
12
Matplotlib.
Write a Python program to construct a Bayesian network considering
13 medical data. Use this model to demonstrate the diagnosis of heartpatients using
standard Heart Disease Data Set
Write a program to Implement Support Vector Machines and Principle
14
Component Analysis
15 Write a program to Implement Principle Component Analysis
FIND-S algorithm and Read the training datafrom a .CSV file.
FIND-S algorithm
Exp.No:
AIM:-Implement and demonstrate the FIND-S algorithm for finding the most specific
hypothesis based on a given set of training data samples. Read the training data from a .CSV
file.
Software Required:
Develop the following program using Anaconda/ Jupiter/ Spider and evaluate ML
Models.
Program:-
import csv
with open('tennis.csv', 'r') as f:
reader = csv.reader(f)
your_list = list(reader)
h = [['0', '0', '0', '0', '0', '0']]
for i in your_list:
print(i)
if i[-1] == "True":
j=0
for x in i:
if x != "True":
if x != h[0][j] and h[0][j] == '0':
h[0][j] = x
elif x != h[0][j] and h[0][j] != '0':
h[0][j] = '?'
else:
pass
j=j+1
print("Most specific hypothesis is")
print(h)
Output:-
'Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same',True
'Sunny', 'Warm', 'High', 'Strong', 'Warm', 'Same',True
'Rainy', 'Cold', 'High', 'Strong', 'Warm', 'Change',False
'Sunny', 'Warm', 'High', 'Strong', 'Cool','Change',True
Result:
The aim of the program is successfully executed and verified.
References:-
Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012
Software Required:
Develop the following program using Anaconda/ Jupiter/ Spider and evaluate ML
Models.
Program:-
class Holder:
factors={} #Initialize an empty dictionary
attributes = () #declaration of dictionaries parameters with an arbitrary length
'''
Constructor of class Holder holding two parameters,
self refers to the instance of the class
'''
def init (self,attr): #
self.attributes = attr
for i in attr:
self.factors[i]=[]
def add_values(self,factor,values):
self.factors[factor]=values
class CandidateElimination:
Positive={} #Initialize positive empty dictionary
Negative={} #Initialize negative empty dictionary
def init (self,data,fact):
self.num_factors = len(data[0][0])
self.factors = fact.factors
self.attr = fact.attributes
self.dataset = data
def run_algorithm(self):'''Initialize the specific and general boundaries, and loop
the dataset against thealgorithm
'''
G = self.initializeG()
S = self.initializeS()
G = self.remove_more_specific(G)
print(G)
print (S)
print (G)
def initializeS(self):
''' Initialize the specific boundary '''
S = tuple(['-' for factor in range(self.num_factors)]) #6 constraints in the vector
return [S]
def initializeG(self):
''' Initialize the general boundary '''
G = tuple(['?' for factor in range(self.num_factors)]) # 6 constraints in the vector
return [G]
def is_positive(self,trial_set):
''' Check if a given training trial_set is positive '''
if trial_set[1] == 'Y':
return True
elif trial_set[1] == 'N':
return False
else:
raise TypeError("invalid target value")
def match_factor(self,value1,value2):
''' Check for the factors values match,
necessary while checking the consistency of
training trial_set with the hypothesis '''
if value1 == '?' or value2 == '?':
return True
elif value1 == value2 :
return True
return False
def consistent(self,hypothesis,instance):
''' Check whether the instance is part of the hypothesis '''
for i,factor in enumerate(hypothesis):
if not self.match_factor(factor,instance[i]):
return False
return True
def remove_inconsistent_G(self,hypotheses,instance):
''' For a positive trial_set, the hypotheses in G
inconsistent with it should be removed '''
G_new = hypotheses[:]
for g in hypotheses:
if not self.consistent(g,instance):
G_new.remove(g)
return G_new
def remove_inconsistent_S(self,hypotheses,instance):
''' For a negative trial_set, the hypotheses in S
inconsistent with it should be removed '''
S_new = hypotheses[:]
for s in hypotheses:
if self.consistent(s,instance):
S_new.remove(s)
return S_new
def remove_more_general(self,hypotheses):
''' After generalizing S for a positive trial_set, the hypothesis in S
general than others in S should be removed '''
S_new = hypotheses[:]
for old in hypotheses:
return G_new
def generalize_inconsistent_S(self,hypothesis,instance):
''' When a inconsistent hypothesis for positive trial_set is seen in the specific
boundary S,
it should be generalized to be consistent with the trial_set ... we will get one
hypothesis'''
hypo = list(hypothesis) # convert tuple to list for mutability
for i,factor in enumerate(hypo):
if factor == '-':
hypo[i] = instance[i]
elif not self.match_factor(factor,instance[i]):
hypo[i] = '?'
generalization = tuple(hypo) # convert list back to tuple for immutability
return generalization
def specialize_inconsistent_G(self,hypothesis,instance):
''' When a inconsistent hypothesis for negative trial_set is seen in the general
boundary G
should be specialized to be consistent with the trial_set.. we will get a set of
hypotheses '''
specializations = []
hypo = list(hypothesis) # convert tuple to list for mutability
for i,factor in enumerate(hypo):
if factor == '?':
values = self.factors[self.attr[i]]
for j in values:
if instance[i] != j:
hyp=hypo[:]
hyp[i]=j
hyp=tuple(hyp) # convert list back to tuple for immutability
specializations.append(hyp) return specializations
def get_general(self,generalization,G):
''' Checks if there is more general hypothesis in G for a generalization of inconsistent
hypothesis in S in case of positive trial_set and returns valid generalization '''
for g in G:
if self.more_general(g,generalization):
return generalization
return None
def get_specific(self,specializations,S):
''' Checks if there is more specific hypothesis in S
for each of hypothesis in specializations of an
inconsistent hypothesis in G in case of negative trial_set
and return the valid specializations'''
valid_specializations = []
for hypo in specializations:
for s in S:
if self.more_specific(s,hypo) or s==self.initializeS()[0]:
valid_specializations.append(hypo)
return valid_specializations
def exists_general(self,hypothesis,G):
'''Used to check if there exists a more general hypothesis in
general boundary for version space'''
for g in G:
if self.more_general(g,hypothesis):
return True
return False
def exists_specific(self,hypothesis,S):
'''Used to check if there exists a more specific hypothesis in
general boundary for version space'''
for s in S:
if self.more_specific(s,hypothesis):
return True
return False
def more_general(self,hyp1,hyp2):
''' Check whether hyp1 is more general than hyp2 '''
hyp = zip(hyp1,hyp2)
for i,j in hyp:
if i == '?':
continue
elif j == '?':
if i != '?':
return False
elif i != j:
return False
else:
continue
return True
def more_specific(self,hyp1,hyp2):
''' hyp1 more specific than hyp2 is
equivalent to hyp2 being more general than hyp1 '''
return self.more_general(hyp2,hyp1)
dataset=[(('sunny','warm','normal','strong','warm','same'),'Y'),(('sunny','warm','high','stron
g','warm','same'),'Y'),(('rainy','cold','high','strong','warm','change'),'N'),(('sunny','warm','hi
gh','strong','cool','change'),'Y')]
attributes =('Sky','Temp','Humidity','Wind','Water','Forecast')
f = Holder(attributes)
f.add_values('Sky',('sunny','rainy','cloudy')) #sky can be sunny rainy or cloudy
f.add_values('Temp',('cold','warm')) #Temp can be sunny cold or warm
f.add_values('Humidity',('normal','high')) #Humidity can be normal or high
f.add_values('Wind',('weak','strong')) #wind can be weak or strong
f.add_values('Water',('warm','cold')) #water can be warm or cold
f.add_values('Forecast',('same','change')) #Forecast can be same or change
a = CandidateElimination(dataset,f) #pass the dataset to the algorithm class and call the
run algoritm method
a.run_algorithm()
Output
[('sunny', 'warm', 'normal', 'strong', 'warm', 'same')]
[('sunny', 'warm', 'normal', 'strong', 'warm','same')]
[('sunny', 'warm', '?', 'strong', 'warm', 'same')]
[('?', '?', '?', '?', '?', '?')]
[('sunny', '?', '?', '?', '?', '?'), ('?', 'warm', '?', '?', '?', '?'), ('?', '?', '?', '?', '?', 'same')]
[('sunny', 'warm', '?', 'strong', 'warm', 'same')]
[('sunny', 'warm', '?', 'strong', '?', '?')]
[('sunny', 'warm', '?', 'strong', '?', '?')]
[('sunny', '?', '?', '?', '?', '?'), ('?', 'warm', '?', '?', '?', '?')]
Result:
The aim of the program is successfully executed and verified.
References:-
Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012
AIM:-Write a program to demonstrate the working of the decision tree based ID3 algorithm.
Use an appropriate data set for building the decision tree and apply this knowledge to classify
a new sample.
Software Required:
Develop the following program using Anaconda/ Jupiter/ Spider and evaluate ML
Models.
Program:-
import numpy as
npimport math
from data_loader import read_data
class Node:
def init (self, attribute): self.attribute = attributeself.children =
[] self.answer = ""
def str (self):
return
self.attribute
def subtables(data, col,
delete):dict = {}
items = np.unique(data[:, col])
count = np.zeros((items.shape[0], 1),
dtype=np.int32)for x in range(items.shape[0]):
for y in range(data.shape[0]):
if data[y, col] ==
items[x]:count[x] += 1
for x in range(items.shape[0]):
dict[items[x]] = np.empty((int(count[x]), data.shape[1]), dtype="|S32")
pos = 0
for y in
range(data.shape[0]):if
data[y, col] == items[x]:
dict[items[x]][pos] = data[y]pos += 1
if delete:
return 0
counts = np.zeros((items.shape[0],
1))sums = 0
for x in range(items.shape[0]):
counts[x] = sum(S == items[x]) / (S.size * 1.0)
for count in counts:
sums += -1 * count * math.log(count,
2)return sums
def gain_ratio(data, col):
items, dict = subtables(data, col, delete=False)
total_size = data.shape[0]
entropies = np.zeros((items.shape[0],
1))intrinsic = np.zeros((items.shape[0],
1)) for x in range(items.shape[0]):
ratio = dict[items[x]].shape[0]/(total_size * 1.0)
entropies[x] = ratio * entropy(dict[items[x]][:, -
1])intrinsic[x] = ratio * math.log(ratio, 2)
total_entropy = entropy(data[:, -
1])iv = -1 * sum(intrinsic)
for x in
range(entropies.shape[0]):
total_entropy -= entropies[x]
return total_entropy / iv
def create_node(data, metadata):
if (np.unique(data[:, -1])).shape[0] ==
1:node = Node("")
node.answer = np.unique(data[:, -
1])[0]return node
gains = np.zeros((data.shape[1] - 1,
1))for col in range(data.shape[1] - 1):
gains[col] = gain_ratio(data,
col)split = np.argmax(gains)
node = Node(metadata[split])
Data_loader.py
import csv
def read_data(filename):
with open(filename, 'r') as csvfile:
datareader = csv.reader(csvfile,
delimiter=',')headers = next(datareader)
metadata =
[]traindata =
[]
for name in headers:
metadata.append(name
)
for row in datareader:
traindata.append(row)
return (metadata, traindata)
Tennis.csv
outlook,temperature,humidity,wind,
answer sunny,hot,high,weak,no
sunny,hot,high,strong,no
overcast,hot,high,weak,yes
rain,mild,high,weak,yes
rain,cool,normal,weak,yes
rain,cool,normal,strong,no
overcast,cool,normal,strong,yes
sunny,mild,high,weak,no
sunny,cool,normal,weak,yes
rain,mild,normal,weak,yes
sunny,mild,normal,strong,yes
overcast,mild,high,strong,yes
overcast,hot,normal,weak,yes
rain,mild,high,strong,no
Output
outlook
overcas
t
b'yes'
rain
wind
b'strong'
b'no'
b'weak'
b'yes'
sunny
humidit
y
b'hig
h'
b'no'
b'norma
l'b'yes
Result:
The aim of the program is successfully executed and verified.
References:-
Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012
a) Linear Regression:
1. Predicting the price of a house based on its square footage, number of bedrooms,
number ofbathrooms, and location.
2. Forecasting the amount of rainfall in a region based on historical data and weather patterns.
3. Estimating the number of bike rentals in a city based on temperature, humidity, and day
of the week.
b) Logistic Regression:
1. Identifying the likelihood of a customer defaulting on a loan based on their credit score,
income,and employment status.
2. Predicting the probability of a patient developing a particular disease based on their age,
sex, andmedical history.
3. Determining the chance of a student passing an exam based on their study habits, test
scores, andattendance record.
c) Binary Classifier:
1. Identifying the likelihood of a customer defaulting on a loan based on their credit score,
income,and employment status.
2. Predicting the probability of a patient developing a particular disease based on their age,
sex, andmedical history.
3. Determining the chance of a student passing an exam based on their study habits, test
scores, andattendance record.
It's important to notice that these are just a few examples and there are many real-world
problemsthat can be solved using these machine learning methods. It's also important to notice
that the success of solving these problems depends on the quality of the data and the feature
engineering, which means that the performance of the model will vary depending on the dataset.
Result:
The aim of the program is successfully executed and verified.
References:-
Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012
Software Required:
Develop the following program using Anaconda/ Jupiter/ Spider and evaluate ML
Models.
Program:-
1.Bias and Variance:
import numpy as np
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
# Read the data set
df = pd.read_csv("data.csv")
# Split the data into training and testing sets
X = df.drop(columns=['label'])
y = df['label']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Train the linear regression model
reg = LinearRegression().fit(X_train, y_train)
# Compute the bias and variance
y_train_pred = reg.predict(X_train)
bias = mean_squared_error(y_train, y_train_pred)
y_test_pred = reg.predict(X_test)
variance = mean_squared_error(y_test, y_test_pred) - bias
print("Bias:", bias)
print("Variance:", variance)
This example uses the Scikit-learn library to train a linear regression model and calculate
the bias and variance of the model. The train_test_split function is used to split the data into
training and testing sets, and the LinearRegression class is used to train the model. The
mean_squared_error function is used to calculate the bias and variance of the model by
comparing the predicted values on the training and testing sets with the actual values.
Result:
The aim of the program is successfully executed and verified.
References:-
Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012
Software Required:
Develop the following program using Anaconda/ Jupiter/ Spider and evaluate ML
Models.
Categorical encoding is the process of converting categorical variables into numerical
variables in order to use them as input for machine learning algorithms. Two common types of
categorical encoding are:
1. Label Encoding:
from sklearn.preprocessing import LabelEncoder
# Create an instance of LabelEncoder
le = LabelEncoder()
# Fit and transform the categorical column
df['categorical_column'] = le.fit_transform(df['categorical_column'])
This example uses the LabelEncoder class from the Scikit-learn library to convert a
categorical column in a DataFrame into numerical values. The fit_transform method is used to fit
the encoder to the data and transform the column. The encoded values are integers and can be
used as input for machine learning algorithms.
2.One-Hot Encoding:
This example uses the OneHotEncoder class from the Scikit-learn library to convert a
categorical column in a DataFrame into a one-hot encoded array. The fit_transform method is
used to fit the encoder to the data and transform the column. The result is an array with a column
for each category, and the values in the columns are binary indicating the presence or absence of
the category in the original column.
It's important to notice that while Label Encoding can be useful in some cases, it can
sometimes lead to the model interpreting the encoded values as ordinal, when they are in fact
nominal. One-Hot Encoding is less prone to this issue, but it can lead to an increase in the
number of features. It's important to consider the problem and the data before choosing the
appropriate encoding technique.
Result:
The aim of the program is successfully executed and verified.
References:-
Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012
Backpropagation algorithm
Exp.No:
Software Required:
Develop the following program using Anaconda/ Jupiter/ Spider and evaluate ML
Models.
Program:-
import numpy as np
X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)
y = np.array(([92], [86], [89]), dtype=float)
X = X/np.amax(X,axis=0) # maximum of X array longitudinally
y = y/100
#Sigmoid Function
def sigmoid (x):
return 1/(1 + np.exp(-x))
#Derivative of Sigmoid Function
def derivatives_sigmoid(x):
return x * (1 - x)
#Variable initialization
epoch=7000 #Setting training iterations
lr=0.1 #Setting learning rate
inputlayer_neurons = 2 #number of features in data set
hiddenlayer_neurons = 3 #number of hidden layers neurons
output_neurons = 1 #number of neurons at output layer
#weight and bias initialization
wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))
bout=np.random.uniform(size=(1,output_neurons))
#draws a random range of numbers uniformly of dim x*y
for i in range(epoch):
#Forward Propogation hinp1=np.dot(X,wh)
hinp=hinp1 + bh hlayer_act = sigmoid(hinp)
d_hiddenlayer = EH * hiddengrad
wout += hlayer_act.T.dot(d_output) *lr# dotproduct of nextlayererror and
currentlayerop
# bout += np.sum(d_output, axis=0,keepdims=True) *lr
wh += X.T.dot(d_hiddenlayer) *lr
#bh += np.sum(d_hiddenlayer, axis=0,keepdims=True) *lr
print("Input: \n" + str(X))
print("Actual Output: \n" + str(y))
print("Predicted Output: \n" ,output)
Input: o
u
[[ 0.66666667
t 1. ]
[ 0.33333333
p 0.55555556]
[ 1. u 0.66666667]]
t
Actual Output:
[[ 0.92]
[ 0.86]
[ 0.89]]
Predicted Output:
[[ 0.89559591]
[ 0.88142069]
[ 0.8928407 ]]
Result:
The aim of the program is successfully executed and verified.
References:-
Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012
Exp.No:
AIM:-Write a program to implement the Neighbour algorithm to classify the iris data set. Print both
correct and wrong predictions. Java/Python ML library classes can be used for this problem.
Software Required:
Develop the following program using Anaconda/ Jupiter/ Spider and evaluate ML
Models.
Program:-
import csv import
random import math
import operator
def loadDataset(filename, split, trainingSet=[] , testSet=[]):with
open(filename, 'rb') as csvfile:
lines = csv.reader(csvfile)
dataset = list(lines)
for x in range(len(dataset)-1):for y
in range(4):
dataset[x][y] = float(dataset[x][y])if
random.random() < split:
trainingSet.append(dataset[x])else:
testSet.append(dataset[x])
def euclideanDistance(instance1, instance2, length):
distance = 0
for x in range(length):
distance += pow((instance1[x] - instance2[x]), 2)return
math.sqrt(distance)
def getNeighbors(trainingSet, testInstance, k):
distances = []
length = len(testInstance)-1
for x in range(len(trainingSet)):
dist = euclideanDistance(testInstance, trainingSet[x], length)
distances.append((trainingSet[x], dist))
distances.sort(key=operator.itemgetter(1))neighbors = []
for x in range(k):
neighbors.append(distances[x][0])return
neighbors
def getResponse(neighbors):
classVotes = {}
for x in range(len(neighbors)): response
= neighbors[x][-1]if response in
classVotes:
classVotes[response] += 1
else:
classVotes[response] = 1
sortedVotes =
sorted(classVotes.iteritems(),
reverse=True)
return sortedVotes[0][0]
def getAccuracy(testSet,
predictions): correct = 0
for x in
range(len(testSet)):
key=operator.itemgetter(1
),
if testSet[x][-1] == predictions[x]:
correct += 1
return (correct/float(len(testSet))) * 100.0
def main():
# prepare
data
trainingSet=
[] testSet=[]
split = 0.67
OUTPUT
Confusion matrix is as follows
[[11 0 0]
[0 9 1]
[0 1 8]]
Accuracy metrics0
Result:
The aim of the program is successfully executed and verified.
References:-
Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012
Exp.No:
AIM:-Implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select appropriate data set for your experiment and draw graphs.
Software Required:
Develop the following program using Anaconda/ Jupiter/ Spider and evaluate ML
Models.
Program:-
from numpy import *import operator
from os
import listdir
import
matplotlib
import
matplotlib.pyplot as plt
import pandas as pd
import numpy as
np1 import
numpy.linalg as np
from scipy.stats.stats import pearsonr
def
kernel(point,x
mat, k):m,n =
np1.shape(xmat
)
weights = np1.mat(np1.eye((m)))for j in range(m):
diff = point - X[j]
weights[j,j] = np1.exp(diff*diff.T/(-
2.0*k**2))return weights
def localWeight(point,xmat,ymat,k):wei =
kernel(point,xmat,k)
W=(X.T*(wei*X)).I*(X.T*(wei*ymat.T))return W
def localWeightRegression(xmat,ymat,k):m,n = np1.shape(xmat)
#set k here
ypred = localWeightRegression(X,mtip,2)
SortIndex =
X[:,1].argsort(0)xsort
= X[SortIndex][:,0]
Output
Result:
The aim of the program is successfully executed and verified.
References:-
Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012
AIM:-Assuming a set of documents that need to be classified, use the naïve Bayesian
Classifier model to perform this task. Built-in Java classes/API can be used to write the
program. Calculate the accuracy, precision, and recall for your data set.
Software Required:
Develop the following program using Anaconda/ Jupiter/ Spider and evaluate ML
Models.
Program:-
import pandas as pd
msg=pd.read_csv('naivetext1.csv',names=['message','label'])
print('The dimensions of the dataset',msg.shape)
msg['labelnum']=msg.label.map({'pos':1,'neg':0})
X=msg.message
y=msg.labelnum
print(X)
print(y)
#splitting the dataset into train and test data
from sklearn.model_selection import train_test_split
xtrain,xtest,ytrain,ytest=train_test_split(X,y)
print(xtest.shape)
print(xtrain.shape)print(ytest.shape) print(ytrain.shape)
#output of count vectoriser is a sparse matrix
from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()
xtrain_dtm = count_vect.fit_transform(xtrain)
xtest_dtm=count_vect.transform(xtest)
print(count_vect.get_feature_names())
df=pd.DataFrame(xtrain_dtm.toarray(),columns=count_vect.get_feature_names())
print(df)#tabular representation
print(xtrain_dtm) #sparse matrix representation
# Training Naive Bayes (NB) classifier on training data.
from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB().fit(xtrain_dtm,ytrain)predicted = clf.predict(xtest_dtm)
X_new_counts = count_vect.transform(docs_new)
predictednew = clf.predict(X_new_counts)
for doc, category in zip(docs_new, predictednew):
print('%s->%s' % (doc, msg.labelnum[category]))'''
I love this sandwich,pos
This is an amazing place,pos
I feel very good about these beers,pos
This is my best work,pos
What an awesome view,pos
I do not like this restaurant,neg
I am tired of this stuff,neg
I can't deal with this,neg
He is my sworn enemy,neg
My boss is horrible,neg
This is an awesome place,pos
I do not like the taste of this juice,neg
I love to dance,pos
I am sick and tired of this place,neg
What a great holiday,pos
That is a bad locality to stay,neg
We will have good fun tomorrow,pos
I went to my enemy's house today,neg
OUTPUT
['about', 'am', 'amazing', 'an', 'and', 'awesome', 'beers', 'best', 'boss', 'can', 'deal',
'do', 'enemy', 'feel', 'fun', 'good', 'have', 'horrible', 'house', 'is', 'like', 'love', 'my',
'not', 'of', 'place', 'restaurant', 'sandwich', 'sick', 'stuff', 'these', 'this', 'tired', 'to',
'today', 'tomorrow', 'very', 'view', 'we', 'went', 'what', 'will', 'with', 'work']
about am amazing an and awesome beers best boss can ... today \
0 1 0 0 0 0 0 1 0 0 0 ... 0
1 0 0 0 0 0 0 0 1 0 0 ... 0
2 0 0 1 1 0 0 0 0 0 0 ... 0
3 0 0 0 0 0 0 0 0 0 0 ... 1
4 0 0 0 0 0 0 0 0 0 0 ... 0
5 01 0 0 1 0 0 0 0 0 ... 0
6 0 0 0 0 0 0 0 0 0 1 ... 0
7 0 0 0 0 0 0 0 0 0 0 ... 0
8 0 1 0 0 0 0 0 0 0 0 ... 0
9 0 0 0 1 0 1 0 0 0 0 ... 0
10 0 0 0 0 0 0 0 0 0 0 ... 0
11 0 0 0 0 0 0 0 0 1 0 ... 0
12 0 0 0 1 0 1 0 0 0 0 ... 0
tomorrow very view we went what will with work
0 0 1 0 0 00 0 00
1 0 0 0 0 00 0 0 1
2 0 0 0 0 00 0 0 0
3 0 0 0 0 10 0 0 0
4 0 0 0 0 00 0 0 0
5 0 0 0 0 00 0 0 0
6 0 0 0 0 00 0 1 0
7 1 0 0 1 00 1 0 0
8 0 0 0 0 00 0 0 0
Result:
The aim of the program is successfully executed and verified.
References:-
Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012
Exp.No:
AIM: - Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set
for clustering using k-Means algorithm. Compare the results of these two algorithms and
comment on the quality of clustering. You can add Java/Python ML library classes/API in the
program.
Software Required:
Develop the following program using Anaconda/ Jupiter/ Spider and evaluate ML
Models.
Program:-
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets.samples_generator import make_blobs
X, y_true = make_blobs(n_samples=100, centers =
4,Cluster_std=0.60,random_state=0)
X = X[:, ::-1]
#flip axes for better plotting
from sklearn.mixture import GaussianMixture
gmm = GaussianMixture (n_components = 4).fit(X)
lables = gmm.predict(X)
plt.scatter(X[:, 0], X[:, 1], c=labels, s=40, cmap=‟viridis‟);
probs = gmm.predict_proba(X)
print(probs[:5].round(3))
size = 50 * probs.max(1) ** 2 # square emphasizes differences
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap=‟viridis‟,
s=size);from matplotlib.patches import Ellipse
def draw_ellipse(position, covariance, ax=None, **kwargs);
“””Draw an ellipse with a given position and covariance”””
Ax = ax or plt.gca()
# Convert covariance to principal axes
if covariance.shape ==(2,2):
U, s, Vt = np.linalg.svd(covariance)
Angle = np.degrees(np.arctan2(U[1, 0], U[0,0]))
Width, height = 2 * np.sqrt(s)
else:
Output
[[1 ,0, 0, 0]
[0 ,0, 1, 0]
[1 ,0, 0, 0]
[1 ,0, 0, 0]
[1 ,0, 0, 0]]
Result:
The aim of the program is successfully executed and verified.
References:-
Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012
Dept. Of IT, GIET Engineering College, Rajahmundry
Data Analysis for Classification
Data Analysis for Classification
Exp.No:
AIM:-Exploratory Data Analysis for Classification using Pandas or Matplotlib.
Software Required:
Develop the following program using Anaconda/ Jupiter/ Spider and evaluate ML
Models.
Program:-
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer(as_frame=True).frame
# Print the first 5 rows of the data
print(data.head())
# Print the shape of the data
print(data.shape)
# Print the summary statistics of the data
print(data.describe())
# Print the count of each class
print(data.target.value_counts())
# Plot a bar chart of the target variable
data.target.value_counts().plot(kind='bar')
plt.title('Target Variable Distribution')
plt.xlabel('Class')
plt.ylabel('Count')
plt.show()
# Plot a histogram of the mean radius feature
data['mean radius'].plot(kind='hist', bins=30)
plt.title('Distribution of Mean Radius')
plt.xlabel('Mean Radius')
plt.ylabel('Count')
plt.show()
# Plot a scatter matrix of the data
pd.plotting.scatter_matrix(data, c=data.target, figsize=(15,15), marker='o', hist_kwds={'bins': 20},
alpha=0.5)
plt.show()
Output:-
mean radius mean texture mean perimeter mean area mean smoothness mean compactness mean
concavity mean concave points mean symmetry mean fractal dimension ... worst texture worst
perimeter worst area worst smoothness worst compactness worst concavity worst concave points worst
symmetry worst fractal dimension target
0 17.99 10.38 122.80 1001.0 0.11840 0.27760 0.3001 0.14710
0.2419 0.07871 ... 17.33 184.60 2019.0 0.1622 0.6656
0.7119 0.2654 0.4601 0.11890 0
1 20.57 17.77 132.90 1326.0 0.08474 0.07864 0.0869 0.07017
0.1812 0.05667 ... 23.41 158.80 1956.0 0.1238 0.1866
0.2416 0.1860 0.2750 0.08902 0
2 19.69 21.25 130.00 1203.0 0.10960 0.15990 0.1974 0.12790
0.2069 0.05999 ... 25.53 152.50 1709.0 0.1444 0.4245
0.4504 0.2430 0.3613 0.08758 0
3 11.42 20.38 77.58 386.1 0.14250 0.28390 0.2414 0.10520
0.2597 0.
Result:
The aim of the program is successfully executed and verified.
References:-
Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012
Software Required:
Develop the following program using Anaconda/ Jupiter/ Spider and evaluate ML
Models.
Program:-
# observed values
age_obs = 60
gender_obs = 1
bp_obs = 130
Output:
Probability of heart disease: 60.25%
Result:
The aim of the program is successfully executed and verified.
References:-
Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012
Exp.No:
AIM:- Write a program to Implement Support Vector Machines and Principle Component
Analysis
Software Required:
Develop the following program using Anaconda/ Jupiter/ Spider and evaluate ML
Models.
Program:-
output:
Accuracy: 0.9778
Misclassifications: 1
Confusion Matrix:
[[16 0 0]
[ 0 17 1]
[ 0 0 11]]
Result:
The aim of the program is successfully executed and verified.
References:-
Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012
Exp.No:
AIM:-Write a program to Implement Principle Component Analysis
Software Required:
Develop the following program using Anaconda/ Jupiter/ Spider and evaluate ML
Models.
Program:-
Result:
The aim of the program is successfully executed and verified.
References:-
Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012