0% found this document useful (0 votes)
9 views53 pages

ML Using Python IT UPDATED

The document is a lab manual for a B.Tech course on Machine Learning using Python at GIET Engineering College. It outlines the vision, mission, educational objectives, outcomes, and specific experiments related to machine learning algorithms that students will implement. The manual includes a detailed syllabus, course objectives, and references for further reading.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
9 views53 pages

ML Using Python IT UPDATED

The document is a lab manual for a B.Tech course on Machine Learning using Python at GIET Engineering College. It outlines the vision, mission, educational objectives, outcomes, and specific experiments related to machine learning algorithms that students will implement. The manual includes a detailed syllabus, course objectives, and references for further reading.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 53

Program: B.Tech.

Regulations: R20
Year & Sem: III-II

Machine Learning using python


Lab Manual

Department of Information Technology

GIET ENGINEERING COLLEGE


Approved by AICTE New Delhi, Affiliated to JNTUK, Accredited by NAAC and ISO:2015 Certified.
NH-16, GIET Campus, Velugubanda (V), Rajanagaram (M), Rajamahendravaram-533296, AP, India.
GIET ENGINEERING COLLEGE
Approved by AICTE New Delhi, Affiliated to JNTUK, Accredited by NAAC and ISO:2015 Certified.
NH-16, GIET Campus, Velugubanda (V), Rajanagaram (M), Rajamahendravaram-533296, AP, India.

VISION
To evolve as a Premier Engineering Institution in the Country with its continues
strive for excellence inEducation, Research and Technological Services.

MISSION
 To generate the community of highly learned students with greater acquirement of
knowledge and to apply it professionally with due consideration for ecological,
economic and ethical issues.

 To provide knowledge-based technological services at the best satisfaction of


society and for the industrial needs.

 To foster the research and disseminate research findings.

 To build in capabilities for advancing education, technology, values, management


and research at international Standards.
DEPARTMENT OF INFORMATION TECHNOLOGY

VISION
 To explore innovative approaches to enhance and expand learning opportunities
through the integration of various technologies.

 To build a strong research and teaching environment that responds to the real-
time challenges of the industry

MISSION
 To inculcate the blend of competence, aptitude of knowledge and investigate flair
through devising anambient environment for sustainable learning.
 To transform attitude, values, priorities by changing mindset and instill positive
outlook for sociallyconscious intellectual development.
 Incubate, apply and spread innovative ideas to evolve the department as a center
of excellence in thrust areas.
DEPARTMENT OF INFORMATION TECHNOLOGY

PROGRAM EDUCATIONAL OBJECTIVES (PEO):

PEO 1. Provide a strong foundation required to comprehend, analyze, design and


develop solutions to real world computing problems.
PEO 2. Expose the students to industry practices for providing computing solutions
using current models and techniques.
PEO 3. Enable the students to pursue higher studies and active research.

PEO 4. Foster sustained professional development through life-long learning to


adapt new computing technologies.
DEPARTMENT OF INFORMATION TECHNOLOGY

PROGRAM OUTCOMES (PO):

PO1. Engineering Knowledge: Apply the knowledge of mathematics, science,


engineering fundamentals, and an engineering specialization to the solution of
complex engineering problems.

PO2. Problem Analysis: Identify, formulate, review research literature, and analyze
complex engineering problems reaching substantiated conclusions using first
principles of mathematics, natural sciences, and engineering sciences.

PO3. Design/Development of Solutions: Design solutions for complex engineering


problems and design system components or processes that meet the specified
needs with appropriate consideration for the public health and safety, and the
cultural, societal, and environmental considerations.

PO4. Conduct Investigations of Complex Problems: Use research-based knowledge


and research methods including design of experiments, analysis and
interpretation of data, and synthesis of the information to provide valid
conclusions.

PO5. Modern Tool Usage: Create, select, and apply appropriate techniques, resources,
and modern engineering and IT tools including prediction and modelling to
complex engineering activities with an understanding of the limitations.

PO6. The Engineer and Society: Apply reasoning informed by the contextual
knowledge to assess societal, health, safety, legal and cultural issues and the
consequent responsibilities relevant to the professional engineering practice.

PO7. Environment and Sustainability: Understand the impact of the professional


engineering solutions in societal and environmental contexts, and demonstrate
the knowledge of, and need for sustainable development.

PO8. Ethics: Apply ethical principles and commit to professional ethics and
responsibilities and norms of the engineering practice.
PO9. Individual and Team Work: Function effectively as an individual, and as a
member or leader in diverse teams, and in multidisciplinary settings.

PO10. Communication: Communicate effectively on complex engineering activities


with the engineering community and with society at large, such as, being able to
comprehend and write effective reports and design documentation, make
effective presentations, and give and receive clear instructions.
PO11. Project Management and Finance: Demonstrate knowledge and understanding
of the engineering and management principles and apply these to one’s own
work, as a member and leader in a team, to manage projects and in
multidisciplinary environments.

PO12. Life-long Learning: Recognize the need for, and have the
preparation and ability to engage in independent and lifelong
learning in the broadest context of technological change.
DEPARTMENT OF INFORMATION TECHNOLOGY

PROGRAM SPECIFIC OUTCOMES (PSO):

PSO1. Ability to apply their skills in the field of algorithms, networking, web
design, cloud computing and databases.
PSO2. Ability to develop and deploy software solutions for real world problems.
PSO3. Gain knowledge in diverse areas of Computer Science and experience an
environment conducive in cultivating skills for successful career,
entrepreneurship, research and higher studies.
DEPARTMENT OF INFORMATION TECHNOLOGY

Course Objectives:
 This course will enable students to learn and understand different Data sets in
implementing the machine learning algorithms.

Course Outcomes:
At the end of the course, student will be able to
 Implement procedures for the machine learning algorithms
 Design and Develop Python programs for various Learning algorithms
 Apply appropriate data sets to the Machine Learning algorithms
 Develop Machine Learning algorithms to solve real world problems
DEPARTMENT OF INFORMATION TECHNOLOGY

SYLLABUS

Experiment-1: Implement and demonstrate the FIND-S algorithm for finding the most specific

hypothesis based on a given set of training data samples. Read the training data from a .CSV

file.

Experiment-2: For a given set of training data examples stored in a .CSV file, implement and

demonstrate the Candidate Elimination algorithm to output a description of the set of all

hypotheses consistent with the training examples.

Experiment-3: Write a program to demonstrate the working of the decision tree based ID3

algorithm. Use an appropriate data set for building the decision tree and apply this knowledge to

classify a new sample.

Experiment-4: Exercises to solve the real-world problems using the following machine learning

methods: a) Linear Regression b) Logistic Regression c) Binary Classifier

Experiment-5: Develop a program for Bias, Variance, Remove duplicates , Cross Validation

Experiment-6: Write a program to implement Categorical Encoding, One-hot Encoding

Experiment-7: Build an Artificial Neural Network by implementing the Back propagation

algorithm and test the same using appropriate data sets.

Experiment-8: Write a program to implement k-Nearest Neighbor algorithm to classify the iris

data set. Print both correct and wrong predictions.

Experiment-9: Implement the non-parametric Locally Weighted Regression algorithm in order to

fit data points. Select appropriate data set for your experiment and draw graphs.

Experiment-10: Assuming a set of documents that need to be classified, use the naïve Bayesian

Classifier model to perform this task. Built-in Java classes/API can be used to write the

program. Calculate the accuracy, precision, and recall for your data set.
Experiment-11: Apply EM algorithm to cluster a Heart Disease Data Set. Use the same data set

for clustering using k-Means algorithm. Compare the results of these two algorithms and

comment on the quality of clustering. You can add Java/Python ML library classes/API in the

program.

Experiment-12: Exploratory Data Analysis for Classification using Pandas or Matplotlib.

Experiment-13: Write a Python program to construct a Bayesian network considering medical

data. Use this model to demonstrate the diagnosis of heart patients using standard Heart Disease

Data Set

Experiment-14: Write a program to Implement Support Vector Machines and Principle

Component Analysis

Experiment-15: Write a program to Implement Principle Component Analysis


DEPARTMENT OF INFORMATION TECHNOLOGY
LESSON PLAN

DEPARTMENT OF INFORMATION TECHNOLOGY

ACADEMIC YEAR: 2022-2023 COURSE: ML Using Python Lab SEMESTER: III-II

FACULTY NAME: D.RAMESH

No. of
S.No.
Sub Topic Names Teaching Aid Classes
Unit/Topic No.
Require
IImplement and demonstrate the FIND-S algorithm for Hands on
finding the most specific hypothesis based on a given set
1 3
of training data samples. Read the training data from a
.CSV file.
For a given set of training data examples stored in a .CSV Hands on
file, implement and demonstrate the Candidate
2 3
Elimination algorithm to output a description of the set of
all hypotheses consistent with the training examples.
Write a program to demonstrate the working of the Hands on
decision tree based ID3 algorithm. Use an appropriate data
3 3
set for building the decision tree and apply this knowledge
to classify a new sample.
Exercises to solve the real-world problems using the Hands on
4 following machine learning methods: a) Linear Regression 3
b) Logistic Regression c) Binary Classifier
Develop a program for Bias, Variance, Remove duplicates Hands on
5 3
, Cross Validation
Write a program to implement Categorical Encoding, Hands on
6 3
One-hot Encoding
Build an Artificial Neural Network by implementing the Hands on
7 Back propagation algorithm and test the same using 3
appropriate data sets.
Write a program to implement k-Nearest Neighbor Hands on
8 algorithm to classify the iris data set. Print both correct 3
and wrong predictions.
Implement the non-parametric Locally Weighted Hands on
9 Regression algorithm in order to fit data points. Select 3
appropriate data set for your experiment and draw graphs.
Assuming a set of documents that need to be classified, Hands on
use the naïve Bayesian Classifier model to perform this
10 task. Built-in Java classes/API can be used to write the 3
program. Calculate the accuracy, precision, and recall for
your data set.
Apply EM algorithm to cluster a Heart Disease Data Set. Hands on
Use the same data set for clustering using k-Means
11 algorithm. Compare the results of these two algorithms 3
and comment on the quality of clustering. You can add
Java/Python ML library classes/API in the program.
Exploratory Data Analysis for Classification using Pandas Hands on
12 3
or Matplotlib.
Write a Python program to construct a Bayesian network Hands on
considering medical data. Use this model to demonstrate
13 3
the diagnosis of heart patients using standard Heart
Disease Data Set
Write a program to Implement Support Vector Machines Hands on
14 3
and Principle Component Analysis
Write a program to Implement Principle Component Hands on
15 3
Analysis
Total No. of hours:45

Requirements: Develop the following program using Anaconda/ Jupiter/ Spider and evaluate ML models.

Text Books:
1. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition,
O’Reilly Publications, 2019

2. Data Science and Machine Learning Mathematical and Statistical Methods,Dirk P.


Kroese, Zdravko I. Botev, Thomas Taimre, Radislav Vaisman,25th November 2020

Reference Books:
1. Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012.

FACULTY HEAD OF THE DEPARTMENT


DEPARTMENT OF INFORMATION TECHNOLOGY

INDEX

Exercise Description
Implement and demonstrate the FIND-S algorithm for finding the most
1 specific hypothesis based on a given set of training data samples.Read the
training data from a .CSV file.
For a given set of training data examples stored in a .CSV file,implement and
demonstrate the Candidate Elimination algorithm to
2
output a description of the set of all hypotheses consistent with thetraining
examples.
Write a program to demonstrate the working of the decision tree basedID3
3 algorithm. Use an appropriate data set for building the decision tree
and apply this knowledge to classify a new sample.
Exercises to solve the real-world problems using the following machinelearning
4 methods: a) Linear Regression b) Logistic Regression c)
Binary Classifier
Develop a program for Bias, Variance, Remove duplicates , Cross
5 Validation
Write a program to implement Categorical Encoding, One-hot
6
Encoding
Build an Artificial Neural Network by implementing the Back
7
propagation algorithm and test the same using appropriate data sets.
Write a program to implement k-Nearest Neighbor algorithm to classify
8 the iris data set. Print both correct and wrong predictions.
Implement the non-parametric Locally Weighted Regression algorithm
9 in order to fit data points. Select appropriate data set for yourexperiment and
draw graphs.
Assuming a set of documents that need to be classified, use the naïve Bayesian
Classifier model to perform this task. Built-in Java classes/API can be used to
10 write the program. Calculate the accuracy,
precision, and recall for your data set.
Apply EM algorithm to cluster a Heart Disease Data Set. Use the same data set for
clustering using k-Means algorithm. Compare the results of these two algorithms
11
and comment on the quality of clustering. You can
add Java/Python ML library classes/API in the program.
Exploratory Data Analysis for Classification using Pandas or
12
Matplotlib.
Write a Python program to construct a Bayesian network considering
13 medical data. Use this model to demonstrate the diagnosis of heartpatients using
standard Heart Disease Data Set
Write a program to Implement Support Vector Machines and Principle
14
Component Analysis
15 Write a program to Implement Principle Component Analysis
FIND-S algorithm and Read the training datafrom a .CSV file.

FIND-S algorithm
Exp.No:
AIM:-Implement and demonstrate the FIND-S algorithm for finding the most specific
hypothesis based on a given set of training data samples. Read the training data from a .CSV
file.

Software Required:
Develop the following program using Anaconda/ Jupiter/ Spider and evaluate ML
Models.

Program:-

import csv
with open('tennis.csv', 'r') as f:
reader = csv.reader(f)
your_list = list(reader)
h = [['0', '0', '0', '0', '0', '0']]
for i in your_list:
print(i)
if i[-1] == "True":
j=0
for x in i:
if x != "True":
if x != h[0][j] and h[0][j] == '0':
h[0][j] = x
elif x != h[0][j] and h[0][j] != '0':
h[0][j] = '?'
else:
pass
j=j+1
print("Most specific hypothesis is")
print(h)
Output:-
'Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same',True
'Sunny', 'Warm', 'High', 'Strong', 'Warm', 'Same',True
'Rainy', 'Cold', 'High', 'Strong', 'Warm', 'Change',False
'Sunny', 'Warm', 'High', 'Strong', 'Cool','Change',True

Dept. Of IT, GIET Engineering College, Rajahmundry


FIND-S algorithm and Read the training datafrom a .CSV file.

Maximally Specific set


[['Sunny', 'Warm', '?', 'Strong', '?', '?']]

Result:
The aim of the program is successfully executed and verified.
References:-
Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012

Dept. Of IT, GIET Engineering College, Rajahmundry


Candidate-Elimination algorithm
Candidate-Elimination algorithm
Exp.No:
AIM:-For a given set of training data examples stored in a .CSV file, implement and demonstrate
the Candidate-Elimination algorithm to output a description of the set of all hypotheses consistent
with the training examples.

Software Required:
Develop the following program using Anaconda/ Jupiter/ Spider and evaluate ML
Models.
Program:-
class Holder:
factors={} #Initialize an empty dictionary
attributes = () #declaration of dictionaries parameters with an arbitrary length
'''
Constructor of class Holder holding two parameters,
self refers to the instance of the class
'''
def init (self,attr): #
self.attributes = attr
for i in attr:
self.factors[i]=[]
def add_values(self,factor,values):
self.factors[factor]=values
class CandidateElimination:
Positive={} #Initialize positive empty dictionary
Negative={} #Initialize negative empty dictionary
def init (self,data,fact):
self.num_factors = len(data[0][0])
self.factors = fact.factors
self.attr = fact.attributes
self.dataset = data
def run_algorithm(self):'''Initialize the specific and general boundaries, and loop
the dataset against thealgorithm
'''
G = self.initializeG()
S = self.initializeS()

Dept. Of IT, GIET Engineering College, Rajahmundry


Candidate-Elimination algorithm
'''
Programmatically populate list in the iterating variable trial_set
'''
count=0
for trial_set in self.dataset:
if self.is_positive(trial_set): #if trial set/example consists of positive examples
G = self.remove_inconsistent_G(G,trial_set[0]) #remove inconsitent data from
the general boundary

S_new = S[:] #initialize the dictionary with no key-value pair


print (S_new)
for s in S:
if not self.consistent(s,trial_set[0]):
S_new.remove(s)
generalization = self.generalize_inconsistent_S(s,trial_set[0])
generalization = self.get_general(generalization,G)
if generalization:
S_new.append(generalization)
S = S_new[:]
S = self.remove_more_general(S)
print(S)
else:#if it is negative
S = self.remove_inconsistent_S(S,trial_set[0]) #remove inconsitent data from
the specific boundary
G_new = G[:] #initialize the dictionary with no key-value pair (dataset can
take any value)
print (G_new)
for g in G:
if self.consistent(g,trial_set[0]):
G_new.remove(g)
specializations = self.specialize_inconsistent_G(g,trial_set[0])
specializationss = self.get_specific(specializations,S)
if specializations != []:
G_new += specializationss
G = G_new[:]

Dept. Of IT, GIET Engineering College, Rajahmundry


Candidate-Elimination algorithm

G = self.remove_more_specific(G)
print(G)
print (S)
print (G)
def initializeS(self):
''' Initialize the specific boundary '''
S = tuple(['-' for factor in range(self.num_factors)]) #6 constraints in the vector
return [S]
def initializeG(self):
''' Initialize the general boundary '''
G = tuple(['?' for factor in range(self.num_factors)]) # 6 constraints in the vector
return [G]
def is_positive(self,trial_set):
''' Check if a given training trial_set is positive '''
if trial_set[1] == 'Y':

return True
elif trial_set[1] == 'N':
return False
else:
raise TypeError("invalid target value")
def match_factor(self,value1,value2):
''' Check for the factors values match,
necessary while checking the consistency of
training trial_set with the hypothesis '''
if value1 == '?' or value2 == '?':
return True
elif value1 == value2 :
return True
return False
def consistent(self,hypothesis,instance):
''' Check whether the instance is part of the hypothesis '''
for i,factor in enumerate(hypothesis):
if not self.match_factor(factor,instance[i]):
return False

Dept. Of IT, GIET Engineering College, Rajahmundry


Candidate-Elimination algorithm

return True
def remove_inconsistent_G(self,hypotheses,instance):
''' For a positive trial_set, the hypotheses in G
inconsistent with it should be removed '''
G_new = hypotheses[:]
for g in hypotheses:
if not self.consistent(g,instance):
G_new.remove(g)
return G_new
def remove_inconsistent_S(self,hypotheses,instance):
''' For a negative trial_set, the hypotheses in S
inconsistent with it should be removed '''
S_new = hypotheses[:]
for s in hypotheses:
if self.consistent(s,instance):
S_new.remove(s)
return S_new
def remove_more_general(self,hypotheses):
''' After generalizing S for a positive trial_set, the hypothesis in S
general than others in S should be removed '''
S_new = hypotheses[:]
for old in hypotheses:

for new in S_new:


if old!=new and self.more_general(new,old):
S_new.remove[new]
return S_new
def remove_more_specific(self,hypotheses):
''' After specializing G for a negative trial_set, the hypothesis in G
specific than others in G should be removed '''
G_new = hypotheses[:]
for old in hypotheses:
for new in G_new:
if old!=new and self.more_specific(new,old):
G_new.remove[new]

Dept. Of IT, GIET Engineering College, Rajahmundry


Candidate-Elimination algorithm

return G_new
def generalize_inconsistent_S(self,hypothesis,instance):
''' When a inconsistent hypothesis for positive trial_set is seen in the specific
boundary S,
it should be generalized to be consistent with the trial_set ... we will get one
hypothesis'''
hypo = list(hypothesis) # convert tuple to list for mutability
for i,factor in enumerate(hypo):
if factor == '-':
hypo[i] = instance[i]
elif not self.match_factor(factor,instance[i]):
hypo[i] = '?'
generalization = tuple(hypo) # convert list back to tuple for immutability
return generalization
def specialize_inconsistent_G(self,hypothesis,instance):
''' When a inconsistent hypothesis for negative trial_set is seen in the general
boundary G
should be specialized to be consistent with the trial_set.. we will get a set of
hypotheses '''
specializations = []
hypo = list(hypothesis) # convert tuple to list for mutability
for i,factor in enumerate(hypo):
if factor == '?':
values = self.factors[self.attr[i]]
for j in values:
if instance[i] != j:
hyp=hypo[:]
hyp[i]=j
hyp=tuple(hyp) # convert list back to tuple for immutability
specializations.append(hyp) return specializations

def get_general(self,generalization,G):
''' Checks if there is more general hypothesis in G for a generalization of inconsistent
hypothesis in S in case of positive trial_set and returns valid generalization '''
for g in G:

Dept. Of IT, GIET Engineering College, Rajahmundry


Candidate-Elimination algorithm

if self.more_general(g,generalization):
return generalization
return None

def get_specific(self,specializations,S):
''' Checks if there is more specific hypothesis in S
for each of hypothesis in specializations of an
inconsistent hypothesis in G in case of negative trial_set
and return the valid specializations'''
valid_specializations = []
for hypo in specializations:
for s in S:
if self.more_specific(s,hypo) or s==self.initializeS()[0]:
valid_specializations.append(hypo)
return valid_specializations

def exists_general(self,hypothesis,G):
'''Used to check if there exists a more general hypothesis in
general boundary for version space'''
for g in G:
if self.more_general(g,hypothesis):
return True
return False
def exists_specific(self,hypothesis,S):
'''Used to check if there exists a more specific hypothesis in
general boundary for version space'''
for s in S:
if self.more_specific(s,hypothesis):
return True
return False
def more_general(self,hyp1,hyp2):
''' Check whether hyp1 is more general than hyp2 '''
hyp = zip(hyp1,hyp2)
for i,j in hyp:
if i == '?':

Dept. Of IT, GIET Engineering College, Rajahmundry


Candidate-Elimination algorithm

continue

elif j == '?':
if i != '?':
return False
elif i != j:
return False
else:
continue
return True
def more_specific(self,hyp1,hyp2):
''' hyp1 more specific than hyp2 is
equivalent to hyp2 being more general than hyp1 '''
return self.more_general(hyp2,hyp1)
dataset=[(('sunny','warm','normal','strong','warm','same'),'Y'),(('sunny','warm','high','stron
g','warm','same'),'Y'),(('rainy','cold','high','strong','warm','change'),'N'),(('sunny','warm','hi
gh','strong','cool','change'),'Y')]
attributes =('Sky','Temp','Humidity','Wind','Water','Forecast')
f = Holder(attributes)
f.add_values('Sky',('sunny','rainy','cloudy')) #sky can be sunny rainy or cloudy
f.add_values('Temp',('cold','warm')) #Temp can be sunny cold or warm
f.add_values('Humidity',('normal','high')) #Humidity can be normal or high
f.add_values('Wind',('weak','strong')) #wind can be weak or strong
f.add_values('Water',('warm','cold')) #water can be warm or cold
f.add_values('Forecast',('same','change')) #Forecast can be same or change
a = CandidateElimination(dataset,f) #pass the dataset to the algorithm class and call the
run algoritm method
a.run_algorithm()

Dept. Of IT, GIET Engineering College, Rajahmundry


Candidate-Elimination algorithm

Output
[('sunny', 'warm', 'normal', 'strong', 'warm', 'same')]
[('sunny', 'warm', 'normal', 'strong', 'warm','same')]
[('sunny', 'warm', '?', 'strong', 'warm', 'same')]
[('?', '?', '?', '?', '?', '?')]
[('sunny', '?', '?', '?', '?', '?'), ('?', 'warm', '?', '?', '?', '?'), ('?', '?', '?', '?', '?', 'same')]
[('sunny', 'warm', '?', 'strong', 'warm', 'same')]
[('sunny', 'warm', '?', 'strong', '?', '?')]
[('sunny', 'warm', '?', 'strong', '?', '?')]
[('sunny', '?', '?', '?', '?', '?'), ('?', 'warm', '?', '?', '?', '?')]

Result:
The aim of the program is successfully executed and verified.
References:-
Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012

Dept. Of IT, GIET Engineering College, Rajahmundry


Decision tree based ID3 algorithm

Decision tree based ID3 algorithm


Exp.No:

AIM:-Write a program to demonstrate the working of the decision tree based ID3 algorithm.
Use an appropriate data set for building the decision tree and apply this knowledge to classify
a new sample.

Software Required:
Develop the following program using Anaconda/ Jupiter/ Spider and evaluate ML
Models.
Program:-
import numpy as
npimport math
from data_loader import read_data
class Node:
def init (self, attribute): self.attribute = attributeself.children =
[] self.answer = ""
def str (self):
return
self.attribute
def subtables(data, col,
delete):dict = {}
items = np.unique(data[:, col])
count = np.zeros((items.shape[0], 1),
dtype=np.int32)for x in range(items.shape[0]):
for y in range(data.shape[0]):
if data[y, col] ==
items[x]:count[x] += 1
for x in range(items.shape[0]):
dict[items[x]] = np.empty((int(count[x]), data.shape[1]), dtype="|S32")
pos = 0
for y in
range(data.shape[0]):if
data[y, col] == items[x]:
dict[items[x]][pos] = data[y]pos += 1
if delete:

Dept. Of IT, GIET Engineering College, Rajahmundry


Decision tree based ID3 algorithm

dict[items[x]] = np.delete(dict[items[x]], col,


1)return items, dict
def entropy(S):
items =
np.unique(S)if
items.size == 1:

return 0
counts = np.zeros((items.shape[0],
1))sums = 0
for x in range(items.shape[0]):
counts[x] = sum(S == items[x]) / (S.size * 1.0)
for count in counts:
sums += -1 * count * math.log(count,
2)return sums
def gain_ratio(data, col):
items, dict = subtables(data, col, delete=False)
total_size = data.shape[0]
entropies = np.zeros((items.shape[0],
1))intrinsic = np.zeros((items.shape[0],
1)) for x in range(items.shape[0]):
ratio = dict[items[x]].shape[0]/(total_size * 1.0)
entropies[x] = ratio * entropy(dict[items[x]][:, -
1])intrinsic[x] = ratio * math.log(ratio, 2)
total_entropy = entropy(data[:, -
1])iv = -1 * sum(intrinsic)
for x in
range(entropies.shape[0]):
total_entropy -= entropies[x]
return total_entropy / iv
def create_node(data, metadata):
if (np.unique(data[:, -1])).shape[0] ==
1:node = Node("")
node.answer = np.unique(data[:, -
1])[0]return node

Dept. Of IT, GIET Engineering College, Rajahmundry


Decision tree based ID3 algorithm

gains = np.zeros((data.shape[1] - 1,
1))for col in range(data.shape[1] - 1):
gains[col] = gain_ratio(data,
col)split = np.argmax(gains)
node = Node(metadata[split])

metadata = np.delete(metadata, split, 0)

items, dict = subtables(data, split, delete=True)


for x in range(items.shape[0]):
child = create_node(dict[items[x]],
metadata)node.children.append((items[x],
child))
return node
def
empty(size):
s = ""
for x in
range(size):s +=
""
return s
def print_tree(node,
level):if node.answer
!= "":
print(empty(level),
node.answer)return
print(empty(level),
node.attribute)for value, n in
node.children:
print(empty(level + 1),
value)print_tree(n, level + 2)
metadata, traindata =
read_data("tennis.csv")data =
np.array(traindata)
node = create_node(data,
metadata)print_tree(node, 0)
Dept. Of IT, GIET Engineering College, Rajahmundry
Decision tree based ID3 algorithm

Data_loader.py
import csv
def read_data(filename):
with open(filename, 'r') as csvfile:
datareader = csv.reader(csvfile,
delimiter=',')headers = next(datareader)
metadata =
[]traindata =
[]
for name in headers:
metadata.append(name
)
for row in datareader:
traindata.append(row)
return (metadata, traindata)

Tennis.csv
outlook,temperature,humidity,wind,
answer sunny,hot,high,weak,no
sunny,hot,high,strong,no
overcast,hot,high,weak,yes
rain,mild,high,weak,yes
rain,cool,normal,weak,yes
rain,cool,normal,strong,no
overcast,cool,normal,strong,yes
sunny,mild,high,weak,no
sunny,cool,normal,weak,yes
rain,mild,normal,weak,yes
sunny,mild,normal,strong,yes
overcast,mild,high,strong,yes
overcast,hot,normal,weak,yes
rain,mild,high,strong,no

Dept. Of IT, GIET Engineering College, Rajahmundry


Decision tree based ID3 algorithm

Output
outlook
overcas
t
b'yes'
rain
wind
b'strong'
b'no'
b'weak'
b'yes'
sunny
humidit
y
b'hig
h'
b'no'
b'norma
l'b'yes

Result:
The aim of the program is successfully executed and verified.
References:-
Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012

Dept. Of IT, GIET Engineering College, Rajahmundry


a) Linear Regression b) Logistic Regression c) Binary Classifier

The real-world problems


Exp.No:
AIM:-Exercises to solve the real-world problems using the following machine learning methods:
a) Linear Regression b) Logistic Regression c) Binary Classifier
Software Required:
Develop the following program using Anaconda/ Jupiter/ Spider and evaluate ML
Models.

a) Linear Regression:
1. Predicting the price of a house based on its square footage, number of bedrooms,
number ofbathrooms, and location.
2. Forecasting the amount of rainfall in a region based on historical data and weather patterns.
3. Estimating the number of bike rentals in a city based on temperature, humidity, and day
of the week.
b) Logistic Regression:
1. Identifying the likelihood of a customer defaulting on a loan based on their credit score,
income,and employment status.
2. Predicting the probability of a patient developing a particular disease based on their age,
sex, andmedical history.
3. Determining the chance of a student passing an exam based on their study habits, test
scores, andattendance record.
c) Binary Classifier:
1. Identifying the likelihood of a customer defaulting on a loan based on their credit score,
income,and employment status.
2. Predicting the probability of a patient developing a particular disease based on their age,
sex, andmedical history.
3. Determining the chance of a student passing an exam based on their study habits, test
scores, andattendance record.
It's important to notice that these are just a few examples and there are many real-world
problemsthat can be solved using these machine learning methods. It's also important to notice
that the success of solving these problems depends on the quality of the data and the feature
engineering, which means that the performance of the model will vary depending on the dataset.

Result:
The aim of the program is successfully executed and verified.
References:-
Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012

Dept. Of IT, GIET Engineering College, Rajahmundry


Bias, Variance, Remove duplicates , Cross Validation
Bias, Variance, Remove duplicates , Cross Validation
Exp.No:
AIM:- Develop a program for Bias, Variance, Remove duplicates , Cross Validation

Software Required:
Develop the following program using Anaconda/ Jupiter/ Spider and evaluate ML
Models.
Program:-
1.Bias and Variance:
import numpy as np
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
# Read the data set
df = pd.read_csv("data.csv")
# Split the data into training and testing sets
X = df.drop(columns=['label'])
y = df['label']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Train the linear regression model
reg = LinearRegression().fit(X_train, y_train)
# Compute the bias and variance
y_train_pred = reg.predict(X_train)
bias = mean_squared_error(y_train, y_train_pred)
y_test_pred = reg.predict(X_test)
variance = mean_squared_error(y_test, y_test_pred) - bias
print("Bias:", bias)
print("Variance:", variance)
This example uses the Scikit-learn library to train a linear regression model and calculate
the bias and variance of the model. The train_test_split function is used to split the data into
training and testing sets, and the LinearRegression class is used to train the model. The
mean_squared_error function is used to calculate the bias and variance of the model by
comparing the predicted values on the training and testing sets with the actual values.

Dept. Of IT, GIET Engineering College, Rajahmundry


Bias, Variance, Remove duplicates , Cross Validation
2. Remove duplicates
# Remove duplicates
df = df.drop_duplicates()
This line of code will remove all the duplicate rows in the dataframe 'df'
3.Cross Validation:
from sklearn.model_selection import cross_val_score
# Cross validation
scores = cross_val_score(reg, X, y, cv=5)
print("Cross validation scores:", scores)
This example uses the cross_val_score function from the Scikit-learn library to perform 5-
fold cross validation on the model. The function takes the model, the feature set, and the target
variable as input, and returns an array of validation scores. These scores can be used to get an
estimate of the model's performance on unseen data.
It's important to notice that these are just examples, and the specific implementation may
vary depending on the data set and the problem you are trying to solve. Also, depending on the
data and the problem, there might be other techniques and metrics that are more appropriate for
assessing the performance of the model.

Result:
The aim of the program is successfully executed and verified.
References:-
Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012

Dept. Of IT, GIET Engineering College, Rajahmundry


Categorical Encoding, One-hot Encoding
Categorical Encoding, One-hot Encoding
Exp.No:
AIM:-Write a program to implement Categorical Encoding, One-hot Encoding

Software Required:
Develop the following program using Anaconda/ Jupiter/ Spider and evaluate ML
Models.
Categorical encoding is the process of converting categorical variables into numerical
variables in order to use them as input for machine learning algorithms. Two common types of
categorical encoding are:

1. Label Encoding:
from sklearn.preprocessing import LabelEncoder
# Create an instance of LabelEncoder
le = LabelEncoder()
# Fit and transform the categorical column
df['categorical_column'] = le.fit_transform(df['categorical_column'])

This example uses the LabelEncoder class from the Scikit-learn library to convert a
categorical column in a DataFrame into numerical values. The fit_transform method is used to fit
the encoder to the data and transform the column. The encoded values are integers and can be
used as input for machine learning algorithms.

2.One-Hot Encoding:

from sklearn.preprocessing import OneHotEncoder


# Create an instance of OneHotEncoder
ohe = OneHotEncoder()
# Fit and transform the categorical column
one_hot = ohe.fit_transform(df[['categorical_column']])
# Create a DataFrame from the one-hot encoded array
one_hot_df = pd.DataFrame(one_hot.toarray(), columns=ohe.categories_)

Dept. Of IT, GIET Engineering College, Rajahmundry


Categorical Encoding, One-hot Encoding

This example uses the OneHotEncoder class from the Scikit-learn library to convert a
categorical column in a DataFrame into a one-hot encoded array. The fit_transform method is
used to fit the encoder to the data and transform the column. The result is an array with a column
for each category, and the values in the columns are binary indicating the presence or absence of
the category in the original column.
It's important to notice that while Label Encoding can be useful in some cases, it can
sometimes lead to the model interpreting the encoded values as ordinal, when they are in fact
nominal. One-Hot Encoding is less prone to this issue, but it can lead to an increase in the
number of features. It's important to consider the problem and the data before choosing the
appropriate encoding technique.

Result:
The aim of the program is successfully executed and verified.
References:-
Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012

Dept. Of IT, GIET Engineering College, Rajahmundry


Backpropagation algorithm

Backpropagation algorithm
Exp.No:

AIM:-Build an Artificial Neural Network by implementing the Backpropagation algorithm and


test the same using appropriate data sets.

Software Required:
Develop the following program using Anaconda/ Jupiter/ Spider and evaluate ML
Models.
Program:-
import numpy as np
X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)
y = np.array(([92], [86], [89]), dtype=float)
X = X/np.amax(X,axis=0) # maximum of X array longitudinally
y = y/100
#Sigmoid Function
def sigmoid (x):
return 1/(1 + np.exp(-x))
#Derivative of Sigmoid Function
def derivatives_sigmoid(x):
return x * (1 - x)
#Variable initialization
epoch=7000 #Setting training iterations
lr=0.1 #Setting learning rate
inputlayer_neurons = 2 #number of features in data set
hiddenlayer_neurons = 3 #number of hidden layers neurons
output_neurons = 1 #number of neurons at output layer
#weight and bias initialization
wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))
bout=np.random.uniform(size=(1,output_neurons))
#draws a random range of numbers uniformly of dim x*y
for i in range(epoch):
#Forward Propogation hinp1=np.dot(X,wh)
hinp=hinp1 + bh hlayer_act = sigmoid(hinp)

Dept. Of IT, GIET Engineering College, Rajahmundry


Backpropagation algorithm
outinp1=np.dot(hlayer_act,wout)
outinp= outinp1+ bout
output = sigmoid(outinp)
#Backpropagation
EO = y-output
outgrad = derivatives_sigmoid(output)
d_output = EO* outgrad
EH = d_output.dot(wout.T)
hiddengrad = derivatives_sigmoid(hlayer_act)#how much hidden layer wts
contributed to error

d_hiddenlayer = EH * hiddengrad
wout += hlayer_act.T.dot(d_output) *lr# dotproduct of nextlayererror and
currentlayerop
# bout += np.sum(d_output, axis=0,keepdims=True) *lr
wh += X.T.dot(d_hiddenlayer) *lr
#bh += np.sum(d_hiddenlayer, axis=0,keepdims=True) *lr
print("Input: \n" + str(X))
print("Actual Output: \n" + str(y))
print("Predicted Output: \n" ,output)
Input: o
u
[[ 0.66666667
t 1. ]
[ 0.33333333
p 0.55555556]
[ 1. u 0.66666667]]
t
Actual Output:
[[ 0.92]
[ 0.86]
[ 0.89]]

Predicted Output:

[[ 0.89559591]
[ 0.88142069]
[ 0.8928407 ]]
Result:
The aim of the program is successfully executed and verified.
References:-
Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012

Dept. Of IT, GIET Engineering College, Rajahmundry


Neighbour algorithm to classify the iris data set

Neighbour algorithm to classify the iris data set

Exp.No:

AIM:-Write a program to implement the Neighbour algorithm to classify the iris data set. Print both
correct and wrong predictions. Java/Python ML library classes can be used for this problem.

Software Required:
Develop the following program using Anaconda/ Jupiter/ Spider and evaluate ML
Models.

Program:-
import csv import
random import math
import operator
def loadDataset(filename, split, trainingSet=[] , testSet=[]):with
open(filename, 'rb') as csvfile:
lines = csv.reader(csvfile)
dataset = list(lines)
for x in range(len(dataset)-1):for y
in range(4):
dataset[x][y] = float(dataset[x][y])if
random.random() < split:
trainingSet.append(dataset[x])else:
testSet.append(dataset[x])
def euclideanDistance(instance1, instance2, length):
distance = 0
for x in range(length):
distance += pow((instance1[x] - instance2[x]), 2)return
math.sqrt(distance)
def getNeighbors(trainingSet, testInstance, k):
distances = []
length = len(testInstance)-1
for x in range(len(trainingSet)):
dist = euclideanDistance(testInstance, trainingSet[x], length)
distances.append((trainingSet[x], dist))
distances.sort(key=operator.itemgetter(1))neighbors = []

Dept. Of IT, GIET Engineering College, Rajahmundry


Neighbour algorithm to classify the iris data set

for x in range(k):
neighbors.append(distances[x][0])return
neighbors

def getResponse(neighbors):
classVotes = {}
for x in range(len(neighbors)): response
= neighbors[x][-1]if response in
classVotes:
classVotes[response] += 1
else:
classVotes[response] = 1

sortedVotes =
sorted(classVotes.iteritems(),
reverse=True)
return sortedVotes[0][0]

def getAccuracy(testSet,
predictions): correct = 0
for x in
range(len(testSet)):
key=operator.itemgetter(1
),
if testSet[x][-1] == predictions[x]:
correct += 1
return (correct/float(len(testSet))) * 100.0

def main():
# prepare
data
trainingSet=
[] testSet=[]
split = 0.67

Dept. Of IT, GIET Engineering College, Rajahmundry


Neighbour algorithm to classify the iris data set
loadDataset('knndat.data', split, trainingSet, testSet) print('Train set: ' + repr(len(trainingSet)))
print('Test set: ' + repr(len(testSet)))
# generate
predictions
predictions=[]
k=3
for x in range(len(testSet)):
neighbors = getNeighbors(trainingSet, testSet[x],
k) result = getResponse(neighbors)
predictions.append(result)
print('> predicted=' + repr(result) + ', actual=' + repr(testSet[x][-
1])) accuracy = getAccuracy(testSet, predictions)
print('Accuracy: ' + repr(accuracy) +
'%') main()

OUTPUT
Confusion matrix is as follows

[[11 0 0]

[0 9 1]

[0 1 8]]

Accuracy metrics0

1.00 1.00 1.00 11

1 0.90 0.90 0.90 10

2 0.89 0.89 0,89 9

Avg/Total 0.93 0.93 0.93 30

Result:
The aim of the program is successfully executed and verified.
References:-
Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012

Dept. Of IT, GIET Engineering College, Rajahmundry


Non-parametric Locally Weighted Regression algorithm

Non-parametric Locally Weighted Regression algorithm

Exp.No:

AIM:-Implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select appropriate data set for your experiment and draw graphs.

Software Required:
Develop the following program using Anaconda/ Jupiter/ Spider and evaluate ML
Models.

Program:-
from numpy import *import operator
from os
import listdir
import
matplotlib
import
matplotlib.pyplot as plt
import pandas as pd
import numpy as
np1 import
numpy.linalg as np
from scipy.stats.stats import pearsonr
def
kernel(point,x
mat, k):m,n =
np1.shape(xmat
)
weights = np1.mat(np1.eye((m)))for j in range(m):
diff = point - X[j]
weights[j,j] = np1.exp(diff*diff.T/(-
2.0*k**2))return weights
def localWeight(point,xmat,ymat,k):wei =
kernel(point,xmat,k)
W=(X.T*(wei*X)).I*(X.T*(wei*ymat.T))return W
def localWeightRegression(xmat,ymat,k):m,n = np1.shape(xmat)

Dept. Of IT, GIET Engineering College, Rajahmundry


Non-parametric Locally Weighted Regression algorithm

ypred = np1.zeros(m)for i in range(m):


ypred[i] =
xmat[i]*localWeight(xmat[i],xmat,ymat,k)
return ypred
# load data points
data =
pd.read_csv('data10.csv'
)bill =
np1.array(data.total_bill
) tip =
np1.array(data.tip)
#preparing and add
1 in billmbill =
np1.mat(bill)
mtip = np1.mat(tip)
m= np1.shape(mbill)[1]
one =
np1.mat(np1.ones(m))
X=
np1.hstack((one.T,mbill
.T))

#set k here
ypred = localWeightRegression(X,mtip,2)

SortIndex =
X[:,1].argsort(0)xsort
= X[SortIndex][:,0]

Dept. Of IT, GIET Engineering College, Rajahmundry


Non-parametric Locally Weighted Regression algorithm

Output

Result:
The aim of the program is successfully executed and verified.
References:-
Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012

Dept. Of IT, GIET Engineering College, Rajahmundry


Naïve Bayesian Classifier model
Naïve Bayesian Classifier model
Exp.No:

AIM:-Assuming a set of documents that need to be classified, use the naïve Bayesian
Classifier model to perform this task. Built-in Java classes/API can be used to write the
program. Calculate the accuracy, precision, and recall for your data set.

Software Required:
Develop the following program using Anaconda/ Jupiter/ Spider and evaluate ML
Models.

Program:-

import pandas as pd
msg=pd.read_csv('naivetext1.csv',names=['message','label'])
print('The dimensions of the dataset',msg.shape)
msg['labelnum']=msg.label.map({'pos':1,'neg':0})
X=msg.message
y=msg.labelnum
print(X)
print(y)
#splitting the dataset into train and test data
from sklearn.model_selection import train_test_split
xtrain,xtest,ytrain,ytest=train_test_split(X,y)
print(xtest.shape)
print(xtrain.shape)print(ytest.shape) print(ytrain.shape)
#output of count vectoriser is a sparse matrix
from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()
xtrain_dtm = count_vect.fit_transform(xtrain)
xtest_dtm=count_vect.transform(xtest)
print(count_vect.get_feature_names())
df=pd.DataFrame(xtrain_dtm.toarray(),columns=count_vect.get_feature_names())
print(df)#tabular representation
print(xtrain_dtm) #sparse matrix representation
# Training Naive Bayes (NB) classifier on training data.
from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB().fit(xtrain_dtm,ytrain)predicted = clf.predict(xtest_dtm)

Dept. Of IT, GIET Engineering College, Rajahmundry


Naïve Bayesian Classifier model

#printing accuracy metrics from sklearn import metricsprint('Accuracy metrics')


print('Accuracy of the classifer is',metrics.accuracy_score(ytest,predicted))
print('Confusion matrix')
print(metrics.confusion_matrix(ytest,predicted))
print('Recall and Precison ')
print(metrics.recall_score(ytest,predicted))
print(metrics.precision_score(ytest,predicted))
'''docs_new = ['I like this place', 'My boss is not my saviour']

X_new_counts = count_vect.transform(docs_new)
predictednew = clf.predict(X_new_counts)
for doc, category in zip(docs_new, predictednew):
print('%s->%s' % (doc, msg.labelnum[category]))'''
I love this sandwich,pos
This is an amazing place,pos
I feel very good about these beers,pos
This is my best work,pos
What an awesome view,pos
I do not like this restaurant,neg
I am tired of this stuff,neg
I can't deal with this,neg
He is my sworn enemy,neg
My boss is horrible,neg
This is an awesome place,pos
I do not like the taste of this juice,neg
I love to dance,pos
I am sick and tired of this place,neg
What a great holiday,pos
That is a bad locality to stay,neg
We will have good fun tomorrow,pos
I went to my enemy's house today,neg

Dept. Of IT, GIET Engineering College, Rajahmundry


Naïve Bayesian Classifier model

OUTPUT

['about', 'am', 'amazing', 'an', 'and', 'awesome', 'beers', 'best', 'boss', 'can', 'deal',
'do', 'enemy', 'feel', 'fun', 'good', 'have', 'horrible', 'house', 'is', 'like', 'love', 'my',
'not', 'of', 'place', 'restaurant', 'sandwich', 'sick', 'stuff', 'these', 'this', 'tired', 'to',
'today', 'tomorrow', 'very', 'view', 'we', 'went', 'what', 'will', 'with', 'work']
about am amazing an and awesome beers best boss can ... today \
0 1 0 0 0 0 0 1 0 0 0 ... 0
1 0 0 0 0 0 0 0 1 0 0 ... 0
2 0 0 1 1 0 0 0 0 0 0 ... 0
3 0 0 0 0 0 0 0 0 0 0 ... 1
4 0 0 0 0 0 0 0 0 0 0 ... 0
5 01 0 0 1 0 0 0 0 0 ... 0
6 0 0 0 0 0 0 0 0 0 1 ... 0
7 0 0 0 0 0 0 0 0 0 0 ... 0
8 0 1 0 0 0 0 0 0 0 0 ... 0
9 0 0 0 1 0 1 0 0 0 0 ... 0
10 0 0 0 0 0 0 0 0 0 0 ... 0
11 0 0 0 0 0 0 0 0 1 0 ... 0
12 0 0 0 1 0 1 0 0 0 0 ... 0
tomorrow very view we went what will with work
0 0 1 0 0 00 0 00
1 0 0 0 0 00 0 0 1
2 0 0 0 0 00 0 0 0
3 0 0 0 0 10 0 0 0
4 0 0 0 0 00 0 0 0
5 0 0 0 0 00 0 0 0
6 0 0 0 0 00 0 1 0
7 1 0 0 1 00 1 0 0
8 0 0 0 0 00 0 0 0
Result:
The aim of the program is successfully executed and verified.
References:-
Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012

Dept. Of IT, GIET Engineering College, Rajahmundry


EM algorithm to cluster a set of data stored in a .CSV file

EM algorithm to cluster a set of data stored in a .CSV file

Exp.No:

AIM: - Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set
for clustering using k-Means algorithm. Compare the results of these two algorithms and
comment on the quality of clustering. You can add Java/Python ML library classes/API in the
program.

Software Required:
Develop the following program using Anaconda/ Jupiter/ Spider and evaluate ML
Models.
Program:-
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets.samples_generator import make_blobs
X, y_true = make_blobs(n_samples=100, centers =
4,Cluster_std=0.60,random_state=0)
X = X[:, ::-1]
#flip axes for better plotting
from sklearn.mixture import GaussianMixture
gmm = GaussianMixture (n_components = 4).fit(X)
lables = gmm.predict(X)
plt.scatter(X[:, 0], X[:, 1], c=labels, s=40, cmap=‟viridis‟);
probs = gmm.predict_proba(X)
print(probs[:5].round(3))
size = 50 * probs.max(1) ** 2 # square emphasizes differences
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap=‟viridis‟,
s=size);from matplotlib.patches import Ellipse
def draw_ellipse(position, covariance, ax=None, **kwargs);
“””Draw an ellipse with a given position and covariance”””
Ax = ax or plt.gca()
# Convert covariance to principal axes
if covariance.shape ==(2,2):
U, s, Vt = np.linalg.svd(covariance)
Angle = np.degrees(np.arctan2(U[1, 0], U[0,0]))
Width, height = 2 * np.sqrt(s)
else:

Dept. Of IT, GIET Engineering College, Rajahmundry


angle = 0
width, height = 2 * np.sqrt(covariance)

EM algorithm to cluster a set of data stored in a .CSV file

#Draw the Ellipse


for nsig in range(1,4):
ax.add_patch(Ellipse(position, nsig * width, nsig *height,
angle, **kwargs))
def plot_gmm(gmm, X, label=True, ax=None):
ax = ax or plt.gca()
labels = gmm.fit(X).predict(X)
if label:

ax.scatter(X[:, 0], x[:, 1], c=labels, s=40, cmap=‟viridis‟, zorder=2)


else:
ax.scatter(X[:, 0], x[:, 1], s=40, zorder=2)
ax.axis(„equal‟)
w_factor = 0.2 / gmm.weights_.max()
for pos, covar, w in zip(gmm.means_, gmm.covariances_, gmm.weights_):
draw_ellipse(pos, covar, alpha=w * w_factor)
gmm = GaussianMixture(n_components=4, random_state=42)
plot_gmm(gmm, X)
gmm = GaussianMixture(n_components=4, covariance_type=‟full‟,
random_state=42)
plot_gmm(gmm, X)

Output

[[1 ,0, 0, 0]
[0 ,0, 1, 0]
[1 ,0, 0, 0]
[1 ,0, 0, 0]
[1 ,0, 0, 0]]

Result:
The aim of the program is successfully executed and verified.
References:-
Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012
Dept. Of IT, GIET Engineering College, Rajahmundry
Data Analysis for Classification
Data Analysis for Classification
Exp.No:
AIM:-Exploratory Data Analysis for Classification using Pandas or Matplotlib.
Software Required:
Develop the following program using Anaconda/ Jupiter/ Spider and evaluate ML
Models.
Program:-

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer(as_frame=True).frame
# Print the first 5 rows of the data
print(data.head())
# Print the shape of the data
print(data.shape)
# Print the summary statistics of the data
print(data.describe())
# Print the count of each class
print(data.target.value_counts())
# Plot a bar chart of the target variable
data.target.value_counts().plot(kind='bar')
plt.title('Target Variable Distribution')
plt.xlabel('Class')
plt.ylabel('Count')
plt.show()
# Plot a histogram of the mean radius feature
data['mean radius'].plot(kind='hist', bins=30)
plt.title('Distribution of Mean Radius')
plt.xlabel('Mean Radius')
plt.ylabel('Count')
plt.show()
# Plot a scatter matrix of the data
pd.plotting.scatter_matrix(data, c=data.target, figsize=(15,15), marker='o', hist_kwds={'bins': 20},
alpha=0.5)
plt.show()

Dept. Of IT, GIET Engineering College, Rajahmundry


Data Analysis for Classification

Output:-
mean radius mean texture mean perimeter mean area mean smoothness mean compactness mean
concavity mean concave points mean symmetry mean fractal dimension ... worst texture worst
perimeter worst area worst smoothness worst compactness worst concavity worst concave points worst
symmetry worst fractal dimension target
0 17.99 10.38 122.80 1001.0 0.11840 0.27760 0.3001 0.14710
0.2419 0.07871 ... 17.33 184.60 2019.0 0.1622 0.6656
0.7119 0.2654 0.4601 0.11890 0
1 20.57 17.77 132.90 1326.0 0.08474 0.07864 0.0869 0.07017
0.1812 0.05667 ... 23.41 158.80 1956.0 0.1238 0.1866
0.2416 0.1860 0.2750 0.08902 0
2 19.69 21.25 130.00 1203.0 0.10960 0.15990 0.1974 0.12790
0.2069 0.05999 ... 25.53 152.50 1709.0 0.1444 0.4245
0.4504 0.2430 0.3613 0.08758 0
3 11.42 20.38 77.58 386.1 0.14250 0.28390 0.2414 0.10520
0.2597 0.

Result:
The aim of the program is successfully executed and verified.
References:-
Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012

Dept. Of IT, GIET Engineering College, Rajahmundry


Construct a Bayesian network
Construct a Bayesian network
Exp.No:
AIM:-Write a Python program to construct a Bayesian network considering medical data. Use this
model to demonstrate the diagnosis of heart patients using standard Heart Disease Data Set

Software Required:
Develop the following program using Anaconda/ Jupiter/ Spider and evaluate ML
Models.
Program:-

# observed values
age_obs = 60
gender_obs = 1
bp_obs = 130

# create a dictionary of observed values


observed_values = {'age': age_obs, 'gender': gender_obs, 'blood_pressure': bp_obs}
# use the model to make predictions
with heart_model:
trace = pm.sample(10000)
p_heart_disease = pm.Deterministic('p_heart_disease',
pm.math.sigmoid(age*0.05 + gender*1.0 + blood_pressure*-0.05))
posterior = pm.sample_posterior_predictive(trace, vars=[p_heart_disease],
samples=10000,
keep_size=True)
# probability of heart disease
prob_heart_disease = posterior['p_heart_disease'].mean()
print("Probability of heart disease: {:.2f}%".format(prob_heart_disease*100))

Output:
Probability of heart disease: 60.25%

Result:
The aim of the program is successfully executed and verified.
References:-
Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012

Dept. Of IT, GIET Engineering College, Rajahmundry


Support Vector Machines and Principle Component Analysis
Support Vector Machines and Principle Component Analysis

Exp.No:
AIM:- Write a program to Implement Support Vector Machines and Principle Component
Analysis

Software Required:
Develop the following program using Anaconda/ Jupiter/ Spider and evaluate ML
Models.
Program:-

# import required libraries


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import svm, datasets
from sklearn.decomposition import PCA
from sklearn.model_selection import train_test_split
from sklearn.metrics import plot_confusion_matrix
# load the iris dataset
iris = datasets.load_iris()
# extract the features and target variable
X = iris.data
y = iris.target
# split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
# perform PCA
pca = PCA(n_components=2)
X_train_pca = pca.fit_transform(X_train)
X_test_pca = pca.transform(X_test)
# visualize the PCA transformed data
plt.figure(figsize=(8,6))
plt.scatter(X_train_pca[:,0], X_train_pca[:,1], c=y_train, cmap='plasma')
plt.xlabel('PCA 1')
plt.ylabel('PCA 2')
plt.show()
# train the SVM classifier
svm_classifier = svm.SVC(kernel='linear', C=1, random_state=0)
svm_classifier.fit(X_train_pca, y_train)
# make predictions on the test set

Dept. Of IT, GIET Engineering College, Rajahmundry


Support Vector Machines and Principle Component Analysis
y_pred = svm_classifier.predict(X_test_pca)
# plot the confusion matrix
plot_confusion_matrix(svm_classifier, X_test_pca, y_test, cmap='Blues')
plt.show()

output:

Accuracy: 0.9778
Misclassifications: 1
Confusion Matrix:
[[16 0 0]
[ 0 17 1]
[ 0 0 11]]

Result:
The aim of the program is successfully executed and verified.
References:-
Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012

Dept. Of IT, GIET Engineering College, Rajahmundry


Principle Component Analysis
Principle Component Analysis

Exp.No:
AIM:-Write a program to Implement Principle Component Analysis

Software Required:
Develop the following program using Anaconda/ Jupiter/ Spider and evaluate ML
Models.

Program:-

# import required libraries


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.datasets import load_iris
# load the iris dataset
iris = load_iris()
# extract the features and target variable
X = iris.data
y = iris.target
# perform PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)
# visualize the PCA transformed data
plt.figure(figsize=(8,6))
plt.scatter(X_pca[:,0], X_pca[:,1], c=y, cmap='plasma')
plt.xlabel('PCA 1')
plt.ylabel('PCA 2')
plt.show()

Dept. Of IT, GIET Engineering College, Rajahmundry


Principle Component Analysis

Result:
The aim of the program is successfully executed and verified.
References:-
Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012

Dept. Of IT, GIET Engineering College, Rajahmundry

You might also like