AI and ML Lab Manual
AI and ML Lab Manual
Step 1: Import necessary libraries and load the Data from dataset to a dataframe
PROGRAM:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('spam.csv', encoding='latin-1')
data.head(5)
OUTPUT:
Step 2: Analyze the Data
PROGRAM:
data.info()
OUTPUT:
Step 5: Split the Data into two parts train and test
PROGRAM:
from sklearn.model_selection import train_test_split
y=data["v1"]
X_train, X_test, y_train, y_test = train_test_split(X, y)
print(X_train.shape)
print(X_test.shape)
OUTPUT:
(4179, 8404)
(1393, 8404)
Step 6: Create and Train the Model with Training Data using Multinominal Naive
Bayes Algorithm
PROGRAM:
from sklearn.naive_bayes import MultinomialNB
mnb=MultinomialNB() #Created a model
mnb.fit(X_train,y_train) #Train the model
OUTPUT:
MultinomialNB()
Result:
Created a Machine Learning model using MultiNominal Naïve Bayes
Algorithm, tested the accuracy of the algorithm and predicted the output for new
unclassified data.
Exercise 4: Write a program to construct a Gaussian Naïve Bayes Model
considering medical data. Use this model to demonstrate the diagnosis of
heart patients using standard Heart Disease Data Set.
Data Set: Heart Disease Dataset
Link to Download the Data set: https://www.kaggle.com/johnsmith88/heart-disease-
dataset
Aim:
To construct a Gaussian Naïve Bayes Model considering medical data
and use this model to demonstrate the diagnosis of heart patients using
standard Heart Disease Data Set
Procedural Steps:
1. Import necessary libraries and load the Data from dataset to a dataframe
2. Analyze the dataset
3. Split the Data into two parts train and test
4. Create and Train the Model with Training Data using Gaussian Naive Bayes
Algorithm and Find the Accuracy of Model
5. Confusion Matrix
6. Plot Confusion Matrix
7. Classification Report
Understanding the Dataset
age (#)
sex : 1 = Male, 0 = Female (Binary)
(cp) chest pain [type (4 values, Ordinal)]: 1: typical angina, 2: atypical angina,
3: non-anginal pain, 4: asymptomatic
(trestbps) resting blood pressure (#)
(chol) serum cholestoral in mg/dl (#)
(fbs) fasting blood sugar > 120 mg/dl (Binary) [1 = true; 0 = false]
(restecg) resting electrocardiographic results [values 0,1,2]
(thalach) maximum heart rate achieved (#)
(exang) exercise induced angina (Binary) [1 = yes; 0 = no]
(oldpeak) = ST depression induced by exercise relative to rest (#)
(slope) of the peak exercise ST segment (Ordinal) [ 1: upsloping, 2: flat , 3:
downsloping)
(ca) number of major vessels (0-3, Ordinal) colored by fluoroscopy
(thal) maximum heart rate achieved (Ordinal) [3 = normal; 6 = fixed defect; 7 =
reversable defect]
Step 1: Import necessary libraries and load the Data from dataset to a dataframe
PROGRAM:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import
confusion_matrix,accuracy_score,classification_report,plot_confusion_matrix
from sklearn.naive_bayes import GaussianNB
data = pd.read_csv('heart.csv')
data.head()
OUTPUT:
Step 4: Create and Train the Model with Training Data using Gaussian Naive Bayes
Algorithm and Find the Accuracy of Model
PROGRAM:
nb = GaussianNB()
nb.fit(X_train,y_train)
nbpred = nb.predict(X_test)
nb_acc_score = accuracy_score(y_test, nbpred)
print("Accuracy of Naive Bayes model:",nb_acc_score*100,'\n')
OUTPUT:
Accuracy of Naive Bayes model: 80.97560975609757
Result:
Created a Machine Learning model using Gaussian Naïve Bayes Algorithm,
and tested the accuracy of the algorithm.
Exercise 5: Write a program to implement k-Nearest Neighbor algorithm to
classify the data set
Data Set: Pima Indians Diabetes Database
Link to Download the Data set: https://www.kaggle.com/uciml/pima-indians-diabetes-
database
Aim:
To write a program to implement k-Nearest Neighbor algorithm to
classify the data set
Procedural Steps:
1. Import necessary libraries and load the Data from dataset to a dataframe
2. Analyze the dataset
3. Split the Data into two parts train and test
4. Calculate Error rates for different k values using k-Nearest Neighbor
algorithm
5. Plot the error rates to find optimum k value
6. Create and Train the Model with Training Data using k-Nearest
Neighbor algorithm with optimal k value and find the accuracy
7. Print the Classification Report
Understanding the Dataset
Pregnancies - Number of times pregnant
Glucose - Plasma glucose concentration a 2 hours in an oral glucose tolerance
test
BloodPressure - Diastolic blood pressure (mm Hg)
SkinThickness - Triceps skin fold thickness (mm)
Insulin - 2-Hour serum insulin (mu U/ml)
BMI - Body mass index (weight in kg/(height in m)^2)
DiabetesPedigreeFunction - Diabetes pedigree function
Age - Age (years)
Outcome - Class variable (0 or 1) 268 of 768 are 1, the others are 0
Step 1: Import necessary libraries and load the Data from dataset to a dataframe
PROGRAM:
#Load the necessary python libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix,classification_report
from sklearn.neighbors import KNeighborsClassifier
#Load the dataset
df = pd.read_csv('diabetes.csv')
#Print the first 5 rows of the dataframe.
df.head()
OUTPUT:
Step 4: Calculate Error rates for different k values using k-Nearest Neighbor
algorithm
PROGRAM:
#Finding Error Rates while number of neighbors varies from 1 to 40
error_rate = []
for i in range(1,40):
knn = KNeighborsClassifier(n_neighbors=i)
knn.fit(X_train,y_train)
pred_i = knn.predict(X_test)
error_rate.append(np.mean(pred_i != y_test))
print(error_rate)
OUTPUT:
[0.265625, 0.234375, 0.234375, 0.22395833333333334,
0.22395833333333334, 0.21875, 0.21875, 0.21354166666666666,
0.22395833333333334, 0.19270833333333334, 0.20833333333333334,
0.20833333333333334, 0.21354166666666666, 0.19791666666666666, 0.203125,
0.19270833333333334, 0.203125, 0.20833333333333334, 0.19791666666666666,
0.21354166666666666, 0.21875, 0.22395833333333334, 0.22395833333333334,
0.23958333333333334, 0.23958333333333334, 0.23958333333333334,
0.23958333333333334, 0.2552083333333333, 0.234375, 0.22395833333333334,
0.234375, 0.22395833333333334, 0.234375, 0.22916666666666666, 0.234375,
0.22916666666666666, 0.25, 0.22916666666666666, 0.24479166666666666]
print(classification_report(y_test,pred))
OUTPUT:
Result:
Created a Machine Learning model using k-Nearest Neighbor Algorithm,
and tested the accuracy of the algorithm.
Exercise 6: Implement Support Vector Classification to classify the Iris
Species
Data Set: Iris Species
Link to Download the Data set: https://www.kaggle.com/uciml/iris
Aim:
Implement Support Vector Classification to classify the Iris Species
Procedural Steps:
1. Import necessary libraries and load the Data from dataset to a dataframe
2. Analyze the dataset
3. Visualize the data using pairplot
4. Split the Data into two parts train and test
5. Create and Train the Model with Training Data using Support Vector
Classification algorithm and find the accuracy
6. Print the Confusion Matrix
7. Print the Classification Report
Understanding the Dataset
The dataset is a CSV file which contains a set of 150 records under 5 attributes - Petal
Length, Petal Width, Sepal Length, Sepal width and Class(Species)
Step 1: Import necessary libraries and load the Data from dataset to a dataframe
PROGRAM:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix
iris=pd.read_csv('Iris.csv')
iris.head()
OUTPUT:
OUTPUT:
Step 4: Split the Data into two parts train and test
PROGRAM:
# Separating the independent variables from dependent variables
X=iris.iloc[:,1:5]
y=iris.iloc[:,5]
X_train,X_test, y_train, y_test=train_test_split(X,y,test_size=0.30)
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)
OUTPUT:
(105, 4) (45, 4) (105,) (45,)
Step 5: Create and Train the Model with Training Data using Support Vector
Classification algorithm and find the accuracy.
PROGRAM:
model=SVC()
model.fit(X_train, y_train)
pred=model.predict(X_test)
print(round(model.score(X_test,y_test)*100,2),"%")
OUTPUT:
95.56 %
Result:
Created a Support Vector Classification model and tested the accuracy of the
algorithm.
7. Apply EM algorithm to cluster a set of data stored in a .CSV file.
The M-step updates the cluster parameters based on the calculations from the
E-step. The mean, covariance, and density are calculated for clusters based on the
data points in the E step.
The process is repeated with the calculated values continuing to be updated until
convergence is reached.
PROGRAM:
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('Clustering_gmm.csv')
frame['cluster'] = labels
color=['blue','green','cyan', 'black']
plt.show()
OUTPUT:
Exercise 8: Apply the technique of pruning and derive the decision tree from
this data. Analyze the results by comparing the structure of pruned and
unpruned tree.
Procedure :
1. Download above data set
2. Download and Install the software graphviz for windows 10 using the link
https://graphviz.org/download/
3. Install the libraries pydotplus and graphviz
4. Import required libraries
5. Train the data with DecisionTreeClassifier
6. Visualize Decision Tree using graphviz
7. Maximum depth of the tree can be used as a control variable for pre-pruning
8. The classification rate is increased due pre-pruning
PROGRAM:
# Load libraries
import pandas as pd
from sklearn.tree import DecisionTreeClassifier # Import Decision Tree Classifier
from sklearn.model_selection import train_test_split # Import train_test_split
function
from sklearn import metrics #Import scikit-learn metrics module for accuracy
calculation
pima = pd.read_csv("diabetes.csv")
pima.head()
OUTPUT:
Accuracy: 67.53
dot_data = StringIO()
export_graphviz(clf, out_file=dot_data,
filled=True, rounded=True,
feature_names = feature_cols,class_names=['0','1'])
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
graph.write_png('diabetes.png')
Image(graph.create_png())
OUTPUT:
5. In Scikit-learn, optimization of decision tree classifier performed
by only pre-pruning. Maximum depth of the tree can be used as a
control variable for pre-pruning. In the following the example, you
can plot a decision tree on the same data with max_depth=3.
# Create Decision Tree classifer object
clf = DecisionTreeClassifier(max_depth=3)
OUTPUT:
Accuracy: 75.76