Deep Learning lab with Tensorflow (2)

BONAM VENKATA CHALAMAYYA ENGINEERING COLLEGE
(AUTONOMOUS)
APPROVED BY AICTE, NEW DELHI, AFFILIATED TO JNTUK KAKINADA
ODALAREVU, ALLAVARAL MANDAL, DR.B.R AMBEDKAR KONASEEMA DISTRICT, ANDHRA PRADESH – 533210.
DEEP LEARNING WITH TENSORFLOW LAB

Laboratory Manual
Prepared By:
B.Satish
k.Sai ram
T.Vihar Ram
B. Navaneeth Krishna
S.Durga Prasad
III B.Tech II Semester

(BR20)
DEPARTMENT OF CSE-AI&ML
III YEAR II SEM Code: 20AD6L04 L T P C
0 0 3 1.5
DEEP LEARNING WITH TENSORFLOW LAB
Course Outcomes:
On completion of this course, the student will be able to
 Implement deep neural networks to solve real world problems
 Choose appropriate pre-trained model to solve real time problem
 Interpret the results of two different deep learning models
List of Experiments:
1. Implement multilayer perceptron algorithm for MNIST Hand written Digit Classification.
2. Design a neural network for classifying movie reviews (Binary Classification) using IMDB
dataset.
3. Design a neural Network for classifying news wires (Multi class classification) using
Reuters dataset.
4. Design a neural network for predicting house prices using Boston Housing Price dataset.
5. Build a Convolution Neural Network for MNIST Hand written Digit Classification.
6. Build a Convolution Neural Network for simple image (dogs and Cats) Classification
7. Use a pre-trained Convolution Neural Network (VGG16) for image classification.0
8. Implement one hot encoding of words or characters.
9. Implement word embeddings for IMDB dataset.
10. Implement a Recurrent Neural Network for IMDB movie review classification problem.
Text Books:
1. Reza Zadeh and BharathRamsundar, ―Tensorflow for Deep Learning‖, O‘Reilly publishers,
2018
References:
1. https://github.com/fchollet/deep-learning-with-python-notebooks
TABLE OF CONTENTS
Sl.No Name of Experiment Page.No
1 Implement multilayer perceptron algorithm for MNIST Hand written Digit

Classification.
2 Design a neural network for classifying movie reviews (Binary Classification) using
IMDB dataset.
.
3 Design a neural Network for classifying news wires (Multi class classification)
using Reuters dataset
4 Design a neural network for predicting house prices using Boston Housing Price
dataset.
5 Build a Convolution Neural Network for MNIST Hand written Digit Classification.
6 Build a Convolution Neural Network for simple image (dogs and Cats) Classification
7 Use a pre-trained Convolution Neural Network (VGG16) for image classification.0
8 Implement one hot encoding of words or characters.
9 Implement word embeddings for IMDB dataset
10 Implement a Recurrent Neural Network for IMDB movie review classification

problem
EXPERIMENT:1
I MPL E ME N T MU L T I L A Y E R PE R C E PT R O N A L G O R I T H M F O R MN I S
T HAND WR I T T E N D I G I T C L A S S I F I C A T I O N .
Aim: Implement multilayer perceptron algorithm for MNIST Hand written Digit Classification.
Description:
 Handwritten digit recognition using the MNIST dataset is a significant project built with the help
of neural
networks. It is designed to detect scanned images of handwritten digits.
 We have taken this concept a step further by enhancing our handwritten digit recognition system
to not only
identify scanned images but also allow users to write digits directly on the screen using an
integrated GUI for
real-time recognition.
 The MNIST dataset (Modified National Institute of Standards and Technology) is a comprehensive
collection
of handwritten digits (0-9) widely used for training and testing machine learning models,
especially in
image classification and deep learning.
Key Features of MNIST:
 Contains 60,000 training images and 10,000 test images.
 Each image is 28×28 pixels in grayscale (values ranging from 0 to 255).
 Labels range from 0 to 9, representing the corresponding digit in the image.
 Commonly used for benchmarking neural network architectures like MLPs, CNNs, and RNNs.
PROCESS:
1.Importing libraries
The necessary libraries such as TensorFlow, Keras, NumPy, and Matplotlib are imported.
These libraries help in defining the model, handling data, and visualizing results.
2.Loading and preprocessing dataset

A dataset (e.g., MNIST, IMDB sentiment analysis, or Reuters news classification) is loaded.
Data is split into training and testing sets.
Features are normalized (scaling pixel values or encoding text data). Labels are one-hot encoded if it's
a classification task.
3.Definding the MLP model

A sequential model is created using keras.Sequential(). Layers are added:
Input Layer: Specifies the input shape.
Hidden Layers: Fully connected (dense) layers with activation functions (e.g., ReLU).
Output Layer: Uses activation like Softmax (for multi-class classification) or Sigmoid (for binary
classification)
4.Compling the model

The model is compiled with:
Loss function (e.g., categorical_crossentropy for classification). Optimizer (e.g., Adam, SGD).
Metrics like accuracy to monitor performance.
5.training the model

The model is trained using the fit() function.
Training happens for a specified number of epochs with a batch size. Validation data is used to monitor
performance.
6.Evaluating the model

The trained model is tested on unseen data using evaluate(). Performance metrics such as accuracy, loss,
precision, and recall are analyzed.
7.Making prediction
New samples are passed through the model using predict(). The predictions are compared with
actual labels.
8.Visualizing result
Accuracy and loss curves are plotted using Matplotlib. Confusion matrix and classification
reports are generated.
9.saving and loading the model

The trained model is saved (model.save()) for future use. It can be loaded later using
keras.models.load_model()
In [1]:
import tensorflow as tf
import numpy as np
from tensorflow.keras.models import Sequential from
tensorflow.keras.layers import Flatten from
tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Activation
import matplotlib.pyplot as plt
In [2]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist. npz

11490434/11490434 ━━━━━━━━━━━━━━━━━━━━ 8s 1us/step
In [3]:
x_train = x_train.astype('float32') x_test =
x_test.astype('float32')
In [4]:
gray_scale = 255
x_train /= gray_scale x_test
/= gray_scale
In [5]:
print("Feature matrix:", x_train.shape)
print("Target matrix:", x_test.shape)
print("Feature matrix:", y_train.shape)
print("Target matrix:", y_test.shape)
Feature matrix: (60000, 28, 28)

Target matrix: (10000, 28, 28)
Feature matrix: (60000,)
Target matrix: (10000,)
In [6]:
fig, ax = plt.subplots(10, 10) k =
0
for i in range(10):
for j in range(10):
ax[i][j].imshow(x_train[k].reshape(28, 28),
aspect='auto')
k += 1
plt.show()
In [7]:
model = Sequential([
Flatten(input_shape=(28, 28)),
Dense(256, activation='sigmoid'),
])
C:\Users\reddy\anaconda3\Lib\site-packages\keras\src\layers\reshaping\flatten.py:37: Use rWarning: Do not pass an

ìnput_shape`/ìnput_dim` argument to a layer. When using Seque ntial models, prefer using an Ìnput(shape)`
object as the first layer in the model inst ead.
super(). init (**kwargs)
In [8]:
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
In [9]:
model.fit(x_train, y_train, epochs=10,
batch_size=2000,
validation_split=0.2)
Epoch 1/10
24/24 ━━━━━━━━━━━━━━━━━━━━ 2s 39ms/step - accuracy: 0.2255 - loss: 2.2490 - val_accurac

y: 0.6957 - val_loss: 1.7442
Epoch 2/10
24/24 ━━━━━━━━━━━━━━━━━━━━ 1s 24ms/step - accuracy: 0.6951 - loss: 1.5704 - val_accurac
y: 0.7732 - val_loss: 1.0563
Epoch 3/10
24/24 ━━━━━━━━━━━━━━━━━━━━ 0s 19ms/step - accuracy: 0.7818 -
y: 0.8565 - val_loss: 0.6860
Epoch 4/10
y: 0.8866 - val_loss: 0.5025
Epoch 5/10
y: 0.9003 - val_loss: 0.4083
Epoch 6/10
y: 0.9079 - val_loss: 0.3549
Epoch 7/10
y: 0.9147 - val_loss: 0.3211
Epoch 8/10
y: 0.9187 - val_loss: 0.2974
Epoch 9/10
y: 0.9242 - val_loss: 0.2780
Epoch 10/10
y: 0.9267 - val_loss: 0.2628
Out[10]:
<keras.src.callbacks.history.History
at 0x24bb73fcec0> In [24]:
results = model.evaluate(x_test, y_test, verbose = 0) print('test loss,

test acc:', results)
test loss, test acc: [0.2701019048690796,
0.9233999848365784] In [ ]:
EXPERIMENT:2
D E S I G N A N E U R A L N E T WO R K F O R C L A S S I F Y I N G MO V I E R E V I E WS ( B I N A R
Y
C L A S S I F I C A T I O N ) U S I N G I MD B D A T A S E T .
AIM:Designa neural network for classifying moviereviews (Binary Classification) using IMDB dataset.
Description:
 The IMDB dataset is a popular benchmark dataset for binary sentiment
classification, where movie reviews are categorized as either positive (1) or
negative (0). It is widely utilized in natural language processing (NLP) tasks,
particularly for sentiment analysis.
 Dataset Overview:
o Total Size: 50,000 movie reviews
o Training Set: 25,000 reviews
o Test Set: 25,000 reviews
Labels:
o 1 → Positive review
o 0 → Negative review
This dataset serves as a fundamental resource for training and evaluating machine learning models in
sentiment classification.
Process:
STEP 1: Loading the dataset
STEP 2: Decoding the review
STEP 3: Dadding the example
STEP 4: Creating and training
STEP 5: predictions and evaluation.
we will be using
 Tensorflow
 keras
IMDB dataset
TensorFlow: An open-source machine learning framework developed by Google for deep
learning
and numerical computation.
Keras: A high-level neural networks API, built on top of TensorFlow, that simplifies deep
learning
model development.
IMDB Dataset: A dataset of 50,000 movie reviews labeled as positive or negative, commonly used
for sentiment analysis.
[1]: from keras.datasets import imdb
[2]: # Load the data, keeping only 10,000 of the most frequently occuring words
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words␣
𝗌= 10000)
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras- datasets/imdb.npz

1
17464
789/1
74647
89
60s
3us/s
2
[3]:
# Since we restricted ourselves to the top 10000 frequent words, no word index␣
𝗌should exceed 10000
# we'll verify this below
# Here is a list of maximum indexes in every review --- we search the maximum␣
𝗌index in this list of max indexes
print(type([max(sequence) for sequence in train_data]))
# Find the maximum of all max indexes

max([max(sequence) for sequence in train_data])
<class 'list'>
[3]: 9999
[4]: #Let's quickly decode a review
# step 1: load the dictionary mappings from word to integer index

word_index = imdb.get_word_index()
# step 2: reverse word index to map integer indexes to their respective words
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
# Step 3: decode the review, mapping integer indices to words

#
# indices are off by 3 because 0, 1, and 2 are reserverd indices for "padding",␣
𝗌"Start of sequence" and "unknown"
3
STEP 3:
STEP 4:
4
STEP 5:
STEP 6:
5
PROGRAM:
6
OUTPUT:
7
EXPERIMENT:3
Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an appropriate
data set for building the decision tree and apply this knowledge to classify a new sample.
AIM:
Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an appropriate
data set for building the decision tree and apply this knowledge to classify a new sample.
Description:
8
Training Dataset:
9
Test Dataset:
Program:
1
0
1
1
OUTPUT:
EXPERIMENT:4
Exercises to solve the real-world problems using the following machine learning methods: a) Linear
Regression b) Logistic Regression c) Binary Classifier
AIM:
Exercises to solve the real-world problems using the following machine learning methods: a) Linear
Regression b) Logistic Regression c) Binary Classifier
Description:
Linear Regression:
Simple Linear Regression is a type of Regression algorithms that models the relationship
between a dependent variable and a single independent variable. The relationship shown by
a Simple Linear Regression model is linear or a sloped straight line, hence it is called Simple
Linear Regression.
1
2
The key point in Simple Linear Regression is that the dependent variable must be a
continuous/real value. However, the independent variable can be measured on continuous
or categorical values.
1
3
Here we are taking a dataset that has two variables: salary (dependent variable) and
experience (Independent variable). The goals of this problem is:
 We want to find out if there is any correlation between these two variables
 We will find the best fit line for the dataset.
 How the dependent variable is changing by changing the independent variable.
Procedure:
Step-1: Data Pre-processing
Step-2: Fitting the Simple Linear Regression to the
Training Set Step: 3. Prediction of test set result
Step: 4. visualizing the Training
set results Step: 5. visualizing the
Test set results Program:
import numpy as np
import pandas as pd
import os
os.getcwd()
os.chdir("/content/drive/MyDrive/Datasets")
data_set= pd.read_csv('Salary_Data.csv')
data_set
x= data_set.iloc[:, :-1].values
y= data_set.iloc[:, 1].values
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 1/3, random_state=0)
x_test
y_test
1
4
x_train
y_train
from sklearn.linear_model import LinearRegression
regressor= LinearRegression()
regressor.fit(x_train, y_train)
y_pred= regressor.predict(x_test)
x_pred= regressor.predict(x_train)
plt.scatter(x_train, y_train, color="green")
plt.plot(x_train, x_pred, color="red")
plt.title("Salary vs Experience (Training Dataset)")
plt.xlabel("Years of Experience")
plt.ylabel("Salary(In Rupees)")
plt.show()
OUTPUT:
plt.scatter(x_test, y_test, color="blue")
plt.plot(x_train, x_pred, color="red") 1

5
plt.title("Salary vs Experience (Test Dataset)")
plt.xlabel("Years of Experience")
plt.ylabel("Salary(In Rupees)")
plt.show()
OUTPUT:
Logistic Regression:
 Logistic regression is one of the most popular Machine Learning algorithms,

which comes under the Supervised Learning technique. It is used for predicting
the categorical dependent variable using a given set of independent variables.
 Logistic regression predicts the output of a categorical dependent variable.
Therefore the outcome must be a categorical or discrete value. It can be either
Yes or No, 0 or 1, true or False, etc. but instead of giving the exact value as 0
and 1, it gives the probabilistic values which lie between 0 and 1.
 Logistic Regression is much similar to the Linear Regression except that how
they are used. Linear Regression is used for solving Regression problems,
whereas Logistic regression is used for solving the classification
problems.
 In Logistic regression, instead of fitting a regression line, we fit an "S" shaped
logistic function, which predicts two maximum values (0 or 1).
1
6
Logistic Function (Sigmoid Function):
1
7
Logistic Regression Equation:
o We know the equation of the straight line can be written as:
o In Logistic Regression y can be between 0 and 1 only, so for this let's divide
the above equation by (1-y):
o But we need range between -[infinity] to +[infinity], then take logarithm of the
equation it will become:
The above equation is the final equation for Logistic Regression.
Example: There is a dataset given which contains the information of various users
obtained from the social networking sites. There is a car making company that has
recently launched a new SUV car. So the company wanted to check how many users
from the dataset, wants to purchase the car.
For this problem, we will build a Machine Learning model using the Logistic regression
algorithm. The dataset is shown in the below image. In this problem, we will predict
the purchased variable (Dependent Variable) by using age and salary
(Independent variables).
NOTE: In logistic regression, we will do feature scaling because we want accurate

result of predictions. Here we will only scale the independent variable because
dependent variable have only 0 and 1 values. Below is the code for it:
1
8
PROGRAM:
import numpy as np
import pandas as pd
import os
os.getcwd()
data_set= pd.read_csv('car_data.csv')
data_set
x= data_set.iloc[:, [2,3]].values
y= data_set.iloc[:, 4].values
x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25, random_state=0)
x_test
y_test
x_train 1
9
y_train
from sklearn.preprocessing import StandardScaler
st_x= StandardScaler()
x_train= st_x.fit_transform(x_train)
x_test= st_x.transform(x_test)
from sklearn.linear_model import LogisticRegression
classifier= LogisticRegression(random_state=0)
classifier.fit(x_train, y_train)
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, l1_ratio=None, max_iter=100,
multi_class='warn', n_jobs=None, penalty='l2',
random_state=0, solver='warn', tol=0.0001, verbose=0,
warm_start=False)
y_pred= classifier.predict(x_test)
from sklearn.metrics import confusion_matrix
cm= confusion_matrix(y_test,y_pred)
from matplotlib.colors import ListedColormap
x_set, y_set = x_train, y_train
x1, x2 = np.meshgrid(np.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].max() + 1, step =0.01),
np.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))
plt.contourf(x1, x2, classifier.predict(np.array([x1.ravel(), x2.ravel()]).T).reshape(x1.shape),
alpha = 0.75, cmap = ListedColormap(('purple','green' )))
plt.xlim(x1.min(), x1.max())
plt.ylim(x2.min(), x2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1], 2

0
c = ListedColormap(('purple', 'green'))(i), label = j)
2
1
plt.title('Logistic Regression (Training set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()
x_set, y_set = x_train, y_train
plt.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
plt.title('Logistic Regression (Training set)')
plt.xlabel('Age')
plt.legend()
plt.show()
2
2
OUTPUT
x_set, y_set = x_test, y_test
plt.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
plt.title('Logistic Regression (Test set)')

2
3
plt.xlabel('Age')
2
4
plt.legend()
plt.show()
Binary Classifier:
 A Classifier in Machine Learning is an algorithm, that will determine the class to

which the input data belongs to based on a set of features.
 A Binary Classifier is an instance of Supervised Learning. In Supervised Learning
we have a set of input data and a set of labels, our task is to map each data with a
label. A Binary Classifier classifies elements into two groups, either Zero or One.
Types of Classification
Classification is of two types:
1. Binary Classification: When we have to categorize given data into 2 distinct
classes. Example – On the basis of given health conditions of a person, we have to
determine whether the person has a certain disease or not.
2. Multiclass Classification: The number of classes is more than 2. For Example – On the
basis of data about different species of flowers, we have to determine which specie
our observation belongs.
2
5
Examples of Binary Classification:
 Email spam detection (spam or not).

 Churn prediction (churn or not).
 Conversion prediction (buy or not).
Popular algorithms that can be used for binary classification include:
 Logistic Regression
 k-Nearest Neighbors
 Decision Trees
 Support Vector Machine
 Naive Bayes
Multi-Class Classification
Multi-class classification refers to those classification tasks that have more than two class
labels. Examples include:
 Face classification.
 Plant species classification.
 Optical character recognition.
Popular algorithms that can be used for multi-class classification include:
 k-Nearest Neighbors. 2
6
 Decision Trees.
 Naive Bayes.
2
7
 Random Forest.
 Gradient Boosting.
PROGRAM:
from numpy import where from
collections import Counter
from sklearn.datasets import make_blobs from
matplotlib import pyplot
# define dataset
X, y = make_blobs(n_samples=1000, centers=2, random_state=1) #
summarize dataset shape
print(X.shape, y.shape)
# summarize observations by class label counter =
Counter(y)
print(counter)
# summarize first few examples for i
in range(10):
print(X[i], y[i])
# plot the dataset and color the by class label for
label, _ in counter.items():
row_ix = where(y == label)[0]
pyplot.scatter(X[row_ix, 0], X[row_ix, 1], label=str(label))
pyplot.legend()
pyplot.show()
2
8
OUTPUT:
EXPERIMENT:5
Develop a program for Bias, Variance, Remove duplicates, Cross Validation
AIM:
Develop a program for Bias, Variance, Remove duplicates, Cross Validation
Description:
Bias:
In machine learning, bias refers to the difference between the predictions made by a learning
algorithm and the true values of the target variable. It measures the systematic error or the
tendency of a model to consistently underfit or overfit the data.
Variance:
In machine learning, variance refers to the variability or instability of a model's predictions

when trained on different subsets of the training data. It measures the sensitivity of the
model to the randomness in the training data.
Cross Validation:
Cross-validation is a technique used in machine learning to assess the performance and

generalization ability of a model. It involves partitioning the available data into multiple
subsets, called folds, and iteratively training and evaluating the model on different
combinations of these folds.
PROGRAM:
import numpy as np
from sklearn.model_selection import
train_test_split from sklearn.linear_model 2
9
import LinearRegression
3
0
from sklearn.metrics import mean_squared_error
# Generate sample data

np.random.seed(42)
X = np.linspace(0, 10, 100).reshape(-1, 1)
y = 3 * X + np.random.randn(100).reshape(-1, 1)
# Add duplicate
samples X =
np.vstack((X, X[:10]))
y = np.vstack((y, y[:10]))
# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Fit a linear regression

model model =
LinearRegression()
model.fit(X_train, y_train)
# Calculate the training and testing errors (bias and variance)

y_train_pred = model.predict(X_train)
train_error = mean_squared_error(y_train,
y_train_pred) y_test_pred = model.predict(X_test)
test_error = mean_squared_error(y_test, y_test_pred)
print("Training error (bias):", train_error)

print("Testing error (variance):",
test_error)
# Remove duplicate samples

X_unique, indices = np.unique(X, axis=0,
return_index=True) y_unique = y[indices]
# Perform cross-validation
cross_val_errors = []
for i in range(5):
X_train, X_val, y_train, y_val = train_test_split(X_unique, y_unique, test_size=0.2, random_state=i)
model = LinearRegression()
model.fit(X_train, y_train)
y_val_pred = model.predict(X_val)
val_error = mean_squared_error(y_val, y_val_pred)
cross_val_errors.append(val_error)
print("Cross-validation errors:", cross_val_errors)

print("Average cross-validation error:", np.mean(cross_val_errors))
3
1
OUTPUT:
Training error (bias): 0.9087396281299075

Testing error (variance): 0.4189781106088322
Cross-validation errors: [0.8943963399542353, 0.711838267559308,
0.9664698336127481, 0.9156270854451775, 1.1754593805313762]
Average cross-validation error: 0.932758181420569
EXPERIMENT:6
Write a program to implement Categorical Encoding, One-hot Encoding
AIM:
Write a program to implement Categorical Encoding, One-hot Encoding
Description:
Categorical encoding:
Categorical encoding is a process of converting categorical variables (features) into numerical
representations that machine learning algorithms can understand. Categorical variables are
variables that represent discrete categories or groups, such as color, country, or product type.
There are several common methods for categorical encoding:
1. Label Encoding:
 Assigns a unique numerical label to each category in the variable.
 Useful for ordinal variables where the categories have an inherent order.
Implemented using LabelEncoder class in scikit-learn.

the
2. One-Hot
Encoding:
 Creates binary columns for each category and represents the presence or
absence of a category using 1s and 0s.
 Suitable for nominal variables where there is no inherent order.
Implemented using OneHotEncoder class in scikit-learn.


the
3. Ordinal Encoding:
 Assigns a numerical value to each category based on a predefined order or
mapping.
 Useful when the categories have an order, but the numerical
difference between them may not be meaningful.
 Can be implemented using mapping dictionaries or custom encoding
functions.
4. Frequency Encoding:
 Replaces each category with its frequency or occurrence in the dataset.
 Useful when the frequency of a category may be informative for the model.
 Can be implemented using pandas' value_counts function or custom
encoding functions.
5. Target Encoding: 3
2
 Replaces each category with the mean or median of the target
variable for that category.
 Useful when the relationship between the category and the target
variable is important.
3
3
Requires careful handling to avoid leakage and overfitting.

6. Binary Encoding:
 Represents each category with binary codes.
 Useful for high-cardinality categorical variables (variables with
many unique categories).
 Implemented using libraries category_encoders or custom encoding functions.
like
PROGRAM:
import pandas as pd
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
# Sample dataset
data = {'Color': ['Red', 'Blue', 'Green', 'Red', 'Blue']}
df = pd.DataFrame(data)
# Categorical Encoding
label_encoder = LabelEncoder()
df['Color_Encoded'] = label_encoder.fit_transform(df['Color'])
# One-Hot Encoding
onehot_encoder = OneHotEncoder(sparse=False)
onehot_encoded = onehot_encoder.fit_transform(df[['Color_Encoded']])
onehot_df = pd.DataFrame(onehot_encoded, columns=label_encoder.classes_)
# Print the original and encoded dataframes

print("Original DataFrame:")
print(df)
print("\nCategorical Encoded DataFrame:")
print(df[['Color', 'Color_Encoded']])
print("\nOne-Hot Encoded DataFrame:")
print(onehot_df)
OUTPUT:
3
4
EXPERIMENT:7
Build an Artificial Neural Network by implementing the Back propagation algorithm and test the same using
appropriate data sets.
AIM:
Build an Artificial Neural Network by implementing the Back propagation algorithm and test the same using
appropriate data sets.
Description:
The backpropagation algorithm is a widely used method for training artificial neural
networks (ANNs). It allows the network to learn from labeled training data by
iteratively adjusting the weights and biases of the network's connections to minimize
the error between predicted and actual outputs. Here is a step-by-step explanation of
the backpropagation algorithm:
1. Initialize the network:
 Define the network architecture, including the number of layers,
neurons per layer, and activation functions.
 Randomly initialize the weights and biases for each connection in the
network.
2. Forward propagation:
 Input an instance of training data to the network.
 Calculate the weighted sum of inputs and biases for each neuron in each
layer.
 Apply the activation function to obtain the output of each neuron.
 Pass the outputs forward to the next layer until reaching the output layer.
 Compare the network's output with the actual output and calculate the
error.
3. Backward propagation:
 Calculate the gradient of the error with respect to the weights and
biases of the output layer.
 Update the weights and biases of the output layer using the gradient and
a learning rate.
 Calculate the gradients for the previous layers using the chain rule.
 Update the weights and biases of the previous layers using the
gradients and the learning rate.
 Repeat the above steps for all instances in the training dataset.
4. Repeat the steps above:

 Repeat steps 2 and 3 for a specified number of epochs or until the
network reaches a satisfactory level of accuracy.
 Adjust the learning rate and other hyperparameters if necessary.
 Monitor the training progress by observing the decrease in the error over
epochs.
5. Evaluate the trained network:
 Once training is complete, evaluate the performance of the trained
network using validation or test data.
 Input new instances of data to the network and observe the predicted
outputs. 3
5
 Calculate metrics such as accuracy, precision, recall, or others
depending on the problem.
3
6
PROGRAM:
import tensorflow as tf
import numpy as np
# Define the dataset

X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=np.float32)
y = np.array([[0], [1], [1], [0]], dtype=np.float32)
# Define the architecture of the neural network

n_input = 2
n_hidden = 2
n_output = 1
# Define the weights and biases as TensorFlow variables

weights = {
'hidden': tf.Variable(tf.random.normal([n_input, n_hidden])),
'output': tf.Variable(tf.random.normal([n_hidden, n_output]))
}
biases = {
'hidden': tf.Variable(tf.random.normal([n_hidden])),
'output': tf.Variable(tf.random.normal([n_output]))
}
# Define the forward pass

def forward_propagation(x):
hidden_layer = tf.sigmoid(tf.add(tf.matmul(x, weights['hidden']), biases['hidden']))
output_layer = tf.sigmoid(tf.add(tf.matmul(hidden_layer, weights['output']), biases['output']))
return output_layer
# Define the backpropagation algorithm

def backpropagation(x, y):
with tf.GradientTape() as tape:
output_layer = forward_propagation(x)
loss = tf.reduce_mean(0.5 * (y - output_layer) ** 2)
gradients = tape.gradient(loss, [weights['hidden'], weights['output'], biases['hidden'], biases['output']])

optimizer.apply_gradients(zip(gradients, [weights['hidden'], weights['output'], biases['hidden'],
biases['output']]))
# Define the training loop

optimizer = tf.optimizers.SGD(learning_rate=0.1)
epochs = 10000
for epoch in range(epochs):

backpropagation(X, y)
if epoch % 1000 == 0:
output = forward_propagation(X)
loss = tf.reduce_mean(0.5 * (y - output) ** 2)
print(f"Epoch: {epoch}, Loss: {loss}")
3
7
# Test the trained model
3
8
predictions = forward_propagation(X)
print("Predictions:")
print(predictions.numpy().round())
OUTPUT:
Epoch: 0, Loss: 0.13507995009422302

Epoch: 1000, Loss: 0.1251978725194931
Epoch: 2000, Loss: 0.12484432756900787
Epoch: 3000, Loss: 0.12444967031478882
Epoch: 4000, Loss: 0.12391756474971771
Epoch: 5000, Loss: 0.12311005592346191
Epoch: 6000, Loss: 0.12179925292730331
Epoch: 7000, Loss: 0.11967145651578903
Epoch: 8000, Loss: 0.11657140403985977
Epoch: 9000, Loss: 0.11282829940319061
Predictions:
[[0.]
[1.]
[0.]
[0.]]
EXPERIMENT:8
Write a program to implement k-Nearest Neighbor algorithm to classify the iris data set. Print both correct
and wrong predictions.
AIM:
Write a program to implement k-Nearest Neighbor algorithm to classify the iris data set. Print both correct
and wrong predictions.
Description:
The k-Nearest Neighbor (k-NN) algorithm is a popular supervised machine learning algorithm
used for both classification and regression tasks. It is a non-parametric method that makes
predictions based on the similarity between input data points.
Process:
1. Data Preparation: Gather a labeled training dataset consisting of input

feature vectors and their corresponding class labels or target values.
2. Choose a Distance Metric: Select an appropriate distance metric to measure
the similarity or dissimilarity between data points. Commonly used distance
metrics include Euclidean distance, Manhattan distance, or cosine similarity.
3. Choose the Value of k: Determine the value of k, which represents the
number of nearest neighbors to consider during prediction. The optimal value
of k can be determined using cross-validation or other model evaluation
techniques.
4. Compute Distances: Calculate the distance between the new input data
point and all the training data points in the feature space, using the chosen
distance metric.
5. Find k Nearest Neighbors: Select the k data points with the shortest
distances to the new input data point.3
9
6. Make Predictions: For classification problems, assign the class label that
appears most frequently among the k nearest neighbors as the predicted class
for the new data point. For regression problems, calculate the average or
weighted average of the target values of the k nearest neighbors as the
predicted value for the new data point.
7. Evaluate Performance: Assess the performance of the k-NN model using
appropriate evaluation metrics such as accuracy, precision, recall, or mean
squared error, depending on the problem type.
PROGRAM:
import numpy as np
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier

from sklearn import metrics
import os
os.getcwd()
# Read dataset to pandas dataframe

names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']
dataset = pd.read_csv("iris_data.csv", names=names)
X = dataset.iloc[:, :-1]
y = dataset.iloc[:, -1]
X.head()
y.head()
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, test_size=0.10)
classifier = KNeighborsClassifier(n_neighbors=5).fit(Xtrain, ytrain)

ypred = classifier.predict(Xtest)
i=0
print ("\n ")
4
0
print ('%-25s %-25s %-25s' % ('Original Label', 'Predicted Label',
'Correct/Wrong'))
print (" ")
for label in ytest:
print ('%-25s %-25s' % (label, ypred[i]), end="")
if (label == ypred[i]):
print (' %-25s' % ('Correct'))
else:
print (' %-25s' % ('Wrong'))
i=i+1
print (" ")
print("\nConfusion Matrix:\n",metrics.confusion_matrix(ytest, ypred))
print (" ")
print("\nClassification Report:\n",metrics.classification_report(ytest, ypred))
print (" ")
print('Accuracy of the classifer is %0.2f' % metrics.accuracy_score(ytest,ypred))
print (" ")
OUTPUT:
-------------------------------------------------------------------------
Original Label Predicted Label Correct/Wrong
-------------------------------------------------------------------------
Iris-versicolor Iris-versicolor
Correct Iris-setosa Iris-
setosa Correct Iris-versicolor Iris-
versicolor Correct Iris-virginica Iris-
virginica Correct Iris-versicolor Iris-
virginica Correct Iris-versicolor Iris-
versicolor Correct Iris-versicolor Iris-
virginica Correct
Iris-setosa Iris-setosa Correct
Iris-virginica Iris-virginica
Correct Iris-versicolor Iris-
versicolor Correct Iris-versicolor Iris-
virginica Correct
Iris-setosa Iris-setosa Correct
-------------------------------------------------------------------------
Confusion Matrix:
[[3 0 0]
[0 7 0]
[0 0 5]]
...
4
1
-------------------------------------------------------------------------
Accuracy of the classifer is 1.00
4
2
EXPERIMENT:9
Implement the non-parametric Locally Weighted Regression algorithm in order to fit data points. Select
appropriate data set for your experiment and draw graphs.
AIM:
Implement the non-parametric Locally Weighted Regression algorithm in order to fit data points. Select
appropriate data set for your experiment and draw graphs.
Description:
The Locally Weighted Regression (LWR) algorithm is a non-parametric regression method

that aims to model the relationship between the input features and the target variable.
Unlike traditional regression algorithms, LWR assigns weights to the training data points
based on their proximity to the query point during prediction.
PROCESS:
1. Data Preparation: Gather a labeled training dataset consisting of input

feature vectors and their corresponding target values.
2. Choose a Kernel Function: Select a kernel function that assigns weights to
the training data points based on their proximity to the query point. Commonly
used kernel functions include Gaussian kernel, Epanechnikov kernel, and
triangular kernel.
3. Choose the Value of the Bandwidth Parameter: Determine the value of
the bandwidth parameter, which controls the width of the kernel and thus the
influence of nearby data points on the prediction. Smaller bandwidth values
give more weight to closer points, while larger bandwidth values consider
points farther away.
4. Compute Weights: For each query point, calculate the weights for all
training data points based on their distances to the query point, using the
selected kernel function and bandwidth parameter.
5. Fit Local Models: For each query point, fit a local regression model using the
weighted data points. This can be done by minimizing a weighted least squares
cost function, such as ordinary least squares or ridge regression.
6. Make Predictions: Once the local models are fitted, use them to predict the
target variable for new query points by applying the learned local regression
functions.
7. Evaluate Performance: Assess the performance of the LWR model using
appropriate evaluation metrics, such as mean squared error or R-squared,
depending on the problem type.
PROGRAM:
import numpy as np
def gaussian_kernel(x, xi, tau):

return np.exp((x - xi)**2 / (-2 * tau**2)) 4
3
def locally_weighted_regression(X_train, y_train, x_query, tau):
m = len(X_train)
X = np.column_stack((np.ones(m), X_train))
W = np.diag([gaussian_kernel(x_query, xi, tau) for xi in X_train])
theta = np.linalg.inv(X.T @ W @ X) @ X.T @ W @ y_train
return theta
# Prepare the dataset

X_train = np.array([1, 2, 3, 4, 5, 6])
y_train = np.array([1, 3, 2, 5, 4, 6])
# Choose query points for prediction

x_query = np.linspace(0, 7, 100)
# Choose tau
tau = 0.5
# Perform locally weighted regression for each query point

y_pred = []
for x in x_query:
theta = locally_weighted_regression(X_train, y_train, x, tau)
y_pred.append(theta[0] + theta[1] * x)
# Plot the original data points and the fitted curve

plt.scatter(X_train, y_train, color='blue', label='Original Data')
plt.plot(x_query, y_pred, color='red', label='Locally Weighted Regression')
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Locally Weighted Regression')
plt.legend()
plt.grid(True)
plt.show()
OUTPUT:
4
4
EXPERIMENT:10
Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier model to perform
this task. Built-in Java classes/API can be used to write the program. Calculate the accuracy, precision, and
recall for your data set.
AIM:
Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier model to perform
this task. Built-in Java classes/API can be used to write the program. Calculate the accuracy, precision, and
recall for your data set.
Description:
1. Data Preparation: Prepare your dataset of labeled documents. Each

document should be represented as a feature vector, and each vector should
be associated with a class label.
2. Data Preprocessing: Perform any necessary preprocessing steps such as
tokenization, removing stop words, and stemming to clean and normalize
the text data.
3. Feature Extraction: Convert the preprocessed text data into numerical
features that can be used by the Naive Bayes Classifier. One common approach
is to use the bag-of-words model, where each document is represented by a
vector indicating the presence or absence of each word in a predefined
vocabulary.
4. Train the Naive Bayes Classifier: Use the training set to estimate the
class prior probabilities and likelihood probabilities based on the feature
vectors. You can use Java's built-in classes or external libraries such as
Apache OpenNLP or Weka for this task.
5. Test the Classifier: Apply the trained Naive Bayes Classifier to classify the
documents in the test set. Compare the predicted class labels with the ground
truth labels to evaluate the accuracy, precision, and recall.
6. Calculate Evaluation Metrics: Compute the evaluation metrics using the
predicted labels and the ground truth labels. Accuracy measures the overall
correctness of the classifier's predictions. Precision measures the proportion
of true positives out of all predicted positives, while recall measures the
proportion of true positives out of all actual positives.
PROGRAM:
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import weka.classifiers.bayes.NaiveBayes;
import weka.classifiers.evaluation.Evaluation;
import weka.core.Attribute;
import weka.core.DenseInstance;
import weka.core.Instances;
4
5
public class NaiveBayesClassifierExample {
public static void main(String[] args) throws Exception {
// Step 1: Read the dataset file
BufferedReader reader = new BufferedReader(new FileReader("dataset.arff"));
4
6
Instances dataset = new Instances(reader);
reader.close();
// Step 2: Set the class attribute

dataset.setClassIndex(dataset.numAttributes() - 1);
// Step 3: Train the Naive Bayes Classifier

NaiveBayes classifier = new NaiveBayes();
classifier.buildClassifier(dataset);
// Step 4: Evaluate the Classifier using cross-validation

Evaluation evaluation = new Evaluation(dataset);
evaluation.crossValidateModel(classifier, dataset, 10, new java.util.Random(1));
// Step 5: Calculate evaluation metrics

double accuracy = evaluation.pctCorrect();
double precision = evaluation.weightedPrecision();
double recall = evaluation.weightedRecall();
// Step 6: Print evaluation metrics

System.out.println("Accuracy: " + accuracy);
System.out.println("Precision: " + precision);
System.out.println("Recall: " + recall);
}
}
EXPERIMENT:11
Apply EM algorithm to cluster a Heart Disease Data Set. Use the same data set for clustering using k-Means
algorithm. Compare the results of these two algorithms and comment on the quality of clustering. You can
add Java/Python ML library classes/API in the program.
AIM:
Apply EM algorithm to cluster a Heart Disease Data Set. Use the same data set for clustering using k-Means
algorithm. Compare the results of these two algorithms and comment on the quality of clustering. You can
add Java/Python ML library classes/API in the program.
Description:
The Expectation-Maximization (EM) algorithm is an iterative method used to estimate

the parameters of probabilistic models when dealing with missing or incomplete data. It is
widely used in various fields, including statistics, machine learning, and data clustering. The
EM algorithm seeks to find the maximum likelihood estimates of the model parameters by
iteratively updating the estimates based on the expected values of the missing data.
The k-Means algorithm is an iterative clustering algorithm used to partition a dataset

into k distinct clusters. It is one of the most popular and widely used clustering
algorithms due to its simplicity and efficiency. The goal of the k-Means algorithm is to
minimize the within-cluster variance by iteratively assigning data points to the nearest
cluster centroid and updating the centroid positions.
4
7
PROGRAM:
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn import cluster
import os
os.chdir("/content/drive/MyDrive/Bhavani")
df=pd.read_csv("heart.csv")
df.head()
X=df.drop(columns='target')
#Elbow Method
Sum_of_squared_distances = []
K = range(1,10)
for num_clusters in K :
kmeans_model = cluster.KMeans(n_clusters=num_clusters)
kmeans_model.fit(X[["trestbps","chol"]])
Sum_of_squared_distances.append(kmeans_model.inertia_)
plt.plot(K,Sum_of_squared_distances,'bx-')
plt.xlabel('Values of K')
plt.ylabel('Sum of squared distances/Inertia')
plt.title('Elbow Method For Optimal k')
plt.show()
kmeans=cluster.KMeans(n_clusters=2,init='k-means++')
#Considering Two attributes trestbps and chol

kmeans=kmeans.fit(X[["trestbps","chol"]])
X['Clusters']=kmeans.labels_
#Centroids
centers = np.array(kmeans.cluster_centers_)
centers
#Red Square Indicates Centroids

sns.scatterplot(x="trestbps",y="chol",hue="Clusters",data=X)
plt.scatter(centers[:,0], centers[:,1], marker="s", color='r')
4
8
OUTPUT:
EXPERIMENT:12
Exploratory Data Analysis for Classification using Pandas or Matplotlib.
AIM:
Exploratory Data Analysis for Classification using Pandas or Matplotlib.
Description:
Exploratory Data Analysis (EDA) is an important step in the data analysis process. It
involves examining and visualizing the data to gain insights, identify patterns, and
understand the characteristics of the dataset. EDA helps in discovering relationships between
variables, detecting outliers, and preparing the data for further analysis or modeling. Here's a
general framework for performing EDA:
4
9
PROCESS:
1. Load the Data: Start by loading your dataset into a suitable data structure,
such as a Pandas DataFrame in Python.
2. Data Summary: Get an overview of the dataset by examining the
dimensions, data types, and basic statistics of the variables. Some useful
functions for this include shape, head, info,
. describe , dtypes
3. and
: Check for missing values in the dataset and decide on an appropriate
Missing
strategy for handling them. Use functions like isnull, isna, to handle missing
fillna
dropna, or data.
4. Data Visualization: Create visualizations to explore the distribution,
relationships, and patterns in the data. Some common plots include
histograms, box plots, scatter plots, bar plots, and correlation matrices.
Matplotlib, Seaborn, and Plotly are popular libraries for data visualization in
Python.
5. Univariate Analysis: Analyze each variable individually to understand its
distribution and characteristics. This can involve examining frequency counts,
summary statistics, histograms, or box plots for numerical variables, and bar
plots or pie charts for categorical variables.
6. Bivariate Analysis: Explore relationships between pairs of variables to identify
correlations, dependencies, or associations. Scatter plots, line plots, or
heatmaps can be used to visualize these relationships. Statistical tests such as
correlation coefficients or t-tests can also provide insights into the relationships.
7. Multivariate Analysis: Analyze the relationships among multiple variables
simultaneously. This can involve visualizations like pair plots, parallel
coordinates plots, or heatmaps to observe patterns and interactions.
8. Outlier Detection: Identify potential outliers in the dataset that deviate
significantly from the rest of the data. Box plots, scatter plots, or statistical
methods like z-scores or the IQR (interquartile range) can be used for outlier
detection.
9. Feature Engineering: Explore opportunities for creating new features or
transforming existing ones to improve the predictive power of your data.
This can involve scaling, normalization, encoding categorical variables, or
creating interaction variables.
10. Data Preprocessing: Clean and preprocess the data as needed for further
analysis or modeling. This can include handling missing values, dealing with
outliers, standardizing or normalizing variables, or performing feature
selection.
PROGRAM:
import pandas as pd
import os
os.chdir("C://Users//INDIAN//AppData//Roaming//Microsoft//Windows//Start Menu//Programs//Python
3.7")
data = pd.read_csv("WASDE-DATA.csv")
# Check the dimensions of the dataset
5
0
print("Number of rows:", data.shape[0])
print("Number of columns:", data.shape[1])
# View the first few records
5
1
print(data.head())
# Check the data types of each

column print(data.dtypes)
print(data.describe())
class_counts = data['region'].value_counts()
print(class_counts)
data.hist(figsize=(10, 10))
plt.show()
data.boxplot(by='region', figsize=(10, 10))
plt.show()
categorical_columns = ['commodity', 'item']
for column in categorical_columns:
data[column].value_counts().plot(kind='bar')
plt.title(column)
plt.show()
plt.scatter(data['commodity'], data['item'])
plt.xlabel('commodity')
plt.ylabel('item')
plt.show()
correlation_matrix = data.corr()
plt.figure(figsize=(10, 10))
plt.imshow(correlation_matrix, cmap='coolwarm', interpolation='nearest')
plt.colorbar()
plt.xticks(range(len(correlation_matrix.columns)), correlation_matrix.columns, rotation=90)
plt.yticks(range(len(correlation_matrix.columns)), correlation_matrix.columns)
plt.title('Correlation Matrix')
plt.show()
OUTPUT:
Number of rows: 10000

Number of columns: 10
code report_month region commodity item \
0 WHEAT_WORLD_19 2023-01 World Less China Wheat Production
1 WHEAT_WORLD_19 2023-01 World Less China Wheat Production
2 WHEAT_WORLD_19 2023-01 World Less China Wheat Imports
3 WHEAT_WORLD_19 2023-01 World Less China Wheat Imports
4 WHEAT_WORLD_19 2023-01 World Less China Wheat Exports
year period value min_value max_value

0 2022/23 Proj. Jan 643.59 NaN NaN
1 2022/23 Proj. Dec 642.59 NaN NaN
2 2022/23 Proj. Jan 195.55 NaN NaN
3 2022/23 Proj. Dec 194.80 NaN NaN
4 2022/23 Proj. Jan 210.72 NaN NaN
code object
report_month object
region object
commodity object
5
2
item object
year object
5
3
period object
value float64
min_value float64
max_value float64
dtype: object
value min_value max_value
count 10000.000000 0.0 0.0
mean 73.100683 NaN NaN
std 149.321209 NaN NaN
min 0.000000 NaN NaN
25% 2.000000 NaN NaN
50% 11.570000 NaN NaN
75% 53.990000 NaN NaN
max 794.440000 NaN NaN
World Less China 455
World 3/ 455
United States 445
Argentina 441
Australia 441
Bangladesh 441
Brazil 441
Canada 441
China 441
India 441
Japan 441
Kazakhstan 441
Major Exporters 4/ 441
Major Importers 6/ 441
N. Africa 7/ 441
Nigeria 441
Russia 441
Sel. Mideast 8/ 441
Southeast Asia 9/ 441
Total Foreign 441
Ukraine 441
European Union 5/ 385
United Kingdom 266
EU-27+UK 5/ 56
Name: region, dtype: int64
5
4
5
5
5
6
5
7
5
8
EXPERIMENT:13
Write a Python program to construct a Bayesian network considering medical data. Use this model to
demonstrate the diagnosis of heart patients using standard Heart Disease Data Set
AIM:
Write a Python program to construct a Bayesian network considering medical data. Use this model to
demonstrate the diagnosis of heart patients using standard Heart Disease Data Set
Description:
A Bayesian network, also known as a Bayesian belief network or probabilistic graphical

model, is a graphical representation of probabilistic relationships among variables. It is based
on the principles of Bayesian probability and provides a compact and intuitive way to model
and reason about uncertain or probabilistic domains.
1. Conditional Probability Distributions (CPDs): CPDs specify the probability

distribution of a node given the values of its parent nodes. They quantify the
conditional dependencies in the network and are used to perform probabilistic
inference.
2. Inference: Inference in a Bayesian network involves computing the probability
distribution of one or more variables given evidence or observed values of other
variables. Various algorithms, such as variable elimination, enumeration, or
sampling methods, can be used for inference.
3. Learning: Bayesian networks can be learned from data using techniques such
as maximum likelihood estimation or Bayesian parameter estimation. Learning
the network structure and 5
9
CPDs from data allows us to discover the underlying dependencies and make
predictions or perform reasoning.
PROGRAM:
import numpy as np
import pandas as pd
import csv
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.models import BayesianModel
from pgmpy.inference import VariableElimination
import os
3.7")
heartDisease = pd.read_csv('new dataset.csv')

heartDisease = heartDisease.replace('?',np.nan)
print('Sample instances from the dataset are given below')

print(heartDisease.head())
print('\n Attributes and datatypes')

print(heartDisease.dtypes)
model= BayesianModel([('age','heartdisease'),('sex','heartdisease'),('exang','heartdisease'),('cp','heartdisease'),
('heartdis ease','restecg'),('heartdisease','chol')])
print('\nLearning CPD using Maximum likelihood estimators')
model.fit(heartDisease,estimator=MaximumLikelihoodEstimator)
print('\n Inferencing with Bayesian Network:')

HeartDiseasetest_infer = VariableElimination(model)
print('\n 1. Probability of HeartDisease given evidence= restecg')

q1=HeartDiseasetest_infer.query(variables=['heartdisease'],evidence={'restecg':1})
print(q1)
print('\n 2. Probability of HeartDisease given evidence= cp ')

q2=HeartDiseasetest_infer.query(variables=['heartdisease'],evidence={'cp':2})
print(q2)
OUTPUT:
Sample instances from the dataset are given below

age sex cp trestbps chol fbs restecg thalach exang oldpeak slope \
0 63 1 1 145 233 1 2 150 0 2.3 3
1 67 1 4 160 286 0 2 108 1 1.5 2
2 67 1 4 120 229 0 2 129 1 2.6 2
3 37 1 3 130 250 0 0 187 0 3.5 3
4 41 0 2 130 204 0 2 172 0 1.4 1
ca thal heartdisease
0 0 6 0 6
0
1 3 3 2
6
1
2 2 7 1
3 0 3 0
4 0 3 0
Attributes and datatypes

age int64
sex int64
cp int64
trestbps int64
chol int64
fbs int64
restecg int64
thalach int64
exang int64
oldpeak float64
slope int64
ca object
thal object
heartdisease int64
dtype: object
Learning CPD using Maximum likelihood estimators
Inferencing with Bayesian Network:
1. Probability of HeartDisease given evidence= restecg

+ + +
| heartdisease | phi(heartdisease) |
+=================+=====================+
| heartdisease(0) | 0.1012 |
+ + +
+ + +
+ + +
+ + +
+ + +
2. Probability of HeartDisease given evidence= cp

+ + +
| heartdisease | phi(heartdisease) |
+=================+=====================+
+ + +
+ + +
+ + +
+ + +
+ + +
6
2
EXPERIMENT:14
Write a program to Implement Support Vector Machines and Principle Component Analysis
AIM:
Write a program to Implement Support Vector Machines and Principle Component Analysis
Description:
Support Vector Machines (SVM) is a powerful supervised machine learning algorithm used
for classification and regression tasks. SVMs are particularly effective in cases where the data
has complex patterns and a clear margin of separation between classes.
Program:
import pandas as pd
from sklearn.svm import SVC
from sklearn.decomposition import PCA
from sklearn.metrics import accuracy_score
import os
3.7")
# Load the dataset

6
3
data = pd.read_csv("dataset.csv")
6
4
# Split the dataset into features (X) and target (y)
X = data.drop('target', axis=1)
y = data['target']
# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Perform PCA for dimensionality reduction

pca = PCA(n_components=2)
X_train_pca = pca.fit_transform(X_train)
X_test_pca = pca.transform(X_test)
# Train the SVM classifier

svm = SVC()
svm.fit(X_train_pca, y_train)
# Make predictions on the test set

y_pred = svm.predict(X_test_pca)
# Calculate the accuracy of the SVM classifier

accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
OUTPUT:
Accuracy: 0.7377049180327869
EXPERIMENT:15
Write a program to Implement Principle Component Analysis
AIM:
Write a program to Implement Principle Component Analysis
Description:
Principal Component Analysis (PCA) is a popular dimensionality reduction technique used in
data analysis and machine learning. It aims to transform a high-dimensional dataset into a
lower-dimensional space while retaining as much information as possible.
USES:
6
5
1. Dimensionality Reduction: By selecting a subset of the top-k principal
components, where k is lower than the original number of features, PCA
reduces the dimensionality of the dataset. This can be helpful when dealing
with high-dimensional data, as it simplifies subsequent analysis and modeling
tasks.
2. Data Visualization: PCA can be used to visualize high-dimensional data in a
lower- dimensional space. By projecting the data onto the principal
components, it is possible to create two or three-dimensional scatter plots
that provide insights into the structure and relationships within the data.
3. Feature Extraction: The principal components themselves can be
interpreted as new features that capture the most important patterns in the
data. These new features can be
6
6
used in subsequent analysis or modeling tasks, potentially leading to improved
performance.
PROCESS:
1. Data Preprocessing: Prepare the dataset by handling missing values,

normalizing or standardizing features, and ensuring that the data is in a
suitable format.
2. Compute Covariance Matrix: Calculate the covariance matrix of the data,
which measures the pairwise relationships between the features.
3. Compute Eigenvectors and Eigenvalues: Determine the eigenvectors and
corresponding eigenvalues of the covariance matrix. The eigenvectors
represent the principal components, while the eigenvalues indicate the amount
of variance captured by each component.
4. Select Principal Components: Select the top-k eigenvectors with the largest
eigenvalues to retain the most important components. Typically, you choose a
value of k that retains a significant portion of the variance, such as 95% or 99%.
5. Project Data: Project the original data onto the selected principal components
to obtain the lower-dimensional representation of the dataset.
PROGRAM:
import pandas as pd
from sklearn.decomposition import PCA
import os
3.7")
# Load the dataset

data = pd.read_csv("dataset.csv")
# Separate features and target variable

X = data.drop('target', axis=1)
y = data['target']
# Perform PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)
# Explained variance ratio

explained_variance_ratio = pca.explained_variance_ratio_
print("Explained Variance Ratio:", explained_variance_ratio)
# Access principal components

principal_components = pca.components_
print("Principal Components:\n", principal_components)
OUTPUT:
Explained Variance Ratio: [0.7475642 0.15037022]

Principal Components:
6
7
[[ 3.94611190e-02 -1.78278639e-03 -1.53716667e-03 4.75880705e-02
9.98053283e-01 1.16389852e-04 -1.55243101e-03 -7.35838010e-03
6
8
6.31483108e-04 1.32988432e-03 -9.99857233e-05 1.46773705e-03
1.18215354e-03]
[ 1.82186255e-01 7.93727347e-04 -1.25419057e-02 1.03810033e-01
-1.94250905e-02 4.61971663e-04 -1.20213285e-03 -9.77188942e-01
7.54817512e-03 1.79407185e-02 -1.04271838e-02 1.01095919e-02
2.59241726e-03]]
Additional Experiments
EXPERIMENT 1 :
Exercises to solve the real-world problems using the following machine learning methods like multi class
classification
AIM:
Exercises to solve the real-world problems using the following machine learning methods like multi class
classification
Description:
Types of Classification
Classification is of two types:
3. Binary Classification: When we have to categorize given data into 2 distinct
classes. Example – On the basis of given health conditions of a person, we have to
determine whether the person has a certain disease or not.
4. Multiclass Classification: The number of classes is more than 2. For Example – On the
basis of data about different species of flowers, we have to determine which specie
our observation belongs.
Examples of Binary Classification:
 Email spam detection (spam or not).

 Churn prediction (churn or not).
 Conversion prediction (buy or not).
6
9
Popular algorithms that can be used for binary classification include:
 Logistic Regression
 k-Nearest Neighbors
 Decision Trees
 Support Vector Machine
 Naive Bayes
Multi-Class Classification
Multi-class classification refers to those classification tasks that have more than two class
labels. Examples include:
 Face classification.
 Plant species classification.
 Optical character recognition.
Popular algorithms that can be used for multi-class classification include:
k-Nearest Neighbors.

Decision Trees.

Naive Bayes.

Random Forest.

Gradient Boosting.

PROGRAM:
from numpy import where
from collections import Counter
from sklearn.datasets import make_blobs from
matplotlib import pyplot
# define dataset
X, y = make_blobs(n_samples=1000, centers=3, random_state=1) #

7
0
summarize dataset shape
7
1
print(X.shape, y.shape)
# summarize observations by class label
counter = Counter(y)
print(counter)
# summarize first few examples
for i in range(10):
print(X[i], y[i])
# plot the dataset and color the by class label
for label, _ in counter.items():
row_ix = where(y == label)[0]
pyplot.scatter(X[row_ix, 0], X[row_ix, 1], label=str(label))
pyplot.legend()
pyplot.show()
OUTPUT:
7
2
7
3

Deep Learning lab with Tensorflow (2)

Uploaded by

Deep Learning lab with Tensorflow (2)

Uploaded by

BONAM VENKATA CHALAMAYYA ENGINEERING COLLEGE

DEEP LEARNING WITH TENSORFLOW LAB

III B.Tech II Semester

Sl.No Name of Experiment Page.No

1 Implement multilayer perceptron algorithm for MNIST Hand written Digit

7 Use a pre-trained Convolution Neural Network (VGG16) for image classification.0

8 Implement one hot encoding of words or characters.

9 Implement word embeddings for IMDB dataset

10 Implement a Recurrent Neural Network for IMDB movie review classification

2.Loading and preprocessing dataset

3.Definding the MLP model

4.Compling the model

5.training the model

6.Evaluating the model

9.saving and loading the model

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist. npz

Feature matrix: (60000, 28, 28)

C:\Users\reddy\anaconda3\Lib\site-packages\keras\src\layers\reshaping\flatten.py:37: Use rWarning: Do not pass an

24/24 ━━━━━━━━━━━━━━━━━━━━ 2s 39ms/step - accuracy: 0.2255 - loss: 2.2490 - val_accurac

results = model.evaluate(x_test, y_test, verbose = 0) print('test loss,

for sentiment analysis.

[1]: from keras.datasets import imdb

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras- datasets/imdb.npz

# we'll verify this below

print(type([max(sequence) for sequence in train_data]))

# Find the maximum of all max indexes

# step 1: load the dictionary mappings from word to integer index

# Step 3: decode the review, mapping integer indices to words

Step-1: Data Pre-processing

Step-2: Fitting the Simple Linear Regression to the

Training Set Step: 3. Prediction of test set result

Step: 4. visualizing the Training

set results Step: 5. visualizing the

Test set results Program:

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 1/3, random_state=0)

from sklearn.linear_model import LinearRegression

plt.scatter(x_train, y_train, color="green")

plt.plot(x_train, x_pred, color="red")

plt.title("Salary vs Experience (Training Dataset)")

plt.scatter(x_test, y_test, color="blue")

plt.plot(x_train, x_pred, color="red") 1

 Logistic regression is one of the most popular Machine Learning algorithms,

The above equation is the final equation for Logistic Regression.

NOTE: In logistic regression, we will do feature scaling because we want accurate

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25, random_state=0)

from sklearn.preprocessing import StandardScaler

from sklearn.linear_model import LogisticRegression

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,

intercept_scaling=1, l1_ratio=None, max_iter=100,

multi_class='warn', n_jobs=None, penalty='l2',

random_state=0, solver='warn', tol=0.0001, verbose=0,

from sklearn.metrics import confusion_matrix

from matplotlib.colors import ListedColormap

x_set, y_set = x_train, y_train

x1, x2 = np.meshgrid(np.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].max() + 1, step =0.01),

np.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))

plt.contourf(x1, x2, classifier.predict(np.array([x1.ravel(), x2.ravel()]).T).reshape(x1.shape),

alpha = 0.75, cmap = ListedColormap(('purple','green' )))

plt.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1], 2

from matplotlib.colors import ListedColormap

x_set, y_set = x_train, y_train

x1, x2 = np.meshgrid(np.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].max() + 1, step =0.01),

np.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))

plt.contourf(x1, x2, classifier.predict(np.array([x1.ravel(), x2.ravel()]).T).reshape(x1.shape),

alpha = 0.75, cmap = ListedColormap(('purple','green' )))

plt.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],

c = ListedColormap(('purple', 'green'))(i), label = j)

plt.title('Logistic Regression (Training set)')

from matplotlib.colors import ListedColormap

x_set, y_set = x_test, y_test

x1, x2 = np.meshgrid(np.arange(start = x_set[:, 0].min() - 1, stop = x_set[:, 0].max() + 1, step =0.01),

np.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))