Deep Learning lab with Tensorflow (2)
Deep Learning lab with Tensorflow (2)
(AUTONOMOUS)
APPROVED BY AICTE, NEW DELHI, AFFILIATED TO JNTUK KAKINADA
ODALAREVU, ALLAVARAL MANDAL, DR.B.R AMBEDKAR KONASEEMA DISTRICT, ANDHRA PRADESH – 533210.
B.Satish
k.Sai ram
T.Vihar Ram
B. Navaneeth Krishna
S.Durga Prasad
DEPARTMENT OF CSE-AI&ML
III YEAR II SEM Code: 20AD6L04 L T P C
0 0 3 1.5
DEEP LEARNING WITH TENSORFLOW LAB
Course Outcomes:
On completion of this course, the student will be able to
Implement deep neural networks to solve real world problems
Choose appropriate pre-trained model to solve real time problem
Interpret the results of two different deep learning models
List of Experiments:
1. Implement multilayer perceptron algorithm for MNIST Hand written Digit Classification.
2. Design a neural network for classifying movie reviews (Binary Classification) using IMDB
dataset.
3. Design a neural Network for classifying news wires (Multi class classification) using
Reuters dataset.
4. Design a neural network for predicting house prices using Boston Housing Price dataset.
5. Build a Convolution Neural Network for MNIST Hand written Digit Classification.
6. Build a Convolution Neural Network for simple image (dogs and Cats) Classification
7. Use a pre-trained Convolution Neural Network (VGG16) for image classification.0
8. Implement one hot encoding of words or characters.
9. Implement word embeddings for IMDB dataset.
10. Implement a Recurrent Neural Network for IMDB movie review classification problem.
Text Books:
1. Reza Zadeh and BharathRamsundar, ―Tensorflow for Deep Learning‖, O‘Reilly publishers,
2018
References:
1. https://github.com/fchollet/deep-learning-with-python-notebooks
TABLE OF CONTENTS
2 Design a neural network for classifying movie reviews (Binary Classification) using
IMDB dataset.
.
3 Design a neural Network for classifying news wires (Multi class classification)
using Reuters dataset
4 Design a neural network for predicting house prices using Boston Housing Price
dataset.
5 Build a Convolution Neural Network for MNIST Hand written Digit Classification.
6 Build a Convolution Neural Network for simple image (dogs and Cats) Classification
I MPL E ME N T MU L T I L A Y E R PE R C E PT R O N A L G O R I T H M F O R MN I S
T HAND WR I T T E N D I G I T C L A S S I F I C A T I O N .
Aim: Implement multilayer perceptron algorithm for MNIST Hand written Digit Classification.
Description:
Handwritten digit recognition using the MNIST dataset is a significant project built with the help
of neural
networks. It is designed to detect scanned images of handwritten digits.
We have taken this concept a step further by enhancing our handwritten digit recognition system
to not only
identify scanned images but also allow users to write digits directly on the screen using an
integrated GUI for
real-time recognition.
The MNIST dataset (Modified National Institute of Standards and Technology) is a comprehensive
collection
of handwritten digits (0-9) widely used for training and testing machine learning models,
especially in
image classification and deep learning.
Key Features of MNIST:
Contains 60,000 training images and 10,000 test images.
Each image is 28×28 pixels in grayscale (values ranging from 0 to 255).
Labels range from 0 to 9, representing the corresponding digit in the image.
Commonly used for benchmarking neural network architectures like MLPs, CNNs, and RNNs.
PROCESS:
1.Importing libraries
The necessary libraries such as TensorFlow, Keras, NumPy, and Matplotlib are imported.
These libraries help in defining the model, handling data, and visualizing results.
7.Making prediction
New samples are passed through the model using predict(). The predictions are compared with
actual labels.
8.Visualizing result
Accuracy and loss curves are plotted using Matplotlib. Confusion matrix and classification
reports are generated.
import tensorflow as tf
import numpy as np
from tensorflow.keras.models import Sequential from
tensorflow.keras.layers import Flatten from
tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Activation
import matplotlib.pyplot as plt
In [2]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
In [3]:
x_train = x_train.astype('float32') x_test =
x_test.astype('float32')
In [4]:
gray_scale = 255
x_train /= gray_scale x_test
/= gray_scale
In [5]:
print("Feature matrix:", x_train.shape)
print("Target matrix:", x_test.shape)
print("Feature matrix:", y_train.shape)
print("Target matrix:", y_test.shape)
In [6]:
fig, ax = plt.subplots(10, 10) k =
0
for i in range(10):
for j in range(10):
ax[i][j].imshow(x_train[k].reshape(28, 28),
aspect='auto')
k += 1
plt.show()
In [7]:
model = Sequential([
Flatten(input_shape=(28, 28)),
Dense(256, activation='sigmoid'),
Dense(128, activation='sigmoid'),
Dense(10, activation='sigmoid'),
])
In [8]:
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
In [9]:
model.fit(x_train, y_train, epochs=10,
batch_size=2000,
validation_split=0.2)
Epoch 1/10
D E S I G N A N E U R A L N E T WO R K F O R C L A S S I F Y I N G MO V I E R E V I E WS ( B I N A R
Y
C L A S S I F I C A T I O N ) U S I N G I MD B D A T A S E T .
AIM:Designa neural network for classifying moviereviews (Binary Classification) using IMDB dataset.
Description:
The IMDB dataset is a popular benchmark dataset for binary sentiment
classification, where movie reviews are categorized as either positive (1) or
negative (0). It is widely utilized in natural language processing (NLP) tasks,
particularly for sentiment analysis.
Dataset Overview:
o Total Size: 50,000 movie reviews
o Training Set: 25,000 reviews
o Test Set: 25,000 reviews
Labels:
o 1 → Positive review
o 0 → Negative review
This dataset serves as a fundamental resource for training and evaluating machine learning models in
sentiment classification.
Process:
STEP 1: Loading the dataset
STEP 2: Decoding the review
STEP 3: Dadding the example
STEP 4: Creating and training
STEP 5: predictions and evaluation.
we will be using
Tensorflow
keras
IMDB dataset
TensorFlow: An open-source machine learning framework developed by Google for deep
learning
and numerical computation.
Keras: A high-level neural networks API, built on top of TensorFlow, that simplifies deep
learning
model development.
IMDB Dataset: A dataset of 50,000 movie reviews labeled as positive or negative, commonly used
[2]: # Load the data, keeping only 10,000 of the most frequently occuring words
(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words␣
𝗌= 10000)
60s
3us/s
2
[3]:
# Since we restricted ourselves to the top 10000 frequent words, no word index␣
𝗌should exceed 10000
# Here is a list of maximum indexes in every review --- we search the maximum␣
𝗌index in this list of max indexes
<class 'list'>
[3]: 9999
[4]: #Let's quickly decode a review
# step 2: reverse word index to map integer indexes to their respective words
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
3
STEP 3:
STEP 4:
4
STEP 5:
STEP 6:
5
PROGRAM:
6
OUTPUT:
7
EXPERIMENT:3
Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an appropriate
data set for building the decision tree and apply this knowledge to classify a new sample.
AIM:
Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an appropriate
data set for building the decision tree and apply this knowledge to classify a new sample.
Description:
8
Training Dataset:
9
Test Dataset:
Program:
1
0
1
1
OUTPUT:
EXPERIMENT:4
Exercises to solve the real-world problems using the following machine learning methods: a) Linear
Regression b) Logistic Regression c) Binary Classifier
AIM:
Exercises to solve the real-world problems using the following machine learning methods: a) Linear
Regression b) Logistic Regression c) Binary Classifier
Description:
Linear Regression:
Simple Linear Regression is a type of Regression algorithms that models the relationship
between a dependent variable and a single independent variable. The relationship shown by
a Simple Linear Regression model is linear or a sloped straight line, hence it is called Simple
Linear Regression.
1
2
The key point in Simple Linear Regression is that the dependent variable must be a
continuous/real value. However, the independent variable can be measured on continuous
or categorical values.
1
3
Here we are taking a dataset that has two variables: salary (dependent variable) and
experience (Independent variable). The goals of this problem is:
We want to find out if there is any correlation between these two variables
We will find the best fit line for the dataset.
How the dependent variable is changing by changing the independent variable.
Procedure:
import numpy as np
import pandas as pd
import os
os.getcwd()
os.chdir("/content/drive/MyDrive/Datasets")
data_set= pd.read_csv('Salary_Data.csv')
data_set
x= data_set.iloc[:, :-1].values
y= data_set.iloc[:, 1].values
x_test
y_test
1
4
x_train
y_train
regressor= LinearRegression()
regressor.fit(x_train, y_train)
y_pred= regressor.predict(x_test)
x_pred= regressor.predict(x_train)
plt.xlabel("Years of Experience")
plt.ylabel("Salary(In Rupees)")
plt.show()
OUTPUT:
plt.xlabel("Years of Experience")
plt.ylabel("Salary(In Rupees)")
plt.show()
OUTPUT:
Logistic Regression:
1
7
Logistic Regression Equation:
o We know the equation of the straight line can be written as:
o In Logistic Regression y can be between 0 and 1 only, so for this let's divide
the above equation by (1-y):
o But we need range between -[infinity] to +[infinity], then take logarithm of the
equation it will become:
Example: There is a dataset given which contains the information of various users
obtained from the social networking sites. There is a car making company that has
recently launched a new SUV car. So the company wanted to check how many users
from the dataset, wants to purchase the car.
For this problem, we will build a Machine Learning model using the Logistic regression
algorithm. The dataset is shown in the below image. In this problem, we will predict
the purchased variable (Dependent Variable) by using age and salary
(Independent variables).
1
8
PROGRAM:
import numpy as np
import pandas as pd
import os
os.getcwd()
os.chdir("/content/drive/MyDrive/Datasets")
data_set= pd.read_csv('car_data.csv')
data_set
x= data_set.iloc[:, [2,3]].values
y= data_set.iloc[:, 4].values
x_test
y_test
x_train 1
9
y_train
st_x= StandardScaler()
x_train= st_x.fit_transform(x_train)
x_test= st_x.transform(x_test)
classifier= LogisticRegression(random_state=0)
classifier.fit(x_train, y_train)
warm_start=False)
y_pred= classifier.predict(x_test)
cm= confusion_matrix(y_test,y_pred)
plt.xlim(x1.min(), x1.max())
plt.ylim(x2.min(), x2.max())
for i, j in enumerate(np.unique(y_set)):
2
1
plt.title('Logistic Regression (Training set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()
plt.xlim(x1.min(), x1.max())
plt.ylim(x2.min(), x2.max())
for i, j in enumerate(np.unique(y_set)):
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()
2
2
OUTPUT
plt.xlim(x1.min(), x1.max())
plt.ylim(x2.min(), x2.max())
for i, j in enumerate(np.unique(y_set)):
2
4
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()
Binary Classifier:
Types of Classification
Classification is of two types:
1. Binary Classification: When we have to categorize given data into 2 distinct
classes. Example – On the basis of given health conditions of a person, we have to
determine whether the person has a certain disease or not.
2. Multiclass Classification: The number of classes is more than 2. For Example – On the
basis of data about different species of flowers, we have to determine which specie
our observation belongs.
2
5
Examples of Binary Classification:
Logistic Regression
k-Nearest Neighbors
Decision Trees
Support Vector Machine
Naive Bayes
Multi-Class Classification
Multi-class classification refers to those classification tasks that have more than two class
labels. Examples include:
Face classification.
Plant species classification.
Optical character recognition.
Popular algorithms that can be used for multi-class classification include:
k-Nearest Neighbors. 2
6
Decision Trees.
Naive Bayes.
2
7
Random Forest.
Gradient Boosting.
PROGRAM:
# define dataset
print(X.shape, y.shape)
Counter(y)
print(counter)
in range(10):
print(X[i], y[i])
label, _ in counter.items():
pyplot.legend()
pyplot.show()
2
8
OUTPUT:
EXPERIMENT:5
AIM:
Description:
Bias:
In machine learning, bias refers to the difference between the predictions made by a learning
algorithm and the true values of the target variable. It measures the systematic error or the
tendency of a model to consistently underfit or overfit the data.
Variance:
Cross Validation:
PROGRAM:
import numpy as np
from sklearn.model_selection import
train_test_split from sklearn.linear_model 2
9
import LinearRegression
3
0
from sklearn.metrics import mean_squared_error
# Add duplicate
samples X =
np.vstack((X, X[:10]))
y = np.vstack((y, y[:10]))
# Perform cross-validation
cross_val_errors = []
for i in range(5):
X_train, X_val, y_train, y_val = train_test_split(X_unique, y_unique, test_size=0.2, random_state=i)
model = LinearRegression()
model.fit(X_train, y_train)
y_val_pred = model.predict(X_val)
val_error = mean_squared_error(y_val, y_val_pred)
cross_val_errors.append(val_error)
3
1
OUTPUT:
EXPERIMENT:6
AIM:
Description:
Categorical encoding:
Categorical encoding is a process of converting categorical variables (features) into numerical
representations that machine learning algorithms can understand. Categorical variables are
variables that represent discrete categories or groups, such as color, country, or product type.
1. Label Encoding:
Assigns a unique numerical label to each category in the variable.
Useful for ordinal variables where the categories have an inherent order.
Implemented using LabelEncoder class in scikit-learn.
the
2. One-Hot
Encoding:
Creates binary columns for each category and represents the presence or
absence of a category using 1s and 0s.
Suitable for nominal variables where there is no inherent order.
3
3
Requires careful handling to avoid leakage and overfitting.
6. Binary Encoding:
Represents each category with binary codes.
Useful for high-cardinality categorical variables (variables with
many unique categories).
Implemented using libraries category_encoders or custom encoding functions.
like
PROGRAM:
import pandas as pd
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
# Sample dataset
data = {'Color': ['Red', 'Blue', 'Green', 'Red', 'Blue']}
df = pd.DataFrame(data)
# Categorical Encoding
label_encoder = LabelEncoder()
df['Color_Encoded'] = label_encoder.fit_transform(df['Color'])
# One-Hot Encoding
onehot_encoder = OneHotEncoder(sparse=False)
onehot_encoded = onehot_encoder.fit_transform(df[['Color_Encoded']])
onehot_df = pd.DataFrame(onehot_encoded, columns=label_encoder.classes_)
OUTPUT:
3
4
EXPERIMENT:7
Build an Artificial Neural Network by implementing the Back propagation algorithm and test the same using
appropriate data sets.
AIM:
Build an Artificial Neural Network by implementing the Back propagation algorithm and test the same using
appropriate data sets.
Description:
The backpropagation algorithm is a widely used method for training artificial neural
networks (ANNs). It allows the network to learn from labeled training data by
iteratively adjusting the weights and biases of the network's connections to minimize
the error between predicted and actual outputs. Here is a step-by-step explanation of
the backpropagation algorithm:
1. Initialize the network:
Define the network architecture, including the number of layers,
neurons per layer, and activation functions.
Randomly initialize the weights and biases for each connection in the
network.
2. Forward propagation:
Input an instance of training data to the network.
Calculate the weighted sum of inputs and biases for each neuron in each
layer.
Apply the activation function to obtain the output of each neuron.
Pass the outputs forward to the next layer until reaching the output layer.
Compare the network's output with the actual output and calculate the
error.
3. Backward propagation:
Calculate the gradient of the error with respect to the weights and
biases of the output layer.
Update the weights and biases of the output layer using the gradient and
a learning rate.
Calculate the gradients for the previous layers using the chain rule.
Update the weights and biases of the previous layers using the
gradients and the learning rate.
Repeat the above steps for all instances in the training dataset.
3
6
PROGRAM:
import tensorflow as tf
import numpy as np
biases = {
'hidden': tf.Variable(tf.random.normal([n_hidden])),
'output': tf.Variable(tf.random.normal([n_output]))
}
3
8
predictions = forward_propagation(X)
print("Predictions:")
print(predictions.numpy().round())
OUTPUT:
EXPERIMENT:8
Write a program to implement k-Nearest Neighbor algorithm to classify the iris data set. Print both correct
and wrong predictions.
AIM:
Write a program to implement k-Nearest Neighbor algorithm to classify the iris data set. Print both correct
and wrong predictions.
Description:
The k-Nearest Neighbor (k-NN) algorithm is a popular supervised machine learning algorithm
used for both classification and regression tasks. It is a non-parametric method that makes
predictions based on the similarity between input data points.
Process:
PROGRAM:
import numpy as np
import pandas as pd
import os
os.getcwd()
os.chdir("/content/drive/MyDrive/Datasets")
X = dataset.iloc[:, :-1]
y = dataset.iloc[:, -1]
X.head()
y.head()
i=0
print ("\n ")
4
0
print ('%-25s %-25s %-25s' % ('Original Label', 'Predicted Label',
'Correct/Wrong'))
print (" ")
for label in ytest:
print ('%-25s %-25s' % (label, ypred[i]), end="")
if (label == ypred[i]):
print (' %-25s' % ('Correct'))
else:
print (' %-25s' % ('Wrong'))
i=i+1
print (" ")
print("\nConfusion Matrix:\n",metrics.confusion_matrix(ytest, ypred))
print (" ")
print("\nClassification Report:\n",metrics.classification_report(ytest, ypred))
print (" ")
print('Accuracy of the classifer is %0.2f' % metrics.accuracy_score(ytest,ypred))
print (" ")
OUTPUT:
-------------------------------------------------------------------------
Original Label Predicted Label Correct/Wrong
-------------------------------------------------------------------------
Iris-versicolor Iris-versicolor
Correct Iris-setosa Iris-
setosa Correct Iris-versicolor Iris-
versicolor Correct Iris-virginica Iris-
virginica Correct Iris-versicolor Iris-
versicolor Correct Iris-virginica Iris-
virginica Correct Iris-versicolor Iris-
versicolor Correct Iris-versicolor Iris-
versicolor Correct Iris-virginica Iris-
virginica Correct
Iris-setosa Iris-setosa Correct
Iris-virginica Iris-virginica
Correct Iris-versicolor Iris-
versicolor Correct Iris-versicolor Iris-
versicolor Correct Iris-virginica Iris-
virginica Correct
Iris-setosa Iris-setosa Correct
-------------------------------------------------------------------------
Confusion Matrix:
[[3 0 0]
[0 7 0]
[0 0 5]]
...
4
1
-------------------------------------------------------------------------
Accuracy of the classifer is 1.00
4
2
EXPERIMENT:9
Implement the non-parametric Locally Weighted Regression algorithm in order to fit data points. Select
appropriate data set for your experiment and draw graphs.
AIM:
Implement the non-parametric Locally Weighted Regression algorithm in order to fit data points. Select
appropriate data set for your experiment and draw graphs.
Description:
PROCESS:
PROGRAM:
import numpy as np
import matplotlib.pyplot as plt
# Choose tau
tau = 0.5
OUTPUT:
4
4
EXPERIMENT:10
Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier model to perform
this task. Built-in Java classes/API can be used to write the program. Calculate the accuracy, precision, and
recall for your data set.
AIM:
Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier model to perform
this task. Built-in Java classes/API can be used to write the program. Calculate the accuracy, precision, and
recall for your data set.
Description:
PROGRAM:
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.List;
import java.util.Map;
import weka.classifiers.bayes.NaiveBayes;
import weka.classifiers.evaluation.Evaluation;
import weka.core.Attribute;
import weka.core.DenseInstance;
import weka.core.Instances;
4
5
public class NaiveBayesClassifierExample {
public static void main(String[] args) throws Exception {
// Step 1: Read the dataset file
BufferedReader reader = new BufferedReader(new FileReader("dataset.arff"));
4
6
Instances dataset = new Instances(reader);
reader.close();
EXPERIMENT:11
Apply EM algorithm to cluster a Heart Disease Data Set. Use the same data set for clustering using k-Means
algorithm. Compare the results of these two algorithms and comment on the quality of clustering. You can
add Java/Python ML library classes/API in the program.
AIM:
Apply EM algorithm to cluster a Heart Disease Data Set. Use the same data set for clustering using k-Means
algorithm. Compare the results of these two algorithms and comment on the quality of clustering. You can
add Java/Python ML library classes/API in the program.
Description:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import cluster
import os
os.chdir("/content/drive/MyDrive/Bhavani")
df=pd.read_csv("heart.csv")
df.head()
X=df.drop(columns='target')
#Elbow Method
Sum_of_squared_distances = []
K = range(1,10)
for num_clusters in K :
kmeans_model = cluster.KMeans(n_clusters=num_clusters)
kmeans_model.fit(X[["trestbps","chol"]])
Sum_of_squared_distances.append(kmeans_model.inertia_)
plt.plot(K,Sum_of_squared_distances,'bx-')
plt.xlabel('Values of K')
plt.ylabel('Sum of squared distances/Inertia')
plt.title('Elbow Method For Optimal k')
plt.show()
kmeans=cluster.KMeans(n_clusters=2,init='k-means++')
X['Clusters']=kmeans.labels_
#Centroids
centers = np.array(kmeans.cluster_centers_)
centers
4
8
OUTPUT:
EXPERIMENT:12
AIM:
Description:
Exploratory Data Analysis (EDA) is an important step in the data analysis process. It
involves examining and visualizing the data to gain insights, identify patterns, and
understand the characteristics of the dataset. EDA helps in discovering relationships between
variables, detecting outliers, and preparing the data for further analysis or modeling. Here's a
general framework for performing EDA:
4
9
PROCESS:
1. Load the Data: Start by loading your dataset into a suitable data structure,
such as a Pandas DataFrame in Python.
2. Data Summary: Get an overview of the dataset by examining the
dimensions, data types, and basic statistics of the variables. Some useful
functions for this include shape, head, info,
. describe , dtypes
3. and
: Check for missing values in the dataset and decide on an appropriate
Missing
strategy for handling them. Use functions like isnull, isna, to handle missing
fillna
dropna, or data.
4. Data Visualization: Create visualizations to explore the distribution,
relationships, and patterns in the data. Some common plots include
histograms, box plots, scatter plots, bar plots, and correlation matrices.
Matplotlib, Seaborn, and Plotly are popular libraries for data visualization in
Python.
5. Univariate Analysis: Analyze each variable individually to understand its
distribution and characteristics. This can involve examining frequency counts,
summary statistics, histograms, or box plots for numerical variables, and bar
plots or pie charts for categorical variables.
6. Bivariate Analysis: Explore relationships between pairs of variables to identify
correlations, dependencies, or associations. Scatter plots, line plots, or
heatmaps can be used to visualize these relationships. Statistical tests such as
correlation coefficients or t-tests can also provide insights into the relationships.
7. Multivariate Analysis: Analyze the relationships among multiple variables
simultaneously. This can involve visualizations like pair plots, parallel
coordinates plots, or heatmaps to observe patterns and interactions.
8. Outlier Detection: Identify potential outliers in the dataset that deviate
significantly from the rest of the data. Box plots, scatter plots, or statistical
methods like z-scores or the IQR (interquartile range) can be used for outlier
detection.
9. Feature Engineering: Explore opportunities for creating new features or
transforming existing ones to improve the predictive power of your data.
This can involve scaling, normalization, encoding categorical variables, or
creating interaction variables.
10. Data Preprocessing: Clean and preprocess the data as needed for further
analysis or modeling. This can include handling missing values, dealing with
outliers, standardizing or normalizing variables, or performing feature
selection.
PROGRAM:
import pandas as pd
import matplotlib.pyplot as plt
import os
os.chdir("C://Users//INDIAN//AppData//Roaming//Microsoft//Windows//Start Menu//Programs//Python
3.7")
data = pd.read_csv("WASDE-DATA.csv")
# Check the dimensions of the dataset
5
0
print("Number of rows:", data.shape[0])
print("Number of columns:", data.shape[1])
5
1
print(data.head())
OUTPUT:
5
3
period object
value float64
min_value float64
max_value float64
dtype: object
value min_value max_value
count 10000.000000 0.0 0.0
mean 73.100683 NaN NaN
std 149.321209 NaN NaN
min 0.000000 NaN NaN
25% 2.000000 NaN NaN
50% 11.570000 NaN NaN
75% 53.990000 NaN NaN
max 794.440000 NaN NaN
World Less China 455
World 3/ 455
United States 445
Argentina 441
Australia 441
Bangladesh 441
Brazil 441
Canada 441
China 441
India 441
Japan 441
Kazakhstan 441
Major Exporters 4/ 441
Major Importers 6/ 441
N. Africa 7/ 441
Nigeria 441
Russia 441
Sel. Mideast 8/ 441
Southeast Asia 9/ 441
Total Foreign 441
Ukraine 441
European Union 5/ 385
United Kingdom 266
EU-27+UK 5/ 56
Name: region, dtype: int64
5
4
5
5
5
6
5
7
5
8
EXPERIMENT:13
Write a Python program to construct a Bayesian network considering medical data. Use this model to
demonstrate the diagnosis of heart patients using standard Heart Disease Data Set
AIM:
Write a Python program to construct a Bayesian network considering medical data. Use this model to
demonstrate the diagnosis of heart patients using standard Heart Disease Data Set
Description:
PROGRAM:
import numpy as np
import pandas as pd
import csv
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.models import BayesianModel
from pgmpy.inference import VariableElimination
import os
os.chdir("C://Users//INDIAN//AppData//Roaming//Microsoft//Windows//Start Menu//Programs//Python
3.7")
model= BayesianModel([('age','heartdisease'),('sex','heartdisease'),('exang','heartdisease'),('cp','heartdisease'),
('heartdis ease','restecg'),('heartdisease','chol')])
print('\nLearning CPD using Maximum likelihood estimators')
model.fit(heartDisease,estimator=MaximumLikelihoodEstimator)
OUTPUT:
ca thal heartdisease
0 0 6 0 6
0
1 3 3 2
6
1
2 2 7 1
3 0 3 0
4 0 3 0
6
2
EXPERIMENT:14
Write a program to Implement Support Vector Machines and Principle Component Analysis
AIM:
Write a program to Implement Support Vector Machines and Principle Component Analysis
Description:
Support Vector Machines (SVM) is a powerful supervised machine learning algorithm used
for classification and regression tasks. SVMs are particularly effective in cases where the data
has complex patterns and a clear margin of separation between classes.
Program:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.decomposition import PCA
from sklearn.metrics import accuracy_score
import os
os.chdir("C://Users//INDIAN//AppData//Roaming//Microsoft//Windows//Start Menu//Programs//Python
3.7")
6
4
# Split the dataset into features (X) and target (y)
X = data.drop('target', axis=1)
y = data['target']
OUTPUT:
Accuracy: 0.7377049180327869
EXPERIMENT:15
AIM:
Description:
Principal Component Analysis (PCA) is a popular dimensionality reduction technique used in
data analysis and machine learning. It aims to transform a high-dimensional dataset into a
lower-dimensional space while retaining as much information as possible.
USES:
6
5
1. Dimensionality Reduction: By selecting a subset of the top-k principal
components, where k is lower than the original number of features, PCA
reduces the dimensionality of the dataset. This can be helpful when dealing
with high-dimensional data, as it simplifies subsequent analysis and modeling
tasks.
2. Data Visualization: PCA can be used to visualize high-dimensional data in a
lower- dimensional space. By projecting the data onto the principal
components, it is possible to create two or three-dimensional scatter plots
that provide insights into the structure and relationships within the data.
3. Feature Extraction: The principal components themselves can be
interpreted as new features that capture the most important patterns in the
data. These new features can be
6
6
used in subsequent analysis or modeling tasks, potentially leading to improved
performance.
PROCESS:
PROGRAM:
import pandas as pd
from sklearn.decomposition import PCA
import os
os.chdir("C://Users//INDIAN//AppData//Roaming//Microsoft//Windows//Start Menu//Programs//Python
3.7")
# Perform PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)
OUTPUT:
6
8
6.31483108e-04 1.32988432e-03 -9.99857233e-05 1.46773705e-03
1.18215354e-03]
[ 1.82186255e-01 7.93727347e-04 -1.25419057e-02 1.03810033e-01
-1.94250905e-02 4.61971663e-04 -1.20213285e-03 -9.77188942e-01
7.54817512e-03 1.79407185e-02 -1.04271838e-02 1.01095919e-02
2.59241726e-03]]
Additional Experiments
EXPERIMENT 1 :
Exercises to solve the real-world problems using the following machine learning methods like multi class
classification
AIM:
Exercises to solve the real-world problems using the following machine learning methods like multi class
classification
Description:
Types of Classification
Classification is of two types:
3. Binary Classification: When we have to categorize given data into 2 distinct
classes. Example – On the basis of given health conditions of a person, we have to
determine whether the person has a certain disease or not.
4. Multiclass Classification: The number of classes is more than 2. For Example – On the
basis of data about different species of flowers, we have to determine which specie
our observation belongs.
Logistic Regression
k-Nearest Neighbors
Decision Trees
Support Vector Machine
Naive Bayes
Multi-Class Classification
Multi-class classification refers to those classification tasks that have more than two class
labels. Examples include:
Face classification.
Plant species classification.
Optical character recognition.
Popular algorithms that can be used for multi-class classification include:
k-Nearest Neighbors.
Decision Trees.
Naive Bayes.
Random Forest.
Gradient Boosting.
PROGRAM:
# define dataset
7
1
print(X.shape, y.shape)
counter = Counter(y)
print(counter)
for i in range(10):
print(X[i], y[i])
pyplot.legend()
pyplot.show()
OUTPUT:
7
2
7
3