ML manoj

Department of Computer Science and Engineering
Practical File
Subject – Machine Learning Lab

(BTCS-619-18)
B.Tech – 6th Semester

[Batch 2021 - 2025]
Chandigarh Group of Colleges

College of Engineering, Landran, Mohali – 140307
SUBMITTED TO: SUBMITTED BY:

Ashpinder Kaur Manoj Sharma
2107859
6th Semester D1
INDEX
S.no. Name of Experiment Page Date Signature
1-7
1. Implement data pre-processing
8-11
2. Deploy Simple Linear Regression
12-15
3. Simulate Multiple Linear Regression
16-19
4. Implement Decision Tree
20-23
5. Deploy Random forest classification
24-27
6. Simulate Naïve Bayes algorithm
Implement K-Nearest Neighbors (K- 28-31

7. NN), k-Means
Deploy Support Vector Machine, Apriori 32-35

8. algorithm
36-39
9. Simulate Artificial Neural Network
40-43
10. Implement the Genetic Algorithm code
2
Practical No. 1
Aim: Implement data pre-processing
• Pre-processing refers to the transformations applied to our data before feeding it to the
algorithm. • Data Preprocessing is a technique that is used to convert the raw data into a clean
data set. In other words, whenever the data is gathered from different sources it is collected in
raw format which is not feasible for the analysis.
Need of Data Preprocessing
• For achieving better results from the applied model in Machine Learning projects the
format of the data must be in a proper manner. Some specified Machine Learning model
needs information in a specified format, for example, Random Forest algorithm does not
support null values, therefore to execute random forest algorithm null values have to be
managed from the original raw data set.
• Another aspect is that data set should be formatted in such a way that more than one
Machine Learning and Deep Learning algorithms are executed in one data set, and best out of
them is chosen.
Steps
:
1. Getting the dataset
2. Importing libraries
3. Importing datasets
4. Finding Missing Data
5. Encoding Categorical Data
6. Splitting dataset into training and test set
7. Feature scaling
Python Code:
import numpy as np import

matplotlib.pyplot as mtp
import pandas as pd data_set=
pd.read_csv('Data.csv')
Output:
3
Figure 1: Dataset uploaded
x= data_set.iloc[:,:-1].values
y= data_set.iloc[:,3].values
output of separating independent and dependent variables:
Figure 2: separating independent and dependent variables
#handling missing data (Replacing missing data with the

mean value) from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values=np.nan,
strategy='mean') imputerimputer= imputer.fit(x[:, 1:3]) x[:,
1:3]=
imputer.transform(x[:, 1:3])
Output: replacing NaN(missing values) with mean
Figure 3: replacing NaN(missing values) with mean

4
from sklearn.preprocessing import LabelEncoder
label_encoder_x= LabelEncoder() x[:, 0]=
label_encoder_x.fit_transform(x[:, 0]) from

sklearn.preprocessing import OneHotEncoder
from sklearn.compose import
ColumnTransformer
ct = ColumnTransformer([('one_hot_encoder', OneHotEncoder(), [0])], #The

column numbers to be transformed (here is [0] but can be [0, 1, 3])
remainder='passthrough' # Leave the rest of the columns
untouched
)
x = np.array(ct.fit_transform(x),
dtype=np.float) labelencoder_y=
LabelEncoder() y=
labelencoder_y.fit_transform(y)
Output: concerting categorical data to numeric:
Figure 3: concerting categorical data to numeric
from sklearn.model_selection import train_test_split
5
x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.2, random_state
=0)
6
Output: Dividing Dataset into training and testing datasets
Figure 4: Dividing Dataset into training and testing datasets
from sklearn.preprocessing import

StandardScaler st_x= StandardScaler()
x_train= st_x.fit_transform(x_train) x_test=
st_x.transform(x_test)
Output: Feature scaling
7
Figure 5: Feature scaling
Conclusion of Data Preprocessing:

1. Data preprocessing is a crucial step in Machine learning
2. Data preprocessing will extract hidden pattern from dataset and convert that
extracted datato knowledge from information
3. Data preprocessing have different steps like; cleaning, transformation and
scaling which convert the data into machine understandable format
8
Practical No. 2
Aim: Deploy Simple Linear Regression
Regression searches for relationships among variables.
For example, you can observe several employees of some company and try to understand how
their salaries depend on the features, such as experience, level of education, role, city they
work in, and so on.
This is a regression problem where data related to each employee represent one observation.
The presumption is that the experience, education, role, and city are the independent features,
while the salary depends on them.
Similarly, you can try to establish a mathematical dependence of the prices of houses on
their areas, numbers of bedrooms, distances to the city center, and so on.
Generally, in regression analysis, you usually consider some phenomenon of interest and have
a number of observations. Each observation has two or more features. Following the
assumption that (at least) one of the features depends on the others, you try to establish a
relation among them.
In other words, you need to find a function that maps some features or variables to others
sufficiently well.
The dependent features are called the dependent variables, outputs, or responses.
The independent features are called the independent variables, inputs, or predictors.
Regression problems usually have one continuous and unbounded dependent variable. The
inputs, however, can be continuous, discrete, or even categorical data such as gender,
nationality, brand, and so on.
.
Simple Linear Regression 5 steps have to perform as per below:
1. Importing the dataset.
2. Splitting dataset into training set and testing set (2 dimensions of X and y per
each set). Normally, the testing set should be 5% to 30% of dataset.
3. Visualize the training set and testing set to double check (you can bypass this step
if you want).
4. Initializing the regression model and fitting it using training set (both X and y).
5. Let’s predict
Step-1: Data Pre-processing import

numpy as nm import
matplotlib.pyplot as mtp
import pandas as pd
data_set= pd.read_csv('Salary_Data.csv')
9
Figure 1: output of Dataset uploaded
Step 2: Extraction of Independent And Dependent Variables
x= data_set.iloc[:, :-
1].values y= data_set.iloc[:,
1].values
Figure 2: Output of Extraction of Independent and dependentvariable
Step 3: Split dataset in training and testing
10
4: Fitting the Simple Linear Regression to the Training Set from
sklearn.linear_model import LinearRegression
regressor= LinearRegression() regressor.fit(x_train,

y_train)
Step: 5. Prediction of test set result:

y_pred= regressor.predict(x_test) x_pred=
regressor.predict(x_train)
Figure 4: Prediction variables
This will create a prediction vector y_pred, and x_pred, which will contain predictions of
test dataset, and prediction of training set respectively
Step: 6. Visualizing the Training set results:
Step: 7. visualizing the Test set results:
11
Practical Outcomes:
• Model the relationship between the two variables. Such as the relationship
between Income and expenditure, experience and Salary, etc.
• Forecasting new observations. Such as Weather forecasting according to
temperature, Revenue of a company according to the investments in a year,
etc
Applications of Simple Linear Regression

• Marks scored by students based on number of hours studied (ideally)- Here
marks scored in exams are independent and the number of hours studied is
independent.
• Predicting crop yields based on the amount of rainfall- Yield is a
dependent variable while the measure of precipitation is an independent
variable.
• Predicting the Salary of a person based on years of experience- Therefore,
Experience becomes the independent while Salary turns into the dependent
variable.
12
Practical No- 3
Aim: Simulate Multiple Linear Regression

In this type of linear regression, we always attempt to discover the relationship between two
or more independent variables or inputs and the corresponding dependent variable or output
and the independent variables can be either continuous or categorical.
Some key points about MLR:
• For MLR, the dependent or target variable(Y) must be the continuous/real, but
the predictor or independent variable may be of continuous or categorical form.
• Each feature variable must model the linear relationship with the dependent variable.
• MLR tries to fit a regression line through a multidimensional space of data-points.
“Multiple Linear Regression is one of the important regression algorithms which models the
linear relationship between a single dependent continuous variable and more than one
independent variable.”
Python Code:
Step-1: Data Pre-processing Step:
import numpy as nm import

matplotlib.pyplot as mtp import pandas
as pd #importing datasets data_set=
pd.read_csv('50_Startups.csv')
Figure 1: Output of dataset uploading

Step 2: Extracting dependent and independent Variables:
13
#Extracting Independent and dependent Variable
x= data_set.iloc[:, :-1].values
y= data_set.iloc[:, 4].values
Figure 2: Output for extracting independent and dependent variable
Step 3: Encoding Dummy Variables:

#Catgorical data from sklearn.preprocessing
import LabelEncoder label_encoder_x=
LabelEncoder() x[:, 3]=
label_encoder_x.fit_transform(x[:, 3]) from

sklearn.preprocessing import OneHotEncoder from
sklearn.compose import ColumnTransformer
ct = ColumnTransformer([('one_hot_encoder', OneHotEncoder(), [3])], #The

column numbers to be transformed (here is [0] but can be [0, 1, 3])
remainder='passthrough' # Leave the rest of the columns
untouched
x = nm.array(ct.fit_transform(x), dtype=nm.float)
labelencoder_y= LabelEncoder() y=
labelencoder_y.fit_transform(y)
14
Figure 3: Conversion of categorical data to numerical
Step 4: Splitting training data and testing data from sklearn.model_selection import
train_test_split x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.2,
random_state=0)
Figure 4: Splitting dataset
Step 5: Fitting our MLR model to the Training set:
#Fitting the MLR model to the training set:

from sklearn.linear_model import LinearRegression
regressor= LinearRegression() regressor.fit(x_train,
y_train)
15
Step 6: Prediction of Test set results
#Predicting the Test set result; y_pred=
regressor.predict(x_test)
Figure 5: Prediction variable
We can also check the score for training dataset and test dataset. Below is the code for it:
print('Train Score: ', regressor.score(x_train, y_train)) print('Test
Score: ', regressor.score(x_test, y_test))
Practical Outcomes:
• A linear relationship should exist between the Target and predictor variables.
• The regression residuals must be normally distributed.
• MLR assumes little or no multicollinearity (correlation between the
independent variable) in data
Applications of MLR
There are mainly two applications of Multiple Linear Regression:

• Effectiveness of Independent variable on prediction:
• Predicting the impact of changes
16
Practical No -4
Aim: Implement Decision Tree

• Decision Tree is a supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving
Classification problems. It is a tree-structured classifier, where internal nodes
represent the features of a dataset, branches represent the decision rules and each
leaf node represents the outcome.
• In a Decision tree, there are two nodes, which are the Decision Node and Leaf
Node. Decision nodes are used to make any decision and have multiple branches,
whereas Leaf nodes are the output of those decisions and do not contain any further
branches.
• The decisions or the test are performed on the basis of features of the given dataset.
• It is a graphical representation for getting all the possible solutions to
a problem/decision based on given conditions.
• It is called a decision tree because, similar to a tree, it starts with the root node,
which expands on further branches and constructs a tree-like structure.
• In order to build a tree, we use the CART algorithm, which stands for
Classification and Regression Tree algorithm.
• A decision tree simply asks a question, and based on the answer (Yes/No), it
further split the tree into sub trees.
• Below diagram explains the general structure of a decision tree:
Algorithm:
• Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
• Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
• Step-3: Divide the S into subsets that contains possible values for the best attributes.
• Step-4: Generate the decision tree node, which contains the best attribute.
• Step-5: Recursively make new decision trees using the subsets of the dataset
created in step -3. Continue this process until a stage is reached where you cannot
further classify the nodes and called the final node as a leaf node.
Python Code:
# Importing the libraries

matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset dataset =

pd.read_csv('Social_Network_Ads.csv'
17
) X = dataset.iloc[:, [2, 3]].values
18
y = dataset.iloc[:, -1].values
# Splitting the dataset into the Training set and Test set from
sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)
# Feature Scaling from sklearn.preprocessing

import StandardScaler sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Training the Decision Tree Classification model on the Training

set from sklearn.tree import DecisionTreeClassifier classifier =
DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
classifier.fit(X_train, y_train)
# Predicting the Test set results

y_pred = classifier.predict(X_test)
# Making the Confusion Matrix from

sklearn.metrics import confusion_matrix
cm
= confusion_matrix(y_test, y_pred) print(cm)
# Visualising the Training set results

from matplotlib.colors import
ListedColormap X_set, y_set = X_train,
y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1,
step = 0.01),
np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step =
0.01))
19
plt.contourf(X1, X2,
classifier.predict(np.array([X1.ravel(),
X2.ravel()]).T).reshape(X1.shape),
20
alpha = 0.75, cmap = ListedColormap(('red',
'green'))) plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max()) for i, j in
enumerate(np.unique(y_set)): plt.scatter(X_set[y_set ==
j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green'))(i), label = j)

plt.title('Decision Tree Classification (Training set)')
plt.xlabel('Age') plt.ylabel('Estimated Salary') plt.legend()
plt.show()
# Visualising the Test set results from

matplotlib.colors import ListedColormap
X_set, y_set = X_test, y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1,
step = 0.01),
np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step =
0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(),
X2.ravel()]).T).reshape(X1.shape), alpha = 0.75,
cmap = ListedColormap(('red', 'green'))) plt.xlim(X1.min(),
X1.max()) plt.ylim(X2.min(), X2.max()) for i, j in
enumerate(np.unique(y_set)): plt.scatter(X_set[y_set == j,
0], X_set[y_set == j, 1], c=
ListedColormap(('red', 'green'))(i), label = j)
plt.title('Decision Tree Classification (Test set)')
plt.show()
Output
21
Practical Outcomes:
• Decision Trees usually mimic human thinking ability while making a decision, so it is
easy to understand.
• The logic behind the decision tree can be easily understood because it shows a treelike
structure
Application of Decision Tree:

• Marketing: Businesses can use decision trees to enhance the accuracy of their
promotional campaigns by observing the performance of their competitors’ products
and services. Decision trees can help in audience segmentation and support businesses
in producing better-targeted advertisements that have higher conversion rates.
• Retention of Customers: Companies use decision trees for customer retention
through analyzing their behaviors and releasing new offers or products to suit those
behaviors. By using decision tree models, companies can figure out the satisfaction
levels of their customers as well.
• Diagnosis of Diseases and Ailments: Decision trees can help physicians and medical
professionals in identifying patients that are at a higher risk of developing serious ( or
preventable) conditions such as diabetes or dementia. The ability of decision trees to
narrow down possibilities according to specific variables is quite helpful in such
cases.
• Detection of Frauds: Companies can prevent fraud by using decision trees to identify
fraudulent behavior beforehand. It can save companies a lot of resources, including
time and money
22
Practical No - 5
Aim: Deploy Random forest classification

Random forests is a supervised learning algorithm. It can be used both for classification and
regression. It is also the most flexible and easy to use algorithm. A forest is comprised of
trees. It is said that the more trees it has, the more robust a forest is. Random forests creates
decision trees on randomly selected data samples, gets prediction from each tree and selects
the best solution by means of voting. It also provides a pretty good indicator of the feature
importance. Random forests has a variety of applications, such as recommendation engines,
image classification and feature selection. It can be used to classify loyal loan applicants,
identify fraudulent activity and predict diseases. It lies at the base of the Boruta algorithm,
which selects important features in a dataset
How does Random Forest algorithm work?
Step-1: Select random K data points from the training set.
Step-2: Build the decision trees associated with the selected data points (Subsets).
Step-3: Choose the number N for decision trees that you want to build.
Step-4: Repeat Step 1 & 2.
Step-5: For new data points, find the predictions of each decision tree, and assign the new
data points to the category that wins the majority votes.
Python Code:
# Random Forest
Classification # Importing the
libraries import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Social_Network_Ads.csv')
X = dataset.iloc[:, [2, 3]].values y =
dataset.iloc[:, -1].values
23
# Splitting the dataset into the Training set and Test
set from sklearn.model_selection import
train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state
= 0)

# Training the Random Forest Classification model on the Training set from
sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy',

random_state = 0) classifier.fit(X_train, y_train)
# Predicting the Test set results y_pred
= classifier.predict(X_test)

sklearn.metrics import confusion_matrix cm

y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max()
+ 1, step = 0.01),
np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step
= 0.01))
24
25
j, 0], X_set[y_set == j, 1], c=
ListedColormap(('red', 'green'))(i), label = j)
plt.title('Random Forest Classification (Training set)')
plt.show()

+ 1, step = 0.01),
= 0.01))
cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max()) plt.ylim(X2.min(),
X2.max()) for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j,
1], c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Random Forest Classification (Test set)')
plt.xlabel('Age') plt.ylabel('Estimated Salary')
plt.legend() plt.show()
Output:
26
Practical Outcomes:
• There should be some actual values in the feature variable of the dataset so that the
classifier can predict accurate results rather than a guessed result. • The
predictions from each tree must have very low correlations
• It takes less training time as compared to other algorithms.
• It predicts output with high accuracy, even for the large dataset it runs efficiently.
• It can also maintain accuracy when a large proportion of data is missing
Applications of Random Forest algorithm:
There are mainly four sectors where Random forest mostly used:
1. Banking: Banking sector mostly uses this algorithm for the identification of loan risk.
2. Medicine: With the help of this algorithm, disease trends and risks of the disease
can be identified.
3. Land Use: We can identify the areas of similar land use by this algorithm.
4. Marketing: Marketing trends can be identified using this algorithm.
27
Practical No. 6
Aim: Simulate Naïve Bayes algorithm

• Naïve Bayes algorithm is a supervised learning algorithm, which is based on
Bayes theorem and used for solving classification problems.
• It is mainly used in text classification that includes a high-dimensional
training dataset.
• Naïve Bayes Classifier is one of the simple and most effective Classification
algorithms which helps in building the fast machine learning models that can
make quick predictions.
• It is a probabilistic classifier, which means it predicts on the basis of
the probability of an object.
• Some popular examples of Naïve Bayes Algorithm are spam filtration,
Sentimental analysis, and classifying articles.
Working of Naïve Bayes' Classifier:

Working of Naïve Bayes' Classifier can be understood with the help of the below example:
Suppose we have a dataset of weather conditions and corresponding target variable "Play".
So using this dataset we need to decide that whether we should play or not on a particular day
according to the weather conditions. So to solve this problem, we need to follow the below
steps:
1. Convert the given dataset into frequency tables.

2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.
Python Code:
# Naive Bayes

import pandas as pd
28
sklearn.model_selection import train_test_split X_train,
X_test, y_train, y_test = train_test_split(X, y, test_size =
0.25, random_state = 0)

# Training the Naive Bayes model on the Training set

from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB() classifier.fit(X_train,
y_train)
# Predicting the Test set results y_pred =

classifier.predict(X_test) # Making the
Confusion Matrix from sklearn.metrics
import confusion_matrix cm =
confusion_matrix(y_test, y_pred)
print(cm) # Visualising the Training set
results from matplotlib.colors import
y_train
+ 1, step = 0.01),
= 0.01))
cmap = ListedColormap(('red', 'green')))
29
plt.xlim(X1.min(),
30
enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j,
1], c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Naive Bayes (Training set)') plt.xlabel('Age')
plt.ylabel('Estimated Salary') plt.legend() plt.show()

+ 1, step = 0.01),
= 0.01))
0], X_set[y_set == j, 1], c=
ListedColormap(('red', 'green'))(i), label = j) plt.title('Naive
Bayes (Test set)') plt.xlabel('Age') plt.ylabel('Estimated
Salary') plt.legend() plt.show()
Output:
Practical Outcomes:
• The rain forest algorithm is a machine learning algorithm that is easy to use and
flexible. It uses ensemble learning, which enables organizations to solve
regression and classification problems.
31
• This is an ideal algorithm for developers because it solves the problem of
overfitting of datasets. It’s a very resourceful tool for making accurate predictions
needed in strategic decision making in organizations.
Applications of Naïve Bayes
• It is used for Credit Scoring.

• It is used in medical data classification.
• It can be used in real-time predictions because Naïve Bayes Classifier is an
eager learner.
• It is used in Text classification such as Spam filtering and Sentiment analysis.
32
Practical No-7
Aim: Implement K-Nearest Neighbors (K-NN), k-Means

• K-Nearest Neighbour is one of the simplest Machine Learning algorithms based
on Supervised Learning technique.
• K-NN algorithm assumes the similarity between the new case/data and available
cases and put the new case into the category that is most similar to the available
categories.
• K-NN algorithm stores all the available data and classifies a new data point based on
the similarity. This means when new data appears then it can be easily classified into
a well suite category by using K- NN algorithm.
• K-NN algorithm can be used for Regression as well as for Classification but mostly
it is used for the Classification problems.
• K-NN is a non-parametric algorithm, which means it does not make any
assumption on underlying data.
• It is also called a lazy learner algorithm because it does not learn from the training
set immediately instead it stores the dataset and at the time of classification, it
performs an action on the dataset.
• KNN algorithm at the training phase just stores the dataset and when it gets new
data, then it classifies that data into a category that is much similar to the new data.
• Example: Suppose, we have an image of a creature that looks similar to cat and dog,
but we want to know either it is a cat or dog. So for this identification, we can use
the KNN algorithm, as it works on a similarity measure. Our KNN model will find
the similar features of the new data set to the cats and dogs images and based on the
most similar features it will put it in either cat or dog category.
How does K-NN work?
The K-NN working can be explained on the basis of the below algorithm:
• Step-1: Select the number K of the neighbors

• Step-2: Calculate the Euclidean distance of K number of neighbors
• Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
• Step-4: Among these k neighbors, count the number of the data points in
each category.
• Step-5: Assign the new data points to that category for which the number of
the neighbor is maximum.
• Step-6: Our model is ready.
Python Code:
# K-Nearest Neighbors (K-NN)
# Importing the libraries import
numpy as np import
matplotlib.pyplot as plt import
33
pandas as pd
34
= 0)

# Training the K-NN model on the Training set from sklearn.neighbors import
KNeighborsClassifier classifier = KNeighborsClassifier(n_neighbors = 5,
metric = 'minkowski', p = 2) classifier.fit(X_train, y_train)

y_train
35
36
+ 1, step = 0.01),
= 0.01))
0], X_set[y_set == j, 1], c=
ListedColormap(('red', 'green'))(i), label = j) plt.title('K-NN
(Training set)') plt.xlabel('Age') plt.ylabel('Estimated

+ 1, step = 0.01),
= 0.01))
j, 0], X_set[y_set == j, 1], c=
ListedColormap(('red', 'green'))(i), label = j) plt.title('K-NN
(Test set)') plt.xlabel('Age')
plt.ylabel('Estimated
Salary') plt.legend()
37
plt.show()
Output:
38
Practical Outcomes:
• Since the KNN algorithm requires no training before making predictions, new
data can be added seamlessly, which will not impact the accuracy of the
algorithm.
• KNN is very easy to implement. There are only two parameters required to
implement KNN—the value of K and the distance function (e.g. Euclidean,
Manhattan, etc.
Applications:
• KNN is widely used in almost all industries, such as healthcare, financial
services, eCommerce, political campaigns, etc.
• Healthcare companies use the KNN algorithm to determine if a patient is
susceptible to certain diseases and conditions.
• Financial institutions predict credit card ratings or qualify loan applications and
the likelihood of default with the help of the KNN algorithm.
• Political analysts classify potential voters into separate classes based on whom
they are likely to vote for.
39
Practical No- 8
Aim: Deploy Support Vector Machine, Apriori algorithm

Support Vector Machines are a type of supervised machine learning algorithm that provides
analysis of data for classification and regression analysis. While they can be used for
regression, SVM is mostly used for classification. We carry out plotting in the n-dimensional
space. Value of each feature is also the value of the specific coordinate. Then, we find the
ideal hyperplane that differentiates between the two classes.
These support vectors are the coordinate representations of individual observation. It is a
frontier method for segregating the two classes.
Python Code:
# Support Vector Machine (SVM)

import pandas as pd
= 0)

40
# Training the SVM model on the Training set from

sklearn.svm import SVC classifier = SVC(kernel =
'linear', random_state = 0) classifier.fit(X_train,
y_train)


y_train
+ 1, step = 0.01),
= 0.01))
j, 0], X_set[y_set == j, 1], c=
ListedColormap(('red', 'green'))(i), label = j) plt.title('SVM
(Training set)') plt.xlabel('Age') plt.ylabel('Estimated
41
+ 1, step = 0.01),
= 0.01))
X2.ravel()]).T).reshape(X1.shape), alpha = 0.75, cmap =
ListedColormap(('red', 'green'))) plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max()) for i, j in
enumerate(np.unique(y_set)): plt.scatter(X_set[y_set == j, 0],
X_set[y_set == j, 1], c=
ListedColormap(('red', 'green'))(i), label = j) plt.title('SVM (Test
set)') plt.xlabel('Age') plt.ylabel('Estimated Salary') plt.legend()
plt.show() Ouptput:
Practical Outcomes:
• It works really well with a clear margin of separation
• It is effective in high dimensional spaces.
• It is effective in cases where the number of dimensions is greater than the number
of samples.
• It uses a subset of training points in the decision function (called support vectors), so it
is also memory efficient.
Applications:
• Face detection – SVMc classify parts of the image as a face and non-face and create
a square boundary around the face.
• Text and hypertext categorization – SVMs allow Text and hypertext categorization
for both inductive and transductive models. They use training data to classify
documents into different categories. It categorizes on the basis of the score generated
and then compares with the threshold value.
42
• Classification of images – Use of SVMs provides better search accuracy for image
classification. It provides better accuracy in comparison to the traditional query-
based searching techniques.
• Bioinformatics – It includes protein classification and cancer classification. We use
SVM for identifying the classification of genes, patients on the basis of genes and
other biological problems.
• Protein fold and remote homology detection – Apply SVM algorithms for
protein remote homology detection.
• Handwriting recognition – We use SVMs to recognize handwritten characters
used widely.
• Generalized predictive control(GPC) – Use SVM based GPC to control
chaotic dynamics with useful parameters.
43
Practical No. 9
Aim:- Simulate Artificial Neural Network

Neural networks (NN), also called artificial neural networks (ANN) are a subset of learning
algorithms within the machine learning field that are loosely based on the concept of
biological neural networks.
Andrey Bulezyuk, who is a German-based machine learning specialist with more than
five years of experience, says that “neural networks are revolutionizing machine learning
because they are capable of efficiently modeling sophisticated abstractions across an
extensive range of disciplines and industries.”
Basically, an ANN comprises of the following components:
An input layer that receives data and pass it

on A hidden layer
An output layer
Weights between the layers

A deliberate activation function for every hidden layer. In this simple neural network
Python tutorial, we’ll employ the Sigmoid activation function.
There are several types of neural networks. In this project, we are going to create the
feedforward or perception neural networks. This type of ANN relays data directly from the
front to the back.
Training the feed-forward neurons often need back-propagation, which provides the network
with corresponding set of inputs and outputs. When the input data is transmitted into the
neuron, it is processed, and an output is generated.
Summarizing an Artificial Neural Network:
1. Take inputs
2. Add bias (if required)
3. Assign random weights to input features
4. Run the code for training.
5. Find the error in prediction.
6. Update the weight by gradient descent algorithm.
7. Repeat the training phase with updated weights.
8. Make predictions.
Python Code:
from joblib.numpy_pickle_utils import xrange from
numpy import *
44
class NeuralNet(object):
def init (self):
# Generate random numbers
random.seed(1)
# Assign random weights to a 3 x 1 matrix,

self.synaptic_weights = 2 * random.random((3, 1)) - 1
# The Sigmoid function

def sigmoid(self, x):
return 1 / (1 + exp(-x))
# The derivative of the Sigmoid

function. # This is the gradient of the
Sigmoid curve.
def sigmoid_derivative(self, x):

return x * (1 - x)
# Train the neural network and adjust the weights each

time. def train(self, inputs, outputs, training_iterations):
for iteration in xrange(training_iterations):
# Pass the training set through the network.
output = self.learn(inputs)
# Calculate the error

error = outputs - output
# Adjust the weights by a factor factor =

dot(inputs.T, error * self. sigmoid_derivative(output))
self.synaptic_weights += factor
45
# The neural network thinks.
def learn(self, inputs):

return self. sigmoid(dot(inputs, self.synaptic_weights))
if name == " main ":

# Initialize neural_network
= NeuralNet()
# The training set. inputs = array([[0,

1, 1], [1, 0, 0], [1, 0, 1]]) outputs =
array([[1,
0, 1]]).T
# Train the neural network

neural_network.train(inputs, outputs, 10000)
# Test the neural network with a test example.

print(neural_network.learn(array([1, 0, 1])))
Output:
46
Practical Outcomes:
• Know the main provisions neuromathematics;
• Know the main types of neural networks;
• Know and apply the methods of training neural networks;
• Know the application of artificial neural networks;
• To be able to formalize the problem, to solve it by using a neural network
Applications of ANN
a. Classification of data:
Based on a set of data, our trained neural network predicts whether it is a dog or a cat? b.
Anomaly detection:
Given the details about transactions of a person, it can say that whether the transaction is
fraud or not.
c. Speech recognition:
We can train our neural network to recognize speech patterns. Example: Siri, Alexa, Google
assistant.
d. Audio generation:
Given the inputs as audio files, it can generate new music based on various factors like genre,
singer, and others.
e. Time series analysis:
A well trained neural network can predict the stock price. f.
Spell checking:
We can train a neural network that detects misspelled spellings and can also suggest a
similar meaning for words. Example: Grammarly g. Character recognition:
A well trained neural network can detect handwritten characters. h.
Machine translation:
We can develop a neural network that translates one language into another language. i.
Image processing:
We can train a neural network to process an image and extract pieces of information from it.
47
Practical No. 10
Aim: Implement the Genetic Algorithm code
One of the advanced algorithms in the field of computer science is Genetic Algorithm
inspired by the Human genetic process of passing genes from one generation to another. It is
generally used for optimization purpose and is heuristic in nature and can be used at various
places. For eg – solving np problem, game theory, code-
Here are quick steps for how the genetic algorithm works:
1. Initial Population– Initialize the population randomly based on the data.

2. Fitness function– Find the fitness value of the each of the chromosomes(a chromosome
is a set of parameters which define a proposed solution to the problem that the genetic
algorithm is trying to solve)
3. Selection– Select the best fitted chromosomes as parents to pass the genes for the next
generation and create a new population
4. Cross-over– Create new set of chromosome by combining the parents and add them to
new population set
5. Mutation– Perform mutation which alters one or more gene values in a chromosome
in the new population set generated. Mutation helps in getting more diverse
opportunity. Obtained population will be used in the next generation
Repeat step 2-5 again for each generation
Python Code:
import numpy as np import pandas as pd import

random import matplotlib.pyplot from
sklearn.datasets import load_breast_cancer from
sklearn.model_selection import train_test_split from
sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
#import the breast cancer dataset from sklearn.datasets import
load_breast_cancer cancer=load_breast_cancer() df =
pd.DataFrame(cancer['data'],columns=cancer['feature_names']
) label=cancer["target"]
#splitting the model into training and testing set
X_train, X_test, y_train, y_test = train_test_split(df,
label,
test_size=0.30,
random_state=101)
48
#training a logistics regression model logmodel =
LogisticRegression() logmodel.fit(X_train,y_train)
predictions = logmodel.predict(X_test) print("Accuracy =
"+ str(accuracy_score(y_test,predictions))) #defining
various steps required for the genetic algorithm def
initilization_of_population(size,n_feat): population = []
for i in range(size):
chromosome =
np.ones(n_feat,dtype=np.bool)
chromosome[:int(0.3*n_feat)]=False
np.random.shuffle(chromosome)
population.append(chromosome) return
population
def fitness_score(population):
scores = [] for chromosome
in population:
logmodel.fit(X_train.iloc[:,chromosome],y_train)
predictions = logmodel.predict(X_test.iloc[:,chromosome])
scores.append(accuracy_score(y_test,predictions)) scores,
population = np.array(scores), np.array(population) inds
= np.argsort(scores) return list(scores[inds][::-1]),
list(population[inds,:][::-1])
def selection(pop_after_fit,n_parents):
population_nextgen = [] for i in
range(n_parents):
population_nextgen.append(pop_after_fit[i]
) return population_nextgen
49
def crossover(pop_after_sel):
population_nextgen=pop_after_sel
for i in range(len(pop_after_sel)):
child=pop_after_sel[i]
child[3:7]=pop_after_sel[(i+1)%len(pop_after_sel)]
[3:7] population_nextgen.append(child) return
population_nextgen
def mutation(pop_after_cross,mutation_rate):
population_nextgen = [] for i in
range(0,len(pop_after_cross)):
chromosome = pop_after_cross[i] for j

in range(len(chromosome)): if
random.random() < mutation_rate:
chromosome[j]= not chromosome[j]
population_nextgen.append(chromosome)
#print(population_nextgen
) return population_nextgen
def generations(size,n_feat,n_parents,mutation_rate,n_gen,X_train,
X_test, y_train, y_test):
best_chromo= [] best_score= []
population_nextgen=initilization_of_population(size,n_feat
) for i in range(n_gen): scores, pop_after_fit =
fitness_score(population_nextgen)
print(scores[:2]) pop_after_sel =
selection(pop_after_fit,n_parents) pop_after_cross =
crossover(pop_after_sel) population_nextgen =
mutation(pop_after_cross,mutation_rate)
50
best_chromo.append(pop_after_fit[0]
) best_score.append(scores[0])
return best_chromo,best_score
Output:
Practical Outcomes:
• Genetic Algorithms might be costly in computational terms since the evaluation

of each individual requires the training of a model.
• These algorithms can take a long time to converge since they have a
stochastic nature
• Genetic algorithms work on the Chromosome, which is an encoded version
of potential solutions’ parameters, rather the parameters themselves.
• Genetic algorithms use fitness score, which is obtained from objective
functions, without other derivative or auxiliary information
Applications:
• Robotics
The use of genetic algorithm in the field of robotics is quite big. Actually, genetic
algorithm is being used to create learning robots which will behave as a human and
will do tasks like cooking our meal, do our laundry etc
• Traffic and Shipment Routing (Travelling Salesman Problem)
This is a famous problem and has been efficiently adopted by many sales-based
companies as it is time saving and economical. This is also achieved using genetic
algorithm.
• Engineering Design
Engineering design has relied heavily on computer modeling and simulation to make
design cycle process fast and economical. Genetic algorithm has been used to
optimize and provide a robust solution
51

ML manoj

Uploaded by

ML manoj

Uploaded by

Department of Computer Science and Engineering

Subject – Machine Learning Lab

B.Tech – 6th Semester

Chandigarh Group of Colleges

SUBMITTED TO: SUBMITTED BY:

S.no. Name of Experiment Page Date Signature

Implement K-Nearest Neighbors (K- 28-31

Deploy Support Vector Machine, Apriori 32-35

import numpy as np import

output of separating independent and dependent variables:

Figure 2: separating independent and dependent variables

#handling missing data (Replacing missing data with the

Output: replacing NaN(missing values) with mean

Figure 3: replacing NaN(missing values) with mean

label_encoder_x.fit_transform(x[:, 0]) from

ct = ColumnTransformer([('one_hot_encoder', OneHotEncoder(), [0])], #The

Output: concerting categorical data to numeric:

Figure 3: concerting categorical data to numeric

from sklearn.model_selection import train_test_split

Figure 4: Dividing Dataset into training and testing datasets

from sklearn.preprocessing import

Output: Feature scaling

Conclusion of Data Preprocessing:

Step-1: Data Pre-processing import

Step 2: Extraction of Independent And Dependent Variables

Figure 2: Output of Extraction of Independent and dependentvariable

Step 3: Split dataset in training and testing

sklearn.linear_model import LinearRegression

regressor= LinearRegression() regressor.fit(x_train,

Step: 5. Prediction of test set result:

Figure 4: Prediction variables

Step: 6. Visualizing the Training set results:

Figure 5: Prediction variables

Step: 7. visualizing the Test set results:

Applications of Simple Linear Regression

Aim: Simulate Multiple Linear Regression

Some key points about MLR:

Step-1: Data Pre-processing Step:

import numpy as nm import

Figure 1: Output of dataset uploading

Figure 2: Output for extracting independent and dependent variable

Step 3: Encoding Dummy Variables:

label_encoder_x.fit_transform(x[:, 3]) from

ct = ColumnTransformer([('one_hot_encoder', OneHotEncoder(), [3])], #The

Figure 4: Splitting dataset

Step 5: Fitting our MLR model to the Training set:

#Fitting the MLR model to the training set:

Figure 5: Prediction variable

There are mainly two applications of Multiple Linear Regression:

Aim: Implement Decision Tree

# Importing the libraries

# Importing the dataset dataset =

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

# Feature Scaling from sklearn.preprocessing

# Training the Decision Tree Classification model on the Training

# Predicting the Test set results

# Making the Confusion Matrix from

# Visualising the Training set results

c = ListedColormap(('red', 'green'))(i), label = j)

# Visualising the Test set results from

Application of Decision Tree:

Aim: Deploy Random forest classification

How does Random Forest algorithm work?

Step-1: Select random K data points from the training set.

Step-4: Repeat Step 1 & 2.

# Importing the dataset

# Feature Scaling from sklearn.preprocessing

classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy',

# Predicting the Test set results y_pred

# Making the Confusion Matrix from

# Visualising the Training set results

# Visualising the Test set results from

Applications of Random Forest algorithm:

Aim: Simulate Naïve Bayes algorithm

Working of Naïve Bayes' Classifier:

1. Convert the given dataset into frequency tables.