Tushar ML

INDEX
S.No. Problem Statement Date Page Signature

No.
1. Study descriptive statistics and data preprocessing concepts
using well-known datasets like the Titanic or Boston house
price dataset.
2. Implement a linear regression with one variable algorithm
from scratch using Python. Given a dataset of X and Y values,
create a linear regression model that predicts Y based on X
without using any libraries like sklearn etc.
3. Implement the gradient descent algorithm for linear
regression with one variable from scratch in vectorized form.
Train a linear regression model using gradient descent to find
the optimal coefficients (slope and intercept) for a given
dataset.
4. Use the sklearn library to create a linear regression with
multiple variables. Load a well-known dataset split it into
training and testing sets, and then train the model to predict a
target variable based on one or more features. You can use the
following data set:
https://raw.githubusercontent.com/sachinmotwani20/NPTEL-
ML_Datasets/main/ScoresPrediction.csv
5. Use the sklearn library to create a logistic regression model for
binary classification. Load a dataset with two classes, pre-
process the data, split it into training and testing sets, and then
train the model to predict the class labels. You can use any data
set.
6. Implement logistic regression from scratch for binary classification.
Given a dataset with two classes, create a logistic regression model
that predicts the probability of an example belonging to one class.
Implement the logistic function and gradient descent for
optimization.
7. Write a program to implement the naïve Bayesian classifier for
a sample training data set stored as a .CSV file. Compute the
accuracy of the classifier, considering few test data sets.
8. Implement k-fold cross-validation (e.g., 5-fold) for the Naïve
Bayesian classifier on a given dataset. Calculate the average
accuracy of the classifier over the k folds and report the results.
9. Implement a K-Nearest Neighbors (KNN) classifier from
scratch in Python. Use a sample dataset, such as the Iris
dataset, and split it into a training and testing set. Train the
KNN classifier on the training set and evaluate its performance
on the testing set. Experiment with different values of k and
report the accuracy of the classifier.
10. Write a Python program to implement a Support Vector
Machine (SVM) classifier using a library like scikit-learn.
Choose a suitable dataset for binary classification (e.g., the
Breast Cancer dataset) and split it into training and testing
sets. Train the SVM classifier on the training data and evaluate
its performance on the testing data, reporting metrics such as
accuracy, precision, recall, and F1-score. Experiment with
different kernel functions (e.g., linear, radial basis function)
and compare their performance.
11. Take a binary classification dataset and implement both the K-
Nearest Neighbors (KNN) and Support Vector Machine (SVM)
classifiers using Python. Compare the performance of these
two algorithms on metrics such as accuracy, precision, recall,
and F1-score. Visualize the decision boundaries for both
algorithms and discuss the strengths and weaknesses of each
approach.
12. Given a dataset containing features and labels, implement a
Random Forest classification model using Python and a
library like scikit-learn. Split the dataset into training and
testing sets, train the model, and evaluate its performance
using metrics like accuracy, precision, and recall.
13. Extend the previous exercise by performing hyperparameter
tuning for the Random Forest classifier. Experiment with
different values for hyperparameters like the number of trees,
maximum depth of trees, and minimum samples per leaf.
Determine the best combination of hyperparameters that
maximizes the classification accuracy on the test dataset.
14. Using a dataset with a large number of features, apply a
Random Forest classifier to perform feature importance
analysis. Identify and rank the most important features in the
dataset. Visualize the feature importances to gain insights into
which features are most influential for classification.
15. Build a feedforward artificial neural network (ANN) from
scratch using Python. Implement the Backpropagation
algorithm to train the network. Use a simple binary
classification dataset (e.g., the XOR problem) to test and
validate the network's training process and accuracy.
16. Create a multilayer perceptron (MLP) neural network using a deep
learning framework like TensorFlow or PyTorch. Train the MLP on
a dataset of handwritten digit images (e.g., MNIST or Fashion
MNIST) using the Backpropagation algorithm. Evaluate the model's
accuracy on a separate test dataset and visualize the results.
17. Apply k-Means algorithm and Hierarchical Clustering
algorithm to cluster a set of data stored in a .CSV file. Use the
same data set for clustering. Compare the results of these two
algorithms and comment on the quality of clustering.
18. Load a dataset from a "data.csv" file and apply the k-Means
clustering algorithm to cluster the data into 'k' clusters.
Experiment with different values of 'k' and visualize the
results. Discuss the optimal number of clusters based on the
clustering quality metrics using PCA.
19. Load a dataset from "data.csv" and apply the k-Means
algorithm with a chosen 'k' value. Use appropriate clustering
quality metrics (e.g., silhouette score, inertia, Davies-Bouldin
index) to evaluate the quality of clustering. Analyze how the
choice of 'k' affects the clustering quality and discuss your
findings.
20. Write a Python program to implement a Self-Organizing Map
(SOM) and train it on a given dataset, such as a collection of
2D points. Allow the user to specify parameters like the map
size, learning rate, and number of training iterations. Visualize
the map before and after training to observe how it adapts to
the data.
21. Implement a Self-Organizing Map (SOM) algorithm and use it
to cluster a set of images based on their visual content. You can
represent each image as a feature vector (e.g., using color
histograms or deep features) and train the SOM to group
similar images together. Visualize the resulting clusters and
evaluate the quality of the clustering.
22. Given a dataset of customer churn, implement a program that
compares the performance of three different supervised
learning algorithms (e.g., Logistic Regression, Random Forest,
and Support Vector Machine) for binary classification. Split
the dataset into training and testing sets, train each algorithm
on the training set, and evaluate their performance using
metrics like accuracy, precision, recall, and F1-score. Present
the results in a clear and informative way, such as through a
bar chart or a table.
Machine Learning with Python Lab (BCAP-311)
PRACTICAL-1
Aim:- Study descriptive statistics and data preprocessing concepts using well-known datasets like
the Titanic or Boston house price dataset.
Solution:-
import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# Load the Titanic dataset from seaborntitanic = sns.load_dataset('titanic')
# Display the first few rows of the dataset
print("Dataset Preview:")
print(titanic.head())
# Compute descriptive statistics
print("\nDescriptive Statistics:")
print(titanic.describe())
# Handling missing data
print("\nHandling Missing Data:")
print("Number of missing values in each column:")
print(titanic.isnull().sum())
# Drop columns with too many missing values (e.g., deck in this case) titanic =
titanic.drop(columns=['deck'])
# Fill missing values in the 'age' column with the median
titanic['age'].fillna(titanic['age'].median(), inplace=True)
# Fill missing values in the 'embarked' column with the most common value
titanic['embarked'].fillna(titanic['embarked'].mode()[0], inplace=True)
# Handling categorical data print("\nHandling Categorical Data:")
# Convert 'sex' and 'embarked' to numerical values using one-hot encoding titanic =
pd.get_dummies(titanic, columns=['sex', 'embarked'], drop_first=True) # Display the
modified dataset
print(titanic.head())
# Partitioning dataset into training and test datasets print("\nPartitioning into Training
and Test Datasets:")X = titanic.drop('survived', axis=1)
y = titanic['survived']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Bringing features onto the same scale (Standardization)
print("\nBringing Features onto the Same Scale:")# Select only numeric columns for
scaling
numeric_columns = X_train.select_dtypes(include=['float64', 'int64']).columnsscaler =
StandardScaler()
X_train_scaled = X_train.copy()X_test_scaled = X_test.copy()
X_train_scaled[numeric_columns] = scaler.fit_transform(X_train[numeric_columns])
X_test_scaled[numeric_columns] = scaler.transform(X_test[numeric_columns])
# Display the first few rows of the scaled training set print("Scaled Training Set
Preview:") print(X_train_scaled.head())
OUTPUT:-
Tushar Ahuja 2 09213702022


PRACTICAL-2
Aim:-Implement a linear regression with one variable algorithm from scratch using
Python.Given a dataset of X and Y values, create a linear regression model that predicts
Y based onX without using any machine learning libraries like sklearn.
Solution:-
import numpy as np
import matplotlib.pyplot as plt
def linear_regression(X, Y, learning_rate, epochs):# Initialize parameters
m=0
b=0
n = len(X)
# Gradient descent
for _ in range(epochs):
Y_pred = m * X + b
dm = (-2/n) * np.sum(X * (Y - Y_pred))db = (-2/n) * np.sum(Y - Y_pred)
m -= learning_rate * dmb -= learning_rate * db return m, b
def plot_regression_line(X, Y, m, b): plt.scatter(X, Y, color='darkorange', label='Actual
data')
plt.plot(X, m * X + b, color='royalblue',label='Regression line')
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Linear Regression with One Variable')plt.legend()
plt.show()def main():
# Generate synthetic datanp.random.seed(42)
X = 2 * np.random.rand(100, 1)
Y = 4 + 3 * X + np.random.randn(100, 1)
# Flatten X and Y to ensure they are 1-D arrays forease of calculations
X = X.flatten()Y = Y.flatten()# Train model
learning_rate = 0.01
epochs = 1000

m, b = linear_regression(X, Y, learning_rate,epochs)
# Output results
print(f"Slope (m): {m:.4f}, Intercept (b): {b:.4f}")
# Plot
plot_regression_line(X, Y, m, b)if name == " main ": main()
OUTPUT:-

PRACTICAL-3
Aim:-Implement the gradient descent algorithm for linear regression with one variable
fromscratch in vectorize form. Train a linear regression model using gradient descent to
find theoptimal coefficients (slope and intercept) for a given dataset.
Solution:-
import numpy as np
import matplotlib.pyplot as pltfrom sklearn import datasets
from sklearn.model_selection import train_test_splitfrom sklearn.preprocessing import
StandardScaler def normalize_feature(X):
mean = np.mean(X, axis=0)std = np.std(X, axis=0)
normalized_X = (X - mean) / stdreturn normalized_X, mean, std def add_intercept(X):
return np.c_[np.ones(X.shape[0]), X]def compute_cost(X, y, theta):
m = len(y)
error = X @ theta - y
cost = (1 / (2 * m)) * np.sum(error**2)return cost
def gradient_descent(X, y, theta, learning_rate, iterations):
m = len(y)
cost_history = np.zeros(iterations)for i in range(iterations):
error = X @ theta - y
gradient = (1 / m) * (X.T @ error) theta = theta - learning_rate * gradient
cost_history[i] = compute_cost(X, y, theta)return theta, cost_history
# Load the diabetes dataset diabetes = datasets.load_diabetes()
X = diabetes.data[:, np.newaxis, 2] # Use only one feature for simplicity y =
diabetes.target
# Split the data into training and testing sets
# Normalize features and add intercept term
X_train_normalized, mean, std = normalize_feature(X_train) X_train_normalized =
add_intercept(X_train_normalized)
# Initialize parameters
theta_initial = np.zeros(X_train_normalized.shape[1])# Set hyperparameters
learning_rate = 0.01
iterations = 1500
# Run gradient descent
theta_optimal, cost_history = gradient_descent(X_train_normalized, y_train,
theta_initial,

learning_rate, iterations)
# Display the optimized coefficients
print("Optimal Coefficients (Intercept, Slope):", theta_optimal)# Plot the cost history to
visualize convergence
plt.plot(np.arange(1, iterations + 1), cost_history, label='Cost History')
plt.xlabel('Iterations')
plt.ylabel('Cost')
plt.title('Gradient Descent Convergence')plt.legend()
plt.show()
OUTPUT:-

PRACTICAL-4
Aim:- Use the sklearn library to create a linear regression with multiple variables. Load
awell known dataset split it into training and testing sets, and then train the model to
predicta target variable based on one or more features. You can use the following data
set:
https://raw.githubusercontent.com/sachinmotwani20/NPTELML_Datasets/main/Scores
Prediction.csv
Solution:-
import pandas as pd
from sklearn.model_selection import train_test_splitfrom sklearn.linear_model import
LinearRegression from sklearn.metrics import mean_squared_error import
matplotlib.pyplot as plt
# Load the dataset from the provided URLurl =
"https://raw.githubusercontent.com/sachinmotwani20/NPTEL
ML_Datasets/main/ScoresPrediction.csv"
df = pd.read_csv(url)
# Display the column names in the datasetprint("\nColumn Names:", df.columns)
# Extract features and target variableX = df.drop("FinalYrScore", axis=1)
# Modify "score" based on the actual column namey = df["FinalYrScore"]
# Split the dataset into training and testing sets
# Create a linear regression model
model = LinearRegression()
# Train the model on the training setmodel.fit(X_train, y_train)
# Make predictions on the testing sety_pred = model.predict(X_test)
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)

print("\nMean Squared Error on Test Set:", mse)# Visualize predictions vs. actual values
plt.scatter(y_test, y_pred)
plt.xlabel("Actual Scores") plt.ylabel("Predicted Scores") plt.title("Actual Scores vs.
Predicted Score
OUTPUT:-

PRACTICAL-5
Aim:- Use the sklearn library to create a logistic regression model for binary
classification. Load a dataset with two classes, pre-process the data, split it into training
and testing sets, and then train the model to predict the class labels. You can use any data
set.
Solution:-
# Import necessary libraries
from sklearn.datasets import load_wine
from sklearn.model_selection import train_test_split from sklearn.preprocessing import
StandardScaler from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, accuracy_score# Step 1: Load the wine
dataset
wine = load_wine()
X = wine.data y = wine.target
binary_filter = (y == 0) | (y == 1)X_binary = X[binary_filter] y_binary = y[binary_filter]
# Step 2: Pre-process the datascaler = StandardScaler()
X_scaled = scaler.fit_transform(X_binary)
# Step 3: Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y_binary, test_size=0.3,
random_state=42)
# Step 4: Train the logistic regression model model =
LogisticRegression(max_iter=10000)model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred)}") print("Classification Report:")
print(classification_report(y_test, y_pred))
OUTPUT:-

PRACTICAL-6
Aim:- Implement logistic regression from scratch for binary classification. Given a
dataset with two classes, create a logistic regression model that predicts the probability
of an example belonging to one class. Implement the logistic function and gradient
descent for optimization.
Solution:-
import numpy as np
from sklearn.datasets import load_breast_cancer from sklearn.model_selection import
train_test_splitfrom sklearn.metrics import accuracy_score
cancer = load_breast_cancer()
X = cancer.data y = cancer.target
X_bias = np.c_[np.ones(X.shape[0]), X] X_train, X_test, y_train, y_test =
train_test_split(X_bias, y, test_size=0.2,random_state=42)
def sigmoid(z):
return 1 / (1 + np.exp(-z))
def gradient_descent(X, y, theta, learning_rate,iterations):
m = len(y)
cost_history = np.zeros(iterations)
for i in range(iterations): # Correct indentation herez = np.dot(X, theta)
h = sigmoid(z)
gradient = np.dot(X.T, (h - y)) / mtheta -= learning_rate * gradient
cost_history[i] = -1/m * np.sum(y * np.log(h) + (1 -y) * np.log(1 - h))
return theta, cost_history
theta_initial = np.zeros(X_train.shape[1])learning_rate = 0.01
iterations = 500 theta_optimal, cost_history =
gradient_descent(X_train, y_train, theta_initial,learning_rate,
iterations)
y_pred_prob = sigmoid(np.dot(X_test,theta_optimal))
y_pred = (y_pred_prob >= 0.5).astype(int)accuracy = accuracy_score(y_test, y_pred)

print("Accuracy:", accuracy)
OUTPUT:-

PRACTICAL-7
Aim:- Write a program to implement the naïve Bayesian classifier for a sample training
data set stored as a .CSV file. Compute the accuracy of the classifier, considering few
testdata sets.
Solution:-
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_splitfrom sklearn.naive_bayes import
GaussianNB
from sklearn.metrics import accuracy_score, confusion_matrix, classification_reportiris
= load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names) y = pd.Series(iris.target,
name='target')
model = GaussianNB()
model.fit(X_train, y_train) y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)print(f"Accuracy: {accuracy:.2f}")
conf_matrix = confusion_matrix(y_test, y_pred)print("Confusion Matrix:")
print(conf_matrix)
class_report = classification_report(y_test, y_pred, target_names=iris.target_names)
print("Classification Report:")

OUTPUT:-

PRACTICAL-8
Aim:- Implement k-fold cross-validation (e.g., 5-fold) for the Naïve Bayesian classifier
on agiven dataset. Calculate the average accuracy of the classifier over the k folds and
report the results.
Solution:-
import pandas as pd
from sklearn.model_selection import StratifiedKFoldfrom sklearn.naive_bayes import
GaussianNB
from sklearn.metrics import accuracy_scoreiris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names) y = pd.Series(iris.target,
name='target')
nb_classifier = GaussianNB()num_folds = 5
stratified_kfold = StratifiedKFold(n_splits=num_folds, shuffle=True, random_state=42)
accuracy_scores = []
for train_index, test_index in stratified_kfold.split(X, y):
X_train, X_test = X.iloc[train_index], X.iloc[test_index] y_train, y_test =
y.iloc[train_index], y.iloc[test_index] nb_classifier.fit(X_train, y_train)
y_pred = nb_classifier.predict(X_test) accuracy = accuracy_score(y_test, y_pred)
accuracy_scores.append(accuracy)
average_accuracy = sum(accuracy_scores) / num_folds
print(f"Average Accuracy over {num_folds}-fold Cross-Validation:
{average_accuracy:.4f}")
OUTPUT:-

PRACTICAL-9
Aim:-Implement a K-Nearest Neighbors (KNN) classifier from scratch in Python. Use a

sample dataset, such as the Iris dataset, and split it into a training and testing set. Train
the KNN classifier on the training set and evaluate its performance on the testing set.
Experimentwith different values of k and report the accuracy of the classifier.
Solution:-
import numpy as np
from sklearn.model_selection import train_test_split from sklearn.metrics import
accuracy_score
def euclidean_distance(x1, x2): return np.sqrt(np.sum((x1 - x2)**2))
class KNNClassifier:
def init (self, k=3):
self.k = k
def fit(self, X_train, y_train):

self.X_train = X_trainself.y_train = y_train
def predict(self, X_test):

predictions = [self._predict(x) for x in X_test]return np.array(predictions)
def _predict(self, x):

distances = [euclidean_distance(x, x_train) for x_train in self.X_train]
k_neighbors_indices = np.argsort(distances)[:self.k] k_neighbor_labels = [self.y_train[i]
for i in k_neighbors_indices] most_common = np.bincount(k_neighbor_labels).argmax()
return most_common# Load the Iris datasetiris = load_iris()
X = iris.data y = iris.target
# Initialize and train the KNN classifier
k_values = [1, 3, 5, 7] # Experiment with different values of kfor k in k_values:
knn_classifier = KNNClassifier(k=k)knn_classifier.fit(X_train, y_train)
# Make predictions on the testing set y_pred = knn_classifier.predict(X_test)
# Evaluate and print the accuracy accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy with k={k}: {accuracy:.4f}")

OUTPUT:-

PRACTICAL-10
Aim:-Write a Python program to implement a Support Vector Machine (SVM) classifier

using a library like scikit-learn. Choose a suitable dataset for binary classification (e.g.,
the Breast Cancer dataset) and split it into training and testing sets. Train the SVM
classifier onthe training data and evaluate its performance on the testing data, reporting
metrics such as accuracy, precision, recall, and F1-score. Experiment with different
kernel functions (e.g., linear, radial basis function) and compare their performance.
.
Solution:-
import numpy as np
import matplotlib.pyplot as pltfrom sklearn import datasets
from sklearn.model_selection import train_test_splitfrom sklearn.svm import SVC
from sklearn.metrics import accuracy_score, precision_score, recall_score,
f1_score,confusion_matrix
# Load the Breast Cancer dataset cancer = datasets.load_breast_cancer()
# Function to train and evaluate SVM classifier
def train_and_evaluate_svm(kernel):
svm_classifier = SVC(kernel=kernel, random_state=42)
svm_classifier.fit(X_train, y_train) # Make predictions on the testing set

y_pred = svm_classifier.predict(X_test) # Evaluate and print performance metrics
accuracy = accuracy_score(y_test, y_pred)precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred) print(f"Kernel: {kernel}") print(f"Accuracy:
{accuracy:.4f}")print(f"Precision: {precision:.4f}")print(f"Recall: {recall:.4f}")
print(f"F1-Score: {f1:.4f}")# Display confusion matrix
cm = confusion_matrix(y_test, y_pred)print("\nConfusion Matrix:")
print(cm)
# Experiment with different kernel functionskernels = ['linear', 'rbf', 'poly']
for kernel in kernels:
print("\n--- SVM Classifier ---")train_and_evaluate_svm(kernel)print("\n" + "="*40 +
"\n")
# Visualize decision boundaries for the first two features with the linear kernel
svm_classifier_linear = SVC(kernel='linear', random_state=42)
svm_classifier_linear.fit(X_train[:, :2], y_train)
# Plot decision boundariesplt.figure(figsize=(10, 6))
plt.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap='coolwarm', marker='o',
edgecolor='k',s=100, label='Training Data')
plt.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap='coolwarm', marker='x', s=150,
linewidths=2, label='Testing Data')

# Create a meshgrid to plot decision boundaries

xx, yy = np.meshgrid(np.linspace(X[:, 0].min(), X[:, 0].max(), 100),
np.linspace(X[:, 1].min(), X[:, 1].max(), 100))
Z = svm_classifier_linear.decision_function(np.c_[xx.ravel(), yy.ravel()]) Z =
Z.reshape(xx.shape)
plt.contour(xx, yy, Z, levels=[0], linewidths=2, colors='black', linestyles='dashed')
plt.title("SVM Decision Boundaries (Linear Kernel)")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")plt.legend()
plt.show()
OUTPUT:-

PRACTICAL-11
Aim:-Take a binary classification dataset and implement both the K-Nearest Neighbors
(KNN) and Support Vector Machine (SVM) classifiers using Python. Compare the
performance of these two algorithms on metrics such as accuracy, precision, recall, and
F1-score. Visualize the decision boundaries for both algorithms and discuss the strengths
and weaknesses of each approach.
Strengths and Weaknesses of K-Nearest Neighbors (KNN)Strengths

1. Simple to Implement and Understand
KNN is easy to implement and understand, making it a good choice for beginners.
2. No Training Period
KNN is a lazy learner, meaning it doesn't require a training phase. The model directly
learns from the training data during prediction.
3. Adapts to Local Patterns

KNN is effective in capturing local patterns and can perform well when the decision
boundary is not globally linear.
Weaknesses
1. Computationally Intensive
As the size of the dataset grows, the computational cost of KNN increases since it needs
to calculate distances for each prediction.
2. Sensitive to Outliers
KNN is sensitive to outliers and noisy data, as they can significantly impact the distance
calculations.
3. Doesn't Perform Well on High-Dimensional Data

In high-dimensional spaces, the "curse of dimensionality" can affect the performance of
KNN, as the concept of proximity becomes less meaningful.
Strengths and Weaknesses of Support Vector Machine (SVM)Strengths
1. Effective in High-Dimensional Spaces
SVM performs well in high-dimensional spaces, making it suitable for tasks with many
features.
2. Robust to Overfitting
SVM is less prone to overfitting, especially in high-dimensional spaces, due to the margin
concept.

3. Versatility
SVM can be applied to various types of data, including linear and non-linear
classificationand regression tasks.
Weaknesses
1. Computational Complexity
Training an SVM can be computationally intensive, especially for large datasets.
2. Choice of Kernel and Parameters
The performance of SVM is sensitive to the choice of the kernel and its parameters.
Selecting the right combination requires experimentation.
3. Limited Interpretability
SVM models may be less interpretable compared to simpler models like linear
regression.
Comparison and Considerations
1. Performance
In terms of performance, the choice between KNN and SVM depends on the specific
characteristics of the dataset. KNN may work well for simpler datasets with local
patterns, whileSVM may excel in tasks with more complex decision boundaries.
2. Computational Cost:
KNN is computationally expensive, especially for large datasets, as it requires calculating
distances for each prediction. SVM, while also computationally intensive, can be more
efficient for large datasets.
3. Robustness
KNN is sensitive to outliers and noisy data, while SVM is more robust due to its emphasis
on the margin.
4. Interpretability
KNN is more interpretable, as its decision is based on the majority class among the
k-nearestneighbors. SVM, being a more complex model, may be less interpretable.
5. Dataset Characteristics
Consider the nature of the dataset. If the decision boundary is expected to be locally
smooth, KNN might be more appropriate. If the dataset has high dimensionality or
requires a non- linear decision boundary, SVM might be a better choice.
In summary, the choice between KNN and SVM depends on factors such as the dataset
size, complexity, and characteristics. Both algorithms have their strengths and
weaknesses, and the selection should be made based on the specific requirements of the
task at hand.

Solution:-
import numpy as np
from sklearn.datasets import load_breast_cancer from sklearn.model_selection import
train_test_split from sklearn.neighbors import KNeighborsClassifier from sklearn.svm
import SVC

f1_score,confusion_matrix
from sklearn.preprocessing import StandardScaler# Load the Breast Cancer dataset
cancer = load_breast_cancer()
# Standardize the features for SVM (KNN is less sensitive to feature scaling)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)X_test_scaled = scaler.transform(X_test)
# Function to train and evaluate KNN classifierdef train_and_evaluate_knn():
knn_classifier = KNeighborsClassifier(n_neighbors=5) knn_classifier.fit(X_train,
y_train)
# Make predictions on the testing set y_pred = knn_classifier.predict(X_test)
# Evaluate and print performance metrics accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred) print("--- K-Nearest Neighbors ---") print(f"Accuracy:
{accuracy:.4f}") print(f"Precision: {precision:.4f}") print(f"Recall: {recall:.4f}")
print(f"F1-Score: {f1:.4f}") print("Confusion Matrix:") print(confusion_matrix(y_test,
y_pred))
# Function to train and evaluate SVM classifierdef train_and_evaluate_svm():
svm_classifier = SVC(kernel='linear', random_state=42)
svm_classifier.fit(X_train_scaled, y_train)
# Make predictions on the testing set
y_pred = svm_classifier.predict(X_test_scaled)# Evaluate and print performance metrics
accuracy = accuracy_score(y_test, y_pred) precision = precision_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
print("\n--- Support Vector Machine ---")

print(f"Accuracy: {accuracy:.4f}") print(f"Precision: {precision:.4f}") print(f"Recall:

{recall:.4f}")
print(f"F1-Score: {f1:.4f}") print("Confusion Matrix:") print(confusion_matrix(y_test,
y_pred))def plot_decision_boundaries():
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
knn_classifier = KNeighborsClassifier(n_neighbors=5)knn_classifier.fit(X_train[:, :2],
y_train)
plt.title("K-Nearest Neighbors Decision Boundaries")
xx, yy = np.meshgrid(np.linspace(X[:, 0].min(), X[:, 0].max(), 100),
Z = knn_classifier.predict(np.c_[xx.ravel(), yy.ravel()])Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap='coolwarm', alpha=0.3)plt.subplot(1, 2, 2)
svm_classifier = SVC(kernel='linear', random_state=42)
svm_classifier.fit(X_train_scaled[:, :2], y_train) # Use scaled features here
plt.title("Support Vector Machine Decision Boundaries")
scaler_visualization = StandardScaler() scaler_visualization.fit(X[:, :2]) # Fit on the first
two features only
xx_scaled, yy_scaled = np.meshgrid(np.linspace(X[:, 0].min(), X[:, 0].max(), 100),
Z =
svm_classifier.decision_function(scaler_visualization.transform(np.c_[xx_scaled.ravel(
),yy_scaled.ravel()])) # Use the new scaler
Z = Z.reshape(xx_scaled.shape)
plt.contourf(xx_scaled, yy_scaled, Z, cmap='coolwarm', alpha=0.3)plt.show()
train_and_evaluate_knn() train_and_evaluate_svm() plot_decision_boundaries()

OUTPUT:-

PRACTICAL-12
Aim:- Given a dataset containing features and labels, implement a Random Forest
classification model using Python and a library like scikit-learn. Split the dataset into
training and testing sets, train the model, and evaluate its performance using metrics like
accuracy, precision, and recall.
Solution:-
import numpy as np
from sklearn.model_selection import train_test_split from sklearn.ensemble import
RandomForestClassifier
confusion_matrix# Load the Iris dataset
iris = load_iris()
X = iris.data # Featuresy = iris.target # Labels
# Split the dataset into training and testing sets (80% train, 20% test)
# Initialize the Random Forest classifier
random_forest_classifier = RandomForestClassifier(n_estimators=100,
random_state=42)# Train the model
random_forest_classifier.fit(X_train, y_train)# Make predictions on the testing set
y_pred = random_forest_classifier.predict(X_test) # Evaluate and print performance
metrics accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted') recall =
recall_score(y_test, y_pred, average='weighted') print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")print(f"Recall: {recall:.4f}")
# Confusion Matrix
conf_matrix = confusion_matrix(y_test, y_pred)print("\nConfusion Matrix:")
print(conf_matrix)
OUTPUT:-

PRACTICAL-13
Aim: Extend the previous exercise by performing hyperparameter tuning for the Random Forest
classifier. Experiment with different values for hyperparameters like the number of trees, maximum
depth of trees, and minimum samples per leaf. Determine the best combination of hyperparameters
that maximizes the classification accuracy on the test dataset.
Solution
import numpy as np import pandas as pd

from sklearn.model_selection import train_test_split, GridSearchCV from sklearn.ensemble import
RandomForestClassifier
from sklearn.metrics import accuracy_score
# Assuming you have a dataset in a CSV file named "dataset.csv" # Replace it with the actual path
or load your dataset using pandas dataset_path = "Iris.csv"
df = pd.read_csv(dataset_path)
# Assuming the last column is the label, and the rest are features X = df.iloc[:, :-1] # Features
y = df.iloc[:, -1] # Labels
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Initialize
the Random Forest classifier
random_forest_classifier = RandomForestClassifier(random_state=42) # Define the hyperparameter
grid for grid search
param_grid = {
'n_estimators': [50, 100, 200],
'max_depth': [None, 10, 20, 30],

'min_samples_leaf': [1, 5, 10],
}

# Perform grid search with cross-validation (5-fold cross-validation in this example)
grid_search = GridSearchCV(estimator=random_forest_classifier, param_grid=param_grid, cv=5,

scoring='accuracy', n_jobs=-1)
grid_search.fit(X_train, y_train) # Print the best hyperparameters
print("Best Hyperparameters:", grid_search.best_params_) # Get the best model from the grid search
best_random_forest = grid_search.best_estimator_
# Make predictions on the testing set using the best model y_pred =
best_random_forest.predict(X_test)
# Evaluate and print performance metrics accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy with Best Hyperparameters: {accuracy:.4f}")

PRACTICAL-14
Aim: Using a dataset with a large number of features, apply a Random Forest classifier to perform
feature importance analysis. Identify and rank the most important features in the dataset. Visualize
the feature importances to gain insights into which features are most influential for classification..
Solution
import numpy as np import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import
train_test_split from sklearn.metrics import accuracy_score
from sklearn.feature_selection import SelectFromModel # Load the Breast Cancer dataset
data = load_breast_cancer()
X, y = data.data, data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Initialize the
Random Forest classifier
random_forest_classifier = RandomForestClassifier(n_estimators=100, random_state=42) # Train the
model
random_forest_classifier.fit(X_train, y_train) # Make predictions on the testing set
y_pred = random_forest_classifier.predict(X_test) # Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred) print(f"Accuracy: {accuracy:.4f}")
# Feature Importance Analysis
feature_importances = random_forest_classifier.feature_importances_ feature_names =
data.feature_names
# Create a DataFrame for better visualization
feature_importance_df = pd.DataFrame({'Feature': feature_names, 'Importance':
feature_importances})
feature_importance_df = feature_importance_df.sort_values(by='Importance', ascending=False) #
Print and visualize the top features
print("\nTop 10 Features:") print(feature_importance_df.head(10)) # Visualize Feature Importances
plt.barh(feature_importance_df['Feature'], feature_importance_df['Importance'], color='skyblue')
plt.xlabel('Importance')
plt.title('Feature Importances')
plt.show()

OUTPUT

PRACTICAL-15
Aim: Build a feedforward artificial neural network (ANN) from scratch using Python. Implement the
Backpropagation algorithm to train the network. Use a simple binaryclassification dataset (e.g., the
XOR problem) to test and validate the network's training process and accuracy.
Solution
import numpy as np class NeuralNetwork:
def init (self, input_size, hidden_size, output_size, learning_rate): # Initialize weights and biases
self.weights_input_hidden = np.random.rand(input_size, hidden_size) self.bias_hidden =
np.zeros((1, hidden_size)) self.weights_hidden_output = np.random.rand(hidden_size, output_size)
self.bias_output = np.zeros((1, output_size))
# Learning rate self.learning_rate = learning_rate
def sigmoid(self, x):
return 1 / (1 + np.exp(-x)) def sigmoid_derivative(self, x):
return x * (1 - x) def forward(self, X): # Forward pass
self.hidden_layer_input = np.dot(X, self.weights_input_hidden) + self.bias_hidden
self.hidden_layer_output = self.sigmoid(self.hidden_layer_input)
self.output_layer_input = np.dot(self.hidden_layer_output, self.weights_hidden_output) +
self.bias_output
self.predicted_output = self.sigmoid(self.output_layer_input)
return self.predicted_output
def backward(self, X, y, output):

# Backward pass
error_output = y - output
output_delta = error_output * self.sigmoid_derivative(output)
error_hidden = output_delta.dot(self.weights_hidden_output.T)
hidden_delta = error_hidden * self.sigmoid_derivative(self.hidden_layer_output)
# Update weights and biases
self.weights_hidden_output += self.hidden_layer_output.T.dot(output_delta) *
self.learning_rate
self.bias_output += np.sum(output_delta, axis=0, keepdims=True) * self.learning_rate

self.weights_input_hidden += X.T.dot(hidden_delta) * self.learning_rate

self.bias_hidden += np.sum(hidden_delta, axis=0, keepdims=True) * self.learning_rate

def train(self, X, y, epochs):
for epoch in range(epochs):
# Forward pass
output = self.forward(X)
# Backward pass and update parameters

self.backward(X, y, output)
# Print the loss every 1000 epochs

if epoch % 1000 == 0:
loss = np.mean(np.square(y - output))
print(f"Epoch {epoch}, Loss: {loss}")
if name == " main ":
# XOR problem dataset
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])
# Initialize the neural network

input_size = 2
hidden_size = 4
output_size = 1
learning_rate = 0.1
epochs = 10000
neural_network = NeuralNetwork(input_size, hidden_size, output_size, learning_rate)

# Train the neural network
neural_network.train(X, y, epochs)
# Test the trained network
predictions = neural_network.forward(X)
print("\nPredictions:")
print(predictions)


PRACTICAL -16
Aim: Create a multilayer perceptron (MLP) neural network using a deep learning framework like
TensorFlow or PyTorch. Train the MLP on a dataset of handwritten digit images (e.g., MNIST or
Fashion MNIST) using the Backpropagation algorithm. Evaluate the model's accuracy on a separate
test dataset and visualize the results.
Solution
import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras.datasets import mnist

import numpy as np
# Load and preprocess the MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0 # Normalize pixel values to between 0 and 1
# Build the MLP model
model = models.Sequential([
layers.Flatten(input_shape=(28, 28)), # Flatten 28x28 images to a 1D array

layers.Dense(128, activation='relu'),
layers.Dropout(0.2),
layers.Dense(10, activation='softmax') # Output layer with 10 units for 10 classes (digits 0-9)
])
# Compile the model

model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(x_train, y_train, epochs=5) # Evaluate the
model on the test set

test_loss, test_accuracy = model.evaluate(x_test, y_test,

verbose=2) print(f"\nTest Accuracy: {test_accuracy *
100:.2f}%")
# Visualize predictions for a few test samples
predictions = model.predict(x_test) num_samples = 5
plt.figure(figsize=(15, 3)) for i in range(num_samples):
plt.subplot(1, num_samples, i + 1) plt.imshow(x_test[i],
cmap='gray')
plt.title(f"Predicted:
{np.argmax(predictions[i])}\nActual: {y_test[i]}")
plt.axis('off')
plt.show()
Output

PRACTICAL-17
Aim: Apply k-Means algorithm and Hierarchical Clustering algorithm to cluster a set of data stored
in a .CSV file. Use the same data set for clustering. Compare the results of these two algorithms and
comment on the quality of clustering.
Solution
import pandas as pd import numpy as np
from sklearn.cluster import KMeans
from scipy.cluster.hierarchy import dendrogram, linkage
from sklearn.preprocessing import LabelEncoder
file_path = 'Iris.csv' # Replace with the actual path to your CSV file data = pd.read_csv(file_path)
# Assuming the last column contains non-numeric labels
label_column = data.columns[-1]
# Encode non-numeric labels to numeric values
label_encoder = LabelEncoder()
data[label_column] = label_encoder.fit_transform(data[label_column]) # Exclude non-numeric
columns
X = data.iloc[:, :-1].values
# Number of clusters for k-Means k_clusters = 3
# Apply k-Means algorithm
kmeans = KMeans(n_clusters=k_clusters) kmeans_labels = kmeans.fit_predict(X)
kmeans_centroids = kmeans.cluster_centers_ # Apply Hierarchical Clustering linkage_matrix =
linkage(X, method='ward') dendrogram(linkage_matrix) plt.title('Hierarchical Clustering
Dendrogram') plt.show()
# Plot k-Means results
plt.scatter(X[:, 0], X[:, 1], c=kmeans_labels, cmap='viridis', edgecolors='k')

plt.scatter(kmeans_centroids[:, 0], kmeans_centroids[:, 1], c='red', marker='X', s=200,

label='Centroids')
plt.title('k-Means Clustering') plt.legend()
plt.show()

PRACTICAL-18
Aim Load a dataset from a "data.csv" file and apply the k-Means clustering algorithm to cluster the data into
'k' clusters. Experiment with different values of 'k' and visualize the results. Discuss the optimal number of
clusters based on the clustering quality metrics using PCA.
Solution
import pandas as pd import numpy as np

from sklearn.metrics import silhouette_score
from sklearn.preprocessing import StandardScaler, OneHotEncoder from sklearn.compose import
ColumnTransformer
from sklearn.decomposition import PCA # Load data from CSV file
file_path = 'Iris.csv'
data = pd.read_csv(file_path)
# Identify numeric and non-numeric columns
numeric_cols = data.select_dtypes(include=[np.number]).columns categorical_cols =
data.select_dtypes(exclude=[np.number]).columns # Apply one-hot encoding to categorical columns
preprocessor = ColumnTransformer( transformers=[
('num', StandardScaler(), numeric_cols), ('cat', OneHotEncoder(), categorical_cols)
])
# Extract features from the data X = data.copy()
# Standardize and encode the data X_scaled = preprocessor.fit_transform(X) # Experiment with
different values of 'k'
k_values = range(2, 11) # Try values of k from 2 to 10 silhouette_scores = []
# Apply k-Means clustering for each value of 'k' for k in k_values:
kmeans = KMeans(n_clusters=k, random_state=42) kmeans_labels = kmeans.fit_predict(X_scaled)
# Calculate silhouette score
silhouette_avg = silhouette_score(X_scaled, kmeans_labels)

silhouette_scores.append(silhouette_avg)
# Visualize the results using PCA for dimensionality reduction pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=kmeans_labels, cmap='viridis', edgecolors='k')
plt.title(f'k-Means Clustering (k={k}), Silhouette Score: {silhouette_avg:.2f}')
plt.show()
# Plot silhouette scores for different values of 'k' plt.plot(k_values, silhouette_scores, marker='o')
plt.xlabel('Number of Clusters (k)') plt.ylabel('Silhouette Score')
plt.title('Silhouette Score vs Number of Clusters') plt.show()
# Optimal number of clusters based on silhouette score optimal_k =

k_values[np.argmax(silhouette_scores)]
print(f"The optimal number of clusters is {optimal_k} based on the highest silhouette score.")
Output
The optimal number of clusters is 2 based on the highest silhouette score.



PRACTICAL -19
Aim: Load a dataset from "data.csv" and apply the k-Means algorithm with a chosen 'k'
value. Use appropriate clustering quality metrics (e.g., silhouette score, inertia, Davies-
Bouldin index) to evaluate the quality of clustering. Analyze how the choice of 'k' affects
the clustering quality and discuss your findings.
Solution :
import pandas as pd
from sklearn.metrics import silhouette_score, davies_bouldin_score import
matplotlib.pyplot as plt
# Load the dataset dataset_path = "Iris.csv"
data = pd.read_csv(dataset_path)
# Exclude the last column (assuming it contains labels)
X = data.iloc[:, :-1].values
# Choose a range of 'k' values to experiment with k_values = range(2, 11)
# Lists to store clustering quality metrics silhouette_scores = []
inertia_values = [] davies_bouldin_scores = []
# Apply k-Means for different 'k' values for k in k_values:
# Fit the k-Means model
kmeans = KMeans(n_clusters=k, random_state=42) kmeans.fit(X)
# Predict clusters
labels = kmeans.labels_
# Calculate clustering quality metrics silhouette_scores.append(silhouette_score(X,
labels)) inertia_values.append(kmeans.inertia_)
davies_bouldin_scores.append(davies_bouldin_score(X, labels))
# Plotting the results
# Plot Silhouette Score plt.subplot(1, 3, 1)
plt.plot(k_values, silhouette_scores, marker='o') plt.title('Silhouette Score')
plt.xlabel('Number of Clusters (k)') plt.ylabel('Score')

# Plot Inertia plt.subplot(1, 3, 2)

plt.plot(k_values, inertia_values, marker='o') plt.title('Inertia')
plt.xlabel('Number of Clusters (k)') plt.ylabel('Inertia')

# Plot Davies-Bouldin Index plt.subplot(1, 3, 3)
plt.plot(k_values, davies_bouldin_scores, marker='o') plt.title('Davies-Bouldin Index')
plt.xlabel('Number of Clusters (k)') plt.ylabel('Score') plt.tight_layout()
plt.show()
Output

PRACTICAL -20
Aim: Write a Python program to implement a Self-Organizing Map (SOM) and train it
ona given dataset, such as a collection of 2D points. Allow the user to specify parameters
like the map size, learning rate, and number of training iterations. Visualize the map
beforeand after training to observe how it adapts to the data.
Solution
import numpy as np
from minisom import MiniSom import matplotlib.pyplot as plt
def visualize_som(som, data, title): plt.figure(figsize=(8, 8))
plt.pcolor(som.distance_map().T, cmap='bone_r') # plot the distance map as background
plt.colorbar()
# plot data points on the map for i, (x, _) in enumerate(data):
w = som.winner(x)
plt.plot(w[0] + 0.5, w[1] + 0.5, 'o', markerfacecolor='None', markersize=10,
markeredgecolor='r', markeredgewidth=2)
plt.text(w[0] + 0.5, w[1] + 0.5, str(i + 1), color='k', fontweight='bold', ha='center',
va='center')
plt.title(title)
plt.show()
# Generate synthetic 2D data np.random.seed(42)
data = np.random.rand(100, 2) # replace this with your own dataset # User-defined
parameters
map_size = (10, 10) # SOM map size learning_rate = 0.5 # initial learning rate
num_iterations = 1000 # number of training iterations # Create and train the SOM
som = MiniSom(*map_size, 2, sigma=1.0, learning_rate=learning_rate)


PRACTICAL -21
Aim: Implement a Self-Organizing Map (SOM) algorithm and use it to cluster a set of
images based on their visual content. You can represent each image as a feature
vector(e.g., using color histograms or deep features) and train the SOM to group similar
images together. Visualize the resulting clusters and evaluate the quality of the clustering.
Solution
import numpy as np
from minisom import MiniSomimport matplotlib.pyplot as plt import cv2

def load_images(image_paths):images = []
for path in image_paths: img = cv2.imread(path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # Convert to RGB
images.append(img)
return images
def extract_features(images):features = []
for img in images:
# Use color histograms as feature vectors
hist_r = np.histogram(img[:,:,0], bins=256, range=(0, 256))[0] hist_g =
np.histogram(img[:,:,1], bins=256, range=(0, 256))[0]hist_b = np.histogram(img[:,:,2],
bins=256, range=(0, 256))[0]feature_vector = np.concatenate((hist_r, hist_g, hist_b))
features.append(feature_vector)
return np.array(features)
def visualize_som_with_labels(som, data, labels, title):plt.figure(figsize=(10, 10))

plt.pcolor(som.distance_map().T, cmap='bone_r') # plot the distance map as background
# plot data points on the map with labels

for i, (x, label) in enumerate(zip(data, labels)): w = som.winner(x)
plt.plot(w[0] + 0.5, w[1] + 0.5, 'o', markerfacecolor='None', markersize=10,
markeredgecolor='r', markeredgewidth=2)
plt.text(w[0] + 0.5, w[1] + 0.5, f"{label}\n({i + 1})", color='k', fontweight='bold',
ha='center', va='center')
plt.title(title) plt.show()
# Set your image paths here
image_paths = ["1.jpg", "2.jpg", "3.jpg", "4.jpg", "5.jpg", "6.jpg", "7.jpg", "8.jpg",
"9.jpg",
"10.jpg", "11.jpg", "12.jpg", "13.jpg", "14.jpg", "15.jpg", "16.jpg"] # Load and
preprocess images
images = load_images(image_paths) feature_vectors = extract_features(images) # User-

defined parameters
map_size = (10, 10) # SOM map size learning_rate = 0.5 # initial learning rate
num_iterations = 1000 # number of training iterations # Create and train the SOM
som = MiniSom(*map_size, feature_vectors.shape[1], sigma=1.0,
learning_rate=learning_rate) som.random_weights_init(feature_vectors)
print("Training SOM...") som.train_random(feature_vectors, num_iterations)
print("Training complete.")
# Get cluster assignments for each image
cluster_labels = [som.winner(x) for x in feature_vectors]
visualize_som_with_labels(som, feature_vectors, cluster_labels, title="SOM
Clustering")
OUTPUT:

PRACTICAL -22
Aim: Given a dataset of customer churn, implement a program that compares the
performance of three different supervised learning algorithms (e.g., Logistic Regression,
Random Forest, and Support Vector Machine) for binary classification. Split the dataset
into training and testing sets, train each algorithm on the training set, and evaluate their
performance using metrics like accuracy, precision, recall, and F1-score. Present the results
in a clear and informative way, such as through a bar chart or a table.
Solution
import pandas as pd
from sklearn.model_selection
import train_test_split from
sklearn.linear_model import
LogisticRegression from sklearn.ensemble
import RandomForestClassifierfrom
sklearn.svm import SVC
from sklearn.metrics import accuracy_score, precision_score,
recall_score, f1_scoreimport matplotlib.pyplot as plt
# Load your customer churn dataset (replace 'your_dataset.csv' with your
actual dataset)dataset_path = 'telecom_churn.csv'
data = pd.read_csv(dataset_path)
# Assuming your dataset has a 'Churn' column indicating binary labels (1 for
churn, 0 for non-churn)
X=
data.drop('Churn',
axis=1)y =
data['Churn']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,

random_state=42)# Define the classifiers
classifiers = {
'Logistic Regression':
LogisticRegression(), 'Random
Forest':
RandomForestClassifier(),
'Support Vector Machine':
SVC()
}
# Train and evaluate each classifier
results = {'Classifier': [], 'Accuracy': [], 'Precision': [], 'Recall': [], 'F1-Score': []}

for clf_name, clf in classifiers.items(): # Train the classifier

clf.fit(X_train, y_train) # Predict on the test set
y_pred = clf.predict(X_test) # Evaluate performance
accuracy = accuracy_score(y_test, y_pred) precision = precision_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred) # Store results
results['Classifier'].append(clf_name) results['Accuracy'].append(accuracy)
results['Precision'].append(precision) results['Recall'].append(recall) results['F1-
Score'].append(f1)
# Convert results to a DataFrame for easy visualization
results_df = pd.DataFrame(results)
# Plot the results using a bar chart
for metric in ['Accuracy', 'Precision', 'Recall', 'F1-Score']: plt.bar(results_df['Classifier'],
results_df[metric], label=metric)
plt.title('Performance Comparison of Classifiers')
plt.xlabel('Classifier')
plt.ylabel('Score')
plt.legend()
plt.show()
# Display the results in tabular form
print("Results:")
print(results_df)


Tushar ML

Uploaded by

Tushar ML

Uploaded by

INDEX

S.No. Problem Statement Date Page Signature

Tushar Ahuja 2 09213702022

Tushar Ahuja 3 09213702022

Tushar Ahuja 4 09213702022

Tushar Ahuja 5 09213702022

Tushar Ahuja 6 09213702022

Tushar Ahuja 7 09213702022

Tushar Ahuja 8 09213702022

Tushar Ahuja 9 09213702022

Tushar Ahuja 10 09213702022

Tushar Ahuja 11 09213702022

Tushar Ahuja 12 09213702022

Tushar Ahuja 13 09213702022

Tushar Ahuja 14 09213702022

Tushar Ahuja 15 09213702022

Aim:-Implement a K-Nearest Neighbors (KNN) classifier from scratch in Python. Use a

def fit(self, X_train, y_train):

def predict(self, X_test):

def _predict(self, x):

# Make predictions on the testing set y_pred = knn_classifier.predict(X_test)

# Evaluate and print the accuracy accuracy = accuracy_score(y_test, y_pred)

Tushar Ahuja 16 09213702022

Tushar Ahuja 17 09213702022

Aim:-Write a Python program to implement a Support Vector Machine (SVM) classifier

svm_classifier.fit(X_train, y_train) # Make predictions on the testing set

Tushar Ahuja 18 09213702022

# Create a meshgrid to plot decision boundaries

Tushar Ahuja 19 09213702022

Strengths and Weaknesses of K-Nearest Neighbors (KNN)Strengths

3. Adapts to Local Patterns

3. Doesn't Perform Well on High-Dimensional Data

Tushar Ahuja 20 09213702022

Comparison and Considerations

Tushar Ahuja 21 09213702022

from sklearn.metrics import accuracy_score, precision_score, recall_score,

Tushar Ahuja 22 09213702022

print(f"Accuracy: {accuracy:.4f}") print(f"Precision: {precision:.4f}") print(f"Recall:

Tushar Ahuja 23 09213702022

Tushar Ahuja 24 09213702022

Tushar Ahuja 25 09213702022

import numpy as np import pandas as pd

'n_estimators': [50, 100, 200],

'max_depth': [None, 10, 20, 30],

Tushar Ahuja 26 09213702022

# Perform grid search with cross-validation (5-fold cross-validation in this example)

grid_search = GridSearchCV(estimator=random_forest_classifier, param_grid=param_grid, cv=5,

Tushar Ahuja 27 09213702022

Tushar Ahuja 28 09213702022

Tushar Ahuja 29 09213702022

def backward(self, X, y, output):

self.bias_output += np.sum(output_delta, axis=0, keepdims=True) * self.learning_rate

Tushar Ahuja 30 09213702022

self.bias_hidden += np.sum(hidden_delta, axis=0, keepdims=True) * self.learning_rate

# Backward pass and update parameters

# Print the loss every 1000 epochs

y = np.array([[0], [1], [1], [0]])

# Initialize the neural network

neural_network = NeuralNetwork(input_size, hidden_size, output_size, learning_rate)

Tushar Ahuja 31 09213702022

Tushar Ahuja 32 09213702022

from tensorflow.keras.datasets import mnist

layers.Flatten(input_shape=(28, 28)), # Flatten 28x28 images to a 1D array

# Compile the model

Tushar Ahuja 33 09213702022

test_loss, test_accuracy = model.evaluate(x_test, y_test,

Tushar Ahuja 34 09213702022

Tushar Ahuja 35 09213702022

plt.scatter(kmeans_centroids[:, 0], kmeans_centroids[:, 1], c='red', marker='X', s=200,

Tushar Ahuja 36 09213702022

import pandas as pd import numpy as np

Tushar Ahuja 37 09213702022

# Optimal number of clusters based on silhouette score optimal_k =

The optimal number of clusters is 2 based on the highest silhouette score.

Tushar Ahuja 38 09213702022

Tushar Ahuja 39 09213702022