0% found this document useful (0 votes)
5 views9 pages

ML Lab Programs (1)

The document outlines a Machine Learning Lab course with objectives to learn installation of Python packages, regression methods, clustering, and classification algorithms. It includes a list of experiments covering various machine learning techniques such as linear regression, logistic regression, decision trees, random forests, K-means, SVM, and neural networks. Additionally, it provides sample code and datasets for practical implementation of these algorithms.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
5 views9 pages

ML Lab Programs (1)

The document outlines a Machine Learning Lab course with objectives to learn installation of Python packages, regression methods, clustering, and classification algorithms. It includes a list of experiments covering various machine learning techniques such as linear regression, logistic regression, decision trees, random forests, K-means, SVM, and neural networks. Additionally, it provides sample code and datasets for practical implementation of these algorithms.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 9

1

Machine Learning Lab


Course Code:20PCC32XX

Prerequisites: Computer Programming


Course objectives:

1. Learn installation of Anakonda-Python and its useful packages


2. Study various regression methods with respect to problem solving
3. Understand clustering algorithms and apply in appropriate problems.
4. Understand classification algorithms and apply in appropriate problems.

Course Outcomes:
LIST OF EXPERIMENTS

1. Install the python software/Anaconda- python and install useful package for Machine
learning load the dataset(sample), understand, and visualize the Data
2. Implement simple linear regression
3. Implement multivariate linear regression.
4. Implement simple logistic regression and multivariate logistic regression.
5. Implement decision trees.
6. Implement any 3 classification algorithms.
7. Implement random forests algorithm
8. Implement K-means, KNN algorithmsm
9. Implement SVM on any applicable datasets.
10.Implement neural networks
11.Implement CA.
12.Implement anomaly detection and recommendation

REFERENCES

1. Machine Learning with Python/Scikit-Learn,


Application to the Estimation of Occupancy and Human
2

2. Implement Simple Linear Regression

Example: predicting a person's weight based on their height.

Import necessary libraries


import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
Sample dataset: height (inches) vs weight (pounds)
X = np.array([58, 59, 60, 61, 62, 63, 64, 65, 66, 67]).reshape(-1, 1) Heights
y = np.array([115, 117, 120, 123, 126, 129, 132, 136, 140, 144]) Weights
Split data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Create the model
model = LinearRegression()
Train the model
model.fit(X_train, y_train)
Make predictions on the test set
y_pred = model.predict(X_test)
Evaluate the model (Mean Squared Error)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")
Visualization of the line
plt.scatter(X, y, color='blue', label='Data Points')
plt.plot(X, model.predict(X), color='red', label='Regression Line')
plt.xlabel('Height (inches)')
plt.ylabel('Weight (pounds)')
plt.legend()
plt.show()
Print the slope (coefficient) and intercept of the line
print(f"Slope (Coefficient): {model.coef_[0]}")
print(f"Intercept: {model.intercept_}")

3. Implement Multivariate Linear Regression

We will use a dataset with multiple features (independent variables) such as size of the house
(square footage), number of bedrooms, and age of the house to predict the house price.

Import necessary libraries


import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
Sample dataset: size (sq ft), bedrooms, age (years) and price (in thousands)
data = {
'Size': [1500, 1600, 1700, 1875, 2000, 2100, 2300, 2400, 2550, 2700],
'Bedrooms': [3, 3, 3, 4, 4, 3, 4, 4, 5, 5],
3

'Age': [10, 12, 13, 14, 15, 16, 17, 18, 19, 20],
'Price': [300, 320, 340, 360, 400, 410, 430, 450, 480, 500]
}
Create a DataFrame
df = pd.DataFrame(data)
Features (Size, Bedrooms, Age)
X = df[['Size', 'Bedrooms', 'Age']]
Target (Price)
y = df['Price']
Split data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Create the model
model = LinearRegression()
Train the model
model.fit(X_train, y_train)
Make predictions on the test set
y_pred = model.predict(X_test)
Evaluate the model (Mean Squared Error)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")
Print the coefficients and intercept
print(f"Coefficients (Weights): {model.coef_}")
print(f"Intercept (Bias): {model.intercept_}")
Example prediction: Predict the price of a house with 2200 sq ft, 4 bedrooms, and 15 years
old
example = np.array([[2200, 4, 15]])
predicted_price = model.predict(example)
print(f"Predicted price for a 2200 sq ft, 4 bedroom, 15 year old house: {predicted_price[0]}
thousand dollars")

4. Implement Simple Logistic Regression and Multivariate Logistic Regression

Simple Logistic Regression (One feature):


We will classify whether a person will buy a car based on their age (a single feature).

Import necessary libraries


import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
Sample data: Age vs Buy (0 = No, 1 = Yes)
data = {
'Age': [22, 25, 47, 52, 46, 56, 55, 60, 62, 61],
'Buy': [0, 0, 1, 1, 1, 1, 1, 1, 1, 0]
}
df = pd.DataFrame(data)
Features (Age)
X = df[['Age']]
Target (Buy)
4

y = df['Buy']
Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Create the Logistic Regression model
model = LogisticRegression()
Train the model
model.fit(X_train, y_train)
Make predictions
y_pred = model.predict(X_test)
Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")
Example prediction: Will a 50-year-old buy a car?
example = np.array([[50]])
pred = model.predict(example)
print(f"Prediction for age 50: {'Yes' if pred[0] == 1 else 'No'}")

Multivariate Logistic Regression (Multiple features):


Here, we will classify whether a student pass based on their study time and number of practice
tests taken.

Sample data: Study Time, Practice Tests, and Pass (0 = Fail, 1 = Pass)
data = {
'StudyTime': [10, 20, 30, 40, 25, 35, 50, 60, 70, 80],
'PracticeTests': [1, 3, 2, 4, 3, 5, 6, 5, 7, 8],
'Pass': [0, 0, 1, 1, 0, 1, 1, 1, 1, 1]
}
df = pd.DataFrame(data)
Features (StudyTime, PracticeTests)
X = df[['StudyTime', 'PracticeTests']]
Target (Pass)
y = df['Pass']
Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Create the Logistic Regression model
model = LogisticRegression()
Train the model
model.fit(X_train, y_train)
Make predictions
y_pred = model.predict(X_test)
Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")
Example prediction: Will a student who studies 55 hours and takes 6 practice tests pass?
example = np.array([[55, 6]])
pred = model.predict(example)
print(f"Prediction for 55 hours study and 6 practice tests: {'Pass' if pred[0] == 1 else 'Fail'}")

5. Implement Decision Trees


5

Decision Trees classify data points by splitting the dataset into branches based on feature
values. The splits are made using metrics like Gini impurity or entropy.

Import necessary libraries


from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn import tree
Sample data: Features (Hours Studied, Practice Tests), Target (Pass = 1, Fail = 0)
data = {
'StudyTime': [10, 20, 30, 40, 25, 35, 50, 60, 70, 80],
'PracticeTests': [1, 3, 2, 4, 3, 5, 6, 5, 7, 8],
'Pass': [0, 0, 1, 1, 0, 1, 1, 1, 1, 1]
}
df = pd.DataFrame(data)
Features (StudyTime, PracticeTests)
X = df[['StudyTime', 'PracticeTests']]
Target (Pass)
y = df['Pass']
Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Create the Decision Tree Classifier
model = DecisionTreeClassifier()
Train the model
model.fit(X_train, y_train)
Make predictions
y_pred = model.predict(X_test)
Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")
Visualize the decision tree
tree.plot_tree(model)
plt.show()

6. Implement Any 3 Classification Algorithms

We will implement Logistic Regression, k-Nearest Neighbors (k-NN), and Support Vector
Machine (SVM) on the same dataset.

from sklearn.neighbors import KNeighborsClassifier


from sklearn.svm import SVC
Logistic Regression
logistic_model = LogisticRegression()
logistic_model.fit(X_train, y_train)
logistic_pred = logistic_model.predict(X_test)
logistic_acc = accuracy_score(y_test, logistic_pred)
print(f"Logistic Regression Accuracy: {logistic_acc}")
k-Nearest Neighbors (k-NN)
knn_model = KNeighborsClassifier(n_neighbors=3)
knn_model.fit(X_train, y_train)
knn_pred = knn_model.predict(X_test)
6

knn_acc = accuracy_score(y_test, knn_pred)


print(f"k-NN Accuracy: {knn_acc}")
Support Vector Machine (SVM)
svm_model = SVC()
svm_model.fit(X_train, y_train)
svm_pred = svm_model.predict(X_test)
svm_acc = accuracy_score(y_test, svm_pred)
print(f"SVM Accuracy: {svm_acc}")

7. Implement Random Forest Algorithm

Random Forest is an ensemble learning method that combines multiple decision trees to
improve accuracy and prevent overfitting.

from sklearn.ensemble import RandomForestClassifier


Create the Random Forest model
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
Train the model
rf_model.fit(X_train, y_train)
Make predictions
y_pred_rf = rf_model.predict(X_test)
Evaluate the model
rf_accuracy = accuracy_score(y_test, y_pred_rf)
print(f"Random Forest Accuracy: {rf_accuracy}")

8. Implement K-means and KNN Algorithms

K-means Clustering:
K-means is an unsupervised algorithm that divides data points into (k) clusters, where each
data point belongs to the cluster with the nearest mean.

Import necessary librarie


from sklearn.cluster import KMeans
import numpy as np
import matplotlib.pyplot as plt
Sample dataset: 2D data points
X = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]])
Create the KMeans model
kmeans = KMeans(n_clusters=2, random_state=0)
Train the model
kmeans.fit(X)
Predict the clusters
clusters = kmeans.predict(X)
Visualize the results
plt.scatter(X[:, 0], X[:, 1], c=clusters, cmap='viridis')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=300, c='red',
marker='X')
plt.title("K-means Clustering")
plt.show()
Print cluster centers
7

print("Cluster centers:", kmeans.cluster_centers_)

K-Nearest Neighbors (KNN) Classification:


KNN is a supervised algorithm that classifies data points based on the classes of their
nearest neighbors.

from sklearn.neighbors import KNeighborsClassifier


from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
Sample dataset: Features (Hours Studied, Practice Tests), Target (Pass = 1, Fail = 0)
data = {
'StudyTime': [10, 20, 30, 40, 25, 35, 50, 60, 70, 80],
'PracticeTests': [1, 3, 2, 4, 3, 5, 6, 5, 7, 8],
'Pass': [0, 0, 1, 1, 0, 1, 1, 1, 1, 1]
}
df = pd.DataFrame(data)
Features (StudyTime, PracticeTests)
X = df[['StudyTime', 'PracticeTests']]
Target (Pass)
y = df['Pass']
Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Create the KNN model
knn = KNeighborsClassifier(n_neighbors=3)
Train the model
knn.fit(X_train, y_train)
Make predictions
y_pred = knn.predict(X_test)
Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"KNN Accuracy: {accuracy}")

9. Implement SVM on Any Applicable Dataset

Support Vector Machine (SVM) is a supervised learning algorithm that finds the hyperplane
that best separates data into classes.

from sklearn.svm import SVC


from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
Load the iris dataset
iris = load_iris()
X = iris.data
y = iris.target
Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Create the SVM model
svm_model = SVC()
Train the model
8

svm_model.fit(X_train, y_train)
Make predictions
y_pred = svm_model.predict(X_test)
Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"SVM Accuracy on Iris dataset: {accuracy}")

10. Implement Neural Networks

Neural networks consist of layers of interconnected nodes, and they are used for various
types of classification or regression tasks.
We'll implement a basic Multi-Layer Perceptron (MLP) using the `MLPClassifier` from the
`sklearn` library.

from sklearn.neural_network import MLPClassifier


from sklearn.datasets import load_digits
from sklearn.metrics import classification_report
Load digits dataset
digits = load_digits()
X = digits.data
y = digits.target
Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Create the MLP model (Neural Network)
mlp = MLPClassifier(hidden_layer_sizes=(100,), max_iter=300, random_state=42)
Train the model
mlp.fit(X_train, y_train)
Make predictions
y_pred = mlp.predict(X_test)
Evaluate the model
print("Neural Network Classification Report:")
print(classification_report(y_test, y_pred))

10. Implement Correlation Analysis (CA)

Correlation analysis is used to study the strength and direction of the linear relationship
between two continuous variables.

import seaborn as sns


import pandas as pd
Sample dataset: Hours studied, Grades, Hours of sleep
data = {
'Hours_Studied': [5, 10, 15, 20, 25],
'Grades': [50, 60, 70, 80, 90],
'Hours_Sleep': [8, 7, 6, 5, 4]
}
df = pd.DataFrame(data)
Calculate the correlation matrix
corr_matrix = df.corr()
Visualize the correlation matrix
9

sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')


plt.title('Correlation Analysis')
plt.show()
Print the correlation values
print("Correlation matrix:")
print(corr_matrix)

11. Implement Anomaly Detection and Recommendation

Anomaly Detection:
Anomaly detection identifies outliers or rare items that differ significantly from the majority
of the data.

from sklearn.ensemble import IsolationForest


Sample data: 2D points
X = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0], [50, 50], [100, 100]])
Create the IsolationForest model
iso_forest = IsolationForest(contamination=0.1)
Fit the model
iso_forest.fit(X)
Predict anomalies (-1: anomaly, 1: normal)
predictions = iso_forest.predict(X)
Output the anomaly predictions
print("Anomaly detection predictions:", predictions)

Recommendation Systems (Using Nearest Neighbors):


Recommendation systems predict user preferences and suggest relevant items based on
past data.

from sklearn.neighbors import NearestNeighbors


Sample user-item data (rows: users, columns: items)
data = np.array([[5, 3, 0, 1],
[4, 0, 0, 1],
[1, 1, 0, 5],
[0, 0, 5, 4],
[0, 3, 4, 5]])
Create the model
model = NearestNeighbors(metric='cosine', algorithm='brute')
Fit the model
model.fit(data)
Find the nearest neighbors for the first user
distances, indices = model.kneighbors(data[0].reshape(1, -1), n_neighbors=3)
Output the nearest neighbors for user 1
print("Nearest neighbors for user 1:")
print(indices)

You might also like