MSc.
CS – SEM II
MACHINE LEARNING – JOURNAL
PRN: MU0027920240005027
Practical No. 1
Implement Linear Regression (Diabetes Dataset)
Aim: To implement Linear Regression using the Diabetes dataset and evaluate its
performance.
Solution:
Code:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score
# Load the diabetes dataset
diabetes = datasets.load_diabetes()
X = diabetes.data[:, np.newaxis, 2] # Use only one feature (BMI)
y = diabetes.target
# Split into training and testing sets
X_train, X_test = X[:-20], X[-20:]
y_train, y_test = y[:-20], y[-20:]
# Create and train the linear regression model
regr = linear_model.LinearRegression()
regr.fit(X_train, y_train)
# Make predictions
y_pred = regr.predict(X_test)
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f"Mean Squared Error: {mse:.2f}")
print(f"R2 Score: {r2:.2f}")
# Plot the results
plt.scatter(X_test, y_test, color="black")
plt.plot(X_test, y_pred, color="blue", linewidth=3)
plt.xlabel("BMI")
plt.ylabel("Disease Progression")
plt.title("Linear Regression on Diabetes Dataset")
plt.show()
Observations and Results:
The model achieved an MSE of 2548.07 and an R² score of 0.47.
The plot shows the linear relationship between BMI and disease progression.
PAGE 1
MSc. CS – SEM II
MACHINE LEARNING – JOURNAL
PRN: MU0027920240005027
Output:
PAGE 2
MSc. CS – SEM II
MACHINE LEARNING – JOURNAL
PRN: MU0027920240005027
Practical No. 2
Implement Logistic Regression (Iris Dataset)
Aim: To classify Iris flowers into two classes using Logistic Regression.
Solution:
Code:
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix
# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data[:, :2] # Using only first two features (sepal length & width)
y = (iris.target != 0).astype(int) # Binary classification (Setosa vs Non-Setosa)
# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train the logistic regression model
clf = LogisticRegression()
clf.fit(X_train, y_train)
# Make predictions
y_pred = clf.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
print("Confusion Matrix:")
print(conf_matrix)
Observations and Results:
The model achieved an accuracy of 1.00 (100%) on the test set.
The confusion matrix confirms perfect classification for this binary task.
PAGE 3
MSc. CS – SEM II
MACHINE LEARNING – JOURNAL
PRN: MU0027920240005027
Output:
PAGE 4
MSc. CS – SEM II
MACHINE LEARNING – JOURNAL
PRN: MU0027920240005027
Practical No. 3
Implement Multinomial Logistic Regression (Iris Dataset)
Aim: To classify Iris flowers into all three classes using Multinomial Logistic
Regression.
Solution:
Code:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
# Load the Iris dataset
iris = load_iris()
# Use all 4 features for multiclass classification
X = iris.data
y = iris.target
# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train the multinomial logistic regression model
# Removed multi_class="multinomial" as it's the default for 'lbfgs' in newer versions
clf = LogisticRegression(solver="lbfgs", max_iter=1000)
clf.fit(X_train, y_train)
# Make predictions
y_pred = clf.predict(X_test)
# Evaluate the model
print("--- Classification Report ---")
print(classification_report(y_test, y_pred))
print("\nModel training and evaluation complete.")
Observations and Results:
The model achieved 100% accuracy on the test set.
The classification report shows perfect precision, recall, and F1-score for all three
classes.
PAGE 5
MSc. CS – SEM II
MACHINE LEARNING – JOURNAL
PRN: MU0027920240005027
Output:
PAGE 6
MSc. CS – SEM II
MACHINE LEARNING – JOURNAL
PRN: MU0027920240005027
Practical No. 4
Implement SVM Classifier (Iris Dataset)
Aim: To classify Iris flowers using a Support Vector Machine (SVM).
Solution:
Code:
from sklearn.datasets import load_iris # Import the iris dataset
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import classification_report
# Load the Iris dataset
iris = load_iris()
# Use all 4 features for classification
X = iris.data
y = iris.target
# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train an SVM classifier
# Using a linear kernel for simplicity and often good performance on Iris
svm_clf = SVC(kernel="linear", random_state=42) # Added random_state for reproducibility
svm_clf.fit(X_train, y_train)
# Make predictions
y_pred = svm_clf.predict(X_test)
# Evaluate the model
print("--- Classification Report for SVM ---")
print(classification_report(y_test, y_pred))
print("\nSVM model training and evaluation complete.")
Observations and Results:
The SVM achieved 100% accuracy on the test set.
The linear kernel worked well for this dataset.
PAGE 7
MSc. CS – SEM II
MACHINE LEARNING – JOURNAL
PRN: MU0027920240005027
Output:
PAGE 8
MSc. CS – SEM II
MACHINE LEARNING – JOURNAL
PRN: MU0027920240005027
Practical No. 5
Train and Fine-tune a Decision Tree for the Moons Dataset
Aim: To train a Decision Tree classifier on a synthetic Moons dataset and fine-tune
hyperparameters.
Solution:
Code:
import matplotlib.pyplot as plt # For plotting the dataset
from sklearn.datasets import make_moons
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV, train_test_split # Added train_test_split
from sklearn.metrics import classification_report # Added classification_report
import numpy as np # For creating a meshgrid for plotting decision boundary
# Generate the Moons dataset
X, y = make_moons(n_samples=1000, noise=0.3, random_state=42)
# Optional: Plot the generated dataset to visualize it
plt.figure(figsize=(8, 6))
plt.scatter(X[y == 0, 0], X[y == 0, 1], c='red', marker='o', label='Class 0')
plt.scatter(X[y == 1, 0], X[y == 1, 1], c='blue', marker='x', label='Class 1')
plt.title("Synthetic Moons Dataset")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.legend()
plt.grid(True)
plt.show()
# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Fine-tune hyperparameters using GridSearchCV
params = {
"max_depth": [3, 5, 10, None], # None means unlimited depth
"min_samples_split": [2, 5, 10]
}
# Initialize DecisionTreeClassifier with a random_state for reproducibility
dt_clf = DecisionTreeClassifier(random_state=42)
grid_search = GridSearchCV(dt_clf, params, cv=5, scoring='accuracy', n_jobs=-1) #
n_jobs=-1 uses all CPU cores
grid_search.fit(X_train, y_train)
# Best model
best_tree = grid_search.best_estimator_
y_pred = best_tree.predict(X_test)
print("--- Decision Tree Hyperparameter Tuning Results ---")
print(f"Best Parameters: {grid_search.best_params_}")
print(f"Best Cross-validation Accuracy: {grid_search.best_score_:.4f}")
print("\n--- Classification Report on Test Set ---")
print(classification_report(y_test, y_pred))
# Optional: Plot the decision boundary of the best model
PAGE 9
MSc. CS – SEM II
MACHINE LEARNING – JOURNAL
PRN: MU0027920240005027
plt.figure(figsize=(10, 8))
x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),
np.linspace(y_min, y_max, 100))
Z = best_tree.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, alpha=0.8, cmap=plt.cm.RdBu)
plt.scatter(X_test[:, 0], X_test[:, 1], c=y_test, s=40, edgecolor='k', cmap=plt.cm.RdBu)
plt.title(f"Decision Boundary of Best Decision Tree (Max Depth: {best_tree.max_depth}, Min
Samples Split: {best_tree.min_samples_split})")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()
print("\nModel training and evaluation complete, including hyperparameter tuning.")
Observations and Results:
The best parameters were max_depth=3 and min_samples_split=2.
The model achieved 89.12% accuracy.
Output:
PAGE 10
MSc. CS – SEM II
MACHINE LEARNING – JOURNAL
PRN: MU0027920240005027
PAGE 11
MSc. CS – SEM II
MACHINE LEARNING – JOURNAL
PRN: MU0027920240005027
Practical No. 6
Train an SVM Regressor on the California Housing Dataset
Aim: To predict housing prices using an SVM regressor.
Solution:
Code:
from sklearn.datasets import fetch_california_housing
from sklearn.svm import SVR
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
# Load the dataset
housing = fetch_california_housing()
X, y = housing.data, housing.target
print(f"Dataset shape: X={X.shape}, y={y.shape}")
# Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2,
random_state=42)
# Train SVM regressor
# 'rbf' kernel is common for non-linear relationships
# C: Regularization parameter. Higher C means less regularization.
# epsilon: Epsilon-SVR loss function parameter. Defines the 'tube' where no penalty is
incurred.
print("\nTraining SVR regressor (this might take a moment)...")
svm_reg = SVR(kernel="rbf", C=1.0, epsilon=0.1)
svm_reg.fit(X_train, y_train)
print("SVR training complete.")
# Evaluate
y_pred = svm_reg.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"\nMean Squared Error: {mse:.4f}") # Increased precision for better comparison
# Plot actual vs. predicted values for a subset of the test data
plt.figure(figsize=(10, 6))
plt.scatter(y_test, y_pred, alpha=0.3)
plt.plot([y.min(), y.max()], [y.min(), y.max()], 'r--', lw=2) # Perfect prediction line
plt.xlabel("Actual Housing Prices ($100k)")
plt.ylabel("Predicted Housing Prices ($100k)")
plt.title("SVR: Actual vs. Predicted Housing Prices")
plt.grid(True)
plt.show()
print("\nSVM Regressor analysis complete.")
PAGE 12
MSc. CS – SEM II
MACHINE LEARNING – JOURNAL
PRN: MU0027920240005027
Observations and Results:
The SVM regressor achieved an MSE of 0.3552.
RBF kernel performed better than linear for this dataset.
Output:
PAGE 13