0% found this document useful (0 votes)
9 views13 pages

Machine Learning Journal

Uploaded by

Bhargav Palav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views13 pages

Machine Learning Journal

Uploaded by

Bhargav Palav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

MSc.

CS – SEM II
MACHINE LEARNING – JOURNAL
PRN: MU0027920240005027

Practical No. 1
Implement Linear Regression (Diabetes Dataset)

Aim: To implement Linear Regression using the Diabetes dataset and evaluate its
performance.

Solution:
Code:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score

# Load the diabetes dataset


diabetes = datasets.load_diabetes()
X = diabetes.data[:, np.newaxis, 2] # Use only one feature (BMI)
y = diabetes.target

# Split into training and testing sets


X_train, X_test = X[:-20], X[-20:]
y_train, y_test = y[:-20], y[-20:]
# Create and train the linear regression model
regr = linear_model.LinearRegression()
regr.fit(X_train, y_train)

# Make predictions
y_pred = regr.predict(X_test)

# Evaluate the model


mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse:.2f}")


print(f"R2 Score: {r2:.2f}")

# Plot the results


plt.scatter(X_test, y_test, color="black")
plt.plot(X_test, y_pred, color="blue", linewidth=3)
plt.xlabel("BMI")
plt.ylabel("Disease Progression")
plt.title("Linear Regression on Diabetes Dataset")
plt.show()

Observations and Results:


 The model achieved an MSE of 2548.07 and an R² score of 0.47.
 The plot shows the linear relationship between BMI and disease progression.

PAGE 1
MSc. CS – SEM II
MACHINE LEARNING – JOURNAL
PRN: MU0027920240005027

Output:

PAGE 2
MSc. CS – SEM II
MACHINE LEARNING – JOURNAL
PRN: MU0027920240005027

Practical No. 2
Implement Logistic Regression (Iris Dataset)

Aim: To classify Iris flowers into two classes using Logistic Regression.

Solution:
Code:
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix

# Load the Iris dataset


iris = datasets.load_iris()
X = iris.data[:, :2] # Using only first two features (sepal length & width)
y = (iris.target != 0).astype(int) # Binary classification (Setosa vs Non-Setosa)

# Split into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the logistic regression model


clf = LogisticRegression()
clf.fit(X_train, y_train)

# Make predictions
y_pred = clf.predict(X_test)

# Evaluate the model


accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)

print(f"Accuracy: {accuracy:.2f}")
print("Confusion Matrix:")
print(conf_matrix)

Observations and Results:


 The model achieved an accuracy of 1.00 (100%) on the test set.
 The confusion matrix confirms perfect classification for this binary task.

PAGE 3
MSc. CS – SEM II
MACHINE LEARNING – JOURNAL
PRN: MU0027920240005027

Output:

PAGE 4
MSc. CS – SEM II
MACHINE LEARNING – JOURNAL
PRN: MU0027920240005027

Practical No. 3
Implement Multinomial Logistic Regression (Iris Dataset)

Aim: To classify Iris flowers into all three classes using Multinomial Logistic
Regression.

Solution:
Code:
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report

# Load the Iris dataset


iris = load_iris()

# Use all 4 features for multiclass classification


X = iris.data
y = iris.target

# Split into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the multinomial logistic regression model


# Removed multi_class="multinomial" as it's the default for 'lbfgs' in newer versions
clf = LogisticRegression(solver="lbfgs", max_iter=1000)
clf.fit(X_train, y_train)

# Make predictions
y_pred = clf.predict(X_test)

# Evaluate the model


print("--- Classification Report ---")
print(classification_report(y_test, y_pred))

print("\nModel training and evaluation complete.")

Observations and Results:


 The model achieved 100% accuracy on the test set.
 The classification report shows perfect precision, recall, and F1-score for all three
classes.

PAGE 5
MSc. CS – SEM II
MACHINE LEARNING – JOURNAL
PRN: MU0027920240005027

Output:

PAGE 6
MSc. CS – SEM II
MACHINE LEARNING – JOURNAL
PRN: MU0027920240005027

Practical No. 4
Implement SVM Classifier (Iris Dataset)

Aim: To classify Iris flowers using a Support Vector Machine (SVM).

Solution:
Code:
from sklearn.datasets import load_iris # Import the iris dataset
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import classification_report

# Load the Iris dataset


iris = load_iris()

# Use all 4 features for classification


X = iris.data
y = iris.target

# Split into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train an SVM classifier


# Using a linear kernel for simplicity and often good performance on Iris
svm_clf = SVC(kernel="linear", random_state=42) # Added random_state for reproducibility
svm_clf.fit(X_train, y_train)

# Make predictions
y_pred = svm_clf.predict(X_test)

# Evaluate the model


print("--- Classification Report for SVM ---")
print(classification_report(y_test, y_pred))

print("\nSVM model training and evaluation complete.")

Observations and Results:


 The SVM achieved 100% accuracy on the test set.
 The linear kernel worked well for this dataset.

PAGE 7
MSc. CS – SEM II
MACHINE LEARNING – JOURNAL
PRN: MU0027920240005027

Output:

PAGE 8
MSc. CS – SEM II
MACHINE LEARNING – JOURNAL
PRN: MU0027920240005027

Practical No. 5
Train and Fine-tune a Decision Tree for the Moons Dataset

Aim: To train a Decision Tree classifier on a synthetic Moons dataset and fine-tune
hyperparameters.

Solution:
Code:
import matplotlib.pyplot as plt # For plotting the dataset
from sklearn.datasets import make_moons
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV, train_test_split # Added train_test_split
from sklearn.metrics import classification_report # Added classification_report
import numpy as np # For creating a meshgrid for plotting decision boundary

# Generate the Moons dataset


X, y = make_moons(n_samples=1000, noise=0.3, random_state=42)

# Optional: Plot the generated dataset to visualize it


plt.figure(figsize=(8, 6))
plt.scatter(X[y == 0, 0], X[y == 0, 1], c='red', marker='o', label='Class 0')
plt.scatter(X[y == 1, 0], X[y == 1, 1], c='blue', marker='x', label='Class 1')
plt.title("Synthetic Moons Dataset")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.legend()
plt.grid(True)
plt.show()

# Split into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fine-tune hyperparameters using GridSearchCV


params = {
"max_depth": [3, 5, 10, None], # None means unlimited depth
"min_samples_split": [2, 5, 10]
}
# Initialize DecisionTreeClassifier with a random_state for reproducibility
dt_clf = DecisionTreeClassifier(random_state=42)
grid_search = GridSearchCV(dt_clf, params, cv=5, scoring='accuracy', n_jobs=-1) #
n_jobs=-1 uses all CPU cores
grid_search.fit(X_train, y_train)

# Best model
best_tree = grid_search.best_estimator_
y_pred = best_tree.predict(X_test)

print("--- Decision Tree Hyperparameter Tuning Results ---")


print(f"Best Parameters: {grid_search.best_params_}")
print(f"Best Cross-validation Accuracy: {grid_search.best_score_:.4f}")
print("\n--- Classification Report on Test Set ---")
print(classification_report(y_test, y_pred))

# Optional: Plot the decision boundary of the best model

PAGE 9
MSc. CS – SEM II
MACHINE LEARNING – JOURNAL
PRN: MU0027920240005027

plt.figure(figsize=(10, 8))
x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100),
np.linspace(y_min, y_max, 100))
Z = best_tree.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, alpha=0.8, cmap=plt.cm.RdBu)
plt.scatter(X_test[:, 0], X_test[:, 1], c=y_test, s=40, edgecolor='k', cmap=plt.cm.RdBu)
plt.title(f"Decision Boundary of Best Decision Tree (Max Depth: {best_tree.max_depth}, Min
Samples Split: {best_tree.min_samples_split})")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()

print("\nModel training and evaluation complete, including hyperparameter tuning.")

Observations and Results:


 The best parameters were max_depth=3 and min_samples_split=2.
 The model achieved 89.12% accuracy.

Output:

PAGE 10
MSc. CS – SEM II
MACHINE LEARNING – JOURNAL
PRN: MU0027920240005027

PAGE 11
MSc. CS – SEM II
MACHINE LEARNING – JOURNAL
PRN: MU0027920240005027

Practical No. 6
Train an SVM Regressor on the California Housing Dataset

Aim: To predict housing prices using an SVM regressor.

Solution:
Code:
from sklearn.datasets import fetch_california_housing
from sklearn.svm import SVR
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import matplotlib.pyplot as plt
# Load the dataset
housing = fetch_california_housing()
X, y = housing.data, housing.target

print(f"Dataset shape: X={X.shape}, y={y.shape}")

# Scale features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.2,
random_state=42)

# Train SVM regressor


# 'rbf' kernel is common for non-linear relationships
# C: Regularization parameter. Higher C means less regularization.
# epsilon: Epsilon-SVR loss function parameter. Defines the 'tube' where no penalty is
incurred.
print("\nTraining SVR regressor (this might take a moment)...")
svm_reg = SVR(kernel="rbf", C=1.0, epsilon=0.1)
svm_reg.fit(X_train, y_train)
print("SVR training complete.")

# Evaluate
y_pred = svm_reg.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"\nMean Squared Error: {mse:.4f}") # Increased precision for better comparison

# Plot actual vs. predicted values for a subset of the test data
plt.figure(figsize=(10, 6))
plt.scatter(y_test, y_pred, alpha=0.3)
plt.plot([y.min(), y.max()], [y.min(), y.max()], 'r--', lw=2) # Perfect prediction line
plt.xlabel("Actual Housing Prices ($100k)")
plt.ylabel("Predicted Housing Prices ($100k)")
plt.title("SVR: Actual vs. Predicted Housing Prices")
plt.grid(True)
plt.show()

print("\nSVM Regressor analysis complete.")

PAGE 12
MSc. CS – SEM II
MACHINE LEARNING – JOURNAL
PRN: MU0027920240005027

Observations and Results:


 The SVM regressor achieved an MSE of 0.3552.
 RBF kernel performed better than linear for this dataset.

Output:

PAGE 13

You might also like