ML Lab Programs (1)
ML Lab Programs (1)
Course Outcomes:
LIST OF EXPERIMENTS
1. Install the python software/Anaconda- python and install useful package for Machine
learning load the dataset(sample), understand, and visualize the Data
2. Implement simple linear regression
3. Implement multivariate linear regression.
4. Implement simple logistic regression and multivariate logistic regression.
5. Implement decision trees.
6. Implement any 3 classification algorithms.
7. Implement random forests algorithm
8. Implement K-means, KNN algorithmsm
9. Implement SVM on any applicable datasets.
10.Implement neural networks
11.Implement CA.
12.Implement anomaly detection and recommendation
REFERENCES
We will use a dataset with multiple features (independent variables) such as size of the house
(square footage), number of bedrooms, and age of the house to predict the house price.
'Age': [10, 12, 13, 14, 15, 16, 17, 18, 19, 20],
'Price': [300, 320, 340, 360, 400, 410, 430, 450, 480, 500]
}
Create a DataFrame
df = pd.DataFrame(data)
Features (Size, Bedrooms, Age)
X = df[['Size', 'Bedrooms', 'Age']]
Target (Price)
y = df['Price']
Split data into training and testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Create the model
model = LinearRegression()
Train the model
model.fit(X_train, y_train)
Make predictions on the test set
y_pred = model.predict(X_test)
Evaluate the model (Mean Squared Error)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean Squared Error: {mse}")
Print the coefficients and intercept
print(f"Coefficients (Weights): {model.coef_}")
print(f"Intercept (Bias): {model.intercept_}")
Example prediction: Predict the price of a house with 2200 sq ft, 4 bedrooms, and 15 years
old
example = np.array([[2200, 4, 15]])
predicted_price = model.predict(example)
print(f"Predicted price for a 2200 sq ft, 4 bedroom, 15 year old house: {predicted_price[0]}
thousand dollars")
y = df['Buy']
Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Create the Logistic Regression model
model = LogisticRegression()
Train the model
model.fit(X_train, y_train)
Make predictions
y_pred = model.predict(X_test)
Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")
Example prediction: Will a 50-year-old buy a car?
example = np.array([[50]])
pred = model.predict(example)
print(f"Prediction for age 50: {'Yes' if pred[0] == 1 else 'No'}")
Sample data: Study Time, Practice Tests, and Pass (0 = Fail, 1 = Pass)
data = {
'StudyTime': [10, 20, 30, 40, 25, 35, 50, 60, 70, 80],
'PracticeTests': [1, 3, 2, 4, 3, 5, 6, 5, 7, 8],
'Pass': [0, 0, 1, 1, 0, 1, 1, 1, 1, 1]
}
df = pd.DataFrame(data)
Features (StudyTime, PracticeTests)
X = df[['StudyTime', 'PracticeTests']]
Target (Pass)
y = df['Pass']
Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
Create the Logistic Regression model
model = LogisticRegression()
Train the model
model.fit(X_train, y_train)
Make predictions
y_pred = model.predict(X_test)
Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")
Example prediction: Will a student who studies 55 hours and takes 6 practice tests pass?
example = np.array([[55, 6]])
pred = model.predict(example)
print(f"Prediction for 55 hours study and 6 practice tests: {'Pass' if pred[0] == 1 else 'Fail'}")
Decision Trees classify data points by splitting the dataset into branches based on feature
values. The splits are made using metrics like Gini impurity or entropy.
We will implement Logistic Regression, k-Nearest Neighbors (k-NN), and Support Vector
Machine (SVM) on the same dataset.
Random Forest is an ensemble learning method that combines multiple decision trees to
improve accuracy and prevent overfitting.
K-means Clustering:
K-means is an unsupervised algorithm that divides data points into (k) clusters, where each
data point belongs to the cluster with the nearest mean.
Support Vector Machine (SVM) is a supervised learning algorithm that finds the hyperplane
that best separates data into classes.
svm_model.fit(X_train, y_train)
Make predictions
y_pred = svm_model.predict(X_test)
Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
print(f"SVM Accuracy on Iris dataset: {accuracy}")
Neural networks consist of layers of interconnected nodes, and they are used for various
types of classification or regression tasks.
We'll implement a basic Multi-Layer Perceptron (MLP) using the `MLPClassifier` from the
`sklearn` library.
Correlation analysis is used to study the strength and direction of the linear relationship
between two continuous variables.
Anomaly Detection:
Anomaly detection identifies outliers or rare items that differ significantly from the majority
of the data.