0% found this document useful (0 votes)

27 views17 pages

ML Lab File Final

Uploaded by

akhil gharu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views17 pages

ML Lab File Final

Uploaded by

akhil gharu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

‭EXPERIMENT – 2‬

‭ IM‬‭: Use PCA on a high-dimensional dataset to reduce‬‭its dimensionality while retaining most of the‬
A
‭variance and visualize the data.‬

‭CODE‬‭:‬

‭mport pandas as pd‬

i
import numpy as np‬
‭
import [Link] as plt‬
‭
import seaborn as sns‬
‭
from [Link] import StandardScaler‬
‭
from [Link] import PCA‬
‭

df = pd.read_csv('USA_Housing.csv')‬
‭

‭rint([Link]().sum()) # no null values‬

p
[Link](['Address'],axis=1,inplace=True)‬
‭

‭ Putting feature variable to X‬

#
X = df[['Avg. Area Income','Avg. Area House Age','Avg. Area Number‬
‭
of Rooms','Avg. Area Number of Bedrooms','Area Population']]‬
‭

‭ Output variable‬
#
y = df['Price']‬
‭

‭ Standardize the features‬

#
scaler = StandardScaler()‬
‭
X_scaled = scaler.fit_transform(X)‬
‭

‭ Apply PCA‬
#
pca = PCA(n_components=2) # Reduce to 2 dimensions for‬
‭
visualization‬
‭
X_pca = pca.fit_transform(X_scaled)‬
‭

‭ Explained variance ratio‬

#
explained_variance = pca.explained_variance_ratio_‬
‭

‭ Visualize the PCA results‬

#
[Link](figsize=(8, 6))‬
‭
[Link](X_pca[:, 0], X_pca[:, 1], c=y, cmap='viridis')‬
‭
[Link]('PCA of USA Pricing dataset')‬
‭
[Link]('Principal Component 1 (Explained Variance:‬
‭
{:.2f}%)'.format(explained_variance[0]*100))‬
‭

‭2‬
‭[Link]('Principal Component 2 (Explained Variance:‬
p
{:.2f}%)'.format(explained_variance[1]*100))‬
‭
[Link]()‬
‭

‭OUTPUT‬‭:‬

‭Scatter Plot‬

‭Before PCA‬

‭After PCA‬

‭RESULT‬‭: Hence, we have reduced the dimension of the‬‭dataset.‬

‭3‬
‭EXPERIMENT – 3‬

‭ IM‬‭:‬‭Perform‬‭a‬‭linear‬‭regression‬‭analysis‬‭on‬‭a‬‭dataset‬‭to‬‭predict‬‭a‬‭continuous‬‭target‬‭variable‬‭based‬‭on‬
A
‭a‬ ‭one‬ ‭o‬ ‭r‬ ‭more‬‭predictor‬‭variables.‬‭Evaluate‬‭the‬‭model’s‬‭performance‬‭using‬‭metrics‬‭like‬‭RMSE‬‭and‬
‭R-sqaured.‬

‭CODE‬‭:‬

‭ Importing Libraries‬
#
import pandas as pd‬
‭
import numpy as np‬
‭
import [Link] as plt‬
‭
import seaborn as sns‬
‭
from [Link] import StandardScaler‬
‭
from [Link] import PCA‬
‭
from sklearn.model_selection import train_test_split‬
‭
from [Link] import mean_squared_error, r2_score‬
‭

df = pd.read_csv('USA_Housing.csv')‬
‭

‭rint("Checking null values:\n",[Link]().sum(),"\n\n") # no null‬

p
values‬
‭
[Link](['Address'],axis=1,inplace=True)‬
‭

‭ Putting feature variable to X‬

#
X = df[['Avg. Area Income','Avg. Area House Age','Avg. Area Number‬
‭
of Rooms','Avg. Area Number of Bedrooms','Area Population']]‬
‭

‭ Output variable‬
#
y = df['Price']‬
‭

‭_train, X_test, y_train, y_test = train_test_split(X, y,‬

X
test_size=0.2, random_state=42)‬
‭

‭rom sklearn.linear_model import LinearRegression‬

f
lm = LinearRegression()‬
‭
[Link](X_train, y_train)‬
‭

y_pred = [Link](X_test)‬
‭

‭ Calculate RMSE‬
#
rmse = [Link](mean_squared_error(y_test, y_pred))‬
‭

‭ Calculate R-squared‬
#
r_squared = r2_score(y_test, y_pred)‬
‭

‭4‬
‭rint("Linear regression model performance:\nRoot Mean Squared Error‬
p
(RMSE):", rmse)‬
‭
print("R-squared:", r_squared)‬
‭

‭[Link](figsize=(8, 6))‬
p
[Link](y_test, y_pred, alpha=0.5, label='Predicted',‬
‭
color='cyan')‬
‭
[Link](y_test, y_test, alpha=0.5, label='Actual', color='blue')‬
‭
[Link]('Actual Values')‬
‭
[Link]('Predicted Values')‬
‭
[Link]('Actual vs Predicted Values')‬
‭
[Link]()‬
‭
[Link]()‬
‭

‭OUTPUT‬‭:‬

‭Predicted Value vs Actual Value‬

‭ ESULT‬‭: Hence, we have trained and evaluated the model.‬

R
‭Evaluation Results are:‬
‭●‬ ‭Root Mean Squared Error (RMSE): 100444.06055558745‬
‭●‬ ‭R-squared: 0.9179971706834289‬

‭5‬
‭EXPERIMENT – 4‬

‭ IM‬‭:‬ ‭Compare‬ ‭the‬ ‭performance‬ ‭of‬ ‭various‬ ‭classifications‬ ‭algorithms‬ ‭(e.g.,‬ ‭Logistic‬ ‭Regression,‬
A
‭Decision‬ ‭Trees,‬ ‭Random‬ ‭Forest,‬ ‭SVM‬ ‭and‬ ‭Naïve‬ ‭Bayes)‬ ‭on‬ ‭a‬ ‭common‬ ‭dataset‬ ‭using‬ ‭accuracy,‬
‭precision, recall and F1-Score.‬

‭CODE‬‭:‬

‭mport pandas as pd‬

i
import seaborn as sns‬
‭
import [Link] as plt‬
‭
from sklearn.model_selection import train_test_split‬
‭
from [Link] import StandardScaler‬
‭
from sklearn.linear_model import LogisticRegression‬
‭
from [Link] import DecisionTreeClassifier‬
‭
from [Link] import RandomForestClassifier‬
‭
from [Link] import SVC‬
‭
from sklearn.naive_bayes import GaussianNB‬
‭
from [Link] import accuracy_score, precision_score,‬
‭
recall_score, f1_score‬
‭

‭f = pd.read_csv('gender_classification_v7.csv')‬
d
df['gender'] = df['gender'].apply(lambda x: 0 if x == 'Male' else 1)‬
‭

‭[Link](figsize=(2, 4))‬
p
[Link]('Count of Gender', size=10)‬
‭
[Link](data=df, x="gender")‬
‭
[Link]('Count', size=12)‬
‭
[Link]('Gender', size=12)‬
‭
[Link](top=True, right=True, left=False, bottom=False)‬
‭
[Link]()‬
‭

‭ = [Link](columns=['gender'])‬
X
y = df['gender']‬
‭

‭caler = StandardScaler()‬
s
X_scaled = [Link](scaler.fit_transform(X), columns=[Link])‬
‭
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y,‬
‭
test_size=0.2, random_state=42, stratify=y)‬
‭

Name, Accuracy, Precision, Recall, F1_Score = [], [], [], [], []‬
‭

‭ Logistic Regression‬
#
regression = LogisticRegression()‬
‭
[Link](X_train, y_train)‬
‭

‭6‬
‭_pred = [Link](X_test)‬
y
[Link](float("{:.2f}".format(accuracy_score(y_test, y_pred)‬
‭
*100)))‬
‭
[Link](precision_score(y_test, y_pred))‬
‭
[Link](recall_score(y_test, y_pred))‬
‭
F1_Score.append(f1_score(y_test, y_pred))‬
‭
[Link]('Logistic Regression')‬
‭

‭ Decision Tree‬
#
tree = DecisionTreeClassifier(criterion="gini", random_state=100,‬
‭
max_depth=3, min_samples_leaf=5)‬
‭
[Link](X_train, y_train)‬
‭
y_pred = [Link](X_test)‬
‭
[Link](float("{:.2f}".format(accuracy_score(y_test, y_pred)‬
‭
*100)))‬
‭
[Link](precision_score(y_test, y_pred))‬
‭
[Link](recall_score(y_test, y_pred))‬
‭
F1_Score.append(f1_score(y_test, y_pred))‬
‭
[Link]('Decision Tree')‬
‭

‭ Random Forest‬
#
forest = RandomForestClassifier(n_estimators = 100)‬
‭
[Link](X_train, y_train)‬
‭
y_pred = [Link](X_test)‬
‭
[Link](float("{:.2f}".format(accuracy_score(y_test, y_pred)‬
‭
*100)))‬
‭
[Link](precision_score(y_test, y_pred))‬
‭
[Link](recall_score(y_test, y_pred))‬
‭
F1_Score.append(f1_score(y_test, y_pred))‬
‭
[Link]('Random Forest')‬
‭

‭ SVM‬
#
svm = SVC(kernel='linear')‬
‭
[Link](X_train, y_train)‬
‭
y_pred = [Link](X_test)‬
‭
[Link](float("{:.2f}".format(accuracy_score(y_test, y_pred)‬
‭
*100)))‬
‭
[Link](precision_score(y_test, y_pred))‬
‭
[Link](recall_score(y_test, y_pred))‬
‭
F1_Score.append(f1_score(y_test, y_pred))‬
‭
[Link]('SVM')‬
‭

‭ Naive Bayes‬
#
naiveBayes = GaussianNB()‬
‭
[Link](X_train, y_train)‬
‭
y_pred = [Link](X_test)‬
‭

‭7‬
‭[Link](float("{:.2f}".format(accuracy_score(y_test, y_pred)‬
A
*100)))‬
‭
[Link](precision_score(y_test, y_pred))‬
‭
[Link](recall_score(y_test, y_pred))‬
‭
F1_Score.append(f1_score(y_test, y_pred))‬
‭
[Link]('Naive Bayes')‬
‭

‭ Combining all models' Performance‬

#
models = {'Model':Name,'Accuracy':Accuracy, 'Precision':Precision,‬
‭
'Recall':Recall, 'F1_Score':F1_Score}‬
‭
model_df = [Link](models)‬
‭
model_df‬
‭

‭OUTPUT‬‭:‬

‭Dataset‬

‭Count Plot‬

‭RESULT‬‭: All models are trained and evaluated. Naïve‬‭Bayes performs best for the given dataset.‬

‭8‬
‭EXPERIMENT – 5‬
‭ IM‬‭:‬ ‭Implement‬ ‭ensemble‬ ‭methods‬ ‭such‬ ‭as‬ ‭Bagging‬ ‭(e.g.,‬ ‭Random‬ ‭Forest)‬ ‭and‬ ‭Boosting‬ ‭(e.g.,‬
A
‭AdaBoost) on a classification task and compare their performance to individual models.‬

‭ ODE‬‭:‬
C
import pandas as pd‬
‭
import seaborn as sns‬
‭
import [Link] as plt‬
‭
from sklearn.model_selection import train_test_split‬
‭
from [Link] import StandardScaler‬
‭
from [Link] import RandomForestClassifier,‬
‭
AdaBoostClassifier‬
‭
from [Link] import accuracy_score, precision_score,‬
‭
recall_score, f1_score‬
‭

‭f = pd.read_csv('gender_classification_v7.csv')‬
d
df['gender'] = df['gender'].apply(lambda x: 0 if x == 'Male' else 1)‬
‭

‭ = [Link](columns=['gender'])‬
X
y = df['gender']‬
‭
scaler = StandardScaler()‬
‭
X_scaled = [Link](scaler.fit_transform(X), columns=[Link])‬
‭
X_train, X_test, y_train, y_test = train_test_split(X_scaled, y,‬
‭
test_size=0.2, random_state=42, stratify=y)‬
‭

Name, Accuracy, Precision, Recall, F1_Score = [], [], [], [], []‬
‭

‭ Ensemble Methods‬
#
# Bagging - Random Forest‬
‭
forest = RandomForestClassifier(n_estimators = 100)‬
‭
[Link](X_train, y_train)‬
‭
y_pred = [Link](X_test)‬
‭
[Link](float("{:.2f}".format(accuracy_score(y_test, y_pred)‬
‭
*100)))‬
‭
[Link](precision_score(y_test, y_pred))‬
‭
[Link](recall_score(y_test, y_pred))‬
‭
F1_Score.append(f1_score(y_test, y_pred))‬
‭
[Link]('Random Forest')‬
‭

‭9‬
‭ Boosting - AdaBoost Classifier‬
#
adaBoost = AdaBoostClassifier()‬
‭
[Link](X_train, y_train)‬
‭
y_pred = [Link](X_test)‬
‭
[Link](float("{:.2f}".format(accuracy_score(y_test, y_pred)‬
‭
*100)))‬
‭
[Link](precision_score(y_test, y_pred))‬
‭
[Link](recall_score(y_test, y_pred))‬
‭
F1_Score.append(f1_score(y_test, y_pred))‬
‭
[Link]('AdaBoost')‬
‭

‭ Combining all models' Performance‬

#
models = {'Model':Name,'Accuracy':Accuracy, 'Precision':Precision,‬
‭
'Recall':Recall, 'F1_Score':F1_Score}‬
‭
model_df = [Link](models)‬
‭
model_df‬
‭

‭OUTPUT‬‭:‬

‭Dataset‬

‭Count Plot‬

‭ ESULT‬‭: Both models are trained and evaluated. AdaBoost‬‭Classifier performs best for the given‬
R
‭dataset.‬

‭10‬
‭EXPERIMENT – 6‬

‭ IM‬‭: Write a code for feature selection techniques‬‭to reduce the no. of features in a dataset while‬
A
‭maintaining or improving the model's performance.‬
‭CODE‬‭:‬

‭mport pandas as pd‬

i
from [Link] import StandardScaler‬
‭
from sklearn.feature_selection import SelectKBest, f_regression‬
‭
from sklearn.model_selection import train_test_split‬
‭
from sklearn.linear_model import LinearRegression‬
‭
from [Link] import mean_squared_error, r2_score‬
‭
import math‬
‭

‭ Assuming 'df' is your DataFrame containing the dataset‬

#
df = pd.read_csv('USA_Housing.csv')‬
‭

‭ Selecting features and output variable‬

#
X = df[['Avg. Area Income', 'Avg. Area House Age', 'Avg. Area Number‬
‭
of Rooms', 'Avg. Area Number of Bedrooms', 'Area Population']]‬
‭
y = df['Price']‬
‭

‭ Perform univariate feature selection using ANOVA F-test‬

#
selector = SelectKBest(score_func=f_regression, k=4) # Select top 3‬
‭
features based on F-test‬
‭
X_selected = selector.fit_transform(X, y)‬
‭

‭ Get selected feature indices‬

#
selected_indices = selector.get_support(indices=True)‬
‭

‭ Get the names of selected features‬

#
selected_features = [Link][selected_indices]‬
‭

‭ Splitting the dataset into training and testing sets‬

#
X_train, X_test, y_train, y_test = train_test_split(X_selected, y,‬
‭
test_size=0.2, random_state=42)‬
‭

‭ Initialize and fit a linear regression model‬

#
model = LinearRegression()‬
‭
[Link](X_train, y_train)‬
‭

‭ Predict on the test set‬

#
y_pred = [Link](X_test)‬
‭

# Calculate performance metrics‬

‭

‭11‬
‭mse = [Link](mean_squared_error(y_test, y_pred))‬
r
r_squared = r2_score(y_test, y_pred)‬
‭

‭rint(f'Root Mean Squared Error (MSE): {rmse:.4f}')‬

p
print(f'R-squared: {r_squared:.4f}')‬
‭
print('Original Features: ', list([Link]))‬
‭
print(f'Selected Features: {list(selected_features)}\n')‬
‭

‭OUTPUT‬‭:‬

‭ ESULT‬‭:‬ ‭After‬ ‭Feature‬ ‭reduction‬ ‭using‬ ‭ANOVA‬ ‭F-Test.‬ ‭Model‬ ‭performance‬ ‭is‬ ‭good.‬ ‭Results‬ ‭are‬
R
‭shown below:‬
‭●‬ ‭Root Mean Squared Error (MSE)‬‭:‬‭100367.9313‬
‭●‬ ‭R-squared‬‭: 0.9181‬
‭●‬ ‭Original‬‭Features‬‭:‬‭['Avg.‬‭Area‬‭Income',‬‭'Avg.‬‭Area‬‭House‬‭Age',‬‭'Avg.‬‭Area‬‭Number‬‭of‬‭Rooms',‬
‭'Avg. Area Number of Bedrooms', 'Area Population']‬
‭●‬ ‭Selected‬‭Features‬‭:‬‭['Avg.‬‭Area‬‭Income',‬‭'Avg.‬‭Area‬‭House‬‭Age',‬‭'Avg.‬‭Area‬‭Number‬‭of‬‭Rooms',‬
‭'Area Population']‬

‭12‬
‭EXPERIMENT – 7‬
‭ IM‬‭: Write a code to apply Apriori algorithm to discover‬‭association rules in retail transaction dataset‬
A
‭to identify frequently co-occurring items in customer purchases‬

‭CODE‬‭:‬

‭pip install apyori‬

!
import pandas as pd‬
‭
from mlxtend.frequent_patterns import association_rules‬
‭
import [Link] as plt‬
‭
import [Link] as px‬
‭
from apyori import apriori‬
‭

‭ Create a dataframe and assign data from excel spreadsheet‬

#
data = pd.read_csv('/content/Groceries_dataset.csv')‬
‭

#one hot encoding the products:‬

‭

‭ummy = pd.get_dummies(data['itemDescription'])‬
d
[Link](['itemDescription'], inplace =True, axis=1)‬
‭

data = [Link](dummy)‬
‭

‭[Link]()‬
d
# Transaction: If a customer bought multiple products in one day, it‬
‭
will be considered as 1 transaction:‬
‭

‭ata1 = [Link](['Member_number', 'Date'])[products[:]].sum()‬

d
data1 = data1.reset_index()[products]‬
‭

‭rint("New Dimension", [Link])‬

p
[Link]()‬
‭
#Replacing all non-zero values with the name of the product:‬
‭

def product_names(x):‬
‭
for product in products:‬
‭
if x[product] >0:‬
‭
x[product] = product‬
‭
return x‬
‭

‭ata1 = [Link](product_names, axis=1)‬

d
[Link]()‬
‭
#Removing Zeros, Extracting the list of items bought per customer‬
‭

‭ = [Link]‬
x
x = [sub[~(sub==0)].tolist() for sub in x if sub [sub !=‬
‭
0].tolist()]‬
‭
transactions = x‬
‭
transactions[0:10]‬
‭
rules = apriori(transactions, min_support = 0.00030, min_lift = 3,‬
‭
max_length = 2, target = "rules")‬
‭

‭13‬
‭ssociation_results = list(rules)‬
a
print(association_results[0])‬
‭

for item in association_results:‬

‭

‭air = item[0]‬
p
items = [x for x in pair]‬
‭

‭rint("Rule : ", items[0], " -> " + items[1])‬

p
print("Support : ", str(item[1]))‬
‭
print("Confidence : ",str(item[2][0][2]))‬
‭
print("Lift : ", str(item[2][0][3]))‬
‭

print("=============================")‬
‭

‭OUTPUT‬‭:‬

‭ ESULT‬‭: 8 association rules in retail(grocery) transaction‬‭dataset are identified for frequently‬

R
‭co-occurring items in customer purchases.‬

‭14‬
‭EXPERIMENT – 8‬
‭ IM‬‭: Implement k-fold cross-validation on a classification‬‭task to assess the model’s performance,‬
A
‭addressing issue of overfitting.‬

‭CODE‬‭:‬

‭mport numpy as np‬

i
import pandas as pd‬
‭
from sklearn.model_selection import KFold‬
‭
from [Link] import load_iris‬
‭
from sklearn.linear_model import LogisticRegression‬
‭
from sklearn.model_selection import cross_val_score‬
‭
from numpy import mean,std‬
‭

‭ata = load_iris()‬
d
df = [Link]([Link], columns=data.feature_names)‬
‭
df['species'] = [Link]‬
‭
df‬
‭

df['species'].value_counts()‬
‭

‭ = [Link](['species'],axis='columns')‬
X
Y = [Link]‬
‭

for i in range(2,16):‬
‭
kf=KFold(n_splits=i, random_state=1, shuffle=True)‬
‭
scores = cross_val_score(model, X, Y, scoring='accuracy', cv=kf,‬
‭
n_jobs=-1)‬
‭
print('n-split:',i)‬
‭
print('Accuracy: %.3f (%.3f)' % (mean(scores), std(scores)))‬
‭

‭OUTPUT‬‭:‬

‭ ESULT‬‭:‬‭K-fold cross-validation helps in obtaining‬‭a more reliable estimate of the model's‬

R
‭performance by repeating the training and testing process k times with different subsets. It‬
‭helps to identify models that generalize well to unseen data and reduces the risk of‬
‭overfitting to specific patterns in the training data, leading to a more robust evaluation of‬
‭model performance.‬

‭15‬
‭EXPERIMENT – 9‬
‭ IM‬‭: To implement a simple classification model to‬‭predict the species of iris flowers in Iris Dataset‬
A
‭using basic algorithms like logistic regression or k-nearest neighbors.‬
‭CODE‬‭:‬

‭mport numpy as np‬

i
import pandas as pd‬
‭
import [Link] as plt‬
‭
from [Link] import load_iris‬
‭
from [Link] import KNeighborsClassifier‬
‭
from sklearn.model_selection import train_test_split‬
‭
from sklearn.linear_model import LogisticRegression‬
‭
data = load_iris()‬
‭
df = [Link]([Link], columns=data.feature_names)‬
‭
df['species'] = [Link]‬
‭
[Link]()‬
‭
X = [Link](['species'],axis='columns')‬
‭
Y = [Link]‬
‭
X_train, X_test, y_train, y_test = train_test_split(X, Y,test_size =‬
‭
0.2, random_state = 0)‬
‭
#KNN Classifier‬
‭
knn5 = KNeighborsClassifier(n_neighbors = 5)‬
‭
knn1 = KNeighborsClassifier(n_neighbors=1)‬
‭
[Link](X_train, y_train)‬
‭
[Link](X_train, y_train)‬
‭

‭_pred_knn5 = [Link](X_test)‬
y
y_pred_knn1 = [Link](X_test)‬
‭
print("Accuracy with KNN at k=5", accuracy_score(y_test,‬
‭
y_pred_knn5)*100)‬
‭
print("Accuracy with KNN at k=1", accuracy_score(y_test,‬
‭
y_pred_knn1)*100)‬
‭
log_regr = LogisticRegression(solver='lbfgs', max_iter=1000)‬
‭
log_regr.fit(X_train, y_train)‬
‭
# Predict labels of unseen (test) data‬
‭
y_pred_lr=log_regr.predict(X_test)‬
‭
score=accuracy_score(y_test,y_pred_lr)‬
‭
# The score method returns the accuracy of the model‬
‭
print("Accuracy of logistic regression ", score*100)‬
‭

‭OUTPUT‬‭:‬

‭ ESULT‬‭: Simple classification models (K-Nearest Neighbor‬‭and Logistic Regression) are trained‬
R
‭and evaluated.‬

‭16‬
‭EXPERIMENT – 10‬
‭ IM‬‭: Predict the quality of wine based on features‬‭like acidity, alcohol content, and pH by using‬
A
‭either linear regression or decision trees.‬

‭CODE‬‭:‬

‭mport pandas as pd‬

i
from sklearn.model_selection import train_test_split‬
‭
from sklearn.linear_model import LinearRegression‬
‭
from sklearn import metrics‬
‭
import numpy as np‬
‭
import [Link] as plt‬
‭

‭Load the Dataset‬

#
data = pd.read_csv('/content/[Link]')‬
‭
df = [Link](data)‬
‭
df‬
‭

‭[Link]().sum()‬
d
[Link]([Link]([Link]()))‬
‭
X = df[['fixed acidity', 'volatile acidity', 'citric acid',‬
‭
'residual sugar', 'chlorides', 'free sulfur dioxide', 'total sulfur‬
‭
dioxide', 'density', 'pH','sulphates','alcohol']].values‬
‭
Y = df[‘quality'].values‬
‭
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size‬
‭
= 0.2, random_state = 0)‬
‭
regressor = LinearRegression()‬
‭
[Link](X_train, y_train)‬
‭
coeff_df = [Link](regressor.coef_, ['fixed acidity', 'volatile‬
‭
acidity', 'citric acid', 'residual sugar', 'chlorides', 'free sulfur‬
‭
dioxide', 'total sulfur dioxide', 'density',‬
‭
'pH','sulphates','alcohol'] , columns=['Coefficient'])‬
‭
coeff_df‬
‭
print(regressor.intercept_)‬
‭

y_pred = [Link](X_test)‬
‭

‭ Calculate RMSE‬
#
rmse = [Link](mean_squared_error(y_test, y_pred))‬
‭

‭ Calculate R-squared‬
#
r_squared = r2_score(y_test, y_pred)‬
‭

‭rint("Linear regression model performance:\nRoot Mean Squared Error‬

p
(RMSE):", rmse)‬
‭
print("R-squared:", r_squared)‬
‭
[Link](figsize=(8, 6))‬
‭
[Link](y_test, y_pred, alpha=0.5, label='Predicted',‬
‭
color='cyan')‬
‭
[Link](y_test, y_test, alpha=0.5, label='Actual', color='blue')‬
‭
[Link]('Actual Values')‬
‭
[Link]('Predicted Values')‬
‭
[Link]('Actual vs Predicted Values')‬
‭

‭17‬
‭[Link]()‬
p
[Link]()‬
‭

‭OUTPUT‬‭:‬

‭ ESULT:‬‭Hence, we have trained and evaluated the model.‬

R
‭Evaluation Results are:‬
‭●‬ ‭Root Mean Squared Error (RMSE): 0.7302836974721729‬
‭●‬ ‭R-squared: 0.3001119515373122‬

‭18‬

LDA and Linear Regression Implementation
No ratings yet
LDA and Linear Regression Implementation
21 pages
1
No ratings yet
1
13 pages
Data Science Experiments with Python
No ratings yet
Data Science Experiments with Python
14 pages
ML Manual
No ratings yet
ML Manual
9 pages
Data Mining Practicals
No ratings yet
Data Mining Practicals
22 pages
ML Manual
No ratings yet
ML Manual
30 pages
Linear and Multiple Regression Analysis
No ratings yet
Linear and Multiple Regression Analysis
10 pages
Pca 2382487
No ratings yet
Pca 2382487
8 pages
PCA Implementation and Analysis
No ratings yet
PCA Implementation and Analysis
15 pages
ML Manual
No ratings yet
ML Manual
18 pages
Mechatronics AI & ML Lab Manual 2024
No ratings yet
Mechatronics AI & ML Lab Manual 2024
22 pages
AIML Project
No ratings yet
AIML Project
4 pages
Data Analysis with Regression Techniques
No ratings yet
Data Analysis with Regression Techniques
5 pages
Build Regression, Decision Trees, SVMs
No ratings yet
Build Regression, Decision Trees, SVMs
19 pages
Perceptron and PCA for Data Analysis
No ratings yet
Perceptron and PCA for Data Analysis
9 pages
Data Exploration and Preprocessing in Python
No ratings yet
Data Exploration and Preprocessing in Python
20 pages
Da Rec
No ratings yet
Da Rec
29 pages
ML Practical 5
No ratings yet
ML Practical 5
10 pages
DSBDA Prac4 2
No ratings yet
DSBDA Prac4 2
1 page
Predicting House Prices with ML
No ratings yet
Predicting House Prices with ML
6 pages
Machine Learning Regression and Classification
No ratings yet
Machine Learning Regression and Classification
5 pages
Machine Learning Cheat Sheet
No ratings yet
Machine Learning Cheat Sheet
15 pages
Feature Selection for House Price Prediction
No ratings yet
Feature Selection for House Price Prediction
17 pages
ML Remaining
No ratings yet
ML Remaining
17 pages
List Operations and Regression Techniques
No ratings yet
List Operations and Regression Techniques
3 pages
ML Lab Experiment Shortened With Same Output
No ratings yet
ML Lab Experiment Shortened With Same Output
6 pages
Wheat Production Model Comparison
No ratings yet
Wheat Production Model Comparison
5 pages
Boston Housing Data Analysis
No ratings yet
Boston Housing Data Analysis
4 pages
Model Comparison for Regression Techniques
No ratings yet
Model Comparison for Regression Techniques
4 pages
Loan Data Analysis with Random Forest
No ratings yet
Loan Data Analysis with Random Forest
8 pages
Decision Tree and Machine Learning Models
No ratings yet
Decision Tree and Machine Learning Models
40 pages
ML Batch
No ratings yet
ML Batch
36 pages
ML Programs
No ratings yet
ML Programs
14 pages
Housing Data Analysis with Python
No ratings yet
Housing Data Analysis with Python
11 pages
Message
No ratings yet
Message
5 pages
Python
No ratings yet
Python
4 pages
Kernel Functions in Regression Analysis
No ratings yet
Kernel Functions in Regression Analysis
9 pages
Mlda - Lab
No ratings yet
Mlda - Lab
35 pages
DA Lab2
No ratings yet
DA Lab2
5 pages
Ai 28-01-25
No ratings yet
Ai 28-01-25
18 pages
Lab Extern L
No ratings yet
Lab Extern L
8 pages
Lasso Regression Aim: Roll Number: 160122733094 Date
No ratings yet
Lasso Regression Aim: Roll Number: 160122733094 Date
8 pages
Machine Learning Algorithms Guide
No ratings yet
Machine Learning Algorithms Guide
34 pages
Implementing Find-S Algorithm in Python
No ratings yet
Implementing Find-S Algorithm in Python
17 pages
Machine Learning for Cs₂CuBiCl₆ Cell Efficiency
No ratings yet
Machine Learning for Cs₂CuBiCl₆ Cell Efficiency
5 pages
Machine Learning Implementations Guide
No ratings yet
Machine Learning Implementations Guide
47 pages
Metrics for Machine Learning Models
No ratings yet
Metrics for Machine Learning Models
20 pages
Regression Analysis Cheat Sheet
No ratings yet
Regression Analysis Cheat Sheet
9 pages
ML Functions
No ratings yet
ML Functions
12 pages
Telecom Churn Proj
No ratings yet
Telecom Churn Proj
4 pages
ML5&6&7&8&9&10
No ratings yet
ML5&6&7&8&9&10
35 pages
Mercedes-Benz Greener Manufacturing Ai
0% (1)
Mercedes-Benz Greener Manufacturing Ai
16 pages
AI Neural Networks and Algorithms Overview
No ratings yet
AI Neural Networks and Algorithms Overview
4 pages
Code
No ratings yet
Code
6 pages
Data Pre Processing
No ratings yet
Data Pre Processing
2 pages
Data Analysis for Beginners
No ratings yet
Data Analysis for Beginners
1 page
AI ML - Cycle 2 Programs
No ratings yet
AI ML - Cycle 2 Programs
15 pages
Data Analysis with Python: NaN, Regression, Clustering, PCA
No ratings yet
Data Analysis with Python: NaN, Regression, Clustering, PCA
2 pages
Sklearn Techniques for Model Evaluation
No ratings yet
Sklearn Techniques for Model Evaluation
11 pages
Vishnu Lease 25-26
No ratings yet
Vishnu Lease 25-26
4 pages
Arduino PPT Akhil
No ratings yet
Arduino PPT Akhil
15 pages
Oman Residential Market Insights Q2 2016
No ratings yet
Oman Residential Market Insights Q2 2016
6 pages
Poetry and Drama - I
No ratings yet
Poetry and Drama - I
386 pages
NSD 2015 Prospectus: Courses & Admission
No ratings yet
NSD 2015 Prospectus: Courses & Admission
43 pages
Understanding the Aurora Borealis
No ratings yet
Understanding the Aurora Borealis
12 pages
GHHJ
No ratings yet
GHHJ
1 page
File Serve
No ratings yet
File Serve
4 pages
File Serve
No ratings yet
File Serve
4 pages
AAI Summer Training Report: ECE Insights
No ratings yet
AAI Summer Training Report: ECE Insights
8 pages
Detay Global Yapı: Construction Projects
No ratings yet
Detay Global Yapı: Construction Projects
54 pages
Snom 300 Datasheet En-4-0-2
No ratings yet
Snom 300 Datasheet En-4-0-2
4 pages
A Plan For 2 Floor Using Autocad Software and Elaborate Complete Design
No ratings yet
A Plan For 2 Floor Using Autocad Software and Elaborate Complete Design
3 pages
Point Cloud to BIM: Existing Approaches
No ratings yet
Point Cloud to BIM: Existing Approaches
8 pages
COVID-19 Office Safety Measures
No ratings yet
COVID-19 Office Safety Measures
4 pages
10 Best FREE Screenwriting Software Options For Writers in 2021
No ratings yet
10 Best FREE Screenwriting Software Options For Writers in 2021
1 page
PayPal Payouts Terms and Conditions
No ratings yet
PayPal Payouts Terms and Conditions
2 pages
P201DV1 Manual
No ratings yet
P201DV1 Manual
65 pages
Osy Micro Project Harsh
No ratings yet
Osy Micro Project Harsh
13 pages
Make-to-Order Manufacturing and Engineer-to-Order - Project Manufacturing
100% (4)
Make-to-Order Manufacturing and Engineer-to-Order - Project Manufacturing
78 pages
CGI Security Exploits and Vulnerabilities
No ratings yet
CGI Security Exploits and Vulnerabilities
18 pages
Acumen Analytics Software: H em Osphere Advanced M Onitor
No ratings yet
Acumen Analytics Software: H em Osphere Advanced M Onitor
68 pages
ISP98
No ratings yet
ISP98
16 pages
Iso Iec TR 29119-13-2022
No ratings yet
Iso Iec TR 29119-13-2022
282 pages
Manual HP Smart Tank 515
100% (1)
Manual HP Smart Tank 515
137 pages
Intelliknight Model 5808: Installation and Operations Manual
No ratings yet
Intelliknight Model 5808: Installation and Operations Manual
180 pages
Manual Central Telefonica
No ratings yet
Manual Central Telefonica
159 pages
SoftICE Command Reference
No ratings yet
SoftICE Command Reference
274 pages
VHDL Code To Simulate 4-Bit Binary Counter by Software
No ratings yet
VHDL Code To Simulate 4-Bit Binary Counter by Software
17 pages
Infor M3 Cloud Configuration Guide
No ratings yet
Infor M3 Cloud Configuration Guide
58 pages
Introduction To Trading Systems
No ratings yet
Introduction To Trading Systems
4 pages
Eaac0203 LM03
No ratings yet
Eaac0203 LM03
196 pages
CLASS No. 8 English Punctuation
No ratings yet
CLASS No. 8 English Punctuation
5 pages
Temenos Infinity Digital Banking Overview
100% (4)
Temenos Infinity Digital Banking Overview
57 pages
18 Xltek EMU40EX EEG PSG System M
No ratings yet
18 Xltek EMU40EX EEG PSG System M
70 pages
10 Minute Mailasd
No ratings yet
10 Minute Mailasd
2 pages
336 Data Entry Set C April 2025
No ratings yet
336 Data Entry Set C April 2025
12 pages
BTFL Cli Backup 20240115 222054 GEPRCF411 AIO
No ratings yet
BTFL Cli Backup 20240115 222054 GEPRCF411 AIO
3 pages
PIVOT Power Brochure
No ratings yet
PIVOT Power Brochure
1 page
Introduction To Scripting in Servicenow
100% (1)
Introduction To Scripting in Servicenow
66 pages

ML Lab File Final

Uploaded by

ML Lab File Final

Uploaded by

‭EXPERIMENT – 2‬

‭mport pandas as pd‬

‭rint([Link]().sum()) # no null values‬

‭ Putting feature variable to X‬

‭ Standardize the features‬

‭ Explained variance ratio‬

‭ Visualize the PCA results‬

‭RESULT‬‭: Hence, we have reduced the dimension of the‬‭dataset.‬

‭rint("Checking null values:\n",[Link]().sum(),"\n\n") # no null‬

‭ Putting feature variable to X‬

‭_train, X_test, y_train, y_test = train_test_split(X, y,‬

‭rom sklearn.linear_model import LinearRegression‬

‭Predicted Value vs Actual Value‬

‭ ESULT‬‭: Hence, we have trained and evaluated the model.‬

‭mport pandas as pd‬

‭ Combining all models' Performance‬

‭ Combining all models' Performance‬

‭mport pandas as pd‬

‭ Assuming 'df' is your DataFrame containing the dataset‬

‭ Selecting features and output variable‬

‭ Perform univariate feature selection using ANOVA F-test‬

‭ Get selected feature indices‬

‭ Get the names of selected features‬

‭ Splitting the dataset into training and testing sets‬

‭ Initialize and fit a linear regression model‬

‭ Predict on the test set‬

# Calculate performance metrics‬

‭rint(f'Root Mean Squared Error (MSE): {rmse:.4f}')‬

‭pip install apyori‬

‭ Create a dataframe and assign data from excel spreadsheet‬

#one hot encoding the products:‬

‭ata1 = [Link](['Member_number', 'Date'])[products[:]].sum()‬

‭rint("New Dimension", [Link])‬

‭ata1 = [Link](product_names, axis=1)‬

for item in association_results:‬

‭rint("Rule : ", items[0], " -> " + items[1])‬

‭ ESULT‬‭: 8 association rules in retail(grocery) transaction‬‭dataset are identified for frequently‬

‭mport numpy as np‬

‭ ESULT‬‭:‬‭K-fold cross-validation helps in obtaining‬‭a more reliable estimate of the model's‬

‭mport numpy as np‬

‭mport pandas as pd‬

‭Load the Dataset‬

‭rint("Linear regression model performance:\nRoot Mean Squared Error‬

‭ ESULT:‬‭Hence, we have trained and evaluated the model.‬

You might also like