Machine Learning Lab Dlihebca6sem
Machine Learning Lab Dlihebca6sem
Bangalore University
LIST OF PROGRAMS
1. Install and set up Python and essential libraries like NumPy and pandas
4. Write a program to Load and explore the dataset of .CVS and excel files using
pandas.
5. Write a program to Visualize the dataset to gain insights using Matplotlib or Seaborn
6. Write a program to Handle missing data, encode categorical variables, and perform
feature scaling.
learn and Train the classifier on the dataset and evaluate its performance.
8. Write a program to implement a linear regression model for regression tasks and
Install Python: If you haven't already installed Python, you can download it from the official
website:
To verify (terminal)
python --version
Install pip: pip is a package manager for Python that allows you to easily install and manage
libraries. Most recent versions of Python come with pip pre-installed. You can verify if pip is
installed by running the following command in your terminal or command prompt:
pip --version
Install NumPy and pandas: Once you have Python and pip installed, you can use pip to install
NumPy and pandas by running the following commands in your terminal or command prompt:
#In terminal
pip install numpy
pip install pandas
This will download and install NumPy and Pandas along with any dependencies they require.
Verify installation: After installing NumPy and pandas, you can verify that they were installed
correctly by running the following commands in Python's interactive mode or a Python script:
import numpy
import pandas
print(numpy.__version__)
print(pandas.__version__)
These commands should print the versions of NumPy and pandas that were installed.
Output:
2. Introduce sci-kit-learn as a machine learning library.
Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python. It
provides a selection of efficient tools for machine learning and statistical modeling including
classification, regression, clustering and dimensionality reduction via a consistence interface
in Python. This library, which is largely written in Python, is built upon NumPy,
SciPy and Matplotlib.
Installation
If you already installed NumPy and Scipy, the following are the two easiest ways to install
scikit-learn −
Using pip
Features
Rather than focusing on loading, manipulating and summarising data, Scikit-learn library is
focused on modeling the data. Some of the most popular groups of models provided by Sklearn
are as follows −
Supervised Learning algorithms − Almost all the popular supervised learning algorithms,
like Linear Regression, Support Vector Machine (SVM), Decision Tree etc., are the part of
scikit-learn.
Unsupervised Learning algorithms − On the other hand, it also has all the popular
unsupervised learning algorithms from clustering, factor analysis, PCA (Principal Component
Analysis) to unsupervised neural networks.
Cross Validation − It is used to check the accuracy of supervised models on unseen data.
3. Install and set up scikit-learn and other necessary tools.
scikit-learn, a powerful Python library for machine learning. Here are the steps to set it up:
Install Python: If you haven’t already installed Python, download and install the latest version
of Python 3 from the official Python website.
Install scikit-learn using pip: Open your terminal or command prompt and run the following
command:
import sklearn
import numpy
import pandas
import matplotlib
print(sklearn.__version__)
print(numpy.__version__)
print(pandas.__version__)
print(matplotlib.__version__)
These commands should print the versions of scikit-learn and other libraries that were
installed.
Output:
4. Write a program to Load and explore the dataset of .CVS and excel files using
pandas.
import pandas as pd
def explore_dataset(file_path):
if file_path.endswith('.csv'):
df = pd.read_csv(file_path)
elif file_path.endswith('.xlsx'):
df = pd.read_excel(file_path)
else:
return
print("Dataset information:")
print(df.info())
print(df.head())
# Display summary statistics for numerical columns
print("\nSummary statistics:")
print(df.describe())
print(f"{column}: {df[column].unique()}")
# Example usage
file_path = 'IRIS.csv'
explore_dataset(file_path)
Output:
5. Write a program to Visualize the dataset to gain insights using Matplotlib or
Seaborn by plotting scatter plots, and bar charts.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
def visualize_dataset(file_path):
sns.pairplot(df)
plt.title("Pairplot of the Dataset")
plt.show()
# Plot bar chart for categorical column (assuming the first column is categorical)
# Example usage
file_path = 'IRIS.csv' # Change this to the path of your CSV file
visualize_dataset(file_path)
Output:
6. Write a program to Handle missing data, encode categorical variables, and
perform feature scaling.
import pandas as pd
iris = load_iris()
iris_df['target'] = iris.target
def preprocess_dataset(df):
# Handle missing data (Iris dataset doesn't have missing values, but we'll simulate
some)
df.iloc[::10, 0] = float('NaN')
imputer = SimpleImputer(strategy='mean')
df[df.columns] = imputer.fit_transform(df[df.columns])
scaler = StandardScaler()
df[df.columns[:-1]] = scaler.fit_transform(df[df.columns[:-1]])
return df
preprocessed_df = preprocess_dataset(iris_df)
print("Preprocessed dataset:")
print(preprocessed_df.head())
Output:
7. Write a program to implement a k-Nearest Neighbours (k-NN) classifier using
scikit-learn and Train the classifier on the dataset and evaluate its performance.
import numpy as np
import pandas as pd
iris = load_iris()
X = iris.data
y = iris.target
k = 3 # Number of neighbors
knn_classifier = KNeighborsClassifier(n_neighbors=k)
knn_classifier.fit(X_train, y_train)
y_pred = knn_classifier.predict(X_test)
print("Accuracy:", accuracy)
print("Classification Report:")
import numpy as np
import pandas as pd
boston = load_boston()
X = boston.data
y = boston.target
boston_df['target'] = y
linear_regression = LinearRegression()
y_pred = linear_regression.predict(X_test)
r2 = r2_score(y_test, y_pred)
Output:
9. Write a program to implement a decision tree classifier using scikit-learn and
import numpy as np
import pandas as pd
iris = load_iris()
X = iris.data
y = iris.target
decision_tree = DecisionTreeClassifier()
decision_tree.fit(X_train, y_train)
# Visualize the decision tree
plt.figure(figsize=(12, 8))
plot_tree(decision_tree, feature_names=iris.feature_names,
class_names=iris.target_names, filled=True)
plt.show()
Output:
10. Write a program to Implement K-Means clustering and Visualize clusters.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
plt.figure(figsize=(8, 6))
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=100,
c='red', label='Centroids')
plt.title('K-Means Clustering')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()
plt.show()
Output: