0% found this document useful (0 votes)

22 views10 pages

Program 4: Public

The document outlines several Python programs that demonstrate data handling and analysis using pandas and scikit-learn. It includes steps for creating CSV and Excel files, loading and exploring datasets, visualizing data with Matplotlib, handling missing data, encoding categorical variables, and implementing machine learning models like k-NN, linear regression, decision trees, and K-Means clustering. Each program is accompanied by code snippets and explanations for performing specific tasks related to data science.

Uploaded by

Nilesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views10 pages

Program 4: Public

Uploaded by

Nilesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Public

Program 4 Write a program to Load and explore the dataset of CSV and excel files using pandas.
Step 1: Creating CSV and Excel Files with Dummy Data

Create CSV File: Open a text editor like Notepad or any other code editor. Enter the following
data Name, Age, Score

Srikanth, 28, 85

Snigdha, 22, 78

Mary, 31, 92

Save this file as sample_data.csv in the C:\ML_Projects directory.

Create Excel File: We can use Microsoft Excel or Google Sheets to create this file. Enter the
below data:

Name Course Sem

Rajesh BCA 1

Ramesh BCA 2

Swati BCOM 1

Florina BCOM 3

Pooja BBA 2

Raghu BBA 4

Save this file as sample_data.xlsx in the C:\ML_Projects directory.

Step 2: Python Code to Load and Explore the Data

import pandas as pd

# Define the file paths

csv_file_path = 'C:\\ML_Projects\\p4_CSV.csv'

excel_file_path = 'C:\\ML_Projects\\p4_XLSX.xlsx'

# Load the CSV file

data_csv = pd.read_csv(csv_file_path)

print("CSV File Data:")

Public

print(data_csv)

# Load the Excel file

data_excel = pd.read_excel (excel_file_path)

print("\nExcel File Data:")

print(data_excel)

# Basic Data Exploration

print("\nData Descriptions: ")

print("CSV Data Description:")

print(data_csv.describe())

print("\nExcel Data Description:")

print(data_excel.describe())

# Displaying data types

print("\nData Types in CSV File:")

print(data_csv.dtypes)

print("\nData Types in Excel File:")

print(data_excel.dtypes)

Program 5

Write a program to Visualize the dataset to gain insights using Matplotlib by plotting

scatter plots, bar charts.

Step 1: Create the CSV File:

Create a CSV file with below data of student study hours and exam scores: Save this file as
study_data.CSV.
Public

Student ID, Study Hours, Exam Score

1,5,82

2,2,48

3,8,90

4,1,35

5,3,50

6,4,66

7,9,95

8,6,75

9,7,88

10,0.5,30

11,10,96

12,0,20

13,12,98

Step 2: Python Code:

import pandas as pd

import matplotlib.pyplot as plt

# Load the data

data = pd.read_csv('C:\\ML_Projects\\study_data.csv')

# Scatter plot of Study Hours vs Exam Scores

plt.figure(figsize=(14, 7))

plt.subplot(1, 2, 1) # 1 row, 2 columns, 1st subplot

plt.scatter (data[ 'Study Hours'], data[ ' Exam Score'], color= 'dodgerblue', edgecolor= 'k',
alpha=0.7)

plt.title('Study Hours vs. Exam Scores')

plt.xlabel('Study Hours')
Public

plt.ylabel('Exam Scores')

plt.grid(True)

# Bar chart of Average Exam Score by Study Hour Range

# Creating bins for study hour ranges

bins=[0, 2, 4, 6, 8, 10, 12]

labels =['0-2', '2-4', '4-6', '6-8', '8-10', '10-12']

data['Study Hour Range'] = pd.cut (data['Study Hours'], bins=bins, labels=labels, right=False)

grouped_data = data.groupby('Study Hour Range')['Exam Score'].mean()

plt.subplot(1, 2, 2) # 1 row, 2 columns, 2nd subplot

grouped_data.plot(kind='bar', color='salmon')

plt.title('Average Exam Score by Study Hour Range')

plt.xlabel('Study Hour Range')

plt.ylabel('Average Exam Score')

plt.xticks(rotation=0) # Keep the category labels horizontal

plt.tight_layout() # Adjust subplots to fit into figure area.

plt.show()

Program 6

Write a program to Handle missing data, encode categorical variables, and perform feature
scaling.

import pandas as pd

from sklearn.impute import SimpleImputer

from sklearn.preprocessing import OneHotEncoder, StandardScaler

Public

# Create dummy data

data={

'Age': [25, 30, None, 28, 35],

'Gender': ['Female', 'Male', 'Male', 'Female', 'Male'],

'Income': [50000, 60000, 45000, None, 70000]

df= pd.DataFrame(data)

# Handling missing data

imputer = SimpleImputer (strategy='mean')

df[['Age', 'Income']]=imputer.fit_transform(df [[ 'Age', 'Income']])

# Print data after handling missing values

print("Data after handling missing values:")

print(df)

# Encoding categorical variables

encoder=OneHotEncoder()

encoded_data = encoder.fit_transform(df[['Gender']]).toarray()

# Print data after categorical encoding

encoded_df = pd.DataFrame (encoded_data,

columns=encoder.get_feature_names_out(['Gender']))

print("\nData after categorical encoding:")

print(encoded_df)

# Feature scaling

scaler = StandardScaler()

scaled_data = scaler.fit_transform (df [[ 'Age', 'Income']])

# Print data after feature scaling

Public

scaled_df = pd.DataFrame(scaled_data, columns=['Scaled Age', 'Scaled Income'])

print("\nData after feature scaling:")

print(scaled_df)

Program 7

Write a program to implement a k-Nearest Neighbours (k-NN) classifier using scikitlearn and
Train the classifier on the dataset and evaluate its performance.

import numpy as np

from sklearn.model_selection import train_test_split

from sklearn. neighbors import KNeighborsClassifier

from sklearn.metrics import accuracy_score

# Dummy student data: exam score 1, exam score 2, pass/fail (features)

X = np.array([[80, 75], [95, 90], [60, 50], [45, 30], [30, 40], [85, 95], [70, 60], [50, 55],

[40, 45], [60, 70]])

y = np.array([1, 1, 0, 0, 0, 1, 1, 0, 0, 1]) # Binary classes for demonstration

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Initialize the k-NN classifier with k=3

knn = KNeighborsClassifier (n_neighbors=3)

# Train the classifier on the training data

knn.fit(X_train, y_train)

# Evaluate the classifier's performance

y_pred=knn.predict(X_test)

accuracy = accuracy_score (y_test, y_pred)

print("Accuracy on the test set: {:.2f}".format(accuracy))

Public

# Take user input for exam scores

exam_score1 = float(input("Enter Exam Score 1: "))

exam_score2 = float(input("Enter Exam Score 2: "))

# Prepare the user input for prediction

user_input = np.array([[exam_score1, exam_score2]])

# Use the trained k-NN classifier to predict the outcome

predicted_outcome=knn.predict(user_input)

if predicted_outcome [0] == 1:

print("Based on the exam scores provided, the student is predicted to pass.")

else:

print("Based on the exam scores provided, the student is predicted to fail.")

Program 8

Write a program to implement a linear regression model for regression tasks and Train the
model on a dataset with continuous target variables.

import numpy as np

from sklearn.linear_model import LinearRegression

# Dummy house price prediction data: features (house size, number of bedrooms) and target
variable (house price)

X = np.array([[1000, 2], [1500, 3], [1200, 2], [1800, 4], [900, 2], [2000, 3]])

y = np.array([300000, 400000, 350000, 500000, 280000, 450000])

# Initialize the Linear Regression model

Public

model = LinearRegression()

# Train the model on the dataset

model.fit(X, y)

# Take input from the user for new house data

size = float(input("Enter the size of the house in sqft: "))

bedrooms=int(input("Enter the number of bedrooms: "))

new_data = np.array([[size, bedrooms]])

# Predict the price for the new house data

predicted_price = model.predict(new_data)

# Print the predicted price for the new house data

print("Predicted price for a house with size {} sqft and {} bedrooms: Rs.{:.2f}". format (size,
bedrooms, predicted_price[0]))

Program 9

Write a program to implement a decision tree classifier using scikit-learn and visualize the
decision tree and understand its splits.

import numpy as np

from sklearn.tree import DecisionTreeClassifier, plot_tree

from sklearn.tree import export_text

import matplotlib.pyplot as plt

# Custom dummy data for fruit classification

#Features: [Weight, Texture] -> Target: [Fruit Type]

x = np.array([[150, 0], [170, 1], [120, 0], [140, 1], [200, 1], [130, 0]])
Public

y = np.array(['Apple', 'Orange', 'Apple', 'Orange', 'Melon', 'Apple'])

# Initialize the Decision Tree Classifier

clf = DecisionTreeClassifier (random_state=42)

clf.fit(x, y)

# Visualize the Decision Tree splits

tree_rules = export_text(clf, feature_names=['Weight', 'Texture'])

print("Decision Tree Classifier Rules: \n", tree_rules)

# Plot the Decision Tree plt.figure(figsize=(10, 6))

plot_tree (clf, filled=True, feature_names=['Weight', 'Texture'], class_names=np. unique (y))

plt.show()

Program 10 Write a program to Implement K-Means clustering and Visualize clusters.

import numpy as np

import matplotlib.pyplot as plt

from sklearn.cluster import KMeans

# Generate dummy customer data (Age, Income)

X = np.array([[30, 50000], [35, 60000], [40, 80000], [25, 30000], [45, 100000], [20, 20000], [50,
120000], [55, 150000], [60, 140000], [28, 40000]])

# Initialize K-Means with 2 clusters

kmeans = KMeans (n_clusters=3, random_state=0)

kmeans.fit(X)

# Get cluster labels and cluster centers

labels = kmeans.labels_

centers = kmeans.cluster_centers_
Public

# Visualize the clusters

plt.figure(figsize=(8, 6))

plt.scatter (X[:, 0], X[:, 1], c=labels, cmap='viridis', s=50, alpha=0.8)

plt.scatter (centers[:, 0], centers[:, 1], c='red', s=200, marker='X', label='Centroids')

plt.xlabel('Age')

plt.ylabel('Income')

plt.title('K-Means Clustering of Customers')

plt.legend()

plt.show()

1.load and Explore CSV and Excel Using Pandas
No ratings yet
1.load and Explore CSV and Excel Using Pandas
5 pages
ML 1-11
No ratings yet
ML 1-11
27 pages
Syllabus AIML
No ratings yet
Syllabus AIML
14 pages
Python Database Handling Assignments
No ratings yet
Python Database Handling Assignments
6 pages
ML LabManual
No ratings yet
ML LabManual
16 pages
Machine Learning Lab Dlihebca6sem
100% (1)
Machine Learning Lab Dlihebca6sem
25 pages
Practical File: School Name School Logo
100% (1)
Practical File: School Name School Logo
35 pages
ML (Sudhanshu)
No ratings yet
ML (Sudhanshu)
24 pages
12 AI Lab Practical File HW
No ratings yet
12 AI Lab Practical File HW
25 pages
R and Python Machine Learning Models
No ratings yet
R and Python Machine Learning Models
23 pages
Informatics Practices Record Class 12
No ratings yet
Informatics Practices Record Class 12
60 pages
Icse Phython Programs
No ratings yet
Icse Phython Programs
65 pages
Data Analysis Lab: Python & Visualization
No ratings yet
Data Analysis Lab: Python & Visualization
11 pages
Shubham Info Practical 3251
No ratings yet
Shubham Info Practical 3251
59 pages
12 Ip Practical List With Solution Complete
No ratings yet
12 Ip Practical List With Solution Complete
5 pages
Data Science for Engineers Course
No ratings yet
Data Science for Engineers Course
8 pages
ML Lab Manual
No ratings yet
ML Lab Manual
36 pages
Sanyam Data Science
No ratings yet
Sanyam Data Science
33 pages
ML
No ratings yet
ML
21 pages
Informatics Practices XII IP 2023-24 Programs
No ratings yet
Informatics Practices XII IP 2023-24 Programs
26 pages
Python Lab Assignment 7
No ratings yet
Python Lab Assignment 7
7 pages
Viva
No ratings yet
Viva
7 pages
MySQL Programming Examples
No ratings yet
MySQL Programming Examples
26 pages
M L
No ratings yet
M L
13 pages
Ai Record Programs
No ratings yet
Ai Record Programs
34 pages
DW - DW Internal 1 - Merged
No ratings yet
DW - DW Internal 1 - Merged
12 pages
Utkarsh Kumar Info Practical
No ratings yet
Utkarsh Kumar Info Practical
53 pages
BAI3552 DataScienceWithPython
No ratings yet
BAI3552 DataScienceWithPython
31 pages
Informatics Practices Practical Record
No ratings yet
Informatics Practices Practical Record
50 pages
PPPL Final Practical Questions
No ratings yet
PPPL Final Practical Questions
5 pages
Python Pandas Practical Exercises
No ratings yet
Python Pandas Practical Exercises
15 pages
Python Programs for Data Analysis Tasks
No ratings yet
Python Programs for Data Analysis Tasks
4 pages
ML File Syllabus
No ratings yet
ML File Syllabus
43 pages
Ai Class 12 Practical 2
0% (1)
Ai Class 12 Practical 2
21 pages
Class 12 Pandas Practical Guide
No ratings yet
Class 12 Pandas Practical Guide
15 pages
CLASS XII - IP List of Practicals With Coding 2020
No ratings yet
CLASS XII - IP List of Practicals With Coding 2020
15 pages
Class Xii PDF For Practical
No ratings yet
Class Xii PDF For Practical
24 pages
Student List and Data Analysis Techniques
No ratings yet
Student List and Data Analysis Techniques
57 pages
BCA 212P Data Science Practical Guide
No ratings yet
BCA 212P Data Science Practical Guide
30 pages
Data Analysis and Visualization Course
No ratings yet
Data Analysis and Visualization Course
4 pages
Informatics Practices Record 2022-23
No ratings yet
Informatics Practices Record 2022-23
92 pages
Class XII Pandas & SQL Practical List
100% (1)
Class XII Pandas & SQL Practical List
7 pages
NumPy and Pandas Data Manipulation Guide
No ratings yet
NumPy and Pandas Data Manipulation Guide
11 pages
Data Science Lab Plan for TE Students
No ratings yet
Data Science Lab Plan for TE Students
5 pages
Class12 IP Practical File
No ratings yet
Class12 IP Practical File
7 pages
Explore NumPy, SciPy, Jupyter, Pandas
No ratings yet
Explore NumPy, SciPy, Jupyter, Pandas
18 pages
Pert Q Python
No ratings yet
Pert Q Python
3 pages
Data Science Lab Exam Questions 2024
No ratings yet
Data Science Lab Exam Questions 2024
13 pages
Data Science Lab Manual..
No ratings yet
Data Science Lab Manual..
54 pages
ML Lab
No ratings yet
ML Lab
29 pages
MLP Regressor with Sklearn on Wine Data
No ratings yet
MLP Regressor with Sklearn on Wine Data
10 pages
Python Data Analysis and Visualization Tasks
No ratings yet
Python Data Analysis and Visualization Tasks
5 pages
Practical (Data Science)
No ratings yet
Practical (Data Science)
13 pages
IP PROJECT P2 - Merged
No ratings yet
IP PROJECT P2 - Merged
36 pages
Lab Report on Human-Computer Interaction
No ratings yet
Lab Report on Human-Computer Interaction
5 pages
ML Record
No ratings yet
ML Record
19 pages
XII IP Practical List 2023-24
No ratings yet
XII IP Practical List 2023-24
4 pages
DSBDAlab Manual
No ratings yet
DSBDAlab Manual
116 pages
ML File: Data Analysis Programs
No ratings yet
ML File: Data Analysis Programs
17 pages
CCNA Lab Part 2
No ratings yet
CCNA Lab Part 2
103 pages
BGP Interview Questions
No ratings yet
BGP Interview Questions
42 pages
Radware Alteon Upgrade Procedures and Methods Via GUI
No ratings yet
Radware Alteon Upgrade Procedures and Methods Via GUI
7 pages
Radware Backup Alteon Configuration
No ratings yet
Radware Backup Alteon Configuration
3 pages
Cyber Crime BCA NOTES 5TH SEM PDF
100% (2)
Cyber Crime BCA NOTES 5TH SEM PDF
17 pages
Measures of Central Tendency and Dispersion
No ratings yet
Measures of Central Tendency and Dispersion
14 pages
Topic 8 Time Series and Forecasting
No ratings yet
Topic 8 Time Series and Forecasting
33 pages
Statistical Concepts and Analysis Review
No ratings yet
Statistical Concepts and Analysis Review
94 pages
Statistical Modeling and Analysis Tasks
No ratings yet
Statistical Modeling and Analysis Tasks
5 pages
Bba 2 Sem Ucb2c07 Statistical Methods For Business 2 S 2024
No ratings yet
Bba 2 Sem Ucb2c07 Statistical Methods For Business 2 S 2024
2 pages
Central Tendency and Variation Measures
No ratings yet
Central Tendency and Variation Measures
5 pages
STA457 Assignment1
No ratings yet
STA457 Assignment1
12 pages
k-Nearest Neighbor Algorithms Explained
No ratings yet
k-Nearest Neighbor Algorithms Explained
6 pages
Experimental Design Procedures For The Behavioral Sciences 4th Edition Solution Manual
No ratings yet
Experimental Design Procedures For The Behavioral Sciences 4th Edition Solution Manual
13 pages
Flint Water Study and Income Analysis
No ratings yet
Flint Water Study and Income Analysis
22 pages
Correlation and Regression Concepts
No ratings yet
Correlation and Regression Concepts
84 pages
Mindmaps Complete Chapter 3 - Data Analytics
No ratings yet
Mindmaps Complete Chapter 3 - Data Analytics
4 pages
A Level Mathematics: Descriptive Statistics Guide
No ratings yet
A Level Mathematics: Descriptive Statistics Guide
6 pages
Estimation Theory (Confidence Limits - Large & Small Samples)
No ratings yet
Estimation Theory (Confidence Limits - Large & Small Samples)
14 pages
Optimal Hedge Ratio Calculation Assignment
No ratings yet
Optimal Hedge Ratio Calculation Assignment
1 page
Ecn 102 Final Exam Solutions Guide
No ratings yet
Ecn 102 Final Exam Solutions Guide
14 pages
Delhi University Econometrics Meeting Minutes
No ratings yet
Delhi University Econometrics Meeting Minutes
7 pages
To Design and Implement Application For Bank Customer Churning Rate Prediction and Analysis Using Machine Learning Algorithm
No ratings yet
To Design and Implement Application For Bank Customer Churning Rate Prediction and Analysis Using Machine Learning Algorithm
4 pages
Main Article
No ratings yet
Main Article
3 pages
INT524 Unit3
No ratings yet
INT524 Unit3
35 pages
Understanding Principal Component Analysis
100% (1)
Understanding Principal Component Analysis
18 pages
Multiple Regression Analysis Homework
No ratings yet
Multiple Regression Analysis Homework
3 pages
Analisis Regresi Linier Sederhana Dan Koefisien Determinasi
No ratings yet
Analisis Regresi Linier Sederhana Dan Koefisien Determinasi
6 pages
Data Mining Functionalities:: - Characterization and Discrimination
No ratings yet
Data Mining Functionalities:: - Characterization and Discrimination
21 pages
Asphalt Extraction Methods Comparison
No ratings yet
Asphalt Extraction Methods Comparison
9 pages
Quantitative Methods for Business Management
No ratings yet
Quantitative Methods for Business Management
55 pages
Class Problem-Portfolio Risk and Return
No ratings yet
Class Problem-Portfolio Risk and Return
14 pages
Simulation Experimentation Techniques
No ratings yet
Simulation Experimentation Techniques
2 pages
Simple Linear Regression Overview
100% (1)
Simple Linear Regression Overview
51 pages
Sample Size Calculations in Clinical Research, Third Edition Shein-Chung Chow Ebook Synced PDF
100% (2)
Sample Size Calculations in Clinical Research, Third Edition Shein-Chung Chow Ebook Synced PDF
47 pages