0% found this document useful (0 votes)

5 views

Data Science Programs

Uploaded by

senthur kannan thirugnanasambanthan

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Data Science Programs

Uploaded by

senthur kannan thirugnanasambanthan

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Question: Create a Pandas program to read a CSV file, fill missing values with the column

mean, and group the data by a specified category to calculate the average of a numerical

column.

Answer:

import pandas as pd

# Read the CSV file into a DataFrame

file_path = 'data.csv' # Replace with your CSV file path

data = pd.read_csv(file_path)

# Fill missing values in each column with the column mean

data = data.fillna(data.mean(numeric_only=True))

# Specify the category column and numerical column

category_column = 'Category' # Replace with the name of your category column

numerical_column = 'Value' # Replace with the name of your numerical column

# Group the data by the category column and calculate the average of the numerical column

grouped_data = data.groupby(category_column)[numerical_column].mean()

# Display the results

print("Average of numerical column grouped by category:")

print(grouped_data)

Question: Implement a k-nearest neighbors (KNN) classifier using scikit-learn to predict

labels from the Iris dataset, and evaluate the model's accuracy.
Answer:

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler

from sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import accuracy_score

# Load the Iris dataset

iris = load_iris()

X, y = iris.data, iris.target

# Split the dataset into training and testing sets (80% train, 20% test)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize the features for better performance

scaler = StandardScaler()

X_train = scaler.fit_transform(X_train)

X_test = scaler.transform(X_test)

# Create the KNN classifier with k=3

knn = KNeighborsClassifier(n_neighbors=3)

# Train the classifier

knn.fit(X_train, y_train)

# Predict labels for the test set

y_pred = knn.predict(X_test)

# Evaluate the model's accuracy

accuracy = accuracy_score(y_test, y_pred)

# Display the accuracy

print("Accuracy of the KNN classifier:", accuracy)

Question: Write a Python program to load a CSV file into a Pandas DataFrame and display

summary statistics (mean, median, and mode) for numerical columns.

Answer:

import pandas as pd

# Load the CSV file into a DataFrame

file_path = 'data.csv' # Replace with the path to your CSV file

data = pd.read_csv(file_path)

# Display the DataFrame

print("DataFrame:")

print(data)

# Calculate and display summary statistics for numerical columns

numerical_data = data.select_dtypes(include=['number'])

# Mean

mean_values = numerical_data.mean()
print("\nMean of numerical columns:")

print(mean_values)

# Median

median_values = numerical_data.median()

print("\nMedian of numerical columns:")

print(median_values)

# Mode

mode_values = numerical_data.mode()

print("\nMode of numerical columns:")

print(mode_values.iloc[0]) # Display the first mode for simplicity

Question: Write a Dask program to load a large CSV file, filter the data based on specific

criteria, and save the results to a new CSV file.

Answer:

import dask.dataframe as dd

# Load the large CSV file into a Dask DataFrame

file_path = 'large_data.csv' # Replace with the path to your large CSV file

data = dd.read_csv(file_path)

# Define the filtering criteria (e.g., filter rows where 'column_name' > 50)

filtered_data = data[data['column_name'] > 50] # Replace 'column_name' and condition as needed

# Save the filtered data to a new CSV file

output_file_path = 'filtered_data.csv'

filtered_data.to_csv(output_file_path, single_file=True, index=False)

print(f"Filtered data has been saved to {output_file_path}")

Question: Write a Python function to calculate the mean, median, and mode of a given list of

numerical values.

Answer:

from statistics import mean, median, mode, StatisticsError

def calculate_statistics(numbers):

"""

Calculate the mean, median, and mode of a list of numerical values.

Args:

numbers (list): A list of numerical values.

Returns:

dict: A dictionary containing the mean, median, and mode.

"""

if not numbers:

return {"mean": None, "median": None, "mode": None}

try:

stats = {

"mean": mean(numbers),
"median": median(numbers),

"mode": mode(numbers),

except StatisticsError:

# Handle cases where mode is not defined (e.g., all values occur equally)

stats = {

"mean": mean(numbers),

"median": median(numbers),

"mode": "No unique mode",

return stats

# Example usage

numbers = [10, 20, 20, 30, 40]

result = calculate_statistics(numbers)

print("Mean:", result["mean"])

print("Median:", result["median"])

print("Mode:", result["mode"])

Contoh Quotation
No ratings yet
Contoh Quotation
1 page
ISO 9001 2015 Internal Audit Checklist Sample
33% (3)
ISO 9001 2015 Internal Audit Checklist Sample
7 pages
One Sample T-Test
No ratings yet
One Sample T-Test
10 pages
9853 1287 01d Overhaul Instructions DHR6 H Ver. B PDF
No ratings yet
9853 1287 01d Overhaul Instructions DHR6 H Ver. B PDF
68 pages
Structural Design of Pavement
No ratings yet
Structural Design of Pavement
38 pages
List of Programs For Informatics - XII - IP
No ratings yet
List of Programs For Informatics - XII - IP
26 pages
List of Programs for Informatics 24-25_1575540280755490817SD_PDF
No ratings yet
List of Programs for Informatics 24-25_1575540280755490817SD_PDF
11 pages
ML Remaining
No ratings yet
ML Remaining
17 pages
List of Programs for Informatics.docx
No ratings yet
List of Programs for Informatics.docx
43 pages
Lab 1
No ratings yet
Lab 1
3 pages
Codes
No ratings yet
Codes
37 pages
MACHINE LEARNING LAB WORD 12-1-2025. DOCUMENT
No ratings yet
MACHINE LEARNING LAB WORD 12-1-2025. DOCUMENT
68 pages
FDS Slot 1
No ratings yet
FDS Slot 1
19 pages
Pattern Recognition
No ratings yet
Pattern Recognition
26 pages
1
No ratings yet
1
12 pages
Recurrent Neural Network-Programs
No ratings yet
Recurrent Neural Network-Programs
9 pages
ML(sudhanshu)
No ratings yet
ML(sudhanshu)
24 pages
EDS - Python Cheat Sheet
No ratings yet
EDS - Python Cheat Sheet
3 pages
Data_Mining_Practicals_Complete
No ratings yet
Data_Mining_Practicals_Complete
13 pages
Import Pandas As PD
No ratings yet
Import Pandas As PD
21 pages
Pandas
No ratings yet
Pandas
42 pages
data science practicals
No ratings yet
data science practicals
47 pages
unit 3 5
No ratings yet
unit 3 5
4 pages
Experiment 1 solution
No ratings yet
Experiment 1 solution
5 pages
Dpa 2
No ratings yet
Dpa 2
2 pages
R456
No ratings yet
R456
8 pages
Exercise 3
No ratings yet
Exercise 3
12 pages
Project Ali Huzaifa
No ratings yet
Project Ali Huzaifa
6 pages
ml record
No ratings yet
ml record
21 pages
ML Practical 205160694034
No ratings yet
ML Practical 205160694034
33 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
Pandasmatplotlib Practical File
No ratings yet
Pandasmatplotlib Practical File
15 pages
Iteration
No ratings yet
Iteration
40 pages
Machine Learning Lab New
No ratings yet
Machine Learning Lab New
14 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
8 pages
Pandas
No ratings yet
Pandas
12 pages
pandas (1)
No ratings yet
pandas (1)
25 pages
R Programs
No ratings yet
R Programs
12 pages
Python Libraries
No ratings yet
Python Libraries
53 pages
Informatics Practices Practical File
No ratings yet
Informatics Practices Practical File
8 pages
IDS-1
No ratings yet
IDS-1
30 pages
Practical
No ratings yet
Practical
27 pages
Data Visualization EDA-print
No ratings yet
Data Visualization EDA-print
18 pages
lab ML
No ratings yet
lab ML
26 pages
Pandas
No ratings yet
Pandas
21 pages
DataFrame.docx
No ratings yet
DataFrame.docx
95 pages
DOC-20250211-WA0009. (1)
No ratings yet
DOC-20250211-WA0009. (1)
26 pages
a
No ratings yet
a
2 pages
Roll NO 2020
No ratings yet
Roll NO 2020
8 pages
AML_code_for_m2
No ratings yet
AML_code_for_m2
7 pages
Ip-12-2023-24 Practical File
No ratings yet
Ip-12-2023-24 Practical File
19 pages
IP Assgn 3
No ratings yet
IP Assgn 3
12 pages
Informatics Practices Practical List22-2323
No ratings yet
Informatics Practices Practical List22-2323
6 pages
Pandas What Can Pandas Do For You ?: Statsmodels SM Seaborn Sns
No ratings yet
Pandas What Can Pandas Do For You ?: Statsmodels SM Seaborn Sns
9 pages
ML Experiments
No ratings yet
ML Experiments
22 pages
Additional Program
No ratings yet
Additional Program
573 pages
Pandas & Numpy
No ratings yet
Pandas & Numpy
32 pages
Informatics Practices Practical List22-2323
100% (1)
Informatics Practices Practical List22-2323
7 pages
Exercise and Experiment 3
No ratings yet
Exercise and Experiment 3
14 pages
AD3411 - 1 To 5
No ratings yet
AD3411 - 1 To 5
11 pages
Data Mining Practicals
No ratings yet
Data Mining Practicals
22 pages
Urmi ML Practical File
No ratings yet
Urmi ML Practical File
37 pages
Report
No ratings yet
Report
24 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Orlando Bravo (FEC Records)
No ratings yet
Orlando Bravo (FEC Records)
2 pages
Class X (2019-20) - Social Science
0% (5)
Class X (2019-20) - Social Science
4 pages
FX 3000MD en
No ratings yet
FX 3000MD en
2 pages
Norris Pro-Rod Coiled Rod
No ratings yet
Norris Pro-Rod Coiled Rod
8 pages
Pepsico - Porter and Kramer
No ratings yet
Pepsico - Porter and Kramer
7 pages
Maestro 800 F 1000 F 1200 F
No ratings yet
Maestro 800 F 1000 F 1200 F
4 pages
Manahan Residence - Proposal For Aircondititoning Works
No ratings yet
Manahan Residence - Proposal For Aircondititoning Works
3 pages
Conference Proceeding: Sound, Urbanism and Sense of Place
No ratings yet
Conference Proceeding: Sound, Urbanism and Sense of Place
824 pages
A Review On Process Parameter Optimization in Material Extrusion Additive Manufacturing Using Thermoplastic
No ratings yet
A Review On Process Parameter Optimization in Material Extrusion Additive Manufacturing Using Thermoplastic
4 pages
Unit 13
No ratings yet
Unit 13
16 pages
Root Cause Analysis of Failure of LPT Rotor Blades of 250MW KWU Turbine
No ratings yet
Root Cause Analysis of Failure of LPT Rotor Blades of 250MW KWU Turbine
1 page
The Eye As An Optical Instrument From Camera Obscura To Helmholtz S Perspective
No ratings yet
The Eye As An Optical Instrument From Camera Obscura To Helmholtz S Perspective
4 pages
Pitot Pressure Flow Chart
No ratings yet
Pitot Pressure Flow Chart
2 pages
SQL Quick Guide PDF
No ratings yet
SQL Quick Guide PDF
7 pages
Zimbabwe: For Pierrot Ensemble
No ratings yet
Zimbabwe: For Pierrot Ensemble
19 pages
Green Chemical Technology: This Is The Title of Your Paper
No ratings yet
Green Chemical Technology: This Is The Title of Your Paper
3 pages
LESSON 5 Technical Writing
No ratings yet
LESSON 5 Technical Writing
5 pages
Slab Design Update CEN 347
No ratings yet
Slab Design Update CEN 347
86 pages
Del Mar College Seminar 2010
No ratings yet
Del Mar College Seminar 2010
43 pages
Types of Fire Extinguishers
No ratings yet
Types of Fire Extinguishers
39 pages
2 Storey Villa
No ratings yet
2 Storey Villa
16 pages
Supercavitation Report New
No ratings yet
Supercavitation Report New
20 pages
Form 59: Subcontractor Application Form: Issue No.01 REV. NO.01
No ratings yet
Form 59: Subcontractor Application Form: Issue No.01 REV. NO.01
4 pages
Open Letter BBC
No ratings yet
Open Letter BBC
6 pages
713 Compact Kaplan e
No ratings yet
713 Compact Kaplan e
4 pages