Data Science Programs
Data Science Programs
mean, and group the data by a specified category to calculate the average of a numerical
column.
Answer:
import pandas as pd
data = pd.read_csv(file_path)
data = data.fillna(data.mean(numeric_only=True))
# Group the data by the category column and calculate the average of the numerical column
grouped_data = data.groupby(category_column)[numerical_column].mean()
print(grouped_data)
labels from the Iris dataset, and evaluate the model's accuracy.
Answer:
iris = load_iris()
X, y = iris.data, iris.target
# Split the dataset into training and testing sets (80% train, 20% test)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
Question: Write a Python program to load a CSV file into a Pandas DataFrame and display
Answer:
import pandas as pd
data = pd.read_csv(file_path)
print("DataFrame:")
print(data)
numerical_data = data.select_dtypes(include=['number'])
# Mean
mean_values = numerical_data.mean()
print("\nMean of numerical columns:")
print(mean_values)
# Median
median_values = numerical_data.median()
print(median_values)
# Mode
mode_values = numerical_data.mode()
Question: Write a Dask program to load a large CSV file, filter the data based on specific
Answer:
import dask.dataframe as dd
file_path = 'large_data.csv' # Replace with the path to your large CSV file
data = dd.read_csv(file_path)
# Define the filtering criteria (e.g., filter rows where 'column_name' > 50)
Question: Write a Python function to calculate the mean, median, and mode of a given list of
numerical values.
Answer:
def calculate_statistics(numbers):
"""
Args:
Returns:
"""
if not numbers:
try:
stats = {
"mean": mean(numbers),
"median": median(numbers),
"mode": mode(numbers),
except StatisticsError:
# Handle cases where mode is not defined (e.g., all values occur equally)
stats = {
"mean": mean(numbers),
"median": median(numbers),
return stats
# Example usage
result = calculate_statistics(numbers)
print("Mean:", result["mean"])
print("Median:", result["median"])
print("Mode:", result["mode"])