Data Science Practical

Program -1
AIM-1: Write a program to concatenate two strings in python Environment.

CODE: a="hello"
b="world" c=a+b
print(c)
OUTPUT:
Page | 2
Program -2
AIM-2: Write a program to create student name list and count number of
students.
CODE:
Create an empty list to store student names
student_names = []
num_students = int(input("Enter the number of students: "))
for i in range(num_students):
student_name = input(f"Enter the name of student {i + 1}: ")
student_names.append(student_name)
print("\nList of Students:")
for student in student_names:
print(student)
print("\nTotal number of students:", len(student_names))
OUTPUT:
Page | 3
Program -3
AIM-3: Write a python program to create or merge two dictionary.
CODE:
# Define two dictionaries

dict1 = {'a': 1, 'b': 2}
dict2 = {'b': 3, 'c': 4}
merged_dict2 = {**dict1, **dict2}
OUTPUT:
Page | 4
Program -4
AIM-4: write a python program to get no. of dimention of a numpy array in
python.
CODE:
import numpy as np
# Example NumPy array
arr = np.array([[1, 2, 3], [4, 5, 6]])
# Get the number of dimensions of the array
num_dimensions = arr.ndim
# Print the number of dimensions
print("Number of dimensions:", num_dimensions)
OUTPUT:
Page | 5
Program -5
AIM-5: Write a python program to read txt, xlxs, csv file.
CODE:
#f= open("C:\\Notes\\DataScience\\Practical\\demofile.txt","r")
f=open("demofile.txt","r")
print(f.read())
OUTPUT:
Page | 6
Program -6
AIM-6: Develop a model using navie bayes sum clustering on python
environment # Import necessary libraries.
CODE:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, confusion_matrix,
classification_report
from sklearn.datasets import load_iris
# Step 1: Load the Iris dataset

data = load_iris()
X = pd.DataFrame(data.data, columns=data.feature_names) # Features
y = pd.Series(data.target) # Target labels (0, 1, 2 for the three species)
# Step 2: Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)
# Step 3: Initialize the Naive Bayes model (Gaussian Naive Bayes)

nb_model = GaussianNB()
# Step 4: Train the model

nb_model.fit(X_train, y_train)
Page | 7
# Step 5: Predict on the test set
y_pred = nb_model.predict(X_test)
# Step 6: Evaluate the model

accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)
# Display results
print(f"Accuracy: {accuracy:.4f}")
print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(class_report)
//support vector model

# Import necessary libraries
import numpy as np
import pandas as pd
Page | 8
from sklearn.svm import SVC # Support Vector Classifier
from sklearn.metrics import accuracy_score, confusion_matrix,
classification_report

data = load_iris()
y = pd.Series(data.target) # Target labels (0, 1, 2 for the three species)
# Step 2: Split the data into training and testing sets

random_state=42)
# Step 3: Initialize the SVM model (Support Vector Classifier)

svm_model = SVC(kernel='linear') # You can change the kernel type to 'poly',
'rbf', 'sigmoid'
# Step 4: Train the model

svm_model.fit(X_train, y_train)
# Step 5: Predict on the test set

y_pred = svm_model.predict(X_test)
# Step 6: Evaluate the model

accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
Page | 9
class_report = classification_report(y_test, y_pred)
# Display results
print(f"Accuracy: {accuracy:.4f}")
print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(class_report)
//clustering
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

data = load_iris()
Page | 10
# Step 2: Standardize the features (Important for K-Means)

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Step 3: Apply K-Means Clustering

# We will use 3 clusters since we know there are 3 species in the Iris dataset
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X_scaled)
# Step 4: Get the cluster labels and centroids

labels = kmeans.labels_
centroids = kmeans.cluster_centers_
# Step 5: Visualize the Clusters using PCA (to reduce to 2D for plotting)
pca = PCA(n_components=2) # Reduce the data to 2D for visualization
X_pca = pca.fit_transform(X_scaled)
# Create a DataFrame for the PCA components and cluster labels
df_pca = pd.DataFrame(X_pca, columns=['PC1', 'PC2'])
df_pca['Cluster'] = labels
# Step 6: Plot the clusters

plt.figure(figsize=(8, 6))
plt.scatter(df_pca['PC1'], df_pca['PC2'], c=df_pca['Cluster'], cmap='viridis',
marker='o')
Page | 11
plt.scatter(centroids[:, 0], centroids[:, 1], s=200, c='red', marker='X',
label='Centroids')
plt.title('K-Means Clustering of Iris Dataset (PCA-reduced to 2D)')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend()
plt.show()
# Optional: Print the cluster centers and labels
print("Cluster Centers (Centroids):")
print(centroids)
print("\nCluster Labels for Each Sample:")
pint(labels)
Page | 12
Program -7
AIM-7: Car price using and feature like milages, age etc using histogram,
scatter plot,data visualization, Svm, clustering.# Import necessary libraries.
CODE:
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.svm import SVR # Support Vector Regression
from sklearn.metrics import mean_squared_error
# Step 1: Create a mock car dataset
np.random.seed(42)
# Create random data for car features
data = {
'Mileage': np.random.uniform(5000, 150000, 200), # Mileage in km
'Age': np.random.randint(1, 20, 200), # Age of the car in years
'Engine_Size': np.random.uniform(1.0, 5.0, 200), # Engine size in liters
'Car_Price': np.random.uniform(10000, 50000, 200) # Car Price in USD
}
# Create a DataFrame
df = pd.DataFrame(data)
# Step 2: Data Visualization
# Plot histograms of the features
Page | 13
plt.subplot(2, 2, 1)
sns.histplot(df['Mileage'], kde=True)
plt.title('Mileage Distribution')
sns.histplot(df['Age'], kde=True)
plt.title('Age Distribution')
sns.histplot(df['Engine_Size'], kde=True)
plt.title('Engine Size Distribution')
sns.histplot(df['Car_Price'], kde=True)
plt.title('Car Price Distribution')
plt.tight_layout()
plt.show()
# Scatter plots to explore relationships
# Scatter plot between mileage and car price
sns.scatterplot(x=df['Mileage'], y=df['Car_Price'])
plt.title('Mileage vs Car Price')
# Scatter plot between age and car price
sns.scatterplot(x=df['Age'], y=df['Car_Price'])
plt.title('Age vs Car Price')
plt.tight_layout()
Page | 14
plt.show()
# Step 3: Preprocessing and Feature Scaling
X = df[['Mileage', 'Age', 'Engine_Size']] # Features
y = df['Car_Price'] # Target
# Split data into training and testing sets
random_state=42)
# Standardize features using StandardScaler
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Step 4: SVM Model (Support Vector Regression)
svm_model = SVR(kernel='rbf')
svm_model.fit(X_train_scaled, y_train)
# Predict car prices using the test data
y_pred_svm = svm_model.predict(X_test_scaled)
# Evaluate the model
mse_svm = mean_squared_error(y_test, y_pred_svm)
print(f"Support Vector Regression Model Mean Squared Error:
{mse_svm:.2f}")
# Step 5: K-Means Clustering
kmeans = KMeans(n_clusters=3, random_state=42)
df['Cluster'] = kmeans.fit_predict(X[['Mileage', 'Age', 'Engine_Size']])
# Visualize the clusters
sns.scatterplot(x=df['Mileage'], y=df['Car_Price'], hue=df['Cluster'],
palette='Set2', s=100)
Page | 15
plt.title('Car Price vs Mileage with Clusters')
plt.xlabel('Mileage')
plt.ylabel('Car Price')
plt.legend(title='Cluster')
plt.show()
# Step 6: Scatter plot of Clusters (Age vs Car Price)
sns.scatterplot(x=df['Age'], y=df['Car_Price'], hue=df['Cluster'], palette='Set2',
s=100)
plt.title('Car Price vs Age with Clusters')
plt.xlabel('Age')
plt.ylabel('Car Price')
plt.show()
OUTPUT:
Page | 16
Program -8
AIM-8: Write a python program to predict income using data visualization,
model development and clustering?
CODE:
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# Step 1: Create a mock dataset for predicting income

np.random.seed(42)
# Create random data for features
data = {
'Age': np.random.randint(18, 65, 500), # Age in years
'Education_Level': np.random.randint(1, 5, 500), # Education level (1: High
School, 2: Bachelor, 3: Master, 4:
PhD)
'Years_of_Experience': np.random.randint(0, 40, 500), # Years of work
experience
'Hours_Worked_Per_Week': np.random.randint(20, 80, 500), # Hours worked
per week
'Income': np.random.uniform(30000, 120000, 500) # Income in USD
Page | 17
}
# Create a DataFrame
df = pd.DataFrame(data)
# Step 2: Data Visualization

# Plot histograms for each feature
sns.histplot(df['Age'], kde=True)
plt.title('Age Distribution')
sns.histplot(df['Education_Level'], kde=True)
plt.title('Education Level Distribution')
sns.histplot(df['Years_of_Experience'], kde=True)
plt.title('Years of Experience Distribution')
sns.histplot(df['Income'], kde=True)
plt.title('Income Distribution')
plt.tight_layout()
plt.show()
# Scatter plots to explore relationships with income
# Scatter plot between Age and Income
sns.scatterplot(x=df['Age'], y=df['Income'])
Page | 18
plt.title('Age vs Income')
# Scatter plot between Years of Experience and Income
sns.scatterplot(x=df['Years_of_Experience'], y=df['Income'])
plt.title('Years of Experience vs Income')
plt.tight_layout()
plt.show()
# Step 3: Preprocessing and Feature Scaling

X = df[['Age', 'Education_Level', 'Years_of_Experience',
'Hours_Worked_Per_Week']] # Features
y = df['Income'] # Target (Income)
# Split data into training and testing sets
random_state=42)
# Standardize the features using StandardScaler
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Step 4: Linear Regression Model (Predicting Income)

reg_model = LinearRegression()
reg_model.fit(X_train_scaled, y_train)
# Predict income using the test data
y_pred = reg_model.predict(X_test_scaled)
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
Page | 19
print(f"Linear Regression Model Mean Squared Error: {mse:.2f}")
# Step 5: K-Means Clustering

# Use K-Means to find patterns/clusters in the income data
kmeans = KMeans(n_clusters=4, random_state=42) # We'll use 4 clusters for
this example
df['Cluster'] = kmeans.fit_predict(X[['Age', 'Education_Level',
'Years_of_Experience',
'Hours_Worked_Per_Week']])
# Visualize the clusters
sns.scatterplot(x=df['Age'], y=df['Income'], hue=df['Cluster'], palette='Set2',
s=100)
plt.title('Income vs Age with Clusters')
plt.xlabel('Age')
plt.ylabel('Income')
plt.show()
# Scatter plot of Experience vs Income with clusters
sns.scatterplot(x=df['Years_of_Experience'], y=df['Income'], hue=df['Cluster'],
palette='Set2', s=100)
plt.title('Income vs Years of Experience with Clusters')
plt.xlabel('Years of Experience')
plt.ylabel('Income')
plt.show()
Page | 20
OUTPUT:
Page | 21
Program -9
AIM-9: Write a program to create data from in R.
CODE:
Create individual vectors for each column
names <- c("Alice", "Bob", "Charlie", "David", "Eva")
ages <- c(25, 30, 35, 40, 45)
departments <- c("HR", "IT", "Marketing", "Finance", "IT")
salaries <- c(50000, 60000, 70000, 80000, 90000)
# Combine these vectors into a data frame

employee_data <- data.frame(Name = names, Age = ages, Department =
departments, Salary = salaries)
# Display the data frame
print(employee_data)
OUTPUT:
Page | 22
Program -10
AIM-10: Write a program to create basic scatter plot in R.
CODE:
# Sample data for x and y axes
x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 6, 8, 10)
# Create the scatter plot
plot(x, y,
main = "Basic Scatter Plot", # Title of the plot
xlab = "X Axis",
ylab = "Y Axis",
pch = 19,
col = "blue")
# Label for the x-axis
# Label for the y-axis
# Point character (19 is a filled circle)
# Color of the points
OUTPUT:
Page | 23

Data Science Practical

Uploaded by

Data Science Practical

Uploaded by

Program -1

AIM-1: Write a program to concatenate two strings in python Environment.

# Define two dictionaries

# Step 1: Load the Iris dataset

# Step 2: Split the data into training and testing sets

# Step 3: Initialize the Naive Bayes model (Gaussian Naive Bayes)

# Step 4: Train the model

# Step 6: Evaluate the model

//support vector model

# Step 1: Load the Iris dataset

# Step 2: Split the data into training and testing sets

# Step 3: Initialize the SVM model (Support Vector Classifier)

# Step 4: Train the model

# Step 5: Predict on the test set

# Step 6: Evaluate the model

# Step 1: Load the Iris dataset

# Step 2: Standardize the features (Important for K-Means)

# Step 3: Apply K-Means Clustering

# Step 4: Get the cluster labels and centroids

# Step 6: Plot the clusters

# Step 1: Create a mock dataset for predicting income

# Step 2: Data Visualization

# Step 3: Preprocessing and Feature Scaling

# Step 4: Linear Regression Model (Predicting Income)

# Step 5: K-Means Clustering

# Combine these vectors into a data frame

You might also like