Data Science Practical
Data Science Practical
OUTPUT:
Page | 2
Program -2
AIM-2: Write a program to create student name list and count number of
students.
CODE:
Create an empty list to store student names
student_names = []
num_students = int(input("Enter the number of students: "))
for i in range(num_students):
student_name = input(f"Enter the name of student {i + 1}: ")
student_names.append(student_name)
print("\nList of Students:")
for student in student_names:
print(student)
print("\nTotal number of students:", len(student_names))
OUTPUT:
Page | 3
Program -3
AIM-3: Write a python program to create or merge two dictionary.
CODE:
OUTPUT:
Page | 4
Program -4
AIM-4: write a python program to get no. of dimention of a numpy array in
python.
CODE:
import numpy as np
# Example NumPy array
arr = np.array([[1, 2, 3], [4, 5, 6]])
# Get the number of dimensions of the array
num_dimensions = arr.ndim
# Print the number of dimensions
print("Number of dimensions:", num_dimensions)
OUTPUT:
Page | 5
Program -5
AIM-5: Write a python program to read txt, xlxs, csv file.
CODE:
#f= open("C:\\Notes\\DataScience\\Practical\\demofile.txt","r")
f=open("demofile.txt","r")
print(f.read())
OUTPUT:
Page | 6
Program -6
AIM-6: Develop a model using navie bayes sum clustering on python
environment # Import necessary libraries.
CODE:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, confusion_matrix,
classification_report
from sklearn.datasets import load_iris
Page | 7
# Step 5: Predict on the test set
y_pred = nb_model.predict(X_test)
Page | 8
from sklearn.svm import SVC # Support Vector Classifier
from sklearn.metrics import accuracy_score, confusion_matrix,
classification_report
from sklearn.datasets import load_iris
Page | 9
class_report = classification_report(y_test, y_pred)
# Display results
print(f"Accuracy: {accuracy:.4f}")
print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(class_report)
//clustering
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
Page | 10
X = pd.DataFrame(data.data, columns=data.feature_names) # Features
# Step 5: Visualize the Clusters using PCA (to reduce to 2D for plotting)
pca = PCA(n_components=2) # Reduce the data to 2D for visualization
X_pca = pca.fit_transform(X_scaled)
# Create a DataFrame for the PCA components and cluster labels
df_pca = pd.DataFrame(X_pca, columns=['PC1', 'PC2'])
df_pca['Cluster'] = labels
Page | 11
plt.scatter(centroids[:, 0], centroids[:, 1], s=200, c='red', marker='X',
label='Centroids')
plt.title('K-Means Clustering of Iris Dataset (PCA-reduced to 2D)')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend()
plt.show()
# Optional: Print the cluster centers and labels
print("Cluster Centers (Centroids):")
print(centroids)
print("\nCluster Labels for Each Sample:")
pint(labels)
Page | 12
Program -7
AIM-7: Car price using and feature like milages, age etc using histogram,
scatter plot,data visualization, Svm, clustering.# Import necessary libraries.
CODE:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR # Support Vector Regression
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.metrics import mean_squared_error
# Step 1: Create a mock car dataset
np.random.seed(42)
# Create random data for car features
data = {
'Mileage': np.random.uniform(5000, 150000, 200), # Mileage in km
'Age': np.random.randint(1, 20, 200), # Age of the car in years
'Engine_Size': np.random.uniform(1.0, 5.0, 200), # Engine size in liters
'Car_Price': np.random.uniform(10000, 50000, 200) # Car Price in USD
}
# Create a DataFrame
df = pd.DataFrame(data)
# Step 2: Data Visualization
# Plot histograms of the features
Page | 13
plt.figure(figsize=(12, 10))
plt.subplot(2, 2, 1)
sns.histplot(df['Mileage'], kde=True)
plt.title('Mileage Distribution')
plt.subplot(2, 2, 2)
sns.histplot(df['Age'], kde=True)
plt.title('Age Distribution')
plt.subplot(2, 2, 3)
sns.histplot(df['Engine_Size'], kde=True)
plt.title('Engine Size Distribution')
plt.subplot(2, 2, 4)
sns.histplot(df['Car_Price'], kde=True)
plt.title('Car Price Distribution')
plt.tight_layout()
plt.show()
# Scatter plots to explore relationships
plt.figure(figsize=(12, 6))
# Scatter plot between mileage and car price
plt.subplot(1, 2, 1)
sns.scatterplot(x=df['Mileage'], y=df['Car_Price'])
plt.title('Mileage vs Car Price')
# Scatter plot between age and car price
plt.subplot(1, 2, 2)
sns.scatterplot(x=df['Age'], y=df['Car_Price'])
plt.title('Age vs Car Price')
plt.tight_layout()
Page | 14
plt.show()
# Step 3: Preprocessing and Feature Scaling
X = df[['Mileage', 'Age', 'Engine_Size']] # Features
y = df['Car_Price'] # Target
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)
# Standardize features using StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Step 4: SVM Model (Support Vector Regression)
svm_model = SVR(kernel='rbf')
svm_model.fit(X_train_scaled, y_train)
# Predict car prices using the test data
y_pred_svm = svm_model.predict(X_test_scaled)
# Evaluate the model
mse_svm = mean_squared_error(y_test, y_pred_svm)
print(f"Support Vector Regression Model Mean Squared Error:
{mse_svm:.2f}")
# Step 5: K-Means Clustering
kmeans = KMeans(n_clusters=3, random_state=42)
df['Cluster'] = kmeans.fit_predict(X[['Mileage', 'Age', 'Engine_Size']])
# Visualize the clusters
plt.figure(figsize=(8, 6))
sns.scatterplot(x=df['Mileage'], y=df['Car_Price'], hue=df['Cluster'],
palette='Set2', s=100)
Page | 15
plt.title('Car Price vs Mileage with Clusters')
plt.xlabel('Mileage')
plt.ylabel('Car Price')
plt.legend(title='Cluster')
plt.show()
# Step 6: Scatter plot of Clusters (Age vs Car Price)
plt.figure(figsize=(8, 6))
sns.scatterplot(x=df['Age'], y=df['Car_Price'], hue=df['Cluster'], palette='Set2',
s=100)
plt.title('Car Price vs Age with Clusters')
plt.xlabel('Age')
plt.ylabel('Car Price')
plt.legend(title='Cluster')
plt.show()
OUTPUT:
Page | 16
Program -8
AIM-8: Write a python program to predict income using data visualization,
model development and clustering?
CODE:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.metrics import mean_squared_error
Page | 17
}
# Create a DataFrame
df = pd.DataFrame(data)
Page | 18
plt.title('Age vs Income')
# Scatter plot between Years of Experience and Income
plt.subplot(1, 2, 2)
sns.scatterplot(x=df['Years_of_Experience'], y=df['Income'])
plt.title('Years of Experience vs Income')
plt.tight_layout()
plt.show()
Page | 19
print(f"Linear Regression Model Mean Squared Error: {mse:.2f}")
Page | 20
OUTPUT:
Page | 21
Program -9
AIM-9: Write a program to create data from in R.
CODE:
Create individual vectors for each column
names <- c("Alice", "Bob", "Charlie", "David", "Eva")
ages <- c(25, 30, 35, 40, 45)
departments <- c("HR", "IT", "Marketing", "Finance", "IT")
salaries <- c(50000, 60000, 70000, 80000, 90000)
OUTPUT:
Page | 22
Program -10
AIM-10: Write a program to create basic scatter plot in R.
CODE:
# Sample data for x and y axes
x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 6, 8, 10)
# Create the scatter plot
plot(x, y,
main = "Basic Scatter Plot", # Title of the plot
xlab = "X Axis",
ylab = "Y Axis",
pch = 19,
col = "blue")
# Label for the x-axis
# Label for the y-axis
# Point character (19 is a filled circle)
# Color of the points
OUTPUT:
Page | 23