0% found this document useful (0 votes)
25 views22 pages

Data Science Practical

The document contains a series of Python and R programs aimed at various programming tasks, including string concatenation, creating student lists, merging dictionaries, working with NumPy arrays, reading files, and developing machine learning models. It also includes data visualization techniques and clustering methods using libraries like Matplotlib, Seaborn, and Scikit-learn. Each program is presented with its aim, code, and expected output.

Uploaded by

surykant4102
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
25 views22 pages

Data Science Practical

The document contains a series of Python and R programs aimed at various programming tasks, including string concatenation, creating student lists, merging dictionaries, working with NumPy arrays, reading files, and developing machine learning models. It also includes data visualization techniques and clustering methods using libraries like Matplotlib, Seaborn, and Scikit-learn. Each program is presented with its aim, code, and expected output.

Uploaded by

surykant4102
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 22

Program -1

AIM-1: Write a program to concatenate two strings in python Environment.


CODE: a="hello"
b="world" c=a+b
print(c)

OUTPUT:

Page | 2
Program -2
AIM-2: Write a program to create student name list and count number of
students.
CODE:
Create an empty list to store student names
student_names = []
num_students = int(input("Enter the number of students: "))
for i in range(num_students):
student_name = input(f"Enter the name of student {i + 1}: ")
student_names.append(student_name)
print("\nList of Students:")
for student in student_names:
print(student)
print("\nTotal number of students:", len(student_names))

OUTPUT:

Page | 3
Program -3
AIM-3: Write a python program to create or merge two dictionary.
CODE:

# Define two dictionaries


dict1 = {'a': 1, 'b': 2}
dict2 = {'b': 3, 'c': 4}
merged_dict2 = {**dict1, **dict2}

OUTPUT:

Page | 4
Program -4
AIM-4: write a python program to get no. of dimention of a numpy array in
python.
CODE:
import numpy as np
# Example NumPy array
arr = np.array([[1, 2, 3], [4, 5, 6]])
# Get the number of dimensions of the array
num_dimensions = arr.ndim
# Print the number of dimensions
print("Number of dimensions:", num_dimensions)

OUTPUT:

Page | 5
Program -5
AIM-5: Write a python program to read txt, xlxs, csv file.
CODE:
#f= open("C:\\Notes\\DataScience\\Practical\\demofile.txt","r")
f=open("demofile.txt","r")
print(f.read())

OUTPUT:

Page | 6
Program -6
AIM-6: Develop a model using navie bayes sum clustering on python
environment # Import necessary libraries.
CODE:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, confusion_matrix,
classification_report
from sklearn.datasets import load_iris

# Step 1: Load the Iris dataset


data = load_iris()
X = pd.DataFrame(data.data, columns=data.feature_names) # Features
y = pd.Series(data.target) # Target labels (0, 1, 2 for the three species)

# Step 2: Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)

# Step 3: Initialize the Naive Bayes model (Gaussian Naive Bayes)


nb_model = GaussianNB()

# Step 4: Train the model


nb_model.fit(X_train, y_train)

Page | 7
# Step 5: Predict on the test set
y_pred = nb_model.predict(X_test)

# Step 6: Evaluate the model


accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)
# Display results
print(f"Accuracy: {accuracy:.4f}")
print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(class_report)

//support vector model


# Import necessary libraries
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

Page | 8
from sklearn.svm import SVC # Support Vector Classifier
from sklearn.metrics import accuracy_score, confusion_matrix,
classification_report
from sklearn.datasets import load_iris

# Step 1: Load the Iris dataset


data = load_iris()
X = pd.DataFrame(data.data, columns=data.feature_names) # Features
y = pd.Series(data.target) # Target labels (0, 1, 2 for the three species)

# Step 2: Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)

# Step 3: Initialize the SVM model (Support Vector Classifier)


svm_model = SVC(kernel='linear') # You can change the kernel type to 'poly',
'rbf', 'sigmoid'

# Step 4: Train the model


svm_model.fit(X_train, y_train)

# Step 5: Predict on the test set


y_pred = svm_model.predict(X_test)

# Step 6: Evaluate the model


accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)

Page | 9
class_report = classification_report(y_test, y_pred)
# Display results
print(f"Accuracy: {accuracy:.4f}")
print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(class_report)

//clustering
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

# Step 1: Load the Iris dataset


data = load_iris()

Page | 10
X = pd.DataFrame(data.data, columns=data.feature_names) # Features

# Step 2: Standardize the features (Important for K-Means)


scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Step 3: Apply K-Means Clustering


# We will use 3 clusters since we know there are 3 species in the Iris dataset
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X_scaled)

# Step 4: Get the cluster labels and centroids


labels = kmeans.labels_
centroids = kmeans.cluster_centers_

# Step 5: Visualize the Clusters using PCA (to reduce to 2D for plotting)
pca = PCA(n_components=2) # Reduce the data to 2D for visualization
X_pca = pca.fit_transform(X_scaled)
# Create a DataFrame for the PCA components and cluster labels
df_pca = pd.DataFrame(X_pca, columns=['PC1', 'PC2'])
df_pca['Cluster'] = labels

# Step 6: Plot the clusters


plt.figure(figsize=(8, 6))
plt.scatter(df_pca['PC1'], df_pca['PC2'], c=df_pca['Cluster'], cmap='viridis',
marker='o')

Page | 11
plt.scatter(centroids[:, 0], centroids[:, 1], s=200, c='red', marker='X',
label='Centroids')
plt.title('K-Means Clustering of Iris Dataset (PCA-reduced to 2D)')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend()
plt.show()
# Optional: Print the cluster centers and labels
print("Cluster Centers (Centroids):")
print(centroids)
print("\nCluster Labels for Each Sample:")
pint(labels)

Page | 12
Program -7
AIM-7: Car price using and feature like milages, age etc using histogram,
scatter plot,data visualization, Svm, clustering.# Import necessary libraries.
CODE:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.svm import SVR # Support Vector Regression
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.metrics import mean_squared_error
# Step 1: Create a mock car dataset
np.random.seed(42)
# Create random data for car features
data = {
'Mileage': np.random.uniform(5000, 150000, 200), # Mileage in km
'Age': np.random.randint(1, 20, 200), # Age of the car in years
'Engine_Size': np.random.uniform(1.0, 5.0, 200), # Engine size in liters
'Car_Price': np.random.uniform(10000, 50000, 200) # Car Price in USD
}
# Create a DataFrame
df = pd.DataFrame(data)
# Step 2: Data Visualization
# Plot histograms of the features

Page | 13
plt.figure(figsize=(12, 10))
plt.subplot(2, 2, 1)
sns.histplot(df['Mileage'], kde=True)
plt.title('Mileage Distribution')
plt.subplot(2, 2, 2)
sns.histplot(df['Age'], kde=True)
plt.title('Age Distribution')
plt.subplot(2, 2, 3)
sns.histplot(df['Engine_Size'], kde=True)
plt.title('Engine Size Distribution')
plt.subplot(2, 2, 4)
sns.histplot(df['Car_Price'], kde=True)
plt.title('Car Price Distribution')
plt.tight_layout()
plt.show()
# Scatter plots to explore relationships
plt.figure(figsize=(12, 6))
# Scatter plot between mileage and car price
plt.subplot(1, 2, 1)
sns.scatterplot(x=df['Mileage'], y=df['Car_Price'])
plt.title('Mileage vs Car Price')
# Scatter plot between age and car price
plt.subplot(1, 2, 2)
sns.scatterplot(x=df['Age'], y=df['Car_Price'])
plt.title('Age vs Car Price')
plt.tight_layout()

Page | 14
plt.show()
# Step 3: Preprocessing and Feature Scaling
X = df[['Mileage', 'Age', 'Engine_Size']] # Features
y = df['Car_Price'] # Target
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)
# Standardize features using StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Step 4: SVM Model (Support Vector Regression)
svm_model = SVR(kernel='rbf')
svm_model.fit(X_train_scaled, y_train)
# Predict car prices using the test data
y_pred_svm = svm_model.predict(X_test_scaled)
# Evaluate the model
mse_svm = mean_squared_error(y_test, y_pred_svm)
print(f"Support Vector Regression Model Mean Squared Error:
{mse_svm:.2f}")
# Step 5: K-Means Clustering
kmeans = KMeans(n_clusters=3, random_state=42)
df['Cluster'] = kmeans.fit_predict(X[['Mileage', 'Age', 'Engine_Size']])
# Visualize the clusters
plt.figure(figsize=(8, 6))
sns.scatterplot(x=df['Mileage'], y=df['Car_Price'], hue=df['Cluster'],
palette='Set2', s=100)

Page | 15
plt.title('Car Price vs Mileage with Clusters')
plt.xlabel('Mileage')
plt.ylabel('Car Price')
plt.legend(title='Cluster')
plt.show()
# Step 6: Scatter plot of Clusters (Age vs Car Price)
plt.figure(figsize=(8, 6))
sns.scatterplot(x=df['Age'], y=df['Car_Price'], hue=df['Cluster'], palette='Set2',
s=100)
plt.title('Car Price vs Age with Clusters')
plt.xlabel('Age')
plt.ylabel('Car Price')
plt.legend(title='Cluster')
plt.show()

OUTPUT:

Page | 16
Program -8
AIM-8: Write a python program to predict income using data visualization,
model development and clustering?
CODE:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.metrics import mean_squared_error

# Step 1: Create a mock dataset for predicting income


np.random.seed(42)
# Create random data for features
data = {
'Age': np.random.randint(18, 65, 500), # Age in years
'Education_Level': np.random.randint(1, 5, 500), # Education level (1: High
School, 2: Bachelor, 3: Master, 4:
PhD)
'Years_of_Experience': np.random.randint(0, 40, 500), # Years of work
experience
'Hours_Worked_Per_Week': np.random.randint(20, 80, 500), # Hours worked
per week
'Income': np.random.uniform(30000, 120000, 500) # Income in USD

Page | 17
}
# Create a DataFrame
df = pd.DataFrame(data)

# Step 2: Data Visualization


# Plot histograms for each feature
plt.figure(figsize=(12, 10))
plt.subplot(2, 2, 1)
sns.histplot(df['Age'], kde=True)
plt.title('Age Distribution')
plt.subplot(2, 2, 2)
sns.histplot(df['Education_Level'], kde=True)
plt.title('Education Level Distribution')
plt.subplot(2, 2, 3)
sns.histplot(df['Years_of_Experience'], kde=True)
plt.title('Years of Experience Distribution')
plt.subplot(2, 2, 4)
sns.histplot(df['Income'], kde=True)
plt.title('Income Distribution')
plt.tight_layout()
plt.show()
# Scatter plots to explore relationships with income
plt.figure(figsize=(12, 6))
# Scatter plot between Age and Income
plt.subplot(1, 2, 1)
sns.scatterplot(x=df['Age'], y=df['Income'])

Page | 18
plt.title('Age vs Income')
# Scatter plot between Years of Experience and Income
plt.subplot(1, 2, 2)
sns.scatterplot(x=df['Years_of_Experience'], y=df['Income'])
plt.title('Years of Experience vs Income')
plt.tight_layout()
plt.show()

# Step 3: Preprocessing and Feature Scaling


X = df[['Age', 'Education_Level', 'Years_of_Experience',
'Hours_Worked_Per_Week']] # Features
y = df['Income'] # Target (Income)
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)
# Standardize the features using StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Step 4: Linear Regression Model (Predicting Income)


reg_model = LinearRegression()
reg_model.fit(X_train_scaled, y_train)
# Predict income using the test data
y_pred = reg_model.predict(X_test_scaled)
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)

Page | 19
print(f"Linear Regression Model Mean Squared Error: {mse:.2f}")

# Step 5: K-Means Clustering


# Use K-Means to find patterns/clusters in the income data
kmeans = KMeans(n_clusters=4, random_state=42) # We'll use 4 clusters for
this example
df['Cluster'] = kmeans.fit_predict(X[['Age', 'Education_Level',
'Years_of_Experience',
'Hours_Worked_Per_Week']])
# Visualize the clusters
plt.figure(figsize=(8, 6))
sns.scatterplot(x=df['Age'], y=df['Income'], hue=df['Cluster'], palette='Set2',
s=100)
plt.title('Income vs Age with Clusters')
plt.xlabel('Age')
plt.ylabel('Income')
plt.legend(title='Cluster')
plt.show()
# Scatter plot of Experience vs Income with clusters
plt.figure(figsize=(8, 6))
sns.scatterplot(x=df['Years_of_Experience'], y=df['Income'], hue=df['Cluster'],
palette='Set2', s=100)
plt.title('Income vs Years of Experience with Clusters')
plt.xlabel('Years of Experience')
plt.ylabel('Income')
plt.legend(title='Cluster')
plt.show()

Page | 20
OUTPUT:

Page | 21
Program -9
AIM-9: Write a program to create data from in R.
CODE:
Create individual vectors for each column
names <- c("Alice", "Bob", "Charlie", "David", "Eva")
ages <- c(25, 30, 35, 40, 45)
departments <- c("HR", "IT", "Marketing", "Finance", "IT")
salaries <- c(50000, 60000, 70000, 80000, 90000)

# Combine these vectors into a data frame


employee_data <- data.frame(Name = names, Age = ages, Department =
departments, Salary = salaries)
# Display the data frame
print(employee_data)

OUTPUT:

Page | 22
Program -10
AIM-10: Write a program to create basic scatter plot in R.
CODE:
# Sample data for x and y axes
x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 6, 8, 10)
# Create the scatter plot
plot(x, y,
main = "Basic Scatter Plot", # Title of the plot
xlab = "X Axis",
ylab = "Y Axis",
pch = 19,
col = "blue")
# Label for the x-axis
# Label for the y-axis
# Point character (19 is a filled circle)
# Color of the points

OUTPUT:

Page | 23

You might also like