Python Programming & Data Science Lab Manual
Python Programming & Data Science Lab Manual
LAB MANUAL
Program Outcomes
0 0 3 1.5
Course Objectives:
Understand the implementation procedures for the machine learning algorithms. (L2)
Identify and apply Machine Learning algorithms to solve real-world problems (L1)
1. Write a program to demonstrate a) Different numeric data types and b) To perform different
Arithmetic Operations on numbers in Python.
5. Write a program to demonstrate a) arrays b) array indexing such as slicing, integer array
indexing and Boolean array indexing along with their basic operations in NumPy.
6. Write a program to compute summary statistics such as mean, median, mode, standard
deviation and variance of the given different types of data.
7. Write a script named copyfile.py. This script should prompt the user for the names of two text
files. The contents of the first file should be the input that to be written to the second file.
8. Write a program to demonstrate Regression analysis with residual plots on a given data set.
9. Write a program to demonstrate the working of the decision tree-based ID3 algorithm. Use an
appropriate data set for building the decision tree and apply this knowledge to classify a new
sample.
10. Write a program to implement the Naïve Bayesian classifier for a sample training data set
stored as a .CSV file. Compute the accuracy of the classifier, considering few test data sets.
11. Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set. Print
both correct and wrong predictions using Java/Python ML library classes.
12. Write a program to implement k-Means clustering algorithm to cluster the set of data stored in
.CSV file. Compare the results of various “k” values for the quality of clustering.
13. Write a program to build Artificial Neural Network and test the same using appropriate data
sets.
EXPERIMENT-1
Write a program to demonstrate a) Different numeric data types and b) To perform different
Arithmetic Operations on numbers in Python
EXPERIMENT-2
# Creating a List
List = []
print("Blank List: ")
print(List)
# Creating a List of numbers
List = [10, 20, 14]
print("\nList of numbers: ")
print(List)
# Creating a List of strings and accessing
# using index
List = ["Python",”Programming”]
print("\nList Items: ")
print(List[0])
print(List[2])
List = []
print("Initial blank List: ")
print(List)
# Addition of Elements
EXPERIMENT – 3
my_tuple = ()
print(my_tuple)
my_tuple = (1, 2, 3)
print(my_tuple)
print(my_tuple)
# nested tuple
print(my_tuple)
my_tuple = ('p','e','r','m','i','t')
print(my_tuple[0]) # 'p'
print(my_tuple[5]) # 't'
# nested tuple
# nested index
print(n_tuple[0][3]) # 's'
EXPERIMENT-4
# empty dictionary
my_dict = {}
# using dict()
# Output: Jack
print(my_dict['name'])
# Output: 26
print(my_dict.get('age'))
# Output None
print(my_dict.get('address'))
# KeyError
print(my_dict['address'])
EXPERIMENT – 5
Write a program to demonstrate a) arrays b) array indexing such as slicing, integer array indexing
and Boolean array indexing along with their basic operations in NumPy.
import numpy as np
list1 = [1, 2, 3, 4, 5, 6]
list2 = [10, 9, 8, 7, 6, 5]
a1 = np.array(list1)
a2 = np.array(list2)
print(a1*a2)
import numpy as np
a = np.arrange(10, 1, -2)
import numpy as np
a = np.arrange(20)
# a[start:stop:step]
EXPERIMENT-6
Write a program to compute summary statistics such as mean, median, mode, standard deviation
import pandas as pd
import numpy as np
import statistics as st
# Load the data
df = pd.read_csv("data_desc.csv")
print(df.shape)
print(df.info())
print(df.loc[:,'Age'].mean())
print(df.loc[:,'Income'].mean())
df.mean(axis = 1)[0:5]
Write a script named copyfile.py. This script should prompt the user for the names of two text files. The
contents of the first file should be the input that to be written to the second file.
Write a program to demonstrate Regression analysis with residual plots on a given data set.
import numpy as np
import matplotlib.pyplot as plt
def estimate_coef(x, y):
# number of observations/points
n = np.size(x)
# mean of x and y vector
m_x = np.mean(x)
m_y = np.mean(y)
# calculating cross-deviation and deviation about x
SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x
# calculating regression coefficients
b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x
return (b_0, b_1)
def plot_regression_line(x, y, b):
# plotting the actual points as scatter plot
plt.scatter(x, y, color = "m",marker = "o", s = 30)
# predicted response vector
y_pred = b[0] + b[1]*x
# plotting the regression line
plt.plot(x, y_pred, color = "g")
# putting labels
plt.xlabel('x')
plt.ylabel('y')
# function to show plot
plt.show()
def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
EXPERIMENT-9
Write a program to demonstrate the working of the decision tree-based ID3 algorithm.
# Importing the required packages
import numpy as np
import pandas as pd
from sklearn.metrics import confusion_matrix
from sklearn.cross_validation import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report
# Function importing Dataset
def importdata():
balance_data = pd.read_csv(
'https://archive.ics.uci.edu/ml/machine-learning-'+
'databases/balance-scale/balance-scale.data',
sep= ',', header = None)
# Printing the dataswet shape
print ("Dataset Length: ", len(balance_data))
print ("Dataset Shape: ", balance_data.shape)
# Printing the dataset obseravtions
print ("Dataset: ",balance_data.head())
return balance_data
Write a program to implement the Naïve Bayesian classifier for a sample training data set stored as
a .CSV file.
class NaiveBayesClassifier:
def __init__(self, X, y):
'''X and y denotes the features and the target labels respectively'''
self.X, self.y = X, y
self.N = len(self.X) # Length of the training set
self.dim = len(self.X[0]) # Dimension of the vector of features
self.attrs = [[] for _ in range(self.dim)] # Here we'll store the columns of the training set
self.output_dom = {} # Output classes with the number of ocurrences in the training set. In this case
we have only 2 classes
self.data = [] # To store every row [Xi, yi]
for i in range(len(self.X)):
for j in range(self.dim):
# if we have never seen this value for this attr before,
# then we add it to the attrs array in the corresponding position
if not self.X[i][j] in self.attrs[j]:
self.attrs[j].append(self.X[i][j])
# if we have never seen this output class before,
# then we add it to the output_dom and count one occurrence for now
if not self.y[i] in self.output_dom.keys():
self.output_dom[self.y[i]] = 1
# otherwise, we increment the occurrence of this output in the training set by 1
else:
self.output_dom[self.y[i]] += 1
# store the row
self.data.append([self.X[i], self.y[i]])
def classify(self, entry):
solve = None # Final result
max_arg = -1 # partial maximum
for y in self.output_dom.keys():
EXPERIMENT-11
Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['SimHei']
# Data generation
train_num = 200
test_num = 100
config = {
'Corn': [[150, 190], [40, 70], [2,4]],
'Potato': [[30, 60], [7, 10], [1, 2]],
'grass': [[10, 40], [10, 40], [0, 1]]
}
plants = list(config.keys())
dataset = pd.DataFrame(columns=['height(cm)', 'Leaf length(cm)', 'Stem diameter(cm)',
'type'])
index = 0
# Natural
for p in config:
for i in range(int(train_num/3-3)):
row = []
Data visualization
We can see the distribution of data points by drawing a scatter diagram of the
dataset = pd.read_csv('potato_train_data.csv')
labels = pd.read_csv('potato_train_label.csv')
features = list(dataset.keys())
classes = np.array(['Corn', 'Potato', 'grass'])
for i in range(3):
labels.loc[labels['type']==classes[i], 'type'] = i
EXPERIMENT-12
Write a program to implement k-Means clustering algorithm to cluster the set of data stored in
.CSV file.