Data Science Fundamentals Lab
Data Science Fundamentals Lab
4. Frequency distributions
5. Averages
6. Variability
7. Normal curves
9. Correlation coefficient
10. Regression
CONTENTS
Marks Staff
Sl. No. Name of the Experiment Page No. (100) Signature
AIM
ALGORITHM
Step1: Start
Step2: Import numpy module
Step3: Print the basic characteristics and operations of array
Step4: Stop
PROGRAM
import numpy as np
# Creating array object
arr = np.array( [[ 1, 2, 3],
[ 4, 2, 5]] )
# Printing type of arr object
print("Array is of type: ", type(arr))
# Printing array dimensions (axes)
print("No. of dimensions: ", arr.ndim)
# Printing shape of array
print("Shape of array: ", arr.shape)
# Printing size (total number of elements) of array
print("Size of array: ", arr.size)
# Printing type of elements in array
print("Array stores elements of type: ", arr.dtype)
OUTPUT
a = np.array([[1,2,3],[3,4,5],[4,5,6]])
print(a)
print("After slicing")
print(a[1:])
Output
[[1 2 3]
[3 4 5]
[4 5 6]]
After slicing
[[3 4 5]
[4 5 6]]
Output:
Our array is:
[[1 2 3]
[3 4 5]
[4 5 6]]
The items in the second column are:
[2 4 5]
The items in the second row are:
[3 4 5]
The items column 1 onwards are:
[[2 3]
[4 5]
[5 6]]
Result:
Thus the working with Numpy arrays was successfully completed.
Ex no: 2 Create a dataframe using a list of elements.
Aim:
ALGORITHM
Step1: Start
Step2: import numpy and pandas module
Step3: Create a dataframe using the dictionary
Step4: Print the output
Step5: Stop
PROGRAM
import numpy as np
import pandas as pd
data = np.array([['','Col1','Col2'],
['Row1',1,2],
['Row2',3,4]])
print(pd.DataFrame(data=data[1:,1:],
index = data[1:,0],
columns=data[0,1:]))
# Take a 2D array as input to your DataFrame
my_2darray = np.array([[1, 2, 3], [4, 5, 6]])
print(pd.DataFrame(my_2darray))
Output:
Col1 Col2
Row1 1 2
Row2 3 4
0 1 2
0 1 2 3
1 4 5 61 2 3
0 1 1 2
1 3 2 4A
0 4
1 5
2 6
3 7
0
United Kingdom London
India New Delhi
United States Washington
Belgium Brussels
(2, 3)
2
Result:
Thus the working with Pandas data frames was successfully completed.
Ex. No.:3 Basic plots using Matplotlib
Aim:
ALGORITHM
Step1: Start
Step2: import Matplotlib module
Step3: Create a Basic plots using Matplotlib
Step4: Print the output
Step5: Stop
Program:3a
# x axis values
x = [1,2,3]
# corresponding y axis values
y = [2,4,1]
Program:3b
import matplotlib.pyplot as plt
a = [1, 2, 3, 4, 5]
b = [0, 0.6, 0.2, 15, 10, 8, 16, 21]
plt.plot(a)
Output:
Program:4c
import matplotlib.pyplot as plt
a = [1, 2, 3, 4, 5]
b = [0, 0.6, 0.2, 15, 10, 8, 16, 21]
c = [4, 2, 6, 8, 3, 20, 13, 15]
# use fig whenever u want the
# output in a new window also
# specify the window size you
# want ans to be displayed
fig = plt.figure(figsize =(10, 10))
sub1.plot(a, 'sb')
sub2.plot(b, 'or')
sub4.plot(c, 'Dm')
Output:
Result:
Thus the basic plots using Matplotlib in Python program was successfully completed.
Ex. No.:4 Frequency distributions
Aim:
To Count the frequency of occurrence of a word in a body of text is often needed during text
processing.
ALGORITHM
Program:
from nltk.tokenize import word_tokenize
from nltk.corpus import gutenberg
sample = gutenberg.raw("blake-poems.txt")
token = word_tokenize(sample)
wlist = []
for i in range(50):
wlist.append(token[i])
Result:
Thus the count the frequency of occurrence of a word in a body of text is often needed during
text processing and Conditional Frequency Distribution program using python was successfully
completed.
Ex. No.:5 Averages
Aim:
To compute weighted averages in Python either defining your own functions or using Numpy
ALGORITHM
Program:6c
weighted_avg_m3
Output:
44225.35
Result:
Thus the compute weighted averages in Python either defining your own functions or using
Numpy was successfully completed.
Ex. No.: 6. Variability
Aim:
To write a python program to calculate the variance.
ALGORITHM
Program:
# Python code to demonstrate variance()
# function on varying range of data-types
Output :
Result:
Thus the computation for variance was successfully completed.
Ex. No.:7 Normal Curve
Aim:
To create a normal curve using python program.
ALGORITHM
Program:
sb.set_style('whitegrid')
sb.lineplot(data, pdf , color = 'black')
plt.xlabel('Heights')
plt.ylabel('Probability Density')
Output:
Result:
Thus the normal curve using python program was successfully completed.
Ex. No.: 8 Correlation and scatter plots
Aim:
To write a python program for correlation with scatter plot
ALGORITHM
Program:
# Data
#Plot
# Plot
Result:
Thus the Correlation and scatter plots using python program was successfully completed.
Ex. No.: 9 Correlation coefficient
Aim:
To write a python program to compute correlation coefficient.
ALGORITHM
Program:
i=0
while i < n :
# sum of elements of array X.
sum_X = sum_X + X[i]
# Driver function
X = [15, 18, 21, 24, 27]
Y = [25, 25, 27, 31, 32]
Output :
0.953463
Result:
Thus the computation for correlation coefficient was successfully completed.
Ex. No.: 10 Simple Linear Regression
Aim:
To write a python program for Simple Linear Regression
ALGORITHM
Program:
import numpy as np
import matplotlib.pyplot as plt
# putting labels
plt.xlabel('x')
plt.ylabel('y')
def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))
Estimated coefficients:
b_0 = -0.0586206896552
b_1 = 1.45747126437
Graph:
Result:
Thus the computation for Simple Linear Regression was successfully completed.