Python_for_DataScience

Unit I Basics of Python 10
Introduction – Setting working directory – Creating and saving, File execution, clearing
console, removing variables from environment, clearing environment – variable creation –
Operators – Data types and its associated operations – sequence data types – conditions and
branching – Functions-Virtual Environments
Introduction to Python
Python is a versatile and popular programming language known for its simplicity and
readability. It is widely used in various fields, including web development, data analysis,
artificial intelligence, and scientific computing.
Setting Working Directory
The working directory is the folder where Python scripts are executed and where files are
read from or written to. To set the working directory:
Coding
import os
# To get the current working directory
print(os.getcwd())
# To change the working directory
os.chdir('path_to_directory')
Creating and Saving Files
Creating and saving files in Python can be done using the open() function:
# Create and write to a file

with open('example.txt', 'w') as file:
file.write("Hello, World!")
# Read from a file

with open('example.txt', 'r') as file:
content = file.read()
print(content)
File Execution
To execute a Python file from the console or terminal:
python filename.py
Clearing Console
Clearing the console can be done using a system call:
import os
# Clear console (Windows)

os.system('cls')
# Clear console (Unix/Linux/MacOS)

os.system('clear')
Removing Variables from Environment
To remove a variable from the environment:
# Create a variable
x = 10
# Delete the variable

del x
Clearing Environment
Clearing the entire environment is not a built-in feature of Python, but you can delete all
variables in the global scope:
# Delete all variables in the global scope

globals().clear()
Variable Creation
Variables in Python are created by simply assigning a value to a name:
x=5 # Integer
y = 3.14 # Float
name = "John" # String
Operators
Python supports various operators:
 Arithmetic Operators: +, -, *, /, %, //, **

 Comparison Operators: ==, !=, >, <, >=, <=
 Logical Operators: and, or, not
 Assignment Operators: =, +=, -=, *=, /=, %=, //=, **=
 Bitwise Operators: &, |, ^, ~, <<, >>
Data Types and Associated Operations
 Numbers: Integers, Floats, Complex numbers

o Operations: Arithmetic, type conversion, etc.
 Strings: Immutable sequences of characters
o Operations: Concatenation, slicing, formatting, etc.
 Lists: Mutable sequences
o Operations: Indexing, slicing, appending, inserting, removing, etc.
 Tuples: Immutable sequences
o Operations: Indexing, slicing, etc.
 Sets: Unordered collections of unique elements
o Operations: Union, intersection, difference, etc.
 Dictionaries: Key-value pairs
o Operations: Accessing, updating, removing elements, etc.
Sequence Data Types
 Lists
my_list = [1, 2, 3, 4, 5]
 Tuples
my_tuple = (1, 2, 3, 4, 5)
 Strings
my_string = "Hello, World!"
 Ranges
my_range = range(1, 10)
Conditions and Branching
Python uses if, elif, and else for conditional branching:
x = 10
if x > 0:
print("Positive")
elif x < 0:
print("Negative")
else:
print("Zero")
Functions
Functions are defined using the def keyword:
def greet(name):
return f"Hello, {name}!"
print(greet("Alice"))
Virtual Environments
Virtual environments allow you to create isolated Python environments for different projects:
Copy code
# Create a virtual environment
python -m venv myenv
# Activate the virtual environment (Windows)

myenv\Scripts\activate
# Activate the virtual environment (Unix/Linux/MacOS)

source myenv/bin/activate
# Deactivate the virtual environment

deactivate
Unit II PYTHON DATA STRUCTURES, PACKAGES 10
List – Tuples- Set – Dictionary – Its associated functions - File handling - Modes– Reading
and writing files - Introduction to Pandas – Series – Data frame – Indexing and loading –
Data manipulation – Merging – Group by – Scales – Pivot table – Date and time.
Lists
Lists are ordered, mutable collections of items.
Creating Lists:
my_list = [1, 2, 3, 4, 5]
Functions and Methods:
 append(x): Add an item to the end.

 extend(iterable): Extend list by appending elements from an iterable.
 insert(i, x): Insert an item at a given position.
 remove(x): Remove first item with value x.
 pop([i]): Remove and return item at position i (default last).
 clear(): Remove all items.
 index(x[, start[, end]]): Return index of first item with value x.
 count(x): Return number of times x appears.
 sort(key=None, reverse=False): Sort items.
 reverse(): Reverse the elements.
 copy(): Return a shallow copy.
Tuples
Tuples are ordered, immutable collections of items.
Creating Tuples:
my_tuple = (1, 2, 3, 4, 5)
 count(x): Return the number of times x appears.

 index(x): Return the index of the first item with value x.
Sets
Sets are unordered collections of unique items.
Creating Sets:
my_set = {1, 2, 3, 4, 5}
 add(x): Add an item.

 remove(x): Remove an item.
 discard(x): Remove an item if present.
 pop(): Remove and return an arbitrary item.
 clear(): Remove all items.
 union(*others): Return the union.
 intersection(*others): Return the intersection.
 difference(*others): Return the difference.
 symmetric_difference(other): Return the symmetric difference.
 issubset(other): Check if set is subset of other.
 issuperset(other): Check if set is superset of other.
Dictionaries
Dictionaries are unordered collections of key-value pairs.
Creating Dictionaries:
my_dict = {'a': 1, 'b': 2, 'c': 3}
 keys(): Return a new view of the dictionary's keys.

 values(): Return a new view of the dictionary's values.
 items(): Return a new view of the dictionary's items.
 get(key[, default]): Return the value for key if key is in the dictionary.
 setdefault(key[, default]): Insert key with a value of default if key is not
in the dictionary.
 update([other]): Update the dictionary with the key/value pairs from other.
 pop(key[, default]): Remove specified key and return the corresponding
value.
 popitem(): Remove and return a (key, value) pair.
File Handling
Modes:
 'r': Read (default).

 'w': Write (truncate file).
 'x': Create (fail if exists).
 'a': Append.
 'b': Binary mode.
 't': Text mode (default).
 '+': Update (read and write).
Reading and Writing Files:
# Writing to a file
with open('example.txt', 'w') as file:
file.write("Hello, World!")
# Reading from a file

with open('example.txt', 'r') as file:
content = file.read()
Introduction to Pandas
Pandas is a powerful data manipulation library in Python.
Series
A Series is a one-dimensional labeled array capable of holding any data type.
Creating a Series:
import pandas as pd
data = [1, 2, 3, 4, 5]
series = pd.Series(data)
DataFrame
A DataFrame is a two-dimensional labeled data structure.
Creating a DataFrame:
data = {
'Column1': [1, 2, 3],
'Column2': [4, 5, 6]
}
df = pd.DataFrame(data)
Indexing and Loading
Indexing:
df['Column1'] # Access a single column

df[['Column1', 'Column2']] # Access multiple columns
df.iloc[0] # Access a row by index
df.loc[0] # Access a row by label
Loading Data:
df = pd.read_csv('file.csv') # Load CSV file

df = pd.read_excel('file.xlsx') # Load Excel file
Data Manipulation
Basic Operations:
df['NewColumn'] = df['Column1'] + df['Column2'] # Add a new column

df.drop('Column1', axis=1, inplace=True) # Drop a column
df.rename(columns={'OldName': 'NewName'}, inplace=True) # Rename a column
Merging
Combining DataFrames:
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})

df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})
merged_df = pd.concat([df1, df2])
Group By
Grouping data:
grouped = df.groupby('Column1')
summary = grouped['Column2'].sum()
Scales
Scaling data can be done using libraries like sklearn.preprocessing.
Example:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaled_data = scaler.fit_transform(df)
Pivot Table
Creating a pivot table:
pivot = df.pivot_table(values='Value', index='Index', columns='Columns', aggfunc='mean')

Date and Time
Handling date and time data:
df['Date'] = pd.to_datetime(df['Date'])
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df['Day'] = df['Date'].dt.day
Unit III: Packages for Data Analysis
Numpy – 1D and 2D numpy – Associated operations –Broadcasting - Linear algebra and

related operations – Indexing and other operations – Matplotlib – scatterplot – line plot – bar
plot – histogram – box plot – pair plot – Case study on regression and classification.
Numpy
Numpy is a powerful numerical computing library in Python, providing support for large
multi-dimensional arrays and matrices along with a large collection of high-level
mathematical functions.
1D and 2D Numpy Arrays
Creating 1D Arrays:
import numpy as np
arr_1d = np.array([1, 2, 3, 4, 5])
Creating 2D Arrays:
arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
Associated Operations
Basic Operations:
# Element-wise addition
result = arr_1d + 2
# Element-wise subtraction
result = arr_1d - 2
# Element-wise multiplication
result = arr_1d * 2
# Element-wise division
result = arr_1d / 2
Aggregations:
# Sum of elements
np.sum(arr_1d)
# Mean of elements
np.mean(arr_1d)
# Standard deviation
np.std(arr_1d)
# Maximum and minimum

np.max(arr_1d)
np.min(arr_1d)
Broadcasting
Broadcasting allows Numpy to perform element-wise operations on arrays of different

shapes.
Example:
arr = np.array([1, 2, 3])

scalar = 2
result = arr + scalar # [3, 4, 5]
Linear Algebra and Related Operations
Dot Product:
a = np.array([1, 2])
b = np.array([3, 4])
dot_product = np.dot(a, b) # 11
Matrix Multiplication:
A = np.array([[1, 2], [3, 4]])

B = np.array([[5, 6], [7, 8]])
result = np.matmul(A, B)
Inverse of a Matrix:
matrix = np.array([[1, 2], [3, 4]])

inverse = np.linalg.inv(matrix)
Indexing and Other Operations
Indexing:
arr = np.array([1, 2, 3, 4, 5])
# Accessing elements
element = arr[0] # 1
# Slicing
subarray = arr[1:3] # [2, 3]
Reshaping:
arr = np.array([[1, 2, 3], [4, 5, 6]])

reshaped = arr.reshape((3, 2)) # [[1, 2], [3, 4], [5, 6]]
Matplotlib
Matplotlib is a plotting library for creating static, interactive, and animated visualizations in
Python.
Scatter Plot
Creating a Scatter Plot:
import matplotlib.pyplot as plt
x = np.array([1, 2, 3, 4, 5])
y = np.array([5, 4, 3, 2, 1])
plt.scatter(x, y)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Scatter Plot')
plt.show()
Line Plot
Creating a Line Plot:
plt.plot(x, y)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Line Plot')
plt.show()
Bar Plot
Creating a Bar Plot:
categories = ['A', 'B', 'C']

values = [10, 20, 15]
plt.bar(categories, values)
plt.xlabel('Categories')
plt.ylabel('Values')
plt.title('Bar Plot')
plt.show()
Histogram
Creating a Histogram:
data = np.random.randn(1000)
plt.hist(data, bins=30)
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Histogram')
plt.show()
Box Plot
Creating a Box Plot:
data = [np.random.normal(0, std, 100) for std in range(1, 4)]
plt.boxplot(data, vert=True, patch_artist=True)

plt.xlabel('Distribution')
plt.ylabel('Value')
plt.title('Box Plot')
plt.show()
Pair Plot
Creating a Pair Plot:
import seaborn as sns

import pandas as pd
df = pd.DataFrame({
'A': np.random.randn(100),
'B': np.random.randn(100),
'C': np.random.randn(100),
'D': np.random.randn(100)
})
sns.pairplot(df)
plt.show()
Case Study: Regression and Classification
Regression
Linear Regression Example:
from sklearn.linear_model import LinearRegression
# Sample data
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([1, 2, 3, 4, 5])
# Create and train the model

model = LinearRegression()
model.fit(X, y)
# Make predictions
predictions = model.predict(X)
# Plot results
plt.scatter(X, y, color='blue')
plt.plot(X, predictions, color='red')
plt.xlabel('X')
plt.ylabel('y')
plt.title('Linear Regression')
plt.show()
Classification
Logistic Regression Example:
from sklearn.linear_model import LogisticRegression

from sklearn.datasets import load_iris
# Load data
iris = load_iris()
X = iris.data
y = iris.target
# Create and train the model

model = LogisticRegression(max_iter=200)
model.fit(X, y)
# Make predictions
predictions = model.predict(X)
# Plot results
plt.scatter(X[:, 0], X[:, 1], c=predictions, cmap='viridis')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Logistic Regression Classification')
plt.show()
Programs for Python for Data Science
Basic python programs:
Addition of two numbers Output

a=eval(input(“enter first no”)) enter first no
b=eval(input(“enter second no”)) 5
c=a+b enter second no
print(“the sum is “,c) 6
the sum is 11
Area of rectangle Output
l=eval(input(“enter the length of rectangle”)) enter the length of rectangle 5
b=eval(input(“enter the breath of rectangle”)) enter the breath of rectangle 6
a=l*b 30
print(a)
Area & circumference of circle output
r=eval(input(“enter the radius of circle”)) enter the radius of circle4
a=3.14*r*r the area of circle 50.24
c=2*3.14*r the circumference of circle
print(“the area of circle”,a) 25.12
print(“the circumference of circle”,c)
Calculate simple interest Output
p=eval(input(“enter principle amount”)) enter principle amount 5000
n=eval(input(“enter no of years”)) enter no of years 4
r=eval(input(“enter rate of interest”)) enter rate of interest6
si=p*n*r/100 simple interest is 1200.0
print(“simple interest is”,si)
Calculate engineering cutoff Output

p=eval(input(“enter physics marks”)) enter physics marks 100
c=eval(input(“enter chemistry marks”)) enter chemistry marks 99
m=eval(input(“enter maths marks”)) enter maths marks 96
cutoff=(p/4+c/4+m/2) cutoff = 97.75
print(“cutoff =”,cutoff)
Check voting eligibility output

age=eval(input(“enter ur age”)) Enter ur age
If(age>=18): 19
print(“eligible for voting”) Eligible for voting
else:
print(“not eligible for voting”)
Find greatest of three numbers output

a=eval(input(“enter the value of a”)) enter the value of a 9
b=eval(input(“enter the value of b”)) enter the value of a 1
c=eval(input(“enter the value of c”)) enter the value of a 8
if(a>b): the greatest no is 9
if(a>c):
print(“the greatest no is”,a)
else:
print(“the greatest no is”,c)
else:
if(b>c):
print(“the greatest no is”,b)
else:
print(“the greatest no is”,c)
Programs on for loop
Print n natural numbers Output
for i in range(1,5,1): 1234
print(i)
Print n odd numbers Output
for i in range(1,10,2):
13579
print(i)
Print n even numbers Output

for i in range(2,10,2):
2468
print(i)
Print squares of numbers Output
for i in range(1,5,1): 1 4 9 16
print(i*i)
Print squares of numbers Output
for i in range(1,5,1): 1 8 27 64
print(i*i*i)
Programs on while loop
Print n natural numbers Output

i=1 1
while(i<=5): 2
print(i) 3
i=i+1 4
5
Print n odd numbers Output
i=2 2
while(i<=10): 4
print(i) 6
i=i+2 8
10
Print n even numbers Output
i=1 1
while(i<=10): 3
print(i) 5
i=i+2 7
9
Print n squares of numbers Output
i=1 1
while(i<=5): 4
print(i*i) 9
i=i+1 16
25
Print n cubes numbers Output

i=1 1
while(i<=3): 8
print(i*i*i) 27
i=i+1
find sum of n numbers Output

i=1 55
sum=0
while(i<=10):
sum=sum+i
i=i+1
print(sum)
factorial of n numbers/product of n numbers Output

i=1 3628800
product=1
while(i<=10):
product=product*i
i=i+1
print(product)
sum of n numbers Output
def add(): enter a value
a=eval(input(“enter a value”)) 6
b=eval(input(“enter b value”)) enter b value
c=a+b 4
print(“the sum is”,c) the sum is 10
add()
area of rectangle using function Output

def area(): enter the length of
l=eval(input(“enter the length of rectangle”)) rectangle 20
b=eval(input(“enter the breath of rectangle”)) enter the breath of
a=l*b rectangle 5
print(“the area of rectangle is”,a) the area of rectangle is
area() 100
swap two values of variables Output

def swap(): enter a value3
a=eval(input("enter a value")) enter b value5
b=eval(input("enter b value")) a= 5 b= 3
c=a
a=b
b=c
print("a=",a,"b=",b)
swap()
check the no divisible by 5 or not Output
def div(): enter n value10
n=eval(input("enter n value")) the number is divisible by
if(n%5==0): 5
print("the number is divisible by 5")
else:
print("the number not divisible by 5")
div()
find reminder and quotient of given no Output

def reminder(): enter a 6
a=eval(input("enter a")) enter b 3
b=eval(input("enter b")) the reminder is 0
R=a%b enter a 8
print("the reminder is",R) enter b 4
def quotient(): the reminder is 2.0
a=eval(input("enter a"))
b=eval(input("enter b"))
Q=a/b
print("the reminder is",Q)
reminder()
quotient()
convert the temperature Output

enter temperature in
def ctof(): centigrade 37
c=eval(input("enter temperature in centigrade")) the temperature in
f=(1.8*c)+32 Fahrenheit is 98.6
print("the temperature in Fahrenheit is",f) enter temp in Fahrenheit
def ftoc(): 100
f=eval(input("enter temp in Fahrenheit")) the temperature in
c=(f-32)/1.8 centigrade is 37.77
print("the temperature in centigrade is",c)
ctof()
ftoc()
program for basic calculator Output
def add(): enter a value 10
a=eval(input("enter a value")) enter b value 10
b=eval(input("enter b value")) the sum is 20
c=a+b enter a value 10
print("the sum is",c) enter b value 10
def sub(): the diff is 0
a=eval(input("enter a value")) enter a value 10
b=eval(input("enter b value")) enter b value 10
c=a-b the mul is 100
print("the diff is",c) enter a value 10
def mul(): enter b value 10
a=eval(input("enter a value")) the div is 1
b=eval(input("enter b value"))
c=a*b
print("the mul is",c)
def div():
a=eval(input("enter a value"))
b=eval(input("enter b value"))
c=a/b
print("the div is",c)
add()
sub()
mul()
div()
NUMPY ARRAYS
ALGORITHM
Step1: Start
Step2: Import numpy module
Step3: Print the basic characteristics and operactions of array Step4: Stop
PROGRAM
import numpy as np
# Creating array object arr = np.array( [[ 1, 2, 3],
[ 4, 2, 5]] )
# Printing type of arr object print("Array is of type: ", type(arr)) # Printing array dimensions
(axes)
print("No. of dimensions: ", arr.ndim) # Printing shape of array print("Shape of array: ",
arr.shape)
# Printing size (total number of elements) of array print("Size of array: ", arr.size)
# Printing type of elements in array
print("Array stores elements of type: ", arr.dtype)
OUTPUT
Array is of type: <class 'numpy.ndarray'> No. of dimensions: 2
Shape of array: (2, 3) Size of array: 6
Array stores elements of type: int32
PROGRAM TO PERFORM ARRAY SLICING
a = np.array([[1,2,3],[3,4,5],[4,5,6]])
print(a)
print("After slicing") print(a[1:])

Output
[[1 2 3]
[3 4 5]
[4 5 6]]
After slicing [[3 4 5]
[4 5 6]]
CREATE A DATAFRAME USING A LIST OF ELEMENTS.
ALGORITHM
Step1: Start
Step2: import numpy and pandas module
Step3: Create a dataframe using the dictionary
Step4: Print the output
Step5: Stop
PROGRAM
import numpy as np import pandas as pd
data = np.array([['','Col1','Col2'], ['Row1',1,2],
['Row2',3,4]])
print(pd.DataFrame(data=data[1:,1:],
index = data[1:,0], columns=data[0,1:]))
# Take a 2D array as input to your DataFrame my_2darray = np.array([[1, 2, 3], [4, 5, 6]])
print(pd.DataFrame(my_2darray))
# Take a dictionary as input to your DataFrame my_dict = {1: ['1', '3'], 2: ['1', '2'], 3: ['2', '4']}
print(pd.DataFrame(my_dict))
# Take a DataFrame as input to your DataFrame
my_df = pd.DataFrame(data=[4,5,6,7], index=range(0,4), columns=['A'])

print(pd.DataFrame(my_df))
# Take a Series as input to your DataFrame

my_series = pd.Series({"United Kingdom":"London", "India":"New Delhi", "United
States":"Washington", "Belgium":"Brussels"})
print(pd.DataFrame(my_series))
df = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6]]))
# Use the `shape` property print(df.shape)
# Or use the `len()` function with the `index` property print(len(df.index))
Output:
Col1 Col2
Row1 1 2
Row2 3 4
0 1 2
0 1 2 3
1 4 5 61 23
0 1 1 2
1 3 2 4A
0 4
1 5
2 6
3 7
United Kingdom London India New Delhi United States Washington Belgium
Brussels
(2, 3)
2
BASIC PLOTS USING MATPLOTLIB
ALGORITHM
Step1: Start
Step2: import Matplotlib module
Step3: Create a Basic plots using Matplotlib Step4: Print the output
Step5: Stop
Program:3a
# importing the required module import matplotlib.pyplot as plt
# x axis values x = [1,2,3]
# corresponding y axis values y = [2,4,1]
# plotting the points plt.plot(x, y)
# naming the x axis plt.xlabel('x - axis') # naming the y axis plt.ylabel('y - axis')
# giving a title to my graph plt.title('My first graph!')
# function to show the plot plt.show()
Output:
Program:3b
import matplotlib.pyplot as plt a = [1, 2, 3, 4, 5]

b = [0, 0.6, 0.2, 15, 10, 8, 16, 21]
plt.plot(a)
# o is for circles and r is # for red
plt.plot(b, "or") plt.plot(list(range(0, 22, 3)))
# naming the x-axis plt.xlabel('Day ->')
# naming the y-axis plt.ylabel('Temp ->')
c = [4, 2, 6, 8, 3, 20, 13, 15]
plt.plot(c, label = '4th Rep')
# get current axes command ax = plt.gca()
# get command over the individual # boundary line of the graph body
ax.spines['right'].set_visible(False) ax.spines['top'].set_visible(False)
# set the range or the bounds of
# the left boundary line to fixed range ax.spines['left'].set_bounds(-3, 40)
# set the interval by which # the x-axis set the marks
plt.xticks(list(range(-3, 10)))
# set the intervals by which y-axis # set the marks plt.yticks(list(range(-3, 20, 3)))
# legend denotes that what color # signifies what
ax.legend(['1st Rep', '2nd Rep', '3rd Rep', '4th Rep'])
# annotate command helps to write
# ON THE GRAPH any text xy denotes # the position on the graph
plt.annotate('Temperature V / s Days', xy = (1.01, -2.15))
# gives a title to the Graph plt.title('All Features Discussed') plt.show()

Output:
Program:
a = [1, 2, 3, 4, 5]
b = [0, 0.6, 0.2, 15, 10, 8, 16, 21]
c = [4, 2, 6, 8, 3, 20, 13, 15]
# use fig whenever u want the # output in a new window also # specify the window size you
# want ans to be displayed
fig = plt.figure(figsize =(10, 10))
# creating multiple plots in a # single plot
sub1 = plt.subplot(2, 2, 1)
sub4 = plt.subplot(2, 2, 4) sub1.plot(a, 'sb')
# sets how the display subplot # x axis values advances by 1 # within the specified range
sub1.set_xticks(list(range(0, 10, 1))) sub1.set_title('1st Rep')
sub2.plot(b, 'or')
# sets how the display subplot x axis # values advances by 2 within the
# specified range sub2.set_xticks(list(range(0, 10, 2))) sub2.set_title('2nd Rep')
# can directly pass a list in the plot
# function instead adding the reference sub3.plot(list(range(0, 22, 3)), 'vg')
sub3.set_xticks(list(range(0, 10, 1))) sub3.set_title('3rd Rep')
sub4.plot(c, 'Dm')
# similarly we can set the ticks for # the y-axis range(start(inclusive), # end(exclusive), step)
sub4.set_yticks(list(range(0, 24, 2))) sub4.set_title('4th Rep')
# without writing plt.show() no plot # will be visible
plt.show()
Output:
Normal Curve
ALGORITHM
Step 1: Start the Program
Step 2: Import packages scipy and call function scipy.stats
Step 3: Import packages numpy, matplotlib and seaborn
Step 4: Create the distribution
Step 5: Visualizing the distribution Step 6: Stop the process
Program:
# import required libraries from scipy.stats import norm import numpy as np
import matplotlib.pyplot as plt import seaborn as sb
# Creating the distribution data = np.arange(1,10,0.01)
pdf = norm.pdf(data , loc = 5.3 , scale = 1 )
#Visualizing the distribution
sb.set_style('whitegrid')
sb.lineplot(data, pdf , color = 'black') plt.xlabel('Heights')
plt.ylabel('Probability Density')
Output:
CORRELATION AND SCATTER PLOTS
ALGORITHM
Step 1: Start the Program Step 2: Create variable y1, y2
Step 3: Create variable x, y3 using random function
Step 4: plot the scatter plot Step 5: Print the result Step 6: Stop the process
Program:
# Scatterplot and Correlations # Data
x-pp random randn(100) yl=x*5+9
y2=-5°x
y3=no_random.randn(100) #Plot
plt.reParams update('figure figsize' (10,8), 'figure dpi¹:100})
plt scatter(x, yl, label=fyl, Correlation = {np.round(np.corrcoef(x,y1)[0,1], 2)})
plt scatter(x, y2, label=fy2 Correlation = (np.round(np.corrcoef(x,y2)[0,1], 2)})
plt scatter(x, y3, label=fy3 Correlation = (np.round(np.corrcoef(x,y3)[0,1], 2)})
# Plot
plt titlef('Scatterplot and Correlations') plt(legend)
plt(show)
Output
SIMPLE LINEAR REGRESSION
ALGORITHM
Step 1: Start the Program
Step 2: Import numpy and matplotlib package Step 3: Define coefficient function
Step 4: Calculate cross-deviation and deviation about x Step 5: Calculate regression

coefficients
Step 6: Plot the Linear regression and define main function
Step 7: Print the result
Step 8: Stop the process
PROGRAM:
import numpy as np
def estimate_coef(x, y):
# number of observations/points n = np.size(x)
# mean of x and y vector m_x = np.mean(x)
m_y = np.mean(y)
# calculating cross-deviation and deviation about x SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x
# calculating regression coefficients b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x return (b_0, b_1)
def plot_regression_line(x, y, b):
# plotting the actual points as scatter plot plt.scatter(x, y, color = "m",
marker = "o", s = 30)
# predicted response vector y_pred = b[0] + b[1]*x

# plotting the regression line plt.plot(x, y_pred, color = "g")
# putting labels plt.xlabel('x')
plt.ylabel('y')
# function to show plot plt.show()
def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
# estimating coefficients b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))
# plotting regression line plot_regression_line(x, y, b)
if name == " main ": main()
Output :
Estimated coefficients:
b_0 = -0.0586206896552
b_1 = 1.45747126437
Graph:
MATPLOTLIB
Draw a line in a diagram from position (1, 3) to (2, 8) then to (6, 1) and
finally to position (8, 10):

import numpy as np
xpoints = np.array([1, 2, 6, 8])

ypoints = np.array([3, 8, 1, 10])
plt.plot(xpoints, ypoints)
plt.show()
Draw a line diagram to plot from (1, 3) to (8, 10), we have to pass two
arrays [1, 8] and [3, 10] to the plot function.

import numpy as np
xpoints = np.array([1, 8])
ypoints = np.array([3, 10])
plt.plot(xpoints, ypoints)
plt.show()
Markers
Draw a line diagram with marker to plot from (1, 3) to (8, 10), we have to
pass two arrays [1, 8] and [3, 10] to the plot function.

import numpy as np
plt.plot(ypoints, marker = 'o')
plt.show()
Marker Size
Draw a line diagram with marker size will be 20 to plot from (1, 3) to (8, 10),
we have to pass two arrays [1, 8] and [3, 10] to the plot function.

import numpy as np
plt.plot(ypoints, marker = 'o', ms = 20)
plt.show()
Marker Color
Draw a line diagram with marker size will be 20 with marker colour red to
plot from (1, 3) to (8, 10), we have to pass two arrays [1, 8] and [3, 10] to
the plot function.

import numpy as np
plt.plot(ypoints, marker = 'o', ms = 20, mec = 'r')
####plt.plot(ypoints, marker = 'o', ms = 20, mec = '#4CAF50', mfc
= '#4CAF50')
###plt.plot(ypoints, marker = 'o', ms = 20, mec = 'hotpink', mfc
= 'hotpink')
plt.show()
Create Labels for a Plot
With Pyplot, you can use the xlabel() and ylabel() functions to set a label
for the x- and y-axis.
import numpy as np
x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])
plt.plot(x, y)
plt.title("Sports Watch Data")
plt.xlabel("Average Pulse")
plt.ylabel("Calorie Burnage")
plt.show()
Set Font Properties for Title and Labels
You can use the fontdict parameter in xlabel(), ylabel(), and title() to set font
properties for the title and labels.
Example
Set font properties for the title and labels:
import numpy as np
x = np.array([80, 85, 90, 95, 100, 105, 110, 115, 120, 125])
y = np.array([240, 250, 260, 270, 280, 290, 300, 310, 320, 330])
font1 = {'family':'serif','color':'blue','size':20}
font2 = {'family':'serif','color':'darkred','size':15}
plt.title("Sports Watch Data", fontdict = font1)
plt.xlabel("Average Pulse", fontdict = font2)
plt.ylabel("Calorie Burnage", fontdict = font2)
plt.plot(x, y)
plt.show()
Matplotlib Scatter
With Pyplot, you can use the scatter() function to draw a scatter plot.
The scatter() function plots one dot for each observation. It needs two arrays
of the same length, one for the values of the x-axis, and one for values on
the y-axis:
import numpy as np
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
plt.scatter(x, y)
plt.show()
ColorMap

import numpy as np
x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
colors = np.array([0, 10, 20, 30, 40, 45, 50, 55, 60, 70, 80, 90, 100])
plt.scatter(x, y, c=colors, cmap='viridis')
plt.colorbar()
plt.show()
Creating Bars
With Pyplot, you can use the bar() function to draw bar graphs:

import numpy as np
x = np.array(["A", "B", "C", "D"])
y = np.array([3, 8, 1, 10])
plt.bar(x,y)
plt.show()
import numpy as np
x = np.array(["A", "B", "C", "D"])
y = np.array([3, 8, 1, 10])
plt.bar(x, y, color = "red")
plt.show()
Histogram
A histogram is a graph showing frequency distributions.
It is a graph showing the number of observations within each given interval.
In Matplotlib, we use the hist() function to create histograms.
The hist() function will use an array of numbers to create a histogram, the
array is sent into the function as an argument.
import numpy as np
x = np.random.normal(170, 10, 250)
plt.hist(x)
plt.show()
Creating Pie Charts

With Pyplot, you can use the pie() function to draw pie charts:

import numpy as np
y = np.array([35, 25, 25, 15])
plt.pie(y)
plt.show()
import numpy as np
y = np.array([35, 25, 25, 15])
mylabels = ["Apples", "Bananas", "Cherries", "Dates"]
plt.pie(y, labels = mylabels)
plt.show()
Explode
The explode parameter, if specified, and not None, must be an array with one
value for each wedge.
Each value represents how far from the center each wedge is displayed:

import numpy as np
y = np.array([35, 25, 25, 15])
myexplode = [0.2, 0, 0, 0]
plt.pie(y, labels = mylabels, explode = myexplode)
plt.show()
Legend
To add a list of explanation for each wedge, use the legend() function:

import numpy as np
y = np.array([35, 25, 25, 15])

plt.pie(y, labels = mylabels)

plt.legend()
plt.show()
Python program to perform Data Manipulation operations using Pandas

package.
import pandas as pd
# Create a DataFrame
data = { 'Name': ['John', 'Emma', 'Sam', 'Lisa', 'Tom'], 'Age': [25, 30, 28, 32, 27], 'Country':
['USA', 'Canada', 'Australia', 'UK', 'Germany'], 'Salary': [50000, 60000, 55000, 70000, 52000]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
# Selecting columns
name_age = df[['Name', 'Age']]
print("\nName and Age columns:")
print(name_age)
# Filtering rows
filtered_df = df[df['Country'] == 'USA']
print("\nFiltered DataFrame (Country = 'USA'):")
print(filtered_df)
# Sorting by a column
sorted_df = df.sort_values('Salary', ascending=False)
print("\nSorted DataFrame (by Salary in descending order):")
print(sorted_df)
# Aggregating data
average_salary = df['Salary'].mean() print("\nAverage Salary:", average_salary)
# Adding a new column
df['Experience'] = [3, 6, 4, 8, 5]
print("\nDataFrame with added Experience column:")
print(df)
# Updating values
df.loc[df['Name'] == 'Emma', 'Salary'] = 65000
print("\nDataFrame after updating Emma's Salary:")
print(df)
# Deleting a column df = df.drop('Experience', axis=1)
print("\nDataFrame after deleting Experience column:")
print(df)

Python_for_DataScience

Uploaded by

Python_for_DataScience

Uploaded by

Unit I Basics of Python 10

Setting Working Directory

Creating and Saving Files

# Create and write to a file

# Read from a file

To execute a Python file from the console or terminal:

# Clear console (Windows)

# Clear console (Unix/Linux/MacOS)

Removing Variables from Environment

To remove a variable from the environment:

# Delete the variable

# Delete all variables in the global scope

Variables in Python are created by simply assigning a value to a name:

Python supports various operators:

 Arithmetic Operators: +, -, *, /, %, //, **

Data Types and Associated Operations

 Numbers: Integers, Floats, Complex numbers

Sequence Data Types

my_string = "Hello, World!"

my_range = range(1, 10)

Conditions and Branching

Python uses if, elif, and else for conditional branching:

Functions are defined using the def keyword:

# Activate the virtual environment (Windows)

# Activate the virtual environment (Unix/Linux/MacOS)

# Deactivate the virtual environment

Lists are ordered, mutable collections of items.

Functions and Methods:

 append(x): Add an item to the end.

Tuples are ordered, immutable collections of items.

Functions and Methods:

 count(x): Return the number of times x appears.

Sets are unordered collections of unique items.

 add(x): Add an item.

Dictionaries are unordered collections of key-value pairs.

my_dict = {'a': 1, 'b': 2, 'c': 3}

Functions and Methods:

 keys(): Return a new view of the dictionary's keys.

 'r': Read (default).

# Reading from a file

Pandas is a powerful data manipulation library in Python.

A Series is a one-dimensional labeled array capable of holding any data type.

A DataFrame is a two-dimensional labeled data structure.

df['Column1'] # Access a single column

df = pd.read_csv('file.csv') # Load CSV file

df['NewColumn'] = df['Column1'] + df['Column2'] # Add a new column

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})

Scaling data can be done using libraries like sklearn.preprocessing.

from sklearn.preprocessing import StandardScaler

Creating a pivot table:

pivot = df.pivot_table(values='Value', index='Index', columns='Columns', aggfunc='mean')

Handling date and time data:

Unit III: Packages for Data Analysis

Numpy – 1D and 2D numpy – Associated operations –Broadcasting - Linear algebra and

1D and 2D Numpy Arrays

arr_1d = np.array([1, 2, 3, 4, 5])

arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Maximum and minimum

Broadcasting allows Numpy to perform element-wise operations on arrays of different

arr = np.array([1, 2, 3])

result = arr + scalar # [3, 4, 5]

Linear Algebra and Related Operations

A = np.array([[1, 2], [3, 4]])

matrix = np.array([[1, 2], [3, 4]])

arr = np.array([1, 2, 3, 4, 5])

arr = np.array([[1, 2, 3], [4, 5, 6]])

Creating a Scatter Plot:

import matplotlib.pyplot as plt

Creating a Line Plot:

Creating a Bar Plot:

categories = ['A', 'B', 'C']

Creating a Box Plot:

data = [np.random.normal(0, std, 100) for std in range(1, 4)]

plt.boxplot(data, vert=True, patch_artist=True)

Creating a Pair Plot:

import seaborn as sns