0% found this document useful (0 votes)

129 views29 pages

Introduction To Python (Part III)

The document discusses important Python libraries including NumPy, Pandas, Matplotlib, and Scikit-learn. NumPy is used for numerical computing and contains functionality for multidimensional arrays. Pandas is used for data manipulation and analysis. Matplotlib is primarily used for scientific plotting. Common tasks covered include loading and manipulating data, creating arrays, data visualization, and handling outliers and missing values.

Uploaded by

Subhradeep Pal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

129 views29 pages

Introduction To Python (Part III)

Uploaded by

Subhradeep Pal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

INTRODUCTION TO PYTHON

(PART III)

Presenter: Prof. Amit Kumar Das

Assistant Professor,
Dept. of Computer Science and Engg.,
Institute of Engineering & Management.
IMPORTANT LIBRARIES IN PYTHON
 scikit-learn
 Numpy – Fundamental package for scientific
computing
 SciPy – Package providing mathematical functions
and statistical distributions
 matplotlib – Primary library supporting scientific
plotting e.g. line diagrams, histograms, scatter plots
 pandas – Primary library providing data manipulation
functionalities
BASIC PYTHON LIBRARIES - NUMPY
NumPy package contains functionality for multidimensional arrays, high-level mathematical
functions e.g. linear algebra and Fourier transform operations, random number generators, etc.
In scikit-learn, NumPy array is the primary data structure, used to input data. Any data used needs
to be converted to a NumPy array.
[Link](object, dtype, copy, order, subok, ndmin)
dtype means data-type i.e. the desired data-type for the array. If not given, then the type will be
determined as the minimum type required to hold the objects in the sequence.

 empty - Return a new uninitialized array

 full - Return a new array of given shape filled with value
 ones - Return a new array setting values to one
 zeros - Return a new array setting values to zero
BASIC PYTHON LIBRARIES - NUMPY
# Defining an array variable with data ...
import numpy as np

arr1 = [Link]((2,3))

arr2 = [Link]([[10,2,3], [23,45,67]])

print(var1)
[[10 2 3]
[23 45 67]]

# Create an array of 1s ...

Arr3 = [Link]((2,3))
[[ 1., 1., 1.],
[ 1., 1., 1.]]

# Create an array of 0s ...

Arr4 = [Link]((2,3),dtype=[Link])
[[0, 0, 0],
[0, 0, 0]]

# Create an array with random numbers ...

[Link]((2,2))
[[ 0.47448072, 0.49876875],
[ 0.29531478, 0.48425055]]
BASIC PYTHON LIBRARIES – NUMPY (CONTD.)
# Defining 1-D array variable with data ...
var2 = [Link](4)
var2[0] = 5.67
var2[1] = 2
var2[2] = 56
var2[3] = 304
print(var2)
[ 5.67 2. 56. 304. ]
print([Link]) # Returns the dimension of the array ...
(4,)
print([Link]) # Returns the size of the array ...
4
# Defining 2-D array variable with data ...
var3 = [Link]((2,3))
var3[0][0] = 5.67
var3[0][1] = 2
var3[0][2] = 56
var3[1][0] = .09
var3[1][1] = 132
var3[1][2] = 1056
print(var3)
[[ 5.67000000e+00 2.00000000e+00 5.60000000e+01]
[ 9.00000000e-02 1.32000000e+02 1.05600000e+03]]
[Note: Same result will be obtained with dtype=[Link]]
print([Link])
(2, 3)
BASIC PYTHON LIBRARIES – NUMPY (CONTD.)
# Same declaration with dtype mentioned ...
var3 = [Link]((2,3), dtype=[Link])
[[ 5, 2, 56],
[ 0, 132, 1056]]
print(var3[1]) # Returns a row of an array ...
[ 0 132 1056]
print(var3[[0, 1]]) # Returns multiple rows of an array ...
[[ 5 2 56]
[ 0 132 1056]]
print(var3[:, 2]) # Returns a column of an array ...
[ 56 1056]
print(var3[:, [1, 2]]) # Returns multiple column of an array ...
[[ 2 56]
[ 132 1056]]
print(var3[1][2]) # Returns a cell value of an array ...
1056
print(var3[1, 2]) # Returns a cell value of an array ...
1056
print([Link](var3)) # Returns transpose of an array ...
[[ 5 0]
[ 2 132]
[ 56 1056]]
print([Link](3,2)) # Returns a re-shaped array ...
[[ 5 2]
[ 56 0]
[ 132 1056]]
BASIC PYTHON LIBRARIES – NUMPY (CONTD.)
Create and concatenate arrays:
import numpy as np

arr1= [Link]((2,3), dtype=[Link])

arr1[0][0] = 5.67
arr1[0][1] = 2
arr1[0][2] = 56
arr1[1][0] = .09
arr1[1][1] = 132
arr1[1][2] = 1056

[[ 5, 2, 56],
[ 0, 132, 1056]]

arr2 = [Link]((1,3), dtype=[Link])

arr2[0][0] = 37
arr2[0][1] = 2.193
arr2[0][2] = 5609

[[ 37, 2, 5609]]
BASIC PYTHON LIBRARIES – NUMPY (CONTD.)
arr_concat = [Link]((arr1, arr2), axis = 0)
print(arr_concat)

[[ 5 2 56]
[ 0 132 1056]
[ 37 2 5609]]

[Link]() # Returns minimum value stored in an array ...

2.0

[Link]() # Returns maximum value stored in an array ...

304.0

[Link]() # Returns cumulative sum of the values stored in an array

...
array([ 5.67, 7.67, 63.67, 367.67])

[Link]() # Returns mean or average value stored in an array ...

91.917500000000004

[Link]() # Returns standard deviation of values stored in an array ...

124.2908299865682
BASIC PYTHON LIBRARIES – NUMPY (CONTD.)
BASIC PYTHON LIBRARIES – NUMPY (CONTD.)
BASIC PYTHON LIBRARIES – NUMPY (CONTD.)
BASIC PYTHON LIBRARIES – PANDAS
pandas is a Python package providing fast and flexible functionalities designed to work with
“relational” or “labeled” data.
import pandas as pd # “pd” is just an alias for pandas

data = pd.read_csv("[Link]") # Uploads data from a .csv file

type(data) # To find the type of the data set object loaded

[Link]

[Link] # To find the dimensions i.e. number of rows and columns of

the data set loaded

(398, 9)

nrow_count = [Link][0] # To find just the number of rows

print(nrow_count)
398

ncol_count = [Link][1] # To find just the number of columns

print(ncol_count)
9
BASIC PYTHON LIBRARIES – PANDAS (CONTD.)
[Link] # To get the columns of a dataframe

Index(['mpg', 'cylinders', 'displacement', 'horsepower', 'weight',

'acceleration', 'model year', 'origin', 'car name'],
dtype='object')

# To change the column names of a dataframe e.g. ‘mpg’ in this case …

[Link] = ['miles_per_gallon', 'cylinders', 'displacement',

'horsepower', 'weight', 'acceleration', 'model year', 'origin', 'car
name']

[Link] # To get the revised column names of the dataframe ...

Index(['miles_per_gallon', 'cylinders', ...], dtype='object')

[Link](columns={'displacement': 'disp'}, inplace=True)

BASIC PYTHON LIBRARIES – PANDAS (CONTD.)
[Link]() # By default displays top 5 rows

[Link](3) # To display the top 3 rows

[Link] () # By default displays bottom 5 rows

[Link] (3) # To display the bottom 3 rows

[Link][200,'cylinders'] # Will return cell value of the 200th row

and column ‘cylinders’ of the data frame
6
Alternatively, we can use the following code:

data.get_value(200,'cylinders')

# Creation of a data set with missing values ...

var1 = [[Link], [Link], [Link], 10.1, 12, 123.14, 0.121]
var2 = [40.2, 11.78, 7801, 0.25, 34.2, [Link], [Link]]
var3 = [1234, [Link], 34.5, [Link], 78.25, 14.5, [Link]]
df = [Link]({'Attr_1': var1, 'Attr_2': var2, 'Attr_3': var3})
print(df)
Attr_1 Attr_2 Attr_3
0 NaN 40.20 1234.00
1 NaN 11.78 NaN
2 NaN 7801.00 34.50
3 10.100 0.25 NaN
4 12.000 34.20 78.25
5 123.140 NaN 14.50
6 0.121 NaN NaN
# Find missing values in a data set
miss_val = df[df['Attr_1'].isnull()]
print(miss_val)
Attr_1 Attr_2 Attr_3
0 NaN 40.20 1234.0
1 NaN 11.78 NaN
2 NaN 7801.00 34.5
BASIC PYTHON LIBRARIES – PANDAS (CONTD.)
>>> [Link](data[["mpg"]])
23.514573
>>> [Link](data[["mpg"]])
23.0
>>> [Link](data[["mpg"]])
60.936119
>>> [Link](data[["mpg"]])
7.806159
BASIC PYTHON LIBRARIES – MATPLOTLIB
Constructing Box plot for Iris data set
 Popular data set in the machine learning
 Consists of 3 different types of iris flower - Setosa, Versicolour, and
Virginica
 4 columns - Sepal Length, Sepal Width, Petal Length and Petal Width
 First have to import the Python library datasets

>>> from sklearn import datasets

# import some data to play with
>>> iris = datasets.load_iris()
>>> import [Link] as plt
>>> X = [Link][:, :4]
>>> [Link](X)
>>> [Link]()
BASIC PYTHON LIBRARIES – MATPLOTLIB (CONTD.)
Box plot for Iris data set (all features):
BASIC PYTHON LIBRARIES – MATPLOTLIB (CONTD.)
>>> [Link](X[:, 1])
>>> [Link]()

Box plot for Iris data set (single feature)

BASIC PYTHON LIBRARIES – MATPLOTLIB (CONTD.)
>>> import [Link] as plt
>>> X = [Link][:, :1]
Histogram
>>> [Link](X)
>>> [Link]('Sepal length')
>>> [Link]()
BASIC PYTHON LIBRARIES – MATPLOTLIB (CONTD.)

Scatterplot of Iris data set : Sepal length vs. Petal length

>>> X = [Link][:, :4] # We take the first 4 features

>>> y = [Link]
>>> [Link](X[:, 2], X[:, 0], c=y, cmap=[Link].Set1,
edgecolor='k')
>>> [Link]('Petal length')
>>> [Link]('Sepal length')
>>> [Link]()
BASIC PYTHON LIBRARIES – MATPLOTLIB (CONTD.)
Scatterplot of Iris data set : Sepal length vs. Petal length
DATA PRE-PROCESSING

Mainly deals with two things –

 Handling outliers
 Remediating missing values

Primary measures for remediating outliers and missing values are:

 Removing specific rows containing outliers / missing values
 Imputing the value (i.e. outlier / missing value) with a standard
statistical measure e.g. mean or median or mode for that attribute
 Estimate the value (i.e. outlier / missing value) based on value of the
attribute in similar records and replace with the estimated value.
 Cap the values within 1.5 X IQR limits
DATA PRE-PROCESSING (CONTD.)
>>> df = pd.read_csv("[Link]")

Finding missing values in a data set:

>>> miss_val = df[df['horsepower'].isnull()]

>>> print(miss_val)
DATA PRE-PROCESSING (CONTD.)
Finding Outliers (Option 1) :
>>> import [Link] as plt
>>> X = data["mpg"]
>>> [Link](X)
>>> [Link]()

model year origin car name

322 80 3 mazda glc
DATA PRE-PROCESSING (CONTD.)
Removing records with missing values / outliers:
We can drop the rows / columns with missing values using the code below.

>>> [Link](axis=0, how=‘any')

In a similar way, outlier values can be removed.

def remove_outlier(ds, col):

quart1 = ds[col].quantile(0.25)
quart3 = ds[col].quantile(0.75)
IQR = quart3 - quart1 #Interquartile range
low_val = quart1 - 1.5*IQR
high_val = quart3 + 1.5*IQR
df_out = [Link][(ds[col] > low_val) & (ds[col] <
high_val)]
return df_out

>>> data = remove_outlier(data, "mpg")

DATA PRE-PROCESSING (CONTD.)
Imputing standard values:
Only the affected rows are identified and the value of the attribute is transformed to the mean
value of the attribute.

>>> hp_mean = [Link](data['horsepower'])

>>> imputedrows = data[data['horsepower'].isnull()]
>>> imputedrows = [Link]([Link], hp_mean)

Then the portion of the data set not having any missing row is kept apart.

>>> missval_removed_rows = [Link](subset=['horsepower'])

Then join back the imputed rows and the remaining part of the data set.

>>> data_mod = missval_removed_rows.append(imputedrows,

ignore_index=True)

In a similar way, outlier values can be imputed.

THANK YOU &
STAY TUNED!

3-Numpy Pandas
No ratings yet
3-Numpy Pandas
37 pages
ML Lab Manual
No ratings yet
ML Lab Manual
12 pages
Python for Scientific Computing: NumPy & Pandas
No ratings yet
Python for Scientific Computing: NumPy & Pandas
7 pages
ELE492 - ELE492 - Image Process Lecture Notes 5
No ratings yet
ELE492 - ELE492 - Image Process Lecture Notes 5
41 pages
OCS353 Data Science Manual Print
No ratings yet
OCS353 Data Science Manual Print
58 pages
22mbada303 Module 4
No ratings yet
22mbada303 Module 4
32 pages
Python Unit 4
No ratings yet
Python Unit 4
43 pages
TD5Numpy Pandas and Matplotlib
No ratings yet
TD5Numpy Pandas and Matplotlib
5 pages
Dsa Lab Record (Ai&Ds)
No ratings yet
Dsa Lab Record (Ai&Ds)
34 pages
Unit 5 PythonPackages (Matplotlib)
No ratings yet
Unit 5 PythonPackages (Matplotlib)
24 pages
EXP1-siddhant Gupta (23 - SE - 148)
No ratings yet
EXP1-siddhant Gupta (23 - SE - 148)
17 pages
PR Final File
No ratings yet
PR Final File
70 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
43 pages
DP Prog
No ratings yet
DP Prog
10 pages
IRJET Scientific Computing and Data Anal
No ratings yet
IRJET Scientific Computing and Data Anal
13 pages
DV Lab2 Updated
No ratings yet
DV Lab2 Updated
12 pages
Unit 5
No ratings yet
Unit 5
39 pages
Python Module 5
No ratings yet
Python Module 5
43 pages
PPS - Unit 5 (Imp Topics)
No ratings yet
PPS - Unit 5 (Imp Topics)
7 pages
Aids Lab
No ratings yet
Aids Lab
45 pages
Scipy, Matplotlib, Pandas
No ratings yet
Scipy, Matplotlib, Pandas
16 pages
PR Final File
No ratings yet
PR Final File
49 pages
Jetlearn Practice - Dimitrina Grazhdani-JL9124415155
No ratings yet
Jetlearn Practice - Dimitrina Grazhdani-JL9124415155
62 pages
RAW Data
No ratings yet
RAW Data
22 pages
Tutorial 2
No ratings yet
Tutorial 2
9 pages
Ty B Tech - Bda - Ai315 - Lab Manual
No ratings yet
Ty B Tech - Bda - Ai315 - Lab Manual
52 pages
DA&V Module 6 (SAMI)
No ratings yet
DA&V Module 6 (SAMI)
10 pages
Ch2 Numpy Pandas
No ratings yet
Ch2 Numpy Pandas
87 pages
Unit 4
No ratings yet
Unit 4
105 pages
Exp 1
No ratings yet
Exp 1
22 pages
Unit 5
No ratings yet
Unit 5
28 pages
Numpy
No ratings yet
Numpy
64 pages
NumPy Basics: Arrays & Computation Guide
No ratings yet
NumPy Basics: Arrays & Computation Guide
33 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
48 pages
Python NumPy for Beginners
100% (1)
Python NumPy for Beginners
84 pages
Fundamentals of Data Science Lab Manual New1
100% (1)
Fundamentals of Data Science Lab Manual New1
32 pages
Fds Lab Record
No ratings yet
Fds Lab Record
84 pages
ML3 Data Analysis
No ratings yet
ML3 Data Analysis
80 pages
Ex. No.: 01 Working With Numpy Arrays
No ratings yet
Ex. No.: 01 Working With Numpy Arrays
30 pages
DSL Rough Draft
No ratings yet
DSL Rough Draft
34 pages
Numpy Basics Introduction To
No ratings yet
Numpy Basics Introduction To
35 pages
Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024
No ratings yet
Ex. No: 1 Exploring The Features of Numpy, Scipy, Jupyter, Statsmodels and Pandas Date: 07/08/2024
9 pages
Python Libraries: NumPy & Pandas Guide
No ratings yet
Python Libraries: NumPy & Pandas Guide
79 pages
4 Introduction To Python Part 3
No ratings yet
4 Introduction To Python Part 3
62 pages
CS3361 Data Science Lab Manual
No ratings yet
CS3361 Data Science Lab Manual
43 pages
Key Python Libraries for Numerical Computing
100% (1)
Key Python Libraries for Numerical Computing
41 pages
ML Practice Session 2
No ratings yet
ML Practice Session 2
7 pages
Batch2 FDS Printout
No ratings yet
Batch2 FDS Printout
38 pages
Fdsa Lab Manual Final
No ratings yet
Fdsa Lab Manual Final
70 pages
ML Lab File
No ratings yet
ML Lab File
43 pages
Python Data Analysis Guide
No ratings yet
Python Data Analysis Guide
75 pages
Numpy Data Analysis and Visualisation With Python
No ratings yet
Numpy Data Analysis and Visualisation With Python
75 pages
Fundamentals of Data Science Lab Manual
No ratings yet
Fundamentals of Data Science Lab Manual
34 pages
NumPy and Pandas: Essential Python Libraries
No ratings yet
NumPy and Pandas: Essential Python Libraries
72 pages
Numpy Python
No ratings yet
Numpy Python
36 pages
Numpy and Matplotlib Practical
No ratings yet
Numpy and Matplotlib Practical
8 pages
Model Evaluation for Engineers
No ratings yet
Model Evaluation for Engineers
8 pages
Model Evaluation and Improvement 2
No ratings yet
Model Evaluation and Improvement 2
17 pages
Python Basics for Beginners
No ratings yet
Python Basics for Beginners
29 pages
AIML Week3 QUIZ1-200519-114501
No ratings yet
AIML Week3 QUIZ1-200519-114501
3 pages
Foundational Learnwise
No ratings yet
Foundational Learnwise
1 page
Introduction To Python (Part I)
No ratings yet
Introduction To Python (Part I)
25 pages
Assignment - Build Your Solution Demo and Conduct Solution Interviews
No ratings yet
Assignment - Build Your Solution Demo and Conduct Solution Interviews
6 pages
Understanding Feature Engineering
No ratings yet
Understanding Feature Engineering
17 pages
Assignment - Build Your Solution Demo and Conduct Solution Interviews
No ratings yet
Assignment - Build Your Solution Demo and Conduct Solution Interviews
6 pages
Assignment - Build Your Solution Demo and Conduct Solution Interviews
No ratings yet
Assignment - Build Your Solution Demo and Conduct Solution Interviews
6 pages
8085 Microprocessor Architecture Overview
No ratings yet
8085 Microprocessor Architecture Overview
59 pages
28-09-24 - SR - Elite, Target & Star - Jee Mains - RPTM-06 - Q, Paper - T
No ratings yet
28-09-24 - SR - Elite, Target & Star - Jee Mains - RPTM-06 - Q, Paper - T
19 pages
Fuzzy Ideals in Semigroups
No ratings yet
Fuzzy Ideals in Semigroups
4 pages
G4-SHS Content List
No ratings yet
G4-SHS Content List
347 pages
Matrices: Elementary Matrix Theory
No ratings yet
Matrices: Elementary Matrix Theory
17 pages
International Math Competition 2005 Problems
No ratings yet
International Math Competition 2005 Problems
4 pages
Numerical Ecology With R Use R Borcard Daniel Gillet François Legendre Pierre Instant Download
No ratings yet
Numerical Ecology With R Use R Borcard Daniel Gillet François Legendre Pierre Instant Download
167 pages
Matlab Curve Fitting Guide
No ratings yet
Matlab Curve Fitting Guide
16 pages
Understanding Demand Functions & Elasticity
No ratings yet
Understanding Demand Functions & Elasticity
21 pages
Elements of Fracture Mechanics: Birla Institute of Technology & Science Pilani (Rajasthan)
No ratings yet
Elements of Fracture Mechanics: Birla Institute of Technology & Science Pilani (Rajasthan)
4 pages
Electromagnetic Field Theory Assignment
No ratings yet
Electromagnetic Field Theory Assignment
73 pages
Advanced Calculus Exam
No ratings yet
Advanced Calculus Exam
3 pages
Trigonometry for Math Students
No ratings yet
Trigonometry for Math Students
10 pages
Exploring The Influence of Financial Development, Renewable Energy, and Tourism On Environmental Sustainability in Tunisia
No ratings yet
Exploring The Influence of Financial Development, Renewable Energy, and Tourism On Environmental Sustainability in Tunisia
23 pages
Game Theory Concepts and Strategies Review
No ratings yet
Game Theory Concepts and Strategies Review
40 pages
AQA Further Maths Exam Practice
100% (2)
AQA Further Maths Exam Practice
105 pages
End of Term 1 Math G4
No ratings yet
End of Term 1 Math G4
5 pages
p4 1st Term Mathematics Exam 2023-2024 - 112059
No ratings yet
p4 1st Term Mathematics Exam 2023-2024 - 112059
5 pages
0 Git t24 Documentation PDF
100% (2)
0 Git t24 Documentation PDF
347 pages
Explicit Dynamics Chapter 6 Explicit Meshing
No ratings yet
Explicit Dynamics Chapter 6 Explicit Meshing
50 pages
F1 Math Syllabus 2223
No ratings yet
F1 Math Syllabus 2223
7 pages
Syllabus 2020
No ratings yet
Syllabus 2020
2 pages
Mechanical Vibrations
No ratings yet
Mechanical Vibrations
6 pages
2024 g8 Mathematic Test1 - 022004
No ratings yet
2024 g8 Mathematic Test1 - 022004
3 pages
Signals and Systems Course Plan (Sem 3)
No ratings yet
Signals and Systems Course Plan (Sem 3)
2 pages
Module 1c Augmented Matrices 1
No ratings yet
Module 1c Augmented Matrices 1
12 pages
Variance
No ratings yet
Variance
5 pages
CO 5 SKILL MATERIAL 22 23maths
No ratings yet
CO 5 SKILL MATERIAL 22 23maths
73 pages
ML Insem PYQ 2022 To 24
No ratings yet
ML Insem PYQ 2022 To 24
6 pages
Engineering and The Mind's Eye
100% (2)
Engineering and The Mind's Eye
14 pages
Math Problem Solutions and Explanations
100% (1)
Math Problem Solutions and Explanations
73 pages