0% found this document useful (0 votes)

55 views49 pages

Unit IV

The document provides an overview of the Pandas library in Python, detailing its capabilities for data analysis, cleaning, and manipulation. It explains key components such as Series and DataFrames, along with various methods for data operations including head, tail, mean, and dropna. Additionally, it covers how to read CSV files and perform statistical analyses on datasets.

Uploaded by

chatgp8766

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views49 pages

Unit IV

Uploaded by

chatgp8766

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Unit IV:

Pandas
Pranav Gupta
INTRODUCTION
• Pandas is a Python library used for working with data sets.
• It has functions for analyzing, cleaning, exploring, and manipulating
data.
• Pandas allows us to analyze big data and make conclusions based on
statistical theories.
• Pandas can clean messy data sets, and make them readable and
relevant.
• Pandas are also able to delete rows that are not relevant, or contains
wrong values, like empty or NULL values. This is called cleaning the
data.
SERIES
• A Pandas Series is like a column in a table.
• It is a one-dimensional array holding data of any type.
• EXAMPLE:
import pandas as pd

a = [2,3,4]

var = [Link](a)

print(var)
LABELS
• If nothing else is specified, the values are labeled with their index
number.
• First value has index 0, second value has index 1 etc.
• This label can be used to access a specified value.
• EXAMPLE: print(var[0])
• Using index argument, the user can name their own labels.
EXAMPLE
#CODE:
import pandas as pd
a = [2,3,4]
var = [Link](a, index = ["x", "y", "z"])
print(var)
#Output
KEY/VALUE OBJECTS AS SERIES
#CODE:
import pandas as pd

num = {"day1": 420, "day2": 380, "day3": 390}

var = [Link](num)

print(var)
DATAFRAMES
• A Pandas DataFrame is a 2 dimensional data
structure, like a 2 dimensional array, or a table with
rows and columns.
EXAMPLE:
import pandas as pd
data = { "S_No": [420, 380, 390],
"Name": ['Dean', 'Jane', 'Shaw']}
df = [Link](data, index = ["Section E",
"Section F","Section G"])
print(df)
LOCATE ROW
• Data frames store data in the form of rows and columns.
• To access one or more rows, loc parameter is used in
pandas.
• EXAMPLE: print([Link][0]) OR print([Link][[0, 1]])
READ CSV IN PANDAS
• A simple way to store big data sets is to use CSV files
(comma separated files).
• CSV files contains plain text and is a well know format
that can be read by everyone including Pandas.
• The to_string() is used to print the entire DataFrame.
EXAMPLE:
import pandas as pd
df = pd.read_csv('[Link]')
print(df.to_string())
DATAFRAME OPERATIONS
• Pandas contain a numerous amount of methods and functions that
can be used to manipulate data.
• These operations are used by data analysts to find correlation,
maximum, minimum, mean, null values in the dataset.
• Some of the dataframe operations are head, tail , info, shape,
reshape, columns, isnull, dropna, mean, sum, describe, value_counts,
corr, loc, iloc, and apply.
1. head():
• This function returns the first 5 rows of the DataFrame.
• This function returns a specified number of rows from
the top.
• It returns the first 5 rows if a number is not specified.
• The column names will also be returned, in addition to
the specified rows.
• SYNTAX: [Link](n)
• Here,

Optional. The number of rows to return.

n Default value is 5.
WITHOUT PARAMETER n
WITH PARAMETER n
2. tail()
• It returns the last 5 rows of the DataFrame.
• This method returns a specified number of last rows.
• This method returns the last 5 rows if a number is not
specified.
• The column names will also be returned, in addition to the
specified rows.
• SYNTAX: [Link](n)
• Here,

Optional. The number of

n rows to return. Default
value is 5.
WITHOUT PARAMETER n
WITH PARAMETER n
3. info()
• This method prints information about the DataFrame.
• The information contains the number of columns, column
labels, column data types, memory usage, range index, and
the number of cells in each column (non-null values).
• This method actually prints the info. There is no need to
use the print() method to print the info.
• SYNTAX:
[Link](verbose,buf,max_cols,memory_usage,sho
w_counts,null_counts)
WITHOUT ANY PARAMETERS
4. shape
• This property returns a tuple containing the shape of the
DataFrame.
• The shape is the number of rows and columns of the
DataFrame.
• SYNTAX: [Link]
5. columns
• This property returns the label of each column in the
DataFrame.
• SYNTAX: [Link]
6. isnull()
• This method returns a DataFrame object where all the
values are replaced with a Boolean value True for NULL
values, and otherwise False.
• This method takes no parameters.
• SYNTAX: [Link]()
7. dropna()
• This method removes the rows that contains NULL values.
• It returns a new DataFrame object unless
the inplace parameter is set to True, in that case
the dropna() method does the removing in the original
DataFrame instead.
• SYNTAX: [Link](axis, how, thresh, subset,
inplace)
Parameter Value Description
axis 0 Optional, default 0.
1 0 and 'index‘ removes ROWS that contains NULL
'index' values
'columns' 1 and 'columns' removes COLUMNS that
contains NULL values
how 'all' Optional, default 'any'. Specifies whether to
'any' remove the row or column when ALL values are
NULL, or if ANY value is NULL.

thresh Number Optional, Specifies the number of NOT NULL

values required to keep the row.

subset List Optional, specifies where to look for NULL

values
inplace True Optional, default False. If True: the removing is
False done on the current DataFrame. If False: returns
a copy where the removing is done.
8. mean()
• The mean() method returns a Series with the mean value
of each column.
• By specifying the column axis (axis='columns'),
the mean() method searches column-wise and returns the
mean value for each row.
• SYNTAX: [Link](axis, skipna, level,
numeric_only, kwargs)
Parameter Value Description
axis 0 Optional, Which axis to check, default 0.
1
'index'
'columns'

skip_na True Optional, default True. Set to False if the

False result should NOT skip NULL values
level Number Optional, default None. Specifies which
level name level ( in a hierarchical multi index) to
check along
numeric_on None Optional. Specify whether to only check
ly True numeric values. Default None
False

kwargs Optional, keyword arguments. These

arguments has no effect, but could be
accepted by a NumPy function
9. sum()
• This method adds all values in each column and returns the
sum for each column.
• By specifying the column axis (axis='columns'),
the sum() method searches column-wise and returns the
sum of each row.
• SYNTAX: [Link](axis, skipna, level, numeric_only,
min_count, kwargs)
Parameter Value Description
axis 0 Optional, Which axis to check, default 0.
1
'index'
'columns'
skip_na True Optional, default True. Set to False if the result
False should NOT skip NULL values
level Number Optional, default None. Specifies which level ( in
level name a hierarchical multi index) to check along
numeric_only None Optional. Specifies whether to only check
True numeric values. Default None
False
min_count None Optional. Specifies the minimum number of
True values that needs to be present to perform the
False action. Default 0
kwargs Optional, keyword arguments. These arguments
has no effect, but could be accepted by a
NumPy function
10. describe()
• This method returns description of the data in the
DataFrame.
• SYNTAX: [Link](percentiles, include, exclude,
datetime_is_numeric)
Parameter Value Description
percentile numbers Optional, a list of percentiles to
between: include in the result, default is :
0 and 1 [.25, .50, .75].

include None Optional, a list of the data types to

'all' allow in the result
datatypes

exclude None Optional, a list of the data types to

'all' disallow in the result
datatypes

datetime_is True Optional, default False. Set to True to

_numeric False treat datetime data as numeric
11. corr
• This method finds the correlation(relationship) between
each column in a DataFrame.
• SYNTAX: [Link](method, min_periods)

Parameter Value Description

method 'kendall' Optional, Default pearson. Specifies
'pearson' which method to use, or a callable
'spearman' function.
func

min_perio Number Optional. Specifies the minimum

ds number of observations required to
return a good enough result
12. Iloc
• The iloc property gets, or sets, the value(s) of the specified
indexes.
• Specify both row and column with an index.
• To access more than one row, use double brackets and specify
the indexes, separated by commas.
• To slice of the DataFrame with from and to indexes, separated
by a colon
• SYNTAX: [Link][[0, 2]] OR [Link][0:2]
13. apply
• The apply() method allows you
to apply a function along one of
the axis of the DataFrame,
default 0, which is the index
(row) axis.
• SYNTAX: [Link](func,
axis, raw, result_type,
args, kwds)
Parameter Value Description
func Required. A function to apply to the
DataFrame.
axis 0 Optional, Which axis to apply the function to.
1 default 0.
'index'
'columns'

raw True Optional, default False. Set to true if the

False row/column should be passed as an ndarray
object
result_type 'expand' Optional, default None. Specifies how the
'reduce' result will be returned
'broadcast'
None

args a tuple Optional, arguments to send into the function

kwds keyword Optional, keyword arguments to send into the
arguments function
14. value_counts()
• This method returns a Series containing the frequency of
each distinct row in the Dataframe.
• SYNTAX: dataframe.value_counts(subset=None,
normalize=False, sort=True, ascending=False,
dropna=True)

DataFrame Ac Win Final
No ratings yet
DataFrame Ac Win Final
30 pages
Pandas
No ratings yet
Pandas
8 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
27 pages
Exp3 Python
No ratings yet
Exp3 Python
15 pages
Data Analysis with Pandas Overview
No ratings yet
Data Analysis with Pandas Overview
49 pages
Pandas
No ratings yet
Pandas
5 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Python Pandas DataFrame Guide
100% (2)
Python Pandas DataFrame Guide
23 pages
Pandas DataFrame Cheat Sheet Guide
No ratings yet
Pandas DataFrame Cheat Sheet Guide
10 pages
Overview of Pandas DataFrames
No ratings yet
Overview of Pandas DataFrames
21 pages
Pandas
No ratings yet
Pandas
13 pages
04-Data Manipulation With Pandas
No ratings yet
04-Data Manipulation With Pandas
28 pages
Pandas
No ratings yet
Pandas
25 pages
For Assignment-3 (Final - Pandas - Lab)
No ratings yet
For Assignment-3 (Final - Pandas - Lab)
40 pages
Python Pandas Cheat Sheet Guide
No ratings yet
Python Pandas Cheat Sheet Guide
11 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
10 pages
Lab 1 ML Lab
No ratings yet
Lab 1 ML Lab
15 pages
Exercise 3
No ratings yet
Exercise 3
12 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
Pandas
No ratings yet
Pandas
26 pages
Pandas Guide for Data Analysts
No ratings yet
Pandas Guide for Data Analysts
9 pages
DataFrames Continued
No ratings yet
DataFrames Continued
9 pages
Python Chrat Book Pandas
No ratings yet
Python Chrat Book Pandas
4 pages
Pandas DataFrame Cheat Sheet
100% (1)
Pandas DataFrame Cheat Sheet
10 pages
Essential Pandas DataFrame Guide
No ratings yet
Essential Pandas DataFrame Guide
9 pages
Pandas DataFrame Cheat Sheet
No ratings yet
Pandas DataFrame Cheat Sheet
4 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Pandas DataFrame Cheat Sheet Guide
No ratings yet
Pandas DataFrame Cheat Sheet Guide
12 pages
Data Handling for Data Scientists
No ratings yet
Data Handling for Data Scientists
163 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
Phan1 Pandas Numpy Matplotlib
No ratings yet
Phan1 Pandas Numpy Matplotlib
158 pages
Pandas
No ratings yet
Pandas
29 pages
ML Unit-2 Notes
No ratings yet
ML Unit-2 Notes
17 pages
Pandas
No ratings yet
Pandas
21 pages
Pandas
No ratings yet
Pandas
4 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
10 pages
Subject IP
No ratings yet
Subject IP
9 pages
Pandas DataFrame Basics Guide
No ratings yet
Pandas DataFrame Basics Guide
32 pages
Pandas
No ratings yet
Pandas
63 pages
IP 12th Chapter 3
No ratings yet
IP 12th Chapter 3
9 pages
Pandas Notes
No ratings yet
Pandas Notes
20 pages
Python Data Structures and Libraries Guide
No ratings yet
Python Data Structures and Libraries Guide
7 pages
Pandas Dataframe
No ratings yet
Pandas Dataframe
8 pages
Pandas DataFrame Basics Guide
No ratings yet
Pandas DataFrame Basics Guide
9 pages
Fundamental - Python
No ratings yet
Fundamental - Python
3 pages
Lab 9
No ratings yet
Lab 9
9 pages
DevOps Session 3 Pandas
No ratings yet
DevOps Session 3 Pandas
33 pages
Python Pandas Tutorial For Beginners
100% (1)
Python Pandas Tutorial For Beginners
203 pages
Content Pandas Cheat Sheet
No ratings yet
Content Pandas Cheat Sheet
9 pages
Chapter 2 - Python Pandas II
No ratings yet
Chapter 2 - Python Pandas II
71 pages
Pandas
No ratings yet
Pandas
25 pages
Data Handling Part Ii
No ratings yet
Data Handling Part Ii
41 pages
Cheat Sheet - Pandas
No ratings yet
Cheat Sheet - Pandas
12 pages
Pandas For Python Pro Level Cheat Sheet
No ratings yet
Pandas For Python Pro Level Cheat Sheet
14 pages
Data Frames
No ratings yet
Data Frames
42 pages
Student Entry Management System Report
No ratings yet
Student Entry Management System Report
63 pages
Oracle Essbase21c: Learn in Days
No ratings yet
Oracle Essbase21c: Learn in Days
3 pages
Apache POI: Excel File Handling Guide
No ratings yet
Apache POI: Excel File Handling Guide
12 pages
Joule 4 Devs
No ratings yet
Joule 4 Devs
39 pages
IT Java Programming Lab Manual
No ratings yet
IT Java Programming Lab Manual
61 pages
Data Sheet Architecture Based Programming Using Codesys
No ratings yet
Data Sheet Architecture Based Programming Using Codesys
2 pages
DAX Time Intelligence Guide
No ratings yet
DAX Time Intelligence Guide
3 pages
Rosoka General 2015
No ratings yet
Rosoka General 2015
20 pages
Beginner's Guide to Dockerfile Basics
No ratings yet
Beginner's Guide to Dockerfile Basics
14 pages
CS304P - Lab Exercises
No ratings yet
CS304P - Lab Exercises
15 pages
TIBCO Statistica® Installation Instructions: December 2020
No ratings yet
TIBCO Statistica® Installation Instructions: December 2020
41 pages
Python Telegram Bot
No ratings yet
Python Telegram Bot
287 pages
Access Specifiers in Java
No ratings yet
Access Specifiers in Java
11 pages
Iloc, Loc, and Ix For Data Selection in Python Pandas - Shane Lynn
No ratings yet
Iloc, Loc, and Ix For Data Selection in Python Pandas - Shane Lynn
2 pages
JT With Advanced Perl: Junos Automation Training For Junose Engineers
No ratings yet
JT With Advanced Perl: Junos Automation Training For Junose Engineers
19 pages
How To Deploy Ethereum On Windows v51
No ratings yet
How To Deploy Ethereum On Windows v51
28 pages
Microsoft DP-600 - Implementing Analytics Solutions Using Microsoft Fabric (Beta) Exam
No ratings yet
Microsoft DP-600 - Implementing Analytics Solutions Using Microsoft Fabric (Beta) Exam
3 pages
SPAD
No ratings yet
SPAD
34 pages
SPDK Vhost-Nvme: Accelerating I/Os in Virtual Machines On Nvme Ssds Via User Space Vhost Target
No ratings yet
SPDK Vhost-Nvme: Accelerating I/Os in Virtual Machines On Nvme Ssds Via User Space Vhost Target
10 pages
Smarter Work Management System Overview
100% (1)
Smarter Work Management System Overview
31 pages
Student Project Weekly Updates
No ratings yet
Student Project Weekly Updates
46 pages
LDP UNIT 2 (BCA BSC) Sem 1
No ratings yet
LDP UNIT 2 (BCA BSC) Sem 1
43 pages
University of Mumbai Class: T.E. Branch: Semester: VI Subject: Object Oriented Software Engineering (Abbreviated As OOSE)
No ratings yet
University of Mumbai Class: T.E. Branch: Semester: VI Subject: Object Oriented Software Engineering (Abbreviated As OOSE)
3 pages
UAV DevBoard PIC Programming Guide
No ratings yet
UAV DevBoard PIC Programming Guide
7 pages
STM32 Tutorial 03 - UART Communication Using HAL (And FreeRTOS)
100% (2)
STM32 Tutorial 03 - UART Communication Using HAL (And FreeRTOS)
5 pages
AgentForce Questions
No ratings yet
AgentForce Questions
5 pages
Unit2 PPT Advanced Concept of Modeling in AI
No ratings yet
Unit2 PPT Advanced Concept of Modeling in AI
63 pages
BICT Chapter 9 - System Analysis and Design
No ratings yet
BICT Chapter 9 - System Analysis and Design
84 pages
Multiline Message Display in C++
No ratings yet
Multiline Message Display in C++
8 pages
Java Programming for Beginners
No ratings yet
Java Programming for Beginners
38 pages

Unit IV

Uploaded by

Unit IV

Uploaded by

Unit IV:

num = {"day1": 420, "day2": 380, "day3": 390}

Optional. The number of rows to return.

Optional. The number of

thresh Number Optional, Specifies the number of NOT NULL

subset List Optional, specifies where to look for NULL

skip_na True Optional, default True. Set to False if the

kwargs Optional, keyword arguments. These

include None Optional, a list of the data types to

exclude None Optional, a list of the data types to

datetime_is True Optional, default False. Set to True to

Parameter Value Description

min_perio Number Optional. Specifies the minimum

raw True Optional, default False. Set to true if the

args a tuple Optional, arguments to send into the function

You might also like