0% found this document useful (0 votes)

32 views

EDA All Functions

The document discusses various exploratory data analysis functions in pandas such as df.head(), df.tail(), df.info(), df.describe(), df.shape, df.columns, and df.dtypes. It provides examples of using each function on a sample DataFrame and describes what each function shows.

Uploaded by

classfunction9

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views

EDA All Functions

Uploaded by

classfunction9

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

3/25/24, 10:42 PM EDA All Functions

Note Book By Tariq Ahmed (WP:

+923070996076)
1. df.head(n): Returns the first n rows of the DataFrame.
2. df.tail(n): Returns the last n rows of the DataFrame.
3. df.info(): Provides information about the DataFrame, including column names, data types,
and non-null value counts.
4. df.describe(): Computes various descriptive statistics for numerical columns in the
DataFrame, such as count, mean, standard deviation, and percentiles.
5. df.shape: Returns the dimensions (number of rows and columns) of the DataFrame.
6. df.columns: Returns the column names of the DataFrame.
7. df.dtypes: Returns the data types of each column in the DataFrame.
8. df.isnull(): Checks for missing values and returns a DataFrame of the same shape with
True/False values indicating the presence of missing values.
9. df.dropna(): Removes rows with missing values from the DataFrame.
10. df.fillna(value): Fills missing values in the DataFrame with a specified value.
11. df.groupby(by): Groups the DataFrame by one or more columns and returns a GroupBy
object for further aggregation and analysis.
12. df.sort_values(by): Sorts the DataFrame by one or more columns.
13. df.merge(df2): Merges two DataFrames based on common columns or indices.
14. df.pivot_table(values, index, columns): Creates a pivot table from the DataFrame,
aggregating values based on specified columns.
15. df.apply(func): Applies a function to each element or column of the DataFrame.

import libraries
In [2]: import pandas as pd

To find csv file encoding

In [5]: with open('Diwali Sales Data.csv') as f:
print(f)

<_io.TextIOWrapper name='Diwali Sales Data.csv' mode='r' encoding='cp1252'>

import Csv file

In [6]: df=pd.read_csv('Diwali Sales Data.csv',encoding='cp1252')

localhost:8888/nbconvert/html/Class EDA/EDA All Functions.ipynb?download=false 1/9

3/25/24, 10:42 PM EDA All Functions

df.head() function is used to display the first

few rows of a DataFrame object in pandas,
which is a popular data manipulation and
analysis library.
In [7]: df.head()

Out[7]: Age
User_ID Cust_name Product_ID Gender Age Marital_Status State Zone O
Group

0 1002903 Sanskriti P00125942 F 26-35 28 0 Maharashtra Western

1 1000732 Kartik P00110942 F 26-35 35 1 Andhra Pradesh Southern

2 1001990 Bindu P00118542 F 26-35 35 1 Uttar Pradesh Central A

3 1001425 Sudevi P00237842 M 0-17 16 0 Karnataka Southern C

4 1000588 Joni P00057942 M 26-35 28 1 Gujarat Western

df.tail(), it returns the last five rows of the

DataFrame by default.
In [10]: df.tail()

Out[10]: Age
User_ID Cust_name Product_ID Gender Age Marital_Status State Zone
Group

11246 1000695 Manning P00296942 M 18-25 19 1 Maharashtra Western

11247 1004089 Reichenbach P00171342 M 26-35 33 0 Haryana Northern

Madhya
11248 1001209 Oshin P00201342 F 36-45 40 0 Central
Pradesh

11249 1004023 Noonan P00059442 M 36-45 37 0 Karnataka Southern

11250 1002744 Brumley P00281742 F 18-25 19 0 Maharashtra Western

df.info() it provides a summary of the

DataFrame, including the following
information:
The total number of rows and columns in the DataFrame. The column names and their
corresponding data types. The count of non-null values in each column. The memory usage of
localhost:8888/nbconvert/html/Class EDA/EDA All Functions.ipynb?download=false 2/9
3/25/24, 10:42 PM EDA All Functions

the DataFrame.

In [8]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11251 entries, 0 to 11250
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 User_ID 11251 non-null int64
1 Cust_name 11251 non-null object
2 Product_ID 11251 non-null object
3 Gender 11251 non-null object
4 Age Group 11251 non-null object
5 Age 11251 non-null int64
6 Marital_Status 11251 non-null int64
7 State 11251 non-null object
8 Zone 11251 non-null object
9 Occupation 11251 non-null object
10 Product_Category 11251 non-null object
11 Orders 11251 non-null int64
12 Amount 11239 non-null float64
13 Status 0 non-null float64
14 unnamed1 0 non-null float64
dtypes: float64(3), int64(4), object(8)
memory usage: 1.3+ MB

df.describe() function in pandas is used to

generate descriptive statistics of a
DataFrame.such as count, mean, standard
deviation, minimum value,
In [9]: df.describe()

Out[9]: User_ID Age Marital_Status Orders Amount Status unnamed1

count 1.125100e+04 11251.000000 11251.000000 11251.000000 11239.000000 0.0 0.0

mean 1.003004e+06 35.421207 0.420318 2.489290 9453.610858 NaN NaN

std 1.716125e+03 12.754122 0.493632 1.115047 5222.355869 NaN NaN

min 1.000001e+06 12.000000 0.000000 1.000000 188.000000 NaN NaN

25% 1.001492e+06 27.000000 0.000000 1.500000 5443.000000 NaN NaN

50% 1.003065e+06 33.000000 0.000000 2.000000 8109.000000 NaN NaN

75% 1.004430e+06 43.000000 1.000000 3.000000 12675.000000 NaN NaN

max 1.006040e+06 92.000000 1.000000 4.000000 23952.000000 NaN NaN

localhost:8888/nbconvert/html/Class EDA/EDA All Functions.ipynb?download=false 3/9

3/25/24, 10:42 PM EDA All Functions

df.shape (number of rows and columns) of

the DataFrame.
In [11]: df.shape

(11251, 15)
Out[11]:

df.columns Show the column names of the

DataFrame.
In [13]: df.columns

Index(['User_ID', 'Cust_name', 'Product_ID', 'Gender', 'Age Group', 'Age',

Out[13]:
'Marital_Status', 'State', 'Zone', 'Occupation', 'Product_Category',
'Orders', 'Amount', 'Status', 'unnamed1'],
dtype='object')

df.dtypes shows the data types of each

column in the DataFrame.
In [14]: df.dtypes

User_ID int64
Out[14]:
Cust_name object
Product_ID object
Gender object
Age Group object
Age int64
Marital_Status int64
State object
Zone object
Occupation object
Product_Category object
Orders int64
Amount float64
Status float64
unnamed1 float64
dtype: object

df.isnull(): Checks for missing values and

returns a DataFrame of the same shape with
True/False values indicating the presence of
missing values.
In [15]: df.isnull()

localhost:8888/nbconvert/html/Class EDA/EDA All Functions.ipynb?download=false 4/9

3/25/24, 10:42 PM EDA All Functions

Out[15]: Age
User_ID Cust_name Product_ID Gender Age Marital_Status State Zone Occupatio
Group

0 False False False False False False False False False Fal

1 False False False False False False False False False Fal

2 False False False False False False False False False Fal

3 False False False False False False False False False Fal

4 False False False False False False False False False Fal

... ... ... ... ... ... ... ... ... ...

11246 False False False False False False False False False Fal

11247 False False False False False False False False False Fal

11248 False False False False False False False False False Fal

11249 False False False False False False False False False Fal

11250 False False False False False False False False False Fal

11251 rows × 15 columns

df.isnull().sum() Checks for missing values

and count how many nulls are.
In [16]: df.isnull().sum()

User_ID 0
Out[16]:
Cust_name 0
Product_ID 0
Gender 0
Age Group 0
Age 0
Marital_Status 0
State 0
Zone 0
Occupation 0
Product_Category 0
Orders 0
Amount 12
Status 11251
unnamed1 11251
dtype: int64

df.dropna(): Removes rows with missing

values from the DataFrame.
In [17]: df.dropna()

localhost:8888/nbconvert/html/Class EDA/EDA All Functions.ipynb?download=false 5/9

3/25/24, 10:42 PM EDA All Functions

Out[17]: Age
User_ID Cust_name Product_ID Gender Age Marital_Status State Zone Occupation Pro
Group

df.drop('column name',axis=1,inplace=True)
Removes Missing values from column.
In [23]: df.drop('unnamed1',axis=1,inplace=True)

df.fillna(value): Fills missing values in the

DataFrame with a specified value.
In [27]: # Fill missing values with a constant value
df_filled = df.fillna(0)
df_filled

Out[27]: Age
User_ID Cust_name Product_ID Gender Age Marital_Status State Zo
Group

0 1002903 Sanskriti P00125942 F 26-35 28 0 Maharashtra Weste

1 1000732 Kartik P00110942 F 26-35 35 1 Andhra Pradesh Southe

2 1001990 Bindu P00118542 F 26-35 35 1 Uttar Pradesh Cent

3 1001425 Sudevi P00237842 M 0-17 16 0 Karnataka Southe

4 1000588 Joni P00057942 M 26-35 28 1 Gujarat Weste

... ... ... ... ... ... ... ... ...

11246 1000695 Manning P00296942 M 18-25 19 1 Maharashtra Weste

11247 1004089 Reichenbach P00171342 M 26-35 33 0 Haryana Northe

Madhya
11248 1001209 Oshin P00201342 F 36-45 40 0 Cent
Pradesh

11249 1004023 Noonan P00059442 M 36-45 37 0 Karnataka Southe

11250 1002744 Brumley P00281742 F 18-25 19 0 Maharashtra Weste

11251 rows × 14 columns

Fill missing values with the mean of the

column

localhost:8888/nbconvert/html/Class EDA/EDA All Functions.ipynb?download=false 6/9

3/25/24, 10:42 PM EDA All Functions

df['column name'].fillna(df['column
name'].mean(),inplace=True)
In [33]: df['Amount'].fillna(df['Amount'].mean(),inplace=True)

In [34]: #check it's fill

df.isnull().sum()

User_ID 0
Out[34]:
Cust_name 0
Product_ID 0
Gender 0
Age Group 0
Age 0
Marital_Status 0
State 0
Zone 0
Occupation 0
Product_Category 0
Orders 0
Amount 0
Status 11251
dtype: int64

df.groupby(by) function in pandas is used to

group a DataFrame by one or more columns.
It allows you to split the DataFrame into
groups based on unique values in the
specified column(s) and perform operations
on each group independently.
In [53]: grouped = df.groupby(['Product_ID', 'Cust_name'])
mean_age = grouped['Age'].mean()
print(mean_age)

Product_ID Cust_name
P00000142 Adrian 19.0
Akshat 27.0
Armstrong
34.0
Arun 33.0
Atkinson46.0
...
P0099442 Amol 26.0
Astrea 35.0
Grant 32.0
Siddharth 36.0
P0099742 Shatayu 13.0
Name: Age, Length: 10948, dtype: float64

In [54]: # in one line

mean_values = df.groupby(['Product_ID', 'Cust_name'])['Age'].mean()
mean_values
localhost:8888/nbconvert/html/Class EDA/EDA All Functions.ipynb?download=false 7/9
3/25/24, 10:42 PM EDA All Functions

Product_ID Cust_name
Out[54]:
P00000142 Adrian 19.0
Akshat 27.0
Armstrong
34.0
Arun 33.0
Atkinson46.0
...
P0099442 Amol 26.0
Astrea 35.0
Grant 32.0
Siddharth 36.0
P0099742 Shatayu 13.0
Name: Age, Length: 10948, dtype: float64

df.sort_values(by): Sorts the DataFrame by

one or more columns.
In [59]: #df.sort_values(by='Column1') # Sort by a single column

df.sort_values(by='Amount')

Out[59]: Age
User_ID Cust_name Product_ID Gender Age Marital_Status State Zo
Group

11250 1002744 Brumley P00281742 F 18-25 19 0 Maharashtra Weste

11249 1004023 Noonan P00059442 M 36-45 37 0 Karnataka Southe

Madhya
11248 1001209 Oshin P00201342 F 36-45 40 0 Cent
Pradesh

11247 1004089 Reichenbach P00171342 M 26-35 33 0 Haryana Northe

11246 1000695 Manning P00296942 M 18-25 19 1 Maharashtra Weste

... ... ... ... ... ... ... ... ...

4 1000588 Joni P00057942 M 26-35 28 1 Gujarat Weste

3 1001425 Sudevi P00237842 M 0-17 16 0 Karnataka Southe

2 1001990 Bindu P00118542 F 26-35 35 1 Uttar Pradesh Cent

1 1000732 Kartik P00110942 F 26-35 35 1 Andhra Pradesh Southe

0 1002903 Sanskriti P00125942 F 26-35 28 0 Maharashtra Weste

11251 rows × 14 columns

In [56]: # Sort by multiple columns

#df.sort_values(by=['Column1', 'Column2'])
df.sort_values(by=['Age', 'Amount'])

localhost:8888/nbconvert/html/Class EDA/EDA All Functions.ipynb?download=false 8/9

3/25/24, 10:42 PM EDA All Functions

Out[56]: Age
User_ID Cust_name Product_ID Gender Age Marital_Status State Zone
Group

11240 1001425 Sudevi P00044742 F 0-17 12 0 Delhi Central

11109 1004135 Jayanti P00229742 F 0-17 12 0 Delhi Central

10804 1001673 Lampkin P00277442 F 0-17 12 0 Gujarat Western

Madhya
10774 1001926 Barton P00157542 M 0-17 12 1 Central
Pradesh

9505 1005403 Caroline P00195742 M 0-17 12 1 Haryana Northern

... ... ... ... ... ... ... ... ... ...

Madhya
2951 1002204 Dilbeck P00246642 M 55+ 92 0 Central
Pradesh

2698 1005658 Poirier P00227942 M 55+ 92 0 Karnataka Southern

Uttar
1106 1001176 Alice P00128942 M 55+ 92 0 Central
Pradesh

Uttar
612 1002526 Shreya P00271142 M 55+ 92 1 Central
Pradesh

359 1003036 Prescott P00255842 F 55+ 92 0 Uttarakhand Central

11251 rows × 14 columns

localhost:8888/nbconvert/html/Class EDA/EDA All Functions.ipynb?download=false 9/9

Controller Manual TF-100 LH-100
64% (11)
Controller Manual TF-100 LH-100
6 pages
UC4 SERVICE MANUAL 2012 Rev B
No ratings yet
UC4 SERVICE MANUAL 2012 Rev B
87 pages
Diwali Sales Analysis EDA 1696347982
No ratings yet
Diwali Sales Analysis EDA 1696347982
8 pages
Project Sale Analysis
No ratings yet
Project Sale Analysis
8 pages
Diwali Sales Analysis
No ratings yet
Diwali Sales Analysis
14 pages
Project
No ratings yet
Project
12 pages
DS
No ratings yet
DS
38 pages
Pandas Interview Questions
No ratings yet
Pandas Interview Questions
21 pages
09_Pandas slides
No ratings yet
09_Pandas slides
33 pages
Data Visualization On Pandas - Jupyter Notebook
No ratings yet
Data Visualization On Pandas - Jupyter Notebook
7 pages
DBMS Practical List DDU PDF
No ratings yet
DBMS Practical List DDU PDF
27 pages
Tidyverse - Tidyr and Dplyr
No ratings yet
Tidyverse - Tidyr and Dplyr
33 pages
2.2 Data Indexing and Selection
No ratings yet
2.2 Data Indexing and Selection
8 pages
Introduction to Pandas
No ratings yet
Introduction to Pandas
14 pages
Lab 9
No ratings yet
Lab 9
9 pages
python 2.1.2 (2)
No ratings yet
python 2.1.2 (2)
7 pages
P#04 ML 46
No ratings yet
P#04 ML 46
11 pages
Experiment No 3 Importing and Exporting Data in Python Using Pandas Student
No ratings yet
Experiment No 3 Importing and Exporting Data in Python Using Pandas Student
6 pages
Data Handing Using Pandas-I
100% (2)
Data Handing Using Pandas-I
46 pages
Lab Exercise 2-CS0017
No ratings yet
Lab Exercise 2-CS0017
17 pages
14_Pandas
No ratings yet
14_Pandas
25 pages
DBMS Lab Manual
No ratings yet
DBMS Lab Manual
42 pages
Introduction To Pandas in Data Analytics
No ratings yet
Introduction To Pandas in Data Analytics
12 pages
Pandas cheat sheet
No ratings yet
Pandas cheat sheet
19 pages
Xii Record (Dataframe & CSV)
No ratings yet
Xii Record (Dataframe & CSV)
11 pages
Chapter 2 Python Pandas - II
No ratings yet
Chapter 2 Python Pandas - II
19 pages
Data Science - Unit-3-Part-2
No ratings yet
Data Science - Unit-3-Part-2
32 pages
What Is Dplyr
No ratings yet
What Is Dplyr
23 pages
2.1 Pandas Objects
No ratings yet
2.1 Pandas Objects
10 pages
Python Pandas Demo PDF
100% (2)
Python Pandas Demo PDF
23 pages
02. Python Pandas - 2 2020-21
No ratings yet
02. Python Pandas - 2 2020-21
21 pages
12 IP Notes On Series
No ratings yet
12 IP Notes On Series
5 pages
Notes For Python Part III
No ratings yet
Notes For Python Part III
44 pages
Pandas Viva Questions
No ratings yet
Pandas Viva Questions
23 pages
SBLC 1
No ratings yet
SBLC 1
23 pages
Pandas
No ratings yet
Pandas
44 pages
DS (Pandas)
No ratings yet
DS (Pandas)
17 pages
Data Frame
No ratings yet
Data Frame
17 pages
Unit 4 DSE
No ratings yet
Unit 4 DSE
9 pages
Pandas Advance Quiz - Data Science Masters - PW Skills
100% (1)
Pandas Advance Quiz - Data Science Masters - PW Skills
5 pages
Chapter 2 Data handling using Pandas - I
No ratings yet
Chapter 2 Data handling using Pandas - I
10 pages
R Basic and Advanced
No ratings yet
R Basic and Advanced
9 pages
Pandas Module (Part-I)
No ratings yet
Pandas Module (Part-I)
36 pages
LL
No ratings yet
LL
5 pages
12 Ip Dataframes Notes
No ratings yet
12 Ip Dataframes Notes
7 pages
ACFrOgCuxzI7id1LCXi9yoyuvISxGard75NvAshCzyRkhz0Fv_jimN6GuJsUI3qR2_jr7vxbRmHlwJPmcpRa7v3zCXyCokAXM23U17GlLnoA-5jSOz-osgZwdAL-ghXvjz5yld44_1rLLZaDMrebwXv-HRUry-kJjWFBo4Jkhw==
No ratings yet
ACFrOgCuxzI7id1LCXi9yoyuvISxGard75NvAshCzyRkhz0Fv_jimN6GuJsUI3qR2_jr7vxbRmHlwJPmcpRa7v3zCXyCokAXM23U17GlLnoA-5jSOz-osgZwdAL-ghXvjz5yld44_1rLLZaDMrebwXv-HRUry-kJjWFBo4Jkhw==
12 pages
12 IP Unit 1 Python Pandas I (Part 3 Dataframes) Notes
No ratings yet
12 IP Unit 1 Python Pandas I (Part 3 Dataframes) Notes
24 pages
Class Notes Class: XII Date: 17-04-2021 Subject: Informatics Practices Topic: Chapter-1
No ratings yet
Class Notes Class: XII Date: 17-04-2021 Subject: Informatics Practices Topic: Chapter-1
5 pages
Ip - Class Xii - Comprehensive Assignment Mid Term 2022-2023
No ratings yet
Ip - Class Xii - Comprehensive Assignment Mid Term 2022-2023
9 pages
dvlab-code
No ratings yet
dvlab-code
10 pages
Data Handling Using Pandas-I-ORG
No ratings yet
Data Handling Using Pandas-I-ORG
44 pages
Block 1-Data Handling Using Pandas DataFrame
No ratings yet
Block 1-Data Handling Using Pandas DataFrame
17 pages
1 Data Handlinng Using Pandas-I
No ratings yet
1 Data Handlinng Using Pandas-I
46 pages
Unit III - Pandas - Data Manipulation Using Python
No ratings yet
Unit III - Pandas - Data Manipulation Using Python
15 pages
Numpy Basics Introduction To
No ratings yet
Numpy Basics Introduction To
35 pages
Python Pandas
No ratings yet
Python Pandas
1 page
chai
No ratings yet
chai
5 pages
pandas
No ratings yet
pandas
24 pages
Python Pandas Cheatsheety
No ratings yet
Python Pandas Cheatsheety
7 pages
609008987-EDA-Lab-Manual
No ratings yet
609008987-EDA-Lab-Manual
93 pages
DM Lab Cycle 1
No ratings yet
DM Lab Cycle 1
12 pages
Oracle SQL and PL/SQL
From Everand
Oracle SQL and PL/SQL
Niraj Gupta
4.5/5 (8)
Solax Mini Manual
100% (1)
Solax Mini Manual
20 pages
Resume Template TNSIF
No ratings yet
Resume Template TNSIF
2 pages
NEB-2000C Coding Software
No ratings yet
NEB-2000C Coding Software
16 pages
0157G - Ford 1805 Instruction Manual
No ratings yet
0157G - Ford 1805 Instruction Manual
12 pages
Global GAP Virtual Classroom Websession PDF
No ratings yet
Global GAP Virtual Classroom Websession PDF
2 pages
Zahra Sayed - PHD Thesis - 3D Mapping of Islamic Geometric Motifs
No ratings yet
Zahra Sayed - PHD Thesis - 3D Mapping of Islamic Geometric Motifs
207 pages
Informed Consent Form
No ratings yet
Informed Consent Form
3 pages
ĐỀ CƯƠNG CUỐI KỲ- KHỐI 10
No ratings yet
ĐỀ CƯƠNG CUỐI KỲ- KHỐI 10
12 pages
Cogizant
No ratings yet
Cogizant
2 pages
NPTEL CC Assignment7
0% (1)
NPTEL CC Assignment7
5 pages
Jobtestprep'S Numeracy Review: Fractions
No ratings yet
Jobtestprep'S Numeracy Review: Fractions
9 pages
Serving Static Content On WebLogic and GlassFish
No ratings yet
Serving Static Content On WebLogic and GlassFish
3 pages
A Deep Learning-Based Experiment on Forest Wildfire Detection in Machine Vision Course
No ratings yet
A Deep Learning-Based Experiment on Forest Wildfire Detection in Machine Vision Course
11 pages
Module of JAVA Training in Indore at SSi Education
No ratings yet
Module of JAVA Training in Indore at SSi Education
3 pages
GT26 Pulsation Supervision System
No ratings yet
GT26 Pulsation Supervision System
4 pages
MODULE-5 TTL-1 1pdf
No ratings yet
MODULE-5 TTL-1 1pdf
76 pages
Python Brochure 999 PDF
No ratings yet
Python Brochure 999 PDF
11 pages
Forhan Set - Rogs (1st) 2024 X
No ratings yet
Forhan Set - Rogs (1st) 2024 X
12 pages
Sensors: ECG Monitoring Systems: Review, Architecture, Processes, and Key Challenges
No ratings yet
Sensors: ECG Monitoring Systems: Review, Architecture, Processes, and Key Challenges
40 pages
PLC L Brouchure
No ratings yet
PLC L Brouchure
92 pages
05Mx16GhostGPC System10
No ratings yet
05Mx16GhostGPC System10
37 pages
UPS Parallel Operations
No ratings yet
UPS Parallel Operations
28 pages
Clarity Viewer Features
No ratings yet
Clarity Viewer Features
1 page
Respiratory system hap
No ratings yet
Respiratory system hap
21 pages
Elements of Analytical Photogrammetry: Exterior Orientation
No ratings yet
Elements of Analytical Photogrammetry: Exterior Orientation
11 pages
Cmos Schmitt Trigger CKT
No ratings yet
Cmos Schmitt Trigger CKT
4 pages
One-Goodmayes The Hazen
No ratings yet
One-Goodmayes The Hazen
2 pages
AP M.Tech College List
100% (1)
AP M.Tech College List
93 pages