Data Handing Using Pandas-I
Data Handing Using Pandas-I
Pandas:
It is a package useful for data analysis and manipulation.
Pandas provide an easy way to create, manipulate and wrangle the
data.
Pandas provide powerful and easy-to-use data structures, as well
as the means to quickly perform operations on these structures.
1. Series
2. Data Frame
3. Panel
e.g.-
Index Data
0 10
1 15
2 18
3 22
How to create Series with nd array
Program-
import pandas as pd
Output-
import numpy as np Default Index
0 10
arr=np.array([10,15,18,22])
1 15
s = pd.Series(arr) 2 18
3 22
print(s)
Data
Here we create an
array of 4 values.
Program-
arr=np.array(['a','b','c','d'],) second b
third c
s=pd.Series(arr,
fourth d
index=['first','second','third','fourth'])
print(s)
Creating a series from Scalar value
Result of s.head()
Result of s.head(3)
tail(): It is used to access the last 5 rows of a series.
Series provides index label loc and iloc and [] to access rows and
columns.
Example-
2. Selection Using iloc index label :-
Example-
3. Selection Using [] :
series_name[ index]
Example-
Indexing in Series
Example-
Slicing in Series
The segments start representing the first item, end representing the
last item, and step representing the increment between each item that
you would like.
Example :-
DATAFRAME
DATAFRAME STRUCTURE
0 ROHIT MI 13
1 VIRAT RCB 17
2 HARDIK MI 14
INDEX DATA
PROPERTIES OF DATAFRAME
1. Series
2. Lists
3. Dictionary
4. A numpy 2D array
Program-
Output-
import pandas as pd 0
s = pd.Series(['a','b','c','d']) 0 a
1 b Default Column Name As 0
df=pd.DataFrame(s)
2 c
print s 3 d
Data Frame from Dictionary of Series
Example-
Example-
Iteration on Rows and Columns
1. iterrows ()
2. iteritems ()
iterrows()
Example-
Select operation in data frame
Example -
import pandas as pd
empdata={ 'empid':[101,102,103,104,105,106],
'ename':['Sachin','Vinod','Lakhbir','Anil','Devinder','U
maSelvi'],
'Doj':['12-01-2012','15-01-2012','05-09-2007','17-
01- 2012','05-09-2007','16-01-2012'] }
df=pd.DataFrame(empdata)
print(df)
Output-
Doj empid ename
0 12-01-2012 101 Sachin
1 15-01-2012 102 Vinod
2 05-09-2007 103 Lakhbir
3 17-01- 2012 104 Anil
4 05-09-2007 105 Devinder
5 16-01-2012 106 UmaSelvi
>>df.empid or df[‘empid’]
0 101
1 102
2 103
3 104
4 105
5 106
Name: empid, dtype: int64
>>df[[‘empid’,’ename’]]
empid ename
0 101 Sachin
1 102 Vinod
2 103 Lakhbir
3 104 Anil
4 105 Devinder
5 106 UmaSelvi
To Add & Rename a column in data
frame
import pandas as pd
s = pd.Series([10,15,18,22])
df=pd.DataFrame(s)
df[‘List3’]=df[‘List1’]+df[‘List2’] Output-
List1 List2
0 10 20
1 15 20
2 18 20
3 22 20
List1
0 10
1 15
2 18
3 22
To Delete a Column using drop()
import pandas as pd
s= pd.Series([10,20,30,40])
df=pd.DataFrame(s)
df.columns=[‘List1’]
df[‘List2’]=50
df1=df.drop(‘List2’,axis=1) (axis=1) means to delete Data
column wise
df2=df.drop(index=2,3,axis=0) (axis=0) means to delete
data row wise with given index
print(df)
print(“ After deletion::”)
print(df1)
print (“ After row deletion::”)
print(df2)
Output-
List1 List2
0 10 40
1 20 40
2 30 40
3 40 40
After deletion::
List1
0 10
1 20
2 30
3 40
After row deletion::
List1
0 10
1 20
Accessing the data frame through loc()
and iloc() method or indexing using Labels
Pandas provide loc() and iloc() methods to access the subset from a
data frame using row/column.
Syntax-
Syntax-
The method head() gives the first 5 rows and the method tail() returns the last 5
rows.
import pandas as pd
empdata={ 'empid':[101,102,103,104,105,106],
'ename':['Sachin','Vinod','Lakhbir','Anil','Devinder','UmaSelvi'],
'Doj':['12-01-2012','15-01-2012','05-09-2007','17-01-
2012','05-09-2007','16-01-2012'] }
df=pd.DataFrame(empdata)
print(df)
print(df.head())
print(df.tail())
Output-
Doj empid ename
0 12-01-2012 101 Sachin
1 15-01-2012 102 Vinod
2 05-09-2007 103 Lakhbir Data Frame
3 17-01-2012 104 Anil
4 05-09-2007 105 Devinder
5 16-01-2012 106 UmaSelvi
Doj empid ename
0 12-01-2012 101 Sachin
1 15-01-2012 102 Vinod head() displays first 5 rows
2 05-09-2007 103 Lakhbir
3 17-01-2012 104 Anil
4 05-09-2007 105 Devinder
Doj empid ename
1 15-01-2012 102 Vinod
2 05-09-2007 103 Lakhbir
3 17-01-2012 104 Anil tail() display last 5 rows
4 05-09-2007 105 Devinder
5 16-01-2012 106 UmaSelvi
To display first 2 rows we can use head(2) and to returns last2
rows we can use tail(2) and to return 3rd to 4th row we can write
df[2:5].
import pandas as pd
empdata={ 'empid':[101,102,103,104,105,106],
'ename':['Sachin','Vinod','Lakhbir','Anil','Devinder','UmaSelvi'],
'Doj':['12-01-2012','15-01-2012','05-09-2007','17-01-
2012','05-09-2007','16-01-2012'] }
df=pd.DataFrame(empdata)
print(df)
print(df.head(2))
print(df.tail(2))
print(df[2:5]
Output-
Doj empid ename
0 12-01-2012 101 Sachin
1 15-01-2012 102 Vinod
2 05-09-2007 103 Lakhbir
3 17-01- 2012 104 Anil
4 05-09-2007 105 Devinder
5 16-01-2012 106 UmaSelvi
Example-2
Example-4
To concatenate DataFrames
along column, you can specify
the axis parameter as 1.
Merge operation in data frame
Two Data Frames might hold different kinds of information about the
same entity and linked by some common feature/column. To join these
Data Frames, pandas provides multiple functions like merge(), join()
etc.
Example-1
1.Full Outer Join:- The full outer join combines the results of both
the left and the right outer joins. The joined data frame will contain all
records from both the data frames and fill in NaNs for missing
matches on either side. You can perform a full outer join by specifying
the how argument as outer in merge() function.
Example-
Example-
3. Right Join :- The right join produce a complete set of records
from data frame B(Right side Data Frame) with the matching records
(where available) in data frame A( Left side data frame). If there is no
match right side will contain null. You have to pass right in how
argument inside merge() function.
Example-
1.Left Join :- The right join produce a complete set of records
from data frame A(Left side Data Frame) with the matching records
(where available) in data frame B( Right side data frame). If there is
no match left side will contain null. You have to pass left in how
argument inside merge() function.
Example-
5. Joining on Index :- Sometimes you have to perform the join
on the indexes or the row labels. For that you have to specify
right_index ( for the indexes of the right data frame )and left_index(
for the indexes of left data frame) as True.
Example-
CSV File