Dataframe Notes
Dataframe Notes
Pandas:
It is a package useful for data analysis and manipulation.
Pandas provide an easy way to create, manipulate and
wrangle the data.
Pandas provide powerful and easy-to-use data structures,
as well as the means to quickly perform operations on these
structures.
1. Series
2. Data Frame
3. Panel
Index Data
0 10
1 15
2 18
3 22
CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF KANPUR
Syntax to create a
Series:
<Series Object>=pandas.Series (data, index=idx
(optional))
Where data may be python sequence (Lists),
ndarray, scalar value or a python dictionary.
Program-
import pandas as
pd Output-
import numpy Default
as np Index 0 10
arr=np.array([10,15,18,
22]) 1 15
2 18
s = pd.Series(arr)
print( 3 22
s)
Dat
a
Here we create
an array of 4
values.
How to create Series with Mutable index
Program-
print(s)
Creating a series from Scalar value
Result of s.head()
Result of s.head(3)
tail(): It is used to access the last 5 rows of a series.
Note :To access last 4 rows we can call
series_name.tail (4)
Selection in Series
Series provides index label loc and ilocand [] to access rows and
columns.
Syntax:-series_name.loc[StartRange:
StopRange] Example-
Syntax:-series_name.iloc[StartRange :
StopRange] Example-
Syntax:-series_name[StartRange> :
StopRange] or series_name[ index]
Example-
Example-
Slicing in Series
Example :-
CREATED BY: SACHIN BHARDWAJ PGT(CS) KV NO1 TEZPUR, VINOD VERMA PGT (CS) KV OEF KANPUR
DATAFRAME
DATAFRAME-It is a two-dimensional object that is useful in
representing data in the form of rows and columns. It is similar to
a spreadsheet or an SQL table. This is the most commonly used
pandas object. Once we store the data into the Dataframe, we
can perform various operations that are useful in analyzing and
understanding the data.
DATAFRAME STRUCTURE
COLUMNS PLAYERNAME IPLTEAM BASEPRICEINCR
0 ROHIT MI 13
1 VIRAT RCB 17
2 HARDIK MI 14
INDEX DATA
PROPERTIES OF DATAFRAME
1. Series
2. Lists
3. Dictionary
4. A numpy 2D array
Program-
Output-
import pandas as pd
0
s= 0 a
pd.Series(['a','b','c','d']) 1 b Default Column Name As
0
df=pd.DataFrame(s) 2 c
print(df) 3 d
DataFrame from Dictionary of Series
Example-
Example-
Iteration on Rows and Columns
1. iterrows ()
2. iteritems ()
iterrows()
Example -
>>df.empid or
df[‘empid’] 0 101
1 102
2 103
3 104
4 105
5 106
Name: empid, dtype:
int64
empi ename
d
>>df[[‘empid’,’ena
0 101
me’]] Sachin
1 102 Vinod
2 103 Lakhbir
3 104 Anil
4 105 Devinder
5 106 UmaSelvi
To Add & Rename a column in
data frame
import pandas as pd
s=
pd.Series([10,15,18,22])
df=pd.DataFrame(s)
df.columns=[‘List1’] To Rename the default column of
Data Frame as List1
df[‘List3’]=df[‘List1’]+df[‘List2’] Output-
Output-
>>del We can simply delete a column by
df[‘List3’]
column passing
name in subscript
List1 List2
with df
0 10 20
>>df
1 15 20
2 18 20
3 22 20
List1
0 10
1 15
2 18
3 22
To Delete a Column Using drop()
import pandas as pd
s=
pd.Series([10,20,30,40])
df=pd.DataFrame(s)
df.columns=[‘List1’]
df[‘List2’]=40
df1=df.drop(‘List2’,axis=1) (axis=1) means to delete Data
column wise
df2=df.drop(index=[2,3],axis=0) (axis=0) means to
delete
data row wise with given index
print(df)
print(“ After deletion::”)
print(df1)
print (“ After row deletion::”)
print(df2)
Output-
List1 List2
0 10 40
1 20 40
2 30 40
3 40 40
After deletion::
List1
0 10
1 20
2 30
3 40
After row deletion::
List1
0 10
1 20
Accessing the data frame through
loc() and iloc() method or indexing
using Labels
To access a single
row
To access single
column
To access first 3
Rows
Accessing the data frame through iloc()
Syntax-
Output-
Doj empi ename
d
0 12-01-2012 101 Sachin
1 15-01-2012 102 Vinod
2 05-09-2007 103 Lakhbir Data Frame
3 17-01-2012 104 Anil
4 05-09-2007 105 Devinder
5 16-01-2012 106 UmaSelvi
Doj empi ename
0 12-01-2012 d Sachin
101
1 15-01-2012 102 Vinod head() displays first 5 rows
2 05-09-2007 103 Lakhbir
3 17-01-2012 104 Anil
4 05-09-2007 105
Devind
er
Doj empi ename
d
1 15-01-2012 102 Vinod
2 05-09-2007 103 Lakhbir
3 17-01-2012 104 Anil tail() display last 5 rows
4 05-09-2007 105 Devind
er
5 16-01-2012 106 UmaSel
vi
To display first 2 rows we can use head(2) and to returns
last2 rows we can use tail(2) and to return 3rd to 4th row
we can write df[2:5].
import pandas as pd
empdata={ 'Doj':['12-01-2012','15-01-2012','05-09-2007',
'17-01-2012','05-09-2007','16-01-2012'], 'empid':
[101,102,103,104,105,106],
'ename':
['Sachin','Vinod','Lakhbir','Anil','Devinder','UmaSelvi']
}
df=pd.DataFrame(empdata)
print(df)
print(df.head(
2))
print(df.tail(2)
)
Output
- Doj empid ename
0 12-01-2012 101 Sachin
1 15-01-2012 102 Vinod
2 05-09-2007 103 Lakhbir
3 17-01- 2012 104 Anil
4 05-09-2007 105 Devinder
5 16-01-2012 106 UmaSelvi
Example-1
1. Full Outer Join:- The full outer join combines the results of
both the left and the right outer joins. The joined data frame will
contain all records from both the data frames and fill in NaNs for
missing matches on either side. You can perform a full outer join
by specifying the how argument as outer in merge() function.
Example-
Example-
3. RightJoin :-The right join produce a complete set of
records from data frame B(Right side Data Frame) with the
matching records (where available) in data frame A( Left side data
frame). If there is no match right side will contain null. You have
to pass right in how argument inside merge() function.
Example-
4. Left Join :- The left join produce a complete set of records
from data frame A(Left side Data Frame) with the matching
records (where available) in data frame B( Right side data frame).
If there is no match left side will contain null. You have to pass
left in how argument inside merge() function.
Example-
5. Joining on Index :-Sometimes you have to perform the
join on the indexes or the row labels. For that you have to specify
right_index( for the indexes of the right data frame ) and
left_index( for the indexes of left data frame) as True.
Example-
CSV File
A CSV is a comma separated values file, which allows
data to be saved in a tabular format. CSV is a simple file
such as a spreadsheet or database. Files in the csv
format can be imported and exported from programs that
store data in tables, such as Microsoft excel or Open
Office.
CSV files data fields are most often
separated, or delimited by a comma. Here the data in
each row are delimited by comma and individual rows are
separated by newline.
To create a csv file, first choose your
favorite text editor such as- Notepad and open a new file.
Then enter the text data you want the file to contain,
separating each value with a comma and each row with a
new line. Save the file with the extension.csv. You can
open the file using MS Excel or another spread sheet
program. It will create the table of similar data.
pd.read_csv() method is used to read a csv file.
Exporting data from
dataframe to CSV File