Class12 Pandas Notes
Class12 Pandas Notes
SCHOOL
I.P Extension, Patparganj, Delhi, 110092
Class XII
Data Handling Using Pandas
Introduction to Python Libraries:
NumPy, Pandas and Matplotlib are three Python libraries which are used for scientific and
analytical use. These libraries allow us to manipulate, transform and visualise data easily and
efficiently.
NumPy, which stands for ‘Numerical Python’, is a library that can be used for numerical data
analysis and scientific computing.
PANDAS stands for PanelData is a high level data manipulation tool used for analysing data.
It is built in packages like NumPy and Matplotlib and gives us a single, convenient place to do
most of our data analysis and visualisation work. Pandas has three important data structures.
1. Series
2. DataFrame
3. Panel
The Matplotlib library in Python is used for plotting graphs and visualisation. Using Matplotlib,
we can generate publication quality plots, histograms, bar charts etc.
Pandas is used when data is in tabular format Numpy is used for numeric array based data
Pandas is used for data analysis and visualizations. NumPy is used for numerical calculations.
Installing Pandas:
NOTE : Pandas can be installed only when Python is already installed on that system.
Data Structures in Pandas:
A data structure is collection of data values and operations that can be applied to that data.
Two commonly used data structures in Pandas are:
1. Series
2. DataFrame
Series:
A Series is a one-dimensional array containing a sequence of values of any data type (int, float,
list, string, etc). By default Series have numeric data labels starting from zero. The data label
associated with a particular value is called its index. We can also assign values of other data
types as index. Example of a series containing names of Cities is given below.
Index Data
0 Delhi
1 Faridabad
2 Jaipur
3 Mumbai
4 Bangalore
Index Data
0 Mango
1 Guava
2 Banana
3 Grapes
4 Water melon
Example of Series containing month name as index and number of days as Data:
Index Data
Jan 31
Feb 28
Mar 31
April 30
May 31
>>>import pandas as pd
>>>s1 = pd.Series( )
>>>s1
Series([ ], dtype: float64)
OUTPUT
0 2
1 4
2 8
3 12
4 14
5 20
dtype: int64
Observe that output is shown in two columns – the index is on the left and the data value is
on the right.
We can also assign a user-defined labels to the index and use them to access elements of a
Series. The following example has a numeric index in random order
Here, data values Raman, Rosy and Ram have index values 1, 7 and 9 respectively
OUTPUT
Feb 2
Mar 3
Apr 4
dtype: int64
Here, data values 2, 3 and 4 have index values Feb, Mar and Apr respectively
We can create a series from a one-dimensional (1D) NumPy array, as shown below:
Output:
0 6
1 4
2 8
3 9
dtype: int32
When index labels are passed with the array, then the length of the index and array must be
of the same
size, else it will result in a ValueError like shown below
OUTPUT:
ValueError: Length of values (4) does not match length of index (5)
When a series is created from dictionary then the keys of the dictionary becomes the index
of the series, so no need to declare the index as a separate list as the built-in keys will be
treated as the index of the series. Let we do some practicals.
import pandas as pd
S2 = pd.Series({2 : "Feb", 3 : "Mar", 4 : "Apr"})
print(S2) #Display the series
OUTPUT:
2 Feb
3 Mar
4 Apr
dtype: object
NOTE: In above example, you can see that keys of a dictionary becomes the index of the
Series
Practical 2: Store the dictionary in a variable and pass it the variable to method Series()
import pandas as pd
d = {"One" : 1, "Two" : 2, "Three" : 3, "Four" : 4}
S2 = pd.Series(d)
print(S2) #Display the series
OUTPUT:
One 1
Two 2
Three 3
Four 4
dtype: int64
Practical 3: Lets try to pass index while creating Series from Dictionary
import pandas as pd
d = {"One" : 1, "Two" :2, "Three" : 3, "Four" : 4}
S2 = pd.Series(d, index=["A", "B", "C", "D"])
print(S2)
OUTPUT:
A NaN
B NaN
C NaN
D NaN
dtype: float64
import pandas as pd
d = [12, 13, 14, 15]
S2 = pd.Series(data = [d]*2, index = d)
print(S2) #Display the series
OUTPUT
ValueError: Length of values (2) does not match length of index (4)
import pandas as pd
d = [12, 13, 14, 15]
S2 = pd.Series(data=[d]*4, index=d)
print(S2) #Display the series
OUTPUT
12 [12, 13, 14, 15]
13 [12, 13, 14, 15]
14 [12, 13, 14, 15]
15 [12, 13, 14, 15]
dtype: object
6. Creation of Series using String:
Practical 1:
import pandas as pd
S2 = pd.Series('a', 'b', 'c')
print(S2) #Display the series
OUTPUT
0 a
1 b
2 c
Practical 2:
import pandas as pd
S2 = pd.Series('anil', 'bhuvan', 'ravi')
print(S2) #Display the series
OUTPUT
0 anil
1 bhuvan
2 ravi
Practical 3:
import pandas as pd
S2 = pd.Series('anil', 'bhuvan', 'ravi', index = [1, 4, 7])
print(S2) #Display the series
OUTPUT
1 anil
4 bhuvan
7 ravi
We can change the existing index value of the Series by using index method:
import pandas as pd
seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC', 'London', 'Paris'],
index=['India', 'USA', 'UK', 'France'])
print(seriesCapCntry)
seriesCapCntry.index=[10,20,30,40] #this statement is used to change the
index of the Series.
print(seriesCapCntry)
OUTPUT
India NewDelhi
USA WashingtonDC
UK London
France Paris
dtype: object
10 NewDelhi
20 WashingtonDC
30 London
40 Paris
dtype: object
There are two common ways for accessing the elements of a series: Indexing and Slicing.
1. Indexing :
Indexing in Series is used to access elements in a series. Indexes are of two types:
Positional index
Labelled index.
Positional index takes an integer value that corresponds to its position in the series
starting from 0, whereas
import pandas as pd
d = [31, 15, 17, 20]
S2 = pd.Series(d)
print(S2[2]) #Display the third value of the series using it's index value
print(S2[0]) #Display the first value of the series using it's index value
OUTPUT:
17
31
import pandas as pd
d = [31, 15, 17, 20]
S2 = pd.Series(d)
print(s2[2]) #Wrong Series name
OUTPUT:
NameError: name 's2' is not defined
import pandas as pd
d = [31, 15, 17, 20]
S2 = pd.Series(d)
print(S2[5]) #Wrong index
OUTPUT:
KeyError: 5
import pandas as pd
d = [31, 15, 17, 20]
S2 = pd.Series(d)
print(S2[-1]) #Negative index
OUTPUT:
KeyError: -1
import pandas as pd
d = [1, 2, 3]
S2 = pd.Series(d, index=["One", "Two", "Three"])
print(S2[-1])
OUTPUT:
3
Practical 5: What happen if we give negative index(enclosed in square brackets) to access an
element.
import pandas as pd
d = [1, 2, 3]
S2 = pd.Series(d, index=["One", "Two", "Three"])
print(S2[[-1]])
OUTPUT:
Three 3
dtype: int64
In the following example, value NewDelhi is displayed for the labelled index India.
import pandas as pd
seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC', 'London', 'Paris'],
index=['India', 'USA', 'UK', 'France'])
print(seriesCapCntry['India']) #Using Labelled index
OUTPUT:
NewDelhi
We can also access an element of the series using the positional index:
import pandas as pd
seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC', 'London', 'Paris'],
index=['India', 'USA', 'UK', 'France'])
print(seriesCapCntry[1])
OUTPUT:
WashingtonDC
More than one element of a series can be accessed using a list of positional integers.
import pandas as pd
seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC', 'London', 'Paris'],
index=['India', 'USA', 'UK', 'France'])
print(seriesCapCntry[[1, 2]])
OUTPUT:
USA WashingtonDC
UK London
dtype: object
More than one element of a series can also be accessed using a list of index labels as shown
in the following examples:
import pandas as pd
seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC', 'London', 'Paris'],
index=['India', 'USA', 'UK', 'France'])
print(seriesCapCntry[['India', 'UK']]) #Accessing Multiple elements using index
labels
OUTPUT:
India NewDelhi
UK London
dtype: object
2. Slicing :
Sometimes, we may need to extract a part of a series. This can be done through slicing. This
is similar to slicing used with List. When we use positional indices for slicing, the value at the
end index position will be excluded. for example
import pandas as pd
seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC', 'London', 'Paris'],
index=['India', 'USA', 'UK', 'France'])
print(seriesCapCntry[0:2]) #Here the value at index 0 and 1 will be extracted
OUTPUT:
India NewDelhi
USA WashingtonDC
dtype: object
import pandas as pd
seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC', 'London', 'Paris'],
index=['India', 'USA', 'UK', 'France'])
print(seriesCapCntry[0:1]) #Here the value at index 0 will be extracted
OUTPUT:
India NewDelhi
dtype: object
import pandas as pd
seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC', 'London', 'Paris'],
index=['India', 'USA', 'UK', 'France'])
print(seriesCapCntry[-1 : -3 : -1]) #Here the value at index 0 will be extracted
OUTPUT:
France Paris
UK London
dtype: object
import pandas as pd
seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC', 'London', 'Paris'],
index=['India', 'USA', 'UK', 'France'])
print(seriesCapCntry[: : -1]) #Here the series will be extracted in reverse order
OUTPUT:
France Paris
UK London
USA WashingtonDC
India NewDelhi
dtype: object
If labelled indexes are used for slicing, then value at the end index label is also included in
the output, for example:
Practical 1:
import pandas as pd
S2 = pd.Series([1, 2, 3, 4, 5], index=["One", "Two", "Three", "Four", "Five"])
print(S2["Two" : "Five"])
OUTPUT:
Two 2
Three 3
Four 4
Five 5
dtype: int64
Practical 2:
import pandas as pd
S2 = pd.Series([1, 2, 3, 4, 5], index=["One", "Two", "Three", "Four", "Five"])
print(S2["One" : "Three"])
OUTPUT:
One 1
Two 2
Three 3
dtype: int64
We can modify the values of series elements by assigning the value to the keys of the series
as shown in the following example:
Example 1:
import pandas as pd
S2 = pd.Series([1,2,3,4,5], index=["One","Two","Three","Four","Five"])
S2["Two"] = 22
print(S2)
OUTPUT:
One 1
Two 22
Three 3
Four 4
Five 5
dtype: int64
Example 2:
import pandas as pd
S2 = pd.Series([1,2,3,4,5], index=["One","Two","Three","Four","Five"])
S2["Two", "Three"] = 22
print(S2)
OUTPUT:
One 1
Two 22
Three 22
Four 4
Five 5
dtype: int64
Example 3:
import pandas as pd
S2 = pd.Series([1,2,3,4,5], index=["One","Two","Three","Four","Five"])
S2[1 : 4] = 22 #we can use slicing to modify the value
print(S2)
OUTPUT:
One 1
Two 22
Three 22
Four 22
Five 5
dtype: int64
Observe that updating the values in a series using slicing also excludes the value at the end
index position.
But, it changes the value at the end index label when slicing is done using labels.
Example 4:
import pandas as pd
S2 = pd.Series([1,2,3,4,5], index=["One","Two","Three","Four","Five"])
S2["One" : "Four"] = 22
print(S2)
OUTPUT:
One 22
Two 22
Three 22
Four 22
Five 5
dtype: int64
Practice Exercise :
Q1. Consider the following Series and write the output of the following:
import pandas as pd
S = pd.Series([1, 2, 3, 4, 5], index=["One", "Two", "Three", "Four", "Five"])
a. print(S[“Two”])
b. print(S[2])
c. print(S[0])
d. print(S[0 : 2])
e. print(S[“Two” : “Three”])
f. print(S[: : -1])
g. print(S[1 : 4])
h. print(S[[“Two”, “Four”]])
i. print(S[“Two” : “Four”])
j. print(S[-1])
SOLUTIONS
a. 2
b. 3
c. 1
d.
One 1
Two 2
dtype: int64
e.
Two 2
Three 3
dtype: int64
f.
Five 5
Four 4
Three 3
Two 2
One 1
dtype: int64
g.
Two 2
Three 3
Four 4
dtype: int64
h.
Two 2
Four 4
dtype: int64
i.
Two 2
Three 3
Four 4
dtype: int64
j.
5
Attributes of Series:
We can access various properties of a series by using its attributes with the series name.
Syntax of using attribute is given below
<Series Name>.<Attribute Name>
values This attributes prints all the values of the series in the form of list.
import pandas as pd
S = pd.Series([1, 2, 3, 4, 5], index=["One", "Two", "Three", "Four", "Five"])
print(S)
print("------------------------------------------")
S.name="Sample"
print(S)
OUTPUT
One 1
Two 2
Three 3
Four 4
Five 5
dtype: int64
------------------------------------------
One 1
Two 2
Three 3
Four 4
Five 5
Name: Sample, dtype: int64
import pandas as pd
S = pd.Series([1, 2, 3, 4, 5], index=["One", "Two", "Three", "Four", "Five"])
print(S)
print("------------------------------------------")
S.index.name="Number"
print(S)
OUTPUT:
One 1
Two 2
Three 3
Four 4
Five 5
dtype: int64
------------------------------------------
Number
One 1
Two 2
Three 3
Four 4
Five 5
dtype: int64
import pandas as pd
S = pd.Series([1, 2, 3, 4, 5], index=["One", "Two", "Three", "Four", "Five"])
print(S)
print("------------------------------------------")
print(S.values)
print("------------------------------------------")
print(S.size)
print("------------------------------------------")
print(S.empty)
OUTPUT:
One 1
Two 2
Three 3
Four 4
Five 5
dtype: int64
------------------------------------------
[1 2 3 4 5]
------------------------------------------
5
------------------------------------------
False
import pandas as pd
S = pd.Series([1, 2, 3, 4, 5], index=["One", "Two", "Three", "Four", "Five"])
print(S)
print("------------------------------------------")
print(S.index)
print("------------------------------------------")
print(S.hasnans)
Methods of Series:
In this section, we are going to discuss methods available for Pandas Series. Let us consider
the following Series.
import pandas as pd
S = pd.Series([1, 2, 3, 4, 5, 6, 7], index=["One", "Two", "Three", "Four", "Five", "Six",
"Seven"])
print(S)
1. head(n): This method returns the first n members of the series. If the value for n is not
passed, then by default first five members are displayed. for example
import pandas as pd
S = pd.Series([1, 2, 3, 4, 5, 6, 7], index=["One", "Two", "Three", "Four", "Five", "Six",
"Seven"])
print(S.head(3)) #Display First three members of Series
print("------------------------------------------------")
print(S.head( ))#Display First five members of Series as no argument is passed
OUTPUT:
One 1
Two 2
Three 3
dtype: int64
------------------------------------------------
One 1
Two 2
Three 3
Four 4
Five 5
dtype: int64
2. tail(n): This method returns the last n members of the series. If the value for n is not
passed, then by default last five members will be displayed. for example
import pandas as pd
S = pd.Series([1, 2, 3, 4, 5, 6, 7], index=["One", "Two", "Three", "Four", "Five", "Six",
"Seven"])
print(S.tail(4)) #Display last four members of Series
print("------------------------------------------------")
print(S.tail( ))#Display last five members of Series as no argument is passed
OUTPUT:
Four 4
Five 5
Six 6
Seven 7
dtype: int64
------------------------------------------------
Three 3
Four 4
Five 5
Six 6
Seven 7
dtype: int64
3. count( ): This method returns returns the number of non-NaN values in the Series. for
example
import pandas as pd
S = pd.Series([1, 2, 3, 4, 5, 6, 7], index=["One", "Two", "Three", "Four", "Five", "Six",
"Seven"])
print(S.count())
OUTPUT:
7
import pandas as pd
S = pd.Series([1, 2, 3, 4, 5, 6, 7], index=["One", "Two", "Three", "Four", "Five", "Six",
"Seven"])
print([S>5]) #will return True for those values of Series which satisfy the
condition
print("----------------------------------------")
print(S[S>5]) #will return those values of Series which satisfy the condition
OUTPUT:
[One False
Two False
Three False
Four False
Five False
Six True
Seven True
dtype: bool]
----------------------------------------
Six 6
Seven 7
dtype: int64
We can delete elements from Series using drop( ) method. To delete a particular element
we have to pass the index of the element to be deleted.
import pandas as pd
S = pd.Series([1, 2, 3, 4, 5, 6, 7], index=["One", "Two", "Three", "Four", "Five", "Six",
"Seven"])
print(S)
print("-------------------------------------------")
print(S.drop("Four"))#This statement will delete only one element.
OUTPUT:
One 1
Two 2
Three 3
Four 4
Five 5
Six 6
Seven 7
dtype: int64
-------------------------------------------
One 1
Two 2
Three 3
Five 5
Six 6
Seven 7
dtype: int64
import pandas as pd
S = pd.Series([1, 2, 3, 4, 5, 6, 7], index=["One", "Two", "Three", "Four", "Five", "Six",
"Seven"])
print(S)
print("-------------------------------------------")
print(S.drop(["Four", "Five"]))#This statement will delete two elements.
One 1
Two 2
Three 3
Four 4
Five 5
Six 6
Seven 7
dtype: int64
-------------------------------------------
One 1
Two 2
Three 3
Six 6
Seven 7
dtype: int64