Matplotlib Ncert
Matplotlib Ncert
Matplotlib
In this chapter
»» Introduction
»» Plotting using
Matplotlib
4.1 Introduction »» Customisation of
We have learned how to organise and analyse Plots
data and perform various statistical operations »» The Pandas Plot
on Pandas DataFrames. Likewise, in Class XI, we Function (Pandas
have learned how to analyse numerical data using Visualisation)
NumPy. The results obtained after analysis is used
to make inferences or draw conclusions about data
as well as to make important business decisions.
Sometimes, it is not easy to infer by merely looking
at the results. In such cases, visualisation helps
in better understanding of results of the analysis.
Data visualisation means graphical or pictorial
representation of the data using graph, chart,
etc. The purpose of plotting data is to visualise
variation or show relationships between variables.
2021–22
2021–22
plotting area, legend, axis labels, ticks, title, etc. (Figure Notes
4.1). Each function makes some change to a figure:
example, creates a figure, creates a plotting area in a
figure, plots some lines in a plotting area, decorates the
plot with labels, etc.
It is always expected that the data presented through
charts easily understood. Hence, while presenting data
we should always give a chart title, label the axis of the
chart and provide legend in case we have more than one
plotted data.
To plot x versus y, we can write plt.plot(x,y). The
show() function is used to display the figure created
using the plot() function.
Let us consider that in a city, the maximum temperature
of a day is recorded for three consecutive days. Program
4-1 demonstrates how to plot temperature values for
the given dates. The output generated is a line chart.
Program 4-1 Plotting Temperature against Height
2021–22
boxplot(x[, notch, sym, vert, whis, ...]) Make a box and whisker plot.
xticks([ticks, labels]) Get or set the current tick locations and labels of the x-axis.
yticks([ticks, labels]) Get or set the current tick locations and labels of the y-axis.
2021–22
2021–22
4.3.2 Colour
It is also possible to format the plot further by changing
the colour of the plotted data. Table 4.4 shows the list of
colours that are supported. We can either use character
codes or the color names as values to the parameter
color in the plot().
Table 4.4 Colour abbreviations for plotting
Character Colour
‘b’ blue
‘g’ green
‘r’ red
‘c’ cyan
‘m’ magenta
‘y’ yellow
‘k’ black
‘w’ white
2021–22
height = [121.9,124.5,129.5,134.6,139.7,147.3,
152.4, 157.5,162.6]
weight= [19.7,21.3,23.5,25.9,28.5,32.1,35.7,39.6,
43.2]
Let us plot a line chart where:
i. x axis will represent weight
ii. y axis will represent height
iii. x axis label should be “Weight in kg”
iv. y axis label should be “Height in cm”
v. colour of the line should be green
vi. use * as marker
vii. Marker size as10
viii. The title of the chart should be “Average
weight with respect to average height”.
ix. Line style should be dashed
x. Linewidth should be 2.
import matplotlib.pyplot as plt
import pandas as pd
height=[121.9,124.5,129.5,134.6,139.7,147.3,152.4,157.5,162.6]
weight=[19.7,21.3,23.5,25.9,28.5,32.1,35.7,39.6,43.2]
df=pd.DataFrame({"height":height,"weight":weight})
#Set xlabel for the plot
plt.xlabel('Weight in kg')
#Set ylabel for the plot
2021–22
plt.ylabel('Height in cm')
#Set chart title:
plt.title('Average weight with respect to average height')
#plot using marker'-*' and line colour as green
plt.plot(df.weight,df.height,marker='*',markersize=10,color='green
',linewidth=2, linestyle='dashdot')
plt.show()
In the above we created the DataFrame using 2 lists,
and in the plot function we have passed the height and
weight columns of the DataFrame. The output is shown
in Figure 4.4.
Continuous data
are measured
while discrete
data are obtained
by counting.
Height, weight
are examples of
continuous data. It
can be in decimals.
Total number
of students in a
class is discrete.
It can never be in
decimals.
2021–22
hist Histogram
box Boxplot
area Area plot
pie Pie plot
scatter Scatter plot
2021–22
Depict the sales for the three weeks using a Line chart. It
should have the following:
i. Chart title as “Mela Sales Report”.
ii. axis label as Days.
iii. axis label as “Sales in Rs”.
Line colours are red for week 1, blue for week 2 and brown
for week 3.
import pandas as pd
import matplotlib.pyplot as plt
# reads "MelaSales.csv" to df by giving path to the file
df=pd.read_csv("MelaSales.csv")
#create a line plot of different color for each week
df.plot(kind='line', color=['red','blue','brown'])
# Set title to "Mela Sales Report"
plt.title('Mela Sales Report')
# Label x axis as "Days"
plt.xlabel('Days')
# Label y axis as "Sales in Rs"
plt.ylabel('Sales in Rs')
#Display the figure
plt.show()
2021–22
Maker ="*"
Marker size=10
linestyle="--"
Linewidth =3
import pandas as pd
import matplotlib.pyplot as plt
df=pd.read_csv("MelaSales.csv")
#creates plot of different color for each week
df.plot(kind='line', color=['red','blue','brown'],marker="*",marke
rsize=10,linewidth=3,linestyle="--")
2021–22
2021–22
2021–22
2021–22
2021–22
2021–22
Program 4-10
import pandas as pd
import matplotlib.pyplot as plt
#read the CSV file with specified columns
#usecols parameter to extract only two required columns
data=pd.read_csv("Min_Max_Seasonal_IMD_2017.csv",
usecols=['ANNUAL - MIN','ANNUAL - MAX'])
df=pd.DataFrame(data)
#plot histogram for 'ANNUAL - MIN'
df.plot(kind='hist',y='ANNUAL - MIN',title='Annual Minimum
Temperature (1901-2017)')
plt.xlabel('Temperature')
plt.ylabel('Number of times')
#plot histogram for both 'ANNUAL - MIN' and 'ANNUAL - MAX'
df.plot(kind='hist',
2021–22
2021–22
2021–22
Activity 4.2
What value does each
bubble on the plot at
Figure 4.14 represent?
2021–22
Program 4-13
import numpy as np
import matplotlib.pyplot as plt
discount= np.array([10,20,30,40,50])
saleInRs=np.array([40000,45000,48000,50000,100000])
size=discount*10
plt.scatter(x=discount,y=saleInRs,s=size,color='red',linewidth=3,m
arker='*',edgecolor='blue')
plt.title('Sales Vs Discount') Think and Reflect
plt.xlabel('Discount offered')
plt.ylabel('Sales in Rs') What would
happen if we use
plt.show() df.plot(kind=’scatter’)
instead of plt.scatter()
in Program 4-13?
2021–22
2021–22
Waseem Ali 95 76 79 77 89
Kulpreet Singh 78 81 75 76 88
Annie Mathews 88 63 67 77 80
Shiksha 95 55 51 59 80
Naveen Gupta 82 55 63 56 74
Taleem Ahmed 73 49 54 60 77
Pragati Nigam 80 50 51 54 76
Usman Abbas 92 43 51 48 69
Gurpreet Kaur 60 43 55 52 71
Sameer Murthy 60 43 55 52 71
Angelina 78 33 39 48 68
Angad Bedi 62 43 51 48 54
2021–22
Year Sunny Bunny Resort Happy Lucky Resort Breezy WIndy Resort
2014 4.75 3 4.5
2015 2.5 4 2
2016 3.5 2.5 3
2017 4 2 3.5
2018 1.5 4.5 1
2021–22
Program 4-15
Think and Reflect
import pandas as pd
import matplotlib.pyplot as plt Which of the three
resorts should be
#read the CSV file in 'data' awarded? Give
data= pd.read_csv('compareresort.csv') reasons.
#convert 'data' into a DataFrame 'df'
df= pd.DataFrame(data)
#plot a box plot for the DataFrame 'df'
with a title
df.plot(kind='box',title='Compare Resorts')
#set xlabel,ylabel
plt.xlabel('Resorts')
plt.ylabel('Rating (5 years)')
#display the plot
plt.show()
Activity 4.3
Plot a pie to display the
radius of the planets
and also give an
appropriate title to
the plot.
Figure 4.18: A boxplot as output of Program 4.15.
2021–22
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'mass': [0.330, 4.87 , 5.97],
'radius': [2439.7, 6051.8, 6378.1]},
index=['Mercury', 'Venus', 'Earth'])
df.plot(kind='pie',y='mass')
plt.show()
2021–22
Program 4-17
import pandas as pd
import matplotlib.pyplot as plt
df=pd.DataFrame({'GeoArea':[83743,78438,22327,22429,21081,16579,10
486],'ForestCover':[67353,27692,17280,17321,19240,13464,8073]},
index=['Arunachal Pradesh','Assam','Manipur','Meghalaya',
'Mizoram','Nagaland','Tripura'])
2021–22
df.plot(kind='pie',y='ForestCover',
title='Forest cover of North Eastern
states',legend=False)
plt.show()
Think and Reflect
What effect did
‘legend= False’ in
Program 4.17 have on
the output?
Program 4-18
import pandas as pd
import matplotlib.pyplot as plt
df=pd.DataFrame({'GeoArea':[83743,78438,22327,22429,21081,16579,1
0486],'ForestCover':[67353,27692,17280,17321,19240,13464,8073]},
index=['Arunachal Pradesh','Assam','Manipur','Meghalaya', 'Mizoram
','Nagaland','Tripura'])
exp=[0.1,0,0,0,0.2,0,0]
#explode the first wedge to .1 level and fifth to level 2.
c=['r','g','m','c','brown','pink','purple']
2021–22
S ummary
• A plot is a graphical representation of a data set
which is also interchangeably known as a graph or
chart. It is used to show the relationship between
two or more variables.
• In order to be able to use Python’s Data
Visualisation library, we need to import the
pyplot module from Matplotlib library using the
following statement: import matplotlib.pyplot as
plt, where plt is an alias or an alternative name
for matplotlib.pyplot. You can keep any alias of
your choice.
• The pyplot module houses functions to create a
figure(plot), create a plotting area in a figure, plot
lines, bars, hist. etc., in a plotting area, decorate
the plot with labels, etc.
2021–22
Exercise
1. What is the purpose of the Matplotlib library?
2. What are some of the major components of any
graphs or plot?
3. Name the function which is used to save the plot.
4. Write short notes on different customisation options
available with any plot.
5. What is the purpose of a legend?
6. Define Pandas visualisation.
7. What is open data? Name any two websites from
which we can download open data.
8. Give an example of data comparison where we can
use the scatter plot.
9. Name the plot which displays the statistical summary.
Note: Give appropriate title, set xlabel and ylabel while
attempting the following questions.
2021–22
2021–22
2021–22