Chapter 4 Data Visualizations
Chapter 4 Data Visualizations
Chapter 4
Introduction to Matplotlib
Build data visualizations with the Matplotlib
Data Visualizations with Matplotlib library
Setting up Matplotlib
Basic Plotting with Matplotlib
Lecturer. Hanad Mohamud Mohamed
Customizing Plots
Advanced Plotting Techniques
Build data visualization with Seaborn Lib
Data Visualization
Data visualization in the context of data science is a critical aspect that involves the
graphical representation of information and data. By using visual elements like charts,
graphs, and maps, data visualization tools and techniques provide an accessible way to
see and understand trends, outliers, and patterns in data
visualization serves several key roles in data science:
Simplifying Complex Data: Data science often deals with large, complex datasets.
Visualization helps in breaking down these complexities, making the data more
understandable. By converting numbers and metrics into visual stories, it becomes easier
for data scientists and stakeholders to grasp difficult concepts or identify new patterns.
Communicating Findings: One of the primary uses of data visualization is to communicate
results to stakeholders. Effective visualizations can convey the findings of complex data
analyses in a manner that is accessible to a non-technical audience. This is crucial in
decision-making processes where insights drawn from data need to be understood by
those who may not have a background in data science.
Some Data Visualizations widely used
Matplotlib: Matplotlib is a widely used Python library for creating static, interactive, and
animated visualizations. It offers extensive customization options, making it ideal for
creating a wide range of 2D graphs and plots. Its versatility and ease of use make it a
popular choice for beginners and professionals alike.
Seaborn: Built on top of Matplotlib, Seaborn simplifies the process of creating beautiful
and informative statistical graphics. It integrates closely with pandas data structures and
provides high-level functions for creating complex visualizations, including heatmaps and
violin plots, with more aesthetic default styles.
Plotnine (ggplot): Inspired by R’s ggplot2, Plotnine is based on the Grammar of Graphics,
allowing users to create complex visualizations by adding layers of components. It’s
particularly useful for those familiar with ggplot2, providing a way to use similar
functionality in Python.
Cont…
Bokeh: Bokeh excels at creating interactive and real-time streaming visualizations in web
browsers. It's great for creating highly interactive plots, dashboards, and data
applications that can connect to real-time data feeds.
Pygal: Pygal, a dynamic SVG charting library, stands out for its ability to produce SVG
(Scalable Vector Graphics) charts. It’s particularly suited for creating charts that are
easily scalable and can be embedded in web pages.
Plotly: Plotly is known for its interactive plots and is highly compatible with web
applications. It supports a wide range of charts and graphs, including 3D charts, and
integrates well with other libraries and frameworks.
Geoplotlib: This library is designed for creating maps and geographical plots, making it
perfect for visualizing spatial data. Geoplotlib can be used to create a variety of map
types, including heatmaps and dot density maps.
Cont..
Missingno: Specifically designed for handling missing data, Missingno provides a small
toolset of flexible and easy-to-use visualizations and utilities for identifying patterns of
missing data in datasets, which is crucial for effective data cleaning and preprocessing.
Folium: Folium is particularly useful for visualizing geospatial data in Python. It leverages
the mapping strengths of the Leaflet.js library, enabling the creation of sophisticated maps
and geographical data visualizations directly within the Python environment.
Each of these libraries has its own strengths and ideal use cases, offering a rich set of tools
for data analysts and scientists to visually interpret and present data.
Data Visualization
We will Focus on Matplotlib and Seaborn Data Visualization Libraries
"A picture is worth a thousand words". Most of us are familiar with this expression. Data
visualization plays an essential role in the representation of both small and large-scale data. It
especially applies when trying to explain the analysis of increasingly large datasets.
Data visualization is the discipline of trying to expose the data to understand it by placing it in
a visual context. Its main goal is to distill large datasets into visual graphics to allow for an easy
understanding of complex relationships within the data.
Several data visualization libraries are available in Python, namely Matplotlib, Seaborn, and
Folium etc.
Purpose of Data Visualization
• Better analysis
• Quick action
• Identifying patterns
• Finding errors
• Understanding the story
• Exploring business insights
• Grasping the Latest Trends
Matplotlib for plotting a dataframe
• The matplotlib is a comprehensive package for data visualization and
comes as part of Anaconda.
Pandas has a convenient integration with matplotlib, which means that data
contained in a dataframe can be plotted with plot():
• You can select the plot type of your choice ( e.g., scatter, bar, boxplot, pie,
hist, …) corresponding to your data
• Please see the resource at the end for more information on various plots
and arguments of the plot() function
Plotting library
Matplotlib is the whole Python package/ library used to create 2D graphs and plots by
using Python scripts. pyplot is a module in Matplotlib, which supports a very wide
variety of graphs and plots namely - histograms, bar charts, power spectra, error charts,
etc. It is used along with NumPy to provide an environment for MatLab.
Pyplot provides the state-machine interface to the plotting library in matplotlib. It means
that figures and axes are implicitly and automatically created to achieve the desired plot.
For example, calling plot from pyplot will automatically create the necessary figure and
axes to achieve the desired plot. Setting a title will then automatically set that title to the
current axes object.The pyplot interface is generally preferred for non-interactive
plotting (i.e., scripting).
Matplotlib – pyplot features
• LINE PLOT
• BAR GRAPH
• HISTOGRAM
• Bie chat
• Scatter plot
• etc
Matplotlib –line plot
Line Plot
A line plot/chart is a graph that shows the frequency of data occurring
along a number line.
The line plot is represented by a series of datapoints connected with a
straight line. Generally line plots are used to display trends over time. A
line plot or line graph can be created using the plot() function available in
pyplot library. We can, not only just plot a line but we can explicitly define
the grid, the x and y axis scale and labels, title and display options etc.
Matplotlib –line plot
E.G.PROGRAM
import numpy as np
import matplotlib.pyplot as plt
year = [2020,2021,2022,2023]
ca202_pass_percentage = [90,92,94,95]
ca204_pass_percentage = [89,91,93,95]
plt.plot(year, ca202_pass_percentage,
color='g')
plt.plot(year, ca204_pass_percentage,
color='orange')
plt.xlabel('year')
plt.ylabel('Pass percentage')
plt.title('Pass percentage class of 2020')
plt.show()
Note:- As many lines required call
plot() function multiple times with
suitable arguments.
Line Plot customization
• Custom line color
• plt.plot(year, passpercentage, color='orange')
• Change the value in color argument.like ‘b’ for blue,’r’,’c’,…..
• Custom line style
• plt.plot( [1,1.1,1,1.1,1], linestyle='-' , linewidth=4).
• set linestyle to any of '-‘ for solid line style, '--‘ for dashed, '-.‘ , ':‘ for dotted line
• Custom line width
• plt.plot( 'x', 'y', data=df, linewidth=22)
set linewidth as required
• Title
• plt.title(‘pass class of 2020') – Change
it as per requirement
• Lable - plt.xlabel(‘Year') - change x or y label
as per requirement
• Legend - plt.legend((‘ca202’,’ca204'),loc='upper
Plotting with Pyplot
Plot bar graphs
label = ['Mohamed', 'Alim', 'Warfa', 'Hirsi', 'Samater',
'Daud']
per = [94,85,45,25,50,54]
index = np.arange(len(label))
plt.bar(index, per)
plt.xlabel('Student Name', fontsize=8)
plt.ylabel('Percentage', fontsize=5)
plt.xticks(index, label, fontsize=8, rotation=30)
plt.title('Percentage of Marks achieve by student Class
XII')
plt.show()
#Note – use barh () for horizontal bars
Bar graph customization
• Custom bar color
• plt.bar(index, per,color="green",edgecolor="blue")
• Change the value in color,edgecolor argument.like ‘b’ for blue,’r’,’c’,…..
• Custom line style
• plt.bar(index, per,color="green",edgecolor="blue",linewidth=4,linestyle='--')
• set linestyle to any of '-‘ for solid line style, '--‘ for dashed, '-.‘ , ':‘ for dotted line
• Custom line width
• plt.bar(index, per,color="green",edgecolor="blue",linewidth=4)
set linewidth as required
• Title
•plt.title('Percentage of Marks achieve by student Class
XII') Change it as per requirement
• Lable - plt.xlabel('Student Name', fontsize=5)- change x or y label
as per requirement
• Change (),loc,frameon property as per requirement
Matplotlib Histogram
Setting up Seaborn
To start working with Seaborn, it's essential to have both Seaborn and Matplotlib
imported into the workspace. This can be achieved with the following code:
The most well-known of these data visualization libraries in Python, Matplotlib, enables
users to generate visualizations like histograms, scatter plots, bar charts, pie charts and
much more.
Seaborn is another useful visualization library that is built on top of Matplotlib. It provides
data visualizations that are typically more aesthetic and statistically sophisticated. Having a
solid understanding of how to use both of these libraries is essential for any data scientist
or data analyst as they both provide easy methods for visualizing data for insight
Note. We going to implement seaborn in the film data set
END