Modulo 8. Data Visualization With Python

Modulo 8.
Data Visualization with Python

https://labs.cognitiveclass.ai/v2/tools/jupyterlite
Tabla de contenido
Importar.........................................................................................................................................2
Line Pots (Series/Dataframe)........................................................................................................2
Area Plots.......................................................................................................................................3
Histograms.....................................................................................................................................4
Colors available in Matplotlib.......................................................................................................8
Bar Charts (Dataframe)..................................................................................................................8
Bar Charts (Dataframe) – Horizontal...........................................................................................10
Waffle Charts...............................................................................................................................11
Waffle Charts – función...............................................................................................................15
Regression Plots...........................................................................................................................17
Scatter Plot – Plotly.express.........................................................................................................20
Line Plot.......................................................................................................................................20
Bar Chart......................................................................................................................................21
Bubble Chart................................................................................................................................22
.....................................................................................................................................................22
Histogram.....................................................................................................................................22
Pie Chart.......................................................................................................................................23
Sunburst Charts............................................................................................................................23
Mapas...........................................................................................................................................23
Mapa Normal...............................................................................................................................23
A. Stamen Toner Maps- Blanco y Negro...................................................................................24
B. B. Stamen Terrain Maps.......................................................................................................24
C. Maps with Markers..............................................................................................................25
D. Choropleth Maps..................................................................................................................28
Scatter Plot...................................................................................................................................30
Line Plot.......................................................................................................................................30
Visualizing Data using Matplotlib

Importar
# we are using the inline backend
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
print('Matplotlib version: ', mpl.__version__) # >= 2.0.0
print(plt.style.available)
mpl.style.use(['ggplot']) # optional: for ggplot-like style
Line Pots (Series/Dataframe)

#First, we will extract the data series for Haiti.
haiti = df_can.loc['Haiti', years] # passing in years 1980 - 2013 to exclude the 'total' column
haiti.head()
haiti.plot()
haiti.index = haiti.index.map(int) # let's change the index values of Haiti to type integer for plotting
haiti.plot(kind='line')
plt.title('Immigration from Haiti')
plt.ylabel('Number of immigrants')
plt.xlabel('Years')
plt.show() # need this line to show the updates made to the figure
haiti.plot(kind='line')
plt.title('Immigration from Haiti')
plt.ylabel('Number of Immigrants')
plt.xlabel('Years')
# annotate the 2010 Earthquake.
# syntax: plt.text(x, y, label)
plt.text(2000, 6000, '2010 Earthquake') # see note below
plt.show()
Area Plots
**Option 1: Scripting layer (procedural method) - using matplotlib.pyplot as 'plt' **
df_top5.index = df_top5.index.map(int)
df_top5.plot(kind='area',
alpha=0.25, # 0 - 1, default value alpha = 0.5 trnsparencia
stacked=False,
figsize=(20, 10)) # pass a tuple (x, y) size
plt.title('Immigration Trend of Top 5 Countries')
plt.xlabel('Years')
plt.show()
**Option 2: Artist layer (Object oriented method) - using an Axes instance from Matplotlib
(preferred) **
# option 2: preferred option with more flexibility
ax = df_top5.plot(kind='area', alpha=0.35, figsize=(20, 10))
ax.set_title('Immigration Trend of Top 5 Countries')
ax.set_ylabel('Number of Immigrants')
ax.set_xlabel('Years')
Histograms
df_can['2013'].plot(kind='hist', figsize=(8, 5))
# add a title to the histogram
plt.title('Histogram of Immigration from 195 Countries in 2013')
# add y-label
plt.ylabel('Number of Countries')
# add x-label
plt.xlabel('Number of Immigrants')
plt.show()
Notice that the x-axis labels do not match with the bin size. This can be fixed by passing in
a xticks keyword that contains the list of the bin sizes, as follows:
# 'bin_edges' is a list of bin intervals
count, bin_edges = np.histogram(df_can['2013'])
df_can['2013'].plot(kind='hist', figsize=(8, 5), xticks=bin_edges)
plt.title('Histogram of Immigration from 195 countries in 2013') # add a title to the

histogram
plt.ylabel('Number of Countries') # add y-label
plt.xlabel('Number of Immigrants') # add x-label
plt.show()
# generate histogram
df_can.loc[['Denmark', 'Norway', 'Sweden'], years].plot.hist()

# generate histogram
df_t.plot(kind='hist', figsize=(10, 6))
plt.title('Histogram of Immigration from Denmark, Norway, and Sweden from 1980 - 2013')
plt.ylabel('Number of Years')
plt.show()
Let's make a few modifications to improve the impact and aesthetics of the previous plot:
 increase the bin size to 15 by passing in bins parameter;

 set transparency to 60% by passing in alpha parameter;
 label the x-axis by passing in x-label parameter;
 change the colors of the plots by passing in color parameter.
# let's get the x-tick values
count, bin_edges = np.histogram(df_t, 15)
# un-stacked histogram
df_t.plot(kind ='hist',
figsize=(10, 6),
bins=15,
alpha=0.6,
xticks=bin_edges,
color=['coral', 'darkslateblue', 'mediumseagreen']
plt.show()
If we do not want the plots to overlap each other, we can stack them using
the stacked parameter. Let's also adjust the min and max x-axis labels to remove the extra gap
on the edges of the plot. We can pass a tuple (min,max) using the xlim paramater, as show
below.
count, bin_edges = np.histogram(df_t, 15)
xmin = bin_edges[0] - 10 # first bin value is 31.0, adding buffer of 10 for aesthetic
purposes
xmax = bin_edges[-1] + 10 # last bin value is 308.0, adding buffer of 10 for aesthetic
purposes
# stacked Histogram
df_t.plot(kind='hist',
figsize=(10, 6),
bins=15,
xticks=bin_edges,
color=['coral', 'darkslateblue', 'mediumseagreen'],
stacked=True,
xlim=(xmin, xmax)
plt.show()
Colors available in Matplotlib
import matplotlib
for name, hex in matplotlib.colors.cnames.items():
print(name, hex)
Bar Charts (Dataframe)

# step 1: get the data
df_iceland = df_can.loc['Iceland', years]
df_iceland.head()
# step 2: plot data
df_iceland.plot(kind='bar', figsize=(10, 6))
plt.xlabel('Year') # add to x-label to the plot
plt.ylabel('Number of immigrants') # add y-label to the plot
plt.title('Icelandic immigrants to Canada from 1980 to 2013') # add title to the plot
plt.show()
df_iceland.plot(kind='bar', figsize=(10, 6), rot=90)
plt.xlabel('Year')
plt.title('Icelandic Immigrants to Canada from 1980 to 2013')
# Annotate arrow
plt.annotate('', # s: str. will leave it blank for no text
xy=(32, 70), # place head of the arrow at point (year 2012 , pop 70)
xytext=(28, 20), # place base of the arrow at point (year 2008 , pop 20)
xycoords='data', # will use the coordinate system of the object being annotated
arrowprops=dict(arrowstyle='->', connectionstyle='arc3', color='blue', lw=2)
# Annotate Text
plt.annotate('2008 - 2011 Financial Crisis', # text to display
xy=(28, 30), # start the text at at point (year 2008 , pop 30)
rotation=72.5, # based on trial and error to match the arrow
va='bottom', # want the text to be vertically 'bottom' aligned
ha='left', # want the text to be horizontally 'left' algned.

)
plt.show()
Bar Charts (Dataframe) – Horizontal

Paso previo
# generate plot
df_top15.plot(kind='barh', figsize=(12, 12), color='steelblue')
plt.title('Top 15 Conuntries Contributing to the Immigration to Canada between 1980 - 2013')
# annotate value labels to each country
for index, value in enumerate(df_top15):
label = format(int(value), ',') # format int with commas

# place text at the end of bar (subtracting 47000 from x, and 0.1 from y to make it fit within the
bar)
plt.annotate(label, xy=(value - 47000, index - 0.10), color='white')
plt.show()
Waffle Charts
Paso previo
Step 1. The first step into creating a waffle chart is determing the proportion of each category with
respect to the total.
# compute the proportion of each category with respect to the total
total_values = df_dsn['Total'].sum()
category_proportions = df_dsn['Total'] / total_values
# print out proportions
pd.DataFrame({"Category Proportion": category_proportions})
Step 2. The second step is defining the overall size of the waffle chart.
width = 40 # width of chart

height = 10 # height of chart
total_num_tiles = width * height # total number of tiles
print(f'Total number of tiles is {total_num_tiles}.')
Step 3. The third step is using the proportion of each category to determe it respective number of
tiles
# compute the number of tiles for each category
tiles_per_category = (category_proportions * total_num_tiles).round().astype(int)
# print out number of tiles per category
pd.DataFrame({"Number of tiles": tiles_per_category})
Step 4. The fourth step is creating a matrix that resembles the waffle chart and populating it.
# initialize the waffle chart as an empty matrix
waffle_chart = np.zeros((height, width), dtype = np.uint)
# define indices to loop through waffle chart
category_index = 0
tile_index = 0
# populate the waffle chart
for col in range(width):
for row in range(height):
tile_index += 1
# if the number of tiles populated for the current category is equal to its corresponding
allocated tiles...
if tile_index > sum(tiles_per_category[0:category_index]):
# ...proceed to the next category
category_index += 1
# set the class value to an integer, which increases with class
waffle_chart[row, col] = category_index
print ('Waffle chart populated!')
Step 5. Map the waffle chart matrix into a visual.

# instantiate a new figure object
fig = plt.figure()
# use matshow to display the waffle chart
colormap = plt.cm.coolwarm
plt.matshow(waffle_chart, cmap=colormap)
plt.colorbar()
plt.show()
Step 6. Prettify the chart.
fig = plt.figure()
plt.colorbar()
# get the axis
ax = plt.gca()
# set minor ticks
ax.set_xticks(np.arange(-.5, (width), 1), minor=True)
ax.set_yticks(np.arange(-.5, (height), 1), minor=True)
# add gridlines based on minor ticks
ax.grid(which='minor', color='w', linestyle='-', linewidth=2)
plt.xticks([])
plt.yticks([])
plt.show()
Step 7. Create a legend and add it to chart.
fig = plt.figure()
plt.colorbar()
# get the axis
ax = plt.gca()
# set minor ticks
# add gridlines based on minor ticks
plt.xticks([])
plt.yticks([])
# compute cumulative sum of individual categories to match color schemes between chart and
legend
values_cumsum = np.cumsum(df_dsn['Total'])
total_values = values_cumsum[len(values_cumsum) - 1]
# create legend
legend_handles = []
for i, category in enumerate(df_dsn.index.values):
label_str = category + ' (' + str(df_dsn['Total'][i]) + ')'
color_val = colormap(float(values_cumsum[i])/total_values)
legend_handles.append(mpatches.Patch(color=color_val, label=label_str))
# add legend to chart
plt.legend(handles=legend_handles,
loc='lower center',
ncol=len(df_dsn.index.values),
bbox_to_anchor=(0., -0.2, 0.95, .1)
plt.show()
Waffle Charts – función
def create_waffle_chart(categories, values, height, width, colormap, value_sign=''):
# compute the proportion of each category with respect to the total

total_values = sum(values)
category_proportions = [(float(value) / total_values) for value in values]
# compute the total number of tiles

total_num_tiles = width * height # total number of tiles
print ('Total number of tiles is', total_num_tiles)
# compute the number of tiles for each catagory

tiles_per_category = [round(proportion * total_num_tiles) for proportion in category_proportions]
# print out number of tiles per category

for i, tiles in enumerate(tiles_per_category):
print (df_dsn.index.values[i] + ': ' + str(tiles))
# initialize the waffle chart as an empty matrix

waffle_chart = np.zeros((height, width))
# define indices to loop through waffle chart

category_index = 0
tile_index = 0
# populate the waffle chart

for col in range(width):
for row in range(height):
tile_index += 1
# if the number of tiles populated for the current category

# is equal to its corresponding allocated tiles...
if tile_index > sum(tiles_per_category[0:category_index]):
# ...proceed to the next category
category_index += 1
# set the class value to an integer, which increases with class

waffle_chart[row, col] = category_index

fig = plt.figure()
plt.colorbar()
# get the axis

ax = plt.gca()
# set minor ticks

# add dridlines based on minor ticks

plt.xticks([])
plt.yticks([])
# compute cumulative sum of individual categories to match color schemes between chart and legend
values_cumsum = np.cumsum(values)
total_values = values_cumsum[len(values_cumsum) - 1]
# create legend
legend_handles = []
for i, category in enumerate(categories):
if value_sign == '%':
label_str = category + ' (' + str(values[i]) + value_sign + ')'
else:
label_str = category + ' (' + value_sign + str(values[i]) + ')'
color_val = colormap(float(values_cumsum[i])/total_values)
legend_handles.append(mpatches.Patch(color=color_val, label=label_str))
# add legend to chart

plt.legend(
handles=legend_handles,
loc='lower center',
ncol=len(categories),
bbox_to_anchor=(0., -0.2, 0.95, .1)
)
plt.show()
width = 40 # width of chart
height = 10 # height of chart
categories = df_dsn.index.values # categories
values = df_dsn['Total'] # correponding values of categories
colormap = plt.cm.coolwarm # color map class
create_waffle_chart(categories, values, height, width, colormap)
Regression Plots
# install seaborn
# !pip3 install seaborn
# import library
import seaborn as sns
print('Seaborn installed and imported!')
Create a new dataframe that stores that total number of landed immigrants to Canada per year
from 1980 to 2013.
# we can use the sum() method to get the total population per year
df_tot = pd.DataFrame(df_can[years].sum(axis=0))
# change the years to type float (useful for regression later on)
df_tot.index = map(float, df_tot.index)
# reset the index to put in back in as a column in the df_tot dataframe
df_tot.reset_index(inplace=True)
# rename columns
df_tot.columns = ['year', 'total']
# view the final dataframe
df_tot.head()
sns.regplot(x='year', y='total', data=df_tot)
plt.figure(figsize=(15, 10))
ax = sns.regplot(x='year', y='total', data=df_tot, color='green', marker='+', scatter_kws={'s': 200})
ax.set(xlabel='Year', ylabel='Total Immigration') # add x- and y-labels
ax.set_title('Total Immigration to Canada from 1980 - 2013') # add title
plt.show()
sns.set(font_scale=1.5)
sns.set_style('ticks') # change background to white background
ax.set(xlabel='Year', ylabel='Total Immigration')
ax.set_title('Total Immigration to Canada from 1980 - 2013')
plt.show()
sns.set(font_scale=1.5)
sns.set_style('whitegrid')
ax.set(xlabel='Year', ylabel='Total Immigration')
ax.set_title('Total Immigration to Canada from 1980 - 2013')

plt.show()
Scatter Plot – Plotly.express
# Import required libraries
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
https://labs.cognitiveclass.ai/v2/tools/jupyterlite
# First we create a figure using go.Figure and adding trace to it through go.scatter
fig = go.Figure(data=go.Scatter(x=data['Distance'], y=data['DepTime'], mode='markers',

marker=dict(color='red')))
# Updating layout through `update_layout`. Here we are adding title to the plot and providing title
to x and y axis.
fig.update_layout(title='Distance vs Departure Time', xaxis_title='Distance', yaxis_title='DepTime')
# Display the figure
fig.show()
Line Plot - plotly.graph_objects
# Group the data by Month and compute average over arrival delay time.
line_data = data.groupby('Month')['ArrDelay'].mean().reset_index()
# Create line plot here
fig = go.Figure(data=go.Scatter(x=line_data['Month'], y=line_data['ArrDelay'], mode='lines',

marker=dict(color='green')))
fig.update_layout(title='Month vs Average Flight Delay Time', xaxis_title='Month',

yaxis_title='ArrDelay')
fig.show()
Bar Chart
# Use plotly express bar chart function px.bar. Provide input data, x and y axis variable, and title of
the chart.
# This will give total number of flights to the destination state.
fig = px.bar(bar_data, x="DestState", y="Flights", title='Total number of flights to the destination

state split by reporting airline')
fig.show()
Bubble Chart
# Create bubble chart here
fig = px.scatter(bub_data, x="Reporting_Airline", y="Flights", size="Flights",
hover_name="Reporting_Airline", title='Reporting Airline vs Number of Flights',

size_max=60)
fig.show()
Histogram
# Create histogram here
fig = px.histogram(data, x="ArrDelay")
fig.show()
Pie Chart
# Use px.pie function to create the chart. Input dataset.
# Values parameter will set values associated to the sector. 'Month' feature is passed to it.
# labels for the sector are passed to the `names` parameter.
fig = px.pie(data, values='Month', names='DistanceGroup', title='Distance group proportion by

month')
fig.show()
Sunburst Charts
# Create sunburst chart here
fig = px.histogram(data, x="ArrDelay")
fig.show()
Mapas
Mapa Normal
#!pip3 install folium==0.5.0
import folium
#deine Mexico's geolocation coordinates
mexico_latitude = 23.6345
mexico_longitude = -102.5528
# define the world map centered around Canada with a higher zoom level
mexico_map = folium.Map(location=[mexico_latitude, mexico_longitude], zoom_start=4)
# display world map
mexico_map
A.Stamen Toner Maps- Blanco y Negro
# create a Stamen Toner map of the world centered around Canada
world_map = folium.Map(location=[56.130, -106.35], zoom_start=4, tiles='Stamen Toner')
# display map
world_map
B. B. Stamen Terrain Maps
# create a Stamen Toner map of the world centered around Canada
world_map = folium.Map(location=[56.130, -106.35], zoom_start=4, tiles='Stamen Terrain')
# display map
world_map
C.Maps with Markers
Previo al mapa
# Download the dataset and read it into a pandas dataframe:
from js import fetch
import io
URL = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/
IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data%20Files/
Police_Department_Incidents_-_Previous_Year__2016_.csv'
resp = await fetch(URL)
text = io.BytesIO((await resp.arrayBuffer()).to_py())
df_incidents = pd.read_csv(text)
print('Dataset downloaded and read into a pandas dataframe!')
# create map and display it
sanfran_map = folium.Map(location=[latitude, longitude], zoom_start=12)
# display the map of San Francisco
sanfran_map
# instantiate a feature group for the incidents in the dataframe
incidents = folium.map.FeatureGroup()
# loop through the 100 crimes and add each to the incidents feature group
for lat, lng, in zip(df_incidents.Y, df_incidents.X):
incidents.add_child(
folium.features.CircleMarker(
[lat, lng],
radius=5, # define how big you want the circle markers to be
color='yellow',
fill=True,
fill_color='blue',
fill_opacity=0.6
# add incidents to map
sanfran_map.add_child(incidents)
# instantiate a feature group for the incidents in the dataframe
incidents = folium.map.FeatureGroup()
# loop through the 100 crimes and add each to the incidents feature group
for lat, lng, in zip(df_incidents.Y, df_incidents.X):
incidents.add_child(
folium.features.CircleMarker(
[lat, lng],
radius=5, # define how big you want the circle markers to be
color='yellow',
fill=True,
fill_color='blue',
fill_opacity=0.6
# add pop-up text to each marker on the map

latitudes = list(df_incidents.Y)
longitudes = list(df_incidents.X)
labels = list(df_incidents.Category)
for lat, lng, label in zip(latitudes, longitudes, labels):
folium.Marker([lat, lng], popup=label).add_to(sanfran_map)
# add incidents to map
sanfran_map.add_child(incidents)
#to group the markers into different clusters. Each cluster is then represented by the number of
crimes in each neighborhood. These clusters can be thought of as pockets of San Francisco
which you can then analyze separately.
from folium import plugins
# let's start again with a clean copy of the map of San Francisco
sanfran_map = folium.Map(location = [latitude, longitude], zoom_start = 12)
# instantiate a mark cluster object for the incidents in the dataframe
incidents = plugins.MarkerCluster().add_to(sanfran_map)
# loop through the dataframe and add each data point to the mark cluster
for lat, lng, label, in zip(df_incidents.Y, df_incidents.X, df_incidents.Category):
folium.Marker(
location=[lat, lng],
icon=None,
popup=label,
).add_to(incidents)
# display map
sanfran_map
D.Choropleth Maps
In order to create a Choropleth map, we need a GeoJSON file that defines the areas/boundaries
of the state, county, or country that we are interested in. In our case, since we are endeavoring
to create a world map, we want a GeoJSON that defines the boundaries of all world countries.
For your convenience, we will be providing you with this file, so let's go ahead and load it.
# download countries geojson file
from js import fetch
import io
import json
URL = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/
IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data%20Files/world_countries.json'
resp = await fetch(URL)
data = io.BytesIO((await resp.arrayBuffer()).to_py())
world_geo = json.load(data)
print('GeoJSON file loaded!')
# create a plain world map
world_map = folium.Map(location=[0, 0], zoom_start=2)
# generate choropleth map using the total immigration of each country to Canada from 1980 to
2013
world_map.choropleth(
geo_data=world_geo,
data=df_can,
columns=['Country', 'Total'],
key_on='feature.properties.name',
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Immigration to Canada'
# display map
world_map
#Notice how the legend is displaying a negative boundary or threshold. Let's fix that by
defining our own thresholds and starting with 0 instead of -6,918!
# create a numpy array of length 6 and has linear spacing from the minimum total immigration to
the maximum total immigration
threshold_scale = np.linspace(df_can['Total'].min(),
df_can['Total'].max(),
6, dtype=int)
threshold_scale = threshold_scale.tolist() # change the numpy array to a list
threshold_scale[-1] = threshold_scale[-1] + 1 # make sure that the last value of the list is greater
than the maximum immigration
# let Folium determine the scale.
world_map = folium.Map(location=[0, 0], zoom_start=2)
world_map.choropleth(
geo_data=world_geo,
data=df_can,
columns=['Country', 'Total'],
key_on='feature.properties.name',
threshold_scale=threshold_scale,
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Immigration to Canada',
reset=True
world_map

Modulo 8. Data Visualization With Python

Uploaded by

Modulo 8. Data Visualization With Python

Uploaded by

Modulo 8.

Data Visualization with Python

Visualizing Data using Matplotlib

import matplotlib as mpl

import matplotlib.pyplot as plt

print('Matplotlib version: ', mpl.__version__) # >= 2.0.0

mpl.style.use(['ggplot']) # optional: for ggplot-like style

Line Pots (Series/Dataframe)

plt.title('Immigration from Haiti')

# annotate the 2010 Earthquake.

# syntax: plt.text(x, y, label)

plt.text(2000, 6000, '2010 Earthquake') # see note below

**Option 1: Scripting layer (procedural method) - using matplotlib.pyplot as 'plt' **

alpha=0.25, # 0 - 1, default value alpha = 0.5 trnsparencia

figsize=(20, 10)) # pass a tuple (x, y) size

plt.title('Immigration Trend of Top 5 Countries')

# option 2: preferred option with more flexibility

ax = df_top5.plot(kind='area', alpha=0.35, figsize=(20, 10))

ax.set_title('Immigration Trend of Top 5 Countries')

df_can['2013'].plot(kind='hist', figsize=(8, 5))

# add a title to the histogram

plt.title('Histogram of Immigration from 195 Countries in 2013')

# 'bin_edges' is a list of bin intervals

count, bin_edges = np.histogram(df_can['2013'])

df_can['2013'].plot(kind='hist', figsize=(8, 5), xticks=bin_edges)

plt.title('Histogram of Immigration from 195 countries in 2013') # add a title to the

plt.ylabel('Number of Countries') # add y-label

plt.xlabel('Number of Immigrants') # add x-label

df_can.loc[['Denmark', 'Norway', 'Sweden'], years].plot.hist()

df_t.plot(kind='hist', figsize=(10, 6))

 increase the bin size to 15 by passing in bins parameter;

# let's get the x-tick values

count, bin_edges = np.histogram(df_t, 15)

color=['coral', 'darkslateblue', 'mediumseagreen']

count, bin_edges = np.histogram(df_t, 15)

color=['coral', 'darkslateblue', 'mediumseagreen'],

Colors available in Matplotlib

for name, hex in matplotlib.colors.cnames.items():

Bar Charts (Dataframe)

df_iceland = df_can.loc['Iceland', years]

df_iceland.plot(kind='bar', figsize=(10, 6))

plt.xlabel('Year') # add to x-label to the plot

plt.ylabel('Number of immigrants') # add y-label to the plot

df_iceland.plot(kind='bar', figsize=(10, 6), rot=90)

plt.title('Icelandic Immigrants to Canada from 1980 to 2013')

plt.annotate('', # s: str. will leave it blank for no text

arrowprops=dict(arrowstyle='->', connectionstyle='arc3', color='blue', lw=2)

plt.annotate('2008 - 2011 Financial Crisis', # text to display

rotation=72.5, # based on trial and error to match the arrow

va='bottom', # want the text to be vertically 'bottom' aligned

ha='left', # want the text to be horizontally 'left' algned.

Bar Charts (Dataframe) – Horizontal

df_top15.plot(kind='barh', figsize=(12, 12), color='steelblue')

plt.title('Top 15 Conuntries Contributing to the Immigration to Canada between 1980 - 2013')

# annotate value labels to each country

for index, value in enumerate(df_top15):

label = format(int(value), ',') # format int with commas

plt.annotate(label, xy=(value - 47000, index - 0.10), color='white')

# compute the proportion of each category with respect to the total

category_proportions = df_dsn['Total'] / total_values

# print out proportions

pd.DataFrame({"Category Proportion": category_proportions})

Step 2. The second step is defining the overall size of the waffle chart.

width = 40 # width of chart

total_num_tiles = width * height # total number of tiles

print(f'Total number of tiles is {total_num_tiles}.')

# compute the number of tiles for each category

tiles_per_category = (category_proportions * total_num_tiles).round().astype(int)

# print out number of tiles per category

pd.DataFrame({"Number of tiles": tiles_per_category})

# initialize the waffle chart as an empty matrix

waffle_chart = np.zeros((height, width), dtype = np.uint)

# define indices to loop through waffle chart

# populate the waffle chart

for col in range(width):

for row in range(height):

print('Matplotlib version: ', mpl.version) # >= 2.0.0

Option 1: Scripting layer (procedural method) - using matplotlib.pyplot as 'plt'