0% found this document useful (0 votes)
66 views30 pages

Modulo 8. Data Visualization With Python

The document discusses various data visualization techniques using Matplotlib in Python, including line plots, area plots, histograms, bar charts, scatter plots, and more. It provides code examples to generate these different plot types using the Matplotlib library and customize aspects like colors, titles, labels, and annotations. The goal is to teach how to effectively visualize data through different plot types commonly used for exploration and communication of insights.

Uploaded by

gia fer
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
0% found this document useful (0 votes)
66 views30 pages

Modulo 8. Data Visualization With Python

The document discusses various data visualization techniques using Matplotlib in Python, including line plots, area plots, histograms, bar charts, scatter plots, and more. It provides code examples to generate these different plot types using the Matplotlib library and customize aspects like colors, titles, labels, and annotations. The goal is to teach how to effectively visualize data through different plot types commonly used for exploration and communication of insights.

Uploaded by

gia fer
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 30

Modulo 8.

Data Visualization with Python


https://labs.cognitiveclass.ai/v2/tools/jupyterlite

Tabla de contenido
Importar.........................................................................................................................................2
Line Pots (Series/Dataframe)........................................................................................................2
Area Plots.......................................................................................................................................3
Histograms.....................................................................................................................................4
Colors available in Matplotlib.......................................................................................................8
Bar Charts (Dataframe)..................................................................................................................8
Bar Charts (Dataframe) – Horizontal...........................................................................................10
Waffle Charts...............................................................................................................................11
Waffle Charts – función...............................................................................................................15
Regression Plots...........................................................................................................................17
Scatter Plot – Plotly.express.........................................................................................................20
Line Plot.......................................................................................................................................20
Bar Chart......................................................................................................................................21
Bubble Chart................................................................................................................................22

.....................................................................................................................................................22
Histogram.....................................................................................................................................22
Pie Chart.......................................................................................................................................23
Sunburst Charts............................................................................................................................23
Mapas...........................................................................................................................................23
Mapa Normal...............................................................................................................................23
A. Stamen Toner Maps- Blanco y Negro...................................................................................24
B. B. Stamen Terrain Maps.......................................................................................................24
C. Maps with Markers..............................................................................................................25
D. Choropleth Maps..................................................................................................................28
Scatter Plot...................................................................................................................................30
Line Plot.......................................................................................................................................30

Visualizing Data using Matplotlib


Importar
# we are using the inline backend

%matplotlib inline

import matplotlib as mpl

import matplotlib.pyplot as plt

print('Matplotlib version: ', mpl.__version__) # >= 2.0.0

print(plt.style.available)

mpl.style.use(['ggplot']) # optional: for ggplot-like style

Line Pots (Series/Dataframe)


#First, we will extract the data series for Haiti.

haiti = df_can.loc['Haiti', years] # passing in years 1980 - 2013 to exclude the 'total' column

haiti.head()

haiti.plot()

haiti.index = haiti.index.map(int) # let's change the index values of Haiti to type integer for plotting

haiti.plot(kind='line')
plt.title('Immigration from Haiti')

plt.ylabel('Number of immigrants')

plt.xlabel('Years')

plt.show() # need this line to show the updates made to the figure

haiti.plot(kind='line')

plt.title('Immigration from Haiti')

plt.ylabel('Number of Immigrants')

plt.xlabel('Years')

# annotate the 2010 Earthquake.

# syntax: plt.text(x, y, label)

plt.text(2000, 6000, '2010 Earthquake') # see note below

plt.show()

Area Plots

**Option 1: Scripting layer (procedural method) - using matplotlib.pyplot as 'plt' **

df_top5.index = df_top5.index.map(int)

df_top5.plot(kind='area',

alpha=0.25, # 0 - 1, default value alpha = 0.5 trnsparencia

stacked=False,

figsize=(20, 10)) # pass a tuple (x, y) size

plt.title('Immigration Trend of Top 5 Countries')

plt.ylabel('Number of Immigrants')

plt.xlabel('Years')
plt.show()

**Option 2: Artist layer (Object oriented method) - using an Axes instance from Matplotlib
(preferred) **

# option 2: preferred option with more flexibility

ax = df_top5.plot(kind='area', alpha=0.35, figsize=(20, 10))

ax.set_title('Immigration Trend of Top 5 Countries')

ax.set_ylabel('Number of Immigrants')

ax.set_xlabel('Years')

Histograms

df_can['2013'].plot(kind='hist', figsize=(8, 5))

# add a title to the histogram

plt.title('Histogram of Immigration from 195 Countries in 2013')

# add y-label

plt.ylabel('Number of Countries')

# add x-label

plt.xlabel('Number of Immigrants')

plt.show()
Notice that the x-axis labels do not match with the bin size. This can be fixed by passing in
a xticks keyword that contains the list of the bin sizes, as follows:

# 'bin_edges' is a list of bin intervals

count, bin_edges = np.histogram(df_can['2013'])

df_can['2013'].plot(kind='hist', figsize=(8, 5), xticks=bin_edges)

plt.title('Histogram of Immigration from 195 countries in 2013') # add a title to the


histogram

plt.ylabel('Number of Countries') # add y-label

plt.xlabel('Number of Immigrants') # add x-label

plt.show()

# generate histogram

df_can.loc[['Denmark', 'Norway', 'Sweden'], years].plot.hist()


# generate histogram

df_t.plot(kind='hist', figsize=(10, 6))

plt.title('Histogram of Immigration from Denmark, Norway, and Sweden from 1980 - 2013')

plt.ylabel('Number of Years')

plt.xlabel('Number of Immigrants')

plt.show()

Let's make a few modifications to improve the impact and aesthetics of the previous plot:

 increase the bin size to 15 by passing in bins parameter;


 set transparency to 60% by passing in alpha parameter;
 label the x-axis by passing in x-label parameter;
 change the colors of the plots by passing in color parameter.

# let's get the x-tick values

count, bin_edges = np.histogram(df_t, 15)

# un-stacked histogram

df_t.plot(kind ='hist',

figsize=(10, 6),

bins=15,

alpha=0.6,

xticks=bin_edges,

color=['coral', 'darkslateblue', 'mediumseagreen']

plt.title('Histogram of Immigration from Denmark, Norway, and Sweden from 1980 - 2013')

plt.ylabel('Number of Years')

plt.xlabel('Number of Immigrants')

plt.show()

If we do not want the plots to overlap each other, we can stack them using
the stacked parameter. Let's also adjust the min and max x-axis labels to remove the extra gap
on the edges of the plot. We can pass a tuple (min,max) using the xlim paramater, as show
below.

count, bin_edges = np.histogram(df_t, 15)

xmin = bin_edges[0] - 10 # first bin value is 31.0, adding buffer of 10 for aesthetic
purposes
xmax = bin_edges[-1] + 10 # last bin value is 308.0, adding buffer of 10 for aesthetic
purposes

# stacked Histogram

df_t.plot(kind='hist',

figsize=(10, 6),

bins=15,

xticks=bin_edges,

color=['coral', 'darkslateblue', 'mediumseagreen'],

stacked=True,

xlim=(xmin, xmax)

plt.title('Histogram of Immigration from Denmark, Norway, and Sweden from 1980 - 2013')

plt.ylabel('Number of Years')

plt.xlabel('Number of Immigrants')

plt.show()

Colors available in Matplotlib

import matplotlib

for name, hex in matplotlib.colors.cnames.items():

print(name, hex)

Bar Charts (Dataframe) 


# step 1: get the data

df_iceland = df_can.loc['Iceland', years]

df_iceland.head()
# step 2: plot data

df_iceland.plot(kind='bar', figsize=(10, 6))

plt.xlabel('Year') # add to x-label to the plot

plt.ylabel('Number of immigrants') # add y-label to the plot

plt.title('Icelandic immigrants to Canada from 1980 to 2013') # add title to the plot

plt.show()

df_iceland.plot(kind='bar', figsize=(10, 6), rot=90)

plt.xlabel('Year')

plt.ylabel('Number of Immigrants')

plt.title('Icelandic Immigrants to Canada from 1980 to 2013')

# Annotate arrow

plt.annotate('', # s: str. will leave it blank for no text

xy=(32, 70), # place head of the arrow at point (year 2012 , pop 70)

xytext=(28, 20), # place base of the arrow at point (year 2008 , pop 20)

xycoords='data', # will use the coordinate system of the object being annotated

arrowprops=dict(arrowstyle='->', connectionstyle='arc3', color='blue', lw=2)

# Annotate Text

plt.annotate('2008 - 2011 Financial Crisis', # text to display

xy=(28, 30), # start the text at at point (year 2008 , pop 30)

rotation=72.5, # based on trial and error to match the arrow

va='bottom', # want the text to be vertically 'bottom' aligned

ha='left', # want the text to be horizontally 'left' algned.


)

plt.show()

Bar Charts (Dataframe) – Horizontal


Paso previo

# generate plot

df_top15.plot(kind='barh', figsize=(12, 12), color='steelblue')

plt.xlabel('Number of Immigrants')

plt.title('Top 15 Conuntries Contributing to the Immigration to Canada between 1980 - 2013')

# annotate value labels to each country

for index, value in enumerate(df_top15):

label = format(int(value), ',') # format int with commas


# place text at the end of bar (subtracting 47000 from x, and 0.1 from y to make it fit within the
bar)

plt.annotate(label, xy=(value - 47000, index - 0.10), color='white')

plt.show()

Waffle Charts

Paso previo

Step 1. The first step into creating a waffle chart is determing the proportion of each category with
respect to the total.

# compute the proportion of each category with respect to the total

total_values = df_dsn['Total'].sum()

category_proportions = df_dsn['Total'] / total_values

# print out proportions

pd.DataFrame({"Category Proportion": category_proportions})

Step 2. The second step is defining the overall size of the waffle chart.

width = 40 # width of chart


height = 10 # height of chart

total_num_tiles = width * height # total number of tiles

print(f'Total number of tiles is {total_num_tiles}.')

Step 3. The third step is using the proportion of each category to determe it respective number of
tiles

# compute the number of tiles for each category

tiles_per_category = (category_proportions * total_num_tiles).round().astype(int)

# print out number of tiles per category

pd.DataFrame({"Number of tiles": tiles_per_category})

Step 4. The fourth step is creating a matrix that resembles the waffle chart and populating it.

# initialize the waffle chart as an empty matrix

waffle_chart = np.zeros((height, width), dtype = np.uint)

# define indices to loop through waffle chart

category_index = 0

tile_index = 0

# populate the waffle chart

for col in range(width):

for row in range(height):

tile_index += 1

# if the number of tiles populated for the current category is equal to its corresponding
allocated tiles...

if tile_index > sum(tiles_per_category[0:category_index]):

# ...proceed to the next category

category_index += 1

# set the class value to an integer, which increases with class

waffle_chart[row, col] = category_index

print ('Waffle chart populated!')

Step 5. Map the waffle chart matrix into a visual.


# instantiate a new figure object

fig = plt.figure()

# use matshow to display the waffle chart

colormap = plt.cm.coolwarm

plt.matshow(waffle_chart, cmap=colormap)

plt.colorbar()

plt.show()

Step 6. Prettify the chart.

# instantiate a new figure object

fig = plt.figure()

# use matshow to display the waffle chart

colormap = plt.cm.coolwarm

plt.matshow(waffle_chart, cmap=colormap)

plt.colorbar()

# get the axis

ax = plt.gca()

# set minor ticks

ax.set_xticks(np.arange(-.5, (width), 1), minor=True)

ax.set_yticks(np.arange(-.5, (height), 1), minor=True)

# add gridlines based on minor ticks

ax.grid(which='minor', color='w', linestyle='-', linewidth=2)

plt.xticks([])

plt.yticks([])

plt.show()
Step 7. Create a legend and add it to chart.

# instantiate a new figure object

fig = plt.figure()

# use matshow to display the waffle chart

colormap = plt.cm.coolwarm

plt.matshow(waffle_chart, cmap=colormap)

plt.colorbar()

# get the axis

ax = plt.gca()

# set minor ticks

ax.set_xticks(np.arange(-.5, (width), 1), minor=True)

ax.set_yticks(np.arange(-.5, (height), 1), minor=True)

# add gridlines based on minor ticks

ax.grid(which='minor', color='w', linestyle='-', linewidth=2)

plt.xticks([])

plt.yticks([])

# compute cumulative sum of individual categories to match color schemes between chart and
legend

values_cumsum = np.cumsum(df_dsn['Total'])

total_values = values_cumsum[len(values_cumsum) - 1]

# create legend

legend_handles = []

for i, category in enumerate(df_dsn.index.values):

label_str = category + ' (' + str(df_dsn['Total'][i]) + ')'

color_val = colormap(float(values_cumsum[i])/total_values)

legend_handles.append(mpatches.Patch(color=color_val, label=label_str))
# add legend to chart

plt.legend(handles=legend_handles,

loc='lower center',

ncol=len(df_dsn.index.values),

bbox_to_anchor=(0., -0.2, 0.95, .1)

plt.show()

Waffle Charts – función

def create_waffle_chart(categories, values, height, width, colormap, value_sign=''):

# compute the proportion of each category with respect to the total


total_values = sum(values)
category_proportions = [(float(value) / total_values) for value in values]

# compute the total number of tiles


total_num_tiles = width * height # total number of tiles
print ('Total number of tiles is', total_num_tiles)

# compute the number of tiles for each catagory


tiles_per_category = [round(proportion * total_num_tiles) for proportion in category_proportions]

# print out number of tiles per category


for i, tiles in enumerate(tiles_per_category):
print (df_dsn.index.values[i] + ': ' + str(tiles))

# initialize the waffle chart as an empty matrix


waffle_chart = np.zeros((height, width))

# define indices to loop through waffle chart


category_index = 0
tile_index = 0

# populate the waffle chart


for col in range(width):
for row in range(height):
tile_index += 1

# if the number of tiles populated for the current category


# is equal to its corresponding allocated tiles...
if tile_index > sum(tiles_per_category[0:category_index]):
# ...proceed to the next category
category_index += 1

# set the class value to an integer, which increases with class


waffle_chart[row, col] = category_index

# instantiate a new figure object


fig = plt.figure()
# use matshow to display the waffle chart
colormap = plt.cm.coolwarm
plt.matshow(waffle_chart, cmap=colormap)
plt.colorbar()

# get the axis


ax = plt.gca()

# set minor ticks


ax.set_xticks(np.arange(-.5, (width), 1), minor=True)
ax.set_yticks(np.arange(-.5, (height), 1), minor=True)

# add dridlines based on minor ticks


ax.grid(which='minor', color='w', linestyle='-', linewidth=2)

plt.xticks([])
plt.yticks([])

# compute cumulative sum of individual categories to match color schemes between chart and legend
values_cumsum = np.cumsum(values)
total_values = values_cumsum[len(values_cumsum) - 1]

# create legend
legend_handles = []
for i, category in enumerate(categories):
if value_sign == '%':
label_str = category + ' (' + str(values[i]) + value_sign + ')'
else:
label_str = category + ' (' + value_sign + str(values[i]) + ')'

color_val = colormap(float(values_cumsum[i])/total_values)
legend_handles.append(mpatches.Patch(color=color_val, label=label_str))

# add legend to chart


plt.legend(
handles=legend_handles,
loc='lower center',
ncol=len(categories),
bbox_to_anchor=(0., -0.2, 0.95, .1)
)
plt.show()

width = 40 # width of chart

height = 10 # height of chart

categories = df_dsn.index.values # categories

values = df_dsn['Total'] # correponding values of categories

colormap = plt.cm.coolwarm # color map class

create_waffle_chart(categories, values, height, width, colormap)

Regression Plots
# install seaborn

# !pip3 install seaborn

# import library

import seaborn as sns

print('Seaborn installed and imported!')

Create a new dataframe that stores that total number of landed immigrants to Canada per year
from 1980 to 2013.

# we can use the sum() method to get the total population per year

df_tot = pd.DataFrame(df_can[years].sum(axis=0))

# change the years to type float (useful for regression later on)

df_tot.index = map(float, df_tot.index)

# reset the index to put in back in as a column in the df_tot dataframe

df_tot.reset_index(inplace=True)

# rename columns

df_tot.columns = ['year', 'total']

# view the final dataframe

df_tot.head()

sns.regplot(x='year', y='total', data=df_tot)

plt.figure(figsize=(15, 10))

ax = sns.regplot(x='year', y='total', data=df_tot, color='green', marker='+', scatter_kws={'s': 200})

ax.set(xlabel='Year', ylabel='Total Immigration') # add x- and y-labels

ax.set_title('Total Immigration to Canada from 1980 - 2013') # add title

plt.show()
plt.figure(figsize=(15, 10))

sns.set(font_scale=1.5)

sns.set_style('ticks') # change background to white background

ax = sns.regplot(x='year', y='total', data=df_tot, color='green', marker='+', scatter_kws={'s': 200})

ax.set(xlabel='Year', ylabel='Total Immigration')

ax.set_title('Total Immigration to Canada from 1980 - 2013')

plt.show()

plt.figure(figsize=(15, 10))

sns.set(font_scale=1.5)

sns.set_style('whitegrid')

ax = sns.regplot(x='year', y='total', data=df_tot, color='green', marker='+', scatter_kws={'s': 200})

ax.set(xlabel='Year', ylabel='Total Immigration')

ax.set_title('Total Immigration to Canada from 1980 - 2013')


plt.show()
Scatter Plot – Plotly.express

# Import required libraries

import pandas as pd

import plotly.express as px

import plotly.graph_objects as go

https://labs.cognitiveclass.ai/v2/tools/jupyterlite

# First we create a figure using go.Figure and adding trace to it through go.scatter

fig = go.Figure(data=go.Scatter(x=data['Distance'], y=data['DepTime'], mode='markers',


marker=dict(color='red')))

# Updating layout through `update_layout`. Here we are adding title to the plot and providing title
to x and y axis.

fig.update_layout(title='Distance vs Departure Time', xaxis_title='Distance', yaxis_title='DepTime')

# Display the figure

fig.show()

Line Plot - plotly.graph_objects

# Group the data by Month and compute average over arrival delay time.

line_data = data.groupby('Month')['ArrDelay'].mean().reset_index()
# Create line plot here

fig = go.Figure(data=go.Scatter(x=line_data['Month'], y=line_data['ArrDelay'], mode='lines',


marker=dict(color='green')))

fig.update_layout(title='Month vs Average Flight Delay Time', xaxis_title='Month',


yaxis_title='ArrDelay')

fig.show()

Bar Chart

# Use plotly express bar chart function px.bar. Provide input data, x and y axis variable, and title of
the chart.

# This will give total number of flights to the destination state.

fig = px.bar(bar_data, x="DestState", y="Flights", title='Total number of flights to the destination


state split by reporting airline')
fig.show()

Bubble Chart

# Create bubble chart here

fig = px.scatter(bub_data, x="Reporting_Airline", y="Flights", size="Flights",

hover_name="Reporting_Airline", title='Reporting Airline vs Number of Flights',


size_max=60)

fig.show()

Histogram
# Create histogram here

fig = px.histogram(data, x="ArrDelay")

fig.show()
Pie Chart
# Use px.pie function to create the chart. Input dataset.

# Values parameter will set values associated to the sector. 'Month' feature is passed to it.

# labels for the sector are passed to the `names` parameter.

fig = px.pie(data, values='Month', names='DistanceGroup', title='Distance group proportion by


month')

fig.show()

Sunburst Charts
# Create sunburst chart here

fig = px.histogram(data, x="ArrDelay")

fig.show()

Mapas

Mapa Normal
#!pip3 install folium==0.5.0

import folium

#deine Mexico's geolocation coordinates

mexico_latitude = 23.6345

mexico_longitude = -102.5528

# define the world map centered around Canada with a higher zoom level

mexico_map = folium.Map(location=[mexico_latitude, mexico_longitude], zoom_start=4)

# display world map

mexico_map
A.Stamen Toner Maps- Blanco y Negro

# create a Stamen Toner map of the world centered around Canada

world_map = folium.Map(location=[56.130, -106.35], zoom_start=4, tiles='Stamen Toner')

# display map

world_map

B. B. Stamen Terrain Maps

# create a Stamen Toner map of the world centered around Canada

world_map = folium.Map(location=[56.130, -106.35], zoom_start=4, tiles='Stamen Terrain')

# display map

world_map
C.Maps with Markers 

Previo al mapa

# Download the dataset and read it into a pandas dataframe:

from js import fetch

import io

URL = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/
IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data%20Files/
Police_Department_Incidents_-_Previous_Year__2016_.csv'

resp = await fetch(URL)

text = io.BytesIO((await resp.arrayBuffer()).to_py())

df_incidents = pd.read_csv(text)

print('Dataset downloaded and read into a pandas dataframe!')

# create map and display it

sanfran_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# display the map of San Francisco

sanfran_map

# instantiate a feature group for the incidents in the dataframe

incidents = folium.map.FeatureGroup()

# loop through the 100 crimes and add each to the incidents feature group

for lat, lng, in zip(df_incidents.Y, df_incidents.X):

incidents.add_child(

folium.features.CircleMarker(

[lat, lng],
radius=5, # define how big you want the circle markers to be

color='yellow',

fill=True,

fill_color='blue',

fill_opacity=0.6

# add incidents to map

sanfran_map.add_child(incidents)

# instantiate a feature group for the incidents in the dataframe

incidents = folium.map.FeatureGroup()

# loop through the 100 crimes and add each to the incidents feature group

for lat, lng, in zip(df_incidents.Y, df_incidents.X):

incidents.add_child(

folium.features.CircleMarker(

[lat, lng],

radius=5, # define how big you want the circle markers to be

color='yellow',

fill=True,

fill_color='blue',

fill_opacity=0.6

# add pop-up text to each marker on the map


latitudes = list(df_incidents.Y)

longitudes = list(df_incidents.X)

labels = list(df_incidents.Category)

for lat, lng, label in zip(latitudes, longitudes, labels):

folium.Marker([lat, lng], popup=label).add_to(sanfran_map)

# add incidents to map

sanfran_map.add_child(incidents)

#to group the markers into different clusters. Each cluster is then represented by the number of
crimes in each neighborhood. These clusters can be thought of as pockets of San Francisco
which you can then analyze separately.

from folium import plugins

# let's start again with a clean copy of the map of San Francisco

sanfran_map = folium.Map(location = [latitude, longitude], zoom_start = 12)

# instantiate a mark cluster object for the incidents in the dataframe

incidents = plugins.MarkerCluster().add_to(sanfran_map)

# loop through the dataframe and add each data point to the mark cluster

for lat, lng, label, in zip(df_incidents.Y, df_incidents.X, df_incidents.Category):

folium.Marker(

location=[lat, lng],

icon=None,

popup=label,

).add_to(incidents)

# display map

sanfran_map
D.Choropleth Maps

In order to create a Choropleth map, we need a GeoJSON file that defines the areas/boundaries
of the state, county, or country that we are interested in. In our case, since we are endeavoring
to create a world map, we want a GeoJSON that defines the boundaries of all world countries.
For your convenience, we will be providing you with this file, so let's go ahead and load it.

# download countries geojson file

from js import fetch

import io

import json

URL = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/
IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data%20Files/world_countries.json'

resp = await fetch(URL)

data = io.BytesIO((await resp.arrayBuffer()).to_py())

world_geo = json.load(data)

print('GeoJSON file loaded!')

# create a plain world map

world_map = folium.Map(location=[0, 0], zoom_start=2)

# generate choropleth map using the total immigration of each country to Canada from 1980 to
2013

world_map.choropleth(

geo_data=world_geo,
data=df_can,

columns=['Country', 'Total'],

key_on='feature.properties.name',

fill_color='YlOrRd',

fill_opacity=0.7,

line_opacity=0.2,

legend_name='Immigration to Canada'

# display map

world_map

#Notice how the legend is displaying a negative boundary or threshold. Let's fix that by
defining our own thresholds and starting with 0 instead of -6,918!

# create a numpy array of length 6 and has linear spacing from the minimum total immigration to
the maximum total immigration

threshold_scale = np.linspace(df_can['Total'].min(),

df_can['Total'].max(),

6, dtype=int)

threshold_scale = threshold_scale.tolist() # change the numpy array to a list

threshold_scale[-1] = threshold_scale[-1] + 1 # make sure that the last value of the list is greater
than the maximum immigration

# let Folium determine the scale.

world_map = folium.Map(location=[0, 0], zoom_start=2)

world_map.choropleth(

geo_data=world_geo,

data=df_can,

columns=['Country', 'Total'],

key_on='feature.properties.name',

threshold_scale=threshold_scale,
fill_color='YlOrRd',

fill_opacity=0.7,

line_opacity=0.2,

legend_name='Immigration to Canada',

reset=True

world_map

You might also like