Modulo 8. Data Visualization With Python
Modulo 8. Data Visualization With Python
Tabla de contenido
Importar.........................................................................................................................................2
Line Pots (Series/Dataframe)........................................................................................................2
Area Plots.......................................................................................................................................3
Histograms.....................................................................................................................................4
Colors available in Matplotlib.......................................................................................................8
Bar Charts (Dataframe)..................................................................................................................8
Bar Charts (Dataframe) – Horizontal...........................................................................................10
Waffle Charts...............................................................................................................................11
Waffle Charts – función...............................................................................................................15
Regression Plots...........................................................................................................................17
Scatter Plot – Plotly.express.........................................................................................................20
Line Plot.......................................................................................................................................20
Bar Chart......................................................................................................................................21
Bubble Chart................................................................................................................................22
.....................................................................................................................................................22
Histogram.....................................................................................................................................22
Pie Chart.......................................................................................................................................23
Sunburst Charts............................................................................................................................23
Mapas...........................................................................................................................................23
Mapa Normal...............................................................................................................................23
A. Stamen Toner Maps- Blanco y Negro...................................................................................24
B. B. Stamen Terrain Maps.......................................................................................................24
C. Maps with Markers..............................................................................................................25
D. Choropleth Maps..................................................................................................................28
Scatter Plot...................................................................................................................................30
Line Plot.......................................................................................................................................30
%matplotlib inline
print(plt.style.available)
haiti = df_can.loc['Haiti', years] # passing in years 1980 - 2013 to exclude the 'total' column
haiti.head()
haiti.plot()
haiti.index = haiti.index.map(int) # let's change the index values of Haiti to type integer for plotting
haiti.plot(kind='line')
plt.title('Immigration from Haiti')
plt.ylabel('Number of immigrants')
plt.xlabel('Years')
plt.show() # need this line to show the updates made to the figure
haiti.plot(kind='line')
plt.ylabel('Number of Immigrants')
plt.xlabel('Years')
plt.show()
Area Plots
df_top5.index = df_top5.index.map(int)
df_top5.plot(kind='area',
stacked=False,
plt.ylabel('Number of Immigrants')
plt.xlabel('Years')
plt.show()
**Option 2: Artist layer (Object oriented method) - using an Axes instance from Matplotlib
(preferred) **
ax.set_ylabel('Number of Immigrants')
ax.set_xlabel('Years')
Histograms
# add y-label
plt.ylabel('Number of Countries')
# add x-label
plt.xlabel('Number of Immigrants')
plt.show()
Notice that the x-axis labels do not match with the bin size. This can be fixed by passing in
a xticks keyword that contains the list of the bin sizes, as follows:
plt.show()
# generate histogram
plt.title('Histogram of Immigration from Denmark, Norway, and Sweden from 1980 - 2013')
plt.ylabel('Number of Years')
plt.xlabel('Number of Immigrants')
plt.show()
Let's make a few modifications to improve the impact and aesthetics of the previous plot:
# un-stacked histogram
df_t.plot(kind ='hist',
figsize=(10, 6),
bins=15,
alpha=0.6,
xticks=bin_edges,
plt.title('Histogram of Immigration from Denmark, Norway, and Sweden from 1980 - 2013')
plt.ylabel('Number of Years')
plt.xlabel('Number of Immigrants')
plt.show()
If we do not want the plots to overlap each other, we can stack them using
the stacked parameter. Let's also adjust the min and max x-axis labels to remove the extra gap
on the edges of the plot. We can pass a tuple (min,max) using the xlim paramater, as show
below.
xmin = bin_edges[0] - 10 # first bin value is 31.0, adding buffer of 10 for aesthetic
purposes
xmax = bin_edges[-1] + 10 # last bin value is 308.0, adding buffer of 10 for aesthetic
purposes
# stacked Histogram
df_t.plot(kind='hist',
figsize=(10, 6),
bins=15,
xticks=bin_edges,
stacked=True,
xlim=(xmin, xmax)
plt.title('Histogram of Immigration from Denmark, Norway, and Sweden from 1980 - 2013')
plt.ylabel('Number of Years')
plt.xlabel('Number of Immigrants')
plt.show()
import matplotlib
print(name, hex)
df_iceland.head()
# step 2: plot data
plt.title('Icelandic immigrants to Canada from 1980 to 2013') # add title to the plot
plt.show()
plt.xlabel('Year')
plt.ylabel('Number of Immigrants')
# Annotate arrow
xy=(32, 70), # place head of the arrow at point (year 2012 , pop 70)
xytext=(28, 20), # place base of the arrow at point (year 2008 , pop 20)
xycoords='data', # will use the coordinate system of the object being annotated
# Annotate Text
xy=(28, 30), # start the text at at point (year 2008 , pop 30)
plt.show()
# generate plot
plt.xlabel('Number of Immigrants')
plt.show()
Waffle Charts
Paso previo
Step 1. The first step into creating a waffle chart is determing the proportion of each category with
respect to the total.
total_values = df_dsn['Total'].sum()
Step 3. The third step is using the proportion of each category to determe it respective number of
tiles
Step 4. The fourth step is creating a matrix that resembles the waffle chart and populating it.
category_index = 0
tile_index = 0
tile_index += 1
# if the number of tiles populated for the current category is equal to its corresponding
allocated tiles...
category_index += 1
fig = plt.figure()
colormap = plt.cm.coolwarm
plt.matshow(waffle_chart, cmap=colormap)
plt.colorbar()
plt.show()
fig = plt.figure()
colormap = plt.cm.coolwarm
plt.matshow(waffle_chart, cmap=colormap)
plt.colorbar()
ax = plt.gca()
plt.xticks([])
plt.yticks([])
plt.show()
Step 7. Create a legend and add it to chart.
fig = plt.figure()
colormap = plt.cm.coolwarm
plt.matshow(waffle_chart, cmap=colormap)
plt.colorbar()
ax = plt.gca()
plt.xticks([])
plt.yticks([])
# compute cumulative sum of individual categories to match color schemes between chart and
legend
values_cumsum = np.cumsum(df_dsn['Total'])
total_values = values_cumsum[len(values_cumsum) - 1]
# create legend
legend_handles = []
color_val = colormap(float(values_cumsum[i])/total_values)
legend_handles.append(mpatches.Patch(color=color_val, label=label_str))
# add legend to chart
plt.legend(handles=legend_handles,
loc='lower center',
ncol=len(df_dsn.index.values),
plt.show()
plt.xticks([])
plt.yticks([])
# compute cumulative sum of individual categories to match color schemes between chart and legend
values_cumsum = np.cumsum(values)
total_values = values_cumsum[len(values_cumsum) - 1]
# create legend
legend_handles = []
for i, category in enumerate(categories):
if value_sign == '%':
label_str = category + ' (' + str(values[i]) + value_sign + ')'
else:
label_str = category + ' (' + value_sign + str(values[i]) + ')'
color_val = colormap(float(values_cumsum[i])/total_values)
legend_handles.append(mpatches.Patch(color=color_val, label=label_str))
Regression Plots
# install seaborn
# import library
Create a new dataframe that stores that total number of landed immigrants to Canada per year
from 1980 to 2013.
# we can use the sum() method to get the total population per year
df_tot = pd.DataFrame(df_can[years].sum(axis=0))
# change the years to type float (useful for regression later on)
df_tot.reset_index(inplace=True)
# rename columns
df_tot.head()
plt.figure(figsize=(15, 10))
plt.show()
plt.figure(figsize=(15, 10))
sns.set(font_scale=1.5)
plt.show()
plt.figure(figsize=(15, 10))
sns.set(font_scale=1.5)
sns.set_style('whitegrid')
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
https://labs.cognitiveclass.ai/v2/tools/jupyterlite
# First we create a figure using go.Figure and adding trace to it through go.scatter
# Updating layout through `update_layout`. Here we are adding title to the plot and providing title
to x and y axis.
fig.show()
# Group the data by Month and compute average over arrival delay time.
line_data = data.groupby('Month')['ArrDelay'].mean().reset_index()
# Create line plot here
fig.show()
Bar Chart
# Use plotly express bar chart function px.bar. Provide input data, x and y axis variable, and title of
the chart.
Bubble Chart
fig.show()
Histogram
# Create histogram here
fig.show()
Pie Chart
# Use px.pie function to create the chart. Input dataset.
# Values parameter will set values associated to the sector. 'Month' feature is passed to it.
fig.show()
Sunburst Charts
# Create sunburst chart here
fig.show()
Mapas
Mapa Normal
#!pip3 install folium==0.5.0
import folium
mexico_latitude = 23.6345
mexico_longitude = -102.5528
# define the world map centered around Canada with a higher zoom level
mexico_map
A.Stamen Toner Maps- Blanco y Negro
# display map
world_map
# display map
world_map
C.Maps with Markers
Previo al mapa
import io
URL = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/
IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data%20Files/
Police_Department_Incidents_-_Previous_Year__2016_.csv'
df_incidents = pd.read_csv(text)
sanfran_map
incidents = folium.map.FeatureGroup()
# loop through the 100 crimes and add each to the incidents feature group
incidents.add_child(
folium.features.CircleMarker(
[lat, lng],
radius=5, # define how big you want the circle markers to be
color='yellow',
fill=True,
fill_color='blue',
fill_opacity=0.6
sanfran_map.add_child(incidents)
incidents = folium.map.FeatureGroup()
# loop through the 100 crimes and add each to the incidents feature group
incidents.add_child(
folium.features.CircleMarker(
[lat, lng],
color='yellow',
fill=True,
fill_color='blue',
fill_opacity=0.6
longitudes = list(df_incidents.X)
labels = list(df_incidents.Category)
sanfran_map.add_child(incidents)
#to group the markers into different clusters. Each cluster is then represented by the number of
crimes in each neighborhood. These clusters can be thought of as pockets of San Francisco
which you can then analyze separately.
# let's start again with a clean copy of the map of San Francisco
incidents = plugins.MarkerCluster().add_to(sanfran_map)
# loop through the dataframe and add each data point to the mark cluster
folium.Marker(
location=[lat, lng],
icon=None,
popup=label,
).add_to(incidents)
# display map
sanfran_map
D.Choropleth Maps
In order to create a Choropleth map, we need a GeoJSON file that defines the areas/boundaries
of the state, county, or country that we are interested in. In our case, since we are endeavoring
to create a world map, we want a GeoJSON that defines the boundaries of all world countries.
For your convenience, we will be providing you with this file, so let's go ahead and load it.
import io
import json
URL = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/
IBMDeveloperSkillsNetwork-DV0101EN-SkillsNetwork/Data%20Files/world_countries.json'
world_geo = json.load(data)
# generate choropleth map using the total immigration of each country to Canada from 1980 to
2013
world_map.choropleth(
geo_data=world_geo,
data=df_can,
columns=['Country', 'Total'],
key_on='feature.properties.name',
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Immigration to Canada'
# display map
world_map
#Notice how the legend is displaying a negative boundary or threshold. Let's fix that by
defining our own thresholds and starting with 0 instead of -6,918!
# create a numpy array of length 6 and has linear spacing from the minimum total immigration to
the maximum total immigration
threshold_scale = np.linspace(df_can['Total'].min(),
df_can['Total'].max(),
6, dtype=int)
threshold_scale[-1] = threshold_scale[-1] + 1 # make sure that the last value of the list is greater
than the maximum immigration
world_map.choropleth(
geo_data=world_geo,
data=df_can,
columns=['Country', 'Total'],
key_on='feature.properties.name',
threshold_scale=threshold_scale,
fill_color='YlOrRd',
fill_opacity=0.7,
line_opacity=0.2,
legend_name='Immigration to Canada',
reset=True
world_map