Visualizing Data
May 24, 2021
1 Visualizing Data
A fundamental part of the data scientist’s toolkit is data visualization. Although it is very easy to
create visualizations, it’s much harder to produce good ones.
1.1 matplotlib
A wide variety of tools exists for visualizing data. We will be using the matplotlib library, which is
widely used. Matplotlib is a Python 2D plotting library which produces publication quality figures
in a variety of hardcopy formats and interactive environments across platforms
Checking Matplotlib Version
[ ]: import matplotlib
print(matplotlib.__version__)
3.4.2
1.1.1 Line Graph
Plotting a Simple Line Graph
[ ]: # Plotting a Simple Line Graph
import [Link] as plt
squares = [1, 4, 9, 16, 25]
[Link](squares)
[Link]()
1
Changing the Label Type and Graph Thickness
[ ]: # Changing the Label Type and Graph Thickness
import [Link] as plt
squares = [1, 4, 9, 16, 25]
[Link](squares, linewidth=3)
# Set chart title and label axes.
[Link]("Square Numbers", fontsize=16)
# loc = 'left'
[Link]("Value", fontsize=14)
[Link]("Square of Value", fontsize=14)
#Set size of tick labels.
plt.tick_params(axis='both', labelsize=14)
[Link]()
2
Correcting the Plot
[ ]: # Correcting the Plot
import [Link] as plt
input_values = [1, 2, 3, 4, 5]
squares = [1, 4, 9, 16, 25]
[Link](input_values, squares, linewidth=3)
# Set chart title and label axes.
[Link]("Square Numbers", fontsize=16)
[Link]("Value", fontsize=14)
[Link]("Square of Value", fontsize=14)
#Set size of tick labels.
plt.tick_params(labelsize=16)
[Link]()
3
Markers
[ ]: # Markes
import [Link] as plt
input_values = [1, 2, 3, 4, 5]
squares = [1, 4, 9, 16, 25]
[Link](input_values, squares, marker='8')
# change marker to - *, v, ^, d, D
# Set chart title and label axes.
[Link]("Square Numbers", fontsize=16)
[Link]("Value", fontsize=14)
[Link]("Square of Value", fontsize=14)
#Set size of tick labels.
plt.tick_params(axis='both', labelsize=12)
[Link]()
4
Linestyle
You can use the keyword argument linestyle, or shorter ls, to change the style of the plotted line:
[ ]: # Linestyle
import [Link] as plt
squares = [1, 4, 9, 16, 25]
[Link](squares, marker='o', ls='-.')
# change it to 'dashed'
# Set chart title and label axes.
[Link]("Square Numbers", fontsize=16)
[Link]("Value", fontsize=14)
[Link]("Square of Value", fontsize=14)
#Set size of tick labels.
plt.tick_params(axis='both', labelsize=12)
[Link]()
5
• linestyle can be written as = ls
• dotted can be written as = :
• dashed can be written as = –
Style Or
‘solid’ (default) ‘-’
‘dotted’ ‘:’
‘dashed’ ‘–’
‘dashdot’ ‘-.’
Line Color
You can use the keyword argument color or the shorter c to set the color of the line:
[ ]: # refer above cell
[Link](squares, c = 'green')
# you can use c instead of color
[Link]()
6
Multiple Lines
You can plot as many lines as you like by simply adding more [Link]() functions:
[ ]: import [Link] as plt
y1 = [3, 8, 1, 10]
y2 = [6, 2, 7, 11]
y3 = [5, 1, 12, 13]
[Link](y1, c = 'r')
[Link](y2, c = 'b')
[Link](y3)
[Link]()
7
Grid Lines
With Pyplot, you can use the grid() function to add grid lines to the plot.
[ ]: import [Link] as plt
subjects = ['Maths', 'Phy', 'Chem', 'CS']
squares = [83, 75, 89, 76]
[Link](subjects, squares, linewidth=3)
# Set chart title and label axes.
[Link]("Marks Details:", fontsize=16)
[Link]("Subjects", fontsize=14)
[Link]("Marks", fontsize=14)
#Set size of tick labels.
plt.tick_params(axis='both', labelsize=12)
[Link]()
[Link]()
8
TRY IT YOURSELF
You have to change the attributes of grid
Matplotlib Subplots
Display Multiple Plots With the subplots() function you can draw multiple plots in one figure:
[ ]: import [Link] as plt
#plot 1:
x = [0, 1, 2, 3]
y = [3, 8, 1, 10]
[Link](1, 3, 2)
[Link]("First")
[Link](x,y)
#plot 2:
x = [0, 1, 2, 3]
y = [10, 20, 30, 40]
[Link](1, 3, 3)
[Link]("Second")
[Link](x,y)
9
#plot 3:
x = [4, 3, 2, 1]
y = [16, 9, 4, 1]
[Link](1, 3, 1)
[Link]("THIRD")
[Link](x,y)
[Link]("Two plots")
[Link]()
TRY IT YOURSELF
Subplot the two or more graphs horizontally
1.1.2 Scatter Plot
[ ]: import [Link] as plt
[Link](2, 4)
[Link]()
10
[ ]: import [Link] as plt
[Link](2, 4, s=200, marker='1')
# Set chart title and label axes.
[Link]("Square Numbers", fontsize=24)
[Link]("Value", fontsize=14)
[Link]("Square of Value", fontsize=14)
# Set size of tick labels.
plt.tick_params(axis='both', labelsize=14)
[Link]()
11
[ ]: import [Link] as plt
x_values = [1, 2, 3, 4, 5]
y_values = [1, 4, 9, 16, 25]
[Link](x_values, y_values)
# Set chart title and label axes.
[Link]("Square Numbers", fontsize=16)
[Link]("Value", fontsize=14)
[Link]("Square of Value", fontsize=14)
# Set size of tick labels.
plt.tick_params(axis='both', labelsize=14)
[Link]()
12
Calculating Data Automatically
[ ]: import [Link] as plt
x_values = list(range(1, 801, 13))
y_values = [x**2 for x in x_values]
[Link](x_values, y_values, s=20)
# Set the range for each axis.
[Link]([0, 900, 0, 900000])
[Link]()
13
[ ]: # Compare Plots
import [Link] as plt
x1 = list(range(1, 30, 2))
y1 = [x*2 for x in x1]
x2 = list(range(0, 30, 2))
y2 = [x*3 for x in x2]
[Link](x1, y1, s=20)
[Link](x2, y2, s=20)
[Link]()
14
[ ]: import [Link] as plt
#day one, the age and speed of 13 cars:
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
[Link](x, y, color='r')
#day two, the age and speed of 15 cars:
x = [2,2,8,1,15,8,12,9,7,3,11,4,7,14,12]
y = [100,105,84,105,90,99,90,95,94,100,79,112,91,80,85]
[Link](x, y, color='g')
[Link]()
15
Using a Colormap
[ ]: import [Link] as plt
x = [5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
colors = [0, 10, 20, 30, 40, 45, 50, 55, 60, 70, 80, 90, 100]
[Link](x, y, c=colors, cmap='Greens')
[Link]()
TRY IT YOURSELF
Cubes: A number raised to the third power is a cube. Plot the first five cubic numbers, and then
plot the first 5000 cubic numbers. Using scatter plot with colormap
1.1.3 Bars
With Pyplot, you can use the bar() function to draw bar graphs:
[ ]: import [Link] as plt
x = ["MATHS", "SCIENCE", "ENGLISH", "HINDI"]
y = [49, 35, 41, 31]
[Link](x, y, color='g')
16
# You can use [Link]
#[Link](range(0, 101, 10))
[Link]()
[ ]: [Link](x, y)
[Link]()
[ ]: [Link](x, y, width = 0.5)
[Link]()
1.1.4 Legends
Plot legends give meaning to a visualization.
[50]: import [Link] as plt
x = [x for x in range(0,11)]
y = [2*x for x in x]
[Link](dpi=150)
[Link](x, y, c='b', marker='^', label = '$2x$')
y2 = [x**2 for x in x]
[Link](x[0:3], y2[0:3], c='r')
[Link](x[2:], y2[2:], c='r', ls='--', marker='o', label='$x^2$')
[Link]('X Axis')
[Link]('Y Axis')
[Link](range(0,11))
[Link](range(0,101, 10))
[Link]()
[Link]("Legend Example")
[Link](fancybox=True, framealpha=True, shadow=True, borderpad=1)
[Link]()
17
1.1.5 Pie Chart
[3]: import [Link] as plt
#[Link](dpi=100)
y = [35, 25, 25, 20, 15, 10, 4]
mylabels = ["A", "B", "C", "D", "E", "F", "G"]
#[Link](y, labels = mylabels, autopct='%.2f%%')
myexplode = [0.1, 0.1, 0.1, 0.1, 0.1, 0.3, 0.3]
[Link](y, labels = mylabels, explode = myexplode,
shadow = True, autopct='%.2f%%', counterclock=False,
radius = 2, startangle=90)
[Link]("Fruits", loc="left")
#[Link](title='Fruits',loc='upper left')
[Link]()
18
19