0% found this document useful (0 votes)
100 views27 pages

Data Visualization Plot Types Guide

The document provides a comprehensive reference table for various data visualization plots, detailing their types, appropriate use cases, and required data types. It includes descriptions of single variable plots like dot plots and histograms, as well as two-variable and multivariate plots such as scatter plots and parallel coordinate plots. Each plot type is accompanied by its pros, cons, and examples of usage to aid in selecting the appropriate visualization method for different data scenarios.

Uploaded by

iron pump
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views27 pages

Data Visualization Plot Types Guide

The document provides a comprehensive reference table for various data visualization plots, detailing their types, appropriate use cases, and required data types. It includes descriptions of single variable plots like dot plots and histograms, as well as two-variable and multivariate plots such as scatter plots and parallel coordinate plots. Each plot type is accompanied by its pros, cons, and examples of usage to aid in selecting the appropriate visualization method for different data scenarios.

Uploaded by

iron pump
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Data Visualization Plots - Reference Table

Plot Type When to Use Datatype Needed

Dot Plot Display individual data points for small Numerical (continuous or
datasets; show distribution and exact values discrete)

Jitter Plot Show distribution of data points with added Numerical (continuous or
random noise to avoid overplotting discrete)

Error Bar Plot Display mean/median values with uncertainty Numerical (continuous) + error
or variability (standard deviation, confidence values
intervals)

Box Plot Show distribution summary (median, quartiles, Numerical (continuous)


outliers); compare distributions across
categories

Histogram Show frequency distribution of continuous Numerical (continuous)


data; understand data shape and spread

Pie Plot Show proportions of categorical data as parts Categorical + numerical


of a whole counts/percentages

Scatter Plot Show relationship between two continuous Two numerical (continuous)
variables; identify correlations and patterns variables

Bar Plot Compare values across categories; show Categorical (x-axis) + Numerical
means or counts for different groups (y-axis)

Log Log Plot Visualize relationships spanning multiple Two numerical (positive values
orders of magnitude; identify power-law only)
relationships

Line Plot Show trends over time or ordered categories; Numerical (both axes), often
display continuous relationships time-series

Parallel Compare multiple numerical variables across Multiple numerical variables +


Coordinate Plot observations; identify patterns in optional categorical
high-dimensional data

Pair Plot Visualize pairwise relationships between all Multiple numerical variables +
numerical variables; show distributions and optional categorical
correlations

Stacked Plot Show cumulative totals over time or Time/categorical (x-axis) +


categories; display part-to-whole relationships multiple numerical series
Heatmap Display matrix data with color intensity; show Matrix of numerical values or 2D
correlations or values across two categorical categorical grid
dimensions

Violin Plot Combine box plot with kernel density plot; Categorical (x-axis) + Numerical
show full distribution shape across categories (y-axis)

Single variable plots [ UNIVARIATE PLOTS ]


1.​ Dot plot
2.​ Jitter Plot
3.​ Box and Whisker Plot
4.​ Histogram

Dot Plot =
➔​ A Dot Plot is a simple chart that displays individual data points along a single axis
(usually the x-axis).
➔​ Each dot represents one observation.
➔​ If multiple observations have the same value, the dots are stacked vertically.
➔​ It’s one of the simplest and most direct ways to visualize small datasets, showing
both frequency and distribution clearly.

➔​ When To Use :
◆​ You have a small to medium-sized dataset (e.g., < 100 data points).
◆​ You want to see each individual value and how often it occurs.
◆​ You need to show the distribution of a single variable (univariate data).
◆​ You want to easily identify clusters, gaps, and outliers.

➔​ Pros
-​ Easy to read and interpret — shows exact values.
-​ Reveals clusters, gaps, and outliers clearly.
-​ Simple and quick to make for small datasets.

➔​ Cons
-​ Becomes cluttered with large datasets.
-​ Hard to use for continuous or wide-range data.

Example :
Jitter plot=
-​ A Jitter Plot is a variation of the Dot Plot where random noise (jitter) is added to one
axis (usually the y-axis) to prevent overlapping of data points.
-​ It helps visualize individual data points when many values are identical, which would
otherwise stack up and overlap in a regular dot plot.

-​ In short: “A jitter plot = dot plot + tiny random displacement.”

-​ Use a jitter plot when:


-​ You have many identical or repeated data points that overlap.
-​ You want to see all individual observations instead of overlapping dots.
-​ You are comparing categorical groups with continuous values.

-​ Pros =
-​ Avoids overplotting by separating overlapping points
-​ Shows true density and spread of data
-​ Easy to see outliers even in overlapping data
-​ Good for categorical vs numeric comparisons

-​ Cons
-​ The random noise (jitter) can slightly distort exact values

Example :
Error Bar Plot =
An error bar plot is a type of graph that displays data with an indication of the uncertainty
or variability associated with each measurement or estimate. Here's a comprehensive
explanation:

Basic Structure
-​ Data points are plotted as symbols (dots, squares, etc.) showing the mean or central
value
-​ Error bars extend vertically and/or horizontally from each point, showing the range
of uncertainty
-​ The bars typically consist of a line with caps (small perpendicular lines) at each end

What Error Bars Represent


Error bars can show different types of variability:
-​ Standard Deviation (SD) - Shows the spread of individual data points around the
mean
-​ Standard Error (SE or SEM) - Shows the precision of the mean estimate
-​ Confidence Intervals (CI) - Shows the range where the true population mean likely
falls (e.g., 95% CI)
-​ Range - Shows minimum and maximum values
-​ Interquartile Range (IQR) - Shows the middle 50% of the data

How to Read Them


-​ The center point represents the average or mean value
-​ The length of the error bar indicates the amount of uncertainty
-​ Longer bars = more variability or less precision
-​ Shorter bars = less variability or more precision
-​ Overlapping error bars between groups suggest the differences may not be
statistically significant

Common Uses
Scientific research - displaying experimental results with measurement uncertainty
Clinical trials - comparing treatment effects across groups
Business analytics - showing forecasts with confidence ranges
Quality control - monitoring process variation
Survey data - presenting poll results with margins of error

Example
Imagine testing three different fertilizers on plant growth:
-​ Fertilizer A: mean height = 20 cm ± 2 cm (error bar from 18-22 cm)
-​ Fertilizer B: mean height = 25 cm ± 1 cm (error bar from 24-26 cm)
-​ Fertilizer C: mean height = 22 cm ± 4 cm (error bar from 18-26 cm)
The plot would show Fertilizer B has the tallest plants with the most consistent results,
while Fertilizer C shows the most variability.

Important Considerations
-​ Always label what the error bars represent (SD, SE, CI, etc.) - different measures
have different interpretations
-​ Error bars are crucial for understanding whether observed differences are
meaningful or just due to random variation
-​ They help distinguish between precision (reproducibility) and accuracy
(correctness)
Box Plot
A box plot (also called a box-and-whisker plot) is a graphical representation of the
distribution of a dataset that shows its central tendency, spread, and outliers in a simple
way. It is commonly used in statistics to visualize the summary of numerical data.

Components of a Box Plot


1.​ Minimum (Lower Whisker): The smallest data point excluding outliers.
2.​ First Quartile (Q1, Lower Edge of Box): The 25th percentile. 25% of data lies below
this value.
3.​ Median (Q2, Middle Line of Box): The 50th percentile. Divides the data into two
halves.
4.​ Third Quartile (Q3, Upper Edge of Box): The 75th percentile. 75% of data lies below
this value.
5.​ Maximum (Upper Whisker): The largest data point excluding outliers.
6.​ Interquartile Range (IQR): IQR = Q3 - Q1. Measures the middle 50% spread.
7.​ Outliers: Points that lie below Q1 - 1.5*IQR or above Q3 + 1.5*IQR.

When to Use
-​ To see the spread of data.
-​ To identify skewness (if median is closer to Q1 or Q3).
-​ To detect outliers.
-​ To compare distributions across different groups.

Pros
-​ Simple visual summary of data.
-​ Highlights median, quartiles, and outliers clearly.
-​ Good for comparing multiple datasets side by side.

Cons
-​ Does not show exact data distribution (like how data clusters within quartiles).
-​ Can hide multimodal distributions.

Histogram
-​ A histogram is a type of bar chart that groups numeric data into bins (intervals) and
displays how many data points fall into each bin.
-​ It’s commonly used to visualize the frequency distribution (or probability
distribution) of continuous or discrete numerical data.

Structure of a Histogram
-​ X-axis (horizontal): Represents the range of values divided into bins (intervals).
-​ Y-axis (vertical): Represents the frequency (count) or density (probability) of data
points within each bin.
-​ Bars: Each bar’s height corresponds to how many data points fall within that bin.

import [Link] as plt


data = [56, 67, 45, 90, 45, 55, 70, 65, 45, 50, 60, 75, 80, 40, 35]
[Link](data, bins=[30,40,50,60,70,80,90,100], edgecolor='black')
[Link]("Score Range")
[Link]("Frequency")
[Link]("Histogram with Bin Size = 10")
[Link]()
TWO VARIABLE PLOTS == [ ]
1.​ Bar plots [grouped ]
2.​ Scatter plot
3.​ Line Plot
4.​ Log Log plot

Scatter Plot
A scatter plot is a type of graph used to visualize the relationship between two numerical
variables. Each point on the plot represents a pair of values (x, y) from your dataset.

Key Features:
●​ X-axis: Represents one variable.
●​ Y-axis: Represents the other variable.
●​ Points: Each point shows a single observation in the dataset.​

Purpose:
1.​ Show correlation: You can quickly see if two variables are related.
○​ Positive correlation → points trend upward.
○​ Negative correlation → points trend downward.
○​ No correlation → points are scattered randomly.
2.​ Detect patterns or clusters: Groups of points might indicate natural clusters.
3.​ Identify outliers: Points that are far from others are easy to spot.

import [Link] as plt

# Heights and weights


heights = [150, 160, 165, 155, 170, 175, 180, 160, 165, 170]
weights = [50, 55, 60, 58, 65, 70, 75, 60, 62, 68]

# Categories: 'Male' or 'Female'


genders = ['F', 'M', 'M', 'F', 'M', 'M', 'M', 'F', 'F', 'M']

# Assign colors to categories


colors = ['blue' if gender == 'M' else 'pink' for gender in genders]

[Link](heights, weights, c=colors, s=100, edgecolor='black')


[Link]("Height (cm)")
[Link]("Weight (kg)")
[Link]("Scatter Plot: Heights vs Weights by Gender")
[Link]()
Log Log plot
A log-log plot is a graph where both the x-axis and y-axis use logarithmic scales
instead of linear scales. This powerful visualization tool is essential in science,
engineering, and data analysis
Why Use Log-Log Plots?
1. Visualizing Wide-Ranging Data
When your data spans several orders of magnitude (e.g., from 1 to 1,000,000), a linear plot
becomes unusable. Log scales compress large values and expand small ones, making all data
visible.
2. Identifying Power Laws
This is the most important application. If two variables follow a power law relationship:

y = a × x^b
Taking logarithms of both sides:
log(y) = log(a) + b × log(x)
This is a linear equation! On a log-log plot, power laws appear as straight lines, where the slope
equals the exponent b.

How to Read a Log-Log Plot

Scale Interpretation:
Each tick mark represents a power of 10 (or another base)
Equal distances on the axis represent equal ratios, not differences
Moving from 1 to 10 covers the same distance as 10 to 100

Slope Interpretation:

The slope of a line tells you the power law exponent:


Slope = 1: Linear relationship (y ∝ x)
Slope = 2: Quadratic relationship (y ∝ x²)
Slope = -1: Inverse relationship (y ∝ 1/x)
Slope = 0.5: Square root relationship (y ∝ √x)

import numpy as np
import [Link] as plt

# X values (avoid zero for log scale)


x = [Link](1, 10, 100)

# Different relationships
y1 = x**1 # Slope = 1 → linear
y2 = x**2 # Slope = 2 → quadratic
y3 = x**-1 # Slope = -1 → inverse
y4 = x**0.5 # Slope = 0.5 → square root

# Plot log-log
[Link](figsize=(8,6))
[Link](x, y1, label='Slope = 1 (y ∝ x)')
[Link](x, y2, label='Slope = 2 (y ∝ x²)')
[Link](x, y3, label='Slope = -1 (y ∝ 1/x)')
[Link](x, y4, label='Slope = 0.5 (y ∝ √x)')

[Link]("X (log scale)")


[Link]("Y (log scale)")
[Link]("Log-Log Plot with Different Slopes")
[Link](True, which="both", ls="--")
[Link]()
[Link]()
Multivariate Plots :
1.​ Stacked Plot
2.​ Parallel Coordinate Plot
3.​ Pair Plot
4.​ Heatmap

Stacked Plot
A stacked plot (often stacked area plot or stacked bar plot) is a way to visualize multiple
variables over a common axis (usually time or categories), where the values are “stacked”
on top of each other.
-​ Shows the total of all variables at each point
-​ Helps to see both individual contributions and overall trend.
2️. Types of Stacked Plots
-​ Stacked Area Plot – continuous data (like time series)
-​ Stacked Bar Plot – categorical data

Example

import [Link] as plt import [Link] as plt


import pandas as pd import pandas as pd

# Sample data
# Sample data
data = {
data = { 'Month': ['Jan', 'Feb', 'Mar', 'Apr', 'May'],
'Month': ['Jan', 'Feb', 'Mar', 'Apr', 'May'], 'Product_A': [3, 4, 5, 2, 6],
'Product_A': [3, 4, 5, 2, 6], 'Product_B': [2, 3, 4, 3, 2],
'Product_B': [2, 3, 4, 3, 2], 'Product_C': [1, 2, 1, 4, 3]
'Product_C': [1, 2, 1, 4, 3] }
}
df = [Link](data)
df = [Link](data)
df.set_index('Month', inplace=True)
df.set_index('Month', inplace=True)
# Plot stacked bar plot
# Plot stacked area plot [Link](kind='bar', stacked=True, alpha=0.7)
[Link](kind='area', stacked=True, alpha=0.7) [Link]("Stacked Bar Plot of Products")
[Link]("Stacked Area Plot of Products") [Link]("Sales")
[Link]("Sales") [Link]()
[Link]()

Parallel Coordinate Plot

A parallel coordinate plot is a visualization technique for displaying multivariate data.


Instead of plotting variables on perpendicular axes (like in scatter plots), it uses multiple
parallel vertical (or horizontal) axes, one for each variable. Each data point is represented
as a polyline that connects its values across all axes.
When to Use Parallel Coordinate Plots

Best for:

●​ Visualizing multivariate data (4+ variables)


●​ Detecting patterns, trends, and correlations across multiple dimensions
●​ Identifying clusters or groups in high-dimensional data
●​ Spotting outliers across multiple variables
●​ Comparing observations across many variables simultaneously

Use cases:

●​ Comparing performance metrics across multiple dimensions


●​ Analyzing product specifications with multiple features
●​ Quality control with multiple measured parameters
●​ Exploring relationships in high-dimensional datasets

Avoid when:

●​ You have only 2-3 variables (use scatter plots instead)


●​ You have too many data points (lines become overlapping and messy)
●​ Variables are on vastly different scales (though normalization can help)

Key Characteristics

●​ Each vertical line = one variable/dimension


●​ Each colored line connecting across axes = one observation/data point
●​ Lines that are parallel between two axes = positive correlation
●​ Lines that cross between two axes = negative correlation
●​ Bundled lines = similar observations

data = {
'Student': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
'Math': [85, 72, 90, 68, 95],
'Science': [88, 75, 85, 70, 92],
'English': [90, 80, 75, 85, 88],
'History': [82, 85, 80, 90, 86]
}
import [Link] as plt
from [Link] import parallel_coordinates

data = {
'Student': ['Alice', 'Bob', 'Charlie', 'Diana', 'Eve'],
'Math': [85, 72, 90, 68, 95],
'Science': [88, 75, 85, 70, 92],
'English': [90, 80, 75, 85, 88],
'History': [82, 85, 80, 90, 86]
}

df = [Link](data)

# Create categorical variable for coloring


df['Performance'] = [Link](df['Math'], bins=[0, 75, 85, 100],
labels=['Low', 'Medium', 'High'])

[Link](figsize=(10, 6))
parallel_coordinates(df, 'Performance',
cols=['Math', 'Science', 'English', 'History'],
color=['red', 'orange', 'green'])
[Link]('Student Performance - Parallel Coordinates')
[Link]('Subject')
[Link]('Score')
[Link](loc='best')
[Link](alpha=0.3)
plt.tight_layout()
[Link]()

Pair Plot (Scatter Plot Matrix)


Definition
A pair plot (also called a scatter plot matrix or SPLOM) is a grid of plots that shows:

●​ Scatter plots for every pair of numerical variables (off-diagonal)


●​ Distribution plots (histograms or KDE) for each individual variable (diagonal)

It displays all pairwise relationships in a dataset simultaneously, making it a powerful tool


for exploring multivariate data.

When to Use Pair Plots


Best for:

●​ Initial exploratory data analysis (EDA) of multivariate datasets


●​ Identifying correlations and relationships between variables
●​ Detecting clusters or groupings in data
●​ Spotting outliers across multiple dimensions
●​ Understanding distributions of individual variables
●​ Comparing patterns across different categories/classes

Use cases:

●​ Machine learning: feature analysis before modeling


●​ Statistical analysis: checking assumptions and relationships
●​ Scientific data: exploring experimental results
●​ Business analytics: customer segmentation analysis

Avoid when:

●​ You have too many variables (>6-7 becomes cluttered)


●​ You have categorical variables only (use other plots)
●​ You need precise measurements (pair plots show patterns, not exact values)
●​ You have extremely large datasets (too slow to render)

Key Characteristics
●​ Grid layout: n×n grid for n variables
●​ Diagonal: Shows distribution of each single variable
●​ Off-diagonal: Shows scatter plots between pairs of variables
●​ Symmetry: Upper and lower triangles are mirror images
●​ Color coding: Often colored by a categorical variable to reveal clusters

SYNTAX
[Link](
data,
vars=['var1', 'var2', 'var3'], # Which variables to plot
hue='category', # Color by category
diag_kind='hist', # 'hist', 'kde', or None
kind='scatter', # 'scatter', 'reg', or 'kde'
palette='Set1', # Color palette
markers=['o', 's', 'D'], # Different markers per group
height=2.5, # Size of each subplot
aspect=1 # Width/height ratio
)

IRIS DATASET

HeatMap
[Link](
data,
annot=True, # Show values in cells
fmt='.2f', # Number format
cmap='coolwarm', # Color scheme
vmin=0, vmax=100,# Value range
center=50, # Center point for diverging colors
linewidths=1, # Cell border width
square=True # Make cells square-shaped
)
Visualizing multiple distributions at the same time
-​ Violin Plot

Plot Type When to Use Detailed Syntax Datatype Needed

Dot Display individual data points Matplotlib: Numerical


Plot for small datasets; show [Link](x_values, y_values, (continuous or
distribution and exact values 'o', markersize=8, discrete)
color='blue')
[Link]('X Label')
[Link]('Y Label')
[Link]('Dot Plot')
[Link]()
Seaborn:
[Link](data=df,
y='column_name',
color='blue', size=8)
[Link]('Dot Plot')
[Link]()

Jitter Show distribution of data Seaborn: Numerical


Plot points with added random [Link](data=df, (continuous or
noise to avoid overplotting x='category', y='value', discrete)
jitter=True, alpha=0.6, size=5)
[Link]('Jitter Plot')
[Link]('Category')
[Link]('Value')
[Link]()
Alternative:
[Link](data=df,
x='category', y='value')
[Link]()

Error Display mean/median values Matplotlib: Numerical


Bar Plot with uncertainty or variability x = [1, 2, 3, 4, 5] (continuous) +
(standard deviation, confidence y = [10, 15, 13, 18, 20] error values
intervals) errors = [1, 2, 1.5, 2,
1.8]
[Link](x, y,
yerr=errors, fmt='o-',
capsize=5, capthick=2,
ecolor='red', markersize=8)
[Link]('X Label')
[Link]('Y Label')
[Link]('Error Bar Plot')
[Link](True, alpha=0.3)
[Link]()
With horizontal errors:
[Link](x, y,
xerr=x_errors,
yerr=y_errors, fmt='s',
capsize=5)
Box Show distribution summary Matplotlib: Numerical
Plot (median, quartiles, outliers); data = [list1, list2, list3] (continuous)
compare distributions across [Link](data,
categories labels=['Group1', 'Group2',
'Group3'], patch_artist=True,
notch=True)
[Link]('Values')
[Link]('Box Plot')
[Link](True, alpha=0.3)
[Link]()
Seaborn:
[Link](data=df, x='category',
y='value', palette='Set2',
width=0.5)
[Link]('Box Plot')
[Link]()
Single variable:
[Link](y=df['column'])

Histogram Show frequency distribution Matplotlib: Numerical


of continuous data; [Link](data, bins=30, (continuous)
understand data shape and edgecolor='black',
spread color='skyblue', alpha=0.7)
[Link]('Value')
[Link]('Frequency')
[Link]('Histogram')
[Link](True, alpha=0.3)
[Link]()
Seaborn:
[Link](data=df,
x='column', bins=30,
kde=True, color='blue')
[Link]('Histogram with
KDE')
[Link]()
With custom bins:
[Link](data, bins=[0, 10,
20, 30, 40, 50],
density=True)
Pie Show Matplotlib: Categorical +
Plot proportions labels = ['Category A', 'Category B', 'Category C', numerical
of 'Category D'] counts/percentages
categorical sizes = [30, 25, 20, 25]
data as
colors = ['gold', 'lightcoral', 'lightskyblue',
parts of a
'lightgreen']
whole
explode = (0.1, 0, 0, 0)
[Link](sizes, labels=labels, colors=colors,
autopct='%1.1f%%', startangle=90, explode=explode,
shadow=True)
[Link]('Pie Chart')
[Link]('equal')
[Link]()
From DataFrame:
df['category'].value_counts().[Link](autopct='%1.
1f%%', figsize=(8,8))
Bivariate Plots
Plot Type When to Use Detailed Syntax Datatype Needed

Scatter Show relationship Matplotlib: Two numerical


Plot between two [Link](x, y, s=50, c='blue', (continuous)
continuous alpha=0.6, edgecolors='black', variables
variables; identify linewidth=0.5)
correlations and
[Link]('X Variable')
patterns
[Link]('Y Variable')
[Link]('Scatter Plot')
[Link](True, alpha=0.3)
[Link]()
Seaborn:
[Link](data=df, x='col1',
y='col2', hue='category', size='size_col',
palette='viridis', alpha=0.7)
[Link]('Scatter Plot')
[Link](bbox_to_anchor=(1.05, 1),
loc='upper left')
[Link]()
With regression line:
[Link](data=df, x='col1', y='col2',
scatter_kws={'alpha':0.5})

Bar Plot Compare values Matplotlib: Categorical


across categories; categories = ['A', 'B', 'C', 'D'] (x-axis) +
show means or values = [25, 40, 30, 55] Numerical (y-axis)
counts for different [Link](categories, values,
groups
color='steelblue', edgecolor='black',
width=0.6)
[Link]('Categories')
[Link]('Values')
[Link]('Bar Plot')
[Link](axis='y', alpha=0.3)
[Link]()
Seaborn:
[Link](data=df, x='category',
y='value', hue='subcategory',
palette='Set2', ci='sd', capsize=0.1)
[Link]('Bar Plot')
[Link](title='Subcategory')
[Link]()
Horizontal:
[Link](categories, values)
Log Log Visualize Matplotlib: Two numerical
Plot relationships [Link](x, y, 'o-', linewidth=2, variables (positive
spanning multiple markersize=6, color='blue') values only)
orders of [Link]('X (log scale)')
magnitude; identify
[Link]('Y (log scale)')
power-law
[Link]('Log-Log Plot')
relationships
[Link](True, which='both', alpha=0.3)
[Link]()
Alternative method:
[Link](x, y, 'o-')
[Link]('log')
[Link]('log')
[Link]('X (log scale)')
[Link]('Y (log scale)')
[Link](True, which='both', ls='--',
alpha=0.5)
[Link]()

Line Plot Show trends over Matplotlib: Numerical (both


time or ordered [Link](x, y, color='blue', linewidth=2, axes), often
categories; display linestyle='-', marker='o', markersize=6, time-series data
continuous label='Series 1')
relationships
[Link]('Time/X Variable')
[Link]('Y Variable')
[Link]('Line Plot')
[Link](loc='best')
[Link](True, alpha=0.3)
[Link]()
Seaborn:
[Link](data=df, x='time', y='value',
hue='category', style='category',
markers=True, dashes=False)
[Link]('Line Plot')
[Link]()
Multiple lines:
[Link](x, y1, label='Line 1')
[Link](x, y2, label='Line 2')
[Link]()
Multivariate Plots
Plot Type When to Use Detailed Syntax Datatype Needed

Parallel Compare Pandas: Multiple


Coordinate multiple from [Link] import numerical
Plot numerical parallel_coordinates variables +
variables parallel_coordinates(df, 'class_column', optional
across categorical class
color=['blue', 'red', 'green'], alpha=0.5)
observations; variable
[Link]('Parallel Coordinates Plot')
identify
[Link]('Variables')
patterns in
high-dimensi [Link]('Values')
onal data [Link](loc='best')
[Link](True, alpha=0.3)
[Link]()
Plotly (interactive):
import [Link] as px
fig = px.parallel_coordinates(df,
color='class_column', dimensions=['var1',
'var2', 'var3', 'var4'],
color_continuous_scale=[Link]
lrose)
[Link]()

Pair Plot Visualize Seaborn: Multiple


pairwise [Link](data=df, hue='category', numerical
relationships palette='husl', diag_kind='kde', variables +
between all plot_kws={'alpha': 0.6, 's': 50, 'edgecolor': optional
numerical categorical
'k'}, height=2.5)
variables; variable for
[Link]('Pair Plot', y=1.02)
show coloring
[Link]()
distributions
Specific columns:
and
correlations [Link](data=df, vars=['col1', 'col2',
'col3'], hue='category', markers=['o', 's',
'D'], diag_kind='hist')
[Link]()
With regression:
[Link](df, kind='reg', diag_kind='kde')
Stacked Plot Show Matplotlib: Time/categorical
cumulative x = [1, 2, 3, 4, 5] (x-axis) +
totals over y1 = [1, 2, 3, 4, 5] multiple
time or y2 = [1, 1, 2, 2, 3] numerical series
categories;
y3 = [2, 2, 2, 3, 3]
display
[Link](x, y1, y2, y3, labels=['Series
part-to-whol
1', 'Series 2', 'Series 3'],
e
relationships colors=['#1f77b4', '#ff7f0e', '#2ca02c'],
alpha=0.8)
[Link]('X Variable')
[Link]('Cumulative Value')
[Link]('Stacked Area Plot')
[Link](loc='upper left')
[Link](True, alpha=0.3)
[Link]()
Pandas:
[Link](stacked=True, alpha=0.7,
figsize=(10, 6))
[Link]('Stacked Area Chart')
[Link]()

Heatmap Display Seaborn: Matrix of


matrix data correlation_matrix = [Link]() numerical values
with color [Link](correlation_matrix, annot=True, or 2D grid with
intensity; fmt='.2f', cmap='coolwarm', center=0, categorical
show dimensions
square=True, linewidths=1, cbar_kws={'shrink':
correlations
0.8})
or values
[Link]('Correlation Heatmap')
across two
categorical plt.tight_layout()
dimensions [Link]()
Custom data:
[Link](data=pivot_table, annot=True,
cmap='YlGnBu', linecolor='white',
linewidths=0.5)
[Link]('X Categories')
[Link]('Y Categories')
[Link]('Heatmap')
[Link]()
With clustering:
[Link]([Link](), cmap='viridis',
annot=True)
Violin Plot Combine box Seaborn: Categorical
plot with [Link](data=df, x='category', (x-axis) +
kernel density y='value', hue='subcategory', split=False, Numerical
plot; show full palette='muted', inner='quartile', (y-axis), optional
distribution secondary
scale='width')
shape across categorical for
[Link]('Violin Plot')
categories hue
[Link]('Category')
[Link]('Value')
[Link](title='Subcategory',
bbox_to_anchor=(1.05, 1))
[Link]()
With split violins:
[Link](data=df, x='category',
y='value', hue='binary_var', split=True,
palette='Set2')
[Link]()
Horizontal:
[Link](data=df, x='value',
y='category', orient='h')

You might also like