0% found this document useful (0 votes)
4 views5 pages

Unit3_4) Matplotlib and seaborn.ipynb - Colab

The document provides an overview of various data visualization techniques using Matplotlib and Seaborn, including box plots, histograms, bar charts, pie charts, line plots, area charts, scatter plots, and heatmaps. Each visualization type is described with its purpose, suitable data types, and example code for implementation. The document emphasizes the importance of selecting the appropriate chart based on the nature of the data being represented.

Uploaded by

shivam511439
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
4 views5 pages

Unit3_4) Matplotlib and seaborn.ipynb - Colab

The document provides an overview of various data visualization techniques using Matplotlib and Seaborn, including box plots, histograms, bar charts, pie charts, line plots, area charts, scatter plots, and heatmaps. Each visualization type is described with its purpose, suitable data types, and example code for implementation. The document emphasizes the importance of selecting the appropriate chart based on the nature of the data being represented.

Uploaded by

shivam511439
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 5

keyboard_arrow_down Understanding Canvas

keyboard_arrow_down Using Matplotlib library and seaborn


Box Plot

Summary:

Minimum: The smallest data point (excluding outliers).

First quartile (Q1): 25th percentile, the lower edge of the box.

Median (Q2): 50th percentile, shown as a line within the box.

Third quartile (Q3): 75th percentile, the upper edge of the box.

Maximum: The largest data point (excluding outliers).

Interquartile Range (IQR): The height of the box represents the IQR (Q3 - Q1).

Whiskers: Extend to the smallest and largest values within 1.5 × IQR of the quartiles.

Outliers: Points outside 1.5 × IQR from Q1 and Q3 are plotted individually.

Types of Data Suitable for a Box Plot:

Continuous data: Example: Heights, weights, temperatures, or sales figures.

Ordinal data: Data with meaningful order (e.g., ratings on a scale).

Grouped data: Comparing continuous data across categories (e.g., test scores by gender or region).

Box plots are not suitable for discrete or categorical data without an inherent order or quantitative value.

import matplotlib.pyplot as plt


impoimportrt seaborn as sns

data = [7, 8, 5, 6, 10, 15, 13]


sns.boxplot(data=data)
plt.title("Box Plot")
plt.show()

import matplotlib.pyplot as plt


import seaborn as sns

# Data for multiple box plots


data1 = [7, 8, 5, 6, 10, 15, 13]
data2 = [3, 4, 2, 5, 8, 12, 9]
data3 = [10, 12, 8, 15, 17, 20, 22]

# Combine the data into a single list


data = [data1, data2, data3]

# Create the box plots


sns.boxplot(data=data)
plt.title("Multiple Box Plots")
plt.xlabel("Datasets")
plt.ylabel("Values")
plt.xticks(ticks=[0, 1, 2], labels=["Data1", "Data2", "Data3"]) # Custom labels
plt.show()

Histogram

Summary:
Distribution Representation: Visualizes the frequency distribution of a dataset.

Bins (Intervals): The x-axis is divided into intervals or "bins," representing ranges of data values.

Frequency: The y-axis represents the frequency or count of data points within each bin.

Bar Heights: The height of each bar indicates the number of data points in the corresponding bin.

Continuous Data: The bars touch each other, indicating the continuous nature of the data.

Customizable Bin Size: Bin width affects the level of detail; narrower bins show more detail, while wider bins summarize data.

Types of Data Suitable for a Histogram:

Continuous data: Example: Heights, weights, income, or time intervals.

Grouped or binned discrete data: Example: Exam scores (e.g., 0–10, 11–20, etc.).

Univariate data: Focuses on a single variable at a time.

Histograms are not suitable for categorical data (nominal or ordinal) as they are designed for numerical, continuous, or grouped discrete
data. For categorical data, a bar chart is more appropriate.

import matplotlib.pyplot as plt

data = [7, 8, 5, 6, 10, 15, 13]


plt.hist(data, bins=3, color='blue', edgecolor='black')
plt.title("Histogram")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()

Column chart

Summary:

Discrete Categories: Used to represent data across discrete categories or groups.

Bars: Individual bars represent categories. Height (vertical bar chart) or length (horizontal bar chart) represents the value associated with the
category.

Axis Labels:

X-axis: Represents the categories.

Y-axis: Represents the frequency, count, or any other value metric.

Space Between Bars: Bars are separated by gaps to emphasize discrete categories.

Customization: Can be clustered (side-by-side for multiple variables) or stacked (segments within a bar).

Comparative Analysis: Useful for comparing values across categories or groups.

Types of Data Suitable for a Bar Chart:

Categorical data: Example: Product types, regions, or brands.

Ordinal data: Example: Rankings or ratings with a meaningful order (e.g., good, better, best).

Aggregated Numerical Data: Example: Sales totals, survey counts, or averages grouped by category.

Nominal Data: Data without a meaningful order (e.g., colors, cities).

Bar charts are not suitable for continuous data unless it has been grouped into discrete intervals. For continuous data, a histogram or line
chart is more appropriate.

import matplotlib.pyplot as plt

categories = ['A', 'B', 'C', 'D']


values = [10, 20, 15, 25]
plt.bar(categories, values, color='orange')
plt.title("Bar Plot")
plt.xlabel("Categories")
plt.ylabel("Values")
plt.show()

import matplotlib.pyplot as plt

categories = ['A', 'B', 'C']


group1 = [5, 10, 15]
group2 = [3, 7, 12]

plt.bar(categories, group1, label='Group 1')


plt.bar(categories, group2, bottom=group1, label='Group 2')
plt.title("Stacked Column Chart")
plt.xlabel("Categories")
plt.ylabel("Values")
plt.legend()
plt.show()
# Same as Bar Plot, but vertical orientation by default in Matplotlib
categories = ['A', 'B', 'C', 'D']
values = [10, 20, 15, 25]
plt.barh(categories, values, color='skyblue')
plt.title("Column Plot")
plt.xlabel("Categories")
plt.ylabel("Values")
plt.show()

Pie Chart

Summary:

Circular Representation: Displays data as slices of a circle.

Proportions: Each slice represents a proportion or percentage of the whole dataset.

Labels: Categories and their corresponding values or percentages are often labeled directly on the chart or in a legend.

Single Variable Focus: Ideal for showing the composition of a dataset.

Visual Impact: Helps identify dominant categories or segments at a glance.

Limitations: Becomes less effective when there are too many slices or when values are very similar.

Types of Data Suitable for a Pie Chart:

Categorical data: Example: Market share by company, population by region.

Nominal data: Example: Product categories, job roles.

Percentages or Proportions: Example: Survey results (e.g., "Yes," "No," "Maybe" responses).

Limited Categories: Works best with fewer categories (usually less than 6–8).

Pie charts are not suitable for datasets with a large number of categories, overlapping categories, or comparisons across multiple groups.
A bar chart or stacked bar chart might be a better alternative for such cases.

import matplotlib.pyplot as plt

categories = ['A', 'B', 'C', 'D']


values = [10, 20, 15, 25]
plt.pie(values, labels=categories, autopct='%1.2f%%', startangle = 0)
plt.title("Pie Chart")
plt.show()

Line Plot:

Summary:

Continuous Data Representation: Visualizes trends, changes, or patterns over time or across sequential data points.

Data Points: Plotted at specific intervals and connected by straight lines.

X-Axis: Represents independent variables, often time or a sequence (e.g., days, months, years).

Y-Axis: Represents dependent variables, showing values corresponding to each point on the x-axis.

Trend Analysis: Highlights upward, downward, or stable trends.

Multiple Lines: Allows comparison between multiple variables or groups.

Customization: Can include markers, dashed lines, or smoothing for enhanced interpretation.

Types of Data Suitable for a Line Chart:

Continuous data: Example: Temperature, stock prices, or revenue over time.

Time-series data: Example: Monthly sales, daily website traffic, or yearly population growth.

Sequential Data: Example: Progression of a measurement (e.g., growth rate, cumulative counts).

Comparative Data: Example: Comparing trends for different groups or categories.

Line charts are not suitable for categorical data without a natural order or for visualizing proportions (use bar or pie charts for these).
They work best when showing data relationships over a continuum.

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
plt.plot(x, y, marker='o', color='green', label="Line")
plt.title("Line Chart")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.legend()
plt.show()
import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
z = [3, 7, 4, 6, 12]
plt.plot(x, y, marker='o', color='green', label="Line1")
plt.plot(x, z, marker='v', color='red', label="Line2")
plt.title("Line Chart")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.legend()
plt.show()

Area Chart

Summary:

Continuous Data Representation: Similar to a line chart but with the area below the line filled with color or shading.

Trend Visualization: Highlights trends and changes over time for one or multiple datasets.

Cumulative Impact: Stacked area charts show cumulative totals across categories or groups.

X-Axis: Represents independent variables, often time or sequential data.

Y-Axis: Represents dependent variables, showing values corresponding to each x-axis point.

Visual Emphasis: The filled area emphasizes magnitude and volume.

Multiple Series: Can display overlapping or stacked series to compare groups.

Types of Data Suitable for an Area Chart:

Continuous data: Example: Website traffic over time, monthly revenue.

Time-series data: Example: Quarterly sales, temperature changes over a year.

Cumulative Data: Example: Total expenses vs. income, energy usage over a period.

Comparative Data: Example: Stacked area charts to show contributions of different sources to a total.

Area charts are not suitable for datasets with too many overlapping series or for precise value comparison between groups (use line or
bar charts instead). They work best when the emphasis is on trends and magnitude over time.

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
plt.fill_between(x, y, color='lightblue', alpha=0.7)
plt.plot(x, y, color='blue', label="Area")
plt.title("Area Chart")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.legend()
plt.show()

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
z = [3, 7, 4, 6, 12]

plt.fill_between(x, y, color='lightblue', alpha=0.7)


plt.plot(x, y, color='blue', label="Area1")

plt.fill_between(x, z, color='red', alpha=0.2)


plt.plot(x, z, color='red', label="Area2")

plt.title("Area Chart")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.legend()
plt.show()

Scatter Plot

Summary:

Bivariate Data Representation: Plots two variables against each other to observe their relationship.

Points: Each point represents a single data observation, with coordinates corresponding to values of the two variables.

X-Axis and Y-Axis: The x-axis represents the independent variable, and the y-axis represents the dependent variable.

Correlation: Highlights relationships such as positive, negative, or no correlation between variables.

Clustering: Identifies patterns or groupings within the data points.

Outliers: Visually reveals data points that deviate significantly from the overall pattern.

Customization: Points can vary in size (bubble chart), color, or shape to represent additional dimensions of data.
Types of Data Suitable for a Scatter Plot:

Numerical data: Example: Exam scores vs. study hours, height vs. weight.

Bivariate data: Example: Age vs. income, temperature vs. electricity consumption.

Data for Correlation Analysis: Example: Relationship between advertising spend and sales.

Multi-Dimensional Data: With color or size encoding, it can show three or more variables (e.g., GDP, population, and region).

Scatter plots are not suitable for categorical data without numeric relationships or for datasets requiring summaries of proportions or
distributions. They excel in exploring relationships, trends, and variability in bivariate data.

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 4, 1, 3, 5]
plt.scatter(x, y, color='red', label='Points')
plt.title("Scatter Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.legend()
plt.show()

Heatmap:

Summary:

Matrix Representation: Displays data in a matrix format where each cell's color represents a value or intensity.

Color Gradient: The color scale (often from low to high) represents the magnitude of values, providing a visual way to understand data
distribution.

X-Axis and Y-Axis: Used to represent categorical or ordinal variables (e.g., time periods, regions, or features) that define the rows and columns
of the matrix.

Correlation Visualization: Often used to show relationships, such as correlation matrices or similarity between variables.

Dense Information: Allows easy identification of patterns, clusters, or outliers in large datasets.

Customizable Colors: Color schemes can be adjusted to suit the context or to emphasize specific trends.

Types of Data Suitable for a Heatmap:

Numerical data: Example: Correlation between variables, performance metrics across time and categories.

Tabular Data: Example: Sales figures across months and regions, temperature variations across time and locations.

Matrix Data: Example: Gene expression data, user activity on a website (user-item interaction matrix).

Comparative Data: Example: Survey responses (e.g., rating scales) across multiple groups.

Categorical Data (with aggregated numerical values): Example: Counts or frequencies of categories over different time periods.

Heatmaps are not ideal for visualizing highly detailed individual values, as they focus on patterns and trends in large datasets. For precise
comparisons of individual data points, a scatter plot, bar chart, or table may be more appropriate.

import seaborn as sns


import numpy as np
import matplotlib.pyplot as plt

data = np.random.rand(5, 5)
sns.heatmap(data, annot=True, cmap='coolwarm')
plt.title("Heatmap")
plt.show()

You might also like