0% found this document useful (0 votes)

4 views5 pages

Unit3_4) Matplotlib and seaborn.ipynb - Colab

The document provides an overview of various data visualization techniques using Matplotlib and Seaborn, including box plots, histograms, bar charts, pie charts, line plots, area charts, scatter plots, and heatmaps. Each visualization type is described with its purpose, suitable data types, and example code for implementation. The document emphasizes the importance of selecting the appropriate chart based on the nature of the data being represented.

Uploaded by

shivam511439

Available Formats

Download as PDF, TXT or read online on Scribd

Download as pdf or txt

0% found this document useful (0 votes)

4 views5 pages

Unit3_4) Matplotlib and seaborn.ipynb - Colab

Uploaded by

shivam511439

Available Formats

Download as PDF, TXT or read online on Scribd

Download as pdf or txt

You are on page 1/ 5

keyboard_arrow_down Understanding Canvas

keyboard_arrow_down Using Matplotlib library and seaborn

Box Plot

Summary:

Minimum: The smallest data point (excluding outliers).

First quartile (Q1): 25th percentile, the lower edge of the box.

Median (Q2): 50th percentile, shown as a line within the box.

Third quartile (Q3): 75th percentile, the upper edge of the box.

Maximum: The largest data point (excluding outliers).

Interquartile Range (IQR): The height of the box represents the IQR (Q3 - Q1).

Whiskers: Extend to the smallest and largest values within 1.5 × IQR of the quartiles.

Outliers: Points outside 1.5 × IQR from Q1 and Q3 are plotted individually.

Types of Data Suitable for a Box Plot:

Continuous data: Example: Heights, weights, temperatures, or sales figures.

Ordinal data: Data with meaningful order (e.g., ratings on a scale).

Grouped data: Comparing continuous data across categories (e.g., test scores by gender or region).

Box plots are not suitable for discrete or categorical data without an inherent order or quantitative value.

import matplotlib.pyplot as plt

impoimportrt seaborn as sns

data = [7, 8, 5, 6, 10, 15, 13]

sns.boxplot(data=data)
plt.title("Box Plot")
plt.show()

import matplotlib.pyplot as plt

import seaborn as sns

# Data for multiple box plots

data1 = [7, 8, 5, 6, 10, 15, 13]
data2 = [3, 4, 2, 5, 8, 12, 9]
data3 = [10, 12, 8, 15, 17, 20, 22]

# Combine the data into a single list

data = [data1, data2, data3]

# Create the box plots

sns.boxplot(data=data)
plt.title("Multiple Box Plots")
plt.xlabel("Datasets")
plt.ylabel("Values")
plt.xticks(ticks=[0, 1, 2], labels=["Data1", "Data2", "Data3"]) # Custom labels
plt.show()

Histogram

Summary:
Distribution Representation: Visualizes the frequency distribution of a dataset.

Bins (Intervals): The x-axis is divided into intervals or "bins," representing ranges of data values.

Frequency: The y-axis represents the frequency or count of data points within each bin.

Bar Heights: The height of each bar indicates the number of data points in the corresponding bin.

Continuous Data: The bars touch each other, indicating the continuous nature of the data.

Customizable Bin Size: Bin width affects the level of detail; narrower bins show more detail, while wider bins summarize data.

Types of Data Suitable for a Histogram:

Continuous data: Example: Heights, weights, income, or time intervals.

Grouped or binned discrete data: Example: Exam scores (e.g., 0–10, 11–20, etc.).

Univariate data: Focuses on a single variable at a time.

Histograms are not suitable for categorical data (nominal or ordinal) as they are designed for numerical, continuous, or grouped discrete
data. For categorical data, a bar chart is more appropriate.

import matplotlib.pyplot as plt

data = [7, 8, 5, 6, 10, 15, 13]

plt.hist(data, bins=3, color='blue', edgecolor='black')
plt.title("Histogram")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()

Column chart

Summary:

Discrete Categories: Used to represent data across discrete categories or groups.

Bars: Individual bars represent categories. Height (vertical bar chart) or length (horizontal bar chart) represents the value associated with the
category.

Axis Labels:

X-axis: Represents the categories.

Y-axis: Represents the frequency, count, or any other value metric.

Space Between Bars: Bars are separated by gaps to emphasize discrete categories.

Customization: Can be clustered (side-by-side for multiple variables) or stacked (segments within a bar).

Comparative Analysis: Useful for comparing values across categories or groups.

Types of Data Suitable for a Bar Chart:

Categorical data: Example: Product types, regions, or brands.

Ordinal data: Example: Rankings or ratings with a meaningful order (e.g., good, better, best).

Aggregated Numerical Data: Example: Sales totals, survey counts, or averages grouped by category.

Nominal Data: Data without a meaningful order (e.g., colors, cities).

Bar charts are not suitable for continuous data unless it has been grouped into discrete intervals. For continuous data, a histogram or line
chart is more appropriate.

import matplotlib.pyplot as plt

categories = ['A', 'B', 'C', 'D']

values = [10, 20, 15, 25]
plt.bar(categories, values, color='orange')
plt.title("Bar Plot")
plt.xlabel("Categories")
plt.ylabel("Values")
plt.show()

import matplotlib.pyplot as plt

categories = ['A', 'B', 'C']

group1 = [5, 10, 15]
group2 = [3, 7, 12]

plt.bar(categories, group1, label='Group 1')

plt.bar(categories, group2, bottom=group1, label='Group 2')
plt.title("Stacked Column Chart")
plt.xlabel("Categories")
plt.ylabel("Values")
plt.legend()
plt.show()
# Same as Bar Plot, but vertical orientation by default in Matplotlib
categories = ['A', 'B', 'C', 'D']
values = [10, 20, 15, 25]
plt.barh(categories, values, color='skyblue')
plt.title("Column Plot")
plt.xlabel("Categories")
plt.ylabel("Values")
plt.show()

Pie Chart

Summary:

Circular Representation: Displays data as slices of a circle.

Proportions: Each slice represents a proportion or percentage of the whole dataset.

Labels: Categories and their corresponding values or percentages are often labeled directly on the chart or in a legend.

Single Variable Focus: Ideal for showing the composition of a dataset.

Visual Impact: Helps identify dominant categories or segments at a glance.

Limitations: Becomes less effective when there are too many slices or when values are very similar.

Types of Data Suitable for a Pie Chart:

Categorical data: Example: Market share by company, population by region.

Nominal data: Example: Product categories, job roles.

Percentages or Proportions: Example: Survey results (e.g., "Yes," "No," "Maybe" responses).

Limited Categories: Works best with fewer categories (usually less than 6–8).

Pie charts are not suitable for datasets with a large number of categories, overlapping categories, or comparisons across multiple groups.
A bar chart or stacked bar chart might be a better alternative for such cases.

import matplotlib.pyplot as plt

categories = ['A', 'B', 'C', 'D']

values = [10, 20, 15, 25]
plt.pie(values, labels=categories, autopct='%1.2f%%', startangle = 0)
plt.title("Pie Chart")
plt.show()

Line Plot:

Summary:

Continuous Data Representation: Visualizes trends, changes, or patterns over time or across sequential data points.

Data Points: Plotted at specific intervals and connected by straight lines.

X-Axis: Represents independent variables, often time or a sequence (e.g., days, months, years).

Y-Axis: Represents dependent variables, showing values corresponding to each point on the x-axis.

Trend Analysis: Highlights upward, downward, or stable trends.

Multiple Lines: Allows comparison between multiple variables or groups.

Customization: Can include markers, dashed lines, or smoothing for enhanced interpretation.

Types of Data Suitable for a Line Chart:

Continuous data: Example: Temperature, stock prices, or revenue over time.

Time-series data: Example: Monthly sales, daily website traffic, or yearly population growth.

Sequential Data: Example: Progression of a measurement (e.g., growth rate, cumulative counts).

Comparative Data: Example: Comparing trends for different groups or categories.

Line charts are not suitable for categorical data without a natural order or for visualizing proportions (use bar or pie charts for these).
They work best when showing data relationships over a continuum.

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
plt.plot(x, y, marker='o', color='green', label="Line")
plt.title("Line Chart")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.legend()
plt.show()
import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
z = [3, 7, 4, 6, 12]
plt.plot(x, y, marker='o', color='green', label="Line1")
plt.plot(x, z, marker='v', color='red', label="Line2")
plt.title("Line Chart")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.legend()
plt.show()

Area Chart

Summary:

Continuous Data Representation: Similar to a line chart but with the area below the line filled with color or shading.

Trend Visualization: Highlights trends and changes over time for one or multiple datasets.

Cumulative Impact: Stacked area charts show cumulative totals across categories or groups.

X-Axis: Represents independent variables, often time or sequential data.

Y-Axis: Represents dependent variables, showing values corresponding to each x-axis point.

Visual Emphasis: The filled area emphasizes magnitude and volume.

Multiple Series: Can display overlapping or stacked series to compare groups.

Types of Data Suitable for an Area Chart:

Continuous data: Example: Website traffic over time, monthly revenue.

Time-series data: Example: Quarterly sales, temperature changes over a year.

Cumulative Data: Example: Total expenses vs. income, energy usage over a period.

Comparative Data: Example: Stacked area charts to show contributions of different sources to a total.

Area charts are not suitable for datasets with too many overlapping series or for precise value comparison between groups (use line or
bar charts instead). They work best when the emphasis is on trends and magnitude over time.

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
plt.fill_between(x, y, color='lightblue', alpha=0.7)
plt.plot(x, y, color='blue', label="Area")
plt.title("Area Chart")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.legend()
plt.show()

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
z = [3, 7, 4, 6, 12]

plt.fill_between(x, y, color='lightblue', alpha=0.7)

plt.plot(x, y, color='blue', label="Area1")

plt.fill_between(x, z, color='red', alpha=0.2)

plt.plot(x, z, color='red', label="Area2")

plt.title("Area Chart")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.legend()
plt.show()

Scatter Plot

Summary:

Bivariate Data Representation: Plots two variables against each other to observe their relationship.

Points: Each point represents a single data observation, with coordinates corresponding to values of the two variables.

X-Axis and Y-Axis: The x-axis represents the independent variable, and the y-axis represents the dependent variable.

Correlation: Highlights relationships such as positive, negative, or no correlation between variables.

Clustering: Identifies patterns or groupings within the data points.

Outliers: Visually reveals data points that deviate significantly from the overall pattern.

Customization: Points can vary in size (bubble chart), color, or shape to represent additional dimensions of data.
Types of Data Suitable for a Scatter Plot:

Numerical data: Example: Exam scores vs. study hours, height vs. weight.

Bivariate data: Example: Age vs. income, temperature vs. electricity consumption.

Data for Correlation Analysis: Example: Relationship between advertising spend and sales.

Multi-Dimensional Data: With color or size encoding, it can show three or more variables (e.g., GDP, population, and region).

Scatter plots are not suitable for categorical data without numeric relationships or for datasets requiring summaries of proportions or
distributions. They excel in exploring relationships, trends, and variability in bivariate data.

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 4, 1, 3, 5]
plt.scatter(x, y, color='red', label='Points')
plt.title("Scatter Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.legend()
plt.show()

Heatmap:

Summary:

Matrix Representation: Displays data in a matrix format where each cell's color represents a value or intensity.

Color Gradient: The color scale (often from low to high) represents the magnitude of values, providing a visual way to understand data
distribution.

X-Axis and Y-Axis: Used to represent categorical or ordinal variables (e.g., time periods, regions, or features) that define the rows and columns
of the matrix.

Correlation Visualization: Often used to show relationships, such as correlation matrices or similarity between variables.

Dense Information: Allows easy identification of patterns, clusters, or outliers in large datasets.

Customizable Colors: Color schemes can be adjusted to suit the context or to emphasize specific trends.

Types of Data Suitable for a Heatmap:

Numerical data: Example: Correlation between variables, performance metrics across time and categories.

Tabular Data: Example: Sales figures across months and regions, temperature variations across time and locations.

Matrix Data: Example: Gene expression data, user activity on a website (user-item interaction matrix).

Comparative Data: Example: Survey responses (e.g., rating scales) across multiple groups.

Categorical Data (with aggregated numerical values): Example: Counts or frequencies of categories over different time periods.

Heatmaps are not ideal for visualizing highly detailed individual values, as they focus on patterns and trends in large datasets. For precise
comparisons of individual data points, a scatter plot, bar chart, or table may be more appropriate.

import seaborn as sns

import numpy as np
import matplotlib.pyplot as plt

data = np.random.rand(5, 5)
sns.heatmap(data, annot=True, cmap='coolwarm')
plt.title("Heatmap")
plt.show()

Types of Graphs
100% (1)
Types of Graphs
16 pages
Advanced Python Chap 3 Part 1
No ratings yet
Advanced Python Chap 3 Part 1
49 pages
Unit 2 Chapter 2 Notes - Statistics
No ratings yet
Unit 2 Chapter 2 Notes - Statistics
4 pages
Notes9_Class_10_Data Visualization using MatPlotlib Notes
No ratings yet
Notes9_Class_10_Data Visualization using MatPlotlib Notes
5 pages
MATPLOTLIB BASICS
No ratings yet
MATPLOTLIB BASICS
27 pages
Diskusi 7 BING4102
100% (1)
Diskusi 7 BING4102
8 pages
Data Summary and Presentation 1
100% (1)
Data Summary and Presentation 1
39 pages
ass-2 (2)
No ratings yet
ass-2 (2)
13 pages
Boxplots in R-1
No ratings yet
Boxplots in R-1
10 pages
2/ Organizing and Visualizing Variables: Dcova
No ratings yet
2/ Organizing and Visualizing Variables: Dcova
4 pages
IT Skill Lab-2
No ratings yet
IT Skill Lab-2
23 pages
Jeet Nandi (Ca2)_mic301a
No ratings yet
Jeet Nandi (Ca2)_mic301a
5 pages
Calc 3
No ratings yet
Calc 3
22 pages
Basic Charts and Multidimensional Visualization
No ratings yet
Basic Charts and Multidimensional Visualization
33 pages
Graphical Presentation
No ratings yet
Graphical Presentation
6 pages
Statistical Data Presentation Tools
0% (1)
Statistical Data Presentation Tools
21 pages
tableau assignment
No ratings yet
tableau assignment
12 pages
Big Data Visualization and Common Adopattation Issues
No ratings yet
Big Data Visualization and Common Adopattation Issues
34 pages
tableau assignment raspinder 1
No ratings yet
tableau assignment raspinder 1
12 pages
lecture4
No ratings yet
lecture4
60 pages
Picturing Distributions With Graphs
No ratings yet
Picturing Distributions With Graphs
21 pages
Charts Statistics PDF
No ratings yet
Charts Statistics PDF
10 pages
Unit - 2 Pca20g02t.docx
No ratings yet
Unit - 2 Pca20g02t.docx
15 pages
DEO CHP 8- Formulas, functions and charts (8 pages)
No ratings yet
DEO CHP 8- Formulas, functions and charts (8 pages)
8 pages
Kinds of Graphs2
No ratings yet
Kinds of Graphs2
31 pages
Sant Rawool Maharaj, Mhavidyalay
No ratings yet
Sant Rawool Maharaj, Mhavidyalay
14 pages
Charting Basics
No ratings yet
Charting Basics
35 pages
Charting Operations
No ratings yet
Charting Operations
4 pages
Visualizing Distributions
No ratings yet
Visualizing Distributions
28 pages
Unit 3 DATA VISUAIZATION
No ratings yet
Unit 3 DATA VISUAIZATION
25 pages
Types of Graphs and Charts and Their Uses
100% (1)
Types of Graphs and Charts and Their Uses
17 pages
Bar Charts NOTES
No ratings yet
Bar Charts NOTES
8 pages
Tableau Terminology
No ratings yet
Tableau Terminology
3 pages
CH 2 Notes Filled
No ratings yet
CH 2 Notes Filled
22 pages
Assignment-1: Bar Plot
No ratings yet
Assignment-1: Bar Plot
24 pages
5_1_Operations_Analytics(1) (1)
No ratings yet
5_1_Operations_Analytics(1) (1)
23 pages
datavisualisation
No ratings yet
datavisualisation
2 pages
Unit1 - 2charts and Graphs
No ratings yet
Unit1 - 2charts and Graphs
26 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
63 pages
Assignment No:-01: Name:-Snehal - Satyavrat.Yadav ROLL - NO:-42
No ratings yet
Assignment No:-01: Name:-Snehal - Satyavrat.Yadav ROLL - NO:-42
12 pages
Data Organization and Presentation
100% (1)
Data Organization and Presentation
26 pages
Introductory Statistics (Chapter 2)
No ratings yet
Introductory Statistics (Chapter 2)
3 pages
Data Visualization in Excel
No ratings yet
Data Visualization in Excel
5 pages
Charts
No ratings yet
Charts
11 pages
21L-1803 Data Visual Assignment#3
No ratings yet
21L-1803 Data Visual Assignment#3
26 pages
Introductory Statistics (Chapter 2)
No ratings yet
Introductory Statistics (Chapter 2)
3 pages
Chapter2 MAS202
No ratings yet
Chapter2 MAS202
43 pages
Data Sceinces
No ratings yet
Data Sceinces
15 pages
Block 1 Slides
No ratings yet
Block 1 Slides
15 pages
Estudiar 1
No ratings yet
Estudiar 1
7 pages
Distribution of A Single Variable
No ratings yet
Distribution of A Single Variable
14 pages
2012 01 05 XcelsiusComponentDocumenationMaster DGriffith
No ratings yet
2012 01 05 XcelsiusComponentDocumenationMaster DGriffith
28 pages
MBAS901 - L2
No ratings yet
MBAS901 - L2
110 pages
Charts in Excel
100% (1)
Charts in Excel
12 pages
Unit 4 python
No ratings yet
Unit 4 python
12 pages
Statistics Lec 1
No ratings yet
Statistics Lec 1
28 pages
Foundation Notes 2013
No ratings yet
Foundation Notes 2013
25 pages
Data Visualization Tech.
No ratings yet
Data Visualization Tech.
6 pages
Apunts BLOC 1 Estadística
No ratings yet
Apunts BLOC 1 Estadística
15 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Latihan Uts
No ratings yet
Latihan Uts
3 pages
Year 10 Maths - Chance - Data - Data Represnetation - Analysis - Questions (Ch2 Ex2)
No ratings yet
Year 10 Maths - Chance - Data - Data Represnetation - Analysis - Questions (Ch2 Ex2)
4 pages
DM Unit-1 Notes
No ratings yet
DM Unit-1 Notes
47 pages
As Level Paper 52
No ratings yet
As Level Paper 52
14 pages
Sampling and Frequency Distribution
No ratings yet
Sampling and Frequency Distribution
57 pages
Python
No ratings yet
Python
32 pages
Modeling 3rd Edition Iain Pardoe
No ratings yet
Modeling 3rd Edition Iain Pardoe
60 pages
Making Sense of Data Statistic Course
No ratings yet
Making Sense of Data Statistic Course
39 pages
Chapter 4
No ratings yet
Chapter 4
43 pages
Intro To Stats 7 - 4 - 2021
No ratings yet
Intro To Stats 7 - 4 - 2021
42 pages
Business Report: by Sreenath Radhakrishnan
No ratings yet
Business Report: by Sreenath Radhakrishnan
26 pages
Question Paper Paper 4
No ratings yet
Question Paper Paper 4
20 pages
Descriptive Analytics
No ratings yet
Descriptive Analytics
15 pages
Box Plots and Cumulative Frequency WS
No ratings yet
Box Plots and Cumulative Frequency WS
2 pages
Dr. Anjan Krishnamurthy Associate Professor Dept. of CSE, BMSIT&M
No ratings yet
Dr. Anjan Krishnamurthy Associate Professor Dept. of CSE, BMSIT&M
129 pages
SMDM Project Report
No ratings yet
SMDM Project Report
27 pages
Notes 491
No ratings yet
Notes 491
717 pages
Math 10
No ratings yet
Math 10
7 pages
Codman ICP Monitor
No ratings yet
Codman ICP Monitor
487 pages
Assessment Task For Lesson 3.2: X X W W W W X X
No ratings yet
Assessment Task For Lesson 3.2: X X W W W W X X
2 pages
Cosc2753 A1 MC
No ratings yet
Cosc2753 A1 MC
8 pages
Box Plot Answers MME
No ratings yet
Box Plot Answers MME
2 pages
2019 WTS 12 Maths P2 Crossnight
No ratings yet
2019 WTS 12 Maths P2 Crossnight
24 pages
WEEK 3 - Central-Tendency-Variation-And-Shape
No ratings yet
WEEK 3 - Central-Tendency-Variation-And-Shape
39 pages
Lecture 2
No ratings yet
Lecture 2
62 pages
R Syntax Comparison::: Cheat Sheet
No ratings yet
R Syntax Comparison::: Cheat Sheet
2 pages
Chapter 1 Lecture 1
No ratings yet
Chapter 1 Lecture 1
145 pages
The Effect of Temperature On Microorganisms Growth Rate
No ratings yet
The Effect of Temperature On Microorganisms Growth Rate
9 pages
Summary Measures: Multiple Choice Questions
No ratings yet
Summary Measures: Multiple Choice Questions
9 pages
STAT 341 Assignment 1: Student Name and ID Due Friday September 27 at 9:00am
No ratings yet
STAT 341 Assignment 1: Student Name and ID Due Friday September 27 at 9:00am
3 pages