Introduction To Data Visualisation
Introduction To Data Visualisation
Visualization
Introduction
Data visualization is the graphical
representation of information and data.
By using visual elements like charts,
graphs, and maps, data visualization tools
provide an accessible way to see and
understand trends, outliers, and patterns
in data.
Visualizing data is a valuable skill for Data
Scientists and Data Analysts.
Introduction
Itis easy to pick up but hard to master.
As you develop visualizations, you need to
keep in mind how the user will interact
with your work.
Your visualizations should be useful in
helping the user analyze the data.
Visualizations and tables allow the data to
be more accessible to others and more
understandable if done correctly.
Definition
“Data visualization is the language of
decision making. Good charts effectively
convey information. Great charts enable,
inform, and improve decision making.” —
Dante Vitagliano
D.V is Art or Science?
Data visualization is part art and part science.
The challenge is to get the art right without getting the
science wrong and vice versa.
A data visualization first and foremost has to accurately
convey the data. It must not mislead or distort. If one
number is twice as large as another, but in the visualization
they look to be about the same, then the visualization is
wrong.
At the same time, a data visualization should be aesthetically
pleasing.
Good visual presentations tend to enhance the message of
the visualization.
If a figure contains jarring colors, imbalanced visual elements,
or other features that distract, then the viewer will find it
harder to inspect the figure and interpret it correctly.
Brief History Of Data Visualization
The concept of using pictures to understand data has been
around for centuries, from maps and graphs in the 17th
century to the invention of the pie chart in the early 1800s.
Several decades later, one of the most cited examples of
statistical graphics occurred when Charles Minard mapped
Napoleon’s invasion of Russia. The map depicted the size of
the army as well as the path of Napoleon’s retreat from
Moscow — and tied that information to temperature and
time scales for a more in-depth understanding of the event.
The idea of visualizing data is not new. Over a hundred years
ago, scientists and thinkers found themselves drowning in
their own flood of data — and to help understand it, they
invented the very idea of info graphics and pictorial graphs.
They were manually plotted and printed on huge canvases or
billboards.
History
With the advent of technology and computation, we can
process large amounts of data at lightning-fast speeds. Today,
data visualization has become a rapidly evolving blend of
science and art that is certain to change the corporate
landscape over the next few years.
The last three decades have seen the field of data
visualization explode into dozen of focus areas.
The new-age software tools enable businesses, researchers,
and individuals to explore their data in new and increasingly
imaginative ways and continue to refine the science and art
of data visualization to bring it to new heights.
Now we have the ability to chart more data, faster, and we
can engage the audience with animated or interactive
visualizations with the latest technologies of 3D Graphics,
Virtual and Augmented Reality — there is more magic in
graphs than ever before.
Why do we need Data
Visualization?
Discovering the patterns
Raw data, tables, and text captured can give you the
numbers and information required, but when you
represent the data in a graphical format, it becomes
easier to find patterns, correlations, and hidden
trends in the data.
Complex data
With the influx of huge quantities of big data from
businesses with enormous volume and velocity- often
in raw formats -it is essential to distill that
information into comprehendible modules in order to
make sense of it. Representing numbers in a visual
format helps you simplify those complex structured
tables into simple comprehendible information.
Importance
User interaction & experience: UX design is the prime
deciding factor for the success of any product. When it
comes to new age media design and web interactions,
interactive data visualization techniques play a vital role in
creating that enriching and seamless user experience
allowing the user to interact with the data and understanding
the data points and the story behind it.
Content: With the web traffic reaching higher peaks than
ever, users are being overloaded with information on the
web. With the help of visual content, as discussed earlier —
telling interesting stories through beautiful visuals such as
infographics can help drive the insights to the end-user in a
powerful as well as comprehensible manner. In this age of
Content, Infographics and such visualization techniques play a
key role in every medium.
Importance
Data Storytelling: Being a highly creative skill,
data visualization can tell a story that gives
significance to the data. Effective visualization can
appeal to the emotions of readers, put faces to
numbers, and introduce a narrative to the data.
Statistics and numbers can overwhelm a lot of
audiences, and therefore loss of their significance.
When organized in an infographic it becomes
much easier to quickly draw meaning from data.
The purpose of an infographic is to simplify a
complex idea which makes them great
educational tools, especially when presenting an
overview of a topic instead of an in-depth
analysis.
Applications of Data
Visualizations
Business Analytics & Enterprise Reporting:Visualizations are
predominantly used in the enterprise analytics and business
intelligence interfaces where stakeholders get an overview of the
data related to their business. Dashboards & reports are very
common almost in every company across all the fields.
Digital & Virtual Presentations: Board meetings, Investor
pitches, and companywide decks all contain data visualizations.
Latest technological advancements have made virtual presentations
with 3D visuals in VR/AR a possibility.VR simulation is another
exciting area with a lot to be explored yet.
AI & Machine learning: Data visualizations are a necessity in the
Data Science field. Correlation plots, accuracy graphs, loss function
graphs, etc. are extremely important tools for a data scientist to
make sense of the data, build and tune models, and present the end
results to the stakeholders
Data Visualizations Tools
We have numerous tools and
technologies used for data visualizations
ranging from open-source software to
enterprise BI solutions.
BI Tools:Tableau, PowerBI, Looker, Qlik
Sense, MS Excel, etc
Visual libraries: Matplotlib, D3.js,
Seaborn, etc.
Job roles in Data Visualization
There are many different types of data science problems that lead
to different skill sets for each problem.Thus there are different
roles in this field:
◦ Data Visualization Engineer
◦ Business Intelligence Analyst
◦ UI Designer
◦ Data Analyst
Data is playing an increasingly dominating role in the market today,
and the ability to handle the data tools is the most sought-after skill
the industry is looking for.
With the massive increase in volume and variety of data, data
visualization has become more evolved and resourceful over the
years making it a must-have skill for non-technical consumers also.
Irrespective of the role within a team/company, data visualization is
a skill that’s necessary for all professionals.
Overview of Data Visualization
Shapes ofData
Marks and channels
Common Visualization Idioms
Color and Size in Visualization
Data Reduction
Shapes of Data-Introduction
A distribution of data item values may be
symmetrical or asymmetrical. Two common
examples of symmetry and asymmetry are the
'normal distribution' and the 'skewed distribution'.
In a symmetrical distribution the two sides of the
distribution are a mirror image of each other.
A normal distribution is a true symmetric
distribution of observed values.
Shapes of the Data
The shape of the data determines the type
of tools that can be used to draw
conclusions from it. Here is how to
graphically plot out the data to find its shape:
Step 1: Plot Data into Categories
To begin with, the data must be divided into
equal categories. The categories must have
equal intervals to make the data meaningful.
Then a frequency table must be prepared
from the available data set and the number
of times an item occurs within an interval
category must be noted down.
Contd…
Step 2: Draw a Histogram
The next step is to plot the data intervals on a graph
paper and create a histogram. A histogram is nothing
but a bar chart of a continuous set of data with equal
intervals.
Step 3: Join the Midpoints to Find the Shape
The next step is to plot the midpoints of the bars of
the histogram. These midpoints must then be joined
to develop the curve of the data that is also called
the shape of the data.
Amongst the many characteristics of the shape of the
data that are important, perhaps the prime category
is symmetry. The reasons for the same have been
listed below.
Characteristics of Shape
The shape of the data is of such prime importance because
statistical techniques have been developed which can make
decisions about the probability of data based on its shape.The
details of the same are as follows:
Symmetrical Data: Symmetrical data sis the easiest type of data
to work with.This is because many statistical techniques have been
developed for the same. In fact symmetrical data is so common that
it is called the normal curve. It also has other names like the bell
curve.There are standard measurements available which can tell
the probability of a data point occurring based on the number of
standard deviations it is away from the mean. From a six sigma
point of view it helps understand how the results of a process are
likely to be distributed.
Most things which are measured continuously in nature as well as
in operations have the normal distribution. It is for this reason that
the applications of symmetrical data are enormous.
Skewed Data: Many times the data is not
symmetrical i.e it is skewed towards one
side. Data can be either positively or
negatively skewed. There are statistical
techniques available which help us find out
the probability distributions of skewed data
too. However such techniques are not very
well developed. This is because most of the
sample data being collected usually follows
the normal distribution. Statistical analysis of
skewed data is therefore not often
performed.
Descriptions of shape
The shape of a distribution will fall somewhere in a
continuum where a flat distribution might be considered
central and where types of departure from this include:
mounded (or unimodal), U-shaped, J-shaped, reverse-J shaped
and multi-modal.
A bimodal distribution would have two high points rather
than one.The shape of a distribution is sometimes
characterized by the behaviors of the tails (as in a long or
short tail). For example, a flat distribution can be said either
to have no tails, or to have short tails.
A normal distribution is usually regarded as having short
tails, while an exponential distribution has exponential tails
and a Pareto distribution has long tails.
Center
When we talk about center, shape, or spread, we
are talking about the distribution of the data, or
how the data is spread across the graph.
The center of a distribution gives you exactly
what it sounds like. It tells you the center or
median of the data. When you look at a graph, it
will be the value where approximately half of your
data is on one side and the rest of your data is on
the other side. The median point of your data set
is the middle number if you were to put your
data in ascending order. Let's say we are taking
surveys of different groups of people and their
donut eating habits. For the first group of people,
we have this graph:
The center of this graph is 5.
We see that our center is 5 because half of the people are to the left and the other half
are to the right.
Another way to describe the center is to take the mean or average of all your data.
When you describe your center in terms of mean and median, you might find that they
are slightly different.Your mean might be more or less than your median. We will discuss
what skewed means in just a little bit, but as far as the center is concerned, if your graph
is skewed, then you will want to use the median as your center.
Shape
Depending on the group of people we survey about
their donut eating habits, we will get different sets of
data. When graphed, we can get different looking
graphs. We use shape to describe the different types
of graphs we will see.There are four different ways in
which we can describe a graph's shape.
1. We can say a graph is symmetric if the left and
right sides of the graph are mirror images of each
other. The graph below, for example, is symmetric
because the left side is a mirror image of the right
side. We see that, at either end of the distribution,
only 1 person chooses to eat 3 donuts and 7 donuts.
Going closer to the center, we see that 2 people
choose to each eat 4 donuts and 6 donuts. They are
mirror images of each other.
2. Sometimes, our graph will look like a rollercoaster and will have
a number of peaks, or areas where the graph is higher than the
surrounding areas. If there is only one peak, then we call it unimodal. If
this one peak occurs at the center of the graph, it is also called bell-
shaped. Doesn't the graph below look like a bell? If it has two peaks, then
we will call it bimodal.
Example of a bell-shaped graph
3. If our graph has more data on one side rather than the other, we call
it skewed. If there are more to the right, we call it skewed left. For our
donuts eaten survey, this would mean that more people choose to eat
more donuts and fewer people choose to eat just a few. If our graph has
more data to the left, then we would say that our graph is skewed right.
For our donuts survey, it would mean that more people prefer to eat
fewer donuts.A good way to remember this is to view the graph as a
slide. If you slide down to the right, then it is skewed right, and if you slide
down to the left, then it is skewed left.
4. If our survey of people's donut eating habits showed that for each
amount of donuts eaten, the same number of people would choose that
amount, then our graph will look flat all across the top, then we call
it uniform.A uniform shape has no peaks nor is it skewed.
Warm Up
OBJECTIVE: Learn to describe the shape of a distribution of data
using appropriate vocabulary.
James kept track of how much he spent on lunch each day for a week, and
got the following results: $5, $7, $4, $5, $10, $6, $5
Alexis asked 10 people what their favorite color is, and got the following
answers:
Blue, green, orange, yellow, blue, blue, red, black, red, green
4.Why doesn’t it make sense to find the mean, median, or range of this
data?
Agenda
32
Launch – Next Step
Lets take a look at the middle names of two classes, and compare their center and
spread.
Frequency
Frequency
3 3
2 2
1 1
0 0
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8
Number of Letters Number of Letters
Agenda
33
Launch B
“These bar graphs have the same center and spread,
but they are completely different!”
Mrs. Smith's Class Mr. Cheever's Class
5 5
4 4
Frequency
Frequency
3 3
2 2
1 1
0 0
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8
Number of Letters Number of Letters
Agenda
34
Summary – Vocabulary
1. Symmetry: When it is graphed, a symmetric distribution can be
divided at the center so that each half is a reflection of the other.
Examples:
Agenda
35
Summary – Vocabulary
2. Peak: A point on the graph that is higher than the points directly to
the left and right.
Examples:
Agenda
36
Summary – Vocabulary
3. Skewed left/right: A graph is skewed left if it is highest to the right,
then becomes lower as it goes left. A graph is skewed right if it is
highest at the left and lowers as it goes right.
Examples:
Agenda
37
Summary – Vocabulary
4. Uniform: A graph that is evenly spread out, with no peaks.
Examples: Uniform
Agenda
38
Summary – Vocabulary
Distributions may also have unusual features. The two most common ones
are:
6. Gap: An area of the distribution where there are no entries in the data set.
7. Outlier: An element of the data set that is much higher or much lower than
all the other elements.
Agenda
39
Summary – Looking Back
Let’s go back to the two graphs we looked at earlier. How can we describe
them using the vocabulary that we just learned?
Mrs. Smith's Class
Mr. Cheever's Class
5
5
4 4
Frequency
Frequency
3 3
2 2
1 1
0 0
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8
Number of Letters Number of Letters
Hint: Go right down the vocabulary list and identify each one.
Remember – Symmetry, Peaks, Skew, uniform, unusual features.
scaffolding Agenda
40
Practice – Interactive Classwork
We will now complete the back side of your class work. Describe the
shape of each graph with as much detail as possible.
Agenda
41
Practice
Describe the SHAPE of this graph:
Gaps: No Gaps
Uniformity: Uniform
Gaps: Gap at 7
Outliers: Outlier at 8
Agenda
43
Practice
Describe the SHAPE of this graph:
Gaps:
One Gap at 4
Outliers:
No Outliers
Agenda
44
Practice
Describe the SHAPE of this graph:
Gaps: No Gaps
Take it further!
Outliers: No Outliers Can you calculate a center for this graph?
Agenda
45