0% found this document useful (0 votes)
432 views23 pages

Data Visualization

The document discusses data visualization and data analysis. It defines data visualization as using visual elements like charts and graphs to portray data in order to discover useful patterns. It then describes the different phases of the data analysis process, including data requirements specification, collection, processing, cleaning, analysis, and communication. Finally, it discusses different types of data visualization, categorizing them as temporal, hierarchical, network, multidimensional, or geospatial. It also lists some common graph types used in data visualization.

Uploaded by

Norsam L. Ampuan
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
0% found this document useful (0 votes)
432 views23 pages

Data Visualization

The document discusses data visualization and data analysis. It defines data visualization as using visual elements like charts and graphs to portray data in order to discover useful patterns. It then describes the different phases of the data analysis process, including data requirements specification, collection, processing, cleaning, analysis, and communication. Finally, it discusses different types of data visualization, categorizing them as temporal, hierarchical, network, multidimensional, or geospatial. It also lists some common graph types used in data visualization.

Uploaded by

Norsam L. Ampuan
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 23

Data Visualization

Data Analysis is a process of collecting, transforming, cleaning, and modeling data with the
goal of discovering the required information. The results so obtained are communicated,
suggesting conclusions, and supporting decision-making. Data visualization is at times used to
portray the data for the ease of discovering the useful patterns in the data. The terms Data
Modeling and Data Analysis mean the same.
Data Analysis Process consists of the following phases that are iterative in nature.

Data Requirements Specification

The data required for analysis is based on a question or an experiment. Based on the
requirements of those directing the analysis, the data necessary as inputs to the analysis is
identified (e.g., Population of people). Specific variables regarding a population (e.g., Age and
Income) may be specified and obtained. Data may be numerical or categorical.

Data Collection

Data Collection is the process of gathering information on targeted variables identified as data
requirements. The emphasis is on ensuring accurate and honest collection of data. Data
Collection ensures that data gathered is accurate such that the related decisions are valid.
Data Collection provides both a baseline to measure and a target to improve.
Data is collected from various sources ranging from organizational databases to the
information in web pages. The data thus obtained, may not be structured and may contain
irrelevant information. Hence, the collected data is required to be subjected to Data
Processing and Data Cleaning.

Data Processing

The data that is collected must be processed or organized for analysis. This includes
structuring the data as required for the relevant Analysis Tools. For example, the data might
have to be placed into rows and columns in a table within a Spreadsheet or Statistical
Application. A Data Model might have to be created.

Data Cleaning

The processed and organized data may be incomplete, contain duplicates, or contain errors.
Data Cleaning is the process of preventing and correcting these errors. There are several types
of Data Cleaning that depend on the type of data. For example, while cleaning the financial
data, certain totals might be compared against reliable published numbers or defined
thresholds. Likewise, quantitative data methods can be used for outlier detection that would
be subsequently excluded in analysis.

Data Analysis

Data that is processed, organized and cleaned would be ready for the analysis. Various data
analysis techniques are available to understand, interpret, and derive conclusions based on
the requirements. Data Visualization may also be used to examine the data in graphical
format, to obtain additional insight regarding the messages within the data.
Statistical Data Models such as Correlation, Regression Analysis can be used to identify the
relations among the data variables. These models that are descriptive of the data are helpful in
simplifying analysis and communicate results.
The process might require additional Data Cleaning or additional Data Collection, and hence
these activities are iterative in nature.

Communication

The results of the data analysis are to be reported in a format as required by the users to
support their decisions and further action. The feedback from the users might result in
additional analysis.
The data analysts can choose data visualization techniques, such as tables and charts, which
help in communicating the message clearly and efficiently to the users. The analysis tools
provide facility to highlight the required information with color codes and formatting in tables
and charts.
What is Data Visualization?

Data visualization is the graphical representation of information and data. By using visual


elements like charts, graphs, and maps, data visualization tools provide an accessible way to
see and understand trends, outliers, and patterns in data.

In the world of Big Data, data visualization tools and technologies are essential to
analyze massive amounts of information and make data-driven decisions.
Our eyes are drawn to colors and patterns. We can quickly identify red from blue, square from
circle. Our culture is visual, including everything from art and advertisements to TV and movies.

Data visualization is another form of visual art that grabs our interest and keeps our
eyes on the message.

Big Data is here and we need to know what it says

As the “age of Big Data” kicks into high-gear, visualization is an increasingly key tool to
make sense of the trillions of rows of data generated every day. Data visualization helps to tell
stories by curating data into a form easier to understand, highlighting the trends and outliers. A
good visualization tells a story, removing the noise from data and highlighting the useful
information.

However, it’s not simply as easy as just dressing up a graph to make it look better or
slapping on the “info” part of an infographic. Effective data visualization is a delicate balancing
act between form and function. The plainest graph could be too boring to catch any notice or it
make tell a powerful point; the most stunning visualization could utterly fail at conveying the
right message or it could speak volumes. The data and the visuals need to work together, and
there’s an art to combining great analysis with great storytelling.

5 Types of Big Data Visualization Categories

Temporal

Data visualizations belong in the temporal category if they satisfy two conditions: that they are
linear, and that they are one-dimensional. Temporal visualizations normally feature lines that
either stand alone or overlap with each other, with a start and finish time.

The plus? These are familiar charts we can recognize from school and the workplace, which
means we have an easier understanding when we read them.
Examples of temporal data visualization include:
 Scatter plots  Timelines
 Polar area diagrams  Line graphs
 Time series sequences

Hierarchical

Data visualizations that belong in the hierarchical category are those that order groups within
larger groups. Hierarchical visualizations are best suited if you’re looking to display clusters of
information, especially if they flow from a single origin point.

The downside to these graphs is that they tend to be more complex and difficult to read, which
is why the tree diagram is used most often. It is the simplest to follow due to its linear path.
Examples of hierarchical data visualizations include:

 Tree diagrams
 Ring charts
 Sunburst diagrams

Network

Datasets connect deeply with other datasets. Network data visualizations show how they relate
to one another within a network. In other words, demonstrating relationships between
datasets without wordy explanations.
Examples of network data visualizations include:

 Matrix charts
 Node-link diagrams
 Word clouds
 Alluvial diagrams

Multidimensional

Just like the name, multidimensional data visualizations have multiple dimensions. This means
that there are always 2 or more variables in the mix to create a 3D data visualization. Because
of the many concurrent layers and datasets, these types of visualizations tend to be the most
vibrant or eye-catching visuals. Another plus? These visuals can break down a ton of data down
to key takeaways.
Examples of multidimensional data visualizations include:

 Scatter plots
 Pie charts
 Venn diagrams
 Stacked bar graphs
 Histograms

Geospatial

Geospatial or spatial data visualizations relate to real life physical locations, overlaying familiar
maps with different data points. These types of data visualizations are commonly used to
display sales or acquisitions over time, and can be most recognizable for their use in political
campaigns or to display market penetration in multinational corporations.
Examples of geospatial data visualizations include:

 Flow map
 Density map
 Cartogram
 Heat map

The 17 Most Common Graph Types

Presentation of data and information is not simply about picking any data visualization
design. Matching data to the right information visualization begins by answering 5 key
questions:

1. What relationship am I trying to understand between my data sets?


2. Do I want to understand the distribution of data and look for outliers?
3. Am I looking to compare multiple values or looking to analyze a single value over
time?
4. Am I interested in analyzing trends in my data sets?
5. Is this visualization an important part of my overarching data story?

With those questions (and your answers) in mind, we’ll dive into the 11 most common graph
types you can mix and match to the best data visualization to bring your data story to life. We’ll
provide you with the data viz 101 and best practices, so feel free to navigate to the one you
want to explore the most.
1. Bar Chart

At some point or another, you've either seen, interacted with, or built a bar chart before. Bar
charts are such a popular graph visualization because of how easy you can scan them for quick
information. Bar charts organize data into rectangular bars that make it a breeze to compare
related data sets.

When do I use a bar chart visualization?

Use a bar chart for the following reasons:

 You want to compare two or more values in the same category


 You want to compare parts of a whole
 You don’t have too many groups (less than 10 works best)
 You want to understand how multiple similar data sets relate to each other

Don’t use a bar chart for the following reasons:

 The category you’re visualizing only has one value associated with it
 You want to visualize continuous data

Best practices for a bar chart visualization

If you use a bar chart, here are the key design best practices:

 Use consistent colours and labeling throughout so that you can identify relationships


more easily
 Simplify the length of the y-axis labels and don’t forget to start from 0 so you can keep
your data in order
2. Line Chart

Like bar charts, line charts help to visualize data in a compact and precise format which makes
it easy to rapidly scan information in order to understand trends. Line charts are used to show
resulting data relative to a continuous variable - most commonly time or money. The proper
use of color in this visualization is necessary because different colored lines can make it even
easier for users to analyze information.

When do I use a line chart visualization?

Use a line chart for the following reasons:

 You want to understand trends, patterns, and fluctuations in your data


 You want to compare different yet related data sets with multiple series
 You want to make projections beyond your data

Don’t use a line chart for the following reason:

 You want to demonstrate an in-depth view of your data

Best practices for a line chart visualization

If you use a line chart, here are the key design best practices:
 Along with using a different colour for each category you’re comparing, make sure you
also use solid lines to keep the line chart clear and concise
 To avoid confusion, try not to compare more than 4 categories in one line chart

3. Scatterplot

Scatterplots are the right data visualizations to use when there are many different data points,
and you want to highlight similarities in the data set. This is useful when looking for outliers or
for understanding the distribution of your data.
If the data forms a band extending from lower left to upper right, there most likely a positive
correlation between the two variables. If the band runs from upper left to lower right, a
negative correlation is probable. If it is hard to see a pattern, there is probably no correlation.

When do I use a scatter plot visualization?

Use a scatterplot for the following reasons:

 You want to show the relationship between two variables


 You want a compact data visualization

Don’t use a scatterplot for the following reasons:

 You want to rapidly scan information


 You want clear and precise data points
Best practices for a scatter plot visualization

If you use a scatterplot, here are the key design best practices:

 Although trend lines are a great way to analyze the data on a scatterplot, ensure you
stick to 1 or 2 trend lines to avoid confusion
 Don’t forget to start at 0 for the y-axis

4. Sparkline

Sparklines are arguably the best data visualization for showing trends because of how compact
they are. They get the job done when it comes to painting a picture for your audience fast.
Though, it is important to make sure your audience understands how to read sparklines
correctly to optimize their use.

When do I use a sparkline visualization?

Use a sparkline for the following reasons:

 You can pair it with a metric that has a current status value tracked over a specific time
period
 You want to show a specific trend behind a metric

Don’t use a sparkline for the following reasons:

 You want to plot multiple series


 You want to illustrate precise data points (i.e. individual values)

Best practices for a sparkline visualization

If you use a sparkline, here are the key design best practices:

 To assist with readability, consider adding indicators on the side that give a better
glimpse into the data, like in the example above
 Stick to one colour for your sparklines to keep them consistent on your dashboard

5. Pie Chart

Pie charts are an interesting graph visualization. At a high-level, they're easy to read and
understand because the parts-of-a-whole relationship is made very obvious. But top data visual
experts agree that one of their disadvantages is that the percentage of each section isn’t
obvious without adding numerical values to each slice of the pie.
So, what’s the point? As long as you stick to best practices, pie charts can be a quick way to scan
information.

When do I use a pie chart visualization?

Use a pie chart for the following reasons:


 You want to compare relative values
 You want to compare parts of a whole
 You want to rapidly scan metrics

Don’t use a pie chart for the following reason:

 You want to precisely compare data

Best practices for a pie chart visualization

If you use a pie chart, here are the key design best practices:

 Make sure that the pie slices add up to 100%. To make this easier, add the numerical
values and percentages to your pie chart
 Order the pieces of your pie according to size
 Use a pie chart if you have only up to 5 categories to compare. If you have too many
categories, you won’t be able to differentiate between the slices

6. Gauge

Gauges typically only compare two values on a scale: they compare a current value and a target
value, which often indicates whether your progress is either good or bad, in the green or in the
red.
When do I use a gauge visualization?

Use a gauge for the following reason:

 You want to track single metrics that have a clear, in the moment objective

Don’t use a gauge for the following reasons:

 You want to track multiple metrics


 You’re looking to visualize precise data points

Best practices for a gauge visualization

If you use a gauge, here are the key design best practices:

 Feel free to play around with the size and shape of the gauge. Whether it’s an arc, a
circle or a line, it’ll get the same job done
 Keep the colours consistent with what means “good” or “bad” for you and your
numbers

 Use consistent colours and labeling throughout so that you can identify relationships


more easily
 Simplify the length of the y-axis labels and don’t forget to start from 0 so you can keep
your data in order

7. Waterfall Chart
A waterfall chart is an information visualization that should be used to show how an initial value
is affected by intermediate values and resulted in a final value. The values can be either
negative or positive.

When do I use a waterfall chart visualization?

Use a waterfall chart for the following reason:

 To reveal the composition or makeup of a number

Don’t use a waterfall chart for the following reason:

 You want to focus on more than one number or metric

Best practices for a waterfall chart visualization

If you use a waterfall chart, here are the key design best practices:

 Use contrasting colors to highlight differences in data sets


 Choose warm colors to indicate increases and cool colors to indicate decreases

8. Funnel Chart

A funnel chart is your data visualization of choice if you want to display a series of steps and the
completion rate for each step. This can be used to track the sales process, a marketing funnel or
the conversion rate across a series of pages or steps. Funnel charts are most often used to
represent how something moves through different stages in a process. A funnel chart displays
values as progressively decreasing proportions amounting to 100 percent in total.
When do I use a funnel chart visualization?

Use a funnel chart for the following reason:

 To display a series of steps and each step’s completion rate

Don’t use a funnel chart for the following reason:

 To visualize individual, unconnected metrics

Best practices for a funnel chart visualization

If you use a funnel chart, here are the key design best practices:

 Scale the size of each section to accurately reflect the size of its data set
 Use contrasting colors or one color in gradating hues, from darkest to lightest as the
size of the funnel decreases

9. Heat Map

A heat map or choropleth map is a data visualization that shows the relationship between two
measures and provides rating information. The rating information is displayed using varying
colors or saturation and can exhibit ratings such as high to low or bad to awesome, and needs
improvement to working well.
It can also be a thematic map in which the area inside recognized boundaries is shaded in
proportion to the data being represented.
When do I use a heat map visualization?

Use a heat map for the following reasons:

 To show a relationship between two measures


 To illustrate an important detail
 To use a rating system

Don’t use a heat map for the following reason:

 To visualize individual, unconnected metrics

Best practices for a heat map visualization

If you use a heat map, here are the key design best practices:

 Use a simple map outline to avoid distracting from the data


 Use a single color in varying shades to show changes in data
 Avoid using multiple patterns

10. Histogram

A histogram is a data visualization that shows the distribution of data over a continuous interval
or certain time period. It's basically a combination of a vertical bar chart and a line chart. The
continuous variable shown on the X-axis is broken into discrete intervals and the number of
data you have in that discrete interval determines the height of the bar.
Histograms give an estimate as to where values are concentrated, what the extremes are and
whether there are any gaps or unusual values throughout your data set.

When do I use a histogram visualization?

Use a histogram for the following reason:

 To make comparisons in data sets over an interval or time


 To show a distribution of data

Don’t use a histogram for the following reason:

 To compare 3+ variables in data sets

Best practices for a histogram visualization

If you use a histogram, here are the key design best practices:

 Avoid bars that are too wide that can hide important details or too narrow that can
cause a lot of noise
 Use equal round numbers to create bar sizes
 Use consistent colours and labeling throughout so that you can identify relationships
more easily

11. Box Plot


A box plot, or box and whisker diagram, is a visual representation of displaying a distribution of
data, usually across groups, based on a five number summary: the minimum, first quartile, the
median (second quartile), third quartile, and the maximum.
The simplest of box plots display the full range of variation from minimum to maximum, the
likely range of variation, and a typical value. A box plot will also show the outliers.

When do I use a box plot visualization?

Use a box plot for the following reasons:

 To display or compare a distribution of data


 To identify the minimum, maximum and median of data

Don’t use a box plot for the following reason:

 To visualize individual, unconnected data sets

Best practices for a box plot visualization

If you use a box plot, here are the key design best practices:

 Ensure font sizes for labels and legends are big enough and line widths are thick
enough to understand the findings easily
 If plotting multiple datasets, use different symbols, line styles or colour to differentiate
each
 Always remove unnecessary clutter from the plots
(Source: Python Graph Gallery)

12. Maps

Maps are an amazing visualization to add to your dashboard if organizing data geographically
tells an important story for your business. For example, if your dashboard is looking looking at
monthly sales, it could be extremely useful to see the geographic locations of your customers.
Above, you’ll find a map visualization that integrates with Salesforce to measure accounts by
country. Keep in mind that if your dashboard is looking at daily sales, this visualization may
provide less value to your day-to-day discussions.
When do I use a map visualization?

Use a map for the following reason:

 Geography is an important part of your data story

Don’t use a map for the following reasons:

 You want to show precise data points


 Geography is not an important element of the dashboard’s overarching story

Best practices for a map visualization

If you use a map visualization, here are the key design best practices:

 Avoid using multiple colours and patterns on your map. Use varying shades of the same
colour instead
 Make sure to include a legend with your map, so that everyone understands what the
data means
13. Tables

If you’re someone who wants a little bit of everything in front of you in order to make thorough
decisions, then tables are the visualization to go with. Tables are great because you can display
both data points and graphics, such as bullet charts, icons, and sparklines. This visualization
type also organizes your data into columns and rows, which is great for reporting.

Above is an example of how to bring in your Google Analytics data into a table, so that you can
see all the information you need in one place.
One thing to keep in mind is that tables can sometimes be overwhelming if you have a
dashboard with many metrics that you want to display. It's important to find a happy medium
between large amounts of data (confusing) and too little data (waste of dashboard space).
When do I use a table visualization?

Use a table for the following reasons:

 You want to display two-dimensional data sets that can be organized categorically
 You can drill-down to break up large data sets with a natural drill-down path

Don’t use a table for the following reason:

 You want to display large amounts of data

Best practices for a table visualization

If you use a table, here are the key design best practices:
 Be mindful of the order of the data. Make sure that labels, categories and numbers
come first then move on to the graphics
 Try not to have more than 10 different rows in your table to avoid clutter

14. Indicators

Indicators are useful for an at a glance view of a metric you need to keep track of. An indicator
is simply a number showing the current value of whichever performance metric you’re tracking.
To make it more useful, add a comparison to the previous time period to show whether your
metric is tracking up or down.
Some people like to get fancy with indicators and use gauges or tickers. They present the same
type of information, just in a different visual way.
15. Area Chart
An area chart is very similar to a line graph but may do a better job at highlighting the relative
differences between items. Use an area chart when you want to see how different items stack
up or contribute to the whole.

16. Radar or Spider Chart


A radar chart is useful for understanding the relative differences between items in your data.
Radar charts make it easy to compare multiple items and see if there are differences that may
be worth further investigation.
17. Treemap
A treemap is a visual tool that can be used to break down the relationships between multiple
variables in your data. They can be used strictly as a presentation vehicle to show how your
products roll up into different categories, for example. A treemap can be broken down into 2-3
different layers to show the hierarchical relationship between items.

Source:

https://www.klipfolio.com/resources/articles/what-is-data-visualization

You might also like