Data Visualization
Data Visualization
Data Analysis is a process of collecting, transforming, cleaning, and modeling data with the
goal of discovering the required information. The results so obtained are communicated,
suggesting conclusions, and supporting decision-making. Data visualization is at times used to
portray the data for the ease of discovering the useful patterns in the data. The terms Data
Modeling and Data Analysis mean the same.
Data Analysis Process consists of the following phases that are iterative in nature.
The data required for analysis is based on a question or an experiment. Based on the
requirements of those directing the analysis, the data necessary as inputs to the analysis is
identified (e.g., Population of people). Specific variables regarding a population (e.g., Age and
Income) may be specified and obtained. Data may be numerical or categorical.
Data Collection
Data Collection is the process of gathering information on targeted variables identified as data
requirements. The emphasis is on ensuring accurate and honest collection of data. Data
Collection ensures that data gathered is accurate such that the related decisions are valid.
Data Collection provides both a baseline to measure and a target to improve.
Data is collected from various sources ranging from organizational databases to the
information in web pages. The data thus obtained, may not be structured and may contain
irrelevant information. Hence, the collected data is required to be subjected to Data
Processing and Data Cleaning.
Data Processing
The data that is collected must be processed or organized for analysis. This includes
structuring the data as required for the relevant Analysis Tools. For example, the data might
have to be placed into rows and columns in a table within a Spreadsheet or Statistical
Application. A Data Model might have to be created.
Data Cleaning
The processed and organized data may be incomplete, contain duplicates, or contain errors.
Data Cleaning is the process of preventing and correcting these errors. There are several types
of Data Cleaning that depend on the type of data. For example, while cleaning the financial
data, certain totals might be compared against reliable published numbers or defined
thresholds. Likewise, quantitative data methods can be used for outlier detection that would
be subsequently excluded in analysis.
Data Analysis
Data that is processed, organized and cleaned would be ready for the analysis. Various data
analysis techniques are available to understand, interpret, and derive conclusions based on
the requirements. Data Visualization may also be used to examine the data in graphical
format, to obtain additional insight regarding the messages within the data.
Statistical Data Models such as Correlation, Regression Analysis can be used to identify the
relations among the data variables. These models that are descriptive of the data are helpful in
simplifying analysis and communicate results.
The process might require additional Data Cleaning or additional Data Collection, and hence
these activities are iterative in nature.
Communication
The results of the data analysis are to be reported in a format as required by the users to
support their decisions and further action. The feedback from the users might result in
additional analysis.
The data analysts can choose data visualization techniques, such as tables and charts, which
help in communicating the message clearly and efficiently to the users. The analysis tools
provide facility to highlight the required information with color codes and formatting in tables
and charts.
What is Data Visualization?
In the world of Big Data, data visualization tools and technologies are essential to
analyze massive amounts of information and make data-driven decisions.
Our eyes are drawn to colors and patterns. We can quickly identify red from blue, square from
circle. Our culture is visual, including everything from art and advertisements to TV and movies.
Data visualization is another form of visual art that grabs our interest and keeps our
eyes on the message.
As the “age of Big Data” kicks into high-gear, visualization is an increasingly key tool to
make sense of the trillions of rows of data generated every day. Data visualization helps to tell
stories by curating data into a form easier to understand, highlighting the trends and outliers. A
good visualization tells a story, removing the noise from data and highlighting the useful
information.
However, it’s not simply as easy as just dressing up a graph to make it look better or
slapping on the “info” part of an infographic. Effective data visualization is a delicate balancing
act between form and function. The plainest graph could be too boring to catch any notice or it
make tell a powerful point; the most stunning visualization could utterly fail at conveying the
right message or it could speak volumes. The data and the visuals need to work together, and
there’s an art to combining great analysis with great storytelling.
Temporal
Data visualizations belong in the temporal category if they satisfy two conditions: that they are
linear, and that they are one-dimensional. Temporal visualizations normally feature lines that
either stand alone or overlap with each other, with a start and finish time.
The plus? These are familiar charts we can recognize from school and the workplace, which
means we have an easier understanding when we read them.
Examples of temporal data visualization include:
Scatter plots Timelines
Polar area diagrams Line graphs
Time series sequences
Hierarchical
Data visualizations that belong in the hierarchical category are those that order groups within
larger groups. Hierarchical visualizations are best suited if you’re looking to display clusters of
information, especially if they flow from a single origin point.
The downside to these graphs is that they tend to be more complex and difficult to read, which
is why the tree diagram is used most often. It is the simplest to follow due to its linear path.
Examples of hierarchical data visualizations include:
Tree diagrams
Ring charts
Sunburst diagrams
Network
Datasets connect deeply with other datasets. Network data visualizations show how they relate
to one another within a network. In other words, demonstrating relationships between
datasets without wordy explanations.
Examples of network data visualizations include:
Matrix charts
Node-link diagrams
Word clouds
Alluvial diagrams
Multidimensional
Just like the name, multidimensional data visualizations have multiple dimensions. This means
that there are always 2 or more variables in the mix to create a 3D data visualization. Because
of the many concurrent layers and datasets, these types of visualizations tend to be the most
vibrant or eye-catching visuals. Another plus? These visuals can break down a ton of data down
to key takeaways.
Examples of multidimensional data visualizations include:
Scatter plots
Pie charts
Venn diagrams
Stacked bar graphs
Histograms
Geospatial
Geospatial or spatial data visualizations relate to real life physical locations, overlaying familiar
maps with different data points. These types of data visualizations are commonly used to
display sales or acquisitions over time, and can be most recognizable for their use in political
campaigns or to display market penetration in multinational corporations.
Examples of geospatial data visualizations include:
Flow map
Density map
Cartogram
Heat map
Presentation of data and information is not simply about picking any data visualization
design. Matching data to the right information visualization begins by answering 5 key
questions:
With those questions (and your answers) in mind, we’ll dive into the 11 most common graph
types you can mix and match to the best data visualization to bring your data story to life. We’ll
provide you with the data viz 101 and best practices, so feel free to navigate to the one you
want to explore the most.
1. Bar Chart
At some point or another, you've either seen, interacted with, or built a bar chart before. Bar
charts are such a popular graph visualization because of how easy you can scan them for quick
information. Bar charts organize data into rectangular bars that make it a breeze to compare
related data sets.
The category you’re visualizing only has one value associated with it
You want to visualize continuous data
If you use a bar chart, here are the key design best practices:
Like bar charts, line charts help to visualize data in a compact and precise format which makes
it easy to rapidly scan information in order to understand trends. Line charts are used to show
resulting data relative to a continuous variable - most commonly time or money. The proper
use of color in this visualization is necessary because different colored lines can make it even
easier for users to analyze information.
If you use a line chart, here are the key design best practices:
Along with using a different colour for each category you’re comparing, make sure you
also use solid lines to keep the line chart clear and concise
To avoid confusion, try not to compare more than 4 categories in one line chart
3. Scatterplot
Scatterplots are the right data visualizations to use when there are many different data points,
and you want to highlight similarities in the data set. This is useful when looking for outliers or
for understanding the distribution of your data.
If the data forms a band extending from lower left to upper right, there most likely a positive
correlation between the two variables. If the band runs from upper left to lower right, a
negative correlation is probable. If it is hard to see a pattern, there is probably no correlation.
If you use a scatterplot, here are the key design best practices:
Although trend lines are a great way to analyze the data on a scatterplot, ensure you
stick to 1 or 2 trend lines to avoid confusion
Don’t forget to start at 0 for the y-axis
4. Sparkline
Sparklines are arguably the best data visualization for showing trends because of how compact
they are. They get the job done when it comes to painting a picture for your audience fast.
Though, it is important to make sure your audience understands how to read sparklines
correctly to optimize their use.
You can pair it with a metric that has a current status value tracked over a specific time
period
You want to show a specific trend behind a metric
If you use a sparkline, here are the key design best practices:
To assist with readability, consider adding indicators on the side that give a better
glimpse into the data, like in the example above
Stick to one colour for your sparklines to keep them consistent on your dashboard
5. Pie Chart
Pie charts are an interesting graph visualization. At a high-level, they're easy to read and
understand because the parts-of-a-whole relationship is made very obvious. But top data visual
experts agree that one of their disadvantages is that the percentage of each section isn’t
obvious without adding numerical values to each slice of the pie.
So, what’s the point? As long as you stick to best practices, pie charts can be a quick way to scan
information.
If you use a pie chart, here are the key design best practices:
Make sure that the pie slices add up to 100%. To make this easier, add the numerical
values and percentages to your pie chart
Order the pieces of your pie according to size
Use a pie chart if you have only up to 5 categories to compare. If you have too many
categories, you won’t be able to differentiate between the slices
6. Gauge
Gauges typically only compare two values on a scale: they compare a current value and a target
value, which often indicates whether your progress is either good or bad, in the green or in the
red.
When do I use a gauge visualization?
You want to track single metrics that have a clear, in the moment objective
If you use a gauge, here are the key design best practices:
Feel free to play around with the size and shape of the gauge. Whether it’s an arc, a
circle or a line, it’ll get the same job done
Keep the colours consistent with what means “good” or “bad” for you and your
numbers
7. Waterfall Chart
A waterfall chart is an information visualization that should be used to show how an initial value
is affected by intermediate values and resulted in a final value. The values can be either
negative or positive.
If you use a waterfall chart, here are the key design best practices:
8. Funnel Chart
A funnel chart is your data visualization of choice if you want to display a series of steps and the
completion rate for each step. This can be used to track the sales process, a marketing funnel or
the conversion rate across a series of pages or steps. Funnel charts are most often used to
represent how something moves through different stages in a process. A funnel chart displays
values as progressively decreasing proportions amounting to 100 percent in total.
When do I use a funnel chart visualization?
If you use a funnel chart, here are the key design best practices:
Scale the size of each section to accurately reflect the size of its data set
Use contrasting colors or one color in gradating hues, from darkest to lightest as the
size of the funnel decreases
9. Heat Map
A heat map or choropleth map is a data visualization that shows the relationship between two
measures and provides rating information. The rating information is displayed using varying
colors or saturation and can exhibit ratings such as high to low or bad to awesome, and needs
improvement to working well.
It can also be a thematic map in which the area inside recognized boundaries is shaded in
proportion to the data being represented.
When do I use a heat map visualization?
If you use a heat map, here are the key design best practices:
10. Histogram
A histogram is a data visualization that shows the distribution of data over a continuous interval
or certain time period. It's basically a combination of a vertical bar chart and a line chart. The
continuous variable shown on the X-axis is broken into discrete intervals and the number of
data you have in that discrete interval determines the height of the bar.
Histograms give an estimate as to where values are concentrated, what the extremes are and
whether there are any gaps or unusual values throughout your data set.
If you use a histogram, here are the key design best practices:
Avoid bars that are too wide that can hide important details or too narrow that can
cause a lot of noise
Use equal round numbers to create bar sizes
Use consistent colours and labeling throughout so that you can identify relationships
more easily
If you use a box plot, here are the key design best practices:
Ensure font sizes for labels and legends are big enough and line widths are thick
enough to understand the findings easily
If plotting multiple datasets, use different symbols, line styles or colour to differentiate
each
Always remove unnecessary clutter from the plots
(Source: Python Graph Gallery)
12. Maps
Maps are an amazing visualization to add to your dashboard if organizing data geographically
tells an important story for your business. For example, if your dashboard is looking looking at
monthly sales, it could be extremely useful to see the geographic locations of your customers.
Above, you’ll find a map visualization that integrates with Salesforce to measure accounts by
country. Keep in mind that if your dashboard is looking at daily sales, this visualization may
provide less value to your day-to-day discussions.
When do I use a map visualization?
If you use a map visualization, here are the key design best practices:
Avoid using multiple colours and patterns on your map. Use varying shades of the same
colour instead
Make sure to include a legend with your map, so that everyone understands what the
data means
13. Tables
If you’re someone who wants a little bit of everything in front of you in order to make thorough
decisions, then tables are the visualization to go with. Tables are great because you can display
both data points and graphics, such as bullet charts, icons, and sparklines. This visualization
type also organizes your data into columns and rows, which is great for reporting.
Above is an example of how to bring in your Google Analytics data into a table, so that you can
see all the information you need in one place.
One thing to keep in mind is that tables can sometimes be overwhelming if you have a
dashboard with many metrics that you want to display. It's important to find a happy medium
between large amounts of data (confusing) and too little data (waste of dashboard space).
When do I use a table visualization?
You want to display two-dimensional data sets that can be organized categorically
You can drill-down to break up large data sets with a natural drill-down path
If you use a table, here are the key design best practices:
Be mindful of the order of the data. Make sure that labels, categories and numbers
come first then move on to the graphics
Try not to have more than 10 different rows in your table to avoid clutter
14. Indicators
Indicators are useful for an at a glance view of a metric you need to keep track of. An indicator
is simply a number showing the current value of whichever performance metric you’re tracking.
To make it more useful, add a comparison to the previous time period to show whether your
metric is tracking up or down.
Some people like to get fancy with indicators and use gauges or tickers. They present the same
type of information, just in a different visual way.
15. Area Chart
An area chart is very similar to a line graph but may do a better job at highlighting the relative
differences between items. Use an area chart when you want to see how different items stack
up or contribute to the whole.
Source:
https://www.klipfolio.com/resources/articles/what-is-data-visualization