Unit 3
Unit 3
Aggregate functions allow you to summarize or change the granularity of your data.
For example, you might want to know exactly how many orders your store had for a
particular year. You can use the COUNTD function to tally the exact number of unique
orders your company had, and then break the visualization down by year.
COUNTD(Order ID)
Aggregations and floating-point arithmetic: The results of some aggregations may not
always be exactly as expected. For example, you may find that the SUM function returns
a value such as -1.42e-14 for a column of numbers that you know should sum to exactly
0. This happens because the Institute of Electrical and Electronics Engineers (IEEE) 754
floating-point standard requires that numbers be stored in binary format, which means
that numbers are sometimes rounded at extremely fine levels of precision. You can
eliminate this potential distraction by using the ROUND function (see Number
Functions) or by formatting the number to show fewer decimal places.
ATTR
Syntax ATTR(expression)
Definition Returns the value of the expression if it has a single value for all
rows. Otherwise returns an asterisk. Null values are ignored.
AVG
Syntax AVG(expression)
Definition Returns the average of all the values in the expression. Null values
are ignored.
COLLECT
Syntax COLLECT(spatial)
CORR
Example example
Database
limitations CORR is available with the following data sources: Tableau data
extracts, Cloudera Hive, EXASolution, Firebird (version 3.0 and
later), Google BigQuery, Hortonworks Hadoop Hive, IBM PDA
(Netezza), Oracle, PostgreSQL, Presto, SybaseIQ, Teradata, Vertica.
COUNT
Syntax COUNT(expression)
Definition Returns the number of items. Null values are not counted.
COUNTD
Syntax COUNTD(expression)
Definition Returns the number of distinct items in a group. Null values are not
counted.
COVAR
MAX
Output Same data type as the argument, or NULL if any part of the
argument is null.
Definition
Returns the maximum of the two arguments, which must be of the
same data type.
As a comparison
MEDIAN
Syntax MEDIAN(expression)
Definition Returns the median of an expression across all records. Null values
are ignored.
MIN
Output Same data type as the argument, or NULL if any part of the
argument is null.
Definition
Returns the minimum of the two arguments, which must be of the
same data type.
Example MIN(4,7) = 4
MIN(#3/25/1986#, #2/20/2021#) = #3/25/1986#
MIN([Name]) = "Abebi"
PERCENTILE
Syntax PERCENTILE(expression, number)
STDEV
Syntax STDEV(expression)
Definition Returns the statistical standard deviation of all values in the given
expression based on a sample of the population.
STDEVP
Syntax STDEVP(expression)
Definition Returns the statistical standard deviation of all values in the given
expression based on a biased population.
SUM
Syntax SUM(expression)
Definition Returns the sum of all values in the expression. Null values are
ignored.
Notes SUM can only be used with numeric fields.
VAR
Syntax VAR(expression)
Definition Returns the statistical variance of all values in the given expression
based on a sample of the population.
VARP
Syntax VARP(expression)
Definition Returns the statistical variance of all values in the given expression
on the entire population.
Sometimes your data source does not contain a field (or column) that you need for your
analysis. For example, your data source might contain fields with values for Sales and
Profit, but not for Profit Ratio. If this is the case, you can create a calculated field for
Profit Ratio using data from the Sales and Profit fields.
This topic demonstrates how to create a simple calculated field using an example.
Follow along with the steps below to learn how to create an aggregate calculation.
1. In Tableau Desktop, connect to the Sample - Superstore saved data source, which
comes with Tableau.
2. Navigate to a worksheet and select Analysis > Create Calculated Field.
3. In the calculation editor that opens, do the following:
○ Name the calculated field Margin.
○ Enter the following formula:
IIF(SUM([Sales]) !=0, SUM([Profit])/SUM([Sales]), 0)
Note: You can use the function reference to find and add aggregate
functions and other functions (like the logical IIF function in this example)
to the calculation formula. For more information, see Use the functions
reference in the calculation editor.
○ When finished, click OK.
4. The new aggregate calculation appears under Measures in the Data pane. Just like
your other fields, you can use it in one or more visualizations.
Note: Aggregation calculations are always measures.
When Margin is placed on a shelf or card in the worksheet, its name is changed to
AGG(Margin), which indicates that it is an aggregate calculation and cannot be
aggregated any further.
● For any aggregate calculation, you cannot combine an aggregated value and a
disaggregated value. For example, SUM(Price)*[Items] is not a valid expression
because SUM(Price) is aggregated and Items is not. However, SUM(Price*Items)
and SUM(Price)*SUM(Items) are both valid.
● Constant terms in an expression act as aggregated or disaggregated values as
appropriate. For example: SUM(Price*7) and SUM(Price)*7 are both valid
expressions.
● All of the functions can be evaluated on aggregated values. However, the
arguments to any given function must either all be aggregated or all disaggregated.
For example: MAX(SUM(Sales),Profit) is not a valid expression because Sales is
aggregated and Profit is not. However, MAX(SUM(Sales),SUM(Profit)) is a valid
expression.
● The result of an aggregate calculation is always a measure. This includes
expressions like ATTR(Dimension) or MIN(Dimension).
● Like predefined aggregations, aggregate calculations are computed correctly for
grand totals. Refer to Grand Totals for more information.
Table calculations
Table calculations allow you to transform values at the level of detail of the visualization
only.
For example, consider the same sample table as above. If you wanted to compute the
number of years since the author released their last book, you might use the following
table calculation:
The result is shown below. The new column, titled Years Since Previous Book, displays
the number of years between the book released in that row and the book released in the
previous row (on the far right-side of the column) and demonstrates how the table
calculation is being computed (on the left-side of the column).
The colors help demonstrate how the table calculation is being computed. In this case, the
table calculation is being computed down each pane.
Book ID Book Series Year Author Years Since Previous Book
Name Released
For example, in the image below, Author is removed from the viz. Since the table
calculation is computed by pane, removing Author changes the granularity and layout of
the viz (instead of two panes there is now only one). The table calculation therefore
calculates the time between 1956 and 1999.
Work with Data Fields in the Data Pane
Tableau displays data source connections and data fields for the workbook in the Data
pane on the left side of the workspace.
After you connect to your data and set up the data source with Tableau, the data source
connections and fields appear on the left side of the workbook in the Data pane.
Current data source connections appear at the top of the Data pane. When you have more
than one connection available, click a connection to select it and start working with that
data.
You build visualizations by adding fields from the Data pane to the view. For details, see
Start Building a Visualization by Dragging Fields to the View.
Fields can be organized by table (Group by Data Source Table) or folder (Group by
Folder). Dimensions are displayed above the gray line, and measures below the gray line
for each table or folder. In some cases, a table or folder might contain only dimensions, or
only measures to start with.
● Calculated fields are listed with their originating field, if all of their input fields
come from the same table.
● Sets are listed with the table with their originating field.
● Parameters are global to the workbook and are displayed in the Parameters area.
● Fields that don't belong to a specific table are displayed in the general area below
the tables. These include: aggregated calculations, calculations that use fields from
multiple tables, Measure Names, and Measure Values.
● In version 2024.2 and later: Field names are displayed in light gray text in the Data
pane when they're not related to any fields in use in the view. You can still use
these fields for analysis in the viz, but unrelated fields are evaluated differently in
analysis than fields that are related. You might see this behavior if you are using a
data source with multi-fact relationships
Below the data source connections in the Data pane are the fields that are available in the
currently selected data source. You can toggle between the Data and Analytics panes in a
worksheet. For details on the Analytics pane, see Apply Advanced Analysis to a View
(Analytics Pane).
Fields from a single-table data source in the Data pane
● Dimension fields – Fields that contain qualitative values (such as names, dates, or
geographical data). You can use dimensions to categorize, segment, and reveal the
details in your data. Dimensions affect the level of detail in the view. Examples of
dimensions include dates, customer names, and customer segments.
● Measure fields – Fields that contain numeric, quantitative values can be measured.
You can apply calculations to them and aggregate them. When you drag a measure
into the view, Tableau applies an aggregation to that measure (by default).
Examples of measures: sales, profit, number of employees, temperature,
frequency.
● Calculated fields – If your underlying data doesn't include all of the fields you
need to answer your questions, you can create new fields in Tableau using
calculations and then save them as part of your data source. These fields are called
calculated fields.
● Sets – Subsets of data that you define. Sets are custom fields based on existing
dimensions and criteria that you specify.
By default the field names defined in the data source are displayed in the Data pane. You
can rename fields and member names, create hierarchies, and organize the fields into
groups and folders.
Data sources contain fields. For relational data sources that you connect to, the fields are
determined by the columns of a table or view. Each field contains a unique attribute of the
data such as customer name, sales total, product type, and so on.
For cube (multidimensional) data sources, the fields are determined by the dimensions
and measures of a cube. In Tableau, cube data sources are supported only in Windows.
Each field has a data type (that you can change if needed), and a role: discrete dimension,
continuous dimension, discrete measure, or continuous measure.
Each field also includes some default settings, such as a default aggregation of SUM or
AVG, depending on the structure of the current view.
The Data pane can also contain a number of fields that do not come from your original
data: Measure Names and Measure Values, Number of Records, Latitude and Longitude.
● The Measure Values field contains all the measures in your data, collected into a
single field with continuous values. Drag individual measure fields out of the
Measure Values card to remove them from the view.
● The Measure Names field contains the names of all measures in your data,
collected into a single field with discrete values.
Count of Table
To see the count for a table, drag its Count field into the view. To see the count for all
tables, select the Count field for each table in the Data pane, and then click the Text Table
in Show Me.
You can't build calculations on top of a table's Count field, and it is aggregate-only.
To select a data source connection for analysis, click the data source connection name in
the Data pane.
To view a context menu for the data source, click Data in the top menu and then click
on the data source in the menu list.
To search for fields in the Data pane, click the magnifying class icon and then type in
the text box.
To see the underlying data, click the View Data icon at the top of the Data pane.
In cases where Tableau has misclassified a field as a dimension or a measure, possibly
because of the data type, you can convert it and change its role.
To convert a measure to a dimension, drag the measure and drop it into the Dimensions
area in the Data pane.
When you drag a field into the view, it will have certain default settings and
characteristics. You can customize a field that is already in the view, just for that instance
of the field. Or you can change its settings in the Data pane to make the field use those
settings going forward.
You can control the definition of a field in the view, depending on how you want to work
with that field data.
Relational versus cube data
The Data pane for a relational and cube data source are shown below. Note that the panes
look essentially the same for both data sources in that the fields are organized into
dimensions and measures. However, the cube data source contains hierarchies for
dimensions. For example, notice that the Employee dimension in the cube Data pane
contains hierarchical members such as Manager Name and Employee Dept.
Relational data sources don’t have built-in hierarchies. However, relational data sources
often have related dimensions that have an inherent hierarchy. For example, a data source
may have fields for Country, State, and City. These fields could be grouped into a
hierarchy called Location. You can assemble relational hierarchies by dragging and
dropping in the Data pane.
Data pane with relational data (left image) versus cube data (right image)
● Tableau: Tableau is a very powerful data visualization tool that can be used by
data analysts, scientists, statisticians, etc. to visualize the data and get a clear
opinion based on the data analysis. Tableau is very famous as it can take in
data and produce the required data visualization output in a very short time.
● Logical Function: Tableau provides various Logical Functions to perform
logical operations on our data. They are Tableau AND, NOT, OR, IF, ELSEIF,
IF Else, CASE, ISNULL, IFNULL, ZN, IIF, etc.
AND Function: The AND Function is employed to see multiple expressions. The syntax
of the AND Function is as shown below:
If both the conditions are True, it returns True. Otherwise, it returns False.
IIF Function: The Tableau IIF function is that the simple version of the If Else Function.
If both the condition is True, then it'll return First Statement otherwise, the second
statement. The syntax of this Tableau IIF Function is:
IIF(Expression, True_statement, False_Statement)
NOT Function: The Tableau NOT function return the exact opposite. I mean, True will
become false and vice versa. The syntax of this Tableau NOT Function is:
NOT(Expression)
ISNULL Function: Tableau ISNULL function will check whether it is NULL or Not. If
it's NULL, then it returns TRUE; otherwise, False will return. The syntax of the Tableau
ISNULL Function is:
ISNULL(Expression)
ZN Function: Tableau ZN function will return the first values of Not Null values, and 0
for Null values. In simple English, ZN in Tableau is employed to exchange the NULL
values with 0. The syntax of the Tableau ZN Function is:
ZN(Expression)
IF-END
● In this example, we simply create a new calculated field by using the IF
function on a field.
● View new calculated field.
● Use in Visualization.
● It has a drawback that creates null values in case of the false condition.
IF-ELSE-END
● In this example, we simply edit that previously calculated field by using the
IF-ELSE function in the same field.
● View a new calculated field.
● Use in Visualization.
● It overcomes a drawback that creates null values in case of the false condition.
IF-ELSEIF-ELSE-END
● In this example, we simply create a new calculated field by using the
IF-ELSEIF-ELSE function on a field.
● View a new calculated field.
● Use in Visualization.
Case Function: Case Function is the part of Logical functions in Tableau. These
functions are used to perform the logical test and return the required value when the test
expression is true.
CASE [<expression>]
WHEN <expression> THEN <expression>
WHEN <expression> THEN <expression>
ELSE <expression>
END
Parameters
Parameters in Tableau are dynamic values that can be used to change the behavior
of a visualization. They allow users to interact with their data by choosing from a
list of predefined values or by entering specific values themselves.
For example, you may create a calculated field that returns True if Sales is greater
than $500,000 and otherwise returns False. You can replace the constant value of
“500000” in the formula with a parameter. Then, using the parameter control, you
can dynamically change the threshold in your calculation.
Create a parameter
1. In the Data pane, click the dropdown arrow in the upper right corner and
select Create Parameter.
2. In the Create Parameter dialog box, give the field a Name.
4. Optional: Specify a current value. This is the default value for the parameter.
5. Optional: Specify a value when the workbook opens.
6. Specify the display format to use in the parameter control (Tableau Desktop
only).
7. Specify how the parameter accepts values. You can select from the following
options:
○ All: The parameter control is a simple text field.
○ List: The parameter control provides a list of possible values for you
to select from.
■ If you select List, you must specify the list of values. Click in
the left column to type your list of values, or you can add
members of a field or paste from the clipboard by selecting
Add values from.
○ Range: The parameter control lets you select values within a specified
range.
■ If you select Range, you must specify a minimum, maximum,
and step size. The step size controls the jumps between values,
such as letting you choose each number (5, 6, 7...) or going
from 5 to 10 to 15.
8. The availability of these options is determined by the data type. For
example, a string parameter doesn't support Range.
To refresh the parameter’s list of values (or domain) whenever the workbook
opens, select List or Range, and then select When the workbook opens.
Notice that some options are grayed out because the workbook is
dynamically pulling values from the data source.
9. When finished, click OK.
The parameter is now listed in the Parameters section at the bottom of the Data
pane.
Edit a parameter
You can edit parameters from the Data pane or the parameter control. Editing is for
things like changing the allowable range or the data type. To simply change the
value or a parameter, use the parameter control.
To edit a parameter:
Delete a parameter
To delete a parameter, right-click it in the Data pane and select Delete. Any
calculated fields that use the deleted parameter become invalid.
Use a parameter
A parameter won't do anything until it's tied to an element in the viz. Parameters
can be referenced in calculations, filters, and reference lines. Parameters are global
across the workbook and can be used in any worksheet.
After the element references the parameter, be sure to Show a parameter control in
the viz (or set up a parameter action, or a dynamic parameter). If there's no way to
change the value of the parameter, it doesn't do any good to have it set up in the
first place.
To use a parameter in a calculation, type the name of the parameter and it appears
in the suggested options, just like typing a field name. You can also drag the
parameter from the Data pane and drop it in the calculation editor.
Parameters give you a way to dynamically modify values in a Top N filter. Rather
than manually setting the number of values you want to show in the filter, you can
use a parameter. A list of parameters is available in the dropdown lists on the Top
tab of the Filter dialog box. Select the parameter you want to use in the filter.
Use a parameter in a reference line
Parameters give you a way to dynamically modify a reference line, band, or box.
For example, instead of showing a reference line at a fixed location on the axis,
you can reference a parameter. Then you can use the parameter control to move the
reference line.
A list of parameters is available in the Value dropdown list in the Add Reference
Line, Band, or Box dialog box. Select the parameter you want to use. The reference
line is drawn at the Current Value specified by the parameter.
Types of Calculations in Tableau
There are three main types of calculations you can use to create calculated fields in
Tableau:
● Basic expressions
● Level of Detail (LOD) expressions
● Table calculations
Basic expressions
Basic expressions allow you to transform values or members at the data source level of
detail (a row-level calculation) or at the visualization level of detail (an aggregate
calculation).
For example, consider the following sample table, which contains data on two fantasy
authors and their books. Perhaps you want to create a column with only the author's last
name and a column that displays how many books are in each series.
Row-level calculations
To create a column that displays the author's last name for every row in the data source,
you can use the following row-level calculation that splits on a space:
SPLIT([Author], '', 2 )
The result can be seen below. The new column, titled Author Last Name is shown on the
far right. The colors demonstrate the level of detail the calculation is performed at. In this
case, the calculation is performed at the row-level of the data source, so each row is
colored separately.
Book ID Book Name Series Year Author Author Last
Released Name
Aggregate calculations
To create a column that displays how many books are in each series, you can use the
following aggregate calculation:
COUNT([Series])
The result can be seen below. The new column, titled Number of Books in Series - at
Series level of detail shows how that calculation would be performed at the Series level
of detail in the view. The colors help demonstrate the level of detail in which the
calculation is being performed.
Level of Detail expressions (also known as LOD expressions) allow you to compute
values at the data source level and the visualization level. However, LOD expressions let
you control the granularity you want to compute. They can be performed at a more
granular level (INCLUDE), a less granular level (EXCLUDE), or an entirely independent
level (FIXED).
Follow along with the steps to learn how to create and use an LOD expression in Tableau.
Step 1: Set up the Visualization
1. Open Tableau Desktop and connect to the Sample-Superstore saved data source.
2. Navigate to a new worksheet.
3. From the Data pane, drag Region to the Columns Shelf.
4. From the Data pane, drag Sales to the Rows Shelf.
A bar chart showing the sum of sales for each region appears.
Instead of the sum of all sales per region, perhaps you want to also see the average sales
per customer for each region. You can use an LOD expression to do this.
1. From the Data pane, drag Sales Per Customer to the Rows shelf and place it to
the left of SUM(Sales).
2. On the Rows shelf, right-click Sales Per Customer and select Measure (Sum) >
Average.
You can now see both the sum of all sales and the average sales per customer for
each region. For example, you can see that in the Central region, the sales totaled
approximately $500,000 with an average sale for each customer being
approximately 800 USD.
Use a Quick LOD expression
You can create a FIXED LOD expression without needing to enter the full calculation
into the calculation dialog.
1. In the Data pane, control-click drag the measure you want to aggregate onto the
desired dimension. A new field appears as a FIXED LOD calculation.
The aggregation in the aggregate expression will come from the default
aggregation on the measure. This is usually SUM. To change the aggregation or
otherwise edit the LOD, right click on the new field and edit the calculation.
2. Or, in the Data pane, select the measure you want to aggregate and then
control-click the dimension you want to aggregate on.
○ Right-click on the selected fields and select Create > LOD Calculation...
○ (Optional) Modify the LOD in the calculation editor.
○ Select OK .
3.
● FIXED
● INCLUDE
● EXCLUDE
LOD expression syntax
The first element after the opening curly brace is one of the following scoping keywords:
FIXED
Example
The following FIXED level of detail expression computes the sum of sales per region:
This level of detail expression, named [Sales by Region], is then placed on Text to show
total sales per region.
The view level of detail is [Region] and [State]. But FIXED level of detail expressions
don't look at the dimensions in the view, only the dimensions specified in the calculation
(here, Region). Therefore, the values for the individual states in each region are identical.
If the keyword had been INCLUDE instead of FIXED, the values would be different for
each state. INCLUDE uses the dimension in the expression ([Region]) and any additional
dimensions in the view ([State]) when evaluating the expression.
INCLUDE
INCLUDE level of detail expressions compute values using the specified dimensions in
addition to whatever dimensions are in the view.
INCLUDE can be useful when you want to calculate at a fine level of detail in the
database, but reaggregate at a coarser level of detail in your view. Fields based on
INCLUDE level of detail expressions change as you add or remove dimensions from the
view.
Example 1
This INCLUDE level of detail expression computes total sales per customer:
With the LOD on the Rows shelf, aggregated as AVG, and [Region] on the Columns
shelf, the view shows the average customer sales amount per region:
Example 2
This INCLUDE level of detail expression calculates sum of sales on a per-state basis:
The calculation is placed on the Rows shelf and is aggregated as an average. The
resulting visualization averages the sum of sales by state across categories.
When Segment is added to the Columns shelf and the calculation is moved to Label, the
LOD expression results update. Now you can see how the average sum of sales per state
varies across categories and segments.
EXCLUDE
EXCLUDE level of detail expressions declare dimensions to omit from the view level of
detail.
EXCLUDE is useful for 'percent of total' or 'difference from overall average' scenarios.
They're comparable to Totals and Reference Lines.
The following EXCLUDE level of detail expression computes the average sales total per
month then excludes the month.
○ Right-click "Order Date" in the Data pane and select Create > Create
Custom Date.
○ From the Detail list, select "Month / Year". Leave the selection as discrete.
The resulting view that shows the difference between actual sales per month and the
average monthly sales for the entire four-year period:
Example 2
Create a level of detail expression, named "ExcludeRegion", that excludes [Region] from
the sum of [Sales]:
Consider the following view, which breaks out the sum of sales by region and by month:
Putting[ExcludeRegion] on Color shades the view to show total sales by month without
the regional component:
Table-Scoped
It’s possible to define a level of detail expression at the table level without using any of
the scoping keywords. For example, the following expression returns the minimum
(earliest) order date for the entire table:
{MIN([Order Date])}
For example, consider the same sample table as above. If you wanted to compute when a
book series was launched, you might use the following LOD expression:
The result can be seen below. The new column, titled Series Launched, displays the
minimum year for each series. The colors help demonstrate the level of detail in which
the calculation is being applied.
In Tableau, the calculation remains at the Series level of detail since it uses the FIXED
function.
If you add another field to the view (which adds more granularity) the values for the
calculation are not affected, unlike an aggregate calculation.