0% found this document useful (0 votes)
65 views6 pages

Reporting Aggregated Data Using The Group Functions

Group functions, also called aggregate functions, return one value for each set of zero or more rows. They process data from multiple rows and return a single result. Common group functions include COUNT, SUM, AVG, MIN, MAX, and more. The GROUP BY clause groups rows together and treats each group as a whole, allowing group functions to aggregate values within each group. ORDER BY is more limited when used with GROUP BY, as it can only sort by columns in the GROUP BY or select list, not other columns. Functions like scalar and aggregate functions can be nested according to their datatype compatibility rules.

Uploaded by

Florin Nedelcu
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
0% found this document useful (0 votes)
65 views6 pages

Reporting Aggregated Data Using The Group Functions

Group functions, also called aggregate functions, return one value for each set of zero or more rows. They process data from multiple rows and return a single result. Common group functions include COUNT, SUM, AVG, MIN, MAX, and more. The GROUP BY clause groups rows together and treats each group as a whole, allowing group functions to aggregate values within each group. ORDER BY is more limited when used with GROUP BY, as it can only sort by columns in the GROUP BY or select list, not other columns. Functions like scalar and aggregate functions can be nested according to their datatype compatibility rules.

Uploaded by

Florin Nedelcu
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 6

Reporting Aggregated Data Using the Group Functions

DEF. A group function returns one value for each set of zero or more rows it encounters. Another term
for group function is multirow or aggregate function. Aggregate functions are typically used with a
SELECT statement that selects many rows, where the aggregate function scans a set of rows and returns
a single answer for all of them.

The first thing to recognize about aggregate functions is that they

o Process data from zero or more rows.


o Return one—and only one—row’s worth of data as their result.

Scalar functions and group functions cannot be used in the same select statement! A scalar function
works for a single row of data, while group functions are applied to sets of zero or more rows of data.
Group functions can be applied on all data types: Number, Varchar or Date.

Aggregate functions can be called from four places in a SELECT statement: the select list, the ORDER BY
clause, and either of two new clauses we’ll look at in this chapter: the GROUP BY clause and the HAVING
clause. Both the GROUP BY and HAVING clauses are unique to the SELECT statement; they do not exist
in other SQL statements.
COUNT
Syntax: COUNT(e1)
Parameters: e1 is an expression. e1 can be any datatype.

The COUNT function determines the number of occurrences of non-NULL values. It considers the
value of an expression and determines if that value is NOT NULL for each row it encounters.

It’s worth noting that COUNT will never return a NULL value. If it encounters no values at all, it will at
least return a value of 0 (zero). This is not the case with all of the aggregates, but it’s true with
COUNT.

The DISTINCT and ALL operators can be used with aggregate functions. For example, here is an example
showing DISTINCT and ALL used within a COUNT function:

SELECT COUNT(DISTINCT LAST_NAME), COUNT(ALL LAST_NAME)


FROM EMPLOYEES;

COUNT(DISTINCT LAST_NAME) COUNT(ALL LAST_NAME)


------------------------ ----------------------
5 7

SUM
Syntax: SUM(e1)
Parameters: e1 is an expression whose datatype is numeric. The SUM function adds numeric
values in a given column. It only takes numeric.

MIN, MAX
Syntax: MIN(e1); MAX(e1)
Parameters: e1 is an expression with a datatype of character, date, or number. For a given set of
rows identified by a SELECT statement, MIN returns the single minimum value, and MAX returns the
single maximum value. MIN and MAX can work with numeric, date, and character data, and they use the
same basic logic that ORDER BY uses for the different datatypes:
 Numeric Low numbers are MIN; high numbers are MAX.
 Data Earlier dates are MIN; later dates are MAX.
 Character ‘A’ is less than ‘Z’; ‘Z’ is less than ‘a’. The string value ‘2’ is greater than the string
value ‘100’. The character ‘1’ is less than the characters ‘10’. Earlier dates are less than later
dates.
AVG
Syntax: AVG(e1)
Parameters: e1 is an expression with a numeric datatype.

The AVG function computes the average value for a set of rows. AVG only works with numeric data. It
ignores NULL values. AVG can be used with both DISTINCT and ALL to return the average amongst
distinct values of a column vs. the average amongst all columns.

Note! Group functions can be nested in scalar functions. Example: ROUND(AVG(NR)) would return the
average calculated for column NR and use round to remove decimals.

MEDIAN
Syntax: MEDIAN(e1)
Parameters: e1 is an expression with a numeric or date data type.
MEDIAN can operate on numeric or date data types. It ignores NULL values. The MEDIAN function is
somewhat related to AVG. MEDIAN performs as you might expect: from a set of data, MEDIAN returns
either the middle value or, if that isn’t easily identified, then an interpolated value from within the
middle. In other words, MEDIAN will sort the values, and if there is an odd number of values, it will
identify the value in the middle of the list; otherwise, if there an even number of values, it will locate the
two values in the middle of the list and perform linear interpolation between them to locate a result.

RANK
Syntax: RANK(c1) WITHIN GROUP (ORDER BY e1)
Parameters: c1 is a constant; e1 is an expression with a data type matching the corresponding c1
data type. Numeric and character pairs are allowed.
In this format, the parameters can be repeated in such a way that for each c1, you can have a
corresponding e1, for each c2 (if included), there must be a corresponding e2, etc. Each successive
parameter is separated from the previous parameter by a comma, as in RANK(c1, c2, c3) WITHIN GROUP
(ORDER BY e1, e2, e3)
Also, the datatype of c1 must match the datatype of e1, and the datatype of c2 (if included)
must match the datatype of e2, etc.
The RANK function calculates the rank of a value within a group of values. Ranks may not be
consecutive numbers, since SQL counts tied rows individually, so if three rows are tied for first, they will
each be ranked 1, 1, and 1, and the next row will be ranked 4.

For example:
SELECT RANK(300) WITHIN GROUP (ORDER BY SQ_FT) RK FROM SHIP_CABINS;

RK
----------------------------------
6

This answer of 6 is telling us when we sort the rows of the SHIP_CABINS table, and then consider the
literal value 300 and compare it to the values in the SQ_FT column, that the value 300, if inserted into
the table, and if sorted with the existing rows, would be the sixth row in the listing. In other words, there
are five rows with a SQ_FT value less than 300.
FIRST, LAST
Syntax: aggregate_function KEEP (DENSE_RANK FIRST ORDER BY e1)
aggregate_function KEEP (DENSE_RANK LAST ORDER BY e1)
Parameters: e1 is an expression with a numeric or character datatype.
The aggregate functions FIRST and LAST are similar. For a given range of sorted values, they return
either the first value (FIRST) or the last value (LAST) of the population of rows defining e1, in the sorted
order.

For example:

SELECT MAX(SQ_FT) KEEP (DENSE_RANK FIRST ORDER BY GUESTS) "Largest"


FROM SHIP_CABINS;

Largest
----------------------
225

In this example, we are doing the following:


 First, we’re sorting all the rows in the SHIP_CABINS table according to the value in the GUESTS
column, and identifying the FIRST value in that sort order, which is a complex way of saying that
we’re identifying the lowest value for the GUESTS column.

 For all rows with a GUEST value that matches the lowest value we just found, determine the
MAX value for SQ_FT. In others, display the highest number of square feet for any and all cabins
that accommodate the lowest number of guests according to the GUESTS column.

GROUPING - to be discussed in a later chapter


Group Data by Using the GROUP BY Clause

The GROUP BY clause is an optional clause within the SELECT statement. Its purpose is to group
sets of rows together and treat each individual set as a whole. In other words, GROUP BY identifies
subsets of rows within the larger set of rows being considered by the SELECT statement. In this way, it’s
sort of like creating a series of “mini-select” statements within the larger SELECT statement.

All group functions mentioned above can be of much more use when used with the GROUP BY
clause. GROUP BY column x will create select all distinct values found in column x and aggregate the
rows based on those values.

The rules for forming a GROUP BY clause are as follows:


 The GROUP BY can specify any number of valid expressions, including columns of the table.
 Generally the GROUP BY is used to specify columns in the table that will contain common data,
in order to “group” rows together for performing some sort of aggregate function on the set of
rows.
 The only items allowed in the select list of a SELECT that includes a GROUP BY clause are
o Expressions that are specified in the GROUP BY
o Aggregate functions
 Expressions that are specified in the GROUP BY do not have to be included in the SELECT
statement’s select list.

ORDER BY Revisited

When a GROUP BY is used in a SELECT statement, then if there is an ORDER BY clause included
as well, its use will be somewhat restricted. The list of columns and/or expressions in an ORDER BY that
is part of a SELECT statement that uses GROUP BY is limited to the following:
 Expressions specified in the GROUP BY clause
 Expressions specified in the select list, referenced by position, name, or alias
 Aggregate functions, regardless of whether the aggregate function is specified elsewhere in the
SELECT statement
 The functions USER, SYSDATE, and UID

One thing you cannot include in the ORDER BY is this: columns in the table that aren’t specified
in the GROUP BY clause. That’s not the case for SELECT statements in general—in a scalar SELECT you
can ORDER BY columns in the table whether they are included in the SELECT or not. But that’s not true
when a GROUP BY is involved. ORDER BY is more limited.
Nesting Functions

Now, remember that there are two general types of functions: single-row, or scalar, functions;
and multirow, or aggregate, functions. Scalar functions return one value for each row encountered by
the SQL statement in which the scalar function is applied. Aggregate functions return one value for
every zero or more rowsencountered by the SQL statement. The rules for nesting functions differ,
depending on whether you are nesting aggregate or scalar functions.

Scalar functions can be nested multiple times, as long as the datatypes match. The reason is that
scalar functions all operate on the same level of detail within the rows—for every one row, a scalar
function returns one value. Aggregate functions, on the other hand, behave differently than scalar
functions when it comes to nesting.

To sum up:
 You are allowed to nest aggregate functions up to two levels deep.
 Each time you introduce an aggregate function, you are “rolling up” lower level data into higher-
level summary data.
 Your SELECT statement’s select list must always respect the level of aggregation and can only
include expressions that are all at the same level of aggregation.
 And finally—remember that scalar functions can be nested at any time and have no effect on
modifying the levels of row aggregation.

HAVING clause

The HAVING clause can exclude specific groups of rows defined in the GROUP BY clause. In other
words, it performs the same task as the WHERE clause does for the rest of the SELECT statement. The
difference is that WHERE deals with individual rows, while HAVING deals with groups of rows as defined
in the GROUP BY clause.
The HAVING clause can only be invoked in a SELECT statement where the GROUP BY clause is
present.
If it is included, GROUP BY and HAVING must follow WHERE (if included) and precede ORDER BY
(if included). Table 7-2 shows these relationships.
HAVING can use the same Boolean operators that WHERE does—AND, OR, and NOT. The only
restrictions on HAVING:
 It can only be used in SELECT statements that have a GROUP BY clause.
 It can only compare expressions that reference groups as defined in the GROUP BY clause, and
aggregate functions.

You might also like