Mastering SQL Window Functions - 01
Mastering SQL Window Functions - 01
Functions: A
Comprehensive Tutorial
Let’s say you have your blocks sorted in rows by color, and
you want to find the biggest block in each row. A Window
Function helps you look at each row separately and pick
out the biggest block in each one.
SELECT column_name,
WINDOW_FUNCTION(column_name) OVER (
PARTITION BY column_name
ORDER BY column_name
RANGE/ROWS BETWEEN ... AND ...
)
FROM table_name;
There you have it! You have a high-level bird’s eye view of
Window Functions work. We’ll of course want to look at
some basic examples to tie this all together, so we’ll do
that next.
SELECT
SaleID,
Salesperson,
SaleAmount,
SaleDate,
SUM(SaleAmount) OVER (ORDER BY SaleDate) AS RunningTotal
FROM Sales;
Let’s check out the code and results first, and it will all
become clearer. But first, see if you can spot what changes
in this code compared to the last example.
SELECT
SaleID,
Salesperson,
SaleAmount,
SaleDate,
SUM(SaleAmount) OVER (PARTITION BY Salesperson ORDER BY SaleDate) AS
CumulativeSalePerPerson
FROM Sales;
Created on DALLE3
SELECT
SaleID,
Salesperson,
SaleAmount,
SaleDate,
RANK() OVER (ORDER BY SaleAmount DESC) AS SaleRank
FROM Sales;
As we can see floating down in the 9th row, Alice caught
the biggest catch of “450” placing her at the top spot! She
also had 3rd, 5th, 10th, 12th, and 14th “rank” catches.
-- RANK() Example
SELECT
SaleID,
Salesperson,
SaleAmount,
RANK() OVER (ORDER BY SaleAmount DESC) AS RankByAmount
FROM Sales;
-- DENSE_RANK() Example
SELECT
SaleID,
Salesperson,
SaleAmount,
DENSE_RANK() OVER (ORDER BY SaleAmount DESC) AS DenseRankByAmount
FROM Sales;
-- ROW_NUMBER() Example
SELECT
SaleID,
Salesperson,
SaleAmount,
ROW_NUMBER() OVER (ORDER BY SaleAmount DESC) AS RowNumByAmount
FROM Sales;
-- NTILE() Example
SELECT
SaleID,
Salesperson,
SaleAmount,
NTILE(4) OVER (ORDER BY SaleAmount DESC) AS Quartile
FROM Sales;
What is a Function?
Functions are predefined in SQL and you use them to
perform operations on data. They let you do things
like aggregating data, formatting strings, extracting
dates, and so on.
function(expression|column) OVER(
[ PARTITION BY expr_list optional]
[ ORDER BY order_list optional]
)
Let's go over the syntax piece by piece:
FROM student_score;
As you can see, we have the minimum and maximum
salary across the entire dataset.
Table showing result of window function
Also, note that the above query can be also achieved
using subqueries like this:
SELECT *,
(SELECT MAX(score) FROM student_score) AS maximum_score,
(SELECT MIN(score) FROM student_score) AS minimum_score
FROM student_score;
As you can see, the window function is easier to
comprehend compared to the subquery method
which looks a bit more advanced.
You can see from the above image that the aggregate
function MAX() and AVG() is calculated for each
partition.
Other Examples of Window Functions
Let's go over some of the common window functions
you will work with in SQL.
FROM student_score;
As you can see in the output above, when
using DENSE_RANK, the next rank number (which is 3)
was assigned to Tolu (unlike when using RANK which
assigned Tolu a rank of 4, skipping 3 because of the
tie).
How to Use the LAG Function
LAG is used to return the offset
row before the current
row within a window. By default it returns the
previous row before the current row.
You typically use LAG when you want to compare the
value of a previous row with the current row. It's
commonly applied in time-series analysis. For
example:
SELECT
*,
LAG(score) OVER(PARTITION BY dep_name ORDER BY score)
FROM student_score;
As shown in the first partition, the first record in the
biochemistry partition (Yusuf's) does not have a
previous value (that is, no record comes before it) so
that's why null was returned. Then moving to the next
record – Muritadoh's – it has a previous record, so it
returns the previous value which is 70.
How to Use the Frame Clause in ORDER BY
Now you've learned some common window functions
you might work with on a daily basis. So let's move on
to learning another key concept related to the ORDER
BY clause called the frame clause.
A frame clause, as the name implies, provides the
frame (that is, the set of rows in a window) on which
the function is to be applied. You use it to provide the
offset of rows to be included or calculated with the
current row (that is, the rows before or after the
current row – the SQL engine process row one after
the other).
SELECT
*,
SUM(score)OVER(ORDER BY student_id ROWS BETWEEN UNBOUNDED
PRECEDING AND CURRENT ROW) AS cummulative_sum
FROM student_score
Let's break down the window function code:
SUM(score)OVER(ORDER BY student_id ROWS BETWEEN UNBOUNDED
PRECEDING AND CURRENT ROW) AS cummulative_sum
Firstly in the OVER() clause, we sort the entire window
– which is the whole dataset – using the student id.
Then we specify the frame clause which is ROWS
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. This is
all rows before the current row and the current row
will be used for calculation.
The result is shown in the below image:
The first row in the dataset does not have any row
before it. But since we also specify the CURRENT
ROW keyword as the last frame, then the SQL engine
finds its sum which equals 65.
Then moving to the second row. It has 1 row before it.
So the SQL engine sums the score of the first
row 65 with the current row which is 70. That is why
the result is 135...and so on down the table.
When to Use a Window Function
You've learned what window functions are in this
tutorial. Some practical cases where you can use
them are: