0% found this document useful (0 votes)
166 views39 pages

Mastering SQL Window Functions - 01

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
0% found this document useful (0 votes)
166 views39 pages

Mastering SQL Window Functions - 01

Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 39

Mastering SQL Window

Functions: A
Comprehensive Tutorial

Unlock the full potential of SQL Window


Functions with this in-depth guide. From
basic understanding to advanced
techniques, elevate your data analysis and
querying skills to new heights.

What are Window Functions?

Navigating the intricate landscape of database


management and data analysis can sometimes feel like
being an explorer in uncharted territories. There’s a
certain thrill in unearthing hidden patterns and drawing
meaningful insights from raw data. SQL, with its powerful
set of tools, acts as our compass in this journey.

Along the path, we often come across roadblocks or


challenges that seem difficult to unravel with the existing
tools or techniques in our “toolbelt.” One such case that
many of us encounter is the limitations of “aggregation
functions” used with GROUP BY operations. Maybe you were
asked to calculate a new field for each row in your data,
which inherently rules out aggregation function
operations. Maybe you’ve been asked to find running
totals, averages, or other statistical measures for an
incoming stream of data that is constantly changing (AKA
non-static data).

At first glance, SQL Window Functions might seem like


just another set of commands in the extensive SQL
repertoire. However, they hold a secret power, a kind of
hidden brush, that turns rows of data into a canvas of
possibilities. Today, we set out to demystify these powerful
functions, peel back the layers, and reveal the artistry and
efficiency they bring to data analysis.

As we venture through the intricacies of Window


Functions, we’ll uncover their ability to not just answer our
queries, but to tell a story with our data, to find the rhythm
in the rows and the melody in the numbers. Whether
you’re a data novice with a curious mind or a seasoned
SQL maestro, this guide is crafted to guide your steps
through the intricate dance of Window Functions and to
open your eyes to the symphony of data that awaits.

When Should I Use Window Functions (ELI5)

Photo by Valery Fedotov on Unsplash

Imagine that you have some building blocks, and each


building block represents some data. Your task requires
you to look at certain groups of blocks or to make new
blocks depending on the existing blocks that you have.

1. You Want to Compare Blocks Without Mixing


Them Up

Imagine you want to see if one block is taller than the


blocks right next to it. A Window Function lets you look at
each block and its neighbors without mixing them all up,
so you can easily compare them.
2. You Want to Count or Add Up Blocks in a Row

If you want to count how many blocks TOTAL you have in a


column or add up their numbers, a Window Function can
do that for you, looking at each block one by one and
keeping a running total. It can help you find a running
average of those blocks as well!

3. You Want to Find the Biggest or Smallest Block in


a Section

Let’s say you have your blocks sorted in rows by color, and
you want to find the biggest block in each row. A Window
Function helps you look at each row separately and pick
out the biggest block in each one.

4. You Want to Give Blocks a Score or a Rank

If you want to give each block a score or a rank based on


its size or color, a Window Function can do that too. It
looks at all the blocks, sorts them how you want, and then
gives each one a number to show its rank in the overall set
of blocks.

5. You Want to See How Blocks Compare to Their


Friends

Maybe you want to see if a block is taller than the average


height of the blocks around it. A Window Function can look
at a block and its buddies, calculate the average height,
and then tell you how that block compares.
Window functions are SQL operations that perform a
calculation across a set of rows that are related to the
current row. Unlike aggregate functions, they do not cause
rows to become grouped into a single output row — the
rows retain their separate identities. Window functions can
perform a calculation across a set of rows that are related
to the current row. They are called window functions
because they perform a calculation across a “window” of
rows. For instance, you might want to calculate a running
total of sales, or find out the highest score in a group.

See the image below to see how they compare to


aggregation functions.

The difference between Aggregation Functions and Window Functions


(explained simply). Source.

Anatomy of a Window Function

A lot of people have very complicated ways to go about


explaining Window Functions, so my goal is to make it
super simple for you!
Imagine you’re on a sightseeing bus, and you’re looking
out the window. You see things one after the other, right?
SQL Window Functions work a bit like that. They look at
your data row by row, but (and here’s the cool part) they
remember what they’ve seen before and what’s coming up
next. It’s like having a photographic memory while
sightseeing!

The hypothetical view from inside a Window Function(Created using


DALLE3)

 The Function: This could be a SUM, AVG, MAX,


or any other function you need. It’s generally the
heart of the (mathematical) operation that you
want to perform! These are similar to regular
aggregate functions but do not reduce the
number of rows returned.

 OVER(): This part lets SQL know that we’re


about to do something special, setting the stage
for our Window Function. OVER()is the
cornerstone of window functions in SQL. This
clause empowers us to designate a “window” or
a subset of data that the function will process.

 PARTITION BY: (Optional) If you want to


perform your calculations on specific chunks
(groups) of your data, this is how you tell SQL to
divide things up. If no PARTITION BY is specified, the
function treats all rows of the query result set as
a single partition. It works similarly to the
GROUP BY clause, but while GROUP BY
aggregates the data, PARTITION BY doesn’t, it
just groups the data for the purpose of the
window function.

 ORDER BY: (Optional) This orders the rows


within each partition. If no ORDER BY is specified,
the function treats all rows of the partition as a
single group.
The little helpers inside of a Window Function

Let’s look at this in actual SQL Code:

SELECT column_name,
WINDOW_FUNCTION(column_name) OVER (
PARTITION BY column_name
ORDER BY column_name
RANGE/ROWS BETWEEN ... AND ...
)
FROM table_name;

There you have it! You have a high-level bird’s eye view of
Window Functions work. We’ll of course want to look at
some basic examples to tie this all together, so we’ll do
that next.

Examples To Make it Stick

Let’s imagine that we have some simple Sales Data and


line items for this sales data.
Some Samples Data for us to play with.
1. Running Totals

SELECT
SaleID,
Salesperson,
SaleAmount,
SaleDate,
SUM(SaleAmount) OVER (ORDER BY SaleDate) AS RunningTotal
FROM Sales;

This will calculate the running total of SaleAmount for each


row ordered by SaleDate. The results are shown below.
Notice the new column called RunningTotal! There you
have it, we just created a new column! You may have seen
this elsewhere as a “calculated field.”

2. Cumulative Totals (By SalesPerson)

Now what if we wanted to see how each member of the


Sales team was evolving over time? It is quite important to
keep track of numbers (AKA Quotas) in a Sales team, so we
may have a different requirement of actually calculating
something like a Running Total not for the whole dataset,
but rather for each person on the team. How could we
approach this?

Let’s check out the code and results first, and it will all
become clearer. But first, see if you can spot what changes
in this code compared to the last example.
SELECT
SaleID,
Salesperson,
SaleAmount,
SaleDate,
SUM(SaleAmount) OVER (PARTITION BY Salesperson ORDER BY SaleDate) AS
CumulativeSalePerPerson
FROM Sales;

If we study the new field “CumulativeSalePerPerson” we


see that the pattern is a little harder to spot, but once we
get to the third row it becomes a lot clearer. “Alice” had a
first sale in Row 1 of “300” then she had another sale of
“200” in the third row, so her cumulative sale at that point
was then “500.” Similarly, Bob had sales represented in
the 2nd and 5th rows, which is why he does not reach
“450” until the 5th row, where he scores a sale of “300” to
add to his previous “150.” It’s that simple! Imagine trying
to rack your brain on how we could do this with a regular
SQL query, it would be IMPOSSIBLE!

Cumulative Sales Performance Over Time of each SalesPerson in our


sample data set. (Note: This graph was not generated with SQL)
3. Ranking Sales by SalesAmount

Now imagine if we had a sales competition going on to see


which Salesperson can get the biggest wins (catch the
largest fish).

Created on DALLE3

Obviously, we would want to have an easy way to rank.


Normally, with a regular query, you may be tempted to
simply ORDER BY SaleAmount DESC but then we would lose the
existing order of the rest of the data. This is where
the RANK() function comes in clutch!

SELECT
SaleID,
Salesperson,
SaleAmount,
SaleDate,
RANK() OVER (ORDER BY SaleAmount DESC) AS SaleRank
FROM Sales;
As we can see floating down in the 9th row, Alice caught
the biggest catch of “450” placing her at the top spot! She
also had 3rd, 5th, 10th, 12th, and 14th “rank” catches.

4. Moving Average (3-Day) of SalesAmount

As a busy Sales team, it is important to look for the overall


trends that a team may be progressing towards, in order to
meet sales quotas. If you’re looking for trends rather than
totals, the 3-day moving average smooths out the daily
fluctuations and highlights the overall direction of sales.
It’s like stepping back from a painting to see the big
picture, rather than focusing on each individual
brushstroke.

For the simplicity of this example, we’ve used a 3-day


WINDOW (3-day Moving Average), but it could just as
easily have been a 7-day (Weekly MA), 30-day (Monthly),
or any period of time you decide to look at! (Note: these
window functions get quite long in a single line, so make
sure to break them up with white space for proper code
styling).

SELECT SaleID, SaleDate, Salesperson, SaleAmount,


AVG(SaleAmount) OVER (
ORDER BY SaleDate
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
) AS MovingAverage
FROM Sales;

We used ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING to look at


the day before and the day after each row (AKA our
“Window”). This is probably the most relevant reason why
anyone even thought to call these functions by this name!

Now that we have done a couple of laps with window


functions: the all-important question you may be asking
yourself…

“Why do these things matter, anyway?”

Why Window Functions Matter

The main difference between Window Functions and


GROUP BY Aggregate Functions is that while an
Aggregate Function returns a single result per group of
rows (like the SUM or AVG of a group), a Window Function
will return a result for every row, often in relation to other
rows in the window (like the running total at each row).
Among my students, this is often been the KEY to
understanding how Window functions work, and
moreover WHY THEY ACTUALLY MATTER.

Mastering SQL window functions is akin to adding a


powerful tool to your data manipulation toolkit. They
provide advanced capabilities for complex data analysis
and reporting, enabling you to draw insights and make
informed decisions. Whether it’s calculating running totals,
ranking results, or comparing individual rows to
aggregated dataset metrics, window functions are
INDISPENSABLE. Embrace them in your SQL journey, and
you’ll find your queries reaching new heights of efficiency
and clarity in no time!

Before we dive even deeper into the intricacies of Window


Functions, it’s worth noting that SQL offers a myriad of
tools and functionalities to elevate your data manipulation
skills. If you’ve been following along in our SQL mastery
series, you might recall our other comprehensive SQL
guide on Mastering SQL Subqueries. Understanding
subqueries is also a crucial step in building a strong
foundation for more advanced SQL topics, including
Window Functions. If you haven’t had a chance to explore
that topic yet, I highly recommend giving it a read (or
saving it to a reading list for a rainy day) to solidify your
understanding and enhance your ability to write complex
SQL queries.
In fact, there are sometimes cases where you could use
EITHER a window function or a Subquery to accomplish
the same task. True SQL Mastery will require you to be
adept at multiple means of coming to an answer and
choosing the best path forward, which also involves
considering which is the most efficient in terms of Query
Optimization (more on that in another part of the
Mastering SQL Series).

Types of Window Functions

Just some of the different (and delicious?) FLAVORS of Window Functions.


Created by DALLE3.

Now that we have a solid introduction to Window


Functions, we should take a moment to check out what
flavors of Window functions we have available to expand
our repertoire.

Aggregate Window Functions


These are similar to regular aggregate functions but do not
reduce the number of rows returned. Examples
include SUM(), AVG(), MIN(), MAX(), COUNT().

1. SUM(): This function returns the sum of a numeric


column.

2. AVG(): This function returns the average of a


numeric column.

3. COUNT(): This function returns the number of rows


that match a specified criterion.

4. MIN(): This function returns the smallest value of


the selected column.

5. MAX(): This function returns the largest value of


the selected column.

Ranking Window Functions:

These functions assign a unique rank to each row within a


partition of a result set (or the overall data set). Examples
are ROW_NUMBER(), RANK(), DENSE_RANK(), NTILE().
1. RANK(): This function assigns a unique rank to
each distinct row within the partition of a result
set. The ranks are assigned in the order
specified in the ORDER BY clause of
the OVER() clause. If two or more rows tie for a
rank, each tied row receives the same rank, and
the next rank(s) are skipped.

2. DENSE_RANK(): This function works similarly


to RANK(), but when two or more rows tie for a
rank, the next rank is not skipped. So if you have
three items at rank 2, the next rank listed would
be 3.

3. ROW_NUMBER(): This function assigns a unique row


number to each row within the partition,
regardless of duplicates. If there are duplicate
values in the ordered set, it will still assign
different row numbers to each row.

4. The NTILE() function is used to divide an ordered


partition into a specified number of groups, or
"tiles", and assign a group number to each row
in the partition. This can be useful for things like
dividing a dataset into quartiles, deciles, or any
other set of evenly sized groups.

Take a look at the various Ranking functions side-by-side


below to see how they might look in code.

-- RANK() Example
SELECT
SaleID,
Salesperson,
SaleAmount,
RANK() OVER (ORDER BY SaleAmount DESC) AS RankByAmount
FROM Sales;

-- DENSE_RANK() Example
SELECT
SaleID,
Salesperson,
SaleAmount,
DENSE_RANK() OVER (ORDER BY SaleAmount DESC) AS DenseRankByAmount
FROM Sales;

-- ROW_NUMBER() Example
SELECT
SaleID,
Salesperson,
SaleAmount,
ROW_NUMBER() OVER (ORDER BY SaleAmount DESC) AS RowNumByAmount
FROM Sales;

-- NTILE() Example
SELECT
SaleID,
Salesperson,
SaleAmount,
NTILE(4) OVER (ORDER BY SaleAmount DESC) AS Quartile
FROM Sales;

Value Window Functions

These functions return specific values from each partition.


These functions provide a way to access specific data from
a partition, allowing you to compare or calculate
differences between values in a result set.

Examples are FIRST_VALUE(), LAST_VALUE(), LEAD(), LAG().

1. FIRST_VALUE(): This function returns the first value


in an ordered set of values from a partition. For
example, you could use this function to find the
initial sale made by a salesperson.
2. LAST_VALUE(): This function returns the last value
in an ordered set of values from a partition. It
can be used to find the most recent sale amount
for a particular product.

3. LEAD(): This function allows you to access data


from subsequent rows in the same result set,
providing a way to compare a current value with
values from following rows. It’s useful for
calculating the difference in sales amounts
between two consecutive days.

4. LAG(): Similar to LEAD(), the LAG() function lets you


access data from previous rows in the result set,
without the need for a self-join. This can be
handy for comparing current data with historical
data. These functions are powerful tools for data
analysis, enabling you to navigate through your
data and gain insights from specific data points
in relation to others.

-- FIRST_VALUE() and LAST_VALUE() Example


SELECT
SaleID,
Salesperson,
SaleAmount,
FIRST_VALUE(SaleAmount) OVER (ORDER BY SaleDate) AS FirstSaleAmount,
LAST_VALUE(SaleAmount) OVER (ORDER BY SaleDate
RANGE BETWEEN UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING
) AS LastSaleAmount
FROM Sales;

-- LEAD() and LAG() Example


SELECT
SaleID,
Salesperson,
SaleAmount,
LAG(SaleAmount) OVER (ORDER BY SaleDate) AS PreviousSaleAmount,
LEAD(SaleAmount) OVER (ORDER BY SaleDate) AS NextSaleAmount
FROM Sales;

Window Frame Specification

This concept refers to the subset of rows used to perform


the calculations for a specific row. The window frame can
be specified using the ROWS or RANGEclause, and it can be
unbounded (considering all rows) or limited to a specific
range.

ROWS: Defines the window frame in terms of physical rows.


You can specify a fixed number of rows, or use UNBOUNDED

PRECEDING and UNBOUNDED FOLLOWINGto include all rows.

RANGE: Defines the window frame based on logical groups


of rows. Similar to ROWS, you can specify a range or
use UNBOUNDED options.

-- ROWS Window Frame Specification


SELECT
SaleID,
Salesperson,
SaleAmount,
AVG(SaleAmount) OVER (ORDER BY SaleDate ROWS BETWEEN 1 PRECEDING AND 1
FOLLOWING) AS MovingAvg
FROM Sales;

-- RANGE Window Frame Specification


SELECT
SaleID,
Salesperson,
SaleAmount,
SUM(SaleAmount) OVER (ORDER BY SaleAmount RANGE BETWEEN 50 PRECEDING
AND 50 FOLLOWING) AS CumulativeSum
FROM Sales;

Window frame specification is crucial when you want to


perform calculations across a specific set of rows related
to the current row, rather than the entire partition.

Troubleshooting Window Functions

If your window function isn’t working as expected,


consider the following:

 Check your OVER clause: The OVER clause


determines how the window function behaves.
Make sure that you've specified the PARTITION

BY and ORDER BY clauses correctly.

 Review your function’s syntax: Each window


function has its own syntax. Be sure to review
the syntax of the function you’re using to ensure
it’s correct.

 Examine the data types: Make sure the data


types you’re using in the function are
compatible. For example, you can’t perform
a SUM operation on a text field (or a column with
a hidden string value)
Optimizing Window Functions

Window functions can often result in slow queries due to


the fact they perform calculations across multiple rows.
Here are some tips to optimize your window functions:

 Reduce the number of rows: If you can, filter


your data before applying the window function.
The fewer rows the function has to work with,
the faster your query will run. This is the best
way to make sure that you can work more
efficiently to debug and run your code, before
releasing the beast on the full breadth of your
data.

 Use appropriate indexing: If you’re


partitioning or ordering your data, ensure that
appropriate indexes exist for those columns. This
can significantly speed up the performance of
your window function.

 Avoid complex ordering: If possible, try to


avoid using multiple columns in your ORDER

BY clause within the window function. Each


additional column can increase the computation
time.

 Limit the window frame: By default, window


functions consider all rows in the partition. If
you don’t need to consider all rows, use
the ROWS or RANGE clause to limit the window
frame.

With these advanced window functions and concepts at


your disposal, you can perform complex transformations
and calculations on your data, making your SQL queries
more powerful and insightful. Whether you’re ranking
results, calculating running totals, or accessing specific
values within a partition, window functions provide the
flexibility and functionality needed for advanced data
analysis. Here is a handy-dandy cheat sheet(Source:
learnsql.com) you can always use for reference, now that
we’ve gone over how to do them in-depth.

We’ve navigated quite an intricate terrain of SQL window


functions, uncovering their ability to transform complex
data analysis into more manageable tasks. These advanced
functions not only streamline our queries but also open up
a world of possibilities for data exploration and reporting.
As you continue to incorporate window functions into your
SQL repertoire, remember that the key to mastery is
practice and experimentation. So, dive in, explore, and let
window functions be your guide in the realm of advanced
SQL querying.

Window functions are an advanced type of function in SQL.


They let you work with observations more easily.

Window functions give you access to features like


advanced analytics and data manipulation without
the need to write complex queries.

In this lesson you will learn about what window


functions are and how they work. Without further ado
let's get started.

What is a Window Function?


Before learning exactly what a window function is,
let's define the meaning of a term that will appear
frequently in this article: result set.

In SQL, a result set is the data or result that is


returned from a query. That is, it's the result (table) of
running the code of a select statement.

For you to understand what a window function is, let's


break the words down into pieces.

What exactly is a window in SQL?


A window is basically a set of rows or observations in
a table or result set. In a table you may have more
than one window depending on how you specify the
query – you will learn about this shortly. A window is
defined using the OVER() clause in SQL.
You will learn how to determine the number of
windows in a result set later in this article.

What is a Function?
Functions are predefined in SQL and you use them to
perform operations on data. They let you do things
like aggregating data, formatting strings, extracting
dates, and so on.

So windows functions are SQL functions that enable


us to perform operations on a window – that is, a set
of records.

The interesting thing about window functions is that


with them you can specify the windows you want to
apply the function on. For example, we can partition
the full result set into various groups/windows.

Before we go into the syntax of Window functions,


let's have a look at the categories of window
functions.

Different Types of Window Functions


There are a lot of window functions that exist in SQL
but they are primarily categorized into 3 different
types:

 Aggregate window functions


 Value window functions
 Ranking window functions
Aggregate window functions are used to perform
operations on sets of rows in a window(s). They
include SUM(), MAX(), COUNT(), and others.
Rank window functions are used to rank rows in a
window(s). They
include RANK(), DENSE_RANK(), ROW_NUMBER(), and others.
Value window functions are like aggregate window
functions that perform multiple operations in a
window, but they're different from aggregate
functions. They include things
like LAG(), LEAD(), FIRST_VALUE(), and others. We will see
their usefulness later in the section.
Sample Table
In this tutorial you will be working with a table
called student_score which contains data such
as student_id, student_name, dep_name and score.
You can create the table using the following code:

DROP TABLE IF EXISTS student_score;

CREATE TABLE student_score (


student_id SERIAL PRIMARY KEY,
student_name VARCHAR(30),
dep_name VARCHAR(40),
score INT
);

INSERT INTO student_score VALUES (11, 'Ibrahim', 'Computer Science', 80);


INSERT INTO student_score VALUES (7, 'Taiwo', 'Microbiology', 76);
INSERT INTO student_score VALUES (9, 'Nurain', 'Biochemistry', 80);
INSERT INTO student_score VALUES (8, 'Joel', 'Computer Science', 90);
INSERT INTO student_score VALUES (10, 'Mustapha', 'Industrial Chemistry', 78);
INSERT INTO student_score VALUES (5, 'Muritadoh', 'Biochemistry', 85);
INSERT INTO student_score VALUES (2, 'Yusuf', 'Biochemistry', 70);
INSERT INTO student_score VALUES (3, 'Habeebah', 'Microbiology', 80);
INSERT INTO student_score VALUES (1, 'Tomiwa', 'Microbiology', 65);
INSERT INTO student_score VALUES (4, 'Gbadebo', 'Computer Science', 80);
INSERT INTO student_score VALUES (12, 'Tolu', 'Computer Science', 67);

Syntax for Window Functions


In a simple expression, a window function looks like
this:

function(expression|column) OVER(
[ PARTITION BY expr_list optional]
[ ORDER BY order_list optional]
)
Let's go over the syntax piece by piece:

function(expression|column) is the window function such as


SUM() or RANK().
OVER() specifies that the function before it is a window
function not an ordinary one. So when the SQL engine
sees the over clause it will know that the function
before the over clause is a window function.
The OVER() clause has some parameters which are
optional depending on what you want to achieve. The
first one being PARTITION BY.
The PARTITION BY divides the result set into different
partitions/windows. For example if you specify
the PARTITION BY clause by a column(s) then the result-
set will be divided into different windows of the value
of that column(s).
The expr_list in the PARTITION BY clause is:
expression | column_name [, expr_list ]
Which means that the PARTITION BY can have an
expression, a column, or more than one occurrence or
an expression or column which must be separated by
a comma. For example PARTITION BY column1, column2.
The next parameter ORDER BY is used to sort the
observations in a window. The ORDER BY clause
takes order_list which is:
expression | column_name [ ASC | DESC ]
[ NULLS FIRST | NULLS LAST ][, order_list ]
where order_list can be a expression or column name
and you can also specify the sort order (either
ascending or descending), or you can sort any null
values first or last. Also the order by can take many
expressions or column names.
As stated earlier, the OVER() clause is used to specify
the window in a result set. Now one thing to note is if
any parameter is not specified in the OVER()clause the
default number of windows in the result set will be
one.
You use the PARTITION BY and ORDER BY parameters to
determine or specify the numbers of windows. Let's
go over an example.
How to Use a Window Function – Example
Let's go over an example of how to use a window
function. Say for instance you want to compare the
minimum score and maximum score from all the
records in the table we created earlier. You can do
that using a window function as shown below.

Remember that not specifying a partition clause in


the OVER clause will cause all the windows to span
through the entire dataset.
SELECT
*,
MAX(score) OVER() AS maximum_score,
MIN(score) OVER() AS minimum_score

FROM student_score;
As you can see, we have the minimum and maximum
salary across the entire dataset.
Table showing result of window function
Also, note that the above query can be also achieved
using subqueries like this:

SELECT *,
(SELECT MAX(score) FROM student_score) AS maximum_score,
(SELECT MIN(score) FROM student_score) AS minimum_score
FROM student_score;
As you can see, the window function is easier to
comprehend compared to the subquery method
which looks a bit more advanced.

How to Use a Window Function with PARTITION


BY
Say, for instance, that you want to split the dataset
into different partitions. Then you want to compare
each record in each partition with an aggregate value
or a calculated value of each partition. You can
specify the PARTITION BY clause in the OVER function.
For example, say you want to compare the maximum
score and average score in each department with the
individual score. You can do this by specifying
the PARTITION BY clause in the OVER statement and also
use it with the aggregate function you want to use to
achieve your desired result.
SELECT
*,
MAX(score)OVER(PARTITION BY dep_name) AS dep_maximum_score,
ROUND(AVG(score)OVER(PARTITION BY dep_name), 2) AS
dep_average_score
FROM student_score;
You can see that the PARTITION BY clause specified in
the OVER() clause split the result set into 4 different
partitions. This is because there are 4 different
departments in the dep_name column (which
are Biochemistry, Computer Science, Industrial Chemistry, and
Microbiology).
Now after the PARTITION BY clause, you can then
calculate the aggregate function for each record in
the different departments.

You can see from the above image that the aggregate
function MAX() and AVG() is calculated for each
partition.
Other Examples of Window Functions
Let's go over some of the common window functions
you will work with in SQL.

How to Use the ROW_NUMBER Function


You use ROW_NUMBER() to assign serial numbers to
records in a window. Say we want to assign serial
numbers to the records in a partition. For example,
we want to add row numbers to the dataset based on
their names in alphabetical order. You can do that
using the following code:
SELECT
*,
ROW_NUMBER() OVER(ORDER BY student_name) AS name_serial_number
FROM student_score;

As you can see from the above image,


the student_name with the smallest value (that is, the
one that falls earliest in the alphabet) is Gbadebo since
it starts with G. Then 1 is added as its row number
which is followed by the name that begins with H, and
so on.
How to Use the RANK Function
RANK(), as the name implies,
lets you rank
observations in a window but with gaps. Let's see
what this means:
SELECT
*,
RANK()OVER(PARTITION BY dep_name ORDER BY score DESC)
FROM student_score;
As you can see in the above code, the result set was
partitioned into different windows based on the
department column. Then we used the ORDER
BY clause to sort the student records based on their
score in descending order in each partition. After that,
we applied the RANKfunction.
Now concerning the gaps, as you can see in the
highlighted part in the above image, two records in
the Computer Science department have the same
score (80). This caused both to be ranked with the
value 2 (instead of one being ranked 2 and the other
3). So it doesn't know how to handle a tie, basically.
You can avoid this scenario using another window
function called DENSE_RANK that ranks observations in a
window without these gaps.
How to Use the DENSE_RANK Function
DENSE_RANK is similar to RANK except
that it ranks
observations in a window without gaps.
SELECT
*,
DENSE_RANK()OVER(PARTITION BY dep_name ORDER BY score DESC)

FROM student_score;
As you can see in the output above, when
using DENSE_RANK, the next rank number (which is 3)
was assigned to Tolu (unlike when using RANK which
assigned Tolu a rank of 4, skipping 3 because of the
tie).
How to Use the LAG Function
LAG is used to return the offset
row before the current
row within a window. By default it returns the
previous row before the current row.
You typically use LAG when you want to compare the
value of a previous row with the current row. It's
commonly applied in time-series analysis. For
example:
SELECT
*,
LAG(score) OVER(PARTITION BY dep_name ORDER BY score)
FROM student_score;
As shown in the first partition, the first record in the
biochemistry partition (Yusuf's) does not have a
previous value (that is, no record comes before it) so
that's why null was returned. Then moving to the next
record – Muritadoh's – it has a previous record, so it
returns the previous value which is 70.
How to Use the Frame Clause in ORDER BY
Now you've learned some common window functions
you might work with on a daily basis. So let's move on
to learning another key concept related to the ORDER
BY clause called the frame clause.
A frame clause, as the name implies, provides the
frame (that is, the set of rows in a window) on which
the function is to be applied. You use it to provide the
offset of rows to be included or calculated with the
current row (that is, the rows before or after the
current row – the SQL engine process row one after
the other).

Now before we look into how to specify a frame


clause, let's look at some of the frame clause's
assumptions:

1. First, a frame clause does not apply to ranking


functions. The ranking function only ranks the
observation in the window based on the ORDER
BY clause.
2. When using an aggregate window function, you
may not include the ORDER BY clause. But when
you use the ORDER BY clause, it's a best practice to
specify the frame clause for accurate results.
What this means is say you want to use an
aggregate window function and you want to also
order the observations in that window by a
column. It's best practice is to specify a frame
clause so that you will get an accurate result. But
if you are not ordering the observations in the
window when using an aggregate function, you
don't need to specify a frame clause.
You can specify a frame clause using two things
– ROWS and RANGE. But in this part you will learn how to
use the ROWS keyword since it is commonly used to
specify a frame clause. The RANGE keyword is beyond
the scope of this article.
The ROWS clause defines the frame in terms of the
physical offset rows from the current rows. That is, it
is used to specify the rows that will be used in
conjunction with the current row for calculation.
For example the following frame clause ROWS BETWEEN
1 PRECEDING AND 1 FOLLOWING defines a frame that
includes the current row, 1 row preceding it and 1 row
following it.
Let's look at the keywords that you can use in
conjunction with the ROWSclause:
1. N PRECEDING is a keyword you use to specify the N
rows that will be included in the calculation along
with the current row. For example 3
PRECEDING means 3 rows preceding the current
row.
2. N FOLLOWING works like N PRECEDING excepts that it
works in an opposite manner. N
FOLLOWING specifies the numbers of row after the
current row.
3. UNBOUNDED PRECEDING means all rows before the
current row.
4. UNBOUNDED FOLLOWING means all rows after the
current row.
5. CURRENT ROW is used to specify the current row.
For example, let's look at the below frame clause:

ROWS BETWEEN 2 PRECEDING AND CURRENT ROW will use less


than or equal to 2 rows before the current row, along
with the current row for the calculation.
Frame clause example
Let's look at an example. Say for instance you want to
get the cumulative sum of all the student scores. You
can do that by using a frame clause.

So first, to be able to do this, you need to first know


the types of keywords you will specify in the frame
clause.

Since you want to sum up all rows before the current


row and the current row itself, you can use
the UNBOUNDED PRECEDING keyword. Remember that this
gets all rows before the current row and also uses the
current row itself.
So the code to achieve that task is shown below:

SELECT
*,
SUM(score)OVER(ORDER BY student_id ROWS BETWEEN UNBOUNDED
PRECEDING AND CURRENT ROW) AS cummulative_sum
FROM student_score
Let's break down the window function code:
SUM(score)OVER(ORDER BY student_id ROWS BETWEEN UNBOUNDED
PRECEDING AND CURRENT ROW) AS cummulative_sum
Firstly in the OVER() clause, we sort the entire window
– which is the whole dataset – using the student id.
Then we specify the frame clause which is ROWS
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. This is
all rows before the current row and the current row
will be used for calculation.
The result is shown in the below image:

The first row in the dataset does not have any row
before it. But since we also specify the CURRENT
ROW keyword as the last frame, then the SQL engine
finds its sum which equals 65.
Then moving to the second row. It has 1 row before it.
So the SQL engine sums the score of the first
row 65 with the current row which is 70. That is why
the result is 135...and so on down the table.
When to Use a Window Function
You've learned what window functions are in this
tutorial. Some practical cases where you can use
them are:

1. When you want to compare an aggregate value


in a window with individual records in that
window.
2. When you want to do things like ranking,
percentile, cumulative sum or running total,
moving average, and so on.

You might also like