SQL Subquery
SQL Subquery
In this tutorial, you will learn about subqueries in SQL with the help of examples.
In SQL, it's possible to place a SQL query inside another query. This inner query is known as a
subquery.
Example
-- use a subquery to select the first name of customer
-- with the maximum value of customer id
SELECT first_name
FROM Customers
WHERE customer_id= (
SELECT MAX(customer_id)
FROM CUSTOMERS
);
Run Code
Here, the query is divided into two parts:
the subquery selects the maximum id from the Customers table
the outer query selects the first_name of the customer with the maximum id (returned by the
sub query)
Here,
column is the name of the column(s) to filter
OPERATOR is any SQL operator to connect the two queries
table is the name of the table to fetch the column from
Note: We should use the JOIN clause instead of a subquery whenever possible. It's because the
execution speed of JOIN is more optimized than that of a subquery.
SQL subqueries are basic tools if you want to communicate effectively with relational databases. In
this article, I provide five subquery examples demonstrating how to use scalar, multirow, and
correlated subqueries in the WHERE, FROM/JOIN, and SELECT clauses.
A subquery, or nested query, is a query placed within another SQL query. When requesting
information from a database, you may find it necessary to include a subquery into
the SELECT, FROM , JOIN, or WHERE clause. However, you can also use subqueries when
updating the database (i.e. in INSERT, UPDATE, and DELETE statements).
There are several types of SQL subqueries:
Scalar subqueries return a single value, or exactly one row and exactly one column.
Multirow subqueries return either:
o One column with multiple rows (i.e. a list of values), or
o Multiple columns with multiple rows (i.e. tables).
Correlated subqueries, where the inner query relies on information obtained from the
outer query.
You can read more about the different types of SQL subqueries elsewhere; here, I want to focus
on examples. As we all know, it’s always easier to grasp new concepts with real-world use cases.
So let’s get started.
11 Miracle 1 300.00
12 Sunshine 1 700.00
15 Barbie 3 250.00
18 Mountains 4 1300.00
Artists
id first_name last_name
1 Thomas Black
2 Kate Smith
3 Natali Wein
4 Francesco Benelli
Collectors
id first_name last_name
Sales
D
arti sales
i a painti collec
st_ _pric
d t ng_id tor_id
id e
e
2
0
2
1 1
0 - 2500.
13 2 104
0 1 00
1 1
-
0
1
1 2 14 2 102 2300.
0 0 00
0 2
2 1
-
Sales
1
1
-
1
0
2
0
2
1 1
0 - 300.0
11 1 102
0 1 0
3 1
-
1
0
2
0
2
1 1
0 - 4000.
16 3 103
0 1 00
4 1
-
1
5
2
0
2
1 1
0 - 200.0
15 3 103
0 1 0
5 1
-
2
2
2
0
2
1 1
0 -
17 3 103 50.00
0 1
6 1
-
2
2
Now let’s explore this data using SQL queries with different types of subqueries.
Example 1 - Scalar Subquery
We’ll start with a simple example: We want to list paintings that are priced higher than the
average. Basically, we want to get painting names along with the listed prices, but only for the
ones that cost more than average. That means that we first need to find this average price;
here’s where the scalar subquery comes into play:
SELECT name, listed_price
FROM paintings
WHERE listed_price > (
SELECT AVG(listed_price)
FROM paintings
);
Our subquery is in the WHERE clause, where it filters the result set based on the listed price. This
subquery returns a single value: the average price per painting for our gallery. Each listed price is
compared to this value, and only the paintings that are priced above average make it to the final
output:
name listed_price
Laura Fisher
Christina Buffet
Steve Stevenson
Interestingly, we could get the same result without a subquery by using an INNER JOIN (or
just JOIN). This join type returns only records that can be found in both tables. So, if we join
the collectors and the sales tables, we’ll get a list of collectors with corresponding records in
the sales table. Note: I have also used the DISTINCT keyword here to remove duplicates from
the output.
Here’s the query:
SELECT DISTINCT collectors.first_name, collectors.last_name
FROM collectors
JOIN sales
ON collectors.id = sales.collector_id;
You can read more about choosing subquery vs. JOIN elsewhere in our blog.
Example 3 – Multirow Subquery with Multiple Columns
When a subquery returns a table with multiple rows and multiple columns, that subquery is
usually found in the FROM or JOIN clause. This allows you to get a table with data that was not
readily available in the database (e.g. grouped data) and then join this table with another one
from your database, if necessary.
Let’s say that we want to see the total amount of sales for each artist who has sold at least one
painting in our gallery. We may start with a subquery that draws on the sales table and
calculates the total amount of sales for each artist ID. Then, in the outer query, we combine this
information with the artists’ first names and last names to get the required output:
SELECT
artists.first_name,
artists.last_name,
artist_sales.sales
FROM artists
JOIN (
SELECT artist_id, SUM(sales_price) AS sales
FROM sales
GROUP BY artist_id
) AS artist_sales
ON artists.id = artist_sales.artist_id;
We assign a meaningful alias to the output of our subquery (artist_sales). This way, we can
easily refer to it in the outer query, when selecting the column from this table, and when
defining the join condition in the ON clause. Note: Databases will throw an error if you don't
provide an alias for your subquery output.
Here’s the result of the query:
first_name last_name sales
Brandon Cooper 0
Laura Fisher 2
Christina Buffet 3
Steve Stevenson 1
As you see, the output of the subquery (i.e. the number of paintings) is different for each record
and depends on the output of the outer query (i.e. the corresponding collector). Thus, we are
dealing with a correlated subquery here.
Check out this guide if you want to learn how to write correlated subqueries in SQL. For now,
let’s have one more correlated subquery example.
Example 5 – Correlated Subquery
This time, we want to show the first names and the last names of the artists who had zero sales
with our gallery. Let’s try to accomplish this task using a correlated subquery in
the WHERE clause:
SELECT first_name, last_name
FROM artists
WHERE NOT EXISTS (
SELECT *
FROM sales
WHERE sales.artist_id = artists.id
);
Here is what's going on in this query:
The outer query lists basic information on the artists, first checking if there are
corresponding records in the sales
The inner query looks for records that correspond to the artist ID that is currently being
checked by the outer query.
If there are no corresponding records, the first name and the last name of the
corresponding artist are added to the output:
first_name last_name
Francesco Benelli
In our example, we have only one artist without any sales yet. Hopefully, he’ll land one soon.
What is subquery in SQL?
A subquery is a SQL query nested inside a larger query.
A subquery may occur in :
o - A SELECT clause
o - A FROM clause
o - A WHERE clause
The subquery can be nested inside a SELECT, INSERT, UPDATE, or DELETE statement or
inside another subquery.
A subquery is usually added within the WHERE Clause of another SQL SELECT statement.
You can use the comparison operators, such as >, <, or =. The comparison operator can
also be a multiple-row operator, such as IN, ANY, or ALL.
A subquery is also called an inner query or inner select, while the statement containing a
subquery is also called an outer query or outer select.
The inner query executes first before its parent query so that the results of an inner
query can be passed to the outer query.
You can use a subquery in a SELECT, INSERT, DELETE, or UPDATE statement to perform the
following tasks:
Compare an expression to the result of the query.
Determine if an expression is included in the results of the query.
Check whether the query selects any rows.
Syntax :
The subquery (inner query) executes once before the main query (outer query) executes.
The main query (outer query) use the subquery result.
SQL Subqueries Example :
In this section, you will learn the requirements of using subqueries. We have the following two
tables 'student' and 'marks' with common field 'StudentID'.
student marks
Now we want to write a query to identify all students who get better marks than that of the
student who's StudentID is 'V002', but we do not know the marks of 'V002'.
- To solve the problem, we require two queries. One query returns the marks (stored in
Total_marks field) of 'V002' and a second query identifies the students who get better marks
than the result of the first query.
First query:
SELECT *
FROM `marks`
WHERE studentid = 'V002';
Copy
Query result:
Above two queries identified students who get the better number than the student who's
StudentID is 'V002' (Abhay).
You can combine the above two queries by placing one query inside the other. The subquery
(also called the 'inner query') is the query inside the parentheses. See the following code and
query result :
SQL Code:
SELECT a.studentid, a.name, b.total_marks
FROM student a, marks b
WHERE a.studentid = b.studentid AND b.total_marks >
(SELECT total_marks
FROM marks
WHERE studentid = 'V002');
Copy
Query result:
SQL Code:
INSERT INTO neworder
SELECT * FROM orders
WHERE advance_amount in(2000,5000);
Copy
Output:
SQL Code:
UPDATE neworder
SET ord_date='15-JAN-10'
WHERE ord_amount-advance_amount<
(SELECT MIN(ord_amount) FROM orders);
Copy
Output:
To see more details of subqueries using UPDATE statement click here.
Subqueries with DELETE statement
DELETE statement can be used with subqueries. Here are the syntax and an example of
subqueries using DELETE statement.
Syntax:
DELETE FROM TABLE_NAME
[ WHERE OPERATOR [ VALUE ]
(SELECT COLUMN_NAME
FROM TABLE_NAME)
[ WHERE) ]
If we want to delete those orders from 'neworder' table which advance_amount are less than
the maximum advance_amount of 'orders' table, the following SQL can be used:
Sample table: neworder
SQL Code:
DELETE FROM neworder
WHERE advance_amount<
(SELECT MAX(advance_amount) FROM orders);
Copy
Output:
Need of Ranking
Suppose a student is given a dataset with information regarding the exam of 50 students. It
would take them some time to determine the top performers. Sorting through all that
information is part of the work, which is problematic. But you can solve this issue using SQL
queries like RANK and DENSE_RANK.
Operations that used to take a long time to complete can now be finished in a few seconds.
These functions are used to order and assign numerical values when they fall into two entities.
Both these functions are always used with the OVER() clause.
For example, a company has five employees with five different salary amounts.
Amounts are like 50 thousand, 45 thousand, 55 thousand, 35 thousand and 42 thousand
respectively. Using this function, we got a rank and estimated who received the highest salary.
We will learn how to use these functions with real-time examples. Then we will understand
what the critical difference between both functions is.
What is the RANK Function in SQL?
The rank function is a SQL function specifically used to calculate each row's ranking depending
on the defined attributes and clauses.
Some essential points need to be considered while using the RANK function:
The use of the ORDER BY clause is necessary for using the RANK function.
PARTITION BY clause can be optional.
If two records share identical numerical values- they will also share a similar ranking value.
This leads to combined results that do not follow any sequential order, leading to irregularity
within some values.
Syntax:
SELECT col_name
RANK() OVER(
[PARTITION BY exp] // if you want to partition in group
ORDER BY exp [ASC | DESC], [{exp1…}]
AS ‘r’ FROM table_name
1 Ninja_1 Maths 85
2 Ninja_2 Maths 50
3 Ninja_3 Science 70
4 Ninja_4 Economics 85
5 Ninja_5 Maths 20
6 Ninja_6 English 92
7 Ninja_7 English 92
8 Ninja_8 Science 89
9 Ninja_9 Maths 63
The above table (score) contains six rows and four columns; now, we will use the RANK function
to estimate the students' Rank without using the PARTITION function.
Estimating Rank without using Partition Function
SELECT
name, subject, marks,
RANK() OVER (
ORDER BY marks ASC ) AS rank
FROM Result;
Output:
name Subject marks rank
Ninja_5 Maths 20 1
Ninja_2 Maths 50 2
Ninja_9 Maths 63 3
Ninja_3 Science 70 4
Ninja_1 Maths 85 5
Ninja_4 Economics 85 5
Ninja_8 Science 89 7
Ninja_6 English 92 8
Ninja_7 English 92 8
We can observe that Ninja_1 and Ninja_4 are assigned the same Rank as 5, and Ninja_8, which
has different marks, is assigned a new seven instead of 6.
We will implement the RANK function using the Partition clause in descending order.
Estimating Rank using Partition Function
SELECT
name, subject, marks,
RANK() OVER (
PARTITION BY subject
ORDER BY marks ASC ) AS rank
From Result;
Output:
name Subject marks rank
Ninja_4 Economics 85 1
Ninja_6 English 92 1
Ninja_7 English 92 1
Ninja_5 Maths 20 1
Ninja_2 Maths 50 2
Ninja_9 Maths 63 3
Ninja_1 Maths 85 4
Ninja_3 Science 70 1
Ninja_8 Science 89 2
We observe that using the PARTITION clause, we can assign ranks based on groups or partitions
of data. Then the PARTITION function can help divide the resulting set into smaller groups or
sections. In the above case, since there are four students with the subject 'Maths', the student
with higher marks is assigned the rank 1 and 2, 3 and 4, respectively.
What is DENSE_RANK Function in SQL?
The DENSE RANK function and the RANK function share similarities. But slightly different from
that of the rank function. It produces a Rank continuously without any gap.
Some essential points need to be kept in mind while using the DENSE_RANK function:
Rows with identical values receive the same Rank.
The Rank of subsequent rows increases by one.
Syntax:
SELECT col_name
DENSE_RANK() OVER(
[PARTITION BY exp] // if you want to partition in group
ORDER BY exp [ASC | DESC], [{exp1…}]
AS ‘r’ FROM table_name
Examples of DENSE_RANK Function
Let’s see an example, to understand the Dense_Rank function better. Let’s consider Score table
which has attributes like name, subject and marks.
Id name subject marks
1 Ninja_1 Maths 85
2 Ninja_2 Maths 50
3 Ninja_3 Science 70
4 Ninja_4 Economics 85
5 Ninja_5 Maths 20
6 Ninja_6 English 92
7 Ninja_7 English 92
8 Ninja_8 Science 89
9 Ninja_9 Maths 63
The above table (score) contains six rows and four columns; for better understanding, let’s see
an example using Dense_Rank without the PARTITION clause.
Estimating DENSE_RANK without using Partition Function
SELECT
name, subject, marks,
DENSE_RANK() OVER (
ORDER BY marks ASC
) AS rank
FROM Result;
Output:
name Subject marks rank
Ninja_5 Maths 20 1
Ninja_2 Maths 50 2
Ninja_9 Maths 63 3
Ninja_3 Science 70 4
Ninja_1 Maths 85 5
Ninja_4 Economics 85 5
Ninja_8 Science 89 6
Ninja_6 English 92 7
Ninja_7 English 92 7
Since Ninja_1 and Ninja_4 have the same marks, they are given the same Rank. And Ninja_8 is
given Rank 6.
Another example is DENSE_RANK with PARTITION clause.
Estimating DENSE_RANK using Partition Function
SELECT
name, subject, marks,
DENSE_RANK() OVER (
PARTITION BY subject
ORDER BY marks ASC
) AS rank
FROM Result;
Output:
name Subject marks rank
Ninja_4 Economics 85 1
Ninja_6 English 92 1
Ninja_7 English 92 1
Ninja_5 Maths 20 1
Ninja_2 Maths 50 2
Ninja_9 Maths 63 3
Ninja_1 Maths 85 4
Ninja_3 Science 70 1
Ninja_8 Science 89 2
Comparison Table between RANK and DENSE_RANK
RANK DENSE_RANK
The next Rank is skipped if two or more rank The next Rank is not skipped if two or more rows
rows have identical values in the ORDER BY have the same values in the ORDER BY columns
columns. and obtain the same dense Rank.
Example: If two workers have the same value, Example: If two workers have the same value, for
for instance, they will both obtain rank 1, and instance, they will both be assigned rank 1, and
the following employee will receive rank 3. the subsequent employee will be given a rank of
2.
A quick summary of SQL ranking function
RANK : The rank function is a SQL function specifically used to calculate each row's ranking
depending on the defined attributes and clauses. It skips a rank who has the same record values.
DENSE_RANK : The dense_rank function is a SQL function which assigns rank number to each
row. It does not skip any rank who has the same record values.
Frequently Asked Questions
How can you use RANK and DENSE_RANK to identify outliers?
RANK and DENSE_RANK are used to identify outliers by looking at the tables. For example if
some records rank are in the range of 20 to 50 but there is a record which has 100,this may
indicate an abnormality.
What are some limitations of using RANK and DENSE_RANK that we need to keep in mind
before using it?
Limitation of using RANK and DENSE_RANK is that they are not suitable for ties or skewed
distributed datasets. To use RANK and DENSE_RANK on these dataset, you need to make some
adjustments in the ranking method or choose an alternative option.
What are some use cases for RANK() and DENSE_RANK() in SQL?
Some common use cases for rank and dense_rank functions in SQL include identifying top
performers, ranking products or identifying trends from data. They are used to answer
questions like “what are top 10 performers in a company over last month” or “what are the top
5 highest rated movies over last year.”
How do the RANK and DENSE_RANK functions work with NULL values?
NULL values are treated as unique values by RANK and DENSE_RANK. They can affect ranking
order. It assigns a distinct RANK and DENSE_RANK value to the record. Depending on sorting
technique or partition criteria, NULL values are either ranked as highest and lowest value in
ranking.
We perform calculations on data using various aggregated functions such as Max, Min, and AVG.
We get a single output row using these functions. SQL Sever provides SQL RANK functions to
specify rank for individual fields as per the categorizations. It returns an aggregated value for
each participating row. SQL RANK functions also knows as Window Functions.
Note: Windows term in this does not relate to the Microsoft Windows operating system.
These are SQL RANK functions.
We have the following rank functions.
ROW_NUMBER()
RANK()
DENSE_RANK()
NTILE()
In the SQL RANK functions, we use the OVER() clause to define a set of rows in the result set.
We can also use SQL PARTITION BY clause to define a subset of data in a partition. You can also
use Order by clause to sort the results in a descending or ascending order.
Before we explore these SQL RANK functions, let’s prepare sample data. In this sample data, we
have exam results for three students in Maths, Science and English subjects.
1 CREATE TABLE ExamResult
2 (StudentName VARCHAR(70),
3 Subject VARCHAR(20),
4 Marks INT
5 );
6 INSERT INTO ExamResult
7 VALUES
8 ('Lily',
9 'Maths',
10 65
11 );
12 INSERT INTO ExamResult
13 VALUES
14 ('Lily',
15 'Science',
16 80
17 );
18 INSERT INTO ExamResult
19 VALUES
20 ('Lily',
21 'english',
22 70
23 );
24 INSERT INTO ExamResult
25 VALUES
26 ('Isabella',
27 'Maths',
28 50
29 );
30 INSERT INTO ExamResult
31 VALUES
32 ('Isabella',
33 'Science',
34 70
35 );
36 INSERT INTO ExamResult
37 VALUES
38 ('Isabella',
39 'english',
40 90
41 );
42 INSERT INTO ExamResult
43 VALUES
44 ('Olivia',
45 'Maths',
46 55
47 );
48 INSERT INTO ExamResult
49 VALUES
50 ('Olivia',
51 'Science',
52 60
53 );
54 INSERT INTO ExamResult
55 VALUES
56 ('Olivia',
57 'english',
58 89
59 );
We have the following sample data in the ExamResult table.
By default, it sorts the data in ascending order and starts assigning ranks for each row. In the
above screenshot, we get ROW number 1 for marks 50.
We can specify descending order with Order By clause, and it changes the RANK accordingly.
1 SELECT Studentname,
2 Subject,
3 Marks,
4 ROW_NUMBER() OVER(ORDER BY Marks desc) RowNumber
5 FROM ExamResult;
RANK() SQL RANK Function
We use RANK() SQL Rank function to specify rank for each row in the result set. We have
student results for three subjects. We want to rank the result of students as per their marks in
the subjects. For example, in the following screenshot, student Isabella got the highest marks in
English subject and lowest marks in Maths subject. As per the marks, Isabella gets the first rank
in English and 3rd place in Maths subject.
Execute the following query to get this result set. In this query, you can note the following
things:
We use PARTITION BY Studentname clause to perform calculations on each student
group
Each subset should get rank as per their Marks in descending order
The result set uses Order By clause to sort results on Studentname and their rank
1 SELECT Studentname,
2 Subject,
3 Marks,
4 RANK() OVER(PARTITION BY Studentname ORDER BY Marks DESC) Rank
5 FROM ExamResult
6 ORDER BY Studentname,
7 Rank;
Let’s execute the following query of SQL Rank function and look at the result set. In this query,
we did not specify SQL PARTITION By clause to divide the data into a smaller subset. We use SQL
Rank function with over clause on Marks clause ( in descending order) to get ranks for
respective rows.
1 SELECT Studentname,
2 Subject,
3 Marks,
4 RANK() OVER(ORDER BY Marks DESC) Rank
5 FROM ExamResult
6 ORDER BY Rank;
In the output, we can see each student get rank as per their marks irrespective of the specific
subject. For example, the highest and lowest marks in the complete result set are 90 and 50
respectively. In the result set, the highest mark gets RANK 1, and the lowest mark gets RANK 9.
If two students get the same marks (in our example, ROW numbers 4 and 5), their ranks are also
the same.
Let’s use DENSE_RANK function in combination with the SQL PARTITION BY clause.
1 SELECT Studentname,
2 Subject,
3 Marks,
4 DENSE_RANK() OVER(PARTITION BY Subject ORDER BY Marks DESC) Rank
5 FROM ExamResult
6 ORDER BY Studentname,
7 Rank;
We do not have two students with similar marks; therefore result set similar to RANK Function
in this case.
Let’s update the student mark with the following query and rerun the query.
1 Update Examresult set Marks=70 where Studentname='Isabella' and Subject='Maths'
We can see that in the student group, Isabella got similar marks in Maths and Science subjects.
Rank is also the same for both subjects in this case.
Let’s see the difference between RANK() and DENSE_RANK() SQL Rank function with the
following query.
Query 1
1 SELECT Studentname,
2 Subject,
3 Marks,
4 RANK() OVER(PARTITION BY StudentName ORDER BY Marks ) Rank
5 FROM ExamResult
6 ORDER BY Studentname,
7 Rank;
Query 2
1 SELECT Studentname,
2 Subject,
3 Marks,
4 DENSE_RANK() OVER(PARTITION BY StudentName ORDER BY Marks ) Rank
5 FROM ExamResult
6 ORDER BY Studentname,
7 Rank;
In the output, you can see a gap in the rank function output within a partition. We do not have
any gap in the DENSE_RANK function.
In the following screenshot, you can see that Isabella has similar numbers in the two subjects. A
rank function assigns rank 1 for similar values however, internally ignores rank two, and the next
row gets rank three.
In the Dense_Rank function, it maintains the rank and does not give any gap for the values.
Similarly, NTILE(3) divides the number of rows of three groups having three records in each
group.
1 SELECT *,
2 NTILE(3) OVER(
3 ORDER BY Marks DESC) Rank
4 FROM ExamResult
5 ORDER BY rank;
We can use SQL PARTITION BY clause to have more than one partition. In the following query,
each partition on subjects is divided into two groups.
1 SELECT *,
2 NTILE(2) OVER(PARTITION BY subject ORDER BY Marks DESC) Rank
3 FROM ExamResult
4 ORDER BY subject, rank;
We can use the OFFSET FETCH command starting from SQL Server 2012 to fetch a specific
number of records.
1 WITH StudentRanks AS
2(
3 SELECT *, ROW_NUMBER() OVER( ORDER BY Marks) AS Ranks
4 FROM ExamResult
5)
6
7 SELECT StudentName , Marks
8 FROM StudentRanks
9 ORDER BY Ranks OFFSET 1 ROWS FETCH NEXT 3 ROWS ONLY;
It assigns the rank number to each row in a partition. It skips the number for similar
RANK values.
It assigns the rank number to each row in a partition. It does not skip the number for
Dense_RANK similar values.
It divides the number of rows as per specified partition and assigns unique value in
NTILE(N) the partition.
Writing Subqueries in SQL
Starting here? This lesson is part of a full-length tutorial in using SQL for Data Analysis. Check out
the beginning.
In this lesson we'll cover:
Subquery basics
Using subqueries to aggregate in multiple stages
Subqueries in conditional logic
Joining subqueries
Subqueries and UNIONs
In this lesson, you will continue to work with the same San Francisco Crime data used in
a previous lesson.
Subquery basics
Subqueries (also known as inner queries or nested queries) are a tool for performing operations
in multiple steps. For example, if you wanted to take the sums of several columns, then average
all of those values, you'd need to do each aggregation in a distinct step.
Subqueries can be used in several places within a query, but it's easiest to start with
the FROM statement. Here's an example of a basic subquery:
SELECT sub.*
FROM (
SELECT *
FROM tutorial.sf_crime_incidents_2014_01
WHERE day_of_week = 'Friday'
) sub
WHERE sub.resolution = 'NONE'
Let's break down what happens when you run the above query:
First, the database runs the "inner query"—the part between the parentheses:
SELECT *
FROM tutorial.sf_crime_incidents_2014_01
WHERE day_of_week = 'Friday'
If you were to run this on its own, it would produce a result set like any other query. It might
sound like a no-brainer, but it's important: your inner query must actually run on its own, as the
database will treat it as an independent query. Once the inner query runs, the outer query will
run using the results from the inner query as its underlying table:
SELECT sub.*
FROM (
<<results from inner query go here>>
) sub
WHERE sub.resolution = 'NONE'
Subqueries are required to have names, which are added after parentheses the same way you
would add an alias to a normal table. In this case, we've used the name "sub."
A quick note on formatting: The important thing to remember when using subqueries is to
provide some way to for the reader to easily determine which parts of the query will be
executed together. Most people do this by indenting the subquery in some way. The examples
in this tutorial are indented quite far—all the way to the parentheses. This isn't practical if you
nest many subqueries, so it's fairly common to only indent two spaces or so.
Practice Problem
Write a query that selects all Warrant Arrests from
the tutorial.sf_crime_incidents_2014_01 dataset, then wrap it in an outer query that only
displays unresolved incidents.
The above examples, as well as the practice problem don't really require subqueries—they solve
problems that could also be solved by adding multiple conditions to the WHERE clause. These
next sections provide examples for which subqueries are the best or only way to solve their
respective problems.
Using subqueries to aggregate in multiple stages
What if you wanted to figure out how many incidents get reported on each day of the week?
Better yet, what if you wanted to know how many incidents happen, on average, on a Friday in
December? In January? There are two steps to this process: counting the number of incidents
each day (inner query), then determining the monthly average (outer query):
SELECT LEFT(sub.date, 2) AS cleaned_month,
sub.day_of_week,
AVG(sub.incidents) AS average_incidents
FROM (
SELECT day_of_week,
date,
COUNT(incidnt_num) AS incidents
FROM tutorial.sf_crime_incidents_2014_01
GROUP BY 1,2
) sub
GROUP BY 1,2
ORDER BY 1,2
If you're having trouble figuring out what's happening, try running the inner query individually
to get a sense of what its results look like. In general, it's easiest to write inner queries first and
revise them until the results make sense to you, then to move on to the outer query.
Practice Problem
Write a query that displays the average number of monthly incidents for each category. Hint:
use tutorial.sf_crime_incidents_cleandate to make your life a little easier.
The above query works because the result of the subquery is only one cell. Most conditional
logic will work with subqueries containing one-cell results. However, IN is the only type of
conditional logic that will work when the inner query contains multiple results:
SELECT *
FROM tutorial.sf_crime_incidents_2014_01
WHERE Date IN (SELECT date
FROM tutorial.sf_crime_incidents_2014_01
ORDER BY date
LIMIT 5
)
Note that you should not include an alias when you write a subquery in a conditional statement.
This is because the subquery is treated as an individual value (or set of values in the IN case)
rather than as a table.
Joining subqueries
You may remember that you can filter queries in joins. It's fairly common to join a subquery that
hits the same table as the outer query rather than filtering in the WHERE clause. The following
query produces the same results as the previous example:
SELECT *
FROM tutorial.sf_crime_incidents_2014_01 incidents
JOIN ( SELECT date
FROM tutorial.sf_crime_incidents_2014_01
ORDER BY date
LIMIT 5
) sub
ON incidents.date = sub.date
This can be particularly useful when combined with aggregations. When you join, the
requirements for your subquery output aren't as stringent as when you use the WHERE clause.
For example, your inner query can output multiple results. The following query ranks all of the
results according to how many incidents were reported in a given day. It does this by
aggregating the total number of incidents each day in the inner query, then using those values
to sort the outer query:
SELECT incidents.*,
sub.incidents AS incidents_that_day
FROM tutorial.sf_crime_incidents_2014_01 incidents
JOIN ( SELECT date,
COUNT(incidnt_num) AS incidents
FROM tutorial.sf_crime_incidents_2014_01
GROUP BY 1
) sub
ON incidents.date = sub.date
ORDER BY sub.incidents DESC, time
Practice Problem
Write a query that displays all rows from the three categories with the fewest incidents
reported.
Subqueries can be very helpful in improving the performance of your queries. Let's revisit
the Crunchbase Data briefly. Imagine you'd like to aggregate all of the companies receiving
investment and companies acquired each month. You could do that without subqueries if you
wanted to, but don't actually run this as it will take minutes to return:
SELECT COALESCE(acquisitions.acquired_month, investments.funded_month) AS month,
COUNT(DISTINCT acquisitions.company_permalink) AS companies_acquired,
COUNT(DISTINCT investments.company_permalink) AS investments
FROM tutorial.crunchbase_acquisitions acquisitions
FULL JOIN tutorial.crunchbase_investments investments
ON acquisitions.acquired_month = investments.funded_month
GROUP BY 1
Note that in order to do this properly, you must join on date fields, which causes a massive "data
explosion." Basically, what happens is that you're joining every row in a given month from one
table onto every month in a given row on the other table, so the number of rows returned is
incredibly great. Because of this multiplicative effect, you must use COUNT(DISTINCT) instead
of COUNT to get accurate counts. You can see this below:
The following query shows 7,414 rows:
SELECT COUNT(*) FROM tutorial.crunchbase_acquisitions
If you'd like to understand this a little better, you can do some extra research on cartesian
products. It's also worth noting that the FULL JOIN and COUNT above actually runs pretty fast—
it's the COUNT(DISTINCT) that takes forever. More on that in the lesson on optimizing queries.
Of course, you could solve this much more efficiently by aggregating the two tables separately,
then joining them together so that the counts are performed across far smaller datasets:
SELECT COALESCE(acquisitions.month, investments.month) AS month,
acquisitions.companies_acquired,
investments.companies_rec_investment
FROM (
SELECT acquired_month AS month,
COUNT(DISTINCT company_permalink) AS companies_acquired
FROM tutorial.crunchbase_acquisitions
GROUP BY 1
) acquisitions
FULL JOIN (
SELECT funded_month AS month,
COUNT(DISTINCT company_permalink) AS companies_rec_investment
FROM tutorial.crunchbase_investments
GROUP BY 1
)investments
ON acquisitions.month = investments.month
ORDER BY 1 DESC
Note: We used a FULL JOIN above just in case one table had observations in a month that the
other table didn't. We also used COALESCE to display months when the acquisitions subquery
didn't have month entries (presumably no acquisitions occurred in those months). We strongly
encourage you to re-run the query without some of these elements to better understand how
they work. You can also run each of the subqueries independently to get a better understanding
of them as well.
Practice Problem
Write a query that counts the number of companies founded and acquired by quarter starting in
Q1 2012. Create the aggregations in two separate queries, then join them.
UNION ALL
SELECT *
FROM tutorial.crunchbase_investments_part2
It's certainly not uncommon for a dataset to come split into several parts, especially if the data
passed through Excel at any point (Excel can only handle ~1M rows per spreadsheet). The two
tables used above can be thought of as different parts of the same dataset—what you'd almost
certainly like to do is perform operations on the entire combined dataset rather than on the
individual parts. You can do this by using a subquery:
SELECT COUNT(*) AS total_rows
FROM (
SELECT *
FROM tutorial.crunchbase_investments_part1
UNION ALL
SELECT *
SQL Subquery
Summary: in this tutorial, you will learn about the SQL subquery and how to use the subqueries
to form flexible SQL statements.
SQL subquery basic
Consider the following employees and departments tables from the sample database:
Suppose you have to find all employees who locate in the location with the id 1700. You might
come up with the following solution.
First, find all departments located at the location whose id is 1700:
SELECT
*
FROM
departments
WHERE
location_id = 1700;
Code language: SQL (Structured Query Language) (sql)
Second, find all employees that belong to the location 1700 by using the department id list of
the previous query:
SELECT
employee_id, first_name, last_name
FROM
employees
WHERE
department_id IN (1 , 3, 8, 10, 11)
ORDER BY first_name , last_name;Code language: SQL (Structured Query Language) (sql)
This solution has two problems. To start with, you have looked at the departments table to
check which department belongs to the location 1700. However, the original question was not
referring to any specific departments; it referred to the location 1700.
Because of the small data volume, you can get a list of department easily. However, in the real
system with high volume data, it might be problematic.
Another problem was that you have to revise the queries whenever you want to find employees
who locate in a different location.
A much better solution to this problem is to use a subquery. By definition, a subquery is a query
nested inside another query such as SELECT, INSERT, UPDATE, or DELETE statement. In this
tutorial, we are focusing on the subquery used with the SELECT statement.
In this example, you can rewrite combine the two queries above as follows:
SELECT
employee_id, first_name, last_name
FROM
employees
WHERE
department_id IN (SELECT
department_id
FROM
departments
WHERE
location_id = 1700)
ORDER BY first_name , last_name;Code language: SQL (Structured Query Language) (sql)
The query placed within the parentheses is called a subquery. It is also known as an inner query
or inner select. The query that contains the subquery is called an outer query or an outer select.
To execute the query, first, the database system has to execute the subquery and substitute the
subquery between the parentheses with its result – a number of department id located at the
location 1700 – and then executes the outer query.
You can use a subquery in many places such as:
With the IN or NOT IN operator
With comparison operators
With the EXISTS or NOT EXISTS operator
With the ANY or ALL operator
In the FROM clause
In the SELECT clause
SQL subquery examples
Let’s take some examples of using the subqueries to understand how they work.
SQL subquery with the IN or NOT IN operator
In the previous example, you have seen how the subquery was used with the IN operator. The
following example uses a subquery with the NOT IN operator to find all employees who do not
locate at the location 1700:
SELECT
employee_id, first_name, last_name
FROM
employees
WHERE
department_id NOT IN (SELECT
department_id
FROM
departments
WHERE
location_id = 1700)
ORDER BY first_name , last_name;Code language: SQL (Structured Query Language) (sql)
In this example, the subquery returns the highest salary of all employees and the outer query
finds the employees whose salary is equal to the highest one.
The following statement finds all employees who salaries are greater than the average salary of
all employees:
SELECT
employee_id, first_name, last_name, salary
FROM
employees
WHERE
salary > (SELECT
AVG(salary)
FROM
employees);
Code language: SQL (Structured Query Language) (sql)
In this example, first, the subquery returns the average salary of all employees. Then, the outer
query uses the greater than operator to find all employees whose salaries are greater than the
average.
SQL subquery with the EXISTS or NOT EXISTS operator
The EXISTS operator checks for the existence of rows returned from the subquery. It returns
true if the subquery contains any rows. Otherwise, it returns false.
The syntax of the EXISTS operator is as follows:
EXISTS (subquery )
Code language: SQL (Structured Query Language) (sql)
The NOT EXISTS operator is opposite to the EXISTS operator.
NOT EXISTS (subquery)
Code language: SQL (Structured Query Language) (sql)
The following example finds all departments which have at least one employee with the salary is
greater than 10,000:
SELECT
department_name
FROM
departments d
WHERE
EXISTS( SELECT
1
FROM
employees e
WHERE
salary > 10000
AND e.department_id = d.department_id)
ORDER BY department_name;
Code language: SQL (Structured Query Language) (sql)
Similarly, the following statement finds all departments that do not have any employee with the
salary greater than 10,000:
SELECT
department_name
FROM
departments d
WHERE
NOT EXISTS( SELECT
1
FROM
employees e
WHERE
salary > 10000
AND e.department_id = d.department_id)
ORDER BY department_name;
Code language: SQL (Structured Query Language) (sql)
You can use this query as a subquery in the FROM clause to calculate the average of average
salary of departments as follows:
SELECT
ROUND(AVG(average_salary), 0)
FROM
(SELECT
AVG(salary) average_salary
FROM
employees
GROUP BY department_id) department_salary;
Code language: SQL (Structured Query Language) (sql)
Now you should understand what an SQL subquery is and how to use subqueries to form
flexible SQL statements.
SQL Correlated Subquery
Summary: in this tutorial, you will learn about the SQL correlated subquery which is
a subquery that uses values from the outer query.
Introduction to SQL correlated subquery
Let’s start with an example.
See the following employees table in the sample database:
The following query finds employees whose salary is greater than the average salary of all
employees:
SELECT
employee_id,
first_name,
last_name,
salary
FROM
employees
WHERE
salary > (SELECT
AVG(salary)
FROM
employees);
Code language: SQL (Structured Query Language) (sql)
In this example, the subquery is used in the WHERE clause. There are some points that you can
see from this query:
First, you can execute the subquery that returns the average salary of all employees
independently.
SELECT
AVG(salary)
FROM
employees;
Code language: SQL (Structured Query Language) (sql)
Second, the database system needs to evaluate the subquery only once.
Third, the outer query makes use of the result returned from the subquery. The outer query
depends on the subquery for its value. However, the subquery does not depend on the outer
query. Sometimes, we call this subquery is a plain subquery.
Unlike a plain subquery, a correlated subquery is a subquery that uses the values from the outer
query. Also, a correlated subquery may be evaluated once for each row selected by the outer
query. Because of this, a query that uses a correlated subquery may be slow.
A correlated subquery is also known as a repeating subquery or a synchronized subquery.
SQL correlated subquery examples
Let’s see few more examples of the correlated subqueries to understand them better.
SQL correlated subquery in the WHERE clause example
The following query finds all employees whose salary is higher than the average salary of the
employees in their departments:
SELECT
employee_id,
first_name,
last_name,
salary,
department_id
FROM
employees e
WHERE
salary > (SELECT
AVG(salary)
FROM
employees
WHERE
department_id = e.department_id)
ORDER BY
department_id ,
first_name ,
last_name;
Code language: SQL (Structured Query Language) (sql)
Here is the output:
In this tutorial, you have learned about the SQL correlated subquery and how to apply it to form
a complex query.
SQL ALL
Summary: in this tutorial, you will learn about the SQL ALL operator and how to use it to
compare a value with a set of values.
Introduction to the SQL ALL operator
The SQL ALL operator is a logical operator that compares a single value with a single-column set
of values returned by a subquery.
The following illustrates the syntax of the SQL ALL operator:
WHERE column_name comparison_operator ALL (subquery)
Code language: SQL (Structured Query Language) (sql)
The SQL ALL operator must be preceded by a comparison operator such as >, >=, <, <=, <>, = and
followed by a subquery. Some database systems such as Oracle allow a list of literal values
instead of a subquery.
Note that if the subquery returns no row, the condition in the WHERE clause is always true.
Assuming that the subquery returns one or more rows, the following table illustrates the
meaning of the SQL ALL operator:
Condition Meaning
c > ALL(…) The values in column c must greater than the biggest value in the set to
Condition Meaning
evaluate to true.
c >= The values in column c must greater than or equal to the biggest value in the
ALL(…) set to evaluate to true.
c < ALL(…) The values in column c must be less than the lowest value in the set to
evaluate to true.
c >= The values in column c must be less than or equal to the lowest value in the
ALL(…) set to evaluate to true.
c <> The values in column c must not be equal to any value in the set to evaluate to
ALL(…) true.
c = ALL(…) The values in column c must be equal to any value in the set to evaluate to
true.
SQL ALL examples
We will use the employees table from the sample database for the demonstration:
This query returned 13,000 which is lower than any salary that returned by the query which used
the ALL operator above.
SQL ALL with the greater than or equal to operator
The following shows the syntax of the SQL ALL operator with the greater than or equal to
operator:
SELECT
*
FROM
table_name
WHERE
column_name >= ALL (subquery);
Code language: SQL (Structured Query Language) (sql)
The query returns all rows whose values in the column_name are greater than or equal to all the
values returned by the subquery.
For example, the following query finds all employees whose salaries are greater than or equal to
the highest salary of employees in the Marketing department:
SELECT
first_name, last_name, salary
FROM
employees
WHERE
salary >= ALL (SELECT
salary
FROM
employees
WHERE
department_id = 2)
ORDER BY salary;
Code language: SQL (Structured Query Language) (sql)
As shown clearly in the screenshot, the salary of Michael is 13,000 which is equal to the highest
salary of employees in the Marketing department is included in the result set.
SQL ALL with the less than operator
The following illustrates the ALL operator used with the less than operator:
SELECT
*
FROM
table_name
WHERE
column_name < ALL (subquery);
Code language: SQL (Structured Query Language) (sql)
This query returns all rows whose values in the column_name are smaller than the smallest
values returned by the subquery.
The following statement finds the lowest salary of employees in the Marketing department:
SELECT
MIN(salary)
FROM
employees
WHERE
department_id = 2;
Code language: SQL (Structured Query Language) (sql)
To find all employees whose salaries are less than the lowest salary of employees in
the Marketing department, you use the ALL operator with the less than operator as follows:
SELECT
first_name, last_name, salary
FROM
employees
WHERE
salary < ALL (SELECT
salary
FROM
employees
WHERE
department_id = 2)
ORDER BY salary DESC;
Code language: SQL (Structured Query Language) (sql)
The result is:
SQL ALL with the less than or equal to operator
The following shows the syntax of the ALL operator used with the less than or equal to
operator:
SELECT
*
FROM
table_name
WHERE
column_name <= ALL (subquery);
Code language: SQL (Structured Query Language) (sql)
For example, the following statement finds all employees whose salaries are less than or equal
to the lowest salary of employees in the Marketing department:
SELECT
first_name, last_name, salary
FROM
employees
WHERE
salary <= ALL (SELECT
salary
FROM
employees
WHERE
department_id = 2)
ORDER BY salary DESC;
Code language: SQL (Structured Query Language) (sql)
SQL ALL with the not equal to operator
The following query returns all rows whose values in the column_name are not equal to any
values returned by the subquery:
SELECT
*
FROM
table_name
WHERE
column_name <> ALL (subquery);
Code language: SQL (Structured Query Language) (sql)
For example, to find employees whose salaries are not equal to the average salary of every
department, you use the query below:
SELECT
first_name, last_name, salary
FROM
employees
WHERE
salary <> ALL (SELECT
AVG(salary)
FROM
employees
GROUP BY department_id)
ORDER BY salary DESC;
Code language: SQL (Structured Query Language) (sql)
Notice that the subquery finds the average salary of employees by the department by using
the AVG() function and the GROUP BY clause.
SQL ALL with the equal to operator
When you use the ALL operator with the equal to operator, the query finds all rows whose
values in the column_name are equal to any values returned by the subquery:
SELECT
*
FROM
table_name
WHERE
column_name = ALL (subquery);
Code language: SQL (Structured Query Language) (sql)
The following example finds all employees whose salaries are equal to the highest salary of
employees in the Marketing department:
SELECT
first_name, last_name, salary
FROM
employees
WHERE
salary = ALL (SELECT
MAX(salary)
FROM
employees
WHERE
department_id = 2);
Code language: SQL (Structured Query Language) (sql)
In this tutorial, you have learned how to use the SQL ALL operator to test whether a value
matches a set of values returned by a subquery.
SQL ANY
Summary: in this tutorial, you will learn about the SQL ANY operator and how to use it to
compare a value with a set of values.
Introduction to the SQL ANY operator
The ANY operator is a logical operator that compares a value with a set of values returned by a
subquery. The ANY operator must be preceded by a comparison operator >, >=, <, <=, =, <> and
followed by a subquery.
The following illustrates the syntax of the ANY operator:
WHERE column_name comparison_operator ANY (subquery)
Code language: SQL (Structured Query Language) (sql)
If the subquery returns no row, the condition evaluates to false. Suppose the subquery does not
return zero rows, the following illustrates the meaning of the ANY operator when it is used with
each comparison operator:
Condition Meaning
x = ANY (…) The values in column c must match one or more values in the set to evaluate
to true.
x != ANY The values in column c must not match one or more values in the set to
(…) evaluate to true.
x > ANY (…) The values in column c must be greater than the smallest value in the set to
evaluate to true.
x < ANY (…) The values in column c must be smaller than the biggest value in the set to
evaluate to true.
x >= ANY The values in column c must be greater than or equal to the smallest value in
Condition Meaning
(…) the set to evaluate to true.
x <= ANY The values in column c must be smaller than or equal to the biggest value in
(…) the set to evaluate to true.
SQL ANY examples
For the demonstration, we will use the employees table from the sample database:
To find all employees whose salaries are equal to the average salary of their department, you
use the following query:
SELECT
first_name,
last_name,
salary
FROM
employees
WHERE
salary = ANY (
SELECT
AVG(salary)
FROM
employees
GROUP BY
department_id)
ORDER BY
first_name,
last_name,
salary;
Code language: SQL (Structured Query Language) (sql)
Note that the lowest average salary is 4,150. The query above returns all employees whose
salaries are greater than the lowest salary.
Using SQL ANY with the greater than or equal to operator example
The following statement returns all employees whose salaries are greater than or equal to the
average salary in every department:
SELECT
first_name,
last_name,
salary
FROM
employees
WHERE
salary >= ANY (SELECT
AVG(salary)
FROM
employees
GROUP BY department_id)
ORDER BY first_name , last_name , salary;
Code language: SQL (Structured Query Language) (sql)
Using SQL ANY with the less than operator example
The following query finds all employees whose salaries are less than the average salary in every
department:
SELECT
first_name,
last_name,
salary
FROM
employees
WHERE
salary < ANY (SELECT
AVG(salary)
FROM
employees
GROUP BY department_id)
ORDER BY salary DESC;
Code language: SQL (Structured Query Language) (sql)
In this example, employees whose salaries are smaller than the highest average salary in every
department:
Using SQL ANY with the less than or equal to operator example
To find employees whose salaries are less than or equal to the average salary in every
department, you use the following query:
SELECT
first_name,
last_name,
salary
FROM
employees
WHERE
salary <= ANY (SELECT
AVG(salary)
FROM
employees
GROUP BY department_id)
ORDER BY salary DESC;
Code language: SQL (Structured Query Language) (sql)
As shown in the screenshot, the result set includes the employees whose salaries are lower than
or equal to the highest average salary in every department.
Now you should know how to use the SQL ANY operator to form condition by comparing a
value with a set of values.
Introduction to the SQL EXISTS operator
The EXISTS operator allows you to specify a subquery to test for the existence of rows. The
following illustrates the syntax of the EXISTS operator:
EXISTS (subquery)
Code language: SQL (Structured Query Language) (sql)
The EXISTS operator returns true if the subquery contains any rows. Otherwise, it returns false.
The EXISTS operator terminates the query processing immediately once it finds a row,
therefore, you can leverage this feature of the EXISTS operator to improve the query
performance.
SQL EXISTS operator example
We will use the employees and dependents tables in the sample database for the
demonstration.
The following statement finds all employees who have at least one dependent:
SELECT
employee_id, first_name, last_name
FROM
employees
WHERE
EXISTS( SELECT
1
FROM
dependents
WHERE
dependents.employee_id = employees.employee_id);
Code language: SQL (Structured Query Language) (sql)
The subquery is correlated. For each row in the employees table, the subquery checks if there is
a corresponding row in the dependents table. If yes, then the subquery returns one which
makes the outer query to include the current row in the employees table. If there is no
corresponding row, then the subquery returns no row that causes the outer query to not
include the current row in the employees table in the result set.
SQL NOT EXISTS
To negate the EXISTS operator, you use the NOT operator as follows:
NOT EXISTS (subquery)
Code language: SQL (Structured Query Language) (sql)
For example, the following query finds employees who do not have any dependents:
SELECT
employee_id, first_name, last_name
FROM
employees
WHERE
NOT EXISTS( SELECT
1
FROM
dependents
WHERE
dependents.employee_id = employees.employee_id);
Code language: SQL (Structured Query Language) (sql)
The following screenshot illustrates the result:
SQL EXISTS and NULL
If the subquery returns NULL, the EXISTS operator still returns the result set. This is because
the EXISTS operator only checks for the existence of row returned by the subquery. It does not
matter if the row is NULL or not.
In the following example, the subquery returns NULL but the EXISTS operator still evaluates to
true:
SELECT
employee_id, first_name, last_name
FROM
employees
WHERE
EXISTS( SELECT NULL)
ORDER BY first_name , last_name;
Code language: SQL (Structured Query Language) (sql)
The query returns all rows in the employees table.
In this tutorial, you have learned how to use the SQL EXISTS operator to test for the existence of
rows returned by a subquery.