Database Management System
Lecture 2
Relational Algebra and SQL
* Some materials adapted from R. Ramakrishnan, J. Gehrke
Today’s Agenda
• Relational Algebra
• Complex SQL
Database Management System 2
Relational Algebra
Database Management System 3
Relational DB and Algebra
• SQL • Relational Algebra
• Practical definition of relational DB • Mathematical definition of Relational DB
• Operates on Tables (bags) • Operates on Relations (Sets)
• Operations • Operations
• Keywords • set-based operations
• Statements: SELECT, FROM, WHERE,… • Intersection, Union,...
• The default is to produce a bag of
rows as a query result
• Want a set, use DISTINCT
Database Management System 4
Describing a relational DB mathematically
• Two ingredients
• A relation is a set of tuples
• Define query operators as a set functions
Database Management System 5
Recap: Cross product with Set
• Let A = {a, b, c} and B = {1, 2}
• Cross product in set theory is defined as ordered pairs (2-tuples) where each
pair consists of an element from A and B
A × B = {(a, 1), (b, 1), (c, 1), (a, 2), (b, 2), (c, 2)}
• How about A = {a, b, c}, B = {1, 2}, and C = {α, β}?
Database Management System 6
Defining Relations
Person(name, salary, num, status)
name = {all possible strings of 30 characters}
salary = {real numbers between 0 and 100,000,000}
num = {integer between 0 and 9999}
status = {“a”, “b”}
• Any instance of the relation is always a subset (⊆) of attributes
• name × sal × num × status
• Each relation instance is a subset of the cross product of its domains
• one element of a relation is called tuple
• A relation is always a set by definition
Database Management System 7
Recap: Set Theory
A = {1, 3, 5, 7} B = {1, 2, 3, 4}
• What do these return?
• A∩B
• A∪B
• A–B
• A×B
Database Management System 8
Relational Algebra has Additional Operations
A = {1, 3, 5, 7} B = {1, 2, 3, 4}
• Introducing new operators
(C for condition, L for attribute list, R for renaming specification)
• A⋈cB
• A÷B
• 𝝈c (A)
• 𝜋 L(A)
• 𝜌 R(A)
Database Management System 9
Relational Algebra as a Query Language
• We don’t normally use relational algebra directly
• Products don’t allow you to write relational algebra queries
• But, it is used internally in a DBMS to represent a query plan
• It is also often used in theoretical work on databases
• (although fragments of first order logic are frequently used as well ... )
Database Management System 10
Relational Algebra Queries w/out Operators
• What does the following SQL query return? Student
SELECT * Student
FROM Student; John Cusack
Will Smith
• Answer: Student
(It is called identity function)
• A relation name by itself is a valid relational algebra query
• Listing the relation name just returns the tuples in the relation
Database Management System 11
Relational Algebra: Selection operator (𝝈)
Account
Number Owner Balance Type
7003001 Jane Smith 1,000,000 Savings
7003003 Alfred Hitchcock 4,400,200 Savings
7003005 Takumi Fujiwara 2,230,000 Checking
7003007 Brian Mills 1,200,000 Savings
• The relational algebra query
𝝈 Balance<3000 (Account)
• Is similar to the SQL query
SELECT *
FROM Account
WHERE Balance < 3,000,000;
Database Management System 12
Relational Algebra: Selection operator (𝝈)
• Select (𝝈) is a unary operator:
𝝈:R→R
• It is always applied to a single relation
𝝈 Balance<3000 (Account)
Select operator Relation or relational
algebra expression
the predicate (condition)
Attribute Comparator (≥, >, =, ≠, <, ≤) Attribute|Constant
Database Management System 13
Exercises
• 𝝈 Balance<3,000,000 (Account)
• 𝝈 Number<7003005 (Account)
• 𝝈 Balance=Number (Account)
• 𝝈 Type=“checking” (𝝈 Balance<3,000,000 (Account))
Account
Number Owner Balance Type
7003001 Jane Smith 1,000,000 Savings
7003003 Alfred Hitchcock 4,400,200 Savings
7003005 Takumi Fujiwara 2,230,000 Checking
7003007 Brian Mills 1,200,000 Savings
Database Management System 14
Relational Algebra: Projection Operator(𝜋)
Account
Number Owner Balance Type
7003001 Jane Smith 1,000,000 Savings
7003003 Alfred Hitchcock 4,400,200 Savings
7003005 Takumi Fujiwara 2,230,000 Checking
7003007 Brian Mills 1,200,000 Savings
• The relational algebra query:
𝜋 Number, Owner (Account)
• Is similar to the SQL query
SELECT Number, Owner
FROM Account;
Database Management System 15
Relational Algebra: Projection operator (𝜋)
• Projection (𝜋) is a unary operator:
𝜋:R→R
• It is always applied to a single relation
𝜋 Number, Owner (Account)
Projection operator Relation or relational
algebra expression
List of attributes to keep
Database Management System 16
Example
SELECT Number
𝜋 Owner (Account) Vs.
FROM Account;
Account
Number Owner Balance Type
7003001 Jane Smith 1,000,000 Savings
7003003 Alfred Hitchcock 4,400,200 Savings
7003005 Takumi Fujiwara 2,230,000 Checking
7003007 Brian Mills 1,200,000 Savings
7003009 Alfred Hitchcock 3,400,200 Checking
Owner Number
Jane Smith • Relations are always sets 7003001
Alfred Hitchcock • Query answer is a set of names 7003003
Takumi Fujiwara
• and J. Smith appears just once 7003005
Brian Mills
in the answer 7003007
7003009
Database Management System 17
Combining Select and Project
• Are any of these equivalent ?
𝜋 Owner(𝝈 Balance < 3,000,000 (Account))
𝝈 Balance<3,000,000(𝜋 Owner, Balance (Account))
𝜋 Owner(𝝈 Balance<3,000,000(𝜋 Owner, Balance(Account)))
𝝈 Type = “checking” (𝝈 Balance<3,000,000(𝜋 Owner, Balance(Account)))
Account
Number Owner Balance Type
7003001 Jane Smith 1,000,000 Savings
7003003 Alfred Hitchcock 4,400,200 Savings
7003005 Takumi Fujiwara 2,230,000 Checking
7003007 Brian Mills 1,200,000 Savings
7003009 Alfred Hitchcock 3,400,200 Checking
Database Management System 18
Relational Algebra: Cross Product operator (×)
• Used in the basic definition of a relation
• “An instance of a relation is a subset of the cross product of its domains”
• Is also an operator in the relational algebra
Database Management System 19
Example
• Suppose we have following two relations
Teacher(TID, Tname) Course(CID, Cname)
Teacher Course
TID Tname CID Cname
101 Emma Thompson 346 How to Act
105 Billy Elliot 491 How to Think
110 John Waine
Teacher X Course SELECT * FROM Teacher, Course;
• The cross product TID Tname CID Cname
101 Emma Thompson 346 How to Act
produces every
101 Emma Thompson 491 How to Think
possible combinations
105 Billy Elliot 346 How to Act
of teacher and courses
105 Billy Elliot 491 How to Think
110 John Waine 346 How to Act
110 John Waine 491 How to Think
Database Management System 20
Relational Algebra: Join operator (⋈)
• Join () is a binary operator
⋈:R×R→R
• It is always applied to a two relations and returns one
Account ⋈Number=Account Deposit
relation or relational Relation or relational
algebra expression algebra expression
the join predicate (condition)
Attribute comparator(≥, >, =, ≠, <, ≤) Attribute
Database Management System 21
Relational Algebra: Join operator (⋈)
Account
Number Owner Balance Type
Deposit
Accnt TxID Date Amount
• The relational algebra query
Account ⋈ Number=Accnt(Deposit)
• is equivalent to
𝝈 Number = Accnt (Account × Deposit)
Database Management System 22
Relational Algebra: Join operator (⋈)
• The join operator is defined for convenience
R1 ⋈ a1=a2R2 ≡ 𝝈 a1=a2 (R1 × R2)
• Any query with a join can always be rewritten into cross product followed by
selection
Database Management System 23
Notes on Join
• Each simple Boolean predicate in the join condition must compare an
attribute from one relation to an attribute in the other relation
Account ⋈ Number = Account ^ type = “checking” Deposit
• type=“checking” is not a join condition
• if you have a join with NO condition, then it is just a cross product
Database Management System 24
Examples
S instance of Student F instance of Faculty
sid name advisor age fid name age
101 Bill 301 20 301 Morrison 45
102 John 302 20 302 Groot 37
103 Edward 301 19
104 Albert 301 19
105 Thompson 302 19
• S ⋈ advisor=fid (F) select * from Student as s, Faculty as f where [Link] = [Link];
• S ⋈ [Link] < [Link] (F) select * from Student as s, Faculty as f where [Link] < [Link];
• The most common join is called a equi-join (for equality condition)
R1 ⋈ A1 = A2 R2
Database Management System 25
SQL statement to an relational Algebra expression
SELECT DISTINCT attributes
FROM T1, T2, …
WHERE conditions
?
𝜋 attributes(𝝈 conditions (T1 × T2 × … ))
• SELECT-FROM-WHERE queries are sometimes described as equivalent to the
Select-Project-Join (SPJ) subset of relational algebra
Database Management System 26
Complex SQL
Database Management System 27
More SQL query constructs
1. SELECT … 1. Extensions: SUM, COUNT, MIN, AVG, etc
2. FROM … 2. Extensions include various kinds of JOINs
3. Additional comparators, e.g. EXISTS, IN, ANY
3. WHERE …
(SELECT … FROM … WHERE …) 4. Operators that takes two or more complete
4. UNION SQL queries as arguments, e.g., UNION and
(SELECT … FROM … WHERE …) INTERSECT
ORDER BY … 5. Several additional clauses, e.g., ORDER BY,
5. GROUP BY … GROUP BY, and HAVING
HAVING …
Database Management System 28
More SQL query constructs
1. SELECT … 1. Extensions: SUM, COUNT, MIN, AVG, etc
2. FROM … 2. Extensions include various kinds of JOINs
3. Additional comparators, e.g. EXISTS, IN, ANY
3. WHERE …
(SELECT … FROM … WHERE …) 4. Operators that takes two or more complete
4. UNION SQL queries as arguments, e.g., UNION and
(SELECT … FROM … WHERE …) INTERSECT
ORDER BY … 5. Several additional clauses, e.g., ORDER BY,
5. GROUP BY … GROUP BY, and HAVING
HAVING …
Database Management System 29
Sample Database
• Let’s consider the following DB for the examples
Customer(Number, Name, Address, Crating,
Camount, Cbalance, Salesperson)
foreign key
[Link] ->[Link]
Salesperson(Number, Name, Address, Office)
• We are going to other DBs time to time
Database Management System 30
SELECT (1/4)
• Aggregate Operators: COUNT, SUM, MIN, MAX, and AVG
SELECT MIN(Cbalnace), MAX(Cbalance), AVG(Cbalance)
FROM Customer;
SELECT MIN(Cbalnace), MAX(Cbalance), AVG(Cbalance)
FROM Customer
WHERE age > 35;
• If one aggregate operator appears in the SELECT clause
• ALL OF THE ENTRIES in the select clause MUST BE AN AGGREGATE OPERATOR
• Unless the query includes a GROUP BY clause (more on later)
Database Management System 31
Stop to think
• What would/should the query result be?
• Is it allowed?
SELECT Name, Crating, AVG(Cbalance)
FROM Customer;
Database Management System 32
SELECT (2/4)
• What is the difference between these two queries?
SELECT COUNT(Name) SELECT DISTINCT Name
Vs.
FROM Customer; FROM Customer;
• When will these two queries return the same answer?
• or what are the conditions for it to happen
Database Management System 33
SELECT (3/4)
• What is the implication of using DISTINCT
• When computing the SUM or AVG of an attribute?
SUM(DISTINCT(AGE)) Vs. SUM(age)
The SUM or AVG will be computed only distinct values
• When computing the MIN or MAX of an attribute?
MIN(DISTINCT(AGE)) Vs. MIN(age)
No Difference: the result does not depend on whether
or not duplicates are removed
Database Management System 34
SELECT (4/4)
• SELECT clause list can also include simple arithmetic expressions using
+, -, *, /
SELECT (Camount – Cbalance) AS AvailableCredit, Name
FROM Customer
WHERE Camount > 0
Database Management System 35
More SQL query constructs
1. SELECT … 1. Extensions: SUM, COUNT, MIN, AVG, etc
2. FROM … 2. Extensions include various kinds of JOINs
3. Additional comparators, e.g. EXISTS, IN, ANY
3. WHERE …
(SELECT … FROM … WHERE …) 4. Operators that takes two or more complete
4. UNION SQL queries as arguments, e.g., UNION and
(SELECT … FROM … WHERE …) INTERSECT
ORDER BY … 5. Several additional clauses, e.g., ORDER BY,
5. GROUP BY … GROUP BY, and HAVING
HAVING …
Database Management System 36
FROM: Syntactic Sugars and new operators
• There are a number of join types that can be expressed in FROM clause
• Inner join (the regular join)
• Cross join syntactic sugars that can be expressed
• natural join
using SELECT-FROM-WHERE queries
• left outer join
• right outer join New operators
• full outer join
Database Management System 37
FROM
• These two queries are equivalent
1. SELECT [Link], [Link]
FROM Customer C JOIN Salesperson S ON [Link] = [Link]
WHERE [Link] < 6;
𝜋 [Link], [Link](𝝈[Link] < 6(Customer ⋈[Link] = [Link] Salesperson))
2. SELECT [Link], [Link]
FROM Customer C, Salesperson S
WHERE [Link] = [Link] AND [Link] < 6;
𝜋 [Link], [Link](𝝈[Link] < 6^[Link] = [Link](Customer × Salesperson))
Database Management System 38
FROM: JOIN with USING clause
• JOIN with USING clause when attributes in the 2 tables have the same name
Course(CNumber, CName, Description)
Teacher(TNumber, TName, Phone)
Offering(CNumber, TNumber, Time, Days, Room)
• These Two queries are equivalent
SELECT [Link], [Link], Room
FROM Course C JOIN Offering USING(CNumber);
SELECT [Link], [Link], Room
FROM Course C JOIN Offering O ON [Link]=[Link];
• USING clause doesn’t need (and can’t have) a correlation name
Database Management System 39
FROM: Basic Join ≡ (INNER) JOIN
• For the INNER JOIN
SELECT [Link], [Link]
FROM Customer C INNER JOIN Salesperson S ON [Link] = [Link];
• The query result includes all “matches” but excludes
• customer rows that do not have a Salesperson
• Salesperson rows that are not assigned to any customers
• The keyword “INNER” is optional
• above query is equivalent to
SELECT [Link], [Link]
FROM Customer C JOIN Salesperson S ON [Link] = [Link];
Database Management System 40
FROM: cross product ≡ CROSS JOIN
• The following queries are equivalent
SELECT *
FROM Customer, Salesperson;
SELECT *
FROM Customer CROSS JOIN Salesperson;
Database Management System 41
FROM: Equi-Jioin vs. Natual Join (1/3)
• When the join is based on equality of attributes, we always have two
identical attributes in the result
Faculty Department
Name DeptID DeptID DeptName
Smith 1 1 Engineering
James 2 2 Communications
Brown 3 3 Marketing
Johnson 1 SELECT *
Robert FROM Faculty F INNER JOIN Department D
ON [Link] = [Link];
[Link] [Link] [Link] [Link]
Smith 1 1 Engineering
Equi-Join Johnson 1 1 Engineering
James 2 2 Communication
Brown 3 3 Markeing
Database Management System 42
FROM: Equi-Jioin vs. Natual Join (1/3)
• Equi-Join with the USING construct: applicable with columns having same
name
Faculty Department
Name DeptID DeptID DeptName
Smith 1 1 Engineering
James 2 2 Communications
Brown 3 3 Marketing
Johnson 1 SELECT *
FROM Faculty F INNER JOIN Department D
USING (DeptID);
Name DeptID DeptName
Smith 1 Engineering
Equi-Join with Johnson 1 Engineering
USING construct James 2 Communication
Brown 3 Markeing
Database Management System 43
FROM: Equi-Jioin vs. Natual Join (3/3)
• NATURAL JOIN: Equi-Join with only one column for each equally named
columns
Faculty Department
Name DeptID DeptID DeptName
Smith 1 1 Engineering
James 2 2 Communications
Brown 3 3 Marketing
Johnson 1 SELECT *
FROM Faculty NATURAL JOIN Department;
Name DeptID DeptName
Smith 1 Engineering
NATURAL JOIN Johnson
James
1
2
Engineering
Communication
If you don’t specify which attributes to
Brown 3 Markeing
join on, natural join will join on
all attributes with the same name Database Management System 44
FROM: more on NATURAL JOIN (1/2)
• NATURAL JOIN is like a “macro” that joins tables with an equality condition
for all attributes with the same name
Course(CNumber, CName, Description)
Teacher(TNumber, TName, Phone)
Offering(CNumber, TNumber, Time, Days, Room)
• NATURAL JOIN drops one of duplicate columns automatically
Database Management System 45
FROM: more on NATURAL JOIN (2/2)
• List the course and teacher name for all course offerings
• This query can be expressed with the NATURAL JOIN or with an INNER JOIN
• These two queries are equivalent
SELECT CName, TName
FROM Course C, Offering O, Teaching T
WHERE [Link] = [Link] AND [Link] = [Link]
SELECT CName, TName
FROM Course NATURAL JOIN Offering NATURAL JOIN Teacher;
• They are equivalent because the join attributes have the same attribute names
• But is it always useful?
Database Management System 46
FROM: INNER JOIN Vs. OUTER JOIN (1/2)
• For the INNER JOIN
SELECT [Link], [Link]
FROM Customer C INNER JOIN Salesperson S ON [Link] = [Link]
• the query result does not include (p.40)
• a customer that does not have a salesperson
• a salesperson that is not assigned to any customers
Number Name Address Crating Camount Cbalance Salesperson
Customer
1 Smith 1st Str. 700 10,000 9,000 55
2 Jones 2nd Str. 700 8,000 4,000 77
3 Mills 3rd Str. 700 11,000 8,000 NULL
Number Name Address Office
Salesperson
55 Miller 5th Str. 101
77 Khan 7th Str. 102
83 Dunham 8th Str. 103
Database Management System 47
FROM: INNER JOIN Vs. OUTER JOIN (2/2)
• An INNER (regular) JOIN includes only those customers that have
salespersons (only the matches)
SELECT [Link], [Link]
FROM Customer as C INNER JOIN Salesperson as S
ON [Link] = [Link];
• A LEFT OUTER JOIN will include all matches plus all – customers that do not
have a Salesperson
• A RIGHT OUTER JOIN will include all matches plus all – salespersons that are
not assigned to any customers
• A FULL OUTER JOIN will include all of these
Database Management System 48
FROM: LEFT OUTER JOIN
INNER JOIN on [Link] = [Link] gives:
1 Smith 1st Str. 700 10,000 9,000 55 55 Miller 5th Str. 101
2 Jones 2nd Str. 700 8,000 4,000 77 77 Khan 7th Str. 102
LEFT OUTER JOIN on [Link] = [Link] gives:
1 Smith 1st Str. 700 10,000 9,000 55 55 Miller 5th Str. 101
2 Jones 2nd Str. 700 8,000 4,000 77 77 Khan 7th Str. 102
3 Mills 3rd Str. 700 11,000 8,000 NULL NULL NULL NULL NULL
Number Name Address Crating Camount Cbalance Salesperson
Customer
1 Smith 1st Str. 700 10,000 9,000 55
2 Jones 2nd Str. 700 8,000 4,000 77
3 Mills 3rd Str. 700 11,000 8,000 NULL
Number Name Address Office
Salesperson
55 Miller 5th Str. 101
77 Khan 7th Str. 102
83 Dunham 8th Str. 103
Database Management System 49
FROM: RIGHT OUTER JOIN
INNER JOIN on [Link] = [Link] gives:
1 Smith 1st Str. 700 10,000 9,000 55 55 Miller 5th Str. 101
2 Jones 2nd Str. 700 8,000 4,000 77 77 Khan 7th Str. 102
RIGHT OUTER JOIN on [Link] = [Link] gives:
1 Smith 1st Str. 700 10,000 9,000 55 55 Miller 5th Str. 101
2 Jones 2nd Str. 700 8,000 4,000 77 77 Khan 7th Str. 102
NULL NULL NULL NULL NULL NULL NULL 83 Dunham 8th Str. 103
Number Name Address Crating Camount Cbalance Salesperson
Customer
1 Smith 1st Str. 700 10,000 9,000 55
2 Jones 2nd Str. 700 8,000 4,000 77
3 Mills 3rd Str. 700 11,000 8,000 NULL
Number Name Address Office
Salesperson
55 Miller 5th Str. 101
77 Khan 7th Str. 102
83 Dunham 8th Str. 103
Database Management System 50
FROM: FULL OUTER JOIN * not supported in mysql
INNER JOIN on [Link] = [Link] gives:
1 Smith 1st Str. 700 10,000 9,000 55 55 Miller 5th Str. 101
2 Jones 2nd Str. 700 8,000 4,000 77 77 Khan 7th Str. 102
RIGHT OUTER JOIN on [Link] = [Link] gives:
1 Smith 1st Str. 700 10,000 9,000 55 55 Miller 5th Str. 101
2 Jones 2nd Str. 700 8,000 4,000 77 77 Khan 7th Str. 102
3 Mills 3rd Str. 700 11,000 8,000 NULL NULL NULL NULL NULL
NULL NULL NULL NULL NULL NULL NULL 83 Dunham 8th Str. 103
Customer Number Name Address Crating Camount Cbalance Salesperson
1 Smith 1st Str. 700 10,000 9,000 55
2 Jones 2nd Str. 700 8,000 4,000 77
3 Mills 3rd Str. 700 11,000 8,000 NULL
Number Name Address Office
Salesperson
55 Miller 5th Str. 101
77 Khan 7th Str. 102
83 Dunham 8th Str. 103
Database Management System 51
FROM: a form of subquery
• You can put a complete query expression in the FROM clause
• also known as nested queries or subqueries
• Parentheses are important
SELECT ...
FROM Employee E, (SELECT ... FROM ... WHERE ...)
WHERE ...
Database Management System 52
Relational Algebra Operators
Database Management System 53
Eight standard relational algebra operators
•𝜋 project We have seen already
•𝝈 select We have seen already
•∪ union From set theory
•∩ intersect From set theory can only used with
union-compatible relations
•– difference From set theory
•× cross product We have seen already
•⋈ join We have seen already
•÷ divide
•𝜌 renaming
Database Management System 54
Union-compatible relations
• Two relations are union-compatible if
• have same number of attributes
• have same domains
• Example
Checking(CNum: int, COwner: string, CBalance: int)
Savings(SNum: int, SOwner: string, SBalance: int)
Database Management System 55
Example: ∪ union
Checking ∪ Savings
Checking
Cnum Cowner Cbalance
Cnum Cowner Cbalance
101 Smith 1000
101 Smith 1000
102 Mils 2000
102 Mills 2000
104 Jones 1000
104 Jones 1000
105 Schwab 3000
105 Schwab 3000
103 Smith 5000
Savings note that attributes are from
Snum Sowner Sbalance the first relation in the query
103 Smith 5000 SELECT CNum, COwner, CBalance
FROM Checking
UNION
SELECT SNum, SOwner, SBalance
FROM Savings;
Database Management System 56
Example: ∩ intersection
Checking ∩ Savings
It is empty – no tuples appear in both relations
?
𝜋Cowner(Checking) ∩ 𝜋 Sowner(Savings)
Smith – the only owner in SavingsAcount
Checking Savings
Cnum Cowner Cbalance Snum Sowner Sbalance
101 Smith 1000 103 Smith 5000
102 Mils 2000
104 Jones 1000
105 Schwab 3000
Database Management System 57
Example: – difference * not supported in mysql
• Find all tuples that are in the Checking relation but are not in the Savings
relation
CheckingAccount − SavingsAccount
• Everyone in Checking except Smith
𝜋 COwner(CheckingAccount) − 𝜋 SOwner(SavingsAccount)
Workaround for difference operation
example query
SELECT * FROM p LEFT OUTER JOIN q ON [Link] = [Link] WHERE [Link] IS NULL
Database Management System 58
More SQL query constructs
1. SELECT … 1. Extensions: SUM, COUNT, MIN, AVG, etc
2. FROM … 2. Extensions include various kinds of JOINs
3. Additional comparators, e.g. EXISTS, IN, ANY
3. WHERE …
(SELECT … FROM … WHERE …) 4. Operators that takes two or more complete
4. UNION SQL queries as arguments, e.g., UNION and
(SELECT … FROM … WHERE …) INTERSECT
ORDER BY … 5. Several additional clauses, e.g., ORDER BY,
5. GROUP BY … GROUP BY, and HAVING
HAVING …
Database Management System 59
UNION and INTERSECTION
• Two complete queries with UNION • Two complete queries with
in between INTERSECT in between
(SELECT [Link]
FROM Customer C (SELECT [Link]
WHERE [Link] LIKE “B%”) FROM Customer C)
UNION INTERSECT
(SELECT [Link] (SELECT [Link]
FROM Salesperson S FROM Salesperson S);
WHERE [Link] LIKE “B%”);
• Two complete queries with EXCEPT (SELECT [Link]
FROM Customer C)
(i.e., DIFFERENCE) in between EXCEPT
• MySQL doesn’t support EXCEPT (SELECT [Link]
FROM Salesperson S);
Database Management System 60
ALL in UNION, INTERSECT, and EXCEPT
• If you don’t specify ALL, the result is computed on sets
• Eliminate duplicates from first operand
• Eliminate duplicates from second operand
• Compute operation
• Eliminate duplicates from result
• Note the difference and chose wisely
• UNION Vs. UNION ALL
• INTERSECT Vs. INTERSECT ALL
• EXCEPT Vs. EXCEPT ALL
Database Management System 61
More SQL query constructs
1. SELECT … 1. Extensions: SUM, COUNT, MIN, AVG, etc
2. FROM … 2. Extensions include various kinds of JOINs
3. Additional comparators, e.g. EXISTS, IN, ANY
3. WHERE …
(SELECT … FROM … WHERE …) 4. Operators that takes two or more complete
4. UNION SQL queries as arguments, e.g., UNION and
(SELECT … FROM … WHERE …) INTERSECT
ORDER BY … 5. Several additional clauses, e.g., ORDER BY,
5. GROUP BY … GROUP BY, and HAVING
HAVING …
Database Management System 62
GROUP BY
• Any SQL query can have the answer “grouped”
• one output row for each group
SELECT Salesperson, COUNT(*) SELECT Salesperson, COUNT(*)
FROM Customer; FROM Customer
GROUP BY Salesperson;
Customer
Number Name Address Crating Camount Cbalance Salesperson Salesperson COUNT(*)
1 Smith 1st Str. 700 10,000 9,000 55 55 1
2 Jones 2nd Str. 700 8,000 4,000 77 77 1
3 Mills 3rd Str. 700 11,000 8,000 NULL NULL 1
Database Management System 63
GROUP BY
SELECT Salesperson, COUNT(*)
FROM Customer
GROUP BY Salesperson;
Customer
Number Name Address Crating Camount Cbalance Salesperson
1 Smith 1st Str. 700 10,000 9,000 55
2 Jones 2nd Str. 700 8,000 4,000 77
3 Mills 3rd Str. 700 11,000 8,000 NULL
4 Bill 4th Str. 700 13,000 5,000 55
5 Jane 5th Str. 800 3,000 3,000 55
6 Harley 8th Str. 700 2,000 8,000 20
7 Khale 9th Str. 900 6,000 1,000 77
Database Management System 64
Example: GROUP BY
SELECT Salesperson, COUNT(*)
1. Make groups resulting in 4 Groups
FROM Customer
2. Evaluate
GROUP BY Salesperson;
“SELECT Salesperson, Count(*)” for each group
Customer
Number Name Address Crating Camount Cbalance Salesperson Salesperson COUNT(*)
1 Smith 1st Str. 700 10,000 9,000 55 55 3
2 Jones 2nd Str. 700 8,000 4,000 77 NULL 1
3 Mills 3rd Str. 700 11,000 8,000 NULL 77 2
4 Bill 4th Str. 700 13,000 5,000 55 20 1
5 Jane 5th Str. 800 3,000 3,000 55
6 Harley 8th Str. 700 2,000 8,000 20
7 Khale 9th Str. 900 6,000 1,000 77
Database Management System 65
SQL HAVING
• HAVING clause specifies a predicate evaluated against each group
• A group is in the result if it satisfies the HAVING condition
SELECT Salesperson, COUNT(*)
FROM Customer
GROUP BY Salesperson HAVING COUNT(*) > 1;
Customer
Number Name Address Crating Camount Cbalance Salesperson Salesperson COUNT(*)
1 Smith 1st Str. 700 10,000 9,000 55 55 2
2 Jones 2nd Str. 700 8,000 4,000 55
3 Mills 3rd Str. 700 11,000 8,000 NULL
Database Management System 66
Example: GROUP BY
SELECT Salesperson, COUNT(*)
1. Make groups resulting in 4 Groups
FROM Customer
2. Check if COUNT(*) >1 holds
GROUP BY Salesperson
3. Evaluate
HAVING COUNT(*) > 1;
“SELECT Salesperson, Count(*)” for each group
Customer
Number Name Address Crating Camount Cbalance Salesperson Salesperson COUNT(*)
1 Smith 1st Str. 700 10,000 9,000 55 55 3
2 Jones 2nd Str. 700 8,000 4,000 77 NULL 1
3 Mills 3rd Str. 700 11,000 8,000 NULL 77 2
4 Bill 4th Str. 700 13,000 5,000 55 20 1
5 Jane 5th Str. 800 3,000 3,000 55
Salesperson COUNT(*)
6 Harley 8th Str. 700 2,000 8,000 20
55 3
7 Khale 9th Str. 900 6,000 1,000 77
77 2
Database Management System 67
Note on GROUP BY, HAVING
• The only attribute that can appear in a “grouped” query are
• the grouping attributes
• aggregate operators that are applied to the group
• Thus, the following is not legal
SELECT Name
FROM Customer GROUP BY Salesperson;
• Because ther can be more than one name for each group
Database Management System 68
Exercise
Team(Name, Games, Wins, Losses, Conference)
Player(Name, Hits, AtBats, HomeRuns, Team)
[Link] -> [Link]
• Write SQL queries for the following
• Average number of wins and losses across teams
• Average number of wins and losses per conference
• Batting average for each player, where batting average is the number of hits divided
by at bats
Database Management System 69
ORDER BY
• Sort the result of a query
SELECT Number, Name, Salesperson
FROM Customer
ORDER BY Name;
Customer Customer
Number Name … Salesperson Number Name … Salesperson
1 Smith … 55 4 Bill … 55
2 Jones … 77 6 Harley … 20
3 Mills … NULL 5 Jane … 55
4 Bill … 55 2 Jones … 77
5 Jane … 55 7 Khale … 77
6 Harley … 20 3 Mills … NULL
7 Khale … 77 1 Smith … 55
Database Management System 70
ORDER BY
• Sort the result of a query
SELECT Number, Name, Salesperson
FROM Customer
ORDER BY Name DESC;
Customer Customer
Number Name … Salesperson Number Name … Salesperson
1 Smith … 55 1 Smith … 55
2 Jones … 77 3 Mills … NULL
3 Mills … NULL 7 Khale … 77
4 Bill … 55 2 Jones … 77
5 Jane … 55 5 Jane … 55
6 Harley … 20 6 Harley … 20
7 Khale … 77 4 Bill … 55
Database Management System 71
ORDER BY
• Sort the result of a query
SELECT Number, Name, Salesperson
FROM Customer
ORDER BY Name, Salesperson;
Customer Customer
Number Name … Salesperson Number Name … Salesperson
1 Smith … 55 8 Bill … 20
2 Jones … 77 4 Bill … 55
3 Mills … NULL 6 Harley … 20
4 Bill … 55 5 Jane … 55
5 Jane … 55 2 Jones … 77
6 Harley … 20 7 Khale … 77
7 Khale … 77 3 Mills … NULL
8 Bill … 20 1 Smith … 55
Database Management System 72
Subqueries
• It can be used in the where clause (in addition to the FROM clause)
SELECT [Link], [Link] Outer query
FROM Customer C1
WHERE [Link] = (SELECT MAX([Link])
FROM Customer C2); Inner query
• Inner query returns
• A single value that represents max credit rating
• Outer query returns
• The name and number of the customer with the highest credit ratings
Database Management System 73
Example
SELECT [Link], [Link]
FROM Customer C1
WHERE [Link] = (SELECT MAX([Link])
FROM Customer C2);
1. FROM clause in outer query
2. Take a row from the Customer table
3. Check if the row satisfies the WHERE clause
4. Evaluate the inner query (result: 800)
5. Evaluate if Crating is equal to the result
Customer
Number Name Address Crating Camount Cbalance Salesperson
1 Smith 1st Str. 200 10,000 9,000 55
2 Jones 2nd Str. 800 8,000 4,000 55
3 Mills 3rd Str. 700 11,000 8,000 NULL
Database Management System 74
Subqueries
• Subqueries can be used in the where clause (in addition to the from clause)
SELECT [Link], [Link]
FROM Customer C1
WHERE [Link] = (SELECT MAX([Link])
FROM Customer C2);
• Six Comparators: =, >, < >=, <=, <> (not equal)
• inner query must return a single value
• If the inner query does not mention any attributes from the outer query (C1
not mentioned in the inner query)
• Then you only need to evaluate the inner query once
• The inner (sub) query is NOT correlated
Database Management System 75
Subqueries: SOME/ALL comparison
SELECT [Link]
FROM Salesperson S
WHERE [Link] = SOME (SELECT [Link]
FROM Customer C
WHERE [Link] = 700);
• For SOME, the expression must be true for at least one row in the subquery
answer
• “ANY” is equivalent to SOME
• What does this query return?
The name of each salespeople that has a
customer with a credit rating of 700
Database Management System 76
Subqueries: SOME/ALL comparison
SELECT [Link]
FROM Salesperson S
WHERE [Link] = ALL (SELECT [Link]
FROM Customer C
WHERE [Link] = 700);
• For ALL, the expression must be true for all rows in the subquery answer
• What does this query return?
The name of the salesperson that has all the customers
with a rating of 700 (if such a salesperson exists)
Database Management System 77
Subqueries: IN/NOT IN comparison (1/4)
SELECT [Link], [Link]
FROM Customer C1
WHERE [Link] IN (SELECT Name
FROM Salesperson);
• With IN, the attribute matches at least one value returned from the
subquery
• Same as “= SOME”
Database Management System 78
Subqueries: IN/NOT IN comparison (2/4)
SELECT [Link], [Link]
FROM Customer C1
WHERE [Link] NOT IN (SELECT Name
FROM Salesperson);
• With NOT IN, the attribute matches none of the values returned from the
subquery
• Same as “<> ALL”
Database Management System 79
Subqueries: IN/NOT IN comparison (3/4)
• Are these equivalent?
• Do we need to use DISTINCT for these to be equivalent?
• Is the subquery correlated?
SELECT [Link], [Link]
FROM Salesperson S
WHERE [Link] IN (SELECT [Link]
FROM Customer C);
SELECT DISTINCT [Link], [Link]
FROM Salesperson S, Customer C
WHERE [Link] = [Link];
Database Management System 80
Subqueries: IN/NOT IN comparison (4/4)
SELECT [Link], [Link]
FROM Salesperson S
WHERE [Link] IN (SELECT [Link]
FROM Customer C
WHERE [Link] = [Link]);
• Because the subquery mentions an attribute from a table in the outer query
• The subquery must be (re-)evaluated for each row in the outer query (each time the
WHERE clause is evaluated)
• Correlated subqueries can be very expensive!
Database Management System 81
Subqueries: EXISTS/NOT EXISTS (1/2)
SELECT [Link]
FROM Customer C
WHERE EXISTS (SELECT *
FROM Salesperson S
WHERE [Link] = [Link] AND
[Link] = [Link]);
• If the answer to the subquery is not empty ... then the EXISTS predicate
returns TRUE
• Is this subquery correlated?
• What does this query return?
Database Management System 82
Subqueries: EXISTS/NOT EXISTS (2/2)
SELECT [Link]
FROM Customer C
WHERE EXISTS (SELECT *
FROM Salesperson S
WHERE [Link] = [Link] AND
[Link] = [Link]);
• Four predicates can be applied to a subquery
• EXISTS : is the subquery answer non-empty?
• NOT EXISTS : is the subquery answer empty?
• UNIQUE : does the subquery return just one row?
• NOT UNIQUE : does the subquery return multiple rows?
Database Management System 83
Missing Relational Algebra
Operator
Divide
Database Management System 84
Divide Operator (p. 54)
• Suppose we have a extra table in our database
Account AccountTypes
Number Owner Balance Type Type
7003001 Jane Smith 1,000,000 Savings Checking
7003003 Alfred Hitchcock 4,400,200 Savings Savings
7003005 Takumi Fujiwara 2,230,000 Checking
7003007 Brian Mills 1,200,000 Savings
• How do we find customers that have at least one account of each account
type?
𝜋Owner,Type(Account) ÷ AccountTypes
Find account owners who have ALL types of accounts
Database Management System 85
For Next Week
• Review – Quiz on the material
• Ch. 4 to 4.2
• Ch. 5.5
• Reading assignments
• Ch. 2-2.5
• Ch. 3.5
• Be sure you understand
• Aggregate operations
• how join operates
• set operators
• GROUP BY, HAVING, ORDER BY, Subqueries
Database Management System 86