Relational Algebra
1
Introduction
• Query languages are specialized languages for
asking questions or queries, that involve the data
in a database.
• Queries in algebra are composed of a collection of
operators.
• Every operator in relational algebra accepts (one
or two) relation instances as arguments and returns
a relation instance as the result.
– A relational algebra expression is recursively defined to
be a relation.
– Relational algebra is a procedural query language
• Define a step-by-step procedure for computing the desired
answer.
2
Basic Operators
• There are six basic operators in relation
algebra:
– select ( )
– project ( )
– union ( U )
– set different ( - )
– Cartesian product ( x )
– rename ( )
3
• Selection
– The selection operation specifies the tuples to
retain through a selection condition.
– Selection condition is a Boolean combination
(an expression using logical connectives and
) of terms that have the form
• attribute op constant
• attribute1 op attribute2
(op is one of the comparison operators , , , ,
, ).
4
Relation R
a=b d>5 ( R )
a b c d
1 7 a b c d
5 7 1 7
12 3 23 10
23 10
5
• Projection
– The projection operator allows us to extract
columns from a relation.
– The subscript specifies the fields to be retained.
(The other fields are ‘projected out’).
– The schema of the result of a projection is
determined by the fields that are projected.
– Duplicated row will be eliminated in the final
result.
• This follows from the definition of a relation as a set
of tuples.
6
Relation R a,c ( R )
a b c a c a c
10 1 1 1
20 1 1 Eliminate
1
Duplicated
30 1 1 rows
2
40 2 2
7
Employee
f_name l_name id sex salary superid dno
Joseph Chan 999999 M 29500 654321 4
Victor Wong 001100 M 30000 888555 5
Carrie Kwan 898989 F 26000 654321 4
Joyce Fong 345345 F 12000 777888 4
Find all employees who works in department 4 and whose
salary is greater than 25000.
dno=4 salary>25000 (Employee)
f_name l_name id sex salary superid dno
Joseph Chan 999999 M 29500 654321 4
Carrie Kwan 898989 F 26000 654321 4
8
1. Find the employee names and department number of all
employees
2. Find the sex and department number of all employees
f_name,l_name,dno ( Employee ) sex,dno ( Employee )
f_name l_name dno sex dno
Joseph Chan 4 M 4
Victor Wong 5 M 5
Carrie Kwan 4 F 4
Joyce Fong 4
9
– Compositing operation gives relational-algebra
expression.
– The result of a relational-algebra expression is
always a relation.
f_name,l_name ( dno=4^salary>25000 ( Employee ) )
f_name l_name
Joseph Chan
Carrie Kwan
10
• Union
– R S returns a relation instance containing all tuples
that occur in either relation R or relation instance S (or
both).
– The union operation is commutative: R S = S R
– Duplicate tuples are eliminated.
– R and S must be union-compatible.
• They have the same number of fields,
• The corresponding fields have the same domains.
– The field names are not important
– The schema of the result is defined to be identical to the
schema of R.
• The field of R S inherit names from R.
11
a b
a b
a b 1
1
R S 2 RUS 2
2
3 1
1
3
12
• Set Difference
– Set difference R – S returns a relation instance
containing all the tuples that occur in R but not
in S.
– The set difference is not commutative: in
general, R – S S – R.
– The relations R and S must be union-
compatible.
– The schema of the result is defined to be
identical to the schema of R.
13
a b
a b a b
1
R S 2 R-S 1
2
3 1
1
14
• Intersection
– R S returns a relation instance containing all
tuples that occur in both R and S.
– The relations R and S must be union-
compatible.
– The schema of the result is defined to be
identical to the schema of R.
– Intersection is not considered a basic operation,
as it can be derived from the basic operations as
shown below:
R S R ( R S )
15
a b
a b a b
1
R S 2 RS 2
2
3
1
16
• Rename ()
– ( R(F), E ) or ( R, E )
• where E is an arbitrary relation algebra expression
– result: relation named R,
– R = E except that the fields may be renamed
according to F.
– F is called the renaming list:
– oldname newname or position newname.
17
Example
( C ( sid identity ), E )
We may rename more fields:
( C ( sid identity, child dependent ), E )
Example
( C ( 3 identity ), E )
Result is C
3rd attribute in E is renamed as “identity” in C
18
• Cartesian product (Cross product)
– R S returns a relation instance whose schema
contains all the fields of R followed by all the
fields of S.
– The result contains one tuple <r,s>
(concatenation of tuples r and s) for each pair
of tuples r R, s S.
19
RS
a b c d e
S 1 10 +
R c d e 1 10 +
a b 10 + 1 20 -
1 10 + 1 10 -
2 20 - 2 10 +
10 - 2 10 +
2 20 -
2 10 -
20
Query: To retrieve for each female employee a list
of the names of her dependents.
EMPLOYEE
f_name l_name id bdate addr sex salary super_id dno
DEPENDENT
eid dependent-name sex bdate relationship
21
Female_emps sex=‘F’ ( Employee )
Empnames f_name,l_name,id ( Female_emps )
Female_emps
f_name l_name id bdate address sex salary superid dno
Alicia Chan 998877 2-Jul-70 231, Cai Road, HK F 9500 654321 4
Jennifer Wong 654321 20-June-60 342, Cheung Road, HK F 30000 888555 4
Joyce Fong 345345 19-Dec-80 23, Young Road, HK F 12000 777888 5
f_name l_name id
Empnames
Alicia Chan 998877
Jennifer Wong 654321
Joyce Fong 345345
Dependents eid dep_name sex bdate relationship
334455 Alice F 5-Apr-90 Daughter
334455 Theodore M 3-Mar-92 Son
654321 Abner M 29-Feb-94 Son
123456 Alice F 2-Nov-97 Daughter
22
Emp_dependents Empnames x Dependents
Emp_dependents
f_name l_name id eid dep_name sex bdate relationship
Alicia Chan 998877 334455 Alice F 5-Apr-90 Daughter
Alicia Chan 998877 334455 Theodore M 3-Mar-92 Son
Alicia Chan 998877 654321 Abner M 29-Feb-94 Son
Alicia Chan 998877 123456 Alice F 2-Nov-97 Daughter
Jennifer Wong 654321 334455 Alice F 5-Apr-90 Daughter
Jennifer Wong 654321 334455 Theodore M 3-Mar-92 Son
Jennifer Wong 654321 654321 Abner M 29-Feb-94 Son
Jennifer Wong 654321 123456 Alice F 2-Nov-97 Daughter
Joyce Fong 345345 334455 Alice F 5-Apr-90 Daughter
Joyce Fong 345345 334455 Theodore M 3-Mar-92 Son
Joyce Fong 345345 654321 Abner M 29-Feb-94 Son
Joyce Fong 345345 123456 Alice F 2-Nov-97 Daughter
23
Query: To retrieve for each female employee a list of the names of her dependents
Actual_dependents id=eid ( Emps_dependents )
Result f_name,l_name,dep_name ( Actual_dependents )
Actual_dependents
f_name l_name id eid dep_name sex bdate relationship
Jennifer Wong 654321 654321 Abner M 29-Feb-94 Son
Result
f_name l_name dep_name
Jennifer Wong Abner
24
• Join
– Because the sequence of operations, × followed by is
quite common, a special operations, called the “join”
operation ( ) was created to specify these as a single
operation.
– Join can be defined as a cross-product followed by
selections and sometimes with projections.
– The result of a cross-product is typically much larger
than the result of a join, and it is very important to
recognize joins and implement them without
materializing the underlying cross-product.
25
• Condition Join
– The most general version of join operation
accepts a join condition c.
– The join condition is identical to a selection
condition in form.
– The operation is defined as follows:
R C S C ( R S )
26
S R
sid sname rating age sid bid day
22 Dustin 7 45.0 22 101 10/10/96
31 Lubber 8 55.5 58 103 11/12/96
58 Rusty 10 35.5
S S .sid [Link] R
(sid) sname rating age (sid) bid day
22 Dustin 7 45.0 58 103 11/12/96
31 Lubber 8 55.5 58 103 11/12/96
27
S R
sid sname rating age sid bid day
22 Dustin 7 45.0 22 101 10/10/96
31 Lubber 8 55.5 58 103 11/12/96
58 Rusty 10 35.5
S S .sid [Link] R
(sid) sname rating age (sid) bid day
22 Dustin 7 45.0 58 103 11/12/96
31 Lubber 8 55.5 58 103 11/12/96
28
Example
Query: Find the names of employees with the highest salary.
l_name,f_name ( Employee ) -
Employee.l_name, Employee.f_name (
Employee [Link]<[Link] ( F, Employee ) )
Note: We assume that <l_name,f_name> is a key in this relation.
29
• Equi-join
– A special case of the join operation is when the
join condition consists solely of equalities
(connected by ) of the form
R.name1 = S.name2
– In the resulting relation, S.name2 will be
dropped by an additional projection operation.
30
R S R R.b S .b S
a b c b e f a b c e f
1 1 X 1 X
5 3 X 1 X
4 1 X 1 X
1 2 Y 1 X
2 3 Y 2 Y
31
R S R R.b S .b S
a b c b e f a b c e f
1 1 X 1 X
5 3 X 1 X
4 1 X 1 X
1 2 Y 1 X
2 3 Y 2 Y
32
• Natural Join
– A further special case of the join operation R S
is an equijoin in which equalities
are specified on all fields having the same
names in R and S.
– We can simply omit the join condition.
– The resulting schema contains the attributes of
R followed by the attributes in S that are not in
R.
– If the two relations
R S have no attributes in
common, is simply the cross-product.
33
R S
a b c d b d e R S
1 X 1 X a b c d e
2 X 3 X 1 X
4 Y 1 X 1 X
1 Y 2 Y 2 Y
2 Y 3 Y
34
R S
a b c d b d e R S
1 X 1 X a b c d e
2 X 3 X 1 X
4 Y 1 X 1 X
1 Y 2 Y 2 Y
2 Y 3 Y
35
• Division
– The division operation is useful for expressing
certain kinds of queries, for example, “find the
names of sailors who have reserved all boats.
– Example
• Consider two relations A and B.
• A has exactly two fields x and y.
• B has just one field y, with the same domain as in A.
• The division operation A/B is the set of all x values
(in the form of unary tuples) such that for every y
value in a tuple of B, there is a tuple <x,y> in A.
36
x y y x
1 B 1 A/B
2 2
3
1
A 1 The division operation A/B is the set of all x
1 values (in the form of unary tuples) such that
3 for every y value in a tuple of B, there is a
tuple <x,y> in A.
4
1
2
37
x y y x
1 B 1 A/B
2 2
3
1
A 1 Another way to understand division is as
1 follows:
3 For each x value in A, consider the set of y
values that appear in tuples of A with that
4 x value. If this set contains all y values in
1 B, the x value is in the result of A/B.
2
38
x y y x
1 B 1 A/B
2 2
3
1 An analogy with integer division may also help
to understand division. For integers A and B,
A 1 A/B is the largest integer Q such that Q * B A .
1 for relation instances A and B, A/B is the
3 largest relation instance Q such that Q B A
4 In fact we can compute the disqualified tuples
1 using the following algebra expression:
2 x (( x ( A) B ) A)
Thus, A/B is
x ( A) x (( x ( A) B) A)
39
The division above can be written in terms of basic operations:
Attributes in A but
A B not in B. Note that it is not a
x y y Temp1 A-B ( A )valid notation in relational algebra.
1 1 Temp2 A-B ( ( Temp1 B ) – A )
2 2 Result = Temp1 – Temp2
3
1 ( Temp1 B ) – A
x
Temp1 x x y
1 A/B
1 2
3 2
4 2
1
2 Temp2
40
Employee fname lname id bdate address salary sid dno
Works_on id pno
Query: Retrieve the names of employees who work on all the projects
that `John Sung’ works on.
Sung fname=“John” ^ lname=“Sung” ( Employee )
Sung_pnos pno ( Works_on Sung )
fname lname id bdate address salary sid dno pno
Result_id Works_on / Sung_pnos
Result fname,lname ( Result_id
Employee )
41
4.8 More examples
Sailors ( sid, sname, age )
Boats ( bid, bname, color )
Reserves ( sid, bid, date )
Consider the above schemas, the primary key fields are
underlined.
42
Sailors ( sid, sname, age )
Boats ( bid, bname, color )
Reserves ( sid, bid, date )
Query 1: Find the names of sailors who have
reserved boat with bid = 103.
More - Solution 1:
efficient
sname ( ( bid=103 Reserves ) Sailors )
- Solution 2:
( Temp1, bid=103 Reserves )
( Temp2, Temp1 Sailors )
sname ( Temp2 )
- Solution 3:
sname ( bid=103 ( Reserves Sailors ) )
43
Sailors ( sid, sname, age )
Boats ( bid, bname, color )
Reserves ( sid, bid, date )
Query 2: Find the names of sailors who have reserved at
least a red boat.
- Solution 1:
sname ( ( color=‘red’ Boats) Reserves Sailors )
- Solution 2:
sname ( sid ( ( bid color=‘red’ Boats ) Reserves ) Sailors )
44
Sailors ( sid, sname, age )
Boats ( bid, bname, color )
Reserves ( sid, bid, date )
Query 3: Find the names of sailors who have reserved at
least a red or a green boats.
We can identify all red or green boats, then find sailors who have
reserved one of these boats.
-Solution 1:
( Tempboats, ( color=‘red’ V color=‘green’ Boats ) )
sname ( Tempboats Reserves Sailors )
What happens if V is replaced by
^ in this query?
45
Sailors ( sid, sname, age )
Boats ( bid, bname, color )
Reserves ( sid, bid, date )
- Solution 2:
sname ( ( color=‘red’ Boats ) Reserves ) Sailors ) U
sname ( ( color=‘green’ Boats )
Reserves ) Sailors )
Query 4: Find the names of sailors who have reserved at least a
red boat and at least a green boat.
The previous solution 1 would not work. We must identify sailors
who have reserved red boats, sailors who have reserved green
boats, then find the intersection.
46
Sailors ( sid, sname, age )
Boats ( bid, bname, color )
Reserves ( sid, bid, date )
Note that sid is a key for Sailors, but sname is not a key.
( Tempred, sid ( ( color=‘red’ Boats ) Reserves ) )
( Tempgreen, sid ( ( color=‘green’ Boats )
Reserves ) )
sname ( ( Tempred Tempgreen ) Sailors )
47
Sailors ( sid, sname, age )
Boats ( bid, bname, color )
Reserves ( sid, bid, date )
Query 5: Find the names of sailors who have reserved
all boats.
( Tempsids, ( sid,bid Reserves ) / ( bid Boats ) )
sname ( Tempsids Sailors )
What if we simply do
Reserves / ( bid Boats ) ?
date sid bid
2-3-2002 007 A
3-7-2002 007 B
48
Sailors ( sid, sname, age )
Boats ( bid, bname, color )
Reserves ( sid, bid, date )
Query 6: To find sailors who have reserved all red boats:
( Tempsids, ( sid,bid Reserves ) / ( bid ( color=‘red’ Boats ) )
sname ( Tempsids Sailors )
49
Remarks
• Only standard relational algebra is covered
in the lecture
– Relation is assumed to be a set of records (no
duplicated records)
– Extended operators such as sorting operators
and aggregation operators are not covered.
50