CS145 Lecture Notes #6
Relational Algebra
Steps in Building and Using a Database
1. Design schema
2. Create schema in DBMS
3. Load initial data
4. Repeat: execute queries and updates on the database
Database Query Languages
What is a query?
Given a database, ask questions, get answers
Example: get all students who are now taking CS145
Example (from the TPC-D benchmark):
“The Volume Shipping Query finds, for two given nations, the gross discounted revenues derived from lineitems in which parts were
shipped from a supplier in either nation to a customer in the other nation during 1995 and 1996. The query lists the supplier nation, the
customer nation, the year, and the revenue from shipments that took place in that year. The query orders the answer by supplier nation,
customer nation, and year (all ascending).”
Some queries are easy to pose, some are not
Some queries are easy for DBMS to answer, some are not
Relational Query Languages
Formal: Relational Algebra, Relational Calculus, Datalog
Practical: SQL, Quel, QBE (Query-by-Example)
What is a relational query?
Input: a number of relations in your database
Output: one relation as the answer
Relational Algebra
Basic operators: selection, projection, cross product, union, differ-
ence, and renaming
Additional operators (can be defined using basic ones): theta-join,
natural join, intersection, etc.
Operands: relations
Input relation(s) operator output relation
Jun Yang 1 CS145 Spring 1999
Example:
Student(SID, name, age, GPA)
Take(SID, CID)
Course(CID, title)
Selection
Notation:
Purpose: pick rows according to some criteria
Input: a table
Output: has the same columns as , but only the rows of that satisfy
Example: the student with SID 123
Example: students with GPA higher than 3.0
Example: straight-A students under 18 or over 21
The selection predicate in general can include any columns of , con-
stants, comparisons such as , , etc., and Boolean connectives (and),
(or), (not)
Projection
Notation:
Purpose: pick columns to output
Input: a table
Output: has only the columns of listed in
Example: SID’s and names of all students
Example: SID’s of students taking classes
Notice the elimination of duplicate rows
Example of composing and : names of students under 18
Jun Yang 2 CS145 Spring 1999
Product and Joins
Cross Product
Notation:
Purpose: pair rows from two tables
Input: two tables and
Output: for each row in and each row in , output a row ; the output
table has the columns of and the columns of
Example: Student Take
If column names conflict, prefix the names with the table name and a dot
Looks odd to glue unrelated tuples together; why use then?
Example: names of students and CID’s of the courses they are taking
Theta-Join
Notation:
Purpose: relate rows from two tables according to some criteria
Shorthand for:
Example: names of students and CID’s of the courses they are taking
Natural Join
Notation:
Purpose: relate rows from two tables, and
enforce equality on all common attributes
eliminate one copy of common attributes
Shorthand for: , where , and
Example: Student Take
Example: names of students taking calculus
Jun Yang 3 CS145 Spring 1999
Set Operators
Union:
Difference:
Intersection:
Input: two tables and with identical schema
Output: has the same schema as and
Duplicate rows are eliminated (as usual) in union
is just a shorthand for
Example of union:
Student(SID, name, age, GPA)
GradStudent(SID, name, age, GPA, advisor)
Find all student SID’s
Example of difference: CID’s of the courses that nobody is taking
What if we also want course titles?
Renaming
Notation: , or
Purpose: rename a table and/or its columns
Example: SID’s of all pairs of classmates
Summary of Relational Algebra
where is any table in the database
... plus additional ones defined in terms of the above:
Jun Yang 4 CS145 Spring 1999
One tricky example: which students have the highest GPA?
When an expressions gets too hairy, it helps to assign some intermediate
result tables:
Or use an expression tree:
Jun Yang 5 CS145 Spring 1999