Ch-1- Query Processing and Optimization (2)
Ch-1- Query Processing and Optimization (2)
Query Processing
1
2.1 Query and Query Language
Query
Is a piece of code that is sent to a database in order to get
information back from the database.
It is used as the way (refers to the action) of retrieving the
data/information from database.
A database "query" is basically a "question" that you ask
the database.
The term 'query' means to search, to question, or to find.
When you query a database, you're searching for
information in the database.
Queries are one of the things that make databases so
2
powerful.
Cont.…
The result of the query is the information that is returned
by the database management system.
Queries are usually constructed using SQL (structured
query language) which resembles a high-level
programming language.
As with many other tasks, you can query a database either
programmatically or via a user interface.
An example of query can be this:
SELECT DISTINCT Credit_Hr FROM COURSE ORDER BY
CName;
The query searches the 'Credit_Hr' field from the
COURSE table and sorts the list by CName.
3
Why Query?
Usually, you will be selective with how much data you
want returned.
If you have a lot of data in your database, you probably
don't want to see everything.
More likely, you'll only want to see data that fits a certain
criteria.
For example,
you might only want to see how many individuals in
your database live in a given city.
Or you might only want to see which individuals have
registered with your database within a given time
period.
4
Query Language
Are computer languages used to make queries into
databases and information systems.
Broadly, can be classified according to whether they are
database query languages or information retrieval query
languages.
The difference is that a database query language attempts
to give factual answers to factual questions,
While an information retrieval query language attempts to
find documents containing information that is relevant to
an area of inquiry.
Query Languages != programming languages
QLs not intended to be used for complex calculations but
support easy, efficient access to large data sets.
5
Examples include:
Contextual Query Language (CQL) a formal language for
representing queries to information retrieval systems such
as web indexes or bibliographic catalogues.
QUEL is a relational database access language, similar in
most ways to SQL;
SQL is a well known query language and Data
Manipulation Language for relational databases;
XQuery is a query language for XML data sources;
.QL is a proprietary object-oriented query language for
querying relational databases; successor of Data log;
DMX is a query language for Data Mining models;
Datalog is a query language for deductive databases;
6
Cont…
Different query languages exist for different type of
databases.
For example MS Access uses SQL, which stands for
Structured Query Language.
MS Access contains Tables, Forms, and Queries.
The Forms are used to enter or display the data,
The Tables are where the data is saved, and
The Queries are used to search for specific data.
Relational model (database) supports simple, powerful
query languages that called Relational Query Languages.
Example Relational Query Languages : QUEL, SQL
7
Relational Query Languages
There are varieties of QLs used by relational DBMS for
manipulating relations.
Some of them are procedural
User tells the system exactly what and how to manipulate the
data
Others are non-procedural
User states what data is needed rather than how it is to be
retrieved.
Two mathematical Query Languages form the basis for
Relational languages
Relational Algebra:
Relational Calculus:
8
Cont…
Relational Algebra: Described as procedural language
More operational, very useful for representing execution
plans.
used to tell the DBMS how to build a new relation from
one or more relations in the database.
Relational Calculus: Described as non-procedural
language
Lets users describe what they want, rather than how to
compute it. (Nonoperational, declarative.)
used to formulate the definition of a relation in terms of
one or more database relations.
All tuples, possibly from several relations, are
manipulated in one statement with out looping
9
Cont.…
Both are the bases for relational languages
Both the algebra and calculus are formal, non user
friendly languages.
They have been used as the basis for other, high level
data manipulation languages(DMLs) for relational
databases.
Both are equivalent to each other. For every expression in
the algebra, there is an equivalent expression in the
calculus.
Both the operands and the results are relations and so the
output from one operation can become the input to
another operation.
10
Relational Algebra
These operations enable a user to specify basic retrieval
requests.
The result of the retrieval is a new relation, which may
have been formed from one or more relations.
The new relations, can be further manipulated using
operations of the same algebra.
A sequence of relational algebra operations forms a
relational algebra expression, whose result will also be
a relation that represents the result of a database query.
The output from one operation can become the input to
another operation (nesting is possible)
11
Fundamental operations in relational algebra are:
Selection ( )
Projection ( )
Cross/Cartesian-Product ( x )
Set-Difference ( - )
Union ( )
Intersection ()
Join
Division operations
Using these we can build up sophisticated database
queries.
12
The first five of those operations perform most of the data
retrieval operation that we are intersected in.
Intersection, Join, Division operations are operations that
can be expressed in terms of the five basic operations.
The selection and projection are unary operations, since
they operate on one relation.
The other operations work on pairs of relations and are
therefore called binary operations.
13
R
R S
X S
a 1 = a 1
2 a 2
b
3 a 3
b 1
b 2
(a) Selection (b) Projection b 3
(c) Cartesian Product
R-S
R U S
R n
S R
R
S S
• Staff
StaffN fName lName Positio Sex DOB Salar branch
o n y No
15
16
1) Selection (Or Restriction)
Selects subset of tuples/rows in a relation that satisfy
selection condition.
The Selection operation is applied to each tuple
individually
The degree of the resulting relation is the same as the
original relation but the cardinality (no. of tuples) is less
than or equal to the original relation.
The Selection operator is commutative.
Set of conditions can be combined using Boolean
operations [(AND), (OR), and ~(NOT)]
No duplicates in result!
17
Cont…
Notation: σ<Selection Condition><Relation Name>
Selection condition general form
<attribute name><comparison operators><constant value>
<attribute name><comparison operators>< attribute name>
comparison operators: =, >, >=, <, <=, !=
Example: List all staff with a salary greater than $10,000.
salary > 10000 (Staff)
18
2) Projection
Selects certain attributes while discarding the other from
the base relation.
Creates a vertical partitioning:
one with the needed columns (attributes) containing
results of the operation and other containing the
discarded Columns.
Deletes attributes that are not in projection list.
Projection operator has to eliminate duplicates! Because
duplication removal is necessary to insure that the
resulting table is also a relation.
Notation: ∏<Selected Attributes><Relation Name>
19
Or ∏a1,…an(R)
Cont…
Example: Produce a list of salaries for all staff, showing
only staffNo, fName, lName, and salary details.
staffNo, fName, lName, salary(Staff)
Exercise:
List all staff with a salary greater than $10,000,
showing only staffNo, fName, lName, and salary
details
20
3) Union Operation
Denoted as R U S
The union of two relations R and S defines a relation that
contains all the tuples of R or S or both R and S, duplicate
tuples being eliminated.
R and S must be union-compatible
Same number of fields.
Corresponding fields have the same type.
Example:
S1 U S2
21
Cont…
22
4) Intersection Operation
Denoted as R n S
The intersection operation defines a relation consisting of
the set of all tuples that are in both R and S.R and S must
be union compatible
R and S must be union-compatible
Example:
S1 n S2
23
5) Set Difference
Denoted as R-S
The set difference operation defines a relation consisting
of the tuples that are in relation R, but not in S.
R and S musts be Union compatible.
Example:
S1 - S2
24
6) Cartesian Product
Denoted as R x S
The Cartesian product operation defines a relation that is
the concatenation of every tuple of relation R with every
tuple of relation S
Each row of S1 is paired with each row of R1
The result of R x S is a relation Q with degree n + m
attributes Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in that order.
The resulting relation Q has one tuple for each
combination of tuples—one from R and one from S.
If R has n tuples, and S has m tuples, then | R x S | will
have n* m tuples.
25
Cartesian Product Example
Employee
Name SSN
John 999999999
Tony 777777777
Dependents
EmployeeSSN Dname
999999999 Emily
777777777 Joe
Employee x Dependents
Name SSN EmployeeSSN Dname
John 999999999 999999999 Emily
John 999999999 777777777 Joe
Tony 777777777 999999999 Emily
Tony 777777777 777777777 Joe
26
7) Join Operation
Typically we want only combinations of the Cartesian
product that satisfy certain conditions and so we would
normally use a Join Operation instead of the Cartesian
product operation.
Join is a derivative Cartesian product, equivalent to
performing a selection operation, using the join predicate
as the selection formula ,over the Cartesian product of the
operand relations.
join operation is one of the most difficult operations to
implement efficiently in a RDBMS and is one of the
reasons why relational systems have intrinsic
performance problems.
27
Cont…
There are various forms of Join operations ,each with
subtle differences
Theta join (Ө-join)
Equijoin(a particular type of Theta join)
Natural join
Outer join
Semi join
28
• Client
clientNo fName lName tellNo PrefType maxRent
• Viewing
clientNo propertNo viewDate comment
29
Theta join (Ө-join)
R FS
The theta join operation defines a relation that contains
tuples satisfying the predicate F from the Cartesian
product of R and S
The predicate F is of the form R.ai Ө S.bi where Ө may
be one of the comparison operators(<,<=,>,>=,≠)
In the case where the predicate F contains only equality
(=), the term Equijoin is used instead.
Can rewrite Theta join using basic Selection and
Cartesian product operations.
R FS = F(R S)
30
Example
List the names and comments of all clients who have
viewed a property for rent.
(clientNo, fName, lName(Client)) Client.clientNo = Viewing.clientNo (clientNo,
propertyNo, comment (Viewing))
31
Natural Join
R S
The natural join is an Equijoin of the two relations R and
S over all common attributes x.
One occurrence of each common attribute is eliminated
from the result.
Ex. List the names and comments of all clients who have
viewed a property for rent
Natural join(to remove one of the similar attributes)
32
Example
33
Another Natural Join Example
R= A B B C
S=
X Y Z U
X Z V W
Y Z Z V
Z V
A B C
R S= X Z U
X Z V
Y Z U
Y Z V
Z V W
34
Outer/left join
R S
The outer join is a join in which tuples from R that do not
have matching values in the common attributes of S are
also included in the result relation.
Missing values in the second relation are set to null.
Example:
A B C
R S
X Y NULL
X Z U
X Z V
Y Z U
Y Z V
Z V W
35
Semijoin
R FS
Defines a relation that contains the tuples of R that
participate in the join of R with S.
It decreases the number of tuples that need to be handled
to form the join.
It is particularly useful for computing joins in distributed
systems
Example:
List complete details of all staff who work at branch in Glascow
36
Cont…
Staff staff.branchNo=Branch.branchNo Branch.city=’Glasgow’
Branch
37
8) Division Operation
R ÷ S
Let table R is defined over the attributes set A and
relation S is defined over the attribute set B such that B C
A.
C= A-B, that is, C is the set of attributes of R that are not
attributes of S.
The Division operation defines a relation over the
attributes C that consists of the set of tuples from R that
match the combination of every tuple in S.
Example: Identify all clients who have viewed all
properties with three rooms
(∏clientNo,propertyNo(viewing))÷(∏propertyNo
38 (σrooms=3(PropertyForRent)))
Example of Division A/B
Pno Pno
Sn Pn Pno
o o P2 P1
P2
S1 P1 P4 P2
B
S1 P2 B2 P4
1
S1 P3 B3
S1 P4
Sno
S2 P1
S1
S2 P2 Sn Sn
S2 o
S3 P2 o
S3 S1
S4 P2 S1
S4 S4 A/B3
S4 p4 A/B2
A A/B1
39