Database System Notes
Database System Notes
Introduction
• Q: What is a Database ?
• Answer from Pratt/Adamski:
o A Database (DB) is structure that can store information about:
1. multiple types of entities,
2. the attributes that describe those entities; and
3. the relationships among the entities
• Answer from Elmasri/Navathe:
o A Database (DB) is collection of related data - with the following properties:
1. A DB is logically coherent and has some relevant meaning
2. A DB is designed, built and populated with data for a specific purpose
3. A DB represents some aspect of the real world.
• Answer from Kroenke: An integrated, self-describing collection of related data
o Integrated: Data is stored in a uniform way, typically all in one place (a single
physical computer for example)
o Self-Describing: A database maintains a description of the data it contains
(Catalog)
o Related: Data has some relationship to other data. In a University we have
students who take courses taught by professors
o By taking advantage of relationships and integration, we can provide information
to users as opposed to simply data.
o We can also say that the database is a model of what the users perceive.
o Three main categories of models:
1. User or Conceptual Models: How users perceive the world and/or the
business.
2. Logical Models: Represent the logic of how a a business operates. For
example, the relationship between different entities and the flow of data
through the organization. Based on the User's model.
3. Physical Models: Represent how the database is actually implemented on
a computer system. This is based on the logical model.
• Database Management System (DBMS)
A collection of software programs that are used to define, construct, maintain and
manipulate data in a database.
• Database System (DBS) contains:
The Database +
The DBMS +
Application Programs (what users interact with)
~ 1 ~
File Systems
~ 2 ~
Example Database
An Example Database
• 1940's, 50's Initial use of computers as calculators. Limited data, focus on algorithms.
Science, military applications.
• 1960's Business uses. Organizational data, customer data, sales, inventory, accounting,
etc. File system based, high emphasis on applications programs to extract and assimilate
data. Larger amounts of data, relatively simple calculations.
• 1970's The relational model. Data separated into individual tables. Related by keys.
Initially required heavy system resources. Examples: Oracle, Sybase, Informix, Digital
RDB, IBM DB2.
• 1980's Microcomputers - the IBM PC, Apple Macintosh. Database program such as
DBase (sort of), Paradox, FoxPro, MS Access. Individual user can crate, maintain small
databases.
~ 3 ~
• Late- 1980's Local area networks. Workgroups sharing resources such as files, printers,
e-mail.
Client/Server Database resides on a central server, applications programs run on client
PCs attached to the server over a LAN.
• 1990's Internet and World Wide Web make databases of all kinds available from a single
type of client - the Web Browser. Data warehousing and Data Mining also emerge.
• Other types of Databases:
o Object-Oriented Database Systems. Objects (data and methods) stored
persistently.
o Distributed Database Systems. Copies of data reside at different locations for
redundancy or for performance reasons.
Contents of a Database
A Database contains:
• User Data
• Metadata
• Indexes
• Application metadata
User Data
~ 4 ~
• Data users work with directly by entering, updating and viewing.
• For our purposes, data will be generally stored in tables with some relationships between
tables.
• Each table has one or more columns. A set of columns forms a database record.
• Recall our example database for the bank. What were some problems we discussed ?
• Here is one improvement - split into 2 tables:
Customer Table
CustomerID Name Address City State
123 Mr. Smith 123 Lexington Smithville KY
124 Mrs. Jones 12 Davis Ave. Smithville KY
125 Mr. Axe 443 Grinder Ln. Broadville GA
127 Mr. & Mrs. Builder 661 Parker Rd. Streetville GA
Accounts Table
• The customer table has 4 records and 5 columns. The Accounts table has 7 records and 3
columns.
• Note relationship between the two tables - CustomerID column.
• How should we split data into the tables ? What are the relationships between the tables ?
There are questions that are answered by Database Modeling and Database Design.
Metadata
~ 5 ~
• Metadata are typically stored in System tables or System Catalog and are typically only
directly accessible by the DBMS or by the system administrator.
Have a look at the Database Documentor feature of MS Access (under the tools menu,
choose Analyze and then Documentor).
This tool queries the system tables to give all kinds of Metadata for tables, etc. in an MS
Access database.
Indexes
• In keeping with our desire to provide users with several different views of data, indexes
provide an alternate means of accessing user data. Sorting and Searching:
• An index for our new banking example might include the account numbers in a sorted
order.
• Indexes allow the database to access a record without having to search through the entire
table.
• Updating data requires an extra step: The index must also be updated.
• Example: Index in a book consists of two things:
1) A Keyword stored in order
2) A pointer to the rest of the information. In the case of the book, the pointer is a page
number.
Applications Metadata
• Many DBMS have storage facilities for forms, reports, queries and other application
components.
• Applications Metadata is accessed via the database development programs.
• Example: Look at the Documentor tool in MS Access. It can also show metadata for
Queries, Forms, Reports, etc.
• Database Design: The activity of specifying the schema of a database in a given data
model
• Database Schema: The structure of a database that:
o Captures data types, relationships and constraints in data
o Is independent of any application program
o Changes infrequently
• Data Model:
o A set of primitives for defining the structure of a database.
o A set of operations for specifying retrieval and updates on a database
o Examples: Relational, Hierarchical, Networked, Object-Oriented
• Database Instance or State: The actual data contained in a database at a given time.
~ 6 ~
The Database Development Process
The following is a very brief outline describing the database development process.
• User needs assessment and requirements gathering: Determine what the user's are
looking for, what functions should be supported, how the system should behave.
• Data Modeling: Based on user requirements, form a logical model of the system. This
logical model is then converted to a physical data model (tables, columns, relationships,
etc.) that will be implemented.
• Implementation: Based on the data model, a database can be created. Applications are
then written to perform the required functions.
• Testing: The system is tested using real data.
• Deployment: The system is deployed to users. Maintenance of the system begins.
There are many variations to this basic development process. A Systems Analysis and Design
course (such as CIS 3900 for undergraduates, CIS 9490 for graduates) covers these topics in
greater detail.
For our Bank example, lets assume that the managers are interested in creating a database to
track their customers and accounts.
• Tables
CUSTOMERS
Customer_Id, Name, Street, City, State, Zip
ACCOUNTS
Customer_Id, Account_Number, Account_Type, Date_Opened, Balance
Note that we use an artificial identifier (a number we make up) for the customer called
Customer_Id. Given a Customer_Id, we can uniquely identify the remaining information.
We call Customer_Id a Key for the CUSTOMERS table.
Notice that when naming columns in the tables we always use an underscore character
and do not use any other punctuation. even though Access allows you to use spaces, etc.
it is not a good idea.
~ 7 ~
• Relationships
The relationship between CUSTOMERS and ACCOUNTS is by Customer_Id.
Since a customer may have more than one account at the bank, we call this a One to
Many relationship. (1:N).
• Domains
A domain is a set of values that a column may have. Domain also includes the type and
length or size of data found in each column.
CUSTOMERS
Column Domain
Data Type Size
Customer_Id (Key) Integer 20
Name Character 30
Street Character 30
City Character 25
State Character 2
Zip Character 5
ACCOUNTS
Column Domain
Data Type Size
Customer_Id (FK) Integer 20
Account_Number (Key) Integer 15
Account_Type Character 2
Date_Opened Date
Balance Real 12,2
~ 8 ~
Customer_Id Account_Number Account_Type Date_Opened Balance
123 9987 Checking 10/12/89 4000.00
123 9980 Savings 10/12/89 2000.00
124 8811 Savings 01/05/92 1000.00
125 4422 Checking 12/01/94 6000.00
125 4433 Savings 12/01/94 9000.00
127 3322 Savings 08/22/94 500.00
127 1122 Checking 11/13/88 800.00
• Business Rules
Business rules allow us to specify constraints on what data can appear in tables and what
operations can be performed on data in tables. For example:
~ 9 ~
• E-R Models are Conceptual Models of the database. They can not be directly
implemented in a database.
• Many variations of E-R Modeling used in practice.
• Mainly differences in notation, symbols used to represent the 4 main constructs.
~ 10 ~
Exercise: Come up with a list of attributes for each of the entities above.
Exercise: Choose one of your attributes as the identifier for each of the entities above.
E-R Diagrams
• The most common way to represent the E-R constructs is by using a diagram
~ 11 ~
• There are a wide variety of notations for E-R Diagrams. Most of the differences concern
how relationships are specified and how attributes are shown.
• In almost all variations, entities are depicted as rectangles with either pointed or rounded
corners. The entity name appears inside.
• Relationships can be displayed as diamonds (see below) or can be simply line segments
between two entities.
• For Relationships, need to convey: Relationship name, degree, cardinality, optionality
(minimal cardinality)
• Here we will give examples from 4 variations: The Kroenke textbook, Elmasri/Navathe
textbook, Oracle Designer/2000 and Visible Analyst.
~ 12 ~
• Relationship Name: Displayed just inside the relationship diamond.
• Degree: Shown by line segments between the relationship diamond and 2 or more
entities.
• Cardinality: Displayed between the participating entity and the relationship diamond next
to the relationship line. Split up the cardinality.
• Optionality: Mandatory participation indicated by double relationship line
Optional participation indicated by a single relationship line.
~ 13 ~
• One ORDER must be placed by one and only one CUSTOMER.
• One CUSTOMER may be placing zero or more ORDERS.
• One ORDER may be made up of zero or more ITEMS.
• One ITEM must be an item on one and only one ORDER.
There are a set of tools that can print these "relationship sentences".
• Visible Analyst Workbench (VAW) uses the rounded box to show an Attributive Entity -
one that depends on the existence of a fundamental entity (noted by just the rectangle).
• The relationships use the following symbols:
o For cardinality, the crow's feet are used to show a "Many" side of a relationship.
o A single line show a "One" side of the relationship.
o Optional participation is shown with an open circle. Thus in the above diagram, a
Customer May place one or more Orders.
~ 14 ~
o Mandatory participation is shown with two hash marks. Thus in the above
diagram, an Order Must be placed by one and only one Customer.
Displaying Attributes
~ 15 ~
• Technically, an Entity-Relationship diagram should show only entities and their
relationships.
• Consider: Entity-Relationship-Attribute (ERA) model.
• Two main ways to display attributes associated with an entity.
1. Attributes appear in ovals attached to the entity. Gets messy.
2. List attributes inside of the entity box.
Weak Entities
• Broad definition. Weak Entity: An entity that depends on another for its existence.
• Elmasri/Navathe definition: Weak entity: Entity types that do not have key attributes of
their own.
• ID Dependent Entity: A weak entity that includes the identifier of the related strong
entity.
• Examples of strong entities:
People, Employees, Customers, Clients, Vendors, Students
Products, Services, Parts, Resources, Materials
Banks
• Examples of ID Dependent entities: Dependents (of employees), Bank Branches (of
Banks).
• ID Dependent entities are sometimes shown with curved boxes as in the Visible Analyst
ER example. Note that an ITEM can not exist by itself. It must be identified with a
specific Order.
~ 16 ~
• The Elmasri/Navathe notation shows the ID Dependent entity with a double box. The
"identifying relationship" (from the strong entity to the weak entity) is shown with a
double diamond.
• Final note: ID Dependent entities will always result in relations (and later on tables) with
composite keys.
Subtype Entities
• Attributes of two or more Entities may overlap significantly but not completely.
• Consider:
Phone Call (Source#, Destination#, Time of day, Duration)
LongDistance Call (Source#, Destination#, Time of day, Duration, Long distance Carrier)
Cell Phone Call (Source#, Destination#, Time of day, LandTime, AirTime)
• One approach would be to put all of the attributes into a single entity.
• Second approach, put common attributes into a parent or supertype entity and then have
3 subtype entities.
~ 17 ~
• Relationship is called an IS-A relationship.
The above diagram uses the Oracle Designer/2000 symbols for Supertype/Subtype.
Below is the same diagram drawn using E-R symbols from the Elmasri/Navathe book.
The d in the circle indicates the subtype entity is distinct. Only one subtype entity can
participate in an instance.
As before, the double line between the Call entity and the d in the circle indicates the
relationship is mandatory.
~ 18 ~
4. The order of columns is insignificant
5. No two rows (tuples) in a relation can be identical.
6. The order of the rows (tuples) is insignificant.
• From our discussion of E-R Modelling, we know that an Entity typically corresponds to a
relation and that the Entity's attributes become attributes of the relation.
• We also discussed how, depending on the relationships between entities, copies of
attributes (the identifiers) were placed in related relations.
Functional Dependencies
• Key: One or more attributes that uniquely identify a tuple (row) in a relation.
• The selection of keys will depend on the particular application being considered.
• Users can offer some guidance as to what would make an appropriate key. Also this is
pretty much an art as opposed to an exact science.
• Recall that no two relations should have exactly the same values, thus a candidate key
would consist of all of the attributes in a relation.
• A key functionally determines a tuple (row).
• Not all determinants are keys.
~ 19 ~
Modification Anomalies
• Once our E-R model has been converted into relations, we may find that some relations
are not properly specified. There can be a number of problems:
o Deletion Anomaly: Deleting a relation results in some related information (from
another entity) being lost.
o Insertion Anomaly: Inserting a relation requires we have information from two
or more entities - this situation might not be feasible.
• Here is a quick example: A company has a Purchase order form:
~ 20 ~
LINE_ITEMS (PO_Number, ItemNum, PartNum, Description, Price, Qty)
PO_HEADER (PO_Number, PODate, Vendor, Ship_To, ...)
Normalization
~ 21 ~
• Relations can fall into one or more categories (or classes) called Normal Forms
• Normal Form: A class of relations free from a certain set of modification anomalies.
• Normal forms are given name such as:
o First normal form (1NF)
o Second normal form (2NF)
o Third normal form (3NF)
o Boyce-Codd normal form (BCNF)
o Fourth normal form (4NF)
o Fifth normal form (5NF)
o Domain-Key normal form (DK/NF)
• These forms are cumulative. A relation in Third normal form is also in 2NF and 1NF.
• A relation is in second normal form (2NF) if all of its non-key attributes are dependent on
all of the key.
• Relations that have a single attribute for a key are automatically in 2NF.
• This is one reason why we often use artificial identifiers as keys.
• In the example below, Close Price is dependent on Company, Date and Symbol, Date
• The following example relation is not in 2NF:
STOCKS (Company, Symbol, Headquarters, Date, Close_Price)
Company Symbol Headquarters Date Close Price
~ 22 ~
IBM IBM Armonk, NY 01/05/94 101.00
IBM IBM Armonk, NY 01/06/94 100.50
IBM IBM Armonk, NY 01/07/94 102.00
Netscape NETS Sunyvale, CA 01/05/94 33.00
Netscape NETS Sunyvale, CA 01/06/94 112.00
• Company, Date -> Close Price
• Symbol, Date -> Close Price
• Company -> Symbol, Headquarters
• Symbol -> Company, Headquarters
•
• Consider that Company, Date -> Close Price.
So we might use Company, Date as our key.
However: Company -> Headquarters
This violates the rule for 2NF. Also, consider the insertion and deletion anomalies.
• One Solution: Split this up into two relations:
COMPANY (Company, Symbol, Headquarters)
STOCKS (Symbol, Date, Close_Price)
Company Symbol Headquarters
IBM IBM Armonk, NY
Netscape NETS Sunnyvale, CA
• Company -> Symbol, Headquarters
• Symbol -> Company, Headquarters
Symbol Date Close Price
IBM 01/05/94 101.00
IBM 01/06/94 100.50
IBM 01/07/94 102.00
NETS 01/05/94 33.00
NETS 01/06/94 112.00
• Symbol, Date -> Close Price
• A relation is in third normal form (3NF) if it is in second normal form and it contains no
transitive dependencies.
~ 23 ~
• Consider relation R containing attributes A, B and C.
If A -> B and B -> C then A -> C
• Transitive Dependency: Three attributes with the above dependencies.
• Example: At CUNY:
• Course_Code -> Course_Num, Section
• Course_Num, Section -> Classroom, Professor
• Example: At Rutgers:
• Course_Index_Num -> Course_Num, Section
• Course_Num, Section -> Classroom, Professor
• Example:
Company County Tax Rate
IBM Putnam 28%
AT&T Bergen 26%
• Company -> County
• and
• County -> Tax Rate
• thus
• Company -> Tax Rate
• What happens if we remove AT&T ?
We loose information about 2 different themes.
• Split this up into two relations:
Company County
IBM Putnam
AT&T Bergen
• Company -> County
County Tax Rate
Putnam 28%
Bergen 26%
• County -> Tax Rate
~ 24 ~
FundID InvestmentType Manager
99 Common Stock Smith
99 Municipal Bonds Jones
33 Common Stock Green
22 Growth Stocks Brown
11 Common Stock Smith
• FundID, InvestmentType -> Manager
• FundID, Manager -> InvestmentType
• Manager -> InvestmentType
• In this case, the combination FundID and InvestmentType form a candidate key because
we can use FundID,InvestmentType to uniquely identify a tuple in the relation.
• Similarly, the combination FundID and Manager also form a candidate key because we
can use FundID, Manager to uniquely identify a tuple.
• Manager by itself is not a candidate key because we cannot use Manager alone to
uniquely identify a tuple in the relation.
• Is this relation R(FundID, InvestmentType, Manager) in 1NF, 2NF or 3NF ?
Given we pick FundID, InvestmentType as the Primary Key: 1NF for sure.
2NF because all of the non-key attributes (Manager) is dependant on all of the key.
3NF because there are no transitive dependencies.
• Consider what happens if we delete the tuple with FundID 22. We loose the fact that
Brown manages the InvestmentType "Growth Stocks."
• The following are steps to normalize a relation into BCNF:
1. List all of the determinants.
2. See if each determinant can act as a key (candidate keys).
3. For any determinant that is not a candidate key, create a new relation from the
functional dependency. Retain the determinant in the original relation.
• For our example:
Rorig(FundID, InvestmentType, Manager)
1. The determinants are:
FundID, InvestmentType
FundID, Manager
Manager
2. Which determinants can act as keys ?
FundID, InvestmentType YES
FundID, Manager YES
Manager NO
3. Create a new relation from the functional dependency:
Rnew(Manager, InvestmentType)
Rorig(FundID, Manager)
In this last step, we have retained the determinant "Manager" in the original
relation Rorig.
~ 25 ~
Fourth Normal Form (4NF)
• A few characteristics:
1. No regular functional dependencies
2. All three attributes taken together form the key.
3. Latter two attributes are independent of one another.
4. Insertion anomaly: Cannot add a stock fund without adding a bond fund (NULL
Value). Must always maintain the combinations to preserve the meaning.
~ 26 ~
• Stock Fund and Bond Fund form a multivalued dependency on Portfolio ID.
• PortfolioID ->-> Stock Fund
• PortfolioID ->-> Bond Fund
• Resolution: Split into two tables with the common key:
Portfolio ID Stock Fund
999 Janus Fund
999 Scudder Global Fund
888 Kaufmann Fund
Portfolio ID Bond Fund
999 Municipal Bonds
999 Dreyfus Short-Intermediate Municipal Bond Fund
888 T. Rowe Price Emerging Markets Bond Fund
• There are certain conditions under which after decomposing a relation, it cannot be
reassembled back into its original form.
• We don't consider these issues here.
De-Normalization
• Consider the following relation:
CUSTOMER (CustomerID, Name, Address, City, State, Zip)
~ 27 ~
• This relation is not in DK/NF because it contains a functional dependency not implied by
the key.
• Zip -> City, State
• We can normalize this into DK/NF by splitting the CUSTOMER relation into two:
CUSTOMER (CustomerID, Name, Address, Zip)
CODES (Zip, City, State)
• We may pay a performance penalty - each customer address lookup requires we look in
two relations (tables).
• In such cases, we may de-normalize the relations to achieve a performance improvement.
All-in-One Example
Many of you asked for a "complete" example that would run through all of the normal forms
from beginning to end using the same tables. This is tough to do, but here is an attempt:
Example relation:
EMPLOYEE ( Name, Project, Task, Office, Phone )
Example Data:
~ 28 ~
First Normal Form
~ 29 ~
Bill 100X T2
Bill 200Y T1
Bill 200Y T2
Sue 100X T33
Sue 200Y T33
Sue 300Z T33
Ed 100X T2
Office Phone
400 1400
442 1442
588 1588
~ 30 ~
EMPLOYEE_PROJECT (Name, Project ) Name Project
R4 (Office, Phone)
Office Phone
400 1400
442 1442
588 1588
~ 31 ~
Relation Name
CUSTOMER (CustomerID, Name, Street, City, State, Zip, Phone)
Example Data
Functional Dependencies
Normalization
Check both CUSTOMER and ZIPCODE to ensure they are both in 1NF up to BCNF.
Relational Algebra:
Hoffer,
Mata-
Elmasri/Navathe Kroenke Connolly/Begg Rob/Coronel Prescott &
Toledo /
(3rd) ed. (7th ed.) (3rd Ed.) (5th ed) McFadden
Cushman
(6th ed.)
Shaum's
Chapter 7 Chapter 8 Chapter 4 N/A N/A Outlines Ch.
2
~ 32 ~
• Recall, the Relational Model consists of the elements: relations, which are made up of
attributes.
• A relation is a set of attributes with values for each attribute such that:
1. Each attribute value must be a single value only (atomic).
2. All values for a given attribute must be of the same type (or domain).
3. Each attribute name must be unique.
4. The order of attributes is insignificant
5. No two rows (tuples) in a relation can be identical.
6. The order of the rows (tuples) is insignificant.
• Relational Algebra is a collection of operations on Relations.
• Relations are operands and the result of an operation is another relation.
• Two main collections of relational operators:
1. Set theory operations:
Union, Intersection, Difference and Cartesian product.
2. Specific Relational Operations:
Selection, Projection, Join, Division
S
First Last Age
Forrest Gump 36
Sally Green 28
DonJuan DeMarco 27
• Union: R S
Result: Relation with tuples from R and S with duplicates removed.
• Difference: R - S
Result: Relation with tuples from R but not from S
~ 33 ~
• Intersection: R S
Result: Relation with tuples that appear in both R and S.
R S
R-S
R S
~ 34 ~
o Union and Intersection are commutative operations
R S=S R
R S=S R
o Difference operation is NOT commutative.
R - S not equal S - R
o The resulting relations may not have meaningful names for the attributes.
Convention is to use the attribute names from the first relation.
Exercises
• Assume relation T
fName lName Score
William Smith 44
Sally Green 28
Mary Kontrary 27
• Compute R T
Compute R T
Show that R - T is not equal to T - R
Cartesian Product
• Produce all combinations of tuples from two relations.
~ 35 ~
Dinner Dessert
Steak Ice Cream
Lobster Cheesecake
RXS
Selection Operator
T F T F T F
T T F T T T F T
F F F F T F
~ 36 ~
Selection Examples
~ 37 ~
• Select only those Employees who are not in the CS department or Adjuncts:
(Rank = 'Adjunct' Dept = 'CS') (EMP)
Result:
Exercises
For this expression, use R and S from the Set Theoretic Operations section above.
Projection Operator
Projection Examples
~ 38 ~
Name Dept
Smith CS
Jones Econ
Green Econ
Brown CS
Smith Fin
• The selection and projection operators can be combined to perform both operations.
• Show the names of all employees working in the CS department:
name ( Dept = 'CS' (EMP) )
Results:
Name
Smith
Brown
• Show the name and rank of those Employees who are not in the CS department or
Adjuncts:
name, rank ( (Rank = 'Adjunct' Dept = 'CS') (EMP) )
Result:
Name Rank
Green Assistant
Smith Associate
Exercises
For this expression, use R and S from the Set Theoretic Operations section above.
~ 39 ~
Aggregate Functions
~ 40 ~
• Find the total payroll for the Economics department: SUM (salary) ( Dept = 'Econ' (EMP) )
Results:
SUM(salary)
85000
Join Operation
• Join operations bring together two relations and combine their attributes and tuples in a
specific fashion.
Join Examples
Assume we have the EMP relation from above and the following DEPART relation:
~ 41 ~
Green 160 Econ 50000 Econ 200 555-1234
Brown 420 CS 65000 CS 404 555-1212
Smith 500 Fin 60000 Fin 501 555-4321
• Find all information on every employee including their department info where the
employee works in an office numbered less than the department main office:
EMP (emp.office < depart.mainoffice) (emp.dept = depart.dept) DEPART
Results:
Natural Join
• Notice in the generic (Theta) join operation, any attributes in common (such as dept
above) are repeated.
• The Natural Join operation removes these duplicate attributes.
• The natural join operator is: *
• We can also assume using * that the join condition will be = on the two attributes in
common.
• Example: EMP * DEPART
Results:
Outer Join
~ 42 ~
• In the Join operations so far, only those tuples from both relations that satisfy the join
condition are included in the output relation.
• The Outer join includes other tuples as well according to a few rules.
• Three types of outer joins:
1. Left Outer Join includes all tuples in the left hand relation and includes only
those matching tuples from the right hand relation.
2. Right Outer Join includes all tuples in the right hand relation and includes
ony those matching tuples from the left hand relation.
3. Full Outer Join includes all tuples in the left hand relation and from the right
hand relation.
PEOPLE: MENU:
Name Age Food Food Day
Alice 21 Hamburger Pizza Monday
Bill 24 Pizza Hamburger Tuesday
Carl 23 Beer Chicken Wednesday
Dina 19 Shrimp Pasta Thursday
Tacos Friday
~ 43 ~
Bill 24 Pizza Pizza Monday
Alice 21 Hamburger Hamburger Tuesday
NULL NULL NULL Chicken Wednesday
NULL NULL NULL Pasta Thursday
NULL NULL NULL Tacos Friday
Outer Union
• The Outer Union operation is applied to partially union compatible relations.
• Operator is: *
• Example: PEOPLE * MENU
~ 44 ~
Dina 19 Shrimp NULL
NULL NULL Hamburger Monday
NULL NULL Pizza Tuesday
NULL NULL Chicken Wednesday
NULL NULL Pasta Thursday
NULL NULL Tacos Friday
When doing homework assignments and projects, it is very helpful to be able to type these
relational algebra symbols into MS Word or other work processor. Since we mainly use MS
Word or another word processor running in Microsoft Windows, we demonstrate them here.
Most of the relational algebra symbols can be produced using the "Symbol" font. One way to do
this is to use the Symbol choice on the Insert menu in MS Word. This is shown below:
~ 45 ~
By default, the symbols displayed on this screen will use the Symbol font.
Some symbols such as join and outer join are not available in this fashion. For these you can
copy and paste the graphics in the MS Word file linked here. All of the relational algebra
symbols are included.
1. Data Definition Language (DDL) Used to create (define) data structures such as
tables, indexes, clusters
2. Data Manipulation Language (DML) is used to store, retrieve and update data
from tables.
~ 46 ~
SQL Data Types
• Each implementation of SQL uses slightly different names for the data types.
Character Strings
~ 47 ~
Boolean 2 bytes True or False.
Integer 2 bytes -32,768 to 32,767.
Long (long
4 bytes -2,147,483,648 to 2,147,483,647.
integer)
Single (single-
-3.402823E38 to -1.401298E-45 for negative values;
precision floating- 4 bytes
1.401298E-45 to 3.402823E38 for positive values.
point)
Double (double- -1.79769313486232E308 to -4.94065645841247E-324 for
precision floating- 8 bytes negative values; 4.94065645841247E-324 to
point) 1.79769313486232E308 for positive values.
Currency (scaled
8 bytes -922,337,203,685,477.5808 to 922,337,203,685,477.5807.
integer)
Date 8 bytes January 1, 100 to December 31, 9999.
Object 4 bytes Any Object reference.
String (variable- 10 bytes + 0 to approx. 2 billion (approx. 65,400 for MS Windows
length) string length version 3.1).
String (fixed- Length of
1 to approximately 65,400.
length) string
Variant (with
16 bytes Any numeric value up to the range of a Double.
numbers)
Variant (with 22 bytes +
Same range as for variable-length String.
characters) string length
• Oracle supports the following data types:
o Numeric: BINARY_INTEGER, DEC, DECIMAL, DOUBLE PRECISION,
FLOAT, INT, INTEGER, NATURAL, NATURALN, NUMBER, NUMERIC,
PLS_INTEGER, POSITIVE, POSITIVEN, REAL, SMALLINT
o Date: DATE
Note: Also stores time.
o Character: CHAR, CHARACTER, STRING, VARCHAR, VARCHAR2
o Others: BOOLEAN, LONG, LONG RAW, RAW
Note: You will not need to memorize the above two tables for exams, etc. They are only
there for your reference.
~ 48 ~
• Define access privileges on objects
• SQL2 specification supports the creation of multiple schemas per database each with a
distinct owner and authorized users.
Creating a Schema
Note: To try out these SQL examples in MS Access, go to the Queries form and choose New, then choose Design
View and then close the next dialog box. Under the View menu, choose SQL. From this point, you can type in any
SQL statement and execute it. Note that MS Access's DDL syntax is extremely limited. Most of the DDL statements
below (including domains, NOT NULL constraints and referential integrity constraints) are not supported.
• Creating a Table:
• CREATE TABLE employee (
• Last_Name VARCHAR(20) NOT NULL,
• First_name VARCHAR(18) NOT NULL,
• Soc_Sec VARCHAR(11) NOT NULL,
• Date_of_Birth DATE,
• Salary NUMBER(8,2)
• ) ;
•
• CREATE TABLE dependant (
• Last_Name VARCHAR(20) NOT NULL,
• First_name VARCHAR(18) NOT NULL,
• Soc_Sec VARCHAR(11) NOT NULL,
• Date_of_Birth DATE,
• Employee_Soc_Sec VARCHAR(11) NOT NULL
• );
•
• Note: When naming tables, columns and other database objects, do not include spaces in
the names. For example, do not call the last name column: Last Name
If you wish to separate words in a name, use the underscore character.
• Specifying Primary and Foreign keys:
• CREATE TABLE order_header (
• order_number NUMBER(10,0) NOT NULL,
• order_date DATE,
• sales_person VARCHAR(25),
• bill_to VARCHAR(35),
• bill_to_address VARCHAR(45),
• bill_to_city VARCHAR(20),
• bill_to_state VARCHAR(2),
• bill_to_zip VARCHAR(10),
• PRIMARY KEY (order_number)
• );
•
• CREATE TABLE order_items (
• order_number NUMBER(10,0) NOT NULL,
• line_item NUMBER(4,0) NOT NULL,
• part_number VARCHAR(12) NOT NULL,
~ 49 ~
• quantity NUMBER(4,0),
• PRIMARY KEY (order_number, line_item),
• FORIEGN KEY (order_number)
• REFERENCES order_header (order_number),
• FOREIGN KEY (part_number)
• REFERENCES parts (part_number)
• );
•
• CREATE INDEX order_index
• ON order_header (order_number) ASC ;
•
• CREATE INDEX items_index
• ON order_items (order_number, line_item) ASC ;
•
• Example from MS Access:
• CREATE TABLE employee (
• FirstName TEXT,
• LastName TEXT,
• ssn INTEGER
• CONSTRAINT ssnConstraint PRIMARY KEY
• );
•
• CREATE INDEX employee_index
• ON employee (ssn) ;
• Constraints on attributes:
o NOT NULL - Attribute may not take a NULL value
o DEFAULT - Store a given default value i no value is specified
o PRIMARY KEY - Indicate which attribute(s) form the primary key
o FOREIGN KEY - Indicate which attribute(s) form a foreign key.
This enforces referential integrity
o UNIQUE - Indicates which attribute(s) must have unique values.
• Specify when constraint should be enforced:
o Immediate
o Deferrable until commit time
• Referential Integrity Constraint: Specify the behavior for child tuples when a parent
tuple is modified.
• Action to take if referential integrity is violated:
o SET NULL - Child tuples foreign key is set to NULL - Orphans.
o SET DEFAULT - Set the value of the foreign key to some default value.
o CASCADE - Child tuples are updated (or deleted) according to the action take on the
parent tuple.
• Examples of ON DELETE and ON UPDATE
• CREATE TABLE order_items (
• order_number NUMBER(10,0) NOT NULL,
~ 50 ~
• line_item NUMBER(4,0) NOT NULL,
• part_number VARCHAR(12) NOT NULL,
• quantity NUMBER(4,0),
• PRIMARY KEY (order_number, line_item),
• FORIEGN KEY (order_number)
• REFERENCES order_header (order_number)
• ON DELETE SET DEFAULT
• ON UPDATE CASCADE,
• FOREIGN KEY (part_number)
• REFERENCES parts (part_number)
• );
• Constraints can also be given names so that they can later be modified or dropped easily.
• CREATE TABLE order_header (
• order_number NUMBER(10,0) NOT NULL,
• order_date DATE,
• sales_person VARCHAR(25),
• bill_to VARCHAR(35),
• bill_to_address VARCHAR(45),
• bill_to_city VARCHAR(20),
• bill_to_state VARCHAR(2),
• bill_to_zip VARCHAR(10),
• CONSTRAINT pk_order_header
• PRIMARY KEY (order_number)
• );
~ 51 ~
• CREATE TABLE order_header (
• order_number NUMBER(10,0) NOT NULL,
• order_date DATE,
• sales_person VARCHAR(25),
• bill_to VARCHAR(35),
• bill_to_address VARCHAR(45),
• bill_to_city VARCHAR(20),
• bill_to_state VARCHAR(2),
• bill_to_zip VARCHAR(10)
• );
•
• ALTER TABLE order_header
• ADD CONSTRAINT pk_order_header
• PRIMARY KEY (order_number);
•
•
• CREATE TABLE order_items (
• order_number NUMBER(10,0) NOT NULL,
• line_item NUMBER(4,0) NOT NULL,
• part_number VARCHAR(12) NOT NULL,
• quantity NUMBER(4,0)
• );
•
• ALTER TABLE order_items ADD
• CONSTRAINT pk_order_items
• PRIMARY KEY (order_number, line_item) ;
•
• ALTER TABLE order_items ADD
• CONSTRAINT fk1_order_items
• FORIEGN KEY (order_number)
• REFERENCES order_header (order_number)
• ON DELETE SET DEFAULT
• ON UPDATE CASCADE;
•
• ALTER TABLE order_items ADD
• CONSTRAINT fk2_order_items
• FOREIGN KEY (part_number)
• REFERENCES parts (part_number)
• ON DELETE SET DEFAULT
• ON UPDATE CASCADE;
~ 52 ~
Removing Schema Components with DROP
• Changing Attributes:
ALTER TABLE student ALTER last_name VARCHAR(35);
ALTER TABLE student ALTER gpa DROP DEFAULT
ALTER TABLE student ALTER gpa SET DEFAULT 0.00;
• Adding Attributes:
ALTER TABLE student ADD admission DATE;
• Removing Attributes (not widely implemented):
ALTER TABLE student DROP home_phone;
• General syntax:
• INSERT INTO tablename (column1, column2, ... columnX)
• VALUES (val1, val2, ... valX);
• Examples:
• INSERT INTO employee (first_name, last_name, street, city, state, zip)
• VALUES ("Buddy", "Rich", "123 Sticks Ln.", "Fillville", "TN",
"31212");
•
• INSERT INTO stocks (symbol, close_date, close_price)
~ 53 ~
• VALUES ("IBM", "03-JUN-94", 104.25);
•
• INSERT INTO student_grades (student_id, test_name, score, grade)
• VALUES (101, "Quiz 1", 88, "B+");
• Quotes are placed around the data depending on the Data type and on the specific
RDBMS being used:
• Main way of getting data out of tables is with the SELECT statement.
• SELECT syntax:
• SELECT column1, column2, ... columnN
• FROM tableA, tableB, ... tableZ
• WHERE condition1, condition2, ...conditionM
• GROUP BY column1, ...
• HAVING condition
• ORDER BY column1, column2, ... columnN
~ 54 ~
• SELECT employee_id, last_name, first_name
• FROM employees
• WHERE salary > 40000
• ORDER BY last_name, first_name DESC
•
• SELECT *
• FROM employees
• ORDER BY 2;
•
• SELECT symbol, close_price
• FROM stocks
• WHERE close_date > "01-JAN-95" AND
• symbol = "IBM"
• ORDER BY close_date
•
•
• SELECT symbol, close_date, close_price
• FROM stocks
• WHERE close_date >= "01-JAN-95"
• ORDER BY symbol, close_date
~ 55 ~
Mary CIS 98
Sue Marketing 88
Tom Finance 92
Alex CIS 79
Sam Marketing 89
Jane Finance 83
...
Note: To try out these examples, create the table in MS Access and enter the data shown above.
Go to the Queries form and choose New, then choose Design View and then close the next dialog
box. Under the View menu, choose SQL.
~ 56 ~
• Mary CIS 98
• Tom Finance 92
• Sam Marketing 89
Note the two aliases given to the students table: s1 and s2. These allow us to refer to
different views of the same table.
• In the FROM portion, list all tables separated by commas. Called a Join.
• The WHERE part becomes the Join Condition
• Example table EMPLOYEE:
• Name Department Salary
• Joe Finance 50000
• Alice Finance 52000
• Jill MIS 48000
• Jack MIS 32000
• Fred Accounting 33000
•
•
• Example table DEPARTMENTS:
• Department Location
• Finance NJ
• MIS CA
• Accounting CA
• Marketing NY
• List all of the employees working in California:
• SELECT employee.name
• FROM employee, department
• WHERE employee.department = department.department
• AND department.location = 'CA';
•
• Results:
• NAME
• --------------------------------
• Jill
• Jack
• Fred
• List each employee name and what state (location) they work in. List them in order of
location and name:
• SELECT employee.name, department.location
• FROM employee, department
• WHERE employee.department = department.department
• ORDER BY department.location, employee.name;
•
• Results:
• NAME LOCATION
• --------------- -------------
~ 57 ~
• Fred CA
• Jack CA
• Jill CA
• Alice NJ
• Joe NJ
• List each department and all employees that work there. Show the department and
location even if no employees work there.
• SELECT department.department, department.location,
• employee.name
• FROM employee RIGHT JOIN department
• ON employee.department = department.department
• ORDER BY department.location, employee.name;
•
•
• Results:
• DEPARTMENT LOCATION NAME
• ------------- ---------------- ----------------
• Accounting CA Fred
• MIS CA Jack
• MIS CA Jill
• Finance NJ Alice
• Finance NJ Joe
• Marketing NY NULL
• What is the highest paid salary in California ?
• SELECT MAX(employee.salary)
• FROM employee, department
• WHERE employee.department = department.department
• AND department.location = 'CA';
•
• Results:
• MAX(SALARY)
• ------------
• 48000
• Cartesian Product of the two tables:
• SELECT *
• FROM employee, department;
•
•
• Results:
• Name employee.Departmen Salary Department.Dep Location
• Joe Finance 50000 Finance NJ
• Joe Finance 50000 MIS CA
• Joe Finance 50000 Accounting CA
• Joe Finance 50000 Marketing NY
• Alice Finance 52000 Finance NJ
~ 58 ~
• Alice Finance 52000 MIS CA
• Alice Finance 52000 Accounting CA
• Alice Finance 52000 Marketing NY
• Jill MIS 48000 Finance NJ
• Jill MIS 48000 MIS CA
• Jill MIS 48000 Accounting CA
• Jill MIS 48000 Marketing NY
• Jack MIS 32000 Finance NJ
• Jack MIS 32000 MIS CA
• Jack MIS 32000 Accounting CA
• Jack MIS 32000 Marketing NY
• Fred Accounting 33000 Finance NJ
• Fred Accounting 33000 MIS CA
• Fred Accounting 33000 Accounting CA
• Fred Accounting 33000 Marketing NY
• In which states do our employees work ?
• SELECT DISTINCT location
• FROM department;
• From our Bank Accounts example.
List the Customer name and their total account holdings:
• SELECT customers.LastName, Sum(Balance)
• FROM customers, accounts
• WHERE customers.CustomerID = accounts.customerid
• GROUP BY customers.LastName
•
• Results:
•
• LASTNAME SUM(BALANCE)
• --------- ------------
• Axe $15,000.00
• Builder $1,300.00
• Jones $1,000.00
• Smith $6,000.00
• We can also use a Column Alias to change the title of the columns
• SELECT customers.LastName, Sum(Balance) AS TotalBalance
• FROM customers, accounts
• WHERE customers.CustomerID = accounts.customerid
• GROUP BY customers.LastName
•
• Results:
•
• LASTNAME TotalBalance
• --------- ------------
• Axe $15,000.00
• Builder $1,300.00
• Jones $1,000.00
• Smith $6,000.00
• Here is a combination of a function and a column alias:
~ 59 ~
• SELECT name, department,
• salary AS CurrentSalary,
• (salary * 1.03) AS ProposedRaise
• FROM employee;
•
• Results:
•
• name department CurrentSalary ProposedRaise
• -------- ------------ ------------- -------------
• Alice Finance 52000 53560
• Fred Accounting 33000 33990
• Jack MIS 32000 32960
• Jill MIS 48000 49440
• Joe Finance 50000 51500
•
•
• Recall some of the E-R diagrams and relations we dealt with had a recursive relationship.
• For example: A student can tutor one or more other students. A student has only one
tutor.
STUDENTS (StudentID, Name, Student_TutorID)
~ 60 ~
• Student Tutor
• ---------- ----------
• Alex Bill
• Mary Bill
• Sue Bill
• Liz Mary
• Ed Mary
• Petra Sue
• The above is called a "recursive" query because it access the same table two times.
• We give the table two aliases called s1 and tutors so that we can compare different
aspects of the same table.
• However, as is, the table is missing something: We don't see who is tutoring Bill Smith.
Use LEFT JOIN:
• SELECT s1.name AS Student, tutors.name AS Tutor
• FROM students s1 LEFT JOIN students tutors
• ON s1.student_tutorid = tutors.studentid;
•
• Results:
•
• Student Tutor
• ---------- ----------
• Bill
• Alex Bill
• Mary Bill
• Sue Bill
• Liz Mary
• Ed Mary
• Petra Sue
• Here is one more twist: Suppose we were interested in those students who do not tutor
anyone? Use RIGHT JOIN
• How many students does each tutor work with ?
• SELECT s1.name AS TutorName,
• COUNT(tutors.student_tutorid) AS NumberTutored
• FROM students s1, students tutors
• WHERE s1.studentid = tutors.student_tutorid
• GROUP BY s1.name;
•
• Results:
•
• TutorName NumberTutored
• ---------- -------------
• Bill 3
• Mary 2
• Sue 1
~ 61 ~
• Typical Logic expressions:
COLUMN = value
Also:
• < > = != <= >=
In the above case, the subquery returns a set of tuples. The IN clause returns true when a
tuple matches a member of the set.
~ 62 ~
• WHERE e3.salary < employee.salary)
•
• Results:
• name salary
• ----------- ----------
• Joe 50000
• Jill 48000
• Fred 33000
•
The above query shows all employees names and salaries where there is at least one
person who makes more money (the first exists) and at least one person who makes less
money (second exists).
• NOT EXISTS:
• SELECT name, salary
• FROM employee
• WHERE NOT EXISTS
• (SELECT name
• FROM EMPLOYEE e2
• WHERE e2.salary > employee.salary)
•
• Results:
• name salary
• --------- ----------
• Alice 52000
•
•
Above query shows all employees for whom there does not exist an employee who is
paid less.
• LIKE operator:
Use the LIKE operator to perform a partial string match. Generally, the % character is
used as the wild card although in some DBMS, the * character is used.
Show all employees whose name contains the letter 'e' and the letter 'n' in any order:
~ 64 ~
• Give an Employee a raise:
• UPDATE employee
• SET salary = salary * 1.05
• WHERE employee_id = 'E1001';
Defining Views
~ 65 ~
Data Storage Characteristics:
• For a significant amount of data, we require persistent, inexpensive, reliable and sharable
storage methods with relatively rapid access time.
• Persistent - Data persists (lives on) after power is removed.
• Inexpensive - typically measured on a $ per Megabyte basis.
• Reliable - Should not have to be replaced due to excessive errors.
• Sharable - Should facilitate sharing of data among many users.
• Access time - Data should be accessible in a relatively short period of time.
~ 66 ~
• Access time is the dominant cost to consider
• Access time consists of:
1. Seek time - moving the disk read/write head to the right track
2. Disk Rotation time - waiting for the disk to rotate the track under the head
3. Transfer time - time to actually read the data (blocks) from the disk and place it
on the bus for main memory.
• The goal is to minimize seek and disk rotation delay by orienting related data on the same
or adjacent tracks.
• Block - The smallest unit of memory a disk can read or write.
• Block Size - the size of the block. Typically 512 bytes, 1024, 2048, ... 32 Kilobytes.
• Relations (records) are stored on disk with each tuple written one after the other (end to
end).
• Blocking Factor - the number of tuples (records) that can fit into a single block.
• Example: EMPLOYEE takes 100 bytes to store one tuple (record).
If the Block Size is 2,000 bytes, then we can store 20 EMPLOYEE tuples (records) in
one block.
Thus the Blocking factor is 2000/100 = 20
• f = B/R
• Fixed length records: Each record is of fixed length. Pad with spaces, etc.
• Variable Length records: Each record is only as long as the data it contains.
• Unspanned Records: A record is found in one and only one block. i.e., records do not
span across block boundaries.
• Spanned Records: Records are allowed to span across block boundaries.
File Operations
Find Select
Insert Insert
Modify Update
Delete Delete
• Unordered file - New record is inserted at the end of the file.
o Insert takes constant time.
o Select, Update and Delete take n/2 time.
(n is the number of records)
• Ordered file - New record is inserted in order, in the file.
o Insert takes log2n plus this time to re-organize records.
~ 67 ~
o Select, Update, Delete take at least log2n
• Indexed file - New record is inserted at the end of the file.
o An index is maintained that points to the location on disk where the record is
found.
o Insert takes constant time for the data itself plus log2n for the index
o Select, Update, Delete take log2n lookup on the index followed by constant time
to access data record.
Types of Indexing
Ordered Index
Hashing
• Identify a function f that takes as input, the key for a relation and returns, as output, the
physical disk address for the rest of the data in the record.
• Example: Assume employee records.
Function f takes the ascii values of the first and last name and adds them.
The numeric result is the physical address for the record.
• Selection time is constant.
• It is possible function f can map two different keys to the same address. In this case, we
use a series of hash buckets.
~ 68 ~
Database System Architectures:
• There are a number of database system architectures presently in use.
• One must examine several criteria:
1. Where do the data and DBMS reside ?
2. Where are the application program executed (e.g., which CPU) ? This may
include the user interface.
3. Where are business rules enforced ?
~ 69 ~
Personal Computer - Stand-Alone Database
~ 70 ~
• PCs are connected to a local area network (LAN).
• A single file server stores a single copy of the database files.
• PCs on the LAN map a drive letter (or volume name) on the file server.
• Applications run on each PC on the LAN and access the same set of files on the file
server. The application is also the DBMS.
• Business rules are enforced in the applications - Also, the applications must handle
concurrency control. Possibly by file locking.
• Each user runs a copy of the same application and accesses the same files.
• Example: Sharing MS Access files on a file server.
• Advantages:
o (limited) Ability to share data among several users
o Costs of storage spread out among users
o Most components are now commodity items - prices falling
• Disadvantages:
o Limited data sharing ability - a few users at most
• Client machines:
o Run own copy of an operating system.
o Run one or more applications using the client machine's CPU, memory.
o Application communicates with DBMS server running on server machine through
a Database Driver
o Database driver (middleware) makes a connection to the DBMS server over a
network.
o Examples of clients: PCs with MS Windows operating system. Forms and reports
developed in: PowerBuilder, MS Access, Borland Delphi, Oracle Developer, MS
Visual Basic, "C" or "C++", etc.
• Server Machines:
o Run own copy of an operating system.
~ 71 ~
o Run a Database Management System that manages a database.
o Provides a Listening daemon that accepts connections from client machines and
submits transactions to DBMS on behalf of the client machines.
o Examples: Sun Sparc server running UNIX operating system. RDBMS such as
Oracle Server, Sybase, Informix, DB2, etc.
PC with Windows operating system.
• Middleware:
o Small portion of software that sits between client and server.
o Establishes a connection from the client to the server and passes commands (e.g.,
SQL) between them.
o See ODBC below.
o Examples:
For Oracle: SQL*Net (or Net8) running on both client and server.
For Sybase: Sybase Open Client and Open Server.
• Business rules may be enforced at:
~ 72 ~
Distributed Database Architecture
Data Partitioning
Horizontal Partitioning:
Partition 1
Customer ID Name Address City State Zip
1001 Mr. Smith 123 Lexington Smithville KY 91232
1002 Mrs. Jones 12 Davis Ave. Smithville KY 91232
Partition 2
Customer ID Name Address City State Zip
1003 Mr. Axe 443 Grinder Ln. Broadville GA 81992
1004 Mr. Builder 661 Parker Rd. Streetville GA 81990
Vertical Partitioning:
Partition 1 Partition 2
CustID Name CustID Address City State Zip
1001 Mr. Smith 1001 123 Lexington Smithville KY 91232
1002 Mrs. Jones 1002 12 Davis Ave. Smithville KY 91232
1003 Mr. Axe 1003 443 Grinder Ln. Broadville GA 81992
1004 Mr. Builder 1004 661 Parker Rd. Streetville GA 81990
Data Replication
~ 74 ~
• In Asynchronous replication, we take snapshots of a master database and propagate the
changes to other sites on some periodic basis.
• In general, distributed database systems offer more flexibility, higher performance and
greater levels of independence over centralized systems.
• However, distributed database systems are also much harder to design and develop,
control and administrate. Security is also more difficult to enforce.
BTW, for those of you from the UNIX world, simply replace "PCs" with "UNIX Workstations"
in the phrases above.
~ 75 ~
• Triggers are procedures or functions stored in the DBMS and are invoked when certain
events occur.
• Events include: Inserting a new row into a table, updating data in a table, deleting a row
in a table, etc.
• Example: A trigger may fire after each time an inventory record is updated. The trigger
will automatically insert a new Order record in the Orders table if the quantity in
inventory falls below a certain level.
• Triggers are used to enforce business rules that all applications that use the database must
adhere to.
• Programming triggers requires special attention is paid to how transaction execute.
Triggers may cause locks to be held longer than expected or may have other side effects.
• Most major DBMS support triggers. e.g., Oracle supports triggers written in PL/SQL.
IBM DB2 supports triggers written in just about any language such as "C" and Java.
• Stored Procedures are similar to triggers: They are functions and procedures that are
stored in the database. Stored procedures may be called by triggers or by application
programs.
• Stored procedures are useful in cases when standard applications logic must be
implemented across all applications. Copies of this code do not need to be distributed to
the clients.
• Also very useful when a large number of database accesses must be done with just a
small result being passed back to the client.
~ 76 ~
Stored procedures in the DBMS are used to accept input from HTML forms,
perform the appropriate query and then format the results in HTML.
MultiUser Databases:
• Multiuser database - more than one user processes the database at the same time
• Several issues arise:
1. How can we prevent users from interfering with each other's work ?
2. How can we safely process transactions on the database without corrupting or
losing data ?
3. If there is a problem (e.g., power failure or system crash), how can we recover
without loosing all of our data ?
Transaction Processing
• We need the ability to control how transactions are run in a multiuser database.
• A transaction is a set of read and write operations that must either commit or abort.
• Consider the following transaction that reserves a seat on an airplane flight and changes
the customer:
1. Read customer information
2. Write reservation information
3. Write charges
• Suppose that after the second step, the database crashes. Or for some reason, changes can
not be written...
~ 77 ~
• Transactions can either reach a commit point, where all actions are permanently saved in
the database or they can abort in which case none of the actions are saved.
• Another way to say this is transactions are Atomic. All operations in a transaction must
be executed as a single unit - Logical Unit of Work.
• Consider two users, each executing similar transactions:
• Example #1:
•
• User A User B
• Read Salary for emp 101 Read Salary for emp 101
• Multiply salary by 1.03 Multiply salary by 1.04
• Write Salary for emp 101 Write Salary for emp 101
•
•
• Example #2:
•
• User A User B
• Read inventory for Prod 200 Read inventory for Prod 200
• Decrement inventory by 5 Decrement inventory by 7
• Write inventory for Prod 200 Write inventory for Prod 200
• First, what should the values for salary (in the first example) really be ?
• The DBMS must find a way to execute these two transactions concurrently and ensure
the result is what the users (and designers) intended.
• These two are examples of the Lost Update or Concurrent Update problem. Some
changes to the database can be overwritten.
• Consider how the operations for user's A and B might be interleaved as in example #2.
Assume there are 10 units in inventory for Prod 200:
• Read inventory for Prod 200 for user A
• Read inventory for Prod 200 for user B
• Decrement inventory by 5 for user A
• Decrement inventory by 7 for user B
• Write inventory for Prod 200 for user A
• Write inventory for Prod 200 for user B
• In the first case, the incorrect amount (3) is written to the database. This is called the Lost
Update problem because we lost the update from User A - it was overwritten by user B.
• The second example works because we let user A write the new value of Prod 200 before
user B can read it. Thus User B's decrement operation will fail.
• Here is another example. User's A and B share a bank account. Assume an initial balance
of $200.
• User A reads the balance
~ 78 ~
• User A deducts $100 from the balance
• User B reads the balance
• User A writes the new balance of $100
• User B deducts $100 from the balance
• User B writes the new balance of $100
• The reason we get the wrong final result (remaining balance of $100) is because
transaction B was allowed to read stale data. This is called the inconsistent read problem.
• Suppose, instead of interleaving (mixing) the operations of the two transactions, we
execute one after the other (note it makes no difference which order: A then B, or B then
A)
• User A reads the balance
• User A deducts $100 from the balance
• User A writes the new balance of $100
• User B reads the balance (which is now $100)
• User B deducts $100 from the balance
• User B writes the new balance of $0
• If we insist only one transaction can execute at a time, in serial order, then performance
will be quite poor.
• Concurrency Control is a method for controlling or scheduling the operations in such a
way that concurrent transactions can be executed.
• If we do concurrency control properly, then we can maximize transaction throughput
while avoiding any chance.
• Transaction throughput: The number of transactions we can perform in a given time
period. Often reported as Transactions per second or TPS.
• A group of two or more concurrent transactions are serializable if we can order their
operations so that the final result is the same as if we had run them in serial order (one
after another).
• Consider transaction A, B, C and D. Each has 3 operations. If executing:
A1, B1, A2, C1, C2, B2, A3, B3, C3
has the same result as executing:
A1, A2, A3, B1, B2, B3, C1, C2, C3
Then the above schedule of transactions and operations is serialized.
Characteristics of Locks
~ 79 ~
• Locks may be applied to:
1. a single data item (value)
2. an entire row of a table
3. a page (memory segment) (many rows worth)
4. an entire table
5. an entire database
• The most commonly implemented locking mechanism is called Two Phased Locking or
2PL. 2PL is a concurrency control mechanism that ensure serializability.
• 2PL has two phases: Growing and shrinking.
1. A transaction acquires locks on data items it will need to complete the transaction.
This is called the growing phase.
2. Once one lock is released, all no other lock may be acquired. This is called the
shrinking phase.
• Consider our prior example, this time using locks:
• User A places an exclusive lock on the balance
• User A reads the balance
• User A deducts $100 from the balance
•
• User B attempts to place a lock on the balance
• but fails because A already has an exclusive lock
• User B is placed into a wait state
• User A writes the new balance of $100
• User A releases the exclusive lock on the balance
•
• User B places an exclusive lock on the balance
• User B reads the balance
• User B deducts $100 from the balance
• User B writes the new balance of $100
• Here is a more involved example:
• User A places a shared lock on item raise_rate
• User A reads raise_rate
• User A places an exclusive lock on item Amy_salary
• User A reads Amy_salary
•
• User B places a shared lock on item raise_rate
• User B reads raise_rate
~ 80 ~
•
• User A calculates a new salary as Amy_salary * (1+raise_rate)
•
• User B places an exclusive lock on item Bill_salary
• User B reads Bill_salary
• User B calculates a new salary as Bill_salary * (1+raise_rate)
• User B writes Bill_salary
•
• User A writes Amy_salary
• User A releases exclusive lock on Amy_salary
•
• User B releases exclusive lock on Bill_Salary
• User B releases shared lock on raise_rate
•
• User A releases shared lock on raise_rate
• Here is another example:
• User A places a shared lock on raise_rate
•
• User B attempts to place an exclusive lock on raise_rate
• Placed into a wait state
•
• User A places an exclusive lock on item Amy_salary
• User A reads raise_rate
• User A releases shared lock on raise_rate
•
• User B places an exclusive lock on raise_rate
•
• User A reads Amy_salary
•
• User B reads raise_rate
•
• User A calculates a new salary as Amy_salary * (1+raise_rate)
•
• User B writes a new raise_rate
• User B releases exclusive lock on raise_rate
•
• User A writes Amy_salary
• User A releases exclusive lock on Amy_salary
•
Deadlock
~ 81 ~
• User B placed into a wait state
• ...
•
• This is called a deadlock. One transaction has locked some of the resources and is
waiting for locks so it can complete. A second transaction has locked those needed items
but is awaiting the release of locks the first transaction is holding so it can continue.
• Two main ways to deal with deadlock.
1. Prevent it in the first place by giving each transaction exclusive rights to acquire
all locks needed before proceeding.
2. Allow the deadlock to occur, then break it by aborting one of the transactions.
Reprocessing
• We apply a similar technique: Make periodic saves of the database (time consuming
operation). However, maintain a more intelligent log of the transactions that have been
applied. This transaction log Includes before images and after images
• Before Image: A copy of the table record (or page) of data before it was changed by the
transaction.
• After Image: A copy of the table record (or page) of data after it was changed by the
transaction.
• Rollback: Undo any partially completed transactions (ones in progress when the crash
occurred) by applying the before images to the database.
~ 82 ~
• Rollforward: Redo the transactions by applying the after images to the database. This is
done for transactions that were committed before the crash.
• Recovery process uses both rollback and rollforward to restore the database.
• In the worst case, we would need to rollback to the last database save and then
rollforward to the point just before the crash.
• Checkpoints can also be taken (less time consuming) in between database saves.
• The DBMS flushes all pending transactions and writes all data to disk and transaction
log.
• Database can be recovered from the last checkpoint in much less time.
Database Backup
~ 83 ~