CPSC 304
Database Systems
n Instructors: Hazra Imran and Laks
V.S. Lakshmanan
n Textbook: Database Management
Systems by Raghu Ramakrishnan &
Johannes Gehrke 3rd Edition.
Most (not all) of my slides are based on the above textbook.
Laks V.S. Lakshmanan; Based on Ramakrishnan & Gehrke, DB Management Systems
and on George Tsiknis and Raymond Ngs slides.
Administrivia
Laks V.S. Lakshmanan; Based on Ramakrishnan & Gehrke, DB Management Systems
and on George Tsiknis and Raymond Ngs slides.
Two Sections
n Lecture notes separate for the two sections (Hazra) and
(Laks).
n However, tutorials are common: you may go to any one
tutorial.
n Lecture notes for both sections will be posted on piazza.
Unit 1:Intro
Your TAs
n . See
https://piazza.com/ubc.ca/winterterm22016/cpsc304/staff
for info. on your TAs.
nFastest way to get info./help about CPSC 304 our online
discussion forum Piazza.
Signup: https://piazza.com/ubc.ca/winterterm22016/cpsc304
Piazza Class Home: piazza.com/ubc.ca/winterterm22016/
cpsc304/home (go-to place for most things CPSC 304).
Course Home Page (Laks section):
http://www.cs.ubc.ca/~laks/cpsc304/304home.html.
Unit 1:Intro
A few Administrative Details
n Online Discussion of Course Material:
n We will use the Piazza system (www.piazza.com) for all
online discussion of course material. Piazza is a next
generation Question & Answer system specifically
designed to help you get answers to your questions fast.
The best way to get an answer to a question about 304 is
to post it on Piazza. Piazza allows both instructor, TAs
and students to answer questions, and makes it easy to
edit both questions and answers to improve them. To join
the Piazza group for 304, please go to
piazza.com/ubc.ca/winterterm22016/cpsc304 and follow
the instructions. Be sure to indicate your full name,
student ID, and the email address that you want to be
registered on Piazza (for this course) with.
Unit 1:Intro
More Admin. Details
n Please note that Piazza is a service that is hosted in the
United States. (It is a startup that originated at Stanford
University.) If you wish to preserve your anonymity on
Piazza, so that none of your personal information is
stored in the United States, you can create a new
anonymous email account and request Dylan (aka
Wengqiang Dong) (
[email protected]) to use that email
account for registering you. (If you already use a gmail,
hotmail or other US hosted email account, this is
essentially moot.)
n Well be using Piazza also for posting announcements
and additional course related material.
n Register your clickers ASAP! (See course home page for
tips.)
Unit 1:Intro
Overview
Unit 1:Intro
Unit 1
Introduction
Read
Text: Chapter 1
Our focus
n tell you what the course is about
n tell you what to expect out of the course
Learning Goals
n Explain what a Database is
n Explain what a database management system (DBMS) is
n Explain benefits of using a DBMS
n Describe the basic structure of a DBMS
Laks V.S. Lakshmanan; Based on Ramakrishnan & Gehrke, DB Management Systems
and on George Tsiknis and Raymond Ngs slides.
Why database systems? 1/2
n One of the most successful industries.
n What powers your ATMs, or shopping portals, or travel
planning/reservations sites, ?
n Social Networking & Recommender Systems: DBMS
Underlying core powering facebook, myspace, flickr,
del.icio.us, Yahoo!Answers, rottentomatoes.com, .
n Data management encompasses all major applications of
CS.
n data management will remain important for ever:
Continued improvement/extensions of relational technology.
Developing technologies for managing data not managed (well):
e.g., text, multimedia, web data, graphs, matrices,
Unit 1:Intro
Why database systems? 2/2
Suppose we are building a system to store the information
pertaining to the university using, say Java.
What do we need to do?
nStore data on files (how should we organize them? )
nWrite programs to access and update the data
nMake sure that updates dont mess things up.
nProvide different views and access on the data to different
A lot of
users (registrar versus students)
work!
nNeed to write programs to deal with crashes.
nAnd do all the above from scratch for each application!
Alternatively:
nUse a DBMS to store and maintain the data.
Unit 1:Intro
10
What is a database?
n A database is an organized collection of related
data, usually (but not necessarily) stored on disk. It is
typically:
important, online
shared
secured
well-designed
variable size
n A DB typically models some real-world enterprise
Entities (e.g., students, courses, movies, songs, cameras, )
Relationships (e.g., Ford stars in Star Wars: The Force
Awakens;
Mary takes CPSC 304; Jack rented a Civic from Hertz.).
Unit 1:Intro
11
What is a DBMS?
n A Database Management System (DBMS) is a
bunch of software designed to store and manage
databases. It is used to:
define, modify, and query a database
provide access control
allow concurrent access
maintain integrity
provide backup and recovery from crashes
Provides a uniform data administration/maintenance
o provides centralized control and easy data management
o reduces application development time/effort.
etc.
n OK, We have to do much less now, but
Unit 1:Intro
12
How do we design such large systems?
We do it in stages and we use different models to represent
the info at each stage:
1.First we model the real world concepts using some high
level models, called conceptual models
2.Then we translate the world model into a database model
(called logical data model) that the database system
understands
3.We make this data model as efficient as possible
(optimization stage)
4.Design the code to query and maintain the data
5.Write the code for the application we want to develop.
In this course well look at all of these .
Unit 1:Intro
13
Conceptual Data Models
n Also called semantic data models
n High level (i.e.,abstract) models used for capturing the real world at
the initial stage.
n Well use the most popular: Entity-Relationship model (ER)
Unit 1:Intro
14
Logical Data Models
n A major purpose of a DB system -- provide an abstract
view of the data using a data model
n Data model : a collection of tools for describing
data , data relationships, semantics, constraints
n Well use the relational model : most widely used model
today.
Main concept: relation, basically a table with rows and columns.
Relations represent all the info. needed for the application.
n Other models:
object-oriented model
semi-structured data models
older models: network model and hierarchical model
Unit 1:Intro
15
An Example of a Relational Database
student
sid
92001200
name
address
Smith G.
1234 W. 12th Ave, Van.
major
gpa
CPSC
85
93001250
Chan J.
2556 Fraser St., Van.
MATH
80
94001150
Campeau J.
null
null
null
course
did
num
title
credits
CPSC
124
Principles of CS I
CPSC
126
Principles of CS II
MATH
100
Calculus I
etc.
Unit 1:Intro
16
Optimizing the Database
n Well learn how to create a design that
captures all the information we need to store
takes up less space
is easy to maintain
n Well look at different Normal Forms
(dont worry about it for now!)
Unit 1:Intro
17
Designing queries
n Well mainly use Structured Query Language (SQL):
Example: Find all the CPSC students who have at least
75% gpa:
Student(sid, name, address, major, gpa)
select name
from Student
where major=CPSC and
gpa >=75
n A declarative language the query processor figures out
how to answer the query efficiently
n Well also look at Relational Algebra and a high level and
intuitive logical language called Datalog:
Unit 1:Intro
18
Datalog Sampler
n Redo the previous query using Datalog:
n Find all the CPSC students who have at least 75% gpa:
n answer(Name) student(Sid, Name, Addr, Major, Gpa),
Major = CPSC, Gpa >= 75.
Constants
Variables
Student(sid, name, address, major, gpa)
Unit 1:Intro
19
Datalog Sampler
n Redo the previous query using Datalog:
n Find all the CPSC students who have at least 75% gpa:
n answer(Name) student(Sid, Name, Addr, Major, Gpa),
Major = CPSC, Gpa >= 75.
Dont Care Variables
Student(sid, name, address, major, gpa)
Unit 1:Intro
20
Datalog Sampler
n Redo the previous query using Datalog:
n Find all the CPSC students who have at least 75% gpa:
n answer(Name) student(_, Name, _, CPSC, Gpa),
Gpa >= 75.
Note: Datalog is essentially a pattern-driven query
language.
Student(sid, name, address, major, gpa)
Unit 1:Intro
21
An Intuitive Way to Think about Datalog
student
sid
name
address
major
gpa
CPSC
conditions
G >= 75
Unit 1:Intro
Values of (variable) N are returned as answer.
Var G is constrained.
CPSC and 75 are constants.
We dont care about values of sid and address.
22
General Remarks about
the Course
Unit 1:Intro
23
Create an Application
n In the term project you will use
a programming language to write code for an
application
and a standard interface to the database to access the
data for the application
n Well focus on Java & JDBC
you can also use C++ and ODBC
or HTML and PHP
Unit 1:Intro
24
Special Topics
In addition to core topics on relational databases,
n well discuss two special topics on Data Analytics
Data Warehousing & OLAP
Data Mining.
Unit 1:Intro
25
Course Learning Outcomes
At the end of the course you will be able to :
n describe how relational databases store and retrieve information
n develop a database that satisfies the needs of a small enterprise using
n
n
n
n
the principles of relational database design.
Express queries using formal database languages like relational
algebra and datalog.
Express queries using SQL
develop a complete data-centric application with transactions and user
interface using Java, JDBC (or equivalent) and a popular DBMS
explain key concepts and techniques used in:
Data Warehousing & OLAP.
Data Mining.
Unit 1:Intro
26
Course Activities
n Youll learn the most by doing:
assigned reading before each lecture
in-class clicker exercises (lectures)
homework assignments started in tutorials
project in groups of size 4 will go through the design
and implementation of a realistic database application
additional exercises posted on the Piazza from time to
time (for your own practice) not for grade.
Unit 1:Intro
27
Examinations and Grading
n Midterms (70 minutes each) :
Tuesday, January 31, in class.
Tuesday, March 7, in class.
n Final examination:
April, date to be scheduled by the university.
n Course grades will be calculated as follows:
12% for the tutorial work (homework assignments)
3% for the clicker questions
20% for the project
20% for midterm exams, and
45% for the final exam.
Plus 0.5 point for just completing a midterm survey. If there are
other surveys, you will hear from us.
Unit 1:Intro
28
Passing the course
n To pass the course, you must obtain an
overall passing mark and pass the project
and the final.
Unit 1:Intro
29
Course Resources
n Piazza home for course: piazza.com/ubc.ca/
winterterm22016/cpsc304/home
all course materials: notes, sample exams, solutions,
Online discussion forum: Piazza.
o Announcements, additional material.
o tutorials, project, homework, etc.
n CPSC 304 home page (Laks section):
http://www.cs.ubc.ca/~laks/cpsc304/304home.html
n Office Hours:
Laks: Face2Face hour: Tues 5:00-6:00 pm; Piazza hour: Thu
5:00-6:00 pm.
See course home page for details on TAs and tutorials/labs.
n Keep visiting course pages and piazza very often to catch
latest updates.
Unit 1:Intro
30
Back to Databases - DBMSs
n What else DBMSs are good for?
n Who are the main users of databases?
n What is the basic structure of a DBMS?
Unit 1:Intro
31
Levels of Abstraction
n Usually database systems allow
users to access the data at three
abstraction levels:
Physical level: shows how data are
actually stored
Logical (or conceptual) level: shows
data using the systems data model
( i.e., tables, etc.)
External (or View) level: describes
different part of the database to different
users
o
o
Unit 1:Intro
View 1 View 2 View 3
Logical Level
Physical Level
for convenience, security, or other
reasons
i.e. compare views of a bank database
for customer, teller, bank manager, &
database administrator
32
Schema and Instances
n Similar to types and variables in programming languages
n Schema the structure of the database
Physical schema: database structure at the physical level
Logical schema (or Conceptual schema or just Schema):
database structure at the logical level
n Instance the actual content of the database at a
particular point in time
Analogous to the value of a variable
n Physical Data Independence the ability to modify the
physical schema without changing the logical schema
Applications depend on the logical schema
n Logical Data Independence Provided by the views
Ability to change the logical schema without changing the
applications that depend on the view
Unit 1:Intro
33
Relational Example: University Database
n Logical schema:
student(sid: integer, name: string, address: string,
major: string, gpa:float)
course(dept: string, num:string, credits:integer)
enrolled(sid:integer, dept: string, num:string,
grade:integer)
n Physical schema will define:
The type of file that stores each relation
The type of records for each file
Any Indexes on the files, etc.
n External Schema (View):
Unit 1:Intro
course_info(dept: string, num:string, enrollment:integer)
34
An Instance of the University Database
student
sid
92001200
name
address
Smith G.
1234 W. 12th Ave, Van.
major
gpa
CPSC
85
93001250
Chan J.
2556 Fraser St., Van.
MATH
80
94001150
Campeau J.
null
null
null
course
did
num
title
credits
CPSC
124
Principles of CS I
CPSC
126
Principles of CS II
MATH
100
Calculus I
etc.
Unit 1:Intro
35
Database Primary Users
n End users (including senior mgmt)
n DB application programmers
n Webmasters
n DB Administrator
maintains conceptual, physical and external schemas
o
modifies the design, as reqts change; determines impact
handles security and authorization
handles backups and recoveries
checks database performance and performs necessary tuning
o indexes, reorganization, query analysis
Unit 1:Intro
36
Structure of a DBMS
Query Parsing,
and Optimization
n A typical DBMS has
a layered
architecture.
n This is one of
several possible
Transacarchitectures; each tion
system has its own
&
variations.
Lock
Query (Operator)
Evaluator
Files and Access Methods
Recov
ery
Buffer Management
Mana
ger
Disk Space Management
Manager
DB
Unit 1:Intro
37
Key Takeaways from this lecture
n Why DBMS?
n Physical, logical, and conceptual models
n Schema and instance
n (super) simple queries using SQL and Datalog
Unit 1:Intro
38
Summary
n DBMS provide many benefits in maintaining & querying
large datasets including
abstract representations of the data at different levels,
recovery from system crashes,
concurrent access, data integrity, and security
quick application development.
n Main goal of 304 is to:
give you the knowledge of how a DB application works
train you as a database programmer
provide you with the early steps to become a DBA
n 404, on the other hand:
looks deeper into a DBMS
trains you more as a DBA
Unit 1:Intro
39