Normalization
Normalization
(UNF 3NF)
For AFW2851 use only
Disclaimer
This guide is NOT designed to replace the lecture
notes, they are meant to make reading the lecture
notes EASIER
This guide will NOT tell you what is a primary/foreign
key (again, lecture notes!)
The examples in this guide are from the lecture notes
because with a different way of explaining things,
hopefully you can understand the lecture notes better
Finally, I CAN be wrong, so let me know if you see
something you feel is not right so I can help make this
guide better.
UNF
You should have seen this table before:
However, the way this table is shown might not be intuitive (its
CORRECT, just that you might not get the right idea on what UNF
really is)
Now, if you look at say, project number 18, you might think that that
project is represented by 4 rows right?
Not quite. Its actually one row, just that real-life databases can only
show it this way.
Fortunately, theres other ways of showing project number 18 (and all
the other projects)
PROJ_NAME
18
Amber Wave
EMP_NUM
EMP_NAME
114 Annelise Jones
118 James J. Frommer
104 Anne K. Ramoras *
102 Darlene M. Smithson
JOB_CLASS
CHG_HOUR HOURS
Applications Designer
$48.10
24.6
General Support
$18.36
45.3
Systems Analyst
$96.75
32.4
DSS Analyst
$45.95
44
In this way, it is much easier to find out repeating groups, and you can
see that project 18 is really more like 1 row, but with many different
values of emp_name, job_class, etc. etc.
Repeating groups are attributes/columns that have many values
associated with one value of the key attributes/columns in the table
(most likely proj_num)
Note: I may use the words attribute/column interchangeably in this
guide, but we (both you and i) should all try to stick to the word
attribute
CUST_NAME
23 John Travolta
CUST_ADDRESS
CUST_PHONE
No. 22, Beverly Hills 555-3341
CUST_EMAIL
jt@hollywood.com
Its possible for any customer to have more than one email address.
So if we add one more email address, it will look like this:
CUST_ID
CUST_NAME
23 John Travolta
23 John Travolta
CUST_ADDRESS
CUST_PHONE
No. 22, Beverly Hills 555-3341
No. 22, Beverly Hills 555-3341
CUST_EMAIL
jt@hollywood.com
john.t@beverly.com
So the issue is figuring out which attributes can have multiple values .
If it can, it is definitely a repeating group.
If you know for a fact that an attribute can only have one value
(whether it is by common sense or set by the question), then all is
good!
So just copy over the values of proj_num and name into the empty
cells (this process actually divides 1 row into multiple rows)
Before Step 1
After Step 1
PROJ_NUM
PROJ_NAME
18
Amber Wave
EMP_NUM
EMP_NAME
114 Annelise Jones
118 James J. Frommer
104 Anne K. Ramoras *
102 Darlene M. Smithson
JOB_CLASS
CHG_HOUR HOURS
Applications Designer
$48.10
24.6
General Support
$18.36
45.3
Systems Analyst
$96.75
32.4
DSS Analyst
$45.95
44
Before Step 1
PROJ_NUM
PROJ_NAME
18 Amber Wave
18 Amber Wave
18 Amber Wave
18 Amber Wave
EMP_NUM
EMP_NAME
114 Annelise Jones
118 James J. Frommer
104 Anne K. Ramoras *
102 Darlene M. Smithson
After Step 1
JOB_CLASS
Applications Designer
General Support
Systems Analyst
DSS Analyst
CHG_HOUR HOURS
$48.10
24.6
$18.36
45.3
$96.75
32.4
$45.95
44
PROJ_NAME
18 Amber Wave
18 Amber Wave
18 Amber Wave
18 Amber Wave
EMP_NUM
EMP_NAME
114 Annelise Jones
118 James J. Frommer
104 Anne K. Ramoras *
102 Darlene M. Smithson
JOB_CLASS
Applications Designer
General Support
Systems Analyst
DSS Analyst
CHG_HOUR HOURS
$48.10
24.6
$18.36
45.3
$96.75
32.4
$45.95
44
How to do it:
(1) Look at proj_num + proj name: does this combination uniquely identify a row?
Nope, 18 Amber Wave can mean any of the 4 rows
(2) Look at proj_num + emp_num: Yes! 18 114 uniquely identifies the first row
(3) Look at proj_num + emp_name: Yes! But emp_num already does the job well
enough, so ignore this. (Besides you never know if an employee has multiple
names)
(4) Proj_num and job_class? No, you could have many Systems Analysts in project
18. So 18 Systems Analysts would not uniquely identify a row.
(5) Repeat the process of finding combinations. If you cant find combinations of
2 attributes, look at combinations of 3, then 4, so on (the highest combination
number you should reach is (total number of attributes 1.) can you guess why?)
PROJ_NAME
18 Amber Wave
18 Amber Wave
18 Amber Wave
18 Amber Wave
EMP_NUM
EMP_NAME
114 Annelise Jones
118 James J. Frommer
104 Anne K. Ramoras *
102 Darlene M. Smithson
JOB_CLASS
Applications Designer
General Support
Systems Analyst
DSS Analyst
CHG_HOUR HOURS
$48.10
24.6
$18.36
45.3
$96.75
32.4
$45.95
44
EMP_NUM
18
18
18
18
PROJ_NAME
114 Amber Wave
118 Amber Wave
104 Amber Wave
102 Amber Wave
EMP_NAME
Annelise Jones
James J. Frommer
Anne K. Ramoras *
Darlene M. Smithson
JOB_CLASS
Applications Designer
General Support
Systems Analyst
DSS Analyst
CHG_HOUR HOURS
$48.10
24.6
$18.36
45.3
$96.75
32.4
$45.95
44
PROJ_NAME
18
Amber Wave
EMP_NUM
EMP_NAME
114 Annelise Jones
118 James J. Frommer
104 Anne K. Ramoras *
102 Darlene M. Smithson
JOB_CLASS
CHG_HOUR HOURS
Applications Designer
$48.10
24.6
General Support
$18.36
45.3
Systems Analyst
$96.75
32.4
DSS Analyst
$45.95
44
Before Step 1
We normally nominate the id
PROJ_NUM
PROJ_NAME
18
Amber Wave
PROJ_NUM
18
18
18
18
EMP_NUM
EMP_NAME
114 Annelise Jones
118 James J. Frommer
104 Anne K. Ramoras *
102 Darlene M. Smithson
Projects
JOB_CLASS
Applications Designer
General Support
Systems Analyst
DSS Analyst
CHG_HOUR HOURS
$48.10
24.6
$18.36
45.3
$96.75
32.4
$45.95
44
Employee_Assignments
After Step 1
PROJ_NAME
18
Amber Wave
Projects
PROJ_NUM
18
18
18
18
EMP_NUM
EMP_NAME
114 Annelise Jones
118 James J. Frommer
104 Anne K. Ramoras *
102 Darlene M. Smithson
CHG_HOUR HOURS
$48.10
24.6
$18.36
45.3
$96.75
32.4
$45.95
44
Employee_Assignments
End of 1NF
Method 1
Rpt_Format(proj_num, proj_name, emp_num, emp_name, job_class,
chg_hour, hours)
Method 2
Projects(proj_num, proj_name)
Employee Assignments(proj_num, emp_num, emp_name, job_class,
chg_hour, hours)
PROJ_NAME
18 Amber Wave
18 Amber Wave
18 Amber Wave
18 Amber Wave
EMP_NUM
EMP_NAME
114 Annelise Jones
118 James J. Frommer
104 Anne K. Ramoras *
102 Darlene M. Smithson
JOB_CLASS
Applications Designer
General Support
Systems Analyst
DSS Analyst
CHG_HOUR HOURS
$48.10
24.6
$18.36
45.3
$96.75
32.4
$45.95
44
Note: It is impossible for a table with only one primary key to have a
partial dependency. Can you guess why?
Non-key attribute = any attribute that is not a primary key or a foreign key
PROJ_NAME
18 Amber Wave
18 Amber Wave
18 Amber Wave
18 Amber Wave
EMP_NUM
EMP_NAME
114 Annelise Jones
118 James J. Frommer
104 Anne K. Ramoras *
102 Darlene M. Smithson
JOB_CLASS
Applications Designer
General Support
Systems Analyst
DSS Analyst
CHG_HOUR HOURS
$48.10
24.6
$18.36
45.3
$96.75
32.4
$45.95
44
Partial dependency
EMP_NAME
Partial dependency
1NF to 2NF
Now you have identified all the partial dependencies, and written them down,
create new tables based on these dependencies, and name the tables.
Rpt_Format(PROJ_NUM, EMP_NUM, PROJ_NAME, EMP_NAME, JOB_CLASS, CHG_HOUR,
HOURS)
Partial Dependencies:
(PROJ_NUM -> PROJ_NAME)
(EMP_NUM -> EMP_NAME)
Attributes that are taken over into the new tables dont have to be in the old
table, So after 2NF...
Projects
PROJ_NUM
Employee Assignments
PROJ_NAME
Employees
EMP_NUM
EMP_NAME
PROJ_NUM
EMP_NUM
JOB_CLASS
CHG_HOUR HOURS
Projects
PROJ_NUM PROJ_NAME
Employee_Assignments
PROJ_NUM EMP_NUM
EMP_NAME
Partial dependency
Partial Dependencies:
Emp_num -> emp_name
End of 2NF
At 1NF for Method 2...
Projects(proj_num, proj_name)
Employee Assignments(proj_num, emp_num, emp_name, job_class,
chg_hour, hours)
Partial Dependencies:
Emp_num -> emp_name
Projects
PROJ_NUM
Employee Assignments
PROJ_NAME
PROJ_NUM
EMP_NUM
JOB_CLASS
CHG_HOUR HOURS
Employees
EMP_NUM
EMP_NAME
End of Method 1 = End of Method 2! (Not always the case, just for this example.
Transitive Dependencies
Transitive dependencies are easy to look for. Just see if any non-key attribute
is 100% influenced by another non-key attribute
Projects
PROJ_NUM
Employee Assignments
PROJ_NAME
PROJ_NUM
EMP_NUM
JOB_CLASS
CHG_HOUR HOURS
Employees
EMP_NUM
EMP_NAME
The Projects and Employees tables only has 1 non-PK attribute, so they are
definitely in 3NF already! So we look at the Employee Assignmentes table:
Does job_class affect chg_hour? Yes! DSS analysts definitely earn a specific
amount, for example.
Does job_class affect hours? No, 2 people with different jobs can work the
same number of hours.
Does chg_hour affect job class? No! 2 people with different jobs can charge
the same amount! (This is an example where A influences B, but B doesnt
influence A)
Chg_hour doesnt influence hour. Hour doesnt influence anything.
2NF to 3NF
Now you have identified all the transitive dependencies, and written them
down, create new tables based on these dependencies, and name the tables.
At 2NF...
Projects(proj_num, proj_name)
Employees(emp_num, emp_name)
Employee Assignments(proj_num, emp_num, job_class, chg_hour, hours)
Transitive Dependencies:
Job_class -> chg_hour
Projects
PROJ_NUM
Employee Assignments
PROJ_NAME
Employees
EMP_NUM
EMP_NAME
PROJ_NUM
EMP_NUM
JOB_CLASS
CHG_HOUR HOURS
Transitive dependency
End of 3NF
Projects
PROJ_NUM
Employee Assignments
PROJ_NAME
Employees
EMP_NUM
PROJ_NUM
EMP_NUM
JOB_CLASS
Job Rates
EMP_NAME
JOB_CLASS
CHG_HOUR
HOURS