Database Normalization Explained in Simple English
Database Normalization Explained in Simple English
but to start, lets look at some data which hasnt been normalized and discuss
some potential pitfalls. Once these are understood, I think youll better
appreciate the reason to normalize the data. Consider the following table:
The first thing to notice is this table serves many purposes including:
1.
2.
3.
4.
2.
currently in Chicago. Our table is a small example, but you can see if it were
larger, that potentially this could involve hundreds of updates.
Also consider what would happen if John Hunt quits. If we remove his
entry, then we lose the information for New York.
These situations are modification anomalies. There are three modification
anomalies that can occur:
Insert Anomaly
There are facts we cannot record until we know information for the entire row.
In our example we cannot record a new sales office until we also know the
sales person. Why? Because in order to create the record, we need provide
a primary key. In our case this is the
EmployeeID.
Update Anomaly
The same information is recorded in multiple rows. For instance if the office
number changes, then there are multiple updates that need to be made. If
these updates are not successfully completed across all rows, then an
inconsistency
occurs.
Deletion Anomaly
Deletion of a row can cause more than one set of facts to be removed. For
instance, if John Hunt retires, then deleting that row cause use to lose
information about the New York
office.
Clearly if the customer were somehow in one column our query would be
simpler. Also, consider if you want to run a query and sort by customer. The
way the table is currently defined, this isnt possible, unless you use three
separate queries with a UNION. These anomalies can be eliminated or
reduced by properly separating the data into different tables, to house the
data in tables which serve a single purpose. The process to do this is called
normalization, and the various stages you can achieve are called the normal
forms.
Definition of Normalization
There are three common forms of normalization: 1st, 2nd, and 3rd normal form.
There are several additional forms, such as BCNF, but I consider those
advanced, and not too necessary to learn in the beginning. The forms are
progressive, meaning that to qualify for 3rd normal form a table must first
satisfy the rules for 2nd normal form, and 2ndnormal form must adhere to those
for 1st normal form. Before we discuss the various forms and rules in details,
lets summarize the various forms:
Second Normal Form The table is in first normal form and all the
columns depend on the tables primary key.
Third Normal Form the table is in second normal form and all of its
columns are not transitively dependent on the primary key
Do not get too hung up if you dont know what these rules mean at the
moment; well explain them in detail in the next post using examples. For now
its important to understand there are three rules which build upon each
other. More tutorials are to follow! Remember! I want to remind you all that if
you have other questions you want answered, then post a comment or tweet
me. Im here to help you. What other topics would you like to know more
about?