Lecture-7
De-normalization
1
Shah, 2015
Shah, 2015
2
Shah, 2015
De-normalization Normalization
Too many tables
4+ Normal Forms
3rd Normal Form
2nd Normal Form
Data Cubes 1st Normal Form
Data Lists
Flat Table One big flat file
3
Shah, 2015
It is not chaos, more like a “controlled crash”
with the aim of performance enhancement
without loss of information.
Normalization is a rule of thumb in DBMS, but in
DSS ease of use is achieved by way of
denormalization.
De-normalization comes in many flavors, such as
combining tables, splitting tables, adding data etc.,
but all done very carefully.
4
Shah, 2015
Bringing “close” dispersed but related data items.
Query performance in DSS significantly dependent
on physical data model.
Very early studies showed performance difference in
orders of magnitude for different number de-
normalized tables and rows per table.
The level of de-normalization should be carefully
considered.
5
Shah, 2015
De-normalization specifically improves
performance by either:
Reducing the number of tables and hence the reliance
on joins, which consequently speeds up performance.
Reducing the number of joins required during query
execution, or
Reducing the number of rows to be retrieved from the
Primary Data Table.
6
Shah, 2015
1. Carefully do a cost-benefit analysis
(frequency of use, additional storage, join
time).
2. Do a data requirement and storage
analysis.
3. Weigh against the maintenance issue of
the redundant data (triggers used).
4. When in doubt, don’t denormalize.
7
Shah, 2015
Dealing with the abundance of star schemas.
Fast access of time series data for analysis.
Fast aggregate (sum, average etc.) results and complicated
calculations.
Multidimensional analysis (e.g. geography) in a complex
hierarchy.
Dealing with few updates but many join queries.
De-normalization will ultimately affect the database size and query
performance.
8
Shah, 2015
1. Collapsing Tables.
- Two entities with a One-to-One relationship.
- Two entities with a Many-to-Many relationship.
2. Splitting Tables (Horizontal/Vertical Splitting).
3. Pre-Joining.
4. Adding Redundant Columns (Reference Data).
5. Derived Attributes (Summary, Total, Balance etc).
9
Shah, 2015
ColA ColB
denormalized
ColA ColB ColC
normalized
ColA ColC
Reduced storage space.
Reduced update time.
Does not changes business view.
Reduced foreign keys.
Reduced indexing.
10