0 ratings0% found this document useful (0 votes) 98 views23 pagesSCD Types
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
Slowly Changing
Dimension: Categories
®
e -
Prof. Sunita Sahu
Assistant Prof, VESIT,;MumbaiSlowly Changing Dimension: Categories
o Dimensions that change slowly over time, rather
than changing on regular schedule, time-base.
o In Data Warehouse there is a need to track
changes in dimension attributes in order to report
historical data.
o The usual changes to dimension tables are
classified into three types
® Type 1
© Type 2
© Type 3Customer
Product Customer Key
Product Key Customer Name,
Product Name Order face Customer Code
Order tact Martial Status
Product Code “Ts Product Key "Kadises
Product Line Time Key State
Brand Customer Key Zip
Salesperson Key
Order Dollars
Time Cost Dollars Salesperson
Time Key Margin Dollars Salesperson Key
Date Sale Units Salesperson
Name
Month 5
care Territory “@
Region Nai
Yearah ey) 1C QeS: Error Correction
o Usually relate to corrections of errors in the source
system.
o For example, the customer dimension: change in
name because of spelling mistakeType 1 Changes, cont.
General Principles for Type 1 changes:
e Usually, the changes relate to correction of errors in
the source system
e Sometimes the change in the source system has no
significance
e The old value in the source system needs to be
discarded
e The change in the source system need not be
preserved in the DWHApplying Type 1c
e Overwrite the attribute value in the dimension table
row with the new value
e The old value of the attribute is not preserved
e No other changes are made in the dimension table
row.
« The key of this dimension table or any other key
values are not affected.
e Easiest to implement.o Before the change:
Customer_ID Customer_Name Customer_Type
1 Cust_1 Corporate
o After the change:
Customer_ID Customer_Name Customer_Type
4 Cust_1 RetailType 2 Changes:
© Let's look at the martial status of customer.
o One the DWH’s requirements is to track orders by o
martial status
o All changes before 11/10/2004 will be under Martial
Status = Single, and all changes after that date will be
under Martial Status = Married
o We need to aggregate the orders before and after the
marriage separatelyType 2 Changes, cont.
o General Principles for Type 2 changes:
e They usually relate to true changes in source .
systems.
e There is a need to preserve history in the DWH.
This type of change partitions the history in the DWH.
e Every change for the same attributes must be
preserved.Type 2 Implementation
o The steps:
« Add a new dimension table row with the new value of ~
the changed attribute
e An effective date will be included in the dimension
table
e There are no changes to the original row in the
dimension table
e The key of the original row is not affected
e The new row is inserted with a new surrogate keyType 2 Example
an ea aca
fee Bi
of Cust! Corporate 22-07-2010 44 45 ggag
Gusto |Customer_N Customer_T | Start_Date End_Date
Cotta | ead Deh
1 Cust_1 Corporate 22-07-2010 31-12-9999
2 Cust_1 Retail 22-07-2010 31-12-9999Type 3 Changes
o Type 3 Slowly Changing Dimension, there will be two
columns to indicate the particular attribute of interest, one
indicating the original value, and one indicating the current
value.
o There will also be a column that indicates when the current
value becomes active.
o Not common at all
e Time-consuming
o We want to track history without lifting heavy burden.
o There are many soft changes and we don't care for the
“far” history @Type 3 Changes
o General Principles:
e They usually relate to “soft” or tentative changes in
the source systems
e There is a need to keep track of history with old and
new values of the changes attribute
e They are used to compare performances across the
transition
e They provide the ability to track forward and backwardType 3
e No new dimension row is needed
e The existing queries will seamlessly switch to the
current value.
e Any queries that need to use the old value must be
revised accordingly.
e The technique works best for one soft change at a
time.
e If there is a succession of changes, more
sophisticated techniques must be advisedType 3
Customer Key Name State
1001 Williams New York
© After Williams moved from New York to Los Angeles, the
original information gets updated, and we have the following
table (assuming the effective date of change is February 20,
2010):
Customer Key Name Original State Current State Effective Date
1001 Williams New York Los Angeles 20-FEB-2010
°Type 3
o Advantages
© This does not increase the size of the table, since new
information is updated.
© This allows us to keep some part of history.
o Disadvantages
© Type 3 will not be able to keep all history where an attribute is
changed more than once. For example, if Williams later
moves to Texas on December 15, 2003, the Los Angeles
information will be lost. @Large Dimension Table
o Dimension table is large based on two factors.
o very deep: that is, the dimension has a very large
number of rows.
o Very wide: that is, the dimension may have a large
number of attributes or columns.
o Ina data warehouse, typically the customer and
product dimensions are likely to be large.
o Such customer dimension tables may have as
many as 100 million rows.
o The product dimension of large retailers is also
quite huge.Junk Dimension
© The junk dimension is simply a structure that provides a convenient
place to store the junk attributes. It is just a collection of random
transactional codes, flags and/or text attributes that are unrelated to
any particular dimension.
© In OLTP tables that are full of flag fields and yes/no attributes, many
of which are used for operational support and have no
documentation except for the column names and the memory banks
of the person who created them. Not only do those types of attributes
not integrate easily into conventional dimensions such as Customer,
Vendor, Time, Location, and Product, but you also don't want to carry
bad design into the data warehouse.However, some of the
miscellaneous attributes will contain data that has significant
business value, so you have to do something with them.Junk Dimension
o Advantage of junk dimension:
o It provides a recognizable location for related codes,
indicators and their descriptors in a dimensional
framework.
0 This avoids the creation of multiple dimension tables.
o Provide a smaller, quicker point of entry for queries
compared to performance when these attributes are
directly in the fact table.
o An interesting use for a junk dimension is to capture the
context of a specific transaction. While our common,
conformed dimensions contain the key dimensional
attributes of interest, there are likely attributes about the
transaction that are not known until the transaction is
processed.Ajok dmeniooiled [Oot Y¥ [ XN TN | Ww] NW TN | NW | ON |
wihYiNeos L$ TON TN TON TN nT]Rapidly Changing Dimensions
o If one or more of its attributes changes frequently.
o when you deal with a type 2 change, you create an
additional dimension table row with the new value of
the changed attribute. By doing so, you are able to
preserve the history.
o consider customer dimension. Here the number of
rows tends to be large, sometimes in the range of
even a million or more rows. But significant attributes
in a customer dimension may change many timesin a
year. Rapidly changing large dimensions can be too
problematic for the type 2 approach.Rapidly Changing Dimensions
o One effective approach is to break the large
dimension table into one or more simpler dimension
tables. How can you accomplish this?
o Obviously, you need to break off the rapidly
changing attributes into another dimension table,
leaving the slowly changing attributes behind in the
original table.Solution to rapidly changing dimension
o Large dimensions call for special considerations.
o Because of the sheer size, many data warehouse
functions involving large dimensions may be slow
and inefficient.
o You need to address the following issues by using
effective design methods, by choosing proper
indexes, and by applying other optimizing
techniques: