C Lecture
C Lecture
The bottom-up approach starts with experiments and prototypes. This is useful in the carly
stage of business modeling and technology development. It allows an organization to move
forward at considerably less expense and to evaluate the benefits of the techno logy be fore
making significant commitments.
In the combined approach, an organization can exploit the planned and strategic nature of
the top-down approach while retaining the rapid implementation and opportunistic
application ofthe bottom-up approach.
The warehouse design process consists of the following steps:
1|IT DWDM
Cho0se d busincss process fo odel l e s a , l t inekR, iiit inythiy
account administration, salcs or e etal .h.e. .lwcins99 titege ig tpntit/t bi
and involves multiple complex objcct collections, #
followed. However, if the proccss is
business process, a data mart nodcl sould be hosen
v e ttitu
Choose the grain ofthe busincss process. fipdametal t o i
2rain is the
indivilal nHS
to be represented in the fact table lor this ptoes, lot exmple,
individualdaily snapshots, and so on.
linenarma me
Choose the dimensions that will apolv lo ech i t tabke recod. Iypical
time, item, custoner, supplicr, warchouse, (ransaction typc, and it at.
Choose the measures that will populale cach ct table eeord, Typical cuCs e itne
additivequantities likedollars sold and units sold.
A Three Tier Data Warehouse Architeeture:
Queryreport Analyin
Output
OLAP server
Miilate tiers
Hotto Mer:
Metadata repository data warehoue
Extract
Clean
Transform
Load
Data
Retresh
2 | IT DWNDM
Tier-1:
The bottom tier isa warchouse database server that is almost always a relational database systeTn.
Back-cnd tools and utilities are used to fced data into the bottom tier from operational databases
external onsutants).
or other cxternal sources (such as customer proile information provided by
These tools and utilities perform data extraction, cleaning, and transformation (e.g, to merge
functions to
similar data from different sources into a unificd format), as well as load and refresh
update the data warchouse. The data are extractcd using application program interfaces knoWn as
gateways. Agateway is supported by the underlying DBMS and allows client programs to generate
SQL code to be exccuted at a server. Examples of gateways include ODBC (Open Database
Conncction) andOLEDB (Open Linking and Embedding for Databases) by Microsoft and JDBC
(Java Database Connection). This tier also contains a metadata repository, which Stores
Tier-3:
The top tier is afront-end client layer, which contains query and reporting tools, analysis too is.
and/or data mining tools (e.g., trend analysis, prediction, and so on).
3|IT DWNDM
terabytes, or beyond.
a few gigabytes to hundreds of gigabytes, computer
warchouse may be implemented on traditionalmainframes,
An enterprise data business modeling
platforms. It requires extensive
super servers, or parallel architecture
and may take years to design and build.
2 Data mart:
is of value to a specific group or
A data martcontains a subset of cornorate-wide data that
example, a marketing data
users. 1ne scope is confined to specific selected subiects. For
in data
mart my confine its subjects to customer. item. and sales. The data contained
marts tend to be summarized.
Dala marts are usually implemented on low-cost departmental servers that are
UNIX/LINUX- or Windows-based. The implementation cycle of a data mart is more
likely to be measured in weeks rather than months or years. However, it may involve
complex integrat ion in the long run if is design and planning were not enterprise-wide.
Depending on the source of data, data marts can be categorized as independent more
dependent. Independent data marts are sourced from data captured from one or more
operational systems or external information providers, or from data generated locally
within a particular department or geographic area. Dependent data marts are
source
directly from enterprise data warehouses.
3. Virtual warehouse:
Avirtual warehouse is a set of views over operational databases. For efficient query
processing, only some of the possible summary views may be materialized.
A virtual warehouse is easy to build but
requires excess capacity on operational database
Servers.
Star schema:
6IT DWDM
sales tem
fact tahle dimensike table
iNcttt_key
itet ke
suppicr type
branch location
reasioa table demension table
Iocatio koy
strect
be a c h e city
cOuty
that of
given in Figure Here, the sales fact table is identicalto
Asnow flake schema for AlIElectronicssales is
dimension lables,
between the two schemas is in the definition of
the star schema in Figure. The main difference
resulting in new
schema is normalized in the snowflake schema,
The single dimension table for item in the star name.
dimension table now contains the attributes item key, item
item and supplier tables. For example, the item
supplier dimension table, containing supplier key
brand. type, and supplier key, where supplier key is linked to the
table for location in the star schema can be
and supplier type information. Similarly, the single dimension
links to the citydimension.
normalized into two newtables: location and city. The city key in the new location table
Notice that further normalization can be performed on province or state and country in the snowflake schema
7|T DWDM
snpplur
dwnsn table
pher koy
location
dinenskOn table
lnaion hey
dimcnsiOn able
city key
contry
Fact constellation.
A fàct constellation schema is shown in Figure. This schema specifies two fact tables, sales and shipping. The
sales table detinition is identical to that of the star schema. The shipping table has five dimensions, or keys: item
key. time key. shipper key. from location, and to location, and two measures: dollars cost and units shipped.
A fact constellation schema allows dimension tables to be shared between fact tables. For cxample, the dimensions
tables for time. item. and location are shared between both the sales and shipping fact tables.
Indata warehousing. there is adistinction between a data warehouse and a data mart.
Adata warehouse collects informat ion about subjects that span theentire organization, such as customers, items,
sales, assets. and personnel. and thus its scope is enterprise-wide. For data warchouses, the fact constellation
schema is commonly used, since itcan model multiple, interrelated subjccts. Adata mart,on the other hand, is a
department subset of the data warehouse that focuses On selected subjects, and thus its scope is department wide.
For data marts. the star or snowflake schema are commonly used, since both are geared toward modeling single
subjects, although the star schenma is more popular and efficient.
8| IT DWDM