According to D.W.
Inmon :
DWH
Subject Oriented
Integrated
Time Varient
Non Volatile
D.W Implementation
Approach
Top Down Approach : D.W.Inmon
Bottom up Approach : Ralph Kimball
SQL Set Operators
Desgined Subject oriented to design
Analysis
Business info collected from various
sources
Allows to anlysis the data with time Eg:
Month, YOY
Once the data entered in DW cannot
change
First Develop EDW then Devlop
Datamarts
First Develop Datamarts then Develp
EDW
Syntax
UNION[ALL]
,MINUS,INTERSECT
String Functions
CONCAT
CONCAT(String1,String2 [..])
RTRIM
RTRIM('String')
LTRIM
LTRIN('String')
TRIM
TRIM('String')
SUBSTRING('String', Start integer, Length of
String)
SUBSTRING
Analytic function
COALESCE Similar to Case
IsNULL (allows only two
arguments)
Returns the first non-null expression in
the list
Replaces NULL with the specified
replacement value.
ROW_NUMBER
COALESCE ( expression [ ,...n ] )
ISNULL ( check_expression,
replacement_value )
ROW_NUMBER() OVER(ORDER BY Column
nam)
RANK
RANK() OVER (ORDER BY Col )
DENSE_RANK
ROW_NUMBER with
PARTITION
DENSE_RANK() OVER (ORDER BY Col )
ROW_NUMBER() OVER(PARTITION BY Col
ORDER BY Col ASC)
Ranking within your ordered partition
No ranks are skipped if there are ranks
with
multiple items
This is kinda like using a Row_number
with Group by
eg:CONVERT(VARCHAR(19),GETDATE())
General function that
converts an expression of one data
type to another
CONVERT()
Returns the sequential number
Null Functions
ISNULL(), NVL(), IFNULL() and
COALESCE()
Joins
Inner Join
Returns all rows when there is at least
one match in BOTH tables
Dimension Table
Return all rows from the left table, and
the matched rows from the right table
Return all rows from the right table,
and the matched rows from the left
table
Return all rows when there is a match
in ONE of the tables
If a table contains primary keys and it
gives the detailed info about business
then such a table
- Entry Points to the fact tables
- Typically in De-Normalized form
- Generally Static and descriptive fields
- Typically used by Group by in SQL
- Typically either Primarkey and
Dimensional attribu
Fact Table
A fact table which contains foreign keys
to dimension tables and numeric facts
- The term FACT represents a single
business measure. E.g. Sales, Qty Sold
- Facts can be detailed level facts or
summarized facts
- Typically the MOST NORMALIZED
TABLE in a dimensional model
- contain HUGE DATA VOLUMES
running into millions of rows
LEFT JOIN
RIGHT JOIN
FULL JOIN
Types of Dimension tables
Degnerated Dimension
Junk Dimeonsions
Slowley Changing Dimension
Conformed Dimension
Fast Chaning Dimensions
data that is dimensional in nature but
stored in a fact table
contain miscellaneous data like flags,
gender, text values etc which is not
useful for reporting
If the data values are changed slowly in
a column or in a row over the period of
time then that dimension table
If Dimension table shared with multiple
fact tables
Changes very fast Eg: Acc Bal,Income
etc
Types of Facts
Additive
Semi Additive
Non Additive
Transformations
Measures that can be added across all
dimensions
Measures that can be added across few
dimensions and not with others
Measures that cannot be added across
all dimensions
It is the process of transforming the
data into a required business format
Data Aggregation
process of integrating the data from
multiple input
Process of removing unwanted/error
out/inaccurate data
multiple detailed values are
summarized into a single unit
Data Purging
Earsing of the data completely
Data Profiling
examining data available from an
existing information source,collecting
stastics and summerise the data with
Aggrate functions (Min,MAX,AVG)
Data Merging
Data Cleansing/Scrubbing
Schemas
Star Schema
Snowflake Schema
Hybrid Schema/Galaxy
Schema/
Fact constellation
It has single fact table connected to
dimension tables like a star.
- The star schema is highly
denormalized
- Simple structure -> easy to
understand schema
- Relatively long time of loading data
into dimension tables (reduendent of
data)
- Performance less compare to Snow
flake
It is an extension of the star schema.In
snowflake schema, very large
dimension tables are normalized into
multiple tables. It is used when a
dimensional table becomes very big
- Highly normalized
- Complex compare to star schema
- Less time to load (due to normalized
data)
- Very good in performance
Combination of Star and Snowflake
schema