Data Warehouse Architecture

Data Warehouse Architecture
Data warehouses and their architectures very depending upon the elements of an organization's
situation.
Three common architectures are:
o Data Warehouse Architecture: Basic

o Data Warehouse Architecture: With Staging Area
o Data Warehouse Architecture: With Staging Area and Data Marts
Data Warehouse Architecture: Basic
Operational System
An operational system is a method used in data warehousing to refer to a system that is used to
process the day-to-day transactions of an organization.
Flat Files
A Flat file system is a system of files in which transactional data is stored, and every file in the
system must have a different name.
Meta Data
A set of data that defines and gives information about other data.
Meta Data used in Data Warehouse for a variety of purpose, including:

Meta Data summarizes necessary information about data, which can make finding and
work with instances of data more accessible. For example, author, data build, and data
changed, and file size are examples of very basic document metadata.
Metadata is used to direct a query to the most appropriate data source.
Lightly and highly summarized data
The area of the data warehouse saves all the predefined lightly and highly summarized
(aggregated) data generated by the warehouse manager.
he goals of the summarized information are to speed up query performance. The

summarized record is updated continuously as new information is loaded into the
warehouse.
End-User access Tools
The principal purpose of a data warehouse is to provide information to the business

managers for strategic decision-making. These customers interact with the warehouse
using end-client access tools.
The examples of some of the end-user access tools can be:
o Reporting and Query Tools

o Application Development Tools
o Executive Information Systems Tools
o Online Analytical Processing Tools
o Data Mining Tools
Data Warehouse Architecture: With Staging Area

We must clean and process your operational information before put it into the warehouse.
We can do this programmatically, although data warehouses uses a staging area (A place
where data is processed before entering the warehouse).
A staging area simplifies data cleansing and consolidation for operational method coming
from multiple source systems, especially for enterprise data warehouses where all relevant
data of an enterprise is consolidated.
Data Warehouse Staging Area is a temporary location where a record from source
systems is copied.
Data Warehouse Architecture: With Staging Area and

Data Marts
We may want to customize our warehouse's architecture for multiple groups within our
organization.
We can do this by adding data marts. A data mart is a segment of a data warehouses that
can provided information for reporting and analysis on a section, unit, department or
operation in the company, e.g., sales, payroll, production, etc.
The figure illustrates an example where purchasing, sales, and stocks are separated. In this
example, a financial analyst wants to analyze historical data for purchases and sales or
mine historical information to make predictions about customer behavior.
Properties of Data Warehouse Architectures
1. Separation: Analytical and transactional processing should be kept apart as much as

possible.
2. Scalability: Hardware and software architectures should be simple to upgrade the data
volume, which has to be managed and processed, and the number of user's requirements,
which have to be met, progressively increase.
3. Extensibility: The architecture should be able to perform new operations and
technologies without redesigning the whole system.
4. Security: Monitoring accesses are necessary because of the strategic data stored in the
data warehouses.
5. Administerability: Data Warehouse management should not be complicated.
Types of Data Warehouse Architectures
Single-Tier Architecture
Single-Tier architecture is not periodically used in practice. Its purpose is to minimize the
amount of data stored to reach this goal; it removes data redundancies.
The figure shows the only layer physically available is the source layer. In this method, data
warehouses are virtual. This means that the data warehouse is implemented as a
multidimensional view of operational data created by specific middleware, or an
intermediate processing layer.
The vulnerability of this architecture lies in its failure to meet the requirement for
separation between analytical and transactional processing. Analysis queries are agreed to
operational data after the middleware interprets them. In this way, queries affect
transactional workloads.
Two-Tier Architecture
The requirement for separation plays an essential role in defining the two-tier architecture
for a data warehouse system, as shown in fig:
Although it is typically called two-layer architecture to highlight a separation between

physically available sources and data warehouses, in fact, consists of four subsequent data
flow stages:
1. Source layer: A data warehouse system uses a heterogeneous source of data. That data is
stored initially to corporate relational databases or legacy databases, or it may come from
an information system outside the corporate walls.
2. Data Staging: The data stored to the source should be extracted, cleansed to remove
inconsistencies and fill gaps, and integrated to merge heterogeneous sources into one
standard schema. The so-named Extraction, Transformation, and Loading Tools
(ETL) can combine heterogeneous schemata, extract, transform, cleanse, validate, filter, and
load source data into a data warehouse.
3. Data Warehouse layer: Information is saved to one logically centralized individual
repository: a data warehouse. The data warehouses can be directly accessed, but it can also
be used as a source for creating data marts, which partially replicate data warehouse
contents and are designed for specific enterprise departments. Meta-data repositories store
information on sources, access procedures, data staging, users, data mart schema, and so
on.
4. Analysis: In this layer, integrated data is efficiently, and flexible accessed to issue reports,
dynamically analyze information, and simulate hypothetical business scenarios. It should
feature aggregate information navigators, complex query optimizers, and customer-friendly
GUIs.
Three-Tier Architecture
The three-tier architecture consists of the source layer (containing multiple source system),
the reconciled layer and the data warehouse layer (containing both data warehouses and
data marts). The reconciled layer sits between the source data and data warehouse.
The main advantage of the reconciled layer is that it creates a standard reference data
model for a whole enterprise. At the same time, it separates the problems of source data
extraction and integration from those of data warehouse population. In some cases,
the reconciled layer is also directly used to accomplish better some operational tasks,
such as producing daily reports that cannot be satisfactorily prepared using the corporate
applications or generating data flows to feed external processes periodically to benefit
from cleaning and integration.
This architecture is especially useful for the extensive, enterprise-wide systems. A

disadvantage of this structure is the extra file storage space used through the extra
redundant reconciled layer. It also makes the analytical tools a little further away from
being real-time.
Data reconciliation (DR) is a term typically used to describe a verification phase during a data
migration where the target data is compared against original source data to ensure that the migration
architecture has transferred the data correctly.
Why is Data Reconciliation important?

During a data migration, it is possible for mistakes to be made in the mapping and transformation
logic. Also, runtime failures such as network dropouts or broken transactions can lead to data being
left in an invalid state. These problems can lead to a range of issues such as:
 Missing records
 Missing values
 Incorrect values
 Duplicated records
 Badly formatted values
 Broken relationships across tables or systems
Without the data reconciliation stage, these issues can go unnoticed, severely damage the overall
accuracy of your data and lead to inaccurate insights and issues with customer service.

Data Warehouse Architecture

Uploaded by

Data Warehouse Architecture

Uploaded by

Data Warehouse Architecture

Three common architectures are:

o Data Warehouse Architecture: Basic

Data Warehouse Architecture: Basic

Meta Data used in Data Warehouse for a variety of purpose, including:

Metadata is used to direct a query to the most appropriate data source.

Lightly and highly summarized data

he goals of the summarized information are to speed up query performance. The

End-User access Tools

The principal purpose of a data warehouse is to provide information to the business

The examples of some of the end-user access tools can be:

o Reporting and Query Tools

Data Warehouse Architecture: With Staging Area

Data Warehouse Architecture: With Staging Area and

Properties of Data Warehouse Architectures

1. Separation: Analytical and transactional processing should be kept apart as much as

5. Administerability: Data Warehouse management should not be complicated.

Types of Data Warehouse Architectures

Although it is typically called two-layer architecture to highlight a separation between

This architecture is especially useful for the extensive, enterprise-wide systems. A

Why is Data Reconciliation important?

You might also like