45.data Integration and Transformation
45.data Integration and Transformation
Data Integration
and
Transformation
Integration & Transformation
Introduction
Data from multiple
sources into a
coherent store.
Process to convert
data from one format
to other.
Integration & Transformation
Entity Identification
What?
Challenging issue
Redundancy
Definition
Issue
Causes of redundancy
Data Mining
Redundancy
Avoidance
Redundancy Avoidance
Handling Redundancy
in Data Integration
Object identification
Derivable data
Redundancy Avoidance
Avoiding Redundancy
Correlation
Covariance analysis
Careful integration
Redundancy Avoidance
Correlation Analysis
finds redundancies by
measuring that how
strongly given two
attributes, implies or
relate to each other
based on the available
data.
Data Mining
Redundancy
Avoidance
Redundancy Avoidance
Handling Redundancy
in Data Integration
Object identification
Derivable data
Redundancy Avoidance
Avoiding Redundancy
Correlation
Covariance analysis
Careful integration
Redundancy Avoidance
Correlation Analysis
finds redundancies by
measuring that how
strongly given two
attributes, implies or
relate to each other
based on the available
data.
Data Mining
Data Transformation
Data Transformation
Data Transformation
Smoothing
Attribute/feature
Construction
Aggregation
Normalization
Discretization
Data Transformation
Smoothing & Att. Constr.
Remove noise in data
Binning, regression
and clustering.
Aggregation &
Normalization
Summary or
aggregation operations
to the data.
Discretization
Raw values of a
numeric attribute are
replaced.
Interval labels
(0–10, 11–20, etc.)
Conceptual labels
(youth, adult, senior).
Data Mining
Data Transformation
Data Transformation
Data Transformation
Smoothing
Attribute/feature
Construction
Aggregation
Normalization
Discretization
Data Transformation
Smoothing & Att. Constr.
Remove noise in data
Binning, regression
and clustering.
Aggregation &
Normalization
Summary or
aggregation operations
to the data.
Discretization
Raw values of a
numeric attribute are
replaced.
Interval labels
(0–10, 11–20, etc.)
Conceptual labels
(youth, adult, senior).