0% found this document useful (0 votes)
39 views22 pages

45.data Integration and Transformation

Data integration involves combining data from multiple sources into a coherent store, while transformation converts data between formats. Redundancy avoidance identifies and removes duplicate or unnecessary data through correlation analysis and careful integration. Data transformation prepares data for mining by smoothing noise, constructing new attributes, aggregating values, normalizing numeric ranges, and discretizing continuous variables.

Uploaded by

amna shahid
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
39 views22 pages

45.data Integration and Transformation

Data integration involves combining data from multiple sources into a coherent store, while transformation converts data between formats. Redundancy avoidance identifies and removes duplicate or unnecessary data through correlation analysis and careful integration. Data transformation prepares data for mining by smoothing noise, constructing new attributes, aggregating values, normalizing numeric ranges, and discretizing continuous variables.

Uploaded by

amna shahid
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 22

Data Mining

Data Integration
and
Transformation
Integration & Transformation

Introduction
Data from multiple
sources into a
coherent store.

Process to convert
data from one format
to other.
Integration & Transformation
Entity Identification

What?

Challenging issue

Matching real world


entities from different
sources during
integration of schema
or objects.
Integration & Transformation

Redundancy

Definition

Issue

Causes of redundancy
Data Mining

Redundancy
Avoidance
Redundancy Avoidance

Handling Redundancy
in Data Integration

Object identification

Derivable data
Redundancy Avoidance

Avoiding Redundancy

Correlation

Covariance analysis

Careful integration
Redundancy Avoidance

Correlation Analysis
finds redundancies by
measuring that how
strongly given two
attributes, implies or
relate to each other
based on the available
data.
Data Mining

Redundancy
Avoidance
Redundancy Avoidance

Handling Redundancy
in Data Integration

Object identification

Derivable data
Redundancy Avoidance

Avoiding Redundancy

Correlation

Covariance analysis

Careful integration
Redundancy Avoidance

Correlation Analysis
finds redundancies by
measuring that how
strongly given two
attributes, implies or
relate to each other
based on the available
data.
Data Mining

Data Transformation
Data Transformation
Data Transformation
Smoothing

Attribute/feature

Construction

Aggregation

Normalization

Discretization
Data Transformation
Smoothing & Att. Constr.
Remove noise in data

Binning, regression
and clustering.

Process where new


attributes are
constructed/added
from given set of
attributes to help the
mining process
Data Transformation

Aggregation &
Normalization
Summary or
aggregation operations
to the data.

Smaller range such as


1.0 to 1.0
0.0 to 1.0
Data Transformation

Discretization
Raw values of a
numeric attribute are
replaced.

Interval labels
(0–10, 11–20, etc.)

Conceptual labels
(youth, adult, senior).
Data Mining

Data Transformation
Data Transformation
Data Transformation
Smoothing

Attribute/feature

Construction

Aggregation

Normalization

Discretization
Data Transformation
Smoothing & Att. Constr.
Remove noise in data

Binning, regression
and clustering.

Process where new


attributes are
constructed/added
from given set of
attributes to help the
mining process
Data Transformation

Aggregation &
Normalization
Summary or
aggregation operations
to the data.

Smaller range such as


1.0 to 1.0
0.0 to 1.0
Data Transformation

Discretization
Raw values of a
numeric attribute are
replaced.

Interval labels
(0–10, 11–20, etc.)

Conceptual labels
(youth, adult, senior).

You might also like