Unit-2-Part-1 Data Mining
Unit-2-Part-1 Data Mining
&
Motivating Challenges
UNIT - II
By
M. Rajesh Reddy
WHAT IS DATA MINING?
• Post Processing:
• only valid and useful results are incorporated into the
decision support system.
• Visualization
• allows analysts to explore the data and the data
mining results from a variety of viewpoints.
• High Dimensionality
• Non-traditional Analysis
Motivating Challenges:
• Scalability
• Size of datasets are in the order of GB, TB or PB.
access)
• High Dimensionality
• common today - data sets with hundreds or thousands
of attributes
• Example
• Bio-Informatics - microarray technology has
produced gene expression data involving
thousands of features.
• Data sets with temporal or spatial components
also tend to have high dimensionality.
• a data set that contains measurements of
temperature at various locations.
Motivating Challenges:
• Non-traditional Analysis:
• Traditional statistical approach: hypothesize-and-test paradigm.
• A hypothesis is proposed,
• an experiment is designed to gather the data, and
• then the data is analyzed with respect to the hypothesis.
• Current data analysis tasks
• Generation and evaluation of thousands of hypotheses,
• Some DM techniques automate the process of hypothesis
generation and evaluation.
• Some data sets frequently involve non-traditional types of data
and data distributions.