DATA MINING
&
DATA
WAREHOUSING
DEFINITION
The Data Mining process is the extraction of
valid and previously unknown information.
OR
The process of the extraction of hidden
predictive information from large databases,
is a powerful new technology with a great
potential to help companies focus on the
most important information in their data
warehouses.
Why Do We Need Data Mining?
To handle bulk of Data in various enterprises,
thereby increasing the margin.
To turn incomprehensible Data into Usable
information.
It is a combination of ideas from statistics,
machine learning, Databases and parallel
computing.
Goals of Data Mining
Prediction
Identification
Classification
Optimization
Prediction
How certain attributes within the Data behave
in future.
Like as:-
What customers buy with discount.
How much sale value a store generates in a
given period.
Whether deleting a sale line yield more profit.
Uses techniques like regression , correlation
etc.
Identification
Data patterns used to identify the existence of
an item, an event or an activity.
Intruders trying to break the computer
system may be identified by the program
executed, files accessed and CPU time per
session.
Existence of gene is identified by certain
sequence of nucleotide symbols present in
the DNA sequence .
Classification
Data partition to identify different classes or
patterns based on combination of
parameters.
Customers can be identified as discount
seekers, shoppers in a rush, loyal regular
customers, shoppers attached to name
brands etc.
Classification can help in categorizing food as
health food, party food, school lunch food
etc.
Data Mining Approaches
Verification Driven Data Mining:- Querying &
reporting, presenting the output in graphical,
tabular & textual forms, through multi-dimensional
analysis & through statistical analysis.
Discovery Data Driven Mining:- There are four
different discovery driven Data Mining approaches
for at present:-
Predictive modeling including neural nets.
Link-analysis technique which attempts to establish
links between records.
Database segmentation which partitions the data
into collections of related records, and
Deviation detection which identifies point that do
not fit in a segment.
Data Mining Process(A KDD Process)
Data Mining-core of knowledge
discovery
Steps of a KDD Process
Learning the application domain
relevant prior knowledge and goals of
application.
Creating a target data set :data selection
Data cleaning and preprocessing
Data reduction and transformation.
find useful features, dimensionally /
variable reduction, invariant representation.
Choosing functions of data mining.
summarization,classification,regression
,association, clustering.
Choosing the mining algorithms.
Data mining: search for patterns of interest.
Pattern evaluation and knowledge presentation.
visualization, transformation, removing
redundant patterns etc.
Use of discovered knowledge.
Typical Data Mining System
ARCHITECTURE
GRAPHICAL USER INTERFACE
PATTERN EVALUATION
DATA MINING ENGINE KNOW
LEDGE
DATA CLEANING & BASE
FILTERING DATA WAREHOUSE
DATA INTEGARTION
DATA
DATA
BASE
WAREH
S
Process of Data Mining
TRANSFORMED
ASSIMI
DATA LATIO
N
EXTRACTE
TRANS
D
FORM
DATA
ED
Data DATA
wareho SELECTE
use D DATA
SELECT TRANSF ASSIMILA
ORM MINE TE
APPLICATIONS OF DATA MINING
Data mining predict future trends & behaviors,
allowing businesses to make proactive,
knowledge driven decisions.
Using a method called “neural segmentation” a
no. of different types of purchase patterns can
be identified and then customers groupings can
be associated with this data.
To minimize the resources, it is necessary to
identify what factors affect the crop yield, out of
such items as chemical fertilizers & additives.
Which are
lowest/high
est margin
What is the customers?
most Who are my
effective customers
distribution and what
Channel? products are
Data they buying?
warehousing customers
What
product Which
promotions What impact customers
have the will new are most
biggest products/ser likely to go to
impact on vices have the
revenue? on revenue & competition
margins?
WHAT IS DATA WAREHOUSE?
DATA COLLECTED FROM ONE OR MANY
SYSTEMS THAT EXIST WITHIN AND OUTSIDE
THE ORGANIZATION. THE DATA IS
STRUCTURED INSUCH A WAY AS TO REDUCE
THE AMOUNT OF TIME THAT IT TAKES TO
PRODUCE RELIABLE INFORMATION.
WHY DO WE NEED DATA
WAREHOUSING?
As It has both hardware and software
components which facilitates taking better
decisions in massive companies.
To provide a consistent common source for
corporate information.
To store large volumes of historical detail
data from mission critical applications.
Improve the ability to access, report against
and analyze information.
To solve or improve business processes.