0% found this document useful (0 votes)
8 views25 pages

1.1 DM-intro

Uploaded by

23310020
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
Download as pptx, pdf, or txt
0% found this document useful (0 votes)
8 views25 pages

1.1 DM-intro

Uploaded by

23310020
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1/ 25

Data Mining

Why Data Mining?


 Availability of huge amount of data

 Necessity to convert the data into meaningful information


Pattern Extraction from the large data base

 DM applies various techniques to data to extract pattern


Evolution- Database Technology

 DM - growth of Information Technology


 Need to process data
 1960s – File processing system
– Difficulty in FP leads to DB
 1970s – early 1980s (DB)
 Different DB models

 Indexing and Accessing methods (B-trees, hashing)

 SQL, forms and reports

 OLTP
 Adv. Database Systems (mid 1980s – present)

Object relation model


Spatial, multimedia, scientific & Engineering database

 Adv. Data Analysis(Data warehouse late 1980s)

 Web based databases ( 1990s – present)

 Leads to generation of Integrated Data and Information System


Data Collection and Database Creation

)
(1960s and Earlier
•primitive file processing Figure – DM Evolution

Database Management Systems (1970s – early 1980s)


•Hierarchical and network database systems
•Relational database systems
•Data modeling tools: entity – relational models, etc
•Indexing and accessing methods: B trees, hashing, etc.
•Query languages: SQL, etc.
•User interfaces, forms and reports
•Query processing and query optimization
•Transactions, concurrency control and recovery
•On-line transaction processing (OLTP)

Advanced Database Systems Advanced Data Analysis: Web-based databases


(mid- 1980s – present) Data warehousing and Data Mining (1990s- present)
•Advanced data models: (late 1980s – present) •XML- based db systems
extended relational, object – •Data warehouse and OLAP •Integration with
relational, etc •Data mining and knowledge discovery: information retrieval
•Advanced applications: generalization, classification, association, •Data and information
spatial, temporal, multimedia, clustering, frequent pattern and structured integration
active, stream and sensor, pattern analysis, outlier analysis, trend and
scientific and engineering, deviation analysis, etc.
knowledge based •Advanced data mining applications: stream
data mining, bio-data mining, time-series
analysis, text mining, Web mining, intrusion
detection, etc
•Data mining and society:
privacy-preserving data mining

New Generation of Integrated Data and Information


Systems
(present-future)
Data Rich, Information Poor
What is Data Mining ?

Find precious nuggets from a raw material


Extracting / Mining knowledge from data
 Pattern Extraction from the large data base
 What is pattern?
 Insights of data (Hidden uncovered pattern -similarity )

 Pattern - Knowledge

 Ex: Gold Mining


In other words…
Data mining (knowledge discovery from data)
Extraction of interesting (non-trivial, implicit, previously unknown and
potentially useful) patterns or knowledge from huge amount of data

Alternative names
Knowledge mining from databases
Knowledge extraction
Data/pattern analysis
Information harvesting
Business intelligence

Knowledge discovery (mining) in databases (KDD)


What kind of information are we collecting?
Non-exclusive list of a variety of information collected in digital form in databases and in
flat files.
 Business transactions
 Scientific data: Medical and personal data
 Surveillance
 video and pictures
 Satellite sensing
 Games
 Digital media
 CAD and Software engineering data
 Virtual Worlds
 Text reports and memos (e-mail messages
 The World Wide Web repositories
Definitions
Knowledge

Information converted into knowledge about historical patterns and future


trends.

For example
summary information on retail supermarket sales
can be analyzed in terms of
promotional efforts to provide knowledge of consumer buying behavior.

Manufacturer/retailer determine which items are most susceptible to promotional


efforts.
Data Mining Applications
Data mining is highly useful in the following domains −

 Market Analysis and Management

 Corporate Analysis & Risk Management

 Fraud Detection

Apart from these, data mining can also be used in the

areas of production control, customer retention, science

exploration, sports, astrology, and Internet Web Surf-Aid.


Data Mining Applications

Market Analysis and Management


 Customer Profiling − kind of people buy what kind of products

 Identifying Customer Requirements − best products for different customers(factors attract new
customers)

 Cross Market Analysis − association/correlations between product sales

 Target Marketing − clusters of model customers who share the same characteristics such as interests,
spending habits, income, etc.

 Determining Customer purchasing pattern − determining customer purchasing pattern

 Providing Summary Information − multidimensional summary reports


Corporate Analysis and Risk Management

Finance Planning and Asset Evaluation - cash flow analysis and prediction,
contingent claim analysis to evaluate assets

Resource Planning − summarizing and comparing the resources and spending

Competition − monitoring competitors and market directions

Fraud Detection

Fields of credit card services and to detect fraud telephone calls

destination of the call, duration of the call, time of the day or week, etc.

patterns that deviate from expected norms


Data Mining in Knowledge Discovery Process
Knowledge discovery from data

KDD process includes

 Data Cleaning (to remove noise and inconsistent data)

 Data Integration (multiple data sources combined)

 Data Selection (data relevant to the analysis task are retrieved from DB)

 Data Transformation (data are transformed or consolidated


into forms appropriate for mining by performing
summary or aggregation operations)
KDD continued….

 Data Mining (essential process-intelligent methods applied to


extract data patterns)

 Pattern Evaluation (to identify truly interesting patterns


representing knowledge based on some
interestingness measures)

 Knowledge Presentation (visualization and knowledge


representation techniques used to
present the mined knowledge to

user)
Knowledge Discovery (KDD) Process
 Data mining—core of knowledge discovery process
References

 Data Mining in Bioinformatics, “Jason T.L. Wang”, “Mohammed


J.Zaki”, “Hannu T.T.Toivonen”, “Dennis Shasha”.

 Data Mining Concepts and Techniques, “Jiawei Han & Micheline


Kamber”

 Application of Data Mining in Bioinformatics, “Khalid Raza” , Indian


Journal of Computer Science & Engineering
Data Mining in Knowledge Discovery Process
Application
 Sales/Marketing (Buying patterns)

 Banking ( credit card amount spend by the customer)

 Insurance & Health care ( Identify the customers)

 Medicine ( characterize patient behavior)

 Biology
Biological Application

 Sequence Analysis
 Genome annotation
 Analysis of gene, protein expression
 Analysis of mutation in cancer
 Protein structure prediction
 Modeling biological systems
 Protein-protein docking & so on.
Some of the Biological DM tools
Knowledge Discovery (KDD) Process
 Data mining—core of knowledge discovery
process
Pattern Evaluation

Data Mining

Task-relevant Data

Data Warehouse Selection

Data Cleaning

Data Integration

Databases

You might also like