Data Warehousing Data Mining Lecture Notes On UNIT 1
Data Warehousing Data Mining Lecture Notes On UNIT 1
SURESH BABU M
ASST PROF
IT DEPT
VJIT
1960s:
Data collection, database creation, IMS and network DBMS
1970s:
Relational data model, relational DBMS implementation
1980s:
RDBMS, advanced data models (extended-relational, OO, deductive, etc.)
Application-oriented DBMS (spatial, scientific, engineering, etc.)
1990s:
Data mining, data warehousing, multimedia databases, and Web databases
2000s
Stream data management and mining
Data mining and its applications
Web technology (XML, data integration) and global information systems
Alternative names
Knowledge discovery (mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, business
intelligence, etc.
Other Applications
Text mining (news group, email, documents) andWeb mining
Stream data mining
Bioinformatics and bio-data analysis
6 Data Mining: Concepts and Techniques
Ex. 1: Market Analysis and Management
Where does the data come from?—Credit card transactions, loyalty cards, discount coupons,
customer complaint calls, plus (public) lifestyle studies
Target marketing
Find clusters of “model” customers who share the same characteristics: interest, income level, spending
habits, etc.,
Determine customer purchasing patterns over time
Resource planning
Competition
Data Mining
Task-relevant Data
Data Cleaning
Data Integration
Increasing potential
to support
business decisions End User
Decision
Making
Data Exploration
Statistical Summary, Querying, and Reporting
Object-relational databases
Multimedia database
Text databases
Data mining may generate thousands of patterns: Not all of them are interesting
Suggested approach: Human-centered, query-based, focused mining
Interestingness measures
A pattern is interesting if it is easily understood by humans, valid on new or test data with
some degree of certainty, potentially useful, novel, or validates some hypothesis that a user
seeks to confirm
Subjective: based on user’s belief in the data, e.g., unexpectedness, novelty, actionability, etc.
Pattern Evaluation
Knowl
Data Mining Engine edge-
Base
Database or Data
Warehouse Server
Mining methodology
Mining different kinds of knowledge from diverse data types, e.g., bio, stream, Web
Performance: efficiency, effectiveness, and scalability
Pattern evaluation: the interestingness problem
Incorporation of background knowledge
Handling noise and incomplete data
Parallel, distributed and incremental mining methods
Integration of the discovered knowledge with existing one: knowledge fusion
User interaction
Data mining query languages and ad-hoc mining
Expression and visualization of data mining results
Interactive mining of knowledge at multiple levels of abstraction
Applications and social impacts
Protection of data security, integrity, and privacy
A KDD process includes data cleaning, data integration, data selection, transformation,
data mining, pattern evaluation, and knowledge presentation
Data mining functionalities: characterization, discrimination, association, classification,
clustering, outlier and trend analysis, etc.
Data mining systems and architectures
S. Chakrabarti. Mining the Web: Statistical Analysis of Hypertex and Semi-Structured Data. Morgan Kaufmann, 2002
T. Dasu and T. Johnson. Exploratory Data Mining and Data Cleaning. John Wiley & Sons, 2003
U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy. Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, 1996
U. Fayyad, G. Grinstein, and A. Wierse, Information Visualization in Data Mining and Knowledge Discovery, Morgan Kaufmann, 2001
J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2nd ed., 2006
D. J. Hand, H. Mannila, and P. Smyth, Principles of Data Mining, MIT Press, 2001
T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction,
Springer-Verlag, 2001
P.-N. Tan, M. Steinbach and V. Kumar, Introduction to Data Mining, Wiley, 2005
I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, Morgan
Kaufmann, 2nd ed. 2005