Introduction To Data Mining: A.J.M.M. (Ton) Weijters
Introduction To Data Mining: A.J.M.M. (Ton) Weijters
Machine Visualization
Learning
Data Mining and
Knowledge Discovery
Statistics Databases
• Statistics:
– more theory-based
– more focused on testing hypotheses
• Machine Learning
– more heuristics then theory-based
– focused on improving performance of a learning algorithms
• Data Mining and Knowledge Discovery
– Data Mining one step in the Knowledge Discovery process (applying
the Machine Learning algorithm)
– Knowledge Discovery, the whole process including data cleaning,
learning, and integration and visualization of results
• Distinctions are fuzzy
Business
Monitoring Understanding
+ Data
Understanding
+ Data
Preparation
80% of the time
Modeling
(applying mining
/faculteit technologie management algorithm) 20%
Phases and Tasks
Determine Collect Initial Data Data Set Select Modeling Evaluate Results Plan Deployment
Business Objectives Initial Data Collection Data Set Description Technique Assessment of Data Deployment Plan
Background Report Modeling Technique Mining Results w.r.t.
Business Objectives Select Data Modeling Assumptions Business Success Plan Monitoring and
Business Success Describe Data Rationale for Inclusion / Criteria Maintenance
Criteria Data Description Report Exclusion Generate Test Design Approved Models Monitoring and
Test Design Maintenance Plan
Situation Assessment Explore Data Clean Data Review Process
Inventory of Resources Data Exploration Report Data Cleaning Report Build Model Review of Process Produce Final Report
Requirements, Parameter Settings Final Report
Assumptions, and Verify Data Quality Construct Data Models Determine Next Steps Final Presentation
Constraints Data Quality Report Derived Attributes Model Description List of Possible Actions
Risks and Contingencies Generated Records Decision Review Project
Terminology Assess Model Experience
Costs and Benefits Integrate Data Model Assessment Documentation
Merged Data Revised Parameter
Determine Settings
Data Mining Goal Format Data
Data Mining Goals Reformatted Data
Data Mining Success
Criteria