Data2 Science Process Am
Data2 Science Process Am
Data Science
5. Knowledge
What Motivated Data Mining?
Why Is It Important?
• Wide availability of huge amounts of
data and the imminent need for turning
such data into useful information and
knowledge.
• Data mining can be viewed as a result of
the natural evolution of information
technology.
Data science process frameworks
• One of the most popular data science process frameworks is Cross
Industry Standard Process for Data Mining ( CRISP-DM ),
which is an acronym for Cross Industry Standard Process for Data
Mining.
• The CRISP-DM process is the most widely adopted framework for
developing data science solutions.
• Other data science frameworks are SEMMA, an acronym for Sample,
Explore, Modify, Model, and Assess,
• DMAIC, is an acronym for Define, Measure, Analyze, Improve, and
Control, used in Six Sigma practice and the Selection, Preprocessing,
Transformation, Data Mining, Interpretation, and Evaluation
framework used in the knowledge discovery in databases process .
CRISP DM process
Data understanding focus
to identify, collect, and
CRISP-DM is a process model with analyze the data sets
six phases that naturally describes
the data science life cycle. It's like a Business Data
set of guardrails to help you plan, Understanding Understanding
organize, and implement your data
science (or machine learning) Data
project. Preparation
Deployment
Data
The Business Understanding phase
Modeling
focuses on understanding the objectives
and requirements of the project.
• Understanding of the customer’s Evaluation
needs
Data science Process
• The data science process is a generic set of steps that is
problem, algorithm, and data science tool agnostic.
• The fundamental objective of any process that involves
data science is to address the analysis question.
• The learning algorithm used to solve the business
question could be a decision tree, an artificial neural
network, or a scatterplot.
• The software tool to develop and implement the data
science algorithm used could be custom coding,
RapidMiner, R, Weka, SAS, Oracle Data Miner, Python.
Data Science Process
Business Data
Understanding Understanding 1. Prior Knowledge
Deployment 4. Application