How To Build and Deploy ML Projects
How To Build and Deploy ML Projects
litan.ilany@intel.com
Agenda
• Introduction
• Machine Learning: Exploration vs Solution
• CRISP-DM
• Data Flow considerations
• Other key considerations
• Q&A
2
Introduction – Litan ilany
litan.ilany@intel.com
3
Advanced analytics team
4
Machine Learning
• Statistics
• Pattern recognition
• Generalization / Inductive Inference
• Types of learning:
• Supervised vs Unsupervised Learning
• Passive vs Active & Reinforcement Learning
• Batch vs Online Learning
5
ML – Algorithm vs Solution
• “Given a data matrix…” – does not exist in real life
6
ML project - Go / No-Go Decision
Model can be
Technology is Model’s I/O flow is
Execution Feasibility accessible
executed in a timely
manner and size
reasonable
7
CRISP-DM
Cross-Industry Standard Process for Data Mining
8
what is the Data we are
CRISP-DM working with?
Business Data
what is the problem Understanding Understanding
we are dealing with?
what are the transformations
and extractions to be
done on the Data?
how should we use the
model we developed? Data
Data Preparation
Deployment
Modeling
does the model meet Evaluation what is the data
the project goals? model we should use?
9
Business Data
Understanding Understanding
Modeling
• Assess situation
• Determine data mining goals and success criteria
• Determine project plan
10
Business Data
Understanding Understanding
Modeling
• Each git-push is integrated with the main repository – after tests series passes
• Multi git-push (can’t check one-by-one)
• Bug in code causes entire integration to fail
Failed
11
Business Data
Understanding Understanding
Modeling
• Each git-push is integrated with the main repository – after tests series passes
• Multi git-push (can’t check one-by-one)
• Bug in code causes entire integration to fail
Failed
12
Business Data
Understanding Understanding
Modeling
• Project plan
Failed
ML solution
Deployment Data
Modeling
Modeling
15
Business Data
Understanding Understanding
Modeling
• Build model
• Assess model
16
Crisp-dm: modeling – example (smart ci)
• Model assessment:
• Which model to choose?
• How can we measure it?
Model A Model B
push push push push
Predicted \ Actual Total Predicted \ Actual Total
Passed failed Passed failed
17
Crisp-dm: modeling – example (smart ci)
Measure A B
• Model assessment: Accuracy (55+12)/100 = 67% (35+25)/100 = 60%
Model A Model B
push push push push
Predicted \ Actual Total Predicted \ Actual Total
Passed failed Passed failed
18
Crisp-dm: modeling – example (smart ci)
Measure A B
• Model assessment: Accuracy (55+12)/100 = 67% (35+25)/100 = 60%
Model A Model B
push push push push
Predicted \ Actual Total Predicted \ Actual Total
Passed failed Passed failed
19
Crisp-dm: modeling – example (smart ci)
Measure A B
• Model assessment: Accuracy (55+12)/100 = 67% (35+25)/100 = 60%
Model A Model B
push push push push
Predicted \ Actual Total Predicted \ Actual Total
Passed failed Passed failed
20
Crisp-dm: modeling – example (smart ci)
Measure A B
• Model assessment: Accuracy (55+12)/100 = 67% (35+25)/100 = 60%
21
Crisp-dm: modeling – example (smart ci)
Business ease of explanation
Random Decision
SVM
Forest Tree
KNN
NN GBT Regression
Complex Simple
22
Crisp-dm: modeling – example (smart ci)
GBT
SVM
Decision
Tree
NN
Regression
KNN
Business ease of
explanation
23
Business Data
Understanding Understanding
Modeling
• TAT reduction:
• TP = 50% reduction (X2 faster)
• FN = 0% reduction
• FP = -500-5000% reduction (X5-50 slower)
24
Business Data
Understanding Understanding
Modeling
25
CRISP-DM: data flow
Business Data
Understanding Understanding
Data Flow
Architecture
Data
Data
Data flow
validation Data Schema Preparation
Deployment Architecture
Data flow
Implementation
Modeling
Evaluation
26
Other key considerations
• Use Git (or other version control platform)
• Automate the research process (trial-and-error)
• Use Docker containers
• TEST YOUR CODE (don’t think of it as black box)
• ML Technical Debt – code and data
27
references
CRISP-DM (Wikipedia)
28