0% found this document useful (0 votes)
35 views29 pages

How To Build and Deploy ML Projects

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
0% found this document useful (0 votes)
35 views29 pages

How To Build and Deploy ML Projects

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 29

How to build and deploy

machine learning projects


Litan Ilany, Advanced Analytics

litan.ilany@intel.com
Agenda
• Introduction
• Machine Learning: Exploration vs Solution
• CRISP-DM
• Data Flow considerations
• Other key considerations
• Q&A

2
Introduction – Litan ilany
litan.ilany@intel.com

• Data Scientist Leader at Intel’s Advanced Analytics team.


• Owns a M.Sc. degree in Information-Systems Engineering at BGU (focused on
Machine-Learning and Reinforcement-Learning)
• Married + 2, Live in Kiryat Motzkin

3
Advanced analytics team

Radical improvement of critical processes Help Building AI Competitive products

Design Product Dev Sales Health Industrial Ai IoT Analytics


Validation Cost, quality, Unlimited scale Smart Clinical Edge-Fog-Cloud
performance Trials platform

Breakthrough Technology that Scales

4
Machine Learning
• Statistics
• Pattern recognition
• Generalization / Inductive Inference

• Types of learning:
• Supervised vs Unsupervised Learning
• Passive vs Active & Reinforcement Learning
• Batch vs Online Learning

5
ML – Algorithm vs Solution
• “Given a data matrix…” – does not exist in real life

• Pareto Principle (80/20 rule)


• Technical aspects
• Business needs
• Extreme cases

6
ML project - Go / No-Go Decision

Problem definition Partner willing to


Business Feasibility is clear invest / change
Enough ROI / impact

Data measures what


Enough accessible &
Data Feasibility they care about
(“signal”)
connected data
Data is accurate

Model can be
Technology is Model’s I/O flow is
Execution Feasibility accessible
executed in a timely
manner and size
reasonable

7
CRISP-DM
Cross-Industry Standard Process for Data Mining

• A structed methodology for DM projects


• Based on practical, real-world experience
• Conceived in 1996-7

8
what is the Data we are
CRISP-DM working with?

Business Data
what is the problem Understanding Understanding
we are dealing with?
what are the transformations
and extractions to be
done on the Data?
how should we use the
model we developed? Data
Data Preparation
Deployment

Modeling
does the model meet Evaluation what is the data
the project goals? model we should use?

9
Business Data
Understanding Understanding

Crisp-dm: business understanding Deployment Data


Data
Preparation

Modeling

• Determine business objective Evaluation

• Assess situation
• Determine data mining goals and success criteria
• Determine project plan

10
Business Data
Understanding Understanding

Crisp-dm: business understanding - eXample Deployment Data


Data
Preparation

Modeling

Example: Smart CI Evaluation

• Each git-push is integrated with the main repository – after tests series passes
• Multi git-push (can’t check one-by-one)
• Bug in code causes entire integration to fail

Failed

Git push Passed


Integration CI tests

11
Business Data
Understanding Understanding

Crisp-dm: business understanding - eXample Deployment Data


Data
Preparation

Modeling

Example: Smart CI Evaluation

• Each git-push is integrated with the main repository – after tests series passes
• Multi git-push (can’t check one-by-one)
• Bug in code causes entire integration to fail

Failed

Git push Personal CI Passed Passed


Integration CI tests
tests

12
Business Data
Understanding Understanding

Crisp-dm: business understanding - example Deployment Data


Data
Preparation

Modeling

• Goals and success criteria:


Evaluation

• Reduce Turnaround Time (TAT)


• At least 20% time reduction

• Project plan
Failed
ML solution

Git push Predict “pass” Passed


Integration + CI
tests
Predict “fail”
Personal CI
Failed tests Passed
13
Business Data
Understanding Understanding

Crisp-dm: data understanding Data


Preparation

Deployment Data

Modeling

• Collect initial data Example: Evaluation

• Describe data • Git-log files (unstructured data):


• Explore data • Commits – numerical / binary

• Verify data quality • Files, Folders – numerical / binary


• Lines – numerical

• Git DB (structured data):


• Users – categorical
• Timestamps, etc.

• Historical tests results (labels)


14
Business Data
Understanding Understanding

Crisp-dm: data preparation Deployment Data


Data
Preparation

Modeling

• Integrate data from multi sources Example: Evaluation

• Format data • Generate features from log


• Feature extraction • Generate and clean user-features
• Clean data • Normalize counters
• Construct data • Thousands of features – remove
• Derive attributes – transformation unnecessary ones
• Reduce imbalance data
• Fill in missing values
• Data balancing (if needed)
• Feature selection

15
Business Data
Understanding Understanding

Crisp-dm: modeling Deployment Data


Data
Preparation

Modeling

• Select modeling technique Example: Evaluation

• Consider computer resources, • We’ll check various ML models with


computation time, number of various hyperparameters
features, business needs

• Generate test design


• Simulation,
• Train/Test split, Cross validation
weekly training phase
• Simulation (chronological order)

• Build model
• Assess model

16
Crisp-dm: modeling – example (smart ci)
• Model assessment:
• Which model to choose?
• How can we measure it?

Model A Model B
push push push push
Predicted \ Actual Total Predicted \ Actual Total
Passed failed Passed failed

Predicted pass 55 18 73 Predicted pass 35 5 40

Predicted fail 15 12 27 Predicted fail 35 25 60

Total 70 30 100 Total 70 30 100

17
Crisp-dm: modeling – example (smart ci)
Measure A B
• Model assessment: Accuracy (55+12)/100 = 67% (35+25)/100 = 60%

• Which model to choose?


• How can we measure it?

Model A Model B
push push push push
Predicted \ Actual Total Predicted \ Actual Total
Passed failed Passed failed

Predicted pass 55 18 73 Predicted pass 35 5 40

Predicted fail 15 12 27 Predicted fail 35 25 60

Total 70 30 100 Total 70 30 100

18
Crisp-dm: modeling – example (smart ci)
Measure A B
• Model assessment: Accuracy (55+12)/100 = 67% (35+25)/100 = 60%

• Which model to choose? Precision 55/73 = 75% 35/40 = 87%

• How can we measure it?

Model A Model B
push push push push
Predicted \ Actual Total Predicted \ Actual Total
Passed failed Passed failed

Predicted pass 55 18 73 Predicted pass 35 5 40

Predicted fail 15 12 27 Predicted fail 35 25 60

Total 70 30 100 Total 70 30 100

19
Crisp-dm: modeling – example (smart ci)
Measure A B
• Model assessment: Accuracy (55+12)/100 = 67% (35+25)/100 = 60%

• Which model to choose? Precision 55/73 = 75% 35/40 = 87%


Recall 55/70 = 76% 35/70 = 50%
• How can we measure it?

Model A Model B
push push push push
Predicted \ Actual Total Predicted \ Actual Total
Passed failed Passed failed

Predicted pass 55 18 73 Predicted pass 35 5 40

Predicted fail 15 12 27 Predicted fail 35 25 60

Total 70 30 100 Total 70 30 100

20
Crisp-dm: modeling – example (smart ci)
Measure A B
• Model assessment: Accuracy (55+12)/100 = 67% (35+25)/100 = 60%

• Which model to choose? Precision 55/73 = 75% 35/40 = 87%


Recall 55/70 = 76% 35/70 = 50%
• How can we measure it?
FPR* 18/30 = 60% 5/30 = 17%
*Lower is better
Model A Model B
push push push push
Predicted \ Actual Total Predicted \ Actual Total
Passed failed Passed failed

Predicted pass 55 18 73 Predicted pass 35 5 40

Predicted fail 15 12 27 Predicted fail 35 25 60

Total 70 30 100 Total 70 30 100

21
Crisp-dm: modeling – example (smart ci)
Business ease of explanation

Random Decision
SVM
Forest Tree
KNN
NN GBT Regression

Complex Simple

22
Crisp-dm: modeling – example (smart ci)

Expected value Random


(in simulations) Forest

GBT
SVM
Decision
Tree
NN
Regression
KNN

Business ease of
explanation
23
Business Data
Understanding Understanding

crisp-dm: evaluation Deployment Data


Data
Preparation

Modeling

• Evaluate results Example:


Evaluation

• In terms of business needs push push


Predicted \ Actual
Passed failed
• Review Process
Predicted pass TP FP
• Determine next steps Predicted fail FN TN

• TAT reduction:
• TP = 50% reduction (X2 faster)
• FN = 0% reduction
• FP = -500-5000% reduction (X5-50 slower)

24
Business Data
Understanding Understanding

Crisp-dm: deployment Deployment Data


Data
Preparation

Modeling

• Plan and deploy the model Example:


Evaluation

• Plan monitoring and maintenance • Integrate with existing CI system


process
• Weekly automatic process that will
train the model
• Weekly automatic process that will
monitor the model’s performance
and suggest better hyper
parameters (if needed)

25
CRISP-DM: data flow
Business Data
Understanding Understanding
Data Flow
Architecture

Data
Data
Data flow
validation Data Schema Preparation
Deployment Architecture

Data flow
Implementation
Modeling
Evaluation

26
Other key considerations
• Use Git (or other version control platform)
• Automate the research process (trial-and-error)
• Use Docker containers
• TEST YOUR CODE (don’t think of it as black box)
• ML Technical Debt – code and data

27
references

CRISP-DM (Wikipedia)

4 things DSs should learn from software engineers

Machine Learning: The High Interest Credit Card of Technical Debt

28

You might also like