0% found this document useful (0 votes)

102 views36 pages

Using Django, Docker and Scikit-Learn To Bootstrap Your Machine Learning Project

The document discusses using Docker, Django, and Scikit-learn for machine learning projects. It begins with an overview of machine learning and an example project using Naive Bayes classification. It then covers engineering machine learning systems, including tools like Jupyter notebooks, Scikit-learn, and Pandas. The document emphasizes using Docker for reproducibility and describes integrating Docker with Django to allow updating models through a web API. This allows data scientists to iterate on models in notebooks within Docker containers.

Uploaded by

Dejan Nastevski

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Download as pptx, pdf, or txt

0% found this document useful (0 votes)

102 views36 pages

Using Django, Docker and Scikit-Learn To Bootstrap Your Machine Learning Project

Uploaded by

Dejan Nastevski

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Download as pptx, pdf, or txt

You are on page 1/ 36

Using Django, Docker and Scikit-learn

to bootstrap your Machine Learning

Project
Lorena Mesa
@loooorenanicole
PythonDay Mexico 2017
http://bit.ly/2s5R01V
Shttp://bit.ly/2s5R01V
Hello, I’m Lorena Mesa.

http://bit.ly/2s5R01V http://bit.ly/2s5R01V
How I’ll
approach
1. Review of machine learning
today’s 2. Anatomy of a data science team

chat. 3.
4.
Engineering a machine learning problem
Iterating on machine learning
engineering with Docker, Django, and
sci-kit learn (sklearn)
What is machine learning?
Machine Learning

is a subfield of computer science [that] http://bit.ly/2s5R01V

stud[ies] pattern recognition and
computational learning [in] artificial
intelligence. [It] explores the construction
and study of algorithms that can learn from
and make predictions on data.
Machine Learning, another definition

A computer program is said to learn from http://bit.ly/2s5R01V

experience (E) with respect to some task
(T) and some performance measure (P), if
its performance on T, as measured by P,
improves with experience E.

(Ch. 1 - Machine Learning Tom Mitchell )

Example Project:
Predicting Altruism with a Naive
Bayes Classifier
Free acts of pizza, a Reddit subreddit
Free acts of pizza

Training data contains:

- 5671 requests
- Successful (994) labelled as True
- Unsuccessful (3046) labelled as False.

Unlabeled data has 1631 requests.

http://bit.ly/2s5R01V
Task: Classify a
piece of data

Is a pizza request
successful? Is it
altruistic or not?
Experience:
Labeled training
data

Request_id | No
Request_id | Yes
Performance
Measurement: Is
the label correct?

Verify if the request

is successful or not
Anatomy of a Data Science Team
IBM UX Personas Applied to Engineering
Data Science
Teams
Complementary skill sets, for
example consider my team:
- (3) Data scientists: PhD Natural Language
Processing, Predictive Analytics,
Economics
- (1) Software engineer: historically platform
engineering and data analyst
- Designated Infrastructure support
Engineering a Machine Learning
Project
Python Tools Used
by Data Scientists
- Executable code + analysis
environments: Jupyter
notebooks
- Machine learning: sklearn
- Database: DataGrip, or
another database IDE
- Data analysis: Pandas
- Plotting: matplotlib, bokeh
- Data visualization: seaborn

Jake VanderPlas, PyCon 2017 keynote

http://bit.ly/2s5R01V
Why Python has been adopted by
scientific community

- Python is glue (plays well with

other languages)
- “Batteries included” + 3rd party
modules
- Simple + dynamic
- Open ethos is well suited to
science
“Before software machine
learning can be usable, it must
first be reusable.”

- (modified) Ralph Johnson

http://bit.ly/2s5R01V
Feature engineering is expensive,
it takes time to:
- Shape the data
- Select which features to
use
- Collect data!

Simple machine learning pipeline.

Data science is fundamentally
embedded within a different
system from production

What is the handoff between

data science and
production?

Handoff between data science and production

Simplified Machine
Learning Project
1. Get and shape the data
2. Train the model on the data
3. Pickle the model, save with joblib
4. User it! Predict on the data

from sklearn.naive_bayes import MultinomialNB

X, y = get_xy()
X_train, X_test, y_train, y_test =
train_test_split(X, y, random_state=1111)

model = MultinomialNB().fit(X_train, y_train)

filename = 'pizza_classifier_latest.pkl'
pickle.dump(model, open(filename, 'wb'))

You can use sklearn pipelines to apply transformations

with scoring indicators as well
Reproducibility matters.
How do we engineer for that?
Docker
Docker containers are a big executable tarball (with explicit format) that
includes everything needed to run it: code, system tools, libraries, settings!

Also, according to Kelsey Hightower, “the first rule of Python is you don’t
use the system installed version of Python”

Step 1: Write a Dockerfile (cached layers)

Step 2: Build the Docker image
docker build -t ‘predicting-altruism:latest’ .
Step 3: Run the Docker image in a container
docker run -d -ti -p 8888:8888 -v
~/local_path/to/notebooks:/home/jupyter/notebooks
predicting-altruism .

http://bit.ly/2s5R01V
Example Dockerfile
FROM python:3
RUN pip install virtualenv

RUN useradd jupyter

RUN adduser jupyter sudo
RUN mkdir /home/jupyter/
ADD entrypoint.sh /home/jupyter/
RUN chmod +x /home/jupyter/entrypoint.sh
ADD requirements.txt /home/jupyter/
ADD notebooks/ /home/jupyter/notebooks

RUN chown jupyter:jupyter /home/jupyter/

VOLUME ["/home/jupyter/notebooks"]

WORKDIR /home/jupyter

RUN virtualenv myenv && pip install -r /home/jupyter/requirements.txt

ENV SHELL=/bin/bash
ENV USER=jupyter
EXPOSE 8888:8888

ENTRYPOINT ["/bin/bash", "/home/jupyter/entrypoint.sh"]

http://bit.ly/2s5R01V
Model versioning
Example Dockerfile with a volume
Docker volumes allow a mountable data directory, permitting an individual to check in and out
new notebooks as they see fit

ADD notebooks/ /home/jupyter/notebooks

...

VOLUME ["/home/jupyter/notebooks"]

Whenever the data scientist and/or other team member is ready to save their work, the pickled
model when saved inside a Docker container will automatically save to the mounted data volume
directory

http://bit.ly/2s5R01V
Django-izing Docker + sklearn
model
Process for updating a model
Now the process becomes:

1. Write a Dockerfile with a mountable data volume

2. Embed the Dockerfile in a Django API
3. Add Jupyter notebook into the mountable data volume in the Django API
4. Call the http://localhost:8000/api/model/create/predictaltriusm endpoint to build the new
Docker image
5. Spin up Docker container, allow data scientist to iterate on model
6. Save the model!
7. Update the model to wherever it needs to live for productionalizing it

http://bit.ly/2s5R01V
Wrap docker-py into Django endpoint
from docker import APIClient
from io import BytesIO

def create_image(request, model, path=None):

if not path:
path = BASE_DIR

try:
urlpatterns = [
url(r'^create/image/(?P<model>\w{0,50})',
with open(path, 'r') as d:
dockerfile = [x.strip() for x in d.readlines()] create_image, name='create_image'),
dockerfile = ' '.join(dockerfile)
dockerfile = bytes(dockerfile.encode('utf-8'))
]
f = BytesIO(dockerfile)
For more information on the Docker Python SDK
# Point to the Docker instance reference the docs on the low level API here
cli = APIClient(base_url='tcp://192.168.99.100:2376')

response = [line for line in cli.build(

fileobj=f, rm=True, tag=model
)]

return JsonResponse({'image': response})

except:
return HttpResponseServerError() http://bit.ly/2s5R01V
docker build -t ‘naive-bayes’ .
It’s that simple.

http://bit.ly/2s5R01V
Want to learn more?

Talks:

- Kelsey Highwater PyCon 2017 closing keynote on Docker + Kubernete

s
- Jake VanderPlas The Python Visualization Landscape
- Kevin Goetsch Deploying Machine Learning using sklearn pipelines
- Lorena Mesa - Predicting free Pizza with Python
Books:
- Introduction to Machine Learning with Python, Sarah Guido, O’Reilly’s
GitHub:
- Docker with Jupyter Notebook mountable volume
- Docker-py (Read the Docs)
Thank you!
http://bit.ly/2s5R01V | @loooorenanicole

Scan To BIM - Presentation
No ratings yet
Scan To BIM - Presentation
61 pages
Pytorch Cheatsheet EN
No ratings yet
Pytorch Cheatsheet EN
1 page
Read & Download (PDF Kindle)
No ratings yet
Read & Download (PDF Kindle)
5 pages
What Is A Support Vector Machine?: Primer
No ratings yet
What Is A Support Vector Machine?: Primer
3 pages
Introduction To TensorFlow For Artificial Intelligence
No ratings yet
Introduction To TensorFlow For Artificial Intelligence
41 pages
Tutorials
No ratings yet
Tutorials
17 pages
Deep Learning: A Visual Introduction
No ratings yet
Deep Learning: A Visual Introduction
53 pages
IT Firms
33% (3)
IT Firms
1,286 pages
Simple Libraries in Python
No ratings yet
Simple Libraries in Python
12 pages
Machine Learning: Andrew NG's Course From Coursera: Presentation
100% (1)
Machine Learning: Andrew NG's Course From Coursera: Presentation
4 pages
Complete Guide To Parameter Tuning in XGBoost (With Codes in Python) PDF
No ratings yet
Complete Guide To Parameter Tuning in XGBoost (With Codes in Python) PDF
20 pages
Columbia Seaborn Tutorial
No ratings yet
Columbia Seaborn Tutorial
12 pages
Pandas
100% (1)
Pandas
1,131 pages
Install TensorFlow With Pip - TensorFlow
No ratings yet
Install TensorFlow With Pip - TensorFlow
3 pages
Duckdb Docs
No ratings yet
Duckdb Docs
721 pages
Scikit-Learn: Library For Machine Learning and Data Science With Python
No ratings yet
Scikit-Learn: Library For Machine Learning and Data Science With Python
11 pages
Python For Finance - The Complete Beginner's Guide - by Behic Guven - Jul, 2020 - Towards Data Science PDF
100% (1)
Python For Finance - The Complete Beginner's Guide - by Behic Guven - Jul, 2020 - Towards Data Science PDF
12 pages
MFML PDF
No ratings yet
MFML PDF
101 pages
PyTorch Crash Course 1713016363
No ratings yet
PyTorch Crash Course 1713016363
15 pages
MACHINELEARING UNIT 1material
100% (1)
MACHINELEARING UNIT 1material
64 pages
API Reference - Scikit-Learn 0.19.2 Documentation
No ratings yet
API Reference - Scikit-Learn 0.19.2 Documentation
21 pages
Deep Learning Patterns and Practices 1st Edition Andrew Ferlitsch 2024 scribd download
100% (3)
Deep Learning Patterns and Practices 1st Edition Andrew Ferlitsch 2024 scribd download
40 pages
Full Download Python Debugging For AI, Machine Learning, and Cloud Computing: A Pattern-Oriented Approach 1st Edition Vostokov PDF
100% (5)
Full Download Python Debugging For AI, Machine Learning, and Cloud Computing: A Pattern-Oriented Approach 1st Edition Vostokov PDF
62 pages
Pandas Complete Notes
No ratings yet
Pandas Complete Notes
105 pages
Full download Neural Networks A Visual Introduction for Beginners Michael Taylor pdf docx
100% (1)
Full download Neural Networks A Visual Introduction for Beginners Michael Taylor pdf docx
65 pages
Figure Style and Scale: Darkgrid Whitegrid Dark White Ticks Darkgrid
No ratings yet
Figure Style and Scale: Darkgrid Whitegrid Dark White Ticks Darkgrid
15 pages
A Gentle Introduction To Neural Networks With Python
100% (1)
A Gentle Introduction To Neural Networks With Python
85 pages
2020 - William L. Hamilton - Graph Representation Learning-Morgan & Claypool
No ratings yet
2020 - William L. Hamilton - Graph Representation Learning-Morgan & Claypool
161 pages
Keras
100% (1)
Keras
2 pages
Instant Download Pandas Workout (MEAP V06) Reuven Lerner PDF All Chapters
100% (2)
Instant Download Pandas Workout (MEAP V06) Reuven Lerner PDF All Chapters
37 pages
7 Time Series Datasets For Machine Learning
No ratings yet
7 Time Series Datasets For Machine Learning
8 pages
Supervised, Unsupervised & Reinforcement Learning
No ratings yet
Supervised, Unsupervised & Reinforcement Learning
11 pages
Intro To Machine Learning With PyTorch
No ratings yet
Intro To Machine Learning With PyTorch
48 pages
Essentials of Machine Learning Algorithms (With Python and R Codes) PDF
100% (1)
Essentials of Machine Learning Algorithms (With Python and R Codes) PDF
20 pages
Understanding Machine Learning Algorithms - in Depth
No ratings yet
Understanding Machine Learning Algorithms - in Depth
167 pages
Pytorch Lightning Readthedocs Latest
100% (1)
Pytorch Lightning Readthedocs Latest
421 pages
Altoros Tensorflow Cheat Sheet
100% (1)
Altoros Tensorflow Cheat Sheet
1 page
27 Jupyter Notebook
No ratings yet
27 Jupyter Notebook
42 pages
PM in Oil & Gas by ML Algorithms
100% (1)
PM in Oil & Gas by ML Algorithms
41 pages
Machine Learning For Predictive Maintainance On Wind Turbines
No ratings yet
Machine Learning For Predictive Maintainance On Wind Turbines
76 pages
Weka Tutorial
No ratings yet
Weka Tutorial
2 pages
Building Transformer Models With Attention - Web - Page
No ratings yet
Building Transformer Models With Attention - Web - Page
19 pages
Full Deep Learning With Python Develop Deep Learning Models On Theano and TensorFLow Using Keras Jason Brownlee Ebook All Chapters
100% (3)
Full Deep Learning With Python Develop Deep Learning Models On Theano and TensorFLow Using Keras Jason Brownlee Ebook All Chapters
62 pages
A Practical Time-Series Tutorial With MATLAB
No ratings yet
A Practical Time-Series Tutorial With MATLAB
95 pages
Download Full Deep Learning 1st Edition Dulani Meedeniya PDF All Chapters
100% (2)
Download Full Deep Learning 1st Edition Dulani Meedeniya PDF All Chapters
50 pages
Numpy Handbook
No ratings yet
Numpy Handbook
16 pages
Lecture 01 (Introduction To Pattern Recognition)
No ratings yet
Lecture 01 (Introduction To Pattern Recognition)
26 pages
Statistics in Details
100% (2)
Statistics in Details
283 pages
Fundamentals of Statistics For Data Science
No ratings yet
Fundamentals of Statistics For Data Science
23 pages
Natural Language Processing With Java - Sample Chapter
100% (1)
Natural Language Processing With Java - Sample Chapter
33 pages
Anomaly Detection in Images CIFAR-10
No ratings yet
Anomaly Detection in Images CIFAR-10
9 pages
Support Vector Machines Succinctly
No ratings yet
Support Vector Machines Succinctly
116 pages
Face Detection & Emotion Recognition
No ratings yet
Face Detection & Emotion Recognition
26 pages
Machine Learning in Trading
67% (3)
Machine Learning in Trading
205 pages
Coursera Machine Learning Homework
100% (1)
Coursera Machine Learning Homework
6 pages
Feature engineering Complete Self-Assessment Guide
From Everand
Feature engineering Complete Self-Assessment Guide
Gerardus Blokdyk
No ratings yet
Unit-4 Containers and Docker
No ratings yet
Unit-4 Containers and Docker
44 pages
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
From Everand
PHP Package Mastery: 100 Essential Tools in One Hour - 2024 Edition
Kanto
No ratings yet
Python and SQLite Development
From Everand
Python and SQLite Development
Agus Kurniawan
No ratings yet
Machine Learning with Python: A Comprehensive Guide with a Practical Example
From Everand
Machine Learning with Python: A Comprehensive Guide with a Practical Example
MARTIN NEEL
No ratings yet
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
Fortios v6.0.4 Release Notes
No ratings yet
Fortios v6.0.4 Release Notes
33 pages
The Power of 10 Rules For Developing Safety-Critic
No ratings yet
The Power of 10 Rules For Developing Safety-Critic
4 pages
201B013 Advanced Programming Lab - 2
No ratings yet
201B013 Advanced Programming Lab - 2
12 pages
Quartus Install 4
No ratings yet
Quartus Install 4
94 pages
HMS Integration Proposal For Connect Today PDF
No ratings yet
HMS Integration Proposal For Connect Today PDF
95 pages
SlickEdit User Guide-V28
No ratings yet
SlickEdit User Guide-V28
1,630 pages
MAQ Software_Job Description_Software Engineer 1_High Pot 2025 (1)
No ratings yet
MAQ Software_Job Description_Software Engineer 1_High Pot 2025 (1)
5 pages
Project Main
No ratings yet
Project Main
39 pages
Manual_Z50-Z70_v2-Z100_EN
No ratings yet
Manual_Z50-Z70_v2-Z100_EN
176 pages
Plaso Filtering: Cheat Sheet 1.03
No ratings yet
Plaso Filtering: Cheat Sheet 1.03
2 pages
F021a - RPL Evidence Guide
No ratings yet
F021a - RPL Evidence Guide
4 pages
Javascript Es 6 Typescript
No ratings yet
Javascript Es 6 Typescript
67 pages
Itp09 Sia Q1P
No ratings yet
Itp09 Sia Q1P
3 pages
JEMStar Application Notes
No ratings yet
JEMStar Application Notes
26 pages
Windows 10
No ratings yet
Windows 10
37 pages
Emerging Technologies
No ratings yet
Emerging Technologies
10 pages
Linuxmint User Guide Readthedocs Io en Latest
No ratings yet
Linuxmint User Guide Readthedocs Io en Latest
45 pages
Oracle RMAN
No ratings yet
Oracle RMAN
6 pages
Computer Science Project
No ratings yet
Computer Science Project
10 pages
Project Management MTI
No ratings yet
Project Management MTI
34 pages
Give A Quick Summary On What Each Part of This Code Does and Its Expected Outcomes
No ratings yet
Give A Quick Summary On What Each Part of This Code Does and Its Expected Outcomes
4 pages
Hacking Odt
No ratings yet
Hacking Odt
9 pages
Merkandi Instructions For Submitting An Offer
No ratings yet
Merkandi Instructions For Submitting An Offer
10 pages
Digital Modulation Using MATLAB
No ratings yet
Digital Modulation Using MATLAB
7 pages
PromptEngineering 20240711 1112
No ratings yet
PromptEngineering 20240711 1112
8 pages
Intec ColorCut Pro User Guide v.1.44f Optimised
No ratings yet
Intec ColorCut Pro User Guide v.1.44f Optimised
22 pages
Revit MEP Electrical Masterclass - From Beginner To Advanced
No ratings yet
Revit MEP Electrical Masterclass - From Beginner To Advanced
2 pages
Laundry ERP - Mehek
No ratings yet
Laundry ERP - Mehek
86 pages
Types of Software
No ratings yet
Types of Software
2 pages