0% found this document useful (0 votes)
102 views36 pages

Using Django, Docker and Scikit-Learn To Bootstrap Your Machine Learning Project

The document discusses using Docker, Django, and Scikit-learn for machine learning projects. It begins with an overview of machine learning and an example project using Naive Bayes classification. It then covers engineering machine learning systems, including tools like Jupyter notebooks, Scikit-learn, and Pandas. The document emphasizes using Docker for reproducibility and describes integrating Docker with Django to allow updating models through a web API. This allows data scientists to iterate on models in notebooks within Docker containers.

Uploaded by

Dejan Nastevski
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
Download as pptx, pdf, or txt
0% found this document useful (0 votes)
102 views36 pages

Using Django, Docker and Scikit-Learn To Bootstrap Your Machine Learning Project

The document discusses using Docker, Django, and Scikit-learn for machine learning projects. It begins with an overview of machine learning and an example project using Naive Bayes classification. It then covers engineering machine learning systems, including tools like Jupyter notebooks, Scikit-learn, and Pandas. The document emphasizes using Docker for reproducibility and describes integrating Docker with Django to allow updating models through a web API. This allows data scientists to iterate on models in notebooks within Docker containers.

Uploaded by

Dejan Nastevski
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1/ 36

Using Django, Docker and Scikit-learn

to bootstrap your Machine Learning


Project
Lorena Mesa
@loooorenanicole
PythonDay Mexico 2017
http://bit.ly/2s5R01V
Shttp://bit.ly/2s5R01V
Hello, I’m Lorena Mesa.

http://bit.ly/2s5R01V http://bit.ly/2s5R01V
How I’ll
approach
1. Review of machine learning
today’s 2. Anatomy of a data science team

chat. 3.
4.
Engineering a machine learning problem
Iterating on machine learning
engineering with Docker, Django, and
sci-kit learn (sklearn)
What is machine learning?
Machine Learning

is a subfield of computer science [that] http://bit.ly/2s5R01V


stud[ies] pattern recognition and
computational learning [in] artificial
intelligence. [It] explores the construction
and study of algorithms that can learn from
and make predictions on data.
Machine Learning, another definition

A computer program is said to learn from http://bit.ly/2s5R01V


experience (E) with respect to some task
(T) and some performance measure (P), if
its performance on T, as measured by P,
improves with experience E.

(Ch. 1 - Machine Learning Tom Mitchell )


Example Project:
Predicting Altruism with a Naive
Bayes Classifier
Free acts of pizza, a Reddit subreddit
Free acts of pizza

Training data contains:


- 5671 requests
- Successful (994) labelled as True
- Unsuccessful (3046) labelled as False.

Unlabeled data has 1631 requests.

http://bit.ly/2s5R01V
Task: Classify a
piece of data

Is a pizza request
successful? Is it
altruistic or not?
Experience:
Labeled training
data

Request_id | No
Request_id | Yes
Performance
Measurement: Is
the label correct?

Verify if the request


is successful or not
Anatomy of a Data Science Team
IBM UX Personas Applied to Engineering
Data Science
Teams
Complementary skill sets, for
example consider my team:
- (3) Data scientists: PhD Natural Language
Processing, Predictive Analytics,
Economics
- (1) Software engineer: historically platform
engineering and data analyst
- Designated Infrastructure support
Engineering a Machine Learning
Project
Python Tools Used
by Data Scientists
- Executable code + analysis
environments: Jupyter
notebooks
- Machine learning: sklearn
- Database: DataGrip, or
another database IDE
- Data analysis: Pandas
- Plotting: matplotlib, bokeh
- Data visualization: seaborn

Jake VanderPlas, PyCon 2017 keynote

http://bit.ly/2s5R01V
Why Python has been adopted by
scientific community

- Python is glue (plays well with


other languages)
- “Batteries included” + 3rd party
modules
- Simple + dynamic
- Open ethos is well suited to
science
“Before software machine
learning can be usable, it must
first be reusable.”

- (modified) Ralph Johnson

http://bit.ly/2s5R01V
Feature engineering is expensive,
it takes time to:
- Shape the data
- Select which features to
use
- Collect data!

Simple machine learning pipeline.


Data science is fundamentally
embedded within a different
system from production

What is the handoff between


data science and
production?

Handoff between data science and production


Simplified Machine
Learning Project
1. Get and shape the data
2. Train the model on the data
3. Pickle the model, save with joblib
4. User it! Predict on the data

from sklearn.naive_bayes import MultinomialNB

X, y = get_xy()
X_train, X_test, y_train, y_test =
train_test_split(X, y, random_state=1111)

model = MultinomialNB().fit(X_train, y_train)


filename = 'pizza_classifier_latest.pkl'
pickle.dump(model, open(filename, 'wb'))

You can use sklearn pipelines to apply transformations


with scoring indicators as well
Reproducibility matters.
How do we engineer for that?
Docker
Docker containers are a big executable tarball (with explicit format) that
includes everything needed to run it: code, system tools, libraries, settings!

Also, according to Kelsey Hightower, “the first rule of Python is you don’t
use the system installed version of Python”

Step 1: Write a Dockerfile (cached layers)


Step 2: Build the Docker image
docker build -t ‘predicting-altruism:latest’ .
Step 3: Run the Docker image in a container
docker run -d -ti -p 8888:8888 -v
~/local_path/to/notebooks:/home/jupyter/notebooks
predicting-altruism .

http://bit.ly/2s5R01V
Example Dockerfile
FROM python:3
RUN pip install virtualenv

RUN useradd jupyter


RUN adduser jupyter sudo
RUN mkdir /home/jupyter/
ADD entrypoint.sh /home/jupyter/
RUN chmod +x /home/jupyter/entrypoint.sh
ADD requirements.txt /home/jupyter/
ADD notebooks/ /home/jupyter/notebooks

RUN chown jupyter:jupyter /home/jupyter/


VOLUME ["/home/jupyter/notebooks"]

WORKDIR /home/jupyter

RUN virtualenv myenv && pip install -r /home/jupyter/requirements.txt


ENV SHELL=/bin/bash
ENV USER=jupyter
EXPOSE 8888:8888

ENTRYPOINT ["/bin/bash", "/home/jupyter/entrypoint.sh"]

http://bit.ly/2s5R01V
Model versioning
Example Dockerfile with a volume
Docker volumes allow a mountable data directory, permitting an individual to check in and out
new notebooks as they see fit

ADD notebooks/ /home/jupyter/notebooks

...

VOLUME ["/home/jupyter/notebooks"]

Whenever the data scientist and/or other team member is ready to save their work, the pickled
model when saved inside a Docker container will automatically save to the mounted data volume
directory

http://bit.ly/2s5R01V
Django-izing Docker + sklearn
model
Process for updating a model
Now the process becomes:

1. Write a Dockerfile with a mountable data volume


2. Embed the Dockerfile in a Django API
3. Add Jupyter notebook into the mountable data volume in the Django API
4. Call the http://localhost:8000/api/model/create/predictaltriusm endpoint to build the new
Docker image
5. Spin up Docker container, allow data scientist to iterate on model
6. Save the model!
7. Update the model to wherever it needs to live for productionalizing it

http://bit.ly/2s5R01V
Wrap docker-py into Django endpoint
from docker import APIClient
from io import BytesIO

def create_image(request, model, path=None):


if not path:
path = BASE_DIR

try:
urlpatterns = [
url(r'^create/image/(?P<model>\w{0,50})',
with open(path, 'r') as d:
dockerfile = [x.strip() for x in d.readlines()] create_image, name='create_image'),
dockerfile = ' '.join(dockerfile)
dockerfile = bytes(dockerfile.encode('utf-8'))
]
f = BytesIO(dockerfile)
For more information on the Docker Python SDK
# Point to the Docker instance reference the docs on the low level API here
cli = APIClient(base_url='tcp://192.168.99.100:2376')

response = [line for line in cli.build(


fileobj=f, rm=True, tag=model
)]

return JsonResponse({'image': response})


except:
return HttpResponseServerError() http://bit.ly/2s5R01V
docker build -t ‘naive-bayes’ .
It’s that simple.

http://bit.ly/2s5R01V
Want to learn more?

Talks:

- Kelsey Highwater PyCon 2017 closing keynote on Docker + Kubernete


s
- Jake VanderPlas The Python Visualization Landscape
- Kevin Goetsch Deploying Machine Learning using sklearn pipelines
- Lorena Mesa - Predicting free Pizza with Python
Books:
- Introduction to Machine Learning with Python, Sarah Guido, O’Reilly’s
GitHub:
- Docker with Jupyter Notebook mountable volume
- Docker-py (Read the Docs)
Thank you!
http://bit.ly/2s5R01V | @loooorenanicole

You might also like