This repo contains the codebase for the machine learning APIs service mesh.
This repository contains the modular implementation of the logic powering Onclusive's ML stack:
- internal libraries
- internally maintained core images
- applications serving internally maintained models
- applications serving externally maintained models
A top-level doc on can be found here
All internal libraries can be found here. See individual library for detailed documentation.
An overview of developer utilities and existing images on can be found here
All internal core images can be found here. See individual core image for detailed documentation.
ML projects are decomposed into multiple pre-defined steps that represent an abstraction of a model lifecycle at Onclusive.
An overview of developer utilities and existing images on can be found here
- ingest: if the data needed for training is external to Onclusive, an ingest step is needed to
bring data into our internal storage.
- see here for this component's doc
- register: register features to be used in training component.
- see here for this component's doc
- train: model training and registering to internal model registry.
- see here for this component's doc
- compile: model compilation (optimized for serving) and registering to internal model registry
- see here for this component's doc
- serve: model served as a REST API.
- see here for this component's doc
- backfill: backfilling.
- see here for this component's doc
Strict abstraction boundaries help express the invariant and logical consistency of each component behaviour (input, processing and output). This allows us to create well defined patterns that can be applied specifically to implement each of these steps on new projects. Not all of these steps are mandatory: for instance, pre-trained model used for zero-shot learning will not have a prepare and train step.
An overview of developer utilities and existing images on can be found here
All internal core images can be found here. See individual app for detailed documentation.
If you are on MacOS, you can run the script ./bin/bootstrap/darwin
that will set up your local machine for development. If you are using Linux, use ./bin/bootstrap/linux
.
Windows setup is not supported yet - to be explored. If you want to contribute to this please reach out to the MLOPs team.
Setup your aws credentials for dev and prod ML accounts (Default region name should be us-east-2 and Default output format should be JSON). Ask @mlops on slack to get your credentials created if you don't have them already.
For your dev credentials:
aws configure --profile dev
For your prod credentials:
aws configure --profile prod
You can also switch profiles at any time by updating the environment variable as follows
export AWS_PROFILE=dev
As all images used in projects and apps are based on our core docker images. It helps save time to build all images. Run the command
make docker.build/python-base
make docker.build/gpu-base
make docker.build/neuron-inference
It takes about 10 minutes to run, go stretch your legs, get a coffee, or consult our Contribution Guide.
See here for a detailed step-by-step guide on how to contribute to the mesh.
Python is our language of choice. In order to manage versions effectively, we recommend to use pyenv. In order to setup your environment with the repository official version.
If poetry commands take longer to run, it's a good idea to clear the pypi cache:
poetry cache clear pypi --all
The error message is:
Failed to solve with frontend dockerfile.v0: failed to create LLB definition: pull access denied, repository does not exist or may require authorization: server message: insufficient_scope:
authorization failed
Run the following command:
export DOCKER_BUILDKIT=0
If you run into this error, you can use the make command:
make clean