Neural Magic's DeepSparse is an inference runtime that can now be deployed directly from the Google Cloud Marketplace. DeepSparse supports various machine types on Google Cloud, so you can quickly deploy the infrastructure that works best for your use case, based on cost and performance.
A Compute Engine VM integrated with DeepSparse can be launched via the GCP console or programmatically via Python. For the console workflow, follow the guide in our blog. If you are interested in configuring and launching an instance with DeepSparse in Python, follow the step-by-step guide below.
You will need to install the Google Cloud Compute, Google API Core, click packages, and have gcloud CLI installed:
pip install google-cloud-compute google-api-core click
Select a Project and Subscribe to the DeepSparse Inference Runtime from the Google Cloud Marketplace.
After you click Launch, you will land on a page where you can enable any required APIs you don't already have enabled on your Google Cloud account.
At this point, you may continue the instance configuration in the GCP console. However, if you prefer launching an instance using Python from your local machine, refer to this script. This module launches a virtual machine in the Compute Engine integrated with DeepSparse.
You can launch the instance by running the following commmand in your terminal:
python gcp.py launch-instance --project-id <PROJECT-ID> --zone <ZONE> --instance-name <INSTANCE-NAME> --machine-type <MACHINE-TYPE>
If you wish to conduct further customization prior to the launch of your instance, refer to the arguments available in the create_instance
function in the script.
After running the script, run the following gcloud CLI command to SSH into your running instance. Pass in the same variables from the function in Step 3 such as your instance name, zone, and project id:
gcloud compute ssh <INSTANCE_NAME> --zone <ZONE> --project <PROJECT_ID>
Once logged into the instance, you can use all of the DeepSparse features such as benchmarking, pipelines, and the server. Here's an example of benchmarking a pruned-quantized version of BERT trained on SQuAD:
deepsparse.benchmark
zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned95_obs_quant-none -i [64,128] -b 64 -nstreams 1 -s sync
python gcp.py delete-instance --project-id <PROJECT-ID> --zone <ZONE> --instance-name <INSTANCE-NAME>