Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[PT2 Inference] Prototype of Inference Runtime (pytorch#108482)
Summary: This diff demonstrates a simplified E2E workflow for PT2 Inference stack: 1. Model author with `torch.export()` 2. Model processing with `aot_inductor.compile()` 3. Model served with a new Inference Runtime API, named `ModelRunner` `torch.export()` and `aot_inductor.compile()` produces a zip file using `PyTorchStreamWriter`. Runtime reads the zip file with `PyTorchStreamReader`. The zip file contains {F1080328179} More discussion on packaging can be found in https://docs.google.com/document/d/1C-4DP5yu7ZhX1aB1p9JcVZ5TultDKObM10AqEtmZ-nU/edit?usp=sharing Runtime can now switch between two Execution modes: 1. Graph Interpreter mode, implemented based on Sigmoid's Executor 2. AOTInductor mode, implemented based on FBAOTInductorModel Test Plan: buck2 run mode/dev-nosan mode/inplace -c fbcode.enable_gpu_sections=True //sigmoid/inference/test:e2e_test Export and Lower with AOTInductor buck2 run mode/dev-sand mode/inplace -c fbcode.enable_gpu_sections=True sigmoid/inference:export_package Run with GraphInterpreter and AOTInducotr buck2 run mode/dev-nosan //sigmoid/inference:main Reviewed By: suo Differential Revision: D47781098 Pull Request resolved: pytorch#108482 Approved by: https://github.com/zhxchen17
- Loading branch information