update README; add TensorRT doc

positivewon · Jan 11, 2024 · 3c202bf · 3c202bf
1 parent 647c576
commit 3c202bf
Show file tree

Hide file tree

Showing 2 changed files with 151 additions and 50 deletions.
diff --git a/README.md b/README.md
@@ -16,61 +16,76 @@ Unlike traditional speech recognition systems that rely on continuous audio stre
  pip install whisper-live
 ```
 
+### Setting up NVIDIA/TensorRT-LLM for TensorRT backend
+- Please follow [TensorRT_whisper readme]() for installation of [NVIDIA/TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) and for building Whisper-TensorRT engine.
+
 ## Getting Started
-- Run the server
+The server supports two backends `faster_whisper` and `tensorrt`. If running `tensorrt` backend follow [TensorRT_whisper readme](https://github.com/collabora/WhisperLive/blob/main/TensorRT_whisper.md)
+
+### Running the Server
+- [Faster Whisper](https://github.com/SYSTRAN/faster-whisper) backend
+```bash
+python3 run_server.py --port 9090 \
+                      --backend faster_whisper
+```
+
+- TensorRT backend. Currently, we only recommend docker setup for TensorRT as shown in the [TensorRT_whisper readme](https://github.com/collabora/WhisperLive/blob/main/TensorRT_whisper.md) which works as expected. Make sure you follow the readme and build your TensorRT Engines before running the server with TensorRT backend.
+```bash
+# Run English only model
+python3 run_server.py --port 9090 \
+                      --backend tensorrt \
+                      --whisper_tensorrt_path /home/TensorRT-LLM/examples/whisper/whisper_small_en
+
+# Run Multilingual model
+python3 run_server.py --port 9090 \
+                      --backend tensorrt \
+                      --whisper_tensorrt_path /home/TensorRT-LLM/examples/whisper/whisper_small \
+                      --trt_multilingual
+```
+
+
+### Running the Client
+- To transcribe an audio file:
 ```python
- from whisper_live.server import TranscriptionServer
- server = TranscriptionServer()
- server.run("0.0.0.0", 9090)
+from whisper_live.client import TranscriptionClient
+client = TranscriptionClient(
+  "localhost",
+  9090,
+  is_multilingual=False,
+  lang="en",
+  translate=False,
+  model_size="small"
+)
+
+client("tests/jfk.wav")
 ```
+This command transcribes the specified audio file (audio.wav) using the Whisper model. It connects to the server running on localhost at port 9090. It can also enable the multilingual feature, allowing transcription in multiple languages. The language option specifies the target language for transcription, in this case, English ("en"). The translate option should be set to `True` if we want to translate from the source language to English and `False` if we want to transcribe in the source language.
 
-- On the client side
-    - To transcribe an audio file:
-    ```python
-      from whisper_live.client import TranscriptionClient
-      client = TranscriptionClient(
-        "localhost",
-        9090,
-        is_multilingual=False,
-        lang="en",
-        translate=False,
-        model_size="small"
-      )
-
-      client("tests/jfk.wav")
-    ```
-    This command transcribes the specified audio file (audio.wav) using the Whisper model. It connects to the server running on localhost at port 9090. It can also enable the multilingual feature, allowing transcription in multiple languages. The language option specifies the target language for transcription, in this case, English ("en"). The translate option should be set to `True` if we want to translate from the source language to English and `False` if we want to transcribe in the source language.
-
-    - To transcribe from microphone:
-    ```python
-      from whisper_live.client import TranscriptionClient
-      client = TranscriptionClient(
-        "localhost",
-        9090,
-        is_multilingual=True,
-        lang="hi",
-        translate=True,
-        model_size="small"
-      )
-      client()
-    ```
-    This command captures audio from the microphone and sends it to the server for transcription. It uses the multilingual option with `hi` as the selected language, enabling the multilingual feature and specifying the target language and task. We use whisper `small` by default but can be changed to any other option based on the requirements and the hardware running the server.
-
-    - To transcribe from a HLS stream:
-    ```python
-      client = TranscriptionClient(host, port, is_multilingual=True, lang="en", translate=False) 
-      client(hls_url="http://as-hls-ww-live.akamaized.net/pool_904/live/ww/bbc_1xtra/bbc_1xtra.isml/bbc_1xtra-audio%3d96000.norewind.m3u8") 
-    ```
-    This command streams audio into the server from a HLS stream. It uses the same options as the previous command, enabling the multilingual feature and specifying the target language and task.
+- To transcribe from microphone:
+```python
+from whisper_live.client import TranscriptionClient
+client = TranscriptionClient(
+  "localhost",
+  9090,
+  is_multilingual=True,
+  lang="hi",
+  translate=True,
+  model_size="small"
+)
+client()
+```
+This command captures audio from the microphone and sends it to the server for transcription. It uses the multilingual option with `hi` as the selected language, enabling the multilingual feature and specifying the target language and task. We use whisper `small` by default but can be changed to any other option based on the requirements and the hardware running the server.
 
-## Transcribe audio from browser
-- Run the server
+- To transcribe from a HLS stream:
 ```python
- from whisper_live.server import TranscriptionServer
- server = TranscriptionServer()
- server.run("0.0.0.0", 9090)
+from whisper_live.client import TranscriptionClient
+client = TranscriptionClient(host, port, is_multilingual=True, lang="en", translate=False) 
+client(hls_url="http://as-hls-ww-live.akamaized.net/pool_904/live/ww/bbc_1xtra/bbc_1xtra.isml/bbc_1xtra-audio%3d96000.norewind.m3u8") 
 ```
-This would start the websocket server on port ```9090```.
+This command streams audio into the server from a HLS stream. It uses the same options as the previous command, enabling the multilingual feature and specifying the target language and task.
+
+## Transcribe audio from browser
+- Run the server with your desired backend as shown [here](https://github.com/collabora/WhisperLive?tab=readme-ov-file#running-the-server)
 
 ### Chrome Extension
 - Refer to [Audio-Transcription-Chrome](https://github.com/collabora/whisper-live/tree/main/Audio-Transcription-Chrome#readme) to use Chrome extension.
@@ -90,11 +105,11 @@ This would start the websocket server on port ```9090```.
  docker build . -t whisper-live -f docker/Dockerfile.cpu
  docker run -it -p 9090:9090 whisper-live:latest
 ```
-**Note**: By default we use "small" model size. To build docker image for a different model size, change the size in server.py and then build the docker image.
+**Note**: This only builds the docker image for `faster_whisper` backend. Follow [TensorRT_whisper readme](https://github.com/collabora/WhisperLive/blob/main/TensorRT_whisper.md) in order to setup and use TensorRT backend. By default we use "small" model size. To build docker image for a different model size, change the size in server.py and then build the docker image.
 
 ## Future Work
 - [ ] Add translation to other languages on top of transcription.
-- [ ] TensorRT backend for Whisper.
+- [x] TensorRT backend for Whisper.
 
 ## Contact
 

diff --git a/TensorRT_whisper.md b/TensorRT_whisper.md
@@ -0,0 +1,86 @@
+# Whisper-TensorRT
+We have only tested the TensorRT backend in docker so, we recommend docker for a smooth TensorRT backend setup.
+
+## Installation
+- Install [docker](https://docs.docker.com/engine/install/)
+- Install [nvidia-container-toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)
+- Pull the pytorch docker image.
+```bash
+docker pull nvcr.io/nvidia/pytorch_23.10-py3
+```
+- Clone this repo.
+```bash
+git clone https://github.com/collabora/WhisperLive.git
+```
+- Next, we run the docker image and mount WhisperLive repo to the containers `/home` directory.
+```bash
+docker run -it --gpus all --shm-size=64g /path/to/WhisperLive:/home/WhisperLive nvcr.io/nvidia/pytorch_23.10-py3
+```
+- Build `tensorrt-llm`.
+```bash
+cd /home/
+cp WhisperLive/scripts/install_tensorrt_llm.sh .
+bash install_tensorrt_llm.sh
+```
+This should clone the [NVIDIA/TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) and build it as well.
+
+- Test the installation.
+```bash
+export ENV=${ENV:-/etc/shinit_v2}
+source $ENV
+python -c "import torch; import tensorrt; import tensorrt_llm"
+```
+
+## Whisper TensorRT Engine
+- Change working dir to the [whisper example dir](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/whisper) in TensorRT-LLM.
+```bash
+cd /home/TensorRT-LLM/examples/whisper
+``` 
+- Currently, by default TensorRT-LLM only supports `large-v2` and `large-v3`. In this repo, we use `small.en`.
+
+- Edit `build.py` to support all the model sizes i.e. `["tiny", "tiny.en", "base", "base.en", "small", "small.en", "medium", "medium.en"]`. In order to do that, update the list [`choices`](https://github.com/NVIDIA/TensorRT-LLM/blob/a75618df24e97ecf92b8899ca3c229c4b8097dda/examples/whisper/build.py#L58) with the model size you prefer for your WhisperLive server.
+
+- Download the models from [here](https://github.com/openai/whisper/blob/ba3f3cd54b0e5b8ce1ab3de13e32122d0d5f98ab/whisper/__init__.py#L17C1-L30C2)
+```bash
+# small.en model
+wget --directory-prefix=assets https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt
+
+# small multilingual model
+wget --directory-prefix=assets https://openaipublic.azureedge.net/main/whisper/models/9ecf779972d90ba49c06d968637d720dd632c55bbf19d441fb42bf17a411e794/small.pt
+```
+
+- For this demo we build `small.en` and `small` multilingual TensorRT engine.
+```bash
+pip install -r requirements.txt
+
+# convert small.en
+python3 build.py --output_dir whisper_small_en --use_gpt_attention_plugin --use_gemm_plugin  --use_bert_attention_plugin --model_name small.en
+
+# convert small multilingual model
+python3 build.py --output_dir whisper_small --use_gpt_attention_plugin --use_gemm_plugin  --use_bert_attention_plugin --model_name small
+```
+
+- Whisper/small.en tensorrt model engine is saved in `/home/TensorRT-LLM/examples/whisper/whisper_small_en` dir and if you converted the `small` multilingual model it should be saved in `/home/TensorRT-LLM/examples/whisper/whisper_small` dir.
+
+## Run WhisperLive Server with TensorRT Backend
+```bash
+
+cd /home/WhisperLive
+bash scripts/setup.sh
+pip install -r requirements.txt
+
+# Required to create mel spectogram
+wget --directory-prefix=assets assets/mel_filters.npz https://raw.githubusercontent.com/openai/whisper/main/whisper/assets/mel_filters.npz
+
+# Run English only model
+python3 run_server.py --port 9090 \
+                      --backend tensorrt \
+                      --whisper_tensorrt_path /home/TensorRT-LLM/examples/whisper/whisper_small_en
+
+# Run Multilingual model
+python3 run_server.py --port 9090 \
+                      --backend tensorrt \
+                      --whisper_tensorrt_path /home/TensorRT-LLM/examples/whisper/whisper_small_en \
+                      --trt_multilingual
+```
+