Skip to content

Commit

Permalink
update README; add TensorRT doc
Browse files Browse the repository at this point in the history
  • Loading branch information
makaveli10 committed Jan 11, 2024
1 parent 647c576 commit 3c202bf
Show file tree
Hide file tree
Showing 2 changed files with 151 additions and 50 deletions.
115 changes: 65 additions & 50 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,61 +16,76 @@ Unlike traditional speech recognition systems that rely on continuous audio stre
pip install whisper-live
```

### Setting up NVIDIA/TensorRT-LLM for TensorRT backend
- Please follow [TensorRT_whisper readme]() for installation of [NVIDIA/TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) and for building Whisper-TensorRT engine.

## Getting Started
- Run the server
The server supports two backends `faster_whisper` and `tensorrt`. If running `tensorrt` backend follow [TensorRT_whisper readme](https://github.com/collabora/WhisperLive/blob/main/TensorRT_whisper.md)

### Running the Server
- [Faster Whisper](https://github.com/SYSTRAN/faster-whisper) backend
```bash
python3 run_server.py --port 9090 \
--backend faster_whisper
```

- TensorRT backend. Currently, we only recommend docker setup for TensorRT as shown in the [TensorRT_whisper readme](https://github.com/collabora/WhisperLive/blob/main/TensorRT_whisper.md) which works as expected. Make sure you follow the readme and build your TensorRT Engines before running the server with TensorRT backend.
```bash
# Run English only model
python3 run_server.py --port 9090 \
--backend tensorrt \
--whisper_tensorrt_path /home/TensorRT-LLM/examples/whisper/whisper_small_en

# Run Multilingual model
python3 run_server.py --port 9090 \
--backend tensorrt \
--whisper_tensorrt_path /home/TensorRT-LLM/examples/whisper/whisper_small \
--trt_multilingual
```


### Running the Client
- To transcribe an audio file:
```python
from whisper_live.server import TranscriptionServer
server = TranscriptionServer()
server.run("0.0.0.0", 9090)
from whisper_live.client import TranscriptionClient
client = TranscriptionClient(
"localhost",
9090,
is_multilingual=False,
lang="en",
translate=False,
model_size="small"
)

client("tests/jfk.wav")
```
This command transcribes the specified audio file (audio.wav) using the Whisper model. It connects to the server running on localhost at port 9090. It can also enable the multilingual feature, allowing transcription in multiple languages. The language option specifies the target language for transcription, in this case, English ("en"). The translate option should be set to `True` if we want to translate from the source language to English and `False` if we want to transcribe in the source language.

- On the client side
- To transcribe an audio file:
```python
from whisper_live.client import TranscriptionClient
client = TranscriptionClient(
"localhost",
9090,
is_multilingual=False,
lang="en",
translate=False,
model_size="small"
)

client("tests/jfk.wav")
```
This command transcribes the specified audio file (audio.wav) using the Whisper model. It connects to the server running on localhost at port 9090. It can also enable the multilingual feature, allowing transcription in multiple languages. The language option specifies the target language for transcription, in this case, English ("en"). The translate option should be set to `True` if we want to translate from the source language to English and `False` if we want to transcribe in the source language.

- To transcribe from microphone:
```python
from whisper_live.client import TranscriptionClient
client = TranscriptionClient(
"localhost",
9090,
is_multilingual=True,
lang="hi",
translate=True,
model_size="small"
)
client()
```
This command captures audio from the microphone and sends it to the server for transcription. It uses the multilingual option with `hi` as the selected language, enabling the multilingual feature and specifying the target language and task. We use whisper `small` by default but can be changed to any other option based on the requirements and the hardware running the server.

- To transcribe from a HLS stream:
```python
client = TranscriptionClient(host, port, is_multilingual=True, lang="en", translate=False)
client(hls_url="http://as-hls-ww-live.akamaized.net/pool_904/live/ww/bbc_1xtra/bbc_1xtra.isml/bbc_1xtra-audio%3d96000.norewind.m3u8")
```
This command streams audio into the server from a HLS stream. It uses the same options as the previous command, enabling the multilingual feature and specifying the target language and task.
- To transcribe from microphone:
```python
from whisper_live.client import TranscriptionClient
client = TranscriptionClient(
"localhost",
9090,
is_multilingual=True,
lang="hi",
translate=True,
model_size="small"
)
client()
```
This command captures audio from the microphone and sends it to the server for transcription. It uses the multilingual option with `hi` as the selected language, enabling the multilingual feature and specifying the target language and task. We use whisper `small` by default but can be changed to any other option based on the requirements and the hardware running the server.

## Transcribe audio from browser
- Run the server
- To transcribe from a HLS stream:
```python
from whisper_live.server import TranscriptionServer
server = TranscriptionServer()
server.run("0.0.0.0", 9090)
from whisper_live.client import TranscriptionClient
client = TranscriptionClient(host, port, is_multilingual=True, lang="en", translate=False)
client(hls_url="http://as-hls-ww-live.akamaized.net/pool_904/live/ww/bbc_1xtra/bbc_1xtra.isml/bbc_1xtra-audio%3d96000.norewind.m3u8")
```
This would start the websocket server on port ```9090```.
This command streams audio into the server from a HLS stream. It uses the same options as the previous command, enabling the multilingual feature and specifying the target language and task.

## Transcribe audio from browser
- Run the server with your desired backend as shown [here](https://github.com/collabora/WhisperLive?tab=readme-ov-file#running-the-server)

### Chrome Extension
- Refer to [Audio-Transcription-Chrome](https://github.com/collabora/whisper-live/tree/main/Audio-Transcription-Chrome#readme) to use Chrome extension.
Expand All @@ -90,11 +105,11 @@ This would start the websocket server on port ```9090```.
docker build . -t whisper-live -f docker/Dockerfile.cpu
docker run -it -p 9090:9090 whisper-live:latest
```
**Note**: By default we use "small" model size. To build docker image for a different model size, change the size in server.py and then build the docker image.
**Note**: This only builds the docker image for `faster_whisper` backend. Follow [TensorRT_whisper readme](https://github.com/collabora/WhisperLive/blob/main/TensorRT_whisper.md) in order to setup and use TensorRT backend. By default we use "small" model size. To build docker image for a different model size, change the size in server.py and then build the docker image.

## Future Work
- [ ] Add translation to other languages on top of transcription.
- [ ] TensorRT backend for Whisper.
- [x] TensorRT backend for Whisper.

## Contact

Expand Down
86 changes: 86 additions & 0 deletions TensorRT_whisper.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# Whisper-TensorRT
We have only tested the TensorRT backend in docker so, we recommend docker for a smooth TensorRT backend setup.

## Installation
- Install [docker](https://docs.docker.com/engine/install/)
- Install [nvidia-container-toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html)
- Pull the pytorch docker image.
```bash
docker pull nvcr.io/nvidia/pytorch_23.10-py3
```
- Clone this repo.
```bash
git clone https://github.com/collabora/WhisperLive.git
```
- Next, we run the docker image and mount WhisperLive repo to the containers `/home` directory.
```bash
docker run -it --gpus all --shm-size=64g /path/to/WhisperLive:/home/WhisperLive nvcr.io/nvidia/pytorch_23.10-py3
```
- Build `tensorrt-llm`.
```bash
cd /home/
cp WhisperLive/scripts/install_tensorrt_llm.sh .
bash install_tensorrt_llm.sh
```
This should clone the [NVIDIA/TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM) and build it as well.

- Test the installation.
```bash
export ENV=${ENV:-/etc/shinit_v2}
source $ENV
python -c "import torch; import tensorrt; import tensorrt_llm"
```

## Whisper TensorRT Engine
- Change working dir to the [whisper example dir](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/whisper) in TensorRT-LLM.
```bash
cd /home/TensorRT-LLM/examples/whisper
```
- Currently, by default TensorRT-LLM only supports `large-v2` and `large-v3`. In this repo, we use `small.en`.

- Edit `build.py` to support all the model sizes i.e. `["tiny", "tiny.en", "base", "base.en", "small", "small.en", "medium", "medium.en"]`. In order to do that, update the list [`choices`](https://github.com/NVIDIA/TensorRT-LLM/blob/a75618df24e97ecf92b8899ca3c229c4b8097dda/examples/whisper/build.py#L58) with the model size you prefer for your WhisperLive server.

- Download the models from [here](https://github.com/openai/whisper/blob/ba3f3cd54b0e5b8ce1ab3de13e32122d0d5f98ab/whisper/__init__.py#L17C1-L30C2)
```bash
# small.en model
wget --directory-prefix=assets https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt

# small multilingual model
wget --directory-prefix=assets https://openaipublic.azureedge.net/main/whisper/models/9ecf779972d90ba49c06d968637d720dd632c55bbf19d441fb42bf17a411e794/small.pt
```

- For this demo we build `small.en` and `small` multilingual TensorRT engine.
```bash
pip install -r requirements.txt

# convert small.en
python3 build.py --output_dir whisper_small_en --use_gpt_attention_plugin --use_gemm_plugin --use_bert_attention_plugin --model_name small.en

# convert small multilingual model
python3 build.py --output_dir whisper_small --use_gpt_attention_plugin --use_gemm_plugin --use_bert_attention_plugin --model_name small
```

- Whisper/small.en tensorrt model engine is saved in `/home/TensorRT-LLM/examples/whisper/whisper_small_en` dir and if you converted the `small` multilingual model it should be saved in `/home/TensorRT-LLM/examples/whisper/whisper_small` dir.

## Run WhisperLive Server with TensorRT Backend
```bash

cd /home/WhisperLive
bash scripts/setup.sh
pip install -r requirements.txt

# Required to create mel spectogram
wget --directory-prefix=assets assets/mel_filters.npz https://raw.githubusercontent.com/openai/whisper/main/whisper/assets/mel_filters.npz

# Run English only model
python3 run_server.py --port 9090 \
--backend tensorrt \
--whisper_tensorrt_path /home/TensorRT-LLM/examples/whisper/whisper_small_en

# Run Multilingual model
python3 run_server.py --port 9090 \
--backend tensorrt \
--whisper_tensorrt_path /home/TensorRT-LLM/examples/whisper/whisper_small_en \
--trt_multilingual
```

0 comments on commit 3c202bf

Please sign in to comment.