[Doc] Update document for RWKV (#293)

jmpcyc · Jun 2, 2023 · c856439 · c856439
1 parent d3c6053
commit c856439
Show file tree

Hide file tree

Showing 6 changed files with 128 additions and 10 deletions.
diff --git a/README.md b/README.md
@@ -39,7 +39,7 @@ MLC LLM offers a repeatable, systematic, and customizable workflow that empowers
 
 ## How does MLC Enable Universal Native Deployment?
 
-The cornerstone of our solution is machine learning compilation ([MLC](https://mlc.ai/)), which we leverage to efficiently deploy AI models. We build on the shoulders of open-source ecosystems, including tokenizers from Hugging Face and Google, as well as open-source LLMs like Llama, Vicuna, Dolly, MOSS and more. Our primary workflow is based on [Apache TVM Unity](https://github.com/apache/tvm/tree/unity), an exciting ongoing development in the Apache TVM Community.
+The cornerstone of our solution is machine learning compilation ([MLC](https://mlc.ai/)), which we leverage to efficiently deploy AI models. We build on the shoulders of open-source ecosystems, including tokenizers from Hugging Face and Google, as well as open-source LLMs like Llama, Vicuna, Dolly, MOSS, RWKV and more. Our primary workflow is based on [Apache TVM Unity](https://github.com/apache/tvm/tree/unity), an exciting ongoing development in the Apache TVM Community.
 
 - Dynamic shape: We bake a language model as a TVM IRModule with native dynamic shape support, avoiding the need for extra padding to the maximum length and reducing both computation amount and memory usage.
 - Composable ML compilation optimizations: we perform many model deployment optimizations, such as better compilation code transformation, fusion, memory planning, library offloading and manual code optimization can be easily incorporated as TVM's IRModule transformations exposed as Python APIs.
@@ -112,4 +112,4 @@ walkthrough of our approaches.
 
 This project is initiated by members from CMU catalyst, UW SAMPL, SJTU, OctoML and the MLC community. We would love to continue developing and supporting the open-source ML community.
 
-This project is only possible thanks to the shoulders open-source ecosystems that we stand on. We want to thank the Apache TVM community and developers of the TVM Unity effort. The open-source ML community members made these models publicly available. PyTorch and Hugging Face communities that make these models accessible. We would like to thank the teams behind Vicuna, SentencePiece, LLaMA, Alpaca and MOSS. We also would like to thank the Vulkan, Swift, C++, Python Rust communities that enables this project.
+This project is only possible thanks to the shoulders open-source ecosystems that we stand on. We want to thank the Apache TVM community and developers of the TVM Unity effort. The open-source ML community members made these models publicly available. PyTorch and Hugging Face communities that make these models accessible. We would like to thank the teams behind Vicuna, SentencePiece, LLaMA, Alpaca, MOSS and RWKV. We also would like to thank the Vulkan, Swift, C++, Python Rust communities that enables this project.
diff --git a/docs/model-zoo.rst b/docs/model-zoo.rst
@@ -41,14 +41,32 @@ Below is a list of off-the-shelf prebuilt models compiled by MLC-LLM community.
     - `RedPajama <https://www.together.xyz/blog/redpajama>`__
     - * Weight storage data type: int4
       * Running data type: float32
-      * Symmetric quantization 
+      * Symmetric quantization
     - `link <https://huggingface.co/mlc-ai/mlc-chat-RedPajama-INCITE-Chat-3B-v1-q4f32_0>`__
   * - `RedPajama-INCITE-Chat-3B-v1-q4f16_0`
     - `RedPajama <https://www.together.xyz/blog/redpajama>`__
     - * Weight storage data type: int4
       * Running data type: float16
-      * Symmetric quantization 
+      * Symmetric quantization
     - `link <https://huggingface.co/mlc-ai/mlc-chat-RedPajama-INCITE-Chat-3B-v1-q4f16_0>`__
+  * - `rwkv-raven-1b5-q8f16_0`
+    - `RWKV <https://github.com/BlinkDL/RWKV-LM>`__
+    - * Weight storage data type: uint8
+      * Running data type: float16
+      * Symmetric quantization
+    - `link <https://huggingface.co/mlc-ai/mlc-chat-rwkv-raven-1b5-q8f16_0>`__
+  * - `rwkv-raven-3b-q8f16_0`
+    - `RWKV <https://github.com/BlinkDL/RWKV-LM>`__
+    - * Weight storage data type: uint8
+      * Running data type: float16
+      * Symmetric quantization
+    - `link <https://huggingface.co/mlc-ai/mlc-chat-rwkv-raven-3b-q8f16_0>`__
+  * - `rwkv-raven-7b-q8f16_0`
+    - `RWKV <https://github.com/BlinkDL/RWKV-LM>`__
+    - * Weight storage data type: uint8
+      * Running data type: float16
+      * Symmetric quantization
+    - `link <https://huggingface.co/mlc-ai/mlc-chat-rwkv-raven-7b-q8f16_0>`__
 
 You can check `MLC-LLM pull requests <https://github.com/mlc-ai/mlc-llm/pulls?q=is%3Aopen+is%3Apr+label%3Anew-models>`__ to track the ongoing efforts of new models. We encourage users to upload their compiled models to Hugging Face and share them with the community.
 
@@ -84,6 +102,10 @@ MLC-LLM supports the following model architectures:
     - `GPT-J <https://github.com/kingoflolz/mesh-transformer-jax>`__
     - `Relax Code <https://github.com/mlc-ai/mlc-llm/blob/main/mlc_llm/relax_model/gptj.py>`__
     - * `MOSS <https://github.com/OpenLMLab/MOSS>`__
+  * - ``rwkv``
+    - `RWKV <https://github.com/BlinkDL/RWKV-LM>`__
+    - `Relax Code <https://github.com/mlc-ai/mlc-llm/blob/main/mlc_llm/relax_model/rwkv.py>`__
+    - * `RWKV-raven <https://github.com/BlinkDL/RWKV-LM>
 
 For models within these model architectures, you can check the :doc:`/tutorials/compile-models` on how to compile models. Please create a new issue if you want to request a new model architecture. Our tutorial :doc:`/tutorials/bring-your-own-models` introduces how to bring a new  model architecture to MLC-LLM.
 

diff --git a/docs/tutorials/compile-models.rst b/docs/tutorials/compile-models.rst
@@ -192,6 +192,79 @@ In the command above, ``--model`` specifies the name of the model to build, ``--
 
                     python3 build.py --model RedPajama-INCITE-Chat-3B-v1 --target android --max-seq-len 768 --quantization q4f16_0
 
+    .. tab:: rwkv-raven-1b5/3b/7b
+
+        .. tabs::
+
+            .. tab:: Target: CUDA
+
+                .. code:: shell
+                    # For 1.5B model
+                    python3 build.py --hf-path=RWKV/rwkv-raven-1b5 --target cuda --quantization q8f16_0
+                    # For 3B model
+                    python3 build.py --hf-path=RWKV/rwkv-raven-3b --target cuda --quantization q8f16_0
+                    # For 7B model
+                    python3 build.py --hf-path=RWKV/rwkv-raven-7b --target cuda --quantization q8f16_0
+
+            .. tab:: Metal
+
+                On Apple Silicon powered Mac, build for Apple Silicon Mac:
+
+                .. code:: shell
+
+                    # For 1.5B model
+                    python3 build.py --hf-path=RWKV/rwkv-raven-1b5 --target metal --quantization q8f16_0
+                    # For 3B model
+                    python3 build.py --hf-path=RWKV/rwkv-raven-3b --target metal --quantization q8f16_0
+                    # For 7B model
+                    python3 build.py --hf-path=RWKV/rwkv-raven-7b --target metal --quantization q8f16_0
+
+                On Apple Silicon powered Mac, build for x86 Mac:
+
+                .. code:: shell
+
+                    # For 1.5B model
+                    python3 build.py --hf-path=RWKV/rwkv-raven-1b5 --target metal_x86_64 --quantization q8f16_0
+                    # For 3B model
+                    python3 build.py --hf-path=RWKV/rwkv-raven-3b --target metal_x86_64 --quantization q8f16_0
+                    # For 7B model
+                    python3 build.py --hf-path=RWKV/rwkv-raven-7b --target metal_x86_64 --quantization q8f16_0
+
+            .. tab:: Vulkan
+
+                On Linux, build for Linux:
+
+                .. code:: shell
+
+                    # For 1.5B model
+                    python3 build.py --hf-path=RWKV/rwkv-raven-1b5 --target vulkan --quantization q8f16_0
+                    # For 3B model
+                    python3 build.py --hf-path=RWKV/rwkv-raven-3b --target vulkan --quantization q8f16_0
+                    # For 7B model
+                    python3 build.py --hf-path=RWKV/rwkv-raven-7b --target vulkan --quantization q8f16_0
+
+                On Linux, build for Windows:
+
+                .. code:: shell
+
+                    # For 1.5B model
+                    python3 build.py --hf-path=RWKV/rwkv-raven-1b5 --target vulkan --quantization q8f16_0 --llvm-mingw path/to/llvm-mingw
+                    # For 3B model
+                    python3 build.py --hf-path=RWKV/rwkv-raven-3b --target vulkan --quantization q8f16_0 --llvm-mingw path/to/llvm-mingw
+                    # For 7B model
+                    python3 build.py --hf-path=RWKV/rwkv-raven-7b --target vulkan --quantization q8f16_0 --llvm-mingw path/to/llvm-mingw
+
+            .. tab:: iPhone/iPad
+
+                .. code:: shell
+
+                    # For 1.5B model
+                    python3 build.py --hf-path=RWKV/rwkv-raven-1b5 --target iphone --quantization q8f16_0
+                    # For 3B model
+                    python3 build.py --hf-path=RWKV/rwkv-raven-3b --target iphone --quantization q8f16_0
+                    # For 7B model
+                    python3 build.py --hf-path=RWKV/rwkv-raven-7b --target iphone --quantization q8f16_0
+
     .. tab:: Other models
 
         .. tabs::

diff --git a/docs/tutorials/deploy-models.rst b/docs/tutorials/deploy-models.rst
@@ -140,7 +140,7 @@ This section introduces how to prepare and upload the model you built.
 .. note::
     Before proceeding, you should first have the model built manually.
     At this moment, the iOS/Android/web app released by MLC LLM only support **specific model architectures with specific quantization modes**. Particularly,
-    
+
     - the :ref:`released iOS/iPadOS app <iPhone-download-app>` supports models structured by LLaMA-7B and quantized by ``q3f16_0``, and models structured by GPT-NeoX-3B and quantized by ``q4f16_0``.
     - the :ref:`released Android app <Android-download-app>` supports models structured by LLaMA-7B and quantized by ``q4f16_0``.
     - the `Web LLM demo page <https://mlc.ai/web-llm/>`_ supports models structured by LLaMA-7B and quantized by ``q4f32_0``, and models structured by GPT-NeoX-3B and quantized by both ``q4f16_0`` and ``q4f32_0``.
@@ -168,7 +168,7 @@ Opening that file, the ``model_lib`` field specifies the model library name we u
                         "model_lib": "vicuna-v1-7b-q3f16_0",
                         ...
                     }
-            
+
             .. tab:: GPT-NeoX-3B
 
                 The model is expected to be quantized by ``q4f16_0``:
@@ -179,7 +179,18 @@ Opening that file, the ``model_lib`` field specifies the model library name we u
                         "model_lib": "RedPajama-INCITE-Chat-3B-v1-q4f16_0",
                         ...
                     }
-    
+
+            .. tab:: RWKV
+
+                The model is expected to be quantized by ``q8f16_0``:
+
+                .. code::
+
+                    {
+                        "model_lib": "rwkv-raven-1b5-q8f16_0",
+                        ...
+                    }
+
     .. tab:: Android
 
         .. tabs::
@@ -194,7 +205,7 @@ Opening that file, the ``model_lib`` field specifies the model library name we u
                         "model_lib": "vicuna-v1-7b-q4f16_0",
                         ...
                     }
-    
+
     .. tab:: Web
 
         .. tabs::
@@ -209,7 +220,7 @@ Opening that file, the ``model_lib`` field specifies the model library name we u
                         "model_lib": "vicuna-v1-7b-q4f32_0",
                         ...
                     }
-            
+
             .. tab:: GPT-NeoX-3B
 
                 If the model is quantized by ``q4f16_0``:

diff --git a/ios/prepare_params.sh b/ios/prepare_params.sh
@@ -8,6 +8,9 @@ mkdir -p dist
 declare -a builtin_list=(
     "RedPajama-INCITE-Chat-3B-v1-q4f16_0"
     # "vicuna-v1-7b-q3f16_0"
+    # "rwkv-raven-1b5-q8f16_0"
+    # "rwkv-raven-3b-q8f16_0"
+    # "rwkv-raven-7b-q8f16_0"
 )
 
 for model in "${builtin_list[@]}"

diff --git a/site/index.md b/site/index.md
@@ -54,6 +54,7 @@ please install the latest [Vulkan driver](https://developer.nvidia.com/vulkan-dr
 Vulkan driver, as the CUDA driver may not be good.
 
 After installing all the dependencies, just follow the instructions below the install the CLI app:
+
 ```shell
 # Create a new conda environment and activate the environment.
 conda create -n mlc-chat
@@ -84,13 +85,20 @@ cd dist/prebuilt
 git clone https://huggingface.co/mlc-ai/mlc-chat-RedPajama-INCITE-Chat-3B-v1-q4f16_0
 cd ../..
 mlc_chat_cli --local-id RedPajama-INCITE-Chat-3B-v1-q4f16_0
+
+# Download prebuilt weights of RWKV-raven-1.5B/3B/7B
+cd dist/prebuilt
+git clone https://huggingface.co/mlc-ai/mlc-chat-rwkv-raven-1b5-q8f16_0
+# or git clone  https://huggingface.co/mlc-ai/mlc-chat-rwkv-raven-3b-q8f16_0
+# or git clone  https://huggingface.co/mlc-ai/mlc-chat-rwkv-raven-7b-q8f16_0
+cd ../..
+mlc_chat_cli --local-id rwkv-raven-1b5-q8f16_0 # Replace your local id if you use 3b or 7b model.
 ```
 
 <p align="center">
 <img src="gif/linux-demo.gif" width="80%">
 </p>
 
-
 ### Web Browser
 
 Please check out [WebLLM](https://mlc.ai/web-llm/), our companion project that deploys models natively to browsers. Everything here runs inside the browser with no server support and accelerated with WebGPU.
@@ -104,4 +112,5 @@ Please check out [WebLLM](https://mlc.ai/web-llm/), our companion project that d
 walkthrough of our approaches.
 
 ## Disclaimer
+
 The pre-packaged demos are for research purposes only, subject to the model License.