Android apps which demonstrate the use of OpenAI's CLIP model for zero-shot image classification and text-image retrieval using clip.cpp
- Fully on-device inference of the CLIP model (generate text and image embeddings)
- Uses JNI bindings over clip.cpp which itself is based on ggml for efficient inference
The project consists of two Gradle modules, app-zero-shot-classify
and app-text-image-search
which contain the sources files for the 'Zero Shot Image Classification' and `Text-Image-Search'
apps respectively.
- Clone the project and open the resulting directory in Android Studio. An automatic Gradle build
should start, if not click on the
Build
menu and selectMake Project
.
git clone https://github.com/shubham0204/CLIP-Android --depth=1
-
Connect the test-device to the computer and make sure that the device is recognized by the computer.
-
Download one of the GGUF models from the HuggingFace repository. For instance, if we download the
CLIP-ViT-B-32-laion2B-s34B-b79K_ggml-model-f16.gguf
model, we need to push it to the test-device's file-system usingadb push
,
adb push CLIP-ViT-B-32-laion2B-s34B-b79K_ggml-model-f16.gguf /data/local/tmp/clip_model_fp16.gguf
[!INFO] It is not required to place the GGUF model in the
/data/local/tmp
directory. A path to the app's internal storage or any other directory which the app can accessible is also allowed.
- For both modules, in
MainActivityViewModel.kt
, ensure that theMODEL_PATH
variable points to the correct model path on the test-device. For instance, if the model is pushed to/data/local/tmp/clip_model_fp16.gguf
, then theMODEL_PATH
variable should be set to/data/local/tmp/clip_model_fp16.gguf
. Moreover, you can configureNUM_THREADS
andVERBOSITY
variables as well.
private val MODEL_PATH = "/data/local/tmp/clip_model_fp16.gguf"
private val NUM_THREADS = 4
private val VERBOSITY = 1
- Select one of the module in the
Run / Debug Configuration
dropdown in the top menu bar, and run the app on the test-device by clicking on theRun
button (Shift + F10) in Android Studio.
-
Navigate to this fork of clip.cpp and clone the branch (
add-android-sample
) and open the resulting directory in Android Studio. -
The project contains two modules,
app
andclip
. The AAR of theclip
module can be found in theapp-text-image-search/libs
andapp-zero-shot-classify/libs
directories of this project. Runninggradlew clip:assemble
should build the debug and release versions ofclip
as an AAR. -
The AAR can be added to the
libs
directory of your project and added as a dependency in thebuild.gradle
file of the app module.
dependencies {
// ...
implementation(files("libs/clip.aar"))
// ...
}
- CLIP: Connecting Text and Images
- Learning Transferable Visual Models From Natural Language Supervision
- clip.cpp
- ggml
- shubham0204's PR that adds JNI bindings to clip.cpp
@article{DBLP:journals/corr/abs-2103-00020,
author = {Alec Radford and
Jong Wook Kim and
Chris Hallacy and
Aditya Ramesh and
Gabriel Goh and
Sandhini Agarwal and
Girish Sastry and
Amanda Askell and
Pamela Mishkin and
Jack Clark and
Gretchen Krueger and
Ilya Sutskever},
title = {Learning Transferable Visual Models From Natural Language Supervision},
journal = {CoRR},
volume = {abs/2103.00020},
year = {2021}
}