GitHub

Shikra: Unleashing Multimodal LLM’s Referential Dialogue Magic

Shikra, an MLLM designed to kick off referential dialogue by excelling in spatial coordinate inputs/outputs in natural language, without additional vocabularies, position encoders, pre-/post-detection, or external plug-in models.

News

[07/03] We released the code, data and Shikra-7B checkpoint.

[06/28] We released Shikra: Unleashing Multimodal LLM’s Referential Dialogue Magic, which is designed to kick off referential dialogue. Checkout the paper.

Install

conda create -n shikra python=3.10
conda activate shikra
pip install -r requirements.txt

configure accelerate

accelerate config

Shikra weights

We release Shikra weights as delta weights to comply with the LLaMA model license. You can add our delta to the original LLaMA weights to obtain the Shikra weights.

Instructions:

Get the original LLaMA weights in the huggingface format by following the instructions here.
Use the following scripts to get weights by applying our delta (shikra-7b-delta). It will automatically download delta weights from our Hugging Face account.

python mllm/models/shikra/apply_delta.py \
    --base /path/to/llama-7b \
    --target /output/path/to/shikra-7b \
    --delta shikras/shikra-7b-delta-v1

Train

After preparing data, you can train the model using the command:

accelerate launch --num_processes 4 \
        --main_process_port 23786 \
        mllm/pipeline/finetune.py \
        config/shikra_pretrain_final19_stage2.py \
        --cfg-options model_args.model_name_or_path=/path/to/init/checkpoint

mmengine style args and huggingface:Trainer args are supported. For example, you can change epoch and output_dir like this:

accelerate launch --num_processes 4 \
        --main_process_port 23786 \
        mllm/pipeline/finetune.py \
        config/shikra_pretrain_final19_stage2.py \
        --cfg-options model_args.model_name_or_path=/path/to/init/checkpoint \
        --num_train_epochs 3 \
        --output_dir /path/to/new/exp/dir

where --cfg-options a=balabala b=balabala is mmengine style argument. They will overwrite the argument predefined in config file. And --num_train_epochs , --output_dir are huggingface:Trainer argument.

Inference

After preparing data, you can inference the model using the command:

accelerate launch --num_processes 4 \
        --main_process_port 23786 \
        mllm/pipeline/finetune.py \
        config/shikra_eval_multi_pope.py \
        --cfg-options model_args.model_name_or_path=/path/to/checkpoint

mmengine style args and huggingface:Trainer args are supported. for example, you can change eval batchsize like this:

accelerate launch --num_processes 4 \
        --main_process_port 23786 \
        mllm/pipeline/finetune.py \
        config/shikra_eval_multi_pope.py \
        --cfg-options model_args.model_name_or_path=/path/to/checkpoint \
        --per_device_eval_batch_size 1

where --cfg-options a=balabala b=balabala is mmengine style argument. They will overwrite the argument predefined in config file. And --per_device_eval_batch_size is huggingface:Trainer argument.

the prediction result will be saved in output_dir/multitest_xxxx_extra_prediction.jsonl, which hold the same order as the input dataset.

Examples

Cite

@article{chen2023shikra,
  title={Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic},
  author={Chen, Keqin and Zhang, Zhao and Zeng, Weili and Zhang, Richong and Zhu, Feng and Zhao, Rui},
  journal={arXiv preprint arXiv:2306.15195},
  year={2023}
}

Acknowledgement

This repo benefits from LLaVA, Vicuna and ChatGLM-Efficient-Tuning. Thanks for their wonderful works.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets		assets
config		config
docs		docs
mllm		mllm
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Shikra: Unleashing Multimodal LLM’s Referential Dialogue Magic

News

Contents

Install

configure accelerate

Shikra weights

Train

Inference

Examples

Cite

Acknowledgement

About

Releases

Packages

Languages

License

tfwcn/shikra

Folders and files

Latest commit

History

Repository files navigation

Shikra: Unleashing Multimodal LLM’s Referential Dialogue Magic

News

Contents

Install

configure accelerate

Shikra weights

Train

Inference

Examples

Cite

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages