Skip to content
/ LAMM Public

[NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agents

Notifications You must be signed in to change notification settings

OpenGVLab/LAMM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Mar 29, 2024
ea57136 Β· Mar 29, 2024
Oct 26, 2023
Oct 26, 2023
Mar 27, 2024
Mar 25, 2024
Nov 6, 2023
Mar 29, 2024
Feb 9, 2024
Mar 29, 2024

Repository files navigation

LAMM

LAMM (pronounced as /læm/, means cute lamb to show appreciation to LLaMA), is a growing open-source community aimed at helping researchers and developers quickly train and evaluate Multi-modal Large Language Models (MLLM), and further build multi-modal AI agents capable of bridging the gap between ideas and execution, enabling seamless interaction between humans and AI machines.

🌏 Project Page

Updates

πŸ“† [2024-03]

  1. Ch3Ef is available!
  2. Ch3Ef released on Arxiv!
  3. Dataset and leaderboard are available!

πŸ“† [2023-12]

  1. DepictQA: Depicted Image Quality Assessment based on Multi-modal Language Models released on Arxiv!
  2. MP5: A Multi-modal LLM based Open-ended Embodied System in Minecraft released on Arxiv!

πŸ“† [2023-11]

  1. ChEF: A comprehensive evaluation framework for MLLM released on Arxiv!
  2. Octavius: Mitigating Task Interference in MLLMs by combining Mixture-of-Experts (MoEs) with LoRAs released on Arxiv!
  3. Camera ready version of LAMM is available on Arxiv.

πŸ“† [2023-10]

  1. LAMM is accepted by NeurIPS2023 Datasets & Benchmark Track! See you in December!

πŸ“† [2023-09]

  1. Light training framework for V100 or RTX3090 is available! LLaMA2-based finetuning is also online.
  2. Our demo moved to OpenXLab.

πŸ“† [2023-07]

  1. Checkpoints & Leaderboard of LAMM on huggingface updated on new code base.
  2. Evaluation code for both 2D and 3D tasks are ready.
  3. Command line demo tools updated.

πŸ“† [2023-06]

  1. LAMM: 2D & 3D dataset & benchmark for MLLM
  2. Watch demo video for LAMM at YouTube or Bilibili!
  3. Full paper with Appendix is available on Arxiv.
  4. LAMM dataset released on Huggingface & OpenDataLab for Research community!',
  5. LAMM code is available for Research community!

Paper List

Publications

Preprints

Citation

LAMM

@article{yin2023lamm,
    title={LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark},
    author={Yin, Zhenfei and Wang, Jiong and Cao, Jianjian and Shi, Zhelun and Liu, Dingning and Li, Mukai and Sheng, Lu and Bai, Lei and Huang, Xiaoshui and Wang, Zhiyong and others},
    journal={arXiv preprint arXiv:2306.06687},
    year={2023}
}

Assessment of Multimodal Large Language Models in Alignment with Human Values

@misc{shi2024assessment,
      title={Assessment of Multimodal Large Language Models in Alignment with Human Values}, 
      author={Zhelun Shi and Zhipin Wang and Hongxing Fan and Zaibin Zhang and Lijun Li and Yongting Zhang and Zhenfei Yin and Lu Sheng and Yu Qiao and Jing Shao},
      year={2024},
      eprint={2403.17830},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

ChEF

@misc{shi2023chef,
      title={ChEF: A Comprehensive Evaluation Framework for Standardized Assessment of Multimodal Large Language Models}, 
      author={Zhelun Shi and Zhipin Wang and Hongxing Fan and Zhenfei Yin and Lu Sheng and Yu Qiao and Jing Shao},
      year={2023},
      eprint={2311.02692},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Octavius

@misc{chen2023octavius,
      title={Octavius: Mitigating Task Interference in MLLMs via MoE}, 
      author={Zeren Chen and Ziqin Wang and Zhen Wang and Huayang Liu and Zhenfei Yin and Si Liu and Lu Sheng and Wanli Ouyang and Yu Qiao and Jing Shao},
      year={2023},
      eprint={2311.02684},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

DepictQA

@article{depictqa,
        title={Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models},
        author={You, Zhiyuan and Li, Zheyuan, and Gu, Jinjin, and Yin, Zhenfei and Xue, Tianfan and Dong, Chao},
        journal={arXiv preprint arXiv:2312.08962},
        year={2023}
    }

MP5

@misc{qin2023mp5,
  title         = {MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception}, 
  author        = {Yiran Qin and Enshen Zhou and Qichang Liu and Zhenfei Yin and Lu Sheng and Ruimao Zhang and Yu Qiao and Jing Shao},
  year          = {2023},
  eprint        = {2312.07472},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV}
}

Get Started

Please see tutorial for the basic usage of this repo.

License

The project is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes.

About

[NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agents

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages