-
Shanghai Jiao Tong University
- Shanghai
-
21:03
(UTC -12:00) - https://www.sjtu.edu.cn/
Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
Tracking and collecting papers/projects/others related to Segment Anything.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
a state-of-the-art-level open visual language model | 多模态预训练模型
A one stop repository for generative AI research updates, interview resources, notebooks and much more!
mPLUG-Owl: The Powerful Multi-modal Large Language Model Family
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"
[Nature Communications] The official codes for "Towards Building Multilingual Language Model for Medicine"
The official code to build up dataset PMC-OA
[NeurIPS 2024 D&B Track, Spotlight] UltraMedical: Building Specialized Generalists in Biomedicine
The official codes for "PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents"
PMC-VQA is a large-scale medical visual question-answering dataset, which contains 227k VQA pairs of 149k images that cover various modalities or diseases.
Mixture-of-Experts for Large Vision-Language Models
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, B…
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
Multi-Aspect Vision Language Pretraining - CVPR2024
Official implementation of SAM-Med2D
Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high …
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
The official codes for "PMC-LLaMA: Towards Building Open-source Language Models for Medicine"
M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language Models
[CVPR 2024] FairCLIP: Harnessing Fairness in Vision-Language Learning
The official code for "Towards Generalist Foundation Model for Radiology by Leveraging Web-scale 2D&3D Medical Data".
Large Language-and-Vision Assistant for Biomedicine, built towards multimodal GPT-4 level capabilities.
The official respository of paper named 'A Refer-and-Ground Multimodal Large Language Model for Biomedicine'
Prompt Learning for Vision-Language Models (IJCV'22, CVPR'22)