-
Xidian University
- Shaanxi China
Lists (1)
Sort Name ascending (A-Z)
Stars
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
GUICourse: From General Vision Langauge Models to Versatile GUI Agents
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI.
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
Implementation of the general framework for AMIE, from the paper "Towards Conversational Diagnostic AI", out of Google Deepmind
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI.
Large Language-and-Vision Assistant for Biomedicine, built towards multimodal GPT-4 level capabilities.
Meditron is a suite of open-source medical Large Language Models (LLMs).
The model, data and code for the visual GUI Agent SeeClick
OS-ATLAS: A Foundation Action Model For Generalist GUI Agents
Code for paper: Improved GUI Grounding via Iterative Narrowing
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Building a comprehensive and handy list of papers for GUI agents
AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.
This is an official implementation for "Learning a Cross-Modality Anomaly Detector for Remote Sensing Imagery“ (TIP 2024))
[ECCV 2024] Tokenize Anything via Prompting
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
Pytorch implementation of convolutional neural network visualization techniques
Collection of AWESOME vision-language models for vision tasks
a collection of awesome autoregressive visual generation models
The paper collections for the autoregressive models in vision.