-
Xidian University
- Shaanxi China
Lists (1)
Sort Name ascending (A-Z)
Stars
GUICourse: From General Vision Langauge Models to Versatile GUI Agents
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI.
Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
Implementation of the general framework for AMIE, from the paper "Towards Conversational Diagnostic AI", out of Google Deepmind
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI.
Large Language-and-Vision Assistant for Biomedicine, built towards multimodal GPT-4 level capabilities.
Meditron is a suite of open-source medical Large Language Models (LLMs).
The model, data and code for the visual GUI Agent SeeClick
OS-ATLAS: A Foundation Action Model For Generalist GUI Agents
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Building a comprehensive and handy list of papers for GUI agents
AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.
[ECCV 2024] Tokenize Anything via Prompting
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
Pytorch implementation of convolutional neural network visualization techniques
Collection of AWESOME vision-language models for vision tasks
a collection of awesome autoregressive visual generation models
The paper collections for the autoregressive models in vision.
A method to increase the speed and lower the memory footprint of existing vision transformers.
This is a repo to track the latest autoregressive visual generation papers.
[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-sim…