Skip to content
View xdjiangkai's full-sized avatar
  • Xidian University
  • Shaanxi China

Block or report xdjiangkai

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

GUICourse: From General Vision Langauge Models to Versatile GUI Agents

Python 92 6 Updated Jul 17, 2024

GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI.

34 1 Updated Oct 15, 2024

Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks

Python 1,494 206 Updated Dec 12, 2024

Implementation of the general framework for AMIE, from the paper "Towards Conversational Diagnostic AI", out of Google Deepmind

Python 54 Updated Sep 16, 2024

GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI.

53 Updated Nov 27, 2024

Large Language-and-Vision Assistant for Biomedicine, built towards multimodal GPT-4 level capabilities.

Python 1,618 204 Updated Aug 13, 2024

Meditron is a suite of open-source medical Large Language Models (LLMs).

Python 1,916 173 Updated Apr 10, 2024

distributed trainer for LLMs

Python 548 78 Updated May 20, 2024

The model, data and code for the visual GUI Agent SeeClick

HTML 247 12 Updated Nov 22, 2024

Towards Generalist Biomedical AI

Python 332 49 Updated Feb 17, 2024

OS-ATLAS: A Foundation Action Model For Generalist GUI Agents

198 6 Updated Nov 19, 2024
Jupyter Notebook 1 Updated Dec 9, 2024
Jupyter Notebook 8 1 Updated Aug 1, 2024

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 5,176 391 Updated Aug 7, 2024

Building a comprehensive and handy list of papers for GUI agents

Python 46 2 Updated Dec 12, 2024

AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.

Python 5,232 564 Updated Aug 8, 2024

[ECCV 2024] Tokenize Anything via Prompting

Jupyter Notebook 543 21 Updated Dec 11, 2024

Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes

JavaScript 12,676 44,437 Updated Dec 2, 2024

具身智能中文指南

678 32 Updated Dec 11, 2024

西电校园网登录助手,支持自动登录与开机自启。

Rust 10 1 Updated Oct 22, 2024

Pytorch implementation of convolutional neural network visualization techniques

Python 7,903 1,493 Updated Oct 10, 2022

Collection of AWESOME vision-language models for vision tasks

2,606 222 Updated Dec 3, 2024

a collection of awesome autoregressive visual generation models

51 Updated Dec 8, 2024

The paper collections for the autoregressive models in vision.

306 10 Updated Dec 12, 2024
Python 46 10 Updated Nov 26, 2024

MVD Evaluation Scripts

Python 68 11 Updated Aug 7, 2024

A method to increase the speed and lower the memory footprint of existing vision transformers.

Python 976 70 Updated Jun 17, 2024

This is a repo to track the latest autoregressive visual generation papers.

69 Updated Dec 5, 2024

[NeurIPS 2024 Oral][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ultra-sim…

Python 6,114 411 Updated Dec 6, 2024
Next