Skip to content
View xdjiangkai's full-sized avatar
  • Xidian University
  • Shaanxi China

Block or report xdjiangkai

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Python 2,860 265 Updated Jun 4, 2024

Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

Python 237 15 Updated Aug 11, 2024

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Python 3,099 220 Updated Dec 3, 2024

Inference code for Llama models

Python 57,080 9,641 Updated Aug 18, 2024

[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

Python 520 38 Updated Dec 26, 2024

GUICourse: From General Vision Langauge Models to Versatile GUI Agents

Python 95 6 Updated Jul 17, 2024

GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI.

34 1 Updated Dec 17, 2024

Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks

Python 1,612 230 Updated Jan 3, 2025

Implementation of the general framework for AMIE, from the paper "Towards Conversational Diagnostic AI", out of Google Deepmind

Python 54 Updated Sep 16, 2024

GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI.

58 Updated Nov 27, 2024

Large Language-and-Vision Assistant for Biomedicine, built towards multimodal GPT-4 level capabilities.

Python 1,654 202 Updated Aug 13, 2024

Meditron is a suite of open-source medical Large Language Models (LLMs).

Python 1,925 174 Updated Apr 10, 2024

distributed trainer for LLMs

Python 554 79 Updated May 20, 2024

The model, data and code for the visual GUI Agent SeeClick

HTML 274 13 Updated Nov 22, 2024

Towards Generalist Biomedical AI

Python 338 51 Updated Feb 17, 2024

OS-ATLAS: A Foundation Action Model For Generalist GUI Agents

221 8 Updated Nov 19, 2024

Code for paper: Improved GUI Grounding via Iterative Narrowing

Jupyter Notebook 6 Updated Dec 20, 2024
Jupyter Notebook 8 1 Updated Aug 1, 2024

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Python 5,271 403 Updated Aug 7, 2024

Building a comprehensive and handy list of papers for GUI agents

Python 147 7 Updated Jan 4, 2025

AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.

Python 5,312 578 Updated Aug 8, 2024

This is an official implementation for "Learning a Cross-Modality Anomaly Detector for Remote Sensing Imagery“ (TIP 2024))

11 Updated Dec 21, 2024

[ECCV 2024] Tokenize Anything via Prompting

Jupyter Notebook 554 24 Updated Dec 11, 2024

Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes

JavaScript 12,880 44,815 Updated Dec 30, 2024

具身智能入门指南 Embodied-AI-Guide

1,054 50 Updated Jan 5, 2025

西电校园网登录助手,支持自动登录与开机自启。

Rust 10 1 Updated Oct 22, 2024

Pytorch implementation of convolutional neural network visualization techniques

Python 7,916 1,490 Updated Jan 1, 2025

Collection of AWESOME vision-language models for vision tasks

2,678 227 Updated Dec 3, 2024

a collection of awesome autoregressive visual generation models

57 Updated Dec 29, 2024

The paper collections for the autoregressive models in vision.

343 12 Updated Dec 27, 2024
Next