Skip to content
View waxnkw's full-sized avatar

Block or report waxnkw

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Long Context Transfer from Language to Vision

Python 348 18 Updated Nov 20, 2024
Python 3,086 265 Updated Oct 16, 2024

NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

Python 406 24 Updated Oct 20, 2024
Python 125 20 Updated Oct 31, 2024

Open-Sora: Democratizing Efficient Video Production for All

Python 22,600 2,215 Updated Nov 28, 2024

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Python 12,846 901 Updated Oct 22, 2024

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 6,367 492 Updated Dec 10, 2024

The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".

Python 226 9 Updated Feb 5, 2024

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)

Python 747 45 Updated Jul 29, 2024

Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model

Python 3,334 336 Updated Nov 3, 2024

Progressive Spatio-Temporal Prototype Matching for Text-Video Retrieval --ICCV2023 Oral

Python 90 1 Updated Nov 2, 2023

[ICLR'24 spotlight] Chinese and English Multimodal Large Model Series (Chat and Paint) | 基于CPM基础模型的中英双语多模态大模型系列

Python 1,097 92 Updated Jun 13, 2024
Python 747 46 Updated Jul 8, 2024

Official repo for VideoComposer: Compositional Video Synthesis with Motion Controllability

Python 908 83 Updated Nov 11, 2023

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, B…

Python 475 36 Updated Apr 21, 2024

Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts

Python 319 36 Updated Aug 1, 2023

Codes for VPGTrans: Transfer Visual Prompt Generator across LLMs. VL-LLaMA, VL-Vicuna.

Python 270 25 Updated Oct 13, 2023

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Python 37,238 4,574 Updated Dec 10, 2024

Running large language models on a single GPU for throughput-oriented scenarios.

Python 9,235 552 Updated Oct 28, 2024

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Jupyter Notebook 26,454 3,360 Updated Jul 23, 2024

[NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

Python 156 23 Updated Dec 9, 2024

This is the code of ECCV 2022 (Oral) paper "Fine-Grained Scene Graph Generation with Data Transfer".

Jupyter Notebook 97 7 Updated Jan 24, 2023

Official repository for the A-OKVQA dataset

Python 66 7 Updated May 8, 2024

Code repository for "It's About Time: Analog clock Reading in the Wild"

Python 71 10 Updated Jun 15, 2024

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)

Python 27 1 Updated Jul 18, 2023

Visual Relation Grounding in Videos (ECCV'20, Spotlight)

Python 57 7 Updated Dec 8, 2022

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)

Python 133 13 Updated Jul 25, 2024

Video as Conditional Graph Hierarchy for Multi-Granular Question Answering (AAAI'22, Oral)

Python 33 4 Updated Sep 17, 2022
Python 97 32 Updated Aug 28, 2018

The source code of our COLING'18 paper "Few-Shot Charge Prediction with Discriminative Legal Attributes".

Python 129 29 Updated Dec 20, 2018
Next