Skip to content
View zhaohengyuan1's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report zhaohengyuan1

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

ChatArena (or Chat Arena) is a Multi-Agent Language Game Environments for LLMs. The goal is to develop communication and collaboration capabilities of AIs.

Python 1,375 133 Updated May 27, 2024

VisualWebArena is a benchmark for multimodal agents.

Python 258 50 Updated Nov 9, 2024

O1 Replication Journey: A Strategic Progress Report – Part I

1,663 49 Updated Nov 30, 2024

FQGAN: Factorized Visual Tokenization and Generation

34 Updated Nov 28, 2024

Code for ROICtrl: Boosting Instance Control for Visual Generation

Python 99 Updated Dec 10, 2024

Repository for ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Python 608 29 Updated Dec 12, 2024

[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos

Python 77 2 Updated Dec 8, 2024

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

Python 25,278 2,437 Updated Dec 13, 2024

Out-of-the-box (OOTB) GUI Agent for Windows and macOS

Python 993 90 Updated Dec 11, 2024

[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).

Python 664 87 Updated Nov 13, 2024

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Python 12,846 901 Updated Oct 22, 2024

(ECCV 2024) Empowering Multimodal Large Language Model as a Powerful Data Generator

Python 110 Updated Oct 17, 2024

connecting humans and agents

Python 62 7 Updated Dec 6, 2024

(NeurIPS 2024) Learning to Visual Question Answering, Asking and Assessment

Python 74 2 Updated Nov 7, 2024

[NeurIPS 2024] EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.

Python 44 Updated Oct 14, 2024

[ICML 2024 Oral] Official code repository for MLLM-as-a-Judge.

Python 57 4 Updated Nov 27, 2024

[NeurIPS2023] Official implementation and model release of the paper "What Makes Good Examples for Visual In-Context Learning?"

Python 166 7 Updated Mar 4, 2024

CVPR and NeurIPS poster examples and templates. May we have in-person poster session soon!

1,528 143 Updated May 9, 2023

Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"

Python 38 3 Updated Apr 18, 2024

A programming framework for agentic AI 🤖 (PyPi: autogen-agentchat)

Python 35,778 5,173 Updated Dec 13, 2024

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

Python 4,399 231 Updated Jun 14, 2024

Set-of-Mark Prompting for GPT-4V and LMMs

Python 1,215 98 Updated Aug 19, 2024

[ICLR 2024] SWE-bench: Can Language Models Resolve Real-world Github Issues?

Python 2,079 365 Updated Dec 10, 2024

The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.

6,954 414 Updated Jul 28, 2024

[NeurIPS'23 Spotlight] "Mind2Web: Towards a Generalist Agent for the Web"

Jupyter Notebook 737 104 Updated Jul 30, 2024

Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,080 45 Updated Dec 11, 2024

An LLM-based Web Navigating Agent (KDD'24)

Python 764 62 Updated Sep 27, 2024

Generative Visual Question Answering Pytorch

Python 4 1 Updated Jun 22, 2022

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,789 117 Updated Oct 30, 2024
Next