zhaohengyuan1

Follow

🎯

Focusing

zhaohengyuan zhaohengyuan1

🎯

Focusing

Follow

Ph.D. student at National University of Singapore.

56 followers · 24 following

Achievements

Achievements

Stars

Farama-Foundation / chatarena

ChatArena (or Chat Arena) is a Multi-Agent Language Game Environments for LLMs. The goal is to develop communication and collaboration capabilities of AIs.

Python 1,375 133 Updated May 27, 2024

web-arena-x / visualwebarena

VisualWebArena is a benchmark for multimodal agents.

Python 258 50 Updated Nov 9, 2024

GAIR-NLP / O1-Journey

O1 Replication Journey: A Strategic Progress Report – Part I

1,663 49 Updated Nov 30, 2024

showlab / FQGAN

FQGAN: Factorized Visual Tokenization and Generation

34 Updated Nov 28, 2024

showlab / ROICtrl

Code for ROICtrl: Boosting Instance Control for Visual Generation

Python 99 Updated Dec 10, 2024

showlab / ShowUI

Repository for ShowUI: One Vision-Language-Action Model for GUI Visual Agent

Python 608 29 Updated Dec 12, 2024

showlab / VideoLISA

[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos

Python 77 2 Updated Dec 8, 2024

infiniflow / ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

Python 25,278 2,437 Updated Dec 13, 2024

showlab / computer_use_ootb

Out-of-the-box (OOTB) GUI Agent for Windows and macOS

Python 993 90 Updated Dec 11, 2024

OSU-NLP-Group / SeeAct

[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).

Python 664 87 Updated Nov 13, 2024

OpenBMB / MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Python 12,846 901 Updated Oct 22, 2024

zhaohengyuan1 / Genixer

(ECCV 2024) Empowering Multimodal Large Language Model as a Powerful Data Generator

Python 110 Updated Oct 17, 2024

thinkwee / iAgents

connecting humans and agents

Python 62 7 Updated Dec 6, 2024

showlab / LOVA3

(NeurIPS 2024) Learning to Visual Question Answering, Asking and Assessment

Python 74 2 Updated Nov 7, 2024

tsinghua-fib-lab / EmbodiedCity

Python 189 9 Updated Oct 19, 2024

showlab / EvolveDirector

[NeurIPS 2024] EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.

Python 44 Updated Oct 14, 2024

Dongping-Chen / MLLM-Judge

[ICML 2024 Oral] Official code repository for MLLM-as-a-Judge.

Python 57 4 Updated Nov 27, 2024

ZhangYuanhan-AI / visual_prompt_retrieval

[NeurIPS2023] Official implementation and model release of the paper "What Makes Good Examples for Visual In-Context Learning?"

Python 166 7 Updated Mar 4, 2024

zhoubolei / bolei_awesome_posters

CVPR and NeurIPS poster examples and templates. May we have in-person poster session soon!

1,528 143 Updated May 9, 2023

shulin16 / MMInA

Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"

Python 38 3 Updated Apr 18, 2024

microsoft / autogen

A programming framework for agentic AI 🤖 (PyPi: autogen-agentchat)

Python 35,778 5,173 Updated Dec 13, 2024

luosiallen / latent-consistency-model

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

Python 4,399 231 Updated Jun 14, 2024

microsoft / SoM

Set-of-Mark Prompting for GPT-4V and LMMs

Python 1,215 98 Updated Aug 19, 2024

swe-bench / SWE-bench

[ICLR 2024] SWE-bench: Can Language Models Resolve Real-world Github Issues?

Python 2,079 365 Updated Dec 10, 2024

WooooDyy / LLM-Agent-Paper-List

The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.

6,954 414 Updated Jul 28, 2024

OSU-NLP-Group / Mind2Web

[NeurIPS'23 Spotlight] "Mind2Web: Towards a Generalist Agent for the Web"

Jupyter Notebook 737 104 Updated Jul 30, 2024

showlab / Show-o

Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,080 45 Updated Dec 11, 2024

THUDM / AutoWebGLM

An LLM-based Web Navigating Agent (KDD'24)

Python 764 62 Updated Sep 27, 2024

Yang-YiFan / vqa-gan

Generative Visual Question Answering Pytorch

Python 4 1 Updated Jun 22, 2022

cambrian-mllm / cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,789 117 Updated Oct 30, 2024