Stars
Eko (Eko Keeps Operating) - Build Production-ready Agentic Workflow with Natural Language - eko.fellou.ai
A high-throughput and memory-efficient inference and serving engine for LLMs
AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.
The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curatio…
🤠 Agent-as-a-Judge and DevAI dataset
ChatArena (or Chat Arena) is a Multi-Agent Language Game Environments for LLMs. The goal is to develop communication and collaboration capabilities of AIs.
VisualWebArena is a benchmark for multimodal agents.
Code for ROICtrl: Boosting Instance Control for Visual Generation
Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Out-of-the-box (OOTB) GUI Agent for Windows and macOS
[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large multimodal models (LMMs) such as GPT-4V(ision).
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
(ECCV 2024) Empowering Multimodal Large Language Model as a Powerful Data Generator
(NeurIPS 2024) Learning to Visual Question Answering, Asking and Assessment
[NeurIPS 2024] EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.
[ICML 2024 Oral] Official code repository for MLLM-as-a-Judge.
[NeurIPS2023] Official implementation and model release of the paper "What Makes Good Examples for Visual In-Context Learning?"
CVPR and NeurIPS poster examples and templates. May we have in-person poster session soon!
Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"
A programming framework for agentic AI 🤖 PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
[ICLR 2024] SWE-bench: Can Language Models Resolve Real-world Github Issues?
The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.