Stars
Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
Framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.
GPT4V-level open-source multi-modal model based on Llama3-8B
a state-of-the-art-level open visual language model | 多模态预训练模型
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
Letta (formerly MemGPT) is a framework for creating LLM services with memory.
Build multi-modal Agents with memory, knowledge, tools and reasoning. Chat with them using a beautiful Agent UI.
💬 Ready-to-use, flexible RAG Chatbot. 基于大模型和 RAG 的知识库问答系统。
BISHENG is an open LLM devops platform for next generation Enterprise AI applications. Powerful and comprehensive features include: GenAI workflow, RAG, Agent, Unified model management, Evaluation,…
the resources about the application based on LLM with RAG pattern
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
Lumina-T2X is a unified framework for Text to Any Modality Generation
An expert benchmark aiming to comprehensively evaluate the aesthetic perception capacities of MLLMs.
Llama中文社区,Llama3在线体验和微调模型已开放,实时汇总最新Llama3学习资料,已将所有代码更新适配Llama3,构建最好的中文Llama大模型,完全开源可商用
Build a Perplexity-Inspired Answer Engine Using Next.js, Groq, Llama-3, Langchain, OpenAI, Upstash, Brave & Serper
Stable Diffusion web UI
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
Clarity AI | AI Image Upscaler & Enhancer - free and open-source Magnific Alternative
Unofficial implementation of BRIA RMBG Model for ComfyUI
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators