Stars
Code for the paper "LongKey: Keyphrase Extraction for Long Documents"
Let your Claude able to think
Using SQL's Turing Completeness to Build Tetris
D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement 💥💥💥
Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
HuixiangDou: Overcoming Group Chat Scenarios with LLM-based Technical Assistance
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑🔬
using clip and sam to segment any instance you specify with text prompt of any instance names
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"
Generate a transcript for your favourite Manga: Detect manga characters, text blocks and panels. Order panels. Cluster characters. Match texts to their speakers. Perform OCR.
Official inference repo for FLUX.1 models
A Comprehensive Toolkit for High-Quality PDF Content Extraction
[NeurIPS'24 Spotlight] EVE: Encoder-Free Vision-Language Models
《李宏毅深度学习教程》(李宏毅老师推荐👍,苹果书🍎),PDF下载地址:https://github.com/datawhalechina/leedl-tutorial/releases
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
[NeurIPS 2024] Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video.
This repo contains the code for 1D tokenizer and generator
Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI