Stars
Fully open reproduction of DeepSeek-R1
The World's Largest Decentralized AGI Multimodal Dataset
Robust Molecular Structure Recognition with Image-to-Graph Generation
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.
Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding
Python tool for converting files and office documents to Markdown.
OpenOCR: A general OCR system with accuracy and efficiency. Supporting 24 Scene Text Recognition methods trained from scratch on large-scale real datasets, and will continue to add the latest methods.
A course on aligning smol models.
HunyuanVideo: A Systematic Framework For Large Video Generation Model
No-GIL Python environment featuring NVIDIA Deep Learning libraries.
The AI-native database built for LLM applications, providing incredibly fast hybrid search of dense vector, sparse vector, tensor (multi-vector), and full-text
A simple screen parsing tool towards pure vision based GUI agent
Janus-Series: Unified Multimodal Understanding and Generation Models
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
RoseTTAFold2 protein/nucleic acid complex prediction
Baichuan-Omni: Towards Capable Open-source Omni-modal LLM 🌊
MixTeX multimodal LaTeX, ZhEn, and, Table OCR. It performs efficient CPU-based inference in a local offline on Windows.
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second.
🚀🤖 Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper