CogView4, CogView3-Plus and CogView3(ECCV 2024)
Generate Any 3D Scene in Seconds
Qwen3-Coder is the code version of Qwen3
Video understanding codebase from FAIR for reproducing video models
Towards Real-World Vision-Language Understanding
A Unified Framework for Text-to-3D and Image-to-3D Generation
Multimodal-Driven Architecture for Customized Video Generation
Official implementation of DreamCraft3D
Diffusion Transformer with Fine-Grained Chinese Understanding
Diversity-driven optimization and large-model reasoning ability
Phi-3.5 for Mac: Locally-run Vision and Language Models
Revolutionizing Database Interactions with Private LLM Technology
The official PyTorch implementation of Google's Gemma models
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
Unified Multimodal Understanding and Generation Models
Sharp Monocular Metric Depth in Less Than a Second
DeepSeek Coder: Let the Code Write Itself
Implementation of the Surya Foundation Model for Heliophysics
GLM-4.6V/4.5V/4.1V-Thinking, towards versatile multimodal reasoning
A state-of-the-art open visual language model
Chat & pretrained large vision language model
Implementation of "MobileCLIP" CVPR 2024
Official implementation of Watermark Anything with Localized Messages
High-resolution models for human tasks
CLIP, Predict the most relevant text snippet given an image