Lists (1)
Sort Name ascending (A-Z)
Stars
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
InspireMusic: A Unified Framework for Music, Song, Audio Generation.
Efficient 2:4 sparse training algorithms and implementations
Code for paper "Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models"
need more time for construction
Knowledge Transfer and Domain Adaptation for Fine-Grained Remote Sensing Image Segmentation
DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding
✨✨Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
Official PyTorch implementation of "Adapting Vision Foundation Models for Robust Cloud Segmentation in Remote Sensing Images".
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
VoiceBench: Benchmarking LLM-Based Voice Assistants
A resource for learning about Machine learning & Deep Learning
A curated list for Efficient Large Language Models
Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
ManiBox: Enhancing Spatial Grasping Generalization via Scalable Simulation Data Generation
SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer.
🚀 Next Generation AI One-Stop Internationalization Solution. 🚀 下一代 AI 一站式 B/C 端解决方案,支持 OpenAI,Midjourney,Claude,讯飞星火,Stable Diffusion,DALL·E,ChatGLM,通义千问,腾讯混元,360 智脑,百川 AI,火山方舟,新必应,Gemini,Moonshot …
Toward Multi Modality Language Model - implementation of GPT-4o/Project Astra