• MongoDB Atlas runs apps anywhere Icon
    MongoDB Atlas runs apps anywhere

    Deploy in 115+ regions with the modern database for every enterprise.

    MongoDB Atlas gives you the freedom to build and run modern applications anywhere—across AWS, Azure, and Google Cloud. With global availability in over 115 regions, Atlas lets you deploy close to your users, meet compliance needs, and scale with confidence across any geography.
    Start Free
  • Try Google Cloud Risk-Free With $300 in Credit Icon
    Try Google Cloud Risk-Free With $300 in Credit

    No hidden charges. No surprise bills. Cancel anytime.

    Use your credit across every product. Compute, storage, AI, analytics. When it runs out, 20+ products stay free. You only pay when you choose to.
    Start Free
  • 1
    Label Sleuth

    Label Sleuth

    Open source no-code system for text annotation and building of text

    An open-source no-code system for text annotation and building text classifiers. No AI knowledge needed. From task definition to working model in just a few hours! While domain experts label their data, Label Sleuth automatically trains in the background-appropriate machine learning models. To avoid wasted labeling effort, Label Sleuth employs active learning techniques to guide the user in what they should be labeled next. Domain experts can quickly start labeling their data through an intuitive user interface. ...
    Downloads: 5 This Week
    Last Update:
    See Project
  • 2
    OpenCompass

    OpenCompass

    OpenCompass is an LLM evaluation platform

    ...Pre-support for 20+ HuggingFace and API models, a model evaluation scheme of 50+ datasets with about 300,000 questions, comprehensively evaluating the capabilities of the models in five dimensions. One line command to implement task division and distributed evaluation, completing the full evaluation of billion-scale models in just a few hours. Support for zero-shot, few-shot, and chain-of-thought evaluations, combined with standard or dialogue type prompt templates, to easily stimulate the maximum performance of various models.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 3
    MetaVoice-1B

    MetaVoice-1B

    Foundational model for human-like, expressive TTS

    MetaVoice — in the form of its source repository “metavoice-src” — is a large-scale text-to-speech (TTS) model. Specifically, the base model (MetaVoice-1B) uses around 1.2 billion parameters and has been trained on a massive dataset — reportedly around 100,000 hours of speech data. The goal is to provide human-like, expressive, and flexible TTS: able to generate natural-sounding speech that can handle diverse inputs and likely generalize over voice styles, intonation, prosody, and perhaps multiple languages or accents. With that scale and dataset volume, MetaVoice aims to push the boundary of what open-source TTS models can achieve: high fidelity, natural prosody, and robustness even for edge cases. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 4
    ChatTTS

    ChatTTS

    A generative speech model for daily dialogue

    ChatTTS is an open-source conversational text-to-speech model optimized for dialogue, developed by 2Noise. Trained on 100,000+ hours of English and Chinese conversation data, it excels at generating expressive prosody—pauses, interjections, laughter—for more natural-sounding speech synthesis in assistant and chatbot applications.
    Downloads: 5 This Week
    Last Update:
    See Project
  • AI-generated apps that pass security review Icon
    AI-generated apps that pass security review

    Stop waiting on engineering. Build production-ready internal tools with AI—on your company data, in your cloud.

    Retool lets you generate dashboards, admin panels, and workflows directly on your data. Type something like “Build me a revenue dashboard on my Stripe data” and get a working app with security, permissions, and compliance built in from day one. Whether on our cloud or self-hosted, create the internal software your team needs without compromising enterprise standards or control.
    Try Retool free
  • 5
    MiniMind

    MiniMind

    Train a 26M-parameter GPT from scratch in just 2h

    minimind is a framework that enables users to train a 26-million-parameter GPT (Generative Pre-trained Transformer) model from scratch in approximately two hours. It provides a streamlined process for data preparation, model training, and evaluation, making it accessible for individuals and organizations to develop their own language models without extensive computational resources.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 6
    Audiblez

    Audiblez

    Generate audiobooks from e-books

    ...It focuses on making audiobook creation easy and fast: from a single command, the tool splits an e-book into chapters, synthesizes audio for each section, and then merges the results into a structured audiobook with chapter-based WAV files and a final .m4b container. The Kokoro-82M model it uses is compact (82M parameters) yet natural sounding, trained on under 100 hours of audio, and supports multiple languages, including English (US/UK), Spanish, French, Hindi, Italian, Japanese, Brazilian Portuguese, and Mandarin Chinese. Audiblez can run entirely from the command line via a PyPI package or through a simple cross-platform GUI built on wxPython, giving both advanced users and non-technical users an accessible workflow.
    Downloads: 15 This Week
    Last Update:
    See Project
  • 7
    HY-Motion 1.0

    HY-Motion 1.0

    HY-Motion model for 3D character animation generation

    ...Built on advanced deep learning architectures that combine Diffusion Transformer (DiT) and flow matching techniques, HY-Motion scales these approaches to the billion-parameter level, resulting in strong instruction-following capabilities and richer motion outputs compared to existing open-source models. The training strategy for the HY-Motion series includes extensive pre-training on thousands of hours of varied motion data, fine-tuning on curated high-quality datasets, and reinforcement learning with human feedback, which improves both the plausibility and adaptability of generated motion sequences.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    AutoClip

    AutoClip

    AI-powered video clipping and highlight generation

    AutoClip is an open-source, AI-powered video processing system designed to automate the extraction of “highlight” segments from full-length videos — ideal for creators who want to generate bite-sized clips, compilations, or highlight reels without manually sifting through hours of footage. The system supports downloading videos from major platforms (e.g. YouTube, Bilibili), or accepting local uploads, and then applies AI analysis to identify segments worth clipping based on content (e.g. high energy moments, speech, or other heuristics). Once highlights are identified, AutoClip can automatically cut those segments and optionally assemble them into a compilation, thus greatly reducing manual video editing effort. ...
    Downloads: 9 This Week
    Last Update:
    See Project
  • 9
    SenseVoice

    SenseVoice

    Multilingual speech recognition and audio understanding model

    ...It provides capabilities such as automatic speech recognition, spoken language identification, speech emotion recognition, and audio event detection within a single system. SenseVoice is trained on more than 400,000 hours of speech data and supports over 50 languages for multilingual recognition tasks. It is built to achieve high transcription accuracy while maintaining efficient inference performance. It includes different model variants optimized for either speed or accuracy, allowing developers to choose a configuration suitable for their use case. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • Gemini 3 and 200+ AI Models on One Platform Icon
    Gemini 3 and 200+ AI Models on One Platform

    Access Google's best plus Claude, Llama, and Gemma. Fine-tune and deploy from one console.

    Build generative AI apps with Vertex AI. Switch between models without switching platforms.
    Start Free
  • 10
    MODMAIL

    MODMAIL

    A feature rich discord Modmail bot

    ...All further DM messages will automatically relay to that channel; any available staff can respond within the channel. Schedule tasks in human time, e.g. ?close in 2 hours silently. Editing and deleting messages are synced. Support for the diverse range of message contents (multiple images, files). Paginated commands interfaces via reactions. When you close a thread, Modmail will generate a log link and post it to your log channel.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    Qwen-2.5-VL

    Qwen-2.5-VL

    Qwen2.5-VL is the multimodal large language model series

    Qwen2.5 is a series of large language models developed by the Qwen team at Alibaba Cloud, designed to enhance natural language understanding and generation across multiple languages. The models are available in various sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B parameters, catering to diverse computational requirements. Trained on a comprehensive dataset of up to 18 trillion tokens, Qwen2.5 models exhibit significant improvements in instruction following, long-text generation...
    Downloads: 6 This Week
    Last Update:
    See Project
  • 12
    LingBot-VLA

    LingBot-VLA

    A Pragmatic VLA Foundation Model

    LingBot-VLA is an open-source Vision-Language-Action (VLA) foundational AI model designed to serve as a general “brain” for real-world robotic manipulation by grounding multimodal perception and language into actionable motions. It has been pretrained on tens of thousands of hours of real robotic interaction data across multiple robot platforms, which enables it to generalize well to diverse morphologies and tasks without needing extensive retraining on each new bot. The model aims to bridge vision, language understanding, and motor control within one unified architecture, making it capable of understanding high-level instructions and generating coherent low-level actions in physical environments. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 13
    AI YouTube Shorts Generator

    AI YouTube Shorts Generator

    A python tool that uses GPT-4, FFmpeg, and OpenCV

    ...The tool streamlines multiple steps of the tedious short-form video workflow: highlight detection, clipping, subtitle generation, cropping to vertical 9:16 format, and final rendering — reducing hours of editing to a mostly automated pipeline. Because it supports both local and online video sources, it's flexible whether you're working with your own recorded content or repurposing existing longer-form videos.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    nanochat

    nanochat

    The best ChatGPT that $100 can buy

    ...The repository stitches together every stage of the lifecycle: tokenizer training, pretraining a Transformer on a large web corpus, mid-training on dialogue and multiple-choice tasks, supervised fine-tuning, optional reinforcement learning for alignment, and finally efficient inference with caching. Its north star is approachability and speed: you can boot a fresh GPU box and drive the whole pipeline via a single script, producing a usable chat model in hours and a clear markdown report of what happened. The code is written to be read—concise training loops, transparent configs, and minimal wrappers—so you can audit each step, tweak it, and rerun without getting lost in framework indirection.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 15
    WavTokenizer

    WavTokenizer

    SOTA discrete acoustic codec models with 40/75 tokens per second

    WavTokenizer is a state-of-the-art discrete acoustic codec designed specifically for audio language modeling, capable of compressing 24 kHz audio into just 40 or 75 tokens per second while preserving high perceptual quality. It is built to represent speech, music, and general audio with extremely low bitrate, making it ideal as a front-end for large audio language models like GPT-4o and similar architectures. The model uses a single-quantizer design together with temporal compression to...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 16
    Surya

    Surya

    Implementation of the Surya Foundation Model for Heliophysics

    Surya is an open‑source, AI‑based foundation model for heliophysics developed collaboratively by NASA (via the IMPACT AI team) and IBM. Named after the Sanskrit word for “sun,” Surya is trained on nine years of high‑resolution solar imagery from NASA’s Solar Dynamics Observatory (SDO). It is designed to forecast solar phenomena—such as flares, solar wind, irradiance, and active region behavior—by predicting future solar images with a sophisticated long–short vision transformer architecture,...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 17
    Vidi2

    Vidi2

    Large Multimodal Models for Video Understanding and Editing

    ...Vidi targets applications like intelligent video editing, automated video search, content analysis, and editing assistance, enabling users to efficiently locate relevant segments and objects in hours-long footage. The system is built with open-source release in mind, giving developers access to model code, inference scripts, and evaluation pipelines so they can reproduce research results or integrate Vidi into their own video-processing workflows.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 18
    Lightweight' GAN

    Lightweight' GAN

    Implementation of 'lightweight' GAN, proposed in ICLR 2021

    ...The main contribution of the paper is a skip-layer excitation in the generator, paired with autoencoding self-supervised learning in the discriminator. Quoting the one-line summary "converge on single gpu with few hours' training, on 1024 resolution sub-hundred images". Augmentation is essential for Lightweight GAN to work effectively in a low data setting. You can test and see how your images will be augmented before they pass into a neural network (if you use augmentation). The general recommendation is to use suitable augs for your data and as many as possible, then after some time of training disable the most destructive (for image) augs. ...
    Downloads: 0 This Week
    Last Update:
    See Project
  • 19
    lora-svc

    lora-svc

    Singing voice change based on whisper, lora for singing voice clone

    singing voice change based on whisper, and lora for singing voice clone. You will feel the beauty of the code from this project. Uni-SVC main branch is for singing voice clone based on whisper with speaker encoder and speaker adapter. Uni-SVC main target is to develop lora for SVC. With lora, maybe clone a singer just need 10 stence after 10 minutes train. Each singer is a plug-in of the base model.
    Downloads: 0 This Week
    Last Update:
    See Project
  • 20
    Horovod

    Horovod

    Distributed training framework for TensorFlow, Keras, PyTorch, etc.

    Horovod was originally developed by Uber to make distributed deep learning fast and easy to use, bringing model training time down from days and weeks to hours and minutes. With Horovod, an existing training script can be scaled up to run on hundreds of GPUs in just a few lines of Python code. Horovod can be installed on-premise or run out-of-the-box in cloud platforms, including AWS, Azure, and Databricks. Horovod can additionally run on top of Apache Spark, making it possible to unify data processing and model training into a single pipeline. ...
    Downloads: 4 This Week
    Last Update:
    See Project
  • 21
    VALL-E

    VALL-E

    PyTorch implementation of VALL-E (Zero-Shot Text-To-Speech)

    ...Specifically, we train a neural codec language model (called VALL-E) using discrete codes derived from an off-the-shelf neural audio codec model, and regard TTS as a conditional language modeling task rather than continuous signal regression as in previous work. During the pre-training stage, we scale up the TTS training data to 60K hours of English speech which is hundreds of times larger than existing systems. VALL-E emerges in-context learning capabilities and can be used to synthesize high-quality personalized speech with only a 3-second enrolled recording of an unseen speaker as an acoustic prompt. Experiment results show that VALL-E significantly outperforms the state-of-the-art zero-shot TTS system in terms of speech naturalness and speaker similarity. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 22
    Knock Knock

    Knock Knock

    Get notified when your training ends

    Knock Knock is a lightweight Python utility created by the Hugging Face team that allows developers to receive notifications when long-running machine learning tasks finish or fail. Training deep learning models often takes hours or even days, making it inconvenient for engineers to constantly monitor progress manually. The library solves this problem by adding simple decorators or command-line commands that automatically send notifications when a process completes or crashes. These alerts can be delivered through several communication platforms such as email, Slack, Telegram, or other messaging services. ...
    Downloads: 2 This Week
    Last Update:
    See Project
  • 23
    ...Its current focus is on boosting techniques, Haar-like features, and face detection. PyCV provides the world's fastest method for training a face detector, in a few hours.
    Downloads: 0 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • Next
MongoDB Logo MongoDB