multimodal

Here are 919 public repositories matching this topic...

Mintplex-Labs / anything-llm

The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, and more.

nodejs desktop-app webui ai-agents multimodal rag vector-database llm localai local-llm ollama llm-webui lmstudio llm-application agent-framework-javascript crewai llama3 custom-ai-agents

Updated Dec 24, 2024
JavaScript

jina-ai / serve

Star

☁️ Build multimodal AI applications with cloud-native stack

Updated Dec 20, 2024
Python

haotian-liu / LLaVA

Star

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

chatbot llama multimodal multi-modality gpt-4 foundation-models visual-language-learning chatgpt instruction-tuning vision-language-model llava llama2 llama-2

Updated Aug 12, 2024
Python

microsoft / unilm

Star

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Updated Dec 15, 2024
Python

NVIDIA / NeMo

Star

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

machine-translation tts speech-synthesis neural-networks deeplearning speaker-recognition asr multimodal speech-translation large-language-models speaker-diariazation generative-ai

Updated Dec 24, 2024
Python

mediar-ai / screenpipe

Star

build ai agents that have the full context, open source, runs locally, developer friendly. 24/7 screen, mic, keyboard recording and control

machine-learning ai computer-vision ml agi vision agents multimodal llm

Updated Dec 23, 2024
TypeScript

rerun-io / rerun

Star

Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.

visualization python rust computer-vision cpp robotics multimodal

Updated Dec 23, 2024
Rust

bentoml / BentoML

Star

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering ai-inference llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

Updated Dec 24, 2024
Python

AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.

ui beam agi openai gpt mistral multimodal groq openai-api gpt-4 large-language-models stable-diffusion generative-ai chatgpt chatgpt-ui gpt-5 anthropic

Updated Dec 23, 2024
TypeScript

facebookresearch / mmf

Star

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

deep-learning dialog pytorch vqa pretrained-models captioning multimodal multi-tasking textvqa hateful-memes

Updated Nov 15, 2024
Python

SkalskiP / courses

Sponsor

Star

This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)

nlp machine-learning natural-language-processing tutorial deep-neural-networks computer-vision deep-learning transformers generative-model multimodal mlops stable-diffusion

Updated Apr 22, 2024
Python

swyxio / ai-notes

Star

notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.

ai openai gpt multimodal gpt-3 prompt-engineering stable-diffusion

Updated Dec 18, 2024
HTML

modelscope / ms-swift

Star

Use PEFT or Full-parameter to finetune 400+ LLMs (Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, ...) or 100+ MLLMs (Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, GOT-OCR2, ...).

Updated Dec 24, 2024
Python

kyegomez / tree-of-thoughts

Sponsor

Star

Plug in and Play Implementation of Tree of Thoughts: Deliberate Problem Solving with Large Language Models that Elevates Model Reasoning by atleast 70%

deep-learning prompt artificial-intelligence multimodal gpt4 prompt-learning prompt-tuning prompt-engineering chatgpt

Updated Oct 29, 2024
Python

livekit / agents

Star

Build real-time multimodal AI applications 🤖🎙️📹

real-time video ai voice agents voice-assistant multimodal

Updated Dec 23, 2024
Python

IDEA-CCNL / Fengshenbang-LM

Star

Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系，成为中文AIGC和认知智能的基础设施。

transformers pytorch chinese-nlp pretrained-models distributed-training multimodal aigc

Updated Aug 13, 2024
Python

luban-agi / Awesome-AIGC-Tutorials

Star

Curated tutorials and resources for Large Language Models, AI Painting, and more.

nlp awesome ai deep-learning tutorials multimodal courses-resource aigc llm midjourney prompt-engineering stable-diffusion chatgpt

Updated Mar 31, 2024

jina-ai / discoart

Star

🪩 Create Disco Diffusion artworks in one line

generative-art cross-modal diffusion prompts creative-ai creative-art multimodal clip-guided-diffusion dalle disco-diffusion midjourney imgen discodiffusion latent-diffusion stable-diffusion

Updated May 16, 2023
Python

rom1504 / img2dataset

Star

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

image big-data deep-learning dataset image-dataset download-images multimodal

Updated Aug 7, 2024
Python

TEN-framework / TEN-Agent

Star

TEN Agent is a conversational AI powered by the TEN, integrating Gemini 2.0 Live, OpenAI Realtime, RTC, and more. It delivers real-time capabilities to see, hear, and speak, while being fully compatible with popular workflow platforms like Dify and Coze.

Updated Dec 23, 2024
Python

Improve this page

Add a description, image, and links to the multimodal topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodal topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multimodal

Here are 919 public repositories matching this topic...

Mintplex-Labs / anything-llm

jina-ai / serve

haotian-liu / LLaVA

microsoft / unilm

NVIDIA / NeMo

mediar-ai / screenpipe

rerun-io / rerun

bentoml / BentoML

enricoros / big-AGI

facebookresearch / mmf

SkalskiP / courses

swyxio / ai-notes

modelscope / ms-swift

kyegomez / tree-of-thoughts

livekit / agents

IDEA-CCNL / Fengshenbang-LM

luban-agi / Awesome-AIGC-Tutorials

jina-ai / discoart

rom1504 / img2dataset

TEN-framework / TEN-Agent

Improve this page

Add this topic to your repo