Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
01_aigc_object		01_aigc_object
02_comfyui		02_comfyui
03_prompts		03_prompts
04_prompt-engineering		04_prompt-engineering
about		about
aigc_object		aigc_object
categories		categories
css		css
fonts		fonts
images		images
js		js
tags		tags
webfonts		webfonts
404.html		404.html
CNAME		CNAME
README.md		README.md
beian.png		beian.png
favicon.png		favicon.png
hb.png		hb.png
index.html		index.html
index.json		index.json
index.xml		index.xml
sitemap.xml		sitemap.xml

Repository files navigation

明文视界的 AI 站

AI 发展日新月异, 以下项目是目自 2024-11-23 起, 搜集整理的非常棒的项目/应用/资源...

后面新添加, 都会标注日期:

项目官方网站

CreateAI 2025-01-14

MiniMax-与用户共创智能 2025-01-14

MiniPerplx 2025-01-14

Project IDX 2025-01-14

Dify Marketplace 2025-01-14

即创 - 一站式智能创意生产与管理平台 2025-01-14

通义万相_AI创意作画_AI绘画_人工智能-阿里云 2025-01-14

Qwen 2025-01-14

GitHub 项目

DrewThomasson/ebook2audiobook: Convert ebooks to audiobooks with chapters and metadata using dynamic AI models and voice cloning. Supports 1,107+ languages! 2025-01-14

Snowfallingplum/SHMT: [NeurIPS 2024] SHMT: Self-supervised Hierarchical Makeup Transfer via Latent Diffusion Models 2025-01-14

xyfJASON/ctrlora: Codebase for "CtrLoRA: An Extensible and Efficient Framework for Controllable Image Generation" 2025-01-14

VITA-MLLM/VITA: ✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction 2025-01-14

NJU-PCALab/STAR: STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution 2025-01-14

wileewang/TransPixar 2025-01-14

Stability-AI/stable-point-aware-3d 2025-01-14

SagiPolaczek/NeuralSVG: Official implementation of NerualSVG 2025-01-14

ali-vilab/TeaCache: Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model 2025-01-14

LituRout/RF-Inversion：整流流反演（RF-Inversion） --- LituRout/RF-Inversion: Rectified Flow Inversion (RF-Inversion) 2025-01-14

fudan-generative-vision/hallo3: Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Diffusion Transformer Networks 2025-01-14

sczhou/Upscale-A-Video: [CVPR 2024] Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution 2025-01-14

somanchiu/ReSwapper: ReSwapper aims to reproduce the implementation of inswapper. This repository provides code for training, inference, and includes pretrained weights. 2025-01-14

wangzhiyaoo/SVFR: Official implementation of SVFR. 2025-01-14

Hugging Face

Ebook2audiobook V2.0 Beta - a Hugging Face Space by drewThomasson 2025-01-14

LatentSync - a Hugging Face Space by fffiloni 2025-01-14

等待中的项目

SeedVR 2025-01-14

FaceLift: Single Image to 3D Head with View Generation and GS-LRM 2025-01-14

项目官方网站

Deepseek Artifacts - Experience the power of the world's best open source model. 2025-01-05

TypingMind — LLM Frontend Chat UI for AI models 2025-01-05

STORM 2025-01-05

CAD Software for Hardware Design | Zoo 2025-01-05

让计算更简单 | OpenBayes 贝式计算 2025-01-05

GitHub 项目

Fanghua-Yu/SUPIR: SUPIR aims at developing Practical Algorithms for Photo-Realistic Image Restoration In the Wild. Our new online demo is also released at suppixel.ai. 2025-01-05

declare-lab/TangoFlux: TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching 2025-01-05

n8n-io/n8n: Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations. 2025-01-05

FoundationVision/Infinity: Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis 2025-01-05

supermemoryai/supermemory: Build your own second brain with supermemory. It's a ChatGPT for your bookmarks. Import tweets or save websites and content using the chrome extension. 2025-01-05

jianzongwu/DiffSensei: Implementation of "DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation" 2025-01-05

bytedance/LatentSync: Taming Stable Diffusion for Lip Sync! 2025-01-05

DAMO-NLP-SG/multimodal_textbook: The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining" 2025-01-05

DAMO-NLP-SG/multimodal_textbook: The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining" 2025-01-05

DAMO-NLP-SG/VideoRefer: The code for "VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM" 2025-01-05

hustvl/LightningDiT: [arXiv'25] Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models 2025-01-05

Hugging Face

TangoFlux - a Hugging Face Space by declare-lab 2025-01-05

Kokoro TTS - a Hugging Face Space by hexgrad 2025-01-05

AniPortrait Official - a Hugging Face Space by ZJYang 2025-01-05

等待中的项目

VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control 2025-01-05

CodeElo 2025-01-05

项目官方网站

Lovable 2025-01-02

Monica - ChatGPT AI Assistant | GPT-4o, Claude 3.5, Gemini 1.5 2025-01-02

Voicenotes: Transcribe notes, meetings & ask AI 2025-01-02

YouMind - AI Creation System 2025-01-02

GitHub 项目

IDEA-Research/GroundingDINO: [ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection" 2025-01-02

Yuan-ManX/ai-game-devtools: Here we will keep track of the latest AI Game Development Tools, including LLM, Agent, Code, Writer, Image, Texture, Shader, 3D Model, Animation, Video, Audio, Music, Singing Voice and Analytics. 🔥 2025-01-02

yandex-research/switti: The code and models for the paper: Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis 2025-01-02

FreedomIntelligence/HuatuoGPT-o1: Medical o1, Towards medical complex reasoning with LLMs 2025-01-02

vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for LLMs 2025-01-02

sgl-project/sglang: SGLang is a fast serving framework for large language models and vision language models. 2025-01-02

modelscope/DiffSynth-Studio: Enjoy the magic of Diffusion models! 2025-01-02

cline/cline: Autonomous coding agent right in your IDE, capable of creating/editing files, executing commands, using the browser, and more with your permission every step of the way. 2025-01-02

OpenDriveLab/AgiBot-World: World's First Large-scale High-quality Robotic Manipulation Benchmark 2025-01-02

huggingface/smolagents: 🤗 smolagents: a barebones library for agents. Agents write python code to call tools and orchestrate other agents. 2025-01-02

TMElyralab/MusePose: MusePose: a Pose-Driven Image-to-Video Framework for Virtual Human Generation 2025-01-02

SpatialVision/Orient-Anything 2025-01-02

ixarchakos/try-off-anyone: Official repository of "TryOffAnyone: Tiled Cloth Generation from a Dressed Person" 2025-01-02

Hugging Face

MMAudio — generating synchronized audio from video/text - a Hugging Face Space by hkchengrex 2025-01-02

Anychat - a Hugging Face Space by akhaliq 2025-01-02

FacePoke - a Hugging Face Space by jbilcke-hf 2025-01-02

AI Comic Factory - a Hugging Face Space by jbilcke-hf 2025-01-02

Switti - a Hugging Face Space by dbaranchuk 2025-01-02

Dokdo Multimodal - a Hugging Face Space by ginipick 2025-01-02

Dokdo - a Hugging Face Space by ginigen 2025-01-02

等待中的项目

Feat2GS 2025-01-02

GenHMR: Generative Human Mesh Recovery 2025-01-02

1.58-bit FLUX 2025-01-02

HSfM 2025-01-02

PERSE: Personalized 3D Generative Avatars from A Single Portrait 2025-01-02

VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control 2025-01-02

项目官方网站

Project Odyssey 2024-12-30

在线运行 ComfyUI 工作流并一键部署 API - ComfyOnline 2024-12-30

智谱AI开放平台 2024-12-30

Replit – Build apps and sites with AI 2024-12-30

AIGCPanel | 开源AI数字人系统 2024-12-30

阶跃星辰开放平台 2024-12-30

Fireworks - Fastest Inference for Generative AI 2024-12-30

百川大模型-汇聚世界知识创作妙笔生花-百川智能 2024-12-30

DomoAI | AI Art Generator & Video to Animation Converter 2024-12-30

CreateAI 2024-12-30

Magnific AI — The magic image Upscaler & Enhancer 2024-12-30

Odyssey 2024-12-30

Nexa AI | Enterprise-Grade On-Device AI for Every Device 2024-12-30

Humane Ai Pin | See the World, Not Your Screen. | Humane 2024-12-30

书生 2024-12-30

Taipy — Build Python Data & BI web applications 2024-12-30

GitHub 项目

facebookresearch/blt: Code for BLT research paper 2024-12-30

VideoVerses/VideoVAEPlus 2024-12-30

TencentARC/DI-PCG: Code release of our paper "DI-PCG: Diffusion-based Efficient Inverse Procedural Content Generation for High-quality 3D Asset Creation". 2024-12-30

QwenLM/Qwen2-VL: Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud. 2024-12-30

web-infra-dev/midscene: An AI-powered automation SDK can control the page, perform assertions, and extract data in JSON format using natural language. 2024-12-30

hpcaitech/Open-Sora: Open-Sora: Democratizing Efficient Video Production for All 2024-12-30

osanseviero/geminiCoder: Create apps with Gemini 2024-12-30

IamCreateAI/Ruyi-Models 2024-12-30

rasbt/LLMs-from-scratch: Implement a ChatGPT-like LLM in PyTorch from scratch, step by step 2024-12-30

getmaxun/maxun: 🔥 Open-source no-code web data extraction platform. Turn websites to APIs and spreadsheets with no-code robots in minutes! [In Beta] 2024-12-30

SakanaAI/asal: Automating the Search for Artificial Life with Foundation Models! 2024-12-30

fallenshock/FlowEdit: Official implementation of the paper: "FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models" 2024-12-30

mingyuan-zhang/LMM: Large Motion Model for Unified Multi-Modal Motion Generation 2024-12-30

TencentARC/StereoCrafter: A framework to convert any 2D videos to immersive stereoscopic 3D 2024-12-30

THUDM/CogAgent: An open-sourced end-to-end VLM-based GUI Agent 2024-12-30

AriaUI/Aria-UI: Aria-UI: Visual Grounding for GUI Instructions 2024-12-30

modstart-lib/aigcpanel: AigcPanel 是一个简单易用的一站式AI数字人系统，支持视频合成、声音合成、声音克隆，简化本地模型管理、一键导入和使用AI模型。 2024-12-30

krystalan/DRT-o1: DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought 2024-12-30

zsyOAOA/InvSR: Arbitrary-steps Image Super-resolution via Diffusion Inversion 2024-12-30

livekit/agents: Build real-time multimodal AI applications 🤖🎙️📹 2024-12-30

modelscope/ClearerVoice-Studio: An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc. 2024-12-30

baaivision/See3D: You See it, You Got it: Learning 3D Creation on Pose-Free Videos at Scale 2024-12-30

Nutlope/picMenu: Visualize menus in seconds with AI 2024-12-30

Avaiga/taipy: Turns Data and AI algorithms into production-ready web applications in no time. 2024-12-30

Hugging Face

QVQ 72B Preview - a Hugging Face Space by Qwen 2024-12-30

LuminaBrush - a Hugging Face Space by lllyasviel 2024-12-30

InvSR - a Hugging Face Space by OAOA 2024-12-30

ClearerVoice-Studio (Speech Enhancement, Separation and Extraction) - a Hugging Face Space by alibabasglab 2024-12-30

等待中的项目

Lifting Motion to the 3D World via 2D Diffusion 2024-12-30

Synthesizing Moving People with 3D Control 2024-12-30

pkulwj1994/diff_instruct_pp: We introduce Diff-Instruct++, a novel approach for human preference alignment of 1-step text-to-image generation. 2024-12-30

MegaSaM 2024-12-30

Sketch2Sound 2024-12-30

INFP: Audio-Driven Interactive Head Generation in Dyadic Conversations 2024-12-30

From Slow Bidirectional to Fast Causal Video Generators 2024-12-30

项目官方网站

Zenodo 2024-12-20

Whisk 2024-12-20

labs.google/fx 2024-12-20

无问芯穹一站式AI平台 2024-12-20

VideoLingo - AI Subtitles Translation 2024-12-20

GitHub 项目

RedAIGC/Flux-version-LayerDiffuse 2024-12-20

microsoft/markitdown: Python tool for converting files and office documents to Markdown. 2024-12-20

franciszzj/Leffa: Learning Flow Fields in Attention for Controllable Person Image Generation 2024-12-20

wzhouxiff/ObjCtrl-2.5D: ObjCtrl-2.5D 2024-12-20

ali-vilab/FreeScale: Code for FreeScale, a tuning-free method for higher-resolution visual generation 2024-12-20

tumurzakov/AnimateDiff: AnimationDiff with train 2024-12-20

hkchengrex/MMAudio: [arXiv 2024] Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis 2024-12-20

TencentARC/BrushEdit: The official implementation of paper "BrushEdit: All-In-One Image Inpainting and Editing" 2024-12-20

TencentARC/ColorFlow: The official implementation of paper "ColorFlow: Retrieval-Augmented Image Sequence Colorization" 2024-12-20

IamCreateAI/Ruyi-Models 2024-12-20

Genesis-Embodied-AI/Genesis: A generative world for general-purpose robotics & embodied AI learning. 2024-12-20

Kedreamix/Linly-Dubbing: 智能视频多语言AI配音/翻译工具 - Linly-Dubbing — “AI赋能，语言无界” 2024-12-20

genmoai/mochi: The best OSS video generation models 2024-12-20

guoyww/AnimateDiff: Official implementation of AnimateDiff. 2024-12-20

Hugging Face

BrushEdit - a Hugging Face Space by TencentARC 2024-12-20

TRELLIS - a Hugging Face Space by JeffreyXiang 2024-12-20

等待中的项目

Motion Prompting: Controlling Video Generation with Motion Trajectories 2024-12-20

snap-research.github.io/wonderland/ 2024-12-20

X-Portrait 2: Highly Expressive Portrait Animation 2024-12-20

项目官方网站

New Chat | glhf.chat 2024-12-15

edify-3d Model by Shutterstock | NVIDIA NIM 2024-12-15

豆包 MarsCode - 工作台 2024-12-15

Devin 2024-12-15

DeepSeek - 探索未至之境 2024-12-15

Sora 2024-12-15

DeepLearning.AI - Learning Platform 2024-12-15

D5渲染器官网 | 实时光追渲染技术，重塑3D创作工作流 2024-12-15

PromptPerfect - AI Prompt Generator and Optimizer 2024-12-15

Learn Prompting: Your Guide to Communicating with AI 2024-12-15

GitHub 项目

hacksider/Deep-Live-Cam: real time face swap and one-click video deepfake with only a single image 2024-12-15

datawhalechina/llm-cookbook: 面向开发者的 LLM 入门教程，吴恩达大模型系列课程中文版 2024-12-15

f/awesome-chatgpt-prompts: This repo includes ChatGPT prompt curation to use ChatGPT better. 2024-12-15

Stability-AI/stable-audio-tools: Generative models for conditional audio generation 2024-12-15

isarandi/nlf: [NeurIPS 2024] Neural Localizer Fields for Continuous 3D Human Pose and Shape Estimation 2024-12-15

Stability-AI/stable-fast-3d: SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement 2024-12-15

lihxxx/DisPose: This repository is the official implementation of DisPose 2024-12-15

fkryan/gazelle 2024-12-15

tdrussell/diffusion-pipe: A pipeline parallel training script for diffusion models. 2024-12-15

openai/openai-cookbook: Examples and guides for using the OpenAI API 2024-12-15

promptslab/Awesome-Prompt-Engineering: This repository contains a hand-curated resources for Prompt Engineering with a focus on Generative Pre-trained Transformer (GPT), ChatGPT, PaLM etc 2024-12-15

thunlp/Delta-CoMe: Delta-CoMe can achieve near loss-less 1-bit compressin which has been accepted by NeurIPS 2024 2024-12-15

Hugging Face

FlowEdit - a Hugging Face Space by fallenshock 2024-12-15

等待中的项目

Project Astra - Google DeepMind 2024-12-15

Project Mariner - Google DeepMind 2024-12-15

Jules (Confidential) 2024-12-15

MotionShop: Zero-Shot Motion Transfer in Video Diffusion Models with Mixture of Score Guidance 2024-12-15

SwiftEdit 2024-12-15

Michael Fischer 2024-12-15

Using Diffusion Priors for Video Amodal Segmentation 2024-12-15

项目官方网站

Fish Audio: Free Generative AI Text To Speech & Voice Cloning 2024-12-09

Generative Foundation Model - Amazon Nova - AWS 2024-12-09

RunComfy: Top ComfyUI Platform - Fast & Easy, No Setup 2024-12-09

提示工程指南 | Prompt Engineering Guide 2024-12-09

Prompt Engineering Guide | Prompt Engineering Guide 2024-12-09

Hailuo AI Audio: Create lifelike speech 2024-12-09

GitHub 项目

FunAudioLLM/CosyVoice: Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability. 2024-12-09

FunAudioLLM/SenseVoice: Multilingual Voice Understanding Model 2024-12-09

modelscope/FunASR: A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc. 2024-12-09

yformer/EfficientTAM: Efficient Track Anything 2024-12-09

jingyaogong/minimind: 「大模型」3小时完全从0训练26M的小参数GPT，个人显卡即可推理训练！ 2024-12-09

kijai/ComfyUI-HunyuanVideoWrapper 2024-12-09

jianchang512/clone-voice: A sound cloning tool with a web interface, using your voice or any sound to record audio / 一个带web界面的声音克隆工具，使用你的音色或任意声音来录制音频 2024-12-09

dair-ai/Prompt-Engineering-Guide: 🐙 Guides, papers, lecture, notebooks and resources for prompt engineering 2024-12-09

memoavatar/memo: Memory-Guided Diffusion for Expressive Talking Video Generation 2024-12-09

1jsingh/negtome: Official Implementation for paper: Negative Token Merging: Image-based Adversarial Feature Guidance 2024-12-09

microsoft/TRELLIS: Official repo for paper "Structured 3D Latents for Scalable and Versatile 3D Generation". 2024-12-09

Francis-Rings/StableAnimator: We present StableAnimator, the first end-to-end ID-preserving video diffusion framework, which synthesizes high-quality videos without any post-processing, conditioned on a reference image and a sequence of poses. 2024-12-09

Hugging Face

CosyVoice-300M · 创空间 2024-12-09

ChatTTS Speaker - a Hugging Face Space by taa 2024-12-09

Flux Fill Outpainting - a Hugging Face Space by multimodalart 2024-12-09

Flux.1-dev Upscaler - a Hugging Face Space by jasperai 2024-12-09

Flux.1-dev Upscaler - a Hugging Face Space by Nymbo 2024-12-09

等待中的项目

Muse 2024-12-09

Introducing Veo and Imagen 3 on Vertex AI | Google Cloud Blog 2024-12-09

FLOAT 2024-12-09

Genie 2: A large-scale foundation world model - Google DeepMind 2024-12-09

SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters 2024-12-09

Digital Life Project 2024-12-09

I2VControl: Disentangled and Unified Video Motion Synthesis Control 2024-12-09

DualPM: Dual Posed-Canonical Point Maps for 3D Shape and Pose Reconstruction 2024-12-09

fugatto.github.io 2024-12-09

CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models 2024-12-09

SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance 2024-12-09

vision-xl 2024-12-09

项目官方网站

语鲸 2024-12-03

深言达意 – 找词找句 2024-12-03

爱校对官网-免费高效的错别字检查工具 2024-12-03

Learn About 2024-12-03

World Labs 2024-12-03

通义tongyi.ai_你的全能AI助手-通义千问 2024-12-03

天工AI - 搜索更深度，阅读更多彩 2024-12-03

讯飞星火大模型-AI大语言模型-星火大模型-科大讯飞 2024-12-03

文心一言 2024-12-03

Home • Hume AI 2024-12-03

Cohere | The leading AI platform for enterprise 2024-12-03

腾讯混元文生视频 2024-12-03

PixelDance - PixelDance AI - 领先的AI视频生成平台 2024-12-03

GitHub 项目

hiroi-sora/Umi-OCR: OCR software, free and offline. 开源、免费的离线OCR软件。支持截屏/批量导入图片，PDF文档识别，排除水印/页眉页脚，扫描/生成二维码。内置多国语言库。 2024-12-03

Significant-Gravitas/AutoGPT: AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters. 2024-12-03

OpenBMB/ChatDev: Create Customized Software using Natural Language Idea (through LLM-powered Multi-Agent Collaboration) 2024-12-03

prs-eth/RollingDepth: Video Depth without Video Models 2024-12-03

Tencent/HunyuanVideo 2024-12-03

Hugging Face

TryOffDiff - a Hugging Face Space by rizavelioglu 2024-12-03

等待中的项目

Freditor 2024-12-03

mayuelala/FollowYourClick: [arXiv 2024] Follow-Your-Click: This repo is the official implementation of "Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts" 2024-12-03

项目网站

搜索问答

纳米搜索 2024-12-02

书生·浦语 2024-12-02

豆包 - 抖音旗下 AI 智能助手

Google AI Studio

视频, 语音, 绘图等综合

即梦AI - 一站式AI创作平台

可灵 AI - 新一代 AI 创意生产力平台

Luma Dream Machine | AI Video Generator

KREA AI - AIGC集合-风格-图有免费

Home - Leonardo.Ai

AI Test Kitchen

Le Chat - Mistral AI

WHEE - 高品质的AI素材生成器

视频

VidAU Creative Center 2024-12-02

kaze.ai - AI-powered Free Online Removing Watermark and Logos Tool 2024-11-27

Vidu，让想象发生

Hailuo AI Video Generator - Reimagine Video Creation

讲故事的方式发生了转变LTX工作室 --- Storytelling Transformed | LTX Studio

Noisee AI 音乐生成MV

Genmo. Create videos and images with AI.

Home | PixVerse

万德动力 --- Wonder Dynamics

HeyGen - AI Spokesperson Video Creator

DomoAI: video to video, video to animation and more

Warpvideo AI: Change Video Style with AI

Hedra 数字人

AI 擁抱 - 免費線上 AI 擁抱影片生成器

MOKI - 我用AI做短片

Meshcapade | 编辑人物动作

BoomCut - 爆剪辑 - 小影科技旗下 AI 内容创意产品与服务平台

绘画设计

Create stunning visuals in seconds with AI.

超能画布首页

Design - Playground

Skybox AI 360°全景

Magic Studio：利用 AI 制作精美图像

Remove Background from Image for Free – remove.bg

Craiyon, formerly DALL-E mini

Create - Artbreeder

NightCafe Creator

Projects - Recraft

Blendbox.ai 多图组合

Ideogram 画布

Logo-creator.io – Generate a logo

在线抠图软件_图片去除背景 | remove.bg – remove.bg

3D

Meshy - Free 3D Models Generated from Images and Text

Immersity AI | Convert Image and Video to 3D

Tripo AI - 用文字或图片免费生成3D模型

语音, 音乐

Free Text to Speech & AI Voice Generator | ElevenLabs

Udio AI Music Generator - Make Original Tracks in Seconds

Stable Audio - Generate

在线免费文本转语音 - TTS-Online | 多种声音与二次元语音

Soundboard - TUNA - Download Unlimited Free Meme Sounds

节奏生成器-Beat Blender 音乐

网易天音 - 一站式AI音乐创作工具 - 官网

提示词

promptoMANIA:绘画提示生成器

PromptHero - 提示词大全

代码

Codeium · Free AI Code Completion & Chat 2024-12-02

MarsCode - AI IDE

ScriptEcho | AI生成生产级代码 |

其他

NotebookLM | Note Taking & Research Assistant Powered by AI

扣子 - AI 智能体开发平台

Illuminate | Learn Your Way

LlamaOCR.com – Document to markdown

Neo AI engineer

等待中的项目

AnchorCrafter 2024-12-02

Generative Omnimatte: Learning to Decompose Video into Layers 2024-12-02

lehduong/OneDiffusion 2024-12-02

LipDub AI | The most realistic AI lip sync and video translation 2024-12-02

MyTimeMachine: Personalized Facial Age Transformation 2024-12-02

Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors 2024-12-02

MultiFoley 2024-12-02

Sonic: Shifting Focus to Global Audio Perception in Audio-driven Portrait Animation 2024-12-02

lewandofskee/MobileMamba: Official implementation of `MobileMamba: Lightweight Multi-Receptive Visual Mamba Network.' 2024-12-02

Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation 2024-11-27

Fugatto, World’s Most Flexible Sound Machine, Debuts | NVIDIA Blog 2024-11-27

Fashion-VDM: Video Diffusion Model for Virtual Try-On

Inverse Painting: Reconstructing The Painting Process

首页 |剧集主管 --- Home | Showrunner

Video Ocean视频大模型 - 人人皆导演

PersonaTalk: Bring Attention to Your Persona in Visual Dubbing

MarDini: Masked Auto-Regressive Diffusion for Video Generation at Scale -- Meta AI Research

loopyavatar.github.io/?ref=aihub.cn

Google Vids：在线视频创建和编辑器 | Google Vids谷歌工作区 --- Google Vids: Online Video Creator and Editor | Google Workspace

URAvatar: Universal Relightable Gaussian Codec Avatars

DanceFusion: A Spatio-Temporal Skeleton Diffusion Transformer for Audio-Driven Dance Motion Reconstruction.

AnimateAnything

模型, 资源, 工作流

Discovery | OpenArt

Shakker - Generative AI design tool with diverse models

首页 · 魔搭社区

Comfy Workflows

FREE online image generator and model hosting site! | Tensor.Art

Civitai: The Home of Open-Source Generative AI

CodeWithGPU | 能复现才是好算法

LiblibAI-哩布哩布AI - 中国领先的AI创作平台

ComfyUI工作流 - 在线运行，速度快，不报错

FREE online image generator and model hosting site! | Tensor.Art

Cephalon Cloud 端脑云 - AIGC 应用平台

相关网站

Discover and download free videos - Pixabay

Danbooru: Anime Image Board

Discover the Best GPTs

AI工具集 | 700+ AI工具集合官网，国内外AI工具集导航大全

Supertools | Best AI Tools Guide

AIGC导航 | 1500+全品类AIGC创作工具_探索更多可能！

插画交流网站[pixiv]

ArtStation - Explore

AIbase - 智能匹配最适合您的AI产品和网站

Newsfeed - Sketchfab

AI Model & API Providers Analysis | Artificial Analysis

GGAC数字艺术平台

Weird Wonderful AI Art | ART of the future - now!

GitHub 项目

视频

hmrishavbandy/FlipSketch: FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations 2024-12-02

KwaiVGI/LivePortrait: Bring portraits to life! 2024-12-02

C0untFloyd/roop-unleashed: Evolved Fork of roop with Web Server and lots of additions 2024-12-02

jdh-algo/JoyVASA 2024-12-02

PKU-YuanGroup/ConsisID: Identity-Preserving Text-to-Video Generation by Frequency Decomposition 2024-12-02

rhymes-ai/Allegro: Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input. 2024-12-02

k4yt3x/video2x: A machine learning-based lossless video super resolution framework. Est. Hack the Valley II, 2018. 2024-11-27

facefusion/facefusion: Industry leading face manipulation platform 2024-11-27

yangchris11/samurai: Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"

alibaba/Tora: The official repository for paper "Tora: Trajectory-oriented Diffusion Transformer for Video Generation"

aigc-apps/CogVideoX-Fun: 📹 A more flexible CogVideoX that can generate videos at any resolution and creates videos from images.

aigc-apps/EasyAnimate: 📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion

GitHub - HVision-NKU/StoryDiffusion: Create Magic Story!

hpcaitech/Open-Sora: Open-Sora: Democratizing Efficient Video Production for All

Vision-CAIR/MiniGPT4-video

hkchengrex/Cutie: [CVPR 2024 Highlight] Putting the Object Back Into Video Object Segmentation

Picsart-AI-Research/StreamingT2V: StreamingT2V: Consistent, Dynamic, and Extendable Long Video Generation from Text

aigc-apps/EasyAnimate: 📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion

Tencent/MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance

jianchang512/pyvideotrans: Translate the video from one language to another and add dubbing. 将视频从一种语言翻译为另一种语言，并支持api调用

Hillobar/Rope: GUI-focused roop

GitHub - sczhou/CodeFormer: [NeurIPS 2022] Towards Robust Blind Face Restoration with Codebook Lookup Transformer

Huanshere/VideoLingo: Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切割、翻译、对齐、甚至加上配音，一键全自动视频搬运AI字幕组

jy0205/Pyramid-Flow: Code of Pyramidal Flow Matching for Efficient Video Generative Modeling

Vision-CAIR/LongVU

Doubiiu/ToonCrafter: [SIGGRAPH Asia 2024, Journal Track] ToonCrafter: Generative Cartoon Interpolation

VectorSpaceLab/Video-XL: 🔥🔥First-ever hour scale video understanding models

anliyuan/Ultralight-Digital-Human: 一个超轻量级、可以在移动端实时运行的数字人模型

antgroup/echomimic_v2: EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation

Zejun-Yang/AniPortrait: AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation

fudan-generative-vision/hallo2: Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation

antgroup/echomimic: EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning

LordLiang/DrawingSpinUp: (SIGGRAPH Asia 2024) This is the official PyTorch implementation of SIGGRAPH Asia 2024 paper: DrawingSpinUp: 3D Animation from Single Character Drawings

HelloVision/HelloMeme: The official HelloMeme GitHub site

Kmcode1/SG-I2V: This is the official implementation of SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation.

facebookresearch/sapiens: High-resolution models for human tasks.

AlonzoLeeeooo/StableV2V: The official implementation of the paper titled "StableV2V: Stablizing Shape Consistency in Video-to-Video Editing".

genmoai/mochi: The best OSS video generation models

THUDM/CogVideo: text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)

CyberAgentAILab/TANGO: Official implementation of the paper "TANGO: Co-Speech Gesture Video Reenactment with Hierarchical Audio-Motion Embedding and Diffusion Interpolation"

IDEA-Research/MotionCLR: [Arxiv 2024] MotionCLR: Motion Generation and Training-free Editing via Understanding Attention Mechanisms

Ji4chenLi/t2v-turbo: Code repository for T2V-Turbo and T2V-Turbo-v2

Lightricks/LTX-Video: Official repository for LTX-Video

ComfyUI

sipie800/ComfyUI-PuLID-Flux-Enhanced 2024-12-02

EvilBT/ComfyUI_SLK_joy_caption_two: ComfyUI Node 2024-11-27

huchenlei/ComfyUI-layerdiffuse: Layer Diffuse custom nodes 2024-11-27

kijai/ComfyUI-IC-Light: Using IC-LIght models in ComfyUI 2024-11-27

kijai/ComfyUI-CogVideoXWrapper

Lightricks/ComfyUI-LTXVideo: LTX-Video Support for ComfyUI

smthemex/ComfyUI_EchoMimic: You can using EchoMimic in ComfyUI

AIFSH/ACE-ComfyUI

logtd/ComfyUI-MochiEdit: ComfyUI nodes to edit videos using Genmo Mochi

kijai/ComfyUI-SUPIR: SUPIR upscaling wrapper for ComfyUI

HelloVision/ComfyUI_HelloMeme: Official comfyui repository of Hellomeme

alimama-creative/SDXL_EcomID_ComfyUI

AIGODLIKE/AIGODLIKE-ComfyUI-Studio: Improve the interactive experience of using ComfyUI, such as making the loading of ComfyUI models more intuitive and making it easier to create model thumbnails

ssitu/ComfyUI_UltimateSDUpscale: ComfyUI nodes for the Ultimate Stable Diffusion Upscale script by Coyote-A.

Gourieff/comfyui-reactor-node: Fast and Simple Face Swap Extension Node for ComfyUI

lldacing/ComfyUI_BiRefNet_ll

AIGODLIKE/ComfyUI-BlenderAI-node: Used for AI model generation, next-generation Blender rendering engine, texture enhancement&generation (based on ComfyUI)

smthemex/ComfyUI_Hallo2: ComfyUI_Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation

kijai/ComfyUI-Florence2: Inference Microsoft Florence2 VLM

CY-CHENYUE/ComfyUI-Molmo: Generate detailed image descriptions and analysis using Molmo models in ComfyUI.

yolain/ComfyUI-Easy-Use: In order to make it easier to use the ComfyUI, I have made some optimizations and integrations to some commonly used nodes.

sipherxyz/comfyui-art-venture

GiusTex/ComfyUI-DiffusersImageOutpaint: Diffusers Image Outpaint for ComfyUI

XLabs-AI/x-flux-comfyui

T8star1984/Comfyui-Aix-NodeMap: Comfyui's latest node organization and annotation, continuously updated, and supported by the Aix team/comfyui最新节点整理及注释，持续更新，AIX团队

T8star1984/Comfyui-Aix-NodeMap: Comfyui's latest node organization and annotation, continuously updated, and supported by the Aix team/comfyui最新节点整理及注释，持续更新，AIX团队

logtd/ComfyUI-Fluxtapoz: Nodes for image juxtaposition for Flux in ComfyUI

WASasquatch/was-node-suite-comfyui: An extensive node suite for ComfyUI with over 210 new nodes

cubiq/ComfyUI_IPAdapter_plus

cubiq/ComfyUI_InstantID

cubiq/ComfyUI_InstantID

ZHO-ZHO-ZHO/ComfyUI-InstantID: Unofficial implementation of InstantID for ComfyUI

kijai/ComfyUI-MochiWrapper

kijai/ComfyUI-LivePortraitKJ: ComfyUI nodes for LivePortrait

PowerHouseMan/ComfyUI-AdvancedLivePortrait

TemryL/ComfyUI-IDM-VTON: ComfyUI adaptation of IDM-VTON for virtual try-on.

city96/ComfyUI-GGUF: GGUF Quantization support for native ComfyUI models

FizzleDorf/ComfyUI_FizzNodes: Custom Nodes for Comfyui

balazik/ComfyUI-PuLID-Flux: PuLID-Flux ComfyUI implementation

kijai/ComfyUI-PyramidFlowWrapper

stavsap/comfyui-ollama

ltdrdata/ComfyUI-Manager: ComfyUI-Manager is an extension designed to enhance the usability of ComfyUI. It offers management functions to install, remove, disable, and enable various custom nodes of ComfyUI. Furthermore, this extension provides a hub feature and convenience functions to access a wide range of information within ComfyUI.

erosDiffusion/ComfyUI-enricos-nodes: Compositor Node experiments

StartHua/Comfyui_CXH_joy_caption: Recommended based on comfyui node pictures:Joy_caption + MiniCPMv2_6-prompt-generator + florence2

ZHO-ZHO-ZHO/ComfyUI-YoloWorld-EfficientSAM: Unofficial implementation of YOLO-World + EfficientSAM for ComfyUI

logtd/ComfyUI-Fluxtapoz: Nodes for image juxtaposition for Flux in ComfyUI

GreenLandisaLie/AuraSR-ComfyUI: ComfyUI implementation of AuraSR

Jonseed/ComfyUI-Detail-Daemon: A port of muerrilla's sd-webui-Detail-Daemon as a node for ComfyUI, to adjust sigmas that control detail.

taabata/ComfyCanvas: Canvas to use with ComfyUI

jtydhr88/ComfyUI-Hunyuan3D-1-wrapper: ComfyUI Hunyuan3D-1-wrapper is a custom node that allows you to run Tencent/Hunyuan3D-1 in ComfyUI as a wrapper.

smthemex/ComfyUI_Sapiens: You can call Using Sapiens to get seg，normal，pose，depth，mask

1038lab/ComfyUI-RMBG: A ComfyUI node for removing image backgrounds using RMBG-2.0.

TTPlanetPig/Comfyui_Object_Migration: This is a study aim to transfer the single concept by using DIT model self-attention capablity

DoctorDiffusion/ComfyUI-BEN: Background Erase Network - Remove backgrounds from images within ComfyUI.

marduk191/ComfyUI-Fluxpromptenhancer: A Prompt Enhancer for flux.1 in ComfyUI

Lightricks/ComfyUI-LTXVideo: LTX-Video Support for ComfyUI

WebUI

open-webui/open-webui: User-friendly AI Interface (Supports Ollama, OpenAI API, ...)

continue-revolution/sd-webui-segment-anything: Segment Anything for Stable Diffusion WebUI

lllyasviel/stable-diffusion-webui-forge

aigc-apps/sd-webui-EasyPhoto: 📷 EasyPhoto | Your Smart AI Photo Generator.

LLM

THUDM/GLM-4-Voice: GLM-4-Voice | 端到端中英语音对话模型

oobabooga/text-generation-webui: A Gradio web UI for Large Language Models.

janhq/jan: Jan is an open source alternative to ChatGPT that runs 100% offline on your computer. Multiple engine support (llama.cpp, TensorRT-LLM)

ollama/ollama: Get up and running with Llama 3.2, Mistral, Gemma 2, and other large language models.

binary-husky/gpt_academic: 为GPT/GLM等LLM大语言模型提供实用化交互接口，特别优化论文阅读/润色/写作体验，模块化设计，支持自定义快捷按钮&函数插件，支持Python和C++等项目剖析&自译解功能，PDF/LaTex论文翻译&总结功能，支持并行问询多种LLM模型，支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, moss等。

SillyTavern/SillyTavern: LLM Frontend for Power Users.

mendableai/firecrawl: 🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

InternLM/InternLM: Official release of InternLM2.5 base and chat models. 1M context support

训练脚本

hiyouga/LLaMA-Factory: Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024) 2024-12-02

kohya-ss/sd-scripts

cocktailpeanut/fluxgym: Dead simple FLUX LoRA training UI with LOW VRAM support

kijai/ComfyUI-FluxTrainer

Releases · bmaltais/kohya_ss

Akegarasu/lora-scripts: LoRA & Dreambooth training scripts & GUI use kohya-ss's trainer, for diffusion model.

Nerogar/OneTrainer: OneTrainer is a one-stop solution for all your stable diffusion training needs.

图像设计

chengyou-jia/ChatGen 2024-12-02

erwold/qwen2vl-flux 2024-11-27

Yuanshi9815/OminiControl: A minimal and universal controller for FLUX.1. 2024-11-27

lllyasviel/sd-forge-layerdiffuse: [WIP] Layer Diffusion for WebUI (via Forge) 2024-11-27

ali-vilab/ACE: All-round Creator and Editor

mit-han-lab/hart: HART: Efficient Visual Generation with Hybrid Autoregressive Transformer

ZhengPeng7/BiRefNet: [CAAI AIR'24] Bilateral Reference for High-Resolution Dichotomous Image Segmentation

YangLing0818/IterComp: IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation

xinsir6/ControlNetPlus: ControlNet++: All-in-one ControlNet for image generations and editing!

Kwai-Kolors/Kolors: Kolors Team

Xiaojiu-z/Stable-Hair: Stable-Hair: Real-World Hair Transfer via Diffusion Model

yisol/IDM-VTON: [ECCV2024] IDM-VTON : Improving Diffusion Models for Authentic Virtual Try-on in the Wild

bcmi/libcom: Image composition toolbox: everything you want to know about image composition or object insertion

PixArt-alpha/PixArt-alpha: PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

black-forest-labs/flux: Official inference repo for FLUX.1 models

Stability-AI/sd3.5

lllyasviel/Omost: Your image is almost there!

gligen/GLIGEN: Open-Set Grounded Text-to-Image Generation

Tencent/HunyuanDiT: Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

lllyasviel/IC-Light: More relighting!

tencent-ailab/IP-Adapter: The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.

piddnad/DDColor: [ICCV 2023] Official implementation of "DDColor: Towards Photo-Realistic Image Colorization via Dual Decoders"

cumulo-autumn/StreamDiffusion: StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation

ToTheBeginning/PuLID: [NeurIPS 2024] Official code for PuLID: Pure and Lightning ID Customization via Contrastive Alignment

KDE/krita: Krita is a free and open source cross-platform application that offers an end-to-end solution for creating digital art files from scratch built on the KDE and Qt frameworks.

Acly/krita-ai-diffusion: Streamlined interface for generating images with AI in Krita. Inpaint and outpaint with optional text prompt, no tweaking required.

instantX-research/InstantID: InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥

jbilcke-hf/FacePoke: Select a portrait, click to move the head around (please use your own space / GPU!)

catcathh/UltraPixel: Implementation of UltraPixel: Advancing Ultra-High-Resolution Image Synthesis to New Peaks

Zeyi-Lin/HivisionIDPhotos: ⚡️HivisionIDPhotos: a lightweight and efficient AI ID photos tools. 一个轻量级的AI证件照制作算法。

VectorSpaceLab/OmniGen: OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340

shallowdream204/DreamClear: [NeurIPS 2024🔥] DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation

NVlabs/consistory

instantX-research/Regional-Prompting-FLUX: Training-free Regional Prompting for Diffusion Transformers 🔥

ali-vilab/In-Context-LoRA: Official repository of In-Context LoRA for Diffusion Transformers

mit-han-lab/nunchaku: SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

ChenyangSi/FreeU: FreeU: Free Lunch in Diffusion U-Net (CVPR2024 Oral)

magic-quill/MagicQuill: Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System

Nutlope/logocreator: A free + OSS logo generator powered by Flux on Together AI

NVlabs/Sana: SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

JackAILab/ConsistentID: Customized ID Consistent for human

DepthAnything/Depth-Anything-V2: [NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation

tryonlabs/FLUX.1-dev-LoRA-Outfit-Generator: FLUX.1-dev LoRA Outfit Generator can create an outfit by detailing the color, pattern, fit, style, material, and type.

语音, 音乐

netease-youdao/EmotiVoice: EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine

haidog-yaqub/EzAudio: High-quality Text-to-Audio Generation with Efficient Diffusion Transformer

2noise/ChatTTS: A generative speech model for daily dialogue.

BytedanceSpeech/seed-tts-eval

RVC-Project/Retrieval-based-Voice-Conversion-WebUI: Easily train a good VC model with voice data <= 10 mins!

GitHub - yxlllc/DDSP-SVC: Real-time end-to-end singing voice conversion system based on DDSP (Differentiable Digital Signal Processing)

voicepaw/so-vits-svc-fork: so-vits-svc fork with realtime support, improved interface and more features.

GitHub - RVC-Boss/GPT-SoVITS: 1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

SWivid/F5-TTS: Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

misya11p/amt-apc: AMT-APC: AMT-APC: Automatic Piano Cover by Fine-Tuning an Automatic Music Transcription Model

WEIFENG2333/AsrTools: ✨ AsrTools: 智能语音转文字工具 | 高效批处理 | 用户友好界面 | 无需 GPU |支持 SRT/TXT 输出 | 让您的音频瞬间变成精确文字！

open-mmlab/Amphion: Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

fishaudio/fish-speech: Brand new TTS solution

3D

VAST-AI-Research/TripoSR 2024-11-27

microsoft/MoGe: MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision

HengyiWang/spann3r: 3D Reconstruction with Spatial Memory

Tencent/Hunyuan3D-1

wenqsun/DimensionX: DimensionX: Create Any 3D and 4D Scenes from a Single Image with Controllable Video Diffusion

文本处理

zyddnys/manga-image-translator: Translate manga/image 一键翻译各类图片内文字 https://cotrans.touhou.ai/

chidiwilliams/buzz: Buzz transcribes and translates audio offline on your personal computer. Powered by OpenAI's Whisper.

AgentEra/Agently-Daily-News-Collector: An open-source LLM based automatically daily news collecting workflow showcase powered by Agently AI application development framework.

LC044/WeChatMsg: 提取微信聊天记录，将其导出成HTML、Word、Excel文档永久保存，对聊天记录进行分析生成年度聊天报告，用聊天数据训练专属于个人的AI聊天助手

gabrielchua/open-notebooklm: Convert any PDF into a podcast episode!

getomni-ai/zerox: Zero shot pdf OCR with gpt-4o-mini

opendatalab/PDF-Extract-Kit: A Comprehensive Toolkit for High-Quality PDF Content Extraction

Nutlope/llama-ocr: Document to Markdown OCR library with Llama 3.2 vision

opendatalab/MinerU: A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具，将PDF转换成Markdown和JSON格式。

其他

showlab/ShowUI: Repository for ShowUI: One Vision-Language-Action Model for GUI Visual Agent 2024-12-02

turboderp/exllamav2: A fast inference library for running LLMs locally on modern consumer-class GPUs 2024-12-02

instructor-ai/instructor: structured outputs for llms 2024-12-02

Comprehensive Guide to Prompting Techniques - Instructor 2024-12-02

huggingface/transformers.js: State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server! 2024-12-02

Ucas-HaoranWei/GOT-OCR2.0: Official code implementation of General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

deepseek-ai/DeepSeek-VL: DeepSeek-VL: Towards Real-World Vision-Language Understanding

dynobo/normcap: OCR powered screen-capture tool to capture information instead of images

modelscope/DiffSynth-Studio: Enjoy the magic of Diffusion models!

abi/screenshot-to-code: Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)

stackblitz/bolt.new: Prompt, run, edit, and deploy full-stack web applications

lean-dojo/LeanCopilot: LLMs as Copilots for Theorem Proving in Lean

geekan/MetaGPT: 🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming

princeton-nlp/SWE-agent: [NeurIPS 2024] SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges.

OpenCodeInterpreter/OpenCodeInterpreter: OpenCodeInterpreter is a suite of open-source code generation systems aimed at bridging the gap between large language models and sophisticated proprietary systems like the GPT-4 Code Interpreter. It significantly enhances code generation capabilities by integrating execution and iterative refinement functionalities.

Ikaros-521/AI-Vtuber: AI Vtuber是一个由【ChatterBot/ChatGPT/claude/langchain/chatglm/text-gen-webui/闻达/千问/kimi/ollama】驱动的虚拟主播【Live2D/UE/xuniren】，可以在【Bilibili/抖音/快手/微信视频号/拼多多/斗鱼/YouTube/twitch/TikTok】直播中与观众实时互动或直接在本地进行聊天。它使用TTS技术【edge-tts/VITS/elevenlabs/bark/bert-vits2/睿声】生成回答并可以选择【so-vits-svc/DDSP-SVC】变声；指令协同SD画图。

GitHub - 3b1b/manim: Animation engine for explanatory math videos

GitHub - ManimCommunity/manim: A community-maintained Python framework for creating mathematical animations.

GitHub - KindXiaoming/pykan: Kolmogorov Arnold Networks

GitHub - PeterH0323/Streamer-Sales: Streamer-Sales 销冠 —— 卖货主播大模型，一个能够根据给定的商品特点对商品进行解说并激发用户的购买意愿的卖货主播模型

FujiwaraChoki/MoneyPrinter: Automate Creation of YouTube Shorts using MoviePy.

princeton-nlp/SWE-agent: SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4. It solves 12.29% of bugs in the SWE-bench evaluation set (comparable to Devin) and take just 1.5 minutes to run (7x faster than Devin).

harry0703/MoneyPrinterTurbo: 利用AI大模型，一键生成高清短视频 Generate short videos with one click using AI LLM.

idootop/mi-gpt: 🏠 将小爱音箱接入 ChatGPT 和豆包，改造成你的专属语音助手。

wan-h/awesome-digital-human-live2d: Awesome Digital Human

openai/swarm: Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.

meta-llama/llama-recipes: Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting a number of candid inference solutions such as HF TGI, VLLM for local or cloud deployment. Demo apps to showcase Meta Llama for WhatsApp & Messenger.

HqWu-HITCS/Awesome-Chinese-LLM: 整理开源的中文大语言模型，以规模较小、可私有化部署、训练成本较低的模型为主，包括底座模型，垂直领域微调及应用，数据集与教程等。

Hannibal046/Awesome-LLM: Awesome-LLM: a curated list of Large Language Model

excalidraw/excalidraw: Virtual whiteboard for sketching hand-drawn like diagrams

meltylabs/melty: Chat first code editor. To download the packaged app:

gpt-omni/mini-omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。

Hugging Face

Qwen2vl Flux Mini Demo - a Hugging Face Space by Djrango 2024-12-02

IC Light V2-Vary - a Hugging Face Space by lllyasviel 2024-12-02

IllusionDiffusion - a Hugging Face Space by AP123 2024-12-02

ReplaceAnything - a Hugging Face Space by modelscope 2024-12-02

QwQ-32B-Preview - a Hugging Face Space by Qwen 2024-12-02

OminiControl - a Hugging Face Space by Yuanshi 2024-11-27

ACE-Chat - a Hugging Face Space by scepter-studio

MoGe - a Hugging Face Space by Ruicheng

EzAudio - a Hugging Face Space by OpenSound

NaturalSpeech3 FACodec - a Hugging Face Space by amphion

IDM VTON - a Hugging Face Space by yisol

AnimateDiff-Lightning - a Hugging Face Space by ByteDance

Omost - a Hugging Face Space by lllyasviel

CLIP Interrogator - a Hugging Face Space by pharmapsychotic

Pyramid Flow - a Hugging Face Space by Pyramid-Flow

Joy Caption Alpha Two - a Hugging Face Space by fancyfeast

IC Light V2 - a Hugging Face Space by lllyasviel

MaskGCT TTS Demo - a Hugging Face Space by amphion

OmniGen - a Hugging Face Space by Shitao

MotionCLR - a Hugging Face Space by EvanTHU

SeedEdit-APP-V1.0 - a Hugging Face Space by ByteDance

Framer - a Hugging Face Space by wwen1997

BRIA RMBG 2.0 - a Hugging Face Space by briaai

MinerU - a Hugging Face Space by opendatalab

Qwen Turbo 1M Demo - a Hugging Face Space by Qwen

DimensionX - a Hugging Face Space by fffiloni

PhotoMaker V2 - a Hugging Face Space by TencentARC

OOTDiffusion - a Hugging Face Space by levihsu

moondream2 - a Hugging Face Space by vikhyatk

文档资料

使用 diffusers 训练你自己的 ControlNet 🧨

Stable Diffusion QR Code 101

E-Hentai/태그 - 나무위키

魔咒百科词典

So-VITS-SVC 4.1 整合包完全指南

Stable Diffusion 3.5 Prompt Guide — Stability AI

使用 ChatGPT 进行写作的学生指南 |开放人工智能 --- A Student’s Guide to Writing with ChatGPT | OpenAI

richards199999/Thinking-Claude: Let your Claude able to think

hesamsheikh/ml-retreat: Machine Learning Journal for Intermediate to Advanced Topics.

Midjourney Documentation and User Guide

归档(可以不用看)

Resources for GAN Artists

Disco Diffusion Portrait Study (by @enviraldesign) - Google 文档

alibaba/animate-anything: Fine-Grained Open Domain Image Animation with Motion Guidance

GitHub - prophesier/diff-svc: Singing Voice Conversion via diffusion model

TencentARC/GFPGAN: GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.

guide to installing disco v5+ locally on windows

clip_interrogator.ipynb - Colaboratory

A Traveler’s Guide to the Latent Space

Coar’s Disco Diffusion Guide

Disco Diffusion Illustrated Settings

Ai generative art tools

AI绘画的关键词（群友们的画）

Artist Studies by @remi_durant

CLIP Prompt Engineering for Generative Art - matthewmcateer.me

数据集-LAION-400-MILLION OPEN DATASET | LAION