Stars
Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch
[ACM MM 2021] Visible Watermark Removal via Self-calibrated Localization and Background Refinement
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
A series of large language models trained from scratch by developers @01-ai
利用pytorch实现图像分类的一个完整的代码,训练,预测,TTA,模型融合,模型部署,cnn提取特征,svm或者随机森林等进行分类,模型蒸馏,一个完整的代码
This is the repository for paper "UniQA: Unified Vision-Language Pre-training of Quality and Aesthetics"
GPT4V-level open-source multi-modal model based on Llama3-8B
PyTorch implementation of MAE https//arxiv.org/abs/2111.06377
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Reading list for research topics in multimodal machine learning
AGI资料汇总学习(主要包括LLM和AIGC),持续更新......
AIGC-interview/CV-interview/LLMs-interview面试问题与答案集合仓,同时包含工作和科研过程中的新想法、新问题、新资源与新项目
[ICLR 2024] SemiReward: A General Reward Model for Semi-supervised Learning
A Unified Semi-Supervised Learning Codebase (NeurIPS'22)
incremental learning for fine-grained image retrieval via feature estimation
Multi-Modal Reasoning Graph for Scene-Text Based Fine-Grained Image Classification and Retrieval
[ACMMM 2023] Learning Style-Invariant Robust Representation for Generalizable Visual Instance Retrieval
ViewDiff generates high-quality, multi-view consistent images of a real-world 3D object in authentic surroundings. (CVPR2024).
LAVIS - A One-stop Library for Language-Vision Intelligence
A collection of resources on controllable generation with text-to-image diffusion models.
对常用的6700个汉字进行音、形比较,输出音近字、形近字的列表。 # 相近字
Open-source and strong foundation image recognition models.