Stars
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Google Research
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
LAVIS - A One-stop Library for Language-Vision Intelligence
OpenMMLab Multimodal Advanced, Generative, and Intelligent Creation Toolbox. Unlock the magic 🪄: Generative-AI (AIGC), easy-to-use APIs, awsome model zoo, diffusion models, for text-to-image genera…
Open-source and strong foundation image recognition models.
Codes for "Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models".
VOLO: Vision Outlooker for Visual Recognition
Open-source, End-to-end, Vision-Language-Action model for GUI Agent & Computer Use.
[NeurIPS'23 Spotlight] "Mind2Web: Towards a Generalist Agent for the Web"
Official repository for the paper "High-Resolution Daytime Translation Without Domain Labels" (CVPR2020, Oral)
[NeurIPS 2020] This project provides a strong single-stage baseline for Long-Tailed Classification, Detection, and Instance Segmentation (LVIS). It is also a PyTorch implementation of the NeurIPS 2…
PyTorch implementation of MAML: https://arxiv.org/abs/1703.03400
Pytorch implementation of "All Tokens Matter: Token Labeling for Training Better Vision Transformers"
MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts
[NeurIPS 2023 D&B] VidChapters-7M: Video Chapters at Scale
[ICCV 2023] Code for "Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement"
Code for the Paper: Antonino Furnari and Giovanni Maria Farinella. What Would You Expect? Anticipating Egocentric Actions with Rolling-Unrolling LSTMs and Modality Attention. International Conferen…
[ECCV 2022] A generalized long-tailed challenge that incorporates both the conventional class-wise imbalance and the overlooked attribute-wise imbalance within each class. The proposed IFL together…
[ICCV 2021 Oral + TPAMI] Just Ask: Learning to Answer Questions from Millions of Narrated Videos
BotSIM - a data-efficient end-to-end Bot SIMulation toolkit for evaluation, diagnosis, and improvement of commercial chatbots
Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models
Temporally Consistent Video Colorization with Deep Feature Propagation and Self-regularization Learning
FathomNet's out-of-sample detection challenge in association with FGVC 2023