Skip to content
View mesnico's full-sized avatar
  • Pisa, Italy

Highlights

  • Pro

Block or report mesnico

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official repository of the paper "Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation"

Python 13 Updated Jan 14, 2025

(2024CVPR) MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Python 268 28 Updated Jul 19, 2024

[AAAI-25] Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference

Python 266 8 Updated Jan 8, 2025

Mamba SSM architecture

Python 13,807 1,188 Updated Jan 18, 2025

[ECCV'24] Official Implementation of Autoregressive Visual Entity Recognizer.

Python 12 Updated Mar 2, 2024

[CBMI2024 Best Paper] Official repository of the paper "Is CLIP the main roadblock for fine-grained open-world perception?".

Jupyter Notebook 22 Updated Jan 9, 2025

Hydra is a framework for elegantly configuring complex applications

Python 8,964 651 Updated Jan 16, 2025

PyTorch code and models for V-JEPA self-supervised learning from video.

Python 2,745 260 Updated Aug 9, 2024

Open-Sora: Democratizing Efficient Video Production for All

Python 23,114 2,281 Updated Dec 27, 2024

✨✨Latest Advances on Multimodal Large Language Models

13,573 867 Updated Jan 17, 2025

[ACL 2024] GroundingGPT: Language-Enhanced Multi-modal Grounding Model

Python 314 17 Updated Nov 4, 2024

Official PyTorch implementation of the paper "TMR: Text-to-Motion Retrieval Using Contrastive 3D Human Motion Synthesis" ICCV 2023

Python 221 19 Updated Dec 11, 2023

[CVPR2024 Highlight] Official repository of the paper "The devil is in the fine-grained details: Evaluating open-vocabulary object detectors for fine-grained understanding."

Python 49 3 Updated Sep 23, 2024

showing how to use CLIP-Vip to do video search

Python 13 2 Updated Nov 16, 2023

Scalable and user friendly neural 🧠 forecasting algorithms.

Python 3,248 376 Updated Jan 13, 2025

WildCapture This repository contains the code and dataset used in the paper titled "Leveraging Visual Attention for out-of-distribution Detection" published at ICCV 2023, Paris Out Of Distribution …

Python 3 Updated Oct 12, 2023

the AI-native open-source embedding database

Rust 17,052 1,412 Updated Jan 19, 2025

PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.

Python 6,748 1,226 Updated Nov 26, 2024

video anomaly detection

Python 76 11 Updated Sep 21, 2022

[NeurIPS 2023] MotionGPT: Human Motion as a Foreign Language, a unified motion-language generation model using LLMs

Python 1,559 95 Updated Apr 3, 2024

DSPy: The framework for programming—not prompting—language models

Python 21,151 1,596 Updated Jan 17, 2025

An Evaluation Framework for Temporal Information Extraction Systems

Python 17 1 Updated Dec 13, 2024

An Image/Text Retrieval Test Collection to Support Multimedia Content Creation

Jupyter Notebook 20 2 Updated Oct 21, 2023

[ECCV2022] A pytorch implementation for TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval

Python 75 9 Updated Nov 29, 2022

Multi-modality pre-training

Python 479 38 Updated May 8, 2024

Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [ICCV'21]

Python 355 43 Updated May 19, 2022

Awesome list for research on CLIP (Contrastive Language-Image Pre-Training).

1,166 57 Updated Jun 28, 2024

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

Python 3,229 242 Updated Mar 5, 2024
Jupyter Notebook 31 2 Updated Sep 16, 2022

[TPAMI2024] Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset

Python 273 16 Updated Dec 25, 2024
Next