-
Meta AI
- New York, USA
- https://koustuvsinha.com
- @koustuvsinha
Highlights
- Pro
Stars
ViT Prisma is a mechanistic interpretability library for Vision Transformers (ViTs).
🔥 Aurora Series: A more efficient multimodal large language model series for video.
💭👀precognition.nvim - Precognition uses virtual text and gutter signs to show available motions.
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
A flexible and efficient codebase for training visually-conditioned language models (VLMs)
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
Machine Learning Engineering Open Book
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Hierarchical Video-Moment Retrieval and Step-Captioning (CVPR 2023)
Cross-platform, fast, feature-rich, GPU based terminal
A neovim plugin for interactively running code with the jupyter kernel. Fork of magma-nvim with improvements in image rendering, performance, and more
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
A high-throughput and memory-efficient inference and serving engine for LLMs
utilities for decoding deep representations (like sentence embeddings) back to text
Open-Sora: Democratizing Efficient Video Production for All
EILeV: Eliciting In-Context Learning in Vision-Language Models for Videos Through Curated Data Distributional Properties
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
HT-Step is a large-scale article grounding dataset of temporal step annotations on how-to videos
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval [ICCV'21]
PyTorch code and models for V-JEPA self-supervised learning from video.
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
[NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering
Fast Differentiable Tensor Library in JavaScript and TypeScript with Bun + Flashlight