-
KAUST
- Saudi Arabia
Highlights
- Pro
Stars
Official repository of "SAMURAI: Adapting Segment Anything Model for Zero-Shot Visual Tracking with Motion-Aware Memory"
ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation (ICCV 2023, Oral)
ai-generated apps , full stack + generative UI
A simple screen parsing tool towards pure vision based GUI agent
[ICCV 2023] Code for NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection
[ECCV 2024] Leveraging Synthetic Data for Real-Domain High-Resolution Monocular Metric Depth Estimation
Refine high-quality datasets and visual AI models
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
🌟A curated list of DUSt3R-related papers and resources, tracking recent advancements using this geometric foundation model.
This repository contains a framework for converting monocular videos into side-by-side (SBS) 3D videos. It utilizes a combination of image processing techniques and depth map predictions to generat…
[arXiv 2023] DreamGaussian4D: Generative 4D Gaussian Splatting
Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources
Project page of replacing the human motion in the video with a virtual 3D human
BoxMOT: pluggable SOTA tracking modules for segmentation, object detection and pose estimation models
Virtual whiteboard for sketching hand-drawn like diagrams
Unofficial Implementation of Animate Anyone
[CVPR 2024] An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation
The Startup CTO's Handbook, a book covering leadership, management and technical topics for leaders of software engineering teams
[Medical Image Analysis] Adversarial Data Augmentation with Chained Differentiable Transformations (AdvChain)
[CVPR 2022] Official PyTorch Implementation for DiffusionCLIP: Text-guided Image Manipulation Using Diffusion Models
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
🐫 CAMEL: Finding the Scaling Law of Agents. The first and the best multi-agent framework. https://www.camel-ai.org
Repo for "Benchmarking Robustness of 3D Point Cloud Recognition against Common Corruptions" https://arxiv.org/abs/2201.12296