历年综述论文分类汇总戳这里↘️ CV-Surveys施工中~~~~~~~~~~
- Rasterized Edge Gradients: Handling Discontinuities Differentiably
- Concept Arithmetics for Circumventing Concept Inhibition in Diffusion Models
🏠project
- X-InstructBLIP: A Framework for Aligning Image, 3D, Audio, Video to LLMs and its Emergent Cross-modal Reasoning
⭐code
- Visual Relationship Transformation
- Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection
- Chameleon: A Data-Efficient Generalist for Dense Visual Prediction in the Wild(https://github.com/GitGyun/chameleon)密集视觉预测
- Unsupervised Dense Prediction using Differentiable Normalized Cuts
- Three Things We Need to Know About Transferring Stable Diffusion to Visual Dense Prediciton Tasks
- Removing Rows and Columns of Tokens in Vision Transformer enables Faster Dense Prediction without Retraining
⭐code
- 版权保护
- 图像水印
- Certifiably Robust Image Watermark
⭐code - A Secure Image Watermarking Framework with Statistical Guarantees via Adversarial Attacks on Secret Key Networks图像水印
- Not Just Change the Labels, Learn the Features: Watermarking Deep Neural Networks with Multi-View Data
⭐code - A Watermark-Conditioned Diffusion Model for IP Protection
⭐code - A Geometric Distortion Immunized Deep Watermarking Framework with Robustness Generalizability
- LaWa: Using Latent Space for In-Generation Image Watermarking
- Certifiably Robust Image Watermark
- Real Appearance Modeling for More General Deepfake Detection
- Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities
⭐code - Fake It till You Make It: Curricular Dynamic Forgery Augmentations towards General Deepfake Detection
- Common Sense Reasoning for Deep Fake Detection
⭐code - 图像伪造检测和定位
- 文档图像篡改检测
- 合成图像检测
- OpenKD: Opening Prompt Diversity for Zero- and Few-shot Keypoint Detection
⭐code - KeypointDETR: An End-to-End 3D Keypoint Detector
⭐code
- Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos
🏠project - Ponymation: Learning Articulated 3D Animal Motions from Unlabeled Online Videos
🏠project3D动物运动 - Adaptive High-Frequency Transformer for Diverse Wildlife Re-Identification
⭐code
- City-on-Web: Real-time Neural Rendering of Large-scale Scenes on the Web
⭐code
🏠project - A Probability-guided Sampler for Neural Implicit Surface Rendering
🏠project渲染 - TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering
🏠project - AnyLens: A Generative Diffusion Model with Any Rendering Lens(https://anylens-diffusion.github.io/)
- CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians
⭐code
🏠project - METACAP: Meta-learning Priors from Multi-View Imagery for Sparse-view Human Performance Capture and Rendering
🏠project - GAURA: Generalizable Approach for Unified Restoration and Rendering of Arbitrary Views
- MaRINeR: Enhancing Novel Views by Matching Rendered Images with Nearby References
⭐code - Learning Unsigned Distance Functions from Multi-view Images with Volume Rendering Priors
⭐code - CaesarNeRF: Calibrated Semantic Representation for Few-Shot Generalizable Neural Rendering
🏠project - IntrinsicAnything: Learning Diffusion Priors for Inverse Rendering Under Unknown Illumination
⭐code渲染 - Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering
🏠project - VersatileGaussian: Real-time Neural Rendering for Versatile Tasks using Gaussian Splatting神经渲染
- UniVoxel: Fast Inverse Rendering by Unified Voxelization of Scene Representation
⭐code - Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering
🏠project - GeoGaussian: Geometry-aware Gaussian Splatting for Scene Rendering
⭐code场景渲染 - GMT: Enhancing Generalizable Neural Rendering via Geometry-Driven Multi-Reference Texture Transfer
⭐code - Boost Your NeRF: A Model-Agnostic Mixture of Experts Framework for High Quality and Efficient Rendering
- Invertible Neural Warp for NeRF
⭐code - VF-NeRF: Viewshed Fields for Rigid NeRF Registration
- NeRF-XL: NeRF at Any Scale with Multi-GPU
🏠project - Regularizing Dynamic Radiance Fields with Kinematic Fields
- KFD-NeRF: Rethinking Dynamic NeRF with Kalman Filter
⭐code - Dynamic Neural Radiance Field From Defocused Monocular Video
- Flash Cache: Reducing Bias in Radiance Cache Based Inverse Rendering
🏠project - Protecting NeRFs' Copyright via Plug-And-Play Watermarking Base Model
🏠project - GeometrySticker: Enabling Ownership Claim of Recolorized Neural Radiance Fields
⭐code
🏠project - Efficient NeRF Optimization - Not All Samples Remain Equally Hard
- MeshFeat: Multi-Resolution Features for Neural Fields on Meshes
🏠project - DecentNeRFs: Decentralized Neural Radiance Fields from Crowdsourced Images
🏠project - TrackNeRF: Bundle Adjusting NeRF from Sparse and Noisy Views via Feature Tracks
⭐code - BeNeRF: Neural Radiance Fields from a Single Blurry Image and Event Stream
⭐code - TriNeRFLet: A Wavelet Based Multiscale Triplane NeRF Representation
🏠project - RS-NeRF: Neural Radiance Fields from Rolling Shutter Images
⭐code - Motion-Oriented Compositional Neural Radiance Fields for Monocular Dynamic Human Modeling
⭐code
🏠project - RaFE: Generative Radiance Fields Restoration
🏠project - Few-shot NeRF by Adaptive Rendering Loss Regularization
⭐code - Depth-guided NeRF Training via Earth Mover’s Distance
- DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields
⭐code - Flowed Time of Flight Radiance Fields
- Volumetric Rendering with Baked Quadrature Fields
- BeNeRF:Neural Radiance Fields from a Single Blurry Image and Event Stream
⭐code - Taming Latent Diffusion Model for Neural Radiance Field Inpainting
🏠project - Mesh2NeRF: Direct Mesh Supervision for Neural Radiance Field Representation and Generation
🏠project
🤗huggingface - SlotLifter: Slot-guided Feature Lifting for Learning Object-Centric Radiance Fields
🏠project - FisherRF: Active View Selection and Mapping with Radiance Fields using Fisher Information
⭐code - DMiT: Deformable Mipmapped Tri-Plane Representation for Dynamic ScenesNeRF
- Single-Mask Inpainting for Voxel-based Neural Radiance Fields
- Content-Aware Radiance Fields: Aligning Model Complexity with Scene Intricacy Through Learned Bitwidth Quantization
⭐code - Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering
🏠project - Physically Plausible Color Correction for Neural Radiance Fields
- Leveraging Thermal Modality to Enhance Reconstruction in Low-Light ConditionsNeRF
- PointNeRF++: A multi-scale, point-based Neural Radiance Field
🏠project - Omni-Recon: Harnessing Image-based Rendering for General-Purpose Neural Radiance Fields
- High-Fidelity and Transferable NeRF Editing by Frequency Decomposition
🏠project - TriNeRFLet: A Wavelet Based Triplane NeRF Representation
🏠project - Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction
🏠project - G2fR: Frequency Regularization in Grid-based Feature Encoding Neural Radiance Fields
- NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields
🏠project - 新视图合成
- Fast View Synthesis of Casual Videos
🏠project - PolyOculus: Simultaneous Multi-view Image-based Novel View Synthesis
🏠project - RING-NeRF : Rethinking Inductive Biases for Versatile and Efficient Neural Fields
- Structured-NeRF: Hierarchical Scene Graph with Neural Representation
- URS-NeRF: Unordered Rolling Shutter Bundle Adjustment for Neural Radiance Fields
- A Compact Dynamic 3D Gaussian Representation for Real-Time Dynamic View Synthesis
⭐code
🏠project - High-Resolution and Few-shot View Synthesis from Asymmetric Dual-lens Inputs
⭐code - Distractor-Free Novel View Synthesis via Exploiting Memorization Effect in Optimization
⭐code - NVS-Adapter: Plug-and-Play Novel View Synthesis from a Single Image
⭐code - FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting
⭐code - Fast View Synthesis of Casual Videos with Soup-of-Planes
🏠project - CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians
🏠project - MegaScenes: Scene-Level View Synthesis at Scale
⭐code - Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis
⭐code视图合成 - NGP-RT: Fusing Multi-Level Hash Features with Lightweight Attention for Real-Time Novel View Synthesis
- Efficient Depth-Guided Urban View Synthesis
⭐code - Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis
⭐code - Generalizable Human Gaussians for Sparse View Synthesis
🏠project - Thermal3D-GS: Physics-induced 3D Gaussians for Thermal Infrared Novel-view Synthesis
⭐code
- Fast View Synthesis of Casual Videos
- FYI: Flip Your Images for Dataset Distillation
- Neural Spectral Decomposition for Dataset Distillation
⭐code - Teddy: Efficient Large-Scale Dataset Distillation via Taylor-Approximated Matching
⭐code - Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation
⭐code - COM Kitchens: An Unedited Overhead-view Video Dataset as a Vision-Language Benchmark
- 基准
- MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models
⭐code - DailyDVS-200: A Comprehensive Benchmark Dataset for Event-Based Action Recognition
⭐code - Urban Waterlogging Detection: A Challenging Benchmark and Large-Small Model Co-Adapter
⭐code - MSD: A Benchmark Dataset for Floor Plan Generation of Building Complexes
- BlinkVision: A Benchmark for Optical Flow, Scene Flow and Point Tracking Estimation using RGB Frames and Eventsbr>:house:project
- SIMBA: Split Inference - Mechanisms, Benchmarks and Attacks
⭐code - A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis
⭐code - BAFFLE: A Baseline of Backpropagation-Free Federated Learning
⭐code - Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking
- Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations
🏠project - UniIR: Training and Benchmarking Universal Multimodal Information Retrievers
🏠project - HyTAS: A Hyperspectral Image Transformer Architecture Search Benchmark and Analysis
- OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding
🏠project - PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines
⭐code - Cross-Platform Video Person ReID: A New Benchmark Dataset and Adaptation Approach
⭐code - R^2-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations
⭐code - m&m’s: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks
⭐code
🤗huggingface - PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology
🤗huggingface - LayeredFlow: A Real-World Benchmark for Non-Lambertian Multi-Layer Optical Flow
🏠project - HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects
⭐code - When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset
⭐code
- MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models
- 数据集
- VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models
⭐code - HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning
⭐code - OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web
- COM Kitchens: An Unedited Overhead-view Procedural Videos Dataset a Vision-Language Benchmark
🌻dataset - Seeing Faces in Things: A Model and Dataset for Pareidolia
🌻dataset - Towards Dual Transparent Liquid Level Estimation in Biomedical Lab: Dataset, Methods and Practice
🌻dataset - GarmentCodeData: A Dataset of 3D Made-to-Measure Garments With Sewing Patterns
🏠project - SemTrack: A Large-scale Dataset for Semantic Tracking in the Wild
🌻dataset - WiMANS: A Benchmark Dataset for WiFi-based Multi-user Activity Sensing
⭐code - BugNIST - a Large Volumetric Dataset for Detection under Domain Shift
- Defect Spectrum: A Granular Look of Large-scale Defect Datasets with Rich Semantics
⭐code
🏠project大规模缺陷数据集 - Raindrop Clarity: A Dual-Focused Dataset for Day and Night Raindrop Removal
⭐code - PartImageNet++ Dataset: Scaling up Part-based Models for Robust Recognition
⭐code - WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding
⭐code - MMVR: Millimeter-wave Multi-View Radar Dataset and Benchmark for Indoor Perception
- SkyScenes: A Synthetic Dataset for Aerial Scene Understanding
🏠project - Caltech Aerial RGB-Thermal Dataset in the Wild
⭐code - V2X-Real: a Largs-Scale Dataset for Vehicle-to-Everything Cooperative Perception
- H-V2X: A Large Scale Highway Dataset for BEV Perception
- PetFace: A Large-Scale Dataset and Benchmark for Animal Identification
⭐code - Long-range Turbulence Mitigation: A Large-scale Dataset and A Coarse-to-fine Framework
- OmniNOCS: A unified NOCS dataset and model for 3D lifting of 2D objects
⭐code - SignAvatars: A Large-scale 3D Sign Language Holistic Motion Dataset and Benchmark
⭐code
🏠project - Insect Identification in the Wild: The AMI Dataset
⭐code野外昆虫识别:AMI 数据集 - RoScenes: A Large-scale Multi-view 3D Dataset for Roadside Perception
🌻dataset
- VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models
- 数据增强
- Audio-Synchronized Visual Animation
⭐code
🏠project - Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation
🏠project - Label-anticipated Event Disentanglement for Audio-Visual Video Parsing
- Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity
⭐code - Spherical World-Locking for Audio-Visual Localization in Egocentric Videos
- Self-Supervised Audio-Visual Soundscape Stylization
🏠project - CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios
⭐code视听场景 - Perceptual Evaluation of Audio-Visual Synchrony Grounded in Viewers’ Opinion Scores
- Siamese Vision Transformers are Scalable Audio-visual Learners
⭐code视听学习器 - Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
🏠project生成环境感知的动作声音 - Audio-visual Generalized Zero-shot Learning the Easy Way
- 视听分割
- SCAPE: A Simple and Strong Category-Agnostic Pose Estimator
⭐code - SRPose: Two-view Relative Pose Estimation with Sparse Keypoints
🏠project - FAFA: Frequency-Aware Flow-Aided Self-Supervision for Underwater Object Pose Estimation
⭐code - A Graph-Based Approach for Category-Agnostic Pose Estimation
🏠project - GS-Pose: Category-Level Object Pose Estimation via Geometric and Semantic Correspondence
- OP-Align: Object-level and Part-level Alignment for Self-supervised Category-level Articulated Object Pose Estimation
⭐code - FoundPose: Unseen Object Pose Estimation with Foundation Features
🏠project - LaPose: Laplacian Mixture Shape Modeling for RGB-Based Category-Level Object Pose Estimation
⭐code - U-COPE: Taking a Further Step to Universal 9D Category-level Object Pose Estimation
- PACE: Pose Annotations in Cluttered Environments
⭐code - 6-DoF
- An Economic Framework for 6-DoF Grasp Detection
⭐code - Pseudo-keypoint RKHS Learning for Self-supervised 6DoF Pose Estimation
- Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance
⭐code - Omni6D: Large-Vocabulary 3D Object Dataset for Category-Level 6D Object Pose Estimation
⭐code - 6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model
⭐code - FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models
🏠project
- An Economic Framework for 6-DoF Grasp Detection
- 相机姿态估计
- 计数
- AFreeCA: Annotation-Free Counting for All计数
- Zero-shot Object Counting with Good Exemplars
- ABC Easy as 123: A Blind Counter for Exemplar-Free Multi-Class Class-agnostic Counting
⭐code
🏠project计数 - Class-Agnostic Object Counting with Text-to-Image Diffusion Model
- Shifted Autoencoders for Point Annotation Restoration in Object Counting
- See and Think: Embodied Agent in Virtual Environment
🏠project - SceneGraphLoc: Cross-Modal Coarse Visual Localization on 3D Scene Graphs
- V-IRL: Grounding Virtual Intelligence in Real Life
⭐code - 机器人
- Robo-ABC: Affordance Generalization Beyond Categories via Semantic Correspondence for Robot Manipulation
🏠project - Learning Cross-hand Policies of High-DOF Reaching and Grasping机器人
- DISCO: Embodied Navigation and Interaction via Differentiable Scene Semantics and Dual-level Control
⭐code - Real-time Holistic Robot Pose Estimation with Unknown States
⭐code - ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation
⭐code
🏠project - Adapt2Reward: Adapting Video-Language Models to Generalizable Robotic Rewards via Failure Prompts
- GraspXL: Generating Grasping Motions for Diverse Objects at Scale
⭐code
🏠project - UGG: Unified Generative Grasping
🏠project机器人 - Decomposed Vector-Quantized Variational Autoencoder for Human Grasp Generation
⭐code - Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation
🏠project机器人
- Robo-ABC: Affordance Generalization Beyond Categories via Semantic Correspondence for Robot Manipulation
- 导航
- VPR
- Close, But Not There: Boosting Geographic Distance Sensitivity in Visual Place Recognition
⭐code - Navigation Instruction Generation with BEV Perception and Large Language Models
⭐code - Revisit Anything: Visual Place Recognition via Image Segment Retrieval
⭐code - VLAD-BuFF: Burst-aware Fast Feature Aggregation for Visual Place Recognition
⭐code - MeshVPR: Citywide Visual Place Recognition Using 3D Meshes
⭐code
- Close, But Not There: Boosting Geographic Distance Sensitivity in Visual Place Recognition
- SLAM
- Deep Patch Visual SLAM
⭐code - RGBD GS-ICP SLAM
⭐code - I2-SLAM: Inverting Imaging Process for Robust Photorealistic Dense SLAM
- Hyperion - A fast, versatile symbolic Gaussian Belief Propagation framework for Continuous-Time SLAM
⭐code - SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM
- LRSLAM: Low-rank Representation of Signed Distance Fields in Dense Visual SLAM System
- I$^2$-SLAM: Inverting Imaging Process for Robust Photorealistic Dense SLAM
- Learn to Memorize and to Forget: A Continual Learning Perspective of Dynamic SLAM
- Self-Supervised Underwater Caustics Removal and Descattering via Deep Monocular SLAM
- CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field
⭐code
- Deep Patch Visual SLAM
- Try-On
- Time-Efficient and Identity-Consistent Virtual Try-On Using A Variant of Altered Diffusion Models
- Improving Virtual Try-On with Garment-focused Diffusion Models
⭐code - Wear-Any-Way: Manipulable Virtual Try-on via Sparse Correspondence Alignment
⭐code
🏠project - Improving Diffusion Models for Authentic Virtual Try-on in the Wild
⭐code - D4-VTON: Dynamic Semantics Disentangling for Differential Diffusion based Virtual Try-On
⭐code - WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models
⭐code
- 交叉地理定位
- GAReT: Cross-view Video Geolocalization with Adapters and Auto-Regressive Transformers
⭐code - Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network
⭐code - ConGeo: Robust Cross-view Geo-localization across Ground View Variations
⭐code
🏠project交叉视角地理定位 - Benchmarking the Robustness of Cross-view Geo-localization Models
- CityGuessr: City-Level Video Geo-Localization on a Global Scale
- GAReT: Cross-view Video Geolocalization with Adapters and Auto-Regressive Transformers
- 地理定位
- Avatars(虚拟人)
- CanonicalFusion: Generating Drivable 3D Human Avatars from Multiple Images
⭐code - RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models
⭐code - MeshAvatar: Learning High-quality Triangular Human Avatars from Multi-view Videos
⭐code - PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations
🏠project - iHuman: Instant Animatable Digital Humans From Monocular Videos
- PAV: Personalized Head Avatar from Unstructured Video Collection
🏠project - Disentangled Clothed Avatar Generation from Text Descriptions
🏠project服装头像生成 - MagicMirror: Fast and High-Quality Avatar Generation with Constrained Search Space
🏠project - 3DFG-PIFu: 3D Feature Grids for Human Digitization from Sparse Views
- FAMOUS: High-Fidelity Monocular 3D Human Digitization Using View Synthesis
⭐code3D 人体数字化 - Instant 3D Human Avatar Generation using Image Diffusion Models
🏠project - Let the Avatar Talk using Texts without Paired Training Data
- CanonicalFusion: Generating Drivable 3D Human Avatars from Multiple Images
- VR
- Controllable Human-Object Interaction Synthesis
🏠project - F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions
- Interaction-centric Spatio-Temporal Context Reasoning for Multi-Person Video HOI Recognition
⭐code - Look Hear: Gaze Prediction for Speech-directed Human Attention
⭐code - Boosting Gaze Object Prediction via Pixel-level Supervision from Vision Foundation Model
⭐code - Revisit Human-Scene Interaction via Space Occupancy
🏠project人机交互 - Exploring Conditional Multi-Modal Prompts for Zero-shot HOI Detection
⭐code - AFF-ttention! Affordances and Attention models for Short-Term Object Interaction Anticipation
- 手-物
- NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model
- Dense Hand-Object(HO) GraspNet with Full Grasping Taxonomy and Dynamics
⭐code - Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection?
⭐code - Coarse-to-Fine Implicit Representation Learning for 3D Hand-Object Reconstruction from a Single RGB-D Image
- De-confounded Gaze Estimation
- 3DGazeNet: Generalizing Gaze Estimation with Weak Supervision from Synthetic Views
⭐code - LG-Gaze: Learning Geometry-aware Continuous Prompts for Language-Guided Gaze Estimation
- Gaze Target Detection Based on Head-Local-Global Coordination
- LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning
⭐code
🏠project - ActionSwitch: Class-agnostic Detection of Simultaneous Actions in Streaming Videos
- Spatio-Temporal Proximity-Aware Dual-Path Model for Panoramic Activity Recognition
- Motion Keyframe Interpolation for Any Human Skeleton using Point Cloud-based Human Motion Data Homogenisation运动关键帧插值
- 基于骨架的动作识别
- SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders
⭐code - Towards Physical World Backdoor Attacks against Skeleton Action Recognition
🏠project - S-JEPA: A Joint Embedding Predictive Architecture for Skeletal Action Recognition
🏠project - Idempotent Unsupervised Representation Learning for Skeleton-Based Action Recognition
⭐code - CrossGLG: LLM Guides One-shot Skeleton-based 3D Action Recognition in a Cross-level Manner
- SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders
- 小样本动作识别
- 时序动作检测
- 时序动作定位
- HAT: History-Augmented Anchor Transformer for Online Temporal Action Localization
⭐code - Towards Adaptive Pseudo-label Learning for Semi-Supervised Temporal Action Localization
- Online Temporal Action Localization with Memory-Augmented Transformer
🏠project - Stepwise Multi-grained Boundary Detector for Point-supervised Temporal Action Localization
- HAT: History-Augmented Anchor Transformer for Online Temporal Action Localization
- 时序动作分割
- Long-Tail Temporal Action Segmentation with Group-wise Temporal Logit Adjustment
⭐code - Two-Stage Active Learning for Efficient Temporal Action Segmentation
- Language-Assisted Skeleton Action Understanding for Skeleton-Based Temporal Action Segmentation
⭐code - Synchronization is All You Need: Exocentric-to-Egocentric Transfer for Temporal Action Segmentation with Unlabeled Synchronized Video Pairs
⭐code
- Long-Tail Temporal Action Segmentation with Group-wise Temporal Logit Adjustment
- 动作质量评估
- Semi-Supervised Teacher-Reference-Student Architecture for Action Quality Assessment
⭐code - RICA^2: Rubric-Informed, Calibrated Assessment of Actions
🏠project - Vision-Language Action Knowledge Learning for Semantic-Aware Action Quality Assessment动作质量评估
- MAGR: Manifold-Aligned Graph Regularization for Continual Action Quality Assessment
⭐code
- Semi-Supervised Teacher-Reference-Student Architecture for Action Quality Assessment
- 动作预测
- 动作识别
- Referring Atomic Video Action Recognition
⭐code - DEAR: Depth-Enhanced Action Recognition
- Bayesian Evidential Deep Learning for Online Action Detection
- C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition
⭐code - Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition
- Classification Matters: Improving Video Action Detection with Class-Specific Attention
- FinePseudo: Improving Pseudo-Labelling through Temporal-Alignablity for Semi-Supervised Fine-Grained Action Recognition
🏠project - Context-Aware Action Recognition: Introducing a Comprehensive Dataset for Behavior Contrast
- Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition
🏠project - On the Utility of 3D Hand Poses for Action Recognition
🏠project - POET: Prompt Offset Tuning for Continual Human Action Adaptation
⭐code - Occluded Gait Recognition with Mixture of Experts: An Action Detection Perspective
⭐code - Leveraging temporal contextualization for video action recognition
⭐code - Optimizing Factorized Encoder Models: Time and Memory Reduction for Scalable and Efficient Action Recognition
- SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition
🏠project
- Referring Atomic Video Action Recognition
- 动作理解
- 群体动作识别
- 癫痫发作检测
- DriveLM: Driving with Graph Visual Question Answering
⭐code - Diffusion-Refined VQA Annotations for Semi-Supervised Gaze Following
- WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question Answering
⭐code - GRACE: Graph-Based Contextual Debiasing for Fair Visual Question Answering
- Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge
⭐code - Compositional Substitutivity of Visual Reasoning for Visual Question Answering
⭐code - Fully Authentic Visual Question Answering Dataset from Online Communities
🏠project - An Explainable Vision Question Answer Model via Diffusion Chain-of-Thought
- 音视频问答
- 视频问答
- Video Question Answering with Procedural Programs
🏠project - ViLA: Efficient Video-Language Alignment for Video Question Answering
⭐code - TimeCraft: Navigate Weakly-Supervised Temporal Grounded Video Question Answering via Bi-directional ReasoningVQA
- AutoEval-Video: An Automatic Benchmark for Assessing Large Vision Language Models in Open-Ended Video Question Answering
⭐code
- Video Question Answering with Procedural Programs
- 视听问答
- Event-Based Motion Magnification
⭐code - Learning-based Axial Video Motion Magnification
🏠project - SMooDi: Stylized Motion Diffusion Model
⭐code - Length-Aware Motion Synthesis via Latent Diffusion
⭐code - HUMOS: Human Motion Model Conditioned on Body Shape
⭐code - HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance
⭐code - Generating Physically Realistic and Directable Human Motions from Multi-Modal Inputs
🏠project - Generating Human Interaction Motions in Scenes with Text Control
🏠project运动生成 - Motion Mamba: Efficient and Long Sequence Motion Generation
⭐code
🏠project - Large Motion Model for Unified Multi-Modal Motion Generation
🏠project - EMDM: Efficient Motion Diffusion Model for Fast and High-Quality Motion Generation
⭐code
🏠project - Bridging the Gap Between Human Motion and Action Semantics via Kinematics Phrases
🏠project人体运动 - TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos
🏠project人体运动 - Nymeria: A Massive Collection of Egocentric Multi-modal Human Motion in the Wild人体运动
- FreeMotion: MoCap-Free Human Motion Synthesis with Multimodal Large Language Models
- MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model
⭐code - Realistic Human Motion Generation with Cross-Diffusion Models
🏠project人体运动 - CoMo: Controllable Motion Generation through Language Guided Pose Code Editing
🏠project生成可控运动 - TLControl: Trajectory and Language Control for Human Motion Synthesis
🏠project人体运动合成 - Retrieval Robust to Object Motion Blur
⭐[code]((https://github.com/Rong-Zou/Retrieval-Robust-to-Object-Motion-Blur) - 三维人体运动合成
- 文本-动作合成
- 人体运动预测
- 人体运动估计
- 运动估计
- 舞蹈生成
- 行为生成
- 运动迁移
- 运动预测
- Human-in-the-Loop Visual Re-ID for Population Size Estimation
⭐code - 行人重识别
- Keypoint Promptable Re-Identification
⭐code - Privacy-Preserving Adaptive Re-Identification without Image Transfer
- Rethinking Normalization Layers for Domain Generalizable Person Re-identification
⭐code - Domain Shifting: A Generalized Solution for Heterogeneous Cross-Modality Person Re-Identification
- VI-ReID
- Keypoint Promptable Re-Identification
- 人物搜索
- 步态识别
- 计数
- SEED: A Simple and Effective 3D DETR in Point Clouds
⭐code - PointLLM: Empowering Large Language Models to Understand Point Clouds
⭐code
🏠project - TransCAD: A Hierarchical Transformer for CAD Sequence Inference from Point Clouds
- Learning to Adapt SAM for Segmenting Cross-domain Point Clouds
- Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time
- milliFlow: Scene Flow Estimation on mmWave Radar Point Cloud for Human Motion Sensing
⭐code - Fast Point Cloud Geometry Compression with Context-based Residual Coding and INR-based Refinement
- Learning Local Pattern Modularization for Point Cloud Reconstruction from Unseen Classes
⭐code - T-MAE: Temporal Masked Autoencoders for Point Cloud Representation Learning
⭐code - Progressive Classifier and Feature Extractor Adaptation for Unsupervised Domain Adaptation on Point Clouds
⭐code - PFGS: High Fidelity Point Cloud Rendering via Feature Splatting
⭐code - Masked Motion Prediction with Semantic Contrast for Point Cloud Sequence Learning
⭐code - To Supervise or Not to Supervise: Understanding and Addressing the Key Challenges of Point Cloud Transfer Learning
- Relightable 3D Gaussians: Realistic Point Cloud Relighting with BRDF Decomposition and Ray Tracing
⭐code - FastPCI: Motion-Structure Guided Fast Point Cloud Frame Interpolation
⭐code - 点云生成
- RangeLDM: Fast Realistic LiDAR Point Cloud Generation
⭐code - Text2LiDAR: Text-guided LiDAR Point Clouds Generation via Equirectangular Transformer
⭐code - Fast Training of Diffusion Transformer with Extreme Masking for 3D Point Clouds Generation
🏠project - FrePolad: Frequency-Rectified Point Latent Diffusion for Point Cloud Generation
🏠project
- RangeLDM: Fast Realistic LiDAR Point Cloud Generation
- 点云完成
- Explicitly Guided Information Interaction Network for Cross-modal Point Cloud Completion
⭐code - T-CorresNet: Template Guided 3D Point Cloud Completion with Correspondence Pooling Query Generation Strategy
⭐code - AEDNet: Adaptive Embedding and Multiview-Aware Disentanglement for Point Cloud Completion
- EINet: Point Cloud Completion via Extrapolation and Interpolation
⭐code - Syn-to-Real Domain Adaptation for Point Cloud Completion via Part-based Approach
⭐code - ProtoComp: Diverse Point Cloud Completion with Controllable Prototype
⭐code
- Explicitly Guided Information Interaction Network for Cross-modal Point Cloud Completion
- 点云重建
- 点云理解
- 点云配准
- ML-SemReg: Boosting Point Cloud Registration with Multi-level Semantic Consistency
⭐code - PointRegGPT: Boosting 3D Point Cloud Registration using Generative Point-Cloud Pairs for Training
⭐code - SemReg: Semantics Constrained Point Cloud Registration
⭐code - Correspondence-Free SE(3) Point Cloud Registration in RKHS via Unsupervised Equivariant Learning
🏠project - UMERegRobust – Universal Manifold Embedding Compatible Features for Robust Point Cloud Registration
⭐code - PARE-Net: Position-Aware Rotation-Equivariant Networks for Robust Point Cloud Registration
⭐code - UMERegRobust -- Universal Manifold Embedding Compatible Features for Robust Point Cloud Registration
⭐code - Equi-GSPR: Equivariant SE(3) Graph Network Model for Sparse Point Cloud Registration点云配准
- ML-SemReg: Boosting Point Cloud Registration with Multi-level Semantic Consistency
- 点云分割
- Dual-level Adaptive Self-Labeling for Novel Class Discovery in Point Cloud Segmentation
- HGL: Hierarchical Geometry Learning for Test-time Adaptation in 3D Point Cloud Segmentation
⭐code - SegPoint: Segment Any Point Cloud via Large Language Model
⭐code - Localization and Expansion: A Decoupled Framework for Point Cloud Few-shot Semantic Segmentation
- Pseudo-Embedding for Generalized Few-Shot Point Cloud Segmentation
⭐code - Subspace Prototype Guidance for Mitigating Class Imbalance in Point Cloud Semantic Segmentation
⭐code
- 点云理解
- 3D点云
- Implicit Filtering for Learning Neural Signed Distance Functions from 3D Point Clouds
⭐code - CloudFixer: Test-Time Adaptation for 3D Point Clouds via Diffusion-Guided Geometric Transformation
⭐code - FLAT: Flux-aware Imperceptible Adversarial Attacks on 3D Point Clouds
- RISurConv: Rotation Invariant Surface Attention-Augmented Convolutions for 3D Point Cloud Classification and Segmentation
- P2P-Bridge: Diffusion Bridges for 3D Point Cloud Denoising
⭐code - Heterogeneous Graph Learning for Scene Graph Prediction in 3D Point Clouds
- Hiding Imperceptible Noise in Curvature-Aware Patches for 3D Point Cloud Attack3D 点云攻击
- Continuous SO(3) Equivariant Convolution for 3D Point Cloud Analysis
⭐code - Frugal 3D Point Cloud Model Training via Progressive Near Point Filtering and Fused Aggregation
- Implicit Filtering for Learning Neural Signed Distance Functions from 3D Point Clouds
- Continuous Memory Representation for Anomaly Detection
⭐code - Dissolving Is Amplifying: Towards Fine-Grained Anomaly Detection
⭐code - Few-Shot Anomaly-Driven Generation for Anomaly Classification and Segmentation
⭐code - GeneralAD: Anomaly Detection Across Domains by Attending to Distorted Features
⭐code - Learning Diffusion Models for Multi-View Anomaly Detection
- Hierarchical Gaussian Mixture Normalizing Flow Modeling for Unified Anomaly Detection
⭐code - TransFusion -- A Transparency-Based Diffusion Model for Anomaly Detection
⭐code - Unsupervised, Online and On-The-Fly Anomaly Detection For Non-Stationary Image Distributions
⭐code - MoEAD: A Parameter-efficient Model for Multi-class Anomaly Detection
⭐code - 缺陷检测
- 故障检测
- 3D异常检测
- 工业异常检测
- Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection
- A Unified Anomaly Synthesis Strategy with Gradient Ascent for Industrial Anomaly Detection and Localization
⭐code - GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection
⭐code - AD3: Introducing a score for Anomaly Detection Dataset Difficulty assessment using VIADUCT dataset
- Learning to Detect Multi-class Anomalies with Just One Normal Image Prompt
- 零样本异常检测
- 多类异常检测
- OOD
- Gradient-Regularized Out-of-Distribution Detection
- SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning
- PixOOD: Pixel-Level Out-of-Distribution Detection
⭐code - An Information Theoretical View for Out-Of-Distribution Detection
- Learning Non-Linear Invariants for Unsupervised Out-of-Distribution Detection
- LAPT: Label-driven Automated Prompt Tuning for OOD Detection with Vision-Language Models
⭐code - ProSub: Probabilistic Open-Set Semi-Supervised Learning with Subspace-Based Out-of-Distribution Detection
⭐code - Diffusion for Out-of-Distribution Detection on Road Scenes and Beyond
⭐code - Can Your Generative Model Detect Out-of-Distribution Covariate Shift?
- Gradient-based Out-of-Distribution Detection
- Vision-Language Dual-Pattern Matching for Out-of-Distribution Detection
- TAG: Text Prompt Augmentation for Zero-Shot Out-of-Distribution Detection
⭐code
- 异常值检测
- 零样本异常分割
- SweepNet: Unsupervised Learning Shape Abstraction via Neural Sweepers
🏠project - Region-aware Distribution Contrast: A Novel Approach to Multi-Task Partially Supervised Learning
⭐code - 自监督
- CroMo-Mixup: Augmenting Cross-Model Representations for Continual Self-Supervised Learning
⭐code - HPFF: Hierarchical Locally Supervised Learning with Patch Feature Fusion
⭐code - SCPNet: Unsupervised Cross-modal Homography Estimation via Intra-modal Self-supervised Learning
⭐code - Efficient Unsupervised Visual Representation Learning with Explicit Cluster Balancing
- OmniSat: Self-Supervised Modality Fusion for Earth Observation
⭐code
🏠project
🌻dataset - FroSSL: Frobenius Norm Minimization for Efficient Multiview Self-Supervised Learning
- Self-supervised visual learning from interactions with objects
- Exploiting Supervised Poison Vulnerability to Strengthen Self-Supervised Defense
- GenView: Enhancing View Quality with Pretrained Generative Model for Self-Supervised Learning
⭐code - On Pretraining Data Diversity for Self-Supervised Learning
⭐code - Decoupling Common and Unique Representations for Multimodal Self-supervised Learning
⭐code - POA: Pre-training Once for Models of All Sizes
⭐code - ViC-MAE: Self-Supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders自监督表示学习
- Pose-Aware Self-Supervised Learning with Viewpoint Trajectory Regularization
🏠project自监督学习 - SSL-Cleanse: Trojan Detection and Mitigation in Self-Supervised Learning
⭐code
- CroMo-Mixup: Augmenting Cross-Model Representations for Continual Self-Supervised Learning
- 半监督
- Image-Feature Weak-to-Strong Consistency: An Enhanced Paradigm for Semi-Supervised Learning
- Improving 3D Semi-supervised Learning by Effectively Utilizing All Unlabelled Data
⭐code - SCOMatch: Alleviating Overtrusting in Open-set Semi-supervised Learning
⭐code - ExMatch: Self-guided Exploitation for Semi-Supervised Learning with Scarce Labeled Samples
- Rebalancing Using Estimated Class Distribution for Imbalanced Semi-Supervised Learning under Class Distribution Mismatch半监督学习
- Towards Latent Masked Image Modeling for Self-Supervised Visual Representation Learning
⭐code - Flexible Distribution Alignment: Towards Long-tailed Semi-supervised Learning with Proper Calibration
⭐code
- GKGNet: Group K-Nearest Neighbor based Graph Convolutional Network for Multi-Label Image Recognition
⭐codeGNN - Graph Neural Network Causal Explanation via Neural Causal Models
⭐code - On the Topology Awareness and Generalization Performance of Graph Neural Networks
- Causal Subgraphs and Information Bottlenecks: Redefining OOD Robustness in Graph Neural Networks
- Auto-GAS: Automated Proxy Discovery for Training-free Generative Architecture Search
⭐code - Auto-DAS: Automated Proxy Discovery for Training-free Distillation-aware Architecture Search
⭐code蒸馏感 - SuperFedNAS: Cost-Efficient Federated Neural Architecture Search for On-Device Inference
- Dependency-aware Differentiable Neural Architecture Search
- DεpS: Delayed ε-Shrinking for Faster Once-For-All Training
- 模型压缩
- 剪枝
- Non-transferable Pruning
- Straightforward Layer-wise Pruning for More Efficient Visual Adaptation
- Isomorphic Pruning for Vision Models
⭐code - LPViT: Low-Power Semi-structured Pruning for Vision Transformers
- PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for Faster Inference
⭐code剪 - Enhanced Sparsification via Stimulative Training
⭐code - SNP: Structured Neuron-level Pruning to Preserve Attention Scores
⭐code
- 量化
- GenQ: Quantization in Low Data Regimes with Generative Synthetic Data
⭐code - MetaAug: Meta-Data Augmentation for Post-Training Quantization
- Toward INT4 Fixed-Point Training via Exploring Quantization Error for Gradients
- CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-Training Quantization of ViTs
⭐code - AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer
⭐code - POCA: Post-training Quantization with Temporal Alignment for Codec Avatars
🏠project量化
- GenQ: Quantization in Low Data Regimes with Generative Synthetic Data
- KD
- Simple Unsupervised Knowledge Distillation With Space Similarity知识蒸馏
- Direct Distillation between Different DomainsKD
- Harmonizing knowledge Transfer in Neural Network with Unified Distillation
- Good Teachers Explain: Explanation-Enhanced Knowledge Distillation
- The Role of Masking for Efficient Supervised Knowledge Distillation of Vision Transformers
- Improving Knowledge Distillation via Regularizing Feature Direction and Norm
- Adversarially Robust Distillation by Reducing the Student-Teacher Variance Gap蒸馏
- Improving Zero-shot Generalization of Learned Prompts via Unsupervised Knowledge Distillation
⭐code - UNIKD: UNcertainty-Filtered Incremental Knowledge Distillation for Neural Implicit Representation
⭐code - BKDSNN: Enhancing the Performance of Learning-based Spiking Neural Networks Training with Blurred Knowledge Distillation
- Nickel and Diming Your GAN: A Dual-Method Approach to Enhancing GAN Efficiency via Knowledge Distillation
- How to Train the Teacher Model for Effective Knowledge Distillation
- Markov Knowledge Distillation: Make Nasty Teachers trained by Self-undermining Knowledge Distillation Fully Distillable
- Spline-based Transformers
- Denoising Vision Transformers
- FairViT: Fair Vision Transformer via Adaptive Masking
- Rotary Position Embedding for Vision Transformer
⭐code - Bidirectional Progressive Transformer for Interaction Intention Anticipation
- Robustness Tokens: Towards Adversarial Robustness of Transformers
- SpecFormer: Guarding Vision Transformer Robustness via Maximum Singular Value Penalization
⭐code - PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers
- OAT: Object-Level Attention Transformer for Gaze Scanpath Prediction
⭐code - AugDETR: Improving Multi-scale Learning for Detection TransformerTransformer
- AttnZero: Efficient Attention Discovery for Vision Transformers
⭐code - SpatialFormer: Towards Generalizable Vision Transformers with Explicit Spatial Understanding
⭐code - Efficient Vision Transformers with Partial Attention
- SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers
⭐code - Stitched ViTs are Flexible Vision Backbones
⭐code - Token Compensator: Altering Inference Cost of Vision Transformer without Re-Tuning
- Uncertainty-Driven Spectral Compressive Imaging with Spatial-Frequency Transformer
⭐code - GiT: Towards Generalist Vision Transformer through Universal Language Interface
⭐code - An Optimal Control View of LoRA and Binary Controller Design for Vision Transformers
- Fairness-aware Vision Transformer via Debiased Self-Attention
⭐code - ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention
⭐code - LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors
🏠project - Parameter-Efficient and Memory-Efficient Tuning for Vision Transformer: A Disentangled Approach
🏠project - LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer
⭐code - Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators
⭐code - BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos
⭐code - An Efficient and Effective Transformer Decoder-Based Framework for Multi-Task Visual Grounding
⭐code
- Learning to Unlearn for Robust Machine Unlearning
- Is Retain Set All You Need in Machine Unlearning? Restoring Performance of Unlearned Models with Out-Of-Distribution Images
⭐code机器学习 - 机器遗忘
- 对抗
- Improving Adversarial Transferability via Model Alignment
⭐code - Event Trojan: Asynchronous Event-based Backdoor Attacks
⭐code - Data Poisoning Quantization Backdoor Attack
- Flatness-aware Sequential Learning Generates Resilient Backdoors
- WBP: Training-time Backdoor Attacks through Hardware-based Weight Bit Poisoning
⭐code - Cocktail Universal Adversarial Attack on Deep Neural Networks
- TrojVLM: Backdoor Attack Against Vision Language Models
- CatchBackdoor: Backdoor Detection via Critical Trojan Neural Path Fuzzing
- Prompt-Driven Contrastive Learning for Transferable Adversarial Attacks
⭐code - Self-Supervised Representation Learning for Adversarial Attack Detection
- Prediction Exposes Your Face: Black-box Model Inversion via Prediction Alignment
- CLIP-Guided Networks for Transferable Targeted Attacks
- CLIP-Guided Generative Networks for Transferable Targeted Adversarial Attacks
- Exploring Vulnerabilities in Spiking Neural Networks: Direct Adversarial Attacks on Raw Event Data
- UNIT: Backdoor Mitigation via Automated Neural Distribution Tightening
⭐code - Inter-Class Topology Alignment for Efficient Black-Box Substitute Attacks黑盒
- Any Target Can be Offense: Adversarial Example Generation via Generalized Latent Infection
⭐code - AdvDiff: Generating Unrestricted Adversarial Examples using Diffusion Models
⭐code - Enhancing Tracking Robustness with Auxiliary Adversarial Defense Networks
- DIFFender: Diffusion-Based Adversarial Defense against Patch Attacks
⭐code
- Improving Adversarial Transferability via Model Alignment
- 持续学习
- CLEO: Continual Learning of Evolving Ontologies
- One-stage Prompt-based Continual Learning
- Exemplar-free Continual Representation Learning via Learnable Drift Compensation
⭐code - Diffusion-Driven Data Replay: A Novel Approach to Combat Forgetting in Federated Class Continual Learning
⭐code - Semantic Residual Prompts for Continual Learning
⭐code - Pick-a-back: Selective Device-to-Device Knowledge Transfer in Federated Continual Learning
⭐code - RCS-Prompt: Learning Prompt to Rearrange Class Space for Prompt-based Continual Learning
⭐code - PromptFusion: Decoupling Stability and Plasticity for Continual Learning
⭐code - Information Bottleneck Based Data Correction in Continual Learning
- Revisiting Supervision for Continual Representation Learning
⭐code持续 - Anytime Continual Learning for Open Vocabulary Classification
⭐code - MagMax: Leveraging Model Merging for Seamless Continual Learning
- Beyond Prompt Learning: Continual Adapter for Efficient Rehearsal-Free Continual Learning
- 迁移学习
- 主动学习
- Dataset Quantization with Active Learning based Adaptive Sampling
- Generalized Coverage for More Robust Low-Budget Active Learning
- Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding主动学习
- Bidirectional Uncertainty-Based Active Learning for Open-Set Annotation主动学习
- Exploring Active Learning in Meta-Learning: Enhancing Context Set Labeling
- 强化学习
- Reinforcement Learning Meets Visual Odometry
- Large-scale Reinforcement Learning for Diffusion Models
- Reinforcement Learning via Auxillary Task Distillation
- Reinforcement Learning Friendly Vision-Language Model for Minecraft
⭐code - Multimodal Label Relevance Ranking via Reinforcement Learning
⭐code - Enhancing Diffusion Models with Text-Encoder Reinforcement Learning
⭐code - Diffusion Models as Optimizers for Efficient Planning in Offline RL
⭐code - Unified Local-Cloud Decision-Making via Reinforcement Learning
🏠project强化学习
- 联邦学习
- Towards Multi-modal Transformers in Federated Learning
⭐code - FedHide: Federated Learning by Hiding in the Neighbors
- FedHARM: Harmonizing Model Architectural Diversity in Federated Learning
⭐code - FedTSA: A Cluster-based Two-Stage Aggregation Method for Model-heterogeneous Federated Learning
- Unlocking the Potential of Federated Learning: The Symphony of Dataset Distillation via Deep Generative Latents
⭐code - PFedEdit: Personalized Federated Learning via Automated Model Editing
⭐code - Overcome Modal Bias in Multi-modal Federated Learning via Balanced Modality Selection
- Fisher Calibration for Backdoor-Robust Heterogeneous Federated Learning
⭐code - Federated Learning with Local Openset Noisy Labels
⭐code - SkyMask: Attack-agnostic Robust Federated Learning with Fine-grained Learnable Masks
⭐code
- Towards Multi-modal Transformers in Federated Learning
- 对比学习
- FlowCon: Out-of-Distribution Detection using Flow-based Contrastive Learning
⭐code - Improving Medical Multi-modal Contrastive Learning with Expert Annotations
- Contrastive Learning with Synthetic Positives对比学习
- Understanding and Mitigating Human-Labelling Errors in Supervised Contrastive Learning
- Adaptive Multi-head Contrastive Learning
- CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts
⭐code对比学习 - Learning the Unlearned: Mitigating Feature Suppression in Contrastive Learning
⭐code
- FlowCon: Out-of-Distribution Detection using Flow-based Contrastive Learning
- 类增量
- Rethinking Few-shot Class-incremental Learning: Learning from Yourself
⭐code - Few-shot Class Incremental Learning with Attention-Aware Self-Adaptive Prompt
- Class-Incremental Learning with CLIP: Adaptive Representation Adjustment and Parameter Fusion
⭐code - Versatile Incremental Learning: Towards Class and Domain-Agnostic Incremental Learning
⭐code - Confidence Self-Calibration for Multi-Label Class-Incremental Learning
⭐[code](https://github.com/ Kaile-Du/CSC) - Canonical Shape Projection is All You Need for 3D Few-shot Class Incremental Learning
⭐code - Personalized Federated Domain-Incremental Learning based on Adaptive Knowledge Matching
- PILoRA: Prototype Guided Incremental LoRA for Federated Class-Incremental Learning
⭐code - CLOSER: Towards Better Representation Learning for Few-Shot Class-Incremental Learning
⭐code - Non-Exemplar Domain Incremental Learning via Cross-Domain Concept Integration
⭐code - On the Approximation Risk of Few-Shot Class-Incremental Learning
⭐code - iNeMo: Incremental Neural Mesh Models for Robust Class-Incremental Learning
⭐code - DiffClass: Diffusion-Based Class Incremental Learning
- Rethinking Few-shot Class-incremental Learning: Learning from Yourself
- 上下文学习
- 多任务学习
- 多实例学习
- 多模态学习
- Source-Free Domain-Invariant Performance Prediction
- The Devil is in the Few Shots: Iterative Visual Knowledge Completion for Few-shot Learning
⭐code - DG
- Towards Multimodal Open-Set Domain Generalization and Adaptation through Self-supervision
⭐code - Feature Diversification and Adaptation for Federated Domain Generalization
- Soft Prompt Generation for Domain Generalization
⭐code - Integrating Markov Blanket Discovery into Causal Representation Learning for Domain Generalization
- Rethinking LiDAR Domain Generalization: Single Source as Multiple Density Domains
⭐code - Improving Zero-Shot Generalization for CLIP with Variational Adapter
- Representation Enhancement-Stabilization: Reducing Bias-Variance of Domain Generalization
⭐code - Local and Global Flatness for Federated Domain Generalization
⭐code - Learn to Preserve and Diversify: Parameter-Efficient Group with Orthogonal Regularization for Domain Generalization
- Disentangling Masked Autoencoders for Unsupervised Domain Generalization
⭐code
- Towards Multimodal Open-Set Domain Generalization and Adaptation through Self-supervision
- DA
- Training-Free Model Merging for Multi-target Domain Adaptation
⭐code - MC-PanDA: Mask Confidence for Panoptic Domain Adaptation
⭐code - Is user feedback always informative? Retrieval Latent Defending for Semi-Supervised Domain Adaptation without Source Data
🏠project - De-Confusing Pseudo-Labels in Source-Free Domain Adaptation
- Open-set Domain Adaptation via Joint Error based Multi-class Positive and Unlabeled Learning
- Robust Nearest Neighbors for Source-Free Domain Adaptation under Class Distribution Shift
⭐code - HVCLIP: High-dimensional Vector in CLIP for Unsupervised Domain Adaptation
- Hierarchical Unsupervised Relation Distillation for Source Free Domain Adaptation
- Learn from the Learnt: Source-Free Active Domain Adaptation via Contrastive Sampling and Visual Persistence
⭐code - Improving Unsupervised Domain Adaptation: A Pseudo-Candidate Set Approach
- UDA-Bench: Revisiting Common Assumptions in Unsupervised Domain Adaptation Using a Standardized Framework
⭐code - Forget More to Learn More: Domain-specific Feature Unlearning for Semi-supervised and Unsupervised Domain Adaptation
- Train Till You Drop: Towards Stable and Robust Source-free Unsupervised 3D Domain Adaptation
- Get Your Embedding Space in Order: Domain-Adaptive Regression for Forest Monitoring
🏠project - CoDA: Instructive Chain-of-Domain Adaptation with Severity-Aware Visual Prompt Tuning
⭐code - COD: Learning Conditional Invariant Representation for Domain Adaptation Regression
- Plug and Play: A Representation Enhanced Domain Adapter for Collaborative Perception
⭐code - DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception
- Training-Free Model Merging for Multi-target Domain Adaptation
- 零样本
- Sapiens: Foundation for Human Vision Models
- Conceptual Codebook Learning for Vision-Language Models
- DEAL: Disentangle and Localize Concept-level Explanations for VLMs
- FlexAttention for Efficient High-Resolution Vision-Language Models
🏠project - QUAR-VLA: Vision-Language-Action Model for Quadruped Robots
- Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
- REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models
🏠project - Octopus: Embodied Vision-Language Programmer from Environmental Feedback
🏠project - GazeXplain: Learning to Predict Natural Language Explanations of Visual Scanpaths
- Learning Chain of Counterfactual Thought for Bias-Robust Vision-Language Reasoning
⭐code - Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory
⭐code - Cascade Prompt Learning for Vision-Language Model Adaptation
⭐code - The Hard Positive Truth about Vision-Language Compositionality
- Improving 2D Feature Representations by 3D-Aware Fine-Tuning
⭐code - Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Models
⭐code - Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models
- ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference
⭐code - FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot Performance
🏠project - Deciphering the Role of Representation Disentanglement: Investigating Compositional Generalization in CLIP Models
⭐code - GalLoP: Learning Global and Local Prompts for Vision-Language Models
- Quantized Prompt for Efficient Generalization of Vision-Language Models
⭐code - AddressCLIP: Empowering Vision-Language Models for City-wide Image Address Localization
⭐code
:Thumbsup:AddressCLIP:一张图实现街道级定位,端到端图像地理定位大模型 - SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant
- Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training
🏠project - Cascade Prompt Learning for Visual-Language Model Adaptation
⭐code - Select and Distill: Selective Dual-Teacher Knowledge Transfer for Continual Learning on Vision-Language Models
🏠project - Unveiling Typographic Deceptions: Insights of the Typographic Vulnerability in Large Vision-Language Model
- Introducing Routing Functions to Vision-Language Parameter-Efficient Fine-Tuning with Low-Rank Bottlenecks
- Take A Step Back: Rethinking the Two Stages in Visual Reasoning
⭐code - HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning
⭐code视觉推理 - Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models
⭐code - An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
⭐code - Improving Vision and Language Concepts Understanding with Multimodal Counterfactual Samples
⭐code - Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models
⭐code - SDPT: Synchronous Dual Prompt Tuning for Fusion-based Visual-Language Pre-trained Models
⭐code - Robust Calibration of Large Vision-Language Adapters
⭐code - BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models
⭐code - CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMs
- MyVLM: Personalizing VLMs for User-Specific Queries
⭐code - BRAVE: Broadening the visual encoding of vision-language models
🏠project - IVTP: Instruction-guided Visual Token Pruning for Large Vision-Language Models
- ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
⭐code
🏠project - The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?
⭐code - Safe-CLIP: Removing NSFW Concepts from Vision-and-Language Models
⭐code - Adapt without Forgetting: Distill Proximity from Dual Teachers in Vision-Language Models
⭐code - Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models
- uCAP: An Unsupervised Prompting Method for Vision-Language Models
- Training A Small Emotional Vision Language Model for Visual Art Comprehension
⭐code - Understanding Multi-compositional learning in Vision and Language models via Category Theory
⭐code - Adversarial Prompt Tuning for Vision-Language Models
⭐code - Language-Image Pre-training with Long Captions
⭐code - CoReS: Orchestrating the Dance of Reasoning and Segmentation
⭐code
🏠project - Attention Prompting on Image for Large Vision-Language Models
⭐code - SILC: Improving Vision Language Pretraining with Self-Distillation
- SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference
⭐code - AdaGlimpse: Active Visual Exploration with Arbitrary Glimpse Position and Scale
⭐code - Video-Language
- VLN
- LLM
- BLINK: Multimodal Large Language Models Can See but Not Perceive
🏠project - Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
- X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning
⭐code
🏠project - X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs
- Instruction Tuning-free Visual Token Complement for Multimodal LLMs
- Merlin: Empowering Multimodal LLMs with Foresight Minds
🏠project - Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
🏠project - MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
⭐code - UniCode: Learning a Unified Codebook for Multimodal Large Language Models
- When Do We Not Need Larger Vision Models?
⭐code - ControlLLM: Augment Language Models with Tools by Searching on Graphs
⭐code - Towards Open-Ended Visual Recognition with Large Language Models
⭐code - SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models
⭐code - ST-LLM: Large Language Models Are Effective Temporal Learners
⭐code - Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions
🏠project - How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs
⭐code - BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models
⭐code - MoAI: Mixture of All Intelligence for Large Language and Vision Models
⭐code
🤗huggingface - Paying More Attention to Images: A Training-Free Method for Alleviating Hallucination in LVLMs
🏠project - LLaVA-UHD: an LMM Perceiving any Aspect Ratio and High-Resolution Images
⭐code - Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models
⭐code - LLMGA: Multimodal Large Language Model based Generation Assistant
⭐code
🏠project - Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models
⭐code - LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
⭐code - LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model
🏠project - Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization
- MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
- LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
⭐code - ShapeLLM: Universal 3D Object Understanding for Embodied Interaction
⭐code - Making Large Language Models Better Planners with Reasoning-Decision Alignment
- Self-Adapting Large Visual-Language Models to Edge Devices across Visual Modalities
- Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos
🏠project - AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting
⭐code - DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM
⭐code - GENIXER: Empowering Multimodal Large Language Models as a Powerful Data Generator
⭐code - Elysium: Exploring Object-level Perception in Videos through Semantic Integration Using MLLMs
⭐code
- BLINK: Multimodal Large Language Models Can See but Not Perceive
- 视觉定位
- Visual Grounding
- 视觉意图理解
- 引用表达理解
- 视觉语言理解
- LatentEditor: Text Driven Local Editing of 3D Scenes
🏠project - RoomTex: Texturing Compositional Indoor Scenes via Iterative Inpainting
⭐code室内场景 - Forecasting Future Videos from Novel Views via Disentangled 3D Scene Representation
⭐code - Compact 3D Scene Representation via Self-Organizing Gaussian Grids
⭐code - CARFF: Conditional Auto-encoded Radiance Field for 3D Scene Forecasting
🏠project - 场景合成
- Pyramid Diffusion for Fine 3D Large Scene Generation
⭐code
🏠project
:Thumbsup:西南交大&利兹大学等联合提出金字塔离散扩散模型(PDD),实现了3D户外场景生成的粗到细的策略 - External Knowledge Enhanced 3D Scene Generation from Sketch3D 场景生成
- SceneTeller: Language-to-3D Scene Generation
⭐code - Forest2Seq: Revitalizing Order Prior for Sequential Indoor Scene Synthesis
- Gaussian Grouping: Segment and Edit Anything in 3D Scenes
⭐code - EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion
⭐code - AnyHome: Open-Vocabulary Large-Scale Indoor Scene Generation with First-Person View Exploration室内场景生成
- BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion
🏠project - The Fabrication of Reality and Fantasy: Scene Generation with LLM-Assisted Prompt Interpretation
⭐code - Language-Driven Physics-Based Scene Synthesis and Editing via Feature Splatting
🏠project场景合成和编辑 - WoVoGen: World Volume-aware Diffusion for Controllable Multi-camera Driving Scene Generation
⭐code驾驶场景生成
- Pyramid Diffusion for Fine 3D Large Scene Generation
- 场景理解
- N2F2: Hierarchical Scene Understanding with Nested Neural Feature Fields
- Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding
- SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding
🏠project - Shape2Scene: 3D Scene Representation Learning Through Pre-training on Shape Data
⭐code - nuCraft: Crafting High Resolution 3D Semantic Occupancy for Unified 3D Scene Understanding
- R3DS: Reality-linked 3D Scenes for Panoramic Scene Understanding
🏠project - Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation
⭐code - Agent3D-Zero: An Agent for Zero-shot 3D Understanding
🏠project - MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders
⭐code密集场景理解
- 语义场景完
- 场景图生成
- OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models
⭐code - Semantic Diversity-aware Prototype-based Learning for Unbiased Scene Graph Generation
⭐code - Fine-Grained Scene Graph Generation via Sample-Level Bias Prediction
⭐code - Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention
⭐code
👍突破场景图生成的边界:OvSGTR 实现全开放词汇场景图生成 - A Fair Ranking and New Model for Panoptic Scene Graph Generation
🏠project - Multi-Granularity Sparse Relationship Matrix Prediction Network for End-to-End Scene Graph Generation
⭐code - Towards Scene Graph Anticipation
⭐code
- OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models
- Masked Angle-Aware Autoencoder for Remote Sensing Images
⭐code - Radiance Field Learners As UAV First-Person Viewers
- Geospecific View Generation - Geometry-Context Aware High-resolution Ground View Inference from Satellite Views
🏠project卫星视图 - Probabilistic Image-Driven Traffic Modeling via Remote Sensing
- UAV First-Person Viewers Are Radiance Field Learners
🏠project - MutDet: Mutually Optimizing Pre-training for Remote Sensing Object Detection
⭐code - Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching
- Free-Viewpoint Video of Outdoor Sports Using a Drone
- Learning Representations of Satellite Images From Metadata Supervision(https://github.com/preligens-lab/satmip)卫星图像
- Multi-scale Cross Distillation for Object Detection in Aerial Images
- LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model
⭐code - PDT Uav Target Detection Dataset for Pests and Diseases Tree
⭐code - Contrastive ground-level image and remote sensing pre-training improves representation learning for natural world imagery
🤗huggingface遥感 - Toward Open Vocabulary Aerial Object Detection with CLIP-Activated Student-Teacher Learning
⭐code
- Online Vectorized HD Map Construction using Geometry
⭐code - MUSES: The Multi-Sensor Semantic Perception Dataset for Driving under Uncertainty
⭐code - HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras
⭐code - Continuity Preserving Online CenterLine Graph Learning
- Accelerating Online Mapping and Behavior Prediction via Direct BEV Feature Attention
⭐code - RepVF: A Unified Vector Fields Representation for Multi-task 3D Perception
⭐code - MapDistill: Boosting Efficient Camera-based HD Map Construction via Camera-LiDAR Fusion Model Distillation
- Generative End-to-End Autonomous Driving
⭐code - CARB-Net: Camera-Assisted Radar-Based Network for Vulnerable Road User Detection
⭐code驾驶 - FipTR: A Simple yet Effective Transformer Framework for Future Instance Prediction in Autonomous Driving
⭐code - Mask2Map: Vectorized HD Map Construction Using Bird's Eye View Segmentation Masks
⭐code驾驶 - Gated Temporal Diffusion for Stochastic Long-Term Dense Anticipation
⭐code - CarFormer: Self-Driving with Learned Object-Centric Representations
⭐code - Image-to-Lidar Relational Distillation for Autonomous Driving Data
- Modelling Competitive Behaviors in Autonomous Driving Under Generative World Model
⭐code - LingoQA: Video Question Answering for Autonomous Driving
⭐code - PreSight: Enhancing Autonomous Vehicle Perception with City-Scale NeRF Priors
⭐code - VQA-Diff: Exploiting VQA and Diffusion for Zero-Shot Image-to-3D Vehicle Asset Generation in Autonomous Driving
- TCLC-GS: Tightly Coupled LiDAR-Camera Gaussian Splatting for Autonomous Driving
- Learning to Drive via Asymmetric Self-Play
🏠project - Embodied Understanding of Driving Scenarios
⭐code - Early Anticipation of Driving Maneuvers
🏠project - RealGen: Retrieval Augmented Generation for Controllable Traffic Scenarios
🏠project - Event-Aided Time-To-Collision Estimation for Autonomous Driving
🏠project - Dolphins: Multimodal Language Model for Driving
🏠project - PPAD: Iterative Interactions of Prediction and Planning for End-to-end Autonomous Driving
⭐code - Asynchronous Large Language Model Enhanced Planner for Autonomous Driving
⭐code - Neural Volumetric World Models for Autonomous Driving
- SeFlow: A Self-Supervised Scene Flow Method in Autonomous Driving
⭐code - Random Walk on Pixel Manifolds for Anomaly Segmentation of Complex Driving Scenes
⭐code自动驾驶 - SLEDGE: Synthesizing Driving Environments with Generative Models and Rule-Based Traffic
- I Can't Believe It's Not Scene Flow!
⭐code场景流 - Safe-Sim: Safety-Critical Closed-Loop Traffic Simulation with Diffusion-Controllable Adversaries
🏠project交通 - UniM2AE: Multi-modal Masked Autoencoders with Unified 3D Representation for 3D Perception in Autonomous Driving
⭐code - DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving
🏠project - Think2Drive: Efficient Reinforcement Learning by Thinking with Latent World Model for Autonomous Driving (in CARLA-v2)
🏠project - Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving
⭐code - Improving Agent Behaviors with RL Fine-tuning for Autonomous Driving
- Lane Graph as Path: Continuity-preserving Path-wise Modeling for Online Lane Graph Construction
⭐code - Beyond the Data Imbalance: Employing the Heterogeneous Datasets for Vehicle Maneuver Prediction
⭐code - 轨迹预测
- Learning Semantic Latent Directions for Accurate and Controllable Human Motion Prediction
⭐code - NeRMo: Learning Implicit Neural Representations for 3D Human Motion Prediction
- CoMusion: Towards Consistent Stochastic Human Motion Prediction via Motion Diffusion人体运动预测
- Risk-Aware Self-Consistent Imitation Learning for Trajectory Planning in Autonomous Driving
- Progressive Pretext Task Learning for Human Trajectory Prediction
⭐code - DySeT: a Dynamic Masked Self-distillation Approach for Robust Trajectory Prediction
- VisionTrap: Vision-Augmented Trajectory Prediction Guided by Textual Descriptions
⭐code - Optimizing Diffusion Models for Joint Trajectory Prediction and Controllable Generation
⭐code - Adaptive Human Trajectory Prediction via Latent Corridors
🏠project - NeuroNCAP: Photorealistic Closed-loop Safety Testing for Autonomous Driving
⭐code - MART: MultiscAle Relational Transformer Networks for Multi-agent Trajectory Prediction
⭐code - Local Occupancy-Enhanced Object Grasping with Multiple Triplanar Projection
- 车辆轨迹预测
- Learning Semantic Latent Directions for Accurate and Controllable Human Motion Prediction
- 占据预测
- VEON: Vocabulary-Enhanced Occupancy Prediction
- OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving
⭐code - OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving
🏠project - Fully Sparse 3D Occupancy Prediction
⭐code - Monocular Occupancy Prediction for Scalable Indoor Scenes
⭐code - ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers
⭐code - CVT-Occ: Cost Volume Temporal Fusion for 3D Occupancy Prediction
⭐code - GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction
⭐code3D 语义占用预测
- 车道线检测
- 车辆监控
- Stable Video Portraits
🏠project - Text-Guided Video Masked Autoencoder
- Multi-Modal Video Dialog State Tracking in the Wild
- Training-free Video Temporal Grounding using Large-scale Pre-trained Models
⭐code - Weakly-Supervised Spatio-Temporal Video Grounding with Variational Cross-Modal Alignment
- E3M: Zero-Shot Spatio-Temporal Video Grounding with Expectation-Maximization Multimodal Modulation
⭐code - Rethinking Weakly-supervised Video Temporal Grounding From a Game Perspective
- Fast Encoding and Decoding for Implicit Video Representation
⭐code
🏠project - DEVIAS: Learning Disentangled Video Representations of Action and Scene
⭐code - VideoStudio: Generating Consistent-Content and Multi-Scene Videos
🏠project - VAD
- Cross-Domain Learning for Video Anomaly Detection with Limited Supervision
- Learning Anomalies with Normality Prior for Unsupervised Video Anomaly Detection
⭐code - Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models
⭐code - FedVAD: Enhancing Federated Video Anomaly Detection with GPT-Driven Semantic Distillation
⭐code - Interleaving One-Class and Weakly-Supervised Models with Adaptive Thresholding for Unsupervised Video Anomaly Detection
⭐code视频异常检测
- 视频摘要
- 视频理解
- VideoMamba: Spatio-Temporal Selective State Space Model
⭐code - VideoMamba: State Space Model for Efficient Video Understanding
⭐code - Goldfish: Vision-Language Understanding of Arbitrarily Long Videos
⭐code - Learning Video Context as Interleaved Multimodal Sequences
⭐code - FunQA: Towards Surprising Video Comprehension
🏠project - Vamos: Versatile Action Models for Video Understanding
⭐code
🏠project - Towards Neuro-Symbolic Video Understanding
⭐code - LongVLM: Efficient Long Video Understanding via Large Language Models
⭐code - VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding
⭐code
🏠project - Ex2Eg-MAE: A Framework for Adaptation of Exocentric Video Masked Autoencoders for Egocentric Social Role Understanding
- VideoAgent: Long-form Video Understanding with Large Language Model as Agent
🤗huggingface - InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
⭐code - Text-Conditioned Resampler For Long Form Video Understanding
- VideoMamba: Spatio-Temporal Selective State Space Model
- 视频分类
- 视频解析
- 视频帧插值
- 视频类增量
- 视频抄袭片段定位
- Adaptive Compressed Sensing with Diffusion-Based Posterior Sampling
⭐code - Co-synthesis of Histopathology Nuclei Image-Label Pairs using a Context-Conditioned Joint Diffusion Model
- Identity-Consistent Diffusion Network for Grading Knee Osteoarthritis Progression in Radiographic Imaging
- Multistain Pretraining for Slide Representation Learning in Pathology
⭐code - Energy-induced Explicit quantification for Multi-modality MRI fusion
⭐code - Brain-ID: Learning Contrast-agnostic Anatomical Representations for Brain Imaging
⭐code - CardiacNet: Learning to Reconstruct Abnormalities for Cardiac Disease Assessment from Echocardiogram Videos
⭐code心脏病评估 - Knowledge-enhanced Visual-Language Pretraining for Computational Pathology
⭐code - Effective Lymph Nodes Detection in CT Scans Using Location Debiased Query Selection and Contrastive Query Representation in TransformerCT
- Bridging the Pathology Domain Gap: Efficiently Adapting CLIP for Pathology Image Analysis with Limited Labeled Data病理学图像分析
- Unified Medical Image Pre-training in Language-Guided Common Semantic Space
- Rethinking Deep Unrolled Model for Accelerated MRI Reconstruction
⭐code - Style-Extracting Diffusion Models for Semi-Supervised Histopathology Segmentation
⭐code半监督组织病理学分割 - 组织病理学图像分类
- 切片图像分类
- DGR-MIL: Exploring Diverse Global Representation in Multiple Instance Learning for Whole Slide Image Classification
⭐code - Pathology-knowledge Enhanced Multi-instance Prompt Learning for Few-shot Whole Slide Image Classification
- Snuffy: Efficient Whole Slide Image Classifier
- Norma: A Noise Robust Memory-Augmented Framework for Whole Slide Image Classification
⭐code - Attention-Challenging Multiple Instance Learning for Whole Slide Image Classification
⭐code
- DGR-MIL: Exploring Diverse Global Representation in Multiple Instance Learning for Whole Slide Image Classification
- 医学图像分割
- FairDomain: Achieving Fairness in Cross-Domain Medical Image Segmentation and Classification
🏠project
⭐code - PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation
⭐code - The Devil is in the Statistics: Mitigating and Exploiting Statistics Difference for Generalizable Semi-supervised Medical Image Segmentation
⭐code - Domesticating SAM for Breast Ultrasound Image Segmentation via Spatial-frequency Fusion and Uncertainty Correction
⭐code - Gradient-Aware for Class-Imbalanced Semi-supervised Medical Image Segmentation
⭐code - AnatoMask: Enhancing Medical Image Segmentation with Reconstruction-guided Self-masking
⭐code - Alternate Diverse Teaching for Semi-supervised Medical Image Segmentation
⭐code - I-MedSAM: Implicit Medical Image Segmentation with Segment Anything
⭐code - VP-SAM: Taming Segment Anything Model for Video Polyp Segmentation via Disentanglement and Spatio-temporal Side Network
⭐code息肉分割
- FairDomain: Achieving Fairness in Cross-Domain Medical Image Segmentation and Classification
- 医学图像配准
- 医学报告生成
- X 光片
- 医学机器人
- 生物医学图像
- CT
- Diffusion Models as Data Mining Tools
⭐code - ProCreate, Don't Reproduce! Propulsive Energy Diffusion for Creative Generation
🏠project - Learning to Generate Conditional Tri-plane for 3D-aware Expression Controllable Portrait Animation
🏠project - HiEI: A Universal Framework for Generating High-quality Emerging Images from Natural Images
- UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models
⭐code - Score Distillation Sampling with Learned Manifold Corrective
- CTRLorALTer: Conditional LoRAdapter for Efficient 0-Shot Control & Altering of T2I Models
🏠project - EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
- Inf-DiT: Upsampling any-resolution image with memory-efficient diffusion transformer
⭐code - The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization
🏠project - MONTAGE: Monitoring Training for Attribution of Generative Diffusion Models
- TP2O: Creative Text Pair-to-Object Generation using Balance Swap-Sampling
- Free-ATM: Harnessing Free Attention Masks for Representation Learning on Diffusion-Generated Images
- Idea2Img: Iterative Self-Refinement with GPT-4V for Automatic Image Design and Generation
🏠project - OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models
⭐code - DreamDiffusion: High-Quality EEG-to-Image Generation with Temporal Masked Signal Modeling and CLIP Alignment
- V-Trans4Style: Visual Transition Recommendation for Video Production Style Adaptation视频制作风格适配的视觉转场推荐
- GAN
- CLR-GAN: Improving GANs Stability and Quality via Consistent Latent Representation and Reconstruction
⭐code - A Closer Look at GAN Priors: Exploiting Intermediate Features for Enhanced Model Inversion Attacks
⭐code - Distilling Diffusion Models into Conditional GANs
- Exploring Guided Sampling of Conditional GANs
⭐code - Learning 3D-aware GANs from Unposed Images with Template Feature Field
🏠project
- CLR-GAN: Improving GANs Stability and Quality via Consistent Latent Representation and Reconstruction
- 扩散
- Measuring Style Similarity in Diffusion Models
⭐code - Do text-free diffusion models learn discriminative visual representations
- Iterative Ensemble Training with Anti-Gradient Control for Mitigating Memorization in Diffusion Models
⭐code - ShoeModel: Learning to Wear on the User-specified Shoes via Diffusion Model
- HiDiffusion: Unlocking Higher-Resolution Creativity and Efficiency in Pretrained Diffusion Models
⭐code - Chains of Diffusion Models
- To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now
⭐code - FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation
🏠project - Beta-Tuned Timestep Diffusion Model
- SMooDi: Stylized Motion Diffusion Model
🏠project - Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation
⭐code - Surf-D: Generating High-Quality Surfaces of Arbitrary Topologies Using Diffusion Models
⭐code
🏠project - Implicit Concept Removal of Diffusion Models
🏠project - ZigMa: A DiT-style Zigzag Mamba Diffusion Model
⭐code - ColorPeel: Color Prompt Learning with Diffusion Models via Color and Shape Disentanglement
⭐code
🏠project - Timestep-Aware Correction for Quantized Diffusion Models
- Shapefusion: 3D localized human diffusion models
🏠project - MVDD: Multi-View Depth Diffusion Models
- SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher
⭐code
🏠project
📺video - Layout-Corrector: Alleviating Layout Sticking Phenomenon in Discrete Diffusion Model
⭐code - Compensation Sampling for Improved Convergence in Diffusion Models
⭐code - ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction
⭐code
⭐code - Self-Guided Generation of Minority Samples Using Diffusion Models
⭐code
- Measuring Style Similarity in Diffusion Models
- Zero-Shot Adaptation for Approximate Posterior Sampling of Diffusion Models in Inverse Problems
⭐code - LogoSticker: Inserting Logos into Diffusion Models for Customized Generation
⭐code - 纹理合成
- 图像合成
- Editable Image Elements for Controllable Synthesis
🏠project - Assessing Sample Quality via the Latent Space of Generative Models
⭐code - SCP-Diff: Photo-Realistic Semantic Image Synthesis with Spatial-Categorical Joint Prior
🏠project - Zero-shot Text-guided Infinite Image Synthesis with LLM guidance
- $\infty$-Brush: Controllable Large Image Synthesis with Diffusion Models in Infinite Dimensions
⭐code
⭐code - EpipolarGAN: Omnidirectional Image Synthesis with Explicit Camera Control
- LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model
- Layered Rendering Diffusion Model for Controllable Zero-Shot Image Synthesis
- Label-free Neural Semantic Image Synthesis
- Improving image synthesis with diffusion-negative sampling
- SCP-Diff: Spatial-Categorical Joint Prior for Diffusion Based Semantic Image Synthesis
🏠project - 2S-ODIS: Two-Stage Omni-Directional Image Synthesis by Geometric Distortion Correction
⭐code - Rejection Sampling IMLE: Designing Priors for Better Few-Shot Image Synthesis
- FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis
⭐code
- Editable Image Elements for Controllable Synthesis
- 图像生成
- MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation
⭐code
🏠project - Few-Shot Image Generation by Conditional Relaxing Diffusion Inversion
- Context Diffusion: In-Context Aware Image Generation
🏠project - Few-shot Defect Image Generation based on Consistency Modeling
⭐code - Linearly Controllable GAN: Unsupervised Feature Categorization and Decomposition for Image Generation and Manipulation
- PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-view Self-Guidance
- AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation
⭐code - AccDiffusion: An Accurate Method for Higher-Resolution Image Generation
🏠project
👍成功地进行无重复高分辨率的图像生成 - Towards Reliable Advertising Image Generation Using Human Feedback
⭐code - StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion
⭐code - Model-agnostic Origin Attribution of Generated Images with Few-shot Examples
- Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance
⭐code - Tuning-Free Image Customization with Image and Text Guidance
🏠project - Collaborative Control for Geometry-Conditioned PBR Image Generation
🏠project - DiffiT: Diffusion Vision Transformers for Image Generation
⭐code - MultiGen: Zero-shot Image Generation from Multi-modal Prompts
- Accelerating Image Generation with Sub-path Linear Approximation Model
⭐code
- MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation
- 视频生成
- FreeInit: Bridging Initialization Gap in Video Diffusion Models
⭐code
🏠project - HARIVO: Harnessing Text-to-Image Models for Video Generation
🏠project - SignGen: End-to-End Sign Language Video Generation with Latent Diffusion
⭐code - DragAnything: Motion Control for Anything using Entity Representation
⭐code
🏠project - Physics-Based Interaction with 3D Objects via Video Generation
⭐code
🏠project - DrivingDiffusion: Layout-Guided Multi-View Driving Scenarios Video Generation with Latent Diffusion Model
🏠project - Photorealistic Video Generation with Diffusion Models
🏠project - DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
⭐code - PoseCrafter: One-Shot Personalized Video Synthesis Following Flexible Pose Control
🏠project - MoVideo: Motion-Aware Video Generation with Diffusion Models
🏠project - IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation
⭐code - MagDiff: Multi-Alignment Diffusion for High-Fidelity Video Generation and Editing
- 文本-视频质量评估
- FreeInit: Bridging Initialization Gap in Video Diffusion Models
- 视频编辑
- DragVideo: Interactive Drag-style Video Editing
🏠project - Video Editing via Factorized Diffusion Distillation
- SAVE: Protagonist Diversification with Structure Agnostic Video Editing
🏠project - DNI: Dilutional Noise Initialization for Diffusion Video Editing
- DeCo: Decoupled Human-Centered Diffusion Video Editing with Motion Consistency
- DreamMotion: Space-Time Self-Similar Score Distillation for Zero-Shot Video Editing
🏠project - Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion
⭐code - Object-Centric Diffusion for Efficient Video Editing
🏠project
- DragVideo: Interactive Drag-style Video Editing
- 图像编辑
- ObjectAdd: Adding Objects into Image via a Training-Free Diffusion Modification Fashion
- Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing
🤗huggingface - COMPOSE: Comprehensive Portrait Shadow Editing
- Fast Diffusion-Based Counterfactuals for Shortcut Removal and Generation编辑
- FreeDiff: Progressive Frequency Truncation for Image Editing with Diffusion Models
⭐code - ByteEdit: Boost, Comply and Accelerate Generative Image Editing
- RegionDrag: Fast Region-Based Image Editing with Diffusion Models
⭐code - 3DEgo: 3D Editing on the Go!
⭐code - View-Consistent 3D Editing with Gaussian Splatting
🏠project - Diffusion Models are Geometry Critics: Single Image 3D Editing Using Pre-Trained Diffusion Priors
🏠project - Watch Your Steps: Local Image and Scene Editing by Text Instructions
🏠project - Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts
🏠project - Free-Editor: Zero-shot Text-driven 3D Scene Editing
🏠project - InstructGIE: Towards Generalizable Image Editing
- Lazy Diffusion Transformer for Interactive Image Editing
🏠project - DATENeRF: Depth-Aware Text-based Editing of NeRFs
⭐code
🏠project - TurboEdit: Real-time text-based disentangled real image editing
- DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing
⭐code - StableDrag: Stable Dragging for Point-based Image Editing
- ST-LDM: A Universal Framework for Text-Grounded Object Generation in Real Images图像编辑
- SwapAnything: Enabling Arbitrary Object Swapping in Personalized Image Editing
🏠project - Source Prompt Disentangled Inversion for Boosting Image Editability with Diffusion Models
⭐code - Robust-Wide: Robust Watermarking against Instruction-driven Image Editing
⭐code - FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing
- Eta Inversion: Designing an Optimal Eta Function for Diffusion-based Real Image Editing
⭐code - RadEdit: stress-testing biomedical vision models via diffusion image editing
- Responsible Visual Editing
⭐code - 3D-GOI: 3D GAN Omni-Inversion for Multifaceted and Multi-object Editing
🏠project - Thinking Outside the BBox: Unconstrained Generative Object Compositing物体合成
- EditShield: Protecting Unauthorized Image Editing by Instruction-guided Diffusion Models
- 图像-视频
- Rethinking Image-to-Video Adaptation: An Object-centric Perspective
- R2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
⭐code - PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation
⭐code - ZeroI2V: Zero-Cost Adaptation of Pre-Trained Transformers from Image to Video
- 文本-视频
- E.T. the Exceptional Trajectories: Text-to-camera-trajectory generation with character awareness
🏠project - WAVE: Warping DDIM Inversion Features for Zero-shot Text-to-Video Editing
🏠project - MotionDirector: Motion Customization of Text-to-Video Diffusion Models
⭐code
🏠project - Factorizing Text-to-Video Generation by Explicit Image Conditioning
🤗huggingface - SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models
🏠project - MEVG: Multi-event Video Generation with Text-to-Video Models
🏠project - Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models
🏠project - xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations
- E.T. the Exceptional Trajectories: Text-to-camera-trajectory generation with character awareness
- 文本-3D
- Diverse Text-to-3D Synthesis with Augmented Text Embedding
⭐code
🏠project - LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis
🏠project - DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation
⭐code - DreamReward: Aligning Human Preference in Text-to-3D Generation
🏠project - DreamDissector: Learning Disentangled Text-to-3D Generation from 2D Diffusion Priors
⭐code - DreamReward: Text-to-3D Generation with Human Preference
🏠project - GVGEN: Text-to-3D Generation with Volumetric Representation
⭐code - WordRobe: Text-Guided Generation of Textured 3D Garments
🏠project - UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation
⭐code
🏠project - ScaleDreamer: Scalable Text-to-3D Synthesis with Asynchronous Score Distillation
⭐code - CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model
🏠project - Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable Repainting
🏠project - VividDreamer: Invariant Score Distillation for Hyper-Realistic Text-to-3D Generation
- DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling
⭐code - HiFi-123: Towards High-fidelity One Image to 3D Content Generation
🏠project - JointDreamer: Ensuring Geometry Consistency and Text Congruence in Text-to-3D Generation via Joint Score Distillation
- Connecting Consistency Distillation to Score Distillation for Text-to-3D Generation
⭐code - TPA3D: Triplane Attention for Fast Text-to-3D Generation
🏠project - DreamView: Injecting View-specific Text Guidance into Text-to-3D Generation
⭐code
- Diverse Text-to-3D Synthesis with Augmented Text Embedding
- 文本-图像
- [Navigating Text-to-lmage Generative Bias acrossIndic Languages]
- Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
⭐code
🏠project - MobileDiffusion: Instant Text-to-Image Generation on Mobile Devices
- PreciseControl: Enhancing Text-To-Image Diffusion Models with Fine-Grained Attribute Control
🏠project - PEA-Diffusion: Parameter-Efficient Adapter with Knowledge Distillation in non-English Text-to-Image Generation
⭐code - Diffusion Soup: Model Merging for Text-to-Image Diffusion Models
- IMMA: Immunizing text-to-image Models against Malicious Adaptation
⭐code - Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers
🏠project - ControlNet-XS: Rethinking the Control of Text-to-Image Diffusion Models as Feedback-Control Systems
- Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation
⭐code - Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models
⭐code - Post-training Quantization with Progressive Calibration and Activation Relaxing for Text-to-Image Diffusion Models
- Textual-Visual Logic Challenge: Understanding and Reasoning in Text-to-Image Generation
⭐code - Navigating Text-to-Image Generative Bias across Indic Languages
🏠project - Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion
⭐code - Diff-Tracker: Text-to-Image Diffusion Models are Unsupervised Trackers
- DreamDrone: Text-to-Image Diffusion Models are Zero-shot Perpetual View Generators
🏠project - Harnessing Text-to-Image Diffusion Models for Category-Agnostic Pose Estimation
- MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models
⭐code
🏠project - Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models
- R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model
⭐code - MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization
⭐code
🏠project - ReCON: Training-Free Acceleration for Text-to-Image Synthesis with Retrieval of Concept Prompt Trajectories
🏠project - LCM-Lookahead for Encoder-based Text-to-Image Personalization
⭐code
🏠project - Lego: Learning to Disentangle and Invert Personalized Concepts Beyond Object Appearance in Text-to-Image Diffusion Models
- Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models
⭐code - Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model
⭐code
👍DiffPNG实现了最佳的性能,证明了T2I扩散模型在短语级理解视觉内容的能力 - T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models
⭐code - Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation
🏠project - Latent Guard: a Safety Framework for Text-to-image Generation
⭐code - Getting it Right: Improving Spatial Consistency in Text-to-Image Models
⭐code
🏠project - Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention
⭐code - Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning
⭐code - PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion
🏠project - Text-Anchored Score Composition: Tackling Condition Misalignment in Text-to-Image Diffusion Models
⭐code - MasterWeaver: Taming Editability and Face Identity for Personalized Text-to-Image Generation
⭐code - Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation
- ProTIP: Probabilistic Robustness Verification on Text-to-Image Diffusion Models against Stochastic Perturbation
- PixArt-Sigma: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
- AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation
⭐code
🏠project - CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion
⭐code - Object-Conditioned Energy-Based Attention Map Alignment in Text-to-Image Diffusion Models
- SpeedUpNet: A Plug-and-Play Adapter Network for Accelerating Text-to-Image Diffusion Models
🏠project - TIBET: Identifying and Evaluating Biases in Text-to-Image Generative Models
🏠project - An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation
- Adversarial Robustification via Text-to-Image Diffusion Models
⭐code - Stable Preference: Redefining training paradigm of human preference model for Text-to-Image Synthesis
- Lost in Translation: Latent Concept Misalignment in Text-to-Image Diffusion Models
🏠project - Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression
- 图像-文本
- 文本-视频对齐
- 图像-文本对齐
- 图像-文本
- 3D(内容)生成
- Make-Your-3D: Fast and Consistent Subject-Driven 3D Content Generation
🏠project - LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation
🏠project - LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation
🏠project - SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion
🏠project - VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models
🏠project - Compress3D: a Compressed Latent Space for 3D Generation from a Single Image
- AnimatableDreamer: Text-Guided Non-rigid 3D Model Generation and Reconstruction with Canonical Score Distillation
🏠project - Learn to Optimize Denoising Scores: A Unified and Improved Diffusion Prior for 3D Generation
🏠project
- Make-Your-3D: Fast and Consistent Subject-Driven 3D Content Generation
- 视觉文本渲染
- GIF 生成
- 布局生成
- 布局-图像
- 图像-图像翻译
- 图像翻译
- Text-to-4D
- Video-to-4D
- 网页设计
- Text-to-Garment
- 图像风格化
- StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models
⭐code - ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
⭐code
🏠project风格 - InstaStyle: Inversion Noise of a Stylized Image is Secretly a Style Adviser
⭐code风格化 - StyleCity: Large-Scale 3D Urban Scenes Stylization
🏠project城市场景风格化 - Scene-Conditional 3D Object Stylization and Composition
- StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models
- 图像矢量化
- 视频拼接
- 文本到相机轨迹生成
- 文本到 3D 场景
- 身份保留的个性化
- 主题驱动生成
- 风格内容分离
- 文本生成多运动
- 文本驱动的3D编辑
- 图像插值
- 图像合成
- 图像动画
- LivePhoto: Real Image Animation with Text-guided Motion Control
⭐code文本引导运动控制的真实图像动画 - MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model
⭐code
🏠project - ZoLA: Zero-Shot Creative Long Animation Generation with Short Video Model
🏠project - Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance
⭐code人体图像动画 - TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models
🏠project
- LivePhoto: Real Image Animation with Text-guided Motion Control
- 集体照合成
- 图像裁剪
- DECap: Towards Generalized Explicit Caption Editing via Diffusion Mechanism
- ControlCap: Controllable Region-level Captioning
⭐code字幕 - MarineInst: A Foundation Model for Marine Image Analysis with Instance Visual Description
⭐code视觉描述 - Distractors-Immune Representation Learning with Cross-modal Contrastive Regularization for Change Captioning
⭐code - CIC-BART-SSA: Controllable Image Captioning with Structured Semantic Augmentation
⭐code - Controllable Contextualized Image Captioning: Directing the Visual Narrative through User-Defined Highlights
⭐code - BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues
⭐code - View Selection for 3D Captioning via Diffusion Ranking
⭐code - Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning
🏠project - HiFi-Score: Fine-grained Image Description Evaluation with Hierarchical Parsing Graphs细粒度图像描述
- 视频字幕
- 密集字幕
- SAH-SCI: Self-Supervised Adapter for Efficient Hyperspectral Snapshot Compressive Imaging
- Adaptive Selection of Sampling-Reconstruction in Fourier Compressed Sensing
- Image Compression for Machine and Human Vision with Spatial-Frequency Adaptation
⭐code - Bidirectional Stereo Image Compression with Cross-Dimensional Entropy Model
- Rate-Distortion-Cognition Controllable Versatile Neural Image Compression
- BaSIC: BayesNet Structure Learning for Computational Scalable Neural Image Compression
⭐code - Region-Adaptive Transform with Segmentation Prior for Image Compression
⭐code - EGIC: Enhanced Low-Bit-Rate Generative Image Compression Guided by Semantic Segmentation
- Lagrangian Hashing for Compressed Neural Field Representations
🏠project - Latent Diffusion Prior Enhanced Deep Unfolding for Snapshot Spectral Compressive Imaging
⭐code快照光谱压缩 - Image Compression for Machine and Human Vision With Spatial-Frequency Adaptation
⭐code - Learned HDR Image Compression for Perceptually Optimal Storage and Display
⭐code - WeConvene: Learned Image Compression with Wavelet-Domain Convolution and Entropy Model
- Lossy Image Compression with Foundation Diffusion Models
- A Unified Image Compression Method for Human Perception and Multiple Vision Tasks
- 视频压缩
- Hierarchical Separable Video Transformer for Snapshot Compressive Imaging
⭐code - A Simple Low-bit Quantization Framework for Video Snapshot Compressive Imaging
⭐code - Free-VSC: Free Semantics from Visual Foundation Models for Unsupervised Video Semantic Compression
- Long-term Temporal Context Gathering for Neural Video Compression
- Learned Rate Control for Frame-Level Adaptive Neural Video Compression via Dynamic Neural Network
- Hierarchical Separable Video Transformer for Snapshot Compressive Imaging
- 视频解码
- 快照光谱成像
- 运动估计
- RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos
⭐code
🏠project - AMES: Asymmetric and Memory-Efficient Similarity Estimation for Instance-level Retrieval
⭐code - IRGen: Generative Modeling for Image Retrieval
⭐code - FastCAD: Real-Time CAD Retrieval and Alignment from Scans and Videos
- Spherical Linear Interpolation and Text-Anchoring for Zero-shot Composed Image Retrieval
⭐code - FreestyleRet: Retrieving Images from Style-Diversified Queries
⭐code - 基于草图的图像检索
- 视频-文本检索
- 图像-文本检索
- 视频检索
- 近邻搜索
- Occlusion-Aware Seamless Segmentation
⭐code - SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding
⭐code
:Thumbsup:视觉定位新SOTA!SegVG:将视觉定位的目标边界框转化为分割信号(已开源) - Segment, Lift and Fit: Automatic 3D Shape Labeling from 2D Prompts
- Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively
⭐code
🏠project - Segment and Recognize Anything at Any Granularity
⭐code - Enriching Information and Preserving Semantic Consistency in Expanding Curvilinear Object Segmentation Datasets
⭐code - Semi-supervised Segmentation of Histopathology Images with Noise-Aware Topological Consistency
⭐code - CoPT: Unsupervised Domain Adaptive Segmentation using Domain-Agnostic Text Embeddings
⭐code - From Pixels to Objects: A Hierarchical Approach for Part and Object Segmentation Using Local and Global Aggregation
- CC-SAM: Enhancing SAM with Cross-feature Attention and Context for Ultrasound Image Segmentation
- Unsupervised Moving Object Segmentation with Atmospheric Turbulence
- Lite-SAM Is Actually What You Need for Segment Everything
- Textual Query-Driven Mask Transformer for Domain Generalized Segmentation
⭐code - Can Textual Semantics Mitigate Sounding Object Segmentation Preference?
⭐code - RAPiD-Seg: Range-Aware Pointwise Distance Distribution Networks for 3D LiDAR Segmentation
⭐code - SPIN: Hierarchical Segmentation with Subpart Granularity in Natural Images
⭐code - CC-SAM: SAM with Cross-feature Attention and Context for Ultrasound Image Segmentation
- FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally
⭐code - SegGen: Supercharging Segmentation Models with Text2Mask and Mask2Img Synthesis
🏠project - PQ-SAM: Post-training Quantization for Segment Anything Model
- Active Coarse-to-Fine Segmentation of Moveable Parts from Real Images
🏠project - LiteSAM is Actually what you Need for segment Everything
- SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation
⭐code - A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive Properties
⭐code - Placing Objects in Context via Inpainting for Out-of-distribution Segmentation
⭐code - Rethinking and Improving Visual Prompt Selection for In-Context Learning Segmentation Framework
⭐code - Better Call SAL: Towards Learning to Segment Anything in Lidar
⭐code - 抠图
- 3D分割
- Bayesian Self-Training for Semi-Supervised 3D Segmentation
- Segment3D: Learning Fine-Grained Class-Agnostic 3D Segmentation without Manual Labels
🏠project - EgoLifter: Open-world 3D Segmentation for Egocentric Perception
🏠project
🤗huggingface - View-Consistent Hierarchical 3D Segmentation Using Ultrametric Feature Fields
⭐code
- 视频分割
- 实例分割
- Unleashing the Power of Prompt-driven Nucleus Instance Segmentation
⭐code - 3D实例分割
- Part2Object: Hierarchical Unsupervised 3D Instance Segmentation
⭐code - SAM-guided Graph Cut for 3D Instance Segmentation
⭐code
🏠project - Continual Learning and Unknown Object Discovery in 3D Scenes via Self-Distillation
⭐code实例分割 - OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
⭐code
- Part2Object: Hierarchical Unsupervised 3D Instance Segmentation
- 无监督实例分割
- 开发世界实例分割
- Unleashing the Power of Prompt-driven Nucleus Instance Segmentation
- 全景分割
- Open Panoramic Segmentation
⭐code - A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting
⭐code - Point-supervised Panoptic Segmentation via Estimating Pseudo Labels from Learnable Distance
- Strike a Balance in Continual Panoptic Segmentation
⭐code - 3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation
- Open Panoramic Segmentation
- 语义分割
- Rethinking Data Augmentation for Robust LiDAR Semantic Segmentation in Adverse Weather
⭐code - Exploring Reliable Matching with Phase Enhancement for Night-time Semantic Segmentation
- MTA-CLIP: Language-Guided Semantic Segmentation with Mask-Text Alignment
- Sparse Refinement for Efficient High-Resolution Semantic Segmentation
🏠project - Embedding-Free Transformer with Inference Spatial Reduction for Efficient Semantic Segmentation
⭐code - Weakly Supervised Co-training with Swapping Assignments for Semantic Segmentation
⭐code - Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation
⭐code - On-the-fly Category Discovery for LiDAR Semantic Segmentation
⭐code - On the Viability of Monocular Depth Pre-training for Semantic Segmentation
- Efficient Active Domain Adaptation for Semantic Segmentation by Selecting Information-rich Superpixels
⭐code - Distributed Semantic Segmentation with Efficient Joint Source and Task Decoding
- FREST: Feature RESToration for Semantic Segmentation under Multiple Adverse Conditions
- Make a Strong Teacher with Label Assistance: A Novel Knowledge Distillation Approach for Semantic Segmentation
⭐code - MeshSegmenter: Zero-Shot Mesh Segmentation via Texture Synthesis
⭐code - Towards Reliable Evaluation and Fast Training of Robust Semantic Segmentation Models
⭐code - ItTakesTwo: Leveraging Peer Representations for Semi-supervised LiDAR Semantic Segmentation
⭐code - Evaluating the Adversarial Robustness of Semantic Segmentation: Trying Harder Pays Off
⭐code - Open-Vocabulary RGB-Thermal Semantic Segmentation
⭐code - Centering the Value of Every Modality: Towards Efficient and Resilient Modality-agnostic Semantic Segmentation
- Learning Modality-agnostic Representation for Semantic Segmentation from Any Modalities
- SFPNet: Sparse Focal Point Network for Semantic Segmentation on General LiDAR Point Clouds
🏠project
⭐code - Reliability in Semantic Segmentation: Can We Use Synthetic Data?
⭐code - Cs2K: Class-specific and Class-shared Knowledge Guidance for Incremental Semantic Segmentation
- MeshSegmenter: Zero-Shot Mesh Semantic Segmentation via Texture Synthesis
⭐code - 3D语义分割
- 跨域语义分割
- 无监督语义分割
- 半监督语义分割
- Beyond Pixels: Semi-Supervised Semantic Segmentation with a Multi-scale Patch-based Multi-Label Classifier
⭐code - Weighting Pseudo-Labels via High-Activation Feature Index Similarity and Object Detection for Semi-Supervised Segmentation
⭐code - SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance
⭐code
- Beyond Pixels: Semi-Supervised Semantic Segmentation with a Multi-scale Patch-based Multi-Label Classifier
- 弱监督语义分割
- Knowledge Transfer with Simulated Inter-Image Erasing for Weakly Supervised Semantic Segmentation
⭐code
通过模拟图像间擦除实现知识转移,弱监督语义分割再也不怕过扩展问题,助力精准目标定位! - Tendency-driven Mutual Exclusivity for Weakly Supervised Incremental Semantic Segmentation
- DIAL: Dense Image-text ALignment for Weakly Supervised Semantic Segmentation
- Finding Meaning in Points: Weakly Supervised Semantic Segmentation for Event Cameras
⭐code - 3D weakly supervised semantic segmentation with 2D vision-language guidance
- Learning from the Web: Language Drives Weakly-Supervised Incremental Learning for Semantic Segmentation
- Phase Concentration and Shortcut Suppression for Weakly Supervised Semantic Segmentation
⭐code - DHR: Dual Features-Driven Hierarchical Rebalancing in Inter- and Intra-Class Regions for Weakly-Supervised Semantic Segmentation
⭐code - 3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance
⭐code - Diffusion-Guided Weakly Supervised Semantic Segmentation
⭐code
- Knowledge Transfer with Simulated Inter-Image Erasing for Weakly Supervised Semantic Segmentation
- 域适应语义分割
- 域泛化语义分割
- 类增量语义分割
- 零样本语义分割
- 开放词汇语义分割
- CLIP-DINOiser: Teaching CLIP a few DINO tricks for open-vocabulary semantic segmentation
⭐code
🏠project - Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation
⭐code - In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation
- ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation
⭐code
- CLIP-DINOiser: Teaching CLIP a few DINO tricks for open-vocabulary semantic segmentation
- Rethinking Data Augmentation for Robust LiDAR Semantic Segmentation in Adverse Weather
- 部分分割
- 运动分割
- 烟雾分割
- 线段分割
- 场景解析
- 交互式分割
- 小样本分割
- 伪装目标分割
- 参考图像分割
- 指代图像分割
- 场景文本分割
- 开放词汇分割
- 指代表达式分割
- VIS
- VOS
- ActionVOS: Actions as Prompts for Video Object Segmentation
⭐code - VISA: Reasoning Video Object Segmentation via Large Language Model
⭐code - PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation
⭐code - Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation
⭐code - Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation
⭐code - OneVOS: Unifying Video Object Segmentation with All-in-One Transformer Framework
- Spatial-Temporal Multi-level Association for Video Object Segmentation
- ActionVOS: Actions as Prompts for Video Object Segmentation
- Labeled Data Selection for Category Discovery
- Active Generation for Image Classification
⭐code - Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs
🏠project - Dyn-Adapter: Towards Disentangled Representation for Efficient Visual Recognition
- Wavelet Convolutions for Large Receptive Fields
⭐code - Momentum Auxiliary Network for Supervised Local Learning
⭐code - An accurate detection is not all you need to combat label noise in web-noisy datasets
⭐code - Dual-stage Hyperspectral Image Classification Model with Spectral Supertoken
⭐code - DEPICT: Diffusion-Enabled Permutation Importance for Image Classification Tasks
- NOVUM: Neural Object Volumes for Robust Object Classification
⭐code - EntAugment: Entropy-Driven Adaptive Data Augmentation Framework for Image Classification
⭐code - Distribution-Aware Robust Learning from Long-Tailed Data with Noisy Labels
⭐code - Discovering Unwritten Visual Classifiers with Large Language Models
- 广义类别发现
- SelEx: Self-Expertise in Fine-Grained Generalized Category Discovery
⭐code - Textual Knowledge Matters: Cross-Modality Co-Teaching for Generalized Visual Class Discovery
⭐code广义类别发现(Generalized Category Discovery,GCD) - Learning to Distinguish Samples for Generalized Category Discovery
⭐code - PromptCCD: Learning Gaussian Mixture Prompt Pool for Continual Category Discovery
⭐code - Online Continuous Generalized Category Discovery
⭐code广义类别发现 - Category Adaptation Meets Projected Distillation in Generalized Continual Category Discovery
⭐code
- SelEx: Self-Expertise in Fine-Grained Generalized Category Discovery
- 多标签图像分类
- 小样本分类
- 零样本分类
- 多标签识别
- 长尾识别
- 细粒度
- On Learning Discriminative Features from Synthesized Data for Self-Supervised Fine-Grained Visual Recognition
- A Rotation-invariant Texture ViT for Fine-Grained Recognition of Esophageal Cancer Endoscopic Ultrasound Images
⭐code - Adapting Fine-Grained Cross-View Localization to Areas without Fine Ground Truth
- ReNoise: Real Image Inversion Through Iterative Noising
⭐code
🏠project - UniProcessor: A Text-induced Unified Low-level Image Processor
- 恢复
- MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration
⭐code - Panel-Specific Degradation Representation for Raw Under-Display Camera Image Restoration
⭐code - Unsupervised Variational Translator for Bridging Image Restoration and High-Level Vision Tasks
- A Comparative Study of Image Restoration Networks for General Backbone Network Design
- GRIDS: Grouped Multiple-Degradation Restoration with Image Degradation Similarity
- Restoring Images in Adverse Weather Conditions via Histogram Transformer
- InstructIR: High-Quality Image Restoration Following Human Instructions
⭐code - Diffusion Prior-Based Amortized Variational Inference for Noisy Inverse Problems
⭐code - Towards Real-World Adverse Weather Image Restoration: Enhancing Clearness and Semantics with Vision-Language Models
- Teaching Tailored to Talent: Adverse Weather Restoration via Prompt Pool and Depth-Anything Constraint
- Seeing the Unseen: A Frequency Prompt Guided Transformer for Image Restoration
⭐code - MambaIR: A Simple Baseline for Image Restoration with State-Space Model
⭐code
👍MambaIR: 基于Mamba的图像复原基准模型 - AutoDIR: Automatic All-in-One Image Restoration with Latent Diffusion
⭐code - SPIRE: Semantic Prompt-Driven Image Restoration
🏠project - Efficient Cascaded Multiscale Adaptive Network for Image Restoration
- Restore Anything with Masks: Leveraging Mask Image Modeling for Blind All-in-One Image Restoration
⭐code - When Fast Fourier Transform Meets Transformer for Image Restoration
⭐code - Osmosis: RGBD Diffusion Prior for Underwater Image Restoration
🏠project - Contribution-based Low-Rank Adaptation with Pre-training Model for Real Image Restoration
🏠project - DiffBIR: Toward Blind Image Restoration with Generative Diffusion Prior
⭐code - MetaWeather: Few-Shot Weather-Degraded Image Restoration
⭐code - Depth-Aware Blind Image Decomposition for Real-World Adverse Weather Recovery
- MoE-DiffIR: Task-customized Diffusion Priors for Universal Compressed Image Restoration
- 修补
- 去雨
- 去噪
- TTT-MIM: Test-Time Training with Masked Image Modeling for Denoising Distribution Shifts
⭐code - Region-Aware Sequence-to-Sequence Learning for Hyperspectral Denoising
⭐code - DualDn: Dual-domain Denoising via Differentiable ISP
⭐code - Exploiting Dual-Correlation for Multi-frame Time-of-Flight Denoising
⭐code - EDformer: Transformer-Based Event Denoising Across Varied Noise Levels
- denoiSplit: a method for joint microscopy image splitting and unsupervised denoising去噪
- Asymmetric Mask Scheme for Self-Supervised Real Image Denoising
⭐code - Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts
🏠project - Enhancing Plausibility Evaluation for Generated Designs with Denoising Autoencoder
⭐code
- TTT-MIM: Test-Time Training with Masked Image Modeling for Denoising Distribution Shifts
- 去雾
- 去模糊
- Deblur e-NeRF: NeRF from Motion-Blurred Events under High-speed or Low-light Conditions
⭐code - UniINR: Event-guided Unified Rolling Shutter Correction, Deblurring, and Interpolation
⭐code - Blind image deblurring with noise-robust kernel estimation
⭐code - Motion Aware Event Representation-driven Image Deblurring(https://github.com/ZhijingS/DA_event_deblur)
- BAD-Gaussians: Bundle Adjusted Deblur Gaussian Splatting
⭐code
- Deblur e-NeRF: NeRF from Motion-Blurred Events under High-speed or Low-light Conditions
- 去卷积
- 去反射
- 去伪影
- 去摩尔纹
- 去马赛克
- 目标移除
- 扩图
- 图像修饰
- 图像增强
- LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models
⭐code - LMT-GP: Combined Latent Mean-Teacher and Gaussian Process for Semi-supervised Low-light Image Enhancement
⭐code - RAVE: Residual Vector Embedding for CLIP-Guided Backlit Image Enhancement
⭐code - Image-adaptive 3D Lookup Tables for Real-time Image Enhancement with Bilateral Grids
⭐code - NamedCurves: Learned Image Enhancement via Color Naming
- Joint RGB-Spectral Decomposition Model Guided Image Enhancement in Mobile Photography
⭐code - GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook Retrieval
⭐code
👍GLARE 利用外部正常光照先验,实现逼真的低光照增强效果! - Fast Context-Based Low-Light Image Enhancement via Neural Implicit Representations
⭐code - Unveiling Advanced Frequency Disentanglement Paradigm for Low-Light Image Enhancement
⭐code
- LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models
- 图像质量评估
- DSMix: Distortion-Induced Sensitivity Map Based Pre-training for No-Reference Image Quality Assessment
⭐code - A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment
- Towards Open-ended Visual Quality Comparison
⭐code
:Thumbsup:Co-Instruct: 让通用多模态大模型学会比较视觉质量 - PromptIQA: Boosting the Performance and Generalization for No-Reference Image Quality Assessment via Prompts无参考图像质量评估
- Assessing UHD Image Quality from Aesthetics, Distortions, and Saliency
⭐code - Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models
🏠project - DSMix: Distortion-Induced Saliency Map Based Pre-training for No-Reference Image Quality Assessment
⭐code
- DSMix: Distortion-Induced Sensitivity Map Based Pre-training for No-Reference Image Quality Assessment
- 图像美学质量评价
- 视频恢复
- 视频着色
- 视频增强
- 视频去雨
- 视频去噪
- 视频去雪
- 视频去模糊
- Domain-adaptive Video Deblurring via Test-time Blurring
⭐code - CMTA: Cross-Modal Temporal Alignment for Event-guided Video Deblurring
⭐code - Cross-Modal Temporal Alignment for Event-guided Video Deblurring
⭐code - Towards Real-world Event-guided Low-light Video Enhancement and Deblurring
⭐code - Rethinking Video Deblurring with Wavelet-Aware Dynamic Transformer and Diffusion Model
⭐code
- Domain-adaptive Video Deblurring via Test-time Blurring
- 视频去闪烁
- 视频去马赛克
- 视频质量增强
- 重照明
- Data Overfitting for On-Device Super-Resolution with Dynamic Algorithm and Compiler Co-Design
⭐code - SMFANet: A Lightweight Self-Modulation Feature Aggregation Network for Efficient Image Super-Resolution
⭐code - Towards Robust Full Low-bit Quantization of Super Resolution Networks
- BurstM: Deep Burst Multi-scale SR using Fourier Space with Optical Flow
⭐code - HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution
- Pairwise Distance Distillation for Unsupervised Real-World Image Super-Resolution
⭐code - UCIP: A Universal Framework for Compressed Image Super-Resolution using Dynamic Prompt
⭐code - Accelerating Image Super-Resolution Networks with Pixel-Level Classification
⭐code - Rethinking Image Super-Resolution from Training Data Perspectives
- Spatially-Variant Degradation Model for Dataset-free Super-resolution
⭐code - Learning Exhaustive Correlation for Spectral Super-Resolution: Where Spatial-Spectral Attention Meets Linear Dependence
- Contourlet Residual for Prompt Learning Enhanced Infrared Image Super-Resolution
⭐code - Confidence-Based Iterative Generation for Real-World Image Super-Resolution
⭐code - Adaptive Multi-modal Fusion of Spatially Variant Kernel Refinement with Diffusion Model for Blind Image Super-Resolution
- Pixel-Aware Stable Diffusion for Realistic Image Super-Resolution and Personalized Stylization
⭐code - XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution
⭐code - AdaDiffSR: Adaptive Region-aware Dynamic acceleration Diffusion Model for Real-World Image Super-Resolution
- Overcoming Distribution Mismatch in Quantizing Image Super-Resolution Networks
⭐code - Rethinking Image Super Resolution from Training Data Perspectives
⭐code - MTKD: Multi-Teacher Knowledge Distillation for Image Super-Resolution
⭐code - OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model
- Learning Dual-Level Deformable Implicit Representation for Real-World Scale Arbitrary Super-Resolution
⭐code - You Only Need One Step: Fast Super-Resolution with Stable Diffusion via Scale Distillation
⭐code - A New Dataset and Framework for Real-World Blurred Images Super-Resolution
⭐code - 场景文本图像超分辨率
- VSR
- Arbitrary-Scale Video Super-Resolution with Structural and Textural Priors
⭐code - RealViformer: Investigating Attention for Real-World Video Super-Resolution
⭐code - Enhancing Perceptual Quality in Video Super-Resolution through Temporally-Consistent Detail Synthesis using Diffusion Models
⭐code - SuperGaussian: Repurposing Video Models for 3D Super Resolution
🏠project - Event-Adapted Video Super-Resolution
- Motion-Guided Latent Diffusion for Temporally Consistent Real-world Video Super-resolution
⭐code
- Arbitrary-Scale Video Super-Resolution with Structural and Textural Priors
- Can OOD Object Detectors Learn from Foundation Models?
⭐code - Crowd-SAM:SAM as a smart annotator for object detection in crowded scenes
⭐code - Distilling Knowledge from Large-Scale Image Models for Object Detection
- DeTra: A Unified Model for Object Detection and Trajectory Forecasting
- Modality Translation for Object Detection Adaptation without forgetting prior knowledge
- OpenSight: A Simple Open-Vocabulary Framework for LiDAR-Based Object Detection
- LEROjD: Lidar Extended Radar-Only Object Detection
⭐code - Bucketed Ranking-based Losses for Efficient Training of Object Detectors
⭐code - Plain-Det: A Plain Multi-Dataset Object Detector
⭐code - On Calibration of Object Detectors: Pitfalls, Evaluation and Baselines
⭐code - Towards Open-World Object-based Anomaly Detection via Self-Supervised Outlier Synthesis
⭐code - Weak-to-Strong Compositional Learning from Generative Models for Language-based Object Detection
- PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects
⭐code - Crowd-SAM: SAM as a Smart Annotator for Object Detection in Crowded Scenes
⭐code - Relation DETR: Exploring Explicit Position Relation Prior for Object Detection
⭐code - Bridge Past and Future: Overcoming Information Asymmetry in Incremental Object Detection
⭐code - Modality Translation for Object Detection Adaptation Without Forgetting Prior Knowledge
⭐code - T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
⭐code - Fine-grained Dynamic Network for Generic Event Boundary Detection
⭐code - CSOT: Cross-Scan Object Transfer for Semi-Supervised LiDAR Object Detection
- Bayesian Detector Combination for Object Detection with Crowdsourced Annotations
⭐code - Simplifying Source-Free Domain Adaptation for Object Detection: Effective Self-Training Strategies and Performance Insights
⭐code - Out-of-Bounding-Box Triggers: A Stealthy Approach to Cheat Object Detectors
⭐code - A Simple Background Augmentation Method for Object Detection with Diffusion Model
- Look Around and Learn: Self-Training Object Detection by Exploration
⭐code - Co-Student: Collaborating Strong and Weak Students for Sparsely Annotated Object Detection
⭐code - Benchmarking Object Detectors with COCO: A New Path Forward
🌻dataset - DAMSDet: Dynamic Adaptive Multispectral Detection Transformer with Competitive Query Selection and Adaptive Feature Fusion
⭐code目标检测 - Integer-Valued Training and Spike-driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection
⭐code - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
⭐code - Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection
- Projecting Points to Axes: Oriented Object Detection via Point-Axis Representation
- GRA: Detecting Oriented Objects through Group-wise Rotating and Attention
- Embracing Events and Frames with Hierarchical Feature Refinement Network for Object Detection
⭐code - Dynamic Retraining-Updating Mean Teacher for Source-Free Object Detection
⭐code - Zero-Shot Detection of AI-Generated Images
⭐code
🏠project - MOD-UV: Learning Mobile Object Detectors from Unlabeled Videos
⭐code - Category-level Object Detection, Pose Estimation and Reconstruction from Stereo Images
🏠project - Rethinking Features-Fused-Pyramid-Neck for Object Detection
⭐code - 3D目标检测
- Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression
⭐code - Approaching Outside: Scaling Unsupervised 3D Object Detection from 2D Scene
- SparseLIF: High-Performance Sparse LiDAR-Camera Fusion for 3D Object Detection
- MonoWAD: Weather-Adaptive Diffusion Model for Robust Monocular 3D Object Detection
⭐code - Transfer Learning from Simulated to Real Scenes for Monocular 3D Object Detection
⭐code - Domain Generalization of 3D Object Detection by Density-Resampling
⭐code - Learning High-resolution Vector Representation from Multi-Camera Images for 3D Object Detection
⭐code - LiDAR-based All-weather 3D Object Detection via Prompting and Distilling 4D Radar
- Towards Stable 3D Object Detection
- RecurrentBEV: A Long-term Temporal Fusion Framework for Multi-view 3D Detection
⭐code - LISO: Lidar-only Self-Supervised 3D Object Detection
⭐code - Diffusion Model for Robust Multi-Sensor Fusion in 3D Object Detection and BEV Segmentation
⭐code - SAMFusion: Sensor-Adaptive Multimodal Fusion for 3D Object Detection in Adverse Weather
- Unlocking Textual and Visual Wisdom: Open-Vocabulary 3D Object Detection Enhanced by Comprehensive Guidance from Text and Image
- OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation
⭐code - Better Regression Makes Better Test-time Adaptive 3D Object Detection
- Find n' Propagate: Open-Vocabulary 3D Object Detection in Urban Environments
⭐code - Diff3DETR: Agent-based Diffusion Model for Semi-supervised 3D Object Detection
- Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance
⭐code - SimPB: A Single Model for 2D and 3D Object Detection from Multiple Cameras
⭐code - CMD: A Cross Mechanism Domain Adaptation Dataset for 3D Object Detection
⭐code
👍DIG从密度、强度和几何三方面缓和传感器体制带来的点云数据差异,显著提升了域自适应算法的性能。 - Ray Denoising: Depth-aware Hard Negative Sampling for Multi-view 3D Object Detection
⭐code - OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection
⭐code - LabelDistill: Label-guided Cross-modal Knowledge Distillation for Camera-based 3D Object Detection
⭐code - MonoTTA: Fully Test-Time Adaptation for Monocular 3D Object Detection
⭐code - FSD-BEV: Foreground Self-Distillation for Multi-view 3D Object Detection
⭐code - General Geometry-aware Weakly Supervised 3D Object Detection
⭐code - Interactive 3D Object Detection with Prompts
- Beyond Viewpoint: Robust 3D Object Recognition under Arbitrary Views through Joint Multi-Part Representation
- Detecting As Labeling: Rethinking LiDAR-camera Fusion in 3D Object Detection
⭐code - TCC-Det: Temporarily consistent cues for weakly-supervised 3D detection
⭐code - GraphBEV: Towards Robust BEV Feature Alignment for Multi-Modal 3D Object Detection
⭐code
- Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression
- 小目标检测
- 伪装目标检测
- CamoTeacher: Dual-Rotation Consistency Learning for Semi-Supervised Camouflaged Object Detection
👍有效减少了像素级、实例级噪声问题 - Learning Camouflaged Object Detection from Noisy Pseudo Label
⭐code - Just a Hint: Point-Supervised Camouflaged Object Detection
- SAM-COD: SAM-guided Unified Framework for Weakly-Supervised Camouflaged Object Detection
- Frequency-Spatial Entanglement Learning for Camouflaged Object Detection
⭐code - FocusDiffuser: Perceiving Local Disparities for Camouflaged Object Detection
⭐code
- CamoTeacher: Dual-Rotation Consistency Learning for Semi-Supervised Camouflaged Object Detection
- 长尾目标检测
- 显著目标检测
- 域适应目标检测
- 小样本目标检测
- 共同显著目标检测
- 开放词汇目标检测
- Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection
⭐code - LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction
- MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection
⭐code - Region-centric Image-Language Pretraining for Open-Vocabulary Detection
⭐code - CLIFF: Continual Latent Diffusion for Open-Vocabulary Object Detection
⭐code
- Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection
- 水印检测
- 阴影检测
- 开集识别
- 目标定位
- Local All-Pair Correspondence for Point Tracking
⭐code
⭐code - Track Everything Everywhere Fast and Robustly
⭐code
🏠project - CoTracker: It is Better to Track Together
🏠project - DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video
⭐code
🏠project - Decomposition Betters Tracking Everything Everywhere
⭐code - Self-Supervised Any-Point Tracking by Contrastive Random Walks
🏠project
⭐code - TAPTR: Tracking Any Point with Transformers as Detection
⭐code - MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping
🏠project - SPAMming Labels: Efficient Annotations for the Trackers of Tomorrow
- SLAck: Semantic, Location, and Appearance Aware Open-Vocabulary Tracking
⭐code - OneTrack: Demystifying the Conflict Between Detection and Tracking in End-to-End 3D Trackers
- Exploring the Feature Extraction and Relation Modeling For Light-Weight Transformer Tracking
⭐code - Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL
- 3D目标跟踪
- 多目标跟踪
- Lost and Found: Overcoming Detector Failures in Online Multi-Object Tracking
⭐code - Walker: Self-supervised Multiple Object Tracking by Walking on Temporal Object Appearance Graphs
- Beyond MOT: Semantic Multi-Object Tracking
⭐code - PapMOT: Exploring Adversarial Patch Attack against Multiple Object Tracking
- VETRA: A Dataset for Vehicle Tracking in Aerial Imagery - New Challenges for Multi-Object Tracking
🏠project
- Lost and Found: Overcoming Detector Failures in Online Multi-Object Tracking
- 细胞跟踪
- Parrot Captions Teach CLIP to Spot Text
⭐code - WeCromCL: Weakly Supervised Cross-Modality Contrastive Learning for Transcription-only Supervised Text Spotting
- FineMatch: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
🏠project - Bridging Synthetic and Real Worlds for Pre-training Scene Text Detectors
⭐code - 手写文本检测
- Align, Minimize and Diversify: A Source-Free Unsupervised Domain Adaptation Method for Handwritten Text Recognition
- PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer
⭐code
:Thumbsup:上交推出 PosFormer!优化位置识别任务来辅助表达式识别,复杂公式识别能力再创新SOTA! - Elegantly Written: Disentangling Writer and Character Styles for Enhancing Online Chinese Handwriting
- NAMER: Non-Autoregressive Modeling for Handwritten Mathematical Expression Recognition
- 手写文本合成
- 场景文本删除
- 文档理解
- 文本分割
- 文本合成
- 文本修复
- X-Pose: Detecting Any Keypoints
⭐code - VQ-HPS: Human Pose and Shape Estimation in a Vector-Quantized Latent Space
🏠project - Expressive Whole-Body 3D Gaussian Avatar
⭐code - GTPT: Group-based Token Pruning Transformer for Efficient Human Pose Estimation
- PoseSOR: Human Pose Can Guide Our Attention
⭐code - COSMU: Complete 3D human shape from monocular unconstrained images
- Modeling and Driving Human Body Soundfields through Acoustic Primitives
🏠project - Domain-Adaptive 2D Human Pose Estimation via Dual Teachers in Extremely Low-Light Conditions
⭐code - SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views
- PoseEmbroider: Towards a 3D, Visual, Semantic-aware Human Pose Representation
- PoseAugment: Generative Human Pose Data Augmentation with Physical Plausibility for IMU-based Motion Capture
⭐code - HPE-Li: WiFi-enabled Lightweight Dual Selective Kernel Convolution for Human Pose Estimation人体姿势估计
- EgoPoser: Robust Real-Time Egocentric Pose Estimation from Sparse and Intermittent Observations Everywhere
🏠project - You Only Learn One Query: Learning Unified Human Query for Single-Stage Multi-Person Multi-Task Human-Centric Perception
⭐code - Within the Dynamic Context: Inertia-aware 3D Human Modeling with Pose Sequence
- Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects
- Human Pose Recognition via Occlusion-Preserving Abstract Images
- 文本驱动的人体生成
- 多人姿势预测
- 3D人体姿态估计
- MPL: Lifting 3D Human Pose from Multi-view 2D Poses
⭐code - RT-Pose: A 4D Radar Tensor-based 3D Human Pose Estimation and Localization Benchmark
🏠project - AvatarPose: Avatar-guided 3D Pose Estimation of Close Human Interaction from Sparse Multi-view Videos
⭐code
🏠project - Mask as Supervision: Leveraging Unified Mask Information for Unsupervised 3D Pose Estimation
⭐code - 3D Human Pose Estimation via Non-Causal Retentive Networks
⭐code - UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues
- Occlusion Handling in 3D Human Pose Estimation with Perturbed Positional Encoding
- RePOSE: 3D Human Pose Estimation via Spatio-Temporal Depth Relational Consistency
⭐code - RT-Pose: A 4D Radar-Tensor based 3D Human Pose Estimation and Localization Benchmark
🤗huggingface - EgoPoseFormer: A Simple Baseline for Stereo Egocentric 3D Human Pose Estimation
⭐code - 3DSA:Multi-View 3D Human Pose Estimation With 3D Space Attention Mechanisms
- WorldPose: A World Cup Dataset for Global 3D Human Pose Estimation
- Rotated Orthographic Projection for Self-Supervised 3D Human Pose Estimation
- NICP: Neural ICP for 3D Human Registration at Scale
🏠project
- MPL: Lifting 3D Human Pose from Multi-view 2D Poses
- 人体网格恢复
- Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot
⭐code - Divide and Fuse: Body Part Mesh Recovery from Partially Visible Human Images
- Global-to-Pixel Regression for Human Mesh Recovery
- WindPoly: Polygonal Mesh Reconstruction via Winding Numbers
🏠project - Multi-RoI Human Mesh Recovery with Camera Consistency and Contrastive Losses
⭐code
- Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot
- 3D人体纹理生成
- 3D人体生成
- StructLDM: Structured Latent Diffusion for 3D Human Generation
⭐code
🏠project
👍南洋理工三维数字人生成新范式:结构扩散模型 - Text Motion Translator: A Bi-Directional Model for Enhanced 3D Human Motion Generation from Open-Vocabulary Descriptions
- Text to Layer-wise 3D Clothed Human Generation
🏠project - SemanticHuman-HD: High Resolution Semantic disentangled 3D Human Generation
🏠project3D 人类生成
- StructLDM: Structured Latent Diffusion for 3D Human Generation
- 人体重建
- 动作捕捉
- 手语识别
- 手部网格
- 3D手部序列恢复
- 3D手部重建
- 手部姿态估计
- 手部运动预测
- 头部姿态估计
- 手持物体重建
- 头部姿态估计
- 4D 头部捕获
- 动作捕捉
- 手语视频生成
- Task-adaptive Q-Face
- Faceptor: A Generalist Model for Face Perception
⭐code - A Light Stage on Every Desk
🏠project - Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control
⭐code
🏠project - ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer
⭐code - Facial Affective Behavior Analysis with Instruction Tuning
⭐code
🏠project - Arc2Face: A Foundation Model for ID-Consistent Human Faces
⭐code
🏠project - GAMMA-FACE: GAussian Mixture Models Amend Diffusion Models for Bias Mitigation in Face Images
- GRAPE: Generalizable and Robust Multi-view Facial Capture
- High-Quality Mesh Blendshape Generation from Face Videos via Neural Inverse Rendering
⭐code - 人脸交换
- 人脸模糊
- 人脸识别
- Towards Certifiably Robust Face Recognition
- AdaDistill: Adaptive Knowledge Distillation for Deep Face Recognition
⭐code - ARoFace: Alignment Robustness to Improve Low-Quality Face Recognition
⭐code - Personalized Privacy Protection Mask Against Unauthorized Facial Recognition
- MST-KD: Multiple Specialized Teachers Knowledge Distillation for Fair Face Recognition
- dversariaLeak: External Information Leakage Attack Using Adversarial Samples on Face Recognition Systems
- 人脸聚类
- 人脸重建
- 人脸表情
- 人脸编辑
- 人脸动画
- 说话头合成
- ScanTalk: 3D Talking Heads from Unregistered Scans
⭐code - EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis
🏠project头部合成 - EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head
⭐code - All You Need is Your Voice: Emotional Face Representation with Audio Perspective for Emotional Talking Face Generation
⭐code - Audio-driven Talking Face Generation with Stabilized Synchronization Loss
- Head360: Learning a Parametric 3D Full-Head for Free-View Synthesis in 360°
⭐code - S^3D-NeRF: Single-Shot Speech-Driven Neural Radiance Field for High Fidelity Talking Head Synthesis
- Gaussian3Diff: 3D Gaussian Diffusion for 3D Full Head Synthesis and Editing
🏠project头部合成 - Tri2-plane: Thinking Head Avatar via Feature Pyramid
🏠project - Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos
🏠project - TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting
⭐code3D 说话头合成
- ScanTalk: 3D Talking Heads from Unregistered Scans
- 动画头部头像
- 人脸超分辨
- 人脸活体检测
- TF-FAS: Twofold-Element Fine-Grained Semantic Guidance for Generalizable Face Anti-Spoofing
⭐code
👍通过双重元素细粒度语义指导来增强泛化能力 - DiffFAS: Face Anti-Spoofing via Generative Diffusion Models
⭐code - Towards Unified Representation of Invariant-Specific Features in Missing Modality Face Anti-Spoofing
- Bottom-Up Domain Prompt Tuning for Generalized Face Anti-Spoofing
- TF-FAS: Twofold-Element Fine-Grained Semantic Guidance for Generalizable Face Anti-Spoofing
- 头部合成
- 情绪识别
- 人脸动作单元检测
- 假脸检测
- GroundUp: Rapid Sketch-Based 3D City Massing
🏠project - Ray-Distance Volume Rendering for Neural Scene Reconstruction
- HSR: Holistic 3D Human-Scene Reconstruction from Monocular Videos
🏠project - Decomposition of Neural Discrete Representations for Large-Scale 3D Mapping
⭐code - BlenderAlchemy: Editing 3D Graphics with Vision-Language Models
🏠project - Temporal Event Stereo via Joint Learning with Stereoscopic Flow
⭐code - GenRC: Generative 3D Room Completion from Sparse Image Collections
⭐code - Single-Photon 3D Imaging with Equi-Depth Photon Histograms
🏠project - Viewpoint textual inversion: discovering scene representations and 3D view control in 2D diffusion models
⭐code - SparseCraft: Few-Shot Neural Reconstruction through Stereopsis Guided Geometric Linearization
⭐code - 3D Congealing: 3D-Aware Image Alignment in the Wild
🏠project - ClusteringSDF: Self-Organized Neural Implicit Surfaces for 3D Decomposition
🏠project - BAGS: Blur Agnostic Gaussian Splatting through Multi-Scale Kernel Modeling
⭐code - Soft Shadow Diffusion (SSD): Physics-inspired Learning for 3D Computational Periscopy
- DPA-Net: Structured 3D Abstraction from Sparse Views via Differentiable Primitive Assembly
- An Optimization Framework to Enforce Multi-View Consistency for Texturing 3D Meshes
🏠project - Omniview-Tuning: Boosting Viewpoint Invariance of Vision-Language Pre-training Models
🏠project - Diffusion Model is a Good Pose Estimator from 3D RF-Vision
🏠project - Nuvo: Neural UV Mapping for Unruly 3D Representations
🏠project - MAP-ADAPT: Real-Time Quality-Adaptive Semantic 3D Maps
🏠project - MinD-3D: Reconstruct High-quality 3D objects in Human Brain
🏠project3D - UpFusion: Novel View Diffusion from Unposed Sparse View Observations
⭐code3D - MVS
- 3D Visual Grounding
- Empowering 3D Visual Grounding with Reasoning Capabilities
🏠project - Multi-branch Collaborative Learning Network for 3D Visual Grounding
⭐code
👍3DREC的Acc@0.5提高了 3.27%,3DRES的mIOU 提高了5.22% - ScanReason: Empowering 3D Visual Grounding with Reasoning Capabilities
⭐code - Four Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding
🏠project
- Empowering 3D Visual Grounding with Reasoning Capabilities
- Stereo Matching
- 3DGS
- GaussReg: Fast 3D Registration with Gaussian Splatting
- 3iGS: Factorised Tensorial Illumination for 3D Gaussian Splatting
- Texture-GS: Disentangle the Geometry and Texture for 3D Gaussian Splatting Editing
⭐code - Compact3D: Smaller and Faster Gaussian Splatting with Vector Quantization
⭐code
🏠project - CoR-GS: Sparse-View 3D Gaussian Splatting via Co-Regularization
⭐code
🏠project3DGS - End-to-End Rate-Distortion Optimized 3D Gaussian Representation
- Deblurring 3D Gaussian Splatting
⭐code
🏠project - Per-Gaussian Embedding-Based Deformation for Deformable 3D Gaussian Splatting
🏠project - HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression
🏠project - On the Error Analysis of 3D Gaussian Splatting and an Optimal Projection Strategy
⭐code - Analytic-Splatting: Anti-Aliased 3D Gaussian Splatting via Analytic Integration
⭐code - MVSGaussian: Fast Generalizable Gaussian Splatting Reconstruction from Multi-View Stereo
⭐code
🏠project - Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting
⭐code - DGD: Dynamic 3D Gaussians Distillation
🏠project - EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS
⭐code - Revising Densification in Gaussian Splatting
- HO-Gaussian: Hybrid Optimization of 3D Gaussian Splatting for Urban Scenes
- RoDUS: Robust Decomposition of Static and Dynamic Elements in Urban Scenes
- SWinGS: Sliding Windows for Dynamic 3D Gaussian Splatting
- VEGS: View Extrapolation of Urban Scenes in 3D Gaussian Splatting using Learned Priors
⭐code - MesonGS: Post-training Compression of 3D Gaussians via Efficient Attribute Transformation
- MIGS: Multi-Identity Gaussian Splatting via Tensor Decomposition
⭐code - SAGS: Structure-Aware 3D Gaussian Splatting
🏠project - GGRt: Towards Generalizable 3D Gaussians without Pose Priors in Real-Time
🏠project - Gaussian in the wild: 3D Gaussian Splatting for Unconstrained Image Collections
⭐code - Pixel-GS Density Control with Pixel-aware Gradient for 3D Gaussian Splatting
⭐code - WaSt-3D: Wasserstein-2 Distance for Scene-to-Scene Stylization on 3D Gaussians
🏠project - MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
⭐code - GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting
🏠project - DynMF: Neural Motion Factorization for Real-time Dynamic View Synthesis with 3D Gaussian Splatting
⭐code
- 深度估计
- Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos
⭐code
🏠project - PatchRefiner: Leveraging Synthetic Data for Real-Domain High-Resolution Monocular Metric Depth Estimation
⭐code - Remove Projective LiDAR Depthmap Artifacts via Exploiting Epipolar Geometry深度图伪影
- Revisit Self-supervision with Local Structure-from-Motion
- DoubleTake: Geometry Guided Depth Estimation
⭐code - Physics-informed Knowledge Transfer for Underwater Monocular Depth Estimation
- FutureDepth: Learning to Predict the Future Improves Video Depth Estimation
- ProDepth: Boosting Self-Supervised Multi-Frame Monocular Depth with Probabilistic Fusion
⭐code - Mono-ViFI: A Unified Learning Framework for Self-supervised Single- and Multi-frame Monocular Depth Estimation
⭐code - Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions
⭐code
⭐code - High-Precision Self-Supervised Monocular Depth Estimation with Rich-Resource Prior
- DiffusionDepth: Diffusion Denoising Approach for Monocular Depth Estimation
⭐code - SEDiff: Structure Extraction for Domain Adaptive Depth Estimation via Denoising Diffusion Models
- Camera Height Doesn't Change: Unsupervised Training for Metric Monocular Road-Scene Depth Estimation
🏠project - GroCo: Ground Constraint for Metric Self-Supervised Monocular Depth
⭐code深度估计 - M2Depth: Self-supervised Two-Frame Multi-camera Metric Depth Estimation
🏠project - Improving Domain Generalization in Self-Supervised Monocular Depth Estimation via Stabilized Adversarial Training
- Leveraging Near-Field Lighting for Monocular Depth Estimation from Endoscopy Videos
- 深度补全
- Deep Cost Ray Fusion for Sparse Depth Video Completion
- OGNI-DC: Robust Depth Completion with Optimization-Guided Neural Iterations
⭐code - AugUndo: Scaling Up Augmentations for Monocular Depth Completion and Estimation
⭐code - Sparse Beats Dense: Rethinking Supervision in Radar-Camera Depth Completion
⭐code
- 表面重建
- Surface Reconstruction from Gaussian Splatting via Novel Stereo Views
🏠project - SG-NeRF: Neural Surface Reconstruction with Scene Graph Optimization
⭐code - Sur2f: A Hybrid Representation for High-Quality and Efficient Surface Reconstruction from Multi-view Images
- Improving Neural Surface Reconstruction with Feature Priors from Multi-View Image
- DiffSurf: A Transformer-based Diffusion Model for Generating and Reconstructing 3D Surfaces in Pose
- Rethinking Directional Parameterization in Neural Implicit Surface Reconstruction
- PISR: Polarimetric Neural Implicit Surface Reconstruction for Textureless and Specular Objects
⭐code - Surface Reconstruction for 3D Gaussian Splatting via Local Structural Hints
🏠project - EMIE-MAP: Large-Scale Road Surface Reconstruction Based on Explicit Mesh and Implicit Encoding
- GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views
🏠project - Improving Neural Surface Reconstruction with Feature Priors from Multi-View Images
⭐code - DynoSurf: Neural Deformation-based Temporally Consistent Dynamic Surface Reconstruction
⭐code - Surface-Centric Modeling for High-Fidelity Generalizable Neural Surface Reconstruction
⭐code - Parameterization-driven Neural Surface Reconstruction for Object-oriented Editing in Neural Rendering
🏠project
- Surface Reconstruction from Gaussian Splatting via Novel Stereo Views
- 三维重建
- GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction
🏠project - InfoNorm: Mutual Information Shaping of Normals for Sparse-View Reconstruction
⭐code - fMRI-3D: A Comprehensive Dataset for Enhancing fMRI-based 3D Reconstruction
⭐code
🏠project
🏠project - Reconstruction and Simulation of Elastic Objects with Spring-Mass 3D Gaussians
⭐code - GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation
⭐code
🏠project - latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction
⭐code - MirrorGaussian: Reflecting 3D Gaussians for Reconstructing Mirror Reflections
🏠project - Resolving Scale Ambiguity in Multi-view 3D Reconstruction using Dual-Pixel Sensors
⭐code - 3D Reconstruction of Objects in Hands without Real World 3D Supervision
- Human Hair Reconstruction with Strand-Aligned 3D Gaussians
🏠project - SUP-NeRF: A Streamlined Unification of Pose Estimation and NeRF for Monocular 3D Object Reconstruction
⭐code - MVDiffHD: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction(https://github.com/Tangshitao/MVDiffusion_plusplus)
- NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and Generation
🏠project - Sketch2Vox: Learning 3D Reconstruction from a Single Monocular Sketch Image
🌻dataset - Analysis-by-Synthesis Transformer for Single-View 3D Reconstruction
⭐code
- GSD: View-Guided Gaussian Splatting Diffusion for 3D Reconstruction
- 三维形状
- Synchronous Diffusion for Unsupervised Smooth Non-Rigid 3D Shape Matching
- Transferable 3D Adversarial Shape Completion using Diffusion Models
- Self-supervised Shape Completion via Involution and Implicit Correspondences
- TetraDiffusion: Tetrahedral Diffusion Models for 3D Shape Generation
⭐code
🏠project - Learning Neural Deformation Representation for 4D Dynamic Shape Generation
- AWOL: Analysis WithOut synthesis using Language3D shape
- DiscoMatch: Fast Discrete Optimisation for Geometrically Consistent 3D Shape Matching
- 视频重建
- 四维重建
- 3D 纹理形状
- Dataset Growth
⭐code - Adaptive Parametric Activation
⭐code - Nonverbal Interaction Detection
⭐code - Situated Instruction Following
🏠project
🏠project - Optimizing Illuminant Estimation in Dual-Exposure HDR Imaging
- Unsupervised Exposure Correction
⭐code - Global Structure-from-Motion Revisited
⭐code - Fast Sprite Decomposition from Animated Graphics
🏠project - Sync from the Sea: Retrieving Alignable Videos from Large-Scale Datasets
🏠project - MERLiN: Single-Shot Material Estimation and Relighting for Photometric Stereo
- Enhancing Vectorized Map Perception with Historical Rasterized Maps
⭐code - Align before Collaborate: Mitigating Feature Misalignment for Robust Multi-Agent Perception
- DC-Solver: Improving Predictor-Corrector Diffusion Sampler via Dynamic Compensation
⭐code - Weight Conditioning for Smooth Optimization of Neural Networks
- Bones Can't Be Triangles: Accurate and Efficient Vertebrae Keypoint Estimation through Collaborative Error Revision
⭐code - SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation
🏠project - VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
⭐code - Global Counterfactual Directions
- Which Model Generated This Image? A Model-Agnostic Approach for Origin Attribution
- Computing the Lipschitz constant needed for fast scene recovery from CASSI measurements
- Pseudo-Labelling Should Be Aware of Disguising Channel Activations
- FMBoost: Boosting Latent Diffusion with Flow Matching
⭐code - Holodepth: Programmable Depth-Varying Projection via Computer-Generated Holography
🏠project - Adversarial Diffusion Distillation
- When and How do negative prompts take effect
- Preventing Catastrophic Forgetting through Memory Networks in Continuous Detection
- SCOD: From Heuristics to Theory
- Unsupervised Representation Learning by Balanced Self Attention Matching
⭐code - DualBEV: Unifying Dual View Transformation with Probabilistic Correspondences
⭐code - Linking in Style: Understanding learned features in deep learning models
⭐code - CliffPhys: Camera-based Respiratory Measurement using Clifford Neural Networks
- Synthesizing Environment-Specific People in Photographs
🏠project - Implicit Steganography Beyond the Constraints of Modality
- Energy-Clibrated VAE with Test Time Free Lunch
- Debiasing surgeon: fantastic weights and how to find them
- SparseRadNet: Sparse Perception Neural Network on Subsampled Radar Data
- Learning Where to Look: Self-supervised Viewpoint Selection for Active Localization using Geometrical Information
- Using My Artistic Style? You Must Obtain My Authorization
⭐code - IFTR: An Instance-Level Fusion Transformer for Visual Collaborative Perception
⭐code - Convex Relaxations for Manifold-Valued Markov Random Fields with Approximation Guarantees
- Adapting to Shifting Correlations with Unlabeled Data Calibration
- On Spectral Properties of Gradient-based Explanation Methods
- O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit Representation
- A Framework for Efficient Model Evaluation through Stratification, Sampling, and Estimation
- Non-Line-of-Sight Estimation of Fast Human Motion with Slow Scanning Imagers
- Image Manipulation Detection With Implicit Neural Representation and Limited Supervision
- Adaptive Bounding Box Uncertainties via Two-Step Conformal Prediction
- GOEmbed: Gradient Origin Embeddings for Representation Agnostic 3D Feature Learning
🏠project - AddBiomechanics Dataset: Capturing the Physics of Human Motion at Scale
🏠project - Tight and Efficient Upper Bound on Spectral Norm of Convolutional Layers
- Learning Multimodal Latent Generative Models with Energy-Based Prior
- Hierarchical Conditioning of Diffusion Models Using Tree-of-Life for Studying Species Evolution
- Learning to Build by Building Your Own Instructions
⭐code - LNL+K: Enhancing Learning with Noisy Labels Through Noise Source Knowledge Integration
⭐code - Deep Online Probability Aggregation Clustering
- Camera Calibration using a Collimator System
⭐code - Asynchronous Bioplausible Neuron for Spiking Neural Networks for Event-Based Vision
- LITA: Language Instructed Temporal-Localization Assistant
⭐code - INTRA: Interaction Relationship-aware Weakly Supervised Affordance Grounding
🏠project - Elucidating the Hierarchical Nature of Behavior with Masked Autoencoders
⭐code - MetaAT: Active Testing for Label-Efficient Evaluation of Dense Recognition Tasks
- Generalizable Symbolic Optimizer Learning
⭐code - Training A Secure Model against Data-Free Model Extraction
- EraseDraw : Learning to Insert Objects by Erasing Them from Images
- AdaDiff: Accelerating Diffusion Models through Step-Wise Adaptive Computation
- Imaging with Confidence: Uncertainty Quantification for High-dimensional Undersampled MR Images
⭐code - Learning to Make Keypoints Sub-Pixel Accurate
⭐code - Explorative Inbetweening of Time and Space
🏠project - Salience-Based Adaptive Masking: Revisiting Token Dynamics for Enhanced Pre-training
⭐code - Instant Uncertainty Calibration of NeRFs Using a Meta-Calibrator
⭐code - Improving Robustness to Model Inversion Attacks via Sparse Coding Architectures
- Constructing Concept-based Models to Mitigate Spurious Correlations with Minimal Human Effort
- Neural Poisson Solver: A Universal and Continuous Framework for Natural Signal Blending
- Augmented Neural Fine-tuning for Efficient Backdoor Purification
- REDIR: Refocus-free Event-based De-occlusion Image Reconstruction
- Comprehensive Attribution: Inherently Explainable Vision Model with Feature Detector
⭐code - Pre-trained Visual Dynamics Representations for Efficient Policy Learning
- MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning
🏠project - MARs: Multi-view Attention Regularizations for Patch-based Feature Recognition of Space Terrain
🏠project - Diff-Reg: Diffusion Model in Doubly Stochastic Matrix Space for Registration Problem
⭐code - Hetecooper: Feature Collaboration Graph for Heterogeneous Collaborative Perception
- FARSE-CNN: Fully Asynchronous, Recurrent and Sparse Event-Based CNN
⭐code - Unmasking Bias in Diffusion Model Training
⭐code - Cross-Input Certified Training for Universal Perturbations
- Investigating Style Similarity in Diffusion Models
- Delving into Adversarial Robustness on Document Tampering Localization
⭐code - AMD: Automatic Multi-step Distillation of Large-scale Vision Models
- Learning Scalable Model Soup on a Single GPU: An Efficient Subspace Training Strategy
⭐code - JDT3D: Addressing the Gaps in LiDAR-Based Tracking-by-Attention
⭐code - SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision
- Adaptive Annealing for Robust Averaging
- Lost in Translation: Modern Neural Networks Still Struggle With Small Realistic Image Transformations
- Generalizing to Unseen Domains via Text-guided Augmentation
- MO-EMT-NAS: Multi-Objective Continuous Transfer of Architectural Knowledge Between Tasks from Different Datasets
- Learning a Dynamic Privacy-preserving Camera Robust to Inversion Attacks
- MaxMI: A Maximal Mutual Information Criterion for Manipulation Concept Discovery
- CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches
- Towards Image Ambient Lighting Normalization
- Synthesizing Time-varying BRDFs via Latent Space
- HoloADMM: High-Quality Holographic Complex Field Recovery
- Fundamental Matrix Estimation Using Relative Depths
- MTaDCS: Moving Trace and Feature Density-based Confidence Sample Selection under Label Noise
⭐code - CipherDM: Secure Three-Party Inference for Diffusion Model Sampling
⭐code - Weighted Ensemble Models Are Strong Continual Learners
⭐code - Learning Equilibrium Transformation for Gamut Expansion and Color Restoration
⭐code - Implicit Neural Models to Extract Heart Rate from Video
🏠project - Learning Quantized Adaptive Conditions for Diffusion Models
- High-Fidelity Modeling of Generalizable Wrinkle Deformation
- Efficient Learning of Event-based Dense Representation using Hierarchical Memories with Adaptive Update
- SlimFlow: Training Smaller One-Step Diffusion Models with Rectified Flow
⭐code - DreamSampler: Unifying Diffusion Sampling and Score Distillation for Image Manipulation
⭐code - PosterLlama: Bridging Design Ability of Langauge Model to Content-Aware Layout Generation
- Integration of Global and Local Representations for Fine-grained Cross-modal Alignment
- Veil Privacy on Visual Data: Concealing Privacy for Humans, Unveiling for DNNs
- A high-quality robust diffusion framework for corrupted dataset
⭐code - FRDiff : Feature Reuse for Universal Training-free Acceleration of Diffusion Models
⭐code - Leveraging Imperfect Restoration for Data Availability Attack
⭐code - Oulu Remote-photoplethysmography Physical Domain Attacks Database (ORPDAD)
⭐code - Spiking Wavelet Transformer
⭐code - Hypernetworks for Generalizable BRDF Representation
🏠project - Solving the inverse problem of microscopy deconvolution with a residual Beylkin-Coifman-Rokhlin neural network
- Photon Inhibition for Energy-Efficient Single-Photon Imaging
🏠project - RANRAC: Robust Neural Scene Representations via Random Ray Consensus
🏠project - Characterizing Model Robustness via Natural Input Gradients
- Emerging Property of Masked Token for Effective Pre-training
- SWAG: Splatting in the Wild images with Appearance-conditioned Gaussians
- Curved Diffusion: A Generative Model With Optical Geometry Control
🏠project - Mini-Splatting: Representing Scenes with a Constrained Number of Gaussians
⭐code - Skeleton Recall Loss for Connectivity Conserving and Resource Efficient Segmentation of Thin Tubular Structures
⭐code - RAP: Retrieval-Augmented Planner for Adaptive Procedure Planning in Instructional Videos
- Rethinking Fast Adversarial Training: A Splitting Technique To Overcome Catastrophic Overfitting
- Optimization-based Uncertainty Attribution Via Learning Informative Perturbations
- CPT-VR: Improving Surface Rendering via Closest Point Transform with View-Reflection Appearance
- Think before Placement: Common Sense Enhanced Transformer for Object Placement
- Efficient Bias Mitigation Without Privileged Information
- Region-Native Visual Tokenization
⭐code - DiffCD: A Symmetric Differentiable Chamfer Distance for Neural Implicit Surface Fitting
⭐code - Efficient Neural Video Representation with Temporally Coherent Modulation
- Made to Order: Discovering monotonic temporal changes via self-supervised video ordering
⭐code - Concise Plane Arrangements for Low-Poly Surface and Volume Modelling
⭐code - ViPer: Visual Personalization of Generative Models via Individual Preference Learning
⭐code - How Far Can a 1-Pixel Camera Go? Solving Vision Tasks using Photoreceptors and Computationally Designed Visual Morphology
- Watching it in Dark: A Target-aware Representation Learning Framework for High-Level Vision Tasks in Low Illumination
⭐code - 3R-INN: How to be climate friendly while consuming/delivering videos
- Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks
⭐code - Dynamic Guidance Adversarial Distillation with Enhanced Teacher Knowledge
⭐code - Idling Neurons, Appropriately Lenient Workload During Fine-tuning Leads to Better Generalization
- ConDense: Consistent 2D-3D Pre-training for Dense and Sparse Features from Multi-View Images
- Tokenize Anything via Prompting
⭐code
🤗huggingface - Efficient Diffusion-Driven Corruption Editor for Test-Time Adaptation
⭐code - Long-CLIP: Unlocking the Long-Text Capability of CLIP
⭐code - Dolfin: Diffusion Layout Transformers without Autoencoder
- Beyond the Contact: Discovering Comprehensive Affordance for 3D Objects from Pre-trained 2D Diffusion Models
⭐code - Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation
⭐code - Zero-Shot Image Feature Consensus with Deep Functional Maps
- LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
🏠project - Scissorhands: Scrub Data Influence via Connection Sensitivity in Networks
⭐code - FuseTeacher: Modality-fused Encoders are Strong Vision Supervisors
⭐code - SmartControl: Enhancing ControlNet for Handling Rough Visual Conditions
🏠project - CoSIGN: Few-Step Guidance of ConSIstency Model to Solve General INverse Problems
⭐code - FedRA: A Random Allocation Strategy for Federated Tuning to Unleash the Power of Heterogeneous Clients
⭐code - Uncertainty Calibration with Energy Based Instance-wise Scaling in the Wild Dataset
⭐code - Learning to Enhance Aperture Phasor Field for Non-Line-of-Sight Imaging
⭐code - UniFS: Universal Few-shot Instance Perception with Point Representations
⭐code - Combining Generative and Geometry Priors for Wide-Angle Portrait Correction
⭐code - FlashTex: Fast Relightable Mesh Texturing with LightControlNet
🏠project重新照明 - Consistent 3D Line Mapping
⭐code - RSL-BA: Rolling Shutter Line Bundle Adjustment
- Discovering Novel Actions from Open World Egocentric Videos with Object-Grounded Visual Commonsense Reasoning
- EAS-SNN: End-to-End Adaptive Sampling and Representation for Event-based Detection with Recurrent Spiking Neural Networks
⭐code - PairingNet: A Learning-based Pair-searching and -matching Network for Image Fragments
- Distributed Active Client Selection With Noisy Clients Using Model Association Scores
- Towards a Density Preserving Objective Function for Learning on Point Sets
- Task-Driven Uncertainty Quantification in Inverse Problems via Conformal Prediction
⭐code - SIGMA: Sinkhorn-Guided Masked Video Modeling
🏠project - LiDAR-Event Stereo Fusion with Hallucinations
⭐code
🏠project - Dual-Camera Smooth Zoom on Mobile Phones
⭐code
🏠project - Learning by Aligning 2D Skeleton Sequences and Multi-Modality Fusion
- Agent Attention: On the Integration of Softmax and Linear Attention
⭐code - Deep Feature Surgery: Towards Accurate and Efficient Multi-Exit Networks
⭐code - Grid-Attention: Enhancing Computational Efficiency of Large Vision Models without Fine-Tuning
⭐code - Customized Generation Reimagined: Fidelity and Editability Harmonized
⭐code - Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction视频重建
- Cascade-Zero123: One Image to Highly Consistent 3D with Self-Prompted Nearby Views
🏠project - Controlling the World by Sleight of Hand
- Probabilistic Weather Forecasting with Deterministic Guidance-based Diffusion Model
⭐code概率天气预报 - Representing Topological Self-Similarity Using Fractal Feature Maps for Accurate Segmentation of Tubular Structures
⭐code - Functional Transform-Based Low-Rank Tensor Factorization for Multi-Dimensional Data Recovery
- G3R: Gradient Guided Generalizable Reconstruction
🏠project - SAIR: Learning Semantic-aware Implicit Representation
- Spectral Subsurface Scattering for Material Classification
- Instance-dependent Noisy-label Learning with Graphical Model Based Noise-rate Estimation
- Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance
⭐code - A Direct Approach to Viewing Graph Solvability
- Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training
⭐code - Towards Multimodal Sentiment Analysis Debiasing via Bias Purification
- Improving Feature Stability during Upsampling -- Spectral Artifacts and the Importance of Spatial Context
- From Fake to Real: Pretraining on Balanced Synthetic Images to Prevent Spurious Correlations in Image Recognition
⭐code - Quantization-Friendly Winograd Transformations for Convolutional Neural Networks
- LetsMap: Unsupervised Representation Learning for Label-Efficient Semantic BEV Mapping
- M3DBench: Towards Omni 3D Assistant with Interleaved Multi-modal Instructions
- Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction
⭐code - StereoGlue: Joint Feature Matching and Robust Estimation
⭐code - Factorized Diffusion: Perceptual Illusions by Noise Decomposition
🏠project - GIVT: Generative Infinite-Vocabulary Transformers
⭐code - Tiny Models are the Computational Saver for Large Models
- Unlocking Attributes' Contribution to Successful Camouflage: A Combined Textual and Visual Analysis Strategy
⭐code - SNeRV: Spectra-preserving Neural Representation for Video
⭐code - COMO: Compact Mapping and Odometry
⭐code - Multi-Sentence Grounding for Long-term Instructional Video
- Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer
🏠project - Exact Diffusion Inversion via Bidirectional Integration Approximation
⭐code - McGrids: Monte Carlo-Driven Adaptive Grids for Iso-Surface Extraction
- Regulating Model Reliance on Non-Robust Features by Smoothing Input Marginal Density
⭐code - Dynamic Data Selection for Efficient SSL via Coarse-to-Fine Refinement
- ZeST: Zero-Shot Material Transfer from a Single Image
⭐code - PCF-Lift: Panoptic Lifting by Probabilistic Contrastive Fusion
- SemGrasp: Semantic Grasp Generation via Language Aligned Discretization
🏠project - DragAPart: Learning a Part-Level Motion Prior for Articulated Objects
🏠project - Superpixel-informed Implicit Neural Representation for Multi-Dimensional Data
- Physics-Free Spectrally Multiplexed Photometric Stereo under Unknown Spectral Composition
- Robust Fitting on a Gate Quantum Computer
- Edge-Guided Fusion and Motion Augmentation for Event-Image Stereo
- Mahalanobis Distance-based Multi-view Optimal Transport for Multi-view Crowd Localization
🏠project - On the Vulnerability of Skip Connections to Model Inversion Attacks
⭐code - Taming CLIP for Fine-grained and Structured Visual Understanding of Museum Exhibits
⭐code - GMM-IKRS: Gaussian Mixture Models for Interpretable Keypoint Refinement and Scoring
- ConDense: Consistent 2D/3D Pre-training for Dense and Sparse Features from Multi-View Images
- Does Data-Efficient Generalization Exacerbate Bias in Foundation Models?
- InfMAE: A Foundation Model in The Infrared Modality红外
- Teach CLIP to Develop a Number Sense for Ordinal Regression
⭐code - GlobalPointer: Large-Scale Plane Adjustment with Bi-Convex Relaxation
⭐code - ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders
- Scalar Function Topology Divergence: Comparing Topology of 3D Objects
- OneRestore: A Universal Restoration Framework for Composite Degradation
⭐code - RoofDiffusion: Constructing Roofs from Severely Corrupted Point Data via Diffusion
- Binomial Self-compensation for Motion Error in Dynamic 3D Scanning
- Encapsulating Knowledge in One Prompt
⭐code - iMatching: Imperative Correspondence Learning
- An Adaptive Screen-Space Meshing Approach for Normal Integration
- Efficient Pre-training for Localized Instruction Generation of Procedural Videos
⭐code - Shape from Heat Conduction
- Put Myself in Your Shoes: Lifting the Egocentric Perspective from Exocentric Videos
- Optimal Transport of Diverse Unsupervised Tasks for Robust Learning from Noisy Few-Shot Data
- Finding Visual Task Vectors
⭐code - Occupancy as Set of Points
⭐code - Learning to Robustly Reconstruct Dynamic Scenes from Low-light Spike Streams
- AID-AppEAL: Automatic Image Dataset and Algorithm for Content Appeal Enhancement and Assessment Labeling
⭐code - Retargeting Visual Data with Deformation Fields
- Delving Deep into Engagement Prediction of Short Videos
⭐code - Temporal-Mapping Photography for Event Cameras
⭐code - Six-Point Method for Multi-Camera Systems with Reduced Solution Space
⭐code - BK-SDM: A Lightweight, Fast, and Cheap Version of Stable Diffusion
⭐code - Physical-Based Event Camera Simulator
⭐code - REFRAME: Reflective Surface Real-Time Rendering for Mobile Devices
⭐code - Self-Training Room Layout via Geometry-aware Ray-casting
- Closed-Loop Unsupervised Representation Disentanglement with β-VAE Distillation and Diffusion Probabilistic Feedback
- UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding
⭐code - EventBind: Learning a Unified Representation to Bind Them All for Event-based Open-world Understanding
⭐code - Where am I? Scene Retrieval with Language
- Event Camera Data Dense Pre-training
- Unsqueeze [CLS] Bottleneck to Learn Rich Representations
⭐code - VeCLIP: Improving CLIP Training via Visual-enriched Captions
⭐code - Spike-Temporal Latent Representation for Energy-Efficient Event-to-Video Reconstruction
- Catastrophic Overfitting: A Potential Blessing in Disguise
- Diffusion Reward: Learning Rewards via Conditional Video Diffusion
🏠project - Data-to-Model Distillation: Data-Efficient Learning Framework
- Neural graphics texture compression supporting random access
- ReMatching: Low-Resolution Representations for Scalable Shape Correspondence
- EgoPet: Egomotion and Interaction Data from an Animal's Perspective
🏠project - This Probably Looks Exactly Like That: An Invertible Prototypical Network
⭐code - Revisiting Feature Disentanglement Strategy in Diffusion Training and Breaking Conditional Independence Assumption in Sampling
- ArtVLM: Attribute Recognition Through Vision-Based Prefix Language Modeling
⭐code属性识别 - Stream Query Denoising for Vectorized HD-Map Construction
- PartCraft: Crafting Creative Objects by Parts
⭐code - ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
⭐code - Dropout Mixture Low-Rank Adaptation for Visual Parameters-Efficient Fine-Tuning
- UNIC: Universal Classification Models via Multi-teacher Distillation
- Efficient Training of Spiking Neural Networks with Multi-Parallel Implicit Stream Architecture
⭐code尖峰神经网络 - Visual Prompting via Partial Optimal Transport
- E3V-K5: An Authentic Benchmark for Redefining Video-Based Energy Expenditure Estimation
⭐code - Understanding Physical Dynamics with Counterfactual World Modeling
🏠project - 4Diff: 3D-Aware Diffusion Model for Third-to-First Viewpoint Translation
🏠project - Revisiting Calibration of Wide-Angle Radially Symmetric Cameras
⭐code相机校准 - STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians
⭐code - Synchronization of Projective Transformations
- UniCal: Unified Neural Sensor Calibration
🏠project - Rawformer: Unpaired Raw-to-Raw Translation for Learnable Camera ISPs
⭐code - Robust Incremental Structure-from-Motion with Hybrid Features
- Any2Point: Empowering Any-modality Transformers for Efficient 3D Understanding
⭐code - CompGS: Smaller and Faster Gaussian Splatting with Vector Quantization
⭐code - Multiscale Graph Texture Network
⭐code - Enhancing Optimization Robustness in 1-bit Neural Networks through Stochastic Sign Descent
⭐code - Domain Reduction Strategy for Non-Line-of-Sight Imaging
⭐code - BI-MDRG: Bridging Image History in Multimodal Dialogue Response Generation
⭐code - Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data
⭐code - Model Stock: All we need is just a few fine-tuned models
⭐code - DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation
- DetailSemNet: Elevating Signature Verification through Detail-Semantic Integration
⭐code - SLIM: Spuriousness Mitigation with Minimal Human Annotations
⭐code - Scaling Backwards: Minimal Synthetic Pre-training?
⭐code - On the Evaluation Consistency of Attribution-based Explanations
⭐code - GTP-4o: Modality-prompted Heterogeneous Graph Learning for Omni-modal Biomedical Representation
⭐code - OvSW: Overcoming Silent Weights for Accurate Binary Neural Networks
⭐code - SHINE: Saliency-aware HIerarchical NEgative Ranking for Compositional Temporal Grounding
⭐code - ReGround: Improving Textual and Spatial Grounding at No Cost
🏠project - ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance
🏠project - WHAC: World-grounded Humans and Cameras
🏠project - Unlocking Attributes' Contribution to Successful Camouflage: A Combined Textual and VisualAnalysis Strategy
⭐code - Neural Metamorphosis
🏠project - Light-in-Flight for a World-in-Motion
- Learning with Unmasked Tokens Drives Stronger Vision Learners
⭐code - PSALM: Pixelwise Segmentation with Large Multi-modal Model
⭐code - InsMapper: Exploring Inner-instance Information for Vectorized HD Mapping
⭐code - The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
⭐code - Refine, Discriminate and Align: Stealing Encoders via Sample-Wise Prototypes and Multi-Relational Extraction
- Multi-Task Domain Adaptation for Language Grounding with 3D Objects
🏠project - QueryCDR: Query-based Controllable Distortion Rectification Network for Fisheye Images
⭐code鱼眼图像 - BAMM: Bidirectional Autoregressive Motion Model
🏠project - Handling The Non-Smooth Challenge in Tensor SVD: A Multi-Objective Tensor Recovery Framework
- Latent-INR: A Flexible Framework for Implicit Representations of Videos with Discriminative Semantics
- RPBG: Towards Robust Neural Point-based Graphics in the Wild
⭐code - Memory-Efficient Fine-Tuning for Quantized Diffusion Model
⭐code - Towards Architecture-Agnostic Untrained Networks Priors for Image Reconstruction with Frequency Regularization
⭐code - Similarity of Neural Architectures using Adversarial Attack Transferability
- NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation
🏠project - Robustness Preserving Fine-tuning using Neuron Importance
- A Riemannian Approach for Spatiotemporal Analysis and Generation of 4D Tree-shaped Structures
- Dual-Path Adversarial Lifting for Domain Shift Correction in Online Test-time Adaptation
⭐code - FTBC: Forward Temporal Bias Correction for Optimizing ANN-SNN Conversion
- Test-time Model Adaptation for Image Reconstruction Using Self-supervised Adaptive Layers图像重建
- Unveiling Privacy Risks in Stochastic Neural Networks Training: Effective Image Reconstruction from Gradients图像重建
- CrossScore: A Multi-View Approach to Image Evaluation and Scoring
- ADMap: Anti-disturbance Framework for Vectorized HD Map Construction
- GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting
⭐code - PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation
⭐code - ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic Environments
⭐code - DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment
⭐code - UL-VIO: Ultra-lightweight Visual-Inertial Odometry with Noise Robust Test-time Adaptation视觉惯性里程计
- Real-data-driven 2000 FPS Color Video from Mosaicked Chromatic Spikes
🤗huggingface - RoGUENeRF: A Robust Geometry-Consistent Universal Enhancer for NeRF
🏠project - LaRa: Efficient Large-Baseline Radiance Fields
⭐code - Bi-TTA: Bidirectional Test-Time Adapter for Remote Physiological Measurement
🏠project - ELSE: Efficient Deep Neural Network Inference through Line-based Sparsity Exploration
- Open-World Dynamic Prompt and Continual Visual Representation Learning
- GeoCalib: Learning Single-image Calibration with Geometric Optimization
⭐code - LEIA: Latent View-invariant Embeddings for Implicit 3D Articulation
⭐code - Alignist: CAD-Informed Orientation Distribution Estimation by Fusing Shape and Correspondences
- Weakly-supervised Camera Localization by Ground-to-satellite Image Registration
- Learning Neural Volumetric Pose Features for Camera Localization
🏠project - SPVLoc: Semantic Panoramic Viewport Matching for 6D Camera Localization in Unseen Environments
⭐code - DECOLLAGE: 3D Detailization by Controllable, Localized, and Learned Geometry Enhancement
⭐code - Event-based Mosaicing Bundle Adjustment
⭐code - Reprojection Errors as Prompts for Efficient Scene Coordinate Regression
- Depth on Demand: Streaming Dense Depth from a Low Frame Rate Active Sensor
- AMEGO: Active Memory from long EGOcentric videos
⭐code - Vista3D: Unravel the 3D Darkside of a Single Image
⭐code - Agglomerative Token Clustering
🏠project - Formula-Supervised Visual-Geometric Pre-training
⭐code - Interpretability-Guided Test-Time Adversarial Defense
⭐code - Towards Model-Agnostic Dataset Condensation by Heterogeneous Models
⭐code - MVPGS: Excavating Multi-view Priors for Gaussian Splatting from Sparse Input Views
⭐code - Intrinsic Single-Image HDR Reconstruction
- Disentangled Generation and Aggregation for Robust Radiance Fields
⭐code - Mixture of Efficient Diffusion Experts Through Automatic Interval and Sub-Network Selection
- Commonly Interesting Images
- Sequential Representation Learning via Static-Dynamic Conditional Disentanglement
- QuasiSim: Parameterized Quasi-Physical Simulators for Dexterous Manipulations Transfer
⭐code
🏠project - Dataset Distillation by Automatic Training Trajectories
⭐code - Neural Graphics Texture Compression Supporting Random Acces
- LookupViT: Compressing visual information to a limited number of tokens
- Bridging the Gap: Studio-like Avatar Creation from a Monocular Phone Capture
🏠project - Generating 3D House Wireframes with Semantics
⭐code
🏠project - Flying with Photons: Rendering Novel Views of Propagating Light
⭐code - Deep Nets with Subsampling Layers Unwittingly Discard Useful Activations at Test-Time
⭐code - MobileNetV4: Universal Models for the Mobile Ecosystem
- Gravity-aligned Rotation Averaging with Circular Regression
⭐code - Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
🏠project - HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts
- DoughNet: A Visual Predictive Model for Topological Manipulation of Deformable Objects
🏠project - TrajPrompt: Aligning Color Trajectory with Vision-Language Representations
⭐code - DomainFusion: Generalizing To Unseen Domains with Latent Diffusion Models
- Toward Tiny and High-quality Facial Makeup with Data Amplify Learning
⭐code - Multi-Label Cluster Discrimination for Visual Representation Learning
- Preventing Catastrophic Overfitting in Fast Adversarial Training: A Bi-level Optimization Perspective
- MemBN: Robust Test-Time Adaptation via Batch Norm with Statistics Memory
- SeiT++: Masked Token Modeling Improves Storage-efficient Training
⭐code - MagicEraser: Erasing Any Objects via Semantics-Aware Control
- Reliable Spatial-Temporal Voxels For Multi-Modal Test-Time Adaptation
🏠project - A Cephalometric Landmark Regression Method based on Dual-encoder for High-resolution X-ray Image
⭐code - Resilience of Entropy Model in Distributed Neural Networks
⭐code - GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image
🏠project - MotionChain: Conversational Motion Controllers via Multimodal Prompts
- MacDiff: Unified Skeleton Modeling with Masked Conditional Diffusion
🏠project - Improving Intervention Efficacy via Concept Realignment in Concept Bottleneck Models
⭐code - Brain Netflix: Scaling Data to Reconstruct Videos from Brain Signals
- Latent-INR: A Flexible Framework for Implicit Representations of Videos with Discriminative Semantics
- Tensorial template matching for fast cross-correlation with rotations and its application for tomography
- SelfGeo: Self-supervised and Geodesic-consistent Estimation of Keypoints on Deformable Shapes
⭐code - Explain via Any Concept: Concept Bottleneck Model with Open Vocabulary Concepts
- Motion and Structure from Event-based Normal Flow
🏠project - SENC: Handling Self-collision in Neural Cloth Simulation
- Distribution Alignment for Fully Test-Time Adaptation with Dynamic Online Data Streams
- Animate Your Motion: Turning Still Images into Dynamic Videos
🏠project - Gaussian Splatting on the Move:Blur and Rolling Shutter Compensation for Natural Camera Motion
⭐code
🏠project - Relightable Neural Actor with Intrinsic Decomposition and Pose Control
🏠project - Layer-Wise Relevance Propagation with Conservation Property for ResNet
🏠project - Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance
🏠project - SparseSSP: 3D Subcellular Structure Prediction from Sparse-View Transmitted Light Images
- ViG-Bias: Visually Grounded Bias Discovery and Mitigation
- DOCCI: Descriptions of Connected and Contrasting Images
🏠project - Geometry Fidelity for Spherical Images
- Efficient Inference of Vision Instruction-Following Models with Elastic Cache
⭐code - Mew: Multiplexed Immunofluorescence Image Analysis through an Efficient Multiplex Network
⭐code - Topology-Preserving Downsampling of Binary Images
- Quality Assured: Rethinking Annotation Strategies in Imaging AI
- Chronologically Accurate Retrieval for Temporal Grounding of Motion-Language Models
🏠project - Data Collection-free Masked Video Modeling
- Efficient Image Pre-Training with Siamese Cropped Masked Autoencoders
⭐code - Möbius Transform for Mitigating Perspective Distortions in Representation Learning
🏠project - Foster Adaptivity and Balance in Learning with Noisy Labels
⭐code
无需先验知识即可高效解决深度学习中的噪声标签问题,让模型性能和鲁棒性大幅提升! - Solving Motion Planning Tasks with a Scalable Generative Model
⭐code - 4D Contrastive Superflows are Dense 3D Representation Learners
⭐code - Learning to Complement and to Defer to Multiple Users
⭐code - Shedding More Light on Robust Classifiers under the lens of Energy-based Models
⭐code - TIP: Tabular-Image Pre-training for Multimodal Classification with Incomplete Data
⭐code - UMBRAE: Unified Multimodal Brain Decoding
⭐code
🏠project - Trainable Highly-expressive Activation Functions
⭐code - Controllable Navigation Instruction Generation with Chain of Thought Prompting
- Recursive Visual Programming
⭐code - Reshaping the Online Data Buffering and Organizing Mechanism for Continual Test-Time Adaptation
⭐code - Imaging Interiors: An Implicit Solution to Electromagnetic Inverse Scattering Problems
⭐code - The Gaussian Discriminant Variational Autoencoder (GdVAE): A Self-Explainable Model with Counterfactual Explanations
⭐code
🏠project - HyperSpaceX: Radial and Angular Exploration of HyperSpherical Dimensions
- DataDream: Few-shot Guided Dataset Generation
⭐code - Aligning Neuronal Coding of Dynamic Visual Scenes with Foundation Vision Models
⭐code
🏠project - Towards Robust Event-based Networks for Nighttime via Unpaired Day-to-Night Event Translation
⭐code - FRI-Net: Floorplan Reconstruction via Room-wise Implicit Representation
- Hierarchically Structured Neural Bones for Reconstructing Animatable Objects from Casual Videos
⭐code - Deep Diffusion Image Prior for Efficient OOD Adaptation in 3D Inverse Problems
⭐code - Pathformer3D: A 3D Scanpath Transformer for 360° Images
⭐code - Kinetic Typography Diffusion Model
⭐code - PolyRoom: Room-aware Transformer for Floorplan Reconstruction
⭐code - Tree-D Fusion: Simulation-Ready Tree Dataset from Single Images with Diffusion Priors
- Multiscale Sliced Wasserstein Distances as Perceptual Color Difference Measures
⭐code - Augmented Neural Fine-Tuning for Efficient Backdoor Purification
- Improving Hyperbolic Representations via Gromov-Wasserstein Regularization
- Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion
⭐code - Efficient Training with Denoised Neural Weights
⭐code - SpaceJAM: a Lightweight and Regularization-free Method for Fast Joint Alignment of Images
⭐code - Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery
⭐code - SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model
🏠project - Multi-modal Relation Distillation for Unified 3D Representation Learning
- Continual Learning for Remote Physiological Measurement: Minimize Forgetting and Simplify Inference
⭐code - TreeSBA: Tree-Transformer for Self-Supervised Sequential Brick Assembly
- SIGMA:Sinkhorn-Guided Masked Video Modeling
⭐code - Attention Beats Linear for Fast Implicit Neural Representation Generation
⭐code - Text2Place: Affordance-aware Text Guided Human Placement
⭐code - RoadPainter: Points Are Ideal Navigators for Topology transformER
- STAMP: Outlier-Aware Test-Time Adaptation with Stable Memory Replay
⭐code - Differentiable Convex Polyhedra Optimization from Multi-view Images
⭐code - A Diffusion Model for Simulation Ready Coronary Anatomy with Morpho-skeletal Control
⭐code - Power Variable Projection for Initialization-Free Large-Scale Bundle Adjustment
- Multi-label Cluster Discrimination for Visual Representation Learning
- SINDER: Repairing the Singular Defects of DINOv2
⭐code - SHIC: Shape-Image Correspondences with no Keypoint Supervision
🏠project - Semicalibrated Relative Pose from an Affine Correspondence and Monodepth相对位姿半校准
- Scalable Group Choreography via Variational Phase Manifold Learning
- Deep Companion Learning: Enhancing Generalization Through Historical Consistency
- Revisit Event Generation Model: Self-Supervised Learning of Event-to-Video Reconstruction with Implicit Neural Representations
⭐code - Neural Surface Detection for Unsigned Distance Fields
- Merging and Splitting Diffusion Paths for Semantically Coherent Panoramas
⭐code - Platypus: A Generalized Specialist Model for Reading Text in Various Forms
⭐code - RAW-Adapter: Adapting Pre-trained Visual Model to Camera RAW Images
⭐code - Learning Differentially Private Diffusion Models via Stochastic Adversarial Distillation
- Affine steerers for structured keypoint description
- SeA: Semantic Adversarial Augmentation for Last Layer Features from Unsupervised Representation Learning
⭐code - MMBench: Is Your Multi-modal Model an All-around Player?
⭐code
🏠project - DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs
⭐code - PreLAR: World Model Pre-training with Learnable Action Representation
⭐code - Dataset Enhancement with Instance-Level Augmentations
⭐code - Non-parametric Sensor Noise Modeling and Synthesis
- Stripe Observation Guided Inference Cost-free Attention Mechanism
⭐code - Leveraging Hierarchical Feature Sharing for Efficient Dataset Condensation
- Object-Aware NIR-to-Visible Translation
⭐code
🌻datasetLow-level Vision