Browser Notice: This page may have display issues in Chrome. For the best experience, we recommend using Safari, Firefox, or other browsers.

FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos

* equal contribution project lead corresponding authors
1
2
3
4
5

TL;DRFoundationMotion is an automated data-curation pipeline for constructing large-scale motion-understanding video datasets. Models trained on these datasets show strong improvements in motion understanding.

Introducing FoundationMotion

We Design a Fully Automated Data Curation Pipeline that Constructs Large-Scale Motion Datasets

Our dataset contains raw videos, processed videos, captions, and question-answer pairs.

Raw Video

Processed Video

We Introduce a Model for Improved Motion Understanding

Cooking Robot

Question

Which hand is the robot using to flip the toast?

✓ Answer: Right hand

FoundationMotion Model: Nvila-15B fine-tuned with the dataset obtained from our proposed FoundationMotion automated data collection pipeline.

Autonomous Vehicle

Question

What is the primary driving behavior demonstrated by the ego vehicle in the video?

✓ Answer: The ego vehicle changes lanes at night to avoid a car with hazard lights ahead

FoundationMotion Model: Nvila-15B fine-tuned with the dataset obtained from our proposed FoundationMotion automated data collection pipeline.

Performance Comparison Across Benchmarks

Green bars with ↑ indicators show improvements of our FoundationMotion model over the baseline
Much Larger Models
Baseline Models
FT w/ FoundationMotion (Ours)
100 80 60 40 20 0

Citation

@misc{gan2025foundationmotion,
    title={FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos}, 
    author={Yulu Gan and Ligeng Zhu and Dandan Shan and Baifeng Shi and Hongxu Yin and Boris Ivanovic and Song Han and Trevor Darrell and Jitendra Malik and Marco Pavone and Boyi Li},
    year={2025},
    eprint={2512.10927},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2512.10927}, 
}