Shape of Motion: 4D Reconstruction from a Single Video
Abstract
A monocular dynamic reconstruction method using SE3 motion bases and data-driven priors achieves state-of-the-art performance in 3D/2D motion estimation and novel view synthesis.
Monocular dynamic reconstruction is a challenging and long-standing vision problem due to the highly ill-posed nature of the task. Existing approaches are limited in that they either depend on templates, are effective only in quasi-static scenes, or fail to model 3D motion explicitly. In this work, we introduce a method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion, from casually captured monocular videos. We tackle the under-constrained nature of the problem with two key insights: First, we exploit the low-dimensional structure of 3D motion by representing scene motion with a compact set of SE3 motion bases. Each point's motion is expressed as a linear combination of these bases, facilitating soft decomposition of the scene into multiple rigidly-moving groups. Second, we utilize a comprehensive set of data-driven priors, including monocular depth maps and long-range 2D tracks, and devise a method to effectively consolidate these noisy supervisory signals, resulting in a globally consistent representation of the dynamic scene. Experiments show that our method achieves state-of-the-art performance for both long-range 3D/2D motion estimation and novel view synthesis on dynamic scenes. Project Page: https://shape-of-motion.github.io/
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- VDG: Vision-Only Dynamic Gaussian for Driving Simulation (2024)
- Dynamic Gaussian Marbles for Novel View Synthesis of Casual Monocular Videos (2024)
- MoDGS: Dynamic Gaussian Splatting from Causually-captured Monocular Videos (2024)
- MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds (2024)
- Splatter a Video: Video Gaussian Representation for Versatile Processing (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
 You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: 
@librarian-bot
	 recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
 AK
							AK 
					 
					 
					 
					