arxiv:2606.28276

SimFoundry: Modular and Automated Scene Generation for Policy Learning and Evaluation

Published on Jun 26

· Submitted by

taesiri on Jun 29

NVIDIA

Upvote

Authors:

Abstract

SimFoundry enables zero-shot real-world robot policy training through automated simulation construction and diverse scene variations that improve generalization and performance prediction.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Training and evaluating robot policies in the real world is costly and difficult to scale. We introduce SimFoundry, a modular and automated system for zero-shot real-to-sim scene construction from a video. SimFoundry generates sim-ready digital twins and supports object, scene, and task editing, enabling the automated generation of diverse digital cousins: affordance-preserving variations of reconstructed real-world scenes. Policies trained on SimFoundry data transfer zero-shot to challenging real tasks involving multi-step manipulation, articulated object interaction, and bimanual interaction, and its digital cousins (variations of the original scene, objects, and tasks) facilitate generalization to new real-world conditions. Across 7 manipulation tasks and 5 policy architectures, SimFoundry simulation evaluations strongly predict real-world performance, with mean Pearson correlation 0.911 and mean maximum ranking violation 0.018. When evaluating sim-trained policies zero-shot in the real world, policies trained with object, scene, and task cousins in simulation show average task success rate improvements of 17%, 21%, and 40%, respectively. Additional details at https://research.nvidia.com/labs/gear/simfoundry/ .