arxiv:2510.18313

OmniNWM: Omniscient Driving Navigation World Models

Published on Oct 21

· Submitted by

Bohan Li on Oct 23

Upvote

Authors:

Zhuang Ma ,

Abstract

OmniNWM is a unified world model for autonomous driving that generates panoramic videos, encodes actions using Plucker ray-maps, and defines dense rewards based on 3D occupancy, achieving top performance in video generation, control, and stability.

AI-generated summary

Autonomous driving world models are expected to work effectively across three core dimensions: state, action, and reward. Existing models, however, are typically restricted to limited state modalities, short video sequences, imprecise action control, and a lack of reward awareness. In this paper, we introduce OmniNWM, an omniscient panoramic navigation world model that addresses all three dimensions within a unified framework. For state, OmniNWM jointly generates panoramic videos of RGB, semantics, metric depth, and 3D occupancy. A flexible forcing strategy enables high-quality long-horizon auto-regressive generation. For action, we introduce a normalized panoramic Plucker ray-map representation that encodes input trajectories into pixel-level signals, enabling highly precise and generalizable control over panoramic video generation. Regarding reward, we move beyond learning reward functions with external image-based models: instead, we leverage the generated 3D occupancy to directly define rule-based dense rewards for driving compliance and safety. Extensive experiments demonstrate that OmniNWM achieves state-of-the-art performance in video generation, control accuracy, and long-horizon stability, while providing a reliable closed-loop evaluation framework through occupancy-grounded rewards. Project page is available at https://github.com/Arlo0o/OmniNWM.

View arXiv page View PDF Project page GitHub 83 Add to collection

Community

Arlolo0

Paper submitter 5 days ago

OmniNWM addresses three core dimensions of autonomous driving world models:

📊 State: Joint generation of panoramic RGB, semantic, metric depth, and 3D occupancy videos
🎮 Action: Precise panoramic camera control via normalized Plücker ray-maps
🏆 Reward: Integrated occupancy-based dense rewards for driving compliance and safety