arxiv:2509.05296

WinT3R: Window-Based Streaming Reconstruction with Camera Token Pool

Published on Sep 5

· Submitted by

lizizun on Sep 8

Upvote

Authors:

Jianjun Zhou ,

Haoyu Guo ,

Haoyi Zhu ,

Tong He

Abstract

WinT3R, a feed-forward reconstruction model, achieves high-quality camera pose estimation and real-time performance using a sliding window mechanism and a global camera token pool.

AI-generated summary

We present WinT3R, a feed-forward reconstruction model capable of online prediction of precise camera poses and high-quality point maps. Previous methods suffer from a trade-off between reconstruction quality and real-time performance. To address this, we first introduce a sliding window mechanism that ensures sufficient information exchange among frames within the window, thereby improving the quality of geometric predictions without large computation. In addition, we leverage a compact representation of cameras and maintain a global camera token pool, which enhances the reliability of camera pose estimation without sacrificing efficiency. These designs enable WinT3R to achieve state-of-the-art performance in terms of online reconstruction quality, camera pose estimation, and reconstruction speed, as validated by extensive experiments on diverse datasets. Code and model are publicly available at https://github.com/LiZizun/WinT3R.

View arXiv page View PDF Add to collection