OccTENS: 3D Occupancy World Model via Temporal Next-Scale Prediction

Jin, Bu; Gu, Songen; Hu, Xiaotao; Zheng, Yupeng; Guo, Xiaoyang; Zhang, Qian; Long, Xiaoxiao; Yin, Wei

Abstract:In this paper, we propose OccTENS, a generative occupancy world model that enables controllable, high-fidelity long-term occupancy generation while maintaining computational efficiency. Different from visual generation, the occupancy world model must capture the fine-grained 3D geometry and dynamic evolution of the 3D scenes, posing great challenges for the generative models. Recent approaches based on autoregression (AR) have demonstrated the potential to predict vehicle movement and future occupancy scenes simultaneously from historical observations, but they typically suffer from \textbf{inefficiency}, \textbf{temporal degradation} in long-term generation and \textbf{lack of controllability}. To holistically address these issues, we reformulate the occupancy world model as a temporal next-scale prediction (TENS) task, which decomposes the temporal sequence modeling problem into the modeling of spatial scale-by-scale generation and temporal scene-by-scene prediction. With a \textbf{TensFormer}, OccTENS can effectively manage the temporal causality and spatial relationships of occupancy sequences in a flexible and scalable way. To enhance the pose controllability, we further propose a holistic pose aggregation strategy, which features a unified sequence modeling for occupancy and ego-motion. Experiments show that OccTENS outperforms the state-of-the-art method with both higher occupancy quality and faster inference time.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2509.03887 [cs.CV]
	(or arXiv:2509.03887v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2509.03887

Computer Science > Computer Vision and Pattern Recognition

Title:OccTENS: 3D Occupancy World Model via Temporal Next-Scale Prediction

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators