Robust Single-Stage Fully Sparse 3D Object Detection via Detachable Latent Diffusion

Qu, Wentao; Mei, Guofeng; Wang, Jing; Wu, Yujiao; Huang, Xiaoshui; Xiao, Liang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2508.03252 (cs)

[Submitted on 5 Aug 2025 (v1), last revised 27 Aug 2025 (this version, v2)]

Title:Robust Single-Stage Fully Sparse 3D Object Detection via Detachable Latent Diffusion

Authors:Wentao Qu, Guofeng Mei, Jing Wang, Yujiao Wu, Xiaoshui Huang, Liang Xiao

View PDF HTML (experimental)

Abstract:Denoising Diffusion Probabilistic Models (DDPMs) have shown success in robust 3D object detection tasks. Existing methods often rely on the score matching from 3D boxes or pre-trained diffusion priors. However, they typically require multi-step iterations in inference, which limits efficiency. To address this, we propose a Robust single-stage fully Sparse 3D object Detection Network with a Detachable Latent Framework (DLF) of DDPMs, named RSDNet. Specifically, RSDNet learns the denoising process in latent feature spaces through lightweight denoising networks like multi-level denoising autoencoders (DAEs). This enables RSDNet to effectively understand scene distributions under multi-level perturbations, achieving robust and reliable detection. Meanwhile, we reformulate the noising and denoising mechanisms of DDPMs, enabling DLF to construct multi-type and multi-level noise samples and targets, enhancing RSDNet robustness to multiple perturbations. Furthermore, a semantic-geometric conditional guidance is introduced to perceive the object boundaries and shapes, alleviating the center feature missing problem in sparse representations, enabling RSDNet to perform in a fully sparse detection pipeline. Moreover, the detachable denoising network design of DLF enables RSDNet to perform single-step detection in inference, further enhancing detection efficiency. Extensive experiments on public benchmarks show that RSDNet can outperform existing methods, achieving state-of-the-art detection.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2508.03252 [cs.CV]
	(or arXiv:2508.03252v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2508.03252

Submission history

From: Wentao Qu [view email]
[v1] Tue, 5 Aug 2025 09:30:39 UTC (7,849 KB)
[v2] Wed, 27 Aug 2025 11:39:11 UTC (7,850 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Robust Single-Stage Fully Sparse 3D Object Detection via Detachable Latent Diffusion

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Robust Single-Stage Fully Sparse 3D Object Detection via Detachable Latent Diffusion

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators