MTMamba++: Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders

Lin, Baijiong; Jiang, Weisen; Chen, Pengguang; Liu, Shu; Chen, Ying-Cong

Computer Science > Computer Vision and Pattern Recognition

arXiv:2408.15101 (cs)

[Submitted on 27 Aug 2024 (v1), last revised 26 Jul 2025 (this version, v2)]

Title:MTMamba++: Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders

Authors:Baijiong Lin, Weisen Jiang, Pengguang Chen, Shu Liu, Ying-Cong Chen

View PDF HTML (experimental)

Abstract:Multi-task dense scene understanding, which trains a model for multiple dense prediction tasks, has a wide range of application scenarios. Capturing long-range dependency and enhancing cross-task interactions are crucial to multi-task dense prediction. In this paper, we propose MTMamba++, a novel architecture for multi-task scene understanding featuring with a Mamba-based decoder. It contains two types of core blocks: self-task Mamba (STM) block and cross-task Mamba (CTM) block. STM handles long-range dependency by leveraging state-space models, while CTM explicitly models task interactions to facilitate information exchange across tasks. We design two types of CTM block, namely F-CTM and S-CTM, to enhance cross-task interaction from feature and semantic perspectives, respectively. Extensive experiments on NYUDv2, PASCAL-Context, and Cityscapes datasets demonstrate the superior performance of MTMamba++ over CNN-based, Transformer-based, and diffusion-based methods while maintaining high computational efficiency. The code is available at this https URL.

Comments:	Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2408.15101 [cs.CV]
	(or arXiv:2408.15101v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2408.15101

Submission history

From: Baijiong Lin [view email]
[v1] Tue, 27 Aug 2024 14:36:46 UTC (15,194 KB)
[v2] Sat, 26 Jul 2025 14:53:04 UTC (18,844 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:MTMamba++: Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:MTMamba++: Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators