MV-SSM: Multi-View State Space Modeling for 3D Human Pose Estimation

Chharia, Aviral; Gou, Wenbo; Dong, Haoye

doi:10.1109/CVPR52734.2025.01082

Computer Science > Computer Vision and Pattern Recognition

arXiv:2509.00649 (cs)

[Submitted on 31 Aug 2025]

Title:MV-SSM: Multi-View State Space Modeling for 3D Human Pose Estimation

Authors:Aviral Chharia, Wenbo Gou, Haoye Dong

View PDF HTML (experimental)

Abstract:While significant progress has been made in single-view 3D human pose estimation, multi-view 3D human pose estimation remains challenging, particularly in terms of generalizing to new camera configurations. Existing attention-based transformers often struggle to accurately model the spatial arrangement of keypoints, especially in occluded scenarios. Additionally, they tend to overfit specific camera arrangements and visual scenes from training data, resulting in substantial performance drops in new settings. In this study, we introduce a novel Multi-View State Space Modeling framework, named MV-SSM, for robustly estimating 3D human keypoints. We explicitly model the joint spatial sequence at two distinct levels: the feature level from multi-view images and the person keypoint level. We propose a Projective State Space (PSS) block to learn a generalized representation of joint spatial arrangements using state space modeling. Moreover, we modify Mamba's traditional scanning into an effective Grid Token-guided Bidirectional Scanning (GTBS), which is integral to the PSS block. Multiple experiments demonstrate that MV-SSM achieves strong generalization, outperforming state-of-the-art methods: +10.8 on AP25 (+24%) on the challenging three-camera setting in CMU Panoptic, +7.0 on AP25 (+13%) on varying camera arrangements, and +15.3 PCP (+38%) on Campus A1 in cross-dataset evaluations. Project Website: this https URL

Comments:	CVPR 2025; Project Website: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Cite as:	arXiv:2509.00649 [cs.CV]
	(or arXiv:2509.00649v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2509.00649
Journal reference:	CVPR, Nashville, TN, USA, 2025, pp. 11590-11599
Related DOI:	https://doi.org/10.1109/CVPR52734.2025.01082

Submission history

From: Aviral Chharia [view email]
[v1] Sun, 31 Aug 2025 00:57:41 UTC (4,190 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:MV-SSM: Multi-View State Space Modeling for 3D Human Pose Estimation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:MV-SSM: Multi-View State Space Modeling for 3D Human Pose Estimation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators