SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts

Zhou, Gengze; Hong, Yicong; Wang, Zun; Zhao, Chongyang; Bansal, Mohit; Wu, Qi

Computer Science > Computer Vision and Pattern Recognition

arXiv:2412.05552 (cs)

[Submitted on 7 Dec 2024]

Title:SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts

Authors:Gengze Zhou, Yicong Hong, Zun Wang, Chongyang Zhao, Mohit Bansal, Qi Wu

View PDF HTML (experimental)

Abstract:The academic field of learning instruction-guided visual navigation can be generally categorized into high-level category-specific search and low-level language-guided navigation, depending on the granularity of language instruction, in which the former emphasizes the exploration process, while the latter concentrates on following detailed textual commands. Despite the differing focuses of these tasks, the underlying requirements of interpreting instructions, comprehending the surroundings, and inferring action decisions remain consistent. This paper consolidates diverse navigation tasks into a unified and generic framework -- we investigate the core difficulties of sharing general knowledge and exploiting task-specific capabilities in learning navigation and propose a novel State-Adaptive Mixture of Experts (SAME) model that effectively enables an agent to infer decisions based on different-granularity language and dynamic observations. Powered by SAME, we present a versatile agent capable of addressing seven navigation tasks simultaneously that outperforms or achieves highly comparable performance to task-specific agents.

Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Robotics (cs.RO)
Cite as:	arXiv:2412.05552 [cs.CV]
	(or arXiv:2412.05552v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2412.05552

Submission history

From: Gengze Zhou [view email]
[v1] Sat, 7 Dec 2024 06:12:53 UTC (3,241 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators