Skip to main content

Showing 1–1 of 1 results for author: Cao-Dinh, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2507.00669  [pdf, ps, other

    cs.LG cs.AI cs.CV cs.RO

    Audio-3DVG: Unified Audio - Point Cloud Fusion for 3D Visual Grounding

    Authors: Duc Cao-Dinh, Khai Le-Duc, Anh Dao, Bach Phan Tat, Chris Ngo, Duy M. H. Nguyen, Nguyen X. Khanh, Thanh Nguyen-Tang

    Abstract: 3D Visual Grounding (3DVG) involves localizing target objects in 3D point clouds based on natural language. While prior work has made strides using textual descriptions, leveraging spoken language-known as Audio-based 3D Visual Grounding-remains underexplored and challenging. Motivated by advances in automatic speech recognition (ASR) and speech representation learning, we propose Audio-3DVG, a si… ▽ More

    Submitted 1 July, 2025; originally announced July 2025.

    Comments: Work in progress, 42 pages