Showing 1–1 of 1 results for author: Hersek, S

Search v0.5.6 released 2020-02-24

arXiv:2506.00273 [pdf, other]

eess.AS cs.LG cs.SD

SoundSculpt: Direction and Semantics Driven Ambisonic Target Sound Extraction

Authors: Tuochao Chen, D Shin, Hakan Erdogan, Sinan Hersek

Abstract: This paper introduces SoundSculpt, a neural network designed to extract target sound fields from ambisonic recordings. SoundSculpt employs an ambisonic-in-ambisonic-out architecture and is conditioned on both spatial information (e.g., target direction obtained by pointing at an immersive video) and semantic embeddings (e.g., derived from image segmentation and captioning). Trained and evaluated o… ▽ More This paper introduces SoundSculpt, a neural network designed to extract target sound fields from ambisonic recordings. SoundSculpt employs an ambisonic-in-ambisonic-out architecture and is conditioned on both spatial information (e.g., target direction obtained by pointing at an immersive video) and semantic embeddings (e.g., derived from image segmentation and captioning). Trained and evaluated on synthetic and real ambisonic mixtures, SoundSculpt demonstrates superior performance compared to various signal processing baselines. Our results further reveal that while spatial conditioning alone can be effective, the combination of spatial and semantic information is beneficial in scenarios where there are secondary sound sources spatially close to the target. Additionally, we compare two different semantic embeddings derived from a text description of the target sound using text encoders. △ Less

Submitted 30 May, 2025; originally announced June 2025.

Search v0.5.6 released 2020-02-24