Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation

Zhou, Hang; Xu, Xudong; Lin, Dahua; Wang, Xiaogang; Liu, Ziwei

Computer Science > Computer Vision and Pattern Recognition

arXiv:2007.09902 (cs)

[Submitted on 20 Jul 2020]

Title:Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation

Authors:Hang Zhou, Xudong Xu, Dahua Lin, Xiaogang Wang, Ziwei Liu

View PDF

Abstract:Stereophonic audio is an indispensable ingredient to enhance human auditory experience. Recent research has explored the usage of visual information as guidance to generate binaural or ambisonic audio from mono ones with stereo supervision. However, this fully supervised paradigm suffers from an inherent drawback: the recording of stereophonic audio usually requires delicate devices that are expensive for wide accessibility. To overcome this challenge, we propose to leverage the vastly available mono data to facilitate the generation of stereophonic audio. Our key observation is that the task of visually indicated audio separation also maps independent audios to their corresponding visual positions, which shares a similar objective with stereophonic audio generation. We integrate both stereo generation and source separation into a unified framework, Sep-Stereo, by considering source separation as a particular type of audio spatialization. Specifically, a novel associative pyramid network architecture is carefully designed for audio-visual feature fusion. Extensive experiments demonstrate that our framework can improve the stereophonic audio generation results while performing accurate sound separation with a shared backbone.

Comments:	To appear in Proceedings of the European Conference on Computer Vision (ECCV), 2020. Code, models, and video results are available on our webpage: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2007.09902 [cs.CV]
	(or arXiv:2007.09902v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2007.09902

Submission history

From: Hang Zhou [view email]
[v1] Mon, 20 Jul 2020 06:20:26 UTC (3,596 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators