Towards Generating Ambisonics Using Audio-Visual Cue for Virtual Reality

Rana, Aakanksha; Ozcinar, Cagri; Smolic, Aljoscha

doi:10.1109/ICASSP.2019.8683318

Computer Science > Sound

arXiv:1908.06752 (cs)

[Submitted on 16 Aug 2019]

Title:Towards Generating Ambisonics Using Audio-Visual Cue for Virtual Reality

Authors:Aakanksha Rana, Cagri Ozcinar, Aljoscha Smolic

View PDF

Abstract:Ambisonics i.e., a full-sphere surround sound, is quintessential with 360-degree visual content to provide a realistic virtual reality (VR) experience. While 360-degree visual content capture gained a tremendous boost recently, the estimation of corresponding spatial sound is still challenging due to the required sound-field microphones or information about the sound-source locations. In this paper, we introduce a novel problem of generating Ambisonics in 360-degree videos using the audio-visual cue. With this aim, firstly, a novel 360-degree audio-visual video dataset of 265 videos is introduced with annotated sound-source locations. Secondly, a pipeline is designed for an automatic Ambisonic estimation problem. Benefiting from the deep learning-based audio-visual feature-embedding and prediction modules, our pipeline estimates the 3D sound-source locations and further use such locations to encode to the B-format. To benchmark our dataset and pipeline, we additionally propose evaluation criteria to investigate the performance using different 360-degree input representations. Our results demonstrate the efficacy of the proposed pipeline and open up a new area of research in 360-degree audio-visual analysis for future investigations.

Comments:	ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Subjects:	Sound (cs.SD); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:1908.06752 [cs.SD]
	(or arXiv:1908.06752v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.1908.06752
Related DOI:	https://doi.org/10.1109/ICASSP.2019.8683318

Submission history

From: Aakanksha Rana [view email]
[v1] Fri, 16 Aug 2019 14:49:30 UTC (8,242 KB)

Computer Science > Sound

Title:Towards Generating Ambisonics Using Audio-Visual Cue for Virtual Reality

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Towards Generating Ambisonics Using Audio-Visual Cue for Virtual Reality

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators