Psychoacoustic Calibration of Loss Functions for Efficient End-to-End Neural Audio Coding

Zhen, Kai; Lee, Mi Suk; Sung, Jongmo; Beack, Seungkwon; Kim, Minje

doi:10.1109/LSP.2020.3039765

Computer Science > Sound

arXiv:2101.00054 (cs)

[Submitted on 31 Dec 2020]

Title:Psychoacoustic Calibration of Loss Functions for Efficient End-to-End Neural Audio Coding

Authors:Kai Zhen, Mi Suk Lee, Jongmo Sung, Seungkwon Beack, Minje Kim

View PDF

Abstract:Conventional audio coding technologies commonly leverage human perception of sound, or psychoacoustics, to reduce the bitrate while preserving the perceptual quality of the decoded audio signals. For neural audio codecs, however, the objective nature of the loss function usually leads to suboptimal sound quality as well as high run-time complexity due to the large model size. In this work, we present a psychoacoustic calibration scheme to re-define the loss functions of neural audio coding systems so that it can decode signals more perceptually similar to the reference, yet with a much lower model complexity. The proposed loss function incorporates the global masking threshold, allowing the reconstruction error that corresponds to inaudible artifacts. Experimental results show that the proposed model outperforms the baseline neural codec twice as large and consuming 23.4% more bits per second. With the proposed method, a lightweight neural codec, with only 0.9 million parameters, performs near-transparent audio coding comparable with the commercial MPEG-1 Audio Layer III codec at 112 kbps.

Subjects:	Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2101.00054 [cs.SD]
	(or arXiv:2101.00054v1 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.2101.00054
Journal reference:	IEEE Signal Processing Letters, vol. 27, pp. 2159-2163, 2020
Related DOI:	https://doi.org/10.1109/LSP.2020.3039765

Submission history

From: Minje Kim [view email]
[v1] Thu, 31 Dec 2020 19:46:46 UTC (768 KB)

Computer Science > Sound

Title:Psychoacoustic Calibration of Loss Functions for Efficient End-to-End Neural Audio Coding

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:Psychoacoustic Calibration of Loss Functions for Efficient End-to-End Neural Audio Coding

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators