-
Real-Time Packet Loss Concealment With Mixed Generative and Predictive Model
Abstract: As deep speech enhancement algorithms have recently demonstrated capabilities greatly surpassing their traditional counterparts for suppressing noise, reverberation and echo, attention is turning to the problem of packet loss concealment (PLC). PLC is a challenging task because it not only involves real-time speech synthesis, but also frequent transitions between the received audio and the synthes… ▽ More
Submitted 11 May, 2022; originally announced May 2022.
Comments: Submitted to INTERSPEECH 2022
-
Perceptually-Driven Video Coding with the Daala Video Codec
Abstract: The Daala project is a royalty-free video codec that attempts to compete with the best patent-encumbered codecs. Part of our strategy is to replace core tools of traditional video codecs with alternative approaches, many of them designed to take perceptual aspects into account, rather than optimizing for simple metrics like PSNR. This paper documents some of our experiences with these tools, which… ▽ More
Submitted 8 October, 2016; originally announced October 2016.
Comments: 19 pages, Proceedings of SPIE Workshop on Applications of Digital Image Processing (ADIP), 2016
-
Daala: Building A Next-Generation Video Codec From Unconventional Technology
Abstract: Daala is a new royalty-free video codec that attempts to compete with state-of-the-art royalty-bearing codecs. To do so, it must achieve good compression while avoiding all of their patented techniques. We use technology that is as different as possible from traditional approaches to achieve this. This paper describes the technology behind Daala and discusses where it fits in the newly created AV1… ▽ More
Submitted 5 August, 2016; originally announced August 2016.
Comments: 6 pages, accepted for multimedia signal processing (MMSP) workshop, 2016
-
Daala: A Perceptually-Driven Still Picture Codec
Abstract: Daala is a new royalty-free video codec based on perceptually-driven coding techniques. We explore using its keyframe format for still picture coding and show how it has improved over the past year. We believe the technology used in Daala could be the basis of an excellent, royalty-free image format.
Submitted 16 May, 2016; originally announced May 2016.
Comments: Accepted for ICIP 2016, 5 pages
-
Daala: A Perceptually-Driven Next Generation Video Codec
Abstract: The Daala project is a royalty-free video codec that attempts to compete with the best patent-encumbered codecs. Part of our strategy is to replace core tools of traditional video codecs with alternative approaches, many of them designed to take perceptual aspects into account, rather than optimizing for simple metrics like PSNR. This paper documents some of our experiences with these tools, which… ▽ More
Submitted 9 March, 2016; originally announced March 2016.
Comments: 10 pages
-
arXiv:1603.01824 [pdf, ps, other]
Low-Complexity Iterative Sinusoidal Parameter Estimation
Abstract: Sinusoidal parameter estimation is a computationally-intensive task, which can pose problems for real-time implementations. In this paper, we propose a low-complexity iterative method for estimating sinusoidal parameters that is based on the linearisation of the model around an initial frequency estimate. We show that for N sinusoids in a frame of length L, the proposed method has a complexity of… ▽ More
Submitted 6 March, 2016; originally announced March 2016.
Comments: 8 pages. arXiv admin note: substantial text overlap with arXiv:1602.05900
Journal ref: Proceedings of International Conference on Signal Processing and Communication Systems (ICSPCS), pp. 276-283, 2007
-
arXiv:1602.05900 [pdf, ps, other]
An Iterative Linearised Solution to the Sinusoidal Parameter Estimation Problem
Abstract: Signal processing applications use sinusoidal modelling for speech synthesis, speech coding, and audio coding. Estimation of the model parameters involves non-linear optimisation methods, which can be very costly for real-time applications. We propose a low-complexity iterative method that starts from initial frequency estimates and converges rapidly. We show that for N sinusoids in a frame of len… ▽ More
Submitted 17 February, 2016; originally announced February 2016.
Comments: 23 pages
Journal ref: Computers and Electrical Engineering (Elsevier), Vol. 36, No. 4, pp. 603-616, 2010
-
arXiv:1602.05526 [pdf, ps, other]
A High-Quality Speech and Audio Codec With Less Than 10 ms Delay
Abstract: With increasing quality requirements for multimedia communications, audio codecs must maintain both high quality and low delay. Typically, audio codecs offer either low delay or high quality, but rarely both. We propose a codec that simultaneously addresses both these requirements, with a delay of only 8.7 ms at 44.1 kHz. It uses gain-shape algebraic vector quantisation in the frequency domain wit… ▽ More
Submitted 17 February, 2016; originally announced February 2016.
Comments: 10 pages
Journal ref: IEEE Transactions on Audio, Speech and Language Processing, Vol. 18, No. 1, pp. 58-67, 2010
-
arXiv:1602.05311 [pdf, ps, other]
A Full-Bandwidth Audio Codec With Low Complexity And Very Low Delay
Abstract: We propose an audio codec that addresses the low-delay requirements of some applications such as network music performance. The codec is based on the modified discrete cosine transform (MDCT) with very short frames and uses gain-shape quantization to preserve the spectral envelope. The short frame sizes required for low delay typically hinder the performance of transform codecs. However, at 96 kbi… ▽ More
Submitted 17 February, 2016; originally announced February 2016.
Comments: 5 pages, Proceedings of EUSIPCO 2009
-
arXiv:1602.05209 [pdf, ps, other]
Perceptual Vector Quantization For Video Coding
Abstract: This paper applies energy conservation principles to the Daala video codec using gain-shape vector quantization to encode a vector of AC coefficients as a length (gain) and direction (shape). The technique originates from the CELT mode of the Opus audio codec, where it is used to conserve the spectral envelope of an audio signal. Conserving energy in video has the potential to preserve textures ra… ▽ More
Submitted 16 February, 2016; originally announced February 2016.
Comments: 11 pages, Proceedings of SPIE Visual Information Processing and Communication, 2015
Journal ref: Proc. SPIE 9410, Visual Information Processing and Communication VI, 941009 (March 4, 2015)
-
arXiv:1602.04845 [pdf, ps, other]
High-Quality, Low-Delay Music Coding in the Opus Codec
Abstract: The IETF recently standardized the Opus codec as RFC6716. Opus targets a wide range of real-time Internet applications by combining a linear prediction coder with a transform coder. We describe the transform coder, with particular attention to the psychoacoustic knowledge built into the format. The result out-performs existing audio codecs that do not operate under real-time constraints.
Submitted 15 February, 2016; originally announced February 2016.
Comments: 10 pages, 135th AES Convention. Proceedings of the 135th AES Convention, October 2013