Skip to main content

Showing 1–19 of 19 results for author: Torcoli, M

.
  1. arXiv:2505.19760  [pdf, ps, other

    eess.AS

    Navigating PESQ: Up-to-Date Versions and Open Implementations

    Authors: Matteo Torcoli, Mhd Modar Halimeh, Emanuël A. P. Habets

    Abstract: Perceptual Evaluation of Speech Quality (PESQ) is an objective quality measure that remains widely used despite its withdrawal by the International Telecommunication Union (ITU). PESQ has evolved over two decades, with multiple versions and publicly available implementations emerging during this time. The numerous versions and their updates can be overwhelming, especially for new PESQ users. This… ▽ More

    Submitted 26 May, 2025; originally announced May 2025.

  2. arXiv:2504.00742  [pdf, other

    eess.AS

    Expanding and Analyzing ODAQ -- the Open Dataset of Audio Quality

    Authors: Sascha Dick, Christoph Thompson, Chih-Wei Wu, Matteo Torcoli, Pablo Delgado, Phillip A. Williams, Emanuel Habets

    Abstract: The Open Dataset of Audio Quality (ODAQ) was recently introduced to address the scarcity of openly available audio datasets with corresponding subjective quality scores. The dataset, released under permissive licenses, comprises audio material processed using six different signal processing methods operating at five quality levels, along with corresponding subjective test results. To expand the da… ▽ More

    Submitted 1 April, 2025; originally announced April 2025.

    Comments: Accepted for presentation at the Audio Engineering Society (AES) 157th Convention, October 2024, New York, USA

  3. arXiv:2503.03304  [pdf, ps, other

    eess.AS

    On the Relation Between Speech Quality and Quantized Latent Representations of Neural Codecs

    Authors: Mhd Modar Halimeh, Matteo Torcoli, Philipp Grundhuber, Emanuël A. P. Habets

    Abstract: Neural audio signal codecs have attracted significant attention in recent years. In essence, the impressive low bitrate achieved by such encoders is enabled by learning an abstract representation that captures the properties of encoded signals, e.g., speech. In this work, we investigate the relation between the latent representation of the input signal learned by a neural codec and the quality of… ▽ More

    Submitted 5 March, 2025; originally announced March 2025.

  4. arXiv:2408.08729  [pdf, ps, other

    eess.AS cs.CL cs.SD

    ConcateNet: Dialogue Separation Using Local And Global Feature Concatenation

    Authors: Mhd Modar Halimeh, Matteo Torcoli, Emanuël Habets

    Abstract: Dialogue separation involves isolating a dialogue signal from a mixture, such as a movie or a TV program. This can be a necessary step to enable dialogue enhancement for broadcast-related applications. In this paper, ConcateNet for dialogue separation is proposed, which is based on a novel approach for processing local and global features aimed at better generalization for out-of-domain signals. C… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  5. arXiv:2405.17364  [pdf, other

    eess.AS

    Speech Loudness in Broadcasting and Streaming

    Authors: Matteo Torcoli, Mhd Modar Halimeh, Thomas Leitz, Yannik Grewe, Michael Kratschmer, Bernhard Neugebauer, Adrian Murtaza, Harald Fuchs, Emanuël A. P. Habets

    Abstract: The introduction and regulation of loudness in broadcasting and streaming brought clear benefits to the audience, e.g., a level of uniformity across programs and channels. Yet, speech loudness is frequently reported as being too low in certain passages, which can hinder the full understanding and enjoyment of movies and TV programs. This paper proposes expanding the set of loudness-based measures… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted for presentation at the Audio Engineering Society (AES) 156th Convention, June 2024, Madrid, Spain

  6. arXiv:2401.00197  [pdf, other

    eess.AS

    ODAQ: Open Dataset of Audio Quality

    Authors: Matteo Torcoli, Chih-Wei Wu, Sascha Dick, Phillip A. Williams, Mhd Modar Halimeh, William Wolcott, Emanuel A. P. Habets

    Abstract: Research into the prediction and analysis of perceived audio quality is hampered by the scarcity of openly available datasets of audio signals accompanied by corresponding subjective quality scores. To address this problem, we present the Open Dataset of Audio Quality (ODAQ), a new dataset containing the results of a MUSHRA listening test conducted with expert listeners from 2 international labora… ▽ More

    Submitted 30 December, 2023; originally announced January 2024.

    Comments: Accepted paper. IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), Seoul, Korea, April 2024

  7. arXiv:2305.19100  [pdf, other

    eess.AS cs.SD

    Predicting Preferred Dialogue-to-Background Loudness Difference in Dialogue-Separated Audio

    Authors: Luca Resti, Martin Strauss, Matteo Torcoli, Emanuël Habets, Bernd Edler

    Abstract: Dialogue Enhancement (DE) enables the rebalancing of dialogue and background sounds to fit personal preferences and needs in the context of broadcast audio. When individual audio stems are unavailable from production, Dialogue Separation (DS) can be applied to the final audio mixture to obtain estimates of these stems. This work focuses on Preferred Loudness Differences (PLDs) between dialogue and… ▽ More

    Submitted 31 May, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: Paper accepted at the 15th International Conference on Quality of Multimedia Experience (QoMEX), 4 pages, 2 figures

  8. arXiv:2303.13453  [pdf, other

    eess.AS cs.SD

    Better Together: Dialogue Separation and Voice Activity Detection for Audio Personalization in TV

    Authors: Matteo Torcoli, Emanuël A. P. Habets

    Abstract: In TV services, dialogue level personalization is key to meeting user preferences and needs. When dialogue and background sounds are not separately available from the production stage, Dialogue Separation (DS) can estimate them to enable personalization. DS was shown to provide clear benefits for the end user. Still, the estimated signals are not perfect, and some leakage can be introduced. This i… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

    Comments: Paper accepted to the 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023), Rhodes, Greece

  9. arXiv:2210.11654  [pdf, other

    eess.AS cs.SD

    Improved Normalizing Flow-Based Speech Enhancement using an All-pole Gammatone Filterbank for Conditional Input Representation

    Authors: Martin Strauss, Matteo Torcoli, Bernd Edler

    Abstract: Deep generative models for Speech Enhancement (SE) received increasing attention in recent years. The most prominent example are Generative Adversarial Networks (GANs), while normalizing flows (NF) received less attention despite their potential. Building on previous work, architectural modifications are proposed, along with an investigation of different conditional input representations. Despite… ▽ More

    Submitted 20 October, 2022; originally announced October 2022.

    Comments: Accepted for Presentation at IEEE SLT 2022

  10. Dialogue Enhancement and Listening Effort in Broadcast Audio: A Multimodal Evaluation

    Authors: Matteo Torcoli, Thomas Robotham, Emanuël A. P. Habets

    Abstract: Dialogue enhancement (DE) plays a vital role in broadcasting, enabling the personalization of the relative level between foreground speech and background music and effects. DE has been shown to improve the quality of experience, intelligibility, and self-reported listening effort (LE). A physiological indicator of LE known from audiology studies is pupil size. The relation between pupil size and L… ▽ More

    Submitted 3 August, 2022; v1 submitted 28 July, 2022; originally announced July 2022.

    Comments: Paper accepted to 14th International Conference on Quality of Multimedia Experience (QoMEX), Lippstadt, Germany, 2022 - version 2 fixes some typos

  11. arXiv:2206.02125  [pdf, other

    eess.AS cs.SD

    Geometrically-Motivated Primary-Ambient Decomposition With Center-Channel Extraction

    Authors: Jouni Paulus, Matteo Torcoli

    Abstract: A geometrically-motivated method for primary-ambient decomposition is proposed and evaluated in an up-mixing application. The method consists of two steps, accommodating a particularly intuitive explanation. The first step consists of signal-adaptive rotations applied on the input stereo scene, which translate the primary sound sources into the center of the rotated scene. The second step applies… ▽ More

    Submitted 5 June, 2022; originally announced June 2022.

    Comments: accepted into EUSIPCO 2022

  12. arXiv:2206.02124  [pdf, other

    eess.AS cs.SD

    Sampling Frequency Independent Dialogue Separation

    Authors: Jouni Paulus, Matteo Torcoli

    Abstract: In some DNNs for audio source separation, the relevant model parameters are independent of the sampling frequency of the audio used for training. Considering the application of dialogue separation, this is shown for two DNN architectures: a U-Net and a fully-convolutional model. The models are trained with audio sampled at 8 kHz. The learned parameters are transferred to models for processing audi… ▽ More

    Submitted 5 June, 2022; originally announced June 2022.

    Comments: accepted into EUSIPCO 2022

  13. arXiv:2112.09494  [pdf

    eess.AS eess.SP

    Dialog+ in Broadcasting: First Field Tests Using Deep-Learning-Based Dialogue Enhancement

    Authors: Matteo Torcoli, Christian Simon, Jouni Paulus, Davide Straninger, Alfred Riedel, Volker Koch, Stefan Wits, Daniela Rieger, Harald Fuchs, Christian Uhle, Stefan Meltzer, Adrian Murtaza

    Abstract: Difficulties in following speech due to loud background sounds are common in broadcasting. Object-based audio, e.g., MPEG-H Audio solves this problem by providing a user-adjustable speech level. While object-based audio is gaining momentum, transitioning to it requires time and effort. Also, lots of content exists, produced and archived outside the object-based workflows. To address this, Fraunhof… ▽ More

    Submitted 17 December, 2021; originally announced December 2021.

    Comments: Presented at IBC 2021 (International Broadcasting Convention)

  14. Objective Measures of Perceptual Audio Quality Reviewed: An Evaluation of Their Application Domain Dependence

    Authors: Matteo Torcoli, Thorsten Kastner, Jürgen Herre

    Abstract: Over the past few decades, computational methods have been developed to estimate perceptual audio quality. These methods, also referred to as objective quality measures, are usually developed and intended for a specific application domain. Because of their convenience, they are often used outside their original intended domain, even if it is unclear whether they provide reliable quality estimates… ▽ More

    Submitted 21 October, 2021; originally announced October 2021.

    Journal ref: IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 29, 2021

  15. Controlling the Perceived Sound Quality for Dialogue Enhancement with Deep Learning

    Authors: Christian Uhle, Matteo Torcoli, Jouni Paulus

    Abstract: Speech enhancement attenuates interfering sounds in speech signals but may introduce artifacts that perceivably deteriorate the output signal. We propose a method for controlling the trade-off between the attenuation of the interfering background signal and the loss of sound quality. A deep neural network estimates the attenuation of the separated background signal such that the sound quality, qua… ▽ More

    Submitted 22 July, 2021; originally announced July 2021.

    Comments: Accepted paper at ICASSP 2020

    Journal ref: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  16. Controlling the Remixing of Separated Dialogue with a Non-Intrusive Quality Estimate

    Authors: Matteo Torcoli, Jouni Paulus, Thorsten Kastner, Christian Uhle

    Abstract: Remixing separated audio sources trades off interferer attenuation against the amount of audible deteriorations. This paper proposes a non-intrusive audio quality estimation method for controlling this trade-off in a signal-adaptive manner. The recently proposed 2f-model is adopted as the underlying quality measure, since it has been shown to correlate strongly with basic audio quality in source s… ▽ More

    Submitted 21 July, 2021; originally announced July 2021.

    Comments: Manuscript accepted for the 2021 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

  17. arXiv:2106.09093  [pdf, other

    eess.AS cs.SD

    A Hands-on Comparison of DNNs for Dialog Separation Using Transfer Learning from Music Source Separation

    Authors: Martin Strauss, Jouni Paulus, Matteo Torcoli, Bernd Edler

    Abstract: This paper describes a hands-on comparison on using state-of-the-art music source separation deep neural networks (DNNs) before and after task-specific fine-tuning for separating speech content from non-speech content in broadcast audio (i.e., dialog separation). The music separation models are selected as they share the number of channels (2) and sampling rate (44.1 kHz or higher) with the consid… ▽ More

    Submitted 22 June, 2021; v1 submitted 16 June, 2021; originally announced June 2021.

    Comments: accepted in INTERSPEECH 2021

  18. An Improved Measure of Musical Noise Based on Spectral Kurtosis

    Authors: Matteo Torcoli

    Abstract: Audio processing methods operating on a time-frequency representation of the signal can introduce unpleasant sounding artifacts known as musical noise. These artifacts are observed in the context of audio coding, speech enhancement, and source separation. The change in kurtosis of the power spectrum introduced during the processing was shown to correlate with the human perception of musical noise… ▽ More

    Submitted 27 May, 2021; originally announced May 2021.

    Comments: Manuscript accepted for the 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics

    Journal ref: 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2019, pp. 90-94

  19. arXiv:1909.11549  [pdf, other

    eess.AS cs.SD

    MPEG-H Audio for Improving Accessibility in Broadcasting and Streaming

    Authors: Christian Simon, Matteo Torcoli, Jouni Paulus

    Abstract: Broadcasting and streaming services still suffer from various levels of accessibility barriers for a significant portion of the population, limiting the access to information and culture, and in the most severe cases limiting the empowerment of people. This paper provides a brief overview of some of the most common accessibility barriers encountered. It then gives a short introduction to object-ba… ▽ More

    Submitted 25 September, 2019; originally announced September 2019.

    Comments: White Paper