Search | arXiv e-print repository

arXiv:2407.20765 [pdf]

Integrating audiological datasets via federated merging of Auditory Profiles

Authors: Samira Saak, Dirk Oetting, Birger Kollmeier, Mareike Buhl

Abstract: Audiological datasets contain valuable knowledge about hearing loss in patients, which can be uncovered using data-driven, federated learning techniques. Our previous approach summarized patient information from one audiological dataset into distinct Auditory Profiles (APs). To obtain a better estimate of the audiological patient population, however, patient patterns must be analyzed across multip… ▽ More Audiological datasets contain valuable knowledge about hearing loss in patients, which can be uncovered using data-driven, federated learning techniques. Our previous approach summarized patient information from one audiological dataset into distinct Auditory Profiles (APs). To obtain a better estimate of the audiological patient population, however, patient patterns must be analyzed across multiple, separated datasets, and finally, be integrated into a combined set of APs. This study aimed at extending the existing profile generation pipeline with an AP merging step, enabling the combination of APs from different datasets based on their similarity across audiological measures. The 13 previously generated APs (NA=595) were merged with 31 newly generated APs from a second dataset (NB=1272) using a similarity score derived from the overlapping densities of common features across the two datasets. To ensure clinical applicability, random forest models were created for various scenarios, encompassing different combinations of audiological measures. A new set with 13 combined APs is proposed, providing separable profiles, which still capture detailed patient information from various test outcome combinations. The classification performance across these profiles is satisfactory. The best performance was achieved using a combination of loudness scaling, audiogram and speech test information, while single measures performed worst. The enhanced profile generation pipeline demonstrates the feasibility of combining APs across datasets, which should generalize to all datasets and could lead to an interpretable global profile set in the future. The classification models maintain clinical applicability. △ Less

Submitted 29 November, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

arXiv:2204.06907 [pdf, other]

Lombard Effect for Bilingual Speakers in Cantonese and English: importance of spectro-temporal features

Authors: Maximilian Karl Scharf, Sabine Hochmuth, Lena L. N. Wong, Birger Kollmeier, Anna Warzybok

Abstract: For a better understanding of the mechanisms underlying speech perception and the contribution of different signal features, computational models of speech recognition have a long tradition in hearing research. Due to the diverse range of situations in which speech needs to be recognized, these models need to be generalizable across many acoustic conditions, speakers, and languages. This contribut… ▽ More For a better understanding of the mechanisms underlying speech perception and the contribution of different signal features, computational models of speech recognition have a long tradition in hearing research. Due to the diverse range of situations in which speech needs to be recognized, these models need to be generalizable across many acoustic conditions, speakers, and languages. This contribution examines the importance of different features for speech recognition predictions of plain and Lombard speech for English in comparison to Cantonese in stationary and modulated noise. While Cantonese is a tonal language that encodes information in spectro-temporal features, the Lombard effect is known to be associated with spectral changes in the speech signal. These contrasting properties of tonal languages and the Lombard effect form an interesting basis for the assessment of speech recognition models. Here, an automatic speech recognition-based ASR model using spectral or spectro-temporal features is evaluated with empirical data. The results indicate that spectro-temporal features are crucial in order to predict the speaker-specific speech recognition threshold SRT$_{50}$ in both Cantonese and English as well as to account for the improvement of speech recognition in modulated noise, while effects due to Lombard speech can already be predicted by spectral features. △ Less

Submitted 14 April, 2022; originally announced April 2022.

Comments: Submitted to INTERSPEECH2022

arXiv:2203.09148 [pdf, other]

doi 10.1016/j.csl.2021.101329

Prediction of speech intelligibility with DNN-based performance measures

Authors: Angel Mario Castro Martinez, Constantin Spille, Jana Roßbach, Birger Kollmeier, Bernd T. Meyer

Abstract: This paper presents a speech intelligibility model based on automatic speech recognition (ASR), combining phoneme probabilities from deep neural networks (DNN) and a performance measure that estimates the word error rate from these probabilities. This model does not require the clean speech reference nor the word labels during testing as the ASR decoding step, which finds the most likely sequence… ▽ More This paper presents a speech intelligibility model based on automatic speech recognition (ASR), combining phoneme probabilities from deep neural networks (DNN) and a performance measure that estimates the word error rate from these probabilities. This model does not require the clean speech reference nor the word labels during testing as the ASR decoding step, which finds the most likely sequence of words given phoneme posterior probabilities, is omitted. The model is evaluated via the root-mean-squared error between the predicted and observed speech reception thresholds from eight normal-hearing listeners. The recognition task consists of identifying noisy words from a German matrix sentence test. The speech material was mixed with eight noise maskers covering different modulation types, from speech-shaped stationary noise to a single-talker masker. The prediction performance is compared to five established models and an ASR-model using word labels. Two combinations of features and networks were tested. Both include temporal information either at the feature level (amplitude modulation filterbanks and a feed-forward network) or captured by the architecture (mel-spectrograms and a time-delay deep neural network, TDNN). The TDNN model is on par with the DNN while reducing the number of parameters by a factor of 37; this optimization allows parallel streams on dedicated hearing aid hardware as a forward-pass can be computed within the 10ms of each frame. The proposed model performs almost as well as the label-based model and produces more accurate predictions than the baseline models. △ Less

Submitted 17 March, 2022; originally announced March 2022.

Journal ref: Computer Speech & Language, 74, p.101329 (2022)

arXiv:2110.01422 [pdf, ps, other]

Individualized sound pressure equalization in hearing devices exploiting an electro-acoustic model

Authors: Henning Schepker, Reinhild Rohden, Florian Denk, Birger Kollmeier, Matthias Blau, Simon Doclo

Abstract: To improve sound quality in hearing devices, the hearing device output should be appropriately equalized. To achieve optimal individualized equalization typically requires knowledge of all transfer functions between the source, the hearing device, and the individual eardrum. However, in practice the measurement of all of these transfer functions is not feasible. This study investigates sound press… ▽ More To improve sound quality in hearing devices, the hearing device output should be appropriately equalized. To achieve optimal individualized equalization typically requires knowledge of all transfer functions between the source, the hearing device, and the individual eardrum. However, in practice the measurement of all of these transfer functions is not feasible. This study investigates sound pressure equalization using different transfer function estimates. Specifically, an electro-acoustic model is used to predict the sound pressure at the individual eardrum, and average estimates are used to predict the remaining transfer functions. Experimental results show that using these assumptions a practically feasible and close-to-optimal individualized sound pressure equalization can be achieved. △ Less

Submitted 4 October, 2021; originally announced October 2021.

arXiv:2109.04241 [pdf, ps, other]

Robust single- and multi-loudspeaker least-squares-based equalization for hearing devices

Authors: Henning Schepker, Florian Denk, Birger Kollmeier, Simon Doclo

Abstract: To improve the sound quality of hearing devices, equalization filters can be used that aim at achieving acoustic transparency, i.e., listening with the device in the ear is perceptually similar to the open ear. The equalization filter needs to ensure that the superposition of the equalized signal played by the device and the signal leaking through the device into the ear canal matches a processed… ▽ More To improve the sound quality of hearing devices, equalization filters can be used that aim at achieving acoustic transparency, i.e., listening with the device in the ear is perceptually similar to the open ear. The equalization filter needs to ensure that the superposition of the equalized signal played by the device and the signal leaking through the device into the ear canal matches a processed version of the signal reaching the eardrum of the open ear. Depending on the processing delay of the hearing device, comb-filtering artifacts can occur due to this superposition, which may degrade the perceived sound quality. In this paper we propose a unified least-squares-based procedure to design single- and multi-loudspeaker equalization filters for hearing devices aiming at achieving acoustic transparency. To account for non-minimum phase components, we introduce a so-called acausality management. To reduce comb-filtering artifacts, we propose to use a frequency-dependent regularization. Experimental results using measured acoustic transfer functions from a multi-loudspeaker earpiece show that the proposed equalization filter design procedure enables to achieve robust acoustic transparency and reduces the impact of comb-filtering artifacts. A comparison between single- and multi-loudspeaker equalization shows that for both cases a robust equalization performance can be achieved for different desired open ear transfer functions. △ Less

Submitted 9 September, 2021; originally announced September 2021.

arXiv:2007.05378 [pdf, other]

DARF: A data-reduced FADE version for simulations of speech recognition thresholds with real hearing aids

Authors: David Hülsmeier, Marc René Schädler, Birger Kollmeier

Abstract: Developing and selecting hearing aids is a time consuming process which is simplified by using objective models. Previously, the framework for auditory discrimination experiments (FADE) accurately simulated benefits of hearing aid algorithms with root mean squared prediction errors below 3 dB. One FADE simulation requires several hours of (un)processed signals, which is obstructive when the signal… ▽ More Developing and selecting hearing aids is a time consuming process which is simplified by using objective models. Previously, the framework for auditory discrimination experiments (FADE) accurately simulated benefits of hearing aid algorithms with root mean squared prediction errors below 3 dB. One FADE simulation requires several hours of (un)processed signals, which is obstructive when the signals have to be recorded. We propose and evaluate a data-reduced FADE version (DARF) which facilitates simulations with signals that cannot be processed digitally, but that can only be recorded in real-time. DARF simulates one speech recognition threshold (SRT) with about 30 minutes of recorded and processed signals of the (German) matrix sentence test. Benchmark experiments were carried out to compare DARF and standard FADE exhibiting small differences for stationary maskers (1 dB), but larger differences with strongly fluctuating maskers (5 dB). Hearing impairment and hearing aid algorithms seemed to reduce the differences. Hearing aid benefits were simulated in terms of speech recognition with three pairs of real hearing aids in silence ($\geq$8 dB), in stationary and fluctuating maskers in co-located (stat. 2 dB; fluct. 6 dB), and spatially separated speech and noise signals (stat. $\geq$8 dB; fluct. 8 dB). The simulations were plausible in comparison to data from literature, but a comparison with empirical data is still open. DARF facilitates objective SRT simulations with real devices with unknown signal processing in real environments. Yet, a validation of DARF for devices with unknown signal processing is still pending since it was only tested with three similar devices. Nonetheless, DARF could be used for improving as well as for developing or model-based fitting of hearing aids. △ Less

Submitted 11 February, 2021; v1 submitted 10 July, 2020; originally announced July 2020.

Comments: 19 pages, 14 figures, submitted to Hearing Research

arXiv:2004.06579 [pdf, other]

The Hearpiece database of individual transfer functions of an openly available in-the-ear earpiece for hearing device research

Authors: Florian Denk, Birger Kollmeier

Abstract: We present a database of acoustic transfer functions of the Hearpiece, an openly available multi-microphone multi-driver in-the-ear earpiece for hearing device research. The database includes HRTFs for 87 incidence directions as well as responses of the drivers, all measured at the four microphones of the Hearpiece as well as the eardrum in the occluded and open ear. The transfer functions were me… ▽ More We present a database of acoustic transfer functions of the Hearpiece, an openly available multi-microphone multi-driver in-the-ear earpiece for hearing device research. The database includes HRTFs for 87 incidence directions as well as responses of the drivers, all measured at the four microphones of the Hearpiece as well as the eardrum in the occluded and open ear. The transfer functions were measured in both ears of 25 human subjects and a KEMAR with anthropometric pinnae for five reinsertions of the device. We describe the measurements of the database and analyse derived acoustic parameters of the device. All regarded transfer functions are subject to differences between subjects as well as variations due to reinsertion into the same ear. Also, the results show that KEMAR measurements represent a median human ear well for all assessed transfer functions. The database is a rich basis for development, evaluation and robustness analysis of multiple hearing device algorithms and applications. The database is openly available at https://doi.org/10.5281/zenodo.3733191. △ Less

Submitted 14 April, 2020; originally announced April 2020.

Comments: 14 pages, 13 figures

Showing 1–7 of 7 results for author: Kollmeier, B