Search | arXiv e-print repository

arXiv:2407.20765 [pdf]

Integrating audiological datasets via federated merging of Auditory Profiles

Authors: Samira Saak, Dirk Oetting, Birger Kollmeier, Mareike Buhl

Abstract: Audiological datasets contain valuable knowledge about hearing loss in patients, which can be uncovered using data-driven, federated learning techniques. Our previous approach summarized patient information from one audiological dataset into distinct Auditory Profiles (APs). To obtain a better estimate of the audiological patient population, however, patient patterns must be analyzed across multip… ▽ More Audiological datasets contain valuable knowledge about hearing loss in patients, which can be uncovered using data-driven, federated learning techniques. Our previous approach summarized patient information from one audiological dataset into distinct Auditory Profiles (APs). To obtain a better estimate of the audiological patient population, however, patient patterns must be analyzed across multiple, separated datasets, and finally, be integrated into a combined set of APs. This study aimed at extending the existing profile generation pipeline with an AP merging step, enabling the combination of APs from different datasets based on their similarity across audiological measures. The 13 previously generated APs (NA=595) were merged with 31 newly generated APs from a second dataset (NB=1272) using a similarity score derived from the overlapping densities of common features across the two datasets. To ensure clinical applicability, random forest models were created for various scenarios, encompassing different combinations of audiological measures. A new set with 13 combined APs is proposed, providing separable profiles, which still capture detailed patient information from various test outcome combinations. The classification performance across these profiles is satisfactory. The best performance was achieved using a combination of loudness scaling, audiogram and speech test information, while single measures performed worst. The enhanced profile generation pipeline demonstrates the feasibility of combining APs across datasets, which should generalize to all datasets and could lead to an interpretable global profile set in the future. The classification models maintain clinical applicability. △ Less

Submitted 29 November, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

arXiv:2401.17202 [pdf]

Comparison of user interfaces for measuring the matrix sentence test on a smartphone

Authors: Samira Saak, Angelika Kothe, Mareike Buhl, Birger Kollmeier

Abstract: Using smartphones for mobile self-testing could provide easy access to speech intelligibility testing for a large proportion of the world population. The matrix sentence test (MST) is an ideal candidate in this context, as it is a repeatable and accurate speech test currently available in 20 languages. In clinical practice, an experimenter uses professional audiological equipment and supervises th… ▽ More Using smartphones for mobile self-testing could provide easy access to speech intelligibility testing for a large proportion of the world population. The matrix sentence test (MST) is an ideal candidate in this context, as it is a repeatable and accurate speech test currently available in 20 languages. In clinical practice, an experimenter uses professional audiological equipment and supervises the MST, which is infeasible for smartphone-based self-testing. Therefore, it is crucial to investigate the feasibility of self-conducting the MST on a smartphone, given its restricted screen size. We compared the traditional closed matrix user interface, displaying all 50 words of the MST in a 10x5 matrix, and three alternative, newly-developed interfaces (slide, type, wheel) regarding SRT consistency, user preference, and completion time, across younger normal hearing (N=15) and older hearing impaired participants (N=14). The slide interface is most suitable for mobile implementation. While the traditional matrix interface works well for most participants, not every participant could perform the task with this interface. The newly-introduced slide interface could serve as a plausible alternative on the small screen of a smartphone. This might be more attractive for elderly patients that may exhibit more tactile and visual impairments than our test subjects employed here. △ Less

Submitted 30 January, 2024; originally announced January 2024.

Comments: 26 pages, 9 Figures, 3 Tables, Supplementary Material

arXiv:2305.03297 [pdf]

Assessing Rate limits Using Behavioral and Neural Responses of Interaural-Time-Difference Cues in Fine-Structure and Envelope

Authors: Hongmei Hu, Stephan Ewert, Birger Kollmeier, Deborah Vickers

Abstract: The objective was to determine the effect of pulse rate on the sensitivity to use interaural-time-difference (ITD) cues and to explore the mechanisms behind rate-dependent degradation in ITD perception in bilateral cochlear implant (CI) listeners using CI simulations and electroencephalogram (EEG) measures. To eliminate the impact of CI stimulation artifacts and to develop protocols for the ongoin… ▽ More The objective was to determine the effect of pulse rate on the sensitivity to use interaural-time-difference (ITD) cues and to explore the mechanisms behind rate-dependent degradation in ITD perception in bilateral cochlear implant (CI) listeners using CI simulations and electroencephalogram (EEG) measures. To eliminate the impact of CI stimulation artifacts and to develop protocols for the ongoing bilateral CI studies, upper-frequency limits for both behavior and EEG responses were obtained from normal hearing (NH) listeners using sinusoidal-amplitude-modulated (SAM) tones and filtered clicks with changes in either fine structure ITD or envelope ITD. Multiple EEG responses were recorded, including the subcortical auditory steady-state responses (ASSRs) and cortical auditory evoked potentials (CAEPs) elicited by stimuli onset, offset, and changes. Results indicated that acoustic change complex (ACC) responses elicited by envelope ITD changes were significantly smaller or absent compared to those elicited by fine structure ITD changes. The ACC morphologies evoked by fine structure ITD changes were similar to onset and offset CAEPs, although smaller than onset CAEPs, with the longest peak latencies for ACC responses and shortest for offset CAEPs. The study found that high-frequency stimuli clearly elicited subcortical ASSRs, but smaller than those evoked by lower carrier frequency SAM tones. The 40-Hz ASSRs decreased with increasing carrier frequencies. Filtered clicks elicited larger ASSRs compared to high-frequency SAM tones, with the order being 40-Hz-ASSR>160-Hz-ASSR>80-Hz-ASSR>320-Hz-ASSR for both stimulus types. Wavelet analysis revealed a clear interaction between detectable transient CAEPs and 40-Hz-ASSRs in the time-frequency domain for SAM tones with a low carrier frequency. △ Less

Submitted 5 May, 2023; originally announced May 2023.

arXiv:2204.06907 [pdf, other]

Lombard Effect for Bilingual Speakers in Cantonese and English: importance of spectro-temporal features

Authors: Maximilian Karl Scharf, Sabine Hochmuth, Lena L. N. Wong, Birger Kollmeier, Anna Warzybok

Abstract: For a better understanding of the mechanisms underlying speech perception and the contribution of different signal features, computational models of speech recognition have a long tradition in hearing research. Due to the diverse range of situations in which speech needs to be recognized, these models need to be generalizable across many acoustic conditions, speakers, and languages. This contribut… ▽ More For a better understanding of the mechanisms underlying speech perception and the contribution of different signal features, computational models of speech recognition have a long tradition in hearing research. Due to the diverse range of situations in which speech needs to be recognized, these models need to be generalizable across many acoustic conditions, speakers, and languages. This contribution examines the importance of different features for speech recognition predictions of plain and Lombard speech for English in comparison to Cantonese in stationary and modulated noise. While Cantonese is a tonal language that encodes information in spectro-temporal features, the Lombard effect is known to be associated with spectral changes in the speech signal. These contrasting properties of tonal languages and the Lombard effect form an interesting basis for the assessment of speech recognition models. Here, an automatic speech recognition-based ASR model using spectral or spectro-temporal features is evaluated with empirical data. The results indicate that spectro-temporal features are crucial in order to predict the speaker-specific speech recognition threshold SRT$_{50}$ in both Cantonese and English as well as to account for the improvement of speech recognition in modulated noise, while effects due to Lombard speech can already be predicted by spectral features. △ Less

Submitted 14 April, 2022; originally announced April 2022.

Comments: Submitted to INTERSPEECH2022

arXiv:2203.09148 [pdf, other]

doi 10.1016/j.csl.2021.101329

Prediction of speech intelligibility with DNN-based performance measures

Authors: Angel Mario Castro Martinez, Constantin Spille, Jana Roßbach, Birger Kollmeier, Bernd T. Meyer

Abstract: This paper presents a speech intelligibility model based on automatic speech recognition (ASR), combining phoneme probabilities from deep neural networks (DNN) and a performance measure that estimates the word error rate from these probabilities. This model does not require the clean speech reference nor the word labels during testing as the ASR decoding step, which finds the most likely sequence… ▽ More This paper presents a speech intelligibility model based on automatic speech recognition (ASR), combining phoneme probabilities from deep neural networks (DNN) and a performance measure that estimates the word error rate from these probabilities. This model does not require the clean speech reference nor the word labels during testing as the ASR decoding step, which finds the most likely sequence of words given phoneme posterior probabilities, is omitted. The model is evaluated via the root-mean-squared error between the predicted and observed speech reception thresholds from eight normal-hearing listeners. The recognition task consists of identifying noisy words from a German matrix sentence test. The speech material was mixed with eight noise maskers covering different modulation types, from speech-shaped stationary noise to a single-talker masker. The prediction performance is compared to five established models and an ASR-model using word labels. Two combinations of features and networks were tested. Both include temporal information either at the feature level (amplitude modulation filterbanks and a feed-forward network) or captured by the architecture (mel-spectrograms and a time-delay deep neural network, TDNN). The TDNN model is on par with the DNN while reducing the number of parameters by a factor of 37; this optimization allows parallel streams on dedicated hearing aid hardware as a forward-pass can be computed within the 10ms of each frame. The proposed model performs almost as well as the label-based model and produces more accurate predictions than the baseline models. △ Less

Submitted 17 March, 2022; originally announced March 2022.

Journal ref: Computer Speech & Language, 74, p.101329 (2022)

arXiv:2111.01237 [pdf]

Auditory-visual scenes for hearing research

Authors: Steven van de Par, Stephan D. Ewert, Lubos Hladek, Christoph Kirsch, Julia Schütze, Josep Llorca-Bofí, Giso Grimm, Maartje M. E. Hendrikse, Birger Kollmeier, Bernhard U. Seeber

Abstract: While experimentation with synthetic stimuli in abstracted listening situations has a long standing and successful history in hearing research, an increased interest exists on closing the remaining gap towards real-life listening by replicating situations with high ecological validity in the lab. This is important for understanding the underlying auditory mechanisms and their relevance in real-lif… ▽ More While experimentation with synthetic stimuli in abstracted listening situations has a long standing and successful history in hearing research, an increased interest exists on closing the remaining gap towards real-life listening by replicating situations with high ecological validity in the lab. This is important for understanding the underlying auditory mechanisms and their relevance in real-life situations as well as for developing and evaluating increasingly sophisticated algorithms for hearing assistance. A range of 'classical' stimuli and paradigms have evolved to de-facto standards in psychoacoustics, which are simplistic and can be easily reproduced across laboratories. While they ideally allow for across laboratory comparisons and reproducible research, they, however, lack the acoustic stimulus complexity and the availability of visual information as observed in everyday life communication and listening situations. This contribution aims to provide and establish an extendable set of complex auditory-visual scenes for hearing research that allow for ecologically valid testing in realistic scenes while also supporting reproducibility and comparability of scientific results. Three virtual environments are provided (underground station, pub, living room), consisting of a detailed visual model, an acoustic geometry model with acoustic surface properties as well as a set of acoustic measurements in the respective real-world environments. The current data set enables i) audio-visual research in a reproducible set of environments, ii) comparison of room acoustic simulation methods with "ground truth" acoustic measurements, iii) a condensation point for future extensions and contributions for developments towards standardized test cases for ecologically valid hearing research in complex scenes. △ Less

Submitted 1 November, 2021; originally announced November 2021.

arXiv:2110.01422 [pdf, ps, other]

Individualized sound pressure equalization in hearing devices exploiting an electro-acoustic model

Authors: Henning Schepker, Reinhild Rohden, Florian Denk, Birger Kollmeier, Matthias Blau, Simon Doclo

Abstract: To improve sound quality in hearing devices, the hearing device output should be appropriately equalized. To achieve optimal individualized equalization typically requires knowledge of all transfer functions between the source, the hearing device, and the individual eardrum. However, in practice the measurement of all of these transfer functions is not feasible. This study investigates sound press… ▽ More To improve sound quality in hearing devices, the hearing device output should be appropriately equalized. To achieve optimal individualized equalization typically requires knowledge of all transfer functions between the source, the hearing device, and the individual eardrum. However, in practice the measurement of all of these transfer functions is not feasible. This study investigates sound pressure equalization using different transfer function estimates. Specifically, an electro-acoustic model is used to predict the sound pressure at the individual eardrum, and average estimates are used to predict the remaining transfer functions. Experimental results show that using these assumptions a practically feasible and close-to-optimal individualized sound pressure equalization can be achieved. △ Less

Submitted 4 October, 2021; originally announced October 2021.

arXiv:2109.04241 [pdf, ps, other]

Robust single- and multi-loudspeaker least-squares-based equalization for hearing devices

Authors: Henning Schepker, Florian Denk, Birger Kollmeier, Simon Doclo

Abstract: To improve the sound quality of hearing devices, equalization filters can be used that aim at achieving acoustic transparency, i.e., listening with the device in the ear is perceptually similar to the open ear. The equalization filter needs to ensure that the superposition of the equalized signal played by the device and the signal leaking through the device into the ear canal matches a processed… ▽ More To improve the sound quality of hearing devices, equalization filters can be used that aim at achieving acoustic transparency, i.e., listening with the device in the ear is perceptually similar to the open ear. The equalization filter needs to ensure that the superposition of the equalized signal played by the device and the signal leaking through the device into the ear canal matches a processed version of the signal reaching the eardrum of the open ear. Depending on the processing delay of the hearing device, comb-filtering artifacts can occur due to this superposition, which may degrade the perceived sound quality. In this paper we propose a unified least-squares-based procedure to design single- and multi-loudspeaker equalization filters for hearing devices aiming at achieving acoustic transparency. To account for non-minimum phase components, we introduce a so-called acausality management. To reduce comb-filtering artifacts, we propose to use a frequency-dependent regularization. Experimental results using measured acoustic transfer functions from a multi-loudspeaker earpiece show that the proposed equalization filter design procedure enables to achieve robust acoustic transparency and reduces the impact of comb-filtering artifacts. A comparison between single- and multi-loudspeaker equalization shows that for both cases a robust equalization performance can be achieved for different desired open ear transfer functions. △ Less

Submitted 9 September, 2021; originally announced September 2021.

arXiv:2007.05378 [pdf, other]

DARF: A data-reduced FADE version for simulations of speech recognition thresholds with real hearing aids

Authors: David Hülsmeier, Marc René Schädler, Birger Kollmeier

Abstract: Developing and selecting hearing aids is a time consuming process which is simplified by using objective models. Previously, the framework for auditory discrimination experiments (FADE) accurately simulated benefits of hearing aid algorithms with root mean squared prediction errors below 3 dB. One FADE simulation requires several hours of (un)processed signals, which is obstructive when the signal… ▽ More Developing and selecting hearing aids is a time consuming process which is simplified by using objective models. Previously, the framework for auditory discrimination experiments (FADE) accurately simulated benefits of hearing aid algorithms with root mean squared prediction errors below 3 dB. One FADE simulation requires several hours of (un)processed signals, which is obstructive when the signals have to be recorded. We propose and evaluate a data-reduced FADE version (DARF) which facilitates simulations with signals that cannot be processed digitally, but that can only be recorded in real-time. DARF simulates one speech recognition threshold (SRT) with about 30 minutes of recorded and processed signals of the (German) matrix sentence test. Benchmark experiments were carried out to compare DARF and standard FADE exhibiting small differences for stationary maskers (1 dB), but larger differences with strongly fluctuating maskers (5 dB). Hearing impairment and hearing aid algorithms seemed to reduce the differences. Hearing aid benefits were simulated in terms of speech recognition with three pairs of real hearing aids in silence ($\geq$8 dB), in stationary and fluctuating maskers in co-located (stat. 2 dB; fluct. 6 dB), and spatially separated speech and noise signals (stat. $\geq$8 dB; fluct. 8 dB). The simulations were plausible in comparison to data from literature, but a comparison with empirical data is still open. DARF facilitates objective SRT simulations with real devices with unknown signal processing in real environments. Yet, a validation of DARF for devices with unknown signal processing is still pending since it was only tested with three similar devices. Nonetheless, DARF could be used for improving as well as for developing or model-based fitting of hearing aids. △ Less

Submitted 11 February, 2021; v1 submitted 10 July, 2020; originally announced July 2020.

Comments: 19 pages, 14 figures, submitted to Hearing Research

arXiv:2004.06579 [pdf, other]

The Hearpiece database of individual transfer functions of an openly available in-the-ear earpiece for hearing device research

Authors: Florian Denk, Birger Kollmeier

Abstract: We present a database of acoustic transfer functions of the Hearpiece, an openly available multi-microphone multi-driver in-the-ear earpiece for hearing device research. The database includes HRTFs for 87 incidence directions as well as responses of the drivers, all measured at the four microphones of the Hearpiece as well as the eardrum in the occluded and open ear. The transfer functions were me… ▽ More We present a database of acoustic transfer functions of the Hearpiece, an openly available multi-microphone multi-driver in-the-ear earpiece for hearing device research. The database includes HRTFs for 87 incidence directions as well as responses of the drivers, all measured at the four microphones of the Hearpiece as well as the eardrum in the occluded and open ear. The transfer functions were measured in both ears of 25 human subjects and a KEMAR with anthropometric pinnae for five reinsertions of the device. We describe the measurements of the database and analyse derived acoustic parameters of the device. All regarded transfer functions are subject to differences between subjects as well as variations due to reinsertion into the same ear. Also, the results show that KEMAR measurements represent a median human ear well for all assessed transfer functions. The database is a rich basis for development, evaluation and robustness analysis of multiple hearing device algorithms and applications. The database is openly available at https://doi.org/10.5281/zenodo.3733191. △ Less

Submitted 14 April, 2020; originally announced April 2020.

Comments: 14 pages, 13 figures

Showing 1–10 of 10 results for author: Kollmeier, B