Search | arXiv e-print repository

Acoustic Disturbance Sensing Level Detection for ASD Diagnosis and Intelligibility Enhancement

Authors: Marcelo Pillonetto, Anderson Queiroz, Rosângela Coelho

Abstract: The acoustic sensitivity of Autism Spectrum Disorder (ASD) individuals highly impacts their intelligibility in noisy urban environments. In this Letter, the disturbance sensing level is examined with perceptual listening tests that demonstrate the impact of their append High Internal Noise (HIN) profile on intelligibility. This particular sensing level is then proposed as additional aid to ASD dia… ▽ More The acoustic sensitivity of Autism Spectrum Disorder (ASD) individuals highly impacts their intelligibility in noisy urban environments. In this Letter, the disturbance sensing level is examined with perceptual listening tests that demonstrate the impact of their append High Internal Noise (HIN) profile on intelligibility. This particular sensing level is then proposed as additional aid to ASD diagnosis. In this Letter, a novel intelligibility enhancement scheme is also introduced for ASD particular circumstances. For this proposal, harmonic features estimated from speech signal frames are considered as center frequencies of auditory filterbanks. A gain factor is further applied to the output of the filtered samples. The experimental results demonstrate that the proposal improved the acoustic intelligibility of ASD and Neurotypicals (NT) people considering four acoustic noises at different signal-to-noise ratios. △ Less

Submitted 15 May, 2025; v1 submitted 22 January, 2024; originally announced January 2024.

Comments: 4 pages, 3 figures, 2 tables

arXiv:2401.11829 [pdf, ps, other]

Harmonic Detection from Noisy Speech with Auditory Frame Gain for Intelligibility Enhancement

Authors: A. Queiroz, R. Coelho

Abstract: This paper introduces a novel (HDAG - Harmonic Detection for Auditory Gain) method for speech intelligibility enhancement in noisy scenarios. In the proposed scheme, a series of selective Gammachirp filters are adopted to emphasize the harmonic components of speech reducing the masking effects of acoustic noises. The fundamental frequency are estimated by the HHT-Amp technique. Harmonic patterns e… ▽ More This paper introduces a novel (HDAG - Harmonic Detection for Auditory Gain) method for speech intelligibility enhancement in noisy scenarios. In the proposed scheme, a series of selective Gammachirp filters are adopted to emphasize the harmonic components of speech reducing the masking effects of acoustic noises. The fundamental frequency are estimated by the HHT-Amp technique. Harmonic patterns estimated with low accuracy are detected and adjusted according the FSFFE low/high pitch separation. The central frequencies of the filterbank are defined considering the third octave subbands which are best suited to cover the regions most relevant to intelligibility. Before signal reconstruction, the gammachirp filtered components are amplified by gain factors regulated by FSFFE classification. The proposed HDAG solution and three baseline techniques are examined considering six background noises with four signal-to-noise ratios. Three objective measures are adopted for the evaluation of speech intelligibility and quality. Several experiments are conducted to demonstrate that the proposed scheme achieves better speech intelligibility improvement when compared to the competing approaches. A perceptual listening test is further considered and corroborates with the objective results. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: 9 pages, 6 figures, 4 tables

arXiv:2112.09896 [pdf, ps, other]

Noisy Speech Based Temporal Decomposition to Improve Fundamental Frequency Estimation

Authors: A. Queiroz, R. Coelho

Abstract: This paper introduces a novel method to separate noisy speech into low or high frequency frames, in order to improve fundamental frequency (F0) estimation accuracy. In this proposal, the target signal is analyzed by means of the ensemble empirical mode decomposition. Next, the pitch information is extracted from the first decomposition modes. This feature indicates the frequency region where the F… ▽ More This paper introduces a novel method to separate noisy speech into low or high frequency frames, in order to improve fundamental frequency (F0) estimation accuracy. In this proposal, the target signal is analyzed by means of the ensemble empirical mode decomposition. Next, the pitch information is extracted from the first decomposition modes. This feature indicates the frequency region where the F0 of speech should be located, thus separating the frames into low-frequency (LF) or high-frequency (HF). The separation is applied to correct candidates extracted from a conventional fundamental frequency detection method, and hence improving the accuracy of F0 estimate. The proposed method is evaluated in experiments with CSTR and TIMIT databases, considering six acoustic noises under various signal-to-noise ratios. A pitch enhancement algorithm is adopted as baseline in the evaluation analysis considering three conventional estimators. Results show that the proposed method outperforms the competing strategies, in terms of low/high frequency separation accuracy. Moreover, the performance metrics of the F0 estimation techniques show that the novel solution is able to better improve F0 detection accuracy when compared to competitive approaches under different noisy conditions. △ Less

Submitted 18 December, 2021; originally announced December 2021.

Comments: 9 pages

arXiv:2112.04949 [pdf, ps, other]

Harmonic and non-Harmonic Based Noisy Reverberant Speech Enhancement in Time Domain

Authors: G. Zucatelli, R. Coelho

Abstract: This paper introduces the single step time domain method named HnH-NRSE, whihc is designed for simultaneous speech intelligibility and quality improvement under noisy-reverberant conditions. In this solution, harmonic and non-harmonic elements of speech are separated by applying zero-crossing and energy criteria. An objective evaluation of the its non-stationarity degree is further used for an ada… ▽ More This paper introduces the single step time domain method named HnH-NRSE, whihc is designed for simultaneous speech intelligibility and quality improvement under noisy-reverberant conditions. In this solution, harmonic and non-harmonic elements of speech are separated by applying zero-crossing and energy criteria. An objective evaluation of the its non-stationarity degree is further used for an adaptive gain to treat masking components. No prior knowledge of speech statistics or room information is required for this technique. Additionally, two combined solutions, IRMO and IRMN, are proposed as composite methods for improvement on noisy-reverberant speech signals. The proposed and baseline methods are evaluated considering two intelligibility and three quality measures, applied for the objective prediction. The results show that the proposed scheme leads to a higher intelligibility and quality improvement when compared to competing methods in most scenarios. Additionally, a perceptual intelligibility listening test is performed, which corroborates with these results. Furthermore, the proposed HnH-NRSE solution attains SRMR quality measure with similar results when compared to the composed IRMO and IRMN techniques. △ Less

Submitted 9 December, 2021; originally announced December 2021.

Comments: 9 pages

arXiv:2012.08227 [pdf, ps, other]

doi 10.1109/LSP.2021.3084561

F0-based Gammatone Filtering for Intelligibility Gain of Acoustic Noisy Signals

Authors: A. Queiroz, R. Coelho

Abstract: This paper proposes a time-domain method to improve speech intelligibility in noisy scenarios. In the proposed approach, a series of Gammatone filters are adopted to detect the harmonic components of speech. The filters outputs are amplified to emphasize the first harmonics, reducing the masking effects of acoustic noises. The proposed GTFF0 solution and two baseline techniques are examined consid… ▽ More This paper proposes a time-domain method to improve speech intelligibility in noisy scenarios. In the proposed approach, a series of Gammatone filters are adopted to detect the harmonic components of speech. The filters outputs are amplified to emphasize the first harmonics, reducing the masking effects of acoustic noises. The proposed GTFF0 solution and two baseline techniques are examined considering four background noises with different non-stationarity degrees. Three intelligibility measures (ESTOI, ESII and ASIIST) are adopted for objective evaluation. The experiments results show that the proposed scheme leads to expressive speech intelligibility gain when compared to the competing approaches. Furthermore, the PESQ and WSS objective scores demonstrate that the proposed technique also provides interesting quality improvement. △ Less

Submitted 15 December, 2020; originally announced December 2020.

arXiv:2008.09175 [pdf, other]

doi 10.1109/LSP.2021.3086405

Blind Mask to Improve Intelligibility of Non-Stationary Noisy Speech

Authors: F. Farias, R. Coelho

Abstract: This letter proposes a novel blind acoustic mask (BAM) designed to adaptively detect noise components and preserve target speech segments in time-domain. A robust standard deviation estimator is applied to the non-stationary noisy speech to identify noise masking elements. The main contribution of the proposed solution is the use of this noise statistics to derive an adaptive information to define… ▽ More This letter proposes a novel blind acoustic mask (BAM) designed to adaptively detect noise components and preserve target speech segments in time-domain. A robust standard deviation estimator is applied to the non-stationary noisy speech to identify noise masking elements. The main contribution of the proposed solution is the use of this noise statistics to derive an adaptive information to define and select samples with lower noise proportion. Thus, preserving speech intelligibility. Additionally, no information of the target speech and noise signals statistics is previously required to this non-ideal mask. The BAM and three competitive methods, Ideal Binary Mask (IBM), Target Binary Mask (TBM), and Non-stationary Noise Estimation for Speech Enhancement (NNESE), are evaluated considering speech signals corrupted by three non-stationary acoustic noises and six values of signal-to-noise ratio (SNR). Results demonstrate that the BAM technique achieves intelligibility gains comparable to ideal masks while maintaining good speech quality. △ Less

Submitted 20 August, 2020; originally announced August 2020.

Comments: 5 Pages, 5 Figures, 3 Tables

arXiv:1910.02712 [pdf, other]

doi 10.1109/LSP.2019.2950618

Adaptive Reverberation Absorption using Non-stationary Masking Components Detection for Intelligibility Improvement

Authors: G. Zucatelli, R. Coelho

Abstract: This letter proposes a new time domain absorption approach designed to reduce masking components of speech signals under noisy-reverberant conditions. In this method, the non-stationarity of corrupted signal segments is used to detect masking distortions based on a defined threshold. The nonstationarity is objectively measured and is also adopted to determine the absorption procedure. Additionally… ▽ More This letter proposes a new time domain absorption approach designed to reduce masking components of speech signals under noisy-reverberant conditions. In this method, the non-stationarity of corrupted signal segments is used to detect masking distortions based on a defined threshold. The nonstationarity is objectively measured and is also adopted to determine the absorption procedure. Additionally, no prior knowledge of speech statistics or of the room information is required for this technique. Three intelligibility measures (ESII, ASIIST, SRMRnorm) and a perceptual listening test are used for evaluation. The experiments results show that the proposed scheme leads to a higher intelligibility improvement when compared to competing methods. △ Less

Submitted 7 October, 2019; originally announced October 2019.

arXiv:1910.02710 [pdf, other]

Impulsive Noise Detection for Intelligibility and Quality Improvement of Speech Enhancement Methods Applied in Time-Domain

Authors: C. Medina, R. Coelho

Abstract: This letter introduces a novel speech enhancement method in the Hilbert-Huang Transform domain to mitigate the effects of acoustic impulsive noises. The estimation and selection of noise components is based on the impulsiveness index of decomposition modes. Speech enhancement experiments are conducted considering five acoustic noises with different impulsiveness index and non-stationarity degrees… ▽ More This letter introduces a novel speech enhancement method in the Hilbert-Huang Transform domain to mitigate the effects of acoustic impulsive noises. The estimation and selection of noise components is based on the impulsiveness index of decomposition modes. Speech enhancement experiments are conducted considering five acoustic noises with different impulsiveness index and non-stationarity degrees under various signal-to-noise ratios. Three speech enhancement algorithms are adopted as baseline in the evaluation analysis considering spectral and time domains. The proposed solution achieves the best results in terms of objective quality measures and similar speech intelligibility rates to the competitive methods. △ Less

Submitted 7 October, 2019; originally announced October 2019.

arXiv:1910.02709 [pdf, other]

Effective Acoustic Energy Sensing Exploitation for Target Sources Localization in Urban Acoustic Scenes

Authors: M. Alves, R. Coelho, E. Dranka

Abstract: This letter proposes a new approach to improve the accuracy of the Energy-based source localization methods in urban acoustic scenes. The proposed acoustic energy sensing flow estimation (ESFE) uses the sensors signal nonstationarity degree to determine the area with highest energy concentration in the scenes. The ESFE is applied to different acoustic scenes and yields to source localization accur… ▽ More This letter proposes a new approach to improve the accuracy of the Energy-based source localization methods in urban acoustic scenes. The proposed acoustic energy sensing flow estimation (ESFE) uses the sensors signal nonstationarity degree to determine the area with highest energy concentration in the scenes. The ESFE is applied to different acoustic scenes and yields to source localization accuracy improvement with computational complexity reduction. The experiments results show that the proposed scheme leads to significant improvement in source localization accuracy. △ Less

Submitted 7 October, 2019; originally announced October 2019.

arXiv:1910.01967 [pdf, other]

Objective Human Affective Vocal Expression Detection and Automatic Classification with Stochastic Models and Learning Systems

Authors: V. Vieira, R. Coelho, F. Assis

Abstract: This paper presents a widespread analysis of affective vocal expression classification systems. In this study, state-of-the-art acoustic features are compared to two novel affective vocal prints for the detection of emotional states: the Hilbert-Huang-Hurst Coefficients (HHHC) and the vector of index of non-stationarity (INS). HHHC is here proposed as a nonlinear vocal source feature vector that r… ▽ More This paper presents a widespread analysis of affective vocal expression classification systems. In this study, state-of-the-art acoustic features are compared to two novel affective vocal prints for the detection of emotional states: the Hilbert-Huang-Hurst Coefficients (HHHC) and the vector of index of non-stationarity (INS). HHHC is here proposed as a nonlinear vocal source feature vector that represents the affective states according to their effects on the speech production mechanism. Emotional states are highlighted by the empirical mode decomposition (EMD) based method, which exploits the non-stationarity of the affective acoustic variations. Hurst coefficients (closely related to the excitation source) are then estimated from the decomposition process to compose the feature vector. Additionally, the INS vector is introduced as dynamic information to the HHHC feature. The proposed features are evaluated in speech emotion classification experiments with three databases in German and English languages. Three state-of-the-art acoustic features are adopted as baseline. The $α$-integrated Gaussian model ($α$-GMM) is also introduced for the emotion representation and classification. Its performance is compared to competing stochastic and machine learning classifiers. Results demonstrate that HHHC leads to significant classification improvement when compared to the baseline acoustic features. Moreover, results also show that $α$-GMM outperforms the competing classification methods. Finally, HHHC and INS are also evaluated as complementary features for the GeMAPS and eGeMAPS feature sets △ Less

Submitted 4 October, 2019; originally announced October 2019.

Showing 1–10 of 10 results for author: Coelho, R