Showing 1–1 of 1 results for author: Csaba, Z

Search v0.5.6 released 2020-02-24

arXiv:2506.03831 [pdf, ps, other]

cs.SD cs.MM eess.AS

Conformer-based Ultrasound-to-Speech Conversion

Authors: Ibrahim Ibrahimov, Zainkó Csaba, Gábor Gosztolya

Abstract: Deep neural networks have shown promising potential for ultrasound-to-speech conversion task towards Silent Speech Interfaces. In this work, we applied two Conformer-based DNN architectures (Base and one with bi-LSTM) for this task. Speaker-specific models were trained on the data of four speakers from the Ultrasuite-Tal80 dataset, while the generated mel spectrograms were synthesized to audio wav… ▽ More Deep neural networks have shown promising potential for ultrasound-to-speech conversion task towards Silent Speech Interfaces. In this work, we applied two Conformer-based DNN architectures (Base and one with bi-LSTM) for this task. Speaker-specific models were trained on the data of four speakers from the Ultrasuite-Tal80 dataset, while the generated mel spectrograms were synthesized to audio waveform using a HiFi-GAN vocoder. Compared to a standard 2D-CNN baseline, objective measurements (MSE and mel cepstral distortion) showed no statistically significant improvement for either model. However, a MUSHRA listening test revealed that Conformer with bi-LSTM provided better perceptual quality, while Conformer Base matched the performance of the baseline along with a 3x faster training time due to its simpler architecture. These findings suggest that Conformer-based models, especially the Conformer with bi-LSTM, offer a promising alternative to CNNs for ultrasound-to-speech conversion. △ Less

Submitted 4 June, 2025; originally announced June 2025.

Comments: accepted to Interspeech 2025

Search v0.5.6 released 2020-02-24