Skip to main content

Showing 1–4 of 4 results for author: Tsiaras, V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2501.17332  [pdf, other

    cs.SD cs.LG eess.AS

    Compact Neural TTS Voices for Accessibility

    Authors: Kunal Jain, Eoin Murphy, Deepanshu Gupta, Jonathan Dyke, Saumya Shah, Vasilieios Tsiaras, Petko Petkov, Alistair Conkie

    Abstract: Contemporary text-to-speech solutions for accessibility applications can typically be classified into two categories: (i) device-based statistical parametric speech synthesis (SPSS) or unit selection (USEL) and (ii) cloud-based neural TTS. SPSS and USEL offer low latency and low disk footprint at the expense of naturalness and audio quality. Cloud-based neural TTS systems provide significantly bet… ▽ More

    Submitted 28 January, 2025; originally announced January 2025.

    Comments: Accepted at ICASSP 2025

  2. arXiv:2006.05233  [pdf, other

    eess.AS cs.SD

    A fully recurrent feature extraction for single channel speech enhancement

    Authors: Muhammed PV Shifas, Santelli Claudio, Vassilis Tsiaras, Yannis Stylianou

    Abstract: Convolutional neural network (CNN) modules are widely being used to build high-end speech enhancement neural models. However, the feature extraction power of vanilla CNN modules has been limited by the dimensionality constraint of the convolution kernels that are integrated - thereby, they have limitations to adequately model the noise context information at the feature extraction stage. To this e… ▽ More

    Submitted 3 June, 2021; v1 submitted 9 June, 2020; originally announced June 2020.

    Comments: 5 pages

  3. A non-causal FFTNet architecture for speech enhancement

    Authors: Muhammed PV Shifas, Nagaraj Adiga, Vassilis Tsiaras, Yannis Stylianou

    Abstract: In this paper, we suggest a new parallel, non-causal and shallow waveform domain architecture for speech enhancement based on FFTNet, a neural network for generating high quality audio waveform. In contrast to other waveform based approaches like WaveNet, FFTNet uses an initial wide dilation pattern. Such an architecture better represents the long term correlated structure of speech in the time do… ▽ More

    Submitted 8 June, 2020; originally announced June 2020.

    Comments: 5 pages

  4. arXiv:1804.09593  [pdf, other

    eess.AS cs.SD stat.ML

    Speaker-independent raw waveform model for glottal excitation

    Authors: Lauri Juvela, Vassilis Tsiaras, Bajibabu Bollepalli, Manu Airaksinen, Junichi Yamagishi, Paavo Alku

    Abstract: Recent speech technology research has seen a growing interest in using WaveNets as statistical vocoders, i.e., generating speech waveforms from acoustic features. These models have been shown to improve the generated speech quality over classical vocoders in many tasks, such as text-to-speech synthesis and voice conversion. Furthermore, conditioning WaveNets with acoustic features allows sharing t… ▽ More

    Submitted 25 April, 2018; originally announced April 2018.

    Comments: Submitted to Interspeech 2018