Skip to main content

Showing 1–3 of 3 results for author: Sorin, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2207.12262  [pdf, other

    eess.AS cs.SD

    Transplantation of Conversational Speaking Style with Interjections in Sequence-to-Sequence Speech Synthesis

    Authors: Raul Fernandez, David Haws, Guy Lorberbom, Slava Shechtman, Alexander Sorin

    Abstract: Sequence-to-Sequence Text-to-Speech architectures that directly generate low level acoustic features from phonetic sequences are known to produce natural and expressive speech when provided with adequate amounts of training data. Such systems can learn and transfer desired speaking styles from one seen speaker to another (in multi-style multi-speaker settings), which is highly desirable for creati… ▽ More

    Submitted 25 July, 2022; originally announced July 2022.

    Comments: Accepted for presentation at Interspeech 2022

  2. arXiv:1909.10302  [pdf

    eess.AS cs.SD

    Sequence to Sequence Neural Speech Synthesis with Prosody Modification Capabilities

    Authors: Slava Shechtman, Alex Sorin

    Abstract: Modern sequence to sequence neural TTS systems provide close to natural speech quality. Such systems usually comprise a network converting linguistic/phonetic features sequence to an acoustic features sequence, cascaded with a neural vocoder. The generated speech prosody (i.e. phoneme durations, pitch and loudness) is implicitly present in the acoustic features, being mixed with spectral informati… ▽ More

    Submitted 23 September, 2019; originally announced September 2019.

    Comments: published at 10th ISCA Speech Synthesis Workshop (SSW-10, 2019)

    Journal ref: Proc. 10th ISCA Speech Synthesis Workshop, 275-280 (2019)

  3. arXiv:1905.00590  [pdf

    eess.AS cs.SD

    High quality, lightweight and adaptable TTS using LPCNet

    Authors: Zvi Kons, Slava Shechtman, Alex Sorin, Carmel Rabinovitz, Ron Hoory

    Abstract: We present a lightweight adaptable neural TTS system with high quality output. The system is composed of three separate neural network blocks: prosody prediction, acoustic feature prediction and Linear Prediction Coding Net as a neural vocoder. This system can synthesize speech with close to natural quality while running 3 times faster than real-time on a standard CPU. The modular setup of the sys… ▽ More

    Submitted 26 June, 2019; v1 submitted 2 May, 2019; originally announced May 2019.

    Comments: Accepted to Interspeech 2019