Skip to main content

Showing 1–7 of 7 results for author: Comini, G

.
  1. arXiv:2402.01385  [pdf

    eess.AS cs.SD

    Del Visual al Auditivo: Sonorización de Escenas Guiada por Imagen

    Authors: María Sánchez, Laura Fernández, Julián Arias, Mateo Cámara, Giulia Comini, Adam Gabrys, José Luis Blanco, Juan Ignacio Godino, Luis Alfonso Hernández

    Abstract: Recent advances in image, video, text and audio generative techniques, and their use by the general public, are leading to new forms of content generation. Usually, each modality was approached separately, which poses limitations. The automatic sound recording of visual sequences is one of the greatest challenges for the automatic generation of multimodal content. We present a processing flow that… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: 10 pages, in Spanish, Tecniacústica

  2. arXiv:2307.16709  [pdf, other

    cs.CL eess.AS

    Multilingual context-based pronunciation learning for Text-to-Speech

    Authors: Giulia Comini, Manuel Sam Ribeiro, Fan Yang, Heereen Shim, Jaime Lorenzo-Trueba

    Abstract: Phonetic information and linguistic knowledge are an essential component of a Text-to-speech (TTS) front-end. Given a language, a lexicon can be collected offline and Grapheme-to-Phoneme (G2P) relationships are usually modeled in order to predict the pronunciation for out-of-vocabulary (OOV) words. Additionally, post-lexical phonology, often defined in the form of rule-based systems, is used to co… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Comments: 5 pages, 2 figures, 5 tables. Interspeech 2023

  3. arXiv:2307.16643  [pdf, other

    eess.AS cs.CL

    Improving grapheme-to-phoneme conversion by learning pronunciations from speech recordings

    Authors: Manuel Sam Ribeiro, Giulia Comini, Jaime Lorenzo-Trueba

    Abstract: The Grapheme-to-Phoneme (G2P) task aims to convert orthographic input into a discrete phonetic representation. G2P conversion is beneficial to various speech processing applications, such as text-to-speech and speech recognition. However, these tend to rely on manually-annotated pronunciation dictionaries, which are often time-consuming and costly to acquire. In this paper, we propose a method to… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Comments: 5 pages, 2 figures, 4 tables. Interspeech 2023

  4. arXiv:2207.14607  [pdf, other

    eess.AS cs.SD

    Low-data? No problem: low-resource, language-agnostic conversational text-to-speech via F0-conditioned data augmentation

    Authors: Giulia Comini, Goeric Huybrechts, Manuel Sam Ribeiro, Adam Gabrys, Jaime Lorenzo-Trueba

    Abstract: The availability of data in expressive styles across languages is limited, and recording sessions are costly and time consuming. To overcome these issues, we demonstrate how to build low-resource, neural text-to-speech (TTS) voices with only 1 hour of conversational speech, when no other conversational data are available in the same language. Assuming the availability of non-expressive speech data… ▽ More

    Submitted 29 July, 2022; originally announced July 2022.

    Comments: Accepted for presentation at Interspeech 2022

  5. arXiv:2202.08164  [pdf, other

    eess.AS cs.CL cs.LG

    Voice Filter: Few-shot text-to-speech speaker adaptation using voice conversion as a post-processing module

    Authors: Adam Gabryś, Goeric Huybrechts, Manuel Sam Ribeiro, Chung-Ming Chien, Julian Roth, Giulia Comini, Roberto Barra-Chicote, Bartek Perz, Jaime Lorenzo-Trueba

    Abstract: State-of-the-art text-to-speech (TTS) systems require several hours of recorded speech data to generate high-quality synthetic speech. When using reduced amounts of training data, standard TTS models suffer from speech quality and intelligibility degradations, making training low-resource TTS systems problematic. In this paper, we propose a novel extremely low-resource TTS method called Voice Filt… ▽ More

    Submitted 16 February, 2022; originally announced February 2022.

    Comments: Accepted at ICASSP 2022

  6. arXiv:2202.05083  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Cross-speaker style transfer for text-to-speech using data augmentation

    Authors: Manuel Sam Ribeiro, Julian Roth, Giulia Comini, Goeric Huybrechts, Adam Gabrys, Jaime Lorenzo-Trueba

    Abstract: We address the problem of cross-speaker style transfer for text-to-speech (TTS) using data augmentation via voice conversion. We assume to have a corpus of neutral non-expressive data from a target speaker and supporting conversational expressive data from different speakers. Our goal is to build a TTS system that is expressive, while retaining the target speaker's identity. The proposed approach… ▽ More

    Submitted 10 February, 2022; originally announced February 2022.

    Comments: 5 pages, 3 figures, 4 tables. ICASSP 2022

  7. arXiv:2011.05707  [pdf, other

    eess.AS cs.CL cs.SD

    Low-resource expressive text-to-speech using data augmentation

    Authors: Goeric Huybrechts, Thomas Merritt, Giulia Comini, Bartek Perz, Raahil Shah, Jaime Lorenzo-Trueba

    Abstract: While recent neural text-to-speech (TTS) systems perform remarkably well, they typically require a substantial amount of recordings from the target speaker reading in the desired speaking style. In this work, we present a novel 3-step methodology to circumvent the costly operation of recording large amounts of target data in order to build expressive style voices with as little as 15 minutes of su… ▽ More

    Submitted 1 June, 2021; v1 submitted 11 November, 2020; originally announced November 2020.