Skip to main content

Showing 1–7 of 7 results for author: Vallés-Pérez, I

Searching in archive cs. Search in all archives.
.
  1. arXiv:2402.03407  [pdf, other

    eess.AS cs.CL cs.LG

    Enhancing the Stability of LLM-based Speech Generation Systems through Self-Supervised Representations

    Authors: Álvaro Martín-Cortinas, Daniel Sáez-Trigueros, Iván Vallés-Pérez, Biel Tura-Vecino, Piotr Biliński, Mateusz Lajszczak, Grzegorz Beringer, Roberto Barra-Chicote, Jaime Lorenzo-Trueba

    Abstract: Large Language Models (LLMs) are one of the most promising technologies for the next era of speech generation systems, due to their scalability and in-context learning capabilities. Nevertheless, they suffer from multiple stability issues at inference time, such as hallucinations, content skipping or speech repetitions. In this work, we introduce a new self-supervised Voice Conversion (VC) archite… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: 10 pages, 1 figure, 3 tables

  2. arXiv:2307.12445  [pdf, other

    cs.SD cs.AI eess.AS

    SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic Spaces

    Authors: Ivan Vallés-Pérez, Grzegorz Beringer, Piotr Bilinski, Gary Cook, Roberto Barra-Chicote

    Abstract: Numerous examples in the literature proved that deep learning models have the ability to work well with multimodal data. Recently, CLIP has enabled deep learning systems to learn shared latent spaces between images and text descriptions, with outstanding zero- or few-shot results in downstream tasks. In this paper we explore the same idea proposed by CLIP but applied to the speech domain, where th… ▽ More

    Submitted 30 January, 2024; v1 submitted 23 July, 2023; originally announced July 2023.

    Comments: In proceedings of the 26th European Conference on Artificial Intelligence ECAI 2023. 8 pages + 1 appendix page

  3. arXiv:2301.05993  [pdf, other

    cs.CV cs.AI

    Empirical study of the modulus as activation function in computer vision applications

    Authors: Iván Vallés-Pérez, Emilio Soria-Olivas, Marcelino Martínez-Sober, Antonio J. Serrano-López, Joan Vila-Francés, Juan Gómez-Sanchís

    Abstract: In this work we propose a new non-monotonic activation function: the modulus. The majority of the reported research on nonlinearities is focused on monotonic functions. We empirically demonstrate how by using the modulus activation function on computer vision tasks the models generalize better than with other nonlinearities - up to a 15% accuracy increase in CIFAR100 and 4% in CIFAR10, relative to… ▽ More

    Submitted 14 January, 2023; originally announced January 2023.

    Comments: Accepted at Engineering Applications of AI

  4. arXiv:2211.09731  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Stutter-TTS: Controlled Synthesis and Improved Recognition of Stuttered Speech

    Authors: Xin Zhang, Iván Vallés-Pérez, Andreas Stolcke, Chengzhu Yu, Jasha Droppo, Olabanji Shonibare, Roberto Barra-Chicote, Venkatesh Ravichandran

    Abstract: Stuttering is a speech disorder where the natural flow of speech is interrupted by blocks, repetitions or prolongations of syllables, words and phrases. The majority of existing automatic speech recognition (ASR) interfaces perform poorly on utterances with stutter, mainly due to lack of matched training data. Synthesis of speech with stutter thus presents an opportunity to improve ASR for this ty… ▽ More

    Submitted 4 November, 2022; originally announced November 2022.

    Comments: 8 pages, 3 figures, 2 tables

    Journal ref: NeurIPS Workshop on SyntheticData4ML, December 2022

  5. arXiv:2204.07786  [pdf, ps, other

    cs.LG cs.AI stat.AP

    Approaching sales forecasting using recurrent neural networks and transformers

    Authors: Iván Vallés-Pérez, Emilio Soria-Olivas, Marcelino Martínez-Sober, Antonio J. Serrano-López, Juan Gómez-Sanchís, Fernando Mateo

    Abstract: Accurate and fast demand forecast is one of the hot topics in supply chain for enabling the precise execution of the corresponding downstream processes (inbound and outbound planning, inventory placement, network planning, etc). We develop three alternatives to tackle the problem of forecasting the customer sales at day/store/item level using deep learning techniques and the Corporación Favorita d… ▽ More

    Submitted 16 April, 2022; originally announced April 2022.

    Comments: Accepted for publication in Expert Systems and Applications

  6. arXiv:2110.07498  [pdf, ps, other

    cs.CL

    End-to-end Keyword Spotting using Xception-1d

    Authors: Iván Vallés-Pérez, Juan Gómez-Sanchis, Marcelino Martínez-Sober, Joan Vila-Francés, Antonio J. Serrano-López, Emilio Soria-Olivas

    Abstract: The field of conversational agents is growing fast and there is an increasing need for algorithms that enhance natural interaction. In this work we show how we achieved state of the art results in the Keyword Spotting field by adapting and tweaking the Xception algorithm, which achieved outstanding results in several computer vision tasks. We obtained about 96\% accuracy when classifying audio cli… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

    Comments: In proceedings of ESANN 2021 conference. 5 pages + references

  7. arXiv:2106.05762  [pdf, other

    cs.SD cs.CL eess.AS

    Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flows

    Authors: Iván Vallés-Pérez, Julian Roth, Grzegorz Beringer, Roberto Barra-Chicote, Jasha Droppo

    Abstract: Text-to-speech systems recently achieved almost indistinguishable quality from human speech. However, the prosody of those systems is generally flatter than natural speech, producing samples with low expressiveness. Disentanglement of speaker id and prosody is crucial in text-to-speech systems to improve on naturalness and produce more variable syntheses. This paper proposes a new neural text-to-s… ▽ More

    Submitted 10 June, 2021; originally announced June 2021.

    Comments: in Proceedings of Interspeech 2021 conference