Skip to main content

Showing 1–13 of 13 results for author: Fernandez, R

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.11096  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Assessing the Impact of Anisotropy in Neural Representations of Speech: A Case Study on Keyword Spotting

    Authors: Guillaume Wisniewski, Séverine Guillaume, Clara Rosina Fernández

    Abstract: Pretrained speech representations like wav2vec2 and HuBERT exhibit strong anisotropy, leading to high similarity between random embeddings. While widely observed, the impact of this property on downstream tasks remains unclear. This work evaluates anisotropy in keyword spotting for computational documentary linguistics. Using Dynamic Time Warping, we show that despite anisotropy, wav2vec2 similari… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  2. New scenarios and trends in non-traditional laboratories from 2000 to 2020

    Authors: Ricardo M. Fernandez, Felix Garcia-Loro, Gustavo Alves, Africa Lopez-Rey, Russ Meier, Manuel Castro

    Abstract: For educational institutions in STEM areas, the provision of practical learning scenarios is, traditionally, a major concern. In the 21st century, the explosion of ICTs, as well as the universalization of low-cost hardware, have allowed the proliferation of technical solutions for any field; in the case of experimentation, encouraging the emergence and proliferation of non-traditional experimentat… ▽ More

    Submitted 24 January, 2025; originally announced January 2025.

    Comments: Transaction on learning Technologies, 20 pages

    Journal ref: IEEE Transactions on Learning Technologies. 17 - 1, pp. 1568 - 1587. New YorkIEEE (Institute of Electrical and Electronic Engineers) Education Society (IEEE ES) and Computer Society (IEEE CS), 04/2024. ISSN 1932-8540

  3. arXiv:2410.03930  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Reverb: Open-Source ASR and Diarization from Rev

    Authors: Nishchal Bhandari, Danny Chen, Miguel Ángel del Río Fernández, Natalie Delworth, Jennifer Drexler Fox, Migüel Jetté, Quinten McNamara, Corey Miller, Ondřej Novotný, Ján Profant, Nan Qin, Martin Ratajczak, Jean-Philippe Robichaud

    Abstract: Today, we are open-sourcing our core speech recognition and diarization models for non-commercial use. We are releasing both a full production pipeline for developers as well as pared-down research models for experimentation. Rev hopes that these releases will spur research and innovation in the fast-moving domain of voice technology. The speech recognition models released today outperform all exi… ▽ More

    Submitted 24 February, 2025; v1 submitted 4 October, 2024; originally announced October 2024.

  4. arXiv:2409.10535  [pdf, other

    cs.CV cs.AI cs.SD eess.AS

    Learning Co-Speech Gesture Representations in Dialogue through Contrastive Learning: An Intrinsic Evaluation

    Authors: Esam Ghaleb, Bulat Khaertdinov, Wim Pouw, Marlou Rasenberg, Judith Holler, Aslı Özyürek, Raquel Fernández

    Abstract: In face-to-face dialogues, the form-meaning relationship of co-speech gestures varies depending on contextual factors such as what the gestures refer to and the individual characteristics of speakers. These factors make co-speech gesture representation learning challenging. How can we learn meaningful gestures representations considering gestures' variability and relationship with speech? This pap… ▽ More

    Submitted 31 August, 2024; originally announced September 2024.

    ACM Class: I.4

    Journal ref: INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION (ICMI 2024)

  5. arXiv:2406.05547  [pdf, other

    cs.SD cs.CL eess.AS

    Exploring the Benefits of Tokenization of Discrete Acoustic Units

    Authors: Avihu Dekel, Raul Fernandez

    Abstract: Tokenization algorithms that merge the units of a base vocabulary into larger, variable-rate units have become standard in natural language processing tasks. This idea, however, has been mostly overlooked when the vocabulary consists of phonemes or Discrete Acoustic Units (DAUs), an audio-based representation that is playing an increasingly important role due to the success of discrete language-mo… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024

  6. A Neural TTS System with Parallel Prosody Transfer from Unseen Speakers

    Authors: Slava Shechtman, Raul Fernandez

    Abstract: Modern neural TTS systems are capable of generating natural and expressive speech when provided with sufficient amounts of training data. Such systems can be equipped with prosody-control functionality, allowing for more direct shaping of the speech output at inference time. In some TTS applications, it may be desirable to have an option that guides the TTS system with an ad-hoc speech recording e… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: Presented at Interspeech 2023

    Journal ref: Proc. INTERSPEECH 2023, 4853-4857 (2023)

  7. arXiv:2309.11210  [pdf, other

    eess.AS cs.CL cs.SD

    Speak While You Think: Streaming Speech Synthesis During Text Generation

    Authors: Avihu Dekel, Slava Shechtman, Raul Fernandez, David Haws, Zvi Kons, Ron Hoory

    Abstract: Large Language Models (LLMs) demonstrate impressive capabilities, yet interaction with these models is mostly facilitated through text. Using Text-To-Speech to synthesize LLM outputs typically results in notable latency, which is impractical for fluent voice conversations. We propose LLM2Speech, an architecture to synthesize speech while text is being generated by an LLM which yields significant l… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

    Comments: Under review for ICASSP 2024

  8. Short-Term Aggregated Residential Load Forecasting using BiLSTM and CNN-BiLSTM

    Authors: Bharat Bohara, Raymond I. Fernandez, Vysali Gollapudi, Xingpeng Li

    Abstract: Higher penetration of renewable and smart home technologies at the residential level challenges grid stability as utility-customer interactions add complexity to power system operations. In response, short-term residential load forecasting has become an increasing area of focus. However, forecasting at the residential level is challenging due to the higher uncertainties involved. Recently deep neu… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.

    Comments: This article has been accepted for publication in 2022 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT). This preprint is for personal use - that is solely for the purpose of research, but republication/redistribution requires IEEE permission. Please check IEEE website for more information

    Journal ref: 2022 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT)

  9. arXiv:2207.12262  [pdf, other

    eess.AS cs.SD

    Transplantation of Conversational Speaking Style with Interjections in Sequence-to-Sequence Speech Synthesis

    Authors: Raul Fernandez, David Haws, Guy Lorberbom, Slava Shechtman, Alexander Sorin

    Abstract: Sequence-to-Sequence Text-to-Speech architectures that directly generate low level acoustic features from phonetic sequences are known to produce natural and expressive speech when provided with adequate amounts of training data. Such systems can learn and transfer desired speaking styles from one seen speaker to another (in multi-style multi-speaker settings), which is highly desirable for creati… ▽ More

    Submitted 25 July, 2022; originally announced July 2022.

    Comments: Accepted for presentation at Interspeech 2022

  10. Supervised and Unsupervised Approaches for Controlling Narrow Lexical Focus in Sequence-to-Sequence Speech Synthesis

    Authors: Slava Shechtman, Raul Fernandez, David Haws

    Abstract: Although Sequence-to-Sequence (S2S) architectures have become state-of-the-art in speech synthesis, capable of generating outputs that approach the perceptual quality of natural samples, they are limited by a lack of flexibility when it comes to controlling the output. In this work we present a framework capable of controlling the prosodic output via a set of concise, interpretable, disentangled p… ▽ More

    Submitted 25 January, 2021; originally announced January 2021.

    Comments: IEEE Spoken Language Technology Workshop (SLT), 2021

  11. arXiv:2011.09530  [pdf, other

    cs.CV cs.AI eess.IV

    Neuro-Symbolic Representations for Video Captioning: A Case for Leveraging Inductive Biases for Vision and Language

    Authors: Hassan Akbari, Hamid Palangi, Jianwei Yang, Sudha Rao, Asli Celikyilmaz, Roland Fernandez, Paul Smolensky, Jianfeng Gao, Shih-Fu Chang

    Abstract: Neuro-symbolic representations have proved effective in learning structure information in vision and language. In this paper, we propose a new model architecture for learning multi-modal neuro-symbolic representations for video captioning. Our approach uses a dictionary learning-based method of learning relations between videos and their paired text descriptions. We refer to these relations as rel… ▽ More

    Submitted 18 November, 2020; originally announced November 2020.

  12. arXiv:1708.06345  [pdf, other

    cs.RO eess.SY

    Robust Optimal Planning and Control of Non-Periodic Bipedal Locomotion with A Centroidal Momentum Model

    Authors: Ye Zhao, Benito R. Fernandez, Luis Sentis

    Abstract: This study presents a theoretical method for planning and controlling agile bipedal locomotion based on robustly tracking a set of non-periodic keyframe states. Based on centroidal momentum dynamics, we formulate a hybrid phase-space planning and control method which includes the following key components: (i) a step transition solver that enables dynamically tracking non-periodic keyframe states o… ▽ More

    Submitted 19 August, 2017; originally announced August 2017.

    Comments: 43 pages, 22 figures, journal, International Journal of Robotics Research, 2017. arXiv admin note: substantial text overlap with arXiv:1701.05929, arXiv:1511.04628

  13. arXiv:1511.04628  [pdf, other

    cs.RO eess.SY

    A Framework for Planning and Controlling Non-Periodic Bipedal Locomotion

    Authors: Ye Zhao, Benito R. Fernandez, Luis Sentis

    Abstract: This study presents a theoretical framework for planning and controlling agile bipedal locomotion based on robustly tracking a set of non-periodic apex states. Based on the prismatic inverted pendulum model, we formulate a hybrid phase-space planning and control framework which includes the following key components: (1) a step transition solver that enables dynamically tracking non-periodic apex o… ▽ More

    Submitted 14 November, 2015; originally announced November 2015.

    Comments: 33 pages, 18 figures, journal