Skip to main content

Showing 1–17 of 17 results for author: Hernando, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2505.24691  [pdf, ps, other

    cs.CL cs.SD eess.AS

    Speech-to-Text Translation with Phoneme-Augmented CoT: Enhancing Cross-Lingual Transfer in Low-Resource Scenarios

    Authors: Gerard I. Gállego, Oriol Pareras, Martí Cortada Garcia, Lucas Takanori, Javier Hernando

    Abstract: We propose a Speech-to-Text Translation (S2TT) approach that integrates phoneme representations into a Chain-of-Thought (CoT) framework to improve translation in low-resource and zero-resource settings. By introducing phoneme recognition as an intermediate step, we enhance cross-lingual transfer, enabling translation even for languages with no labeled speech data. Our system builds on a multilingu… ▽ More

    Submitted 30 May, 2025; originally announced May 2025.

    Comments: Accepted at Interspeech 2025

  2. arXiv:2503.22577  [pdf, other

    cs.CV cs.AI

    Breaking Language Barriers in Visual Language Models via Multilingual Textual Regularization

    Authors: Iñigo Pikabea, Iñaki Lacunza, Oriol Pareras, Carlos Escolano, Aitor Gonzalez-Agirre, Javier Hernando, Marta Villegas

    Abstract: Rapid advancements in Visual Language Models (VLMs) have transformed multimodal understanding but are often constrained by generating English responses regardless of the input language. This phenomenon has been termed as Image-induced Fidelity Loss (IFL) and stems from limited multimodal multilingual training data. To address this, we propose a continuous multilingual integration strategy that inj… ▽ More

    Submitted 20 May, 2025; v1 submitted 28 March, 2025; originally announced March 2025.

    Comments: v2: Expanded model merging experiments. Fix duplicated subsection on limitations

  3. arXiv:2503.15243  [pdf, other

    cs.IT cs.ET

    Integrating Sensing and Communications in 6G? Not Until It Is Secure to Do So

    Authors: Nanchi Su, Fan Liu, Jiaqi Zou, Christos Masouros, George C. Alexandropoulos, Alain Mourad, Javier Lorca Hernando, Qinyu Zhang, Tse-Tin Chan

    Abstract: Integrated Sensing and Communication (ISAC) is emerging as a cornerstone technology for forthcoming 6G systems, significantly improving spectrum and energy efficiency. However, the commercial viability of ISAC hinges on addressing critical challenges surrounding security, privacy, and trustworthiness. These challenges necessitate an end-to-end framework to safeguards both communication data and se… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

    Comments: 8 pages; 5 figures; submitted to an IEEE magazine

  4. Mass-Editing Memory with Attention in Transformers: A cross-lingual exploration of knowledge

    Authors: Daniel Tamayo, Aitor Gonzalez-Agirre, Javier Hernando, Marta Villegas

    Abstract: Recent research has explored methods for updating and modifying factual knowledge in large language models, often focusing on specific multi-layer perceptron blocks. This study expands on this work by examining the effectiveness of existing knowledge editing methods across languages and delving into the role of attention mechanisms in this process. Drawing from the insights gained, we propose Mass… ▽ More

    Submitted 4 February, 2025; originally announced February 2025.

    Journal ref: Findings of the Association for Computational Linguistics: ACL 2024. Pages: 5831-5847

  5. Work-Efficient Parallel Non-Maximum Suppression Kernels

    Authors: David Oro, Carles Fernández, Xavier Martorell, Javier Hernando

    Abstract: In the context of object detection, sliding-window classifiers and single-shot Convolutional Neural Network (CNN) meta-architectures typically yield multiple overlapping candidate windows with similar high scores around the true location of a particular object. Non-Maximum Suppression (NMS) is the process of selecting a single representative candidate within this cluster of detections, so as to ob… ▽ More

    Submitted 1 February, 2025; originally announced February 2025.

    Comments: Code: https://github.com/hertasecurity/gpu-nms

    ACM Class: D.1.3; I.4.8

    Journal ref: The Computer Journal, Volume 65, Issue 4, April 2022, Pages 773-787

  6. arXiv:2501.17893  [pdf, other

    eess.AS cs.LG cs.SD

    Language Modelling for Speaker Diarization in Telephonic Interviews

    Authors: Miquel India, Javier Hernando, José A. R. Fonollosa

    Abstract: The aim of this paper is to investigate the benefit of combining both language and acoustic modelling for speaker diarization. Although conventional systems only use acoustic features, in some scenarios linguistic data contain high discriminative speaker information, even more reliable than the acoustic ones. In this study we analyze how an appropriate fusion of both kind of features is able to ob… ▽ More

    Submitted 28 January, 2025; originally announced January 2025.

  7. arXiv:2410.13385  [pdf, other

    eess.AS cs.CL cs.SD

    On the Use of Audio to Improve Dialogue Policies

    Authors: Daniel Roncel, Federico Costa, Javier Hernando

    Abstract: With the significant progress of speech technologies, spoken goal-oriented dialogue systems are becoming increasingly popular. One of the main modules of a dialogue system is typically the dialogue policy, which is responsible for determining system actions. This component usually relies only on audio transcriptions, being strongly dependent on their quality and ignoring very important extralingui… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

    Comments: IberSpeech 2024

  8. Double Multi-Head Attention Multimodal System for Odyssey 2024 Speech Emotion Recognition Challenge

    Authors: Federico Costa, Miquel India, Javier Hernando

    Abstract: As computer-based applications are becoming more integrated into our daily lives, the importance of Speech Emotion Recognition (SER) has increased significantly. Promoting research with innovative approaches in SER, the Odyssey 2024 Speech Emotion Recognition Challenge was organized as part of the Odyssey 2024 Speaker and Language Recognition Workshop. In this paper we describe the Double Multi-He… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: Odyssey 2024: The Speaker and Language Recognition Workshop

    Journal ref: Proc. The Speaker and Language Recognition Workshop (Odyssey 2024), 266-273

  9. Speaker Characterization by means of Attention Pooling

    Authors: Federico Costa, Miquel India, Javier Hernando

    Abstract: State-of-the-art Deep Learning systems for speaker verification are commonly based on speaker embedding extractors. These architectures are usually composed of a feature extractor front-end together with a pooling layer to encode variable-length utterances into fixed-length speaker vectors. The authors have recently proposed the use of a Double Multi-Head Self-Attention pooling for speaker recogni… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: IberSpeech 2022

    Journal ref: Proc. IberSPEECH 2022, 166-170

  10. arXiv:2301.01703  [pdf, other

    cs.IT eess.SP

    Technology Trends for Massive MIMO towards 6G

    Authors: Yiming Huo, Xingqin Lin, Boya Di, Hongliang Zhang, Francisco Javier Lorca Hernando, Ahmet Serdar Tan, Shahid Mumtaz, Özlem Tuğfe Demir, Kun Chen-Hu

    Abstract: At the dawn of the next-generation wireless systems and networks, massive multiple-input multiple-output (MIMO) has been envisioned as one of the enabling technologies. With the continued success of being applied in the 5G and beyond, the massive MIMO technology has demonstrated its advantageousness, integrability, and extendibility. Moreover, several evolutionary features and revolutionizing tren… ▽ More

    Submitted 5 January, 2023; v1 submitted 4 January, 2023; originally announced January 2023.

    Comments: 7 pages, 5 figures. This work has been submitted to the IEEE for possible publication

  11. arXiv:2010.10937  [pdf, other

    eess.AS cs.SD

    The UPC Speaker Verification System Submitted to VoxCeleb Speaker Recognition Challenge 2020 (VoxSRC-20)

    Authors: Umair Khan, Javier Hernando

    Abstract: This report describes the submission from Technical University of Catalonia (UPC) to the VoxCeleb Speaker Recognition Challenge (VoxSRC-20) at Interspeech 2020. The final submission is a combination of three systems. System-1 is an autoencoder based approach which tries to reconstruct similar i-vectors, whereas System-2 and -3 are Convolutional Neural Network (CNN) based siamese architectures. The… ▽ More

    Submitted 27 October, 2020; v1 submitted 21 October, 2020; originally announced October 2020.

    Comments: VoxSRC-20 Workshop (Interspeech 2020 Conference)

  12. arXiv:2008.01077  [pdf, other

    eess.AS cs.LG cs.SD

    Self-attention encoding and pooling for speaker recognition

    Authors: Pooyan Safari, Miquel India, Javier Hernando

    Abstract: The computing power of mobile devices limits the end-user applications in terms of storage size, processing, memory and energy consumption. These limitations motivate researchers for the design of more efficient deep models. On the other hand, self-attention networks based on Transformer architecture have attracted remarkable interests due to their high parallelization capabilities and strong perf… ▽ More

    Submitted 3 August, 2020; originally announced August 2020.

  13. arXiv:2007.13199  [pdf, other

    eess.AS cs.SD

    Double Multi-Head Attention for Speaker Verification

    Authors: Miquel India, Pooyan Safari, Javier Hernando

    Abstract: Most state-of-the-art Deep Learning systems for speaker verification are based on speaker embedding extractors. These architectures are commonly composed of a feature extractor front-end together with a pooling layer to encode variable-length utterances into fixed-length speaker vectors. In this paper we present Double Multi-Head Attention pooling, which extends our previous approach based on Self… ▽ More

    Submitted 9 January, 2021; v1 submitted 26 July, 2020; originally announced July 2020.

  14. arXiv:2006.05388  [pdf

    cs.HC cs.LG

    End-to-end User Recognition using Touchscreen Biometrics

    Authors: Michał Krzemiński, Javier Hernando

    Abstract: We study the touchscreen data as behavioural biometrics. The goal was to create an end-to-end system that can transparently identify users using raw data from mobile devices. The touchscreen biometrics was researched only few times in series of works with disparity in used methodology and databases. In the proposed system data from the touchscreen goes directly, without any processing, to the inpu… ▽ More

    Submitted 9 June, 2020; originally announced June 2020.

  15. arXiv:1906.09890  [pdf, other

    cs.SD cs.LG stat.ML

    Self Multi-Head Attention for Speaker Recognition

    Authors: Miquel India, Pooyan Safari, Javier Hernando

    Abstract: Most state-of-the-art Deep Learning (DL) approaches for speaker recognition work on a short utterance level. Given the speech signal, these algorithms extract a sequence of speaker embeddings from short segments and those are averaged to obtain an utterance level speaker representation. In this work we propose the use of an attention mechanism to obtain a discriminative speaker embedding given non… ▽ More

    Submitted 1 July, 2019; v1 submitted 24 June, 2019; originally announced June 2019.

    Comments: 4+1 pages. 4 Figures. Accepted for Interspeech 2009

    MSC Class: 68

  16. arXiv:1706.10098  [pdf, other

    cs.GR cs.DC

    From Big Data to Big Displays: High-Performance Visualization at Blue Brain

    Authors: Stefan Eilemann, Marwan Abdellah, Nicolas Antille, Ahmet Bilgili, Grigory Chevtchenko, Raphael Dumusc, Cyrille Favreau, Juan Hernando, Daniel Nachbaur, Pawel Podhajski, Jafet Villafranca, Felix Schürmann

    Abstract: Blue Brain has pushed high-performance visualization (HPV) to complement its HPC strategy since its inception in 2007. In 2011, this strategy has been accelerated to develop innovative visualization solutions through increased funding and strategic partnerships with other research institutions. We present the key elements of this HPV ecosystem, which integrates C++ visualization applications wit… ▽ More

    Submitted 30 June, 2017; originally announced June 2017.

    Comments: ISC 2017 Visualization at Scale workshop

  17. Deep Learning for Single and Multi-Session i-Vector Speaker Recognition

    Authors: Omid Ghahabi, Javier Hernando

    Abstract: The promising performance of Deep Learning (DL) in speech recognition has motivated the use of DL in other speech technology applications such as speaker recognition. Given i-vectors as inputs, the authors proposed an impostor selection algorithm and a universal model adaptation process in a hybrid system based on Deep Belief Networks (DBN) and Deep Neural Networks (DNN) to discriminatively model… ▽ More

    Submitted 8 December, 2015; originally announced December 2015.

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, Volume: 25, Issue: 4, April 2017