Search | arXiv e-print repository

Open-Source System for Multilingual Translation and Cloned Speech Synthesis

Authors: Mateo Cámara, Juan Gutiérrez, María Pilar Daza, José Luis Blanco

Abstract: We present an open-source system designed for multilingual translation and speech regeneration, addressing challenges in communication and accessibility across diverse linguistic contexts. The system integrates Whisper for speech recognition with Voice Activity Detection (VAD) to identify speaking intervals, followed by a pipeline of Large Language Models (LLMs). For multilingual applications, the… ▽ More We present an open-source system designed for multilingual translation and speech regeneration, addressing challenges in communication and accessibility across diverse linguistic contexts. The system integrates Whisper for speech recognition with Voice Activity Detection (VAD) to identify speaking intervals, followed by a pipeline of Large Language Models (LLMs). For multilingual applications, the first LLM segments speech into coherent, complete sentences, which a second LLM then translates. For speech regeneration, the system uses a text-to-speech (TTS) module with voice cloning capabilities to replicate the original speaker's voice, maintaining naturalness and speaker identity. The system's open-source components can operate locally or via APIs, offering cost-effective deployment across various use cases. These include real-time multilingual translation in Zoom sessions, speech regeneration for public broadcasts, and Bluetooth-enabled multilingual playback through personal devices. By preserving the speaker's voice, the system ensures a seamless and immersive experience, whether translating or regenerating speech. This open-source project is shared with the community to foster innovation and accessibility. We provide a detailed system performance analysis, including latency and word accuracy, demonstrating its potential to enable inclusive, adaptable communication solutions in real-world multilingual scenarios. △ Less

Submitted 3 July, 2025; originally announced July 2025.

Comments: Presented at Forum Acusticum Euronoise 2025

arXiv:2411.04581 [pdf, other]

URLLC Networks enabled by STAR-RIS, Rate Splitting, and Multiple Antennas

Authors: Eduard Jorswieck, Mohammad Soleymani, Ignacio Santamaria, Jesús Gutiérrez

Abstract: The challenges in dense ultra-reliable low-latency communication networks to deliver the required service to multiple devices are addressed by three main technologies: multiple antennas at the base station (MISO), rate splitting multiple access (RSMA) with private and common message encoding, and simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RIS). Careful res… ▽ More The challenges in dense ultra-reliable low-latency communication networks to deliver the required service to multiple devices are addressed by three main technologies: multiple antennas at the base station (MISO), rate splitting multiple access (RSMA) with private and common message encoding, and simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RIS). Careful resource allocation, encompassing beamforming and RIS optimization, is required to exploit the synergy between the three. We propose an alternating optimization-based algorithm, relying on minorization-maximization. Numerical results show that the achievable second-order max-min rates of the proposed scheme outperform the baselines significantly. MISO, RSMA, and STAR-RIS all contribute to enabling ultra-reliable low-latency communication (URLLC). △ Less

Submitted 7 November, 2024; originally announced November 2024.

Comments: Accepted at 2025 International Conference on Mobile and Miniaturized Terahertz Systems (ICMMTS)

arXiv:2408.04271 [pdf, other]

Energy Efficiency Comparison of RIS Architectures in MISO Broadcast Channels

Authors: Mohammad Soleymani, Ignacio Santamaria, Eduard Jorswieck, Marco Di Renzo, Jesús Gutiérrez

Abstract: In this paper, we develop energy-efficient schemes for multi-user multiple-input single-output (MISO) broadcast channels (BCs), assisted by reconfigurable intelligent surfaces (RISs). To this end, we consider three architectures of RIS: locally passive diagonal (LP-D), globally passive diagonal (GP-D), and globally passive beyond diagonal (GP-BD). In a globally passive RIS, the power of the output… ▽ More In this paper, we develop energy-efficient schemes for multi-user multiple-input single-output (MISO) broadcast channels (BCs), assisted by reconfigurable intelligent surfaces (RISs). To this end, we consider three architectures of RIS: locally passive diagonal (LP-D), globally passive diagonal (GP-D), and globally passive beyond diagonal (GP-BD). In a globally passive RIS, the power of the output signal of the RIS is not greater than its input power, but some RIS elements can amplify the signal. In a locally passive RIS, every element cannot amplify the incident signal. We show that these RIS architectures can substantially improve energy efficiency (EE) if the static power of the RIS elements is not too high. Moreover, GP-BD RIS, which has a higher complexity and static power than LP-D RIS and GP-D RIS, provides better spectral efficiency, but its EE performance highly depends on the static power consumption and may be worse than its diagonal counterparts. △ Less

Submitted 8 August, 2024; originally announced August 2024.

Comments: Accepted at 25th IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC)

arXiv:2406.02170 [pdf, other]

MIMO Capacity Maximization with Beyond-Diagonal RIS

Authors: Ignacio Santamaria, Mohammad Soleymani, Eduard Jorswieck, Jesús Gutiérrez

Abstract: This paper addresses the problem of maximizing the capacity of a multiple-input multiple-output (MIMO) link assisted by a beyond-diagonal reconfigurable intelligent surface (BD-RIS). We maximize the capacity by alternately optimizing the transmit covariance matrix, and the BD-RIS scattering matrix, which, according to network theory, should be unitary and symmetric. These constraints make the opti… ▽ More This paper addresses the problem of maximizing the capacity of a multiple-input multiple-output (MIMO) link assisted by a beyond-diagonal reconfigurable intelligent surface (BD-RIS). We maximize the capacity by alternately optimizing the transmit covariance matrix, and the BD-RIS scattering matrix, which, according to network theory, should be unitary and symmetric. These constraints make the optimization of BD-RIS more challenging than that of diagonal RIS. To find a stationary point of the capacity we maximize a sequence of quadratic problems in the manifold of unitary matrices. This leads to an efficient algorithm that always improves the capacity obtained by a diagonal RIS. Through simulation examples, we study the capacity improvement provided by a passive BD-RIS architecture over the conventional RIS model in which the phase shift matrix is diagonal. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 5 pages, 4 figures

arXiv:2303.03014 [pdf, other]

Interference Leakage Minimization in RIS-assisted MIMO Interference Channels

Authors: Ignacio Santamaria, Mohammad Soleymani, Eduard Jorswieck, Jesus Gutierrez

Abstract: We address the problem of interference leakage (IL) minimization in the $K$-user multiple-input multiple-output (MIMO) interference channel (IC) assisted by a reconfigurable intelligent surface (RIS). We describe an iterative algorithm based on block coordinate descent to minimize the IL cost function. A reformulation of the problem provides a geometric interpretation and shows interesting connect… ▽ More We address the problem of interference leakage (IL) minimization in the $K$-user multiple-input multiple-output (MIMO) interference channel (IC) assisted by a reconfigurable intelligent surface (RIS). We describe an iterative algorithm based on block coordinate descent to minimize the IL cost function. A reformulation of the problem provides a geometric interpretation and shows interesting connections with envelope precoding and phase-only zero-forcing beamforming problems. As a result of this analysis, we derive a set of necessary (but not sufficient) conditions for a phase-optimized RIS to be able to perfectly cancel the interference on the $K$-user MIMO IC. △ Less

Submitted 6 March, 2023; originally announced March 2023.

Comments: Accepted at ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

arXiv:2212.07396 [pdf, other]

Objective quality assessment of medical images and videos: Review and challenges

Authors: Rafael Rodrigues, Lucie Lévêque, Jesús Gutiérrez, Houda Jebbari, Meriem Outtas, Lu Zhang, Aladine Chetouani, Shaymaa Al-Juboori, Maria Martini, Antonio M. G. Pinheiro

Abstract: Quality assessment is a key element for the evaluation of hardware and software involved in image and video acquisition, processing, and visualization. In the medical field, user-based quality assessment is still considered more reliable than objective methods, which allow the implementation of automated and more efficient solutions. Regardless of increasing research in this topic in the last deca… ▽ More Quality assessment is a key element for the evaluation of hardware and software involved in image and video acquisition, processing, and visualization. In the medical field, user-based quality assessment is still considered more reliable than objective methods, which allow the implementation of automated and more efficient solutions. Regardless of increasing research in this topic in the last decade, defining quality standards for medical content remains a non-trivial task, as the focus should be on the diagnostic value assessed from expert viewers rather than the perceived quality from naïve viewers, and objective quality metrics should aim at estimating the first rather than the latter. In this paper, we present a survey of methodologies used for the objective quality assessment of medical images and videos, dividing them into visual quality-based and task-based approaches. Visual quality based methods compute a quality index directly from visual attributes, while task-based methods, being increasingly explored, measure the impact of quality impairments on the performance of a specific task. A discussion on the limitations of state-of-the-art research on this topic is also provided, along with future challenges to be addressed. △ Less

Submitted 14 December, 2022; originally announced December 2022.

Comments: Submitted for peer review at Multimedia Tools and Applications

ACM Class: I.4; I.4.2

arXiv:2210.03589 [pdf, other]

Tracing, Ranking and Valuation of Aggregated DER Flexibility in Active Distribution Networks

Authors: Andrey Churkin, Wangwei Kong, Jose N. Melchor Gutierrez, Eduardo A. Martínez Ceseña, Pierluigi Mancarella

Abstract: The integration of distributed energy resources (DER) makes active distribution networks (ADNs) natural providers of flexibility services. However, the optimal operation of flexible units in ADNs is highly complex, which poses challenges for distribution system operators (DSOs) in aggregating DER flexibility. For example, to maximise the provision of services, flexible units must be strongly coord… ▽ More The integration of distributed energy resources (DER) makes active distribution networks (ADNs) natural providers of flexibility services. However, the optimal operation of flexible units in ADNs is highly complex, which poses challenges for distribution system operators (DSOs) in aggregating DER flexibility. For example, to maximise the provision of services, flexible units must be strongly coordinated to manage network constraints, e.g., perform power swaps. Furthermore, due to the nonlinearities of aggregated DER flexibility provision, some units may need to rapidly change their outputs to enable the services. To address these challenges, this paper brings together exact AC optimal power flow (OPF) models and a cooperative game formulation and presents a new framework for tracing, ranking, and valuation of aggregated DER flexibility in ADNs. Extensive tests and simulations performed for the 33-bus radial distribution network demonstrate that the framework enables translating complex DER interactions into useful information for DSOs by ranking the criticality of flexible units and performing flexibility valuation based on its cost or economic surplus. Additionally, the framework proposes no-swap constraints and a nonlinearity metric which can be used by DSOs to identify unreliable operating regions with power swaps or rapid changes in flexible unit dispatch. △ Less

Submitted 26 May, 2023; v1 submitted 7 October, 2022; originally announced October 2022.

arXiv:2110.02771 [pdf, other]

DNN-assisted Particle-based Bayesian Joint Synchronization and Localization

Authors: Meysam Goodarzi, Vladica Sark, Nebojsa Maletic, Jesús Gutiérrez, Giuseppe Caire, Eckhard Grass

Abstract: In this work, we propose a Deep neural network-assisted Particle Filter-based (DePF) approach to address the Mobile User (MU) joint synchronization and localization (sync\&loc) problem in ultra dense networks. In particular, DePF deploys an asymmetric time-stamp exchange mechanism between the MUs and the Access Points (APs), which, traditionally, provides us with information about the MUs' clock o… ▽ More In this work, we propose a Deep neural network-assisted Particle Filter-based (DePF) approach to address the Mobile User (MU) joint synchronization and localization (sync\&loc) problem in ultra dense networks. In particular, DePF deploys an asymmetric time-stamp exchange mechanism between the MUs and the Access Points (APs), which, traditionally, provides us with information about the MUs' clock offset and skew. However, information about the distance between an AP and an MU is also intrinsic to the propagation delay experienced by exchanged time-stamps. In addition, to estimate the angle of arrival of the received synchronization packet, DePF draws on the multiple signal classification algorithm that is fed by Channel Impulse Response (CIR) experienced by the sync packets. The CIR is also leveraged on to determine the link condition, i.e. Line-of-Sight (LoS) or Non-LoS. Finally, to perform joint sync\&loc, DePF capitalizes on particle Gaussian mixtures that allow for a hybrid particle-based and parametric Bayesian Recursive Filtering (BRF) fusion of the aforementioned pieces of information and thus jointly estimate the position and clock parameters of the MUs. The simulation results verifies the superiority of the proposed algorithm over the state-of-the-art schemes, especially that of Extended Kalman filter- and linearized BRF-based joint sync\&loc. In particular, only drawing on the synchronization time-stamp exchange and CIRs, for 90$\%$of the cases, the absolute position and clock offset estimation error remain below 1 meter and 2 nanoseconds, respectively. △ Less

Submitted 2 June, 2022; v1 submitted 29 September, 2021; originally announced October 2021.

arXiv:2110.01086 [pdf, other]

Assessing Distribution Network Flexibility via Reliability-based P-Q Area Segmentation

Authors: Andrey Churkin, Wangwei Kong, Jose N. Melchor Gutierrez, Pierluigi Mancarella, Eduardo A. Martinez Cesena

Abstract: This paper proposes a framework to assess the flexibility of active distribution networks (ADNs) via P-Q area segmentation, considering the reliability of flexible units (FUs). A mixed-integer quadratically constrained programming (MIQCP) model is formulated to analyse flexible active and reactive power support at the interface with transmission networks, explicitly capturing the contributions and… ▽ More This paper proposes a framework to assess the flexibility of active distribution networks (ADNs) via P-Q area segmentation, considering the reliability of flexible units (FUs). A mixed-integer quadratically constrained programming (MIQCP) model is formulated to analyse flexible active and reactive power support at the interface with transmission networks, explicitly capturing the contributions and reliability of FUs that provide flexibility services within an ADN. The numerical simulations performed for a real 124-bus UK distribution network demonstrate the optimal flexibility provision by different FUs, as well as the corresponding reliability and the impact of network reconfiguration. Distribution system operators (DSOs) can use the proposed framework to identify critical units, select an adequate combination of flexibility volumes, and manage its reliability. △ Less

Submitted 17 April, 2023; v1 submitted 3 October, 2021; originally announced October 2021.

Comments: Submitted to PSCC 2022, then resubmitted to IEEE PowerTech 2023 conference

arXiv:2008.08481 [pdf, other]

Bayesian Joint Synchronization and Localization Based on Asymmetric Time-stamp Exchange

Authors: Meysam Goodarzi, Nebojsa Maletic, Jesus Gutierrez, Eckhard Grass

Abstract: In this work, we study the joint synchronization and localization (sync&loc) of Mobile Nodes (MNs) in ultra dense networks. In particular, we deploy an asymmetric timestamp exchange mechanism between MNs and Access Nodes (ANs), that, traditionally, provides us with information about the MNs' clock offset and skew. However, information about the distance between an AN and a MN is also intrinsic to… ▽ More In this work, we study the joint synchronization and localization (sync&loc) of Mobile Nodes (MNs) in ultra dense networks. In particular, we deploy an asymmetric timestamp exchange mechanism between MNs and Access Nodes (ANs), that, traditionally, provides us with information about the MNs' clock offset and skew. However, information about the distance between an AN and a MN is also intrinsic to the propagation delay experienced by exchanged time-stamps. In addition, we utilize Angle of Arrival (AoA) estimation to determine the incoming direction of time-stamp exchange packets, which gives further information about the MNs' location. Finally, we employ Bayesian Recursive Filtering (BRF) to combine the aforementioned pieces of information and jointly estimate the position and clock parameters of MNs. The simulation results indicate that the Root Mean Square Errors (RMSEs) of position and clock offset estimation are kept below 1 meter and 1 ns, respectively. △ Less

Submitted 19 August, 2020; originally announced August 2020.

Comments: IEEE International Symposium on Networks, Computers and Communications (ISNCC 2020)

arXiv:2004.09469 [pdf, other]

A Hybrid Bayesian Approach Towards Clock Offset and Skew Estimation in 5G Networks

Authors: Meysam Goodarzi, Darko Cvetkovski, Nebojsa Maletic, Jesus Gutierrez, Eckhard Grass

Abstract: In this work, we propose a hybrid Bayesian approach towards clock offset and skew estimation, thereby synchronizing large scale networks. In particular, we demonstrate the advantage of Bayesian Recursive Filtering (BRF) in alleviating time-stamping errors for pairwise synchronization. Moreover, we indicate the benefit of Factor Graph (FG), along with Belief Propagation (BP) algorithm in achieving… ▽ More In this work, we propose a hybrid Bayesian approach towards clock offset and skew estimation, thereby synchronizing large scale networks. In particular, we demonstrate the advantage of Bayesian Recursive Filtering (BRF) in alleviating time-stamping errors for pairwise synchronization. Moreover, we indicate the benefit of Factor Graph (FG), along with Belief Propagation (BP) algorithm in achieving high precision end-to-end network synchronization. Finally, we reveal the merit of hybrid synchronization, where a large-scale network is divided into local synchronization domains, for each of which a suitable synchronization algorithm (BP- or BRF-based) is utilized. The simulation results show that, despite the simplifications in the hybrid approach, the Root Mean Square Errors (RMSEs) of clock offset and skew estimation remain below 5 ns and 0.3 ppm, respectively. △ Less

Submitted 20 April, 2020; originally announced April 2020.

Comments: arXiv admin note: text overlap with arXiv:2002.12660

arXiv:2002.12660 [pdf, other]

Synchronization in 5G: a Bayesian Approach

Authors: M. Goodarzi, D. Cvetkovski, N. Maletic, J. Gutierrez, E. Grass

Abstract: In this work, we propose a hybrid approach to synchronize large scale networks. In particular, we draw on Kalman Filtering (KF) along with time-stamps generated by the Precision Time Protocol (PTP) for pairwise node synchronization. Furthermore, we investigate the merit of Factor Graphs (FGs) along with Belief Propagation (BP) algorithm in achieving high precision end-to-end network synchronizatio… ▽ More In this work, we propose a hybrid approach to synchronize large scale networks. In particular, we draw on Kalman Filtering (KF) along with time-stamps generated by the Precision Time Protocol (PTP) for pairwise node synchronization. Furthermore, we investigate the merit of Factor Graphs (FGs) along with Belief Propagation (BP) algorithm in achieving high precision end-to-end network synchronization. Finally, we present the idea of dividing the large-scale network into local synchronization domains, for each of which a suitable sync algorithm is utilized. The simulation results indicate that, despite the simplifications in the hybrid approach, the error in the offset estimation remains below 5 ns. △ Less

Submitted 28 February, 2020; originally announced February 2020.

arXiv:2001.04528 [pdf, other]

doi 10.1111/cgf.13889

On Demand Solid Texture Synthesis Using Deep 3D Networks

Authors: Jorge Gutierrez, Julien Rabin, Bruno Galerne, Thomas Hurtut

Abstract: This paper describes a novel approach for on demand volumetric texture synthesis based on a deep learning framework that allows for the generation of high quality 3D data at interactive rates. Based on a few example images of textures, a generative network is trained to synthesize coherent portions of solid textures of arbitrary sizes that reproduce the visual characteristics of the examples along… ▽ More This paper describes a novel approach for on demand volumetric texture synthesis based on a deep learning framework that allows for the generation of high quality 3D data at interactive rates. Based on a few example images of textures, a generative network is trained to synthesize coherent portions of solid textures of arbitrary sizes that reproduce the visual characteristics of the examples along some directions. To cope with memory limitations and computation complexity that are inherent to both high resolution and 3D processing on the GPU, only 2D textures referred to as "slices" are generated during the training stage. These synthetic textures are compared to exemplar images via a perceptual loss function based on a pre-trained deep network. The proposed network is very light (less than 100k parameters), therefore it only requires sustainable training (i.e. few hours) and is capable of very fast generation (around a second for $256^3$ voxels) on a single GPU. Integrated with a spatially seeded PRNG the proposed generator network directly returns an RGB value given a set of 3D coordinates. The synthesized volumes have good visual results that are at least equivalent to the state-of-the-art patch based approaches. They are naturally seamlessly tileable and can be fully generated in parallel. △ Less

Submitted 13 January, 2020; originally announced January 2020.

Showing 1–13 of 13 results for author: Gutierrez, J