Search | arXiv e-print repository

An Interpretable Transformer-Based Foundation Model for Cross-Procedural Skill Assessment Using Raw fNIRS Signals

Authors: A. Subedi, S. De, L. Cavuoto, S. Schwaitzberg, M. Hackett, J. Norfleet

Abstract: Objective skill assessment in high-stakes procedural environments requires models that not only decode underlying cognitive and motor processes but also generalize across tasks, individuals, and experimental contexts. While prior work has demonstrated the potential of functional near-infrared spectroscopy (fNIRS) for evaluating cognitive-motor performance, existing approaches are often task-specif… ▽ More Objective skill assessment in high-stakes procedural environments requires models that not only decode underlying cognitive and motor processes but also generalize across tasks, individuals, and experimental contexts. While prior work has demonstrated the potential of functional near-infrared spectroscopy (fNIRS) for evaluating cognitive-motor performance, existing approaches are often task-specific, rely on extensive preprocessing, and lack robustness to new procedures or conditions. Here, we introduce an interpretable transformer-based foundation model trained on minimally processed fNIRS signals for cross-procedural skill assessment. Pretrained using self-supervised learning on data from laparoscopic surgical tasks and endotracheal intubation (ETI), the model achieves greater than 88% classification accuracy on all tasks, with Matthews Correlation Coefficient exceeding 0.91 on ETI. It generalizes to a novel emergency airway procedure--cricothyrotomy--using fewer than 30 labeled samples and a lightweight (less than 2k parameter) adapter module, attaining an AUC greater than 87%. Interpretability is achieved via a novel channel attention mechanism--developed specifically for fNIRS--that identifies functionally coherent prefrontal sub-networks validated through ablation studies. Temporal attention patterns align with task-critical phases and capture stress-induced changes in neural variability, offering insight into dynamic cognitive states. △ Less

Submitted 21 June, 2025; originally announced June 2025.

ACM Class: I.2.6; J.3; H.1.2

arXiv:2505.20146 [pdf, ps, other]

On the Robustness of RSMA to Adversarial BD-RIS-Induced Interference

Authors: Arthur S. de Sena, Jacek Kibilda, Nurul H. Mahmood, Andre Gomes, Luiz A. DaSilva, Matti Latva-aho

Abstract: This article investigates the robustness of rate-splitting multiple access (RSMA) in multi-user multiple-input multiple-output (MIMO) systems to interference attacks against channel acquisition induced by beyond-diagonal RISs (BD-RISs). Two primary attack strategies, random and aligned interference, are proposed for fully connected and group-connected BD-RIS architectures. Valid random reflection… ▽ More This article investigates the robustness of rate-splitting multiple access (RSMA) in multi-user multiple-input multiple-output (MIMO) systems to interference attacks against channel acquisition induced by beyond-diagonal RISs (BD-RISs). Two primary attack strategies, random and aligned interference, are proposed for fully connected and group-connected BD-RIS architectures. Valid random reflection coefficients are generated exploiting the Takagi factorization, while potent aligned interference attacks are achieved through optimization strategies based on a quadratically constrained quadratic program (QCQP) reformulation followed by projections onto the unitary manifold. Our numerical findings reveal that, when perfect channel state information (CSI) is available, RSMA behaves similarly to space-division multiple access (SDMA) and thus is highly susceptible to the attack, with BD-RIS inducing severe performance loss and significantly outperforming diagonal RIS. However, under imperfect CSI, RSMA consistently demonstrates significantly greater robustness than SDMA, particularly as the system's transmit power increases. △ Less

Submitted 26 May, 2025; originally announced May 2025.

arXiv:2504.14584 [pdf, ps, other]

Max-Min Fairness for Stacked Intelligent Metasurface-Assisted Multi-User MISO Systems

Authors: Nipuni Ginige, Prathapasinghe Dharmawansa, Arthur Sousa de Sena, Nurul Huda Mahmood, Nandana Rajatheva, Matti Latva-aho

Abstract: Stacked intelligent metasurface (SIM) is an emerging technology that uses multiple reconfigurable surface layers to enable flexible wave-based beamforming. In this paper, we focus on an \ac{SIM}-assisted multi-user multiple-input single-output system, where it is essential to ensure that all users receive a fair and reliable service level. To this end, we develop two max-min fairness algorithms ba… ▽ More Stacked intelligent metasurface (SIM) is an emerging technology that uses multiple reconfigurable surface layers to enable flexible wave-based beamforming. In this paper, we focus on an \ac{SIM}-assisted multi-user multiple-input single-output system, where it is essential to ensure that all users receive a fair and reliable service level. To this end, we develop two max-min fairness algorithms based on instantaneous channel state information (CSI) and statistical CSI. For the instantaneous CSI case, we propose an alternating optimization algorithm that jointly optimizes power allocation using geometric programming and wave-based beamforming coefficients using the gradient descent-ascent method. For the statistical CSI case, since deriving an exact expression for the average minimum achievable rate is analytically intractable, we derive a tight upper bound and thereby formulate a stochastic optimization problem. This problem is then solved, capitalizing on an alternating approach combining geometric programming and gradient descent algorithms, to obtain the optimal policies. Our numerical results show significant improvements in the minimum achievable rate compared to the benchmark schemes. In particular, for the instantaneous CSI scenario, the individual impact of the optimal wave-based beamforming is significantly higher than that of the power allocation strategy. Moreover, the proposed upper bound is shown to be tight in the low signal-to-noise ratio regime under the statistical CSI. △ Less

Submitted 20 April, 2025; originally announced April 2025.

arXiv:2504.03681 [pdf]

End-to-End Deep Learning for Real-Time Neuroimaging-Based Assessment of Bimanual Motor Skills

Authors: Aseem Subedi, Rahul, Lora Cavuoto, Steven Schwaitzberg, Matthew Hackett, Jack Norfleet, Suvranu De

Abstract: The real-time assessment of complex motor skills presents a challenge in fields such as surgical training and rehabilitation. Recent advancements in neuroimaging, particularly functional near-infrared spectroscopy (fNIRS), have enabled objective assessment of such skills with high accuracy. However, these techniques are hindered by extensive preprocessing requirements to extract neural biomarkers.… ▽ More The real-time assessment of complex motor skills presents a challenge in fields such as surgical training and rehabilitation. Recent advancements in neuroimaging, particularly functional near-infrared spectroscopy (fNIRS), have enabled objective assessment of such skills with high accuracy. However, these techniques are hindered by extensive preprocessing requirements to extract neural biomarkers. This study presents a novel end-to-end deep learning framework that processes raw fNIRS signals directly, eliminating the need for intermediate preprocessing steps. The model was evaluated on datasets from three distinct bimanual motor tasks--suturing, pattern cutting, and endotracheal intubation (ETI)--using performance metrics derived from both training and retention datasets. It achieved a mean classification accuracy of 93.9% (SD 4.4) and a generalization accuracy of 92.6% (SD 1.9) on unseen skill retention datasets, with a leave-one-subject-out cross-validation yielding an accuracy of 94.1% (SD 3.6). Contralateral prefrontal cortex activations exhibited task-specific discriminative power, while motor cortex activations consistently contributed to accurate classification. The model also demonstrated resilience to neurovascular coupling saturation caused by extended task sessions, maintaining robust performance across trials. Comparative analysis confirms that the end-to-end model performs on par with or surpasses baseline models optimized for fully processed fNIRS data, with statistically similar (p<0.05) or improved prediction accuracies. By eliminating the need for extensive signal preprocessing, this work provides a foundation for real-time, non-invasive assessment of bimanual motor skills in medical training environments, with potential applications in robotics, rehabilitation, and sports. △ Less

Submitted 21 March, 2025; originally announced April 2025.

arXiv:2503.22520 [pdf, other]

Multi-stage model predictive control for slug flow crystallizers using uncertainty-aware surrogate models

Authors: Collin R. Johnson, Stijn de Vries, Kerstin Wohlgemuth, Sergio Lucia

Abstract: This paper presents a novel dynamic model for slug flow crystallizers that addresses the challenges of spatial distribution without backmixing or diffusion, potentially enabling advanced model-based control. The developed model can accurately describe the main characteristics of slug flow crystallizers, including slug-to-slug variability but leads to a high computational complexity due to the cons… ▽ More This paper presents a novel dynamic model for slug flow crystallizers that addresses the challenges of spatial distribution without backmixing or diffusion, potentially enabling advanced model-based control. The developed model can accurately describe the main characteristics of slug flow crystallizers, including slug-to-slug variability but leads to a high computational complexity due to the consideration of partial differential equations and population balance equations. For that reason, the model cannot be directly used for process optimization and control. To solve this challenge, we propose two different approaches, conformalized quantile regression and Bayesian last layer neural networks, to develop surrogate models with uncertainty quantification capabilities. These surrogates output a prediction of the system states together with an uncertainty of these predictions to account for process variability and model uncertainty. We use the uncertainty of the predictions to formulate a robust model predictive control approach, enabling robust real-time advanced control of a slug flow crystallizer. △ Less

Submitted 28 March, 2025; originally announced March 2025.

arXiv:2503.07177 [pdf, other]

The 4D Human Embryonic Brain Atlas: spatiotemporal atlas generation for rapid anatomical changes using first-trimester ultrasound from the Rotterdam Periconceptional Cohort

Authors: Wietske A. P. Bastiaansen, Melek Rousian, Anton H. J. Koning, Wiro J. Niessen, Bernadette S. de Bakker, Régine P. M. Steegers-Theunissen, Stefan Klein

Abstract: Early brain development is crucial for lifelong neurodevelopmental health. However, current clinical practice offers limited knowledge of normal embryonic brain anatomy on ultrasound, despite the brain undergoing rapid changes within the time-span of days. To provide detailed insights into normal brain development and identify deviations, we created the 4D Human Embryonic Brain Atlas using a deep… ▽ More Early brain development is crucial for lifelong neurodevelopmental health. However, current clinical practice offers limited knowledge of normal embryonic brain anatomy on ultrasound, despite the brain undergoing rapid changes within the time-span of days. To provide detailed insights into normal brain development and identify deviations, we created the 4D Human Embryonic Brain Atlas using a deep learning-based approach for groupwise registration and spatiotemporal atlas generation. Our method introduced a time-dependent initial atlas and penalized deviations from it, ensuring age-specific anatomy was maintained throughout rapid development. The atlas was generated and validated using 831 3D ultrasound images from 402 subjects in the Rotterdam Periconceptional Cohort, acquired between gestational weeks 8 and 12. We evaluated the effectiveness of our approach with an ablation study, which demonstrated that incorporating a time-dependent initial atlas and penalization produced anatomically accurate results. In contrast, omitting these adaptations led to anatomically incorrect atlas. Visual comparisons with an existing ex-vivo embryo atlas further confirmed the anatomical accuracy of our atlas. In conclusion, the proposed method successfully captures the rapid anotomical development of the embryonic brain. The resulting 4D Human Embryonic Brain Atlas provides a unique insights into this crucial early life period and holds the potential for improving the detection, prevention, and treatment of prenatal neurodevelopmental disorders. △ Less

Submitted 10 March, 2025; originally announced March 2025.

arXiv:2502.18280 [pdf, ps, other]

doi 10.3384/ecp212.060

A Deep-Unfolding Approach to RIS Phase Shift Optimization Via Transformer-Based Channel Prediction

Authors: Ishan Koralege, Arthur S. de Sena, Nurul H. Mahmood, Farjam Karim, Dimuthu Lesthuruge, Samitha Gunarathne

Abstract: Reconfigurable intelligent surfaces (RISs) have emerged as a promising solution that can provide dynamic control over the propagation of electromagnetic waves. The RIS technology is envisioned as a key enabler of sixth-generation networks by offering the ability to adaptively manipulate signal propagation through the smart configuration of its phase shift coefficients, thereby optimizing signal st… ▽ More Reconfigurable intelligent surfaces (RISs) have emerged as a promising solution that can provide dynamic control over the propagation of electromagnetic waves. The RIS technology is envisioned as a key enabler of sixth-generation networks by offering the ability to adaptively manipulate signal propagation through the smart configuration of its phase shift coefficients, thereby optimizing signal strength, coverage, and capacity. However, the realization of this technology's full potential hinges on the accurate acquisition of channel state information (CSI). In this paper, we propose an efficient CSI prediction framework for a RIS-assisted communication system based on the machine learning (ML) transformer architecture. Architectural modifications are introduced to the vanilla transformer for multivariate time series forecasting to achieve high prediction accuracy. The predicted channel coefficients are then used to optimize the RIS phase shifts. Simulation results present a comprehensive analysis of key performance metrics, including data rate and outage probability. Our results confirm the effectiveness of the proposed ML approach and demonstrate its superiority over other baseline ML-based CSI prediction schemes such as conventional deep neural networks and long short-term memory architectures, albeit at the cost of slightly increased complexity. △ Less

Submitted 25 February, 2025; originally announced February 2025.

Comments: Accepted for Scandinavian Simulation Society(SIMS) EUROSIM 2024

Journal ref: Linkoping Electronic Conference Proceedings 212 (2024) 441-447

arXiv:2501.19109 [pdf, ps, other]

Reliability Modeling for Beyond-5G Mission Critical Networks Using Effective Capacity

Authors: Anudeep Karnam, Jobish John, Kishor C. Joshi, George Exarchakos, Sonia Heemstra de Groot, Ignas Niemegeers

Abstract: Accurate reliability modeling for ultra-reliable low latency communication (URLLC) and hyper-reliable low latency communication (HRLLC) networks is challenging due to the complex interactions between network layers required to meet stringent requirements. In this paper, we propose such a model. We consider the acknowledged mode of the radio link control (RLC) layer, utilizing separate buffers for… ▽ More Accurate reliability modeling for ultra-reliable low latency communication (URLLC) and hyper-reliable low latency communication (HRLLC) networks is challenging due to the complex interactions between network layers required to meet stringent requirements. In this paper, we propose such a model. We consider the acknowledged mode of the radio link control (RLC) layer, utilizing separate buffers for transmissions and retransmissions, along with the behavior of physical channels. Our approach leverages the effective capacity (EC) framework, which quantifies the maximum constant arrival rate a time-varying wireless channel can support while meeting statistical quality of service (QoS) constraints. We derive a reliability model that incorporates delay violations, various latency components, and multiple transmission attempts. Our method identifies optimal operating conditions that satisfy URLLC/HRLLC constraints while maintaining near-optimal EC, ensuring the system can handle peak traffic with a guaranteed QoS. Our model reveals critical trade-offs between EC and reliability across various use cases, providing guidance for URLLC/HRLLC network design for service providers and system designers. △ Less

Submitted 31 January, 2025; originally announced January 2025.

Comments: 7 pages, This paper has been accepted for publication in the 2025 IEEE Wireless Communications and Networking Conference (WCNC)

arXiv:2501.19019 [pdf, ps, other]

Uplink Rate Splitting Multiple Access with Imperfect Channel State Information and Interference Cancellation

Authors: Farjam Karim, Nurul Huda Mahmood, Arthur S. de Sena, Deepak Kumar, Bruno Clerckx, Matti Latva-aho

Abstract: This article investigates the performance of uplink rate splitting multiple access (RSMA) in a two-user scenario, addressing an under-explored domain compared to its downlink counterpart. With the increasing demand for uplink communication in applications like the Internet-of-Things, it is essential to account for practical imperfections, such as inaccuracies in channel state information at the re… ▽ More This article investigates the performance of uplink rate splitting multiple access (RSMA) in a two-user scenario, addressing an under-explored domain compared to its downlink counterpart. With the increasing demand for uplink communication in applications like the Internet-of-Things, it is essential to account for practical imperfections, such as inaccuracies in channel state information at the receiver (CSIR) and limitations in successive interference cancellation (SIC), to provide realistic assessments of system performance. Specifically, we derive closed-form expressions for the outage probability, throughput, and asymptotic outage behavior of uplink users, considering imperfect CSIR and SIC. We validate the accuracy of these derived expressions using Monte Carlo simulations. Our findings reveal that at low transmit power levels, imperfect CSIR significantly affects system performance more severely than SIC imperfections. However, as the transmit power increases, the impact of imperfect CSIR diminishes, while the influence of SIC imperfections becomes more pronounced. Moreover, we highlight the impact of the rate allocation factor on user performance. Finally, our comparison with non-orthogonal multiple access (NOMA) highlights the outage performance trade-offs between RSMA and NOMA. RSMA proves to be more effective in managing imperfect CSIR and enhances performance through strategic message splitting, resulting in more robust communication. △ Less

Submitted 31 January, 2025; originally announced January 2025.

arXiv:2501.10727 [pdf, other]

doi 10.1145/3715275.3732035

In the Picture: Medical Imaging Datasets, Artifacts, and their Living Review

Authors: Amelia Jiménez-Sánchez, Natalia-Rozalia Avlona, Sarah de Boer, Víctor M. Campello, Aasa Feragen, Enzo Ferrante, Melanie Ganz, Judy Wawira Gichoya, Camila González, Steff Groefsema, Alessa Hering, Adam Hulman, Leo Joskowicz, Dovile Juodelyte, Melih Kandemir, Thijs Kooi, Jorge del Pozo Lérida, Livie Yumeng Li, Andre Pacheco, Tim Rädsch, Mauricio Reyes, Théo Sourget, Bram van Ginneken, David Wen, Nina Weng , et al. (4 additional authors not shown)

Abstract: Datasets play a critical role in medical imaging research, yet issues such as label quality, shortcuts, and metadata are often overlooked. This lack of attention may harm the generalizability of algorithms and, consequently, negatively impact patient outcomes. While existing medical imaging literature reviews mostly focus on machine learning (ML) methods, with only a few focusing on datasets for s… ▽ More Datasets play a critical role in medical imaging research, yet issues such as label quality, shortcuts, and metadata are often overlooked. This lack of attention may harm the generalizability of algorithms and, consequently, negatively impact patient outcomes. While existing medical imaging literature reviews mostly focus on machine learning (ML) methods, with only a few focusing on datasets for specific applications, these reviews remain static -- they are published once and not updated thereafter. This fails to account for emerging evidence, such as biases, shortcuts, and additional annotations that other researchers may contribute after the dataset is published. We refer to these newly discovered findings of datasets as research artifacts. To address this gap, we propose a living review that continuously tracks public datasets and their associated research artifacts across multiple medical imaging applications. Our approach includes a framework for the living review to monitor data documentation artifacts, and an SQL database to visualize the citation relationships between research artifact and dataset. Lastly, we discuss key considerations for creating medical imaging datasets, review best practices for data annotation, discuss the significance of shortcuts and demographic diversity, and emphasize the importance of managing datasets throughout their entire lifecycle. Our demo is publicly available at http://inthepicture.itu.dk/. △ Less

Submitted 2 June, 2025; v1 submitted 18 January, 2025; originally announced January 2025.

Comments: ACM Conference on Fairness, Accountability, and Transparency - FAccT 2025

arXiv:2501.08464 [pdf, other]

Time series forecasting for multidimensional telemetry data using GAN and BiLSTM in a Digital Twin

Authors: Joao Carmo de Almeida Neto, Claudio Miceli de Farias, Leandro Santiago de Araujo, Leopoldo Andre Dutra Lusquino Filho

Abstract: The research related to digital twins has been increasing in recent years. Besides the mirroring of the physical word into the digital, there is the need of providing services related to the data collected and transferred to the virtual world. One of these services is the forecasting of physical part future behavior, that could lead to applications, like preventing harmful events or designing impr… ▽ More The research related to digital twins has been increasing in recent years. Besides the mirroring of the physical word into the digital, there is the need of providing services related to the data collected and transferred to the virtual world. One of these services is the forecasting of physical part future behavior, that could lead to applications, like preventing harmful events or designing improvements to get better performance. One strategy used to predict any system operation it is the use of time series models like ARIMA or LSTM, and improvements were implemented using these algorithms. Recently, deep learning techniques based on generative models such as Generative Adversarial Networks (GANs) have been proposed to create time series and the use of LSTM has gained more relevance in time series forecasting, but both have limitations that restrict the forecasting results. Another issue found in the literature is the challenge of handling multivariate environments/applications in time series generation. Therefore, new methods need to be studied in order to fill these gaps and, consequently, provide better resources for creating useful digital twins. In this proposal, it is going to be studied the integration of a BiLSTM layer with a time series obtained by GAN in order to improve the forecasting of all the features provided by the dataset in terms of accuracy and, consequently, improving behaviour prediction. △ Less

Submitted 14 January, 2025; originally announced January 2025.

arXiv:2501.05586 [pdf, other]

doi 10.1109/ICASSP49660.2025.10890068

FreeSVC: Towards Zero-shot Multilingual Singing Voice Conversion

Authors: Alef Iury Siqueira Ferreira, Lucas Rafael Gris, Augusto Seben da Rosa, Frederico Santos de Oliveira, Edresson Casanova, Rafael Teixeira Sousa, Arnaldo Candido Junior, Anderson da Silva Soares, Arlindo Galvão Filho

Abstract: This work presents FreeSVC, a promising multilingual singing voice conversion approach that leverages an enhanced VITS model with Speaker-invariant Clustering (SPIN) for better content representation and the State-of-the-Art (SOTA) speaker encoder ECAPA2. FreeSVC incorporates trainable language embeddings to handle multiple languages and employs an advanced speaker encoder to disentangle speaker c… ▽ More This work presents FreeSVC, a promising multilingual singing voice conversion approach that leverages an enhanced VITS model with Speaker-invariant Clustering (SPIN) for better content representation and the State-of-the-Art (SOTA) speaker encoder ECAPA2. FreeSVC incorporates trainable language embeddings to handle multiple languages and employs an advanced speaker encoder to disentangle speaker characteristics from linguistic content. Designed for zero-shot learning, FreeSVC enables cross-lingual singing voice conversion without extensive language-specific training. We demonstrate that a multilingual content extractor is crucial for optimal cross-language conversion. Our source code and models are publicly available. △ Less

Submitted 9 January, 2025; originally announced January 2025.

arXiv:2412.15756 [pdf, other]

Probabilistic Latent Variable Modeling for Dynamic Friction Identification and Estimation

Authors: Victor Vantilborgh, Sander De Witte, Frederik Ostyn, Tom Lefebvre, Guillaume Crevecoeur

Abstract: Precise identification of dynamic models in robotics is essential to support control design, friction compensation, output torque estimation, etc. A longstanding challenge remains in the identification of friction models for robotic joints, given the numerous physical phenomena affecting the underlying friction dynamics which result into nonlinear characteristics and hysteresis behaviour in partic… ▽ More Precise identification of dynamic models in robotics is essential to support control design, friction compensation, output torque estimation, etc. A longstanding challenge remains in the identification of friction models for robotic joints, given the numerous physical phenomena affecting the underlying friction dynamics which result into nonlinear characteristics and hysteresis behaviour in particular. These phenomena proof difficult to be modelled and captured accurately using physical analogies alone. This has motivated researchers to shift from physics-based to data-driven models. Currently, these methods are still limited in their ability to generalize effectively to typical industrial robot deployement, characterized by high- and low-velocity operations and frequent direction reversals. Empirical observations motivate the use of dynamic friction models but these remain particulary challenging to establish. To address the current limitations, we propose to account for unidentified dynamics in the robot joints using latent dynamic states. The friction model may then utilize both the dynamic robot state and additional information encoded in the latent state to evaluate the friction torque. We cast this stochastic and partially unsupervised identification problem as a standard probabilistic representation learning problem. In this work both the friction model and latent state dynamics are parametrized as neural networks and integrated in the conventional lumped parameter dynamic robot model. The complete dynamics model is directly learned from the noisy encoder measurements in the robot joints. We use the Expectation-Maximisation (EM) algorithm to find a Maximum Likelihood Estimate (MLE) of the model parameters. The effectiveness of the proposed method is validated in terms of open-loop prediction accuracy in comparison with baseline methods, using the Kuka KR6 R700 as a test platform. △ Less

Submitted 20 December, 2024; originally announced December 2024.

arXiv:2412.08684 [pdf, other]

Coherent3D: Coherent 3D Portrait Video Reconstruction via Triplane Fusion

Authors: Shengze Wang, Xueting Li, Chao Liu, Matthew Chan, Michael Stengel, Henry Fuchs, Shalini De Mello, Koki Nagano

Abstract: Recent breakthroughs in single-image 3D portrait reconstruction have enabled telepresence systems to stream 3D portrait videos from a single camera in real-time, democratizing telepresence. However, per-frame 3D reconstruction exhibits temporal inconsistency and forgets the user's appearance. On the other hand, self-reenactment methods can render coherent 3D portraits by driving a 3D avatar built… ▽ More Recent breakthroughs in single-image 3D portrait reconstruction have enabled telepresence systems to stream 3D portrait videos from a single camera in real-time, democratizing telepresence. However, per-frame 3D reconstruction exhibits temporal inconsistency and forgets the user's appearance. On the other hand, self-reenactment methods can render coherent 3D portraits by driving a 3D avatar built from a single reference image, but fail to faithfully preserve the user's per-frame appearance (e.g., instantaneous facial expression and lighting). As a result, none of these two frameworks is an ideal solution for democratized 3D telepresence. In this work, we address this dilemma and propose a novel solution that maintains both coherent identity and dynamic per-frame appearance to enable the best possible realism. To this end, we propose a new fusion-based method that takes the best of both worlds by fusing a canonical 3D prior from a reference view with dynamic appearance from per-frame input views, producing temporally stable 3D videos with faithful reconstruction of the user's per-frame appearance. Trained only using synthetic data produced by an expression-conditioned 3D GAN, our encoder-based method achieves both state-of-the-art 3D reconstruction and temporal consistency on in-studio and in-the-wild datasets. https://research.nvidia.com/labs/amri/projects/coherent3d △ Less

Submitted 11 December, 2024; originally announced December 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2405.00794

arXiv:2411.17725 [pdf, ps, other]

Efficient Channel Prediction for Beyond Diagonal RIS-Assisted MIMO Systems with Channel Aging

Authors: Nipuni Ginige, Arthur Sousa de Sena, Nurul Huda Mahmood, Nandana Rajatheva, Matti Latva-aho

Abstract: Novel reconfigurable intelligent surface (RIS) architectures, known as beyond diagonal RISs (BD-RISs), have been proposed to enhance reflection efficiency and expand RIS capabilities. However, their passive nature, non-diagonal reflection matrix, and the large number of coupled reflecting elements complicate the channel state information (CSI) estimation process. The challenge further escalates in… ▽ More Novel reconfigurable intelligent surface (RIS) architectures, known as beyond diagonal RISs (BD-RISs), have been proposed to enhance reflection efficiency and expand RIS capabilities. However, their passive nature, non-diagonal reflection matrix, and the large number of coupled reflecting elements complicate the channel state information (CSI) estimation process. The challenge further escalates in scenarios with fast-varying channels. In this paper, we address this challenge by proposing novel joint channel estimation and prediction strategies with low overhead and high accuracy for two different RIS architectures in a BD-RIS-assisted multiple-input multiple-output system under correlated fast-fading environments with channel aging. The channel estimation procedure utilizes the Tucker2 decomposition with bilinear alternative least squares, which is exploited to decompose the cascade channels of the BD-RIS-assisted system into effective channels of reduced dimension. The channel prediction framework is based on a convolutional neural network combined with an autoregressive predictor. The estimated/predicted CSI is then utilized to optimize the RIS phase shifts aiming at the maximization of the downlink sum rate. Insightful simulation results demonstrate that our proposed approach is robust to channel aging, and exhibits a high estimation accuracy. Moreover, our scheme can deliver a high average downlink sum rate, outperforming other state-of-the-art channel estimation methods. The results also reveal a remarkable reduction in pilot overhead of up to 98\% compared to baseline schemes, all imposing low computational complexity. △ Less

Submitted 21 November, 2024; originally announced November 2024.

Comments: arXiv admin note: text overlap with arXiv:2406.07387

arXiv:2410.19199 [pdf, other]

Making Social Platforms Accessible: Emotion-Aware Speech Generation with Integrated Text Analysis

Authors: Suparna De, Ionut Bostan, Nishanth Sastry

Abstract: Recent studies have outlined the accessibility challenges faced by blind or visually impaired, and less-literate people, in interacting with social networks, in-spite of facilitating technologies such as monotone text-to-speech (TTS) screen readers and audio narration of visual elements such as emojis. Emotional speech generation traditionally relies on human input of the expected emotion together… ▽ More Recent studies have outlined the accessibility challenges faced by blind or visually impaired, and less-literate people, in interacting with social networks, in-spite of facilitating technologies such as monotone text-to-speech (TTS) screen readers and audio narration of visual elements such as emojis. Emotional speech generation traditionally relies on human input of the expected emotion together with the text to synthesise, with additional challenges around data simplification (causing information loss) and duration inaccuracy, leading to lack of expressive emotional rendering. In real-life communications, the duration of phonemes can vary since the same sentence might be spoken in a variety of ways depending on the speakers' emotional states or accents (referred to as the one-to-many problem of text to speech generation). As a result, an advanced voice synthesis system is required to account for this unpredictability. We propose an end-to-end context-aware Text-to-Speech (TTS) synthesis system that derives the conveyed emotion from text input and synthesises audio that focuses on emotions and speaker features for natural and expressive speech, integrating advanced natural language processing (NLP) and speech synthesis techniques for real-time applications. Our system also showcases competitive inference time performance when benchmarked against the state-of-the-art TTS models, making it suitable for real-time accessibility applications. △ Less

Submitted 24 October, 2024; originally announced October 2024.

Journal ref: 16th International Conference on Advances in Social Networks Analysis and Mining -ASONAM-2024

arXiv:2410.07436 [pdf, other]

Toward Robust Real-World Audio Deepfake Detection: Closing the Explainability Gap

Authors: Georgia Channing, Juil Sock, Ronald Clark, Philip Torr, Christian Schroeder de Witt

Abstract: The rapid proliferation of AI-manipulated or generated audio deepfakes poses serious challenges to media integrity and election security. Current AI-driven detection solutions lack explainability and underperform in real-world settings. In this paper, we introduce novel explainability methods for state-of-the-art transformer-based audio deepfake detectors and open-source a novel benchmark for real… ▽ More The rapid proliferation of AI-manipulated or generated audio deepfakes poses serious challenges to media integrity and election security. Current AI-driven detection solutions lack explainability and underperform in real-world settings. In this paper, we introduce novel explainability methods for state-of-the-art transformer-based audio deepfake detectors and open-source a novel benchmark for real-world generalizability. By narrowing the explainability gap between transformer-based audio deepfake detectors and traditional methods, our results not only build trust with human experts, but also pave the way for unlocking the potential of citizen intelligence to overcome the scalability issue in audio deepfake detection. △ Less

Submitted 9 October, 2024; originally announced October 2024.

arXiv:2410.06880 [pdf, ps, other]

Cooperative UAV-Relay based Satellite Aerial Ground Integrated Networks

Authors: Bhola, Yu-Jia Chen, Ashutosh Balakrishnan, Swades De, Li-Chun Wang

Abstract: In the post-fifth generation (5G) era, escalating user quality of service (QoS) strains terrestrial network capacity, especially in urban areas with dynamic traffic distributions. This paper introduces a novel cooperative unmanned aerial vehicle relay-based deployment (CUD) framework in satellite air-ground integrated networks (SAGIN). The CUD strategy deploys an unmanned aerial vehicle-based rela… ▽ More In the post-fifth generation (5G) era, escalating user quality of service (QoS) strains terrestrial network capacity, especially in urban areas with dynamic traffic distributions. This paper introduces a novel cooperative unmanned aerial vehicle relay-based deployment (CUD) framework in satellite air-ground integrated networks (SAGIN). The CUD strategy deploys an unmanned aerial vehicle-based relay (UAVr) in an amplify-andforward (AF) mode to enhance user QoS when terrestrial base stations fall short of network capacity. By combining low earth orbit (LEO) satellite and UAVr signals using cooperative diversity, the CUD framework enhances the signal to noise ratio (SNR) at the user. Comparative evaluations against existing frameworks reveal performance improvements, demonstrating the effectiveness of the CUD framework in addressing the evolving demands of next-generation networks. △ Less

Submitted 9 October, 2024; originally announced October 2024.

Comments: 5 pages, 3 figures, to appear in IEEE 100th Vehicular Technology Conference (VTC2024-Fall)

arXiv:2409.10131 [pdf, other]

Room impulse response prototyping using receiver distance estimations for high quality room equalisation algorithms

Authors: James Brooks-Park, Martin Bo Møller, Jan Østergaard, Søren Bech, Steven van de Par

Abstract: Room equalisation aims to increase the quality of loudspeaker reproduction in reverberant environments, compensating for colouration caused by imperfect room reflections and frequency dependant loudspeaker directivity. A common technique in the field of room equalisation, is to invert a prototype Room Impulse Response (RIR). Rather than inverting a single RIR at the listening position, a prototype… ▽ More Room equalisation aims to increase the quality of loudspeaker reproduction in reverberant environments, compensating for colouration caused by imperfect room reflections and frequency dependant loudspeaker directivity. A common technique in the field of room equalisation, is to invert a prototype Room Impulse Response (RIR). Rather than inverting a single RIR at the listening position, a prototype response is composed of several responses distributed around the listening area. This paper proposes a method of impulse response prototyping, using estimated receiver positions, to form a weighted average prototype response. A method of receiver distance estimation is described, supporting the implementation of the prototype RIR. The proposed prototyping method is compared to other methods by measuring their post equalisation spectral deviation at several positions in a simulated room. △ Less

Submitted 16 September, 2024; originally announced September 2024.

arXiv:2409.03318 [pdf, other]

Pick the Largest Margin for Robust Detection of Splicing

Authors: Julien Simon de Kergunic, Rony Abecidan, Patrick Bas, Vincent Itier

Abstract: Despite advancements in splicing detection, practitioners still struggle to fully leverage forensic tools from the literature due to a critical issue: deep learning-based detectors are extremely sensitive to their trained instances. Simple post-processing applied to evaluation images can easily decrease their performances, leading to a lack of confidence in splicing detectors for operational conte… ▽ More Despite advancements in splicing detection, practitioners still struggle to fully leverage forensic tools from the literature due to a critical issue: deep learning-based detectors are extremely sensitive to their trained instances. Simple post-processing applied to evaluation images can easily decrease their performances, leading to a lack of confidence in splicing detectors for operational contexts. In this study, we show that a deep splicing detector behaves differently against unknown post-processes for different learned weights, even if it achieves similar performances on a test set from the same distribution as its training one. We connect this observation to the fact that different learnings create different latent spaces separating training samples differently. Our experiments reveal a strong correlation between the distributions of latent margins and the ability of the detector to generalize to post-processed images. We thus provide to the practitioner a way to build deep detectors that are more robust than others against post-processing operations, suggesting to train their architecture under different conditions and picking the one maximizing the latent space margin. △ Less

Submitted 6 September, 2024; v1 submitted 5 September, 2024; originally announced September 2024.

arXiv:2408.15867 [pdf, ps, other]

Practical Challenges for Reliable RIS Deployment in Heterogeneous Multi-Operator Multi-Band Networks

Authors: Mehdi Monemi, Mehdi Rasti, Arthur S. de Sena, Mohammad Amir Fallah, Matti Latva-Aho, Marco Di Renzo

Abstract: Reconfigurable intelligent surfaces (RISs) have been introduced as arrays of nearly passive elements with software-tunable electromagnetic properties to dynamically manipulate the reflection/transmission of radio signals. Research works in this area are focused on two applications, namely {\it user-assist} RIS aiming at tuning the RIS to enhance the quality-of-service (QoS) of target users, and th… ▽ More Reconfigurable intelligent surfaces (RISs) have been introduced as arrays of nearly passive elements with software-tunable electromagnetic properties to dynamically manipulate the reflection/transmission of radio signals. Research works in this area are focused on two applications, namely {\it user-assist} RIS aiming at tuning the RIS to enhance the quality-of-service (QoS) of target users, and the {\it malicious} RIS aiming for an attacker to degrade the QoS at victim receivers through generating {\it intended} destructive interference. While both user-assist and malicious RIS applications have been explored extensively, the impact of RIS deployments on imposing {\it unintended} interference on various wireless user-equipments (EUs) remains underexplored. This paper investigates the challenges of integrating RISs into multi-carrier, multi-user, and multi-operator networks. We discuss how RIS deployments intended to benefit specific users can negatively impact other users served at various carrier frequencies through different network operators. While not an ideal solution, we discuss how ultra-narrowband metasurfaces can be incorporated into the manufacturing of RISs to mitigate some challenges of RIS deployment in wireless networks. We also present a simulation scenario to illuminate some practical challenges associated with the deployment of RISs in shared public environments. △ Less

Submitted 29 June, 2025; v1 submitted 28 August, 2024; originally announced August 2024.

arXiv:2408.14198 [pdf]

Combined assessment of auditory distance perception and externalization

Authors: Henning Hoppe, Steven van de Par, Virginia Flanagin, Stephan D. Ewert

Abstract: This study investigates frontal auditory distance perception (ADP) and externalization in virtual audio-visual environments, considering effects of headphone rendering method, room size, reverberation, and visual representation of the room. Either head-related impulse responses from an artificial head or a spherical head model were used for diotic (monophonic) and binaural auralizations with and w… ▽ More This study investigates frontal auditory distance perception (ADP) and externalization in virtual audio-visual environments, considering effects of headphone rendering method, room size, reverberation, and visual representation of the room. Either head-related impulse responses from an artificial head or a spherical head model were used for diotic (monophonic) and binaural auralizations with and without real-time head tracking. The visuals were presented through a head-mounted display. Two differently sized rooms as well as an infinitely extending space (echoic and anechoic) were used in which an invisible frontal virtual sound source was located. Additionally, the effect of a freely movable loudspeaker for visually indicating perceived distances was investigated. Both ADP and externalization were significantly affected by room size, but otherwise the two perceptual quantities differed in their outcomes. Room visibility significantly affected ADP, leading to considerable overestimations and more variability in the absence of a visual environment, although externalization was not affected. The movable loudspeaker improved distance estimation significantly, however, did not affect externalization. For reverberation, a (non-significant) trend of improved ADP was observed, however, externalization was significantly improved. Different headphone renderings did not significantly affect ADP or externalization, although a clear trend was observed for externalization. △ Less

Submitted 26 August, 2024; originally announced August 2024.

Comments: This work has been submitted to The Journal of the Acoustical Society of America of the for possible publication

arXiv:2408.13904 [pdf]

The effect of self-motion and room familiarity on sound source localization in virtual environments

Authors: Niklas Isserstedt, Stephan D. Ewert, Virginia Flanagin, Steven van de Par

Abstract: This paper investigates the influence of lateral horizontal self-motion of participants during signal presentation on distance and azimuth perception for frontal sound sources in a rectangular room. Additionally, the effect of deviating room acoustics for a single sound presentation embedded in a sequence of presentations using a baseline room acoustics for familiarization is analyzed. For this pu… ▽ More This paper investigates the influence of lateral horizontal self-motion of participants during signal presentation on distance and azimuth perception for frontal sound sources in a rectangular room. Additionally, the effect of deviating room acoustics for a single sound presentation embedded in a sequence of presentations using a baseline room acoustics for familiarization is analyzed. For this purpose, two experiments were conducted using audiovisual virtual reality technology with dynamic head-tracking and real-time auralization over headphones combined with visual rendering of the room using a head-mounted display. Results show an improved distance perception accuracy when participants moved laterally during signal presentation instead of staying at a fixed position, with only head movements allowed. Adaptation to the room acoustics also improves distance perception accuracy. Azimuth perception seems to be independent of lateral movements during signal presentation and could even be negatively influenced by the familiarity of the used room acoustics. △ Less

Submitted 25 August, 2024; originally announced August 2024.

arXiv:2408.13035 [pdf, other]

Malicious RIS Meets RSMA: Unveiling the Robustness of Rate Splitting to RIS-Induced Attacks

Authors: A. S. de Sena, A. Gomes, J. Kibiłda, N. H. Mahmood, L. A. DaSilva, M. Latva-aho

Abstract: While the robustness of rate-splitting multiple access (RSMA) to imperfect channel state information (CSI) is well-documented, its susceptibility to attacks launched with malicious reconfigurable intelligent surfaces (RISs) remains unexplored. This paper fills this gap by investigating three potential RIS-induced attacks against RSMA in a multi-user multiple-input multiple-output (MIMO) network: r… ▽ More While the robustness of rate-splitting multiple access (RSMA) to imperfect channel state information (CSI) is well-documented, its susceptibility to attacks launched with malicious reconfigurable intelligent surfaces (RISs) remains unexplored. This paper fills this gap by investigating three potential RIS-induced attacks against RSMA in a multi-user multiple-input multiple-output (MIMO) network: random interference, aligned interference, and mitigation attack. The random interference attack employs random RIS coefficients to disrupt RSMA. The other two attacks are triggered by optimizing the RIS through weighted-sum strategies based on the projected gradient method. Simulation results reveal significant degradation caused by all the attacks under perfect CSI conditions. Remarkably, when imperfect CSI is considered, RSMA, owing to its flexible power allocation strategy designed to counter CSI-related interference, can be robust to the attacks even when the base station is blind to them. It is also shown that RSMA can significantly outperform conventional space-division multiple access (SDMA). △ Less

Submitted 23 August, 2024; originally announced August 2024.

Comments: Accepted in IEEE Global Communications Conference (GLOBECOM), Dec. 2024, Cape Town, South Africa

arXiv:2408.00437 [pdf, ps, other]

Efficient Patient Fine-Tuned Seizure Detection with a Tensor Kernel Machine

Authors: Seline J. S. de Rooij, Frederiek Wesel, Borbála Hunyadi

Abstract: Recent developments in wearable devices have made accurate and efficient seizure detection more important than ever. A challenge in seizure detection is that patient-specific models typically outperform patient-independent models. However, in a wearable device one typically starts with a patient-independent model, until such patient-specific data is available. To avoid having to construct a new cl… ▽ More Recent developments in wearable devices have made accurate and efficient seizure detection more important than ever. A challenge in seizure detection is that patient-specific models typically outperform patient-independent models. However, in a wearable device one typically starts with a patient-independent model, until such patient-specific data is available. To avoid having to construct a new classifier with this data, as required in conventional kernel machines, we propose a transfer learning approach with a tensor kernel machine. This method learns the primal weights in a compressed form using the canonical polyadic decomposition, making it possible to efficiently update the weights of the patient-independent model with patient-specific data. The results show that this patient fine-tuned model reaches as high a performance as a patient-specific SVM model with a model size that is twice as small as the patient-specific model and ten times as small as the patient-independent model. △ Less

Submitted 1 August, 2024; originally announced August 2024.

Comments: 5 pages, to be published in the EUSIPCO2024 conference proceedings

arXiv:2407.15321 [pdf, other]

doi 10.1080/01431161.2024.2384098

Hierarchical Homogeneity-Based Superpixel Segmentation: Application to Hyperspectral Image Analysis

Authors: Luciano Carvalho Ayres, Sérgio José Melo de Almeida, José Carlos Moreira Bermudez, Ricardo Augusto Borsoi

Abstract: Hyperspectral image (HI) analysis approaches have recently become increasingly complex and sophisticated. Recently, the combination of spectral-spatial information and superpixel techniques have addressed some hyperspectral data issues, such as the higher spatial variability of spectral signatures and dimensionality of the data. However, most existing superpixel approaches do not account for speci… ▽ More Hyperspectral image (HI) analysis approaches have recently become increasingly complex and sophisticated. Recently, the combination of spectral-spatial information and superpixel techniques have addressed some hyperspectral data issues, such as the higher spatial variability of spectral signatures and dimensionality of the data. However, most existing superpixel approaches do not account for specific HI characteristics resulting from its high spectral dimension. In this work, we propose a multiscale superpixel method that is computationally efficient for processing hyperspectral data. The Simple Linear Iterative Clustering (SLIC) oversegmentation algorithm, on which the technique is based, has been extended hierarchically. Using a novel robust homogeneity testing, the proposed hierarchical approach leads to superpixels of variable sizes but with higher spectral homogeneity when compared to the classical SLIC segmentation. For validation, the proposed homogeneity-based hierarchical method was applied as a preprocessing step in the spectral unmixing and classification tasks carried out using, respectively, the Multiscale sparse Unmixing Algorithm (MUA) and the CNN-Enhanced Graph Convolutional Network (CEGCN) methods. Simulation results with both synthetic and real data show that the technique is competitive with state-of-the-art solutions. △ Less

Submitted 21 July, 2024; originally announced July 2024.

arXiv:2407.02636 [pdf, other]

MmWave for Extended Reality: Open User Mobility Dataset, Characterisation, and Impact on Link Quality

Authors: Alexander Marinsek, Sam De Kunst, Gilles Callebaut, Lieven De Strycker, Liesbet Van der Perre

Abstract: User mobility in extended reality (XR) can have a major impact on millimeter-wave (mmWave) links and may require dedicated mitigation strategies to ensure reliable connections and avoid outage. The available prior art has predominantly focused on XR applications with constrained user mobility and limited impact on mmWave channels. We have performed dedicated experiments to extend the characterisat… ▽ More User mobility in extended reality (XR) can have a major impact on millimeter-wave (mmWave) links and may require dedicated mitigation strategies to ensure reliable connections and avoid outage. The available prior art has predominantly focused on XR applications with constrained user mobility and limited impact on mmWave channels. We have performed dedicated experiments to extend the characterisation of relevant future XR use cases featuring a high degree of user mobility. To this end, we have carried out a tailor-made measurement campaign and conducted a characterisation of the collected tracking data, including the approximation of the data using statistical distributions. Moreover, we have provided an interpretation of the possible impact of the recorded mobility on mmWave technology. The dataset is made publicly accessible to provide a testing ground for wireless system design and to enable further XR mobility modelling. △ Less

Submitted 14 April, 2025; v1 submitted 2 July, 2024; originally announced July 2024.

Comments: In the process of being published in the IEEE Communications Magazine, special issue FT2304 / eXtended Reality

arXiv:2407.00463 [pdf, other]

Open-Source Conversational AI with SpeechBrain 1.0

Authors: Mirco Ravanelli, Titouan Parcollet, Adel Moumen, Sylvain de Langen, Cem Subakan, Peter Plantinga, Yingzhi Wang, Pooneh Mousavi, Luca Della Libera, Artem Ploujnikov, Francesco Paissan, Davide Borra, Salah Zaiem, Zeyu Zhao, Shucong Zhang, Georgios Karakasidis, Sung-Lin Yeh, Pierre Champion, Aku Rouhe, Rudolf Braun, Florian Mai, Juan Zuluaga-Gomez, Seyed Mahed Mousavi, Andreas Nautsch, Ha Nguyen , et al. (8 additional authors not shown)

Abstract: SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much more. It promotes transparency and replicability by releasing both the pre-trained models and the complete "recipes" of code and algorithms required for training them. This paper prese… ▽ More SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much more. It promotes transparency and replicability by releasing both the pre-trained models and the complete "recipes" of code and algorithms required for training them. This paper presents SpeechBrain 1.0, a significant milestone in the evolution of the toolkit, which now has over 200 recipes for speech, audio, and language processing tasks, and more than 100 models available on Hugging Face. SpeechBrain 1.0 introduces new technologies to support diverse learning modalities, Large Language Model (LLM) integration, and advanced decoding strategies, along with novel models, tasks, and modalities. It also includes a new benchmark repository, offering researchers a unified platform for evaluating models across diverse tasks. △ Less

Submitted 16 October, 2024; v1 submitted 29 June, 2024; originally announced July 2024.

Comments: Accepted to the Journal of Machine Learning research (JMLR), Machine Learning Open Source Software

arXiv:2406.07387 [pdf, ps, other]

Machine Learning-Based Channel Prediction for RIS-assisted MIMO Systems With Channel Aging

Authors: Nipuni Ginige, Arthur Sousa de Sena, Nurul Huda Mahmood, Nandana Rajatheva, Matti Latva-aho

Abstract: Reconfigurable intelligent surfaces (RISs) have emerged as a promising technology to enhance the performance of sixth-generation (6G) and beyond communication systems. The passive nature of RISs and their large number of reflecting elements pose challenges to the channel estimation process. The associated complexity further escalates when the channel coefficients are fast-varying as in scenarios w… ▽ More Reconfigurable intelligent surfaces (RISs) have emerged as a promising technology to enhance the performance of sixth-generation (6G) and beyond communication systems. The passive nature of RISs and their large number of reflecting elements pose challenges to the channel estimation process. The associated complexity further escalates when the channel coefficients are fast-varying as in scenarios with user mobility. In this paper, we propose an extended channel estimation framework for RIS-assisted multiple-input multiple-output (MIMO) systems based on a convolutional neural network (CNN) integrated with an autoregressive (AR) predictor. The implemented framework is designed for identifying the aging pattern and predicting enhanced estimates of the wireless channels in correlated fast-fading environments. Insightful simulation results demonstrate that our proposed CNN-AR approach is robust to channel aging, exhibiting a high-precision estimation accuracy. The results also show that our approach can achieve high spectral efficiency and low pilot overhead compared to traditional methods. △ Less

Submitted 9 May, 2024; originally announced June 2024.

arXiv:2406.06208 [pdf, other]

Quantifying the effect of speech pathology on automatic and human speaker verification

Authors: Bence Mark Halpern, Thomas Tienkamp, Wen-Chin Huang, Lester Phillip Violeta, Teja Rebernik, Sebastiaan de Visscher, Max Witjes, Martijn Wieling, Defne Abur, Tomoki Toda

Abstract: This study investigates how surgical intervention for speech pathology (specifically, as a result of oral cancer surgery) impacts the performance of an automatic speaker verification (ASV) system. Using two recently collected Dutch datasets with parallel pre and post-surgery audio from the same speaker, NKI-OC-VC and SPOKE, we assess the extent to which speech pathology influences ASV performance,… ▽ More This study investigates how surgical intervention for speech pathology (specifically, as a result of oral cancer surgery) impacts the performance of an automatic speaker verification (ASV) system. Using two recently collected Dutch datasets with parallel pre and post-surgery audio from the same speaker, NKI-OC-VC and SPOKE, we assess the extent to which speech pathology influences ASV performance, and whether objective/subjective measures of speech severity are correlated with the performance. Finally, we carry out a perceptual study to compare judgements of ASV and human listeners. Our findings reveal that pathological speech negatively affects ASV performance, and the severity of the speech is negatively correlated with the performance. There is a moderate agreement in perceptual and objective scores of speaker similarity and severity, however, we could not clearly establish in the perceptual study, whether the same phenomenon also exists in human perception. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 5 pages, 2 figures, 2 tables. Accepted to Interspeech 2024

ACM Class: I.2.7

arXiv:2405.10004 [pdf, other]

doi 10.1038/s41597-024-03496-6

ROCOv2: Radiology Objects in COntext Version 2, an Updated Multimodal Image Dataset

Authors: Johannes Rückert, Louise Bloch, Raphael Brüngel, Ahmad Idrissi-Yaghir, Henning Schäfer, Cynthia S. Schmidt, Sven Koitka, Obioma Pelka, Asma Ben Abacha, Alba G. Seco de Herrera, Henning Müller, Peter A. Horn, Felix Nensa, Christoph M. Friedrich

Abstract: Automated medical image analysis systems often require large amounts of training data with high quality labels, which are difficult and time consuming to generate. This paper introduces Radiology Object in COntext version 2 (ROCOv2), a multimodal dataset consisting of radiological images and associated medical concepts and captions extracted from the PMC Open Access subset. It is an updated versio… ▽ More Automated medical image analysis systems often require large amounts of training data with high quality labels, which are difficult and time consuming to generate. This paper introduces Radiology Object in COntext version 2 (ROCOv2), a multimodal dataset consisting of radiological images and associated medical concepts and captions extracted from the PMC Open Access subset. It is an updated version of the ROCO dataset published in 2018, and adds 35,705 new images added to PMC since 2018. It further provides manually curated concepts for imaging modalities with additional anatomical and directional concepts for X-rays. The dataset consists of 79,789 images and has been used, with minor modifications, in the concept detection and caption prediction tasks of ImageCLEFmedical Caption 2023. The dataset is suitable for training image annotation models based on image-caption pairs, or for multi-label image classification using Unified Medical Language System (UMLS) concepts provided with each image. In addition, it can serve for pre-training of medical domain models, and evaluation of deep learning models for multi-task learning. △ Less

Submitted 18 June, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

Comments: Accepted for Scientific Data

arXiv:2404.09666 [pdf, other]

Deformable MRI Sequence Registration for AI-based Prostate Cancer Diagnosis

Authors: Alessa Hering, Sarah de Boer, Anindo Saha, Jasper J. Twilt, Mattias P. Heinrich, Derya Yakar, Maarten de Rooij, Henkjan Huisman, Joeran S. Bosma

Abstract: The PI-CAI (Prostate Imaging: Cancer AI) challenge led to expert-level diagnostic algorithms for clinically significant prostate cancer detection. The algorithms receive biparametric MRI scans as input, which consist of T2-weighted and diffusion-weighted scans. These scans can be misaligned due to multiple factors in the scanning process. Image registration can alleviate this issue by predicting t… ▽ More The PI-CAI (Prostate Imaging: Cancer AI) challenge led to expert-level diagnostic algorithms for clinically significant prostate cancer detection. The algorithms receive biparametric MRI scans as input, which consist of T2-weighted and diffusion-weighted scans. These scans can be misaligned due to multiple factors in the scanning process. Image registration can alleviate this issue by predicting the deformation between the sequences. We investigate the effect of image registration on the diagnostic performance of AI-based prostate cancer diagnosis. First, the image registration algorithm, developed in MeVisLab, is analyzed using a dataset with paired lesion annotations. Second, the effect on diagnosis is evaluated by comparing case-level cancer diagnosis performance between using the original dataset, rigidly aligned diffusion-weighted scans, or deformably aligned diffusion-weighted scans. Rigid registration showed no improvement. Deformable registration demonstrated a substantial improvement in lesion overlap (+10% median Dice score) and a positive yet non-significant improvement in diagnostic performance (+0.3% AUROC, p=0.18). Our investigation shows that a substantial improvement in lesion alignment does not directly lead to a significant improvement in diagnostic performance. Qualitative analysis indicated that jointly developing image registration methods and diagnostic AI algorithms could enhance diagnostic accuracy and patient outcomes. △ Less

Submitted 28 June, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

arXiv:2401.13161 [pdf, other]

doi 10.1109/LGRS.2024.3358694

A Generalized Multiscale Bundle-Based Hyperspectral Sparse Unmixing Algorithm

Authors: Luciano Carvalho Ayres, Ricardo Augusto Borsoi, José Carlos Moreira Bermudez, Sérgio José Melo de Almeida

Abstract: In hyperspectral sparse unmixing, a successful approach employs spectral bundles to address the variability of the endmembers in the spatial domain. However, the regularization penalties usually employed aggregate substantial computational complexity, and the solutions are very noise-sensitive. We generalize a multiscale spatial regularization approach to solve the unmixing problem by incorporatin… ▽ More In hyperspectral sparse unmixing, a successful approach employs spectral bundles to address the variability of the endmembers in the spatial domain. However, the regularization penalties usually employed aggregate substantial computational complexity, and the solutions are very noise-sensitive. We generalize a multiscale spatial regularization approach to solve the unmixing problem by incorporating group sparsity-inducing mixed norms. Then, we propose a noise-robust method that can take advantage of the bundle structure to deal with endmember variability while ensuring inter- and intra-class sparsity in abundance estimation with reasonable computational cost. We also present a general heuristic to select the \emph{most representative} abundance estimation over multiple runs of the unmixing process, yielding a solution that is robust and highly reproducible. Experiments illustrate the robustness and consistency of the results when compared to related methods. △ Less

Submitted 23 January, 2024; originally announced January 2024.

arXiv:2401.07149 [pdf, other]

Malicious RIS versus Massive MIMO: Securing Multiple Access against RIS-based Jamming Attacks

Authors: Arthur Sousa de Sena, Jacek Kibilda, Nurul Huda Mahmood, André Gomes, Matti Latva-aho

Abstract: In this letter, we study an attack that leverages a reconfigurable intelligent surface (RIS) to induce harmful interference toward multiple users in massive multiple-input multiple-output (mMIMO) systems during the data transmission phase. We propose an efficient and flexible weighted-sum projected gradient-based algorithm for the attacker to optimize the RIS reflection coefficients without knowin… ▽ More In this letter, we study an attack that leverages a reconfigurable intelligent surface (RIS) to induce harmful interference toward multiple users in massive multiple-input multiple-output (mMIMO) systems during the data transmission phase. We propose an efficient and flexible weighted-sum projected gradient-based algorithm for the attacker to optimize the RIS reflection coefficients without knowing legitimate user channels. To counter such a threat, we propose two reception strategies. Simulation results demonstrate that our malicious algorithm outperforms baseline strategies while offering adaptability for targeting specific users. At the same time, our results show that our mitigation strategies are effective even if only an imperfect estimate of the cascade RIS channel is available. △ Less

Submitted 13 January, 2024; originally announced January 2024.

arXiv:2401.06475 [pdf, other]

Beyond Diagonal RIS for Multi-Band Multi-Cell MIMO Networks: A Practical Frequency-Dependent Model and Performance Analysis

Authors: Arthur S. de Sena, Mehdi Rasti, Nurul H. Mahmood, Matti Latva-aho

Abstract: This paper delves into the unexplored frequency-dependent characteristics of beyond diagonal reconfigurable intelligent surfaces (BD-RISs). A generalized practical frequency-dependent reflection model is proposed as a fundamental framework for configuring fully-connected and group-connected RISs in a multi-band multi-base station (BS) multiple-input multiple-output (MIMO) network. Leveraging this… ▽ More This paper delves into the unexplored frequency-dependent characteristics of beyond diagonal reconfigurable intelligent surfaces (BD-RISs). A generalized practical frequency-dependent reflection model is proposed as a fundamental framework for configuring fully-connected and group-connected RISs in a multi-band multi-base station (BS) multiple-input multiple-output (MIMO) network. Leveraging this practical model, multi-objective optimization strategies are formulated to maximize the received power at multiple users connected to different BSs, each operating under a distinct carrier frequency. By relying on matrix theory and exploiting the symmetric structure of the reflection matrices inherent to BD-RISs, relaxed tractable versions of the challenging problems are achieved for scenarios with obstructed and unobstructed direct channel links. The relaxed solutions are then combined with codebook-based approaches to configure the practical capacitance values for the BD-RISs. Simulation results reveal the frequency-dependent behaviors of different RIS architectures and demonstrate the effectiveness of the proposed schemes. Notably, BD-RISs exhibit high reflection performance across the intended frequency range, remarkably outperforming conventional single-connected RISs. Moreover, the proposed optimization approaches prove effective in enabling the targeted operation of BD-RISs across one or more carrier frequencies. The results also shed light on the potential for harmful interference in the absence of synchronization between RISs and adjacent BSs. △ Less

Submitted 24 June, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

arXiv:2312.05704 [pdf, other]

doi 10.1109/COMST.2024.3417336

On the Ground and in the Sky: A Tutorial on Radio Localization in Ground-Air-Space Networks

Authors: Hazem Sallouha, Sharief Saleh, Sibren De Bast, Zhuangzhuang Cui, Sofie Pollin, Henk Wymeersch

Abstract: The inherent limitations in scaling up ground infrastructure for future wireless networks, combined with decreasing operational costs of aerial and space networks, are driving considerable research interest in multisegment ground-air-space (GAS) networks. In GAS networks, where ground and aerial users share network resources, ubiquitous and accurate user localization becomes indispensable, not onl… ▽ More The inherent limitations in scaling up ground infrastructure for future wireless networks, combined with decreasing operational costs of aerial and space networks, are driving considerable research interest in multisegment ground-air-space (GAS) networks. In GAS networks, where ground and aerial users share network resources, ubiquitous and accurate user localization becomes indispensable, not only as an end-user service but also as an enabler for location-aware communications. This breaks the convention of having localization as a byproduct in networks primarily designed for communications. To address these imperative localization needs, the design and utilization of ground, aerial, and space anchors require thorough investigation. In this tutorial, we provide an in-depth systemic analysis of the radio localization problem in GAS networks, considering ground and aerial users as targets to be localized. Starting from a survey of the most relevant works, we then define the key characteristics of anchors and targets in GAS networks. Subsequently, we detail localization fundamentals in GAS networks, considering 3D positions, orientations, and velocities. Afterward, we thoroughly analyze radio localization systems in GAS networks, detailing the system model, design aspects, and considerations for each of the three GAS anchors. Preliminary results are presented to provide a quantifiable perspective on key design aspects in GAS-based localization scenarios. We then identify the vital roles 6G enablers are expected to play in radio localization in GAS networks. △ Less

Submitted 9 August, 2024; v1 submitted 9 December, 2023; originally announced December 2023.

Comments: Accepted for publication in IEEE Communications Surveys & Tutorials

arXiv:2311.00624 [pdf, other]

Reverberant sound field equalisation for an enhanced stereo playback experience

Authors: James Brooks-Park, Steven van de Par

Abstract: The topic of room equalisation has been at the forefront of research and product development for many years, with the aim of increasing the playback quality of loudspeakers in reverberant rooms. Traditional room equalisation systems comprise of a number of filters that when applied to the primary loudspeakers, additional room colouration is compensated for. This publication introduces a novel equa… ▽ More The topic of room equalisation has been at the forefront of research and product development for many years, with the aim of increasing the playback quality of loudspeakers in reverberant rooms. Traditional room equalisation systems comprise of a number of filters that when applied to the primary loudspeakers, additional room colouration is compensated for. This publication introduces a novel equalisation technique where gammatone filter band energy is added to the reverberant sound field via two surround loudspeakers, leaving the direct sound from the primary loudspeakers unaltered, but the sum of direct and reverberant energy is equalised at the listening position. Unlike traditional systems, this method allows the target function of the direct sound to differ from the reverberant sound field. The proposed method is motivated by the different roles direct and reverberant sound components play in humans perception of sound. Along with introducing the proposed method, results from a subjective listening test are presented, demonstrating the preference towards the proposed technique when compared to a traditional room equalisation technique and stereo playback. △ Less

Submitted 1 November, 2023; originally announced November 2023.

arXiv:2310.14217 [pdf, ps, other]

On the Sum Secrecy Rate of Multi-User Holographic MIMO Networks

Authors: Arthur S. de Sena, Jiguang He, Ahmed Al Hammadi, Chongwen Huang, Faouzi Bader, Merouane Debbah, Mathias Fink

Abstract: The emerging concept of extremely-large holographic multiple-input multiple-output (HMIMO), beneficial from compactly and densely packed cost-efficient radiating meta-atoms, has been demonstrated for enhanced degrees of freedom even in pure line-of-sight conditions, enabling tremendous multiplexing gain for the next-generation communication systems. Most of the reported works focus on energy and s… ▽ More The emerging concept of extremely-large holographic multiple-input multiple-output (HMIMO), beneficial from compactly and densely packed cost-efficient radiating meta-atoms, has been demonstrated for enhanced degrees of freedom even in pure line-of-sight conditions, enabling tremendous multiplexing gain for the next-generation communication systems. Most of the reported works focus on energy and spectrum efficiency, path loss analyses, and channel modeling. The extension to secure communications remains unexplored. In this paper, we theoretically characterize the secrecy capacity of the HMIMO network with multiple legitimate users and one eavesdropper while taking into consideration artificial noise and max-min fairness. We formulate the power allocation (PA) problem and address it by following successive convex approximation and Taylor expansion. We further study the effect of fixed PA coefficients, imperfect channel state information, inter-element spacing, and the number of Eve's antennas on the sum secrecy rate. Simulation results show that significant performance gain with more than 100\% increment in the high signal-to-noise ratio (SNR) regime for the two-user case is obtained by exploiting adaptive/flexible PA compared to the case with fixed PA coefficients. △ Less

Submitted 22 October, 2023; originally announced October 2023.

Comments: 7 pages, 7 figures, submitted to IEEE ICC 2024

arXiv:2309.10825 [pdf, other]

doi 10.1016/j.cmpb.2024.108395

Latent Disentanglement in Mesh Variational Autoencoders Improves the Diagnosis of Craniofacial Syndromes and Aids Surgical Planning

Authors: Simone Foti, Alexander J. Rickart, Bongjin Koo, Eimear O' Sullivan, Lara S. van de Lande, Athanasios Papaioannou, Roman Khonsari, Danail Stoyanov, N. u. Owase Jeelani, Silvia Schievano, David J. Dunaway, Matthew J. Clarkson

Abstract: The use of deep learning to undertake shape analysis of the complexities of the human head holds great promise. However, there have traditionally been a number of barriers to accurate modelling, especially when operating on both a global and local level. In this work, we will discuss the application of the Swap Disentangled Variational Autoencoder (SD-VAE) with relevance to Crouzon, Apert and Muen… ▽ More The use of deep learning to undertake shape analysis of the complexities of the human head holds great promise. However, there have traditionally been a number of barriers to accurate modelling, especially when operating on both a global and local level. In this work, we will discuss the application of the Swap Disentangled Variational Autoencoder (SD-VAE) with relevance to Crouzon, Apert and Muenke syndromes. Although syndrome classification is performed on the entire mesh, it is also possible, for the first time, to analyse the influence of each region of the head on the syndromic phenotype. By manipulating specific parameters of the generative model, and producing procedure-specific new shapes, it is also possible to simulate the outcome of a range of craniofacial surgical procedures. This opens new avenues to advance diagnosis, aids surgical planning and allows for the objective evaluation of surgical outcomes. △ Less

Submitted 5 September, 2023; originally announced September 2023.

arXiv:2308.09521 [pdf, other]

Data-driven Topology and Parameter Identification in Distribution Systems with limited Measurements

Authors: Steven de Jongh, Felicitas Mueller, Fabian Osterberg, Claudio A. Cañizares, Thomas Leibfried, Kankar Bhattacharya

Abstract: This manuscript presents novel techniques for identifying the switch states, phase identification, and estimation of equipment parameters in multi-phase low voltage electrical grids, which is a major challenge in long-standing German low voltage grids that lack observability and are heavily impacted by modelling errors. The proposed methods are tailored for systems with a limited number of spatial… ▽ More This manuscript presents novel techniques for identifying the switch states, phase identification, and estimation of equipment parameters in multi-phase low voltage electrical grids, which is a major challenge in long-standing German low voltage grids that lack observability and are heavily impacted by modelling errors. The proposed methods are tailored for systems with a limited number of spatially distributed measuring devices, which measure voltage magnitudes at specific nodes and some line current magnitudes. The overall approach employs a problem decomposition strategy to divide the problem into smaller subproblems, which are addressed independently. The techniques for identifying switch states and system phases are based on heuristics and a binary optimization problem using correlation analysis of the measured time series. The estimation of equipment parameters is achieved through a data-driven regression approach and by an optimization problem, and the identification of cable types is solved using a Mixed-Integer Quadratic Programming solver. To validate the presented methods, a realistic grid is used and the presented techniques are evaluated for their resilience to data quality and time resolution, discussing the limitations of the proposed methods. △ Less

Submitted 18 August, 2023; originally announced August 2023.

arXiv:2307.02806 [pdf, ps, other]

A Singular-value-based Marker for the Detection of Atrial Fibrillation Using High-resolution Electrograms and Multi-lead ECG

Authors: Hanie Moghaddasi, Richard C. Hendriks, Borbala Hunyadi, Paul Knops, Mathijs S van Schie, Natasja M. S. de Groot, Alle-Jan van der Veen

Abstract: The severity of atrial fibrillation (AF) can be assessed from intra-operative epicardial measurements (high-resolution electrograms), using metrics such as conduction block (CB) and continuous conduction delay and block (cCDCB). These features capture differences in conduction velocity and wavefront propagation. However, they do not clearly differentiate patients with various degrees of AF while t… ▽ More The severity of atrial fibrillation (AF) can be assessed from intra-operative epicardial measurements (high-resolution electrograms), using metrics such as conduction block (CB) and continuous conduction delay and block (cCDCB). These features capture differences in conduction velocity and wavefront propagation. However, they do not clearly differentiate patients with various degrees of AF while they are in sinus rhythm, and complementary features are needed. In this work, we focus on the morphology of the action potentials, and derive features to detect variations in the atrial potential waveforms. Methods: We show that the spatial variation of atrial potential morphology during a single beat may be described by changes in the singular values of the epicardial measurement matrix. The method is non-parametric and requires little preprocessing. A corresponding singular value map points at areas subject to fractionation and block. Further, we developed an experiment where we simultaneously measure electrograms (EGMs) and a multi-lead ECG. Results: The captured data showed that the normalized singular values of the heartbeats during AF are higher than during SR, and that this difference is more pronounced for the (non-invasive) ECG data than for the EGM data, if the electrodes are positioned at favorable locations. Conclusion: Overall, the singular value-based features are a useful indicator to detect and evaluate AF. Significance: The proposed method might be beneficial for identifying electropathological regions in the tissue without estimating the local activation time. △ Less

Submitted 6 July, 2023; originally announced July 2023.

Comments: 11 pages, 17 figures

arXiv:2306.17012 [pdf]

doi 10.1109/I3DA57090.2023.10289496

Evaluation of Virtual Acoustic Environments with Different Acoustic Level of Detail

Authors: Stefan Fichna, Steven van de Par, Stephan D. Ewert

Abstract: Virtual acoustic environments enable the creation and simulation of realistic and ecologically valid daily-life situations with applications in hearing research and audiology. Hereby, reverberant indoor environments play an important role. For real-time applications, simplifications in the room acoustics simulation are required, however, it remains unclear what acoustic level of detail (ALOD) is n… ▽ More Virtual acoustic environments enable the creation and simulation of realistic and ecologically valid daily-life situations with applications in hearing research and audiology. Hereby, reverberant indoor environments play an important role. For real-time applications, simplifications in the room acoustics simulation are required, however, it remains unclear what acoustic level of detail (ALOD) is necessary to capture all perceptually relevant effects. This study investigates the effect of varying ALOD in the simulation of three different real environments, a living room with a coupled kitchen, a pub, and an underground station. ALOD was varied by generating different numbers of image sources for early reflections, or by excluding geometrical room details specific for each environment. The simulations were perceptually evaluated using headphones in comparison to binaural room impulse responses measured with a dummy head in the corresponding real environments, and partly using loudspeakers. The study assessed the perceived overall difference for a pulse, and a speech token. Furthermore, plausibility and externalization were evaluated. The results show that a strong reduction in ALOD is possible while obtaining similar plausibility and externalization as with the dummy head recordings. The number and accuracy of early reflections appear less relevant, provided diffuse late reverberation is appropriately accounted for. △ Less

Submitted 10 August, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

Comments: This work has been submitted to the I3DA 2023 International Conference on Immersive and 3D Audio for possible publication. Revised version after review

arXiv:2306.16967 [pdf, other]

On the relevance of acoustic measurements for creating realistic virtual acoustic environments

Authors: Siegfried Gündert, Stephan D. Ewert, Steven van de Par

Abstract: Geometrical approaches for room acoustics simulation have the advantage of requiring limited computational resources while still achieving a high perceptual plausibility. A common approach is using the image source model for direct and early reflections in connection with further simplified models such as a feedback delay network for the diffuse reverberant tail. When recreating real spaces as vir… ▽ More Geometrical approaches for room acoustics simulation have the advantage of requiring limited computational resources while still achieving a high perceptual plausibility. A common approach is using the image source model for direct and early reflections in connection with further simplified models such as a feedback delay network for the diffuse reverberant tail. When recreating real spaces as virtual acoustic environments using room acoustics simulation, the perceptual relevance of individual parameters in the simulation is unclear. Here we investigate the importance of underlying acoustical measurements and technical evaluation methods to obtain high-quality room acoustics simulations in agreement with dummy-head recordings of a real space. We focus on the role of source directivity. The effect of including measured, modelled, and omnidirectional source directivity in room acoustics simulations was assessed in comparison to the measured reference. Technical evaluation strategies to verify and improve the accuracy of various elements in the simulation processing chain from source, the room properties, to the receiver are presented. Perceptual results from an ABX listening experiment with random speech tokens are shown and compared with technical measures for a ranking of simulation approaches. △ Less

Submitted 29 June, 2023; originally announced June 2023.

Comments: This work has been submitted to the I3DA 2023 International Conference (IEEE Xplore Digital Library) for possible publication

arXiv:2306.16696 [pdf]

doi 10.1051/aacus/2024062

Computationally-efficient and perceptually-motivated rendering of diffuse reflections in room acoustics simulation

Authors: Stephan D. Ewert, Nico Gößling, Oliver Buttler, Steven van de Par, Hongmei Hu

Abstract: Geometrical acoustics is well suited for simulating room reverberation in interactive real-time applications. While the image source model (ISM) is exceptionally fast, the restriction to specular reflections impacts its perceptual plausibility. To account for diffuse late reverberation, hybrid approaches have been proposed, e.g., using a feedback delay network (FDN) in combination with the ISM. He… ▽ More Geometrical acoustics is well suited for simulating room reverberation in interactive real-time applications. While the image source model (ISM) is exceptionally fast, the restriction to specular reflections impacts its perceptual plausibility. To account for diffuse late reverberation, hybrid approaches have been proposed, e.g., using a feedback delay network (FDN) in combination with the ISM. Here, a computationally-efficient, digital-filter approach is suggested to account for effects of non-specular reflections in the ISM and to couple scattered sound into a diffuse reverberation model using a spatially rendered FDN. Depending on the scattering coefficient of a room boundary, energy of each image source is split into a specular and a scattered part which is added to the diffuse sound field. Temporal effects as observed for an infinite ideal diffuse (Lambertian) reflector are simulated using cascaded all-pass filters. Effects of scattering and multiple (inter-) reflections caused by larger geometric disturbances at walls and by objects in the room are accounted for in a highly simplified manner. Using a single parameter to quantify deviations from an empty shoebox room, each reflection is temporally smeared using cascaded all-pass filters. The proposed method was perceptually evaluated against dummy head recordings of real rooms. △ Less

Submitted 29 June, 2023; originally announced June 2023.

Comments: This work has been submitted to Forum Acusticum 2023 for publication

arXiv:2306.03859 [pdf, other]

Parameter Estimation in Electrical Distribution Systems with limited Measurements using Regression Methods

Authors: Steven de Jongh, Felicitas Mueller, Claudio Cañizares, Thomas Leibfried, Kankar Bhattacharya

Abstract: This paper presents novel methods for parameter identification in electrical grids with small numbers of spatially distributed measuring devices, which is an issue for distribution system operators managing aged and not properly mapped underground Low Voltage (LV) grids, especially in Germany. For this purpose, the total impedance of individual branches of the overall system is estimated by measur… ▽ More This paper presents novel methods for parameter identification in electrical grids with small numbers of spatially distributed measuring devices, which is an issue for distribution system operators managing aged and not properly mapped underground Low Voltage (LV) grids, especially in Germany. For this purpose, the total impedance of individual branches of the overall system is estimated by measuring currents and voltages at a subset of all system nodes over time. It is shown that, under common assumptions for electrical distsribution systems, an estimate of the total impedance can be made using readily computable proxies. Different regression methods are then used and compared to estimate the total impedance of the respective branches, with varying weights of the input data. The results on realistic LV feeders with different branch lengths and number of unmeasured segments are discussed and multiple influencing factors are investigated through simulations. It is shown that estimates of the total impedances can be obtained with acceptable quality under realistic assumptions. △ Less

Submitted 18 August, 2023; v1 submitted 6 June, 2023; originally announced June 2023.

arXiv:2306.01688 [pdf, other]

Packet Reception Probability: Packets That You Can't Decode Can Help Keep You Safe

Authors: Subham De, Deepak Vasisht, Hari Sundaram, Robin Kravets

Abstract: This paper provides a robust, scalable Bluetooth Low-Energy (BLE) based indoor localization solution using commodity hardware. While WiFi-based indoor localization has been widely studied, BLE has emerged a key technology for contact-tracing in the current pandemic. To accurately estimate distance using BLE on commercial devices, systems today rely on Receiver Signal Strength Indicator(RSSI) which… ▽ More This paper provides a robust, scalable Bluetooth Low-Energy (BLE) based indoor localization solution using commodity hardware. While WiFi-based indoor localization has been widely studied, BLE has emerged a key technology for contact-tracing in the current pandemic. To accurately estimate distance using BLE on commercial devices, systems today rely on Receiver Signal Strength Indicator(RSSI) which suffers from sampling bias and multipath effects. We propose a new metric: Packet Reception Probability (PRP) that builds on a counter-intuitive idea that we can exploit packet loss to estimate distance. We localize using a Bayesian-PRP formulation that also incorporates an explicit model of the multipath. To make deployment easy, we do not require any hardware, firmware, or driver-level changes to off-the-shelf devices, and require minimal training. PRP can achieve meter level accuracy with just 6 devices with known locations and 12 training locations. We show that fusing PRP with RSSI is beneficial at short distances < 2m. Beyond 2m, fusion is worse than PRP, as RSSI becomes effectively de-correlated with distance. Robust location accuracy at all distances and ease of deployment with PRP can help enable wide range indoor localization solutions using BLE. △ Less

Submitted 2 June, 2023; originally announced June 2023.

Comments: 14 pages, 10 figures

arXiv:2305.19710 [pdf, other]

doi 10.1109/MNET.2023.3329192

Semantic-Functional Communications in Cyber-Physical Systems

Authors: Pedro E. Goria Silva, Pedro H. J. Nardelli, Arthur S. de Sena, Harun Siljak, Niko Nevaranta, Nicola Marchetti, Rausley A. A. de Souza

Abstract: This paper explores the use of semantic knowledge inherent in the cyber-physical system (CPS) under study in order to minimize the use of explicit communication, which refers to the use of physical radio resources to transmit potentially informative data. It is assumed that the acquired data have a function in the system, usually related to its state estimation, which may trigger control actions.… ▽ More This paper explores the use of semantic knowledge inherent in the cyber-physical system (CPS) under study in order to minimize the use of explicit communication, which refers to the use of physical radio resources to transmit potentially informative data. It is assumed that the acquired data have a function in the system, usually related to its state estimation, which may trigger control actions. We propose that a semantic-functional approach can leverage the semantic-enabled implicit communication while guaranteeing that the system maintains functionality under the required performance. We illustrate the potential of this proposal through simulations of a swarm of drones jointly performing remote sensing in a given area. Our numerical results demonstrate that the proposed method offers the best design option regarding the ability to accomplish a previously established task -- remote sensing in the addressed case -- while minimising the use of radio resources by controlling the trade-offs that jointly determine the CPS performance and its effectiveness in the use of resources. In this sense, we establish a fundamental relationship between energy, communication, and functionality considering a given end application. △ Less

Submitted 31 May, 2023; originally announced May 2023.

arXiv:2305.04661 [pdf, ps, other]

Unleashing 3D Connectivity in Beyond 5G Networks with Reconfigurable Intelligent Surfaces

Authors: Jiguang He, Aymen Fakhreddine, Arthur S. de Sena, Yu Tian, Merouane Debbah

Abstract: Reconfigurable intelligent surfaces (RISs) bring various benefits to the current and upcoming wireless networks, including enhanced spectrum and energy efficiency, soft handover, transmission reliability, and even localization accuracy. These remarkable improvements result from the reconfigurability, programmability, and adaptation capabilities of RISs for fine-tuning radio propagation environment… ▽ More Reconfigurable intelligent surfaces (RISs) bring various benefits to the current and upcoming wireless networks, including enhanced spectrum and energy efficiency, soft handover, transmission reliability, and even localization accuracy. These remarkable improvements result from the reconfigurability, programmability, and adaptation capabilities of RISs for fine-tuning radio propagation environments, which can be realized in a cost- and energy-efficient manner. In this paper, we focus on the upgrade of the existing fifth-generation (5G) cellular network with the introduction of an RIS owning a full-dimensional uniform planar array structure for unleashing advanced three-dimensional connectivity. The deployed RIS is exploited for serving unmanned aerial vehicles (UAVs) flying in the sky with ultra-high data rate, a challenging task to be achieved with conventional base stations (BSs) that are designed mainly to serve ground users. By taking into account the line-of-sight probability for the RIS-UAV and BS-UAV links, we formulate the average achievable rate, analyze the effect of environmental parameters, and make insightful performance comparisons. Simulation results show that the deployment of RISs can bring impressive gains and significantly outperform conventional RIS-free 5G networks. △ Less

Submitted 2 October, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

Comments: 5 pages, 4 figures, invited paper to Asilomar Conference on Signals, Systems, and Computers 2023 (accepted)

arXiv:2304.10618 [pdf, other]

ULEEN: A Novel Architecture for Ultra Low-Energy Edge Neural Networks

Authors: Zachary Susskind, Aman Arora, Igor D. S. Miranda, Alan T. L. Bacellar, Luis A. Q. Villon, Rafael F. Katopodis, Leandro S. de Araujo, Diego L. C. Dutra, Priscila M. V. Lima, Felipe M. G. Franca, Mauricio Breternitz Jr., Lizy K. John

Abstract: The deployment of AI models on low-power, real-time edge devices requires accelerators for which energy, latency, and area are all first-order concerns. There are many approaches to enabling deep neural networks (DNNs) in this domain, including pruning, quantization, compression, and binary neural networks (BNNs), but with the emergence of the "extreme edge", there is now a demand for even more ef… ▽ More The deployment of AI models on low-power, real-time edge devices requires accelerators for which energy, latency, and area are all first-order concerns. There are many approaches to enabling deep neural networks (DNNs) in this domain, including pruning, quantization, compression, and binary neural networks (BNNs), but with the emergence of the "extreme edge", there is now a demand for even more efficient models. In order to meet the constraints of ultra-low-energy devices, we propose ULEEN, a model architecture based on weightless neural networks. Weightless neural networks (WNNs) are a class of neural model which use table lookups, not arithmetic, to perform computation. The elimination of energy-intensive arithmetic operations makes WNNs theoretically well suited for edge inference; however, they have historically suffered from poor accuracy and excessive memory usage. ULEEN incorporates algorithmic improvements and a novel training strategy inspired by BNNs to make significant strides in improving accuracy and reducing model size. We compare FPGA and ASIC implementations of an inference accelerator for ULEEN against edge-optimized DNN and BNN devices. On a Xilinx Zynq Z-7045 FPGA, we demonstrate classification on the MNIST dataset at 14.3 million inferences per second (13 million inferences/Joule) with 0.21 $μ$s latency and 96.2% accuracy, while Xilinx FINN achieves 12.3 million inferences per second (1.69 million inferences/Joule) with 0.31 $μ$s latency and 95.83% accuracy. In a 45nm ASIC, we achieve 5.1 million inferences/Joule and 38.5 million inferences/second at 98.46% accuracy, while a quantized Bit Fusion model achieves 9230 inferences/Joule and 19,100 inferences/second at 99.35% accuracy. In our search for ever more efficient edge devices, ULEEN shows that WNNs are deserving of consideration. △ Less

Submitted 20 April, 2023; originally announced April 2023.

Comments: 14 pages, 14 figures Portions of this article draw heavily from arXiv:2203.01479, most notably sections 5E and 5F.2

arXiv:2302.11491 [pdf, other]

A Supervisory Learning Control Framework for Autonomous & Real-time Task Planning for an Underactuated Cooperative Robotic task

Authors: Sander De Witte, Tom Lefebvre, Thijs Van Hauwermeiren, Guillaume Crevecoeur

Abstract: We introduce a framework for cooperative manipulation, applied on an underactuated manipulation problem. Two stationary robotic manipulators are required to cooperate in order to reposition an object within their shared work space. Control of multi-agent systems for manipulation tasks cannot rely on individual control strategies with little to no communication between the agents that serve the com… ▽ More We introduce a framework for cooperative manipulation, applied on an underactuated manipulation problem. Two stationary robotic manipulators are required to cooperate in order to reposition an object within their shared work space. Control of multi-agent systems for manipulation tasks cannot rely on individual control strategies with little to no communication between the agents that serve the common objective through swarming. Instead a coordination strategy is required that queries subtasks to the individual agents. We formulate the problem in a Task And Motion Planning (TAMP) setting, while considering a decomposition strategy that allows us to treat the task and motion planning problems separately. We solve the supervisory planning problem offline using deep Reinforcement Learning techniques resulting into a supervisory policy capable of coordinating the two manipulators into a successful execution of the pick-and-place task. Additionally, a benefit of solving the task planning problem offline is the possibility of real-time (re)planning, demonstrating robustness in the event of subtask execution failure or on-the-fly task changes. The framework achieved zero-shot deployment on the real setup with a success rate that is higher than 90%. △ Less

Submitted 22 February, 2023; originally announced February 2023.

Showing 1–50 of 144 results for author: De, S