-
An Interpretable Transformer-Based Foundation Model for Cross-Procedural Skill Assessment Using Raw fNIRS Signals
Authors:
A. Subedi,
S. De,
L. Cavuoto,
S. Schwaitzberg,
M. Hackett,
J. Norfleet
Abstract:
Objective skill assessment in high-stakes procedural environments requires models that not only decode underlying cognitive and motor processes but also generalize across tasks, individuals, and experimental contexts. While prior work has demonstrated the potential of functional near-infrared spectroscopy (fNIRS) for evaluating cognitive-motor performance, existing approaches are often task-specif…
▽ More
Objective skill assessment in high-stakes procedural environments requires models that not only decode underlying cognitive and motor processes but also generalize across tasks, individuals, and experimental contexts. While prior work has demonstrated the potential of functional near-infrared spectroscopy (fNIRS) for evaluating cognitive-motor performance, existing approaches are often task-specific, rely on extensive preprocessing, and lack robustness to new procedures or conditions. Here, we introduce an interpretable transformer-based foundation model trained on minimally processed fNIRS signals for cross-procedural skill assessment. Pretrained using self-supervised learning on data from laparoscopic surgical tasks and endotracheal intubation (ETI), the model achieves greater than 88% classification accuracy on all tasks, with Matthews Correlation Coefficient exceeding 0.91 on ETI. It generalizes to a novel emergency airway procedure--cricothyrotomy--using fewer than 30 labeled samples and a lightweight (less than 2k parameter) adapter module, attaining an AUC greater than 87%. Interpretability is achieved via a novel channel attention mechanism--developed specifically for fNIRS--that identifies functionally coherent prefrontal sub-networks validated through ablation studies. Temporal attention patterns align with task-critical phases and capture stress-induced changes in neural variability, offering insight into dynamic cognitive states.
△ Less
Submitted 21 June, 2025;
originally announced June 2025.
-
On the Robustness of RSMA to Adversarial BD-RIS-Induced Interference
Authors:
Arthur S. de Sena,
Jacek Kibilda,
Nurul H. Mahmood,
Andre Gomes,
Luiz A. DaSilva,
Matti Latva-aho
Abstract:
This article investigates the robustness of rate-splitting multiple access (RSMA) in multi-user multiple-input multiple-output (MIMO) systems to interference attacks against channel acquisition induced by beyond-diagonal RISs (BD-RISs). Two primary attack strategies, random and aligned interference, are proposed for fully connected and group-connected BD-RIS architectures. Valid random reflection…
▽ More
This article investigates the robustness of rate-splitting multiple access (RSMA) in multi-user multiple-input multiple-output (MIMO) systems to interference attacks against channel acquisition induced by beyond-diagonal RISs (BD-RISs). Two primary attack strategies, random and aligned interference, are proposed for fully connected and group-connected BD-RIS architectures. Valid random reflection coefficients are generated exploiting the Takagi factorization, while potent aligned interference attacks are achieved through optimization strategies based on a quadratically constrained quadratic program (QCQP) reformulation followed by projections onto the unitary manifold. Our numerical findings reveal that, when perfect channel state information (CSI) is available, RSMA behaves similarly to space-division multiple access (SDMA) and thus is highly susceptible to the attack, with BD-RIS inducing severe performance loss and significantly outperforming diagonal RIS. However, under imperfect CSI, RSMA consistently demonstrates significantly greater robustness than SDMA, particularly as the system's transmit power increases.
△ Less
Submitted 26 May, 2025;
originally announced May 2025.
-
Max-Min Fairness for Stacked Intelligent Metasurface-Assisted Multi-User MISO Systems
Authors:
Nipuni Ginige,
Prathapasinghe Dharmawansa,
Arthur Sousa de Sena,
Nurul Huda Mahmood,
Nandana Rajatheva,
Matti Latva-aho
Abstract:
Stacked intelligent metasurface (SIM) is an emerging technology that uses multiple reconfigurable surface layers to enable flexible wave-based beamforming. In this paper, we focus on an \ac{SIM}-assisted multi-user multiple-input single-output system, where it is essential to ensure that all users receive a fair and reliable service level. To this end, we develop two max-min fairness algorithms ba…
▽ More
Stacked intelligent metasurface (SIM) is an emerging technology that uses multiple reconfigurable surface layers to enable flexible wave-based beamforming. In this paper, we focus on an \ac{SIM}-assisted multi-user multiple-input single-output system, where it is essential to ensure that all users receive a fair and reliable service level. To this end, we develop two max-min fairness algorithms based on instantaneous channel state information (CSI) and statistical CSI. For the instantaneous CSI case, we propose an alternating optimization algorithm that jointly optimizes power allocation using geometric programming and wave-based beamforming coefficients using the gradient descent-ascent method. For the statistical CSI case, since deriving an exact expression for the average minimum achievable rate is analytically intractable, we derive a tight upper bound and thereby formulate a stochastic optimization problem. This problem is then solved, capitalizing on an alternating approach combining geometric programming and gradient descent algorithms, to obtain the optimal policies. Our numerical results show significant improvements in the minimum achievable rate compared to the benchmark schemes. In particular, for the instantaneous CSI scenario, the individual impact of the optimal wave-based beamforming is significantly higher than that of the power allocation strategy. Moreover, the proposed upper bound is shown to be tight in the low signal-to-noise ratio regime under the statistical CSI.
△ Less
Submitted 20 April, 2025;
originally announced April 2025.
-
End-to-End Deep Learning for Real-Time Neuroimaging-Based Assessment of Bimanual Motor Skills
Authors:
Aseem Subedi,
Rahul,
Lora Cavuoto,
Steven Schwaitzberg,
Matthew Hackett,
Jack Norfleet,
Suvranu De
Abstract:
The real-time assessment of complex motor skills presents a challenge in fields such as surgical training and rehabilitation. Recent advancements in neuroimaging, particularly functional near-infrared spectroscopy (fNIRS), have enabled objective assessment of such skills with high accuracy. However, these techniques are hindered by extensive preprocessing requirements to extract neural biomarkers.…
▽ More
The real-time assessment of complex motor skills presents a challenge in fields such as surgical training and rehabilitation. Recent advancements in neuroimaging, particularly functional near-infrared spectroscopy (fNIRS), have enabled objective assessment of such skills with high accuracy. However, these techniques are hindered by extensive preprocessing requirements to extract neural biomarkers. This study presents a novel end-to-end deep learning framework that processes raw fNIRS signals directly, eliminating the need for intermediate preprocessing steps. The model was evaluated on datasets from three distinct bimanual motor tasks--suturing, pattern cutting, and endotracheal intubation (ETI)--using performance metrics derived from both training and retention datasets. It achieved a mean classification accuracy of 93.9% (SD 4.4) and a generalization accuracy of 92.6% (SD 1.9) on unseen skill retention datasets, with a leave-one-subject-out cross-validation yielding an accuracy of 94.1% (SD 3.6). Contralateral prefrontal cortex activations exhibited task-specific discriminative power, while motor cortex activations consistently contributed to accurate classification. The model also demonstrated resilience to neurovascular coupling saturation caused by extended task sessions, maintaining robust performance across trials. Comparative analysis confirms that the end-to-end model performs on par with or surpasses baseline models optimized for fully processed fNIRS data, with statistically similar (p<0.05) or improved prediction accuracies. By eliminating the need for extensive signal preprocessing, this work provides a foundation for real-time, non-invasive assessment of bimanual motor skills in medical training environments, with potential applications in robotics, rehabilitation, and sports.
△ Less
Submitted 21 March, 2025;
originally announced April 2025.
-
Multi-stage model predictive control for slug flow crystallizers using uncertainty-aware surrogate models
Authors:
Collin R. Johnson,
Stijn de Vries,
Kerstin Wohlgemuth,
Sergio Lucia
Abstract:
This paper presents a novel dynamic model for slug flow crystallizers that addresses the challenges of spatial distribution without backmixing or diffusion, potentially enabling advanced model-based control. The developed model can accurately describe the main characteristics of slug flow crystallizers, including slug-to-slug variability but leads to a high computational complexity due to the cons…
▽ More
This paper presents a novel dynamic model for slug flow crystallizers that addresses the challenges of spatial distribution without backmixing or diffusion, potentially enabling advanced model-based control. The developed model can accurately describe the main characteristics of slug flow crystallizers, including slug-to-slug variability but leads to a high computational complexity due to the consideration of partial differential equations and population balance equations. For that reason, the model cannot be directly used for process optimization and control. To solve this challenge, we propose two different approaches, conformalized quantile regression and Bayesian last layer neural networks, to develop surrogate models with uncertainty quantification capabilities. These surrogates output a prediction of the system states together with an uncertainty of these predictions to account for process variability and model uncertainty. We use the uncertainty of the predictions to formulate a robust model predictive control approach, enabling robust real-time advanced control of a slug flow crystallizer.
△ Less
Submitted 28 March, 2025;
originally announced March 2025.
-
The 4D Human Embryonic Brain Atlas: spatiotemporal atlas generation for rapid anatomical changes using first-trimester ultrasound from the Rotterdam Periconceptional Cohort
Authors:
Wietske A. P. Bastiaansen,
Melek Rousian,
Anton H. J. Koning,
Wiro J. Niessen,
Bernadette S. de Bakker,
Régine P. M. Steegers-Theunissen,
Stefan Klein
Abstract:
Early brain development is crucial for lifelong neurodevelopmental health. However, current clinical practice offers limited knowledge of normal embryonic brain anatomy on ultrasound, despite the brain undergoing rapid changes within the time-span of days. To provide detailed insights into normal brain development and identify deviations, we created the 4D Human Embryonic Brain Atlas using a deep…
▽ More
Early brain development is crucial for lifelong neurodevelopmental health. However, current clinical practice offers limited knowledge of normal embryonic brain anatomy on ultrasound, despite the brain undergoing rapid changes within the time-span of days. To provide detailed insights into normal brain development and identify deviations, we created the 4D Human Embryonic Brain Atlas using a deep learning-based approach for groupwise registration and spatiotemporal atlas generation. Our method introduced a time-dependent initial atlas and penalized deviations from it, ensuring age-specific anatomy was maintained throughout rapid development. The atlas was generated and validated using 831 3D ultrasound images from 402 subjects in the Rotterdam Periconceptional Cohort, acquired between gestational weeks 8 and 12. We evaluated the effectiveness of our approach with an ablation study, which demonstrated that incorporating a time-dependent initial atlas and penalization produced anatomically accurate results. In contrast, omitting these adaptations led to anatomically incorrect atlas. Visual comparisons with an existing ex-vivo embryo atlas further confirmed the anatomical accuracy of our atlas. In conclusion, the proposed method successfully captures the rapid anotomical development of the embryonic brain. The resulting 4D Human Embryonic Brain Atlas provides a unique insights into this crucial early life period and holds the potential for improving the detection, prevention, and treatment of prenatal neurodevelopmental disorders.
△ Less
Submitted 10 March, 2025;
originally announced March 2025.
-
A Deep-Unfolding Approach to RIS Phase Shift Optimization Via Transformer-Based Channel Prediction
Authors:
Ishan Koralege,
Arthur S. de Sena,
Nurul H. Mahmood,
Farjam Karim,
Dimuthu Lesthuruge,
Samitha Gunarathne
Abstract:
Reconfigurable intelligent surfaces (RISs) have emerged as a promising solution that can provide dynamic control over the propagation of electromagnetic waves. The RIS technology is envisioned as a key enabler of sixth-generation networks by offering the ability to adaptively manipulate signal propagation through the smart configuration of its phase shift coefficients, thereby optimizing signal st…
▽ More
Reconfigurable intelligent surfaces (RISs) have emerged as a promising solution that can provide dynamic control over the propagation of electromagnetic waves. The RIS technology is envisioned as a key enabler of sixth-generation networks by offering the ability to adaptively manipulate signal propagation through the smart configuration of its phase shift coefficients, thereby optimizing signal strength, coverage, and capacity. However, the realization of this technology's full potential hinges on the accurate acquisition of channel state information (CSI). In this paper, we propose an efficient CSI prediction framework for a RIS-assisted communication system based on the machine learning (ML) transformer architecture. Architectural modifications are introduced to the vanilla transformer for multivariate time series forecasting to achieve high prediction accuracy. The predicted channel coefficients are then used to optimize the RIS phase shifts. Simulation results present a comprehensive analysis of key performance metrics, including data rate and outage probability. Our results confirm the effectiveness of the proposed ML approach and demonstrate its superiority over other baseline ML-based CSI prediction schemes such as conventional deep neural networks and long short-term memory architectures, albeit at the cost of slightly increased complexity.
△ Less
Submitted 25 February, 2025;
originally announced February 2025.
-
Reliability Modeling for Beyond-5G Mission Critical Networks Using Effective Capacity
Authors:
Anudeep Karnam,
Jobish John,
Kishor C. Joshi,
George Exarchakos,
Sonia Heemstra de Groot,
Ignas Niemegeers
Abstract:
Accurate reliability modeling for ultra-reliable low latency communication (URLLC) and hyper-reliable low latency communication (HRLLC) networks is challenging due to the complex interactions between network layers required to meet stringent requirements. In this paper, we propose such a model. We consider the acknowledged mode of the radio link control (RLC) layer, utilizing separate buffers for…
▽ More
Accurate reliability modeling for ultra-reliable low latency communication (URLLC) and hyper-reliable low latency communication (HRLLC) networks is challenging due to the complex interactions between network layers required to meet stringent requirements. In this paper, we propose such a model. We consider the acknowledged mode of the radio link control (RLC) layer, utilizing separate buffers for transmissions and retransmissions, along with the behavior of physical channels. Our approach leverages the effective capacity (EC) framework, which quantifies the maximum constant arrival rate a time-varying wireless channel can support while meeting statistical quality of service (QoS) constraints. We derive a reliability model that incorporates delay violations, various latency components, and multiple transmission attempts. Our method identifies optimal operating conditions that satisfy URLLC/HRLLC constraints while maintaining near-optimal EC, ensuring the system can handle peak traffic with a guaranteed QoS. Our model reveals critical trade-offs between EC and reliability across various use cases, providing guidance for URLLC/HRLLC network design for service providers and system designers.
△ Less
Submitted 31 January, 2025;
originally announced January 2025.
-
Uplink Rate Splitting Multiple Access with Imperfect Channel State Information and Interference Cancellation
Authors:
Farjam Karim,
Nurul Huda Mahmood,
Arthur S. de Sena,
Deepak Kumar,
Bruno Clerckx,
Matti Latva-aho
Abstract:
This article investigates the performance of uplink rate splitting multiple access (RSMA) in a two-user scenario, addressing an under-explored domain compared to its downlink counterpart. With the increasing demand for uplink communication in applications like the Internet-of-Things, it is essential to account for practical imperfections, such as inaccuracies in channel state information at the re…
▽ More
This article investigates the performance of uplink rate splitting multiple access (RSMA) in a two-user scenario, addressing an under-explored domain compared to its downlink counterpart. With the increasing demand for uplink communication in applications like the Internet-of-Things, it is essential to account for practical imperfections, such as inaccuracies in channel state information at the receiver (CSIR) and limitations in successive interference cancellation (SIC), to provide realistic assessments of system performance. Specifically, we derive closed-form expressions for the outage probability, throughput, and asymptotic outage behavior of uplink users, considering imperfect CSIR and SIC. We validate the accuracy of these derived expressions using Monte Carlo simulations. Our findings reveal that at low transmit power levels, imperfect CSIR significantly affects system performance more severely than SIC imperfections. However, as the transmit power increases, the impact of imperfect CSIR diminishes, while the influence of SIC imperfections becomes more pronounced. Moreover, we highlight the impact of the rate allocation factor on user performance. Finally, our comparison with non-orthogonal multiple access (NOMA) highlights the outage performance trade-offs between RSMA and NOMA. RSMA proves to be more effective in managing imperfect CSIR and enhances performance through strategic message splitting, resulting in more robust communication.
△ Less
Submitted 31 January, 2025;
originally announced January 2025.
-
In the Picture: Medical Imaging Datasets, Artifacts, and their Living Review
Authors:
Amelia Jiménez-Sánchez,
Natalia-Rozalia Avlona,
Sarah de Boer,
Víctor M. Campello,
Aasa Feragen,
Enzo Ferrante,
Melanie Ganz,
Judy Wawira Gichoya,
Camila González,
Steff Groefsema,
Alessa Hering,
Adam Hulman,
Leo Joskowicz,
Dovile Juodelyte,
Melih Kandemir,
Thijs Kooi,
Jorge del Pozo Lérida,
Livie Yumeng Li,
Andre Pacheco,
Tim Rädsch,
Mauricio Reyes,
Théo Sourget,
Bram van Ginneken,
David Wen,
Nina Weng
, et al. (4 additional authors not shown)
Abstract:
Datasets play a critical role in medical imaging research, yet issues such as label quality, shortcuts, and metadata are often overlooked. This lack of attention may harm the generalizability of algorithms and, consequently, negatively impact patient outcomes. While existing medical imaging literature reviews mostly focus on machine learning (ML) methods, with only a few focusing on datasets for s…
▽ More
Datasets play a critical role in medical imaging research, yet issues such as label quality, shortcuts, and metadata are often overlooked. This lack of attention may harm the generalizability of algorithms and, consequently, negatively impact patient outcomes. While existing medical imaging literature reviews mostly focus on machine learning (ML) methods, with only a few focusing on datasets for specific applications, these reviews remain static -- they are published once and not updated thereafter. This fails to account for emerging evidence, such as biases, shortcuts, and additional annotations that other researchers may contribute after the dataset is published. We refer to these newly discovered findings of datasets as research artifacts. To address this gap, we propose a living review that continuously tracks public datasets and their associated research artifacts across multiple medical imaging applications. Our approach includes a framework for the living review to monitor data documentation artifacts, and an SQL database to visualize the citation relationships between research artifact and dataset. Lastly, we discuss key considerations for creating medical imaging datasets, review best practices for data annotation, discuss the significance of shortcuts and demographic diversity, and emphasize the importance of managing datasets throughout their entire lifecycle. Our demo is publicly available at http://inthepicture.itu.dk/.
△ Less
Submitted 2 June, 2025; v1 submitted 18 January, 2025;
originally announced January 2025.
-
Time series forecasting for multidimensional telemetry data using GAN and BiLSTM in a Digital Twin
Authors:
Joao Carmo de Almeida Neto,
Claudio Miceli de Farias,
Leandro Santiago de Araujo,
Leopoldo Andre Dutra Lusquino Filho
Abstract:
The research related to digital twins has been increasing in recent years. Besides the mirroring of the physical word into the digital, there is the need of providing services related to the data collected and transferred to the virtual world. One of these services is the forecasting of physical part future behavior, that could lead to applications, like preventing harmful events or designing impr…
▽ More
The research related to digital twins has been increasing in recent years. Besides the mirroring of the physical word into the digital, there is the need of providing services related to the data collected and transferred to the virtual world. One of these services is the forecasting of physical part future behavior, that could lead to applications, like preventing harmful events or designing improvements to get better performance. One strategy used to predict any system operation it is the use of time series models like ARIMA or LSTM, and improvements were implemented using these algorithms. Recently, deep learning techniques based on generative models such as Generative Adversarial Networks (GANs) have been proposed to create time series and the use of LSTM has gained more relevance in time series forecasting, but both have limitations that restrict the forecasting results. Another issue found in the literature is the challenge of handling multivariate environments/applications in time series generation. Therefore, new methods need to be studied in order to fill these gaps and, consequently, provide better resources for creating useful digital twins. In this proposal, it is going to be studied the integration of a BiLSTM layer with a time series obtained by GAN in order to improve the forecasting of all the features provided by the dataset in terms of accuracy and, consequently, improving behaviour prediction.
△ Less
Submitted 14 January, 2025;
originally announced January 2025.
-
FreeSVC: Towards Zero-shot Multilingual Singing Voice Conversion
Authors:
Alef Iury Siqueira Ferreira,
Lucas Rafael Gris,
Augusto Seben da Rosa,
Frederico Santos de Oliveira,
Edresson Casanova,
Rafael Teixeira Sousa,
Arnaldo Candido Junior,
Anderson da Silva Soares,
Arlindo Galvão Filho
Abstract:
This work presents FreeSVC, a promising multilingual singing voice conversion approach that leverages an enhanced VITS model with Speaker-invariant Clustering (SPIN) for better content representation and the State-of-the-Art (SOTA) speaker encoder ECAPA2. FreeSVC incorporates trainable language embeddings to handle multiple languages and employs an advanced speaker encoder to disentangle speaker c…
▽ More
This work presents FreeSVC, a promising multilingual singing voice conversion approach that leverages an enhanced VITS model with Speaker-invariant Clustering (SPIN) for better content representation and the State-of-the-Art (SOTA) speaker encoder ECAPA2. FreeSVC incorporates trainable language embeddings to handle multiple languages and employs an advanced speaker encoder to disentangle speaker characteristics from linguistic content. Designed for zero-shot learning, FreeSVC enables cross-lingual singing voice conversion without extensive language-specific training. We demonstrate that a multilingual content extractor is crucial for optimal cross-language conversion. Our source code and models are publicly available.
△ Less
Submitted 9 January, 2025;
originally announced January 2025.
-
Probabilistic Latent Variable Modeling for Dynamic Friction Identification and Estimation
Authors:
Victor Vantilborgh,
Sander De Witte,
Frederik Ostyn,
Tom Lefebvre,
Guillaume Crevecoeur
Abstract:
Precise identification of dynamic models in robotics is essential to support control design, friction compensation, output torque estimation, etc. A longstanding challenge remains in the identification of friction models for robotic joints, given the numerous physical phenomena affecting the underlying friction dynamics which result into nonlinear characteristics and hysteresis behaviour in partic…
▽ More
Precise identification of dynamic models in robotics is essential to support control design, friction compensation, output torque estimation, etc. A longstanding challenge remains in the identification of friction models for robotic joints, given the numerous physical phenomena affecting the underlying friction dynamics which result into nonlinear characteristics and hysteresis behaviour in particular. These phenomena proof difficult to be modelled and captured accurately using physical analogies alone. This has motivated researchers to shift from physics-based to data-driven models. Currently, these methods are still limited in their ability to generalize effectively to typical industrial robot deployement, characterized by high- and low-velocity operations and frequent direction reversals. Empirical observations motivate the use of dynamic friction models but these remain particulary challenging to establish. To address the current limitations, we propose to account for unidentified dynamics in the robot joints using latent dynamic states. The friction model may then utilize both the dynamic robot state and additional information encoded in the latent state to evaluate the friction torque. We cast this stochastic and partially unsupervised identification problem as a standard probabilistic representation learning problem. In this work both the friction model and latent state dynamics are parametrized as neural networks and integrated in the conventional lumped parameter dynamic robot model. The complete dynamics model is directly learned from the noisy encoder measurements in the robot joints. We use the Expectation-Maximisation (EM) algorithm to find a Maximum Likelihood Estimate (MLE) of the model parameters. The effectiveness of the proposed method is validated in terms of open-loop prediction accuracy in comparison with baseline methods, using the Kuka KR6 R700 as a test platform.
△ Less
Submitted 20 December, 2024;
originally announced December 2024.
-
Coherent3D: Coherent 3D Portrait Video Reconstruction via Triplane Fusion
Authors:
Shengze Wang,
Xueting Li,
Chao Liu,
Matthew Chan,
Michael Stengel,
Henry Fuchs,
Shalini De Mello,
Koki Nagano
Abstract:
Recent breakthroughs in single-image 3D portrait reconstruction have enabled telepresence systems to stream 3D portrait videos from a single camera in real-time, democratizing telepresence. However, per-frame 3D reconstruction exhibits temporal inconsistency and forgets the user's appearance. On the other hand, self-reenactment methods can render coherent 3D portraits by driving a 3D avatar built…
▽ More
Recent breakthroughs in single-image 3D portrait reconstruction have enabled telepresence systems to stream 3D portrait videos from a single camera in real-time, democratizing telepresence. However, per-frame 3D reconstruction exhibits temporal inconsistency and forgets the user's appearance. On the other hand, self-reenactment methods can render coherent 3D portraits by driving a 3D avatar built from a single reference image, but fail to faithfully preserve the user's per-frame appearance (e.g., instantaneous facial expression and lighting). As a result, none of these two frameworks is an ideal solution for democratized 3D telepresence. In this work, we address this dilemma and propose a novel solution that maintains both coherent identity and dynamic per-frame appearance to enable the best possible realism. To this end, we propose a new fusion-based method that takes the best of both worlds by fusing a canonical 3D prior from a reference view with dynamic appearance from per-frame input views, producing temporally stable 3D videos with faithful reconstruction of the user's per-frame appearance. Trained only using synthetic data produced by an expression-conditioned 3D GAN, our encoder-based method achieves both state-of-the-art 3D reconstruction and temporal consistency on in-studio and in-the-wild datasets. https://research.nvidia.com/labs/amri/projects/coherent3d
△ Less
Submitted 11 December, 2024;
originally announced December 2024.
-
Efficient Channel Prediction for Beyond Diagonal RIS-Assisted MIMO Systems with Channel Aging
Authors:
Nipuni Ginige,
Arthur Sousa de Sena,
Nurul Huda Mahmood,
Nandana Rajatheva,
Matti Latva-aho
Abstract:
Novel reconfigurable intelligent surface (RIS) architectures, known as beyond diagonal RISs (BD-RISs), have been proposed to enhance reflection efficiency and expand RIS capabilities. However, their passive nature, non-diagonal reflection matrix, and the large number of coupled reflecting elements complicate the channel state information (CSI) estimation process. The challenge further escalates in…
▽ More
Novel reconfigurable intelligent surface (RIS) architectures, known as beyond diagonal RISs (BD-RISs), have been proposed to enhance reflection efficiency and expand RIS capabilities. However, their passive nature, non-diagonal reflection matrix, and the large number of coupled reflecting elements complicate the channel state information (CSI) estimation process. The challenge further escalates in scenarios with fast-varying channels. In this paper, we address this challenge by proposing novel joint channel estimation and prediction strategies with low overhead and high accuracy for two different RIS architectures in a BD-RIS-assisted multiple-input multiple-output system under correlated fast-fading environments with channel aging. The channel estimation procedure utilizes the Tucker2 decomposition with bilinear alternative least squares, which is exploited to decompose the cascade channels of the BD-RIS-assisted system into effective channels of reduced dimension. The channel prediction framework is based on a convolutional neural network combined with an autoregressive predictor. The estimated/predicted CSI is then utilized to optimize the RIS phase shifts aiming at the maximization of the downlink sum rate. Insightful simulation results demonstrate that our proposed approach is robust to channel aging, and exhibits a high estimation accuracy. Moreover, our scheme can deliver a high average downlink sum rate, outperforming other state-of-the-art channel estimation methods. The results also reveal a remarkable reduction in pilot overhead of up to 98\% compared to baseline schemes, all imposing low computational complexity.
△ Less
Submitted 21 November, 2024;
originally announced November 2024.
-
Making Social Platforms Accessible: Emotion-Aware Speech Generation with Integrated Text Analysis
Authors:
Suparna De,
Ionut Bostan,
Nishanth Sastry
Abstract:
Recent studies have outlined the accessibility challenges faced by blind or visually impaired, and less-literate people, in interacting with social networks, in-spite of facilitating technologies such as monotone text-to-speech (TTS) screen readers and audio narration of visual elements such as emojis. Emotional speech generation traditionally relies on human input of the expected emotion together…
▽ More
Recent studies have outlined the accessibility challenges faced by blind or visually impaired, and less-literate people, in interacting with social networks, in-spite of facilitating technologies such as monotone text-to-speech (TTS) screen readers and audio narration of visual elements such as emojis. Emotional speech generation traditionally relies on human input of the expected emotion together with the text to synthesise, with additional challenges around data simplification (causing information loss) and duration inaccuracy, leading to lack of expressive emotional rendering. In real-life communications, the duration of phonemes can vary since the same sentence might be spoken in a variety of ways depending on the speakers' emotional states or accents (referred to as the one-to-many problem of text to speech generation). As a result, an advanced voice synthesis system is required to account for this unpredictability. We propose an end-to-end context-aware Text-to-Speech (TTS) synthesis system that derives the conveyed emotion from text input and synthesises audio that focuses on emotions and speaker features for natural and expressive speech, integrating advanced natural language processing (NLP) and speech synthesis techniques for real-time applications. Our system also showcases competitive inference time performance when benchmarked against the state-of-the-art TTS models, making it suitable for real-time accessibility applications.
△ Less
Submitted 24 October, 2024;
originally announced October 2024.
-
Toward Robust Real-World Audio Deepfake Detection: Closing the Explainability Gap
Authors:
Georgia Channing,
Juil Sock,
Ronald Clark,
Philip Torr,
Christian Schroeder de Witt
Abstract:
The rapid proliferation of AI-manipulated or generated audio deepfakes poses serious challenges to media integrity and election security. Current AI-driven detection solutions lack explainability and underperform in real-world settings. In this paper, we introduce novel explainability methods for state-of-the-art transformer-based audio deepfake detectors and open-source a novel benchmark for real…
▽ More
The rapid proliferation of AI-manipulated or generated audio deepfakes poses serious challenges to media integrity and election security. Current AI-driven detection solutions lack explainability and underperform in real-world settings. In this paper, we introduce novel explainability methods for state-of-the-art transformer-based audio deepfake detectors and open-source a novel benchmark for real-world generalizability. By narrowing the explainability gap between transformer-based audio deepfake detectors and traditional methods, our results not only build trust with human experts, but also pave the way for unlocking the potential of citizen intelligence to overcome the scalability issue in audio deepfake detection.
△ Less
Submitted 9 October, 2024;
originally announced October 2024.
-
Cooperative UAV-Relay based Satellite Aerial Ground Integrated Networks
Authors:
Bhola,
Yu-Jia Chen,
Ashutosh Balakrishnan,
Swades De,
Li-Chun Wang
Abstract:
In the post-fifth generation (5G) era, escalating user quality of service (QoS) strains terrestrial network capacity, especially in urban areas with dynamic traffic distributions. This paper introduces a novel cooperative unmanned aerial vehicle relay-based deployment (CUD) framework in satellite air-ground integrated networks (SAGIN). The CUD strategy deploys an unmanned aerial vehicle-based rela…
▽ More
In the post-fifth generation (5G) era, escalating user quality of service (QoS) strains terrestrial network capacity, especially in urban areas with dynamic traffic distributions. This paper introduces a novel cooperative unmanned aerial vehicle relay-based deployment (CUD) framework in satellite air-ground integrated networks (SAGIN). The CUD strategy deploys an unmanned aerial vehicle-based relay (UAVr) in an amplify-andforward (AF) mode to enhance user QoS when terrestrial base stations fall short of network capacity. By combining low earth orbit (LEO) satellite and UAVr signals using cooperative diversity, the CUD framework enhances the signal to noise ratio (SNR) at the user. Comparative evaluations against existing frameworks reveal performance improvements, demonstrating the effectiveness of the CUD framework in addressing the evolving demands of next-generation networks.
△ Less
Submitted 9 October, 2024;
originally announced October 2024.
-
Room impulse response prototyping using receiver distance estimations for high quality room equalisation algorithms
Authors:
James Brooks-Park,
Martin Bo Møller,
Jan Østergaard,
Søren Bech,
Steven van de Par
Abstract:
Room equalisation aims to increase the quality of loudspeaker reproduction in reverberant environments, compensating for colouration caused by imperfect room reflections and frequency dependant loudspeaker directivity. A common technique in the field of room equalisation, is to invert a prototype Room Impulse Response (RIR). Rather than inverting a single RIR at the listening position, a prototype…
▽ More
Room equalisation aims to increase the quality of loudspeaker reproduction in reverberant environments, compensating for colouration caused by imperfect room reflections and frequency dependant loudspeaker directivity. A common technique in the field of room equalisation, is to invert a prototype Room Impulse Response (RIR). Rather than inverting a single RIR at the listening position, a prototype response is composed of several responses distributed around the listening area. This paper proposes a method of impulse response prototyping, using estimated receiver positions, to form a weighted average prototype response. A method of receiver distance estimation is described, supporting the implementation of the prototype RIR. The proposed prototyping method is compared to other methods by measuring their post equalisation spectral deviation at several positions in a simulated room.
△ Less
Submitted 16 September, 2024;
originally announced September 2024.
-
Pick the Largest Margin for Robust Detection of Splicing
Authors:
Julien Simon de Kergunic,
Rony Abecidan,
Patrick Bas,
Vincent Itier
Abstract:
Despite advancements in splicing detection, practitioners still struggle to fully leverage forensic tools from the literature due to a critical issue: deep learning-based detectors are extremely sensitive to their trained instances. Simple post-processing applied to evaluation images can easily decrease their performances, leading to a lack of confidence in splicing detectors for operational conte…
▽ More
Despite advancements in splicing detection, practitioners still struggle to fully leverage forensic tools from the literature due to a critical issue: deep learning-based detectors are extremely sensitive to their trained instances. Simple post-processing applied to evaluation images can easily decrease their performances, leading to a lack of confidence in splicing detectors for operational contexts. In this study, we show that a deep splicing detector behaves differently against unknown post-processes for different learned weights, even if it achieves similar performances on a test set from the same distribution as its training one. We connect this observation to the fact that different learnings create different latent spaces separating training samples differently. Our experiments reveal a strong correlation between the distributions of latent margins and the ability of the detector to generalize to post-processed images. We thus provide to the practitioner a way to build deep detectors that are more robust than others against post-processing operations, suggesting to train their architecture under different conditions and picking the one maximizing the latent space margin.
△ Less
Submitted 6 September, 2024; v1 submitted 5 September, 2024;
originally announced September 2024.
-
Practical Challenges for Reliable RIS Deployment in Heterogeneous Multi-Operator Multi-Band Networks
Authors:
Mehdi Monemi,
Mehdi Rasti,
Arthur S. de Sena,
Mohammad Amir Fallah,
Matti Latva-Aho,
Marco Di Renzo
Abstract:
Reconfigurable intelligent surfaces (RISs) have been introduced as arrays of nearly passive elements with software-tunable electromagnetic properties to dynamically manipulate the reflection/transmission of radio signals. Research works in this area are focused on two applications, namely {\it user-assist} RIS aiming at tuning the RIS to enhance the quality-of-service (QoS) of target users, and th…
▽ More
Reconfigurable intelligent surfaces (RISs) have been introduced as arrays of nearly passive elements with software-tunable electromagnetic properties to dynamically manipulate the reflection/transmission of radio signals. Research works in this area are focused on two applications, namely {\it user-assist} RIS aiming at tuning the RIS to enhance the quality-of-service (QoS) of target users, and the {\it malicious} RIS aiming for an attacker to degrade the QoS at victim receivers through generating {\it intended} destructive interference. While both user-assist and malicious RIS applications have been explored extensively, the impact of RIS deployments on imposing {\it unintended} interference on various wireless user-equipments (EUs) remains underexplored. This paper investigates the challenges of integrating RISs into multi-carrier, multi-user, and multi-operator networks. We discuss how RIS deployments intended to benefit specific users can negatively impact other users served at various carrier frequencies through different network operators. While not an ideal solution, we discuss how ultra-narrowband metasurfaces can be incorporated into the manufacturing of RISs to mitigate some challenges of RIS deployment in wireless networks. We also present a simulation scenario to illuminate some practical challenges associated with the deployment of RISs in shared public environments.
△ Less
Submitted 29 June, 2025; v1 submitted 28 August, 2024;
originally announced August 2024.
-
Combined assessment of auditory distance perception and externalization
Authors:
Henning Hoppe,
Steven van de Par,
Virginia Flanagin,
Stephan D. Ewert
Abstract:
This study investigates frontal auditory distance perception (ADP) and externalization in virtual audio-visual environments, considering effects of headphone rendering method, room size, reverberation, and visual representation of the room. Either head-related impulse responses from an artificial head or a spherical head model were used for diotic (monophonic) and binaural auralizations with and w…
▽ More
This study investigates frontal auditory distance perception (ADP) and externalization in virtual audio-visual environments, considering effects of headphone rendering method, room size, reverberation, and visual representation of the room. Either head-related impulse responses from an artificial head or a spherical head model were used for diotic (monophonic) and binaural auralizations with and without real-time head tracking. The visuals were presented through a head-mounted display. Two differently sized rooms as well as an infinitely extending space (echoic and anechoic) were used in which an invisible frontal virtual sound source was located. Additionally, the effect of a freely movable loudspeaker for visually indicating perceived distances was investigated. Both ADP and externalization were significantly affected by room size, but otherwise the two perceptual quantities differed in their outcomes. Room visibility significantly affected ADP, leading to considerable overestimations and more variability in the absence of a visual environment, although externalization was not affected. The movable loudspeaker improved distance estimation significantly, however, did not affect externalization. For reverberation, a (non-significant) trend of improved ADP was observed, however, externalization was significantly improved. Different headphone renderings did not significantly affect ADP or externalization, although a clear trend was observed for externalization.
△ Less
Submitted 26 August, 2024;
originally announced August 2024.
-
The effect of self-motion and room familiarity on sound source localization in virtual environments
Authors:
Niklas Isserstedt,
Stephan D. Ewert,
Virginia Flanagin,
Steven van de Par
Abstract:
This paper investigates the influence of lateral horizontal self-motion of participants during signal presentation on distance and azimuth perception for frontal sound sources in a rectangular room. Additionally, the effect of deviating room acoustics for a single sound presentation embedded in a sequence of presentations using a baseline room acoustics for familiarization is analyzed. For this pu…
▽ More
This paper investigates the influence of lateral horizontal self-motion of participants during signal presentation on distance and azimuth perception for frontal sound sources in a rectangular room. Additionally, the effect of deviating room acoustics for a single sound presentation embedded in a sequence of presentations using a baseline room acoustics for familiarization is analyzed. For this purpose, two experiments were conducted using audiovisual virtual reality technology with dynamic head-tracking and real-time auralization over headphones combined with visual rendering of the room using a head-mounted display. Results show an improved distance perception accuracy when participants moved laterally during signal presentation instead of staying at a fixed position, with only head movements allowed. Adaptation to the room acoustics also improves distance perception accuracy. Azimuth perception seems to be independent of lateral movements during signal presentation and could even be negatively influenced by the familiarity of the used room acoustics.
△ Less
Submitted 25 August, 2024;
originally announced August 2024.
-
Malicious RIS Meets RSMA: Unveiling the Robustness of Rate Splitting to RIS-Induced Attacks
Authors:
A. S. de Sena,
A. Gomes,
J. Kibiłda,
N. H. Mahmood,
L. A. DaSilva,
M. Latva-aho
Abstract:
While the robustness of rate-splitting multiple access (RSMA) to imperfect channel state information (CSI) is well-documented, its susceptibility to attacks launched with malicious reconfigurable intelligent surfaces (RISs) remains unexplored. This paper fills this gap by investigating three potential RIS-induced attacks against RSMA in a multi-user multiple-input multiple-output (MIMO) network: r…
▽ More
While the robustness of rate-splitting multiple access (RSMA) to imperfect channel state information (CSI) is well-documented, its susceptibility to attacks launched with malicious reconfigurable intelligent surfaces (RISs) remains unexplored. This paper fills this gap by investigating three potential RIS-induced attacks against RSMA in a multi-user multiple-input multiple-output (MIMO) network: random interference, aligned interference, and mitigation attack. The random interference attack employs random RIS coefficients to disrupt RSMA. The other two attacks are triggered by optimizing the RIS through weighted-sum strategies based on the projected gradient method. Simulation results reveal significant degradation caused by all the attacks under perfect CSI conditions. Remarkably, when imperfect CSI is considered, RSMA, owing to its flexible power allocation strategy designed to counter CSI-related interference, can be robust to the attacks even when the base station is blind to them. It is also shown that RSMA can significantly outperform conventional space-division multiple access (SDMA).
△ Less
Submitted 23 August, 2024;
originally announced August 2024.
-
Efficient Patient Fine-Tuned Seizure Detection with a Tensor Kernel Machine
Authors:
Seline J. S. de Rooij,
Frederiek Wesel,
Borbála Hunyadi
Abstract:
Recent developments in wearable devices have made accurate and efficient seizure detection more important than ever. A challenge in seizure detection is that patient-specific models typically outperform patient-independent models. However, in a wearable device one typically starts with a patient-independent model, until such patient-specific data is available. To avoid having to construct a new cl…
▽ More
Recent developments in wearable devices have made accurate and efficient seizure detection more important than ever. A challenge in seizure detection is that patient-specific models typically outperform patient-independent models. However, in a wearable device one typically starts with a patient-independent model, until such patient-specific data is available. To avoid having to construct a new classifier with this data, as required in conventional kernel machines, we propose a transfer learning approach with a tensor kernel machine. This method learns the primal weights in a compressed form using the canonical polyadic decomposition, making it possible to efficiently update the weights of the patient-independent model with patient-specific data. The results show that this patient fine-tuned model reaches as high a performance as a patient-specific SVM model with a model size that is twice as small as the patient-specific model and ten times as small as the patient-independent model.
△ Less
Submitted 1 August, 2024;
originally announced August 2024.
-
Hierarchical Homogeneity-Based Superpixel Segmentation: Application to Hyperspectral Image Analysis
Authors:
Luciano Carvalho Ayres,
Sérgio José Melo de Almeida,
José Carlos Moreira Bermudez,
Ricardo Augusto Borsoi
Abstract:
Hyperspectral image (HI) analysis approaches have recently become increasingly complex and sophisticated. Recently, the combination of spectral-spatial information and superpixel techniques have addressed some hyperspectral data issues, such as the higher spatial variability of spectral signatures and dimensionality of the data. However, most existing superpixel approaches do not account for speci…
▽ More
Hyperspectral image (HI) analysis approaches have recently become increasingly complex and sophisticated. Recently, the combination of spectral-spatial information and superpixel techniques have addressed some hyperspectral data issues, such as the higher spatial variability of spectral signatures and dimensionality of the data. However, most existing superpixel approaches do not account for specific HI characteristics resulting from its high spectral dimension. In this work, we propose a multiscale superpixel method that is computationally efficient for processing hyperspectral data. The Simple Linear Iterative Clustering (SLIC) oversegmentation algorithm, on which the technique is based, has been extended hierarchically. Using a novel robust homogeneity testing, the proposed hierarchical approach leads to superpixels of variable sizes but with higher spectral homogeneity when compared to the classical SLIC segmentation. For validation, the proposed homogeneity-based hierarchical method was applied as a preprocessing step in the spectral unmixing and classification tasks carried out using, respectively, the Multiscale sparse Unmixing Algorithm (MUA) and the CNN-Enhanced Graph Convolutional Network (CEGCN) methods. Simulation results with both synthetic and real data show that the technique is competitive with state-of-the-art solutions.
△ Less
Submitted 21 July, 2024;
originally announced July 2024.
-
MmWave for Extended Reality: Open User Mobility Dataset, Characterisation, and Impact on Link Quality
Authors:
Alexander Marinsek,
Sam De Kunst,
Gilles Callebaut,
Lieven De Strycker,
Liesbet Van der Perre
Abstract:
User mobility in extended reality (XR) can have a major impact on millimeter-wave (mmWave) links and may require dedicated mitigation strategies to ensure reliable connections and avoid outage. The available prior art has predominantly focused on XR applications with constrained user mobility and limited impact on mmWave channels. We have performed dedicated experiments to extend the characterisat…
▽ More
User mobility in extended reality (XR) can have a major impact on millimeter-wave (mmWave) links and may require dedicated mitigation strategies to ensure reliable connections and avoid outage. The available prior art has predominantly focused on XR applications with constrained user mobility and limited impact on mmWave channels. We have performed dedicated experiments to extend the characterisation of relevant future XR use cases featuring a high degree of user mobility. To this end, we have carried out a tailor-made measurement campaign and conducted a characterisation of the collected tracking data, including the approximation of the data using statistical distributions. Moreover, we have provided an interpretation of the possible impact of the recorded mobility on mmWave technology. The dataset is made publicly accessible to provide a testing ground for wireless system design and to enable further XR mobility modelling.
△ Less
Submitted 14 April, 2025; v1 submitted 2 July, 2024;
originally announced July 2024.
-
Open-Source Conversational AI with SpeechBrain 1.0
Authors:
Mirco Ravanelli,
Titouan Parcollet,
Adel Moumen,
Sylvain de Langen,
Cem Subakan,
Peter Plantinga,
Yingzhi Wang,
Pooneh Mousavi,
Luca Della Libera,
Artem Ploujnikov,
Francesco Paissan,
Davide Borra,
Salah Zaiem,
Zeyu Zhao,
Shucong Zhang,
Georgios Karakasidis,
Sung-Lin Yeh,
Pierre Champion,
Aku Rouhe,
Rudolf Braun,
Florian Mai,
Juan Zuluaga-Gomez,
Seyed Mahed Mousavi,
Andreas Nautsch,
Ha Nguyen
, et al. (8 additional authors not shown)
Abstract:
SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much more. It promotes transparency and replicability by releasing both the pre-trained models and the complete "recipes" of code and algorithms required for training them. This paper prese…
▽ More
SpeechBrain is an open-source Conversational AI toolkit based on PyTorch, focused particularly on speech processing tasks such as speech recognition, speech enhancement, speaker recognition, text-to-speech, and much more. It promotes transparency and replicability by releasing both the pre-trained models and the complete "recipes" of code and algorithms required for training them. This paper presents SpeechBrain 1.0, a significant milestone in the evolution of the toolkit, which now has over 200 recipes for speech, audio, and language processing tasks, and more than 100 models available on Hugging Face. SpeechBrain 1.0 introduces new technologies to support diverse learning modalities, Large Language Model (LLM) integration, and advanced decoding strategies, along with novel models, tasks, and modalities. It also includes a new benchmark repository, offering researchers a unified platform for evaluating models across diverse tasks.
△ Less
Submitted 16 October, 2024; v1 submitted 29 June, 2024;
originally announced July 2024.
-
Machine Learning-Based Channel Prediction for RIS-assisted MIMO Systems With Channel Aging
Authors:
Nipuni Ginige,
Arthur Sousa de Sena,
Nurul Huda Mahmood,
Nandana Rajatheva,
Matti Latva-aho
Abstract:
Reconfigurable intelligent surfaces (RISs) have emerged as a promising technology to enhance the performance of sixth-generation (6G) and beyond communication systems. The passive nature of RISs and their large number of reflecting elements pose challenges to the channel estimation process. The associated complexity further escalates when the channel coefficients are fast-varying as in scenarios w…
▽ More
Reconfigurable intelligent surfaces (RISs) have emerged as a promising technology to enhance the performance of sixth-generation (6G) and beyond communication systems. The passive nature of RISs and their large number of reflecting elements pose challenges to the channel estimation process. The associated complexity further escalates when the channel coefficients are fast-varying as in scenarios with user mobility. In this paper, we propose an extended channel estimation framework for RIS-assisted multiple-input multiple-output (MIMO) systems based on a convolutional neural network (CNN) integrated with an autoregressive (AR) predictor. The implemented framework is designed for identifying the aging pattern and predicting enhanced estimates of the wireless channels in correlated fast-fading environments. Insightful simulation results demonstrate that our proposed CNN-AR approach is robust to channel aging, exhibiting a high-precision estimation accuracy. The results also show that our approach can achieve high spectral efficiency and low pilot overhead compared to traditional methods.
△ Less
Submitted 9 May, 2024;
originally announced June 2024.
-
Quantifying the effect of speech pathology on automatic and human speaker verification
Authors:
Bence Mark Halpern,
Thomas Tienkamp,
Wen-Chin Huang,
Lester Phillip Violeta,
Teja Rebernik,
Sebastiaan de Visscher,
Max Witjes,
Martijn Wieling,
Defne Abur,
Tomoki Toda
Abstract:
This study investigates how surgical intervention for speech pathology (specifically, as a result of oral cancer surgery) impacts the performance of an automatic speaker verification (ASV) system. Using two recently collected Dutch datasets with parallel pre and post-surgery audio from the same speaker, NKI-OC-VC and SPOKE, we assess the extent to which speech pathology influences ASV performance,…
▽ More
This study investigates how surgical intervention for speech pathology (specifically, as a result of oral cancer surgery) impacts the performance of an automatic speaker verification (ASV) system. Using two recently collected Dutch datasets with parallel pre and post-surgery audio from the same speaker, NKI-OC-VC and SPOKE, we assess the extent to which speech pathology influences ASV performance, and whether objective/subjective measures of speech severity are correlated with the performance. Finally, we carry out a perceptual study to compare judgements of ASV and human listeners. Our findings reveal that pathological speech negatively affects ASV performance, and the severity of the speech is negatively correlated with the performance. There is a moderate agreement in perceptual and objective scores of speaker similarity and severity, however, we could not clearly establish in the perceptual study, whether the same phenomenon also exists in human perception.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
ROCOv2: Radiology Objects in COntext Version 2, an Updated Multimodal Image Dataset
Authors:
Johannes Rückert,
Louise Bloch,
Raphael Brüngel,
Ahmad Idrissi-Yaghir,
Henning Schäfer,
Cynthia S. Schmidt,
Sven Koitka,
Obioma Pelka,
Asma Ben Abacha,
Alba G. Seco de Herrera,
Henning Müller,
Peter A. Horn,
Felix Nensa,
Christoph M. Friedrich
Abstract:
Automated medical image analysis systems often require large amounts of training data with high quality labels, which are difficult and time consuming to generate. This paper introduces Radiology Object in COntext version 2 (ROCOv2), a multimodal dataset consisting of radiological images and associated medical concepts and captions extracted from the PMC Open Access subset. It is an updated versio…
▽ More
Automated medical image analysis systems often require large amounts of training data with high quality labels, which are difficult and time consuming to generate. This paper introduces Radiology Object in COntext version 2 (ROCOv2), a multimodal dataset consisting of radiological images and associated medical concepts and captions extracted from the PMC Open Access subset. It is an updated version of the ROCO dataset published in 2018, and adds 35,705 new images added to PMC since 2018. It further provides manually curated concepts for imaging modalities with additional anatomical and directional concepts for X-rays. The dataset consists of 79,789 images and has been used, with minor modifications, in the concept detection and caption prediction tasks of ImageCLEFmedical Caption 2023. The dataset is suitable for training image annotation models based on image-caption pairs, or for multi-label image classification using Unified Medical Language System (UMLS) concepts provided with each image. In addition, it can serve for pre-training of medical domain models, and evaluation of deep learning models for multi-task learning.
△ Less
Submitted 18 June, 2024; v1 submitted 16 May, 2024;
originally announced May 2024.
-
Deformable MRI Sequence Registration for AI-based Prostate Cancer Diagnosis
Authors:
Alessa Hering,
Sarah de Boer,
Anindo Saha,
Jasper J. Twilt,
Mattias P. Heinrich,
Derya Yakar,
Maarten de Rooij,
Henkjan Huisman,
Joeran S. Bosma
Abstract:
The PI-CAI (Prostate Imaging: Cancer AI) challenge led to expert-level diagnostic algorithms for clinically significant prostate cancer detection. The algorithms receive biparametric MRI scans as input, which consist of T2-weighted and diffusion-weighted scans. These scans can be misaligned due to multiple factors in the scanning process. Image registration can alleviate this issue by predicting t…
▽ More
The PI-CAI (Prostate Imaging: Cancer AI) challenge led to expert-level diagnostic algorithms for clinically significant prostate cancer detection. The algorithms receive biparametric MRI scans as input, which consist of T2-weighted and diffusion-weighted scans. These scans can be misaligned due to multiple factors in the scanning process. Image registration can alleviate this issue by predicting the deformation between the sequences. We investigate the effect of image registration on the diagnostic performance of AI-based prostate cancer diagnosis. First, the image registration algorithm, developed in MeVisLab, is analyzed using a dataset with paired lesion annotations. Second, the effect on diagnosis is evaluated by comparing case-level cancer diagnosis performance between using the original dataset, rigidly aligned diffusion-weighted scans, or deformably aligned diffusion-weighted scans. Rigid registration showed no improvement. Deformable registration demonstrated a substantial improvement in lesion overlap (+10% median Dice score) and a positive yet non-significant improvement in diagnostic performance (+0.3% AUROC, p=0.18). Our investigation shows that a substantial improvement in lesion alignment does not directly lead to a significant improvement in diagnostic performance. Qualitative analysis indicated that jointly developing image registration methods and diagnostic AI algorithms could enhance diagnostic accuracy and patient outcomes.
△ Less
Submitted 28 June, 2024; v1 submitted 15 April, 2024;
originally announced April 2024.
-
A Generalized Multiscale Bundle-Based Hyperspectral Sparse Unmixing Algorithm
Authors:
Luciano Carvalho Ayres,
Ricardo Augusto Borsoi,
José Carlos Moreira Bermudez,
Sérgio José Melo de Almeida
Abstract:
In hyperspectral sparse unmixing, a successful approach employs spectral bundles to address the variability of the endmembers in the spatial domain. However, the regularization penalties usually employed aggregate substantial computational complexity, and the solutions are very noise-sensitive. We generalize a multiscale spatial regularization approach to solve the unmixing problem by incorporatin…
▽ More
In hyperspectral sparse unmixing, a successful approach employs spectral bundles to address the variability of the endmembers in the spatial domain. However, the regularization penalties usually employed aggregate substantial computational complexity, and the solutions are very noise-sensitive. We generalize a multiscale spatial regularization approach to solve the unmixing problem by incorporating group sparsity-inducing mixed norms. Then, we propose a noise-robust method that can take advantage of the bundle structure to deal with endmember variability while ensuring inter- and intra-class sparsity in abundance estimation with reasonable computational cost. We also present a general heuristic to select the \emph{most representative} abundance estimation over multiple runs of the unmixing process, yielding a solution that is robust and highly reproducible. Experiments illustrate the robustness and consistency of the results when compared to related methods.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
Malicious RIS versus Massive MIMO: Securing Multiple Access against RIS-based Jamming Attacks
Authors:
Arthur Sousa de Sena,
Jacek Kibilda,
Nurul Huda Mahmood,
André Gomes,
Matti Latva-aho
Abstract:
In this letter, we study an attack that leverages a reconfigurable intelligent surface (RIS) to induce harmful interference toward multiple users in massive multiple-input multiple-output (mMIMO) systems during the data transmission phase. We propose an efficient and flexible weighted-sum projected gradient-based algorithm for the attacker to optimize the RIS reflection coefficients without knowin…
▽ More
In this letter, we study an attack that leverages a reconfigurable intelligent surface (RIS) to induce harmful interference toward multiple users in massive multiple-input multiple-output (mMIMO) systems during the data transmission phase. We propose an efficient and flexible weighted-sum projected gradient-based algorithm for the attacker to optimize the RIS reflection coefficients without knowing legitimate user channels. To counter such a threat, we propose two reception strategies. Simulation results demonstrate that our malicious algorithm outperforms baseline strategies while offering adaptability for targeting specific users. At the same time, our results show that our mitigation strategies are effective even if only an imperfect estimate of the cascade RIS channel is available.
△ Less
Submitted 13 January, 2024;
originally announced January 2024.
-
Beyond Diagonal RIS for Multi-Band Multi-Cell MIMO Networks: A Practical Frequency-Dependent Model and Performance Analysis
Authors:
Arthur S. de Sena,
Mehdi Rasti,
Nurul H. Mahmood,
Matti Latva-aho
Abstract:
This paper delves into the unexplored frequency-dependent characteristics of beyond diagonal reconfigurable intelligent surfaces (BD-RISs). A generalized practical frequency-dependent reflection model is proposed as a fundamental framework for configuring fully-connected and group-connected RISs in a multi-band multi-base station (BS) multiple-input multiple-output (MIMO) network. Leveraging this…
▽ More
This paper delves into the unexplored frequency-dependent characteristics of beyond diagonal reconfigurable intelligent surfaces (BD-RISs). A generalized practical frequency-dependent reflection model is proposed as a fundamental framework for configuring fully-connected and group-connected RISs in a multi-band multi-base station (BS) multiple-input multiple-output (MIMO) network. Leveraging this practical model, multi-objective optimization strategies are formulated to maximize the received power at multiple users connected to different BSs, each operating under a distinct carrier frequency. By relying on matrix theory and exploiting the symmetric structure of the reflection matrices inherent to BD-RISs, relaxed tractable versions of the challenging problems are achieved for scenarios with obstructed and unobstructed direct channel links. The relaxed solutions are then combined with codebook-based approaches to configure the practical capacitance values for the BD-RISs. Simulation results reveal the frequency-dependent behaviors of different RIS architectures and demonstrate the effectiveness of the proposed schemes. Notably, BD-RISs exhibit high reflection performance across the intended frequency range, remarkably outperforming conventional single-connected RISs. Moreover, the proposed optimization approaches prove effective in enabling the targeted operation of BD-RISs across one or more carrier frequencies. The results also shed light on the potential for harmful interference in the absence of synchronization between RISs and adjacent BSs.
△ Less
Submitted 24 June, 2024; v1 submitted 12 January, 2024;
originally announced January 2024.
-
On the Ground and in the Sky: A Tutorial on Radio Localization in Ground-Air-Space Networks
Authors:
Hazem Sallouha,
Sharief Saleh,
Sibren De Bast,
Zhuangzhuang Cui,
Sofie Pollin,
Henk Wymeersch
Abstract:
The inherent limitations in scaling up ground infrastructure for future wireless networks, combined with decreasing operational costs of aerial and space networks, are driving considerable research interest in multisegment ground-air-space (GAS) networks. In GAS networks, where ground and aerial users share network resources, ubiquitous and accurate user localization becomes indispensable, not onl…
▽ More
The inherent limitations in scaling up ground infrastructure for future wireless networks, combined with decreasing operational costs of aerial and space networks, are driving considerable research interest in multisegment ground-air-space (GAS) networks. In GAS networks, where ground and aerial users share network resources, ubiquitous and accurate user localization becomes indispensable, not only as an end-user service but also as an enabler for location-aware communications. This breaks the convention of having localization as a byproduct in networks primarily designed for communications. To address these imperative localization needs, the design and utilization of ground, aerial, and space anchors require thorough investigation. In this tutorial, we provide an in-depth systemic analysis of the radio localization problem in GAS networks, considering ground and aerial users as targets to be localized. Starting from a survey of the most relevant works, we then define the key characteristics of anchors and targets in GAS networks. Subsequently, we detail localization fundamentals in GAS networks, considering 3D positions, orientations, and velocities. Afterward, we thoroughly analyze radio localization systems in GAS networks, detailing the system model, design aspects, and considerations for each of the three GAS anchors. Preliminary results are presented to provide a quantifiable perspective on key design aspects in GAS-based localization scenarios. We then identify the vital roles 6G enablers are expected to play in radio localization in GAS networks.
△ Less
Submitted 9 August, 2024; v1 submitted 9 December, 2023;
originally announced December 2023.
-
Reverberant sound field equalisation for an enhanced stereo playback experience
Authors:
James Brooks-Park,
Steven van de Par
Abstract:
The topic of room equalisation has been at the forefront of research and product development for many years, with the aim of increasing the playback quality of loudspeakers in reverberant rooms. Traditional room equalisation systems comprise of a number of filters that when applied to the primary loudspeakers, additional room colouration is compensated for. This publication introduces a novel equa…
▽ More
The topic of room equalisation has been at the forefront of research and product development for many years, with the aim of increasing the playback quality of loudspeakers in reverberant rooms. Traditional room equalisation systems comprise of a number of filters that when applied to the primary loudspeakers, additional room colouration is compensated for. This publication introduces a novel equalisation technique where gammatone filter band energy is added to the reverberant sound field via two surround loudspeakers, leaving the direct sound from the primary loudspeakers unaltered, but the sum of direct and reverberant energy is equalised at the listening position. Unlike traditional systems, this method allows the target function of the direct sound to differ from the reverberant sound field. The proposed method is motivated by the different roles direct and reverberant sound components play in humans perception of sound. Along with introducing the proposed method, results from a subjective listening test are presented, demonstrating the preference towards the proposed technique when compared to a traditional room equalisation technique and stereo playback.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
On the Sum Secrecy Rate of Multi-User Holographic MIMO Networks
Authors:
Arthur S. de Sena,
Jiguang He,
Ahmed Al Hammadi,
Chongwen Huang,
Faouzi Bader,
Merouane Debbah,
Mathias Fink
Abstract:
The emerging concept of extremely-large holographic multiple-input multiple-output (HMIMO), beneficial from compactly and densely packed cost-efficient radiating meta-atoms, has been demonstrated for enhanced degrees of freedom even in pure line-of-sight conditions, enabling tremendous multiplexing gain for the next-generation communication systems. Most of the reported works focus on energy and s…
▽ More
The emerging concept of extremely-large holographic multiple-input multiple-output (HMIMO), beneficial from compactly and densely packed cost-efficient radiating meta-atoms, has been demonstrated for enhanced degrees of freedom even in pure line-of-sight conditions, enabling tremendous multiplexing gain for the next-generation communication systems. Most of the reported works focus on energy and spectrum efficiency, path loss analyses, and channel modeling. The extension to secure communications remains unexplored. In this paper, we theoretically characterize the secrecy capacity of the HMIMO network with multiple legitimate users and one eavesdropper while taking into consideration artificial noise and max-min fairness. We formulate the power allocation (PA) problem and address it by following successive convex approximation and Taylor expansion. We further study the effect of fixed PA coefficients, imperfect channel state information, inter-element spacing, and the number of Eve's antennas on the sum secrecy rate. Simulation results show that significant performance gain with more than 100\% increment in the high signal-to-noise ratio (SNR) regime for the two-user case is obtained by exploiting adaptive/flexible PA compared to the case with fixed PA coefficients.
△ Less
Submitted 22 October, 2023;
originally announced October 2023.
-
Latent Disentanglement in Mesh Variational Autoencoders Improves the Diagnosis of Craniofacial Syndromes and Aids Surgical Planning
Authors:
Simone Foti,
Alexander J. Rickart,
Bongjin Koo,
Eimear O' Sullivan,
Lara S. van de Lande,
Athanasios Papaioannou,
Roman Khonsari,
Danail Stoyanov,
N. u. Owase Jeelani,
Silvia Schievano,
David J. Dunaway,
Matthew J. Clarkson
Abstract:
The use of deep learning to undertake shape analysis of the complexities of the human head holds great promise. However, there have traditionally been a number of barriers to accurate modelling, especially when operating on both a global and local level. In this work, we will discuss the application of the Swap Disentangled Variational Autoencoder (SD-VAE) with relevance to Crouzon, Apert and Muen…
▽ More
The use of deep learning to undertake shape analysis of the complexities of the human head holds great promise. However, there have traditionally been a number of barriers to accurate modelling, especially when operating on both a global and local level. In this work, we will discuss the application of the Swap Disentangled Variational Autoencoder (SD-VAE) with relevance to Crouzon, Apert and Muenke syndromes. Although syndrome classification is performed on the entire mesh, it is also possible, for the first time, to analyse the influence of each region of the head on the syndromic phenotype. By manipulating specific parameters of the generative model, and producing procedure-specific new shapes, it is also possible to simulate the outcome of a range of craniofacial surgical procedures. This opens new avenues to advance diagnosis, aids surgical planning and allows for the objective evaluation of surgical outcomes.
△ Less
Submitted 5 September, 2023;
originally announced September 2023.
-
Data-driven Topology and Parameter Identification in Distribution Systems with limited Measurements
Authors:
Steven de Jongh,
Felicitas Mueller,
Fabian Osterberg,
Claudio A. Cañizares,
Thomas Leibfried,
Kankar Bhattacharya
Abstract:
This manuscript presents novel techniques for identifying the switch states, phase identification, and estimation of equipment parameters in multi-phase low voltage electrical grids, which is a major challenge in long-standing German low voltage grids that lack observability and are heavily impacted by modelling errors. The proposed methods are tailored for systems with a limited number of spatial…
▽ More
This manuscript presents novel techniques for identifying the switch states, phase identification, and estimation of equipment parameters in multi-phase low voltage electrical grids, which is a major challenge in long-standing German low voltage grids that lack observability and are heavily impacted by modelling errors. The proposed methods are tailored for systems with a limited number of spatially distributed measuring devices, which measure voltage magnitudes at specific nodes and some line current magnitudes. The overall approach employs a problem decomposition strategy to divide the problem into smaller subproblems, which are addressed independently. The techniques for identifying switch states and system phases are based on heuristics and a binary optimization problem using correlation analysis of the measured time series. The estimation of equipment parameters is achieved through a data-driven regression approach and by an optimization problem, and the identification of cable types is solved using a Mixed-Integer Quadratic Programming solver. To validate the presented methods, a realistic grid is used and the presented techniques are evaluated for their resilience to data quality and time resolution, discussing the limitations of the proposed methods.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
A Singular-value-based Marker for the Detection of Atrial Fibrillation Using High-resolution Electrograms and Multi-lead ECG
Authors:
Hanie Moghaddasi,
Richard C. Hendriks,
Borbala Hunyadi,
Paul Knops,
Mathijs S van Schie,
Natasja M. S. de Groot,
Alle-Jan van der Veen
Abstract:
The severity of atrial fibrillation (AF) can be assessed from intra-operative epicardial measurements (high-resolution electrograms), using metrics such as conduction block (CB) and continuous conduction delay and block (cCDCB). These features capture differences in conduction velocity and wavefront propagation. However, they do not clearly differentiate patients with various degrees of AF while t…
▽ More
The severity of atrial fibrillation (AF) can be assessed from intra-operative epicardial measurements (high-resolution electrograms), using metrics such as conduction block (CB) and continuous conduction delay and block (cCDCB). These features capture differences in conduction velocity and wavefront propagation. However, they do not clearly differentiate patients with various degrees of AF while they are in sinus rhythm, and complementary features are needed. In this work, we focus on the morphology of the action potentials, and derive features to detect variations in the atrial potential waveforms. Methods: We show that the spatial variation of atrial potential morphology during a single beat may be described by changes in the singular values of the epicardial measurement matrix. The method is non-parametric and requires little preprocessing. A corresponding singular value map points at areas subject to fractionation and block. Further, we developed an experiment where we simultaneously measure electrograms (EGMs) and a multi-lead ECG. Results: The captured data showed that the normalized singular values of the heartbeats during AF are higher than during SR, and that this difference is more pronounced for the (non-invasive) ECG data than for the EGM data, if the electrodes are positioned at favorable locations. Conclusion: Overall, the singular value-based features are a useful indicator to detect and evaluate AF. Significance: The proposed method might be beneficial for identifying electropathological regions in the tissue without estimating the local activation time.
△ Less
Submitted 6 July, 2023;
originally announced July 2023.
-
Evaluation of Virtual Acoustic Environments with Different Acoustic Level of Detail
Authors:
Stefan Fichna,
Steven van de Par,
Stephan D. Ewert
Abstract:
Virtual acoustic environments enable the creation and simulation of realistic and ecologically valid daily-life situations with applications in hearing research and audiology. Hereby, reverberant indoor environments play an important role. For real-time applications, simplifications in the room acoustics simulation are required, however, it remains unclear what acoustic level of detail (ALOD) is n…
▽ More
Virtual acoustic environments enable the creation and simulation of realistic and ecologically valid daily-life situations with applications in hearing research and audiology. Hereby, reverberant indoor environments play an important role. For real-time applications, simplifications in the room acoustics simulation are required, however, it remains unclear what acoustic level of detail (ALOD) is necessary to capture all perceptually relevant effects. This study investigates the effect of varying ALOD in the simulation of three different real environments, a living room with a coupled kitchen, a pub, and an underground station. ALOD was varied by generating different numbers of image sources for early reflections, or by excluding geometrical room details specific for each environment. The simulations were perceptually evaluated using headphones in comparison to binaural room impulse responses measured with a dummy head in the corresponding real environments, and partly using loudspeakers. The study assessed the perceived overall difference for a pulse, and a speech token. Furthermore, plausibility and externalization were evaluated. The results show that a strong reduction in ALOD is possible while obtaining similar plausibility and externalization as with the dummy head recordings. The number and accuracy of early reflections appear less relevant, provided diffuse late reverberation is appropriately accounted for.
△ Less
Submitted 10 August, 2023; v1 submitted 29 June, 2023;
originally announced June 2023.
-
On the relevance of acoustic measurements for creating realistic virtual acoustic environments
Authors:
Siegfried Gündert,
Stephan D. Ewert,
Steven van de Par
Abstract:
Geometrical approaches for room acoustics simulation have the advantage of requiring limited computational resources while still achieving a high perceptual plausibility. A common approach is using the image source model for direct and early reflections in connection with further simplified models such as a feedback delay network for the diffuse reverberant tail. When recreating real spaces as vir…
▽ More
Geometrical approaches for room acoustics simulation have the advantage of requiring limited computational resources while still achieving a high perceptual plausibility. A common approach is using the image source model for direct and early reflections in connection with further simplified models such as a feedback delay network for the diffuse reverberant tail. When recreating real spaces as virtual acoustic environments using room acoustics simulation, the perceptual relevance of individual parameters in the simulation is unclear. Here we investigate the importance of underlying acoustical measurements and technical evaluation methods to obtain high-quality room acoustics simulations in agreement with dummy-head recordings of a real space. We focus on the role of source directivity. The effect of including measured, modelled, and omnidirectional source directivity in room acoustics simulations was assessed in comparison to the measured reference. Technical evaluation strategies to verify and improve the accuracy of various elements in the simulation processing chain from source, the room properties, to the receiver are presented. Perceptual results from an ABX listening experiment with random speech tokens are shown and compared with technical measures for a ranking of simulation approaches.
△ Less
Submitted 29 June, 2023;
originally announced June 2023.
-
Computationally-efficient and perceptually-motivated rendering of diffuse reflections in room acoustics simulation
Authors:
Stephan D. Ewert,
Nico Gößling,
Oliver Buttler,
Steven van de Par,
Hongmei Hu
Abstract:
Geometrical acoustics is well suited for simulating room reverberation in interactive real-time applications. While the image source model (ISM) is exceptionally fast, the restriction to specular reflections impacts its perceptual plausibility. To account for diffuse late reverberation, hybrid approaches have been proposed, e.g., using a feedback delay network (FDN) in combination with the ISM. He…
▽ More
Geometrical acoustics is well suited for simulating room reverberation in interactive real-time applications. While the image source model (ISM) is exceptionally fast, the restriction to specular reflections impacts its perceptual plausibility. To account for diffuse late reverberation, hybrid approaches have been proposed, e.g., using a feedback delay network (FDN) in combination with the ISM. Here, a computationally-efficient, digital-filter approach is suggested to account for effects of non-specular reflections in the ISM and to couple scattered sound into a diffuse reverberation model using a spatially rendered FDN. Depending on the scattering coefficient of a room boundary, energy of each image source is split into a specular and a scattered part which is added to the diffuse sound field. Temporal effects as observed for an infinite ideal diffuse (Lambertian) reflector are simulated using cascaded all-pass filters. Effects of scattering and multiple (inter-) reflections caused by larger geometric disturbances at walls and by objects in the room are accounted for in a highly simplified manner. Using a single parameter to quantify deviations from an empty shoebox room, each reflection is temporally smeared using cascaded all-pass filters. The proposed method was perceptually evaluated against dummy head recordings of real rooms.
△ Less
Submitted 29 June, 2023;
originally announced June 2023.
-
Parameter Estimation in Electrical Distribution Systems with limited Measurements using Regression Methods
Authors:
Steven de Jongh,
Felicitas Mueller,
Claudio Cañizares,
Thomas Leibfried,
Kankar Bhattacharya
Abstract:
This paper presents novel methods for parameter identification in electrical grids with small numbers of spatially distributed measuring devices, which is an issue for distribution system operators managing aged and not properly mapped underground Low Voltage (LV) grids, especially in Germany. For this purpose, the total impedance of individual branches of the overall system is estimated by measur…
▽ More
This paper presents novel methods for parameter identification in electrical grids with small numbers of spatially distributed measuring devices, which is an issue for distribution system operators managing aged and not properly mapped underground Low Voltage (LV) grids, especially in Germany. For this purpose, the total impedance of individual branches of the overall system is estimated by measuring currents and voltages at a subset of all system nodes over time. It is shown that, under common assumptions for electrical distsribution systems, an estimate of the total impedance can be made using readily computable proxies. Different regression methods are then used and compared to estimate the total impedance of the respective branches, with varying weights of the input data. The results on realistic LV feeders with different branch lengths and number of unmeasured segments are discussed and multiple influencing factors are investigated through simulations. It is shown that estimates of the total impedances can be obtained with acceptable quality under realistic assumptions.
△ Less
Submitted 18 August, 2023; v1 submitted 6 June, 2023;
originally announced June 2023.
-
Packet Reception Probability: Packets That You Can't Decode Can Help Keep You Safe
Authors:
Subham De,
Deepak Vasisht,
Hari Sundaram,
Robin Kravets
Abstract:
This paper provides a robust, scalable Bluetooth Low-Energy (BLE) based indoor localization solution using commodity hardware. While WiFi-based indoor localization has been widely studied, BLE has emerged a key technology for contact-tracing in the current pandemic. To accurately estimate distance using BLE on commercial devices, systems today rely on Receiver Signal Strength Indicator(RSSI) which…
▽ More
This paper provides a robust, scalable Bluetooth Low-Energy (BLE) based indoor localization solution using commodity hardware. While WiFi-based indoor localization has been widely studied, BLE has emerged a key technology for contact-tracing in the current pandemic. To accurately estimate distance using BLE on commercial devices, systems today rely on Receiver Signal Strength Indicator(RSSI) which suffers from sampling bias and multipath effects. We propose a new metric: Packet Reception Probability (PRP) that builds on a counter-intuitive idea that we can exploit packet loss to estimate distance. We localize using a Bayesian-PRP formulation that also incorporates an explicit model of the multipath. To make deployment easy, we do not require any hardware, firmware, or driver-level changes to off-the-shelf devices, and require minimal training. PRP can achieve meter level accuracy with just 6 devices with known locations and 12 training locations. We show that fusing PRP with RSSI is beneficial at short distances < 2m. Beyond 2m, fusion is worse than PRP, as RSSI becomes effectively de-correlated with distance. Robust location accuracy at all distances and ease of deployment with PRP can help enable wide range indoor localization solutions using BLE.
△ Less
Submitted 2 June, 2023;
originally announced June 2023.
-
Semantic-Functional Communications in Cyber-Physical Systems
Authors:
Pedro E. Goria Silva,
Pedro H. J. Nardelli,
Arthur S. de Sena,
Harun Siljak,
Niko Nevaranta,
Nicola Marchetti,
Rausley A. A. de Souza
Abstract:
This paper explores the use of semantic knowledge inherent in the cyber-physical system (CPS) under study in order to minimize the use of explicit communication, which refers to the use of physical radio resources to transmit potentially informative data. It is assumed that the acquired data have a function in the system, usually related to its state estimation, which may trigger control actions.…
▽ More
This paper explores the use of semantic knowledge inherent in the cyber-physical system (CPS) under study in order to minimize the use of explicit communication, which refers to the use of physical radio resources to transmit potentially informative data. It is assumed that the acquired data have a function in the system, usually related to its state estimation, which may trigger control actions. We propose that a semantic-functional approach can leverage the semantic-enabled implicit communication while guaranteeing that the system maintains functionality under the required performance. We illustrate the potential of this proposal through simulations of a swarm of drones jointly performing remote sensing in a given area. Our numerical results demonstrate that the proposed method offers the best design option regarding the ability to accomplish a previously established task -- remote sensing in the addressed case -- while minimising the use of radio resources by controlling the trade-offs that jointly determine the CPS performance and its effectiveness in the use of resources. In this sense, we establish a fundamental relationship between energy, communication, and functionality considering a given end application.
△ Less
Submitted 31 May, 2023;
originally announced May 2023.
-
Unleashing 3D Connectivity in Beyond 5G Networks with Reconfigurable Intelligent Surfaces
Authors:
Jiguang He,
Aymen Fakhreddine,
Arthur S. de Sena,
Yu Tian,
Merouane Debbah
Abstract:
Reconfigurable intelligent surfaces (RISs) bring various benefits to the current and upcoming wireless networks, including enhanced spectrum and energy efficiency, soft handover, transmission reliability, and even localization accuracy. These remarkable improvements result from the reconfigurability, programmability, and adaptation capabilities of RISs for fine-tuning radio propagation environment…
▽ More
Reconfigurable intelligent surfaces (RISs) bring various benefits to the current and upcoming wireless networks, including enhanced spectrum and energy efficiency, soft handover, transmission reliability, and even localization accuracy. These remarkable improvements result from the reconfigurability, programmability, and adaptation capabilities of RISs for fine-tuning radio propagation environments, which can be realized in a cost- and energy-efficient manner. In this paper, we focus on the upgrade of the existing fifth-generation (5G) cellular network with the introduction of an RIS owning a full-dimensional uniform planar array structure for unleashing advanced three-dimensional connectivity. The deployed RIS is exploited for serving unmanned aerial vehicles (UAVs) flying in the sky with ultra-high data rate, a challenging task to be achieved with conventional base stations (BSs) that are designed mainly to serve ground users. By taking into account the line-of-sight probability for the RIS-UAV and BS-UAV links, we formulate the average achievable rate, analyze the effect of environmental parameters, and make insightful performance comparisons. Simulation results show that the deployment of RISs can bring impressive gains and significantly outperform conventional RIS-free 5G networks.
△ Less
Submitted 2 October, 2023; v1 submitted 8 May, 2023;
originally announced May 2023.
-
ULEEN: A Novel Architecture for Ultra Low-Energy Edge Neural Networks
Authors:
Zachary Susskind,
Aman Arora,
Igor D. S. Miranda,
Alan T. L. Bacellar,
Luis A. Q. Villon,
Rafael F. Katopodis,
Leandro S. de Araujo,
Diego L. C. Dutra,
Priscila M. V. Lima,
Felipe M. G. Franca,
Mauricio Breternitz Jr.,
Lizy K. John
Abstract:
The deployment of AI models on low-power, real-time edge devices requires accelerators for which energy, latency, and area are all first-order concerns. There are many approaches to enabling deep neural networks (DNNs) in this domain, including pruning, quantization, compression, and binary neural networks (BNNs), but with the emergence of the "extreme edge", there is now a demand for even more ef…
▽ More
The deployment of AI models on low-power, real-time edge devices requires accelerators for which energy, latency, and area are all first-order concerns. There are many approaches to enabling deep neural networks (DNNs) in this domain, including pruning, quantization, compression, and binary neural networks (BNNs), but with the emergence of the "extreme edge", there is now a demand for even more efficient models. In order to meet the constraints of ultra-low-energy devices, we propose ULEEN, a model architecture based on weightless neural networks. Weightless neural networks (WNNs) are a class of neural model which use table lookups, not arithmetic, to perform computation. The elimination of energy-intensive arithmetic operations makes WNNs theoretically well suited for edge inference; however, they have historically suffered from poor accuracy and excessive memory usage. ULEEN incorporates algorithmic improvements and a novel training strategy inspired by BNNs to make significant strides in improving accuracy and reducing model size. We compare FPGA and ASIC implementations of an inference accelerator for ULEEN against edge-optimized DNN and BNN devices. On a Xilinx Zynq Z-7045 FPGA, we demonstrate classification on the MNIST dataset at 14.3 million inferences per second (13 million inferences/Joule) with 0.21 $μ$s latency and 96.2% accuracy, while Xilinx FINN achieves 12.3 million inferences per second (1.69 million inferences/Joule) with 0.31 $μ$s latency and 95.83% accuracy. In a 45nm ASIC, we achieve 5.1 million inferences/Joule and 38.5 million inferences/second at 98.46% accuracy, while a quantized Bit Fusion model achieves 9230 inferences/Joule and 19,100 inferences/second at 99.35% accuracy. In our search for ever more efficient edge devices, ULEEN shows that WNNs are deserving of consideration.
△ Less
Submitted 20 April, 2023;
originally announced April 2023.
-
A Supervisory Learning Control Framework for Autonomous & Real-time Task Planning for an Underactuated Cooperative Robotic task
Authors:
Sander De Witte,
Tom Lefebvre,
Thijs Van Hauwermeiren,
Guillaume Crevecoeur
Abstract:
We introduce a framework for cooperative manipulation, applied on an underactuated manipulation problem. Two stationary robotic manipulators are required to cooperate in order to reposition an object within their shared work space. Control of multi-agent systems for manipulation tasks cannot rely on individual control strategies with little to no communication between the agents that serve the com…
▽ More
We introduce a framework for cooperative manipulation, applied on an underactuated manipulation problem. Two stationary robotic manipulators are required to cooperate in order to reposition an object within their shared work space. Control of multi-agent systems for manipulation tasks cannot rely on individual control strategies with little to no communication between the agents that serve the common objective through swarming. Instead a coordination strategy is required that queries subtasks to the individual agents. We formulate the problem in a Task And Motion Planning (TAMP) setting, while considering a decomposition strategy that allows us to treat the task and motion planning problems separately. We solve the supervisory planning problem offline using deep Reinforcement Learning techniques resulting into a supervisory policy capable of coordinating the two manipulators into a successful execution of the pick-and-place task. Additionally, a benefit of solving the task planning problem offline is the possibility of real-time (re)planning, demonstrating robustness in the event of subtask execution failure or on-the-fly task changes. The framework achieved zero-shot deployment on the real setup with a success rate that is higher than 90%.
△ Less
Submitted 22 February, 2023;
originally announced February 2023.