-
Rotatable Antenna-Enabled Spectrum Sharing in Cognitive Radio Systems
Authors:
Yanhua Tan,
Beixiong Zheng,
Yi Fang,
Derrick Wing Kwan Ng,
Rui Zhang,
Jie Xu
Abstract:
Rotatable antenna (RA) technology has recently drawn significant attention in wireless systems owing to its unique ability to exploit additional spatial degrees-of-freedom (DoFs) by dynamically adjusting the three-dimensional (3D) boresight direction of each antenna. In this letter, we propose a new RA-assisted cognitive radio (CR) system designed to achieve efficient spectrum sharing while mitiga…
▽ More
Rotatable antenna (RA) technology has recently drawn significant attention in wireless systems owing to its unique ability to exploit additional spatial degrees-of-freedom (DoFs) by dynamically adjusting the three-dimensional (3D) boresight direction of each antenna. In this letter, we propose a new RA-assisted cognitive radio (CR) system designed to achieve efficient spectrum sharing while mitigating interference between primary and secondary communication links. Specifically, we formulate an optimization problem for the joint design of the transmit beamforming and the boresight directions of RAs at the secondary transmitter (ST), aimed at maximizing the received signal-to-interference-plus-noise ratio (SINR) at the secondary receiver (SR), while satisfying both interference constraint at the primary receiver (PR) and the maximum transmit power constraint at the ST. Although the formulated problem is challenging to solve due to its non-convexity and coupled variables, we develop an efficient algorithm by leveraging alternating optimization (AO) and successive convex approximation (SCA) techniques to acquire high-quality solutions. Numerical results demonstrate that the proposed RA-assisted system substantially outperforms conventional benchmark schemes in spectrum-sharing CR systems, validating RA's capability to simultaneously enhance the communication quality at the SR and mitigate interference at the PR.
△ Less
Submitted 29 September, 2025;
originally announced September 2025.
-
GPS Denied IBVS-Based Navigation and Collision Avoidance of UAV Using a Low-Cost RGB Camera
Authors:
Xiaoyu Wang,
Yan Rui Tan,
William Leong,
Sunan Huang,
Rodney Teo,
Cheng Xiang
Abstract:
This paper proposes an image-based visual servoing (IBVS) framework for UAV navigation and collision avoidance using only an RGB camera. While UAV navigation has been extensively studied, it remains challenging to apply IBVS in missions involving multiple visual targets and collision avoidance. The proposed method achieves navigation without explicit path planning, and collision avoidance is reali…
▽ More
This paper proposes an image-based visual servoing (IBVS) framework for UAV navigation and collision avoidance using only an RGB camera. While UAV navigation has been extensively studied, it remains challenging to apply IBVS in missions involving multiple visual targets and collision avoidance. The proposed method achieves navigation without explicit path planning, and collision avoidance is realized through AI-based monocular depth estimation from RGB images. Unlike approaches that rely on stereo cameras or external workstations, our framework runs fully onboard a Jetson platform, ensuring a self-contained and deployable system. Experimental results validate that the UAV can navigate across multiple AprilTags and avoid obstacles effectively in GPS-denied environments.
△ Less
Submitted 22 September, 2025;
originally announced September 2025.
-
From Image Denoisers to Regularizing Imaging Inverse Problems: An Overview
Authors:
Hong Ye Tan,
Subhadip Mukherjee,
Junqi Tang
Abstract:
Inverse problems lie at the heart of modern imaging science, with broad applications in areas such as medical imaging, remote sensing, and microscopy. Recent years have witnessed a paradigm shift in solving imaging inverse problems, where data-driven regularizers are used increasingly, leading to remarkably high-fidelity reconstruction. A particularly notable approach for data-driven regularizatio…
▽ More
Inverse problems lie at the heart of modern imaging science, with broad applications in areas such as medical imaging, remote sensing, and microscopy. Recent years have witnessed a paradigm shift in solving imaging inverse problems, where data-driven regularizers are used increasingly, leading to remarkably high-fidelity reconstruction. A particularly notable approach for data-driven regularization is to use learned image denoisers as implicit priors in iterative image reconstruction algorithms. This survey presents a comprehensive overview of this powerful and emerging class of algorithms, commonly referred to as plug-and-play (PnP) methods. We begin by providing a brief background on image denoising and inverse problems, followed by a short review of traditional regularization strategies. We then explore how proximal splitting algorithms, such as the alternating direction method of multipliers (ADMM) and proximal gradient descent (PGD), can naturally accommodate learned denoisers in place of proximal operators, and under what conditions such replacements preserve convergence. The role of Tweedie's formula in connecting optimal Gaussian denoisers and score estimation is discussed, which lays the foundation for regularization-by-denoising (RED) and more recent diffusion-based posterior sampling methods. We discuss theoretical advances regarding the convergence of PnP algorithms, both within the RED and proximal settings, emphasizing the structural assumptions that the denoiser must satisfy for convergence, such as non-expansiveness, Lipschitz continuity, and local homogeneity. We also address practical considerations in algorithm design, including choices of denoiser architecture and acceleration strategies.
△ Less
Submitted 3 September, 2025;
originally announced September 2025.
-
Fusing Structural Phenotypes with Functional Data for Early Prediction of Primary Angle Closure Glaucoma Progression
Authors:
Swati Sharma,
Thanadet Chuangsuwanich,
Royston K. Y. Tan,
Shimna C. Prasad,
Tin A. Tun,
Shamira A. Perera,
Martin L. Buist,
Tin Aung,
Monisha E. Nongpiur,
Michaël J. A. Girard
Abstract:
Purpose: To classify eyes as slow or fast glaucoma progressors in patients with primary angle closure glaucoma (PACG) using an integrated approach combining optic nerve head (ONH) structural features and sector-based visual field (VF) functional parameters. Methods: PACG patients with >5 reliable VF tests over >5 years were included. Progression was assessed in Zeiss Forum, with baseline VF within…
▽ More
Purpose: To classify eyes as slow or fast glaucoma progressors in patients with primary angle closure glaucoma (PACG) using an integrated approach combining optic nerve head (ONH) structural features and sector-based visual field (VF) functional parameters. Methods: PACG patients with >5 reliable VF tests over >5 years were included. Progression was assessed in Zeiss Forum, with baseline VF within six months of OCT. Fast progression was VFI decline <-2.0% per year; slow progression >-2.0% per year. OCT volumes were AI-segmented to extract 31 ONH parameters. The Glaucoma Hemifield Test defined five regions per hemifield, aligned with RNFL distribution. Mean sensitivity per region was combined with structural parameters to train ML classifiers. Multiple models were tested, and SHAP identified key predictors. Main outcome measures: Classification of slow versus fast progressors using combined structural and functional data. Results: We analyzed 451 eyes from 299 patients. Mean VFI progression was -0.92% per year; 369 eyes progressed slowly and 82 rapidly. The Random Forest model combining structural and functional features achieved the best performance (AUC = 0.87, 2000 Monte Carlo iterations). SHAP identified six key predictors: inferior MRW, inferior and inferior-temporal RNFL thickness, nasal-temporal LC curvature, superior nasal VF sensitivity, and inferior RNFL and GCL+IPL thickness. Models using only structural or functional features performed worse with AUC of 0.82 and 0.78, respectively. Conclusions: Combining ONH structural and VF functional parameters significantly improves classification of progression risk in PACG. Inferior ONH features, MRW and RNFL thickness, were the most predictive, highlighting the critical role of ONH morphology in monitoring disease progression.
△ Less
Submitted 19 August, 2025;
originally announced August 2025.
-
OpenGCRAM: An Open-Source Gain Cell Compiler Enabling Design-Space Exploration for AI Workloads
Authors:
Xinxin Wang,
Lixian Yan,
Shuhan Liu,
Luke Upton,
Zhuoqi Cai,
Yiming Tan,
Shengman Li,
Koustav Jana,
Peijing Li,
Jesse Cirimelli-Low,
Thierry Tambe,
Matthew Guthaus,
H. -S. Philip Wong
Abstract:
Gain Cell memory (GCRAM) offers higher density and lower power than SRAM, making it a promising candidate for on-chip memory in domain-specific accelerators. To support workloads with varying traffic and lifetime metrics, GCRAM also offers high bandwidth, ultra low leakage power and a wide range of retention times, which can be adjusted through transistor design (like threshold voltage and channel…
▽ More
Gain Cell memory (GCRAM) offers higher density and lower power than SRAM, making it a promising candidate for on-chip memory in domain-specific accelerators. To support workloads with varying traffic and lifetime metrics, GCRAM also offers high bandwidth, ultra low leakage power and a wide range of retention times, which can be adjusted through transistor design (like threshold voltage and channel material) and on-the-fly by changing the operating voltage. However, designing and optimizing GCRAM sub-systems can be time-consuming. In this paper, we present OpenGCRAM, an open-source GCRAM compiler capable of generating GCRAM bank circuit designs and DRC- and LVS-clean layouts for commercially available foundry CMOS, while also providing area, delay, and power simulations based on user-specified configurations (e.g., word size and number of words). OpenGCRAM enables fast, accurate, customizable, and optimized GCRAM block generation, reduces design time, ensure process compliance, and delivers performance-tailored memory blocks that meet diverse application requirements.
△ Less
Submitted 14 July, 2025;
originally announced July 2025.
-
MEGANet-W: A Wavelet-Driven Edge-Guided Attention Framework for Weak Boundary Polyp Detection
Authors:
Zhe Yee Tan,
Ashwaq Qasem
Abstract:
Colorectal polyp segmentation is critical for early detection of colorectal cancer, yet weak and low contrast boundaries significantly limit automated accuracy. Existing deep models either blur fine edge details or rely on handcrafted filters that perform poorly under variable imaging conditions. We propose MEGANet-W, a Wavelet Driven Edge Guided Attention Network that injects directional, paramet…
▽ More
Colorectal polyp segmentation is critical for early detection of colorectal cancer, yet weak and low contrast boundaries significantly limit automated accuracy. Existing deep models either blur fine edge details or rely on handcrafted filters that perform poorly under variable imaging conditions. We propose MEGANet-W, a Wavelet Driven Edge Guided Attention Network that injects directional, parameter free Haar wavelet edge maps into each decoder stage to recalibrate semantic features. The key novelties of MEGANet-W include a two-level Haar wavelet head for multi-orientation edge extraction; and Wavelet Edge Guided Attention (W-EGA) modules that fuse wavelet cues with boundary and input branches. On five public polyp datasets, MEGANet-W consistently outperforms existing methods, improving mIoU by up to 2.3% and mDice by 1.2%, while introducing no additional learnable parameters. This approach improves reliability in difficult cases and offers a robust solution for medical image segmentation tasks requiring precise boundary detection.
△ Less
Submitted 17 September, 2025; v1 submitted 3 July, 2025;
originally announced July 2025.
-
Automatic Rank Determination for Low-Rank Adaptation via Submodular Function Maximization
Authors:
Yihang Gao,
Vincent Y. F. Tan
Abstract:
In this paper, we propose SubLoRA, a rank determination method for Low-Rank Adaptation (LoRA) based on submodular function maximization. In contrast to prior approaches, such as AdaLoRA, that rely on first-order (linearized) approximations of the loss function, SubLoRA utilizes second-order information to capture the potentially complex loss landscape by incorporating the Hessian matrix. We show t…
▽ More
In this paper, we propose SubLoRA, a rank determination method for Low-Rank Adaptation (LoRA) based on submodular function maximization. In contrast to prior approaches, such as AdaLoRA, that rely on first-order (linearized) approximations of the loss function, SubLoRA utilizes second-order information to capture the potentially complex loss landscape by incorporating the Hessian matrix. We show that the linearization becomes inaccurate and ill-conditioned when the LoRA parameters have been well optimized, motivating the need for a more reliable and nuanced second-order formulation. To this end, we reformulate the rank determination problem as a combinatorial optimization problem with a quadratic objective. However, solving this problem exactly is NP-hard in general. To overcome the computational challenge, we introduce a submodular function maximization framework and devise a greedy algorithm with approximation guarantees. We derive a sufficient and necessary condition under which the rank-determination objective becomes submodular, and construct a closed-form projection of the Hessian matrix that satisfies this condition while maintaining computational efficiency. Our method combines solid theoretical foundations, second-order accuracy, and practical computational efficiency. We further extend SubLoRA to a joint optimization setting, alternating between LoRA parameter updates and rank determination under a rank budget constraint. Extensive experiments on fine-tuning physics-informed neural networks (PINNs) for solving partial differential equations (PDEs) demonstrate the effectiveness of our approach. Results show that SubLoRA outperforms existing methods in both rank determination and joint training performance.
△ Less
Submitted 2 July, 2025;
originally announced July 2025.
-
DCD: A Semantic Segmentation Model for Fetal Ultrasound Four-Chamber View
Authors:
Donglian Li,
Hui Guo,
Minglang Chen,
Huizhen Chen,
Jialing Chen,
Bocheng Liang,
Pengchen Liang,
Ying Tan
Abstract:
Accurate segmentation of anatomical structures in the apical four-chamber (A4C) view of fetal echocardiography is essential for early diagnosis and prenatal evaluation of congenital heart disease (CHD). However, precise segmentation remains challenging due to ultrasound artifacts, speckle noise, anatomical variability, and boundary ambiguity across different gestational stages. To reduce the workl…
▽ More
Accurate segmentation of anatomical structures in the apical four-chamber (A4C) view of fetal echocardiography is essential for early diagnosis and prenatal evaluation of congenital heart disease (CHD). However, precise segmentation remains challenging due to ultrasound artifacts, speckle noise, anatomical variability, and boundary ambiguity across different gestational stages. To reduce the workload of sonographers and enhance segmentation accuracy, we propose DCD, an advanced deep learning-based model for automatic segmentation of key anatomical structures in the fetal A4C view. Our model incorporates a Dense Atrous Spatial Pyramid Pooling (Dense ASPP) module, enabling superior multi-scale feature extraction, and a Convolutional Block Attention Module (CBAM) to enhance adaptive feature representation. By effectively capturing both local and global contextual information, DCD achieves precise and robust segmentation, contributing to improved prenatal cardiac assessment.
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
NMCSE: Noise-Robust Multi-Modal Coupling Signal Estimation Method via Optimal Transport for Cardiovascular Disease Detection
Authors:
Peihong Zhang,
Zhixin Li,
Rui Sang,
Yuxuan Liu,
Yiqiang Cai,
Yizhou Tan,
Shengchen Li
Abstract:
Electrocardiogram (ECG) and Phonocardiogram (PCG) signals are linked by a latent coupling signal representing the electrical-to-mechanical cardiac transformation. While valuable for cardiovascular disease (CVD) detection, this coupling signal is traditionally estimated using deconvolution methods that amplify noise, limiting clinical utility. In this paper, we propose Noise-Robust Multi-Modal Coup…
▽ More
Electrocardiogram (ECG) and Phonocardiogram (PCG) signals are linked by a latent coupling signal representing the electrical-to-mechanical cardiac transformation. While valuable for cardiovascular disease (CVD) detection, this coupling signal is traditionally estimated using deconvolution methods that amplify noise, limiting clinical utility. In this paper, we propose Noise-Robust Multi-Modal Coupling Signal Estimation (NMCSE), which reformulates the problem as distribution matching via optimal transport theory. By jointly optimizing amplitude and temporal alignment, NMCSE mitigates noise amplification without additional preprocessing. Integrated with our Temporal-Spatial Feature Extraction network, NMCSE enables robust multi-modal CVD detection. Experiments on the PhysioNet 2016 dataset with realistic hospital noise demonstrate that NMCSE reduces estimation errors by approximately 30% in Mean Squared Error while maintaining higher Pearson Correlation Coefficients across all tested signal-to-noise ratios. Our approach achieves 97.38% accuracy and 0.98 AUC in CVD detection, outperforming state-of-the-art methods and demonstrating robust performance for real-world clinical applications.
△ Less
Submitted 2 June, 2025; v1 submitted 14 May, 2025;
originally announced May 2025.
-
Low-Rank Adaptive Structural Priors for Generalizable Diabetic Retinopathy Grading
Authors:
Yunxuan Wang,
Ray Yin,
Yumei Tan,
Hao Chen,
Haiying Xia
Abstract:
Diabetic retinopathy (DR), a serious ocular complication of diabetes, is one of the primary causes of vision loss among retinal vascular diseases. Deep learning methods have been extensively applied in the grading of diabetic retinopathy (DR). However, their performance declines significantly when applied to data outside the training distribution due to domain shifts. Domain generalization (DG) ha…
▽ More
Diabetic retinopathy (DR), a serious ocular complication of diabetes, is one of the primary causes of vision loss among retinal vascular diseases. Deep learning methods have been extensively applied in the grading of diabetic retinopathy (DR). However, their performance declines significantly when applied to data outside the training distribution due to domain shifts. Domain generalization (DG) has emerged as a solution to this challenge. However, most existing DG methods overlook lesion-specific features, resulting in insufficient accuracy. In this paper, we propose a novel approach that enhances existing DG methods by incorporating structural priors, inspired by the observation that DR grading is heavily dependent on vessel and lesion structures. We introduce Low-rank Adaptive Structural Priors (LoASP), a plug-and-play framework designed for seamless integration with existing DG models. LoASP improves generalization by learning adaptive structural representations that are finely tuned to the complexities of DR diagnosis. Extensive experiments on eight diverse datasets validate its effectiveness in both single-source and multi-source domain scenarios. Furthermore, visualizations reveal that the learned structural priors intuitively align with the intricate architecture of the vessels and lesions, providing compelling insights into their interpretability and diagnostic relevance.
△ Less
Submitted 27 April, 2025;
originally announced April 2025.
-
Audio-Plane: Audio Factorization Plane Gaussian Splatting for Real-Time Talking Head Synthesis
Authors:
Shuai Shen,
Wanhua Li,
Yunpeng Zhang,
Yap-Peng Tan,
Jiwen Lu
Abstract:
Talking head synthesis has emerged as a prominent research topic in computer graphics and multimedia, yet most existing methods often struggle to strike a balance between generation quality and computational efficiency, particularly under real-time constraints. In this paper, we propose a novel framework that integrates Gaussian Splatting with a structured Audio Factorization Plane (Audio-Plane) t…
▽ More
Talking head synthesis has emerged as a prominent research topic in computer graphics and multimedia, yet most existing methods often struggle to strike a balance between generation quality and computational efficiency, particularly under real-time constraints. In this paper, we propose a novel framework that integrates Gaussian Splatting with a structured Audio Factorization Plane (Audio-Plane) to enable high-quality, audio-synchronized, and real-time talking head generation. For modeling a dynamic talking head, a 4D volume representation, which consists of three axes in 3D space and one temporal axis aligned with audio progression, is typically required. However, directly storing and processing a dense 4D grid is impractical due to the high memory and computation cost, and lack of scalability for longer durations. We address this challenge by decomposing the 4D volume representation into a set of audio-independent spatial planes and audio-dependent planes, forming a compact and interpretable representation for talking head modeling that we refer to as the Audio-Plane. This factorized design allows for efficient and fine-grained audio-aware spatial encoding, and significantly enhances the model's ability to capture complex lip dynamics driven by speech signals. To further improve region-specific motion modeling, we introduce an audio-guided saliency splatting mechanism based on region-aware modulation, which adaptively emphasizes highly dynamic regions such as the mouth area. This allows the model to focus its learning capacity on where it matters most for accurate speech-driven animation. Extensive experiments on both the self-driven and the cross-driven settings demonstrate that our method achieves state-of-the-art visual quality, precise audio-lip synchronization, and real-time performance, outperforming prior approaches across both 2D- and 3D-based paradigms.
△ Less
Submitted 26 June, 2025; v1 submitted 28 March, 2025;
originally announced March 2025.
-
Deep Learning-Based Quantitative Assessment of Renal Chronicity Indices in Lupus Nephritis
Authors:
Tianqi Tu,
Hui Wang,
Jiangbo Pei,
Xiaojuan Yu,
Aidong Men,
Suxia Wang,
Qingchao Chen,
Ying Tan,
Feng Yu,
Minghui Zhao
Abstract:
Background: Renal chronicity indices (CI) have been identified as strong predictors of long-term outcomes in lupus nephritis (LN) patients. However, assessment by pathologists is hindered by challenges such as substantial time requirements, high interobserver variation, and susceptibility to fatigue. This study aims to develop an effective deep learning (DL) pipeline that automates the assessment…
▽ More
Background: Renal chronicity indices (CI) have been identified as strong predictors of long-term outcomes in lupus nephritis (LN) patients. However, assessment by pathologists is hindered by challenges such as substantial time requirements, high interobserver variation, and susceptibility to fatigue. This study aims to develop an effective deep learning (DL) pipeline that automates the assessment of CI and provides valuable prognostic insights from a disease-specific perspective. Methods: We curated a dataset comprising 282 slides obtained from 141 patients across two independent cohorts with a complete 10-years follow-up. Our DL pipeline was developed on 60 slides (22,410 patch images) from 30 patients in the training cohort and evaluated on both an internal testing set (148 slides, 77,605 patch images) and an external testing set (74 slides, 27,522 patch images). Results: The study included two cohorts with slight demographic differences, particularly in age and hemoglobin levels. The DL pipeline showed high segmentation performance across tissue compartments and histopathologic lesions, outperforming state-of-the-art methods. The DL pipeline also demonstrated a strong correlation with pathologists in assessing CI, significantly improving interobserver agreement. Additionally, the DL pipeline enhanced prognostic accuracy, particularly in outcome prediction, when combined with clinical parameters and pathologist-assessed CIs Conclusions: The DL pipeline demonstrated accuracy and efficiency in assessing CI in LN, showing promise in improving interobserver agreement among pathologists. It also exhibited significant value in prognostic analysis and enhancing outcome prediction in LN patients, offering a valuable tool for clinical decision-making.
△ Less
Submitted 26 March, 2025;
originally announced March 2025.
-
Improving Acoustic Scene Classification with City Features
Authors:
Yiqiang Cai,
Yizhou Tan,
Shengchen Li,
Xi Shao,
Mark D. Plumbley
Abstract:
Acoustic scene recordings are often collected from a diverse range of cities. Most existing acoustic scene classification (ASC) approaches focus on identifying common acoustic scene patterns across cities to enhance generalization. However, the potential acoustic differences introduced by city-specific environmental and cultural factors are overlooked. In this paper, we hypothesize that the city-s…
▽ More
Acoustic scene recordings are often collected from a diverse range of cities. Most existing acoustic scene classification (ASC) approaches focus on identifying common acoustic scene patterns across cities to enhance generalization. However, the potential acoustic differences introduced by city-specific environmental and cultural factors are overlooked. In this paper, we hypothesize that the city-specific acoustic features are beneficial for the ASC task rather than being treated as noise or bias. To this end, we propose City2Scene, a novel framework that leverages city features to improve ASC. Unlike conventional approaches that may discard or suppress city information, City2Scene transfers the city-specific knowledge from pre-trained city classification models to scene classification model using knowledge distillation. We evaluate City2Scene on three datasets of DCASE Challenge Task 1, which include both scene and city labels. Experimental results demonstrate that city features provide valuable information for classifying scenes. By distilling city-specific knowledge, City2Scene effectively improves accuracy across a variety of lightweight CNN backbones, achieving competitive performance to the top-ranked solutions of DCASE Challenge in recent years.
△ Less
Submitted 12 June, 2025; v1 submitted 21 March, 2025;
originally announced March 2025.
-
Adapting Automatic Speech Recognition for Accented Air Traffic Control Communications
Authors:
Marcus Yu Zhe Wee,
Justin Juin Hng Wong,
Lynus Lim,
Joe Yu Wei Tan,
Prannaya Gupta,
Dillion Lim,
En Hao Tew,
Aloysius Keng Siew Han,
Yong Zhi Lim
Abstract:
Effective communication in Air Traffic Control (ATC) is critical to maintaining aviation safety, yet the challenges posed by accented English remain largely unaddressed in Automatic Speech Recognition (ASR) systems. Existing models struggle with transcription accuracy for Southeast Asian-accented (SEA-accented) speech, particularly in noisy ATC environments. This study presents the development of…
▽ More
Effective communication in Air Traffic Control (ATC) is critical to maintaining aviation safety, yet the challenges posed by accented English remain largely unaddressed in Automatic Speech Recognition (ASR) systems. Existing models struggle with transcription accuracy for Southeast Asian-accented (SEA-accented) speech, particularly in noisy ATC environments. This study presents the development of ASR models fine-tuned specifically for Southeast Asian accents using a newly created dataset. Our research achieves significant improvements, achieving a Word Error Rate (WER) of 0.0982 or 9.82% on SEA-accented ATC speech. Additionally, the paper highlights the importance of region-specific datasets and accent-focused training, offering a pathway for deploying ASR systems in resource-constrained military operations. The findings emphasize the need for noise-robust training techniques and region-specific datasets to improve transcription accuracy for non-Western accents in ATC communications.
△ Less
Submitted 27 February, 2025;
originally announced February 2025.
-
Stabilization of singularly perturbed networked control systems over a single channel
Authors:
Weixuan Wang,
Alejandro I. Maass,
Dragan Nešić,
Ying Tan,
Romain Postoyan,
W. P. M. H. Heemels
Abstract:
This paper studies the emulation-based stabilization of nonlinear networked control systems with two time scales. We address the challenge of using a single communication channel for transmitting both fast and slow variables between the plant and the controller. A novel dual clock mechanism is proposed to schedule transmissions for this purpose. The system is modeled as a hybrid singularly perturb…
▽ More
This paper studies the emulation-based stabilization of nonlinear networked control systems with two time scales. We address the challenge of using a single communication channel for transmitting both fast and slow variables between the plant and the controller. A novel dual clock mechanism is proposed to schedule transmissions for this purpose. The system is modeled as a hybrid singularly perturbed dynamical system, and singular perturbation analysis is employed to determine individual maximum allowable transmission intervals for both fast and slow variables, ensuring semi-global practical asymptotic stability. Enhanced stability guarantees are also provided under stronger assumptions. The efficacy of the proposed method is illustrated through a numerical example.
△ Less
Submitted 25 February, 2025;
originally announced February 2025.
-
Rotatable Antenna Enabled Wireless Communication System with Visual Recognition: A Prototype Implementation
Authors:
Liang Dai,
Beixiong Zheng,
Yanhua Tan,
Lipeng Zhu,
Fangjiong Chen,
Rui Zhang
Abstract:
Rotatable antenna (RA) is an emerging technology that has great potential to exploit additional spatial degrees of freedom (DoFs) by flexibly altering the three-dimensional (3D) orientation/boresight of each antenna. In this demonstration, we present a prototype of the RA-enabled wireless communication system with a visual recognition module to evaluate the performance gains provided by the RA in…
▽ More
Rotatable antenna (RA) is an emerging technology that has great potential to exploit additional spatial degrees of freedom (DoFs) by flexibly altering the three-dimensional (3D) orientation/boresight of each antenna. In this demonstration, we present a prototype of the RA-enabled wireless communication system with a visual recognition module to evaluate the performance gains provided by the RA in practical environments. In particular, a mechanically-driven RA is developed by integrating a digital servo motor, a directional antenna, and a microcontroller, which enables the dynamic adjustment of the RA orientation. Moreover, the orientation adjustment of the RA is guided by the user's direction information provided by the visual recognition module, thereby significantly enhancing system response speed and self-orientation accuracy. The experimental results demonstrate that the RA-enabled communication system achieves significant improvement in communication coverage performance compared to the conventional fixed antenna system.
△ Less
Submitted 23 March, 2025; v1 submitted 24 February, 2025;
originally announced February 2025.
-
Adaptive Traffic Element-Based Streetlight Control Using Neighbor Discovery Algorithm Based on IoT Events
Authors:
Yupeng Tan,
Sheng Xu,
Chengyue Su
Abstract:
Intelligent streetlight systems divide the streetlight network into multiple sectors, activating only the streetlights in the corresponding sectors when traffic elements pass by, rather than all streetlights, effectively reducing energy waste. This strategy requires streetlights to understand their neighbor relationships to illuminate only the streetlights in their respective sectors. However, man…
▽ More
Intelligent streetlight systems divide the streetlight network into multiple sectors, activating only the streetlights in the corresponding sectors when traffic elements pass by, rather than all streetlights, effectively reducing energy waste. This strategy requires streetlights to understand their neighbor relationships to illuminate only the streetlights in their respective sectors. However, manually configuring the neighbor relationships for a large number of streetlights in complex large-scale road streetlight networks is cumbersome and prone to errors. Due to the crisscrossing nature of roads, it is also difficult to determine the neighbor relationships using GPS or communication positioning. In response to these issues, this article proposes a systematic approach to model the streetlight network as a social network and construct a neighbor relationship probabilistic graph using IoT event records of streetlights detecting traffic elements. Based on this, a multi-objective genetic algorithm based probabilistic graph clustering method is designed to discover the neighbor relationships of streetlights. Considering the characteristic that pedestrians and vehicles usually move at a constant speed on a section of a road, speed consistency is introduced as an optimization objective, which, together with traditional similarity measures, forms a multi-objective function, enhancing the accuracy of neighbor relationship discovery. Extensive experiments on simulation datasets were conducted, comparing the proposed algorithm with other probabilistic graph clustering algorithms. The results demonstrate that the proposed algorithm can more accurately identify the neighbor relationships of streetlights compared to other algorithms, effectively achieving adaptive streetlight control for traffic elements.
△ Less
Submitted 1 December, 2024;
originally announced December 2024.
-
Online 4D Ultrasound-Guided Robotic Tracking Enables 3D Ultrasound Localisation Microscopy with Large Tissue Displacements
Authors:
Jipeng Yan,
Qingyuan Tan,
Shusei Kawara,
Jingwen Zhu,
Bingxue Wang,
Matthieu Toulemonde,
Honghai Liu,
Ying Tan,
Meng-Xing Tang
Abstract:
Super-Resolution Ultrasound (SRUS) imaging through localising and tracking microbubbles, also known as Ultrasound Localisation Microscopy (ULM), has demonstrated significant potential for reconstructing microvasculature and flows with sub-diffraction resolution in clinical diagnostics. However, imaging organs with large tissue movements, such as those caused by respiration, presents substantial ch…
▽ More
Super-Resolution Ultrasound (SRUS) imaging through localising and tracking microbubbles, also known as Ultrasound Localisation Microscopy (ULM), has demonstrated significant potential for reconstructing microvasculature and flows with sub-diffraction resolution in clinical diagnostics. However, imaging organs with large tissue movements, such as those caused by respiration, presents substantial challenges. Existing methods often require breath holding to maintain accumulation accuracy, which limits data acquisition time and ULM image saturation. To improve image quality in the presence of large tissue movements, this study introduces an approach integrating high-frame-rate ultrasound with online precise robotic probe control. Tested on a microvasculature phantom with translation motions up to 20 mm, twice the aperture size of the matrix array used, our method achieved real-time tracking of the moving phantom and imaging volume rate at 85 Hz, keeping majority of the target volume in the imaging field of view. ULM images of the moving cross channels in the phantom were successfully reconstructed in post-processing, demonstrating the feasibility of super-resolution imaging under large tissue motions. This represents a significant step towards ULM imaging of organs with large motion.
△ Less
Submitted 25 March, 2025; v1 submitted 17 September, 2024;
originally announced September 2024.
-
Self-Supervised Elimination of Non-Independent Noise in Hyperspectral Imaging
Authors:
Guangrui Ding,
Chang Liu,
Jiaze Yin,
Xinyan Teng,
Yuying Tan,
Hongjian He,
Haonan Lin,
Lei Tian,
Ji-Xin Cheng
Abstract:
Hyperspectral imaging has been widely used for spectral and spatial identification of target molecules, yet often contaminated by sophisticated noise. Current denoising methods generally rely on independent and identically distributed noise statistics, showing corrupted performance for non-independent noise removal. Here, we demonstrate Self-supervised PErmutation Noise2noise Denoising (SPEND), a…
▽ More
Hyperspectral imaging has been widely used for spectral and spatial identification of target molecules, yet often contaminated by sophisticated noise. Current denoising methods generally rely on independent and identically distributed noise statistics, showing corrupted performance for non-independent noise removal. Here, we demonstrate Self-supervised PErmutation Noise2noise Denoising (SPEND), a deep learning denoising architecture tailor-made for removing non-independent noise from a single hyperspectral image stack. We utilize hyperspectral stimulated Raman scattering and mid-infrared photothermal microscopy as the testbeds, where the noise is spatially correlated and spectrally varied. Based on single hyperspectral images, SPEND permutates odd and even spectral frames to generate two stacks with identical noise properties, and uses the pairs for efficient self-supervised noise-to-noise training. SPEND achieved an 8-fold signal-to-noise improvement without having access to the ground truth data. SPEND enabled accurate mapping of low concentration biomolecules in both fingerprint and silent regions, demonstrating its robustness in sophisticated cellular environments.
△ Less
Submitted 15 September, 2024;
originally announced September 2024.
-
Exploring Differences between Human Perception and Model Inference in Audio Event Recognition
Authors:
Yizhou Tan,
Yanru Wu,
Yuanbo Hou,
Xin Xu,
Hui Bu,
Shengchen Li,
Dick Botteldooren,
Mark D. Plumbley
Abstract:
Audio Event Recognition (AER) traditionally focuses on detecting and identifying audio events. Most existing AER models tend to detect all potential events without considering their varying significance across different contexts. This makes the AER results detected by existing models often have a large discrepancy with human auditory perception. Although this is a critical and significant issue, i…
▽ More
Audio Event Recognition (AER) traditionally focuses on detecting and identifying audio events. Most existing AER models tend to detect all potential events without considering their varying significance across different contexts. This makes the AER results detected by existing models often have a large discrepancy with human auditory perception. Although this is a critical and significant issue, it has not been extensively studied by the Detection and Classification of Sound Scenes and Events (DCASE) community because solving it is time-consuming and labour-intensive. To address this issue, this paper introduces the concept of semantic importance in AER, focusing on exploring the differences between human perception and model inference. This paper constructs a Multi-Annotated Foreground Audio Event Recognition (MAFAR) dataset, which comprises audio recordings labelled by 10 professional annotators. Through labelling frequency and variance, the MAFAR dataset facilitates the quantification of semantic importance and analysis of human perception. By comparing human annotations with the predictions of ensemble pre-trained models, this paper uncovers a significant gap between human perception and model inference in both semantic identification and existence detection of audio events. Experimental results reveal that human perception tends to ignore subtle or trivial events in the event semantic identification, while model inference is easily affected by events with noises. Meanwhile, in event existence detection, models are usually more sensitive than humans.
△ Less
Submitted 10 September, 2024;
originally announced September 2024.
-
Passenger hazard perception based on EEG signals for highly automated driving vehicles
Authors:
Ashton Yu Xuan Tan,
Yingkai Yang,
Xiaofei Zhang,
Bowen Li,
Xiaorong Gao,
Sifa Zheng,
Jianqiang Wang,
Xinyu Gu,
Jun Li,
Yang Zhao,
Yuxin Zhang,
Tania Stathaki
Abstract:
Enhancing the safety of autonomous vehicles is crucial, especially given recent accidents involving automated systems. As passengers in these vehicles, humans' sensory perception and decision-making can be integrated with autonomous systems to improve safety. This study explores neural mechanisms in passenger-vehicle interactions, leading to the development of a Passenger Cognitive Model (PCM) and…
▽ More
Enhancing the safety of autonomous vehicles is crucial, especially given recent accidents involving automated systems. As passengers in these vehicles, humans' sensory perception and decision-making can be integrated with autonomous systems to improve safety. This study explores neural mechanisms in passenger-vehicle interactions, leading to the development of a Passenger Cognitive Model (PCM) and the Passenger EEG Decoding Strategy (PEDS). Central to PEDS is a novel Convolutional Recurrent Neural Network (CRNN) that captures spatial and temporal EEG data patterns. The CRNN, combined with stacking algorithms, achieves an accuracy of $85.0\% \pm 3.18\%$. Our findings highlight the predictive power of pre-event EEG data, enhancing the detection of hazardous scenarios and offering a network-driven framework for safer autonomous vehicles.
△ Less
Submitted 27 March, 2025; v1 submitted 29 August, 2024;
originally announced August 2024.
-
A Mirror Descent-Based Algorithm for Corruption-Tolerant Distributed Gradient Descent
Authors:
Shuche Wang,
Vincent Y. F. Tan
Abstract:
Distributed gradient descent algorithms have come to the fore in modern machine learning, especially in parallelizing the handling of large datasets that are distributed across several workers. However, scant attention has been paid to analyzing the behavior of distributed gradient descent algorithms in the presence of adversarial corruptions instead of random noise. In this paper, we formulate a…
▽ More
Distributed gradient descent algorithms have come to the fore in modern machine learning, especially in parallelizing the handling of large datasets that are distributed across several workers. However, scant attention has been paid to analyzing the behavior of distributed gradient descent algorithms in the presence of adversarial corruptions instead of random noise. In this paper, we formulate a novel problem in which adversarial corruptions are present in a distributed learning system. We show how to use ideas from (lazy) mirror descent to design a corruption-tolerant distributed optimization algorithm. Extensive convergence analysis for (strongly) convex loss functions is provided for different choices of the stepsize. We carefully optimize the stepsize schedule to accelerate the convergence of the algorithm, while at the same time amortizing the effect of the corruption over time. Experiments based on linear regression, support vector classification, and softmax classification on the MNIST dataset corroborate our theoretical findings.
△ Less
Submitted 5 February, 2025; v1 submitted 19 July, 2024;
originally announced July 2024.
-
MIMO Capacity Analysis and Channel Estimation for Electromagnetic Information Theory
Authors:
Jieao Zhu,
Vincent Y. F. Tan,
Linglong Dai
Abstract:
Electromagnetic information theory (EIT) is an interdisciplinary subject that serves to integrate deterministic electromagnetic theory with stochastic Shannon's information theory. Existing EIT analysis operates in the continuous space domain, which is not aligned with the practical algorithms working in the discrete space domain. This mismatch leads to a significant difficulty in application of E…
▽ More
Electromagnetic information theory (EIT) is an interdisciplinary subject that serves to integrate deterministic electromagnetic theory with stochastic Shannon's information theory. Existing EIT analysis operates in the continuous space domain, which is not aligned with the practical algorithms working in the discrete space domain. This mismatch leads to a significant difficulty in application of EIT methodologies to practical discrete space systems, which is called as the discrete-continuous gap in this paper. To bridge this gap, we establish the discrete-continuous correspondence with a prolate spheroidal wave function (PSWF)-based ergodic capacity analysis framework. Specifically, we state and prove some discrete-continuous correspondence lemmas to establish a firm theoretical connection between discrete information-theoretic quantities to their continuous counterparts. With these lemmas, we apply the PSWF ergodic capacity bound to advanced MIMO architectures such as continuous-aperture MIMO (CAP-MIMO) and extremely large-scale MIMO (XL-MIMO). From this PSWF capacity bound, we discover the capacity saturation phenomenon both theoretically and empirically. Although the growth of MIMO performance is fundamentally limited in this EIT-based analysis framework, we reveal new opportunities in MIMO channel estimation by exploiting the EIT knowledge about the channel. Inspired by the PSWF capacity bound, we utilize continuous PSWFs to improve the pilot design of discrete MIMO channel estimators, which is called as the PSWF channel estimator (PSWF-CE). Simulation results demonstrate improved performances of the proposed PSWF-CE, compared to traditional minimum mean squared error (MMSE) and compressed sensing-based estimators.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Resilient control of networked switched systems subject to deception attack and DoS attack
Authors:
Rui Zhao,
Zhiqiang Zuo,
Ying Tan,
Yijing Wang,
Wentao Zhang
Abstract:
In this paper, the resilient control for switched systems in the presence of deception attack and denial-of-service (DoS) attack is addressed. Due to the interaction of two kinds of attacks and the asynchronous phenomenon of controller mode and subsystem mode, the system dynamics becomes much more complex. A criterion is derived to ensure the mean square security level of the closed-loop system. T…
▽ More
In this paper, the resilient control for switched systems in the presence of deception attack and denial-of-service (DoS) attack is addressed. Due to the interaction of two kinds of attacks and the asynchronous phenomenon of controller mode and subsystem mode, the system dynamics becomes much more complex. A criterion is derived to ensure the mean square security level of the closed-loop system. This in turn reveals the balance of system resilience and control performance. Furthermore, a mixed-switching control strategy is put forward to make the system globally asymptotically stable. It is shown that the system will still converge to the equilibrium even if the deception attack occurs. Finally, simulations are carried out to verify the effectiveness of the theoretical results.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
Multimodal Physical Fitness Monitoring (PFM) Framework Based on TimeMAE-PFM in Wearable Scenarios
Authors:
Junjie Zhang,
Zheming Zhang,
Huachen Xiang,
Yangquan Tan,
Linnan Huo,
Fengyi Wang
Abstract:
Physical function monitoring (PFM) plays a crucial role in healthcare especially for the elderly. Traditional assessment methods such as the Short Physical Performance Battery (SPPB) have failed to capture the full dynamic characteristics of physical function. Wearable sensors such as smart wristbands offer a promising solution to this issue. However, challenges exist, such as the computational co…
▽ More
Physical function monitoring (PFM) plays a crucial role in healthcare especially for the elderly. Traditional assessment methods such as the Short Physical Performance Battery (SPPB) have failed to capture the full dynamic characteristics of physical function. Wearable sensors such as smart wristbands offer a promising solution to this issue. However, challenges exist, such as the computational complexity of machine learning methods and inadequate information capture. This paper proposes a multi-modal PFM framework based on an improved TimeMAE, which compresses time-series data into a low-dimensional latent space and integrates a self-enhanced attention module. This framework achieves effective monitoring of physical health, providing a solution for real-time and personalized assessment. The method is validated using the NHATS dataset, and the results demonstrate an accuracy of 70.6% and an AUC of 82.20%, surpassing other state-of-the-art time-series classification models.
△ Less
Submitted 25 March, 2024;
originally announced April 2024.
-
Tunable Superconducting Magnetic Levitation with Self-Stability
Authors:
Qi Xu,
Yi Lin,
Yunfei Tan,
Jianzhao Geng
Abstract:
Magnetic levitation based on the flux pinning nature of type II superconductors has the merit of self-stability, making it appealing for applications such as high speed bearings, maglev trains, space generators, etc. However, such levitation systems physically rely on the superconductor pre-capturing magnetic flux (i.e. field cooling process) before establishing the levitation state which is nonad…
▽ More
Magnetic levitation based on the flux pinning nature of type II superconductors has the merit of self-stability, making it appealing for applications such as high speed bearings, maglev trains, space generators, etc. However, such levitation systems physically rely on the superconductor pre-capturing magnetic flux (i.e. field cooling process) before establishing the levitation state which is nonadjustable afterwards. Moreover, practical type II superconductors in the levitation system inevitably suffer from various sources of energy losses, leading to continuous levitation force decay. These intrinsic drawbacks make superconducting maglev inflexible and impractical for long term operation. Here we propose and demonstrate a new form of superconducting maglev which is tunable and with self-stability. The maglev system uses a closed-loop type II superconducting coil to lock flux of a magnet, establishing self-stable levitation between the two objects. A flux pump is used to modulate the total magnetic flux of the coil without breaking its superconductivity, thus flexibly tuning levitation force and height meanwhile maintaining self-stability. For the first time, we experimentally demonstrate a self-stable type II superconducting maglev system which is able to: counteract long term levitation force decay, adjust levitation force and equilibrium position, and establish levitation under zero field cooling condition. These breakthroughs may bridge the gap between demonstrations and practical applications of type II superconducting maglevs.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
EEG Based Generative Depression Discriminator
Authors:
Ziming Mao,
Hao wu,
Yongxi Tan,
Yuhe Jin
Abstract:
Depression is a very common but serious mood disorder.In this paper, We built a generative detection network(GDN) in accordance with three physiological laws. Our aim is that we expect the neural network to learn the relevant brain activity based on the EEG signal and, at the same time, to regenerate the target electrode signal based on the brain activity. We trained two generators, the first one…
▽ More
Depression is a very common but serious mood disorder.In this paper, We built a generative detection network(GDN) in accordance with three physiological laws. Our aim is that we expect the neural network to learn the relevant brain activity based on the EEG signal and, at the same time, to regenerate the target electrode signal based on the brain activity. We trained two generators, the first one learns the characteristics of depressed brain activity, and the second one learns the characteristics of control group's brain activity. In the test, a segment of EEG signal was put into the two generators separately, if the relationship between the EEG signal and brain activity conforms to the characteristics of a certain category, then the signal generated by the generator of the corresponding category is more consistent with the original signal. Thus it is possible to determine the category corresponding to a certain segment of EEG signal. We obtained an accuracy of 92.30\% on the MODMA dataset and 86.73\% on the HUSM dataset. Moreover, this model is able to output explainable information, which can be used to help the user to discover possible misjudgments of the network.Our code will be released.
△ Less
Submitted 19 January, 2024;
originally announced February 2024.
-
Automated Detection of Myopic Maculopathy in MMAC 2023: Achievements in Classification, Segmentation, and Spherical Equivalent Prediction
Authors:
Yihao Li,
Philippe Zhang,
Yubo Tan,
Jing Zhang,
Zhihan Wang,
Weili Jiang,
Pierre-Henri Conze,
Mathieu Lamard,
Gwenolé Quellec,
Mostafa El Habib Daho
Abstract:
Myopic macular degeneration is the most common complication of myopia and the primary cause of vision loss in individuals with pathological myopia. Early detection and prompt treatment are crucial in preventing vision impairment due to myopic maculopathy. This was the focus of the Myopic Maculopathy Analysis Challenge (MMAC), in which we participated. In task 1, classification of myopic maculopath…
▽ More
Myopic macular degeneration is the most common complication of myopia and the primary cause of vision loss in individuals with pathological myopia. Early detection and prompt treatment are crucial in preventing vision impairment due to myopic maculopathy. This was the focus of the Myopic Maculopathy Analysis Challenge (MMAC), in which we participated. In task 1, classification of myopic maculopathy, we employed the contrastive learning framework, specifically SimCLR, to enhance classification accuracy by effectively capturing enriched features from unlabeled data. This approach not only improved the intrinsic understanding of the data but also elevated the performance of our classification model. For Task 2 (segmentation of myopic maculopathy plus lesions), we have developed independent segmentation models tailored for different lesion segmentation tasks and implemented a test-time augmentation strategy to further enhance the model's performance. As for Task 3 (prediction of spherical equivalent), we have designed a deep regression model based on the data distribution of the dataset and employed an integration strategy to enhance the model's prediction accuracy. The results we obtained are promising and have allowed us to position ourselves in the Top 6 of the classification task, the Top 2 of the segmentation task, and the Top 1 of the prediction task. The code is available at \url{https://github.com/liyihao76/MMAC_LaTIM_Solution}.
△ Less
Submitted 7 January, 2024;
originally announced January 2024.
-
Multi-Objective Complementary Control
Authors:
Jiapeng Xu,
Xiang Chen,
Ying Tan,
Kemin Zhou
Abstract:
This paper proposes a novel multi-objective control framework for linear time-invariant systems in which performance and robustness can be achieved in a complementary way instead of a trade-off. In particular, a state-space solution is first established for a new stabilizing control structure consisting of two independently designed controllers coordinated with a Youla-type operator ${\bm Q}$. It…
▽ More
This paper proposes a novel multi-objective control framework for linear time-invariant systems in which performance and robustness can be achieved in a complementary way instead of a trade-off. In particular, a state-space solution is first established for a new stabilizing control structure consisting of two independently designed controllers coordinated with a Youla-type operator ${\bm Q}$. It is then shown by performance analysis that these two independently designed controllers operate in a naturally complementary way for a tracking control system, due to the coordination function of ${\bm Q}$ driven by the residual signal of a Luenberger observer. Moreover, it is pointed out that ${\bm Q}$ could be further optimized with an additional gain factor to achieve improved performance, through a data-driven methodology for a measured cost function.
△ Less
Submitted 13 November, 2024; v1 submitted 14 December, 2023;
originally announced December 2023.
-
Adaptive Event-triggered Control For Strict-feedback Systems With Time-varying Parameters
Authors:
Yan Tan,
Liucang Wu,
Wenqi Liu
Abstract:
In this article, we develop a new adaptive event-triggered asymptotic control scheme for strict-feedback systems with fast time-varying parameters. To deal with time-varying parameters with unknown variation boundaries in the feedback path and the input path, we construct three adaptive laws for parameter estimation, two for the uncertain parameters in the feedback path and one for the uncertain p…
▽ More
In this article, we develop a new adaptive event-triggered asymptotic control scheme for strict-feedback systems with fast time-varying parameters. To deal with time-varying parameters with unknown variation boundaries in the feedback path and the input path, we construct three adaptive laws for parameter estimation, two for the uncertain parameters in the feedback path and one for the uncertain parameters in the input path. In particular, two sets of tuning functions are introduced to avoid over-parametrization. Additionally, an event-triggering mechanism is embedded in this adaptive control framework to reduce the data transmission from the controller to the actuator. We also introduce a soft sign function to handle the perturbations caused by sampling errors to achieve asymptotic stability and avoid the so-called parameter drift. The stability analysis shows that the closed-loop system is globally uniformly asymptotically stable and the Zeno behavior can be excluded. Simulation results verify the effectiveness and performance of the proposed adaptive scheme.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
Stain Consistency Learning: Handling Stain Variation for Automatic Digital Pathology Segmentation
Authors:
Michael Yeung,
Todd Watts,
Sean YW Tan,
Pedro F. Ferreira,
Andrew D. Scott,
Sonia Nielles-Vallespin,
Guang Yang
Abstract:
Stain variation is a unique challenge associated with automated analysis of digital pathology. Numerous methods have been developed to improve the robustness of machine learning methods to stain variation, but comparative studies have demonstrated limited benefits to performance. Moreover, methods to handle stain variation were largely developed for H&E stained data, with evaluation generally limi…
▽ More
Stain variation is a unique challenge associated with automated analysis of digital pathology. Numerous methods have been developed to improve the robustness of machine learning methods to stain variation, but comparative studies have demonstrated limited benefits to performance. Moreover, methods to handle stain variation were largely developed for H&E stained data, with evaluation generally limited to classification tasks. Here we propose Stain Consistency Learning, a novel framework combining stain-specific augmentation with a stain consistency loss function to learn stain colour invariant features. We perform the first, extensive comparison of methods to handle stain variation for segmentation tasks, comparing ten methods on Masson's trichrome and H&E stained cell and nuclei datasets, respectively. We observed that stain normalisation methods resulted in equivalent or worse performance, while stain augmentation or stain adversarial methods demonstrated improved performance, with the best performance consistently achieved by our proposed approach. The code is available at: https://github.com/mlyg/stain_consistency_learning
△ Less
Submitted 11 November, 2023;
originally announced November 2023.
-
Concealed Electronic Countermeasures of Radar Signal with Adversarial Examples
Authors:
Ruinan Ma,
Canjie Zhu,
Mingfeng Lu,
Yunjie Li,
Yu-an Tan,
Ruibin Zhang,
Ran Tao
Abstract:
Electronic countermeasures involving radar signals are an important aspect of modern warfare. Traditional electronic countermeasures techniques typically add large-scale interference signals to ensure interference effects, which can lead to attacks being too obvious. In recent years, AI-based attack methods have emerged that can effectively solve this problem, but the attack scenarios are currentl…
▽ More
Electronic countermeasures involving radar signals are an important aspect of modern warfare. Traditional electronic countermeasures techniques typically add large-scale interference signals to ensure interference effects, which can lead to attacks being too obvious. In recent years, AI-based attack methods have emerged that can effectively solve this problem, but the attack scenarios are currently limited to time domain radar signal classification. In this paper, we focus on the time-frequency images classification scenario of radar signals. We first propose an attack pipeline under the time-frequency images scenario and DITIMI-FGSM attack algorithm with high transferability. Then, we propose STFT-based time domain signal attack(STDS) algorithm to solve the problem of non-invertibility in time-frequency analysis, thus obtaining the time-domain representation of the interference signal. A large number of experiments show that our attack pipeline is feasible and the proposed attack method has a high success rate.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
Learning Regularized Monotone Graphon Mean-Field Games
Authors:
Fengzhuo Zhang,
Vincent Y. F. Tan,
Zhaoran Wang,
Zhuoran Yang
Abstract:
This paper studies two fundamental problems in regularized Graphon Mean-Field Games (GMFGs). First, we establish the existence of a Nash Equilibrium (NE) of any $λ$-regularized GMFG (for $λ\geq 0$). This result relies on weaker conditions than those in previous works for analyzing both unregularized GMFGs ($λ=0$) and $λ$-regularized MFGs, which are special cases of GMFGs. Second, we propose provab…
▽ More
This paper studies two fundamental problems in regularized Graphon Mean-Field Games (GMFGs). First, we establish the existence of a Nash Equilibrium (NE) of any $λ$-regularized GMFG (for $λ\geq 0$). This result relies on weaker conditions than those in previous works for analyzing both unregularized GMFGs ($λ=0$) and $λ$-regularized MFGs, which are special cases of GMFGs. Second, we propose provably efficient algorithms to learn the NE in weakly monotone GMFGs, motivated by Lasry and Lions [2007]. Previous literature either only analyzed continuous-time algorithms or required extra conditions to analyze discrete-time algorithms. In contrast, we design a discrete-time algorithm and derive its convergence rate solely under weakly monotone conditions. Furthermore, we develop and analyze the action-value function estimation procedure during the online learning process, which is absent from algorithms for monotone GMFGs. This serves as a sub-module in our optimization algorithm. The efficiency of the designed algorithm is corroborated by empirical evaluations.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
Deep Unrolling for Nonconvex Robust Principal Component Analysis
Authors:
Elizabeth Z. C. Tan,
Caroline Chaux,
Emmanuel Soubies,
Vincent Y. F. Tan
Abstract:
We design algorithms for Robust Principal Component Analysis (RPCA) which consists in decomposing a matrix into the sum of a low rank matrix and a sparse matrix. We propose a deep unrolled algorithm based on an accelerated alternating projection algorithm which aims to solve RPCA in its nonconvex form. The proposed procedure combines benefits of deep neural networks and the interpretability of the…
▽ More
We design algorithms for Robust Principal Component Analysis (RPCA) which consists in decomposing a matrix into the sum of a low rank matrix and a sparse matrix. We propose a deep unrolled algorithm based on an accelerated alternating projection algorithm which aims to solve RPCA in its nonconvex form. The proposed procedure combines benefits of deep neural networks and the interpretability of the original algorithm and it automatically learns hyperparameters. We demonstrate the unrolled algorithm's effectiveness on synthetic datasets and also on a face modeling problem, where it leads to both better numerical and visual performances.
△ Less
Submitted 11 July, 2023;
originally announced July 2023.
-
Sufficient Conditions on Bipartite Consensus of Weakly Connected Matrix-weighted Networks
Authors:
Chongzhi Wang,
Haibin Shao,
Ying Tan,
Dewei Li
Abstract:
Recent advancements in bipartite consensus, a scenario where agents are divided into two disjoint sets with agents in the same set agreeing on a certain value and those in different sets agreeing on opposite or specifically related values, have highlighted its potential applications across various fields. Traditional research typically relies on the presence of a positive-negative spanning tree, w…
▽ More
Recent advancements in bipartite consensus, a scenario where agents are divided into two disjoint sets with agents in the same set agreeing on a certain value and those in different sets agreeing on opposite or specifically related values, have highlighted its potential applications across various fields. Traditional research typically relies on the presence of a positive-negative spanning tree, which limits the practical applicability of bipartite consensus. This study relaxes that assumption by allowing for weak connectivity within the network, where paths can be weighted by semidefinite matrices. By exploring the algebraic constraints imposed by positive-negative trees and semidefinite paths, we derive sufficient conditions for achieving bipartite consensus. Our theoretical findings are validated through numerical results.
△ Less
Submitted 28 September, 2024; v1 submitted 3 July, 2023;
originally announced July 2023.
-
Dictionary Learning under Symmetries via Group Representations
Authors:
Subhroshekhar Ghosh,
Aaron Y. R. Low,
Yong Sheng Soh,
Zhuohang Feng,
Brendan K. Y. Tan
Abstract:
The dictionary learning problem can be viewed as a data-driven process to learn a suitable transformation so that data is sparsely represented directly from example data. In this paper, we examine the problem of learning a dictionary that is invariant under a pre-specified group of transformations. Natural settings include Cryo-EM, multi-object tracking, synchronization, pose estimation, etc. We s…
▽ More
The dictionary learning problem can be viewed as a data-driven process to learn a suitable transformation so that data is sparsely represented directly from example data. In this paper, we examine the problem of learning a dictionary that is invariant under a pre-specified group of transformations. Natural settings include Cryo-EM, multi-object tracking, synchronization, pose estimation, etc. We specifically study this problem under the lens of mathematical representation theory. Leveraging the power of non-abelian Fourier analysis for functions over compact groups, we prescribe an algorithmic recipe for learning dictionaries that obey such invariances. We relate the dictionary learning problem in the physical domain, which is naturally modelled as being infinite dimensional, with the associated computational problem, which is necessarily finite dimensional. We establish that the dictionary learning problem can be effectively understood as an optimization instance over certain matrix orbitopes having a particular block-diagonal structure governed by the irreducible representations of the group of symmetries. This perspective enables us to introduce a band-limiting procedure which obtains dimensionality reduction in applications. We provide guarantees for our computational ansatz to provide a desirable dictionary learning outcome. We apply our paradigm to investigate the dictionary learning problem for the groups SO(2) and SO(3). While the SO(2)-orbitope admits an exact spectrahedral description, substantially less is understood about the SO(3)-orbitope. We describe a tractable spectrahedral outer approximation of the SO(3)-orbitope, and contribute an alternating minimization paradigm to perform optimization in this setting. We provide numerical experiments to highlight the efficacy of our approach in learning SO(3)-invariant dictionaries, both on synthetic and on real world data.
△ Less
Submitted 25 July, 2023; v1 submitted 31 May, 2023;
originally announced May 2023.
-
Sentence Embedder Guided Utterance Encoder (SEGUE) for Spoken Language Understanding
Authors:
Yi Xuan Tan,
Navonil Majumder,
Soujanya Poria
Abstract:
The pre-trained speech encoder wav2vec 2.0 performs very well on various spoken language understanding (SLU) tasks. However, on many tasks, it trails behind text encoders with textual input. To improve the understanding capability of SLU encoders, various studies have used knowledge distillation to transfer knowledge from natural language understanding (NLU) encoders. We use a very simple method o…
▽ More
The pre-trained speech encoder wav2vec 2.0 performs very well on various spoken language understanding (SLU) tasks. However, on many tasks, it trails behind text encoders with textual input. To improve the understanding capability of SLU encoders, various studies have used knowledge distillation to transfer knowledge from natural language understanding (NLU) encoders. We use a very simple method of distilling from a textual sentence embedder directly into wav2vec 2.0 as pre-training, utilizing paired audio-text datasets. We observed that this method is indeed capable of improving SLU task performance in fine-tuned settings, as well as full-data and few-shot transfer on a frozen encoder. However, the model performs worse on certain tasks highlighting the strengths and weaknesses of our approach.
△ Less
Submitted 20 May, 2023;
originally announced May 2023.
-
Model-driven CT reconstruction algorithm for nano-resolution X-ray phase contrast imaging
Authors:
Xuebao Cai,
Yuhang Tan,
Ting Su,
Dong Liang,
Hairong Zheng,
Jinyou Xu,
Peiping Zhu,
Yongshuai Ge
Abstract:
The low-density imaging performance of a zone plate based nano-resolution hard X-ray computed tomography (CT) system can be significantly improved by incorporating a grating-based Lau interferometer. Due to the diffraction, however, the acquired nano-resolution phase signal may suffer splitting problem, which impedes the direct reconstruction of phase contrast CT (nPCT) images. To overcome, a new…
▽ More
The low-density imaging performance of a zone plate based nano-resolution hard X-ray computed tomography (CT) system can be significantly improved by incorporating a grating-based Lau interferometer. Due to the diffraction, however, the acquired nano-resolution phase signal may suffer splitting problem, which impedes the direct reconstruction of phase contrast CT (nPCT) images. To overcome, a new model-driven nPCT image reconstruction algorithm is developed in this study. In it, the diffraction procedure is mathematically modeled into a matrix B, from which the projections without signal splitting can be generated invertedly. Furthermore, a penalized weighed least-square model with total variation (PWLS-TV) is employed to denoise these projections, from which nPCT images with high accuracy are directly reconstructed. Numerical and physical experiments demonstrate that this new algorithm is able to work with phase projections having any splitting distances. Results also reveal that nPCT images with higher signal-to-noise-ratio (SNR) would be reconstructed from projections with larger signal splittings. In conclusion, a novel model-driven nPCT image reconstruction algorithm with high accuracy and robustness is verified for the Lau interferometer based hard X-ray nano-resolution phase contrast imaging.
△ Less
Submitted 13 October, 2023; v1 submitted 14 May, 2023;
originally announced May 2023.
-
Robust Tracking Control for Nonlinear Systems: Performance optimization via extremum seeking
Authors:
Jiapeng Xu,
Ying Tan,
Xiang Chen
Abstract:
This paper presents a controller design and optimization framework for nonlinear dynamic systems to track a given reference signal in the presence of disturbances when the task is repeated over a finite-time interval. This novel framework mainly consists of two steps. The first step is to design a robust linear quadratic tracking controller based on the existing control structure with a Youla-type…
▽ More
This paper presents a controller design and optimization framework for nonlinear dynamic systems to track a given reference signal in the presence of disturbances when the task is repeated over a finite-time interval. This novel framework mainly consists of two steps. The first step is to design a robust linear quadratic tracking controller based on the existing control structure with a Youla-type filter $\tilde Q$. Secondly, an extra degree of freedom: a parameterization in terms of $\tilde Q$, is added to this design framework. This extra design parameter is tuned iteratively from measured tracking cost function with the given disturbances and modeling uncertainties to achieve the best transient performance. The proposed method is validated with simulation placed on a Furuta inverted pendulum, showing significant tracking performance improvement.
△ Less
Submitted 31 March, 2023;
originally announced April 2023.
-
Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger
Authors:
Yi Yu,
Yufei Wang,
Wenhan Yang,
Shijian Lu,
Yap-peng Tan,
Alex C. Kot
Abstract:
Recent deep-learning-based compression methods have achieved superior performance compared with traditional approaches. However, deep learning models have proven to be vulnerable to backdoor attacks, where some specific trigger patterns added to the input can lead to malicious behavior of the models. In this paper, we present a novel backdoor attack with multiple triggers against learned image com…
▽ More
Recent deep-learning-based compression methods have achieved superior performance compared with traditional approaches. However, deep learning models have proven to be vulnerable to backdoor attacks, where some specific trigger patterns added to the input can lead to malicious behavior of the models. In this paper, we present a novel backdoor attack with multiple triggers against learned image compression models. Motivated by the widely used discrete cosine transform (DCT) in existing compression systems and standards, we propose a frequency-based trigger injection model that adds triggers in the DCT domain. In particular, we design several attack objectives for various attacking scenarios, including: 1) attacking compression quality in terms of bit-rate and reconstruction quality; 2) attacking task-driven measures, such as down-stream face recognition and semantic segmentation. Moreover, a novel simple dynamic loss is designed to balance the influence of different loss terms adaptively, which helps achieve more efficient training. Extensive experiments show that with our trained trigger injection models and simple modification of encoder parameters (of the compression model), the proposed attack can successfully inject several backdoors with corresponding triggers in a single image compression model.
△ Less
Submitted 28 February, 2023;
originally announced February 2023.
-
Event-triggered Hybrid Energy-aware Scheduling in Manufacturing Systems
Authors:
Zhean Shao,
Wen Li,
Ying Tan
Abstract:
Incorporating renewable energy sources (RESs) into manufacturing systems has been an active research area in order to address many challenges originating from the unpredictable nature of RESs such as photovoltaics.In the energy-aware scheduling for manufacturing systems, the traditional off-line scheduling techniques cannot always work well due to their lack of robustness with respect to uncertain…
▽ More
Incorporating renewable energy sources (RESs) into manufacturing systems has been an active research area in order to address many challenges originating from the unpredictable nature of RESs such as photovoltaics.In the energy-aware scheduling for manufacturing systems, the traditional off-line scheduling techniques cannot always work well due to their lack of robustness with respect to uncertainties coming from imprecise models or unexpected situations. On the other hand, on-line scheduling or rescheduling, which can improve the robustness by using the model and the latest measurements simultaneously, suffer from a high computational cost. This work proposes a hybrid scheduling framework, which combines the advantages of both off-line scheduling and on-line scheduling, to provide a balanced solution between robustness and computational cost. A novel concept of partially-dispatchable state is introduced. It can be treated as a constant in scheduling when the model works well. When the model does not work well, it is triggered as the variable to tune to improve the performance. Such an event-triggered structure can reduce the number of rescheduling and computational costs while achieving a reasonable performance and enhancing system robustness. Moreover, the choice of partially-dispatchable state also provides an extra design freedom in achieving green manufacturing. Simulation examples on a manufacturing system, of which consists a 100-kW solar photovoltaic system, a 10-machine flow shop production line, a 50-kWh energy storage system, a 100-kW gas turbine, and the grid for power supply, demonstrating the validity and applicability of this event-triggered hybrid scheduling (ETHS) framework.
△ Less
Submitted 1 February, 2023;
originally announced February 2023.
-
A Comprehensive Survey on Heart Sound Analysis in the Deep Learning Era
Authors:
Zhao Ren,
Yi Chang,
Thanh Tam Nguyen,
Yang Tan,
Kun Qian,
Björn W. Schuller
Abstract:
Heart sound auscultation has been applied in clinical usage for early screening of cardiovascular diseases. Due to the high demand for auscultation expertise, automatic auscultation can help with auxiliary diagnosis and reduce the burden of training professional clinicians. Nevertheless, there is a limit to classic machine learning's performance improvement in the era of big data. Deep learning ha…
▽ More
Heart sound auscultation has been applied in clinical usage for early screening of cardiovascular diseases. Due to the high demand for auscultation expertise, automatic auscultation can help with auxiliary diagnosis and reduce the burden of training professional clinicians. Nevertheless, there is a limit to classic machine learning's performance improvement in the era of big data. Deep learning has outperformed classic machine learning in many research fields, as it employs more complex model architectures with a stronger capability of extracting effective representations. Moreover, it has been successfully applied to heart sound analysis in the past years. As most review works about heart sound analysis were carried out before 2017, the present survey is the first to work on a comprehensive overview to summarise papers on heart sound analysis with deep learning published in 2017--2022. This work introduces both classic machine learning and deep learning for comparison, and further offer insights about the advances and future research directions in deep learning for heart sound analysis. Our repository is publicly available at \url{https://github.com/zhaoren91/awesome-heart-sound-analysis}.
△ Less
Submitted 11 May, 2024; v1 submitted 23 January, 2023;
originally announced January 2023.
-
Finding the Most Transferable Tasks for Brain Image Segmentation
Authors:
Yicong Li,
Yang Tan,
Jingyun Yang,
Yang Li,
Xiao-Ping Zhang
Abstract:
Although many studies have successfully applied transfer learning to medical image segmentation, very few of them have investigated the selection strategy when multiple source tasks are available for transfer. In this paper, we propose a prior knowledge guided and transferability based framework to select the best source tasks among a collection of brain image segmentation tasks, to improve the tr…
▽ More
Although many studies have successfully applied transfer learning to medical image segmentation, very few of them have investigated the selection strategy when multiple source tasks are available for transfer. In this paper, we propose a prior knowledge guided and transferability based framework to select the best source tasks among a collection of brain image segmentation tasks, to improve the transfer learning performance on the given target task. The framework consists of modality analysis, RoI (region of interest) analysis, and transferability estimation, such that the source task selection can be refined step by step. Specifically, we adapt the state-of-the-art analytical transferability estimation metrics to medical image segmentation tasks and further show that their performance can be significantly boosted by filtering candidate source tasks based on modality and RoI characteristics. Our experiments on brain matter, brain tumor, and white matter hyperintensities segmentation datasets reveal that transferring from different tasks under the same modality is often more successful than transferring from the same task under different modalities. Furthermore, within the same modality, transferring from the source task that has stronger RoI shape similarity with the target task can significantly improve the final transfer performance. And such similarity can be captured using the Structural Similarity index in the label space.
△ Less
Submitted 2 January, 2023;
originally announced January 2023.
-
On Robust Observer Design for System Motion on SE(3) Using Onboard Visual Sensors
Authors:
Tong Zhang,
Ying Tan,
Xiang Chen,
Zike Lei
Abstract:
Onboard visual sensing has been widely used in the unmanned ground vehicle (UGV) and/or unmanned aerial vehicle (UAV), which can be modeled as dynamic systems on SE(3). The onboard sensing outputs of the dynamic system can usually be applied to derive the relative position between the feature marks and the system, but bearing with explicit geometrical constraint. Such a visual geometrical constrai…
▽ More
Onboard visual sensing has been widely used in the unmanned ground vehicle (UGV) and/or unmanned aerial vehicle (UAV), which can be modeled as dynamic systems on SE(3). The onboard sensing outputs of the dynamic system can usually be applied to derive the relative position between the feature marks and the system, but bearing with explicit geometrical constraint. Such a visual geometrical constraint makes the design of the visual observer on SE(3) very challenging, as it will cause a time-varying or switching visible set due to the varying number of feature marks in this set along different trajectories. Moreover, the possibility of having mis-identified feature marks and modeling uncertainties might result in a divergent estimation error. This paper proposes a new robust observer design method that can accommodate these uncertainties from onboard visual sensing. The key design idea for this observer is to estimate the visible set and identify the mis-identified features from the measurements. Based on the identified uncertainties, a switching strategy is proposed to ensure bounded estimation error for any given trajectory over a fixed time interval. Simulation results are provided to demonstrate the effectiveness of the proposed robust observer.
△ Less
Submitted 21 March, 2023; v1 submitted 29 November, 2022;
originally announced November 2022.
-
Segmentation, Classification, and Quality Assessment of UW-OCTA Images for the Diagnosis of Diabetic Retinopathy
Authors:
Yihao Li,
Rachid Zeghlache,
Ikram Brahim,
Hui Xu,
Yubo Tan,
Pierre-Henri Conze,
Mathieu Lamard,
Gwenolé Quellec,
Mostafa El Habib Daho
Abstract:
Diabetic Retinopathy (DR) is a severe complication of diabetes that can cause blindness. Although effective treatments exist (notably laser) to slow the progression of the disease and prevent blindness, the best treatment remains prevention through regular check-ups (at least once a year) with an ophthalmologist. Optical Coherence Tomography Angiography (OCTA) allows for the visualization of the r…
▽ More
Diabetic Retinopathy (DR) is a severe complication of diabetes that can cause blindness. Although effective treatments exist (notably laser) to slow the progression of the disease and prevent blindness, the best treatment remains prevention through regular check-ups (at least once a year) with an ophthalmologist. Optical Coherence Tomography Angiography (OCTA) allows for the visualization of the retinal vascularization, and the choroid at the microvascular level in great detail. This allows doctors to diagnose DR with more precision. In recent years, algorithms for DR diagnosis have emerged along with the development of deep learning and the improvement of computer hardware. However, these usually focus on retina photography. There are no current methods that can automatically analyze DR using Ultra-Wide OCTA (UW-OCTA). The Diabetic Retinopathy Analysis Challenge 2022 (DRAC22) provides a standardized UW-OCTA dataset to train and test the effectiveness of various algorithms on three tasks: lesions segmentation, quality assessment, and DR grading. In this paper, we will present our solutions for the three tasks of the DRAC22 challenge. The obtained results are promising and have allowed us to position ourselves in the TOP 5 of the segmentation task, the TOP 4 of the quality assessment task, and the TOP 3 of the DR grading task. The code is available at \url{https://github.com/Mostafa-EHD/Diabetic_Retinopathy_OCTA}.
△ Less
Submitted 21 November, 2022;
originally announced November 2022.
-
Robust output regulation of linear system subject to modeled and unmodeled uncertainty
Authors:
Zhicheng Zhang,
Zhiqiang Zuo,
Xiang Chen,
Ying Tan,
Yijing Wang
Abstract:
In this paper, a novel robust output regulation control framework is proposed for the system subject to noise, modeled disturbance and unmodeled disturbance to seek tracking performance and robustness simultaneously. The output regulation scheme is utilized in the framework to track the reference in the presence of modeled disturbance, and the effect of unmodeled disturbance is reduced by an…
▽ More
In this paper, a novel robust output regulation control framework is proposed for the system subject to noise, modeled disturbance and unmodeled disturbance to seek tracking performance and robustness simultaneously. The output regulation scheme is utilized in the framework to track the reference in the presence of modeled disturbance, and the effect of unmodeled disturbance is reduced by an $\mathcal{H}_\infty$ compensator. The Kalman filter can be also introduced in the stabilization loop to deal with the white noise. Furthermore, the tracking error in the presence/absence of noise and disturbance is estimated. The effectiveness and performance of our proposed control framework is verified in the numerical example by applying in the Furuta Inverted Pendulum system.
△ Less
Submitted 26 October, 2022;
originally announced October 2022.
-
Are Macula or Optic Nerve Head Structures better at Diagnosing Glaucoma? An Answer using AI and Wide-Field Optical Coherence Tomography
Authors:
Charis Y. N. Chiang,
Fabian Braeu,
Thanadet Chuangsuwanich,
Royston K. Y. Tan,
Jacqueline Chua,
Leopold Schmetterer,
Alexandre Thiery,
Martin Buist,
Michaël J. A. Girard
Abstract:
Purpose: (1) To develop a deep learning algorithm to automatically segment structures of the optic nerve head (ONH) and macula in 3D wide-field optical coherence tomography (OCT) scans; (2) To assess whether 3D macula or ONH structures (or the combination of both) provide the best diagnostic power for glaucoma. Methods: A cross-sectional comparative study was performed which included wide-field sw…
▽ More
Purpose: (1) To develop a deep learning algorithm to automatically segment structures of the optic nerve head (ONH) and macula in 3D wide-field optical coherence tomography (OCT) scans; (2) To assess whether 3D macula or ONH structures (or the combination of both) provide the best diagnostic power for glaucoma. Methods: A cross-sectional comparative study was performed which included wide-field swept-source OCT scans from 319 glaucoma subjects and 298 non-glaucoma subjects. All scans were compensated to improve deep-tissue visibility. We developed a deep learning algorithm to automatically label all major ONH tissue structures by using 270 manually annotated B-scans for training. The performance of our algorithm was assessed using the Dice coefficient (DC). A glaucoma classification algorithm (3D CNN) was then designed using a combination of 500 OCT volumes and their corresponding automatically segmented masks. This algorithm was trained and tested on 3 datasets: OCT scans cropped to contain the macular tissues only, those to contain the ONH tissues only, and the full wide-field OCT scans. The classification performance for each dataset was reported using the AUC. Results: Our segmentation algorithm was able to segment ONH and macular tissues with a DC of 0.94 $\pm$ 0.003. The classification algorithm was best able to diagnose glaucoma using wide-field 3D-OCT volumes with an AUC of 0.99 $\pm$ 0.01, followed by ONH volumes with an AUC of 0.93 $\pm$ 0.06, and finally macular volumes with an AUC of 0.91 $\pm$ 0.11. Conclusions: this study showed that using wide-field OCT as compared to the typical OCT images containing just the ONH or macular may allow for a significantly improved glaucoma diagnosis. This may encourage the mainstream adoption of 3D wide-field OCT scans. For clinical AI studies that use traditional machines, we would recommend the use of ONH scans as opposed to macula scans.
△ Less
Submitted 12 October, 2022;
originally announced October 2022.
-
Six-center Assessment of CNN-Transformer with Belief Matching Loss for Patient-independent Seizure Detection in EEG
Authors:
Wei Yan Peh,
Prasanth Thangavel,
Yuanyuan Yao,
John Thomas,
Yee Leng Tan,
Justin Dauwels
Abstract:
Neurologists typically identify epileptic seizures from electroencephalograms (EEGs) by visual inspection. This process is often time-consuming, especially for EEG recordings that last hours or days. To expedite the process, a reliable, automated, and patient-independent seizure detector is essential. However, developing a patient-independent seizure detector is challenging as seizures exhibit div…
▽ More
Neurologists typically identify epileptic seizures from electroencephalograms (EEGs) by visual inspection. This process is often time-consuming, especially for EEG recordings that last hours or days. To expedite the process, a reliable, automated, and patient-independent seizure detector is essential. However, developing a patient-independent seizure detector is challenging as seizures exhibit diverse characteristics across patients and recording devices. In this study, we propose a patient-independent seizure detector to automatically detect seizures in both scalp EEG and intracranial EEG (iEEG). First, we deploy a convolutional neural network with transformers and belief matching loss to detect seizures in single-channel EEG segments. Next, we extract regional features from the channel-level outputs to detect seizures in multi-channel EEG segments. At last, we apply postprocessing filters to the segment-level outputs to determine seizures' start and end points in multi-channel EEGs. Finally, we introduce the minimum overlap evaluation scoring as an evaluation metric that accounts for minimum overlap between the detection and seizure, improving upon existing assessment metrics. We trained the seizure detector on the Temple University Hospital Seizure (TUH-SZ) dataset and evaluated it on five independent EEG datasets. We evaluate the systems with the following metrics: sensitivity (SEN), precision (PRE), and average and median false positive rate per hour (aFPR/h and mFPR/h). Across four adult scalp EEG and iEEG datasets, we obtained SEN of 0.617-1.00, PRE of 0.534-1.00, aFPR/h of 0.425-2.002, and mFPR/h of 0-1.003. The proposed seizure detector can detect seizures in adult EEGs and takes less than 15s for a 30 minutes EEG. Hence, this system could aid clinicians in reliably identifying seizures expeditiously, allocating more time for devising proper treatment.
△ Less
Submitted 22 November, 2022; v1 submitted 29 July, 2022;
originally announced August 2022.
-
Asymptotic Nash Equilibrium for the $M$-ary Sequential Adversarial Hypothesis Testing Game
Authors:
Jiachun Pan,
Yonglong Li,
Vincent Y. F. Tan
Abstract:
In this paper, we consider a novel $M$-ary sequential hypothesis testing problem in which an adversary is present and perturbs the distributions of the samples before the decision maker observes them. This problem is formulated as a sequential adversarial hypothesis testing game played between the decision maker and the adversary. This game is a zero-sum and strategic one. We assume the adversary…
▽ More
In this paper, we consider a novel $M$-ary sequential hypothesis testing problem in which an adversary is present and perturbs the distributions of the samples before the decision maker observes them. This problem is formulated as a sequential adversarial hypothesis testing game played between the decision maker and the adversary. This game is a zero-sum and strategic one. We assume the adversary is active under \emph{all} hypotheses and knows the underlying distribution of observed samples. We adopt this framework as it is the worst-case scenario from the perspective of the decision maker. The goal of the decision maker is to minimize the expectation of the stopping time to ensure that the test is as efficient as possible; the adversary's goal is, instead, to maximize the stopping time. We derive a pair of strategies under which the asymptotic Nash equilibrium of the game is attained. We also consider the case in which the adversary is not aware of the underlying hypothesis and hence is constrained to apply the same strategy regardless of which hypothesis is in effect. Numerical results corroborate our theoretical findings.
△ Less
Submitted 20 June, 2022;
originally announced June 2022.
-
Extremely Low-light Image Enhancement with Scene Text Restoration
Authors:
Pohao Hsu,
Che-Tsung Lin,
Chun Chet Ng,
Jie-Long Kew,
Mei Yih Tan,
Shang-Hong Lai,
Chee Seng Chan,
Christopher Zach
Abstract:
Deep learning-based methods have made impressive progress in enhancing extremely low-light images - the image quality of the reconstructed images has generally improved. However, we found out that most of these methods could not sufficiently recover the image details, for instance, the texts in the scene. In this paper, a novel image enhancement framework is proposed to precisely restore the scene…
▽ More
Deep learning-based methods have made impressive progress in enhancing extremely low-light images - the image quality of the reconstructed images has generally improved. However, we found out that most of these methods could not sufficiently recover the image details, for instance, the texts in the scene. In this paper, a novel image enhancement framework is proposed to precisely restore the scene texts, as well as the overall quality of the image simultaneously under extremely low-light images conditions. Mainly, we employed a self-regularised attention map, an edge map, and a novel text detection loss. In addition, leveraging synthetic low-light images is beneficial for image enhancement on the genuine ones in terms of text detection. The quantitative and qualitative experimental results have shown that the proposed model outperforms state-of-the-art methods in image restoration, text detection, and text spotting on See In the Dark and ICDAR15 datasets.
△ Less
Submitted 1 April, 2022;
originally announced April 2022.