Search | arXiv e-print repository

Unsupervised Single-Channel Speech Separation with a Diffusion Prior under Speaker-Embedding Guidance

Authors: Runwu Shi, Kai Li, Chang Li, Jiang Wang, Sihan Tan, Kazuhiro Nakadai

Abstract: Speech separation is a fundamental task in audio processing, typically addressed with fully supervised systems trained on paired mixtures. While effective, such systems typically rely on synthetic data pipelines, which may not reflect real-world conditions. Instead, we revisit the source-model paradigm, training a diffusion generative model solely on anechoic speech and formulating separation as a… ▽ More Speech separation is a fundamental task in audio processing, typically addressed with fully supervised systems trained on paired mixtures. While effective, such systems typically rely on synthetic data pipelines, which may not reflect real-world conditions. Instead, we revisit the source-model paradigm, training a diffusion generative model solely on anechoic speech and formulating separation as a diffusion inverse problem. However, unconditional diffusion models lack speaker-level conditioning, they can capture local acoustic structure but produce temporally inconsistent speaker identities in separated sources. To address this limitation, we propose Speaker-Embedding guidance that, during the reverse diffusion process, maintains speaker coherence within each separated track while driving embeddings of different speakers further apart. In addition, we propose a new separation-oriented solver tailored for speech separation, and both strategies effectively enhance performance on the challenging task of unsupervised source-model-based speech separation, as confirmed by extensive experimental results. Audio samples and code are available at https://runwushi.github.io/UnSepDiff_demo. △ Less

Submitted 29 September, 2025; originally announced September 2025.

Comments: 5 pages, 2 figures, submitted to ICASSP 2026

arXiv:2509.10055 [pdf]

Data-driven optimization of sparse sensor placement in thermal hydraulic experiments

Authors: Xicheng Wang, Yun. Feng, Dmitry Grishchenko, Pavel Kudinov, Ruifeng Tian, Sichao Tan

Abstract: Thermal-Hydraulic (TH) experiments provide valuable insight into the physics of heat and mass transfer and qualified data for code development, calibration and validation. However, measurements are typically collected from sparsely distributed sensors, offering limited coverage over the domain of interest and phenomena of interest. Determination of the spatial configuration of these sensors is cru… ▽ More Thermal-Hydraulic (TH) experiments provide valuable insight into the physics of heat and mass transfer and qualified data for code development, calibration and validation. However, measurements are typically collected from sparsely distributed sensors, offering limited coverage over the domain of interest and phenomena of interest. Determination of the spatial configuration of these sensors is crucial and challenging during the pre-test design stage. This paper develops a data-driven framework for optimizing sensor placement in TH experiments, including (i) a sensitivity analysis to construct datasets, (ii) Proper Orthogonal Decomposition (POD) for dimensionality reduction, and (iii) QR factorization with column pivoting to determine optimal sensor configuration under spatial constraints. The framework is demonstrated on a test conducted in the TALL-3D Lead-bismuth eutectic (LBE) loop. In this case, the utilization of optical techniques, such as Particle Image Velocimetry (PIV), are impractical. Thereby the quantification of momentum and energy transport relies heavily on readings from Thermocouples (TCs). The test section was previously instrumented with many TCs determined through a manual process combining simulation results with expert judgement. The proposed framework provides a systematic and automated approach for sensor placement. The resulting TCs exhibit high sensitivity to the variation of uncertain input parameters and enable accurate full field reconstruction while maintaining robustness against measurement noise. △ Less

Submitted 12 September, 2025; originally announced September 2025.

arXiv:2508.11115 [pdf, ps, other]

UWB-PostureGuard: A Privacy-Preserving RF Sensing System for Continuous Ergonomic Sitting Posture Monitoring

Authors: Haotang Li, Zhenyu Qi, Sen He, Kebin Peng, Sheng Tan, Yili Ren, Tomas Cerny, Jiyue Zhao, Zi Wang

Abstract: Improper sitting posture during prolonged computer use has become a significant public health concern. Traditional posture monitoring solutions face substantial barriers, including privacy concerns with camera-based systems and user discomfort with wearable sensors. This paper presents UWB-PostureGuard, a privacy-preserving ultra-wideband (UWB) sensing system that advances mobile technologies for… ▽ More Improper sitting posture during prolonged computer use has become a significant public health concern. Traditional posture monitoring solutions face substantial barriers, including privacy concerns with camera-based systems and user discomfort with wearable sensors. This paper presents UWB-PostureGuard, a privacy-preserving ultra-wideband (UWB) sensing system that advances mobile technologies for preventive health management through continuous, contactless monitoring of ergonomic sitting posture. Our system leverages commercial UWB devices, utilizing comprehensive feature engineering to extract multiple ergonomic sitting posture features. We develop PoseGBDT to effectively capture temporal dependencies in posture patterns, addressing limitations of traditional frame-wise classification approaches. Extensive real-world evaluation across 10 participants and 19 distinct postures demonstrates exceptional performance, achieving 99.11% accuracy while maintaining robustness against environmental variables such as clothing thickness, additional devices, and furniture configurations. Our system provides a scalable, privacy-preserving mobile health solution on existing platforms for proactive ergonomic management, improving quality of life at low costs. △ Less

Submitted 14 August, 2025; originally announced August 2025.

arXiv:2507.15385 [pdf, ps, other]

Transformer-based Deep Learning Model for Joint Routing and Scheduling with Varying Electric Vehicle Numbers

Authors: Jun Kang Yap, Vishnu Monn Baskaran, Wen Shan Tan, Ze Yang Ding, Hao Wang, David L. Dowe

Abstract: The growing integration of renewable energy sources in modern power systems has introduced significant operational challenges due to their intermittent and uncertain outputs. In recent years, mobile energy storage systems (ESSs) have emerged as a popular flexible resource for mitigating these challenges. Compared to stationary ESSs, mobile ESSs offer additional spatial flexibility, enabling cost-e… ▽ More The growing integration of renewable energy sources in modern power systems has introduced significant operational challenges due to their intermittent and uncertain outputs. In recent years, mobile energy storage systems (ESSs) have emerged as a popular flexible resource for mitigating these challenges. Compared to stationary ESSs, mobile ESSs offer additional spatial flexibility, enabling cost-effective energy delivery through the transportation network. However, the widespread deployment of mobile ESSs is often hindered by the high investment cost, which has motivated researchers to investigate utilising more readily available alternatives, such as electric vehicles (EVs) as mobile energy storage units instead. Hence, we explore this opportunity with a MIP-based day-ahead electric vehicle joint routing and scheduling problem in this work. However, solving the problem in a practical setting can often be computationally intractable since the existence of binary variables makes it combinatorial challenging. Therefore, we proposed to simplify the problem's solution process for a MIP solver by pruning the solution search space with a transformer-based deep learning (DL) model. This is done by training the model to rapidly predict the optimal binary solutions. In addition, unlike many existing DL approaches that assume fixed problem structures, the proposed model is designed to accommodate problems with EV fleets of any sizes. This flexibility is essential since frequent re-training can introduce significant computational overhead. We evaluated the approach with simulations on the IEEE 33-bus system coupled with the Nguyen-Dupuis transportation network. △ Less

Submitted 21 July, 2025; originally announced July 2025.

Comments: Accepted at Industry Applications Society Annual Meeting (IAS 2025)

arXiv:2507.15307 [pdf, ps, other]

Joint Optimisation of Electric Vehicle Routing and Scheduling: A Deep Learning-Driven Approach for Dynamic Fleet Sizes

Authors: Jun Kang Yap, Vishnu Monn Baskaran, Wen Shan Tan, Ze Yang Ding, Hao Wang, David L. Dowe

Abstract: Electric Vehicles (EVs) are becoming increasingly prevalent nowadays, with studies highlighting their potential as mobile energy storage systems to provide grid support. Realising this potential requires effective charging coordination, which are often formulated as mixed-integer programming (MIP) problems. However, MIP problems are NP-hard and often intractable when applied to time-sensitive task… ▽ More Electric Vehicles (EVs) are becoming increasingly prevalent nowadays, with studies highlighting their potential as mobile energy storage systems to provide grid support. Realising this potential requires effective charging coordination, which are often formulated as mixed-integer programming (MIP) problems. However, MIP problems are NP-hard and often intractable when applied to time-sensitive tasks. To address this limitation, we propose a deep learning assisted approach for optimising a day-ahead EV joint routing and scheduling problem with varying number of EVs. This problem simultaneously optimises EV routing, charging, discharging and generator scheduling within a distribution network with renewable energy sources. A convolutional neural network is trained to predict the binary variables, thereby reducing the solution search space and enabling solvers to determine the remaining variables more efficiently. Additionally, a padding mechanism is included to handle the changes in input and output sizes caused by varying number of EVs, thus eliminating the need for re-training. In a case study on the IEEE 33-bus system and Nguyen-Dupuis transportation network, our approach reduced runtime by 97.8% when compared to an unassisted MIP solver, while retaining 99.5% feasibility and deviating less than 0.01% from the optimal solution. △ Less

Submitted 21 July, 2025; originally announced July 2025.

Comments: Accepted at International Joint Conference on Neural Networks (IJCNN 2025)

arXiv:2507.00270 [pdf, ps, other]

EMSpice 2.1: A Coupled EM and IR Drop Analysis Tool with Joule Heating and Thermal Map Integration for VLSI Reliability

Authors: Subed Lamichhane, Haotian Lu, Sheldon X. -D. Tan

Abstract: Electromigration (EM) remains a critical reliability concern in current and future copper-based VLSI circuits. As technology scales down, EM-induced IR drop becomes increasingly severe. While several EM-aware IR drop analysis tools have been proposed, few incorporate the real impact of temperature distribution on both EM and IR drop effects. In this work, we introduce EMSpice 2.1, an enhanced tool… ▽ More Electromigration (EM) remains a critical reliability concern in current and future copper-based VLSI circuits. As technology scales down, EM-induced IR drop becomes increasingly severe. While several EM-aware IR drop analysis tools have been proposed, few incorporate the real impact of temperature distribution on both EM and IR drop effects. In this work, we introduce EMSpice 2.1, an enhanced tool built upon the existing coupled IR-EM analysis framework, EMSpice 2.0, for EM-aware IR drop analysis. For the first time, EMSpice 2.1 uniquely integrates Joule heating effects and practical thermal maps derived from actual chip conditions. Additionally, it features improved interoperability with commercial EDA tools, facilitating more comprehensive EM and IR drop sign-off analysis. Our findings demonstrate that specific hotspot patterns significantly impact the lifetime of interconnects and overall chip reliability due to EM failures. Furthermore, our tool exhibits strong agreement with industry-standard tools such as COMSOL, achieving a speedup of over 200 times while maintaining high accuracy. △ Less

Submitted 30 June, 2025; originally announced July 2025.

Comments: 4 Pages, accepted to SMACD 2025

arXiv:2506.00564 [pdf, ps, other]

Image Restoration Learning via Noisy Supervision in the Fourier Domain

Authors: Haosen Liu, Jiahao Liu, Shan Tan, Edmund Y. Lam

Abstract: Noisy supervision refers to supervising image restoration learning with noisy targets. It can alleviate the data collection burden and enhance the practical applicability of deep learning techniques. However, existing methods suffer from two key drawbacks. Firstly, they are ineffective in handling spatially correlated noise commonly observed in practical applications such as low-light imaging and… ▽ More Noisy supervision refers to supervising image restoration learning with noisy targets. It can alleviate the data collection burden and enhance the practical applicability of deep learning techniques. However, existing methods suffer from two key drawbacks. Firstly, they are ineffective in handling spatially correlated noise commonly observed in practical applications such as low-light imaging and remote sensing. Secondly, they rely on pixel-wise loss functions that only provide limited supervision information. This work addresses these challenges by leveraging the Fourier domain. We highlight that the Fourier coefficients of spatially correlated noise exhibit sparsity and independence, making them easier to handle. Additionally, Fourier coefficients contain global information, enabling more significant supervision. Motivated by these insights, we propose to establish noisy supervision in the Fourier domain. We first prove that Fourier coefficients of a wide range of noise converge in distribution to the Gaussian distribution. Exploiting this statistical property, we establish the equivalence between using noisy targets and clean targets in the Fourier domain. This leads to a unified learning framework applicable to various image restoration tasks, diverse network architectures, and different noise models. Extensive experiments validate the outstanding performance of this framework in terms of both quantitative indices and perceptual quality. △ Less

Submitted 31 May, 2025; originally announced June 2025.

arXiv:2505.16807 [pdf, ps, other]

Chirp Delay-Doppler Domain Modulation: A New Paradigm of Integrated Sensing and Communication for Autonomous Vehicles

Authors: Zhuoran Li, Shufeng Tan, Zhen Gao, Yi Tao, Zhonghuai Wu, Zhongxiang Li, Chun Hu, Dezhi Zheng

Abstract: Autonomous driving is reshaping the way humans travel, with millimeter wave (mmWave) radar playing a crucial role in this transformation to enabe vehicle-to-everything (V2X). Although chirp is widely used in mmWave radar systems for its strong sensing capabilities, the lack of integrated communication functions in existing systems may limit further advancement of autonomous driving. In light of th… ▽ More Autonomous driving is reshaping the way humans travel, with millimeter wave (mmWave) radar playing a crucial role in this transformation to enabe vehicle-to-everything (V2X). Although chirp is widely used in mmWave radar systems for its strong sensing capabilities, the lack of integrated communication functions in existing systems may limit further advancement of autonomous driving. In light of this, we first design ``dedicated chirps" tailored for sensing chirp signals in the environment, facilitating the identification of idle time-frequency resources. Based on these dedicated chirps, we propose a chirp-division multiple access (Chirp-DMA) scheme, enabling multiple pairs of mmWave radar transceivers to perform integrated sensing and communication (ISAC) without interference. Subsequently, we propose two chirp-based delay-Doppler domain modulation schemes that enable each pair of mmWave radar transceivers to simultaneously sense and communicate within their respective time-frequency resource blocks. The modulation schemes are based on different multiple-input multiple-output (MIMO) radar schemes: the time division multiplexing (TDM)-based scheme offers higher communication rates, while the Doppler division multiplexing (DDM)-based scheme is suitable for working in a lower signal-to-noise ratio range. We then validate the effectiveness of the proposed DDM-based scheme through simulations. Finally, we present some challenges and issues that need to be addressed to advance ISAC in V2X for better autonomous driving. Simulation codes are provided to reproduce the results in this paper: \href{https://github.com/LiZhuoRan0/2025-IEEE-Network-ChirpDelayDopplerModulationISAC}{https://github.com/LiZhuoRan0}. △ Less

Submitted 22 May, 2025; originally announced May 2025.

arXiv:2503.11324 [pdf, other]

Safe-VAR: Safe Visual Autoregressive Model for Text-to-Image Generative Watermarking

Authors: Ziyi Wang, Songbai Tan, Gang Xu, Xuerui Qiu, Hongbin Xu, Xin Meng, Ming Li, Fei Richard Yu

Abstract: With the success of autoregressive learning in large language models, it has become a dominant approach for text-to-image generation, offering high efficiency and visual quality. However, invisible watermarking for visual autoregressive (VAR) models remains underexplored, despite its importance in misuse prevention. Existing watermarking methods, designed for diffusion models, often struggle to ad… ▽ More With the success of autoregressive learning in large language models, it has become a dominant approach for text-to-image generation, offering high efficiency and visual quality. However, invisible watermarking for visual autoregressive (VAR) models remains underexplored, despite its importance in misuse prevention. Existing watermarking methods, designed for diffusion models, often struggle to adapt to the sequential nature of VAR models. To bridge this gap, we propose Safe-VAR, the first watermarking framework specifically designed for autoregressive text-to-image generation. Our study reveals that the timing of watermark injection significantly impacts generation quality, and watermarks of different complexities exhibit varying optimal injection times. Motivated by this observation, we propose an Adaptive Scale Interaction Module, which dynamically determines the optimal watermark embedding strategy based on the watermark information and the visual characteristics of the generated image. This ensures watermark robustness while minimizing its impact on image quality. Furthermore, we introduce a Cross-Scale Fusion mechanism, which integrates mixture of both heads and experts to effectively fuse multi-resolution features and handle complex interactions between image content and watermark patterns. Experimental results demonstrate that Safe-VAR achieves state-of-the-art performance, significantly surpassing existing counterparts regarding image quality, watermarking fidelity, and robustness against perturbations. Moreover, our method exhibits strong generalization to an out-of-domain watermark dataset QR Codes. △ Less

Submitted 14 March, 2025; originally announced March 2025.

arXiv:2503.02685 [pdf, other]

TReND: Transformer derived features and Regularized NMF for neonatal functional network Delineation

Authors: Sovesh Mohapatra, Minhui Ouyang, Shufang Tan, Jianlin Guo, Lianglong Sun, Yong He, Hao Huang

Abstract: Precise parcellation of functional networks (FNs) of early developing human brain is the fundamental basis for identifying biomarker of developmental disorders and understanding functional development. Resting-state fMRI (rs-fMRI) enables in vivo exploration of functional changes, but adult FN parcellations cannot be directly applied to the neonates due to incomplete network maturation. No standar… ▽ More Precise parcellation of functional networks (FNs) of early developing human brain is the fundamental basis for identifying biomarker of developmental disorders and understanding functional development. Resting-state fMRI (rs-fMRI) enables in vivo exploration of functional changes, but adult FN parcellations cannot be directly applied to the neonates due to incomplete network maturation. No standardized neonatal functional atlas is currently available. To solve this fundamental issue, we propose TReND, a novel and fully automated self-supervised transformer-autoencoder framework that integrates regularized nonnegative matrix factorization (RNMF) to unveil the FNs in neonates. TReND effectively disentangles spatiotemporal features in voxel-wise rs-fMRI data. The framework integrates confidence-adaptive masks into transformer self-attention layers to mitigate noise influence. A self supervised decoder acts as a regulator to refine the encoder's latent embeddings, which serve as reliable temporal features. For spatial coherence, we incorporate brain surface-based geodesic distances as spatial encodings along with functional connectivity from temporal features. The TReND clustering approach processes these features under sparsity and smoothness constraints, producing robust and biologically plausible parcellations. We extensively validated our TReND framework on three different rs-fMRI datasets: simulated, dHCP and HCP-YA against comparable traditional feature extraction and clustering techniques. Our results demonstrated the superiority of the TReND framework in the delineation of neonate FNs with significantly better spatial contiguity and functional homogeneity. Collectively, we established TReND, a novel and robust framework, for neonatal FN delineation. TReND-derived neonatal FNs could serve as a neonatal functional atlas for perinatal populations in health and disease. △ Less

Submitted 4 March, 2025; originally announced March 2025.

Comments: 10 Pages, 5 figures

arXiv:2412.19990 [pdf, other]

SegKAN: High-Resolution Medical Image Segmentation with Long-Distance Dependencies

Authors: Shengbo Tan, Rundong Xue, Shipeng Luo, Zeyu Zhang, Xinran Wang, Lei Zhang, Daji Ergu, Zhang Yi, Yang Zhao, Ying Cai

Abstract: Hepatic vessels in computed tomography scans often suffer from image fragmentation and noise interference, making it difficult to maintain vessel integrity and posing significant challenges for vessel segmentation. To address this issue, we propose an innovative model: SegKAN. First, we improve the conventional embedding module by adopting a novel convolutional network structure for image embeddin… ▽ More Hepatic vessels in computed tomography scans often suffer from image fragmentation and noise interference, making it difficult to maintain vessel integrity and posing significant challenges for vessel segmentation. To address this issue, we propose an innovative model: SegKAN. First, we improve the conventional embedding module by adopting a novel convolutional network structure for image embedding, which smooths out image noise and prevents issues such as gradient explosion in subsequent stages. Next, we transform the spatial relationships between Patch blocks into temporal relationships to solve the problem of capturing positional relationships between Patch blocks in traditional Vision Transformer models. We conducted experiments on a Hepatic vessel dataset, and compared to the existing state-of-the-art model, the Dice score improved by 1.78%. These results demonstrate that the proposed new structure effectively enhances the segmentation performance of high-resolution extended objects. Code will be available at https://github.com/goblin327/SegKAN △ Less

Submitted 2 January, 2025; v1 submitted 27 December, 2024; originally announced December 2024.

arXiv:2411.13862 [pdf, other]

Image Compression Using Novel View Synthesis Priors

Authors: Luyuan Peng, Mandar Chitre, Hari Vishnu, Yuen Min Too, Bharath Kalyan, Rajat Mishra, Soo Pieng Tan

Abstract: Real-time visual feedback is essential for tetherless control of remotely operated vehicles, particularly during inspection and manipulation tasks. Though acoustic communication is the preferred choice for medium-range communication underwater, its limited bandwidth renders it impractical to transmit images or videos in real-time. To address this, we propose a model-based image compression techniq… ▽ More Real-time visual feedback is essential for tetherless control of remotely operated vehicles, particularly during inspection and manipulation tasks. Though acoustic communication is the preferred choice for medium-range communication underwater, its limited bandwidth renders it impractical to transmit images or videos in real-time. To address this, we propose a model-based image compression technique that leverages prior mission information. Our approach employs trained machine-learning based novel view synthesis models, and uses gradient descent optimization to refine latent representations to help generate compressible differences between camera images and rendered images. We evaluate the proposed compression technique using a dataset from an artificial ocean basin, demonstrating superior compression ratios and image quality over existing techniques. Moreover, our method exhibits robustness to introduction of new objects within the scene, highlighting its potential for advancing tetherless remotely operated vehicle operations. △ Less

Submitted 27 November, 2024; v1 submitted 21 November, 2024; originally announced November 2024.

Comments: Preprint submitted to IEEE Journal of Oceanic Engineering

arXiv:2410.18461 [pdf, other]

doi 10.1007/978-981-96-4589-3

Uncertainty-Error correlations in Evidential Deep Learning models for biomedical segmentation

Authors: Hai Siong Tan, Kuancheng Wang, Rafe Mcbeth

Abstract: In this work, we examine the effectiveness of an uncertainty quantification framework known as Evidential Deep Learning applied in the context of biomedical image segmentation. This class of models involves assigning Dirichlet distributions as priors for segmentation labels, and enables a few distinct definitions of model uncertainties. Using the cardiac and prostate MRI images available in the Me… ▽ More In this work, we examine the effectiveness of an uncertainty quantification framework known as Evidential Deep Learning applied in the context of biomedical image segmentation. This class of models involves assigning Dirichlet distributions as priors for segmentation labels, and enables a few distinct definitions of model uncertainties. Using the cardiac and prostate MRI images available in the Medical Segmentation Decathlon for validation, we found that Evidential Deep Learning models with U-Net backbones generally yielded superior correlations between prediction errors and uncertainties relative to the conventional baseline equipped with Shannon entropy measure, Monte-Carlo Dropout and Deep Ensemble methods. We also examined these models' effectiveness in active learning, finding that relative to the standard Shannon entropy-based sampling, they yielded higher point-biserial uncertainty-error correlations while attaining similar performances in Dice-Sorensen coefficients. These superior features of EDL models render them well-suited for segmentation tasks that warrant a critical sensitivity in detecting large model errors. △ Less

Submitted 24 October, 2024; originally announced October 2024.

Comments: 15 pages

Journal ref: Published in Proceedings of TAAI 2024

arXiv:2409.14413 [pdf]

Real-time Detection and Auto focusing of Beam Profiles from Silicon Photonics Gratings using YOLO model

Authors: Yu Dian Lim, Hong Yu Li, Simon Chun Kiat Goh, Xiangyu Wang, Peng Zhao, Chuan Seng Tan

Abstract: When observing the chip-to-free-space light beams from silicon photonics (SiPh) to free-space, manual adjustment of camera lens is often required to obtain a focused image of the light beams. In this letter, we demonstrated an auto-focusing system based on you-only-look-once (YOLO) model. The trained YOLO model exhibits high classification accuracy of 99.7% and high confidence level >0.95 when det… ▽ More When observing the chip-to-free-space light beams from silicon photonics (SiPh) to free-space, manual adjustment of camera lens is often required to obtain a focused image of the light beams. In this letter, we demonstrated an auto-focusing system based on you-only-look-once (YOLO) model. The trained YOLO model exhibits high classification accuracy of 99.7% and high confidence level >0.95 when detecting light beams from SiPh gratings. A video demonstration of real-time light beam detection, real-time computation of beam width, and auto focusing of light beams are also included. △ Less

Submitted 22 September, 2024; originally announced September 2024.

arXiv:2409.00204 [pdf, other]

MedDet: Generative Adversarial Distillation for Efficient Cervical Disc Herniation Detection

Authors: Zeyu Zhang, Nengmin Yi, Shengbo Tan, Ying Cai, Yi Yang, Lei Xu, Qingtai Li, Zhang Yi, Daji Ergu, Yang Zhao

Abstract: Cervical disc herniation (CDH) is a prevalent musculoskeletal disorder that significantly impacts health and requires labor-intensive analysis from experts. Despite advancements in automated detection of medical imaging, two significant challenges hinder the real-world application of these methods. First, the computational complexity and resource demands present a significant gap for real-time app… ▽ More Cervical disc herniation (CDH) is a prevalent musculoskeletal disorder that significantly impacts health and requires labor-intensive analysis from experts. Despite advancements in automated detection of medical imaging, two significant challenges hinder the real-world application of these methods. First, the computational complexity and resource demands present a significant gap for real-time application. Second, noise in MRI reduces the effectiveness of existing methods by distorting feature extraction. To address these challenges, we propose three key contributions: Firstly, we introduced MedDet, which leverages the multi-teacher single-student knowledge distillation for model compression and efficiency, meanwhile integrating generative adversarial training to enhance performance. Additionally, we customize the second-order nmODE to improve the model's resistance to noise in MRI. Lastly, we conducted comprehensive experiments on the CDH-1848 dataset, achieving up to a 5% improvement in mAP compared to previous methods. Our approach also delivers over 5 times faster inference speed, with approximately 67.8% reduction in parameters and 36.9% reduction in FLOPs compared to the teacher model. These advancements significantly enhance the performance and efficiency of automated CDH detection, demonstrating promising potential for future application in clinical practice. See project website https://steve-zeyu-zhang.github.io/MedDet △ Less

Submitted 18 October, 2024; v1 submitted 30 August, 2024; originally announced September 2024.

Comments: Accepted to BIBM 2024 Oral

arXiv:2408.10287 [pdf]

Recognizing Beam Profiles from Silicon Photonics Gratings using Transformer Model

Authors: Yu Dian Lim, Hong Yu Li, Simon Chun Kiat Goh, Xiangyu Wang, Peng Zhao, Chuan Seng Tan

Abstract: Over the past decade, there has been extensive work in developing integrated silicon photonics (SiPh) gratings for the optical addressing of trapped ion qubits in the ion trap quantum computing community. However, when viewing beam profiles from infrared (IR) cameras, it is often difficult to determine the corresponding heights where the beam profiles are located. In this work, we developed transf… ▽ More Over the past decade, there has been extensive work in developing integrated silicon photonics (SiPh) gratings for the optical addressing of trapped ion qubits in the ion trap quantum computing community. However, when viewing beam profiles from infrared (IR) cameras, it is often difficult to determine the corresponding heights where the beam profiles are located. In this work, we developed transformer models to recognize the corresponding height categories of beam profiles of light from SiPh gratings. The model is trained using two techniques: (1) input patches, and (2) input sequence. For model trained with input patches, the model achieved recognition accuracy of 0.938. Meanwhile, model trained with input sequence shows lower accuracy of 0.895. However, when repeating the model-training 150 cycles, model trained with input patches shows inconsistent accuracy ranges between 0.445 to 0.959, while model trained with input sequence exhibit higher accuracy values between 0.789 to 0.936. The obtained outcomes can be expanded to various applications, including auto-focusing of light beam and auto-adjustment of z-axis stage to acquire desired beam profiles. △ Less

Submitted 22 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

arXiv:2407.19544 [pdf]

Deep Generative Models-Assisted Automated Labeling for Electron Microscopy Images Segmentation

Authors: Wenhao Yuan, Bingqing Yao, Shengdong Tan, Fengqi You, Qian He

Abstract: The rapid advancement of deep learning has facilitated the automated processing of electron microscopy (EM) big data stacks. However, designing a framework that eliminates manual labeling and adapts to domain gaps remains challenging. Current research remains entangled in the dilemma of pursuing complete automation while still requiring simulations or slight manual annotations. Here we demonstrate… ▽ More The rapid advancement of deep learning has facilitated the automated processing of electron microscopy (EM) big data stacks. However, designing a framework that eliminates manual labeling and adapts to domain gaps remains challenging. Current research remains entangled in the dilemma of pursuing complete automation while still requiring simulations or slight manual annotations. Here we demonstrate tandem generative adversarial network (tGAN), a fully label-free and simulation-free pipeline capable of generating EM images for computer vision training. The tGAN can assimilate key features from new data stacks, thus producing a tailored virtual dataset for the training of automated EM analysis tools. Using segmenting nanoparticles for analyzing size distribution of supported catalysts as the demonstration, our findings showcased that the recognition accuracy of tGAN even exceeds the manually-labeling method by 5%. It can also be adaptively deployed to various data domains without further manual manipulation, which is verified by transfer learning from HAADF-STEM to BF-TEM. This generalizability may enable it to extend its application to a broader range of imaging characterizations, liberating microscopists and materials scientists from tedious dataset annotations. △ Less

Submitted 28 July, 2024; originally announced July 2024.

arXiv:2407.16961 [pdf, other]

doi 10.1109/JOE.2025.3581356

Pose Estimation from Camera Images for Underwater Inspection

Authors: Luyuan Peng, Hari Vishnu, Mandar Chitre, Yuen Min Too, Bharath Kalyan, Rajat Mishra, Soo Pieng Tan

Abstract: High-precision localization is pivotal in underwater reinspection missions. Traditional localization methods like inertial navigation systems, Doppler velocity loggers, and acoustic positioning face significant challenges and are not cost-effective for some applications. Visual localization is a cost-effective alternative in such cases, leveraging the cameras already equipped on inspection vehicle… ▽ More High-precision localization is pivotal in underwater reinspection missions. Traditional localization methods like inertial navigation systems, Doppler velocity loggers, and acoustic positioning face significant challenges and are not cost-effective for some applications. Visual localization is a cost-effective alternative in such cases, leveraging the cameras already equipped on inspection vehicles to estimate poses from images of the surrounding scene. Amongst these, machine learning-based pose estimation from images shows promise in underwater environments, performing efficient relocalization using models trained based on previously mapped scenes. We explore the efficacy of learning-based pose estimators in both clear and turbid water inspection missions, assessing the impact of image formats, model architectures and training data diversity. We innovate by employing novel view synthesis models to generate augmented training data, significantly enhancing pose estimation in unexplored regions. Moreover, we enhance localization accuracy by integrating pose estimator outputs with sensor data via an extended Kalman filter, demonstrating improved trajectory smoothness and accuracy. △ Less

Submitted 23 July, 2024; originally announced July 2024.

Comments: Submitted to IEEE Journal of Oceanic Engineering

arXiv:2404.17126 [pdf, other]

doi 10.1016/j.compbiomed.2024.109172

Deep Evidential Learning for Radiotherapy Dose Prediction

Authors: Hai Siong Tan, Kuancheng Wang, Rafe Mcbeth

Abstract: In this work, we present a novel application of an uncertainty-quantification framework called Deep Evidential Learning in the domain of radiotherapy dose prediction. Using medical images of the Open Knowledge-Based Planning Challenge dataset, we found that this model can be effectively harnessed to yield uncertainty estimates that inherited correlations with prediction errors upon completion of n… ▽ More In this work, we present a novel application of an uncertainty-quantification framework called Deep Evidential Learning in the domain of radiotherapy dose prediction. Using medical images of the Open Knowledge-Based Planning Challenge dataset, we found that this model can be effectively harnessed to yield uncertainty estimates that inherited correlations with prediction errors upon completion of network training. This was achieved only after reformulating the original loss function for a stable implementation. We found that (i)epistemic uncertainty was highly correlated with prediction errors, with various association indices comparable or stronger than those for Monte-Carlo Dropout and Deep Ensemble methods, (ii)the median error varied with uncertainty threshold much more linearly for epistemic uncertainty in Deep Evidential Learning relative to these other two conventional frameworks, indicative of a more uniformly calibrated sensitivity to model errors, (iii)relative to epistemic uncertainty, aleatoric uncertainty demonstrated a more significant shift in its distribution in response to Gaussian noise added to CT intensity, compatible with its interpretation as reflecting data noise. Collectively, our results suggest that Deep Evidential Learning is a promising approach that can endow deep-learning models in radiotherapy dose prediction with statistical robustness. Towards enhancing its clinical relevance, we demonstrate how we can use such a model to construct the predicted Dose-Volume-Histograms' confidence intervals. △ Less

Submitted 23 September, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

Comments: 28 pages

Journal ref: Computers in Biology and Medicine, Vol. 182, Nov 2024, 109172

arXiv:2404.15163 [pdf, other]

Adaptive Mixed-Scale Feature Fusion Network for Blind AI-Generated Image Quality Assessment

Authors: Tianwei Zhou, Songbai Tan, Wei Zhou, Yu Luo, Yuan-Gen Wang, Guanghui Yue

Abstract: With the increasing maturity of the text-to-image and image-to-image generative models, AI-generated images (AGIs) have shown great application potential in advertisement, entertainment, education, social media, etc. Although remarkable advancements have been achieved in generative models, very few efforts have been paid to design relevant quality assessment models. In this paper, we propose a nov… ▽ More With the increasing maturity of the text-to-image and image-to-image generative models, AI-generated images (AGIs) have shown great application potential in advertisement, entertainment, education, social media, etc. Although remarkable advancements have been achieved in generative models, very few efforts have been paid to design relevant quality assessment models. In this paper, we propose a novel blind image quality assessment (IQA) network, named AMFF-Net, for AGIs. AMFF-Net evaluates AGI quality from three dimensions, i.e., "visual quality", "authenticity", and "consistency". Specifically, inspired by the characteristics of the human visual system and motivated by the observation that "visual quality" and "authenticity" are characterized by both local and global aspects, AMFF-Net scales the image up and down and takes the scaled images and original-sized image as the inputs to obtain multi-scale features. After that, an Adaptive Feature Fusion (AFF) block is used to adaptively fuse the multi-scale features with learnable weights. In addition, considering the correlation between the image and prompt, AMFF-Net compares the semantic features from text encoder and image encoder to evaluate the text-to-image alignment. We carry out extensive experiments on three AGI quality assessment databases, and the experimental results show that our AMFF-Net obtains better performance than nine state-of-the-art blind IQA methods. The results of ablation experiments further demonstrate the effectiveness of the proposed multi-scale input strategy and AFF block. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: IEEE Transactions on Broadcasting (TBC)

arXiv:2403.15132 [pdf, other]

Transfer CLIP for Generalizable Image Denoising

Authors: Jun Cheng, Dong Liang, Shan Tan

Abstract: Image denoising is a fundamental task in computer vision. While prevailing deep learning-based supervised and self-supervised methods have excelled in eliminating in-distribution noise, their susceptibility to out-of-distribution (OOD) noise remains a significant challenge. The recent emergence of contrastive language-image pre-training (CLIP) model has showcased exceptional capabilities in open-w… ▽ More Image denoising is a fundamental task in computer vision. While prevailing deep learning-based supervised and self-supervised methods have excelled in eliminating in-distribution noise, their susceptibility to out-of-distribution (OOD) noise remains a significant challenge. The recent emergence of contrastive language-image pre-training (CLIP) model has showcased exceptional capabilities in open-world image recognition and segmentation. Yet, the potential for leveraging CLIP to enhance the robustness of low-level tasks remains largely unexplored. This paper uncovers that certain dense features extracted from the frozen ResNet image encoder of CLIP exhibit distortion-invariant and content-related properties, which are highly desirable for generalizable denoising. Leveraging these properties, we devise an asymmetrical encoder-decoder denoising network, which incorporates dense features including the noisy image and its multi-scale features from the frozen ResNet encoder of CLIP into a learnable image decoder to achieve generalizable denoising. The progressive feature augmentation strategy is further proposed to mitigate feature overfitting and improve the robustness of the learnable decoder. Extensive experiments and comparisons conducted across diverse OOD noises, including synthetic noise, real-world sRGB noise, and low-dose CT image noise, demonstrate the superior generalization ability of our method. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: Accepted by CVPR2024

arXiv:2403.01229 [pdf, other]

REWIND Dataset: Privacy-preserving Speaking Status Segmentation from Multimodal Body Movement Signals in the Wild

Authors: Jose Vargas Quiros, Chirag Raman, Stephanie Tan, Ekin Gedik, Laura Cabrera-Quiros, Hayley Hung

Abstract: Recognizing speaking in humans is a central task towards understanding social interactions. Ideally, speaking would be detected from individual voice recordings, as done previously for meeting scenarios. However, individual voice recordings are hard to obtain in the wild, especially in crowded mingling scenarios due to cost, logistics, and privacy concerns. As an alternative, machine learning mode… ▽ More Recognizing speaking in humans is a central task towards understanding social interactions. Ideally, speaking would be detected from individual voice recordings, as done previously for meeting scenarios. However, individual voice recordings are hard to obtain in the wild, especially in crowded mingling scenarios due to cost, logistics, and privacy concerns. As an alternative, machine learning models trained on video and wearable sensor data make it possible to recognize speech by detecting its related gestures in an unobtrusive, privacy-preserving way. These models themselves should ideally be trained using labels obtained from the speech signal. However, existing mingling datasets do not contain high quality audio recordings. Instead, speaking status annotations have often been inferred by human annotators from video, without validation of this approach against audio-based ground truth. In this paper we revisit no-audio speaking status estimation by presenting the first publicly available multimodal dataset with high-quality individual speech recordings of 33 subjects in a professional networking event. We present three baselines for no-audio speaking status segmentation: a) from video, b) from body acceleration (chest-worn accelerometer), c) from body pose tracks. In all cases we predict a 20Hz binary speaking status signal extracted from the audio, a time resolution not available in previous datasets. In addition to providing the signals and ground truth necessary to evaluate a wide range of speaking status detection methods, the availability of audio in REWIND makes it suitable for cross-modality studies not feasible with previous mingling datasets. Finally, our flexible data consent setup creates new challenges for multimodal systems under missing modalities. △ Less

Submitted 2 March, 2024; originally announced March 2024.

arXiv:2402.18600 [pdf]

doi 10.1007/s40200-025-01596-7

Artificial Intelligence and Diabetes Mellitus: An Inside Look Through the Retina

Authors: Yasin Sadeghi Bazargani, Majid Mirzaei, Navid Sobhi, Mirsaeed Abdollahi, Ali Jafarizadeh, Siamak Pedrammehr, Roohallah Alizadehsani, Ru San Tan, Sheikh Mohammed Shariful Islam, U. Rajendra Acharya

Abstract: Diabetes mellitus (DM) predisposes patients to vascular complications. Retinal images and vasculature reflect the body's micro- and macrovascular health. They can be used to diagnose DM complications, including diabetic retinopathy (DR), neuropathy, nephropathy, and atherosclerotic cardiovascular disease, as well as forecast the risk of cardiovascular events. Artificial intelligence (AI)-enabled s… ▽ More Diabetes mellitus (DM) predisposes patients to vascular complications. Retinal images and vasculature reflect the body's micro- and macrovascular health. They can be used to diagnose DM complications, including diabetic retinopathy (DR), neuropathy, nephropathy, and atherosclerotic cardiovascular disease, as well as forecast the risk of cardiovascular events. Artificial intelligence (AI)-enabled systems developed for high-throughput detection of DR using digitized retinal images have become clinically adopted. Beyond DR screening, AI integration also holds immense potential to address challenges associated with the holistic care of the patient with DM. In this work, we aim to comprehensively review the literature for studies on AI applications based on retinal images related to DM diagnosis, prognostication, and management. We will describe the findings of holistic AI-assisted diabetes care, including but not limited to DR screening, and discuss barriers to implementing such systems, including issues concerning ethics, data privacy, equitable access, and explainability. With the ability to evaluate the patient's health status vis a vis DM complication as well as risk prognostication of future cardiovascular complications, AI-assisted retinal image analysis has the potential to become a central tool for modern personalized medicine in patients with DM. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 44 Pages, 6 figures, 1 table, 166 references

ACM Class: J.3.2; J.3.3

arXiv:2401.13587 [pdf, other]

Deep Learning Based Adaptive Joint mmWave Beam Alignment

Authors: Daniel Tandler, Marc Gauger, Ahmet Serdar Tan, Sebastian Dörner, Stephan ten Brink

Abstract: The challenging propagation environment, combined with the hardware limitations of mmWave systems, gives rise to the need for accurate initial access beam alignment strategies with low latency and high achievable beamforming gain. Much of the recent work in this area either focuses on one-sided beam alignment, or, joint beam alignment methods where both sides of the link perform a sequence of fixe… ▽ More The challenging propagation environment, combined with the hardware limitations of mmWave systems, gives rise to the need for accurate initial access beam alignment strategies with low latency and high achievable beamforming gain. Much of the recent work in this area either focuses on one-sided beam alignment, or, joint beam alignment methods where both sides of the link perform a sequence of fixed channel probing steps. Codebook-based non-adaptive beam alignment schemes have the potential to allow multiple user equipment (UE) to perform initial access beam alignment in parallel whereas adaptive schemes are favourable in achievable beamforming gain. This work introduces a novel deep learning based joint beam alignment scheme that aims to combine the benefits of adaptive, codebook-free beam alignment at the UE side with the advantages of a codebook-sweep based scheme at the base station. The proposed end-to-end trainable scheme is compatible with current cellular standard signaling and can be readily integrated into the standard without requiring significant changes to it. Extensive simulations demonstrate superior performance of the proposed approach over purely codebook-based ones. △ Less

Submitted 24 January, 2024; originally announced January 2024.

arXiv:2311.06572 [pdf, other]

Swin UNETR++: Advancing Transformer-Based Dense Dose Prediction Towards Fully Automated Radiation Oncology Treatments

Authors: Kuancheng Wang, Hai Siong Tan, Rafe Mcbeth

Abstract: The field of Radiation Oncology is uniquely positioned to benefit from the use of artificial intelligence to fully automate the creation of radiation treatment plans for cancer therapy. This time-consuming and specialized task combines patient imaging with organ and tumor segmentation to generate a 3D radiation dose distribution to meet clinical treatment goals, similar to voxel-level dense predic… ▽ More The field of Radiation Oncology is uniquely positioned to benefit from the use of artificial intelligence to fully automate the creation of radiation treatment plans for cancer therapy. This time-consuming and specialized task combines patient imaging with organ and tumor segmentation to generate a 3D radiation dose distribution to meet clinical treatment goals, similar to voxel-level dense prediction. In this work, we propose Swin UNETR++, that contains a lightweight 3D Dual Cross-Attention (DCA) module to capture the intra and inter-volume relationships of each patient's unique anatomy, which fully convolutional neural networks lack. Our model was trained, validated, and tested on the Open Knowledge-Based Planning dataset. In addition to metrics of Dose Score $\overline{S_{\text{Dose}}}$ and DVH Score $\overline{S_{\text{DVH}}}$ that quantitatively measure the difference between the predicted and ground-truth 3D radiation dose distribution, we propose the qualitative metrics of average volume-wise acceptance rate $\overline{R_{\text{VA}}}$ and average patient-wise clinical acceptance rate $\overline{R_{\text{PA}}}$ to assess the clinical reliability of the predictions. Swin UNETR++ demonstrates near-state-of-the-art performance on validation and test dataset (validation: $\overline{S_{\text{DVH}}}$=1.492 Gy, $\overline{S_{\text{Dose}}}$=2.649 Gy, $\overline{R_{\text{VA}}}$=88.58%, $\overline{R_{\text{PA}}}$=100.0%; test: $\overline{S_{\text{DVH}}}$=1.634 Gy, $\overline{S_{\text{Dose}}}$=2.757 Gy, $\overline{R_{\text{VA}}}$=90.50%, $\overline{R_{\text{PA}}}$=98.0%), establishing a basis for future studies to translate 3D dose predictions into a deliverable treatment plan, facilitating full automation. △ Less

Submitted 12 October, 2024; v1 submitted 11 November, 2023; originally announced November 2023.

Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2023, December 10th, 2023, New Orleans, United States, 16 pages

arXiv:2311.06552 [pdf, other]

Stain Consistency Learning: Handling Stain Variation for Automatic Digital Pathology Segmentation

Authors: Michael Yeung, Todd Watts, Sean YW Tan, Pedro F. Ferreira, Andrew D. Scott, Sonia Nielles-Vallespin, Guang Yang

Abstract: Stain variation is a unique challenge associated with automated analysis of digital pathology. Numerous methods have been developed to improve the robustness of machine learning methods to stain variation, but comparative studies have demonstrated limited benefits to performance. Moreover, methods to handle stain variation were largely developed for H&E stained data, with evaluation generally limi… ▽ More Stain variation is a unique challenge associated with automated analysis of digital pathology. Numerous methods have been developed to improve the robustness of machine learning methods to stain variation, but comparative studies have demonstrated limited benefits to performance. Moreover, methods to handle stain variation were largely developed for H&E stained data, with evaluation generally limited to classification tasks. Here we propose Stain Consistency Learning, a novel framework combining stain-specific augmentation with a stain consistency loss function to learn stain colour invariant features. We perform the first, extensive comparison of methods to handle stain variation for segmentation tasks, comparing ten methods on Masson's trichrome and H&E stained cell and nuclei datasets, respectively. We observed that stain normalisation methods resulted in equivalent or worse performance, while stain augmentation or stain adversarial methods demonstrated improved performance, with the best performance consistently achieved by our proposed approach. The code is available at: https://github.com/mlyg/stain_consistency_learning △ Less

Submitted 11 November, 2023; originally announced November 2023.

arXiv:2308.16742 [pdf, other]

Unsupervised CT Metal Artifact Reduction by Plugging Diffusion Priors in Dual Domains

Authors: Xuan Liu, Yaoqin Xie, Songhui Diao, Shan Tan, Xiaokun Liang

Abstract: During the process of computed tomography (CT), metallic implants often cause disruptive artifacts in the reconstructed images, impeding accurate diagnosis. Several supervised deep learning-based approaches have been proposed for reducing metal artifacts (MAR). However, these methods heavily rely on training with simulated data, as obtaining paired metal artifact CT and clean CT data in clinical s… ▽ More During the process of computed tomography (CT), metallic implants often cause disruptive artifacts in the reconstructed images, impeding accurate diagnosis. Several supervised deep learning-based approaches have been proposed for reducing metal artifacts (MAR). However, these methods heavily rely on training with simulated data, as obtaining paired metal artifact CT and clean CT data in clinical settings is challenging. This limitation can lead to decreased performance when applying these methods in clinical practice. Existing unsupervised MAR methods, whether based on learning or not, typically operate within a single domain, either in the image domain or the sinogram domain. In this paper, we propose an unsupervised MAR method based on the diffusion model, a generative model with a high capacity to represent data distributions. Specifically, we first train a diffusion model using CT images without metal artifacts. Subsequently, we iteratively utilize the priors embedded within the pre-trained diffusion model in both the sinogram and image domains to restore the degraded portions caused by metal artifacts. This dual-domain processing empowers our approach to outperform existing unsupervised MAR methods, including another MAR method based on the diffusion model, which we have qualitatively and quantitatively validated using synthetic datasets. Moreover, our method demonstrates superior visual results compared to both supervised and unsupervised methods on clinical datasets. △ Less

Submitted 5 January, 2024; v1 submitted 31 August, 2023; originally announced August 2023.

arXiv:2305.15887 [pdf, other]

Diffusion Probabilistic Priors for Zero-Shot Low-Dose CT Image Denoising

Authors: Xuan Liu, Yaoqin Xie, Jun Cheng, Songhui Diao, Shan Tan, Xiaokun Liang

Abstract: Denoising low-dose computed tomography (CT) images is a critical task in medical image computing. Supervised deep learning-based approaches have made significant advancements in this area in recent years. However, these methods typically require pairs of low-dose and normal-dose CT images for training, which are challenging to obtain in clinical settings. Existing unsupervised deep learning-based… ▽ More Denoising low-dose computed tomography (CT) images is a critical task in medical image computing. Supervised deep learning-based approaches have made significant advancements in this area in recent years. However, these methods typically require pairs of low-dose and normal-dose CT images for training, which are challenging to obtain in clinical settings. Existing unsupervised deep learning-based methods often require training with a large number of low-dose CT images or rely on specially designed data acquisition processes to obtain training data. To address these limitations, we propose a novel unsupervised method that only utilizes normal-dose CT images during training, enabling zero-shot denoising of low-dose CT images. Our method leverages the diffusion model, a powerful generative model. We begin by training a cascaded unconditional diffusion model capable of generating high-quality normal-dose CT images from low-resolution to high-resolution. The cascaded architecture makes the training of high-resolution diffusion models more feasible. Subsequently, we introduce low-dose CT images into the reverse process of the diffusion model as likelihood, combined with the priors provided by the diffusion model and iteratively solve multiple maximum a posteriori (MAP) problems to achieve denoising. Additionally, we propose methods to adaptively adjust the coefficients that balance the likelihood and prior in MAP estimations, allowing for adaptation to different noise levels in low-dose CT images. We test our method on low-dose CT datasets of different regions with varying dose levels. The results demonstrate that our method outperforms the state-of-the-art unsupervised method and surpasses several supervised deep learning-based methods. Codes are available in https://github.com/DeepXuan/Dn-Dp. △ Less

Submitted 13 July, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

arXiv:2305.02493 [pdf, other]

RCP-RF: A Comprehensive Road-car-pedestrian Risk Management Framework based on Driving Risk Potential Field

Authors: Shuhang Tan, Zhiling Wang, Yan Zhong

Abstract: Recent years have witnessed the proliferation of traffic accidents, which led wide researches on Automated Vehicle (AV) technologies to reduce vehicle accidents, especially on risk assessment framework of AV technologies. However, existing time-based frameworks can not handle complex traffic scenarios and ignore the motion tendency influence of each moving objects on the risk distribution, leading… ▽ More Recent years have witnessed the proliferation of traffic accidents, which led wide researches on Automated Vehicle (AV) technologies to reduce vehicle accidents, especially on risk assessment framework of AV technologies. However, existing time-based frameworks can not handle complex traffic scenarios and ignore the motion tendency influence of each moving objects on the risk distribution, leading to performance degradation. To address this problem, we novelly propose a comprehensive driving risk management framework named RCP-RF based on potential field theory under Connected and Automated Vehicles (CAV) environment, where the pedestrian risk metric are combined into a unified road-vehicle driving risk management framework. Different from existing algorithms, the motion tendency between ego and obstacle cars and the pedestrian factor are legitimately considered in the proposed framework, which can improve the performance of the driving risk model. Moreover, it requires only O(N 2) of time complexity in the proposed method. Empirical studies validate the superiority of our proposed framework against state-of-the-art methods on real-world dataset NGSIM and real AV platform. △ Less

Submitted 3 May, 2023; originally announced May 2023.

arXiv:2301.05536 [pdf]

doi 10.1109/TAP.2023.3235015

An Electromagnetic-Information-Theory Based Model for Efficient Characterization of MIMO Systems in Complex Space

Authors: Ruifeng Li, Da Li, Jinyan Ma, Zhaoyang Feng, Ling Zhang, Shurun Tan, Wei E. I. Sha, Hongsheng Chen, Er-Ping Li

Abstract: It is the pursuit of a multiple-input-multiple-output (MIMO) system to approach and even break the limit of channel capacity. However, it is always a big challenge to efficiently characterize the MIMO systems in complex space and get better propagation performance than the conventional MIMO systems considering only free space, which is important for guiding the power and phase allocation of antenn… ▽ More It is the pursuit of a multiple-input-multiple-output (MIMO) system to approach and even break the limit of channel capacity. However, it is always a big challenge to efficiently characterize the MIMO systems in complex space and get better propagation performance than the conventional MIMO systems considering only free space, which is important for guiding the power and phase allocation of antenna units. In this manuscript, an Electromagnetic-Information-Theory (EMIT) based model is developed for efficient characterization of MIMO systems in complex space. The group-T-matrix-based multiple scattering fast algorithm, the mode-decomposition-based characterization method, and their joint theoretical framework in complex space are discussed. Firstly, key informatics parameters in free electromagnetic space based on a dyadic Green's function are derived. Next, a novel group-T-matrix-based multiple scattering fast algorithm is developed to describe a representative inhomogeneous electromagnetic space. All the analytical results are validated by simulations. In addition, the complete form of the EMIT-based model is proposed to derive the informatics parameters frequently used in electromagnetic propagation, through integrating the mode analysis method with the dyadic Green's function matrix. Finally, as a proof-or-concept, microwave anechoic chamber measurements of a cylindrical array is performed, demonstrating the effectiveness of the EMIT-based model. Meanwhile, a case of image transmission with limited power is presented to illustrate how to use this EMIT-based model to guide the power and phase allocation of antenna units for real MIMO applications. △ Less

Submitted 13 January, 2023; originally announced January 2023.

Comments: 13 pages, 14 figures

Journal ref: IEEE Transactions on Antennas and Propagation, 2023

arXiv:2301.01703 [pdf, other]

Technology Trends for Massive MIMO towards 6G

Authors: Yiming Huo, Xingqin Lin, Boya Di, Hongliang Zhang, Francisco Javier Lorca Hernando, Ahmet Serdar Tan, Shahid Mumtaz, Özlem Tuğfe Demir, Kun Chen-Hu

Abstract: At the dawn of the next-generation wireless systems and networks, massive multiple-input multiple-output (MIMO) has been envisioned as one of the enabling technologies. With the continued success of being applied in the 5G and beyond, the massive MIMO technology has demonstrated its advantageousness, integrability, and extendibility. Moreover, several evolutionary features and revolutionizing tren… ▽ More At the dawn of the next-generation wireless systems and networks, massive multiple-input multiple-output (MIMO) has been envisioned as one of the enabling technologies. With the continued success of being applied in the 5G and beyond, the massive MIMO technology has demonstrated its advantageousness, integrability, and extendibility. Moreover, several evolutionary features and revolutionizing trends for massive MIMO have gradually emerged in recent years, which are expected to reshape the future 6G wireless systems and networks. Specifically, the functions and performance of future massive MIMO systems will be enabled and enhanced via combining other innovative technologies, architectures, and strategies such as intelligent omni-surfaces (IOSs)/intelligent reflecting surfaces (IRSs), artificial intelligence (AI), THz communications, cell free architecture. Also, more diverse vertical applications based on massive MIMO will emerge and prosper, such as wireless localization and sensing, vehicular communications, non-terrestrial communications, remote sensing, inter-planetary communications. △ Less

Submitted 5 January, 2023; v1 submitted 4 January, 2023; originally announced January 2023.

Comments: 7 pages, 5 figures. This work has been submitted to the IEEE for possible publication

arXiv:2211.14553 [pdf]

doi 10.14445/22315381/IJETT-V70I11P208

A Remote Baby Surveillance System with RFID and GPS Tracking

Authors: Ruven A/L Sundarajoo, Gwo Chin Chung, Wai Leong Pang, Soo Fun Tan

Abstract: In the 21st century, sending babies or children to daycare centres has become more and more common among young guardians. The balance between full-time work and child care is increasingly challenging nowadays. In Malaysia, thousands of child abuse cases have been reported from babysitting centres every year, which indeed triggers the anxiety and stress of the guardians. Hence, this paper proposes… ▽ More In the 21st century, sending babies or children to daycare centres has become more and more common among young guardians. The balance between full-time work and child care is increasingly challenging nowadays. In Malaysia, thousands of child abuse cases have been reported from babysitting centres every year, which indeed triggers the anxiety and stress of the guardians. Hence, this paper proposes to construct a remote baby surveillance system with radio-frequency identification (RFID) and global positioning system (GPS) tracking. With the incorporation of the Internet of Things (IoT), a sensor-based microcontroller is used to detect the conditions of the baby as well as the surrounding environment and then display the real-time data as well as notifications to alert the guardians via a mobile application. These conditions include the crying and waking of the baby, as well as temperature, the mattress's wetness, and moving objects around the baby. In addition, RFID and GPS location tracking are implemented to ensure the safety of the baby, while white noise is used to increase the comfort of the baby. In the end, a prototype has been successfully developed for functionality and reliability testing. Several experiments have been conducted to measure the efficiency of the mattress's wetness detection, the RFID transmission range, the frequency spectrum of white noise, and also the output power of the solar panel. The proposed system is expected to assist guardians in ensuring the safety and comfort of their babies remotely, as well as prevent any occurrence of child abuse. △ Less

Submitted 26 November, 2022; originally announced November 2022.

Comments: 12 pages, 13 figures Published with International Journal of Engineering Trends and Technology (IJETT)

Journal ref: International Journal of Engineering Trends and Technology, vol. 70, no. 11, pp. 81-92, 2022

arXiv:2210.14446 [pdf, other]

Smart Speech Segmentation using Acousto-Linguistic Features with look-ahead

Authors: Piyush Behre, Naveen Parihar, Sharman Tan, Amy Shah, Eva Sharma, Geoffrey Liu, Shuangyu Chang, Hosam Khalil, Chris Basoglu, Sayan Pathak

Abstract: Segmentation for continuous Automatic Speech Recognition (ASR) has traditionally used silence timeouts or voice activity detectors (VADs), which are both limited to acoustic features. This segmentation is often overly aggressive, given that people naturally pause to think as they speak. Consequently, segmentation happens mid-sentence, hindering both punctuation and downstream tasks like machine tr… ▽ More Segmentation for continuous Automatic Speech Recognition (ASR) has traditionally used silence timeouts or voice activity detectors (VADs), which are both limited to acoustic features. This segmentation is often overly aggressive, given that people naturally pause to think as they speak. Consequently, segmentation happens mid-sentence, hindering both punctuation and downstream tasks like machine translation for which high-quality segmentation is critical. Model-based segmentation methods that leverage acoustic features are powerful, but without an understanding of the language itself, these approaches are limited. We present a hybrid approach that leverages both acoustic and language information to improve segmentation. Furthermore, we show that including one word as a look-ahead boosts segmentation quality. On average, our models improve segmentation-F0.5 score by 9.8% over baseline. We show that this approach works for multiple languages. For the downstream task of machine translation, it improves the translation BLEU score by an average of 1.05 points. △ Less

Submitted 27 October, 2022; v1 submitted 25 October, 2022; originally announced October 2022.

arXiv:2201.05766 [pdf, ps, other]

Integrated Sensing and Communication with mmWave Massive MIMO: A Compressed Sampling Perspective

Authors: Zhen Gao, Ziwei Wan, Dezhi Zheng, Shufeng Tan, Christos Masouros, Derrick Wing Kwan Ng, Sheng Chen

Abstract: Integrated sensing and communication (ISAC) has opened up numerous game-changing opportunities for realizing future wireless systems. In this paper, we propose an ISAC processing framework relying on millimeter-wave (mmWave) massive multiple-input multiple-output (MIMO) systems. Specifically, we provide a compressed sampling (CS) perspective to facilitate ISAC processing, which can not only recove… ▽ More Integrated sensing and communication (ISAC) has opened up numerous game-changing opportunities for realizing future wireless systems. In this paper, we propose an ISAC processing framework relying on millimeter-wave (mmWave) massive multiple-input multiple-output (MIMO) systems. Specifically, we provide a compressed sampling (CS) perspective to facilitate ISAC processing, which can not only recover the high-dimensional channel state information or/and radar imaging information, but also significantly reduce pilot overhead. First, an energy-efficient widely spaced array (WSA) architecture is tailored for the radar receiver, which enhances the angular resolution of radar sensing at the cost of angular ambiguity. Then, we propose an ISAC frame structure for time-varying ISAC systems considering different timescales. The pilot waveforms are judiciously designed by taking into account both CS theories and hardware constraints induced by hybrid beamforming (HBF) architecture. Next, we design the dedicated dictionary for WSA that serves as a building block for formulating the ISAC processing as sparse signal recovery problems. The orthogonal matching pursuit with support refinement (OMP-SR) algorithm is proposed to effectively solve the problems in the existence of the angular ambiguity. We also provide a framework for estimating the Doppler frequencies during payload data transmission to guarantee communication performances. Simulation results demonstrate the good performances of both communications and radar sensing under the proposed ISAC framework. △ Less

Submitted 9 September, 2022; v1 submitted 15 January, 2022; originally announced January 2022.

Comments: 32 pages, 15 figures, accepted by IEEE Transactions on Wireless Communications

arXiv:2110.05319 [pdf, other]

doi 10.1109/TGRS.2022.3196810

MD Loss: Efficient Training of 3D Seismic Fault Segmentation Network under Sparse Labels by Weakening Anomaly Annotation

Authors: Yimin Dou, Kewen Li, Jianbing Zhu, Timing Li, Shaoquan Tan, Zongchao Huang

Abstract: Data-driven fault detection has been regarded as a 3D image segmentation task. The models trained from synthetic data are difficult to generalize in some surveys. Recently, training 3D fault segmentation using sparse manual 2D slices is thought to yield promising results, but manual labeling has many false negative labels (abnormal annotations), which is detrimental to training and consequently to… ▽ More Data-driven fault detection has been regarded as a 3D image segmentation task. The models trained from synthetic data are difficult to generalize in some surveys. Recently, training 3D fault segmentation using sparse manual 2D slices is thought to yield promising results, but manual labeling has many false negative labels (abnormal annotations), which is detrimental to training and consequently to detection performance. Motivated to train 3D fault segmentation networks under sparse 2D labels while suppressing false negative labels, we analyze the training process gradient and propose the Mask Dice (MD) loss. Moreover, the fault is an edge feature, and current encoder-decoder architectures widely used for fault detection (e.g., U-shape network) are not conducive to edge representation. Consequently, Fault-Net is proposed, which is designed for the characteristics of faults, employs high-resolution propagation features, and embeds MultiScale Compression Fusion block to fuse multi-scale information, which allows the edge information to be fully preserved during propagation and fusion, thus enabling advanced performance via few computational resources. Experimental demonstrates that MD loss supports the inclusion of human experience in training and suppresses false negative labels therein, enabling baseline models to improve performance and generalize to more surveys. Fault-Net is capable to provide a more stable and reliable interpretation of faults, it uses extremely low computational resources and inference is significantly faster than other models. Our method indicates optimal performance in comparison with several mainstream methods. △ Less

Submitted 21 June, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

Comments: This work has been submitted to the IEEE for possible publication

arXiv:2106.15264 [pdf]

doi 10.1109/TPEL.2021.3074324

Dynamic Response and Stability Margin Improvement of Wireless Power Receiver Systems via Right-Half-Plane Zero Elimination

Authors: Kerui Li, Siew-Chong Tan, Ron Shu Yuen Hui

Abstract: The series-series compensation topology is widely adopted in many wireless power transfer applications. For such systems, their wireless power receiver part typically involves a DC-DC converter with front-stage full-bridge diode rectifier, to process the high-frequency transmitted AC power into a DC output voltage for the load. It is recently reported that the current source nature of the series-s… ▽ More The series-series compensation topology is widely adopted in many wireless power transfer applications. For such systems, their wireless power receiver part typically involves a DC-DC converter with front-stage full-bridge diode rectifier, to process the high-frequency transmitted AC power into a DC output voltage for the load. It is recently reported that the current source nature of the series-series compensation will introduce right-half-plane (RHP) zeros into the small-signal transfer functions of the DC-DC converter of the wireless power receiver, which will severely affect the stability and dynamic response of the system. To resolve this issue, in this paper, it is proposed to adopt a different rectifier configuration for the system such that the input current to the DC-DC converter becomes controllable to eliminate the presence of RHP zeros of the small-signal transfer functions of the system. This rectifier can be applied to different wireless power receivers using the buck, buck-boost, or boost converters. As compared with the original wireless power receivers, the modified ones feature minimum-phase characteristics and hence ease the design of compensator. Theoretical and experimental results are provided. The comparative experimental results verify the elimination of the RHP zero, improved dynamic responses of reference tracking and against load disturbances, and a larger stability margin. △ Less

Submitted 17 April, 2021; originally announced June 2021.

Comments: IEEE Transactions on Power Electronics, 2021

arXiv:2104.04641 [pdf, other]

CodedStereo: Learned Phase Masks for Large Depth-of-field Stereo

Authors: Shiyu Tan, Yicheng Wu, Shoou-I Yu, Ashok Veeraraghavan

Abstract: Conventional stereo suffers from a fundamental trade-off between imaging volume and signal-to-noise ratio (SNR) -- due to the conflicting impact of aperture size on both these variables. Inspired by the extended depth of field cameras, we propose a novel end-to-end learning-based technique to overcome this limitation, by introducing a phase mask at the aperture plane of the cameras in a stereo ima… ▽ More Conventional stereo suffers from a fundamental trade-off between imaging volume and signal-to-noise ratio (SNR) -- due to the conflicting impact of aperture size on both these variables. Inspired by the extended depth of field cameras, we propose a novel end-to-end learning-based technique to overcome this limitation, by introducing a phase mask at the aperture plane of the cameras in a stereo imaging system. The phase mask creates a depth-dependent point spread function, allowing us to recover sharp image texture and stereo correspondence over a significantly extended depth of field (EDOF) than conventional stereo. The phase mask pattern, the EDOF image reconstruction, and the stereo disparity estimation are all trained together using an end-to-end learned deep neural network. We perform theoretical analysis and characterization of the proposed approach and show a 6x increase in volume that can be imaged in simulation. We also build an experimental prototype and validate the approach using real-world results acquired using this prototype system. △ Less

Submitted 9 April, 2021; originally announced April 2021.

Comments: Accepted to CVPR 2021 as an oral presentation

arXiv:2011.14961 [pdf]

doi 10.1109/TPEL.2020.3041006

On Effect of Right-Half-Plane Zero Present in Buck Converters with Input Current Source in Wireless Power Receiver Systems

Authors: Kerui Li, Siew-Chong Tan, Ron Shu Yuen Hui

Abstract: In wireless power receiver systems, the buck converter is widely used to step down the higher rectified voltage derived from the wireless receiver coil, to a lower output voltage for the immediate battery charging process. In this work, the presence and effect of the right-half-plane (RHP) zeros found in the small-signal inductor-current-to-duty-ratio and output-voltage-to-duty ratio transfer func… ▽ More In wireless power receiver systems, the buck converter is widely used to step down the higher rectified voltage derived from the wireless receiver coil, to a lower output voltage for the immediate battery charging process. In this work, the presence and effect of the right-half-plane (RHP) zeros found in the small-signal inductor-current-to-duty-ratio and output-voltage-to-duty ratio transfer functions of the buck converter in the wireless power receiver system on the control performance, are investigated. It is found and mathematically proved that the RHP zeros are introduced by the current source nature of the system attributed to the series-series compensation and finite DC-link capacitance. The RHP zero not only results in non-monotonic open-loop dynamic response but also complicates the design of feedback control and causes potential closed-loop instability. Theoretical and experimental results are provided to validate the presence of the RHP zeros and their effect on open-loop and closed-loop dynamic responses. △ Less

Submitted 25 November, 2020; originally announced November 2020.

Comments: 11 pages. IEEE Transactions on Power Electronics 2020 (Early access)

Journal ref: IEEE Trans. Power Electron., vol. 36, no. 6, pp. 6364-6374, June 2021

arXiv:2008.03715 [pdf, other]

doi 10.1145/3394171.3413697

A Modular Approach for Synchronized Wireless Multimodal Multisensor Data Acquisition in Highly Dynamic Social Settings

Authors: Chirag Raman, Stephanie Tan, Hayley Hung

Abstract: Existing data acquisition literature for human behavior research provides wired solutions, mainly for controlled laboratory setups. In uncontrolled free-standing conversation settings, where participants are free to walk around, these solutions are unsuitable. While wireless solutions are employed in the broadcasting industry, they can be prohibitively expensive. In this work, we propose a modular… ▽ More Existing data acquisition literature for human behavior research provides wired solutions, mainly for controlled laboratory setups. In uncontrolled free-standing conversation settings, where participants are free to walk around, these solutions are unsuitable. While wireless solutions are employed in the broadcasting industry, they can be prohibitively expensive. In this work, we propose a modular and cost-effective wireless approach for synchronized multisensor data acquisition of social human behavior. Our core idea involves a cost-accuracy trade-off by using Network Time Protocol (NTP) as a source reference for all sensors. While commonly used as a reference in ubiquitous computing, NTP is widely considered to be insufficiently accurate as a reference for video applications, where Precision Time Protocol (PTP) or Global Positioning System (GPS) based references are preferred. We argue and show, however, that the latency introduced by using NTP as a source reference is adequate for human behavior research, and the subsequent cost and modularity benefits are a desirable trade-off for applications in this domain. We also describe one instantiation of the approach deployed in a real-world experiment to demonstrate the practicality of our setup in-the-wild. △ Less

Submitted 9 August, 2020; originally announced August 2020.

Comments: 9 pages, 8 figures, Proceedings of the 28th ACM International Conference on Multimedia (MM '20), October 12--16, 2020, Seattle, WA, USA. First two authors contributed equally

arXiv:2005.01125 [pdf, other]

doi 10.1007/978-981-15-8155-7_423

Implementation of UAV Coordination Based on a Hierarchical Multi-UAV Simulation Platform

Authors: Kun Xiao, Lan Ma, Shaochang Tan, Yirui Cong, Xiangke Wang

Abstract: In this paper, a hierarchical multi-UAV simulation platform,called XTDrone, is designed for UAV swarms, which is completely open-source 4 . There are six layers in XTDrone: communication, simulator,low-level control, high-level control, coordination, and human interac-tion layers. XTDrone has three advantages. Firstly, the simulation speedcan be adjusted to match the computer performance, based on… ▽ More In this paper, a hierarchical multi-UAV simulation platform,called XTDrone, is designed for UAV swarms, which is completely open-source 4 . There are six layers in XTDrone: communication, simulator,low-level control, high-level control, coordination, and human interac-tion layers. XTDrone has three advantages. Firstly, the simulation speedcan be adjusted to match the computer performance, based on the lock-step mode. Thus, the simulations can be conducted on a work stationor on a personal laptop, for different purposes. Secondly, a simplifiedsimulator is also developed which enables quick algorithm designing sothat the approximated behavior of UAV swarms can be observed inadvance. Thirdly, XTDrone is based on ROS, Gazebo, and PX4, andhence the codes in simulations can be easily transplanted to embeddedsystems. Note that XTDrone can support various types of multi-UAVmissions, and we provide two important demos in this paper: one is aground-station-based multi-UAV cooperative search, and the other is adistributed UAV formation flight, including consensus-based formationcontrol, task assignment, and obstacle avoidance. △ Less

Submitted 30 May, 2022; v1 submitted 3 May, 2020; originally announced May 2020.

Comments: 12 pages, 10 figures. And for the, see https://gitee.com/robin_shaun/XTDrone

Journal ref: Proceedings of 2020 International Conference on Guidance Navigation and Control, ICGNC 2020, Tianjin, China, October 23-25, 2020

arXiv:2004.13420 [pdf]

doi 10.1109/TPEL.2020.2991297

On Beat Frequency Oscillation of Two-Stage Wireless Power Receivers

Authors: Kerui Li, Siew-Chong Tan, Ron Shu Yuen Hui

Abstract: Two-stage wireless power receivers, which typically include an AC-DC diode rectifier and a DC-DC regulator, are popular solutions in low-power wireless power transfer applications. However, the interaction between the rectifier and the regulator may introduce beat frequency oscillation on both the DC-link and output capacitors. In this paper, the cause of the beat frequency oscillation and its rel… ▽ More Two-stage wireless power receivers, which typically include an AC-DC diode rectifier and a DC-DC regulator, are popular solutions in low-power wireless power transfer applications. However, the interaction between the rectifier and the regulator may introduce beat frequency oscillation on both the DC-link and output capacitors. In this paper, the cause of the beat frequency oscillation and its related issues are investigated with the corresponding design solution on alleviating the oscillation discussed. Theoretical and experimental results verifying the presence of beat frequency oscillation in the two-stage wireless receiver system are provided. Our study shows that the beat frequency oscillation can be significantly alleviated if appropriate design solutions are applied. △ Less

Submitted 5 October, 2020; v1 submitted 28 April, 2020; originally announced April 2020.

Journal ref: in IEEE Transactions on Power Electronics, vol. 35, no. 12, pp. 12741-12751, Dec. 2020

arXiv:2004.13181 [pdf, ps, other]

EM-GAN: Fast Stress Analysis for Multi-Segment Interconnect Using Generative Adversarial Networks

Authors: Wentian Jin, Sheriff Sadiqbatcha, Jinwei Zhang, Sheldon X. -D. Tan

Abstract: In this paper, we propose a fast transient hydrostatic stress analysis for electromigration (EM) failure assessment for multi-segment interconnects using generative adversarial networks (GANs). Our work leverages the image synthesis feature of GAN-based generative deep neural networks. The stress evaluation of multi-segment interconnects, modeled by partial differential equations, can be viewed as… ▽ More In this paper, we propose a fast transient hydrostatic stress analysis for electromigration (EM) failure assessment for multi-segment interconnects using generative adversarial networks (GANs). Our work leverages the image synthesis feature of GAN-based generative deep neural networks. The stress evaluation of multi-segment interconnects, modeled by partial differential equations, can be viewed as time-varying 2D-images-to-image problem where the input is the multi-segment interconnects topology with current densities and the output is the EM stress distribution in those wire segments at the given aging time. Based on this observation, we train conditional GAN model using the images of many self-generated multi-segment wires and wire current densities and aging time (as conditions) against the COMSOL simulation results. Different hyperparameters of GAN were studied and compared. The proposed algorithm, called {\it EM-GAN}, can quickly give accurate stress distribution of a general multi-segment wire tree for a given aging time, which is important for full-chip fast EM failure assessment. Our experimental results show that the EM-GAN shows 6.6\% averaged error compared to COMSOL simulation results with orders of magnitude speedup. It also delivers 8.3X speedup over state-of-the-art analytic based EM analysis solver. △ Less

Submitted 27 April, 2020; originally announced April 2020.

arXiv:2004.06048 [pdf]

doi 10.1109/JESTPE.2020.2986372

Highly-Efficient Single-Switch-Regulated Resonant Wireless Power Receiver with Hybrid Modulation

Authors: Kerui Li, Albert Ting Leung Lee, Siew-Chong Tan, Ron Shu Yuen Hui

Abstract: In this paper, a highly-efficient single-switch-regulated resonant wireless power receiver with hybrid modulation is proposed. To achieve both high efficiency and good output voltage regulation, phase shift and pulse width hybrid modulation are simultaneously applied. The soft switching operation in this topology is achieved by the cycle-by-cycle phase shift adjustment between the input current an… ▽ More In this paper, a highly-efficient single-switch-regulated resonant wireless power receiver with hybrid modulation is proposed. To achieve both high efficiency and good output voltage regulation, phase shift and pulse width hybrid modulation are simultaneously applied. The soft switching operation in this topology is achieved by the cycle-by-cycle phase shift adjustment between the input current and the gate drive signal and also attributed to the reactive components such as the series-compensated secondary coil and the parasitic capacitor of the active switch . The soft switching operation also leads to high efficiency and low EMI. By adjusting the duty ratio of the switch, tight regulation of the output voltage can be attained. The steady-state and dynamic models of the resonant receiver with hybrid modulation are analytically derived in order to properly design the feedback controller. An experimental setup of a two-coil wireless power transfer system, including the hardware prototype of the proposed receiver, is constructed for experimental verification. The experimental results show the effectiveness of the soft-switching operation in the receiver with high efficiency while maintaining good regulation of the output voltage, regardless of line and load variations. △ Less

Submitted 5 January, 2021; v1 submitted 9 April, 2020; originally announced April 2020.

Comments: in IEEE Journal of Emerging and Selected Topics in Power Electronics. 2020

arXiv:2002.12588 [pdf, other]

Regional Registration of Whole Slide Image Stacks Containing Highly Deformed Artefacts

Authors: Mahsa Paknezhad, Sheng Yang Michael Loh, Yukti Choudhury, Valerie Koh Cui Koh, TimothyTay Kwang Yong, Hui Shan Tan, Ravindran Kanesvaran, Puay Hoon Tan, John Yuen Shyi Peng, Weimiao Yu, Yongcheng Benjamin Tan, Yong Zhen Loy, Min-Han Tan, Hwee Kuan Lee

Abstract: Motivation: High resolution 2D whole slide imaging provides rich information about the tissue structure. This information can be a lot richer if these 2D images can be stacked into a 3D tissue volume. A 3D analysis, however, requires accurate reconstruction of the tissue volume from the 2D image stack. This task is not trivial due to the distortions that each individual tissue slice experiences wh… ▽ More Motivation: High resolution 2D whole slide imaging provides rich information about the tissue structure. This information can be a lot richer if these 2D images can be stacked into a 3D tissue volume. A 3D analysis, however, requires accurate reconstruction of the tissue volume from the 2D image stack. This task is not trivial due to the distortions that each individual tissue slice experiences while cutting and mounting the tissue on the glass slide. Performing registration for the whole tissue slices may be adversely affected by the deformed tissue regions. Consequently, regional registration is found to be more effective. In this paper, we propose an accurate and robust regional registration algorithm for whole slide images which incrementally focuses registration on the area around the region of interest. Results: Using mean similarity index as the metric, the proposed algorithm (mean $\pm$ std: $0.84 \pm 0.11$) followed by a fine registration algorithm ($0.86 \pm 0.08$) outperformed the state-of-the-art linear whole tissue registration algorithm ($0.74 \pm 0.19$) and the regional version of this algorithm ($0.81 \pm 0.15$). The proposed algorithm also outperforms the state-of-the-art nonlinear registration algorithm (original : $0.82 \pm 0.12$, regional : $0.77 \pm 0.22$) for whole slide images and a recently proposed patch-based registration algorithm (patch size 256: $0.79 \pm 0.16$ , patch size 512: $0.77 \pm 0.16$) for medical images. Availability: The C++ implementation code is available online at the github repository: https://github.com/MahsaPaknezhad/WSIRegistration △ Less

Submitted 28 February, 2020; originally announced February 2020.

arXiv:1912.05814 [pdf]

doi 10.1109/TPEL.2019.2917552

Single-Switch-Regulated Resonant WPT Receiver

Authors: Kerui Li, Siew Chong Tan, Ron Shu Yuen Hui

Abstract: A single-switch-regulated wireless power transfer (WPT) receiver is presented in this letter. Aiming at low-cost applications, the system involves only a single-switch class-E resonant rectifier, a frequency synchronization circuit, and a microcontroller. The number of power semiconductor devices required in this circuit is minimal. Only one active switch is used and no diode is required. As a sin… ▽ More A single-switch-regulated wireless power transfer (WPT) receiver is presented in this letter. Aiming at low-cost applications, the system involves only a single-switch class-E resonant rectifier, a frequency synchronization circuit, and a microcontroller. The number of power semiconductor devices required in this circuit is minimal. Only one active switch is used and no diode is required. As a single-switch solution, this simplifies circuit implementation, improves reliability, and lowers hardware cost. The single-switch resonant rectifier provides a relatively constant quasi-sinusoidal voltage waveform to pick up the wireless power from the receiver coil. Due to the resonant nature of the rectifier, ZVS turn on and turn off are achieved. The steady-state analysis and discussions on the component sizing and the control design are provided. A prototype is built and experimental works are performed to verify the features. △ Less

Submitted 18 December, 2019; v1 submitted 12 December, 2019; originally announced December 2019.

Journal ref: in IEEE Transactions on Power Electronics, vol. 34, no. 11, pp. 10386-10391, Nov. 2019

arXiv:1912.05809 [pdf]

doi 10.1109/TPEL.2019.2959057

Single-Stage Regulated Resonant WPT Receiver with Low Input Harmonic Distortion

Authors: Kerui Li, Siew Chong Tan, Ron Shu Yuen Hui

Abstract: Resonant rectifier topologies would be a promising candidate for achieving simple, compact, and reliable single-stage wireless power transfer (WPT) receiver if not for the lack of good DC regulation capability. This paper investigates the problems that prevent the feasibility of single-stage DC regulation in resonant rectifier topologies. A possible solution is the proposed differential resonant r… ▽ More Resonant rectifier topologies would be a promising candidate for achieving simple, compact, and reliable single-stage wireless power transfer (WPT) receiver if not for the lack of good DC regulation capability. This paper investigates the problems that prevent the feasibility of single-stage DC regulation in resonant rectifier topologies. A possible solution is the proposed differential resonant rectifier topology, of which the rectifier is designed to have a relatively constant AC voltage, and that phase shift control is used to achieve relatively good output regulation. Design considerations on the reactive component sizing, magnetic component design, frequency and phase synchronization, small signal modelling, and closed-loop feedback control design, are discussed. Experimental results verified that the proposed WPT receiver system can achieve single-stage AC rectification and DC regulation while attaining the key features of low harmonic distortion in its AC output voltage, continuous DC current, and zero-voltage-switching (ZVS) operation over a wide operating range. △ Less

Submitted 6 January, 2020; v1 submitted 12 December, 2019; originally announced December 2019.

Journal ref: in IEEE Transactions on Power Electronics, vol. 35, no. 7, pp. 6820-6829, July 2020

arXiv:1911.12796 [pdf, other]

Light-weight Calibrator: a Separable Component for Unsupervised Domain Adaptation

Authors: Shaokai Ye, Kailu Wu, Mu Zhou, Yunfei Yang, Sia huat Tan, Kaidi Xu, Jiebo Song, Chenglong Bao, Kaisheng Ma

Abstract: Existing domain adaptation methods aim at learning features that can be generalized among domains. These methods commonly require to update source classifier to adapt to the target domain and do not properly handle the trade off between the source domain and the target domain. In this work, instead of training a classifier to adapt to the target domain, we use a separable component called data cal… ▽ More Existing domain adaptation methods aim at learning features that can be generalized among domains. These methods commonly require to update source classifier to adapt to the target domain and do not properly handle the trade off between the source domain and the target domain. In this work, instead of training a classifier to adapt to the target domain, we use a separable component called data calibrator to help the fixed source classifier recover discrimination power in the target domain, while preserving the source domain's performance. When the difference between two domains is small, the source classifier's representation is sufficient to perform well in the target domain and outperforms GAN-based methods in digits. Otherwise, the proposed method can leverage synthetic images generated by GANs to boost performance and achieve state-of-the-art performance in digits datasets and driving scene semantic segmentation. Our method empirically reveals that certain intriguing hints, which can be mitigated by adversarial attack to domain discriminators, are one of the sources for performance degradation under the domain shift. △ Less

Submitted 28 February, 2020; v1 submitted 28 November, 2019; originally announced November 2019.

Comments: Accepted by CVPR2020

arXiv:1911.04657 [pdf, other]

CALPA-NET: Channel-pruning-assisted Deep Residual Network for Steganalysis of Digital Images

Authors: Shunquan Tan, Weilong Wu, Zilong Shao, Qiushi Li, Bin Li, Jiwu Huang

Abstract: Over the past few years, detection performance improvements of deep-learning based steganalyzers have been usually achieved through structure expansion. However, excessive expanded structure results in huge computational cost, storage overheads, and consequently difficulty in training and deployment. In this paper we propose CALPA-NET, a ChAnneL-Pruning-Assisted deep residual network architecture… ▽ More Over the past few years, detection performance improvements of deep-learning based steganalyzers have been usually achieved through structure expansion. However, excessive expanded structure results in huge computational cost, storage overheads, and consequently difficulty in training and deployment. In this paper we propose CALPA-NET, a ChAnneL-Pruning-Assisted deep residual network architecture search approach to shrink the network structure of existing vast, over-parameterized deep-learning based steganalyzers. We observe that the broad inverted-pyramid structure of existing deep-learning based steganalyzers might contradict the well-established model diversity oriented philosophy, and therefore is not suitable for steganalysis. Then a hybrid criterion combined with two network pruning schemes is introduced to adaptively shrink every involved convolutional layer in a data-driven manner. The resulting network architecture presents a slender bottleneck-like structure. We have conducted extensive experiments on BOSSBase+BOWS2 dataset, more diverse ALASKA dataset and even a large-scale subset extracted from ImageNet CLS-LOC dataset. The experimental results show that the model structure generated by our proposed CALPA-NET can achieve comparative performance with less than two percent of parameters and about one third FLOPs compared to the original steganalytic model. The new model possesses even better adaptivity, transferability, and scalability. △ Less

Submitted 23 June, 2020; v1 submitted 11 November, 2019; originally announced November 2019.

Comments: Accepted by IEEE Transactions on Information Forensics & Security

arXiv:1910.09570 [pdf, other]

Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery

Authors: Shawn Tan, Guillaume Androz, Ahmad Chamseddine, Pierre Fecteau, Aaron Courville, Yoshua Bengio, Joseph Paul Cohen

Abstract: We release the largest public ECG dataset of continuous raw signals for representation learning containing 11 thousand patients and 2 billion labelled beats. Our goal is to enable semi-supervised ECG models to be made as well as to discover unknown subtypes of arrhythmia and anomalous ECG signal events. To this end, we propose an unsupervised representation learning task, evaluated in a semi-super… ▽ More We release the largest public ECG dataset of continuous raw signals for representation learning containing 11 thousand patients and 2 billion labelled beats. Our goal is to enable semi-supervised ECG models to be made as well as to discover unknown subtypes of arrhythmia and anomalous ECG signal events. To this end, we propose an unsupervised representation learning task, evaluated in a semi-supervised fashion. We provide a set of baselines for different feature extractors that can be built upon. Additionally, we perform qualitative evaluations on results from PCA embeddings, where we identify some clustering of known subtypes indicating the potential for representation learning in arrhythmia sub-type discovery. △ Less

Submitted 21 October, 2019; originally announced October 2019.

Comments: Under Review

arXiv:1908.07406 [pdf, ps, other]

Multi-Objective Optimization for Drone Delivery

Authors: Suttinee Sawadsitang, Dusit Niyato, Puay Siew Tan, Sarana Nutanong

Abstract: Recently, an unmanned aerial vehicle (UAV), as known as drone, has become an alternative means of package delivery. Although the drone delivery scheduling has been studied in recent years, most existing models are formulated as a single objective optimization problem. However, in practice, the drone delivery scheduling has multiple objectives that the shipper has to achieve. Moreover, drone delive… ▽ More Recently, an unmanned aerial vehicle (UAV), as known as drone, has become an alternative means of package delivery. Although the drone delivery scheduling has been studied in recent years, most existing models are formulated as a single objective optimization problem. However, in practice, the drone delivery scheduling has multiple objectives that the shipper has to achieve. Moreover, drone delivery typically faces with unexpected events, e.g., breakdown or unable to takeoff, that can significantly affect the scheduling problem. Therefore, in this paper, we propose a multi-objective and three-stage stochastic optimization model for the drone delivery scheduling, called multi-objective optimization for drone delivery (MODD) system. To handle the the multi-objective optimization in the MODD system, we apply $\varepsilon$-constraint method. The performance evaluation is performed by using a real dataset from Singapore delivery services. △ Less

Submitted 24 July, 2019; originally announced August 2019.

Comments: 5 pages, 4 figures

Journal ref: 2019 IEEE 90th Vehicular Technology Conference: VTC2019-Fall

Showing 1–50 of 50 results for author: Tan, S