-
Multi-segment Soft Robot Control via Deep Koopman-based Model Predictive Control
Authors:
Lei Lv,
Lei Liu,
Lei Bao,
Fuchun Sun,
Jiahong Dong,
Jianwei Zhang,
Xuemei Shan,
Kai Sun,
Hao Huang,
Yu Luo
Abstract:
Soft robots, compared to regular rigid robots, as their multiple segments with soft materials bring flexibility and compliance, have the advantages of safe interaction and dexterous operation in the environment. However, due to its characteristics of high dimensional, nonlinearity, time-varying nature, and infinite degree of freedom, it has been challenges in achieving precise and dynamic control…
▽ More
Soft robots, compared to regular rigid robots, as their multiple segments with soft materials bring flexibility and compliance, have the advantages of safe interaction and dexterous operation in the environment. However, due to its characteristics of high dimensional, nonlinearity, time-varying nature, and infinite degree of freedom, it has been challenges in achieving precise and dynamic control such as trajectory tracking and position reaching. To address these challenges, we propose a framework of Deep Koopman-based Model Predictive Control (DK-MPC) for handling multi-segment soft robots. We first employ a deep learning approach with sampling data to approximate the Koopman operator, which therefore linearizes the high-dimensional nonlinear dynamics of the soft robots into a finite-dimensional linear representation. Secondly, this linearized model is utilized within a model predictive control framework to compute optimal control inputs that minimize the tracking error between the desired and actual state trajectories. The real-world experiments on the soft robot "Chordata" demonstrate that DK-MPC could achieve high-precision control, showing the potential of DK-MPC for future applications to soft robots.
△ Less
Submitted 1 May, 2025;
originally announced May 2025.
-
Analysis and Mitigation of Cascading Failures Using a Stochastic Interaction Graph with Eigen-analysis
Authors:
Zhenping Guo,
Xiaowen Su,
Kai Sun,
Byungkwon Park,
Srdjan Simunovic
Abstract:
In studies on complex network systems using graph theory, eigen-analysis is typically performed on an undirected graph model of the network. However, when analyzing cascading failures in a power system, the interactions among failures suggest the need for a directed graph beyond the topology of the power system to model directions of failure propagation. To accurately quantify failure interactions…
▽ More
In studies on complex network systems using graph theory, eigen-analysis is typically performed on an undirected graph model of the network. However, when analyzing cascading failures in a power system, the interactions among failures suggest the need for a directed graph beyond the topology of the power system to model directions of failure propagation. To accurately quantify failure interactions for effective mitigation strategies, this paper proposes a stochastic interaction graph model and associated eigen-analysis. Different types of modes on failure propagations are defined and characterized by the eigenvalues of a stochastic interaction matrix, whose absolute values are unity, zero, or in between. Finding and interpreting these modes helps identify the probable patterns of failure propagation, either local or widespread, and the participating components based on eigenvectors. Then, by lowering the failure probabilities of critical components highly participating in a mode of widespread failures, cascading can be mitigated. The validity of the proposed stochastic interaction graph model, eigen-analysis and the resulting mitigation strategies is demonstrated using simulated cascading failure data on an NPCC 140-bus system.
△ Less
Submitted 12 March, 2025;
originally announced March 2025.
-
PI-Controlled Variable Time-Step Power System Simulation Using an Adaptive Order Differential Transformation Method
Authors:
Kaiyang Huang,
Yang Liu,
Kai Sun,
Feng Qiu
Abstract:
Dynamic simulation plays a crucial role in power system transient stability analysis, but traditional numerical integration-based methods are time-consuming due to the small time step sizes. Other semi-analytical solution methods, such as the Differential Transformation method, often struggle to select proper orders and steps, leading to slow performance and numerical instability. To address these…
▽ More
Dynamic simulation plays a crucial role in power system transient stability analysis, but traditional numerical integration-based methods are time-consuming due to the small time step sizes. Other semi-analytical solution methods, such as the Differential Transformation method, often struggle to select proper orders and steps, leading to slow performance and numerical instability. To address these challenges, this paper proposes a novel adaptive dynamic simulation approach for power system transient
stability analysis. The approach adds feedback control and optimization to selecting the step and order, utilizing the Differential Transformation method and a proportional-integral control strategy to control truncation errors. Order selection is formulated as an optimization problem resulting in a variable-step-optimal-order method that achieves significantly larger time step sizes without violating numerical stability. It is applied to three systems: the IEEE 9-bus, 3-generator system, IEEE 39-bus, 10-generator system, and a Polish 2383-bus, 327-generator system, promising computational efficiency and numerical robustness for large-scale power system is demonstrated in comprehensive case studies.
△ Less
Submitted 12 March, 2025;
originally announced March 2025.
-
A Heterogeneous Multiscale Method for Efficient Simulation of Power Systems with Inverter-Based Resources
Authors:
Kaiyang Huang,
Min Xiong,
Yang Liu,
Kai Sun
Abstract:
As inverter-based resources (IBRs) penetrate power systems, the dynamics become more complex, exhibiting multiple timescales, including electromagnetic transient (EMT) dynamics of power electronic controllers and electromechanical dynamics of synchronous generators. Consequently, the power system model becomes highly stiff, posing a challenge for efficient simulation using existing methods that fo…
▽ More
As inverter-based resources (IBRs) penetrate power systems, the dynamics become more complex, exhibiting multiple timescales, including electromagnetic transient (EMT) dynamics of power electronic controllers and electromechanical dynamics of synchronous generators. Consequently, the power system model becomes highly stiff, posing a challenge for efficient simulation using existing methods that focus on dynamics within a single timescale. This paper proposes a Heterogeneous Multiscale Method for highly efficient multi-timescale simulation of a power system represented by its EMT model. The new method alternates between the microscopic EMT model of the system and an automatically reduced macroscopic model, varying the step size accordingly to achieve significant acceleration while maintaining accuracy in both fast and slow dynamics of interests. It also incorporates a semi-analytical solution method to enable a more adaptive variable-step mechanism. The new simulation method is illustrated using a two-area system and is then tested on a detailed EMT model of the IEEE 39-bus system.
△ Less
Submitted 14 March, 2025; v1 submitted 12 March, 2025;
originally announced March 2025.
-
Pseudo-Measurement Enhancement in Power Distribution Systems
Authors:
Tao Xu,
Kaiqi Wang,
Jiadong Zhang,
Ji Qiao,
Zixuan Zhao,
Hong Zhu,
Kai Sun
Abstract:
With the rapid development of smart distribution networks (DNs), the integrity and accuracy of grid measurement data are crucial to the safety and stability of the entire system. However, the quality of the user power consumption data cannot be guaranteed during the collection and transmission process. To this end, this paper proposes a low-rank tensor completion model based on CANDECOMP/PARAFAC d…
▽ More
With the rapid development of smart distribution networks (DNs), the integrity and accuracy of grid measurement data are crucial to the safety and stability of the entire system. However, the quality of the user power consumption data cannot be guaranteed during the collection and transmission process. To this end, this paper proposes a low-rank tensor completion model based on CANDECOMP/PARAFAC decomposition (CPD-LRTC) to enhance the quality of the measurement data of the DNs. Firstly, the causes and the associated characteristics of the missing data are analyzed, and a third-order standard tensor is constructed as a mathematical model of the measurement data of the DN. Then, a completion model is established based on the characteristics of measurement data and the low rank of the completion tensor, and the alternating direction method of multipliers (ADMM) is used to solve it iteratively. Finally, the proposed model is verified through two case studies, the completion accuracy, the computational efficiency, and the memory usage are compared to traditional methods.
△ Less
Submitted 22 February, 2025;
originally announced February 2025.
-
Enabling Cardiac Monitoring using In-ear Ballistocardiogram on COTS Wireless Earbuds
Authors:
Yongjian Fu,
Ke Sun,
Ruyao Wang,
Xinyi Li,
Ju Ren,
Yaoxue Zhang,
Xinyu Zhang
Abstract:
The human ear offers a unique opportunity for cardiac monitoring due to its physiological and practical advantages. However, existing earable solutions require additional hardware and complex processing, posing challenges for commercial True Wireless Stereo (TWS) earbuds which are limited by their form factor and resources. In this paper, we propose TWSCardio, a novel system that repurposes the IM…
▽ More
The human ear offers a unique opportunity for cardiac monitoring due to its physiological and practical advantages. However, existing earable solutions require additional hardware and complex processing, posing challenges for commercial True Wireless Stereo (TWS) earbuds which are limited by their form factor and resources. In this paper, we propose TWSCardio, a novel system that repurposes the IMU sensors in TWS earbuds for cardiac monitoring. Our key finding is that these sensors can capture in-ear ballistocardiogram (BCG) signals. TWSCardio reuses the unstable Bluetooth channel to stream the IMU data to a smartphone for BCG processing. It incorporates a signal enhancement framework to address issues related to missing data and low sampling rate, while mitigating motion artifacts by fusing multi-axis information. Furthermore, it employs a region-focused signal reconstruction method to translate the multi-axis in-ear BCG signals into fine-grained seismocardiogram (SCG) signals. We have implemented TWSCardio as an efficient real-time app. Our experiments on 100 subjects verify that TWSCardio can accurately reconstruct cardiac signals while showing resilience to motion artifacts, missing data, and low sampling rates. Our case studies further demonstrate that TWSCardio can support diverse cardiac monitoring applications.
△ Less
Submitted 12 January, 2025;
originally announced January 2025.
-
A Knowledge-enhanced Pathology Vision-language Foundation Model for Cancer Diagnosis
Authors:
Xiao Zhou,
Luoyi Sun,
Dexuan He,
Wenbin Guan,
Ruifen Wang,
Lifeng Wang,
Xin Sun,
Kun Sun,
Ya Zhang,
Yanfeng Wang,
Weidi Xie
Abstract:
Deep learning has enabled the development of highly robust foundation models for various pathological tasks across diverse diseases and patient cohorts. Among these models, vision-language pre-training, which leverages large-scale paired data to align pathology image and text embedding spaces, and provides a novel zero-shot paradigm for downstream tasks. However, existing models have been primaril…
▽ More
Deep learning has enabled the development of highly robust foundation models for various pathological tasks across diverse diseases and patient cohorts. Among these models, vision-language pre-training, which leverages large-scale paired data to align pathology image and text embedding spaces, and provides a novel zero-shot paradigm for downstream tasks. However, existing models have been primarily data-driven and lack the incorporation of domain-specific knowledge, which limits their performance in cancer diagnosis, especially for rare tumor subtypes. To address this limitation, we establish a Knowledge-enhanced Pathology (KEEP) foundation model that harnesses disease knowledge to facilitate vision-language pre-training. Specifically, we first construct a disease knowledge graph (KG) that covers 11,454 human diseases with 139,143 disease attributes, including synonyms, definitions, and hypernym relations. We then systematically reorganize the millions of publicly available noisy pathology image-text pairs, into 143K well-structured semantic groups linked through the hierarchical relations of the disease KG. To derive more nuanced image and text representations, we propose a novel knowledge-enhanced vision-language pre-training approach that integrates disease knowledge into the alignment within hierarchical semantic groups instead of unstructured image-text pairs. Validated on 18 diverse benchmarks with more than 14,000 whole slide images (WSIs), KEEP achieves state-of-the-art performance in zero-shot cancer diagnostic tasks. Notably, for cancer detection, KEEP demonstrates an average sensitivity of 89.8% at a specificity of 95.0% across 7 cancer types. For cancer subtyping, KEEP achieves a median balanced accuracy of 0.456 in subtyping 30 rare brain cancers, indicating strong generalizability for diagnosing rare tumors.
△ Less
Submitted 17 December, 2024;
originally announced December 2024.
-
3D MedDiffusion: A 3D Medical Diffusion Model for Controllable and High-quality Medical Image Generation
Authors:
Haoshen Wang,
Zhentao Liu,
Kaicong Sun,
Xiaodong Wang,
Dinggang Shen,
Zhiming Cui
Abstract:
The generation of medical images presents significant challenges due to their high-resolution and three-dimensional nature. Existing methods often yield suboptimal performance in generating high-quality 3D medical images, and there is currently no universal generative framework for medical imaging. In this paper, we introduce the 3D Medical Diffusion (3D MedDiffusion) model for controllable, high-…
▽ More
The generation of medical images presents significant challenges due to their high-resolution and three-dimensional nature. Existing methods often yield suboptimal performance in generating high-quality 3D medical images, and there is currently no universal generative framework for medical imaging. In this paper, we introduce the 3D Medical Diffusion (3D MedDiffusion) model for controllable, high-quality 3D medical image generation. 3D MedDiffusion incorporates a novel, highly efficient Patch-Volume Autoencoder that compresses medical images into latent space through patch-wise encoding and recovers back into image space through volume-wise decoding. Additionally, we design a new noise estimator to capture both local details and global structure information during diffusion denoising process. 3D MedDiffusion can generate fine-detailed, high-resolution images (up to 512x512x512) and effectively adapt to various downstream tasks as it is trained on large-scale datasets covering CT and MRI modalities and different anatomical regions (from head to leg). Experimental results demonstrate that 3D MedDiffusion surpasses state-of-the-art methods in generative quality and exhibits strong generalizability across tasks such as sparse-view CT reconstruction, fast MRI reconstruction, and data augmentation.
△ Less
Submitted 17 December, 2024;
originally announced December 2024.
-
Reinforcement Learning for Freeway Lane-Change Regulation via Connected Vehicles
Authors:
Ke Sun,
Huan Yu
Abstract:
Lane change decision-making is a complex task due to intricate vehicle-vehicle and vehicle-infrastructure interactions. Existing algorithms for lane-change control often depend on vehicles with a certain level of autonomy (e.g., autonomous or connected autonomous vehicles). To address the challenges posed by low penetration rates of autonomous vehicles and the high costs of precise data collection…
▽ More
Lane change decision-making is a complex task due to intricate vehicle-vehicle and vehicle-infrastructure interactions. Existing algorithms for lane-change control often depend on vehicles with a certain level of autonomy (e.g., autonomous or connected autonomous vehicles). To address the challenges posed by low penetration rates of autonomous vehicles and the high costs of precise data collection, this study proposes a dynamic lane change regulation design based on multi-agent reinforcement learning (MARL) to enhance freeway traffic efficiency. The proposed framework leverages multi-lane macroscopic traffic models that describe spatial-temporal dynamics of the density and speed for each lane. Lateral traffic flow between adjacent lanes, resulting from aggregated lane-changing behaviors, is modeled as source terms exchanged between the partial differential equations (PDEs). We propose a lane change regulation strategy using MARL, where one agent is placed at each discretized lane grid. The state of each agent is represented by aggregated vehicle attributes within its grid, generated from the SUMO microscopic simulation environment. The agent's actions are lane-change regulations for vehicles in its grid. Specifically, lane-change regulation signals are computed at a centralized traffic management center and then broadcast to connected vehicles in the corresponding lane grids. Compared to vehicle-level maneuver control, this approach achieves a higher regulation rate by leveraging vehicle connectivity while introducing no critical safety concerns, and accommodating varying levels of connectivity and autonomy within the traffic system. The proposed model is simulated and evaluated in varied traffic scenarios and demand conditions. Experimental results demonstrate that the method improves overall traffic efficiency with minimal additional energy consumption while maintaining driving safety.
△ Less
Submitted 6 December, 2024; v1 submitted 5 December, 2024;
originally announced December 2024.
-
End-to-end Triple-domain PET Enhancement: A Hybrid Denoising-and-reconstruction Framework for Reconstructing Standard-dose PET Images from Low-dose PET Sinograms
Authors:
Caiwen Jiang,
Mianxin Liu,
Kaicong Sun,
Dinggang Shen
Abstract:
As a sensitive functional imaging technique, positron emission tomography (PET) plays a critical role in early disease diagnosis. However, obtaining a high-quality PET image requires injecting a sufficient dose (standard dose) of radionuclides into the body, which inevitably poses radiation hazards to patients. To mitigate radiation hazards, the reconstruction of standard-dose PET (SPET) from low-…
▽ More
As a sensitive functional imaging technique, positron emission tomography (PET) plays a critical role in early disease diagnosis. However, obtaining a high-quality PET image requires injecting a sufficient dose (standard dose) of radionuclides into the body, which inevitably poses radiation hazards to patients. To mitigate radiation hazards, the reconstruction of standard-dose PET (SPET) from low-dose PET (LPET) is desired. According to imaging theory, PET reconstruction process involves multiple domains (e.g., projection domain and image domain), and a significant portion of the difference between SPET and LPET arises from variations in the noise levels introduced during the sampling of raw data as sinograms. In light of these two facts, we propose an end-to-end TriPle-domain LPET EnhancemenT (TriPLET) framework, by leveraging the advantages of a hybrid denoising-and-reconstruction process and a triple-domain representation (i.e., sinograms, frequency spectrum maps, and images) to reconstruct SPET images from LPET sinograms. Specifically, TriPLET consists of three sequentially coupled components including 1) a Transformer-assisted denoising network that denoises the inputted LPET sinograms in the projection domain, 2) a discrete-wavelet-transform-based reconstruction network that further reconstructs SPET from LPET in the wavelet domain, and 3) a pair-based adversarial network that evaluates the reconstructed SPET images in the image domain. Extensive experiments on the real PET dataset demonstrate that our proposed TriPLET can reconstruct SPET images with the highest similarity and signal-to-noise ratio to real data, compared with state-of-the-art methods.
△ Less
Submitted 4 December, 2024;
originally announced December 2024.
-
DVasMesh: Deep Structured Mesh Reconstruction from Vascular Images for Dynamics Modeling of Vessels
Authors:
Dengqiang Jia,
Xinnian Yang,
Xiaosong Xiong,
Shijie Huang,
Feiyu Hou,
Li Qin,
Kaicong Sun,
Kannie Wai Yan Chan,
Dinggang Shen
Abstract:
Vessel dynamics simulation is vital in studying the relationship between geometry and vascular disease progression. Reliable dynamics simulation relies on high-quality vascular meshes. Most of the existing mesh generation methods highly depend on manual annotation, which is time-consuming and laborious, usually facing challenges such as branch merging and vessel disconnection. This will hinder ves…
▽ More
Vessel dynamics simulation is vital in studying the relationship between geometry and vascular disease progression. Reliable dynamics simulation relies on high-quality vascular meshes. Most of the existing mesh generation methods highly depend on manual annotation, which is time-consuming and laborious, usually facing challenges such as branch merging and vessel disconnection. This will hinder vessel dynamics simulation, especially for the population study. To address this issue, we propose a deep learning-based method, dubbed as DVasMesh to directly generate structured hexahedral vascular meshes from vascular images. Our contributions are threefold. First, we propose to formally formulate each vertex of the vascular graph by a four-element vector, including coordinates of the centerline point and the radius. Second, a vectorized graph template is employed to guide DVasMesh to estimate the vascular graph. Specifically, we introduce a sampling operator, which samples the extracted features of the vascular image (by a segmentation network) according to the vertices in the template graph. Third, we employ a graph convolution network (GCN) and take the sampled features as nodes to estimate the deformation between vertices of the template graph and target graph, and the deformed graph template is used to build the mesh. Taking advantage of end-to-end learning and discarding direct dependency on annotated labels, our DVasMesh demonstrates outstanding performance in generating structured vascular meshes on cardiac and cerebral vascular images. It shows great potential for clinical applications by reducing mesh generation time from 2 hours (manual) to 30 seconds (automatic).
△ Less
Submitted 1 December, 2024;
originally announced December 2024.
-
Stealth Attacks Against Moving Target Defense for Smart Grid
Authors:
Ke Sun,
Iñaki Esnaola,
H. Vincent Poor
Abstract:
Data injection attacks (DIAs) pose a significant cybersecurity threat to the Smart Grid by enabling an attacker to compromise the integrity of data acquisition and manipulate estimated states without triggering bad data detection procedures. To mitigate this vulnerability, the moving target defense (MTD) alters branch admittances to mismatch the system information that is available to an attacker,…
▽ More
Data injection attacks (DIAs) pose a significant cybersecurity threat to the Smart Grid by enabling an attacker to compromise the integrity of data acquisition and manipulate estimated states without triggering bad data detection procedures. To mitigate this vulnerability, the moving target defense (MTD) alters branch admittances to mismatch the system information that is available to an attacker, thereby inducing an imperfect DIA construction that results in degradation of attack performance. In this paper, we first analyze the existence of stealth attacks for the case in which the MTD strategy only changes the admittance of a single branch. Equipped with this initial insight, we then extend the results to the case in which multiple branches are protected by the MTD strategy. Remarkably, we show that stealth attacks can be constructed with information only about which branches are protected, without knowledge about the particular admittance value changes. Furthermore, we provide a sufficient protection condition for the MTD strategy via graph-theoretic tools that guarantee that the system is not vulnerable to DIAs. Numerical simulations are implemented on IEEE test systems to validate the obtained results.
△ Less
Submitted 24 November, 2024;
originally announced November 2024.
-
EveGuard: Defeating Vibration-based Side-Channel Eavesdropping with Audio Adversarial Perturbations
Authors:
Jung-Woo Chang,
Ke Sun,
David Xia,
Xinyu Zhang,
Farinaz Koushanfar
Abstract:
Vibrometry-based side channels pose a significant privacy risk, exploiting sensors like mmWave radars, light sensors, and accelerometers to detect vibrations from sound sources or proximate objects, enabling speech eavesdropping. Despite various proposed defenses, these involve costly hardware solutions with inherent physical limitations. This paper presents EveGuard, a software-driven defense fra…
▽ More
Vibrometry-based side channels pose a significant privacy risk, exploiting sensors like mmWave radars, light sensors, and accelerometers to detect vibrations from sound sources or proximate objects, enabling speech eavesdropping. Despite various proposed defenses, these involve costly hardware solutions with inherent physical limitations. This paper presents EveGuard, a software-driven defense framework that creates adversarial audio, protecting voice privacy from side channels without compromising human perception. We leverage the distinct sensing capabilities of side channels and traditional microphones, where side channels capture vibrations and microphones record changes in air pressure, resulting in different frequency responses. EveGuard first proposes a perturbation generator model (PGM) that effectively suppresses sensor-based eavesdropping while maintaining high audio quality. Second, to enable end-to-end training of PGM, we introduce a new domain translation task called Eve-GAN for inferring an eavesdropped signal from a given audio. We further apply few-shot learning to mitigate the data collection overhead for Eve-GAN training. Our extensive experiments show that EveGuard achieves a protection rate of more than 97 percent from audio classifiers and significantly hinders eavesdropped audio reconstruction. We further validate the performance of EveGuard across three adaptive attack mechanisms. We have conducted a user study to verify the perceptual quality of our perturbed audio.
△ Less
Submitted 9 April, 2025; v1 submitted 15 November, 2024;
originally announced November 2024.
-
A Generalist Audio Foundation Model for Comprehensive Body Sound Auscultation
Authors:
Pingjie Wang,
Liudan Zhao,
Zihan Zhao,
Miao He,
Xin Sun,
Ya Zhang,
Kun Sun,
Yanfeng Wang,
Yu Wang
Abstract:
Accurate and efficient auscultation-based diagnostics are vital for early disease detection, especially in resource-limited settings where specialized clinical expertise is scarce. Traditional auscultation, which heavily depends on clinician experience, suffers from significant inter-observer variability, while existing AI models often falter due to the limitations of non-representative training d…
▽ More
Accurate and efficient auscultation-based diagnostics are vital for early disease detection, especially in resource-limited settings where specialized clinical expertise is scarce. Traditional auscultation, which heavily depends on clinician experience, suffers from significant inter-observer variability, while existing AI models often falter due to the limitations of non-representative training data. In this study, we introduce AuscultaBase, a novel AI-driven diagnostic framework that harnesses self-supervised and contrastive learning techniques alongside large-scale, multi-source data integration to advance body sound analysis. By generating robust feature representations, AuscultaBase markedly enhances performance in abnormality detection, disease classification, and activity recognition tasks. Comprehensive evaluations on our newly established benchmark, AuscultaBench, demonstrate that AuscultaBase consistently outperforms state-of-the-art methods across key performance metrics, underscoring its potential as a scalable and cost-effective tool for clinical screening and early disease intervention. The code and model checkpoint has been released in https://github.com/applewpj/AuscultaBase.
△ Less
Submitted 25 March, 2025; v1 submitted 11 November, 2024;
originally announced November 2024.
-
HCDN: A Change Detection Network for Construction Housekeeping Using Feature Fusion and Large Vision Models
Authors:
Kailai Sun,
Zherui Shao,
Yang Miang Goh,
Jing Tian,
Vincent J. L. Gan
Abstract:
Workplace safety has received increasing attention as millions of workers worldwide suffer from work-related accidents. Despite poor housekeeping is a significant contributor to construction accidents, there remains a significant lack of technological research focused on improving housekeeping practices in construction sites. Recognizing and locating poor housekeeping in a dynamic construction sit…
▽ More
Workplace safety has received increasing attention as millions of workers worldwide suffer from work-related accidents. Despite poor housekeeping is a significant contributor to construction accidents, there remains a significant lack of technological research focused on improving housekeeping practices in construction sites. Recognizing and locating poor housekeeping in a dynamic construction site is an important task that can be improved through computer vision approaches. Despite advances in AI and computer vision, existing methods for detecting poor housekeeping conditions face many challenges, including limited explanations, lack of locating of poor housekeeping, and lack of annotated datasets. On the other hand, change detection which aims to detect the changed environmental conditions (e.g., changing from good to poor housekeeping) and 'where' the change has occurred (e.g., location of objects causing poor housekeeping), has not been explored to the problem of housekeeping management. To address these challenges, we propose the Housekeeping Change Detection Network (HCDN), an advanced change detection neural network that integrates a feature fusion module and a large vision model, achieving state-of-the-art performance. Additionally, we introduce the approach to establish a novel change detection dataset (named Housekeeping-CCD) focused on housekeeping in construction sites, along with a housekeeping segmentation dataset. Our contributions include significant performance improvements compared to existing methods, providing an effective tool for enhancing construction housekeeping and safety. To promote further development, we share our source code and trained models for global researchers: https://github.com/NUS-DBE/Housekeeping-CD.
△ Less
Submitted 22 October, 2024;
originally announced October 2024.
-
Digital Twin for O-RAN Towards 6G
Authors:
Huan X. Nguyen,
Kexuan Sun,
Duc To,
Quoc-Tuan Vien,
Tuan Anh Le
Abstract:
In future wireless systems of beyond 5G and 6G, addressing diverse applications with varying quality requirements is essential. Open Radio Access Network (O-RAN) architectures offer the potential for dynamic resource adaptation based on traffic demands. However, achieving real-time resource orchestration remains a challenge. Simultaneously, Digital Twin (DT) technology holds promise for testing an…
▽ More
In future wireless systems of beyond 5G and 6G, addressing diverse applications with varying quality requirements is essential. Open Radio Access Network (O-RAN) architectures offer the potential for dynamic resource adaptation based on traffic demands. However, achieving real-time resource orchestration remains a challenge. Simultaneously, Digital Twin (DT) technology holds promise for testing and analysing complex systems, offering a unique platform for addressing dynamic operation and automation in O-RAN architectures. Yet, developing DTs for complex 5G/6G networks poses challenges, including data exchanges, ML model training data availability, network dynamics, processing power limitations, interdisciplinary collaboration needs, and a lack of standardized methodologies. This paper provides an overview of Open RAN architecture, trend and challenges, proposing the DT concepts for O-RAN with solution examples showcasing its integration into the framework.
△ Less
Submitted 3 October, 2024;
originally announced October 2024.
-
Towards General Text-guided Image Synthesis for Customized Multimodal Brain MRI Generation
Authors:
Yulin Wang,
Honglin Xiong,
Kaicong Sun,
Shuwei Bai,
Ling Dai,
Zhongxiang Ding,
Jiameng Liu,
Qian Wang,
Qian Liu,
Dinggang Shen
Abstract:
Multimodal brain magnetic resonance (MR) imaging is indispensable in neuroscience and neurology. However, due to the accessibility of MRI scanners and their lengthy acquisition time, multimodal MR images are not commonly available. Current MR image synthesis approaches are typically trained on independent datasets for specific tasks, leading to suboptimal performance when applied to novel datasets…
▽ More
Multimodal brain magnetic resonance (MR) imaging is indispensable in neuroscience and neurology. However, due to the accessibility of MRI scanners and their lengthy acquisition time, multimodal MR images are not commonly available. Current MR image synthesis approaches are typically trained on independent datasets for specific tasks, leading to suboptimal performance when applied to novel datasets and tasks. Here, we present TUMSyn, a Text-guided Universal MR image Synthesis generalist model, which can flexibly generate brain MR images with demanded imaging metadata from routinely acquired scans guided by text prompts. To ensure TUMSyn's image synthesis precision, versatility, and generalizability, we first construct a brain MR database comprising 31,407 3D images with 7 MRI modalities from 13 centers. We then pre-train an MRI-specific text encoder using contrastive learning to effectively control MR image synthesis based on text prompts. Extensive experiments on diverse datasets and physician assessments indicate that TUMSyn can generate clinically meaningful MR images with specified imaging metadata in supervised and zero-shot scenarios. Therefore, TUMSyn can be utilized along with acquired MR scan(s) to facilitate large-scale MRI-based screening and diagnosis of brain diseases.
△ Less
Submitted 25 September, 2024;
originally announced September 2024.
-
Electrically Reconfigurable Non-Volatile On-Chip Bragg Filter with Multilevel Operation
Authors:
Amged Alquliah,
Jay Ke-Chieh Sun,
Christopher Mekhiel,
Chengkuan Gao,
Guli Gulinihali,
Yeshaiahu Fainman,
Abdoulaye Ndao
Abstract:
Photonic integrated circuits (PICs) demand tailored spectral responses for various applications. On-chip Bragg filters offer a promising solution, yet their static nature hampers scalability. Current tunable filters rely on volatile switching mechanisms plagued by high static power consumption and thermal crosstalk. Here, we introduce, for the first time, a non-volatile, electrically programmable…
▽ More
Photonic integrated circuits (PICs) demand tailored spectral responses for various applications. On-chip Bragg filters offer a promising solution, yet their static nature hampers scalability. Current tunable filters rely on volatile switching mechanisms plagued by high static power consumption and thermal crosstalk. Here, we introduce, for the first time, a non-volatile, electrically programmable on-chip Bragg filter. This device incorporates a nanoscale layer of wide-bandgap phase change material (Sb2S3) atop a periodically structured silicon waveguide. The reversible phase transitions and drastic refractive index modulation of Sb2S3 enable dynamic spectral tuning via foundry-compatible microheaters. Our design surpasses traditional passive Bragg gratings and active volatile filters by offering electrically controlled, reconfigurable spectral responses in a non-volatile manner. The proposed filter achieves a peak reflectivity exceeding 99% and a high tuning range ($Δλ$=20 nm) when transitioning between the amorphous and crystalline states of Sb2S3. Additionally, we demonstrate quasi-continuous spectral control of the filter stopband by modulating the amorphous/crystalline distribution within Sb2S3. Our approach offers substantial benefits for low-power, programmable PICs, thereby laying the groundwork for prospective applications in optical communications, optical interconnects, microwave photonics, optical signal processing, and adaptive multi-parameter sensing.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
HSDreport: Heart Sound Diagnosis with Echocardiography Reports
Authors:
Zihan Zhao,
Pingjie Wang,
Liudan Zhao,
Yuchen Yang,
Ya Zhang,
Kun Sun,
Xin Sun,
Xin Zhou,
Yu Wang,
Yanfeng Wang
Abstract:
Heart sound auscultation holds significant importance in the diagnosis of congenital heart disease. However, existing methods for Heart Sound Diagnosis (HSD) tasks are predominantly limited to a few fixed categories, framing the HSD task as a rigid classification problem that does not fully align with medical practice and offers only limited information to physicians. Besides, such methods do not…
▽ More
Heart sound auscultation holds significant importance in the diagnosis of congenital heart disease. However, existing methods for Heart Sound Diagnosis (HSD) tasks are predominantly limited to a few fixed categories, framing the HSD task as a rigid classification problem that does not fully align with medical practice and offers only limited information to physicians. Besides, such methods do not utilize echocardiography reports, the gold standard in the diagnosis of related diseases. To tackle this challenge, we introduce HSDreport, a new benchmark for HSD, which mandates the direct utilization of heart sounds obtained from auscultation to predict echocardiography reports. This benchmark aims to merge the convenience of auscultation with the comprehensive nature of echocardiography reports. First, we collect a new dataset for this benchmark, comprising 2,275 heart sound samples along with their corresponding reports. Subsequently, we develop a knowledge-aware query-based transformer to handle this task. The intent is to leverage the capabilities of medically pre-trained models and the internal knowledge of large language models (LLMs) to address the task's inherent complexity and variability, thereby enhancing the robustness and scientific validity of the method. Furthermore, our experimental results indicate that our method significantly outperforms traditional HSD approaches and existing multimodal LLMs in detecting key abnormalities in heart sounds.
△ Less
Submitted 16 August, 2024;
originally announced August 2024.
-
Toward Pedestrian Head Tracking: A Benchmark Dataset and an Information Fusion Network
Authors:
Kailai Sun,
Xinwei Wang,
Shaobo Liu,
Qianchuan Zhao,
Gao Huang,
Chang Liu
Abstract:
Pedestrian detection and tracking in crowded video sequences have a wide range of applications, including autonomous driving, robot navigation and pedestrian flow surveillance. However, detecting and tracking pedestrians in high-density crowds face many challenges, including intra-class occlusions, complex motions, and diverse poses. Although deep learning models have achieved remarkable progress…
▽ More
Pedestrian detection and tracking in crowded video sequences have a wide range of applications, including autonomous driving, robot navigation and pedestrian flow surveillance. However, detecting and tracking pedestrians in high-density crowds face many challenges, including intra-class occlusions, complex motions, and diverse poses. Although deep learning models have achieved remarkable progress in head detection, head tracking datasets and methods are extremely lacking. Existing head datasets have limited coverage of complex pedestrian flows and scenes (e.g., pedestrian interactions, occlusions, and object interference). It is of great importance to develop new head tracking datasets and methods. To address these challenges, we present a Chinese Large-scale Cross-scene Pedestrian Head Tracking dataset (Cchead) and a Multi-Source Information Fusion Network (MIFN). Our dataset has features that are of considerable interest, including 10 diverse scenes of 50,528 frames with over 2,366,249 heads and 2,358 tracks annotated. Our dataset contains diverse human moving speeds, directions, and complex crowd pedestrian flows with collision avoidance behaviors. We provide a comprehensive analysis and comparison with existing state-of-the-art (SOTA) algorithms. Moreover, our MIFN is the first end-to-end CNN-based head detection and tracking network that jointly trains RGB frames, pixel-level motion information (optical flow and frame difference maps), depth maps, and density maps in videos. Compared with SOTA pedestrian detection and tracking methods, MIFN achieves superior performance on our Cchead dataset. We believe our datasets and baseline will become valuable resources towards developing pedestrian tracking in dense crowds.
△ Less
Submitted 11 August, 2024;
originally announced August 2024.
-
Prototype Learning Guided Hybrid Network for Breast Tumor Segmentation in DCE-MRI
Authors:
Lei Zhou,
Yuzhong Zhang,
Jiadong Zhang,
Xuejun Qian,
Chen Gong,
Kun Sun,
Zhongxiang Ding,
Xing Wang,
Zhenhui Li,
Zaiyi Liu,
Dinggang Shen
Abstract:
Automated breast tumor segmentation on the basis of dynamic contrast-enhancement magnetic resonance imaging (DCE-MRI) has shown great promise in clinical practice, particularly for identifying the presence of breast disease. However, accurate segmentation of breast tumor is a challenging task, often necessitating the development of complex networks. To strike an optimal trade-off between computati…
▽ More
Automated breast tumor segmentation on the basis of dynamic contrast-enhancement magnetic resonance imaging (DCE-MRI) has shown great promise in clinical practice, particularly for identifying the presence of breast disease. However, accurate segmentation of breast tumor is a challenging task, often necessitating the development of complex networks. To strike an optimal trade-off between computational costs and segmentation performance, we propose a hybrid network via the combination of convolution neural network (CNN) and transformer layers. Specifically, the hybrid network consists of a encoder-decoder architecture by stacking convolution and decovolution layers. Effective 3D transformer layers are then implemented after the encoder subnetworks, to capture global dependencies between the bottleneck features. To improve the efficiency of hybrid network, two parallel encoder subnetworks are designed for the decoder and the transformer layers, respectively. To further enhance the discriminative capability of hybrid network, a prototype learning guided prediction module is proposed, where the category-specified prototypical features are calculated through on-line clustering. All learned prototypical features are finally combined with the features from decoder for tumor mask prediction. The experimental results on private and public DCE-MRI datasets demonstrate that the proposed hybrid network achieves superior performance than the state-of-the-art (SOTA) methods, while maintaining balance between segmentation accuracy and computation cost. Moreover, we demonstrate that automatically generated tumor masks can be effectively applied to identify HER2-positive subtype from HER2-negative subtype with the similar accuracy to the analysis based on manual tumor segmentation. The source code is available at https://github.com/ZhouL-lab/PLHN.
△ Less
Submitted 11 August, 2024;
originally announced August 2024.
-
Estimation of Participation Factors for Power System Oscillation from Measurements
Authors:
Tianwei Xia,
Zhe Yu,
Kai Sun,
Di Shi,
Kaiyang Huang
Abstract:
In a power system, when the participation factors of generators are computed to rank their participations into an oscillatory mode, a model-based approach is conventionally used on the linearized system model by means of the corresponding right and left eigenvectors. This paper proposes a new approach for estimating participation factors directly from measurement data on generator responses under…
▽ More
In a power system, when the participation factors of generators are computed to rank their participations into an oscillatory mode, a model-based approach is conventionally used on the linearized system model by means of the corresponding right and left eigenvectors. This paper proposes a new approach for estimating participation factors directly from measurement data on generator responses under selected disturbances. The approach computes extended participation factors that coincide with accurate model-based participation factors when the measured responses satisfy an ideally symmetric condition. This paper relaxes this symmetric condition with the original measurement space by identifying and utilizing a coordinate transformation to a new space optimally recovering the symmetry. Thus, the optimal estimates of participation factors solely from measurements are achieved, and the accuracy and influencing factors are discussed. The proposed approach is first demonstrated in detail on a two-area system and then tested on an NPCC 48-machine power system. The penetration of inverter-based resources is also considered.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
TIMIT Speaker Profiling: A Comparison of Multi-task learning and Single-task learning Approaches
Authors:
Rong Wang,
Kun Sun
Abstract:
This study employs deep learning techniques to explore four speaker profiling tasks on the TIMIT dataset, namely gender classification, accent classification, age estimation, and speaker identification, highlighting the potential and challenges of multi-task learning versus single-task models. The motivation for this research is twofold: firstly, to empirically assess the advantages and drawbacks…
▽ More
This study employs deep learning techniques to explore four speaker profiling tasks on the TIMIT dataset, namely gender classification, accent classification, age estimation, and speaker identification, highlighting the potential and challenges of multi-task learning versus single-task models. The motivation for this research is twofold: firstly, to empirically assess the advantages and drawbacks of multi-task learning over single-task models in the context of speaker profiling; secondly, to emphasize the undiminished significance of skillful feature engineering for speaker recognition tasks. The findings reveal challenges in accent classification, and multi-task learning is found advantageous for tasks of similar complexity. Non-sequential features are favored for speaker recognition, but sequential ones can serve as starting points for complex models. The study underscores the necessity of meticulous experimentation and parameter tuning for deep learning models.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
On the Uniqueness of Participation Factors in Nonlinear Dynamical Systems
Authors:
Tianwei Xia,
Kai Sun
Abstract:
In the modal analysis and control of nonlinear dynamical systems, the participation factors of state variables with respect to a critical or selected mode serve as a pivotal tool for simplifying stability studies by focusing on a subset of highly influential state variables. For linear systems, the participation factors of state variables regarding a mode are uniquely determined by the mode's comp…
▽ More
In the modal analysis and control of nonlinear dynamical systems, the participation factors of state variables with respect to a critical or selected mode serve as a pivotal tool for simplifying stability studies by focusing on a subset of highly influential state variables. For linear systems, the participation factors of state variables regarding a mode are uniquely determined by the mode's composition and shape, defined by the system's left and right eigenvectors, respectively. However, the uniqueness of other types of participation factors necessitates further investigation. This paper establishes a sufficient condition for the uniqueness of nonlinear participation factors and five other variants of participation factors, accounting for uncertain scaling factors in a mode's shape and composition. These scaling factors arise from variations in the selection of physical units or the value ranges of state variables when analyzing and controlling real-world dynamical systems. Understanding the sufficient condition of the uniqueness is therefore crucial for the correct application of participation factors in practical scenarios. Additionally, the paper explores the relationship between perturbation magnitudes in state variables and the selection of optimal scaling factors.
△ Less
Submitted 28 March, 2025; v1 submitted 11 March, 2024;
originally announced March 2024.
-
A Semi-Analytical Approach for State-Space Electromagnetic Transient Simulation Using the Differential Transformation
Authors:
Min Xiong,
Kaiyang Huang,
Yang Liu,
Rui Yao,
Kai Sun,
Feng Qiu
Abstract:
Electromagnetic transient (EMT) simulation is a crucial tool for power system dynamic analysis because of its detailed component modeling and high simulation accuracy. However, it suffers from computational burdens for large power grids since a tiny time step is typically required for accuracy. This paper proposes an efficient and accurate semi-analytical approach for state-space EMT simulations o…
▽ More
Electromagnetic transient (EMT) simulation is a crucial tool for power system dynamic analysis because of its detailed component modeling and high simulation accuracy. However, it suffers from computational burdens for large power grids since a tiny time step is typically required for accuracy. This paper proposes an efficient and accurate semi-analytical approach for state-space EMT simulations of power grids. It employs high-order semi-analytical solutions derived using the differential transformation from the state-space EMT grid model. The approach incorporates a proposed variable time step strategy based on equation imbalance, leveraging structural information of the grid model, to enlarge the time step and accelerate simulations, while high resolution is maintained by reconstructing detailed fast EMT dynamics through an efficient dense output mechanism. It also addresses limit-induced switches during large time steps by using a binary search-enhanced quadratic interpolation algorithm. Case studies are conducted on EMT models of the IEEE 39-bus system and a synthetic 390-bus system to demonstrate the merits of the new simulation approach against traditional methods.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
Compensating Removed Frequency Components: Thwarting Voice Spectrum Reduction Attacks
Authors:
Shu Wang,
Kun Sun,
Qi Li
Abstract:
Automatic speech recognition (ASR) provides diverse audio-to-text services for humans to communicate with machines. However, recent research reveals ASR systems are vulnerable to various malicious audio attacks. In particular, by removing the non-essential frequency components, a new spectrum reduction attack can generate adversarial audios that can be perceived by humans but cannot be correctly i…
▽ More
Automatic speech recognition (ASR) provides diverse audio-to-text services for humans to communicate with machines. However, recent research reveals ASR systems are vulnerable to various malicious audio attacks. In particular, by removing the non-essential frequency components, a new spectrum reduction attack can generate adversarial audios that can be perceived by humans but cannot be correctly interpreted by ASR systems. It raises a new challenge for content moderation solutions to detect harmful content in audio and video available on social media platforms. In this paper, we propose an acoustic compensation system named ACE to counter the spectrum reduction attacks over ASR systems. Our system design is based on two observations, namely, frequency component dependencies and perturbation sensitivity. First, since the Discrete Fourier Transform computation inevitably introduces spectral leakage and aliasing effects to the audio frequency spectrum, the frequency components with similar frequencies will have a high correlation. Thus, considering the intrinsic dependencies between neighboring frequency components, it is possible to recover more of the original audio by compensating for the removed components based on the remaining ones. Second, since the removed components in the spectrum reduction attacks can be regarded as an inverse of adversarial noise, the attack success rate will decrease when the adversarial audio is replayed in an over-the-air scenario. Hence, we can model the acoustic propagation process to add over-the-air perturbations into the attacked audio. We implement a prototype of ACE and the experiments show ACE can effectively reduce up to 87.9% of ASR inference errors caused by spectrum reduction attacks. Also, by analyzing residual errors, we summarize six general types of ASR inference errors and investigate the error causes and potential mitigation solutions.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
Three-dimensional echo-shifted EPI with simultaneous blip-up and blip-down acquisitions for correcting geometric distortion
Authors:
Kaibao Sun,
Zhifeng Chen,
Guangyu Dan,
Qingfei Luo,
Lirong Yan,
Feng Liu,
Xiaohong Joe Zhou
Abstract:
Purpose: Echo-planar imaging (EPI) with blip-up/down acquisition (BUDA) can provide high-quality images with minimal distortions by using two readout trains with opposing phase-encoding gradients. Because of the need for two separate acquisitions, BUDA doubles the scan time and degrades the temporal resolution when compared to single-shot EPI, presenting a major challenge for many applications, pa…
▽ More
Purpose: Echo-planar imaging (EPI) with blip-up/down acquisition (BUDA) can provide high-quality images with minimal distortions by using two readout trains with opposing phase-encoding gradients. Because of the need for two separate acquisitions, BUDA doubles the scan time and degrades the temporal resolution when compared to single-shot EPI, presenting a major challenge for many applications, particularly functional MRI (fMRI). This study aims at overcoming this challenge by developing an echo-shifted EPI BUDA (esEPI-BUDA) technique to acquire both blip-up and blip-down datasets in a single shot. Methods: A three-dimensional (3D) esEPI-BUDA pulse sequence was designed by using an echo-shifting strategy to produce two EPI readout trains. These readout trains produced a pair of k-space datasets whose k-space trajectories were interleaved with opposite phase-encoding gradient directions. The two k-space datasets were separately reconstructed using a 3D SENSE algorithm, from which time-resolved B0-field maps were derived using TOPUP in FSL and then input into a forward model of joint parallel imaging reconstruction to correct for geometric distortion. In addition, Hankel structured low-rank constraint was incorporated into the reconstruction framework to improve image quality by mitigating the phase errors between the two interleaved k-space datasets. Results: The 3D esEPI-BUDA technique was demonstrated in a phantom and an fMRI study on healthy human subjects. Geometric distortions were effectively corrected in both phantom and human brain images. In the fMRI study, the visual activation volumes and their BOLD responses were comparable to those from conventional 3D echo-planar images. Conclusion: The improved imaging efficiency and dynamic distortion correction capability afforded by 3D esEPI-BUDA are expected to benefit many EPI applications.
△ Less
Submitted 12 August, 2023;
originally announced August 2023.
-
Trend-Based SAC Beam Control Method with Zero-Shot in Superconducting Linear Accelerator
Authors:
Xiaolong Chen,
Xin Qi,
Chunguang Su,
Yuan He,
Zhijun Wang,
Kunxiang Sun,
Chao Jin,
Weilong Chen,
Shuhui Liu,
Xiaoying Zhao,
Duanyang Jia,
Man Yi
Abstract:
The superconducting linear accelerator is a highly flexiable facility for modern scientific discoveries, necessitating weekly reconfiguration and tuning. Accordingly, minimizing setup time proves essential in affording users with ample experimental time. We propose a trend-based soft actor-critic(TBSAC) beam control method with strong robustness, allowing the agents to be trained in a simulated en…
▽ More
The superconducting linear accelerator is a highly flexiable facility for modern scientific discoveries, necessitating weekly reconfiguration and tuning. Accordingly, minimizing setup time proves essential in affording users with ample experimental time. We propose a trend-based soft actor-critic(TBSAC) beam control method with strong robustness, allowing the agents to be trained in a simulated environment and applied to the real accelerator directly with zero-shot. To validate the effectiveness of our method, two different typical beam control tasks were performed on China Accelerator Facility for Superheavy Elements (CAFe II) and a light particle injector(LPI) respectively. The orbit correction tasks were performed in three cryomodules in CAFe II seperately, the time required for tuning has been reduced to one-tenth of that needed by human experts, and the RMS values of the corrected orbit were all less than 1mm. The other transmission efficiency optimization task was conducted in the LPI, our agent successfully optimized the transmission efficiency of radio-frequency quadrupole(RFQ) to over $85\%$ within 2 minutes. The outcomes of these two experiments offer substantiation that our proposed TBSAC approach can efficiently and effectively accomplish beam commissioning tasks while upholding the same standard as skilled human experts. As such, our method exhibits potential for future applications in other accelerator commissioning fields.
△ Less
Submitted 25 May, 2023; v1 submitted 23 May, 2023;
originally announced May 2023.
-
Semi-Analytical Electromagnetic Transient Simulation Using Differential Transformation
Authors:
Min Xiong,
Rui Yao,
Yang Liu,
Kai Sun,
Feng Qiu
Abstract:
For electromagnetic transient (EMT) simulation of a power system, a state-space-based approach needs to solve state-space EMT equations by using numerical integration methods, e.g., the Euler method, Runge-Kutta methods, and trapezoidal-rule method, at small time steps. The simulation can be slow on a power system having multiple generators. To speed up state-space-based EMT simulations, this pape…
▽ More
For electromagnetic transient (EMT) simulation of a power system, a state-space-based approach needs to solve state-space EMT equations by using numerical integration methods, e.g., the Euler method, Runge-Kutta methods, and trapezoidal-rule method, at small time steps. The simulation can be slow on a power system having multiple generators. To speed up state-space-based EMT simulations, this paper proposes a Differential Transformation based semi-analytical method that repeatedly utilizes a high-order semi-analytical solution of the EMT equations at longer time steps. The proposed semi-analytical method is tested on the detailed EMT model of a four-generator two-area system. Simulation results show the significant potential of the proposed method to accelerate EMT simulations of power systems compared with traditional numerical methods.
△ Less
Submitted 18 February, 2023;
originally announced February 2023.
-
A Heterogeneous Multiscale Method for Power System Simulation Considering Electromagnetic Transients
Authors:
Kaiyang Huang,
Min Xiong,
Yang Liu,
Kai Sun,
Feng Qiu
Abstract:
Traditional dynamic security assessment faces challenges as power systems are experiencing a transformation to inverter-based-resource (IBR) dominated systems, for which electromagnetic transient (EMT) dynamics have to be considered. However, EMT simulation is time-consuming especially for a large power grid because the mathematical model based on detailed component modeling is highly stiff and ne…
▽ More
Traditional dynamic security assessment faces challenges as power systems are experiencing a transformation to inverter-based-resource (IBR) dominated systems, for which electromagnetic transient (EMT) dynamics have to be considered. However, EMT simulation is time-consuming especially for a large power grid because the mathematical model based on detailed component modeling is highly stiff and needs to be integrated at tiny time steps due to numerical stability. This paper proposes a heterogeneous multiscale method (HMM) to address the simulation of a power system considering EMT dynamics as a multiscale problem. The method aims to accurately simulate the macroscopic dynamics of the system even when EMT dynamics are dominating. By force estimation using a kernel function, the proposed method automatically generates a macro model on the fly of simulation based on the micro model of EMT dynamics. It can flexibly switch between the micro- and macro-models to capture important EMT dynamics during some time intervals while skipping over other time intervals of less interest to achieve a superior simulation speed. The method is illustrated by a case study on a two-machine EMT model to demonstrate its potential for power system simulation.
△ Less
Submitted 18 February, 2023;
originally announced February 2023.
-
Optimal Distributed Voltage Control via Primal Dual Gradient Dynamics
Authors:
Mohammed N. Khamees,
Yang Liu,
Kai Sun
Abstract:
The rapidly increasing penetration of inverter-based resources into a power transmission network requires more sophisticated voltage control strategies considering their inherent output variabilities. In addition, faults and load variations affect the voltage profile over the power network. This paper proposes a Primal Dual Gradient Dynamics based optimal distributed voltage control approach that…
▽ More
The rapidly increasing penetration of inverter-based resources into a power transmission network requires more sophisticated voltage control strategies considering their inherent output variabilities. In addition, faults and load variations affect the voltage profile over the power network. This paper proposes a Primal Dual Gradient Dynamics based optimal distributed voltage control approach that optimizes outputs of distributed reactive power sources to maintain an acceptable voltage profile while preserving operational limits. Case studies of this new approach on IEEE test systems have verified its effectiveness.
△ Less
Submitted 15 February, 2023;
originally announced February 2023.
-
A GPU-Accelerated Light-field Super-resolution Framework Based on Mixed Noise Model and Weighted Regularization
Authors:
Trung-Hieu Tran,
Kaicong Sun,
Sven Simon
Abstract:
This paper presents a GPU-accelerated computational framework for reconstructing high resolution (HR) LF images under a mixed Gaussian-Impulse noise condition. The main focus is on developing a high-performance approach considering processing speed and reconstruction quality. From a statistical perspective, we derive a joint $\ell^1$-$\ell^2$ data fidelity term for penalizing the HR reconstruction…
▽ More
This paper presents a GPU-accelerated computational framework for reconstructing high resolution (HR) LF images under a mixed Gaussian-Impulse noise condition. The main focus is on developing a high-performance approach considering processing speed and reconstruction quality. From a statistical perspective, we derive a joint $\ell^1$-$\ell^2$ data fidelity term for penalizing the HR reconstruction error taking into account the mixed noise situation. For regularization, we employ the weighted non-local total variation approach, which allows us to effectively realize LF image prior through a proper weighting scheme. We show that the alternating direction method of multipliers algorithm (ADMM) can be used to simplify the computation complexity and results in a high-performance parallel computation on the GPU Platform. An extensive experiment is conducted on both synthetic 4D LF dataset and natural image dataset to validate the proposed SR model's robustness and evaluate the accelerated optimizer's performance. The experimental results show that our approach achieves better reconstruction quality under severe mixed-noise conditions as compared to the state-of-the-art approaches. In addition, the proposed approach overcomes the limitation of the previous work in handling large-scale SR tasks. While fitting within a single off-the-shelf GPU, the proposed accelerator provides an average speedup of 2.46$\times$ and 1.57$\times$ for $\times 2$ and $\times 3$ SR tasks, respectively. In addition, a speedup of $77\times$ is achieved as compared to CPU execution.
△ Less
Submitted 9 June, 2022;
originally announced June 2022.
-
Preparing data for pathological artificial intelligence with clinical-grade performance
Authors:
Yuanqing Yang,
Kai Sun,
Yanhua Gao,
Kuangsong Wang,
Gang Yu
Abstract:
[Purpose] The pathology is decisive for disease diagnosis, but relies heavily on the experienced pathologists. Recently, pathological artificial intelligence (PAI) is thought to improve diagnostic accuracy and efficiency. However, the high performance of PAI based on deep learning in the laboratory generally cannot be reproduced in the clinic. [Methods] Because the data preparation is important fo…
▽ More
[Purpose] The pathology is decisive for disease diagnosis, but relies heavily on the experienced pathologists. Recently, pathological artificial intelligence (PAI) is thought to improve diagnostic accuracy and efficiency. However, the high performance of PAI based on deep learning in the laboratory generally cannot be reproduced in the clinic. [Methods] Because the data preparation is important for PAI, the paper has reviewed PAI-related studies in the PubMed database published from January 2017 to February 2022, and 118 studies were included. The in-depth analysis of methods for preparing data is performed, including obtaining slides of pathological tissue, cleaning, screening, and then digitizing. Expert review, image annotation, dataset division for model training and validation are also discussed. We further discuss the reasons why the high performance of PAI is not reproducible in the clinical practices and show some effective ways to improve clinical performances of PAI. [Results] The robustness of PAI depend on randomized collection of representative disease slides, including rigorous quality control and screening, correction of digital discrepancies, reasonable annotation, and the amount of data. The digital pathology is fundamental of clinical-grade PAI, and the techniques of data standardization and weakly supervised learning methods based on whole slide image (WSI) are effective ways to overcome obstacles of performance reproduction. [Conclusion] The representative data, the amount of labeling and consistency from multi-centers is the key to performance reproduction. The digital pathology for clinical diagnosis, data standardization and technique of WSI-based weakly supervised learning hopefully build clinical-grade PAI. Keywords: pathological artificial intelligence; data preparation; clinical-grade; deep learning
△ Less
Submitted 22 May, 2022;
originally announced May 2022.
-
AI-enabled Automatic Multimodal Fusion of Cone-Beam CT and Intraoral Scans for Intelligent 3D Tooth-Bone Reconstruction and Clinical Applications
Authors:
Jin Hao,
Jiaxiang Liu,
Jin Li,
Wei Pan,
Ruizhe Chen,
Huimin Xiong,
Kaiwei Sun,
Hangzheng Lin,
Wanlu Liu,
Wanghui Ding,
Jianfei Yang,
Haoji Hu,
Yueling Zhang,
Yang Feng,
Zeyu Zhao,
Huikai Wu,
Youyi Zheng,
Bing Fang,
Zuozhu Liu,
Zhihe Zhao
Abstract:
A critical step in virtual dental treatment planning is to accurately delineate all tooth-bone structures from CBCT with high fidelity and accurate anatomical information. Previous studies have established several methods for CBCT segmentation using deep learning. However, the inherent resolution discrepancy of CBCT and the loss of occlusal and dentition information largely limited its clinical ap…
▽ More
A critical step in virtual dental treatment planning is to accurately delineate all tooth-bone structures from CBCT with high fidelity and accurate anatomical information. Previous studies have established several methods for CBCT segmentation using deep learning. However, the inherent resolution discrepancy of CBCT and the loss of occlusal and dentition information largely limited its clinical applicability. Here, we present a Deep Dental Multimodal Analysis (DDMA) framework consisting of a CBCT segmentation model, an intraoral scan (IOS) segmentation model (the most accurate digital dental model), and a fusion model to generate 3D fused crown-root-bone structures with high fidelity and accurate occlusal and dentition information. Our model was trained with a large-scale dataset with 503 CBCT and 28,559 IOS meshes manually annotated by experienced human experts. For CBCT segmentation, we use a five-fold cross validation test, each with 50 CBCT, and our model achieves an average Dice coefficient and IoU of 93.99% and 88.68%, respectively, significantly outperforming the baselines. For IOS segmentations, our model achieves an mIoU of 93.07% and 95.70% on the maxillary and mandible on a test set of 200 IOS meshes, which are 1.77% and 3.52% higher than the state-of-art method. Our DDMA framework takes about 20 to 25 minutes to generate the fused 3D mesh model following the sequential processing order, compared to over 5 hours by human experts. Notably, our framework has been incorporated into a software by a clear aligner manufacturer, and real-world clinical cases demonstrate that our model can visualize crown-root-bone structures during the entire orthodontic treatment and can predict risks like dehiscence and fenestration. These findings demonstrate the potential of multi-modal deep learning to improve the quality of digital dental models and help dentists make better clinical decisions.
△ Less
Submitted 11 March, 2022;
originally announced March 2022.
-
Participation Factor-Based Adaptive Model Reduction for Fast Power System Simulation
Authors:
Mahsa Sajjadi,
Kaiyang Huang,
Kai Sun
Abstract:
This paper describes an adaptive method to reduce a nonlinear power system model for fast and accurate transient stability simulation. It presents an approach to analyze and rank participation factors of each system state variable into dominant system modes excited by a disturbance so as to determine which regions or generators can be reduced without impacting the accuracy of simulation for a stud…
▽ More
This paper describes an adaptive method to reduce a nonlinear power system model for fast and accurate transient stability simulation. It presents an approach to analyze and rank participation factors of each system state variable into dominant system modes excited by a disturbance so as to determine which regions or generators can be reduced without impacting the accuracy of simulation for a study area. In this approach, the generator models located in an external area with large participation factors are nonlinearly reduced and the rest of the generators will be linearized. The simulation results confirm that the assessment of the level of interaction between generators and system modes by participation factors is effective in enhancing the accuracy and speed of power system models. The proposed method is applied to the Northeastern Power Coordinating Council region system with 48-machine, 140-bus power system model and the results are compared with two cases including fully linearized model reduction and model reduction using the rotor angle deviation criteria.
△ Less
Submitted 9 March, 2022;
originally announced March 2022.
-
Machine Learning based Optimal Feedback Control for Microgrid Stabilization
Authors:
Tianwei Xia,
Kai Sun,
Wei Kang
Abstract:
Microgrids have more operational flexibilities as well as uncertainties than conventional power grids, especially when renewable energy resources are utilized. An energy storage based feedback controller can compensate undesired dynamics of a microgrid to improve its stability. However, the optimal feedback control of a microgrid subject to a large disturbance needs to solve a Hamilton-Jacobi-Bell…
▽ More
Microgrids have more operational flexibilities as well as uncertainties than conventional power grids, especially when renewable energy resources are utilized. An energy storage based feedback controller can compensate undesired dynamics of a microgrid to improve its stability. However, the optimal feedback control of a microgrid subject to a large disturbance needs to solve a Hamilton-Jacobi-Bellman problem. This paper proposes a machine learning-based optimal feedback control scheme. Its training dataset is generated from a linear-quadratic regulator and a brute-force method respectively addressing small and large disturbances. Then, a three-layer neural network is constructed from the data for the purpose of optimal feedback control. A case study is carried out for a microgrid model based on a modified Kundur two-area system to test the real-time performance of the proposed control scheme.
△ Less
Submitted 9 March, 2022;
originally announced March 2022.
-
Time-variant Nonlinear Participation Factors Considering Resonances in Power Systems
Authors:
Tianwei Xia,
Kai Sun
Abstract:
The participation factor (PF), as an important modal property for small-signal stability, evaluates the linkage between a state variable and a mode. Applying the normal form theory, a nonlinear PF can be defined to evaluate the participation of a state variable into modal dynamics following a large disturbance, that gives considerations to resonances and nonlinearities up to a desired order. Howev…
▽ More
The participation factor (PF), as an important modal property for small-signal stability, evaluates the linkage between a state variable and a mode. Applying the normal form theory, a nonlinear PF can be defined to evaluate the participation of a state variable into modal dynamics following a large disturbance, that gives considerations to resonances and nonlinearities up to a desired order. However, existing nonlinear PFs are inconsistent with the conventional linear PF when nonlinear dynamics following a large disturbance attenuate and linear modal dynamics become dominating. This paper proposes a time-variant nonlinear PF by introducing a time decaying factor and the definition of a nonlinear mode. The new PFs consider modes of resonances and their values naturally transition to a linear PF when the system state becomes close to its equilibrium. The case study on a two-area four-generator system shows that the new PF can correctly rank generators by their participations in natural and resonance modes of nonlinear oscillation subject to a large disturbance.
△ Less
Submitted 9 March, 2022;
originally announced March 2022.
-
Multi-Material Blind Beam Hardening Correction Based on Non-Linearity Adjustment of Projections
Authors:
Ammar Alsaffar,
Kaicong Sun,
Sven Simon
Abstract:
Beam hardening (BH) is one of the major artifacts that severely reduces the quality of Computed Tomography (CT) imaging. In a polychromatic X-ray beam, since low-energy photons are more preferentially absorbed, the attenuation of the beam is no longer a linear function of the absorber thickness. The existing BH correction methods either require a given material, which might be unfeasible in realit…
▽ More
Beam hardening (BH) is one of the major artifacts that severely reduces the quality of Computed Tomography (CT) imaging. In a polychromatic X-ray beam, since low-energy photons are more preferentially absorbed, the attenuation of the beam is no longer a linear function of the absorber thickness. The existing BH correction methods either require a given material, which might be unfeasible in reality, or they require a long computation time. This work aims to propose a fast and accurate BH correction method that requires no prior knowledge of the materials and corrects first and higher-order BH artifacts. In the first step, a wide sweep of the material is performed based on an experimentally measured look-up table to obtain the closest estimate of the material. Then the non-linearity effect of the BH is corrected by adding the difference between the estimated monochromatic and the polychromatic simulated projections of the segmented image. The estimated monochromatic projection is simulated by selecting the energy from the polychromatic spectrum which produces the lowest mean square error (MSE) with the acquired projection from the scanner. The polychromatic projection is estimated by minimizing the difference between the acquired projection and the weighted sum of the simulated polychromatic projections using different spectra of different filtration. To evaluate the proposed BH correction method, we have conducted extensive experiments on the real-world CT data. Compared to the state-of-the-art empirical BH correction method, the experiments show that the proposed method can highly reduce the BH artifacts without prior knowledge of the materials.
△ Less
Submitted 9 March, 2022;
originally announced March 2022.
-
Computational Scatter Correction for High-Resolution Flat-Panel CT Based on a Fast Monte Carlo Photon Transport Model
Authors:
Ammar Alsaffar,
Steffen Kieß,
Kaicong Sun,
Sven Simon
Abstract:
In computed tomography (CT) reconstruction, scattering causes server quality degradation of the reconstructed CT images by introducing streaks and cupping artifacts which reduce the detectability of low contrast objects. Monte Carlo (MC) simulation is considered as the most accurate approach for scatter estimation. However, the existing MC estimators are computationally expensive especially for th…
▽ More
In computed tomography (CT) reconstruction, scattering causes server quality degradation of the reconstructed CT images by introducing streaks and cupping artifacts which reduce the detectability of low contrast objects. Monte Carlo (MC) simulation is considered as the most accurate approach for scatter estimation. However, the existing MC estimators are computationally expensive especially for the considered high-resolution flat-panel CT. In this paper, we propose a fast and accurate photon transport model which describes the physics within the 1 keV to 1 MeV range using multiple controllable key parameters. Based on this model, scatter computation for a single projection can be completed within a range of few seconds under well-defined model parameters. Smoothing and interpolation are performed on the estimated scatter to accelerate the scatter calculation without compromising accuracy too much compared to measured near scatter-free projection images. Combining the scatter estimation with the filtered backprojection (FBP), scatter correction is performed effectively in an iterative manner. In order to evaluate the proposed MC model, we have conducted extensive experiments on the simulated data and real-world high-resolution flat-panel CT. Comparing to the state-of-the-art MC simulators, our photon transport model achieved a 202$\times$ speed-up on a four GPU system comparing to the multi-threaded state-of-the-art EGSnrc MC simulator. Besides, it is shown that for real-world high-resolution flat-panel CT, scatter correction with sufficient accuracy is accomplished within one to three iterations using a FBP and a forward projection computed with the proposed fast MC photon transport model.
△ Less
Submitted 31 January, 2022;
originally announced January 2022.
-
SGUIE-Net: Semantic Attention Guided Underwater Image Enhancement with Multi-Scale Perception
Authors:
Qi Qi,
Kunqian Li,
Haiyong Zheng,
Xiang Gao,
Guojia Hou,
Kun Sun
Abstract:
Due to the wavelength-dependent light attenuation, refraction and scattering, underwater images usually suffer from color distortion and blurred details. However, due to the limited number of paired underwater images with undistorted images as reference, training deep enhancement models for diverse degradation types is quite difficult. To boost the performance of data-driven approaches, it is esse…
▽ More
Due to the wavelength-dependent light attenuation, refraction and scattering, underwater images usually suffer from color distortion and blurred details. However, due to the limited number of paired underwater images with undistorted images as reference, training deep enhancement models for diverse degradation types is quite difficult. To boost the performance of data-driven approaches, it is essential to establish more effective learning mechanisms that mine richer supervised information from limited training sample resources. In this paper, we propose a novel underwater image enhancement network, called SGUIE-Net, in which we introduce semantic information as high-level guidance across different images that share common semantic regions. Accordingly, we propose semantic region-wise enhancement module to perceive the degradation of different semantic regions from multiple scales and feed it back to the global attention features extracted from its original scale. This strategy helps to achieve robust and visually pleasant enhancements to different semantic objects, which should thanks to the guidance of semantic information for differentiated enhancement. More importantly, for those degradation types that are not common in the training sample distribution, the guidance connects them with the already well-learned types according to their semantic relevance. Extensive experiments on the publicly available datasets and our proposed dataset demonstrated the impressive performance of SGUIE-Net. The code and proposed dataset are available at: https://trentqq.github.io/SGUIE-Net.html
△ Less
Submitted 8 January, 2022;
originally announced January 2022.
-
A Resolution Enhancement Plug-in for Deformable Registration of Medical Images
Authors:
Kaicong Sun,
Sven Simon
Abstract:
Image registration is a fundamental task for medical imaging. Resampling of the intensity values is required during registration and better spatial resolution with finer and sharper structures can improve the resampling performance and hence the registration accuracy. Super-resolution (SR) is an algorithmic technique targeting at spatial resolution enhancement which can achieve an image resolution…
▽ More
Image registration is a fundamental task for medical imaging. Resampling of the intensity values is required during registration and better spatial resolution with finer and sharper structures can improve the resampling performance and hence the registration accuracy. Super-resolution (SR) is an algorithmic technique targeting at spatial resolution enhancement which can achieve an image resolution beyond the hardware limitation. In this work, we consider SR as a preprocessing technique and present a CNN-based resolution enhancement module (REM) which can be easily plugged into the registration network in a cascaded manner. Different residual schemes and network configurations of REM are investigated to obtain an effective architecture design of REM. In fact, REM is not confined to image registration, it can also be straightforwardly integrated into other vision tasks for enhanced resolution. The proposed REM is thoroughly evaluated for deformable registration on medical images quantitatively and qualitatively at different upscaling factors. Experiments on LPBA40 brain MRI dataset demonstrate that REM not only improves the registration accuracy, especially when the input images suffer from degraded spatial resolution, but also generates resolution enhanced images which can be exploited for successive diagnosis.
△ Less
Submitted 30 December, 2021;
originally announced December 2021.
-
Asymptotic Learning Requirements for Stealth Attacks on Linearized State Estimation
Authors:
Ke Sun,
Iñaki Esnaola,
Antonia M. Tulino,
H. Vincent Poor
Abstract:
Information-theoretic stealth attacks are data injection attacks that minimize the amount of information acquired by the operator about the state variables, while simultaneously limiting the Kullback-Leibler divergence between the distribution of the measurements under attack and the distribution under normal operation with the aim of controling the probability of detection. For Gaussian distribut…
▽ More
Information-theoretic stealth attacks are data injection attacks that minimize the amount of information acquired by the operator about the state variables, while simultaneously limiting the Kullback-Leibler divergence between the distribution of the measurements under attack and the distribution under normal operation with the aim of controling the probability of detection. For Gaussian distributed state variables, attack construction requires knowledge of the second order statistics of the state variables, which is estimated from a finite number of past realizations using a sample covariance matrix. Within this framework, the attack performance is studied for the attack construction with the sample covariance matrix. This results in an analysis of the amount of data required to learn the covariance matrix of the state variables used on the attack construction. The ergodic attack performance is characterized using asymptotic random matrix theory tools, and the variance of the attack performance is bounded. The ergodic performance and the variance bounds are assessed with simulations on IEEE test systems.
△ Less
Submitted 11 January, 2023; v1 submitted 22 December, 2021;
originally announced December 2021.
-
Data-Driven Computational Methods for the Domain of Attraction and Zubov's Equation
Authors:
Wei Kang,
Kai Sun,
Liang Xu
Abstract:
This paper deals with a special type of Lyapunov functions, namely the solution of Zubov's equation. Such a function can be used to characterize the domain of attraction for systems of ordinary differential equations. We derive and prove an integral form solution to Zubov's equation. For numerical computation, we develop two data-driven methods. One is based on the integration of an augmented syst…
▽ More
This paper deals with a special type of Lyapunov functions, namely the solution of Zubov's equation. Such a function can be used to characterize the domain of attraction for systems of ordinary differential equations. We derive and prove an integral form solution to Zubov's equation. For numerical computation, we develop two data-driven methods. One is based on the integration of an augmented system of differential equations; and the other one is based on deep learning. The former is effective for systems with a relatively low state space dimension and the latter is developed for high dimensional problems. The deep learning method is applied to a New England 10-generator power system model. We prove that a neural network approximation exists for the Lyapunov function of power systems such that the approximation error is a cubic polynomial of the number of generators. The error convergence rate as a function of n, the number of neurons, is proved.
△ Less
Submitted 29 December, 2021;
originally announced December 2021.
-
Partial Symbol Recovery for Interference Resilience in Low-Power Wide Area Networks
Authors:
Kai Sun,
Zhimeng Yin,
Weiwei Chen,
Shuai Wang,
Zeyu Zhang,
Tian He
Abstract:
Recent years have witnessed the proliferation of Low-power Wide Area Networks (LPWANs) in the unlicensed band for various Internet-of-Things (IoT) applications. Due to the ultra-low transmission power and long transmission duration, LPWAN devices inevitably suffer from high power Cross Technology Interference (CTI), such as interference from Wi-Fi, coexisting in the same spectrum. To alleviate thi…
▽ More
Recent years have witnessed the proliferation of Low-power Wide Area Networks (LPWANs) in the unlicensed band for various Internet-of-Things (IoT) applications. Due to the ultra-low transmission power and long transmission duration, LPWAN devices inevitably suffer from high power Cross Technology Interference (CTI), such as interference from Wi-Fi, coexisting in the same spectrum. To alleviate this issue, this paper introduces the Partial Symbol Recovery (PSR) scheme for improving the CTI resilience of LPWAN. We verify our idea on LoRa, a widely adopted LPWAN technique, as a proof of concept. At the PHY layer, although CTI has much higher power, its duration is relatively shorter compared with LoRa symbols, leaving part of a LoRa symbol uncorrupted. Moreover, due to its high redundancy, LoRa chips within a symbol are highly correlated. This opens the possibility of detecting a LoRa symbol with only part of the chips. By examining the unique frequency patterns in LoRa symbols with time-frequency analysis, our design effectively detects the clean LoRa chips that are free of CTI. This enables PSR to only rely on clean LoRa chips for successfully recovering from communication failures. We evaluate our PSR design with real-world testbeds, including SX1280 LoRa chips and USRP B210, under Wi-Fi interference in various scenarios. Extensive experiments demonstrate that our design offers reliable packet recovery performance, successfully boosting the LoRa packet reception ratio from 45.2% to 82.2% with a performance gain of 1.8 times.
△ Less
Submitted 8 September, 2021;
originally announced September 2021.
-
FL-MISR: Fast Large-Scale Multi-Image Super-Resolution for Computed Tomography Based on Multi-GPU Acceleration
Authors:
Kaicong Sun,
Trung-Hieu Tran,
Jajnabalkya Guhathakurta,
Sven Simon
Abstract:
Multi-image super-resolution (MISR) usually outperforms single-image super-resolution (SISR) under a proper inter-image alignment by explicitly exploiting the inter-image correlation. However, the large computational demand encumbers the deployment of MISR in practice. In this work, we propose a distributed optimization framework based on data parallelism for fast large-scale MISR using multi-GPU…
▽ More
Multi-image super-resolution (MISR) usually outperforms single-image super-resolution (SISR) under a proper inter-image alignment by explicitly exploiting the inter-image correlation. However, the large computational demand encumbers the deployment of MISR in practice. In this work, we propose a distributed optimization framework based on data parallelism for fast large-scale MISR using multi-GPU acceleration named FL-MISR. The scaled conjugate gradient (SCG) algorithm is applied to the distributed subfunctions and the local SCG variables are communicated to synchronize the convergence rate over multi-GPU systems towards a consistent convergence. Furthermore, an inner-outer border exchange scheme is performed to obviate the border effect between neighboring GPUs. The proposed FL-MISR is applied to the computed tomography (CT) system by super-resolving the projections acquired by subpixel detector shift. The SR reconstruction is performed on the fly during the CT acquisition such that no additional computation time is introduced. FL-MISR is extensively evaluated from different aspects and experimental results demonstrate that FL-MISR effectively improves the spatial resolution of CT systems in modulation transfer function (MTF) and visual perception. Comparing to a multi-core CPU implementation, FL-MISR achieves a more than 50x speedup on an off-the-shelf 4-GPU system.
△ Less
Submitted 5 October, 2021; v1 submitted 9 August, 2021;
originally announced August 2021.
-
AOSLO-net: A deep learning-based method for automatic segmentation of retinal microaneurysms from adaptive optics scanning laser ophthalmoscope images
Authors:
Qian Zhang,
Konstantina Sampani,
Mengjia Xu,
Shengze Cai,
Yixiang Deng,
He Li,
Jennifer K. Sun,
George Em Karniadakis
Abstract:
Microaneurysms (MAs) are one of the earliest signs of diabetic retinopathy (DR), a frequent complication of diabetes that can lead to visual impairment and blindness. Adaptive optics scanning laser ophthalmoscopy (AOSLO) provides real-time retinal images with resolution down to 2 $μm$ and thus allows detection of the morphologies of individual MAs, a potential marker that might dictate MA patholog…
▽ More
Microaneurysms (MAs) are one of the earliest signs of diabetic retinopathy (DR), a frequent complication of diabetes that can lead to visual impairment and blindness. Adaptive optics scanning laser ophthalmoscopy (AOSLO) provides real-time retinal images with resolution down to 2 $μm$ and thus allows detection of the morphologies of individual MAs, a potential marker that might dictate MA pathology and affect the progression of DR. In contrast to the numerous automatic models developed for assessing the number of MAs on fundus photographs, currently there is no high throughput image protocol available for automatic analysis of AOSLO photographs. To address this urgency, we introduce AOSLO-net, a deep neural network framework with customized training policies to automatically segment MAs from AOSLO images. We evaluate the performance of AOSLO-net using 87 DR AOSLO images and our results demonstrate that the proposed model outperforms the state-of-the-art segmentation model both in accuracy and cost and enables correct MA morphological classification.
△ Less
Submitted 25 June, 2021; v1 submitted 5 June, 2021;
originally announced June 2021.
-
Bilateral Spectrum Weighted Total Variation for Noisy-Image Super-Resolution and Image Denoising
Authors:
Kaicong Sun,
Sven Simon
Abstract:
In this paper, we propose a regularization technique for noisy-image super-resolution and image denoising. Total variation (TV) regularization is adopted in many image processing applications to preserve the local smoothness. However, TV prior is prone to oversmoothness, staircasing effect, and contrast losses. Nonlocal TV (NLTV) mitigates the contrast losses by adaptively weighting the smoothness…
▽ More
In this paper, we propose a regularization technique for noisy-image super-resolution and image denoising. Total variation (TV) regularization is adopted in many image processing applications to preserve the local smoothness. However, TV prior is prone to oversmoothness, staircasing effect, and contrast losses. Nonlocal TV (NLTV) mitigates the contrast losses by adaptively weighting the smoothness based on the similarity measure of image patches. Although it suppresses the noise effectively in the flat regions, it might leave residual noise surrounding the edges especially when the image is not oversmoothed. To address this problem, we propose the bilateral spectrum weighted total variation (BSWTV). Specially, we apply a locally adaptive shrink coefficient to the image gradients and employ the eigenvalues of the covariance matrix of the weighted image gradients to effectively refine the weighting map and suppress the residual noise. In conjunction with the data fidelity term derived from a mixed Poisson-Gaussian noise model, the objective function is decomposed and solved by the alternating direction method of multipliers (ADMM) algorithm. In order to remove outliers and facilitate the convergence stability, the weighting map is smoothed by a Gaussian filter with an iteratively decreased kernel width and updated in a momentum-based manner in each ADMM iteration. We benchmark our method with the state-of-the-art approaches on the public real-world datasets for super-resolution and image denoising. Experiments show that the proposed method obtains outstanding performance for super-resolution and achieves promising results for denoising on real-world images.
△ Less
Submitted 5 June, 2021; v1 submitted 1 June, 2021;
originally announced June 2021.
-
Multi-scale super-resolution generation of low-resolution scanned pathological images
Authors:
Kai Sun,
Yanhua Gao,
Ting Xie,
Xun Wang,
Qingqing Yang,
Le Chen,
Kuansong Wang,
Gang Yu
Abstract:
Background. Digital pathology has aroused widespread interest in modern pathology. The key of digitalization is to scan the whole slide image (WSI) at high magnification. The lager the magnification is, the richer details WSI will provide, but the scanning time is longer and the file size of obtained is larger. Methods. We design a strategy to scan slides with low resolution (5X) and a super-resol…
▽ More
Background. Digital pathology has aroused widespread interest in modern pathology. The key of digitalization is to scan the whole slide image (WSI) at high magnification. The lager the magnification is, the richer details WSI will provide, but the scanning time is longer and the file size of obtained is larger. Methods. We design a strategy to scan slides with low resolution (5X) and a super-resolution method is proposed to restore the image details when in diagnosis. The method is based on a multi-scale generative adversarial network, which sequentially generates three high-resolution images such as 10X, 20X and 40X. Results. The peak-signal-to-noise-ratio of 10X to 40X generated images are 24.16, 22.27 and 20.44, and the structural-similarity-index are 0.845, 0.680 and 0.512, which are better than other super-resolution networks. Visual scoring average and standard deviation from three pathologists is 3.63 plus-minus 0.52, 3.70 plus-minus 0.57 and 3.74 plus-minus 0.56 and the p value of analysis of variance is 0.37, indicating that generated images include sufficient information for diagnosis. The average value of Kappa test is 0.99, meaning the diagnosis of generated images is highly consistent with that of the real images. Conclusion. This proposed method can generate high-quality 10X, 20X, 40X images from 5X images at the same time, in which the time and storage costs of digitalization can be effectively reduced up to 1/64 of the previous costs. The proposed method provides a better alternative for low-cost storage, faster image share of digital pathology. Keywords. Digital pathology; Super-resolution; Low resolution scanning; Low cost
△ Less
Submitted 30 July, 2021; v1 submitted 15 May, 2021;
originally announced May 2021.
-
A Soft-Aided Staircase Decoder Using Three-Level Channel Reliabilities
Authors:
Yi Lei,
Bin Chen,
Gabriele Liga,
Alexios Balatsoukas-Stimming,
Kaixuan Sun,
Alex Alvarado
Abstract:
The soft-aided bit-marking (SABM) algorithm is based on the idea of marking bits as highly reliable bits (HRBs), highly unreliable bits (HUBs), and uncertain bits to improve the performance of hard-decision (HD) decoders. The HRBs and HUBs are used to assist the HD decoders to prevent miscorrections and to decode those originally uncorrectable cases via bit flipping (BF), respectively. In this pap…
▽ More
The soft-aided bit-marking (SABM) algorithm is based on the idea of marking bits as highly reliable bits (HRBs), highly unreliable bits (HUBs), and uncertain bits to improve the performance of hard-decision (HD) decoders. The HRBs and HUBs are used to assist the HD decoders to prevent miscorrections and to decode those originally uncorrectable cases via bit flipping (BF), respectively. In this paper, an improved SABM algorithm (called iSABM) is proposed for staircase codes (SCCs). Similar to the SABM, iSABM marks bits with the help of channel reliabilities, i.e., using the absolute values of the log-likelihood ratios. The improvements offered by iSABM include: (i) HUBs being classified using a reliability threshold, (ii) BF randomly selecting HUBs, and (iii) soft-aided decoding over multiple SCC blocks. The decoding complexity of iSABM is comparable of that of SABM. This is due to the fact that on the one hand no sorting is required (lower complexity) because of the use of a threshold for HUBs, while on the other hand multiple SCC blocks use soft information (higher complexity). Additional gains of up to 0.53 dB with respect to SABM and 0.91 dB with respect to standard SCC decoding at a bit error rate of $10^{-6}$ are reported. Furthermore, it is shown that using 1-bit reliability marking, i.e., only having HRBs and HUBs, only causes a gain penalty of up to 0.25 dB with a significantly reduced memory requirement.
△ Less
Submitted 17 March, 2021;
originally announced March 2021.
-
Deep Convolutional Sparse Coding Network for Pansharpening with Guidance of Side Information
Authors:
Shuang Xu,
Jiangshe Zhang,
Kai Sun,
Zixiang Zhao,
Lu Huang,
Junmin Liu,
Chunxia Zhang
Abstract:
Pansharpening is a fundamental issue in remote sensing field. This paper proposes a side information partially guided convolutional sparse coding (SCSC) model for pansharpening. The key idea is to split the low resolution multispectral image into a panchromatic image related feature map and a panchromatic image irrelated feature map, where the former one is regularized by the side information from…
▽ More
Pansharpening is a fundamental issue in remote sensing field. This paper proposes a side information partially guided convolutional sparse coding (SCSC) model for pansharpening. The key idea is to split the low resolution multispectral image into a panchromatic image related feature map and a panchromatic image irrelated feature map, where the former one is regularized by the side information from panchromatic images. With the principle of algorithm unrolling techniques, the proposed model is generalized as a deep neural network, called as SCSC pansharpening neural network (SCSC-PNN). Compared with 13 classic and state-of-the-art methods on three satellites, the numerical experiments show that SCSC-PNN is superior to others. The codes are available at https://github.com/xsxjtu/SCSC-PNN.
△ Less
Submitted 10 March, 2021;
originally announced March 2021.