-
Pretrained LLMs as Real-Time Controllers for Robot Operated Serial Production Line
Authors:
Muhammad Waseem,
Kshitij Bhatta,
Chen Li,
Qing Chang
Abstract:
The manufacturing industry is undergoing a transformative shift, driven by cutting-edge technologies like 5G, AI, and cloud computing. Despite these advancements, effective system control, which is crucial for optimizing production efficiency, remains a complex challenge due to the intricate, knowledge-dependent nature of manufacturing processes and the reliance on domain-specific expertise. Conve…
▽ More
The manufacturing industry is undergoing a transformative shift, driven by cutting-edge technologies like 5G, AI, and cloud computing. Despite these advancements, effective system control, which is crucial for optimizing production efficiency, remains a complex challenge due to the intricate, knowledge-dependent nature of manufacturing processes and the reliance on domain-specific expertise. Conventional control methods often demand heavy customization, considerable computational resources, and lack transparency in decision-making. In this work, we investigate the feasibility of using Large Language Models (LLMs), particularly GPT-4, as a straightforward, adaptable solution for controlling manufacturing systems, specifically, mobile robot scheduling. We introduce an LLM-based control framework to assign mobile robots to different machines in robot assisted serial production lines, evaluating its performance in terms of system throughput. Our proposed framework outperforms traditional scheduling approaches such as First-Come-First-Served (FCFS), Shortest Processing Time (SPT), and Longest Processing Time (LPT). While it achieves performance that is on par with state-of-the-art methods like Multi-Agent Reinforcement Learning (MARL), it offers a distinct advantage by delivering comparable throughput without the need for extensive retraining. These results suggest that the proposed LLM-based solution is well-suited for scenarios where technical expertise, computational resources, and financial investment are limited, while decision transparency and system scalability are critical concerns.
△ Less
Submitted 5 March, 2025;
originally announced March 2025.
-
Rapid Bone Scintigraphy Enhancement via Semantic Prior Distillation from Segment Anything Model
Authors:
Pengchen Liang,
Leijun Shi,
Huiping Yao,
Bin Pu,
Jianguo Chen,
Lei Zhao,
Haishan Huang,
Zhuangzhuang Chen,
Zhaozhao Xu,
Lite Xu,
Qing Chang,
Yiwei Li
Abstract:
Rapid bone scintigraphy is crucial for diagnosing skeletal disorders and detecting tumor metastases in children, as it shortens scan duration and reduces discomfort. However, accelerated acquisition often degrades image quality, impairing the visibility of fine anatomical details and potentially compromising diagnosis. To overcome this limitation, we introduce the first application of SAM-based se…
▽ More
Rapid bone scintigraphy is crucial for diagnosing skeletal disorders and detecting tumor metastases in children, as it shortens scan duration and reduces discomfort. However, accelerated acquisition often degrades image quality, impairing the visibility of fine anatomical details and potentially compromising diagnosis. To overcome this limitation, we introduce the first application of SAM-based semantic priors for medical image restoration, utilizing the Segment Anything Model (SAM) to enhance pediatric rapid bone scintigraphy. Our approach employs two cascaded networks, $f^{IR1}$ and $f^{IR2}$, supported by three specialized modules: a Semantic Prior Integration (SPI) module, a Semantic Knowledge Distillation (SKD) module, and a Semantic Consistency Module (SCM). The SPI and SKD modules inject domain-specific semantic cues from a fine-tuned SAM, while the SCM preserves coherent semantic feature representations across both cascaded stages. Moreover, we present RBS, a novel Rapid Bone Scintigraphy dataset comprising paired standard (20 cm/min) and rapid (40 cm/min) scans from 137 pediatric patients aged 0.5 - 16 years, making it the first dataset tailored for pediatric rapid bone scintigraphy restoration. Extensive experiments on both a public endoscopic dataset and our RBS dataset demonstrate that our method consistently surpasses existing techniques in PSNR, SSIM, FID, and LPIPS metrics.
△ Less
Submitted 4 June, 2025; v1 submitted 4 March, 2025;
originally announced March 2025.
-
Vision Foundation Models in Medical Image Analysis: Advances and Challenges
Authors:
Pengchen Liang,
Bin Pu,
Haishan Huang,
Yiwei Li,
Hualiang Wang,
Weibo Ma,
Qing Chang
Abstract:
The rapid development of Vision Foundation Models (VFMs), particularly Vision Transformers (ViT) and Segment Anything Model (SAM), has sparked significant advances in the field of medical image analysis. These models have demonstrated exceptional capabilities in capturing long-range dependencies and achieving high generalization in segmentation tasks. However, adapting these large models to medica…
▽ More
The rapid development of Vision Foundation Models (VFMs), particularly Vision Transformers (ViT) and Segment Anything Model (SAM), has sparked significant advances in the field of medical image analysis. These models have demonstrated exceptional capabilities in capturing long-range dependencies and achieving high generalization in segmentation tasks. However, adapting these large models to medical image analysis presents several challenges, including domain differences between medical and natural images, the need for efficient model adaptation strategies, and the limitations of small-scale medical datasets. This paper reviews the state-of-the-art research on the adaptation of VFMs to medical image segmentation, focusing on the challenges of domain adaptation, model compression, and federated learning. We discuss the latest developments in adapter-based improvements, knowledge distillation techniques, and multi-scale contextual feature modeling, and propose future directions to overcome these bottlenecks. Our analysis highlights the potential of VFMs, along with emerging methodologies such as federated learning and model compression, to revolutionize medical image analysis and enhance clinical applications. The goal of this work is to provide a comprehensive overview of current approaches and suggest key areas for future research that can drive the next wave of innovation in medical image segmentation.
△ Less
Submitted 20 February, 2025; v1 submitted 20 February, 2025;
originally announced February 2025.
-
Topology-Aware Wavelet Mamba for Airway Structure Segmentation in Postoperative Recurrent Nasopharyngeal Carcinoma CT Scans
Authors:
Haishan Huang,
Pengchen Liang,
Naier Lin,
Luxi Wang,
Bin Pu,
Jianguo Chen,
Qing Chang,
Xia Shen,
Guo Ran
Abstract:
Nasopharyngeal carcinoma (NPC) patients often undergo radiotherapy and chemotherapy, which can lead to postoperative complications such as limited mouth opening and joint stiffness, particularly in recurrent cases that require re-surgery. These complications can affect airway function, making accurate postoperative airway risk assessment essential for managing patient care. Accurate segmentation o…
▽ More
Nasopharyngeal carcinoma (NPC) patients often undergo radiotherapy and chemotherapy, which can lead to postoperative complications such as limited mouth opening and joint stiffness, particularly in recurrent cases that require re-surgery. These complications can affect airway function, making accurate postoperative airway risk assessment essential for managing patient care. Accurate segmentation of airway-related structures in postoperative CT scans is crucial for assessing these risks. This study introduces TopoWMamba (Topology-aware Wavelet Mamba), a novel segmentation model specifically designed to address the challenges of postoperative airway risk evaluation in recurrent NPC patients. TopoWMamba combines wavelet-based multi-scale feature extraction, state-space sequence modeling, and topology-aware modules to segment airway-related structures in CT scans robustly. By leveraging the Wavelet-based Mamba Block (WMB) for hierarchical frequency decomposition and the Snake Conv VSS (SCVSS) module to preserve anatomical continuity, TopoWMamba effectively captures both fine-grained boundaries and global structural context, crucial for accurate segmentation in complex postoperative scenarios. Through extensive testing on the NPCSegCT dataset, TopoWMamba achieves an average Dice score of 88.02%, outperforming existing models such as UNet, Attention UNet, and SwinUNet. Additionally, TopoWMamba is tested on the SegRap 2023 Challenge dataset, where it shows a significant improvement in trachea segmentation with a Dice score of 95.26%. The proposed model provides a strong foundation for automated segmentation, enabling more accurate postoperative airway risk evaluation.
△ Less
Submitted 20 February, 2025;
originally announced February 2025.
-
Toward Zero-Shot Learning for Visual Dehazing of Urological Surgical Robots
Authors:
Renkai Wu,
Xianjin Wang,
Pengchen Liang,
Zhenyu Zhang,
Qing Chang,
Hao Tang
Abstract:
Robot-assisted surgery has profoundly influenced current forms of minimally invasive surgery. However, in transurethral suburethral urological surgical robots, they need to work in a liquid environment. This causes vaporization of the liquid when shearing and heating is performed, resulting in bubble atomization that affects the visual perception of the robot. This can lead to the need for uninter…
▽ More
Robot-assisted surgery has profoundly influenced current forms of minimally invasive surgery. However, in transurethral suburethral urological surgical robots, they need to work in a liquid environment. This causes vaporization of the liquid when shearing and heating is performed, resulting in bubble atomization that affects the visual perception of the robot. This can lead to the need for uninterrupted pauses in the surgical procedure, which makes the surgery take longer. To address the atomization characteristics of liquids under urological surgical robotic vision, we propose an unsupervised zero-shot dehaze method (RSF-Dehaze) for urological surgical robotic vision. Specifically, the proposed Region Similarity Filling Module (RSFM) of RSF-Dehaze significantly improves the recovery of blurred region tissues. In addition, we organize and propose a dehaze dataset for robotic vision in urological surgery (USRobot-Dehaze dataset). In particular, this dataset contains the three most common urological surgical robot operation scenarios. To the best of our knowledge, we are the first to organize and propose a publicly available dehaze dataset for urological surgical robot vision. The proposed RSF-Dehaze proves the effectiveness of our method in three urological surgical robot operation scenarios with extensive comparative experiments with 20 most classical and advanced dehazing and image recovery algorithms. The proposed source code and dataset are available at https://github.com/wurenkai/RSF-Dehaze .
△ Less
Submitted 2 October, 2024;
originally announced October 2024.
-
RICAU-Net: Residual-block Inspired Coordinate Attention U-Net for Segmentation of Small and Sparse Calcium Lesions in Cardiac CT
Authors:
Doyoung Park,
Jinsoo Kim,
Qi Chang,
Shuang Leng,
Liang Zhong,
Lohendran Baskaran
Abstract:
The Agatston score, which is the sum of the calcification in the four main coronary arteries, has been widely used in the diagnosis of coronary artery disease (CAD). However, many studies have emphasized the importance of the vessel-specific Agatston score, as calcification in a specific vessel is significantly correlated with the occurrence of coronary heart disease (CHD). In this paper, we propo…
▽ More
The Agatston score, which is the sum of the calcification in the four main coronary arteries, has been widely used in the diagnosis of coronary artery disease (CAD). However, many studies have emphasized the importance of the vessel-specific Agatston score, as calcification in a specific vessel is significantly correlated with the occurrence of coronary heart disease (CHD). In this paper, we propose the Residual-block Inspired Coordinate Attention U-Net (RICAU-Net), which incorporates coordinate attention in two distinct manners and a customized combo loss function for lesion-specific coronary artery calcium (CAC) segmentation. This approach aims to tackle the high class-imbalance issue associated with small and sparse CAC lesions. Experimental results and the ablation study demonstrate that the proposed method outperforms the five other U-Net based methods used in medical applications, by achieving the highest per-lesion Dice scores across all four lesions.
△ Less
Submitted 19 January, 2025; v1 submitted 10 September, 2024;
originally announced September 2024.
-
MNeRV: A Multilayer Neural Representation for Videos
Authors:
Qingling Chang,
Haohui Yu,
Shuxuan Fu,
Zhiqiang Zeng,
Chuangquan Chen
Abstract:
As a novel video representation method, Neural Representations for Videos (NeRV) has shown great potential in the fields of video compression, video restoration, and video interpolation. In the process of representing videos using NeRV, each frame corresponds to an embedding, which is then reconstructed into a video frame sequence after passing through a small number of decoding layers (E-NeRV, HN…
▽ More
As a novel video representation method, Neural Representations for Videos (NeRV) has shown great potential in the fields of video compression, video restoration, and video interpolation. In the process of representing videos using NeRV, each frame corresponds to an embedding, which is then reconstructed into a video frame sequence after passing through a small number of decoding layers (E-NeRV, HNeRV, etc.). However, this small number of decoding layers can easily lead to the problem of redundant model parameters due to the large proportion of parameters in a single decoding layer, which greatly restricts the video regression ability of neural network models. In this paper, we propose a multilayer neural representation for videos (MNeRV) and design a new decoder M-Decoder and its matching encoder M-Encoder. MNeRV has more encoding and decoding layers, which effectively alleviates the problem of redundant model parameters caused by too few layers. In addition, we design MNeRV blocks to perform more uniform and effective parameter allocation between decoding layers. In the field of video regression reconstruction, we achieve better reconstruction quality (+4.06 PSNR) with fewer parameters. Finally, we showcase MNeRV performance in downstream tasks such as video restoration and video interpolation. The source code of MNeRV is available at https://github.com/Aaronbtb/MNeRV.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
UltraLight VM-UNet: Parallel Vision Mamba Significantly Reduces Parameters for Skin Lesion Segmentation
Authors:
Renkai Wu,
Yinghao Liu,
Pengchen Liang,
Qing Chang
Abstract:
Traditionally for improving the segmentation performance of models, most approaches prefer to use adding more complex modules. And this is not suitable for the medical field, especially for mobile medical devices, where computationally loaded models are not suitable for real clinical environments due to computational resource constraints. Recently, state-space models (SSMs), represented by Mamba,…
▽ More
Traditionally for improving the segmentation performance of models, most approaches prefer to use adding more complex modules. And this is not suitable for the medical field, especially for mobile medical devices, where computationally loaded models are not suitable for real clinical environments due to computational resource constraints. Recently, state-space models (SSMs), represented by Mamba, have become a strong competitor to traditional CNNs and Transformers. In this paper, we deeply explore the key elements of parameter influence in Mamba and propose an UltraLight Vision Mamba UNet (UltraLight VM-UNet) based on this. Specifically, we propose a method for processing features in parallel Vision Mamba, named PVM Layer, which achieves excellent performance with the lowest computational load while keeping the overall number of processing channels constant. We conducted comparisons and ablation experiments with several state-of-the-art lightweight models on three skin lesion public datasets and demonstrated that the UltraLight VM-UNet exhibits the same strong performance competitiveness with parameters of only 0.049M and GFLOPs of 0.060. In addition, this study deeply explores the key elements of parameter influence in Mamba, which will lay a theoretical foundation for Mamba to possibly become a new mainstream module for lightweighting in the future. The code is available from https://github.com/wurenkai/UltraLight-VM-UNet .
△ Less
Submitted 24 April, 2024; v1 submitted 29 March, 2024;
originally announced March 2024.
-
Dealing With Heterogeneous 3D MR Knee Images: A Federated Few-Shot Learning Method With Dual Knowledge Distillation
Authors:
Xiaoxiao He,
Chaowei Tan,
Bo Liu,
Liping Si,
Weiwu Yao,
Liang Zhao,
Di Liu,
Qilong Zhangli,
Qi Chang,
Kang Li,
Dimitris N. Metaxas
Abstract:
Federated Learning has gained popularity among medical institutions since it enables collaborative training between clients (e.g., hospitals) without aggregating data. However, due to the high cost associated with creating annotations, especially for large 3D image datasets, clinical institutions do not have enough supervised data for training locally. Thus, the performance of the collaborative mo…
▽ More
Federated Learning has gained popularity among medical institutions since it enables collaborative training between clients (e.g., hospitals) without aggregating data. However, due to the high cost associated with creating annotations, especially for large 3D image datasets, clinical institutions do not have enough supervised data for training locally. Thus, the performance of the collaborative model is subpar under limited supervision. On the other hand, large institutions have the resources to compile data repositories with high-resolution images and labels. Therefore, individual clients can utilize the knowledge acquired in the public data repositories to mitigate the shortage of private annotated images. In this paper, we propose a federated few-shot learning method with dual knowledge distillation. This method allows joint training with limited annotations across clients without jeopardizing privacy. The supervised learning of the proposed method extracts features from limited labeled data in each client, while the unsupervised data is used to distill both feature and response-based knowledge from a national data repository to further improve the accuracy of the collaborative model and reduce the communication cost. Extensive evaluations are conducted on 3D magnetic resonance knee images from a private clinical dataset. Our proposed method shows superior performance and less training time than other semi-supervised federated learning methods. Codes and additional visualization results are available at https://github.com/hexiaoxiao-cs/fedml-knee.
△ Less
Submitted 17 April, 2023; v1 submitted 25 March, 2023;
originally announced March 2023.
-
Autofluorescence Bronchoscopy Video Analysis for Lesion Frame Detection
Authors:
Qi Chang,
Rebecca Bascom,
Jennifer Toth,
Danish Ahmad,
William E. Higgins
Abstract:
Because of the significance of bronchial lesions as indicators of early lung cancer and squamous cell carcinoma, a critical need exists for early detection of bronchial lesions. Autofluorescence bronchoscopy (AFB) is a primary modality used for bronchial lesion detection, as it shows high sensitivity to suspicious lesions. The physician, however, must interactively browse a long video stream to lo…
▽ More
Because of the significance of bronchial lesions as indicators of early lung cancer and squamous cell carcinoma, a critical need exists for early detection of bronchial lesions. Autofluorescence bronchoscopy (AFB) is a primary modality used for bronchial lesion detection, as it shows high sensitivity to suspicious lesions. The physician, however, must interactively browse a long video stream to locate lesions, making the search exceedingly tedious and error prone. Unfortunately, limited research has explored the use of automated AFB video analysis for efficient lesion detection. We propose a robust automatic AFB analysis approach that distinguishes informative and uninformative AFB video frames in a video. In addition, for the informative frames, we determine the frames containing potential lesions and delineate candidate lesion regions. Our approach draws upon a combination of computer-based image analysis, machine learning, and deep learning. Thus, the analysis of an AFB video stream becomes more tractable. Tests with patient AFB video indicate that $\ge$97\% of frames were correctly labeled as informative or uninformative. In addition, $\ge$97\% of lesion frames were correctly identified, with false positive and false negative rates $\le$3\%.
△ Less
Submitted 21 March, 2023;
originally announced March 2023.
-
Bronchoscopic video synchronization for interactive multimodal inspection of bronchial lesions
Authors:
Qi Chang,
Patrick D. Byrnes,
Danish Ahmad,
Jennifer Toth,
Rebecca Bascom,
William E. Higgins
Abstract:
With lung cancer being the most fatal cancer worldwide, it is important to detect the disease early. A potentially effective way of detecting early cancer lesions developing along the airway walls (epithelium) is bronchoscopy. To this end, developments in bronchoscopy offer three promising noninvasive modalities for imaging bronchial lesions: white-light bronchoscopy (WLB), autofluorescence bronch…
▽ More
With lung cancer being the most fatal cancer worldwide, it is important to detect the disease early. A potentially effective way of detecting early cancer lesions developing along the airway walls (epithelium) is bronchoscopy. To this end, developments in bronchoscopy offer three promising noninvasive modalities for imaging bronchial lesions: white-light bronchoscopy (WLB), autofluorescence bronchoscopy (AFB), and narrow-band imaging (NBI). While these modalities give complementary views of the airway epithelium, the physician must manually inspect each video stream produced by a given modality to locate the suspect cancer lesions. Unfortunately, no effort has been made to rectify this situation by providing efficient quantitative and visual tools for analyzing these video streams. This makes the lesion search process extremely time-consuming and error-prone, thereby making it impractical to utilize these rich data sources effectively. We propose a framework for synchronizing multiple bronchoscopic videos to enable an interactive multimodal analysis of bronchial lesions. Our methods first register the video streams to a reference 3D chest computed-tomography (CT) scan to produce multimodal linkages to the airway tree. Our methods then temporally correlate the videos to one another to enable synchronous visualization of the resulting multimodal data set. Pictorial and quantitative results illustrate the potential of the methods.
△ Less
Submitted 20 March, 2023;
originally announced March 2023.
-
ESFPNet: efficient deep learning architecture for real-time lesion segmentation in autofluorescence bronchoscopic video
Authors:
Qi Chang,
Danish Ahmad,
Jennifer Toth,
Rebecca Bascom,
William E. Higgins
Abstract:
Lung cancer tends to be detected at an advanced stage, resulting in a high patient mortality rate. Thus, much recent research has focused on early disease detection Bronchoscopy is the procedure of choice for an effective noninvasive way of detecting early manifestations (bronchial lesions) of lung cancer. In particular, autofluorescence bronchoscopy (AFB) discriminates the autofluorescence proper…
▽ More
Lung cancer tends to be detected at an advanced stage, resulting in a high patient mortality rate. Thus, much recent research has focused on early disease detection Bronchoscopy is the procedure of choice for an effective noninvasive way of detecting early manifestations (bronchial lesions) of lung cancer. In particular, autofluorescence bronchoscopy (AFB) discriminates the autofluorescence properties of normal (green) and diseased tissue (reddish brown) with different colors. Because recent studies show AFB's high sensitivity in searching lesions, it has become a potentially pivotal method in bronchoscopic airway exams. Unfortunately, manual inspection of AFB video is extremely tedious and error prone, while limited effort has been expended toward potentially more robust automatic AFB lesion analysis. We propose a real-time (processing throughput of 27 frames/sec) deep-learning architecture dubbed ESFPNet for accurate segmentation and robust detection of bronchial lesions in AFB video streams. The architecture features an encoder structure that exploits pretrained Mix Transformer (MiT) encoders and an efficient stage-wise feature pyramid (ESFP) decoder structure. Segmentation results from the AFB airway-exam videos of 20 lung cancer patients indicate that our approach gives a mean Dice index = 0.756 and an average Intersection of Union = 0.624, results that are superior to those generated by other recent architectures. Thus, ESFPNet gives the physician a potential tool for confident real-time lesion segmentation and detection during a live bronchoscopic airway exam. Moreover, our model shows promising potential applicability to other domains, as evidenced by its state-of-the-art (SOTA) performance on the CVC-ClinicDB, ETIS-LaribPolypDB datasets, and superior performance on the Kvasir, CVC-ColonDB datasets.
△ Less
Submitted 8 December, 2022; v1 submitted 15 July, 2022;
originally announced July 2022.
-
Reinforcement Learning Based User-Guided Motion Planning for Human-Robot Collaboration
Authors:
Tian Yu,
Qing Chang
Abstract:
Robots are good at performing repetitive tasks in modern manufacturing industries. However, robot motions are mostly planned and preprogrammed with a notable lack of adaptivity to task changes. Even for slightly changed tasks, the whole system must be reprogrammed by robotics experts. Therefore, it is highly desirable to have a flexible motion planning method, with which robots can adapt to specif…
▽ More
Robots are good at performing repetitive tasks in modern manufacturing industries. However, robot motions are mostly planned and preprogrammed with a notable lack of adaptivity to task changes. Even for slightly changed tasks, the whole system must be reprogrammed by robotics experts. Therefore, it is highly desirable to have a flexible motion planning method, with which robots can adapt to specific task changes in unstructured environments, such as production systems or warehouses, with little or no intervention from non-expert personnel. In this paper, we propose a user-guided motion planning algorithm in combination with the reinforcement learning (RL) method to enable robots automatically generate their motion plans for new tasks by learning from a few kinesthetic human demonstrations. To achieve adaptive motion plans for a specific application environment, e.g., desk assembly or warehouse loading/unloading, a library is built by abstracting features of common human demonstrated tasks. The definition of semantical similarity between features in the library and features of a new task is proposed and further used to construct the reward function in RL. The RL policy can automatically generate motion plans for a new task if it determines that new task constraints can be satisfied with the current library and request additional human demonstrations. Multiple experiments conducted on common tasks and scenarios demonstrate that the proposed user-guided RL-assisted motion planning method is effective.
△ Less
Submitted 1 July, 2022;
originally announced July 2022.
-
Paint shop vehicle sequencing based on quantum computing considering color changeover and painting quality
Authors:
Jing Huang,
Hua-Tzu Fan,
Guoxian Xiao,
Qing Chang
Abstract:
As customer demands become increasingly diverse, the colors and styles of vehicles offered by automotive companies have also grown substantially. It poses great challenges to design and management of automotive manufacturing system, among which is the proper sequencing of vehicles in everyday operation of the paint shop. With typically hundreds of vehicles in one shift, the paint shop sequencing p…
▽ More
As customer demands become increasingly diverse, the colors and styles of vehicles offered by automotive companies have also grown substantially. It poses great challenges to design and management of automotive manufacturing system, among which is the proper sequencing of vehicles in everyday operation of the paint shop. With typically hundreds of vehicles in one shift, the paint shop sequencing problem is intractable in classical computing. In this paper, we propose to solve a general paint shop sequencing problem using state-of-the-art quantum computing algorithms. Most existing works are solely focused on reducing color changeover costs, i.e., costs incurred by different colors between consecutive vehicles. This work reveals that different sequencing of vehicles also significantly affects the quality performance of the painting process. We use a machine learning model pretrained on historical data to predict the probability of painting defect. The problem is formulated as a combinational optimization problem with two cost components, i.e., color changeover cost and repair cost. The problem is further converted to a quantum optimization problem and solved with Quantum Approximation Optimization Algorithm (QAOA). As a matter of fact, current quantum computers are still limited in accuracy and scalability. However, with a simplified case study, we demonstrate how the classic sequencing problem in paint shop can be formulated and solved using quantum computing and demonstrate the potential of quantum computing in solving real problems in manufacturing systems.
△ Less
Submitted 22 June, 2022;
originally announced June 2022.
-
Atrial Fibrillation Detection Using Weight-Pruned, Log-Quantised Convolutional Neural Networks
Authors:
Xiu Qi Chang,
Ann Feng Chew,
Benjamin Chen Ming Choong,
Shuhui Wang,
Rui Han,
Wang He,
Li Xiaolin,
Rajesh C. Panicker,
Deepu John
Abstract:
Deep neural networks (DNN) are a promising tool in medical applications. However, the implementation of complex DNNs on battery-powered devices is challenging due to high energy costs for communication. In this work, a convolutional neural network model is developed for detecting atrial fibrillation from electrocardiogram (ECG) signals. The model demonstrates high performance despite being trained…
▽ More
Deep neural networks (DNN) are a promising tool in medical applications. However, the implementation of complex DNNs on battery-powered devices is challenging due to high energy costs for communication. In this work, a convolutional neural network model is developed for detecting atrial fibrillation from electrocardiogram (ECG) signals. The model demonstrates high performance despite being trained on limited, variable-length input data. Weight pruning and logarithmic quantisation are combined to introduce sparsity and reduce model size, which can be exploited for reduced data movement and lower computational complexity. The final model achieved a 91.1% model compression ratio while maintaining high model accuracy of 91.7% and less than 1% loss.
△ Less
Submitted 14 June, 2022;
originally announced June 2022.
-
DeepRecon: Joint 2D Cardiac Segmentation and 3D Volume Reconstruction via A Structure-Specific Generative Method
Authors:
Qi Chang,
Zhennan Yan,
Mu Zhou,
Di Liu,
Khalid Sawalha,
Meng Ye,
Qilong Zhangli,
Mikael Kanski,
Subhi Al Aref,
Leon Axel,
Dimitris Metaxas
Abstract:
Joint 2D cardiac segmentation and 3D volume reconstruction are fundamental to building statistical cardiac anatomy models and understanding functional mechanisms from motion patterns. However, due to the low through-plane resolution of cine MR and high inter-subject variance, accurately segmenting cardiac images and reconstructing the 3D volume are challenging. In this study, we propose an end-to-…
▽ More
Joint 2D cardiac segmentation and 3D volume reconstruction are fundamental to building statistical cardiac anatomy models and understanding functional mechanisms from motion patterns. However, due to the low through-plane resolution of cine MR and high inter-subject variance, accurately segmenting cardiac images and reconstructing the 3D volume are challenging. In this study, we propose an end-to-end latent-space-based framework, DeepRecon, that generates multiple clinically essential outcomes, including accurate image segmentation, synthetic high-resolution 3D image, and 3D reconstructed volume. Our method identifies the optimal latent representation of the cine image that contains accurate semantic information for cardiac structures. In particular, our model jointly generates synthetic images with accurate semantic information and segmentation of the cardiac structures using the optimal latent representation. We further explore downstream applications of 3D shape reconstruction and 4D motion pattern adaptation by the different latent-space manipulation strategies.The simultaneously generated high-resolution images present a high interpretable value to assess the cardiac shape and motion.Experimental results demonstrate the effectiveness of our approach on multiple fronts including 2D segmentation, 3D reconstruction, downstream 4D motion pattern adaption performance.
△ Less
Submitted 14 June, 2022;
originally announced June 2022.
-
TransFusion: Multi-view Divergent Fusion for Medical Image Segmentation with Transformers
Authors:
Di Liu,
Yunhe Gao,
Qilong Zhangli,
Ligong Han,
Xiaoxiao He,
Zhaoyang Xia,
Song Wen,
Qi Chang,
Zhennan Yan,
Mu Zhou,
Dimitris Metaxas
Abstract:
Combining information from multi-view images is crucial to improve the performance and robustness of automated methods for disease diagnosis. However, due to the non-alignment characteristics of multi-view images, building correlation and data fusion across views largely remain an open problem. In this study, we present TransFusion, a Transformer-based architecture to merge divergent multi-view im…
▽ More
Combining information from multi-view images is crucial to improve the performance and robustness of automated methods for disease diagnosis. However, due to the non-alignment characteristics of multi-view images, building correlation and data fusion across views largely remain an open problem. In this study, we present TransFusion, a Transformer-based architecture to merge divergent multi-view imaging information using convolutional layers and powerful attention mechanisms. In particular, the Divergent Fusion Attention (DiFA) module is proposed for rich cross-view context modeling and semantic dependency mining, addressing the critical issue of capturing long-range correlations between unaligned data from different image views. We further propose the Multi-Scale Attention (MSA) to collect global correspondence of multi-scale feature representations. We evaluate TransFusion on the Multi-Disease, Multi-View \& Multi-Center Right Ventricular Segmentation in Cardiac MRI (M\&Ms-2) challenge cohort. TransFusion demonstrates leading performance against the state-of-the-art methods and opens up new perspectives for multi-view imaging integration towards robust medical image segmentation.
△ Less
Submitted 5 September, 2022; v1 submitted 21 March, 2022;
originally announced March 2022.
-
Modality Bank: Learn multi-modality images across data centers without sharing medical data
Authors:
Qi Chang,
Hui Qu,
Zhennan Yan,
Yunhe Gao,
Lohendran Baskaran,
Dimitris Metaxas
Abstract:
Multi-modality images have been widely used and provide comprehensive information for medical image analysis. However, acquiring all modalities among all institutes is costly and often impossible in clinical settings. To leverage more comprehensive multi-modality information, we propose a privacy secured decentralized multi-modality adaptive learning architecture named ModalityBank. Our method cou…
▽ More
Multi-modality images have been widely used and provide comprehensive information for medical image analysis. However, acquiring all modalities among all institutes is costly and often impossible in clinical settings. To leverage more comprehensive multi-modality information, we propose a privacy secured decentralized multi-modality adaptive learning architecture named ModalityBank. Our method could learn a set of effective domain-specific modulation parameters plugged into a common domain-agnostic network. We demonstrate by switching different sets of configurations, the generator could output high-quality images for a specific modality. Our method could also complete the missing modalities across all data centers, thus could be used for modality completion purposes. The downstream task trained from the synthesized multi-modality samples could achieve higher performance than learning from one real data center and achieve close-to-real performance compare with all real images.
△ Less
Submitted 21 January, 2022;
originally announced January 2022.
-
Learn distributed GAN with Temporary Discriminators
Authors:
Hui Qu,
Yikai Zhang,
Qi Chang,
Zhennan Yan,
Chao Chen,
Dimitris Metaxas
Abstract:
In this work, we propose a method for training distributed GAN with sequential temporary discriminators. Our proposed method tackles the challenge of training GAN in the federated learning manner: How to update the generator with a flow of temporary discriminators? We apply our proposed method to learn a self-adaptive generator with a series of local discriminators from multiple data centers. We s…
▽ More
In this work, we propose a method for training distributed GAN with sequential temporary discriminators. Our proposed method tackles the challenge of training GAN in the federated learning manner: How to update the generator with a flow of temporary discriminators? We apply our proposed method to learn a self-adaptive generator with a series of local discriminators from multiple data centers. We show our design of loss function indeed learns the correct distribution with provable guarantees. The empirical experiments show that our approach is capable of generating synthetic data which is practical for real-world applications such as training a segmentation model.
△ Less
Submitted 17 July, 2020;
originally announced July 2020.
-
Mastering the working sequence in human-robot collaborative assembly based on reinforcement learning
Authors:
Tian Yu,
Jing Huang,
Qing Chang
Abstract:
A long-standing goal of the Human-Robot Collaboration (HRC) in manufacturing systems is to increase the collaborative working efficiency. In line with the trend of Industry 4.0 to build up the smart manufacturing system, the Co-robot in the HRC system deserves better designing to be more self-organized and to find the superhuman proficiency by self-learning. Inspired by the impressive machine lear…
▽ More
A long-standing goal of the Human-Robot Collaboration (HRC) in manufacturing systems is to increase the collaborative working efficiency. In line with the trend of Industry 4.0 to build up the smart manufacturing system, the Co-robot in the HRC system deserves better designing to be more self-organized and to find the superhuman proficiency by self-learning. Inspired by the impressive machine learning algorithms developed by Google Deep Mind like Alphago Zero, in this paper, the human-robot collaborative assembly working process is formatted into a chessboard and the selection of moves in the chessboard is used to analogize the decision making by both human and robot in the HRC assembly working process. To obtain the optimal policy of working sequence to maximize the working efficiency, the robot is trained with a self-play algorithm based on reinforcement learning, without guidance or domain knowledge beyond game rules. A neural network is also trained to predict the distribution of the priority of move selections and whether a working sequence is the one resulting in the maximum of the HRC efficiency. An adjustable desk assembly is used to demonstrate the proposed HRC assembly algorithm and its efficiency.
△ Less
Submitted 8 July, 2020;
originally announced July 2020.
-
Synthetic Learning: Learn From Distributed Asynchronized Discriminator GAN Without Sharing Medical Image Data
Authors:
Qi Chang,
Hui Qu,
Yikai Zhang,
Mert Sabuncu,
Chao Chen,
Tong Zhang,
Dimitris Metaxas
Abstract:
In this paper, we propose a data privacy-preserving and communication efficient distributed GAN learning framework named Distributed Asynchronized Discriminator GAN (AsynDGAN). Our proposed framework aims to train a central generator learns from distributed discriminator, and use the generated synthetic image solely to train the segmentation model.We validate the proposed framework on the applicat…
▽ More
In this paper, we propose a data privacy-preserving and communication efficient distributed GAN learning framework named Distributed Asynchronized Discriminator GAN (AsynDGAN). Our proposed framework aims to train a central generator learns from distributed discriminator, and use the generated synthetic image solely to train the segmentation model.We validate the proposed framework on the application of health entities learning problem which is known to be privacy sensitive. Our experiments show that our approach: 1) could learn the real image's distribution from multiple datasets without sharing the patient's raw data. 2) is more efficient and requires lower bandwidth than other distributed deep learning methods. 3) achieves higher performance compared to the model trained by one real dataset, and almost the same performance compared to the model trained by all real datasets. 4) has provable guarantees that the generator could learn the distributed distribution in an all important fashion thus is unbiased.
△ Less
Submitted 14 June, 2020; v1 submitted 29 May, 2020;
originally announced June 2020.
-
Autonomous Voltage Control for Grid Operation Using Deep Reinforcement Learning
Authors:
Ruisheng Diao,
Zhiwei Wang,
Di Shi,
Qianyun Chang,
Jiajun Duan,
Xiaohu Zhang
Abstract:
Modern power grids are experiencing grand challenges caused by the stochastic and dynamic nature of growing renewable energy and demand response. Traditional theoretical assumptions and operational rules may be violated, which are difficult to be adapted by existing control systems due to the lack of computational power and accurate grid models for use in real time, leading to growing concerns in…
▽ More
Modern power grids are experiencing grand challenges caused by the stochastic and dynamic nature of growing renewable energy and demand response. Traditional theoretical assumptions and operational rules may be violated, which are difficult to be adapted by existing control systems due to the lack of computational power and accurate grid models for use in real time, leading to growing concerns in the secure and economic operation of the power grid. Existing operational control actions are typically determined offline, which are less optimized. This paper presents a novel paradigm, Grid Mind, for autonomous grid operational controls using deep reinforcement learning. The proposed AI agent for voltage control can learn its control policy through interactions with massive offline simulations, and adapts its behavior to new changes including not only load/generation variations but also topological changes. A properly trained agent is tested on the IEEE 14-bus system with tens of thousands of scenarios, and promising performance is demonstrated in applying autonomous voltage controls for secure grid operation.
△ Less
Submitted 23 April, 2019;
originally announced April 2019.
-
Non-Local Graph-Based Prediction For Reversible Data Hiding In Images
Authors:
Qi Chang,
Gene Cheung,
Yao Zhao,
Xiaolong Li,
Rongrong Ni
Abstract:
Reversible data hiding (RDH) is desirable in applications where both the hidden message and the cover medium need to be recovered without loss. Among many RDH approaches is prediction-error expansion (PEE), containing two steps: i) prediction of a target pixel value, and ii) embedding according to the value of prediction-error. In general, higher prediction performance leads to larger embedding ca…
▽ More
Reversible data hiding (RDH) is desirable in applications where both the hidden message and the cover medium need to be recovered without loss. Among many RDH approaches is prediction-error expansion (PEE), containing two steps: i) prediction of a target pixel value, and ii) embedding according to the value of prediction-error. In general, higher prediction performance leads to larger embedding capacity and/or lower signal distortion. Leveraging on recent advances in graph signal processing (GSP), we pose pixel prediction as a graph-signal restoration problem, where the appropriate edge weights of the underlying graph are computed using a similar patch searched in a semi-local neighborhood. Specifically, for each candidate patch, we first examine eigenvalues of its structure tensor to estimate its local smoothness. If sufficiently smooth, we pose a maximum a posteriori (MAP) problem using either a quadratic Laplacian regularizer or a graph total variation (GTV) term as signal prior. While the MAP problem using the first prior has a closed-form solution, we design an efficient algorithm for the second prior using alternating direction method of multipliers (ADMM) with nested proximal gradient descent. Experimental results show that with better quality GSP-based prediction, at low capacity the visual quality of the embedded image exceeds state-of-the-art methods noticeably.
△ Less
Submitted 19 February, 2018;
originally announced February 2018.