-
Prompt Mechanisms in Medical Imaging: A Comprehensive Survey
Authors:
Hao Yang,
Xinlong Liang,
Zhang Li,
Yue Sun,
Zheyu Hu,
Xinghe Xie,
Behdad Dashtbozorg,
Jincheng Huang,
Shiwei Zhu,
Luyi Han,
Jiong Zhang,
Shanshan Wang,
Ritse Mann,
Qifeng Yu,
Tao Tan
Abstract:
Deep learning offers transformative potential in medical imaging, yet its clinical adoption is frequently hampered by challenges such as data scarcity, distribution shifts, and the need for robust task generalization. Prompt-based methodologies have emerged as a pivotal strategy to guide deep learning models, providing flexible, domain-specific adaptations that significantly enhance model performa…
▽ More
Deep learning offers transformative potential in medical imaging, yet its clinical adoption is frequently hampered by challenges such as data scarcity, distribution shifts, and the need for robust task generalization. Prompt-based methodologies have emerged as a pivotal strategy to guide deep learning models, providing flexible, domain-specific adaptations that significantly enhance model performance and adaptability without extensive retraining. This systematic review critically examines the burgeoning landscape of prompt engineering in medical imaging. We dissect diverse prompt modalities, including textual instructions, visual prompts, and learnable embeddings, and analyze their integration for core tasks such as image generation, segmentation, and classification. Our synthesis reveals how these mechanisms improve task-specific outcomes by enhancing accuracy, robustness, and data efficiency and reducing reliance on manual feature engineering while fostering greater model interpretability by making the model's guidance explicit. Despite substantial advancements, we identify persistent challenges, particularly in prompt design optimization, data heterogeneity, and ensuring scalability for clinical deployment. Finally, this review outlines promising future trajectories, including advanced multimodal prompting and robust clinical integration, underscoring the critical role of prompt-driven AI in accelerating the revolution of diagnostics and personalized treatment planning in medicine.
△ Less
Submitted 27 June, 2025;
originally announced July 2025.
-
Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding
Authors:
Zijian Lin,
Yang Zhang,
Yougen Yuan,
Yuming Yan,
Jinjiang Liu,
Zhiyong Wu,
Pengfei Hu,
Qun Yu
Abstract:
Modern autoregressive speech synthesis models leveraging language models have demonstrated remarkable performance. However, the sequential nature of next token prediction in these models leads to significant latency, hindering their deployment in scenarios where inference speed is critical. In this work, we propose Speech Speculative Decoding (SSD), a novel framework for autoregressive speech synt…
▽ More
Modern autoregressive speech synthesis models leveraging language models have demonstrated remarkable performance. However, the sequential nature of next token prediction in these models leads to significant latency, hindering their deployment in scenarios where inference speed is critical. In this work, we propose Speech Speculative Decoding (SSD), a novel framework for autoregressive speech synthesis acceleration. Specifically, our method employs a lightweight draft model to generate candidate token sequences, which are subsequently verified in parallel by the target model using the proposed SSD framework. Experimental results demonstrate that SSD achieves a significant speedup of 1.4x compared with conventional autoregressive decoding, while maintaining high fidelity and naturalness. Subjective evaluations further validate the effectiveness of SSD in preserving the perceptual quality of the target model while accelerating inference.
△ Less
Submitted 2 June, 2025; v1 submitted 21 May, 2025;
originally announced May 2025.
-
Can We Ignore Labels In Out of Distribution Detection?
Authors:
Hong Yang,
Qi Yu,
Travis Desel
Abstract:
Out-of-distribution (OOD) detection methods have recently become more prominent, serving as a core element in safety-critical autonomous systems. One major purpose of OOD detection is to reject invalid inputs that could lead to unpredictable errors and compromise safety. Due to the cost of labeled data, recent works have investigated the feasibility of self-supervised learning (SSL) OOD detection,…
▽ More
Out-of-distribution (OOD) detection methods have recently become more prominent, serving as a core element in safety-critical autonomous systems. One major purpose of OOD detection is to reject invalid inputs that could lead to unpredictable errors and compromise safety. Due to the cost of labeled data, recent works have investigated the feasibility of self-supervised learning (SSL) OOD detection, unlabeled OOD detection, and zero shot OOD detection. In this work, we identify a set of conditions for a theoretical guarantee of failure in unlabeled OOD detection algorithms from an information-theoretic perspective. These conditions are present in all OOD tasks dealing with real-world data: I) we provide theoretical proof of unlabeled OOD detection failure when there exists zero mutual information between the learning objective and the in-distribution labels, a.k.a. 'label blindness', II) we define a new OOD task - Adjacent OOD detection - that tests for label blindness and accounts for a previously ignored safety gap in all OOD detection benchmarks, and III) we perform experiments demonstrating that existing unlabeled OOD methods fail under conditions suggested by our label blindness theory and analyze the implications for future research in unlabeled OOD methods.
△ Less
Submitted 20 April, 2025;
originally announced April 2025.
-
F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization
Authors:
Xiaohui Sun,
Ruitong Xiao,
Jianye Mo,
Bowen Wu,
Qun Yu,
Baoxun Wang
Abstract:
We present F5R-TTS, a novel text-to-speech (TTS) system that integrates Group Relative Policy Optimization (GRPO) into a flow-matching based architecture. By reformulating the deterministic outputs of flow-matching TTS into probabilistic Gaussian distributions, our approach enables seamless integration of reinforcement learning algorithms. During pretraining, we train a probabilistically reformula…
▽ More
We present F5R-TTS, a novel text-to-speech (TTS) system that integrates Group Relative Policy Optimization (GRPO) into a flow-matching based architecture. By reformulating the deterministic outputs of flow-matching TTS into probabilistic Gaussian distributions, our approach enables seamless integration of reinforcement learning algorithms. During pretraining, we train a probabilistically reformulated flow-matching based model which is derived from F5-TTS with an open-source dataset. In the subsequent reinforcement learning (RL) phase, we employ a GRPO-driven enhancement stage that leverages dual reward metrics: word error rate (WER) computed via automatic speech recognition and speaker similarity (SIM) assessed by verification models. Experimental results on zero-shot voice cloning demonstrate that F5R-TTS achieves significant improvements in both speech intelligibility (a 29.5% relative reduction in WER) and speaker similarity (a 4.6% relative increase in SIM score) compared to conventional flow-matching based TTS systems. Audio samples are available at https://frontierlabs.github.io/F5R.
△ Less
Submitted 22 April, 2025; v1 submitted 3 April, 2025;
originally announced April 2025.
-
A Continual Learning-driven Model for Accurate and Generalizable Segmentation of Clinically Comprehensive and Fine-grained Whole-body Anatomies in CT
Authors:
Dazhou Guo,
Zhanghexuan Ji,
Yanzhou Su,
Dandan Zheng,
Heng Guo,
Puyang Wang,
Ke Yan,
Yirui Wang,
Qinji Yu,
Zi Li,
Minfeng Xu,
Jianfeng Zhang,
Haoshen Li,
Jia Ge,
Tsung-Ying Ho,
Bing-Shen Huang,
Tashan Ai,
Kuaile Zhao,
Na Shen,
Qifeng Wang,
Yun Bian,
Tingyu Wu,
Peng Du,
Hua Zhang,
Feng-Ming Kong
, et al. (9 additional authors not shown)
Abstract:
Precision medicine in the quantitative management of chronic diseases and oncology would be greatly improved if the Computed Tomography (CT) scan of any patient could be segmented, parsed and analyzed in a precise and detailed way. However, there is no such fully annotated CT dataset with all anatomies delineated for training because of the exceptionally high manual cost, the need for specialized…
▽ More
Precision medicine in the quantitative management of chronic diseases and oncology would be greatly improved if the Computed Tomography (CT) scan of any patient could be segmented, parsed and analyzed in a precise and detailed way. However, there is no such fully annotated CT dataset with all anatomies delineated for training because of the exceptionally high manual cost, the need for specialized clinical expertise, and the time required to finish the task. To this end, we proposed a novel continual learning-driven CT model that can segment complete anatomies presented using dozens of previously partially labeled datasets, dynamically expanding its capacity to segment new ones without compromising previously learned organ knowledge. Existing multi-dataset approaches are not able to dynamically segment new anatomies without catastrophic forgetting and would encounter optimization difficulty or infeasibility when segmenting hundreds of anatomies across the whole range of body regions. Our single unified CT segmentation model, CL-Net, can highly accurately segment a clinically comprehensive set of 235 fine-grained whole-body anatomies. Composed of a universal encoder, multiple optimized and pruned decoders, CL-Net is developed using 13,952 CT scans from 20 public and 16 private high-quality partially labeled CT datasets of various vendors, different contrast phases, and pathologies. Extensive evaluation demonstrates that CL-Net consistently outperforms the upper limit of an ensemble of 36 specialist nnUNets trained per dataset with the complexity of 5% model size and significantly surpasses the segmentation accuracy of recent leading Segment Anything-style medical image foundation models by large margins. Our continual learning-driven CL-Net model would lay a solid foundation to facilitate many downstream tasks of oncology and chronic diseases using the most widely adopted CT imaging.
△ Less
Submitted 16 March, 2025;
originally announced March 2025.
-
High-precision visual navigation device calibration method based on collimator
Authors:
Shunkun Liang,
Dongcai Tan,
Banglei Guan,
Zhang Li,
Guangcheng Dai,
Nianpeng Pan,
Liang Shen,
Yang Shang,
Qifeng Yu
Abstract:
Visual navigation devices require precise calibration to achieve high-precision localization and navigation, which includes camera and attitude calibration. To address the limitations of time-consuming camera calibration and complex attitude adjustment processes, this study presents a collimator-based calibration method and system. Based on the optical characteristics of the collimator, a single-i…
▽ More
Visual navigation devices require precise calibration to achieve high-precision localization and navigation, which includes camera and attitude calibration. To address the limitations of time-consuming camera calibration and complex attitude adjustment processes, this study presents a collimator-based calibration method and system. Based on the optical characteristics of the collimator, a single-image camera calibration algorithm is introduced. In addition, integrated with the precision adjustment mechanism of the calibration frame, a rotation transfer model between coordinate systems enables efficient attitude calibration. Experimental results demonstrate that the proposed method achieves accuracy and stability comparable to traditional multi-image calibration techniques. Specifically, the re-projection errors are less than 0.1463 pixels, and average attitude angle errors are less than 0.0586 degrees with a standard deviation less than 0.0257 degrees, demonstrating high precision and robustness.
△ Less
Submitted 25 February, 2025;
originally announced February 2025.
-
Near-Field Wideband Beamforming for RIS Based on Fresnel Zone
Authors:
Qiumo Yu,
Linglong Dai
Abstract:
Reconfigurable intelligent surface (RIS) has emerged as a promising solution to overcome the challenges of high path loss and easy signal blockage in millimeter-wave (mmWave) and terahertz (THz) communication systems. With the increase of RIS aperture and system bandwidth, the near-field beam split effect emerges, which causes beams at different frequencies to focus on distinct physical locations,…
▽ More
Reconfigurable intelligent surface (RIS) has emerged as a promising solution to overcome the challenges of high path loss and easy signal blockage in millimeter-wave (mmWave) and terahertz (THz) communication systems. With the increase of RIS aperture and system bandwidth, the near-field beam split effect emerges, which causes beams at different frequencies to focus on distinct physical locations, leading to a significant gain loss of beamforming. To address this problem, we leverage the property of Fresnel zone that the beam split disappears for RIS elements along a single Fresnel zone and propose beamforming design on the two dimensions of along and across the Fresnel zones. The phase shift of RIS elements along the same Fresnel zone are designed aligned, so that the signal reflected by these element can add up in-phase at the receiver regardless of the frequency. Then the expression of equivalent channel is simplified to the Fourier transform of reflective intensity across Fresnel zones modulated by the designed phase. Based on this relationship, we prove that the uniformly distributed in-band gain with aligned phase along the Fresnel zone leads to the upper bound of achievable rate. Finally, we design phase shifts of RIS to approach this upper bound by adopting the stationary phase method as well as the Gerchberg-Saxton (GS) algorithm. Simulation results validate the effectiveness of our proposed Fresnel zone-based method in mitigating the near-field beam split effect.
△ Less
Submitted 27 November, 2024;
originally announced November 2024.
-
Confocal structured illumination microscopy
Authors:
Weishuai Zhou,
Manhong Yao,
Xi Lin,
Quan Yu,
Junzheng Peng,
Jingang Zhong
Abstract:
Confocal microscopy, a critical advancement in optical imaging, is widely applied because of its excellent anti-noise ability. However, it has low imaging efficiency and can cause phototoxicity. Optical-sectioning structured illumination microscopy (OS-SIM) can overcome the limitations of confocal microscopy but still face challenges in imaging depth and signal-to-noise ratio (SNR). We introduce t…
▽ More
Confocal microscopy, a critical advancement in optical imaging, is widely applied because of its excellent anti-noise ability. However, it has low imaging efficiency and can cause phototoxicity. Optical-sectioning structured illumination microscopy (OS-SIM) can overcome the limitations of confocal microscopy but still face challenges in imaging depth and signal-to-noise ratio (SNR). We introduce the concept of confocal imaging into OS-SIM and propose confocal structured illumination microscopy (CSIM) to enhance the imaging performance of OS-SIM. CSIM exploits the principle of dual photography to reconstruct a dual image from each pixel of the camera. The reconstructed dual image is equivalent to the image obtained by using the spatial light modulator (SLM) as a virtual camera, enabling the separation of the conjugate and non-conjugate signals recorded by the camera pixel. We can reject the non-conjugate signals by extracting the conjugate signal from each dual image to reconstruct a confocal image when establishing the conjugate relationship between the camera and the SLM. We have constructed the theoretical framework of CSIM. Optical-sectioning experimental results demonstrate that CSIM can reconstruct images with superior SNR and greater imaging depth compared with existing OS-SIM. CSIM is expected to expand the application scope of OS-SIM.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Coded Beam Training
Authors:
Tianyue Zheng,
Jieao Zhu,
Qiumo Yu,
Yongli Yan,
Linglong Dai
Abstract:
In extremely large-scale multiple input multiple output (XL-MIMO) systems for future sixth-generation (6G) communications, codebook-based beam training stands out as a promising technology to acquire channel state information (CSI). Despite their effectiveness, when the pilot overhead is limited, existing beam training methods suffer from significant achievable rate degradation for remote users wi…
▽ More
In extremely large-scale multiple input multiple output (XL-MIMO) systems for future sixth-generation (6G) communications, codebook-based beam training stands out as a promising technology to acquire channel state information (CSI). Despite their effectiveness, when the pilot overhead is limited, existing beam training methods suffer from significant achievable rate degradation for remote users with low signal-to-noise ratio (SNR). To tackle this challenge, leveraging the error-correcting capability of channel codes, we introduce channel coding theory into hierarchical beam training to extend the coverage area. Specifically, we establish the duality between hierarchical beam training and channel coding, and the proposed coded beam training scheme serves as a general framework. Then, we present two specific implementations exemplified by coded beam training methods based on Hamming codes and convolutional codes, during which the beam encoding and decoding processes are refined respectively to better accommodate the beam training problem. Simulation results have demonstrated that the proposed coded beam training method can enable reliable beam training performance for remote users with low SNR while keeping training overhead low.
△ Less
Submitted 6 March, 2024; v1 submitted 3 January, 2024;
originally announced January 2024.
-
How AI-driven Digital Twins Can Empower Mobile Networks
Authors:
Tong Li,
Fenyu Jiang,
Qiaohong Yu,
Wenzhen Huang,
Tao Jiang,
Depeng Jin
Abstract:
The growing complexity of next-generation networks exacerbates the modeling and algorithmic flaws of conventional network optimization methodology. In this paper, we propose a mobile network digital twin (MNDT) architecture for 6G networks. To address the modeling and algorithmic shortcomings, the MNDT uses a simulation-optimization structure. The feedback from the network simulation engine, which…
▽ More
The growing complexity of next-generation networks exacerbates the modeling and algorithmic flaws of conventional network optimization methodology. In this paper, we propose a mobile network digital twin (MNDT) architecture for 6G networks. To address the modeling and algorithmic shortcomings, the MNDT uses a simulation-optimization structure. The feedback from the network simulation engine, which serves as validation for the optimizer's decision outcomes, is used explicitly to train artificial intelligence (AI) empowered optimizers iteratively. In practice, we develop a network digital twin prototype system leveraging data-driven technology to accurately model the behaviors of mobile network elements (e.g., mobile users and base stations), wireless environments, and network performance. An AI-powered network optimizer has been developed based on the deployed MNDT prototype system for providing reliable and optimized network configurations. The results of the experiments demonstrate that the proposed MNDT infrastructure can provide practical network optimization solutions while adapting to the more complex environment.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Energy-efficient Beamforming for RISs-aided Communications: Gradient Based Meta Learning
Authors:
Xinquan Wang,
Fenghao Zhu,
Qianyun Zhou,
Qihao Yu,
Chongwen Huang,
Ahmed Alhammadi,
Zhaoyang Zhang,
Chau Yuen,
Mérouane Debbah
Abstract:
Reconfigurable intelligent surfaces (RISs) have become a promising technology to meet the requirements of energy efficiency and scalability in future six-generation (6G) communications. However, a significant challenge in RISs-aided communications is the joint optimization of active and passive beamforming at base stations (BSs) and RISs respectively. Specifically, the main difficulty is attribute…
▽ More
Reconfigurable intelligent surfaces (RISs) have become a promising technology to meet the requirements of energy efficiency and scalability in future six-generation (6G) communications. However, a significant challenge in RISs-aided communications is the joint optimization of active and passive beamforming at base stations (BSs) and RISs respectively. Specifically, the main difficulty is attributed to the highly non-convex optimization space of beamforming matrices at both BSs and RISs, as well as the diversity and mobility of communication scenarios. To address this, we present a greenly gradient based meta learning beamforming (GMLB) approach. Unlike traditional deep learning based methods which take channel information directly as input, GMLB feeds the gradient of sum rate into neural networks. Coherently, we design a differential regulator to address the phase shift optimization of RISs. Moreover, we use the meta learning to iteratively optimize the beamforming matrices of BSs and RISs. These techniques make the proposed method to work well without requiring energy-consuming pre-training. Simulations show that GMLB could achieve higher sum rate than that of typical alternating optimization algorithms with the energy consumption by two orders of magnitude less.
△ Less
Submitted 16 February, 2024; v1 submitted 12 November, 2023;
originally announced November 2023.
-
Ternary Stochastic Geometry Theory for Performance Analysis of RIS-Assisted UDN
Authors:
Hongchi Lin,
Qiyue yu
Abstract:
Currently, network topology becomes increasingly complex with the increased number of various network nodes, bringing in the challenge of network design and analysis. Most of the current studies are deduced based on the binary system stochastic geometry, overlooking the coupling and collaboration among nodes. This limitation makes it difficult to accurately analyze network systems, such as reconfi…
▽ More
Currently, network topology becomes increasingly complex with the increased number of various network nodes, bringing in the challenge of network design and analysis. Most of the current studies are deduced based on the binary system stochastic geometry, overlooking the coupling and collaboration among nodes. This limitation makes it difficult to accurately analyze network systems, such as reconfigurable intelligent surface (RIS) assisted ultra-dense network (UDN). To address this issue, we propose a dual coordinate system analysis method, by using dual observation points and their established coordinates. The concept of a typical triangle that consists of a base station (BS), a RIS, and a user equipment (UE) is defined as the fundamental unit of analysis for ternary stochastic geometry. This triangle comprises the base station, the RIS, and the user equipment (UE). Furthermore, we extend Campbell's theorem and propose an approximate probability generating function for ternary stochastic geometry. Utilizing the theoretical framework of ternary stochastic geometry, we derive and analyze performance metrics of a RIS-assisted UDN system, such as coverage probability, area spectral efficiency, area energy efficiency, and energy coverage efficiency. Simulation results show that RIS can significantly enhance system performance, particularly for UEs with high signal-to-interference-plus-noise ratios, exhibiting a phenomenon similar to the Matthew effect.
△ Less
Submitted 7 May, 2024; v1 submitted 16 July, 2023;
originally announced July 2023.
-
SynthMix: Mixing up Aligned Synthesis for Medical Cross-Modality Domain Adaptation
Authors:
Xinwen Zhang,
Chaoyi Zhang,
Dongnan Liu,
Qianbi Yu,
Weidong Cai
Abstract:
The adversarial methods showed advanced performance by producing synthetic images to mitigate the domain shift, a common problem due to the hardship of acquiring labelled data in medical field. Most existing studies focus on modifying the network architecture, but little has worked on the GAN training strategy. In this work, we propose SynthMix, an add-on module with a natural yet effective traini…
▽ More
The adversarial methods showed advanced performance by producing synthetic images to mitigate the domain shift, a common problem due to the hardship of acquiring labelled data in medical field. Most existing studies focus on modifying the network architecture, but little has worked on the GAN training strategy. In this work, we propose SynthMix, an add-on module with a natural yet effective training policy that can promote synthetic quality without altering the network architecture. Following the adversarial philosophy of GAN, we designed a mix-up synthesis scheme termed SynthMix. It coherently mixed up aligned images of real and synthetic samples to stimulate the generation of fine-grained features, examined by an associated Inspector for the domain-specific details. We evaluated our method on two segmentation benchmarks among three publicly available datasets, where our method showed a significant performance gain compared with existing state-of-the-art approaches.
△ Less
Submitted 6 May, 2023;
originally announced May 2023.
-
Model-free Quantum Gate Design and Calibration using Deep Reinforcement Learning
Authors:
Omar Shindi,
Qi Yu,
Parth Girdhar,
Daoyi Dong
Abstract:
High-fidelity quantum gate design is important for various quantum technologies, such as quantum computation and quantum communication. Numerous control policies for quantum gate design have been proposed given a dynamical model of the quantum system of interest. However, a quantum system is often highly sensitive to noise, and obtaining its accurate modeling can be difficult for many practical ap…
▽ More
High-fidelity quantum gate design is important for various quantum technologies, such as quantum computation and quantum communication. Numerous control policies for quantum gate design have been proposed given a dynamical model of the quantum system of interest. However, a quantum system is often highly sensitive to noise, and obtaining its accurate modeling can be difficult for many practical applications. Thus, the control policy based on a quantum system model may be unpractical for quantum gate design. Also, quantum measurements collapse quantum states, which makes it challenging to obtain information through measurements during the control process. In this paper, we propose a novel training framework using deep reinforcement learning for model-free quantum control. The proposed framework relies only on the measurement at the end of the control process and offers the ability to find the optimal control policy without access to quantum systems during the learning process. The effectiveness of the proposed technique is numerically demonstrated for model-free quantum gate design and quantum gate calibration using off-policy reinforcement learning algorithms.
△ Less
Submitted 7 February, 2023; v1 submitted 5 February, 2023;
originally announced February 2023.
-
CancerUniT: Towards a Single Unified Model for Effective Detection, Segmentation, and Diagnosis of Eight Major Cancers Using a Large Collection of CT Scans
Authors:
Jieneng Chen,
Yingda Xia,
Jiawen Yao,
Ke Yan,
Jianpeng Zhang,
Le Lu,
Fakai Wang,
Bo Zhou,
Mingyan Qiu,
Qihang Yu,
Mingze Yuan,
Wei Fang,
Yuxing Tang,
Minfeng Xu,
Jian Zhou,
Yuqian Zhao,
Qifeng Wang,
Xianghua Ye,
Xiaoli Yin,
Yu Shi,
Xin Chen,
Jingren Zhou,
Alan Yuille,
Zaiyi Liu,
Ling Zhang
Abstract:
Human readers or radiologists routinely perform full-body multi-organ multi-disease detection and diagnosis in clinical practice, while most medical AI systems are built to focus on single organs with a narrow list of a few diseases. This might severely limit AI's clinical adoption. A certain number of AI models need to be assembled non-trivially to match the diagnostic process of a human reading…
▽ More
Human readers or radiologists routinely perform full-body multi-organ multi-disease detection and diagnosis in clinical practice, while most medical AI systems are built to focus on single organs with a narrow list of a few diseases. This might severely limit AI's clinical adoption. A certain number of AI models need to be assembled non-trivially to match the diagnostic process of a human reading a CT scan. In this paper, we construct a Unified Tumor Transformer (CancerUniT) model to jointly detect tumor existence & location and diagnose tumor characteristics for eight major cancers in CT scans. CancerUniT is a query-based Mask Transformer model with the output of multi-tumor prediction. We decouple the object queries into organ queries, tumor detection queries and tumor diagnosis queries, and further establish hierarchical relationships among the three groups. This clinically-inspired architecture effectively assists inter- and intra-organ representation learning of tumors and facilitates the resolution of these complex, anatomically related multi-organ cancer image reading tasks. CancerUniT is trained end-to-end using a curated large-scale CT images of 10,042 patients including eight major types of cancers and occurring non-cancer tumors (all are pathology-confirmed with 3D tumor masks annotated by radiologists). On the test set of 631 patients, CancerUniT has demonstrated strong performance under a set of clinically relevant evaluation metrics, substantially outperforming both multi-disease methods and an assembly of eight single-organ expert models in tumor detection, segmentation, and diagnosis. This moves one step closer towards a universal high performance cancer screening tool.
△ Less
Submitted 6 October, 2023; v1 submitted 28 January, 2023;
originally announced January 2023.
-
Wake Word Detection Based on Res2Net
Authors:
Qiuchen Yu,
Ruohua Zhou
Abstract:
This letter proposes a new wake word detection system based on Res2Net. As a variant of ResNet, Res2Net was first applied to objection detection. Res2Net realizes multiple feature scales by increasing possible receptive fields. This multiple scaling mechanism significantly improves the detection ability of wake words with different durations. Compared with the ResNet-based model, Res2Net also sign…
▽ More
This letter proposes a new wake word detection system based on Res2Net. As a variant of ResNet, Res2Net was first applied to objection detection. Res2Net realizes multiple feature scales by increasing possible receptive fields. This multiple scaling mechanism significantly improves the detection ability of wake words with different durations. Compared with the ResNet-based model, Res2Net also significantly reduces the model size and is more suitable for detecting wake words. The proposed system can determine the positions of wake words from the audio stream without any additional assistance. The proposed method is verified on the Mobvoi dataset containing two wake words. At a false alarm rate of 0.5 per hour, the system reduced the false rejection of the two wake words by more than 12% over prior works.
△ Less
Submitted 30 September, 2022;
originally announced September 2022.
-
Doubly-Iterative Sparsified MMSE Turbo Equalization for OTFS Modulation
Authors:
Haotian Li,
Qiyue Yu
Abstract:
Currently, orthogonal time frequency space (OTFS) modulation has drawn much attention to reliable communications in high-mobility scenarios. This paper proposes a doubly-iterative sparsified minimum mean square error (DI-S-MMSE) turbo equalizer, which iteratively exchanges the extrinsic information between a soft-input-soft-input (SISO) MMSE estimator and a SISO decoder. Our proposed equalizer doe…
▽ More
Currently, orthogonal time frequency space (OTFS) modulation has drawn much attention to reliable communications in high-mobility scenarios. This paper proposes a doubly-iterative sparsified minimum mean square error (DI-S-MMSE) turbo equalizer, which iteratively exchanges the extrinsic information between a soft-input-soft-input (SISO) MMSE estimator and a SISO decoder. Our proposed equalizer does not suffer from short loops and approaches the performance of the near-optimal symbol-wise maximum a posteriori (MAP) algorithm. To exploit the inherent sparsity of OTFS system, we resort to graph theory to investigate the sparsity pattern of the channel matrix, and propose two sparsification guidelines to reduce the complexity of calculating the matrix inverse at the MMSE estimator. Then, we apply two iterative algorithms to MMSE estimation, i.e., the Generalized Minimal Residual (GMRES) and Factorized Sparse Approximate Inverse (FSPAI) algorithms. The former is used at the initial turbo iteration, whose global convergence is proven in our equalizer, while the latter is used at the subsequent turbo iterations with the help of our proposed guidelines. Simulation results demonstrate that our equalizer has a linear order of complexity while the performance loss incurred by the sparsification is only 0.2 dB at $10^{-4}$ bit error rate. Simulation codes are available to reproduce the results presented in this paper: https://github.com/Alga53/DISMMSE-Turbo-Equalizer-for-OTFS.
△ Less
Submitted 7 September, 2024; v1 submitted 2 July, 2022;
originally announced July 2022.
-
A Joint Beamforming Design and Integrated CPM-LFM Signal for Dual-functional Radar-communication Systems
Authors:
Yu Cao,
QiYue Yu
Abstract:
The dual-functional radar-communication (DFRC) system is an attractive technique, since it can support both wireless communications and radar by a unified hardware platform with real-time cooperation. Considering the appealing feature of multiple beams, this paper proposes a precoding scheme that simultaneously support multiuser transmission and target detection, with an integrated continuous phas…
▽ More
The dual-functional radar-communication (DFRC) system is an attractive technique, since it can support both wireless communications and radar by a unified hardware platform with real-time cooperation. Considering the appealing feature of multiple beams, this paper proposes a precoding scheme that simultaneously support multiuser transmission and target detection, with an integrated continuous phase modulation (CPM) and linear frequency modulation (LFM) signal, based on the designed dual mode framework. Similarly to the conception of communication rate, this paper defines radar rate to unify the DFRC system. Then, the maximum sum-rate that includes both the communication and radar rates is set to be the objective function. Regarding as the optimal issue is non-convex, the optimal problem is divided into two sub-issues, one is the user selection issue, and the other is the joint beamforming design and power allocation issue. A successive maximum iteration (SMI) algorithm is presented for the former issue, which can balance the performances between the sum-rate and complexity; and maximum minimization Lagrange multiplier (MMLM) iteration algorithm is utilized to solve the latter optimal issue. Moreover, we deduce the spectrum characteristic, bit error rate (BER) and ambiguity function (AF) for the proposed system. Simulation results show that our proposed system can provide appreciated sum-rate than the classical schemes, validating the efficiency of the proposed system.
△ Less
Submitted 17 December, 2021;
originally announced December 2021.
-
Simultaneous estimation of parameters and the state of an optical parametric oscillator system
Authors:
Qi Yu,
Shota Yokoyama,
Daoyi Dong,
David McManus,
Hidehiro Yonezawa
Abstract:
In this paper, we consider the filtering problem of an optical parametric oscillator (OPO). The OPO pump power may fluctuate due to environmental disturbances, resulting in uncertainty in the system modeling. Thus, both the state and the unknown parameter may need to be estimated simultaneously. We formulate this problem using a state-space representation of the OPO dynamics. Under the assumption…
▽ More
In this paper, we consider the filtering problem of an optical parametric oscillator (OPO). The OPO pump power may fluctuate due to environmental disturbances, resulting in uncertainty in the system modeling. Thus, both the state and the unknown parameter may need to be estimated simultaneously. We formulate this problem using a state-space representation of the OPO dynamics. Under the assumption of Gaussianity and proper constraints, the dual Kalman filter method and the joint extended Kalman filter method are employed to simultaneously estimate the system state and the pump power. Numerical examples demonstrate the effectiveness of the proposed algorithms.
△ Less
Submitted 14 November, 2021;
originally announced November 2021.
-
Spatio-Temporal Urban Knowledge Graph Enabled Mobility Prediction
Authors:
Huandong Wang,
Qiaohong Yu,
Yu Liu,
Depeng Jin,
Yong Li
Abstract:
With the rapid development of the mobile communication technology, mobile trajectories of humans are massively collected by Internet service providers (ISPs) and application service providers (ASPs). On the other hand, the rising paradigm of knowledge graph (KG) provides us a promising solution to extract structured "knowledge" from massive trajectory data. In this paper, we focus on modeling user…
▽ More
With the rapid development of the mobile communication technology, mobile trajectories of humans are massively collected by Internet service providers (ISPs) and application service providers (ASPs). On the other hand, the rising paradigm of knowledge graph (KG) provides us a promising solution to extract structured "knowledge" from massive trajectory data. In this paper, we focus on modeling users' spatio-temporal mobility patterns based on knowledge graph techniques, and predicting users' future movement based on the "knowledge'' extracted from multiple sources in a cohesive manner. Specifically, we propose a new type of knowledge graph, i.e., spatio-temporal urban knowledge graph (STKG), where mobility trajectories, category information of venues, and temporal information are jointly modeled by the facts with different relation types in STKG. The mobility prediction problem is converted to the knowledge graph completion problem in STKG. Further, a complex embedding model with elaborately designed scoring functions is proposed to measure the plausibility of facts in STKG to solve the knowledge graph completion problem, which considers temporal dynamics of the mobility patterns and utilizes PoI categories as the auxiliary information and background knowledge. Extensive evaluations confirm the high accuracy of our model in predicting users' mobility, i.e., improving the accuracy by 5.04% compared with the state-of-the-art algorithms. In addition, PoI categories as the background knowledge and auxiliary information are confirmed to be helpful by improving the performance by 3.85% in terms of accuracy. Additionally, experiments show that our proposed method is time-efficient by reducing the computational time by over 43.12% compared with existing methods.
△ Less
Submitted 10 November, 2021; v1 submitted 1 November, 2021;
originally announced November 2021.
-
Uniquely Decodable Multi-Amplitude Sequence for Grant-Free Multiple-Access Adder Channels
Authors:
Qi-Yue Yu,
Ke-Xun Song
Abstract:
Grant-free multiple-access (GFMA) is a valuable research topic, since it can support multiuser transmission with low latency. This paper constructs novel uniquely-decodable multi-amplitude sequence (UDAS) sets for GFMA systems, which can provide high spectrum efficiency (SE) with low-complexity active user detection (AUD) algorithm. First of all, we propose an UDAS-based multi-dimensional bit inte…
▽ More
Grant-free multiple-access (GFMA) is a valuable research topic, since it can support multiuser transmission with low latency. This paper constructs novel uniquely-decodable multi-amplitude sequence (UDAS) sets for GFMA systems, which can provide high spectrum efficiency (SE) with low-complexity active user detection (AUD) algorithm. First of all, we propose an UDAS-based multi-dimensional bit interleaving coded modulation (MD-BICM) transmitter; then introduce the definition of UDAS and construct two kinds of UDAS sets based on cyclic and quasi-cyclic matrix modes. Besides, we present a statistic of UDAS feature based AUD algorithm (SoF-AUD), and a joint multiuser detection and improved message passing algorithm for the proposed system. Finally, the active user error rate (AUER) and Shannon limits of the proposed system are deduced in details. Simulation results show that our proposed system can simultaneously support four users without additional redundancy, and the AUER can reach an extremely low value $10^{-5}$ when $E_b/N_0$ is $0$ dB and the length of transmit block is larger than a given value, i.e., 784, verifying the validity and flexibility of the proposed UDAS sets.
△ Less
Submitted 8 April, 2022; v1 submitted 22 October, 2021;
originally announced October 2021.
-
Bilateral-ViT for Robust Fovea Localization
Authors:
Sifan Song,
Kang Dang,
Qinji Yu,
Zilong Wang,
Frans Coenen,
Jionglong Su,
Xiaowei Ding
Abstract:
The fovea is an important anatomical landmark of the retina. Detecting the location of the fovea is essential for the analysis of many retinal diseases. However, robust fovea localization remains a challenging problem, as the fovea region often appears fuzzy, and retina diseases may further obscure its appearance. This paper proposes a novel Vision Transformer (ViT) approach that integrates inform…
▽ More
The fovea is an important anatomical landmark of the retina. Detecting the location of the fovea is essential for the analysis of many retinal diseases. However, robust fovea localization remains a challenging problem, as the fovea region often appears fuzzy, and retina diseases may further obscure its appearance. This paper proposes a novel Vision Transformer (ViT) approach that integrates information both inside and outside the fovea region to achieve robust fovea localization. Our proposed network, named Bilateral-Vision-Transformer (Bilateral-ViT), consists of two network branches: a transformer-based main network branch for integrating global context across the entire fundus image and a vessel branch for explicitly incorporating the structure of blood vessels. The encoded features from both network branches are subsequently merged with a customized Multi-scale Feature Fusion (MFF) module. Our comprehensive experiments demonstrate that the proposed approach is significantly more robust for diseased images and establishes the new state of the arts using the Messidor and PALM datasets.
△ Less
Submitted 3 March, 2022; v1 submitted 19 October, 2021;
originally announced October 2021.
-
Resource-constrained Federated Edge Learning with Heterogeneous Data: Formulation and Analysis
Authors:
Yi Liu,
Yuanshao Zhu,
James J. Q. Yu
Abstract:
Efficient collaboration between collaborative machine learning and wireless communication technology, forming a Federated Edge Learning (FEEL), has spawned a series of next-generation intelligent applications. However, due to the openness of network connections, the FEEL framework generally involves hundreds of remote devices (or clients), resulting in expensive communication costs, which is not f…
▽ More
Efficient collaboration between collaborative machine learning and wireless communication technology, forming a Federated Edge Learning (FEEL), has spawned a series of next-generation intelligent applications. However, due to the openness of network connections, the FEEL framework generally involves hundreds of remote devices (or clients), resulting in expensive communication costs, which is not friendly to resource-constrained FEEL. To address this issue, we propose a distributed approximate Newton-type algorithm with fast convergence speed to alleviate the problem of FEEL resource (in terms of communication resources) constraints. Specifically, the proposed algorithm is improved based on distributed L-BFGS algorithm and allows each client to approximate the high-cost Hessian matrix by computing the low-cost Fisher matrix in a distributed manner to find a "better" descent direction, thereby speeding up convergence. Second, we prove that the proposed algorithm has linear convergence in strongly convex and non-convex cases and analyze its computational and communication complexity. Similarly, due to the heterogeneity of the connected remote devices, FEEL faces the challenge of heterogeneous data and non-IID (Independent and Identically Distributed) data. To this end, we design a simple but elegant training scheme, namely FedOVA, to solve the heterogeneous statistical challenge brought by heterogeneous data. In this way, FedOVA first decomposes a multi-class classification problem into more straightforward binary classification problems and then combines their respective outputs using ensemble learning. In particular, the scheme can be well integrated with our communication efficient algorithm to serve FEEL. Numerical results verify the effectiveness and superiority of the proposed algorithm.
△ Less
Submitted 14 October, 2021;
originally announced October 2021.
-
Crosslink-Net: Double-branch Encoder Segmentation Network via Fusing Vertical and Horizontal Convolutions
Authors:
Qian Yu,
Lei Qi,
Luping Zhou,
Lei Wang,
Yilong Yin,
Yinghuan Shi,
Wuzhang Wang,
Yang Gao
Abstract:
Accurate image segmentation plays a crucial role in medical image analysis, yet it faces great challenges of various shapes, diverse sizes, and blurry boundaries. To address these difficulties, square kernel-based encoder-decoder architecture has been proposed and widely used, but its performance remains still unsatisfactory. To further cope with these challenges, we present a novel double-branch…
▽ More
Accurate image segmentation plays a crucial role in medical image analysis, yet it faces great challenges of various shapes, diverse sizes, and blurry boundaries. To address these difficulties, square kernel-based encoder-decoder architecture has been proposed and widely used, but its performance remains still unsatisfactory. To further cope with these challenges, we present a novel double-branch encoder architecture. Our architecture is inspired by two observations: 1) Since the discrimination of features learned via square convolutional kernels needs to be further improved, we propose to utilize non-square vertical and horizontal convolutional kernels in the double-branch encoder, so features learned by the two branches can be expected to complement each other. 2) Considering that spatial attention can help models to better focus on the target region in a large-sized image, we develop an attention loss to further emphasize the segmentation on small-sized targets. Together, the above two schemes give rise to a novel double-branch encoder segmentation framework for medical image segmentation, namely Crosslink-Net. The experiments validate the effectiveness of our model on four datasets. The code is released at https://github.com/Qianyu1226/Crosslink-Net.
△ Less
Submitted 23 July, 2021;
originally announced July 2021.
-
Toward Interactive Modulation for Photo-Realistic Image Restoration
Authors:
Haoming Cai,
Jingwen He,
Qiao Yu,
Chao Dong
Abstract:
Modulating image restoration level aims to generate a restored image by altering a factor that represents the restoration strength. Previous works mainly focused on optimizing the mean squared reconstruction error, which brings high reconstruction accuracy but lacks finer texture details. This paper presents a Controllable Unet Generative Adversarial Network (CUGAN) to generate high-frequency text…
▽ More
Modulating image restoration level aims to generate a restored image by altering a factor that represents the restoration strength. Previous works mainly focused on optimizing the mean squared reconstruction error, which brings high reconstruction accuracy but lacks finer texture details. This paper presents a Controllable Unet Generative Adversarial Network (CUGAN) to generate high-frequency textures in the modulation tasks. CUGAN consists of two modules -- base networks and condition networks. The base networks comprise a generator and a discriminator. In the generator, we realize the interactive control of restoration levels by tuning the weights of different features from different scales in the Unet architecture. Moreover, we adaptively modulate the intermediate features in the discriminator according to the severity of degradations. The condition networks accept the condition vector (encoded degradation information) as input, then generate modulation parameters for both the generator and the discriminator. During testing, users can control the output effects by tweaking the condition vector. We also provide a smooth transition between GAN and MSE effects by a simple transition method. Extensive experiments demonstrate that the proposed CUGAN achieves excellent performance on image restoration modulation tasks.
△ Less
Submitted 7 May, 2021;
originally announced May 2021.
-
The impact of data volume on performance of deep learning based building rooftop extraction using very high spatial resolution aerial images
Authors:
Hongjie He,
Ke Yang,
Yuwei Cai,
Zijian Jiang,
Qiutong Yu,
Kun Zhao,
Junbo Wang,
Sarah Narges Fatholahi,
Yan Liu,
Hasti Andon Petrosians,
Bingxu Hu,
Liyuan Qing,
Zhehan Zhang,
Hongzhang Xu,
Siyu Li,
Kyle Gao,
Linlin Xu,
Jonathan Li
Abstract:
Building rooftop data are of importance in several urban applications and in natural disaster management. In contrast to traditional surveying and mapping, by using high spatial resolution aerial images, deep learning-based building rooftops extraction methods are efficient and accurate. Although more training data is preferred in deep learning-based tasks, the effect of data volume on building ex…
▽ More
Building rooftop data are of importance in several urban applications and in natural disaster management. In contrast to traditional surveying and mapping, by using high spatial resolution aerial images, deep learning-based building rooftops extraction methods are efficient and accurate. Although more training data is preferred in deep learning-based tasks, the effect of data volume on building extraction models is underexplored. Therefore, the paper explores the impact of data volume on the performance of building rooftop extraction from very-high-spatial-resolution (VHSR) images using deep learning-based methods. To do so, we manually labelled 0.12m spatial resolution aerial images and perform a comparative analysis of models trained on datasets of different sizes using popular deep learning architectures for segmentation tasks, including Fully Convolutional Networks (FCN)-8s, U-Net and DeepLabv3+. The experiments showed that with more training data, algorithms converged faster and achieved higher accuracy, while better algorithms were able to better mitigate the lack of training data.
△ Less
Submitted 4 October, 2021; v1 submitted 16 March, 2021;
originally announced March 2021.
-
Deep Symmetric Adaptation Network for Cross-modality Medical Image Segmentation
Authors:
Xiaoting Han,
Lei Qi,
Qian Yu,
Ziqi Zhou,
Yefeng Zheng,
Yinghuan Shi,
Yang Gao
Abstract:
Unsupervised domain adaptation (UDA) methods have shown their promising performance in the cross-modality medical image segmentation tasks. These typical methods usually utilize a translation network to transform images from the source domain to target domain or train the pixel-level classifier merely using translated source images and original target images. However, when there exists a large dom…
▽ More
Unsupervised domain adaptation (UDA) methods have shown their promising performance in the cross-modality medical image segmentation tasks. These typical methods usually utilize a translation network to transform images from the source domain to target domain or train the pixel-level classifier merely using translated source images and original target images. However, when there exists a large domain shift between source and target domains, we argue that this asymmetric structure could not fully eliminate the domain gap. In this paper, we present a novel deep symmetric architecture of UDA for medical image segmentation, which consists of a segmentation sub-network, and two symmetric source and target domain translation sub-networks. To be specific, based on two translation sub-networks, we introduce a bidirectional alignment scheme via a shared encoder and private decoders to simultaneously align features 1) from source to target domain and 2) from target to source domain, which helps effectively mitigate the discrepancy between domains. Furthermore, for the segmentation sub-network, we train a pixel-level classifier using not only original target images and translated source images, but also original source images and translated target images, which helps sufficiently leverage the semantic information from the images with different styles. Extensive experiments demonstrate that our method has remarkable advantages compared to the state-of-the-art methods in both cross-modality Cardiac and BraTS segmentation tasks.
△ Less
Submitted 17 January, 2021;
originally announced January 2021.
-
Deep Learning Methods for Lung Cancer Segmentation in Whole-slide Histopathology Images -- the ACDC@LungHP Challenge 2019
Authors:
Zhang Li,
Jiehua Zhang,
Tao Tan,
Xichao Teng,
Xiaoliang Sun,
Yang Li,
Lihong Liu,
Yang Xiao,
Byungjae Lee,
Yilong Li,
Qianni Zhang,
Shujiao Sun,
Yushan Zheng,
Junyu Yan,
Ni Li,
Yiyu Hong,
Junsu Ko,
Hyun Jung,
Yanling Liu,
Yu-cheng Chen,
Ching-wei Wang,
Vladimir Yurovskiy,
Pavel Maevskikh,
Vahid Khanagha,
Yi Jiang
, et al. (8 additional authors not shown)
Abstract:
Accurate segmentation of lung cancer in pathology slides is a critical step in improving patient care. We proposed the ACDC@LungHP (Automatic Cancer Detection and Classification in Whole-slide Lung Histopathology) challenge for evaluating different computer-aided diagnosis (CADs) methods on the automatic diagnosis of lung cancer. The ACDC@LungHP 2019 focused on segmentation (pixel-wise detection)…
▽ More
Accurate segmentation of lung cancer in pathology slides is a critical step in improving patient care. We proposed the ACDC@LungHP (Automatic Cancer Detection and Classification in Whole-slide Lung Histopathology) challenge for evaluating different computer-aided diagnosis (CADs) methods on the automatic diagnosis of lung cancer. The ACDC@LungHP 2019 focused on segmentation (pixel-wise detection) of cancer tissue in whole slide imaging (WSI), using an annotated dataset of 150 training images and 50 test images from 200 patients. This paper reviews this challenge and summarizes the top 10 submitted methods for lung cancer segmentation. All methods were evaluated using the false positive rate, false negative rate, and DICE coefficient (DC). The DC ranged from 0.7354$\pm$0.1149 to 0.8372$\pm$0.0858. The DC of the best method was close to the inter-observer agreement (0.8398$\pm$0.0890). All methods were based on deep learning and categorized into two groups: multi-model method and single model method. In general, multi-model methods were significantly better ($\textit{p}$<$0.01$) than single model methods, with mean DC of 0.7966 and 0.7544, respectively. Deep learning based methods could potentially help pathologists find suspicious regions for further analysis of lung cancer in WSI.
△ Less
Submitted 21 August, 2020;
originally announced August 2020.
-
Generation of accessible sets in the dynamical modelling of quantum network systems
Authors:
Qi Yu,
Yuanlong Wang,
Daoyi Dong,
Ian R. Petersen,
Guo-Yong Xiang
Abstract:
In this paper, we consider the dynamical modeling of a class of quantum network systems consisting of qubits. Qubit probes are employed to measure a set of selected nodes of the quantum network systems. For a variety of applications, a state space model is a useful way to model the system dynamics. To construct a state space model for a quantum network system, the major task is to find an accessib…
▽ More
In this paper, we consider the dynamical modeling of a class of quantum network systems consisting of qubits. Qubit probes are employed to measure a set of selected nodes of the quantum network systems. For a variety of applications, a state space model is a useful way to model the system dynamics. To construct a state space model for a quantum network system, the major task is to find an accessible set containing all of the operators coupled to the measurement operators. This paper focuses on the generation of a proper accessible set for a given system and measurement scheme. We provide analytic results on simplifying the process of generating accessible sets for systems with a time-independent Hamiltonian. Since the order of elements in the accessible set determines the form of state space matrices, guidance is provided to effectively arrange the ordering of elements in the state vector. Defining a system state according to the accessible set, one can develop a state space model with a special pattern inherited from the system structure. As a demonstration, we specifically consider a typical 1D-chain system with several common measurements, and employ the proposed method to determine its accessible set.
△ Less
Submitted 30 April, 2020;
originally announced April 2020.
-
Hybrid filtering for a class of nonlinear quantum systems subject to classical stochastic disturbances
Authors:
Qi Yu,
Daoyi Dong,
Ian R. Petersen
Abstract:
A hybrid quantum-classical filtering problem, where a qubit system is disturbed by a classical stochastic process, is investigated. The strategy is to model the classical disturbance by using an optical cavity. Relations between classical disturbances and the cavity analog system are analyzed. The dynamics of the enlarged quantum network system, which includes a qubit system and a cavity system, a…
▽ More
A hybrid quantum-classical filtering problem, where a qubit system is disturbed by a classical stochastic process, is investigated. The strategy is to model the classical disturbance by using an optical cavity. Relations between classical disturbances and the cavity analog system are analyzed. The dynamics of the enlarged quantum network system, which includes a qubit system and a cavity system, are derived. A stochastic master equation for the qubit-cavity hybrid system is given, based on which estimates for the state of the cavity system and the classical signal are obtained. The quantum extended Kalman filter is employed to achieve efficient computation. Numerical results are presented to illustrate the effectiveness of our methods.
△ Less
Submitted 6 April, 2020;
originally announced April 2020.
-
Crossover-Net: Leveraging the Vertical-Horizontal Crossover Relation for Robust Segmentation
Authors:
Qian Yu,
Yinghuan Shi,
Yefeng Zheng,
Yang Gao,
Jianbing Zhu,
Yakang Dai
Abstract:
Robust segmentation for non-elongated tissues in medical images is hard to realize due to the large variation of the shape, size, and appearance of these tissues in different patients. In this paper, we present an end-to-end trainable deep segmentation model termed Crossover-Net for robust segmentation in medical images. Our proposed model is inspired by an insightful observation: during segmentat…
▽ More
Robust segmentation for non-elongated tissues in medical images is hard to realize due to the large variation of the shape, size, and appearance of these tissues in different patients. In this paper, we present an end-to-end trainable deep segmentation model termed Crossover-Net for robust segmentation in medical images. Our proposed model is inspired by an insightful observation: during segmentation, the representation from the horizontal and vertical directions can provide different local appearance and orthogonality context information, which helps enhance the discrimination between different tissues by simultaneously learning from these two directions. Specifically, by converting the segmentation task to a pixel/voxel-wise prediction problem, firstly, we originally propose a cross-shaped patch, namely crossover-patch, which consists of a pair of (orthogonal and overlapped) vertical and horizontal patches, to capture the orthogonal vertical and horizontal relation. Then, we develop the Crossover-Net to learn the vertical-horizontal crossover relation captured by our crossover-patches. To achieve this goal, for learning the representation on a typical crossover-patch, we design a novel loss function to (1) impose the consistency on the overlap region of the vertical and horizontal patches and (2) preserve the diversity on their non-overlap regions. We have extensively evaluated our method on CT kidney tumor, MR cardiac, and X-ray breast mass segmentation tasks. Promising results are achieved according to our extensive evaluation and comparison with the state-of-the-art segmentation models.
△ Less
Submitted 3 April, 2020;
originally announced April 2020.
-
Detecting Pancreatic Ductal Adenocarcinoma in Multi-phase CT Scans via Alignment Ensemble
Authors:
Yingda Xia,
Qihang Yu,
Wei Shen,
Yuyin Zhou,
Elliot K. Fishman,
Alan L. Yuille
Abstract:
Pancreatic ductal adenocarcinoma (PDAC) is one of the most lethal cancers among the population. Screening for PDACs in dynamic contrast-enhanced CT is beneficial for early diagnosis. In this paper, we investigate the problem of automated detecting PDACs in multi-phase (arterial and venous) CT scans. Multiple phases provide more information than single phase, but they are unaligned and inhomogeneou…
▽ More
Pancreatic ductal adenocarcinoma (PDAC) is one of the most lethal cancers among the population. Screening for PDACs in dynamic contrast-enhanced CT is beneficial for early diagnosis. In this paper, we investigate the problem of automated detecting PDACs in multi-phase (arterial and venous) CT scans. Multiple phases provide more information than single phase, but they are unaligned and inhomogeneous in texture, making it difficult to combine cross-phase information seamlessly. We study multiple phase alignment strategies, i.e., early alignment (image registration), late alignment (high-level feature registration), and slow alignment (multi-level feature registration), and suggest an ensemble of all these alignments as a promising way to boost the performance of PDAC detection. We provide an extensive empirical evaluation on two PDAC datasets and show that the proposed alignment ensemble significantly outperforms previous state-of-the-art approaches, illustrating the strong potential for clinical use.
△ Less
Submitted 1 July, 2020; v1 submitted 18 March, 2020;
originally announced March 2020.
-
WiEps: Measurement of Dielectric Property with Commodity WiFi Device -- An application to Ethanol/Water Mixture
Authors:
Hang Song,
Bo Wei,
Qun Yu,
Xia Xiao,
Takamaro Kikkawa
Abstract:
WiFi signal has become accessible everywhere, providing high-speed data transmission experience. Besides the communication service, channel state information (CSI) of the WiFi signals is widely employed for numerous Internet of Things (IoT) applications. Recently, most of these applications are based on analysis of the microwave reflections caused by physical movement of the objective. In this pap…
▽ More
WiFi signal has become accessible everywhere, providing high-speed data transmission experience. Besides the communication service, channel state information (CSI) of the WiFi signals is widely employed for numerous Internet of Things (IoT) applications. Recently, most of these applications are based on analysis of the microwave reflections caused by physical movement of the objective. In this paper, a novel contactless wireless sensing technique named WiEps is developed to measure the dielectric properties of the material, exploiting the transmission characteristics of the WiFi signals. In WiEps, the material under test is placed between the transmitter antenna and receiver antenna. A theoretical model is proposed to quantitatively describe the relationship between CSI data and dielectric properties of the material. During the experiment, the phase and amplitude of the transmitted WiFi signals are extracted from the measured CSI data. The parameters of the theoretical model are calculated using measured data from the known materials. Then, WiEps is utilized to estimate the dielectric properties of unknown materials. The proposed technique is first applied to the ethanol/water mixtures. Then, additional liquids are measured for further verification. The estimated permittivities and conductivities show good agreement with the actual values, with the average error of 4.0% and 8.9%, respectively, indicating the efficacy of WiEps. By measuring the dielectric property, this technique is promising to be applied to new IoT applications using ubiquitous WiFi signals, such as food engineering, material manufacturing process monitoring, and security check.
△ Less
Submitted 5 June, 2020; v1 submitted 3 March, 2020;
originally announced March 2020.
-
When Radiology Report Generation Meets Knowledge Graph
Authors:
Yixiao Zhang,
Xiaosong Wang,
Ziyue Xu,
Qihang Yu,
Alan Yuille,
Daguang Xu
Abstract:
Automatic radiology report generation has been an attracting research problem towards computer-aided diagnosis to alleviate the workload of doctors in recent years. Deep learning techniques for natural image captioning are successfully adapted to generating radiology reports. However, radiology image reporting is different from the natural image captioning task in two aspects: 1) the accuracy of p…
▽ More
Automatic radiology report generation has been an attracting research problem towards computer-aided diagnosis to alleviate the workload of doctors in recent years. Deep learning techniques for natural image captioning are successfully adapted to generating radiology reports. However, radiology image reporting is different from the natural image captioning task in two aspects: 1) the accuracy of positive disease keyword mentions is critical in radiology image reporting in comparison to the equivalent importance of every single word in a natural image caption; 2) the evaluation of reporting quality should focus more on matching the disease keywords and their associated attributes instead of counting the occurrence of N-gram. Based on these concerns, we propose to utilize a pre-constructed graph embedding module (modeled with a graph convolutional neural network) on multiple disease findings to assist the generation of reports in this work. The incorporation of knowledge graph allows for dedicated feature learning for each disease finding and the relationship modeling between them. In addition, we proposed a new evaluation metric for radiology image reporting with the assistance of the same composed graph. Experimental results demonstrate the superior performance of the methods integrated with the proposed graph embedding module on a publicly accessible dataset (IU-RR) of chest radiographs compared with previous approaches using both the conventional evaluation metrics commonly adopted for image captioning and our proposed ones.
△ Less
Submitted 19 February, 2020;
originally announced February 2020.
-
Fully Dense Neural Network for the Automatic Modulation Recognition
Authors:
Miao Du,
Qin Yu,
Shaomin Fei,
Chen Wang,
Xiaofeng Gong,
Ruisen Luo
Abstract:
Nowadays, we mainly use various convolution neural network (CNN) structures to extract features from radio data or spectrogram in AMR. Based on expert experience and spectrograms, they not only increase the difficulty of preprocessing, but also consume a lot of memory. In order to directly use in-phase and quadrature (IQ) data obtained by the receiver and enhance the efficiency of network extracti…
▽ More
Nowadays, we mainly use various convolution neural network (CNN) structures to extract features from radio data or spectrogram in AMR. Based on expert experience and spectrograms, they not only increase the difficulty of preprocessing, but also consume a lot of memory. In order to directly use in-phase and quadrature (IQ) data obtained by the receiver and enhance the efficiency of network extraction features to improve the recognition rate of modulation mode, this paper proposes a new network structure called Fully Dense Neural Network (FDNN). This network uses residual blocks to extract features, dense connect to reduce model size, and adds attentions mechanism to recalibrate. Experiments on RML2016.10a show that this network has a higher recognition rate and lower model complexity. And it shows that the FDNN model with dense connections can not only extract features effectively but also greatly reduce model parameters, which also provides a significant contribution for the application of deep learning to the intelligent radio system.
△ Less
Submitted 7 December, 2019;
originally announced December 2019.
-
Dictionary Learning with BLOTLESS Update
Authors:
Qi Yu,
Wei Dai,
Zoran Cvetkovic,
Jubo Zhu
Abstract:
Algorithms for learning a dictionary to sparsely represent a given dataset typically alternate between sparse coding and dictionary update stages. Methods for dictionary update aim to minimise expansion error by updating dictionary vectors and expansion coefficients given patterns of non-zero coefficients obtained in the sparse coding stage. We propose a block total least squares (BLOTLESS) algori…
▽ More
Algorithms for learning a dictionary to sparsely represent a given dataset typically alternate between sparse coding and dictionary update stages. Methods for dictionary update aim to minimise expansion error by updating dictionary vectors and expansion coefficients given patterns of non-zero coefficients obtained in the sparse coding stage. We propose a block total least squares (BLOTLESS) algorithm for dictionary update. BLOTLESS updates a block of dictionary elements and the corresponding sparse coefficients simultaneously. In the error free case, three necessary conditions for exact recovery are identified. Lower bounds on the number of training data are established so that the necessary conditions hold with high probability. Numerical simulations show that the bounds approximate well the number of training data needed for exact dictionary recovery. Numerical experiments further demonstrate several benefits of dictionary learning with BLOTLESS update compared with state-of-the-art algorithms especially when the amount of training data is small.
△ Less
Submitted 1 February, 2020; v1 submitted 24 June, 2019;
originally announced June 2019.
-
A Unified Framework for Wide Area Measurement System Planning
Authors:
James J. Q. Yu,
Albert Y. S. Lam,
David J. Hill,
Victor O. K. Li
Abstract:
Wide area measurement system (WAMS) is one of the essential components in the future power system. To make WAMS construction plans, practical models of the power network observability, reliability, and underlying communication infrastructures need to be considered. To address this challenging problem, in this paper we propose a unified framework for WAMS planning to cover most realistic concerns i…
▽ More
Wide area measurement system (WAMS) is one of the essential components in the future power system. To make WAMS construction plans, practical models of the power network observability, reliability, and underlying communication infrastructures need to be considered. To address this challenging problem, in this paper we propose a unified framework for WAMS planning to cover most realistic concerns in the construction process. The framework jointly optimizes the system construction cost, measurement reliability, and volume of synchrophasor data traffic resulting in a multi-objective optimization problem, which provides multiple Pareto optimal solutions to suit different requirements by the utilities. The framework is verified on two IEEE test systems. The simulation results demonstrate the trade-off relationships among the proposed objectives. Moreover, the proposed framework can develop optimal WAMS plans for full observability with minimal cost. This work develops a comprehensive framework for most practical WAMS construction designs.
△ Less
Submitted 21 November, 2017;
originally announced November 2017.
-
Delay Aware Intelligent Transient Stability Assessment System
Authors:
James J. Q. Yu,
Albert Y. S. Lam,
David J. Hill,
Victor O. K. Li
Abstract:
Transient stability assessment is a critical tool for power system design and operation. With the emerging advanced synchrophasor measurement techniques, machine learning methods are playing an increasingly important role in power system stability assessment. However, most existing research makes a strong assumption that the measurement data transmission delay is negligible. In this paper, we focu…
▽ More
Transient stability assessment is a critical tool for power system design and operation. With the emerging advanced synchrophasor measurement techniques, machine learning methods are playing an increasingly important role in power system stability assessment. However, most existing research makes a strong assumption that the measurement data transmission delay is negligible. In this paper, we focus on investigating the influence of communication delay on synchrophasor-based transient stability assessment. In particular, we develop a delay aware intelligent system to address this issue. By utilizing an ensemble of multiple long short-term memory networks, the proposed system can make early assessments to achieve a much shorter response time by utilizing incomplete system variable measurements. Compared with existing work, our system is able to make accurate assessments with a significantly improved efficiency. We perform numerous case studies to demonstrate the superiority of the proposed intelligent system, in which accurate assessments can be developed with time one third less than state-of-the-art methodologies. Moreover, the simulations indicate that noise in the measurements has trivial impact on the assessment performance, demonstrating the robustness of the proposed system.
△ Less
Submitted 21 November, 2017;
originally announced November 2017.
-
Hybrid Filtering for a Class of Quantum Systems with Classical Disturbances
Authors:
Qi Yu,
Daoyi Dong,
Ian R. Petersen,
Qing Gao
Abstract:
A filtering problem for a class of quantum systems disturbed by a classical stochastic process is investigated in this paper. The classical disturbance process, which is assumed to be described by a linear stochastic differential equation, is modeled by a quantum cavity model. Then the hybrid quantum-classical system is described by a combined quantum system consisting of two quantum cavity subsys…
▽ More
A filtering problem for a class of quantum systems disturbed by a classical stochastic process is investigated in this paper. The classical disturbance process, which is assumed to be described by a linear stochastic differential equation, is modeled by a quantum cavity model. Then the hybrid quantum-classical system is described by a combined quantum system consisting of two quantum cavity subsystems. Quantum filtering theory and a quantum extended Kalman filter method are employed to estimate the states of the combined quantum system. An estimate of the classical stochastic process is derived from the estimate of the combined quantum system. The effectiveness and performance of the proposed methods are illustrated by numerical results.
△ Less
Submitted 27 March, 2017;
originally announced March 2017.
-
Coordinated Autonomous Vehicle Parking for Vehicle-to-Grid Services: Formulation and Distributed Algorithm
Authors:
Albert Y. S. Lam,
James J. Q. Yu,
Yunhe Hou,
Victor O. K. Li
Abstract:
Autonomous vehicles (AVs) will revolutionarize ground transport and take a substantial role in the future transportation system. Most AVs are likely to be electric vehicles (EVs) and they can participate in the vehicle-to-grid (V2G) system to support various V2G services. Although it is generally infeasible for EVs to dictate their routes, we can design AV travel plans to fulfill certain system-wi…
▽ More
Autonomous vehicles (AVs) will revolutionarize ground transport and take a substantial role in the future transportation system. Most AVs are likely to be electric vehicles (EVs) and they can participate in the vehicle-to-grid (V2G) system to support various V2G services. Although it is generally infeasible for EVs to dictate their routes, we can design AV travel plans to fulfill certain system-wide objectives. In this paper, we focus on the AVs looking for parking and study how they can be led to appropriate parking facilities to support V2G services. We formulate the Coordinated Parking Problem (CPP), which can be solved by a standard integer linear program solver but requires long computational time. To make it more practical, we develop a distributed algorithm to address CPP based on dual decomposition. We carry out a series of simulations to evaluate the proposed solution methods. Our results show that the distributed algorithm can produce nearly optimal solutions with substantially less computational time. A coarser time scale can improve computational time but degrade the solution quality resulting in possible infeasible solution. Even with communication loss, the distributed algorithm can still perform well and converge with only little degradation in speed.
△ Less
Submitted 5 January, 2017;
originally announced January 2017.
-
Intelligent Time-Adaptive Transient Stability Assessment System
Authors:
James J. Q. Yu,
David J. Hill,
Albert Y. S. Lam,
Jiatao Gu,
Victor O. K. Li
Abstract:
Online identification of post-contingency transient stability is essential in power system control, as it facilitates the grid operator to decide and coordinate system failure correction control actions. Utilizing machine learning methods with synchrophasor measurements for transient stability assessment has received much attention recently with the gradual deployment of wide-area protection and c…
▽ More
Online identification of post-contingency transient stability is essential in power system control, as it facilitates the grid operator to decide and coordinate system failure correction control actions. Utilizing machine learning methods with synchrophasor measurements for transient stability assessment has received much attention recently with the gradual deployment of wide-area protection and control systems. In this paper, we develop a transient stability assessment system based on the long short-term memory network. By proposing a temporal self-adaptive scheme, our proposed system aims to balance the trade-off between assessment accuracy and response time, both of which may be crucial in real-world scenarios. Compared with previous work, the most significant enhancement is that our system learns from the temporal data dependencies of the input data, which contributes to better assessment accuracy. In addition, the model structure of our system is relatively less complex, speeding up the model training process. Case studies on three power systems demonstrate the efficacy of the proposed transient stability assessment system.
△ Less
Submitted 21 May, 2017; v1 submitted 26 October, 2016;
originally announced October 2016.
-
Coordinated Electric Vehicle Charging Control with Aggregator Power Trading and Indirect Load Control
Authors:
James J. Q. Yu,
Junhao Lin,
Albert Y. S. Lam,
Victor O. K. Li
Abstract:
Due to the increasing concern for greenhouse gas emissions and fossil fuel security, electric vehicles (EVs) have attracted much attention in recent years. EVs can aggregate together constituting the vehicle-to-grid system. Coordination of EVs is beneficial to the power system in many ways. In this paper, we formulate a novel large-scale EV charging problem with energy trading in order to maximize…
▽ More
Due to the increasing concern for greenhouse gas emissions and fossil fuel security, electric vehicles (EVs) have attracted much attention in recent years. EVs can aggregate together constituting the vehicle-to-grid system. Coordination of EVs is beneficial to the power system in many ways. In this paper, we formulate a novel large-scale EV charging problem with energy trading in order to maximize the aggregator profit. This problem is non-convex and can be solved with a centralized iterative approach. To overcome the computation complexity issue brought by the non-convexity, we develop a distributed optimization-based heuristic. To evaluate our proposed approach, a modified IEEE 118 bus testing system is employed with 10 aggregators serving 30 000 EVs. The simulation results indicate that our proposed distributed heuristic with energy trading can effectively increase the total profit of aggregators. In addition, the proposed distributed optimization-based heuristic strategy can achieve near-optimal performance.
△ Less
Submitted 25 April, 2017; v1 submitted 4 August, 2015;
originally announced August 2015.