Search | arXiv e-print repository

An Interpretable Two-Stage Feature Decomposition Method for Deep Learning-based SAR ATR

Authors: Chenwei Wang, Renjie Xu, Congwen Wu, Cunyi Yin, Ziyun Liao, Deqing Mao, Sitong Zhang, Hong Yan

Abstract: Synthetic aperture radar automatic target recognition (SAR ATR) has seen significant performance improvements with deep learning. However, the black-box nature of deep SAR ATR introduces low confidence and high risks in decision-critical SAR applications, hindering practical deployment. To address this issue, deep SAR ATR should provide an interpretable reasoning basis $r_b$ and logic $λ_w$, formi… ▽ More Synthetic aperture radar automatic target recognition (SAR ATR) has seen significant performance improvements with deep learning. However, the black-box nature of deep SAR ATR introduces low confidence and high risks in decision-critical SAR applications, hindering practical deployment. To address this issue, deep SAR ATR should provide an interpretable reasoning basis $r_b$ and logic $λ_w$, forming the reasoning logic $\sum_{i} {{r_b^i} \times {λ_w^i}} =pred$ behind the decisions. Therefore, this paper proposes a physics-based two-stage feature decomposition method for interpretable deep SAR ATR, which transforms uninterpretable deep features into attribute scattering center components (ASCC) with clear physical meanings. First, ASCCs are obtained through a clustering algorithm. To extract independent physical components from deep features, we propose a two-stage decomposition method. In the first stage, a feature decoupling and discrimination module separates deep features into approximate ASCCs with global discriminability. In the second stage, a multilayer orthogonal non-negative matrix tri-factorization (MLO-NMTF) further decomposes the ASCCs into independent components with distinct physical meanings. The MLO-NMTF elegantly aligns with the clustering algorithms to obtain ASCCs. Finally, this method ensures both an interpretable reasoning process and accurate recognition results. Extensive experiments on four benchmark datasets confirm its effectiveness, showcasing the method's interpretability, robust recognition performance, and strong generalization capability. △ Less

Submitted 11 June, 2025; originally announced June 2025.

arXiv:2506.01737 [pdf, ps, other]

The Promise of Spiking Neural Networks for Ubiquitous Computing: A Survey and New Perspectives

Authors: Hemanth Sabbella, Archit Mukherjee, Thivya Kandappu, Sounak Dey, Arpan Pal, Archan Misra, Dong Ma

Abstract: Spiking neural networks (SNNs) have emerged as a class of bio -inspired networks that leverage sparse, event-driven signaling to achieve low-power computation while inherently modeling temporal dynamics. Such characteristics align closely with the demands of ubiquitous computing systems, which often operate on resource-constrained devices while continuously monitoring and processing time-series se… ▽ More Spiking neural networks (SNNs) have emerged as a class of bio -inspired networks that leverage sparse, event-driven signaling to achieve low-power computation while inherently modeling temporal dynamics. Such characteristics align closely with the demands of ubiquitous computing systems, which often operate on resource-constrained devices while continuously monitoring and processing time-series sensor data. Despite their unique and promising features, SNNs have received limited attention and remain underexplored (or at least, under-adopted) within the ubiquitous computing community. To address this gap, this paper first introduces the core components of SNNs, both in terms of models and training mechanisms. It then presents a systematic survey of 76 SNN-based studies focused on time-series data analysis, categorizing them into six key application domains. For each domain, we summarize relevant works and subsequent advancements, distill core insights, and highlight key takeaways for researchers and practitioners. To facilitate hands-on experimentation, we also provide a comprehensive review of current software frameworks and neuromorphic hardware platforms, detailing their capabilities and specifications, and then offering tailored recommendations for selecting development tools based on specific application needs. Finally, we identify prevailing challenges within each application domain and propose future research directions that need be explored in ubiquitous community. Our survey highlights the transformative potential of SNNs in enabling energy-efficient ubiquitous sensing across diverse application domains, while also serving as an essential introduction for researchers looking to enter this emerging field. △ Less

Submitted 2 June, 2025; originally announced June 2025.

Comments: 50 pages

ACM Class: I.2

arXiv:2505.06296 [pdf, other]

Q-Heart: ECG Question Answering via Knowledge-Informed Multimodal LLMs

Authors: Hung Manh Pham, Jialu Tang, Aaqib Saeed, Dong Ma

Abstract: Electrocardiography (ECG) offers critical cardiovascular insights, such as identifying arrhythmias and myocardial ischemia, but enabling automated systems to answer complex clinical questions directly from ECG signals (ECG-QA) remains a significant challenge. Current approaches often lack robust multimodal reasoning capabilities or rely on generic architectures ill-suited for the nuances of physio… ▽ More Electrocardiography (ECG) offers critical cardiovascular insights, such as identifying arrhythmias and myocardial ischemia, but enabling automated systems to answer complex clinical questions directly from ECG signals (ECG-QA) remains a significant challenge. Current approaches often lack robust multimodal reasoning capabilities or rely on generic architectures ill-suited for the nuances of physiological signals. We introduce Q-Heart, a novel multimodal framework designed to bridge this gap. Q-Heart leverages a powerful, adapted ECG encoder and integrates its representations with textual information via a specialized ECG-aware transformer-based mapping layer. Furthermore, Q-Heart leverages dynamic prompting and retrieval of relevant historical clinical reports to guide tuning the language model toward knowledge-aware ECG reasoning. Extensive evaluations on the benchmark ECG-QA dataset show Q-Heart achieves state-of-the-art performance, outperforming existing methods by a 4% improvement in exact match accuracy. Our work demonstrates the effectiveness of combining domain-specific architectural adaptations with knowledge-augmented LLM instruction tuning for complex physiological ECG analysis, paving the way for more capable and potentially interpretable clinical patient care systems. △ Less

Submitted 7 May, 2025; originally announced May 2025.

arXiv:2503.12864 [pdf, other]

Robust Co-Optimization of Distribution Network Hardening and Mobile Resource Scheduling with Decision-Dependent Uncertainty

Authors: Donglai Ma, Xiaoyu Cao, Bo Zeng, Chen Chen, Qiaozhu Zhai, Qing-Shan Jia, Xiaohong Guan

Abstract: This paper studies the robust co-planning of proactive network hardening and mobile hydrogen energy resources (MHERs) scheduling, which is to enhance the resilience of power distribution network (PDN) against the disastrous events. A decision-dependent robust optimization model is formulated with min-max resilience constraint and discrete recourse structure, which helps achieve the load survivabil… ▽ More This paper studies the robust co-planning of proactive network hardening and mobile hydrogen energy resources (MHERs) scheduling, which is to enhance the resilience of power distribution network (PDN) against the disastrous events. A decision-dependent robust optimization model is formulated with min-max resilience constraint and discrete recourse structure, which helps achieve the load survivability target considering endogenous uncertainties. Different from the traditional model with a fixed uncertainty set, we adopt a dynamic representation that explicitly captures the endogenous uncertainties of network contingency as well as the available hydrogen storage levels of MHERs, which induces a decision-dependent uncertainty (DDU) set. Also, the multi-period adaptive routing and energy scheduling of MHERs are modeled as a mixed-integer recourse problem for further decreasing the resilience cost. Then, a nested parametric column-and-constraint generation (N-PC&CG) algorithm is customized and developed to solve this challenging formulation. By leveraging the structural property of the DDU set as well as the combination of discrete recourse decisions and the corresponding extreme points, we derive a strengthened solution scheme with nontrivial enhancement strategies to realize efficient and exact computation. Numerical results on 14-bus test system and 56-bus real-world distribution network demonstrate the resilience benefits and economical feasibility of the proposed method under different damage severity levels. Moreover, the enhanced N-PC&CG shows a superior solution capability to support prompt decisions for resilient planning with DDU models. △ Less

Submitted 17 March, 2025; originally announced March 2025.

Comments: 15 pages, 3 figures

arXiv:2503.04375 [pdf, other]

Proactive Robust Hardening of Resilient Power Distribution Network: Decision-Dependent Uncertainty Modeling and Fast Solution Strategy

Authors: Donglai Ma, Xiaoyu Cao, Bo Zeng, Qing-Shan Jia, Chen Chen, Qiaozhu Zhai, Xiaohong Guan

Abstract: To address the power system hardening problem, traditional approaches often adopt robust optimization (RO) that considers a fixed set of concerned contingencies, regardless of the fact that hardening some components actually renders relevant contingencies impractical. In this paper, we directly adopt a dynamic uncertainty set that explicitly incorporates the impact of hardening decisions on the wo… ▽ More To address the power system hardening problem, traditional approaches often adopt robust optimization (RO) that considers a fixed set of concerned contingencies, regardless of the fact that hardening some components actually renders relevant contingencies impractical. In this paper, we directly adopt a dynamic uncertainty set that explicitly incorporates the impact of hardening decisions on the worst-case contingencies, which leads to a decision-dependent uncertainty (DDU) set. Then, a DDU-based robust-stochastic optimization (DDU-RSO) model is proposed to support the hardening decisions on distribution lines and distributed generators (DGs). Also, the randomness of load variations and available storage levels is considered through stochastic programming (SP) in the innermost level problem. Various corrective measures (e.g., the joint scheduling of DGs and energy storage) are included, coupling with a finite support of stochastic scenarios, for resilience enhancement. To relieve the computation burden of this new hardening formulation, an enhanced customization of parametric column-and-constraint generation (P-C&CG) algorithm is developed. By leveraging the network structural information, the enhancement strategies based on resilience importance indices are designed to improve the convergence performance. Numerical results on 33-bus and 118-bus test distribution networks have demonstrated the effectiveness of DDU-RSO aided hardening scheme. Furthermore, in comparison to existing solution methods, the enhanced P-C&CG has achieved a superior performance by reducing the solution time by a few orders of magnitudes. △ Less

Submitted 6 March, 2025; originally announced March 2025.

arXiv:2503.01248 [pdf, other]

Comprehensive Evaluation of OCT-based Automated Segmentation of Retinal Layer, Fluid and Hyper-Reflective Foci: Impact on Diabetic Retinopathy Severity Assessment

Authors: S. Chen, D. Ma, M. Raviselvan, S. Sundaramoorthy, K. Popuri, M. J. Ju, M. V. Sarunic, D. Ratra, M. F. Beg

Abstract: Diabetic retinopathy (DR) is a leading cause of vision loss, requiring early and accurate assessment to prevent irreversible damage. Spectral Domain Optical Coherence Tomography (SD-OCT) enables high-resolution retinal imaging, but automated segmentation performance varies, especially in cases with complex fluid and hyperreflective foci (HRF) patterns. This study proposes an active-learning-based… ▽ More Diabetic retinopathy (DR) is a leading cause of vision loss, requiring early and accurate assessment to prevent irreversible damage. Spectral Domain Optical Coherence Tomography (SD-OCT) enables high-resolution retinal imaging, but automated segmentation performance varies, especially in cases with complex fluid and hyperreflective foci (HRF) patterns. This study proposes an active-learning-based deep learning pipeline for automated segmentation of retinal layers, fluid, and HRF, using four state-of-the-art models: U-Net, SegFormer, SwinUNETR, and VM-UNet, trained on expert-annotated SD-OCT volumes. Segmentation accuracy was evaluated with five-fold cross-validation, and retinal thickness was quantified using a K-nearest neighbors algorithm and visualized with Early Treatment Diabetic Retinopathy Study (ETDRS) maps. SwinUNETR achieved the highest overall accuracy (DSC = 0.7719; NSD = 0.8149), while VM-UNet excelled in specific layers. Structural differences were observed between non-proliferative and proliferative DR, with layer-specific thickening correlating with visual acuity impairment. The proposed framework enables robust, clinically relevant DR assessment while reducing the need for manual annotation, supporting improved disease monitoring and treatment planning. △ Less

Submitted 10 April, 2025; v1 submitted 3 March, 2025; originally announced March 2025.

Comments: 20 pages, 11 figures

arXiv:2501.02000 [pdf, other]

Multi-Center Study on Deep Learning-Assisted Detection and Classification of Fetal Central Nervous System Anomalies Using Ultrasound Imaging

Authors: Yang Qi, Jiaxin Cai, Jing Lu, Runqing Xiong, Rongshang Chen, Liping Zheng, Duo Ma

Abstract: Prenatal ultrasound evaluates fetal growth and detects congenital abnormalities during pregnancy, but the examination of ultrasound images by radiologists requires expertise and sophisticated equipment, which would otherwise fail to improve the rate of identifying specific types of fetal central nervous system (CNS) abnormalities and result in unnecessary patient examinations. We construct a deep… ▽ More Prenatal ultrasound evaluates fetal growth and detects congenital abnormalities during pregnancy, but the examination of ultrasound images by radiologists requires expertise and sophisticated equipment, which would otherwise fail to improve the rate of identifying specific types of fetal central nervous system (CNS) abnormalities and result in unnecessary patient examinations. We construct a deep learning model to improve the overall accuracy of the diagnosis of fetal cranial anomalies to aid prenatal diagnosis. In our collected multi-center dataset of fetal craniocerebral anomalies covering four typical anomalies of the fetal central nervous system (CNS): anencephaly, encephalocele (including meningocele), holoprosencephaly, and rachischisis, patient-level prediction accuracy reaches 94.5%, with an AUROC value of 99.3%. In the subgroup analyzes, our model is applicable to the entire gestational period, with good identification of fetal anomaly types for any gestational period. Heatmaps superimposed on the ultrasound images not only provide a visual interpretation for the algorithm but also provide an intuitive visual aid to the physician by highlighting key areas that need to be reviewed, helping the physician to quickly identify and validate key areas. Finally, the retrospective reader study demonstrates that by combining the automatic prediction of the DL system with the professional judgment of the radiologist, the diagnostic accuracy and efficiency can be effectively improved and the misdiagnosis rate can be reduced, which has an important clinical application prospect. △ Less

Submitted 1 January, 2025; originally announced January 2025.

arXiv:2412.03959 [pdf, other]

Is FISHER All You Need in The Multi-AUV Underwater Target Tracking Task?

Authors: Jingzehua Xu, Guanwen Xie, Ziqi Zhang, Xiangwang Hou, Dongfang Ma, Shuai Zhang, Yong Ren, Dusit Niyato

Abstract: It is significant to employ multiple autonomous underwater vehicles (AUVs) to execute the underwater target tracking task collaboratively. However, it's pretty challenging to meet various prerequisites utilizing traditional control methods. Therefore, we propose an effective two-stage learning from demonstrations training framework, FISHER, to highlight the adaptability of reinforcement learning (… ▽ More It is significant to employ multiple autonomous underwater vehicles (AUVs) to execute the underwater target tracking task collaboratively. However, it's pretty challenging to meet various prerequisites utilizing traditional control methods. Therefore, we propose an effective two-stage learning from demonstrations training framework, FISHER, to highlight the adaptability of reinforcement learning (RL) methods in the multi-AUV underwater target tracking task, while addressing its limitations such as extensive requirements for environmental interactions and the challenges in designing reward functions. The first stage utilizes imitation learning (IL) to realize policy improvement and generate offline datasets. To be specific, we introduce multi-agent discriminator-actor-critic based on improvements of the generative adversarial IL algorithm and multi-agent IL optimization objective derived from the Nash equilibrium condition. Then in the second stage, we develop multi-agent independent generalized decision transformer, which analyzes the latent representation to match the future states of high-quality samples rather than reward function, attaining further enhanced policies capable of handling various scenarios. Besides, we propose a simulation to simulation demonstration generation procedure to facilitate the generation of expert demonstrations in underwater environments, which capitalizes on traditional control methods and can easily accomplish the domain transfer to obtain demonstrations. Extensive simulation experiments from multiple scenarios showcase that FISHER possesses strong stability, multi-task performance and capability of generalization. △ Less

Submitted 5 December, 2024; originally announced December 2024.

Journal ref: IEEE Transactions on Mobile Computing 2025

arXiv:2409.19585 [pdf, other]

Two-stage Framework for Robust Speech Emotion Recognition Using Target Speaker Extraction in Human Speech Noise Conditions

Authors: Jinyi Mi, Xiaohan Shi, Ding Ma, Jiajun He, Takuya Fujimura, Tomoki Toda

Abstract: Developing a robust speech emotion recognition (SER) system in noisy conditions faces challenges posed by different noise properties. Most previous studies have not considered the impact of human speech noise, thus limiting the application scope of SER. In this paper, we propose a novel two-stage framework for the problem by cascading target speaker extraction (TSE) method and SER. We first train… ▽ More Developing a robust speech emotion recognition (SER) system in noisy conditions faces challenges posed by different noise properties. Most previous studies have not considered the impact of human speech noise, thus limiting the application scope of SER. In this paper, we propose a novel two-stage framework for the problem by cascading target speaker extraction (TSE) method and SER. We first train a TSE model to extract the speech of target speaker from a mixture. Then, in the second stage, we utilize the extracted speech for SER training. Additionally, we explore a joint training of TSE and SER models in the second stage. Our developed system achieves a 14.33% improvement in unweighted accuracy (UA) compared to a baseline without using TSE method, demonstrating the effectiveness of our framework in mitigating the impact of human speech noise. Moreover, we conduct experiments considering speaker gender, showing that our framework performs particularly well in different-gender mixture. △ Less

Submitted 17 December, 2024; v1 submitted 29 September, 2024; originally announced September 2024.

Comments: This is the preprint version of the paper accepted at APSIPA ASC 2024

arXiv:2409.11796 [pdf, other]

Communication, Sensing and Control integrated Closed-loop System: Modeling, Control Design and Resource Allocation

Authors: Zeyang Meng, Dingyou Ma, Zhiqing Wei, Ying Zhou, Zhiyong Feng

Abstract: The wireless communication technologies have fundamentally revolutionized industrial operations. The operation of the automated equipment is conducted in a closed-loop manner, where the status of devices is collected and sent to the control center through the uplink channel, and the control center sends the calculated control commands back to the devices via downlink communication. However, existi… ▽ More The wireless communication technologies have fundamentally revolutionized industrial operations. The operation of the automated equipment is conducted in a closed-loop manner, where the status of devices is collected and sent to the control center through the uplink channel, and the control center sends the calculated control commands back to the devices via downlink communication. However, existing studies neglect the interdependent relationship between uplink and downlink communications, and there is an absence of a unified approach to model the communication, sensing, and control within the loop. This can lead to inaccurate performance assessments, ultimately hindering the ability to provide guidance for the design of practical systems. Therefore, this paper introduces an integrated closed-loop model that encompasses sensing, communication, and control functionalities, while addressing the coupling effects between uplink and downlink communications. Through the analysis of system convergence, an inequality pertaining to the performances of sensing, communication, and control is derived. Additionally, a joint optimization algorithm for control and resource allocation is proposed. Simulation results are presented to offer an intuitive understanding of the impact of system parameters. The findings of this paper unveil the intricate correlation among sensing, communication, and control, providing insights for the optimal design of industrial closed-loop systems. △ Less

Submitted 18 September, 2024; originally announced September 2024.

Comments: 12 pages, 6 figures

MSC Class: 60G99; 93D05 ACM Class: H.1.1; I.6.4

arXiv:2408.16415 [pdf, other]

UAV's Rotor Micro-Doppler Feature Extraction Using Integrated Sensing and Communication Signal: Algorithm Design and Testbed Evaluation

Authors: Jiachen Wei, Dingyou Ma, Feiyang He, Qixun Zhang, Zhiyong Feng, Zhengfeng Liu, Taohong Liang

Abstract: With the rapid application of unmanned aerial vehicles (UAVs) in urban areas, the identification and tracking of hovering UAVs have become critical challenges, significantly impacting the safety of aircraft take-off and landing operations. As a promising technology for 6G mobile systems, integrated sensing and communication (ISAC) can be used to detect high-mobility UAVs with a low deployment cost… ▽ More With the rapid application of unmanned aerial vehicles (UAVs) in urban areas, the identification and tracking of hovering UAVs have become critical challenges, significantly impacting the safety of aircraft take-off and landing operations. As a promising technology for 6G mobile systems, integrated sensing and communication (ISAC) can be used to detect high-mobility UAVs with a low deployment cost. The micro-Doppler signals from UAV rotors can be leveraged to address the detection of low-mobility and hovering UAVs using ISAC signals. However, determining whether the frame structure of the ISAC system can be used to identify UAVs, and how to accurately capture the weak rotor micro-Doppler signals of UAVs in complex environments, remain two challenging problems. This paper first proposes a novel frame structure for UAV micro-Doppler extraction and the representation of UAV micro-Doppler signals within the channel state information (CSI). Furthermore, to address complex environments and the interference caused by UAV body vibrations, the rotor micro-Doppler null space pursuit (rmD-NSP) algorithm and the feature extraction algorithm synchroextracting transform (SET) are designed to effectively separate UAV's rotor micro-Doppler signals and enhance their features in the spectrogram. Finally, both simulation and hardware testbed demonstrate that the proposed rmD-NSP algorithm enables the ISAC base station (BS) to accurately and completely extract UAV's rotor micro-Doppler signals. Within a 0.1s observation period, ISAC BS successfully captures eight rotations of the DJI M300 RTK UAV's rotor in urban environments. Compared to the existing AM-FM NSP and NSP signal decomposition algorithms, the integrity of the rotor micro-Doppler features is improved by 60%. △ Less

Submitted 29 August, 2024; originally announced August 2024.

arXiv:2407.15903 [pdf, other]

Semantics Guided Disentangled GAN for Chest X-ray Image Rib Segmentation

Authors: Lili Huang, Dexin Ma, Xiaowei Zhao, Chenglong Li, Haifeng Zhao, Jin Tang, Chuanfu Li

Abstract: The label annotations for chest X-ray image rib segmentation are time consuming and laborious, and the labeling quality heavily relies on medical knowledge of annotators. To reduce the dependency on annotated data, existing works often utilize generative adversarial network (GAN) to generate training data. However, GAN-based methods overlook the nuanced information specific to individual organs, w… ▽ More The label annotations for chest X-ray image rib segmentation are time consuming and laborious, and the labeling quality heavily relies on medical knowledge of annotators. To reduce the dependency on annotated data, existing works often utilize generative adversarial network (GAN) to generate training data. However, GAN-based methods overlook the nuanced information specific to individual organs, which degrades the generation quality of chest X-ray image. Hence, we propose a novel Semantics guided Disentangled GAN (SD-GAN), which can generate the high-quality training data by fully utilizing the semantic information of different organs, for chest X-ray image rib segmentation. In particular, we use three ResNet50 branches to disentangle features of different organs, then use a decoder to combine features and generate corresponding images. To ensure that the generated images correspond to the input organ labels in semantics tags, we employ a semantics guidance module to perform semantic guidance on the generated images. To evaluate the efficacy of SD-GAN in generating high-quality samples, we introduce modified TransUNet(MTUNet), a specialized segmentation network designed for multi-scale contextual information extracting and multi-branch decoding, effectively tackling the challenge of organ overlap. We also propose a new chest X-ray image dataset (CXRS). It includes 1250 samples from various medical institutions. Lungs, clavicles, and 24 ribs are simultaneously annotated on each chest X-ray image. The visualization and quantitative results demonstrate the efficacy of SD-GAN in generating high-quality chest X-ray image-mask pairs. Using generated data, our trained MTUNet overcomes the limitations of the data scale and outperforms other segmentation networks. △ Less

Submitted 22 July, 2024; originally announced July 2024.

arXiv:2407.06901 [pdf, other]

RespEar: Earable-Based Robust Respiratory Rate Monitoring

Authors: Yang Liu, Kayla-Jade Butkow, Jake Stuchbury-Wass, Adam Pullin, Dong Ma, Cecilia Mascolo

Abstract: Respiratory rate (RR) monitoring is integral to understanding physical and mental health and tracking fitness. Existing studies have demonstrated the feasibility of RR monitoring under specific user conditions (e.g., while remaining still, or while breathing heavily). Yet, performing accurate, continuous and non-obtrusive RR monitoring across diverse daily routines and activities remains challengi… ▽ More Respiratory rate (RR) monitoring is integral to understanding physical and mental health and tracking fitness. Existing studies have demonstrated the feasibility of RR monitoring under specific user conditions (e.g., while remaining still, or while breathing heavily). Yet, performing accurate, continuous and non-obtrusive RR monitoring across diverse daily routines and activities remains challenging. In this work, we present RespEar, an earable-based system for robust RR monitoring. By leveraging the unique properties of in-ear microphones in earbuds, RespEar enables the use of Respiratory Sinus Arrhythmia (RSA) and Locomotor Respiratory Coupling (LRC), physiological couplings between cardiovascular activity, gait and respiration, to indirectly determine RR. This effectively addresses the challenges posed by the almost imperceptible breathing signals under daily activities. We further propose a suite of meticulously crafted signal processing schemes to improve RR estimation accuracy and robustness. With data collected from 18 subjects over 8 activities, RespEar measures RR with a mean absolute error (MAE) of 1.48 breaths per minutes (BPM) and a mean absolute percent error (MAPE) of 9.12% in sedentary conditions, and a MAE of 2.28 BPM and a MAPE of 11.04% in active conditions, respectively, which is unprecedented for a method capable of generalizing across conditions with a single modality. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.05391 [pdf, other]

Interference Management in MIMO-ISAC Systems: A Transceiver Design Approach

Authors: Yangyang Niu, Zhiqing Wei, Dingyou Ma, Xiaoyu Yang, Huici Wu, Zhiyong Feng, Jianhua Yuan

Abstract: The integrated sensing and communication (ISAC) system under multi-input multi-output (MIMO) architecture achieves dual functionalities of sensing and communication on the same platform by utilizing spatial gain, which provides a feasible paradigm facing spectrum congestion. However, the dual functionalities of sensing and communication operating simultaneously in the same platform bring severe in… ▽ More The integrated sensing and communication (ISAC) system under multi-input multi-output (MIMO) architecture achieves dual functionalities of sensing and communication on the same platform by utilizing spatial gain, which provides a feasible paradigm facing spectrum congestion. However, the dual functionalities of sensing and communication operating simultaneously in the same platform bring severe interference in the ISAC systems. Facing this challenge, we propose a joint optimization framework for transmit beamforming and receive filter design for ISAC systems with MIMO architecture. We aim to maximize the signal-to-clutter-plus-noise ratio (SCNR) at the receiver while considering various constraints such as waveform similarity, power budget, and communication performance requirements to ensure the integration of the dual functionalities. In particular, the overall transmit beamforming is refined into sensing beamforming and communication beamforming, and a quadratic transformation (QT) is introduced to relax and convert the complex non-convex optimization objective. An efficient algorithm based on covariance matrix tapers (CMT) is proposed to restructure the clutter covariance matrix considering the mismatched steering vector, thereby improving the robustness of the ISAC transceiver design. Numerical simulations are provided to demonstrate the effectiveness of the proposed algorithm. △ Less

Submitted 7 July, 2024; originally announced July 2024.

arXiv:2406.12323 [pdf, other]

doi 10.1109/TCOMM.2025.3573460

Hybrid Beamforming Design for Near-Field ISAC with Modular XL-MIMO

Authors: Chunwei Meng, Dingyou Ma, Zhaolin Wang, Yuanwei Liu, Zhiqing Wei, Zhiyong Feng

Abstract: A novel modular extremely large-scale multiple-input-multiple-output (XL-MIMO) integrated sensing and communication (ISAC) framework is proposed in this paper. We consider a downlink ISAC scenario and exploit the modular array architecture to enhance the communication spectral efficiency and sensing resolution while reducing the channel modeling complexity by employing the hybrid spherical and pla… ▽ More A novel modular extremely large-scale multiple-input-multiple-output (XL-MIMO) integrated sensing and communication (ISAC) framework is proposed in this paper. We consider a downlink ISAC scenario and exploit the modular array architecture to enhance the communication spectral efficiency and sensing resolution while reducing the channel modeling complexity by employing the hybrid spherical and planar wavefront model. Considering the hybrid digital-analog structure inherent to modular arrays, we formulate a joint analog-digital beamforming design problem based on the communication spectral efficiency and sensing signal-to-clutter-plus-noise ratio (SCNR). By exploring the structural similarity of the communication and sensing channels, it is proved that the optimal transmit covariance matrix lies in the subspace spanned by the subarray response vectors, yielding a closed-form solution for the optimal analog beamformer. Consequently, the joint design problem is transformed into a low-dimensional rank-constrained digital beamformer optimization. We first propose a manifold optimization method that directly optimizes the digital beamformer on the rank-constrained Stiefel manifold. Additionally, we develop an semidefinite relaxation (SDR)-based approach that relaxes the rank constraint and employ the randomization technique to obtain a near-optimal solution. Simulation results demonstrate the effectiveness of the proposed modular XL-MIMO ISAC framework and algorithms. △ Less

Submitted 20 February, 2025; v1 submitted 18 June, 2024; originally announced June 2024.

arXiv:2405.19338 [pdf, other]

doi 10.1038/s43856-024-00672-y

Accurate Patient Alignment without Unnecessary Imaging Dose via Synthesizing Patient-specific 3D CT Images from 2D kV Images

Authors: Yuzhen Ding, Jason M. Holmes, Hongying Feng, Baoxin Li, Lisa A. McGee, Jean-Claude M. Rwigema, Sujay A. Vora, Daniel J. Ma, Robert L. Foote, Samir H. Patel, Wei Liu

Abstract: In radiotherapy, 2D orthogonally projected kV images are used for patient alignment when 3D-on-board imaging(OBI) unavailable. But tumor visibility is constrained due to the projection of patient's anatomy onto a 2D plane, potentially leading to substantial setup errors. In treatment room with 3D-OBI such as cone beam CT(CBCT), the field of view(FOV) of CBCT is limited with unnecessarily high imag… ▽ More In radiotherapy, 2D orthogonally projected kV images are used for patient alignment when 3D-on-board imaging(OBI) unavailable. But tumor visibility is constrained due to the projection of patient's anatomy onto a 2D plane, potentially leading to substantial setup errors. In treatment room with 3D-OBI such as cone beam CT(CBCT), the field of view(FOV) of CBCT is limited with unnecessarily high imaging dose, thus unfavorable for pediatric patients. A solution to this dilemma is to reconstruct 3D CT from kV images obtained at the treatment position. Here, we propose a dual-models framework built with hierarchical ViT blocks. Unlike a proof-of-concept approach, our framework considers kV images as the solo input and can synthesize accurate, full-size 3D CT in real time(within milliseconds). We demonstrate the feasibility of the proposed approach on 10 patients with head and neck (H&N) cancer using image quality(MAE: <45HU), dosimetrical accuracy(Gamma passing rate (2%/2mm/10%)>97%) and patient position uncertainty(shift error: <0.4mm). The proposed framework can generate accurate 3D CT faithfully mirroring real-time patient position, thus significantly improving patient setup accuracy, keeping imaging dose minimum, and maintaining treatment veracity. △ Less

Submitted 1 April, 2024; originally announced May 2024.

Comments: 17 pages, 8 figures and tables

Journal ref: Communications Medicine 4, Article number: 241 (2024)

arXiv:2405.09022 [pdf, other]

doi 10.1109/JIOT.2024.3413687

Multi-Objective Optimization-based Transmit Beamforming for Multi-Target and Multi-User MIMO-ISAC Systems

Authors: Chunwei Meng, Zhiqing Wei, Dingyou Ma, Wanli Ni, Liyan Su, Zhiyong Feng

Abstract: Integrated sensing and communication (ISAC) is an enabling technology for the sixth-generation mobile communications, which equips the wireless communication networks with sensing capabilities. In this paper, we investigate transmit beamforming design for multiple-input and multiple-output (MIMO)-ISAC systems in scenarios with multiple radar targets and communication users. A general form of multi… ▽ More Integrated sensing and communication (ISAC) is an enabling technology for the sixth-generation mobile communications, which equips the wireless communication networks with sensing capabilities. In this paper, we investigate transmit beamforming design for multiple-input and multiple-output (MIMO)-ISAC systems in scenarios with multiple radar targets and communication users. A general form of multi-target sensing mutual information (MI) is derived, along with its upper bound, which can be interpreted as the sum of individual single-target sensing MI. Additionally, this upper bound can be achieved by suppressing the cross-correlation among reflected signals from different targets, which aligns with the principles of adaptive MIMO radar. Then, we propose a multi-objective optimization framework based on the signal-to-interference-plus-noise ratio of each user and the tight upper bound of sensing MI, introducing the Pareto boundary to characterize the achievable communication-sensing performance boundary of the proposed ISAC system. To achieve the Pareto boundary, the max-min system utility function method is employed, while considering the fairness between communication users and radar targets. Subsequently, the bisection search method is employed to find a specific Pareto optimal solution by solving a series of convex feasible problems. Finally, simulation results validate that the proposed method achieves a better tradeoff between multi-user communication and multi-target sensing performance. Additionally, utilizing the tight upper bound of sensing MI as a performance metric can enhance the multi-target resolution capability and angle estimation accuracy. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2404.07472 [pdf, other]

doi 10.1109/LWC.2024.3406577

Cramer-Rao Bounds for Near-Field Sensing: A Generic Modular Architecture

Authors: Chunwei Meng, Dingyou Ma, Xu Chen, Zhiyong Feng, Yuanwei Liu

Abstract: A generic modular array architecture is proposed, featuring uniform/non-uniform subarray layouts that allows for flexible deployment. The bistatic near-field sensing system is considered, where the target is located in the near-field of the whole modular array and the far-field of each subarray. Then, the closed-form expressions of Cramer-Rao bounds (CRBs) for range and angle estimations are deriv… ▽ More A generic modular array architecture is proposed, featuring uniform/non-uniform subarray layouts that allows for flexible deployment. The bistatic near-field sensing system is considered, where the target is located in the near-field of the whole modular array and the far-field of each subarray. Then, the closed-form expressions of Cramer-Rao bounds (CRBs) for range and angle estimations are derived based on the hybrid spherical and planar wave model (HSPM). Simulation results validate the accuracy of the derived closed-form CRBs and demonstrate that: i) The HSPM with varying angles of arrival (AoAs) between subarrays can reduce the CRB for range estimation compared to the traditional HSPM with shared AoA; and ii) The proposed generic modular architecture with subarrays positioned closer to the edges can significantly reduce the CRBs compared to the traditional modular architecture with uniform subarray layout, when the array aperture is fixed. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2402.15725 [pdf, other]

Text-guided HuBERT: Self-Supervised Speech Pre-training via Generative Adversarial Networks

Authors: Duo Ma, Xianghu Yue, Junyi Ao, Xiaoxue Gao, Haizhou Li

Abstract: Human language can be expressed in either written or spoken form, i.e. text or speech. Humans can acquire knowledge from text to improve speaking and listening. However, the quest for speech pre-trained models to leverage unpaired text has just started. In this paper, we investigate a new way to pre-train such a joint speech-text model to learn enhanced speech representations and benefit various s… ▽ More Human language can be expressed in either written or spoken form, i.e. text or speech. Humans can acquire knowledge from text to improve speaking and listening. However, the quest for speech pre-trained models to leverage unpaired text has just started. In this paper, we investigate a new way to pre-train such a joint speech-text model to learn enhanced speech representations and benefit various speech-related downstream tasks. Specifically, we propose a novel pre-training method, text-guided HuBERT, or T-HuBERT, which performs self-supervised learning over speech to derive phoneme-like discrete representations. And these phoneme-like pseudo-label sequences are firstly derived from speech via the generative adversarial networks (GAN) to be statistically similar to those from additional unpaired textual data. In this way, we build a bridge between unpaired speech and text in an unsupervised manner. Extensive experiments demonstrate the significant superiority of our proposed method over various strong baselines, which achieves up to 15.3% relative Word Error Rate (WER) reduction on the LibriSpeech dataset. △ Less

Submitted 3 August, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

Comments: 5 pages, 1 figures,5 tables, accepted by IEEE Signal Processing Letters(SPL)

arXiv:2311.10416 [pdf, ps, other]

Meta-DSP: A Meta-Learning Approach for Data-Driven Nonlinear Compensation in High-Speed Optical Fiber Systems

Authors: Xinyu Xiao, Zhennan Zhou, Bin Dong, Dingjiong Ma, Li Zhou, Jie Sun

Abstract: Nonlinear effects in high-speed optical fiber systems fundamentally limit channel capacity. While traditional Digital Backward Propagation (DBP) with adaptive filters addresses these effects, its computational complexity remains impractical. Data-driven solutions like Filtered DBP (FDBP) reduce complexity but critically lack inherent generalization: Their nonlinear compensation capability cannot b… ▽ More Nonlinear effects in high-speed optical fiber systems fundamentally limit channel capacity. While traditional Digital Backward Propagation (DBP) with adaptive filters addresses these effects, its computational complexity remains impractical. Data-driven solutions like Filtered DBP (FDBP) reduce complexity but critically lack inherent generalization: Their nonlinear compensation capability cannot be naturally extended to new transmission rates or WDM channel counts without retraining on newly collected data. We propose Meta-DSP, a novel signal processing pipeline combining: (1) Meta-DBP, a meta-learning-based DBP model that generalizes across transmission parameters without retraining, and (2) XPM-ADF, a carefully engineered adaptive filter designed to address multi-channel nonlinear distortions. The system demonstrates strong generalization, learning from 40 Gbaud single-channel data and successfully applying this knowledge to higher rates (80/160 Gbaud) and multi-channel configurations (up to 21 channels). Experimental results show Meta-DSP improves Q-factor by 0.55 dB over CDC in challenging scenarios while reducing computational complexity 10$\times$ versus DBP. This work provides a scalable solution for nonlinear compensation in dynamic optical networks, balancing performance with practical computational constraints. △ Less

Submitted 10 June, 2025; v1 submitted 17 November, 2023; originally announced November 2023.

arXiv:2309.09627 [pdf, other]

Electrolaryngeal Speech Intelligibility Enhancement Through Robust Linguistic Encoders

Authors: Lester Phillip Violeta, Wen-Chin Huang, Ding Ma, Ryuichi Yamamoto, Kazuhiro Kobayashi, Tomoki Toda

Abstract: We propose a novel framework for electrolaryngeal speech intelligibility enhancement through the use of robust linguistic encoders. Pretraining and fine-tuning approaches have proven to work well in this task, but in most cases, various mismatches, such as the speech type mismatch (electrolaryngeal vs. typical) or a speaker mismatch between the datasets used in each stage, can deteriorate the conv… ▽ More We propose a novel framework for electrolaryngeal speech intelligibility enhancement through the use of robust linguistic encoders. Pretraining and fine-tuning approaches have proven to work well in this task, but in most cases, various mismatches, such as the speech type mismatch (electrolaryngeal vs. typical) or a speaker mismatch between the datasets used in each stage, can deteriorate the conversion performance of this framework. To resolve this issue, we propose a linguistic encoder robust enough to project both EL and typical speech in the same latent space, while still being able to extract accurate linguistic information, creating a unified representation to reduce the speech type mismatch. Furthermore, we introduce HuBERT output features to the proposed framework for reducing the speaker mismatch, making it possible to effectively use a large-scale parallel dataset during pretraining. We show that compared to the conventional framework using mel-spectrogram input and output features, using the proposed framework enables the model to synthesize more intelligible and naturally sounding speech, as shown by a significant 16% improvement in character error rate and 0.83 improvement in naturalness score. △ Less

Submitted 20 January, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

Comments: Accepted to ICASSP 2024. Demo page: lesterphillip.github.io/icassp2024_el_sie

arXiv:2308.08313 [pdf, other]

ECPC-IDS:A benchmark endometrail cancer PET/CT image dataset for evaluation of semantic segmentation and detection of hypermetabolic regions

Authors: Dechao Tang, Tianming Du, Deguo Ma, Zhiyu Ma, Hongzan Sun, Marcin Grzegorzek, Huiyan Jiang, Chen Li

Abstract: Endometrial cancer is one of the most common tumors in the female reproductive system and is the third most common gynecological malignancy that causes death after ovarian and cervical cancer. Early diagnosis can significantly improve the 5-year survival rate of patients. With the development of artificial intelligence, computer-assisted diagnosis plays an increasingly important role in improving… ▽ More Endometrial cancer is one of the most common tumors in the female reproductive system and is the third most common gynecological malignancy that causes death after ovarian and cervical cancer. Early diagnosis can significantly improve the 5-year survival rate of patients. With the development of artificial intelligence, computer-assisted diagnosis plays an increasingly important role in improving the accuracy and objectivity of diagnosis, as well as reducing the workload of doctors. However, the absence of publicly available endometrial cancer image datasets restricts the application of computer-assisted diagnostic techniques.In this paper, a publicly available Endometrial Cancer PET/CT Image Dataset for Evaluation of Semantic Segmentation and Detection of Hypermetabolic Regions (ECPC-IDS) are published. Specifically, the segmentation section includes PET and CT images, with a total of 7159 images in multiple formats. In order to prove the effectiveness of segmentation methods on ECPC-IDS, five classical deep learning semantic segmentation methods are selected to test the image segmentation task. The object detection section also includes PET and CT images, with a total of 3579 images and XML files with annotation information. Six deep learning methods are selected for experiments on the detection task.This study conduct extensive experiments using deep learning-based semantic segmentation and object detection methods to demonstrate the differences between various methods on ECPC-IDS. As far as we know, this is the first publicly available dataset of endometrial cancer with a large number of multiple images, including a large amount of information required for image and target detection. ECPC-IDS can aid researchers in exploring new algorithms to enhance computer-assisted technology, benefiting both clinical doctors and patients greatly. △ Less

Submitted 11 October, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

Comments: 14 pages,6 figures

arXiv:2308.08172 [pdf, other]

AATCT-IDS: A Benchmark Abdominal Adipose Tissue CT Image Dataset for Image Denoising, Semantic Segmentation, and Radiomics Evaluation

Authors: Zhiyu Ma, Chen Li, Tianming Du, Le Zhang, Dechao Tang, Deguo Ma, Shanchuan Huang, Yan Liu, Yihao Sun, Zhihao Chen, Jin Yuan, Qianqing Nie, Marcin Grzegorzek, Hongzan Sun

Abstract: Methods: In this study, a benchmark \emph{Abdominal Adipose Tissue CT Image Dataset} (AATTCT-IDS) containing 300 subjects is prepared and published. AATTCT-IDS publics 13,732 raw CT slices, and the researchers individually annotate the subcutaneous and visceral adipose tissue regions of 3,213 of those slices that have the same slice distance to validate denoising methods, train semantic segmentati… ▽ More Methods: In this study, a benchmark \emph{Abdominal Adipose Tissue CT Image Dataset} (AATTCT-IDS) containing 300 subjects is prepared and published. AATTCT-IDS publics 13,732 raw CT slices, and the researchers individually annotate the subcutaneous and visceral adipose tissue regions of 3,213 of those slices that have the same slice distance to validate denoising methods, train semantic segmentation models, and study radiomics. For different tasks, this paper compares and analyzes the performance of various methods on AATTCT-IDS by combining the visualization results and evaluation data. Thus, verify the research potential of this data set in the above three types of tasks. Results: In the comparative study of image denoising, algorithms using a smoothing strategy suppress mixed noise at the expense of image details and obtain better evaluation data. Methods such as BM3D preserve the original image structure better, although the evaluation data are slightly lower. The results show significant differences among them. In the comparative study of semantic segmentation of abdominal adipose tissue, the segmentation results of adipose tissue by each model show different structural characteristics. Among them, BiSeNet obtains segmentation results only slightly inferior to U-Net with the shortest training time and effectively separates small and isolated adipose tissue. In addition, the radiomics study based on AATTCT-IDS reveals three adipose distributions in the subject population. Conclusion: AATTCT-IDS contains the ground truth of adipose tissue regions in abdominal CT slices. This open-source dataset can attract researchers to explore the multi-dimensional characteristics of abdominal adipose tissue and thus help physicians and patients in clinical practice. AATCT-IDS is freely published for non-commercial purpose at: \url{https://figshare.com/articles/dataset/AATTCT-IDS/23807256}. △ Less

Submitted 16 August, 2023; originally announced August 2023.

Comments: 17 pages, 7 figures

arXiv:2308.05489 [pdf, other]

doi 10.1109/JSTARS.2022.3218369

SAR Target Image Generation Method Using Azimuth-Controllable Generative Adversarial Network

Authors: Chenwei Wang, Jifang Pei, Xiaoyu Liu, Yulin Huang, Deqing Mao, Yin Zhang, Jianyu Yang

Abstract: Sufficient synthetic aperture radar (SAR) target images are very important for the development of researches. However, available SAR target images are often limited in practice, which hinders the progress of SAR application. In this paper, we propose an azimuth-controllable generative adversarial network to generate precise SAR target images with an intermediate azimuth between two given SAR image… ▽ More Sufficient synthetic aperture radar (SAR) target images are very important for the development of researches. However, available SAR target images are often limited in practice, which hinders the progress of SAR application. In this paper, we propose an azimuth-controllable generative adversarial network to generate precise SAR target images with an intermediate azimuth between two given SAR images' azimuths. This network mainly contains three parts: generator, discriminator, and predictor. Through the proposed specific network structure, the generator can extract and fuse the optimal target features from two input SAR target images to generate SAR target image. Then a similarity discriminator and an azimuth predictor are designed. The similarity discriminator can differentiate the generated SAR target images from the real SAR images to ensure the accuracy of the generated, while the azimuth predictor measures the difference of azimuth between the generated and the desired to ensure the azimuth controllability of the generated. Therefore, the proposed network can generate precise SAR images, and their azimuths can be controlled well by the inputs of the deep network, which can generate the target images in different azimuths to solve the small sample problem to some degree and benefit the researches of SAR images. Extensive experimental results show the superiority of the proposed method in azimuth controllability and accuracy of SAR target image generation. △ Less

Submitted 10 August, 2023; originally announced August 2023.

arXiv:2305.15636 [pdf]

Channelized analog microwave short-time Fourier transform in the optical domain with improved measurement performance

Authors: Xiaowei Li, Taixia Shi, Dong Ma, Yang Chen

Abstract: In this article, analog microwave short-time Fourier transform (STFT) with improved measurement performance is implemented in the optical domain by employing stimulated Brillouin scattering (SBS) and channelization. By jointly using three optical frequency combs and filter- and SBS-based frequency-to-time mapping (FTTM), the time-frequency information of the signal under test (SUT) in different fr… ▽ More In this article, analog microwave short-time Fourier transform (STFT) with improved measurement performance is implemented in the optical domain by employing stimulated Brillouin scattering (SBS) and channelization. By jointly using three optical frequency combs and filter- and SBS-based frequency-to-time mapping (FTTM), the time-frequency information of the signal under test (SUT) in different frequency intervals is measured in different channels. Then, by using the channel label introduced through subcarriers after photodetection, the obtained low-speed electrical pulses in different channels mixed in the time domain are distinguished and the time-frequency information of the SUT in different channels is respectively obtained and spliced to implement the STFT. For the first time, channelization measurement technology is introduced in the STFT system based on frequency sweeping and FTTM, greatly reducing the frequency-sweep range of the required frequency-sweep signal to the analysis bandwidth divided by the number of channels. In addition, channelization can also be used to improve the time and frequency resolution of the STFT system. A proof-of-concept experiment is performed. 12-GHz and 10-GHz analysis bandwidth is implemented by using a 4-GHz frequency-sweep signal and 3 channels and a 2-GHz frequency-sweep signal and 5 channels. Measurement performance improvement is also demonstrated. △ Less

Submitted 24 May, 2023; originally announced May 2023.

Comments: 18 pages, 9 figures, 1 table

arXiv:2301.00504 [pdf]

Spectral Bandwidth Recovery of Optical Coherence Tomography Images using Deep Learning

Authors: Timothy T. Yu, Da Ma, Jayden Cole, Myeong Jin Ju, Mirza F. Beg, Marinko V. Sarunic

Abstract: Optical coherence tomography (OCT) captures cross-sectional data and is used for the screening, monitoring, and treatment planning of retinal diseases. Technological developments to increase the speed of acquisition often results in systems with a narrower spectral bandwidth, and hence a lower axial resolution. Traditionally, image-processing-based techniques have been utilized to reconstruct subs… ▽ More Optical coherence tomography (OCT) captures cross-sectional data and is used for the screening, monitoring, and treatment planning of retinal diseases. Technological developments to increase the speed of acquisition often results in systems with a narrower spectral bandwidth, and hence a lower axial resolution. Traditionally, image-processing-based techniques have been utilized to reconstruct subsampled OCT data and more recently, deep-learning-based methods have been explored. In this study, we simulate reduced axial scan (A-scan) resolution by Gaussian windowing in the spectral domain and investigate the use of a learning-based approach for image feature reconstruction. In anticipation of the reduced resolution that accompanies wide-field OCT systems, we build upon super-resolution techniques to explore methods to better aid clinicians in their decision-making to improve patient outcomes, by reconstructing lost features using a pixel-to-pixel approach with an altered super-resolution generative adversarial network (SRGAN) architecture. △ Less

Submitted 1 January, 2023; originally announced January 2023.

arXiv:2212.00532 [pdf, other]

EBHI-Seg: A Novel Enteroscope Biopsy Histopathological Haematoxylin and Eosin Image Dataset for Image Segmentation Tasks

Authors: Liyu Shi, Xiaoyan Li, Weiming Hu, Haoyuan Chen, Jing Chen, Zizhen Fan, Minghe Gao, Yujie Jing, Guotao Lu, Deguo Ma, Zhiyu Ma, Qingtao Meng, Dechao Tang, Hongzan Sun, Marcin Grzegorzek, Shouliang Qi, Yueyang Teng, Chen Li

Abstract: Background and Purpose: Colorectal cancer is a common fatal malignancy, the fourth most common cancer in men, and the third most common cancer in women worldwide. Timely detection of cancer in its early stages is essential for treating the disease. Currently, there is a lack of datasets for histopathological image segmentation of rectal cancer, which often hampers the assessment accuracy when comp… ▽ More Background and Purpose: Colorectal cancer is a common fatal malignancy, the fourth most common cancer in men, and the third most common cancer in women worldwide. Timely detection of cancer in its early stages is essential for treating the disease. Currently, there is a lack of datasets for histopathological image segmentation of rectal cancer, which often hampers the assessment accuracy when computer technology is used to aid in diagnosis. Methods: This present study provided a new publicly available Enteroscope Biopsy Histopathological Hematoxylin and Eosin Image Dataset for Image Segmentation Tasks (EBHI-Seg). To demonstrate the validity and extensiveness of EBHI-Seg, the experimental results for EBHI-Seg are evaluated using classical machine learning methods and deep learning methods. Results: The experimental results showed that deep learning methods had a better image segmentation performance when utilizing EBHI-Seg. The maximum accuracy of the Dice evaluation metric for the classical machine learning method is 0.948, while the Dice evaluation metric for the deep learning method is 0.965. Conclusion: This publicly available dataset contained 5,170 images of six types of tumor differentiation stages and the corresponding ground truth images. The dataset can provide researchers with new segmentation algorithms for medical diagnosis of colorectal cancer, which can be used in the clinical setting to help doctors and patients. △ Less

Submitted 6 December, 2022; v1 submitted 1 December, 2022; originally announced December 2022.

arXiv:2211.01079 [pdf, other]

Intermediate Fine-Tuning Using Imperfect Synthetic Speech for Improving Electrolaryngeal Speech Recognition

Authors: Lester Phillip Violeta, Ding Ma, Wen-Chin Huang, Tomoki Toda

Abstract: Research on automatic speech recognition (ASR) systems for electrolaryngeal speakers has been relatively unexplored due to small datasets. When training data is lacking in ASR, a large-scale pretraining and fine tuning framework is often sufficient to achieve high recognition rates; however, in electrolaryngeal speech, the domain shift between the pretraining and fine-tuning data is too large to o… ▽ More Research on automatic speech recognition (ASR) systems for electrolaryngeal speakers has been relatively unexplored due to small datasets. When training data is lacking in ASR, a large-scale pretraining and fine tuning framework is often sufficient to achieve high recognition rates; however, in electrolaryngeal speech, the domain shift between the pretraining and fine-tuning data is too large to overcome, limiting the maximum improvement of recognition rates. To resolve this, we propose an intermediate fine-tuning step that uses imperfect synthetic speech to close the domain shift gap between the pretraining and target data. Despite the imperfect synthetic data, we show the effectiveness of this on electrolaryngeal speech datasets, with improvements of 6.1% over the baseline that did not use imperfect synthetic speech. Results show how the intermediate fine-tuning stage focuses on learning the high-level inherent features of the imperfect synthetic data rather than the low-level features such as intelligibility. △ Less

Submitted 30 May, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

Comments: Accepted to ICASSP 2023

arXiv:2210.10314 [pdf, other]

Two-stage training method for Japanese electrolaryngeal speech enhancement based on sequence-to-sequence voice conversion

Authors: Ding Ma, Lester Phillip Violeta, Kazuhiro Kobayashi, Tomoki Toda

Abstract: Sequence-to-sequence (seq2seq) voice conversion (VC) models have greater potential in converting electrolaryngeal (EL) speech to normal speech (EL2SP) compared to conventional VC models. However, EL2SP based on seq2seq VC requires a sufficiently large amount of parallel data for the model training and it suffers from significant performance degradation when the amount of training data is insuffici… ▽ More Sequence-to-sequence (seq2seq) voice conversion (VC) models have greater potential in converting electrolaryngeal (EL) speech to normal speech (EL2SP) compared to conventional VC models. However, EL2SP based on seq2seq VC requires a sufficiently large amount of parallel data for the model training and it suffers from significant performance degradation when the amount of training data is insufficient. To address this issue, we suggest a novel, two-stage strategy to optimize the performance on EL2SP based on seq2seq VC when a small amount of the parallel dataset is available. In contrast to utilizing high-quality data augmentations in previous studies, we first combine a large amount of imperfect synthetic parallel data of EL and normal speech, with the original dataset into VC training. Then, a second stage training is conducted with the original parallel dataset only. The results show that the proposed method progressively improves the performance of EL2SP based on seq2seq VC. △ Less

Submitted 19 October, 2022; originally announced October 2022.

Comments: Accepted to SLT 2022

arXiv:2208.14635 [pdf, other]

Segmentation-guided Domain Adaptation and Data Harmonization of Multi-device Retinal Optical Coherence Tomography using Cycle-Consistent Generative Adversarial Networks

Authors: Shuo Chen, Da Ma, Sieun Lee, Timothy T. L. Yu, Gavin Xu, Donghuan Lu, Karteek Popuri, Myeong Jin Ju, Marinko V. Sarunic, Mirza Faisal Beg

Abstract: Optical Coherence Tomography(OCT) is a non-invasive technique capturing cross-sectional area of the retina in micro-meter resolutions. It has been widely used as a auxiliary imaging reference to detect eye-related pathology and predict longitudinal progression of the disease characteristics. Retina layer segmentation is one of the crucial feature extraction techniques, where the variations of reti… ▽ More Optical Coherence Tomography(OCT) is a non-invasive technique capturing cross-sectional area of the retina in micro-meter resolutions. It has been widely used as a auxiliary imaging reference to detect eye-related pathology and predict longitudinal progression of the disease characteristics. Retina layer segmentation is one of the crucial feature extraction techniques, where the variations of retinal layer thicknesses and the retinal layer deformation due to the presence of the fluid are highly correlated with multiple epidemic eye diseases like Diabetic Retinopathy(DR) and Age-related Macular Degeneration (AMD). However, these images are acquired from different devices, which have different intensity distribution, or in other words, belong to different imaging domains. This paper proposes a segmentation-guided domain-adaptation method to adapt images from multiple devices into single image domain, where the state-of-art pre-trained segmentation model is available. It avoids the time consumption of manual labelling for the upcoming new dataset and the re-training of the existing network. The semantic consistency and global feature consistency of the network will minimize the hallucination effect that many researchers reported regarding Cycle-Consistent Generative Adversarial Networks(CycleGAN) architecture. △ Less

Submitted 31 August, 2022; originally announced August 2022.

Comments: 16 pages, 10 figures

arXiv:2208.09143 [pdf]

Photonics-enabled wavelet-like transform via nonlinear optical frequency sweeping and stimulated Brillouin scattering-based frequency-to-time mapping

Authors: Pengcheng Zuo, Dong Ma, Yang Chen

Abstract: A photonics-enabled wavelet-like transform system, characterized by multi-resolution time-frequency analysis, is proposed based on a typical stimulated Brillouin scattering (SBS) pump-probe setup using an optical nonlinear frequency-sweep signal. In the pump path, a continuous-wave optical signal is injected into an SBS medium to generate an SBS gain. In the probe path, a periodic nonlinear freque… ▽ More A photonics-enabled wavelet-like transform system, characterized by multi-resolution time-frequency analysis, is proposed based on a typical stimulated Brillouin scattering (SBS) pump-probe setup using an optical nonlinear frequency-sweep signal. In the pump path, a continuous-wave optical signal is injected into an SBS medium to generate an SBS gain. In the probe path, a periodic nonlinear frequency-sweep optical signal with a time-varying chirp rate is generated, which is then modulated at a Mach-Zehnder modulator (MZM) by the electrical signal under test (SUT). The optical signal from the MZM is selectively amplified by the SBS gain and converted back to the electrical domain using a low-speed photodetector, implementing the periodic SBS-based frequency-to-time mapping (FTTM). The frequency-domain information corresponding to different periods is mapped to the time domain via the FTTM in the form of low-speed electrical pulses, which is then spliced to analyze the time-frequency relationship of the SUT in real-time. The time-varying chirp rate in each sweep period makes the signals with different frequencies have different frequency resolutions in the FTTM process, which is very similar to the characteristics of the wavelet transform, so we call it wavelet-like transform. An experiment is carried out. Multi-resolution time-frequency analysis of a variety of RF signals is carried out in a 4-GHz bandwidth limited only by the equipment. △ Less

Submitted 19 August, 2022; originally announced August 2022.

Comments: 9 pages, 6 figures

arXiv:2208.04871 [pdf]

Breaking the accuracy and resolution limitation of filter- and frequency-to-time mapping-based time and frequency acquisition methods by broadening the filter bandwidth

Authors: Pengcheng Zuo, Dong Ma, Xiaowei Li, Yang Chen

Abstract: In this paper, the filter- and frequency-to-time mapping (FTTM)-based photonics-assisted time and frequency acquisition methods are comprehensively analyzed and the accuracy and resolution limitation in the fast sweep scenario is broken by broadening the filter bandwidth. It is found that when the sweep speed is very fast, the width of the generated pulse via FTTM is mainly determined by the impul… ▽ More In this paper, the filter- and frequency-to-time mapping (FTTM)-based photonics-assisted time and frequency acquisition methods are comprehensively analyzed and the accuracy and resolution limitation in the fast sweep scenario is broken by broadening the filter bandwidth. It is found that when the sweep speed is very fast, the width of the generated pulse via FTTM is mainly determined by the impulse response of the filter. In this case, appropriately increasing the filter bandwidth can significantly reduce the pulse width, so as to improve the measurement accuracy and resolution. FTTM-based short-time Fourier transform (STFT) and microwave frequency measurement using the stimulated Brillouin scattering (SBS) effect is demonstrated by comparing the results with and without SBS gain spectrum broadening and the improvement of measurement accuracy and frequency resolution is well confirmed. The frequency measurement accuracy of the system is improved by around 25 times compared with the former work using a similar sweep speed, while the frequency resolution of the STFT is also much improved compared with our former results. △ Less

Submitted 9 August, 2022; originally announced August 2022.

Comments: 18 pages, 11 figures

arXiv:2207.01175 [pdf]

doi 10.1109/LPT.2022.3225547

Photonics-based short-time Fourier transform without high-frequency electronic devices and equipment

Authors: Pengcheng Zuo, Dong Ma, Yang Chen

Abstract: A photonics-based short-time Fourier transform (STFT) system is proposed and experimentally demonstrated based on stimulated Brillouin scattering (SBS) without using high-frequency electronic devices and equipment. The wavelength of a distributed feedback laser diode is periodically swept by using a low-speed periodic sawtooth/triangular driving current. The periodic frequency-sweep optical signal… ▽ More A photonics-based short-time Fourier transform (STFT) system is proposed and experimentally demonstrated based on stimulated Brillouin scattering (SBS) without using high-frequency electronic devices and equipment. The wavelength of a distributed feedback laser diode is periodically swept by using a low-speed periodic sawtooth/triangular driving current. The periodic frequency-sweep optical signal is modulated by the signal under test (SUT) and then injected into a section of SBS medium. The optical signal from another laser diode as the pump wave is reversely injected into the SBS medium. After simply detecting the forward transmission optical signals in a low-speed photodetector, the STFT of the SUT can be implemented. The system is characterized by the absence of any high-frequency electronic devices or equipment. An experiment is performed. The STFT of a variety of RF signals is carried out in a 4-GHz bandwidth. The dynamic frequency resolution is demonstrated to be around 60 MHz. △ Less

Submitted 3 July, 2022; originally announced July 2022.

Comments: 8 pages, 5 figures

arXiv:2204.04579 [pdf, other]

doi 10.1121/10.0015792

Inferring Pitch from Coarse Spectral Features

Authors: Danni Ma, Neville Ryant, Mark Liberman

Abstract: Fundamental frequency (F0) has long been treated as the physical definition of "pitch" in phonetic analysis. But there have been many demonstrations that F0 is at best an approximation to pitch, both in production and in perception: pitch is not F0, and F0 is not pitch. Changes in the pitch involve many articulatory and acoustic covariates; pitch perception often deviates from what F0 analysis pre… ▽ More Fundamental frequency (F0) has long been treated as the physical definition of "pitch" in phonetic analysis. But there have been many demonstrations that F0 is at best an approximation to pitch, both in production and in perception: pitch is not F0, and F0 is not pitch. Changes in the pitch involve many articulatory and acoustic covariates; pitch perception often deviates from what F0 analysis predicts; and in fact, quasi-periodic signals from a single voice source are often incompletely characterized by an attempt to define a single time-varying F0. In this paper, we find strong support for the existence of covariates for pitch in aspects of relatively coarse spectra, in which an overtone series is not available. Thus linear regression can predict the pitch of simple vocalizations, produced by an articulatory synthesizer or by human, from single frames of such coarse spectra. Across speakers, and in more complex vocalizations, our experiments indicate that the covariates are not quite so simple, though apparently still available for more sophisticated modeling. On this basis, we propose that the field needs a better way of thinking about speech pitch, just as celestial mechanics requires us to go beyond Newton's point mass approximations to heavenly bodies. △ Less

Submitted 26 August, 2022; v1 submitted 9 April, 2022; originally announced April 2022.

arXiv:2203.05707 [pdf]

doi 10.3233/JAD-220021

Machine Learning Based Multimodal Neuroimaging Genomics Dementia Score for Predicting Future Conversion to Alzheimer's Disease

Authors: Ghazal Mirabnahrazam, Da Ma, Sieun Lee, Karteek Popuri, Hyunwoo Lee, Jiguo Cao, Lei Wang, James E Galvin, Mirza Faisal Beg, the Alzheimer's Disease Neuroimaging Initiative

Abstract: Background: The increasing availability of databases containing both magnetic resonance imaging (MRI) and genetic data allows researchers to utilize multimodal data to better understand the characteristics of dementia of Alzheimer's type (DAT). Objective: The goal of this study was to develop and analyze novel biomarkers that can help predict the development and progression of DAT. Methods: We use… ▽ More Background: The increasing availability of databases containing both magnetic resonance imaging (MRI) and genetic data allows researchers to utilize multimodal data to better understand the characteristics of dementia of Alzheimer's type (DAT). Objective: The goal of this study was to develop and analyze novel biomarkers that can help predict the development and progression of DAT. Methods: We used feature selection and ensemble learning classifier to develop an image/genotype-based DAT score that represents a subject's likelihood of developing DAT in the future. Three feature types were used: MRI only, genetic only, and combined multimodal data. We used a novel data stratification method to better represent different stages of DAT. Using a pre-defined 0.5 threshold on DAT scores, we predicted whether or not a subject would develop DAT in the future. Results: Our results on Alzheimer's Disease Neuroimaging Initiative (ADNI) database showed that dementia scores using genetic data could better predict future DAT progression for currently normal control subjects (Accuracy=0.857) compared to MRI (Accuracy=0.143), while MRI can better characterize subjects with stable mild cognitive impairment (Accuracy=0.614) compared to genetics (Accuracy=0.356). Combining MRI and genetic data showed improved classification performance in the remaining stratified groups. Conclusion: MRI and genetic data can contribute to DAT prediction in different ways. MRI data reflects anatomical changes in the brain, while genetic data can detect the risk of DAT progression prior to the symptomatic onset. Combining information from multimodal data in the right way can improve prediction performance. △ Less

Submitted 10 March, 2022; originally announced March 2022.

Journal ref: J Alzheimers Dis 1 Jan. (2022) 1-21

arXiv:2202.09954 [pdf, other]

doi 10.1109/TCOMM.2022.3201931

Theoretical Analysis of Deep Neural Networks in Physical Layer Communication

Authors: Jun Liu, Haitao Zhao, Dongtang Ma, Kai Mei, Jibo Wei

Abstract: Recently, deep neural network (DNN)-based physical layer communication techniques have attracted considerable interest. Although their potential to enhance communication systems and superb performance have been validated by simulation experiments, little attention has been paid to the theoretical analysis. Specifically, most studies in the physical layer have tended to focus on the application of… ▽ More Recently, deep neural network (DNN)-based physical layer communication techniques have attracted considerable interest. Although their potential to enhance communication systems and superb performance have been validated by simulation experiments, little attention has been paid to the theoretical analysis. Specifically, most studies in the physical layer have tended to focus on the application of DNN models to wireless communication problems but not to theoretically understand how does a DNN work in a communication system. In this paper, we aim to quantitatively analyze why DNNs can achieve comparable performance in the physical layer comparing with traditional techniques, and also drive their cost in terms of computational complexity. To achieve this goal, we first analyze the encoding performance of a DNN-based transmitter and compare it to a traditional one. And then, we theoretically analyze the performance of DNN-based estimator and compare it with traditional estimators. Third, we investigate and validate how information is flown in a DNN-based communication system under the information theoretic concepts. Our analysis develops a concise way to open the "black box" of DNNs in physical layer communication, which can be applied to support the design of DNN-based intelligent communication techniques and help to provide explainable performance assessment. △ Less

Submitted 26 August, 2022; v1 submitted 20 February, 2022; originally announced February 2022.

Comments: 15 pages, 13 figures, has been accepted for publication in IEEE Transactions on Communications. arXiv admin note: substantial text overlap with arXiv:2106.01124

Journal ref: IEEE Transactions on Communications, 2022

arXiv:2201.11285 [pdf]

doi 10.1364/OL.455019

Time-varying microwave photonic filter for arbitrary waveform signal-to-noise ratio improvement

Authors: Dong Ma, Yang Chen

Abstract: A time-varying microwave photonic filter (TV-MPF) based on stimulated Brillouin scattering (SBS) is proposed and utilized to suppress the in-band noise of broadband arbitrary microwave waveforms, thereby improving the signal-to-noise ratio (SNR). The filter-controlling signal is designed according to the signal to be filtered and drives the TV-MPF so that the passband of the filter is always align… ▽ More A time-varying microwave photonic filter (TV-MPF) based on stimulated Brillouin scattering (SBS) is proposed and utilized to suppress the in-band noise of broadband arbitrary microwave waveforms, thereby improving the signal-to-noise ratio (SNR). The filter-controlling signal is designed according to the signal to be filtered and drives the TV-MPF so that the passband of the filter is always aligned with the frequencies of the signal to be filtered. By continuously tracking the signal spectral component, the TV-MPF only retains the spectral components of the signal and filters out the noise other than the spectral component of the signal at the current time, so as to improve the in-band SNR of the signal to be filtered. An experiment is performed. A variety of signals with different formats and in-band SNRs are used to test the noise suppression capability of the TV-MPF, and the waveform mean-square error is calculated to quantify the improvement of the signal, demonstrating the excellent adaptability of the proposed TV-MPF to different kinds of signals. △ Less

Submitted 26 January, 2022; originally announced January 2022.

Comments: 8 pages, 5 figures

arXiv:2201.08741 [pdf]

doi 10.3389/fnimg.2022.1023481

Improving Across-Dataset Brain Tissue Segmentation Using Transformer

Authors: Vishwanatha M. Rao, Zihan Wan, Soroush Arabshahi, David J. Ma, Pin-Yu Lee, Ye Tian, Xuzhe Zhang, Andrew F. Laine, Jia Guo

Abstract: Brain tissue segmentation has demonstrated great utility in quantifying MRI data through Voxel-Based Morphometry and highlighting subtle structural changes associated with various conditions within the brain. However, manual segmentation is highly labor-intensive, and automated approaches have struggled due to properties inherent to MRI acquisition, leaving a great need for an effective segmentati… ▽ More Brain tissue segmentation has demonstrated great utility in quantifying MRI data through Voxel-Based Morphometry and highlighting subtle structural changes associated with various conditions within the brain. However, manual segmentation is highly labor-intensive, and automated approaches have struggled due to properties inherent to MRI acquisition, leaving a great need for an effective segmentation tool. Despite the recent success of deep convolutional neural networks (CNNs) for brain tissue segmentation, many such solutions do not generalize well to new datasets, which is critical for a reliable solution. Transformers have demonstrated success in natural image segmentation and have recently been applied to 3D medical image segmentation tasks due to their ability to capture long-distance relationships in the input where the local receptive fields of CNNs struggle. This study introduces a novel CNN-Transformer hybrid architecture designed for brain tissue segmentation. We validate our model's performance across four multi-site T1w MRI datasets, covering different vendors, field strengths, scan parameters, time points, and neuropsychiatric conditions. In all situations, our model achieved the greatest generality and reliability. Out method is inherently robust and can serve as a valuable tool for brain-related T1w MRI studies. The code for the TABS network is available at: https://github.com/raovish6/TABS. △ Less

Submitted 31 January, 2023; v1 submitted 21 January, 2022; originally announced January 2022.

ACM Class: I.4.6

arXiv:2201.07438 [pdf, other]

MHTTS: Fast multi-head text-to-speech for spontaneous speech with imperfect transcription

Authors: Dabiao Ma, Yitong Zhang, Meng Li, Feng Ye

Abstract: Neural network based end-to-end Text-to-Speech (TTS) has greatly improved the quality of synthesized speech. While how to use massive spontaneous speech without transcription efficiently still remains an open problem. In this paper, we propose MHTTS, a fast multi-speaker TTS system that is robust to transcription errors and speaking style speech data. Specifically, we introduce a multi-head model… ▽ More Neural network based end-to-end Text-to-Speech (TTS) has greatly improved the quality of synthesized speech. While how to use massive spontaneous speech without transcription efficiently still remains an open problem. In this paper, we propose MHTTS, a fast multi-speaker TTS system that is robust to transcription errors and speaking style speech data. Specifically, we introduce a multi-head model and transfer text information from high-quality corpus with manual transcription to spontaneous speech with imperfectly recognized transcription by jointly training them. MHTTS has three advantages: 1) Our system synthesizes better quality multi-speaker voice with faster inference speed. 2) Our system is capable of transferring correct text information to data with imperfect transcription, simulated using corruption, or provided by an Automatic Speech Recogniser (ASR). 3) Our system can utilize massive real spontaneous speech with imperfect transcription and synthesize expressive voice. △ Less

Submitted 4 February, 2022; v1 submitted 19 January, 2022; originally announced January 2022.

arXiv:2111.13438 [pdf]

doi 10.1109/JLT.2022.3174552

Short-time Fourier transform based on stimulated Brillouin scattering

Authors: Pengcheng Zuo, Dong Ma, Yang Chen

Abstract: In this paper, all-optical short-time Fourier transform (STFT) based on stimulated Brillouin scattering (SBS) is proposed and further used for real-time time-frequency analysis of different radio frequency (RF) signals. In the proposed all-optical STFT system, SBS not only provides a band-pass filter for implementing the window function in conjunction with a periodic frequency-sweep optical signal… ▽ More In this paper, all-optical short-time Fourier transform (STFT) based on stimulated Brillouin scattering (SBS) is proposed and further used for real-time time-frequency analysis of different radio frequency (RF) signals. In the proposed all-optical STFT system, SBS not only provides a band-pass filter for implementing the window function in conjunction with a periodic frequency-sweep optical signal but also obtains the frequency domain information in different time windows through the generated waveform via frequency-to-time mapping (FTTM). A periodic frequency-sweep optical signal is generated and then modulated at a Mach-Zehnder modulator by the electrical signal under test (SUT). During different sweep periods, the fixed Brillouin gain functions as a bandpass filter to select a specific range of the spectrum, which is equivalent to applying a sliding window function to the corresponding section of the temporal signal with the help of the sweep optical signal. At the same time, after the optical signal is selectively amplified by the SBS gain and converted back to the electrical domain, SBS also implements the real-time FTTM, which can be utilized to obtain the frequency domain information corresponding to different time windows through the generated waveforms via the FTTM. The frequency domain information corresponding to different time windows is formed and spliced to analyze the time-frequency relationship of the SUT in real-time. An experiment is performed. STFTs of a variety of RF signals are carried out in a 12-GHz bandwidth limited only by the equipment, and the dynamic frequency resolution is better than 60 MHz. △ Less

Submitted 26 November, 2021; originally announced November 2021.

Comments: 18 pages, 9 figures, 1 table

arXiv:2111.02667 [pdf, other]

doi 10.1109/TAP.2022.3177533

Physics Assisted Deep Learning for Indoor Imaging using Phaseless Wi-Fi Measurements

Authors: Samruddhi Deshmukh, Amartansh Dubey, Dingfei Ma, Qifeng Chen, Ross Murch

Abstract: A physics assisted deep learning framework to perform accurate indoor imaging using phaseless Wi-Fi measurements is proposed. It is able to image objects that are large (compared to wavelength) and have high permittivity values, that existing radio frequency (RF) inverse scattering techniques find very challenging, making it suitable for indoor RF imaging. The technique utilizes a Rytov based inve… ▽ More A physics assisted deep learning framework to perform accurate indoor imaging using phaseless Wi-Fi measurements is proposed. It is able to image objects that are large (compared to wavelength) and have high permittivity values, that existing radio frequency (RF) inverse scattering techniques find very challenging, making it suitable for indoor RF imaging. The technique utilizes a Rytov based inverse scattering model with a deep learning framework. The inverse scattering model is based on an extended Rytov approximation (xRA) that pre-reconstructs the RF measurements. Under strong scattering conditions, this pre-reconstruction is related to the actual permittivity profile by a non-linear function, which is learned by a modified U-Net model to obtain the permittivity profile of the object. Thus, our proposed approach not only reconstructs the shape of objects, but also estimates their permittivity values accurately. We demonstrate its imaging performance using simulations as well as experimental results in an actual indoor environment using 2.4 GHz Wi-Fi phaseless measurements. For incident wavelength $λ_0$, the proposed framework can reconstruct objects with relative permittivity as high as 77 and electrical size as large as $40 λ$, where $λ=λ_0/\sqrt{77}$. This is in contrast to existing phaseless imaging techniques which cannot reconstruct permittivity values beyond 3 or 4. Thus, our proposed method is the first inverse scattering-based deep learning framework which can image large scatterers with high permittivity and achieve accurate indoor RF imaging using phaseless Wi-Fi measurements. △ Less

Submitted 4 November, 2021; originally announced November 2021.

Comments: 14 pages, 10 figures. This work has been submitted to IEEE for possible publication

arXiv:2110.12857 [pdf]

doi 10.1364/AO.450247

Photonics-assisted microwave pulse detection and frequency measurement based on pulse replication and frequency-to-time mapping

Authors: Pengcheng Zuo, Dong Ma, Qingbo Liu, Lizhong Jiang, Yang Chen

Abstract: A photonics-assisted microwave pulse detection and frequency measurement scheme is proposed. The unknown microwave pulse is converted to the optical domain and then injected into a fiber loop for pulse replication, which makes it easier to identify the microwave pulse with large pulse repetition interval (PRI), whereas stimulated Brillouin scattering-based frequency-to-time mapping (FTTM) is utili… ▽ More A photonics-assisted microwave pulse detection and frequency measurement scheme is proposed. The unknown microwave pulse is converted to the optical domain and then injected into a fiber loop for pulse replication, which makes it easier to identify the microwave pulse with large pulse repetition interval (PRI), whereas stimulated Brillouin scattering-based frequency-to-time mapping (FTTM) is utilized to measure the carrier frequency of the microwave pulse. A sweep optical carrier is generated and modulated by the unknown microwave pulse and a continuous-wave single-frequency reference, generating two different frequency sweep optical signals, which are combined and used as the probe wave to detect a fixed Brillouin gain spectrum. When the optical signal is detected in a photodetector, FTTM is realized and the frequency of the microwave pulse can be determined. An experiment is performed. For a fiber loop containing a 210-m fiber, pulse replication and FTTM of the pulses with a PRI of 20 μs and pulse width of 1.20, 1.00, 0.85, and 0.65 μs are realized. Under a certain sweep frequency chirp rate of 0.978 THz/s, the measurement errors are below {\pm}12 and {\pm}5 MHz by using one pair of pulses and multiple pairs of pulses, respectively. The influence of the sweep frequency chirp rate and pulse width on the measurement error has also been studied. To a certain extent, the faster the frequency sweep, the greater the frequency measurement error. For a specific sweep frequency chirp rate, the measurement error is almost unaffected by the pulse width to be measured. △ Less

Submitted 25 September, 2021; originally announced October 2021.

Comments: 13 pages, 8 figures

arXiv:2109.05627 [pdf, other]

Differential Diagnosis of Frontotemporal Dementia and Alzheimer's Disease using Generative Adversarial Network

Authors: Da Ma, Donghuan Lu, Karteek Popuri, Mirza Faisal Beg

Abstract: Frontotemporal dementia and Alzheimer's disease are two common forms of dementia and are easily misdiagnosed as each other due to their similar pattern of clinical symptoms. Differentiating between the two dementia types is crucial for determining disease-specific intervention and treatment. Recent development of Deep-learning-based approaches in the field of medical image computing are delivering… ▽ More Frontotemporal dementia and Alzheimer's disease are two common forms of dementia and are easily misdiagnosed as each other due to their similar pattern of clinical symptoms. Differentiating between the two dementia types is crucial for determining disease-specific intervention and treatment. Recent development of Deep-learning-based approaches in the field of medical image computing are delivering some of the best performance for many binary classification tasks, although its application in differential diagnosis, such as neuroimage-based differentiation for multiple types of dementia, has not been explored. In this study, a novel framework was proposed by using the Generative Adversarial Network technique to distinguish FTD, AD and normal control subjects, using volumetric features extracted at coarse-to-fine structural scales from Magnetic Resonance Imaging scans. Experiments of 10-folds cross-validation on 1,954 images achieved high accuracy. With the proposed framework, we have demonstrated that the combination of multi-scale structural features and synthetic data augmentation based on generative adversarial network can improve the performance of challenging tasks such as differentiating Dementia sub-types. △ Less

Submitted 29 September, 2021; v1 submitted 12 September, 2021; originally announced September 2021.

arXiv:2109.03904 [pdf]

doi 10.1016/j.optcom.2022.128228

Time-frequency analysis of microwave signals based on stimulated Brillouin scattering

Authors: Dong Ma, Pengcheng Zuo, Yang Chen

Abstract: A novel photonic approach to the time-frequency analysis of microwave signals is proposed based on the stimulated Brillouin scattering (SBS)-assisted frequency-to-time mapping (FTTM). Two types of time-frequency analysis links, namely parallel SBS link and time-division SBS link are proposed. The parallel SBS link can be utilized to perform real-time time-frequency analysis of microwave signal, wh… ▽ More A novel photonic approach to the time-frequency analysis of microwave signals is proposed based on the stimulated Brillouin scattering (SBS)-assisted frequency-to-time mapping (FTTM). Two types of time-frequency analysis links, namely parallel SBS link and time-division SBS link are proposed. The parallel SBS link can be utilized to perform real-time time-frequency analysis of microwave signal, which provides a promising solution for real-time time-frequency analysis, especially when it is combined with the photonic integration technique. A simulation is made to verify its feasibility by analyzing signals in multiple formats. The time-division SBS link has a simpler and reconfigurable structure, which can realize an ultra-high-resolution time-frequency analysis for periodic signals using the time segmentation and accumulation technique. An experiment is performed for the time-division SBS link. The multi-dimensional reconfigurability of the system is experimentally studied. An analysis bandwidth of 3.9 GHz, an analysis frequency up to 20 GHz, and a frequency resolution of 15 MHz are demonstrated, respectively. △ Less

Submitted 7 September, 2021; originally announced September 2021.

Comments: 17 pages, 10 figures, 1 table

arXiv:2107.10701 [pdf, other]

Multitask-Based Joint Learning Approach To Robust ASR For Radio Communication Speech

Authors: Duo Ma, Nana Hou, Van Tung Pham, Haihua Xu, Eng Siong Chng

Abstract: To realize robust end-to-end Automatic Speech Recognition(E2E ASR) under radio communication condition, we propose a multitask-based method to joint train a Speech Enhancement (SE) module as the front-end and an E2E ASR model as the back-end in this paper. One of the advantage of the proposed method is that the entire system can be trained from scratch. Different from prior works, either component… ▽ More To realize robust end-to-end Automatic Speech Recognition(E2E ASR) under radio communication condition, we propose a multitask-based method to joint train a Speech Enhancement (SE) module as the front-end and an E2E ASR model as the back-end in this paper. One of the advantage of the proposed method is that the entire system can be trained from scratch. Different from prior works, either component here doesn't need to perform pre-training and fine-tuning processes separately. Through analysis, we found that the success of the proposed method lies in the following aspects. Firstly, multitask learning is essential, that is the SE network is not only learning to produce more Intelligent speech, it is also aimed to generate speech that is beneficial to recognition. Secondly, we also found speech phase preserved from noisy speech is critical for improving ASR performance. Thirdly, we propose a dual channel data augmentation training method to obtain further improvement.Specifically, we combine the clean and enhanced speech to train the whole system. We evaluate the proposed method on the RATS English data set, achieving a relative WER reduction of 4.6% with the joint training method, and up to a relative WER reduction of 11.2% with the proposed data augmentation method. △ Less

Submitted 22 July, 2021; originally announced July 2021.

Comments: 7pages,3figures,Submitted to APSIPA2021

arXiv:2107.02345 [pdf, other]

Domain Adaptation via CycleGAN for Retina Segmentation in Optical Coherence Tomography

Authors: Ricky Chen, Timothy T. Yu, Gavin Xu, Da Ma, Marinko V. Sarunic, Mirza Faisal Beg

Abstract: With the FDA approval of Artificial Intelligence (AI) for point-of-care clinical diagnoses, model generalizability is of the utmost importance as clinical decision-making must be domain-agnostic. A method of tackling the problem is to increase the dataset to include images from a multitude of domains; while this technique is ideal, the security requirements of medical data is a major limitation. A… ▽ More With the FDA approval of Artificial Intelligence (AI) for point-of-care clinical diagnoses, model generalizability is of the utmost importance as clinical decision-making must be domain-agnostic. A method of tackling the problem is to increase the dataset to include images from a multitude of domains; while this technique is ideal, the security requirements of medical data is a major limitation. Additionally, researchers with developed tools benefit from the addition of open-sourced data, but are limited by the difference in domains. Herewith, we investigated the implementation of a Cycle-Consistent Generative Adversarial Networks (CycleGAN) for the domain adaptation of Optical Coherence Tomography (OCT) volumes. This study was done in collaboration with the Biomedical Optics Research Group and Functional & Anatomical Imaging & Shape Analysis Lab at Simon Fraser University. In this study, we investigated a learning-based approach of adapting the domain of a publicly available dataset, UK Biobank dataset (UKB). To evaluate the performance of domain adaptation, we utilized pre-existing retinal layer segmentation tools developed on a different set of RETOUCH OCT data. This study provides insight on state-of-the-art tools for domain adaptation compared to traditional processing techniques as well as a pipeline for adapting publicly available retinal data to the domains previously used by our collaborators. △ Less

Submitted 5 July, 2021; originally announced July 2021.

Comments: 10 pages, 6 figures, 1 table

ACM Class: I.4.0

arXiv:2106.14671 [pdf, other]

doi 10.1109/JSTSP.2021.3118219

FRaC: FMCW-Based Joint Radar-Communications System via Index Modulation

Authors: Dingyou Ma, Nir Shlezinger, Tianyao Huang, Yimin Liu, Yonina C. Eldar

Abstract: Dual function radar communications (DFRC) systems are attractive technologies for autonomous vehicles, which utilize electromagnetic waves to constantly sense the environment while simultaneously communicating with neighbouring devices. An emerging approach to implement DFRC systems is to embed information in radar waveforms via index modulation (IM). Implementation of DFRC schemes in vehicular sy… ▽ More Dual function radar communications (DFRC) systems are attractive technologies for autonomous vehicles, which utilize electromagnetic waves to constantly sense the environment while simultaneously communicating with neighbouring devices. An emerging approach to implement DFRC systems is to embed information in radar waveforms via index modulation (IM). Implementation of DFRC schemes in vehicular systems gives rise to strict constraints in terms of cost, power efficiency, and hardware complexity. In this paper, we extend IM-based DFRC systems to utilize sparse arrays and frequency modulated continuous waveforms (FMCWs), which are popular in automotive radar for their simplicity and low hardware complexity. The proposed FMCW-based radar-communications system (FRaC) operates at reduced cost and complexity by transmitting with a reduced number of radio frequency modules, combined with narrowband FMCW signalling. This is achieved via array sparsification in transmission, formulating a virtual multiple-input multiple-output array by combining the signals in one coherent processing interval, in which the narrowband waveforms are transmitted in a randomized manner. Performance analysis and numerical results show that the proposed radar scheme achieves similar resolution performance compared with a wideband radar system operating with a large receive aperture, while requiring less hardware overhead. For the communications subsystem, FRaC achieves higher rates and improved error rates compared to dual-function signalling based on conventional phase modulation. △ Less

Submitted 28 June, 2021; originally announced June 2021.

Comments: 16 pages

arXiv:2106.08147 [pdf, other]

doi 10.1117/12.2530688

Perceptually-inspired super-resolution of compressed videos

Authors: Di Ma, Mariana Afonso, Fan Zhang, David R. Bull

Abstract: Spatial resolution adaptation is a technique which has often been employed in video compression to enhance coding efficiency. This approach encodes a lower resolution version of the input video and reconstructs the original resolution during decoding. Instead of using conventional up-sampling filters, recent work has employed advanced super-resolution methods based on convolutional neural networks… ▽ More Spatial resolution adaptation is a technique which has often been employed in video compression to enhance coding efficiency. This approach encodes a lower resolution version of the input video and reconstructs the original resolution during decoding. Instead of using conventional up-sampling filters, recent work has employed advanced super-resolution methods based on convolutional neural networks (CNNs) to further improve reconstruction quality. These approaches are usually trained to minimise pixel-based losses such as Mean-Squared Error (MSE), despite the fact that this type of loss metric does not correlate well with subjective opinions. In this paper, a perceptually-inspired super-resolution approach (M-SRGAN) is proposed for spatial up-sampling of compressed video using a modified CNN model, which has been trained using a generative adversarial network (GAN) on compressed content with perceptual loss functions. The proposed method was integrated with HEVC HM 16.20, and has been evaluated on the JVET Common Test Conditions (UHD test sequences) using the Random Access configuration. The results show evident perceptual quality improvement over the original HM 16.20, with an average bitrate saving of 35.6% (Bjøntegaard Delta measurement) based on a perceptual quality metric, VMAF. △ Less

Submitted 15 June, 2021; originally announced June 2021.

arXiv:2106.01124 [pdf, other]

Opening the Black Box of Deep Neural Networks in Physical Layer Communication

Authors: Jun Liu, Haitao Zhao, Dongtang Ma, Kai Mei, Jibo Wei

Abstract: Deep Neural Network (DNN)-based physical layer techniques are attracting considerable interest due to their potential to enhance communication systems. However, most studies in the physical layer have tended to focus on the application of DNN models to wireless communication problems but not to theoretically understand how does a DNN work in a communication system. In this paper, we aim to quantit… ▽ More Deep Neural Network (DNN)-based physical layer techniques are attracting considerable interest due to their potential to enhance communication systems. However, most studies in the physical layer have tended to focus on the application of DNN models to wireless communication problems but not to theoretically understand how does a DNN work in a communication system. In this paper, we aim to quantitatively analyze why DNNs can achieve comparable performance in the physical layer comparing with traditional techniques and their cost in terms of computational complexity. We further investigate and also experimentally validate how information is flown in a DNN-based communication system under the information theoretic concepts. △ Less

Submitted 18 February, 2022; v1 submitted 2 June, 2021; originally announced June 2021.

Comments: 6 pages, 5 figures, to be presented in the IEEE Wireless Communications and Networking Conference (WCNC) 2022 Workshop on Machine Learning for Communications: Future Large Scale MIMO and AI-Native Air-Interface

arXiv:2105.11594 [pdf]

A Fast MR Fingerprinting Simulator for Direct Error Estimation and Sequence Optimization

Authors: Siyuan Hu, Stephen Jordan, Rasim Boyacioglu, Ignacio Rozada, Matthias Troyer, Mark Griswold, Debra McGivney, Dan Ma

Abstract: MR Fingerprinting is a novel quantitative MR technique that could simultaneously provide multiple tissue property maps. When optimizing MRF scans, modeling undersampling errors and field imperfections in cost functions will make the optimization results more practical and robust. However, this process is computationally expensive and impractical for sequence optimization algorithms when MRF signal… ▽ More MR Fingerprinting is a novel quantitative MR technique that could simultaneously provide multiple tissue property maps. When optimizing MRF scans, modeling undersampling errors and field imperfections in cost functions will make the optimization results more practical and robust. However, this process is computationally expensive and impractical for sequence optimization algorithms when MRF signal evolutions need to be generated for each optimization iteration. Here, we introduce a fast MRF simulator to simulate aliased images from actual scan scenarios including undersampling and system imperfections, which substantially reduces computational time and allows for direct error estimation and efficient sequence optimization. By constraining the total number of tissues present in a brain phantom, MRF signals from highly undersampled scans can be simulated as the product of the spatial response functions based on sampling patterns and sequence-dependent temporal functions. During optimization, the spatial response function is independent of sequence design and does not need to be recalculated. We evaluate the performance and computational speed of the proposed approach by simulations and in vivo experiments. We also demonstrate the power of applying the simulator in MRF sequence optimization. The simulation results from the proposed method closely approximate the signals and MRF maps from in vivo scans, with 158 times shorter processing time than the conventional simulation method using Non-uniform Fourier transform. Incorporating the proposed simulator in the MRF optimization framework makes direct estimation of undersampling errors during the optimization process feasible, and provide optimized MRF sequences that are robust against undersampling factors and system inhomogeneity. △ Less

Submitted 24 May, 2021; originally announced May 2021.

Comments: 10 pages, 7 figures

Showing 1–50 of 80 results for author: Mao, D