-
Imitation Learning for Satellite Attitude Control under Unknown Perturbations
Authors:
Zhizhuo Zhang,
Hao Peng,
Xiaoli Bai
Abstract:
This paper presents a novel satellite attitude control framework that integrates Soft Actor-Critic (SAC) reinforcement learning with Generative Adversarial Imitation Learning (GAIL) to achieve robust performance under various unknown perturbations. Traditional control techniques often rely on precise system models and are sensitive to parameter uncertainties and external perturbations. To overcome…
▽ More
This paper presents a novel satellite attitude control framework that integrates Soft Actor-Critic (SAC) reinforcement learning with Generative Adversarial Imitation Learning (GAIL) to achieve robust performance under various unknown perturbations. Traditional control techniques often rely on precise system models and are sensitive to parameter uncertainties and external perturbations. To overcome these limitations, we first develop a SAC-based expert controller that demonstrates improved resilience against actuator failures, sensor noise, and attitude misalignments, outperforming our previous results in several challenging scenarios. We then use GAIL to train a learner policy that imitates the expert's trajectories, thereby reducing training costs and improving generalization through expert demonstrations. Preliminary experiments under single and combined perturbations show that the SAC expert can rotate the antenna to a specified direction and keep the antenna orientation reliably stable in most of the listed perturbations. Additionally, the GAIL learner can imitate most of the features from the trajectories generated by the SAC expert. Comparative evaluations and ablation studies confirm the effectiveness of the SAC algorithm and reward shaping. The integration of GAIL further reduces sample complexity and demonstrates promising imitation capabilities, paving the way for more intelligent and autonomous spacecraft control systems.
△ Less
Submitted 1 July, 2025;
originally announced July 2025.
-
Learning Multimodal AI Algorithms for Amplifying Limited User Input into High-dimensional Control Space
Authors:
Ali Rabiee,
Sima Ghafoori,
MH Farhadi,
Robert Beyer,
Xiangyu Bai,
David J Lin,
Sarah Ostadabbas,
Reza Abiri
Abstract:
Current invasive assistive technologies are designed to infer high-dimensional motor control signals from severely paralyzed patients. However, they face significant challenges, including public acceptance, limited longevity, and barriers to commercialization. Meanwhile, noninvasive alternatives often rely on artifact-prone signals, require lengthy user training, and struggle to deliver robust hig…
▽ More
Current invasive assistive technologies are designed to infer high-dimensional motor control signals from severely paralyzed patients. However, they face significant challenges, including public acceptance, limited longevity, and barriers to commercialization. Meanwhile, noninvasive alternatives often rely on artifact-prone signals, require lengthy user training, and struggle to deliver robust high-dimensional control for dexterous tasks. To address these issues, this study introduces a novel human-centered multimodal AI approach as intelligent compensatory mechanisms for lost motor functions that could potentially enable patients with severe paralysis to control high-dimensional assistive devices, such as dexterous robotic arms, using limited and noninvasive inputs. In contrast to the current state-of-the-art (SoTA) noninvasive approaches, our context-aware, multimodal shared-autonomy framework integrates deep reinforcement learning algorithms to blend limited low-dimensional user input with real-time environmental perception, enabling adaptive, dynamic, and intelligent interpretation of human intent for complex dexterous manipulation tasks, such as pick-and-place. The results from our ARAS (Adaptive Reinforcement learning for Amplification of limited inputs in Shared autonomy) trained with synthetic users over 50,000 computer simulation episodes demonstrated the first successful implementation of the proposed closed-loop human-in-the-loop paradigm, outperforming the SoTA shared autonomy algorithms. Following a zero-shot sim-to-real transfer, ARAS was evaluated on 23 human subjects, demonstrating high accuracy in dynamic intent detection and smooth, stable 3D trajectory control for dexterous pick-and-place tasks. ARAS user study achieved a high task success rate of 92.88%, with short completion times comparable to those of SoTA invasive assistive technologies.
△ Less
Submitted 16 May, 2025;
originally announced May 2025.
-
Tetrahedron-Net for Medical Image Registration
Authors:
Jinhai Xiang,
Shuai Guo,
Qianru Han,
Dantong Shi,
Xinwei He,
Xiang Bai
Abstract:
Medical image registration plays a vital role in medical image processing. Extracting expressive representations for medical images is crucial for improving the registration quality. One common practice for this end is constructing a convolutional backbone to enable interactions with skip connections among feature extraction layers. The de facto structure, U-Net-like networks, has attempted to des…
▽ More
Medical image registration plays a vital role in medical image processing. Extracting expressive representations for medical images is crucial for improving the registration quality. One common practice for this end is constructing a convolutional backbone to enable interactions with skip connections among feature extraction layers. The de facto structure, U-Net-like networks, has attempted to design skip connections such as nested or full-scale ones to connect one single encoder and one single decoder to improve its representation capacity. Despite being effective, it still does not fully explore interactions with a single encoder and decoder architectures. In this paper, we embrace this observation and introduce a simple yet effective alternative strategy to enhance the representations for registrations by appending one additional decoder. The new decoder is designed to interact with both the original encoder and decoder. In this way, it not only reuses feature presentation from corresponding layers in the encoder but also interacts with the original decoder to corporately give more accurate registration results. The new architecture is concise yet generalized, with only one encoder and two decoders forming a ``Tetrahedron'' structure, thereby dubbed Tetrahedron-Net. Three instantiations of Tetrahedron-Net are further constructed regarding the different structures of the appended decoder. Our extensive experiments prove that superior performance can be obtained on several representative benchmarks of medical image registration. Finally, such a ``Tetrahedron'' design can also be easily integrated into popular U-Net-like architectures including VoxelMorph, ViT-V-Net, and TransMorph, leading to consistent performance gains.
△ Less
Submitted 7 May, 2025;
originally announced May 2025.
-
Adaptive Inner Speech-Text Alignment for LLM-based Speech Translation
Authors:
Henglyu Liu,
Andong Chen,
Kehai Chen,
Xuefeng Bai,
Meizhi Zhong,
Yuan Qiu,
Min Zhang
Abstract:
Recent advancement of large language models (LLMs) has led to significant breakthroughs across various tasks, laying the foundation for the development of LLM-based speech translation systems. Existing methods primarily focus on aligning inputs and outputs across modalities while overlooking deeper semantic alignment within model representations. To address this limitation, we propose an Adaptive…
▽ More
Recent advancement of large language models (LLMs) has led to significant breakthroughs across various tasks, laying the foundation for the development of LLM-based speech translation systems. Existing methods primarily focus on aligning inputs and outputs across modalities while overlooking deeper semantic alignment within model representations. To address this limitation, we propose an Adaptive Inner Speech-Text Alignment (AI-STA) method to bridge the modality gap by explicitly aligning speech and text representations at selected layers within LLMs. To achieve this, we leverage the optimal transport (OT) theory to quantify fine-grained representation discrepancies between speech and text. Furthermore, we utilize the cross-modal retrieval technique to identify the layers that are best suited for alignment and perform joint training on these layers. Experimental results on speech translation (ST) tasks demonstrate that AI-STA significantly improves the translation performance of large speech-text models (LSMs), outperforming previous state-of-the-art approaches. Our findings highlight the importance of inner-layer speech-text alignment in LLMs and provide new insights into enhancing cross-modal learning.
△ Less
Submitted 13 March, 2025;
originally announced March 2025.
-
Tumor Detection, Segmentation and Classification Challenge on Automated 3D Breast Ultrasound: The TDSC-ABUS Challenge
Authors:
Gongning Luo,
Mingwang Xu,
Hongyu Chen,
Xinjie Liang,
Xing Tao,
Dong Ni,
Hyunsu Jeong,
Chulhong Kim,
Raphael Stock,
Michael Baumgartner,
Yannick Kirchhoff,
Maximilian Rokuss,
Klaus Maier-Hein,
Zhikai Yang,
Tianyu Fan,
Nicolas Boutry,
Dmitry Tereshchenko,
Arthur Moine,
Maximilien Charmetant,
Jan Sauer,
Hao Du,
Xiang-Hui Bai,
Vipul Pai Raikar,
Ricardo Montoya-del-Angel,
Robert Marti
, et al. (12 additional authors not shown)
Abstract:
Breast cancer is one of the most common causes of death among women worldwide. Early detection helps in reducing the number of deaths. Automated 3D Breast Ultrasound (ABUS) is a newer approach for breast screening, which has many advantages over handheld mammography such as safety, speed, and higher detection rate of breast cancer. Tumor detection, segmentation, and classification are key componen…
▽ More
Breast cancer is one of the most common causes of death among women worldwide. Early detection helps in reducing the number of deaths. Automated 3D Breast Ultrasound (ABUS) is a newer approach for breast screening, which has many advantages over handheld mammography such as safety, speed, and higher detection rate of breast cancer. Tumor detection, segmentation, and classification are key components in the analysis of medical images, especially challenging in the context of 3D ABUS due to the significant variability in tumor size and shape, unclear tumor boundaries, and a low signal-to-noise ratio. The lack of publicly accessible, well-labeled ABUS datasets further hinders the advancement of systems for breast tumor analysis. Addressing this gap, we have organized the inaugural Tumor Detection, Segmentation, and Classification Challenge on Automated 3D Breast Ultrasound 2023 (TDSC-ABUS2023). This initiative aims to spearhead research in this field and create a definitive benchmark for tasks associated with 3D ABUS image analysis. In this paper, we summarize the top-performing algorithms from the challenge and provide critical analysis for ABUS image examination. We offer the TDSC-ABUS challenge as an open-access platform at https://tdsc-abus2023.grand-challenge.org/ to benchmark and inspire future developments in algorithmic research.
△ Less
Submitted 26 January, 2025;
originally announced January 2025.
-
Optimizing Prompt Strategies for SAM: Advancing lesion Segmentation Across Diverse Medical Imaging Modalities
Authors:
Yuli Wang,
Victoria Shi,
Wen-Chi Hsu,
Yuwei Dai,
Sophie Yao,
Zhusi Zhong,
Zishu Zhang,
Jing Wu,
Aaron Maxwell,
Scott Collins,
Zhicheng Jiao,
Harrison X. Bai
Abstract:
Purpose: To evaluate various Segmental Anything Model (SAM) prompt strategies across four lesions datasets and to subsequently develop a reinforcement learning (RL) agent to optimize SAM prompt placement. Materials and Methods: This retrospective study included patients with four independent ovarian, lung, renal, and breast tumor datasets. Manual segmentation and SAM-assisted segmentation were per…
▽ More
Purpose: To evaluate various Segmental Anything Model (SAM) prompt strategies across four lesions datasets and to subsequently develop a reinforcement learning (RL) agent to optimize SAM prompt placement. Materials and Methods: This retrospective study included patients with four independent ovarian, lung, renal, and breast tumor datasets. Manual segmentation and SAM-assisted segmentation were performed for all lesions. A RL model was developed to predict and select SAM points to maximize segmentation performance. Statistical analysis of segmentation was conducted using pairwise t-tests. Results: Results show that increasing the number of prompt points significantly improves segmentation accuracy, with Dice coefficients rising from 0.272 for a single point to 0.806 for five or more points in ovarian tumors. The prompt location also influenced performance, with surface and union-based prompts outperforming center-based prompts, achieving mean Dice coefficients of 0.604 and 0.724 for ovarian and breast tumors, respectively. The RL agent achieved a peak Dice coefficient of 0.595 for ovarian tumors, outperforming random and alternative RL strategies. Additionally, it significantly reduced segmentation time, achieving a nearly 10-fold improvement compared to manual methods using SAM. Conclusion: While increased SAM prompts and non-centered prompts generally improved segmentation accuracy, each pathology and modality has specific optimal thresholds and placement strategies. Our RL agent achieved superior performance compared to other agents while achieving a significant reduction in segmentation time.
△ Less
Submitted 28 December, 2024; v1 submitted 23 December, 2024;
originally announced December 2024.
-
Space-Time-Modulated Wideband Radiation-Type Programmable Metasurface for Low Sidelobe Beamforming
Authors:
Xudong Bai,
Longpan Wang,
Yuhua Chen,
Xilong Lu,
Fuli Zhang,
Jingfeng Chen,
Wen Chen,
He-Xiu Xu
Abstract:
Programmable metasurfaces promise a great potential to construct low-cost phased array systems due to the capability of elaborate modulation over electromagnetic (EM) waves. However, they are in either reflective or transmissive mode, and usually possess a relatively high profile as a result of the external feed source. Besides, it is difficult to conduct multibit phase shift in metasurfaces, when…
▽ More
Programmable metasurfaces promise a great potential to construct low-cost phased array systems due to the capability of elaborate modulation over electromagnetic (EM) waves. However, they are in either reflective or transmissive mode, and usually possess a relatively high profile as a result of the external feed source. Besides, it is difficult to conduct multibit phase shift in metasurfaces, when comparing with conventional phased arrays. Here, we propose a strategy of space-time modulated wideband radiation-type programmable metasurface for low side-lobe beamforming. The wideband programmable metasurface avoids the space-feed external source required by its traditional counterpart, thus achieving a significant reduction of profile through integration of a highefficiency microwave-fed excitation network and metasurface. Furthermore, through introducing space-time-modulated strategy, the high-accuracy amplitude-phase weight algorithm can also be synchronously carried out on the first harmonic component for low side-lobe beam-scanning. Most importantly, adaptive beamforming and generation of interference null can further be created after analyzing the harmonic component characteristics of received signals.
△ Less
Submitted 6 December, 2024;
originally announced December 2024.
-
Real-time Vehicle-to-Vehicle Communication Based Network Cooperative Control System through Distributed Database and Multimodal Perception: Demonstrated in Crossroads
Authors:
Xinwen Zhu,
Zihao Li,
Yuxuan Jiang,
Jiazhen Xu,
Jie Wang,
Xuyang Bai
Abstract:
The autonomous driving industry is rapidly advancing, with Vehicle-to-Vehicle (V2V) communication systems highlighting as a key component of enhanced road safety and traffic efficiency. This paper introduces a novel Real-time Vehicle-to-Vehicle Communication Based Network Cooperative Control System (VVCCS), designed to revolutionize macro-scope traffic planning and collision avoidance in autonomou…
▽ More
The autonomous driving industry is rapidly advancing, with Vehicle-to-Vehicle (V2V) communication systems highlighting as a key component of enhanced road safety and traffic efficiency. This paper introduces a novel Real-time Vehicle-to-Vehicle Communication Based Network Cooperative Control System (VVCCS), designed to revolutionize macro-scope traffic planning and collision avoidance in autonomous driving. Implemented on Quanser Car (Qcar) hardware platform, our system integrates the distributed databases into individual autonomous vehicles and an optional central server. We also developed a comprehensive multi-modal perception system with multi-objective tracking and radar sensing. Through a demonstration within a physical crossroad environment, our system showcases its potential to be applied in congested and complex urban environments.
△ Less
Submitted 23 October, 2024;
originally announced October 2024.
-
Joint Optimization of Data- and Model-Driven Probing Beams and Beam Predictor
Authors:
Tianheng Lu,
Fan Meng,
Zhilei Zhang,
Yongming Huang,
Cheng Zhang,
Xiaoyu Bai
Abstract:
Hierarchical search in millimeter-wave (mmWave) communications incurs significant beam training overhead and delay, especially in a dynamic environment. Deep learning-enabled beam prediction is promising to significantly mitigate the overhead and delay, efficiently utilizing the site-specific channel prior. In this work, we propose to jointly optimize a data- and model-driven probe beam module and…
▽ More
Hierarchical search in millimeter-wave (mmWave) communications incurs significant beam training overhead and delay, especially in a dynamic environment. Deep learning-enabled beam prediction is promising to significantly mitigate the overhead and delay, efficiently utilizing the site-specific channel prior. In this work, we propose to jointly optimize a data- and model-driven probe beam module and a cascaded data-driven beam predictor, with limitations in that the probe and communicate beams are restricted within the manifold space of uniform planer array and quantization of the phase modulator. First, The probe beam module senses the mmWave channel with a complex-valued neural network and outputs the counterpart RSRPs of probe beams. Second, the beam predictor estimates the RSRPs in the entire beamspace to minimize the prediction cross entropy and selects the optimal beam with the maximum RSRP value for data transmission. Additionally, we propose to add noise to the phase variables in the probe beam module, against quantization error. Simulation results show the effectiveness of our proposed scheme.
△ Less
Submitted 26 September, 2024;
originally announced September 2024.
-
HSIGene: A Foundation Model For Hyperspectral Image Generation
Authors:
Li Pang,
Xiangyong Cao,
Datao Tang,
Shuang Xu,
Xueru Bai,
Feng Zhou,
Deyu Meng
Abstract:
Hyperspectral image (HSI) plays a vital role in various fields such as agriculture and environmental monitoring. However, due to the expensive acquisition cost, the number of hyperspectral images is limited, degenerating the performance of downstream tasks. Although some recent studies have attempted to employ diffusion models to synthesize HSIs, they still struggle with the scarcity of HSIs, affe…
▽ More
Hyperspectral image (HSI) plays a vital role in various fields such as agriculture and environmental monitoring. However, due to the expensive acquisition cost, the number of hyperspectral images is limited, degenerating the performance of downstream tasks. Although some recent studies have attempted to employ diffusion models to synthesize HSIs, they still struggle with the scarcity of HSIs, affecting the reliability and diversity of the generated images. Some studies propose to incorporate multi-modal data to enhance spatial diversity, but the spectral fidelity cannot be ensured. In addition, existing HSI synthesis models are typically uncontrollable or only support single-condition control, limiting their ability to generate accurate and reliable HSIs. To alleviate these issues, we propose HSIGene, a novel HSI generation foundation model which is based on latent diffusion and supports multi-condition control, allowing for more precise and reliable HSI generation. To enhance the spatial diversity of the training data while preserving spectral fidelity, we propose a new data augmentation method based on spatial super-resolution, in which HSIs are upscaled first, and thus abundant training patches could be obtained by cropping the high-resolution HSIs. In addition, to improve the perceptual quality of the augmented data, we introduce a novel two-stage HSI super-resolution framework, which first applies RGB bands super-resolution and then utilizes our proposed Rectangular Guided Attention Network (RGAN) for guided HSI super-resolution. Experiments demonstrate that the proposed model is capable of generating a vast quantity of realistic HSIs for downstream tasks such as denoising and super-resolution. The code and models are available at https://github.com/LiPang/HSIGene.
△ Less
Submitted 1 November, 2024; v1 submitted 19 September, 2024;
originally announced September 2024.
-
Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm
Authors:
Yuning Wu,
Jiatong Shi,
Yifeng Yu,
Yuxun Tang,
Tao Qian,
Yueqian Lin,
Jionghao Han,
Xinyi Bai,
Shinji Watanabe,
Qin Jin
Abstract:
This research presents Muskits-ESPnet, a versatile toolkit that introduces new paradigms to Singing Voice Synthesis (SVS) through the application of pretrained audio models in both continuous and discrete approaches. Specifically, we explore discrete representations derived from SSL models and audio codecs and offer significant advantages in versatility and intelligence, supporting multi-format in…
▽ More
This research presents Muskits-ESPnet, a versatile toolkit that introduces new paradigms to Singing Voice Synthesis (SVS) through the application of pretrained audio models in both continuous and discrete approaches. Specifically, we explore discrete representations derived from SSL models and audio codecs and offer significant advantages in versatility and intelligence, supporting multi-format inputs and adaptable data processing workflows for various SVS models. The toolkit features automatic music score error detection and correction, as well as a perception auto-evaluation module to imitate human subjective evaluating scores. Muskits-ESPnet is available at \url{https://github.com/espnet/espnet}.
△ Less
Submitted 10 October, 2024; v1 submitted 11 September, 2024;
originally announced September 2024.
-
Transmissive RIS Enabled Transceiver Systems:Architecture, Design Issues and Opportunities
Authors:
Zhendong Li,
Wen Chen,
Qingqing Wu,
Ziwei Liu,
Chong He,
Xudong Bai,
Jun Li
Abstract:
Reconfigurable intelligent surface (RIS) is anticipated to augment the performance of beyond fifth-generation (B5G) and sixth-generation (6G) networks by intelligently manipulating the state of its components. Rather than employing reflective RIS for aided communications, this paper proposes an innovative transmissive RIS-enabled transceiver (TRTC) architecture that can accomplish the functions of…
▽ More
Reconfigurable intelligent surface (RIS) is anticipated to augment the performance of beyond fifth-generation (B5G) and sixth-generation (6G) networks by intelligently manipulating the state of its components. Rather than employing reflective RIS for aided communications, this paper proposes an innovative transmissive RIS-enabled transceiver (TRTC) architecture that can accomplish the functions of traditional multi-antenna systems in a cost-effective and energy-efficient manner. First, the proposed network architecture and its corresponding transmission scheme are elaborated from the perspectives of downlink (DL) and uplink (UL) transmissions. Then, we illustrate several significant advantages and differences of TRTC compared to other multiantenna systems. Furthermore, the downlink modulation and extraction principle based on time-modulation array (TMA) is introduced in detail to tackle the multi-stream communications. Moreover, a near-far field channel model appropriate for this architecture is proposed. Based on the channel model, we summarize some state-of-the-art channel estimation schemes, and the channel estimation scheme of TRTC is also provided. Considering the optimization for DL and UL communications, we present numerical simulations that confirm the superiority of the proposed optimization algorithm. Lastly, numerous prospective research avenues for TRTC systems are delineated to inspire further exploration.
△ Less
Submitted 24 August, 2024;
originally announced August 2024.
-
Stochastic Real-Time Economic Dispatch for Integrated Electric and Gas Systems Considering Uncertainty Propagation and Pipeline Leakage
Authors:
eiyao Zhao,
Zhengshuo Li,
Jiahui Zhang,
Xiang Bai,
Jia Su
Abstract:
Gas-fired units (GFUs) with rapid regulation capabilities are considered an effective tool to mitigate fluctuations in the generation of renewable energy sources and have coupled electricity power systems (EPSs) and natural gas systems (NGSs) more tightly. However, this tight coupling leads to uncertainty propagation, a challenge for the real-time dispatch of such integrated electric and gas syste…
▽ More
Gas-fired units (GFUs) with rapid regulation capabilities are considered an effective tool to mitigate fluctuations in the generation of renewable energy sources and have coupled electricity power systems (EPSs) and natural gas systems (NGSs) more tightly. However, this tight coupling leads to uncertainty propagation, a challenge for the real-time dispatch of such integrated electric and gas systems (IEGSs). Moreover, pipeline leakage failures in the NGS may threaten the electricity supply reliability of the EPS through GFUs. To address these problems, this paper first establishes an operational model considering gas pipeline dynamic characteristics under uncertain leakage failures for the NGS and then presents a stochastic IEGS real-time economic dispatch (RTED) model considering both uncertainty propagation and pipeline leakage uncertainty. To quickly solve this complicated large-scale stochastic optimization problem, a novel notion of the coupling boundary dynamic adjustment region considering pipeline leakage failure (LCBDAR) is proposed to characterize the dynamic characteristics of the NGS boundary connecting GFUs. Based on the LCBDAR, a noniterative decentralized solution is proposed to decompose the original stochastic RTED model into two subproblems that are solved separately by the EPS and NGS operators, thus preserving their data privacy. In particular, only one-time data interaction from the NGS to the EPS is required. Case studies on several IEGSs at different scales demonstrate the effectiveness of the proposed method.
△ Less
Submitted 15 August, 2024;
originally announced August 2024.
-
SAR to Optical Image Translation with Color Supervised Diffusion Model
Authors:
Xinyu Bai,
Feng Xu
Abstract:
Synthetic Aperture Radar (SAR) offers all-weather, high-resolution imaging capabilities, but its complex imaging mechanism often poses challenges for interpretation. In response to these limitations, this paper introduces an innovative generative model designed to transform SAR images into more intelligible optical images, thereby enhancing the interpretability of SAR images. Specifically, our mod…
▽ More
Synthetic Aperture Radar (SAR) offers all-weather, high-resolution imaging capabilities, but its complex imaging mechanism often poses challenges for interpretation. In response to these limitations, this paper introduces an innovative generative model designed to transform SAR images into more intelligible optical images, thereby enhancing the interpretability of SAR images. Specifically, our model backbone is based on the recent diffusion models, which have powerful generative capabilities. We employ SAR images as conditional guides in the sampling process and integrate color supervision to counteract color shift issues effectively. We conducted experiments on the SEN12 dataset and employed quantitative evaluations using peak signal-to-noise ratio, structural similarity, and fréchet inception distance. The results demonstrate that our model not only surpasses previous methods in quantitative assessments but also significantly enhances the visual quality of the generated images.
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
Accelerating Diffusion for SAR-to-Optical Image Translation via Adversarial Consistency Distillation
Authors:
Xinyu Bai,
Feng Xu
Abstract:
Synthetic Aperture Radar (SAR) provides all-weather, high-resolution imaging capabilities, but its unique imaging mechanism often requires expert interpretation, limiting its widespread applicability. Translating SAR images into more easily recognizable optical images using diffusion models helps address this challenge. However, diffusion models suffer from high latency due to numerous iterative i…
▽ More
Synthetic Aperture Radar (SAR) provides all-weather, high-resolution imaging capabilities, but its unique imaging mechanism often requires expert interpretation, limiting its widespread applicability. Translating SAR images into more easily recognizable optical images using diffusion models helps address this challenge. However, diffusion models suffer from high latency due to numerous iterative inferences, while Generative Adversarial Networks (GANs) can achieve image translation with just a single iteration but often at the cost of image quality. To overcome these issues, we propose a new training framework for SAR-to-optical image translation that combines the strengths of both approaches. Our method employs consistency distillation to reduce iterative inference steps and integrates adversarial learning to ensure image clarity and minimize color shifts. Additionally, our approach allows for a trade-off between quality and speed, providing flexibility based on application requirements. We conducted experiments on SEN12 and GF3 datasets, performing quantitative evaluations using Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), and Frechet Inception Distance (FID), as well as calculating the inference latency. The results demonstrate that our approach significantly improves inference speed by 131 times while maintaining the visual quality of the generated images, thus offering a robust and efficient solution for SAR-to-optical image translation.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Unraveling Radiomics Complexity: Strategies for Optimal Simplicity in Predictive Modeling
Authors:
Mahdi Ait Lhaj Loutfi,
Teodora Boblea Podasca,
Alex Zwanenburg,
Taman Upadhaya,
Jorge Barrios,
David R. Raleigh,
William C. Chen,
Dante P. I. Capaldi,
Hong Zheng,
Olivier Gevaert,
Jing Wu,
Alvin C. Silva,
Paul J. Zhang,
Harrison X. Bai,
Jan Seuntjens,
Steffen Löck,
Patrick O. Richard,
Olivier Morin,
Caroline Reinhold,
Martin Lepage,
Martin Vallières
Abstract:
Background: The high dimensionality of radiomic feature sets, the variability in radiomic feature types and potentially high computational requirements all underscore the need for an effective method to identify the smallest set of predictive features for a given clinical problem. Purpose: Develop a methodology and tools to identify and explain the smallest set of predictive radiomic features. Mat…
▽ More
Background: The high dimensionality of radiomic feature sets, the variability in radiomic feature types and potentially high computational requirements all underscore the need for an effective method to identify the smallest set of predictive features for a given clinical problem. Purpose: Develop a methodology and tools to identify and explain the smallest set of predictive radiomic features. Materials and Methods: 89,714 radiomic features were extracted from five cancer datasets: low-grade glioma, meningioma, non-small cell lung cancer (NSCLC), and two renal cell carcinoma cohorts (n=2104). Features were categorized by computational complexity into morphological, intensity, texture, linear filters, and nonlinear filters. Models were trained and evaluated on each complexity level using the area under the curve (AUC). The most informative features were identified, and their importance was explained. The optimal complexity level and associated most informative features were identified using systematic statistical significance analyses and a false discovery avoidance procedure, respectively. Their predictive importance was explained using a novel tree-based method. Results: MEDimage, a new open-source tool, was developed to facilitate radiomic studies. Morphological features were optimal for MRI-based meningioma (AUC: 0.65) and low-grade glioma (AUC: 0.68). Intensity features were optimal for CECT-based renal cell carcinoma (AUC: 0.82) and CT-based NSCLC (AUC: 0.76). Texture features were optimal for MRI-based renal cell carcinoma (AUC: 0.72). Tuning the Hounsfield unit range improved results for CECT-based renal cell carcinoma (AUC: 0.86). Conclusion: Our proposed methodology and software can estimate the optimal radiomics complexity level for specific medical outcomes, potentially simplifying the use of radiomics in predictive modeling across various contexts.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Research on OPF control of three-phase four-wire low-voltage distribution network considering uncertainty
Authors:
Rui Wang,
Xiaoqing Bai,
Shengquan Huang,
Shoupu Wei
Abstract:
As power systems become more complex and uncertain, low-voltage distribution networks face numerous challenges, including three-phase imbalances caused by asymmetrical loads and distributed energy resources. We propose a robust stochastic optimization (RSO) based optimal power flow (OPF) control method for three-phase, four-wire low-voltage distribution networks that consider uncertainty to addres…
▽ More
As power systems become more complex and uncertain, low-voltage distribution networks face numerous challenges, including three-phase imbalances caused by asymmetrical loads and distributed energy resources. We propose a robust stochastic optimization (RSO) based optimal power flow (OPF) control method for three-phase, four-wire low-voltage distribution networks that consider uncertainty to address these issues. Using historical data and deep learning classification methods, the proposed method simulates optimal system behaviour without requiring communication infrastructure. The simulation results verify that the proposed method effectively controls the voltage and current amplitude while minimizing the operational cost and three-phase imbalance within acceptable limits. The proposed method shows promise for managing uncertainties and optimizing performance in low-voltage distribution networks.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and ACE-KiSing
Authors:
Jiatong Shi,
Yueqian Lin,
Xinyi Bai,
Keyi Zhang,
Yuning Wu,
Yuxun Tang,
Yifeng Yu,
Qin Jin,
Shinji Watanabe
Abstract:
In singing voice synthesis (SVS), generating singing voices from musical scores faces challenges due to limited data availability. This study proposes a unique strategy to address the data scarcity in SVS. We employ an existing singing voice synthesizer for data augmentation, complemented by detailed manual tuning, an approach not previously explored in data curation, to reduce instances of unnatu…
▽ More
In singing voice synthesis (SVS), generating singing voices from musical scores faces challenges due to limited data availability. This study proposes a unique strategy to address the data scarcity in SVS. We employ an existing singing voice synthesizer for data augmentation, complemented by detailed manual tuning, an approach not previously explored in data curation, to reduce instances of unnatural voice synthesis. This innovative method has led to the creation of two expansive singing voice datasets, ACE-Opencpop and ACE-KiSing, which are instrumental for large-scale, multi-singer voice synthesis. Through thorough experimentation, we establish that these datasets not only serve as new benchmarks for SVS but also enhance SVS performance on other singing voice datasets when used as supplementary resources. The corpora, pre-trained models, and their related training recipes are publicly available at ESPnet-Muskits (\url{https://github.com/espnet/espnet})
△ Less
Submitted 12 June, 2024; v1 submitted 31 January, 2024;
originally announced January 2024.
-
Entropy-based Probing Beam Selection and Beam Prediction via Deep Learning
Authors:
Fan Meng,
Cheng Zhang,
Yongming Huang,
Zhilei Zhang,
Xiaoyu Bai,
Zhaohua Lu
Abstract:
Hierarchical beam search in mmWave communications incurs substantial training overhead, necessitating deep learning-enabled beam predictions to effectively leverage channel priors and mitigate this overhead. In this study, we introduce a comprehensive probabilistic model of power distribution in beamspace, and formulate the joint optimization problem of probing beam selection and probabilistic bea…
▽ More
Hierarchical beam search in mmWave communications incurs substantial training overhead, necessitating deep learning-enabled beam predictions to effectively leverage channel priors and mitigate this overhead. In this study, we introduce a comprehensive probabilistic model of power distribution in beamspace, and formulate the joint optimization problem of probing beam selection and probabilistic beam prediction as an entropy minimization problem. Then, we propose a greedy scheme to iteratively and alternately solve this problem, where a transformer-based beam predictor is trained to estimate the conditional power distribution based on the probing beams and user location within each iteration, and the trained predictor selects an unmeasured beam that minimizes the entropy of remaining beams. To further reduce the number of interactions and the computational complexity of the iterative scheme, we propose a two-stage probing beam selection scheme. Firstly, probing beams are selected from a location-specific codebook designed by an entropy-based criterion, and predictions are made with corresponding feedback. Secondly, the optimal beam is identified using additional probing beams with the highest predicted power values. Simulation results demonstrate the superiority of the proposed schemes compared to hierarchical beam search and beam prediction with uniform probing beams.
△ Less
Submitted 3 January, 2024;
originally announced January 2024.
-
SAMIHS: Adaptation of Segment Anything Model for Intracranial Hemorrhage Segmentation
Authors:
Yinuo Wang,
Kai Chen,
Weimin Yuan,
Cai Meng,
XiangZhi Bai
Abstract:
Segment Anything Model (SAM), a vision foundation model trained on large-scale annotations, has recently continued raising awareness within medical image segmentation. Despite the impressive capabilities of SAM on natural scenes, it struggles with performance decline when confronted with medical images, especially those involving blurry boundaries and highly irregular regions of low contrast. In t…
▽ More
Segment Anything Model (SAM), a vision foundation model trained on large-scale annotations, has recently continued raising awareness within medical image segmentation. Despite the impressive capabilities of SAM on natural scenes, it struggles with performance decline when confronted with medical images, especially those involving blurry boundaries and highly irregular regions of low contrast. In this paper, a SAM-based parameter-efficient fine-tuning method, called SAMIHS, is proposed for intracranial hemorrhage segmentation, which is a crucial and challenging step in stroke diagnosis and surgical planning. Distinguished from previous SAM and SAM-based methods, SAMIHS incorporates parameter-refactoring adapters into SAM's image encoder and considers the efficient and flexible utilization of adapters' parameters. Additionally, we employ a combo loss that combines binary cross-entropy loss and boundary-sensitive loss to enhance SAMIHS's ability to recognize the boundary regions. Our experimental results on two public datasets demonstrate the effectiveness of our proposed method. Code is available at https://github.com/mileswyn/SAMIHS .
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
Classification-Aided Robust Multiple Target Tracking Using Neural Enhanced Message Passing
Authors:
Xianglong Bai,
Zengfu Wang,
Quan Pan,
Tao Yun,
Hua Lan
Abstract:
We address the challenge of tracking an unknown number of targets in strong clutter environments using measurements from a radar sensor. Leveraging the range-Doppler spectra information, we identify the measurement classes, which serve as additional information to enhance clutter rejection and data association, thus bolstering the robustness of target tracking. We first introduce a novel neural en…
▽ More
We address the challenge of tracking an unknown number of targets in strong clutter environments using measurements from a radar sensor. Leveraging the range-Doppler spectra information, we identify the measurement classes, which serve as additional information to enhance clutter rejection and data association, thus bolstering the robustness of target tracking. We first introduce a novel neural enhanced message passing approach, where the beliefs obtained by the unified message passing are fed into the neural network as additional information. The output beliefs are then utilized to refine the original beliefs. Then, we propose a classification-aided robust multiple target tracking algorithm, employing the neural enhanced message passing technique. This algorithm is comprised of three modules: a message-passing module, a neural network module, and a Dempster-Shafer module. The message-passing module is used to represent the statistical model by the factor graph and infers target kinematic states, visibility states, and data associations based on the spatial measurement information. The neural network module is employed to extract features from range-Doppler spectra and derive beliefs on whether a measurement is target-generated or clutter-generated. The Dempster-Shafer module is used to fuse the beliefs obtained from both the factor graph and the neural network. As a result, our proposed algorithm adopts a model-and-data-driven framework, effectively enhancing clutter suppression and data association, leading to significant improvements in multiple target tracking performance. We validate the effectiveness of our approach using both simulated and real data scenarios, demonstrating its capability to handle challenging tracking scenarios in practical radar applications.
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
Weakly Supervised YOLO Network for Surgical Instrument Localization in Endoscopic Videos
Authors:
Rongfeng Wei,
Jinlin Wu,
Xuexue Bai,
Ming Feng,
Zhen Lei,
Hongbin Liu,
Zhen Chen
Abstract:
In minimally invasive surgery, surgical instrument localization is a crucial task for endoscopic videos, which enables various applications for improving surgical outcomes. However, annotating the instrument localization in endoscopic videos is tedious and labor-intensive. In contrast, obtaining the category information is easy and efficient in real-world applications. To fully utilize the categor…
▽ More
In minimally invasive surgery, surgical instrument localization is a crucial task for endoscopic videos, which enables various applications for improving surgical outcomes. However, annotating the instrument localization in endoscopic videos is tedious and labor-intensive. In contrast, obtaining the category information is easy and efficient in real-world applications. To fully utilize the category information and address the localization problem, we propose a weakly supervised localization framework named WS-YOLO for surgical instruments. By leveraging the instrument category information as the weak supervision, our WS-YOLO framework adopts an unsupervised multi-round training strategy for the localization capability training. We validate our WS-YOLO framework on the Endoscopic Vision Challenge 2023 dataset, which achieves remarkable performance in the weakly supervised surgical instrument localization. The source code is available at https://github.com/Breezewrf/WS-YOLO.
△ Less
Submitted 20 June, 2024; v1 submitted 23 September, 2023;
originally announced September 2023.
-
Hierarchical Uncertainty Estimation for Medical Image Segmentation Networks
Authors:
Xinyu Bai,
Wenjia Bai
Abstract:
Learning a medical image segmentation model is an inherently ambiguous task, as uncertainties exist in both images (noise) and manual annotations (human errors and bias) used for model training. To build a trustworthy image segmentation model, it is important to not just evaluate its performance but also estimate the uncertainty of the model prediction. Most state-of-the-art image segmentation net…
▽ More
Learning a medical image segmentation model is an inherently ambiguous task, as uncertainties exist in both images (noise) and manual annotations (human errors and bias) used for model training. To build a trustworthy image segmentation model, it is important to not just evaluate its performance but also estimate the uncertainty of the model prediction. Most state-of-the-art image segmentation networks adopt a hierarchical encoder architecture, extracting image features at multiple resolution levels from fine to coarse. In this work, we leverage this hierarchical image representation and propose a simple yet effective method for estimating uncertainties at multiple levels. The multi-level uncertainties are modelled via the skip-connection module and then sampled to generate an uncertainty map for the predicted image segmentation. We demonstrate that a deep learning segmentation network such as U-net, when implemented with such hierarchical uncertainty estimation module, can achieve a high segmentation performance, while at the same time provide meaningful uncertainty maps that can be used for out-of-distribution detection.
△ Less
Submitted 16 August, 2023;
originally announced August 2023.
-
Liver Tumor Screening and Diagnosis in CT with Pixel-Lesion-Patient Network
Authors:
Ke Yan,
Xiaoli Yin,
Yingda Xia,
Fakai Wang,
Shu Wang,
Yuan Gao,
Jiawen Yao,
Chunli Li,
Xiaoyu Bai,
Jingren Zhou,
Ling Zhang,
Le Lu,
Yu Shi
Abstract:
Liver tumor segmentation and classification are important tasks in computer aided diagnosis. We aim to address three problems: liver tumor screening and preliminary diagnosis in non-contrast computed tomography (CT), and differential diagnosis in dynamic contrast-enhanced CT. A novel framework named Pixel-Lesion-pAtient Network (PLAN) is proposed. It uses a mask transformer to jointly segment and…
▽ More
Liver tumor segmentation and classification are important tasks in computer aided diagnosis. We aim to address three problems: liver tumor screening and preliminary diagnosis in non-contrast computed tomography (CT), and differential diagnosis in dynamic contrast-enhanced CT. A novel framework named Pixel-Lesion-pAtient Network (PLAN) is proposed. It uses a mask transformer to jointly segment and classify each lesion with improved anchor queries and a foreground-enhanced sampling loss. It also has an image-wise classifier to effectively aggregate global information and predict patient-level diagnosis. A large-scale multi-phase dataset is collected containing 939 tumor patients and 810 normal subjects. 4010 tumor instances of eight types are extensively annotated. On the non-contrast tumor screening task, PLAN achieves 95% and 96% in patient-level sensitivity and specificity. On contrast-enhanced CT, our lesion-level detection precision, recall, and classification accuracy are 92%, 89%, and 86%, outperforming widely used CNN and transformers for lesion segmentation. We also conduct a reader study on a holdout set of 250 cases. PLAN is on par with a senior human radiologist, showing the clinical significance of our results.
△ Less
Submitted 21 October, 2023; v1 submitted 17 July, 2023;
originally announced July 2023.
-
Combinatorial-restless-bandit-based Transmitter-Receiver Online Selection for Distributed MIMO Radars With Non-Stationary Channels
Authors:
Yuhang Hao,
Zengfu Wang,
Jing Fu,
Xianglong Bai,
Can Li,
Quan Pan
Abstract:
We track moving targets with a distributed multiple-input multiple-output (MIMO) radar, for which the transmitters and receivers are appropriately paired and selected with a limited number of radar stations. We aim to maximize the sum of the signal-to-interference-plus-noise ratios (SINRs) of all the targets by sensibly selecting the transmitter-receiver pairs during the tracking period. A key is…
▽ More
We track moving targets with a distributed multiple-input multiple-output (MIMO) radar, for which the transmitters and receivers are appropriately paired and selected with a limited number of radar stations. We aim to maximize the sum of the signal-to-interference-plus-noise ratios (SINRs) of all the targets by sensibly selecting the transmitter-receiver pairs during the tracking period. A key is to model the optimization problem of selecting the transmitter-receiver pairs by a restless multi-armed bandit (RMAB) model that is able to formulate the time-varying signals of the transceiver channels whenever the channels are being probed or not. We regard the estimated mean reward (i.e., SINR) as the state of an arm. If an arm is probed, the estimated mean reward of the arm is the weighted sum of the observed reward and the predicted mean reward; otherwise, it is the predicted mean reward. We associate the predicted mean reward with the estimated mean reward at the previous time slot and the state of the target, which is estimated via the interacting multiple model-unscented Kalman filter (IMM-UKF). The optimized selection of transmitter-receiver pairs at each time is accomplished by using Binary Particle Swarm Optimization (BPSO) based on indexes of arms, each of which is designed by the upper confidence bound (UCB1) algorithm. Above all, a multi-group combinatorial-restless-bandit technique taking into account of different combinations of transmitters and receivers and the closed-loop scheme between transmitter-receiver pair selection and target state estimation, namely MG-CRB-CL, is developed to achieve a near-optimal selection strategy and improve multi-target tracking performance. Simulation results for different scenarios are provided to verify the effectiveness and superior performance of our MG-CRB-CL algorithm.
△ Less
Submitted 16 June, 2023;
originally announced June 2023.
-
SAM3D: Zero-Shot 3D Object Detection via Segment Anything Model
Authors:
Dingyuan Zhang,
Dingkang Liang,
Hongcheng Yang,
Zhikang Zou,
Xiaoqing Ye,
Zhe Liu,
Xiang Bai
Abstract:
With the development of large language models, many remarkable linguistic systems like ChatGPT have thrived and achieved astonishing success on many tasks, showing the incredible power of foundation models. In the spirit of unleashing the capability of foundation models on vision tasks, the Segment Anything Model (SAM), a vision foundation model for image segmentation, has been proposed recently a…
▽ More
With the development of large language models, many remarkable linguistic systems like ChatGPT have thrived and achieved astonishing success on many tasks, showing the incredible power of foundation models. In the spirit of unleashing the capability of foundation models on vision tasks, the Segment Anything Model (SAM), a vision foundation model for image segmentation, has been proposed recently and presents strong zero-shot ability on many downstream 2D tasks. However, whether SAM can be adapted to 3D vision tasks has yet to be explored, especially 3D object detection. With this inspiration, we explore adapting the zero-shot ability of SAM to 3D object detection in this paper. We propose a SAM-powered BEV processing pipeline to detect objects and get promising results on the large-scale Waymo open dataset. As an early attempt, our method takes a step toward 3D object detection with vision foundation models and presents the opportunity to unleash their power on 3D vision tasks. The code is released at https://github.com/DYZhang09/SAM3D.
△ Less
Submitted 29 January, 2024; v1 submitted 3 June, 2023;
originally announced June 2023.
-
Infrared Image Deturbulence Restoration Using Degradation Parameter-Assisted Wide & Deep Learning
Authors:
Yi Lu,
Yadong Wang,
Xingbo Jiang,
Xiangzhi Bai
Abstract:
Infrared images captured under turbulent conditions are degraded by complex geometric distortions and blur. We address infrared deturbulence as an image restoration task, proposing DparNet, a parameter-assisted multi-frame network with a wide & deep architecture. DparNet learns a degradation prior (key parameter matrix) directly from degraded images without external knowledge. Its wide & deep arch…
▽ More
Infrared images captured under turbulent conditions are degraded by complex geometric distortions and blur. We address infrared deturbulence as an image restoration task, proposing DparNet, a parameter-assisted multi-frame network with a wide & deep architecture. DparNet learns a degradation prior (key parameter matrix) directly from degraded images without external knowledge. Its wide & deep architecture uses these learned parameters to directly modulate restoration, achieving spatially and intensity adaptive results. Evaluated on dedicated infrared deturbulence (49,744 images) and visible image denoising (109,536 images) datasets, DparNet significantly outperforms State-of-the-Art (SOTA) methods in restoration performance and efficiency. Notably, leveraging these parameters improves PSNR by 0.6-1.1 dB with less than 2% increase in model parameters and computational complexity. Our work demonstrates that degraded images hide key degradation information that can be learned and utilized to boost adaptive image restoration.
△ Less
Submitted 6 May, 2025; v1 submitted 29 May, 2023;
originally announced May 2023.
-
Robust Multitarget Tracking in Interference Environments: A Message-Passing Approach
Authors:
Xianglong Bai,
Hua Lan,
Zengfu Wang,
Quan Pan,
Yuhang Hao,
Can Li
Abstract:
Multitarget tracking in the interference environments suffers from the nonuniform, unknown and time-varying clutter, resulting in dramatic performance deterioration. We address this challenge by proposing a robust multitarget tracking algorithm, which estimates the states of clutter and targets simultaneously by the message-passing (MP) approach. We define the non-homogeneous clutter with a finite…
▽ More
Multitarget tracking in the interference environments suffers from the nonuniform, unknown and time-varying clutter, resulting in dramatic performance deterioration. We address this challenge by proposing a robust multitarget tracking algorithm, which estimates the states of clutter and targets simultaneously by the message-passing (MP) approach. We define the non-homogeneous clutter with a finite mixture model containing a uniform component and multiple nonuniform components. The measured signal strength is utilized to estimate the mean signal-to-noise ratio (SNR) of targets and the mean clutter-to-noise ratio (CNR) of clutter, which are then used as additional feature information of targets and clutter to improve the performance of discrimination of targets from clutter. We also present a hybrid data association which can reason over correspondence between targets, clutter, and measurements. Then, a unified MP algorithm is used to infer the marginal posterior probability distributions of targets, clutter, and data association by splitting the joint probability distribution into a mean-field approximate part and a belief propagation part. As a result, a closed-loop iterative optimization of the posterior probability distribution can be obtained, which can effectively deal with the coupling between target tracking, clutter estimation and data association. Simulation results demonstrate the performance superiority and robustness of the proposed multitarget tracking algorithm compared with the probability hypothesis density (PHD) filter and the cardinalized PHD (CPHD) filter.
△ Less
Submitted 14 December, 2022;
originally announced December 2022.
-
Fast Quasi-Optimal Power Flow of Flexible DC Traction Power Systems
Authors:
Zhanhe Li,
Xiaoqian Li,
Yingdong Wei,
Chao Lu,
Xuelian Bai
Abstract:
This paper proposes a quasi-optimal power flow (OPF) algorithm for flexible DC traction power systems (TPSs). Near-optimal solutions can be solved with high computational efficiency by the proposed quasi-OPF. Unlike conventional OPF utilizing mathematical optimization algorithms, the proposed quasi-OPF adopts analytical mapping from load information to near-optimal solutions, hence considerably ac…
▽ More
This paper proposes a quasi-optimal power flow (OPF) algorithm for flexible DC traction power systems (TPSs). Near-optimal solutions can be solved with high computational efficiency by the proposed quasi-OPF. Unlike conventional OPF utilizing mathematical optimization algorithms, the proposed quasi-OPF adopts analytical mapping from load information to near-optimal solutions, hence considerably accelerating the computation. First, we study the mechanism and physical meaning interpretation of conventional OPF based on a new modeling method and successfully interpret the mechanism of conventional OPF in flexible DC TPSs. Then, the analytical mapping from load information to near-optimal solutions is obtained inspired by the mechanism of conventional OPF, and the quasi-OPF algorithm is designed based on the mapping. Since the mapping is based on simple arithmetic, the quasi-OPF algorithm can solve OPF with much less execution time, achieving subsecond level calculation and a speed-up of 57 times compared to conventional OPF. The effectiveness is verified by mathematical proofs and a case study with Beijing Metro Line 13. It provides an insight into the mechanism and physical meaning of OPF, and is a powerful tool for flexible DC TPSs to analyze the effects of coordinated control, design real-time control strategies, and solve operational problems in planning.
△ Less
Submitted 5 November, 2022;
originally announced November 2022.
-
A Holistic Robust Motion Controller Framework for Autonomous Platooning
Authors:
Hong Wang,
Li-Ming Peng,
Zi-Chun Wei,
Kai Yang,
Xian-Xu Bai,
Luo Jiang,
Ehsan Hashemi
Abstract:
Safety is the foremost concern for autonomous platooning. The vehicle-to-vehicle (V2V) communication delay and the sudden appearance of obstacles will trigger the safety of the intended functionality (SOTIF) issues for autonomous platooning. This research proposes a holistic robust motion controller framework (MCF) for an intelligent and connected vehicle platoon system. The MCF utilizes a hierarc…
▽ More
Safety is the foremost concern for autonomous platooning. The vehicle-to-vehicle (V2V) communication delay and the sudden appearance of obstacles will trigger the safety of the intended functionality (SOTIF) issues for autonomous platooning. This research proposes a holistic robust motion controller framework (MCF) for an intelligent and connected vehicle platoon system. The MCF utilizes a hierarchical structure to resolve the longitudinal string stability and the lateral control problem under the complex driving environment and time-varying communication delay. Firstly, the H-infinity feedback controller is developed to ensure the robustness of the platoon under time-varying communication delay in the upper-level coordination layer (UCL). The output from UCL will be delivered to the lower-level motion-planning layer (LML) as reference signals. Secondly, the model predictive control (MPC) algorithm is implemented in the LML to achieve multi-objective control, which comprehensively considers the reference signals, the artificial potential field, and multiple vehicle dynamics constraints. Furthermore, three critical scenarios are co-simulated for case studies, including platooning under time-varying communication delay, merging, and obstacle avoidance scenarios. The simulation results indicate that, compared with single-structure MPC, the proposed MCF can offer a better suppression on position error propagation, and get improvements on maximum position error in the three scenarios by $19.2\%$, $59.8\%$, and $15.3\%$, respectively. Last, the practicability and effectiveness of the proposed MCF are verified via hardware-in-the-loop experiment. The average conducting time of the proposed method on Speedgoat real-time target machine is 1.1 milliseconds, which meets the real-time requirements.
△ Less
Submitted 10 June, 2022;
originally announced June 2022.
-
Multi-Antenna Systems by Transmissive Reconfigurable Meta-Surface
Authors:
Zhendong Li,
Wen Chen,
Chong He,
Xudong Bai,
Jianmin Lu
Abstract:
Reconfigurable meta-surface (RMS) is proposed as a very promising and novel technology, which is composed of a large number of low-cost passive elements, and can achieve passive beamforming by controlling the amplitude and phase of incident electromagnetic (EM) waves. Therefore, in order to solve the challenges of high power consumption and high cost of existing base stations (BSs), we propose a l…
▽ More
Reconfigurable meta-surface (RMS) is proposed as a very promising and novel technology, which is composed of a large number of low-cost passive elements, and can achieve passive beamforming by controlling the amplitude and phase of incident electromagnetic (EM) waves. Therefore, in order to solve the challenges of high power consumption and high cost of existing base stations (BSs), we propose a low-cost and low-power consumption transmissive RMS multi-antenna system in this paper. Specifically, we first provide an overview of the transmissive RMS multi-antenna system, including its advantages, network architecture, transmission mechanism, modulation principle, channel model and channel estimation technique. Then, we address transceiver design and optimization for downlink (DL) and uplink (UL), and some numerical results are also given to verify the effectiveness of the proposed algorithm. Finally, several potential research directions of the transmissive RMS multi-antenna system are given to inspire further investigation in future work.
△ Less
Submitted 20 February, 2022; v1 submitted 12 September, 2021;
originally announced September 2021.
-
An Optimal Resource Allocator of Elastic Training for Deep Learning Jobs on Cloud
Authors:
Liang Hu,
Jiangcheng Zhu,
Zirui Zhou,
Ruiqing Cheng,
Xiaolong Bai,
Yong Zhang
Abstract:
Cloud training platforms, such as Amazon Web Services and Huawei Cloud provide users with computational resources to train their deep learning jobs. Elastic training is a service embedded in cloud training platforms that dynamically scales up or down the resources allocated to a job. The core technique of an elastic training system is to best allocate limited resources among heterogeneous jobs in…
▽ More
Cloud training platforms, such as Amazon Web Services and Huawei Cloud provide users with computational resources to train their deep learning jobs. Elastic training is a service embedded in cloud training platforms that dynamically scales up or down the resources allocated to a job. The core technique of an elastic training system is to best allocate limited resources among heterogeneous jobs in terms of shorter queueing delay and higher training efficiency. This paper presents an optimal resource allocator for elastic training system that leverages a mixed-integer programming (MIP) model to maximize the training progress of deep learning jobs. We take advantage of the real-world job data obtained from ModelArts, the deep learning training platform of Huawei Cloud and conduct simulation experiments to compare the optimal resource allocator with a greedy one as benchmark. Numerical results show that the proposed allocator can reduce queuing time by up to 32% and accelerate training efficiency by up to 24% relative to the greedy resource allocator, thereby greatly improving user experience with Huawei ModelArts and potentially enabling the realization of higher profits for the product. Also, the optimal resource allocator is fast in decision-making, taking merely 0.4 seconds on average.
△ Less
Submitted 7 September, 2021;
originally announced September 2021.
-
Time-correlated Window Carrier-phase Aided GNSS Positioning Using Factor Graph Optimization for Urban Positioning
Authors:
Xiwei Bai,
Weisong Wen,
Li-Ta Hsu
Abstract:
This paper proposes an improved global navigation satellite system (GNSS) positioning method that explores the time correlation between consecutive epochs of the code and carrier phase measurements which significantly increases the robustness against outlier measurements. Instead of relying on the time difference carrier phase (TDCP) which only considers two neighboring epochs using an extended Ka…
▽ More
This paper proposes an improved global navigation satellite system (GNSS) positioning method that explores the time correlation between consecutive epochs of the code and carrier phase measurements which significantly increases the robustness against outlier measurements. Instead of relying on the time difference carrier phase (TDCP) which only considers two neighboring epochs using an extended Kalman filter (EKF) estimator, this paper proposed to employ the carrier-phase measurements inside a window, the so-called window carrier-phase (WCP), to constrain the states inside a factor graph. A left null space matrix is employed to eliminate the shared unknown ambiguity variables and therefore, correlated the associated states inside the WCP. Then the pseudorange, Doppler, and the constructed WCP measurements are integrated simultaneously using factor graph optimization (FGO) to estimate the state of the GNSS receiver. We evaluated the performance of the proposed method in two typical urban canyons in Hong Kong, achieving the mean positioning error of 1.76 meters and 2.96 meters, respectively, using the automobile-level GNSS receiver. Meanwhile, the effectiveness of the proposed method is further evaluated using a low-cost smartphone level GNSS receiver and similar improvement is also obtained, compared with several existing GNSS positioning methods.
△ Less
Submitted 1 September, 2021;
originally announced September 2021.
-
WDNet: Watermark-Decomposition Network for Visible Watermark Removal
Authors:
Yang Liu,
Zhen Zhu,
Xiang Bai
Abstract:
Visible watermarks are widely-used in images to protect copyright ownership. Analyzing watermark removal helps to reinforce the anti-attack techniques in an adversarial way. Current removal methods normally leverage image-to-image translation techniques. Nevertheless, the uncertainty of the size, shape, color and transparency of the watermarks set a huge barrier for these methods. To combat this,…
▽ More
Visible watermarks are widely-used in images to protect copyright ownership. Analyzing watermark removal helps to reinforce the anti-attack techniques in an adversarial way. Current removal methods normally leverage image-to-image translation techniques. Nevertheless, the uncertainty of the size, shape, color and transparency of the watermarks set a huge barrier for these methods. To combat this, we combine traditional watermarked image decomposition into a two-stage generator, called Watermark-Decomposition Network (WDNet), where the first stage predicts a rough decomposition from the whole watermarked image and the second stage specifically centers on the watermarked area to refine the removal results. The decomposition formulation enables WDNet to separate watermarks from the images rather than simply removing them. We further show that these separated watermarks can serve as extra nutrients for building a larger training dataset and further improving removal performance. Besides, we construct a large-scale dataset named CLWD, which mainly contains colored watermarks, to fill the vacuum of colored watermark removal dataset. Extensive experiments on the public gray-scale dataset LVW and CLWD consistently show that the proposed WDNet outperforms the state-of-the-art approaches both in accuracy and efficiency. The code and CLWD dataset are publicly available at https://github.com/MRUIL/WDNet.
△ Less
Submitted 14 December, 2020; v1 submitted 14 December, 2020;
originally announced December 2020.
-
A Two-Stage Approach to Device-Robust Acoustic Scene Classification
Authors:
Hu Hu,
Chao-Han Huck Yang,
Xianjun Xia,
Xue Bai,
Xin Tang,
Yajian Wang,
Shutong Niu,
Li Chai,
Juanjuan Li,
Hongning Zhu,
Feng Bao,
Yuanjun Zhao,
Sabato Marco Siniscalchi,
Yannan Wang,
Jun Du,
Chin-Hui Lee
Abstract:
To improve device robustness, a highly desirable key feature of a competitive data-driven acoustic scene classification (ASC) system, a novel two-stage system based on fully convolutional neural networks (CNNs) is proposed. Our two-stage system leverages on an ad-hoc score combination based on two CNN classifiers: (i) the first CNN classifies acoustic inputs into one of three broad classes, and (i…
▽ More
To improve device robustness, a highly desirable key feature of a competitive data-driven acoustic scene classification (ASC) system, a novel two-stage system based on fully convolutional neural networks (CNNs) is proposed. Our two-stage system leverages on an ad-hoc score combination based on two CNN classifiers: (i) the first CNN classifies acoustic inputs into one of three broad classes, and (ii) the second CNN classifies the same inputs into one of ten finer-grained classes. Three different CNN architectures are explored to implement the two-stage classifiers, and a frequency sub-sampling scheme is investigated. Moreover, novel data augmentation schemes for ASC are also investigated. Evaluated on DCASE 2020 Task 1a, our results show that the proposed ASC system attains a state-of-the-art accuracy on the development set, where our best system, a two-stage fusion of CNN ensembles, delivers a 81.9% average accuracy among multi-device test data, and it obtains a significant improvement on unseen devices. Finally, neural saliency analysis with class activation mapping (CAM) gives new insights on the patterns learnt by our models.
△ Less
Submitted 2 November, 2020;
originally announced November 2020.
-
Wind Power Transmission System Integration -- a Case Study of China Wind Power Base
Authors:
Jianxue Wang,
Shutang You,
Xingzhong Bai,
Mingqiao Peng
Abstract:
Due to a series of supporting policies in recent years, China wind power has developed rapidly through a large-scale and centralized mode. This paper analyzes the two major concerns faced by wind power development in China: wind generation reliability and wind energy balancing. More specifically, wind farm tripping-off-grid incidents and wind power curtailment issues, which caused huge economical…
▽ More
Due to a series of supporting policies in recent years, China wind power has developed rapidly through a large-scale and centralized mode. This paper analyzes the two major concerns faced by wind power development in China: wind generation reliability and wind energy balancing. More specifically, wind farm tripping-off-grid incidents and wind power curtailment issues, which caused huge economical loss, are investigated in details. Based on operation experience of large wind power bases, technical recommendations and economic incentives are proposed to improve wind power integration and power grid reliability. As a summary and outlook of wind power development in China, this paper provides a reference on future wind power development for other countries.
△ Less
Submitted 10 April, 2021; v1 submitted 22 October, 2020;
originally announced October 2020.
-
An Acoustic Segment Model Based Segment Unit Selection Approach to Acoustic Scene Classification with Partial Utterances
Authors:
Hu Hu,
Sabato Marco Siniscalchi,
Yannan Wang,
Xue Bai,
Jun Du,
Chin-Hui Lee
Abstract:
In this paper, we propose a sub-utterance unit selection framework to remove acoustic segments in audio recordings that carry little information for acoustic scene classification (ASC). Our approach is built upon a universal set of acoustic segment units covering the overall acoustic scene space. First, those units are modeled with acoustic segment models (ASMs) used to tokenize acoustic scene utt…
▽ More
In this paper, we propose a sub-utterance unit selection framework to remove acoustic segments in audio recordings that carry little information for acoustic scene classification (ASC). Our approach is built upon a universal set of acoustic segment units covering the overall acoustic scene space. First, those units are modeled with acoustic segment models (ASMs) used to tokenize acoustic scene utterances into sequences of acoustic segment units. Next, paralleling the idea of stop words in information retrieval, stop ASMs are automatically detected. Finally, acoustic segments associated with the stop ASMs are blocked, because of their low indexing power in retrieval of most acoustic scenes. In contrast to building scene models with whole utterances, the ASM-removed sub-utterances, i.e., acoustic utterances without stop acoustic segments, are then used as inputs to the AlexNet-L back-end for final classification. On the DCASE 2018 dataset, scene classification accuracy increases from 68%, with whole utterances, to 72.1%, with segment selection. This represents a competitive accuracy without any data augmentation, and/or ensemble strategy. Moreover, our approach compares favourably to AlexNet-L with attention.
△ Less
Submitted 31 July, 2020;
originally announced August 2020.
-
Device-Robust Acoustic Scene Classification Based on Two-Stage Categorization and Data Augmentation
Authors:
Hu Hu,
Chao-Han Huck Yang,
Xianjun Xia,
Xue Bai,
Xin Tang,
Yajian Wang,
Shutong Niu,
Li Chai,
Juanjuan Li,
Hongning Zhu,
Feng Bao,
Yuanjun Zhao,
Sabato Marco Siniscalchi,
Yannan Wang,
Jun Du,
Chin-Hui Lee
Abstract:
In this technical report, we present a joint effort of four groups, namely GT, USTC, Tencent, and UKE, to tackle Task 1 - Acoustic Scene Classification (ASC) in the DCASE 2020 Challenge. Task 1 comprises two different sub-tasks: (i) Task 1a focuses on ASC of audio signals recorded with multiple (real and simulated) devices into ten different fine-grained classes, and (ii) Task 1b concerns with cla…
▽ More
In this technical report, we present a joint effort of four groups, namely GT, USTC, Tencent, and UKE, to tackle Task 1 - Acoustic Scene Classification (ASC) in the DCASE 2020 Challenge. Task 1 comprises two different sub-tasks: (i) Task 1a focuses on ASC of audio signals recorded with multiple (real and simulated) devices into ten different fine-grained classes, and (ii) Task 1b concerns with classification of data into three higher-level classes using low-complexity solutions. For Task 1a, we propose a novel two-stage ASC system leveraging upon ad-hoc score combination of two convolutional neural networks (CNNs), classifying the acoustic input according to three classes, and then ten classes, respectively. Four different CNN-based architectures are explored to implement the two-stage classifiers, and several data augmentation techniques are also investigated. For Task 1b, we leverage upon a quantization method to reduce the complexity of two of our top-accuracy three-classes CNN-based architectures. On Task 1a development data set, an ASC accuracy of 76.9\% is attained using our best single classifier and data augmentation. An accuracy of 81.9\% is then attained by a final model fusion of our two-stage ASC classifiers. On Task 1b development data set, we achieve an accuracy of 96.7\% with a model size smaller than 500KB. Code is available: https://github.com/MihawkHu/DCASE2020_task1.
△ Less
Submitted 26 August, 2020; v1 submitted 16 July, 2020;
originally announced July 2020.
-
Measurement-Level Fusion for OTHR Network Using Message Passing
Authors:
Hua Lan,
Zengfu Wang,
Xianglong Bai,
Quan Pan,
Kun Lu
Abstract:
Tracking an unknown number of targets based on multipath measurements provided by an over-the-horizon radar (OTHR) network with a statistical ionospheric model is complicated, which requires solving four subproblems: target detection, target tracking, multipath data association and ionospheric height identification. A joint solution is desired since the four subproblems are highly correlated, but…
▽ More
Tracking an unknown number of targets based on multipath measurements provided by an over-the-horizon radar (OTHR) network with a statistical ionospheric model is complicated, which requires solving four subproblems: target detection, target tracking, multipath data association and ionospheric height identification. A joint solution is desired since the four subproblems are highly correlated, but suffering from the intractable inference problem of high-dimensional latent variables. In this paper, a unified message passing approach, combining belief propagation (BP) and mean-field (MF) approximation, is developed for simplifying the intractable inference. Based upon the factor graph corresponding to a factorization of the joint probability distribution function (PDF) of the latent variables and a choice for a separation of this factorization into BP region and MF region, the posterior PDFs of continuous latent variables including target kinematic state, target visibility state, and ionospheric height, are approximated by MF due to its simple MP update rules for conjugate-exponential models. With regard to discrete multipath data association which contains one-to-one frame (hard) constraints, its PDF is approximated by loopy BP. Finally, the approximated posterior PDFs are updated iteratively in a closed-loop manner, which is effective for dealing with the coupling issue among target detection, target tracking, multipath data association, and ionospheric height identification. Meanwhile, the proposed approach has the measurement-level fusion architecture due to the direct processing of the raw multipath measurements from an OTHR network, which is benefit to improving target tracking performance. Its performance is demonstrated on a simulated OTHR network multitarget tracking scenario.
△ Less
Submitted 3 April, 2020; v1 submitted 22 March, 2020;
originally announced March 2020.
-
Deep Prototypical Networks Based Domain Adaptation for Fault Diagnosis
Authors:
Huanjie Wang,
Jie Tan,
Xiwei Bai,
Jiechao Yang
Abstract:
Due to the existence of dataset shifts, the distributions of data acquired from different working conditions show significant differences in real-world industrial applications, which leads to performance degradation of traditional machine learning methods. This work provides a framework that combines supervised domain adaptation with prototype learning for fault diagnosis. The main idea of domain…
▽ More
Due to the existence of dataset shifts, the distributions of data acquired from different working conditions show significant differences in real-world industrial applications, which leads to performance degradation of traditional machine learning methods. This work provides a framework that combines supervised domain adaptation with prototype learning for fault diagnosis. The main idea of domain adaptation is to apply the Siamese architecture to learn a latent space where the mapped features are inter-class separable and intra-class similar for both source and target domains. Moreover, the prototypical layer utilizes the features from Siamese architecture to learn prototype representations of each class. This supervised method is attractive because it needs very few labeled target samples. Moreover, it can be further extended to address the problem when the classes from the source and target domains are not completely overlapping. The model must generalize to unseen classes in the source dataset, given only a few examples of each new target class. Experimental results, on the Case Western Reserve University bearing dataset, show the effectiveness of the proposed framework. With increasing target samples in training, the model quickly converges with high classification accuracy.
△ Less
Submitted 11 December, 2019; v1 submitted 8 December, 2019;
originally announced December 2019.
-
Modeling and Simulation of UAV Carrier Landings
Authors:
Gaurav Misra,
Tianyu Gao,
Xiaoli Bai
Abstract:
With UAVs promising capabilities to increase operation flexibility and reduce mission cost, we are exploiting the automated carrier-landing performance advancement that can be achieved by fixed-wing UAVs. To demonstrate such potentials, in this paper, we investigate two key metrics, namely, flight path control performance, and reduced approach speeds for UAVs based on the F/A-18 High Angle of Atta…
▽ More
With UAVs promising capabilities to increase operation flexibility and reduce mission cost, we are exploiting the automated carrier-landing performance advancement that can be achieved by fixed-wing UAVs. To demonstrate such potentials, in this paper, we investigate two key metrics, namely, flight path control performance, and reduced approach speeds for UAVs based on the F/A-18 High Angle of Attack (HARV) model. The landing control architecture consists of an auto-throttle, a stability augmentation system, glideslope and approach track controllers. The performance of the control model is tested using Monte Carlo simulations under a range of environmental uncertainties including atmospheric turbulence consisting of wind shear, discrete and continuous wind gusts, and carrier airwakes. Realistic deck motion is considered where the standard deck motion time histories under the Systematic Characterization of the Naval Environment (SCONE) program released by the Office of Naval Research (ONR) are used. We numerically demonstrate the limiting approach conditions which allow for successful carrier landings and factors affecting it's performance.
△ Less
Submitted 23 January, 2019;
originally announced January 2019.
-
A balanced energy consumption clustering algorithm for heterogeneous energy wireless sensor networks
Authors:
Xiaofu Ma,
Yu Fang,
Xingzhen Bai
Abstract:
In this paper, a balanced energy consumption clustering algorithm (BECC) is proposed. This new scheme is a cluster-based algorithm designed for heterogeneous energy wireless sensor networks. A polarized energy factor is introduced to adjust the probability with which each node may become a cluster head in the election of the new clustering scheme. Under the condition that the expected number of cl…
▽ More
In this paper, a balanced energy consumption clustering algorithm (BECC) is proposed. This new scheme is a cluster-based algorithm designed for heterogeneous energy wireless sensor networks. A polarized energy factor is introduced to adjust the probability with which each node may become a cluster head in the election of the new clustering scheme. Under the condition that the expected number of cluster heads in the network preserves the theoretical optimal number, BECC makes sure that nodes with higher residual energy will become cluster heads with higher probabilities while nodes with lower residual energy will not become cluster heads. Simulation results show that this new scheme provides longer lifetime than the classical clustering algorithms including LEACH and other improved algorithms in heterogeneous networks, and BECC also reaches larger amount of messages received at the sink.
△ Less
Submitted 4 December, 2018;
originally announced January 2019.
-
A multifeature fusion approach for power system transient stability assessment using PMU data
Authors:
Yang Li,
Guoqing Li,
Zhenhao Wang,
Zijiao Han,
Xue Bai
Abstract:
Taking full advantage of synchrophasors provided by GPS-based wide-area measurement system (WAMS), a novel VBpMKL-based transient stability assessment (TSA) method through multifeature fusion is proposed in this paper. First, a group of classification features reflecting the transient stability characteristics of power systems are extracted from synchrophasors, and according to the different stage…
▽ More
Taking full advantage of synchrophasors provided by GPS-based wide-area measurement system (WAMS), a novel VBpMKL-based transient stability assessment (TSA) method through multifeature fusion is proposed in this paper. First, a group of classification features reflecting the transient stability characteristics of power systems are extracted from synchrophasors, and according to the different stages of the disturbance process they are broken into three nonoverlapped subsets; then a VBpMKL-based TSA model is built using multifeature fusion through combining feature spaces corresponding to each feature subset; and finally application of the proposed model to the IEEE 39-bus system and a real-world power system is demonstrated. The novelty of the proposed approach is that it improves the classification accuracy and reliability of TSA using multifeature fusion with synchrophasors. The application results on the test systems verify the effectiveness of the proposal.
△ Less
Submitted 8 September, 2018;
originally announced September 2018.