-
Spatio-Temporal Representation Decoupling and Enhancement for Federated Instrument Segmentation in Surgical Videos
Authors:
Zheng Fang,
Xiaoming Qi,
Chun-Mei Feng,
Jialun Pei,
Weixin Si,
Yueming Jin
Abstract:
Surgical instrument segmentation under Federated Learning (FL) is a promising direction, which enables multiple surgical sites to collaboratively train the model without centralizing datasets. However, there exist very limited FL works in surgical data science, and FL methods for other modalities do not consider inherent characteristics in surgical domain: i) different scenarios show diverse anato…
▽ More
Surgical instrument segmentation under Federated Learning (FL) is a promising direction, which enables multiple surgical sites to collaboratively train the model without centralizing datasets. However, there exist very limited FL works in surgical data science, and FL methods for other modalities do not consider inherent characteristics in surgical domain: i) different scenarios show diverse anatomical backgrounds while highly similar instrument representation; ii) there exist surgical simulators which promote large-scale synthetic data generation with minimal efforts. In this paper, we propose a novel Personalized FL scheme, Spatio-Temporal Representation Decoupling and Enhancement (FedST), which wisely leverages surgical domain knowledge during both local-site and global-server training to boost segmentation. Concretely, our model embraces a Representation Separation and Cooperation (RSC) mechanism in local-site training, which decouples the query embedding layer to be trained privately, to encode respective backgrounds. Meanwhile, other parameters are optimized globally to capture the consistent representations of instruments, including the temporal layer to capture similar motion patterns. A textual-guided channel selection is further designed to highlight site-specific features, facilitating model adapta tion to each site. Moreover, in global-server training, we propose Synthesis-based Explicit Representation Quantification (SERQ), which defines an explicit representation target based on synthetic data to synchronize the model convergence during fusion for improving model generalization.
△ Less
Submitted 30 June, 2025;
originally announced June 2025.
-
TOAST: Task-Oriented Adaptive Semantic Transmission over Dynamic Wireless Environments
Authors:
Sheng Yun,
Jianhua Pei,
Ping Wang
Abstract:
The evolution toward 6G networks demands a fundamental shift from bit-centric transmission to semantic-aware communication that emphasizes task-relevant information. This work introduces TOAST (Task-Oriented Adaptive Semantic Transmission), a unified framework designed to address the core challenge of multi-task optimization in dynamic wireless environments through three complementary components.…
▽ More
The evolution toward 6G networks demands a fundamental shift from bit-centric transmission to semantic-aware communication that emphasizes task-relevant information. This work introduces TOAST (Task-Oriented Adaptive Semantic Transmission), a unified framework designed to address the core challenge of multi-task optimization in dynamic wireless environments through three complementary components. First, we formulate adaptive task balancing as a Markov decision process, employing deep reinforcement learning to dynamically adjust the trade-off between image reconstruction fidelity and semantic classification accuracy based on real-time channel conditions. Second, we integrate module-specific Low-Rank Adaptation (LoRA) mechanisms throughout our Swin Transformer-based joint source-channel coding architecture, enabling parameter-efficient fine-tuning that dramatically reduces adaptation overhead while maintaining full performance across diverse channel impairments including Additive White Gaussian Noise (AWGN), fading, phase noise, and impulse interference. Third, we incorporate an Elucidating diffusion model that operates in the latent space to restore features corrupted by channel noises, providing substantial quality improvements compared to baseline approaches. Extensive experiments across multiple datasets demonstrate that TOAST achieves superior performance compared to baseline approaches, with significant improvements in both classification accuracy and reconstruction quality at low Signal-to-Noise Ratio (SNR) conditions while maintaining robust performance across all tested scenarios.
△ Less
Submitted 27 June, 2025;
originally announced June 2025.
-
Hierarchical and Collaborative LLM-Based Control for Multi-UAV Motion and Communication in Integrated Terrestrial and Non-Terrestrial Networks
Authors:
Zijiang Yan,
Hao Zhou,
Jianhua Pei,
Hina Tabassum
Abstract:
Unmanned aerial vehicles (UAVs) have been widely adopted in various real-world applications. However, the control and optimization of multi-UAV systems remain a significant challenge, particularly in dynamic and constrained environments. This work explores the joint motion and communication control of multiple UAVs operating within integrated terrestrial and non-terrestrial networks that include h…
▽ More
Unmanned aerial vehicles (UAVs) have been widely adopted in various real-world applications. However, the control and optimization of multi-UAV systems remain a significant challenge, particularly in dynamic and constrained environments. This work explores the joint motion and communication control of multiple UAVs operating within integrated terrestrial and non-terrestrial networks that include high-altitude platform stations (HAPS). Specifically, we consider an aerial highway scenario in which UAVs must accelerate, decelerate, and change lanes to avoid collisions and maintain overall traffic flow. Different from existing studies, we propose a novel hierarchical and collaborative method based on large language models (LLMs). In our approach, an LLM deployed on the HAPS performs UAV access control, while another LLM onboard each UAV handles motion planning and control. This LLM-based framework leverages the rich knowledge embedded in pre-trained models to enable both high-level strategic planning and low-level tactical decisions. This knowledge-driven paradigm holds great potential for the development of next-generation 3D aerial highway systems. Experimental results demonstrate that our proposed collaborative LLM-based method achieves higher system rewards, lower operational costs, and significantly reduced UAV collision rates compared to baseline approaches.
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results
Authors:
Xin Li,
Yeying Jin,
Xin Jin,
Zongwei Wu,
Bingchen Li,
Yufei Wang,
Wenhan Yang,
Yu Li,
Zhibo Chen,
Bihan Wen,
Robby T. Tan,
Radu Timofte,
Qiyu Rong,
Hongyuan Jing,
Mengmeng Zhang,
Jinglong Li,
Xiangyu Lu,
Yi Ren,
Yuting Liu,
Meng Zhang,
Xiang Chen,
Qiyuan Guan,
Jiangxin Dong,
Jinshan Pan,
Conglin Gou
, et al. (112 additional authors not shown)
Abstract:
This paper reviews the NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images. This challenge received a wide range of impressive solutions, which are developed and evaluated using our collected real-world Raindrop Clarity dataset. Unlike existing deraining datasets, our Raindrop Clarity dataset is more diverse and challenging in degradation types and contents, which includ…
▽ More
This paper reviews the NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images. This challenge received a wide range of impressive solutions, which are developed and evaluated using our collected real-world Raindrop Clarity dataset. Unlike existing deraining datasets, our Raindrop Clarity dataset is more diverse and challenging in degradation types and contents, which includes day raindrop-focused, day background-focused, night raindrop-focused, and night background-focused degradations. This dataset is divided into three subsets for competition: 14,139 images for training, 240 images for validation, and 731 images for testing. The primary objective of this challenge is to establish a new and powerful benchmark for the task of removing raindrops under varying lighting and focus conditions. There are a total of 361 participants in the competition, and 32 teams submitting valid solutions and fact sheets for the final testing phase. These submissions achieved state-of-the-art (SOTA) performance on the Raindrop Clarity dataset. The project can be found at https://lixinustc.github.io/CVPR-NTIRE2025-RainDrop-Competition.github.io/.
△ Less
Submitted 19 April, 2025; v1 submitted 17 April, 2025;
originally announced April 2025.
-
Deep Learning-Based Quantitative Assessment of Renal Chronicity Indices in Lupus Nephritis
Authors:
Tianqi Tu,
Hui Wang,
Jiangbo Pei,
Xiaojuan Yu,
Aidong Men,
Suxia Wang,
Qingchao Chen,
Ying Tan,
Feng Yu,
Minghui Zhao
Abstract:
Background: Renal chronicity indices (CI) have been identified as strong predictors of long-term outcomes in lupus nephritis (LN) patients. However, assessment by pathologists is hindered by challenges such as substantial time requirements, high interobserver variation, and susceptibility to fatigue. This study aims to develop an effective deep learning (DL) pipeline that automates the assessment…
▽ More
Background: Renal chronicity indices (CI) have been identified as strong predictors of long-term outcomes in lupus nephritis (LN) patients. However, assessment by pathologists is hindered by challenges such as substantial time requirements, high interobserver variation, and susceptibility to fatigue. This study aims to develop an effective deep learning (DL) pipeline that automates the assessment of CI and provides valuable prognostic insights from a disease-specific perspective. Methods: We curated a dataset comprising 282 slides obtained from 141 patients across two independent cohorts with a complete 10-years follow-up. Our DL pipeline was developed on 60 slides (22,410 patch images) from 30 patients in the training cohort and evaluated on both an internal testing set (148 slides, 77,605 patch images) and an external testing set (74 slides, 27,522 patch images). Results: The study included two cohorts with slight demographic differences, particularly in age and hemoglobin levels. The DL pipeline showed high segmentation performance across tissue compartments and histopathologic lesions, outperforming state-of-the-art methods. The DL pipeline also demonstrated a strong correlation with pathologists in assessing CI, significantly improving interobserver agreement. Additionally, the DL pipeline enhanced prognostic accuracy, particularly in outcome prediction, when combined with clinical parameters and pathologist-assessed CIs Conclusions: The DL pipeline demonstrated accuracy and efficiency in assessing CI in LN, showing promise in improving interobserver agreement among pathologists. It also exhibited significant value in prognostic analysis and enhancing outcome prediction in LN patients, offering a valuable tool for clinical decision-making.
△ Less
Submitted 26 March, 2025;
originally announced March 2025.
-
Probabilistic Net Load Forecasting for High-Penetration RES Grids Utilizing Enhanced Conditional Diffusion Model
Authors:
Yixiang Huang,
Jianhua Pei,
Luocheng Chen,
Zhenchang Du,
Jinfu Chen,
Zirui Peng
Abstract:
The proliferation of intermittent distributed renewable energy sources (RES) in modern power systems has fundamentally compromised the reliability and accuracy of deterministic net load forecasting. Generative models, particularly diffusion models, demonstrate exceptional potential in uncertainty quantification for scenario forecasting. Nevertheless, their probabilistic predictive capabilities and…
▽ More
The proliferation of intermittent distributed renewable energy sources (RES) in modern power systems has fundamentally compromised the reliability and accuracy of deterministic net load forecasting. Generative models, particularly diffusion models, demonstrate exceptional potential in uncertainty quantification for scenario forecasting. Nevertheless, their probabilistic predictive capabilities and conditional bootstrapping mechanisms still remain underexplored. In this paper, a day-ahead probabilistic net load forecasting framework is developed by systematically quantifying epistemic uncertainty and aleatoric variability using the feature-informed enhanced conditional diffusion model (ECDM). The ECDM architecture implements the net load distribution generation process using an imputation-based conditional diffusion model, where multi-modal conditional inputs, such as weather and calendar data, are fused via cross-attention mechanisms. Specifically, historical net load profiles are utilized to guide the reverse diffusion trajectory through non-parametric imputation operators preserving spatial-temporal integrity. To capture periodic characteristics, a novel weekly arrangement method is also introduced, while an unconditional model is integrated to ensure diversity in the generated scenarios. Subsequently, the maximum probabilistic points and probability intervals of predicted net load are obtained by the adaptive kernel density estimation under RES intermittency. Moreover, ECDM is extented to multi-energy forecast framework, attempting to increase interpretability of the net load predictions. Numerical experiments on a publicly available dataset demonstrate the superior forecasting performance of the proposed method compared to existing state-of-the-art approaches.
△ Less
Submitted 3 June, 2025; v1 submitted 22 March, 2025;
originally announced March 2025.
-
Semantic-Aware Adaptive Video Streaming Using Latent Diffusion Models for Wireless Networks
Authors:
Zijiang Yan,
Jianhua Pei,
Hongda Wu,
Hina Tabassum,
Ping Wang
Abstract:
This paper proposes a novel Semantic Communication (SemCom) framework for real-time adaptive-bitrate video streaming by integrating Latent Diffusion Models (LDMs) within the FFmpeg techniques. This solution addresses the challenges of high bandwidth usage, storage inefficiencies, and quality of experience (QoE) degradation associated with traditional Constant Bitrate Streaming (CBS) and Adaptive B…
▽ More
This paper proposes a novel Semantic Communication (SemCom) framework for real-time adaptive-bitrate video streaming by integrating Latent Diffusion Models (LDMs) within the FFmpeg techniques. This solution addresses the challenges of high bandwidth usage, storage inefficiencies, and quality of experience (QoE) degradation associated with traditional Constant Bitrate Streaming (CBS) and Adaptive Bitrate Streaming (ABS). The proposed approach leverages LDMs to compress I-frames into a latent space, offering significant storage and semantic transmission savings without sacrificing high visual quality. While retaining B-frames and P-frames as adjustment metadata to support efficient refinement of video reconstruction at the user side, the proposed framework further incorporates state-of-the-art denoising and Video Frame Interpolation (VFI) techniques. These techniques mitigate semantic ambiguity and restore temporal coherence between frames, even in noisy wireless communication environments. Experimental results demonstrate the proposed method achieves high-quality video streaming with optimized bandwidth usage, outperforming state-of-the-art solutions in terms of QoE and resource efficiency. This work opens new possibilities for scalable real-time video streaming in 5G and future post-5G networks.
△ Less
Submitted 29 June, 2025; v1 submitted 8 February, 2025;
originally announced February 2025.
-
CVaR-Based Variational Quantum Optimization for User Association in Handoff-Aware Vehicular Networks
Authors:
Zijiang Yan,
Hao Zhou,
Jianhua Pei,
Aryan Kaushik,
Hina Tabassum,
Ping Wang
Abstract:
Efficient resource allocation is essential for optimizing various tasks in wireless networks, which are usually formulated as generalized assignment problems (GAP). GAP, as a generalized version of the linear sum assignment problem, involves both equality and inequality constraints that add computational challenges. In this work, we present a novel Conditional Value at Risk (CVaR)-based Variationa…
▽ More
Efficient resource allocation is essential for optimizing various tasks in wireless networks, which are usually formulated as generalized assignment problems (GAP). GAP, as a generalized version of the linear sum assignment problem, involves both equality and inequality constraints that add computational challenges. In this work, we present a novel Conditional Value at Risk (CVaR)-based Variational Quantum Eigensolver (VQE) framework to address GAP in vehicular networks (VNets). Our approach leverages a hybrid quantum-classical structure, integrating a tailored cost function that balances both objective and constraint-specific penalties to improve solution quality and stability. Using the CVaR-VQE model, we handle the GAP efficiently by focusing optimization on the lower tail of the solution space, enhancing both convergence and resilience on noisy intermediate-scale quantum (NISQ) devices. We apply this framework to a user-association problem in VNets, where our method achieves 23.5% improvement compared to the deep neural network (DNN) approach.
△ Less
Submitted 4 February, 2025; v1 submitted 14 January, 2025;
originally announced January 2025.
-
Deep Generative Model-Aided Power System Dynamic State Estimation and Reconstruction with Unknown Control Inputs or Data Distributions
Authors:
Jianhua Pei,
Ping Wang,
Jingyu Wang,
Dongyuan Shi
Abstract:
Fast and robust dynamic state estimation (DSE) is essential for accurately capturing the internal dynamic processes of power systems, and it serves as the foundation for reliably implementing real-time dynamic modeling, monitoring, and control applications. Nonetheless, on one hand, traditional DSE methods based on Kalman filtering or particle filtering have high accuracy requirements for system p…
▽ More
Fast and robust dynamic state estimation (DSE) is essential for accurately capturing the internal dynamic processes of power systems, and it serves as the foundation for reliably implementing real-time dynamic modeling, monitoring, and control applications. Nonetheless, on one hand, traditional DSE methods based on Kalman filtering or particle filtering have high accuracy requirements for system parameters, control inputs, phasor measurement unit (PMU) data, and centralized DSE communication. Consequently, these methods often face accuracy bottlenecks when dealing with structural or system process errors, unknown control vectors, PMU anomalies, and communication contingencies. On the other hand, deep learning-aided DSE, while parameter-free, often suffers from generalization issues under unforeseen operating conditions. To address these challenges, this paper proposes an effective approach that leverages deep generative models from AI-generated content (AIGC) to assist DSE. The proposed approach employs an encoder-decoder architecture to estimate unknown control input variables, a robust encoder to mitigate the impact of bad PMU data, and latent diffusion model to address communication issues in centralized DSE. Additionally, a lightweight adaptor is designed to quickly adjust the latent vector distribution. Extensive experimental results on the IEEE 39-bus system and the NPCC 140-bus system demonstrate the effectiveness and superiority of the proposed method in addressing DSE modeling imperfection, measurement uncertainties, communication contingencies, and unknown distribution challenges, while also proving its ability to reduce data storage and communication resource requirements.
△ Less
Submitted 6 January, 2025;
originally announced January 2025.
-
GEM: A GEneral Memristive Transistor Model
Authors:
Shengbo Wang,
Jingfang Pei,
Cong Li,
Xuemeng Li,
Li Tao,
Arokia Nathan,
Guohua Hu,
Shuo Gao
Abstract:
Neuromorphic devices, with their distinct advantages in energy efficiency and parallel processing, are pivotal in advancing artificial intelligence applications. Among these devices, memristive transistors have attracted significant attention due to their superior stability and operation flexibility compared to two-terminal memristors. However, the lack of a robust model that accurately captures t…
▽ More
Neuromorphic devices, with their distinct advantages in energy efficiency and parallel processing, are pivotal in advancing artificial intelligence applications. Among these devices, memristive transistors have attracted significant attention due to their superior stability and operation flexibility compared to two-terminal memristors. However, the lack of a robust model that accurately captures their complex electrical behavior has hindered further exploration of their potential. In this work, we introduce the GEneral Memristive transistor (GEM) model to address this challenge. The GEM model incorporates time-dependent differential equation, a voltage-controlled moving window function, and a nonlinear current output function, enabling precise representation of both switching and output characteristics in memristive transistors. Compared to previous models, the GEM model demonstrates a 300% improvement in modeling the switching behavior, while effectively capturing the inherent nonlinearities and physical limits of these devices. This advancement significantly enhances the realistic simulation of memristive transistors, thereby facilitating further exploration and application development.
△ Less
Submitted 7 November, 2024; v1 submitted 27 August, 2024;
originally announced August 2024.
-
Epicardium Prompt-guided Real-time Cardiac Ultrasound Frame-to-volume Registration
Authors:
Long Lei,
Jun Zhou,
Jialun Pei,
Baoliang Zhao,
Yueming Jin,
Yuen-Chun Jeremy Teoh,
Jing Qin,
Pheng-Ann Heng
Abstract:
A comprehensive guidance view for cardiac interventional surgery can be provided by the real-time fusion of the intraoperative 2D images and preoperative 3D volume based on the ultrasound frame-to-volume registration. However, cardiac ultrasound images are characterized by a low signal-to-noise ratio and small differences between adjacent frames, coupled with significant dimension variations betwe…
▽ More
A comprehensive guidance view for cardiac interventional surgery can be provided by the real-time fusion of the intraoperative 2D images and preoperative 3D volume based on the ultrasound frame-to-volume registration. However, cardiac ultrasound images are characterized by a low signal-to-noise ratio and small differences between adjacent frames, coupled with significant dimension variations between 2D frames and 3D volumes to be registered, resulting in real-time and accurate cardiac ultrasound frame-to-volume registration being a very challenging task. This paper introduces a lightweight end-to-end Cardiac Ultrasound frame-to-volume Registration network, termed CU-Reg. Specifically, the proposed model leverages epicardium prompt-guided anatomical clues to reinforce the interaction of 2D sparse and 3D dense features, followed by a voxel-wise local-global aggregation of enhanced features, thereby boosting the cross-dimensional matching effectiveness of low-quality ultrasound modalities. We further embed an inter-frame discriminative regularization term within the hybrid supervised learning to increase the distinction between adjacent slices in the same ultrasound volume to ensure registration stability. Experimental results on the reprocessed CAMUS dataset demonstrate that our CU-Reg surpasses existing methods in terms of registration accuracy and efficiency, meeting the guidance requirements of clinical cardiac interventional surgery.
△ Less
Submitted 17 January, 2025; v1 submitted 20 June, 2024;
originally announced June 2024.
-
Lightweight, error-tolerant edge detection using memristor-enabled stochastic logics
Authors:
Lekai Song,
Pengyu Liu,
Jingfang Pei,
Yang Liu,
Songwei Liu,
Shengbo Wang,
Leonard W. T. Ng,
Tawfique Hasan,
Kong-Pang Pun,
Shuo Gao,
Guohua Hu
Abstract:
The demand for efficient edge vision has spurred the interest in developing stochastic computing approaches for performing image processing tasks. Memristors with inherent stochasticity readily introduce probability into the computations and thus enable stochastic image processing computations. Here, we present a stochastic computing approach for edge detection, a fundamental image processing tech…
▽ More
The demand for efficient edge vision has spurred the interest in developing stochastic computing approaches for performing image processing tasks. Memristors with inherent stochasticity readily introduce probability into the computations and thus enable stochastic image processing computations. Here, we present a stochastic computing approach for edge detection, a fundamental image processing technique, facilitated with memristor-enabled stochastic logics. Specifically, we integrate the memristors with logic circuits and harness the stochasticity from the memristors to realize compact stochastic logics for stochastic number encoding and processing. The stochastic numbers, exhibiting well-regulated probabilities and correlations, can be processed to perform logic operations with statistical probabilities. This can facilitate lightweight stochastic edge detection for edge visual scenarios characterized with high-level noise errors. As a practical demonstration, we implement a hardware stochastic Roberts cross operator using the stochastic logics, and prove its exceptional edge detection performance, remarkably, with 95% less computational cost while withstanding 50% bit-flip errors. The results underscore the great potential of our stochastic edge detection approach in developing lightweight, error-tolerant edge vision hardware and systems for autonomous driving, virtual/augmented reality, medical imaging diagnosis, industrial automation, and beyond.
△ Less
Submitted 20 March, 2024; v1 submitted 25 February, 2024;
originally announced February 2024.
-
Transfer the linguistic representations from TTS to accent conversion with non-parallel data
Authors:
Xi Chen,
Jiakun Pei,
Liumeng Xue,
Mingyang Zhang
Abstract:
Accent conversion aims to convert the accent of a source speech to a target accent, meanwhile preserving the speaker's identity. This paper introduces a novel non-autoregressive framework for accent conversion that learns accent-agnostic linguistic representations and employs them to convert the accent in the source speech. Specifically, the proposed system aligns speech representations with lingu…
▽ More
Accent conversion aims to convert the accent of a source speech to a target accent, meanwhile preserving the speaker's identity. This paper introduces a novel non-autoregressive framework for accent conversion that learns accent-agnostic linguistic representations and employs them to convert the accent in the source speech. Specifically, the proposed system aligns speech representations with linguistic representations obtained from Text-to-Speech (TTS) systems, enabling training of the accent voice conversion model on non-parallel data. Furthermore, we investigate the effectiveness of a pretraining strategy on native data and different acoustic features within our proposed framework. We conduct a comprehensive evaluation using both subjective and objective metrics to assess the performance of our approach. The evaluation results highlight the benefits of the pretraining strategy and the incorporation of richer semantic features, resulting in significantly enhanced audio quality and intelligibility.
△ Less
Submitted 7 January, 2024;
originally announced January 2024.
-
Crucial Feature Capture and Discrimination for Limited Training Data SAR ATR
Authors:
Chenwei Wang,
Siyi Luo,
Jifang Pei,
Yulin Huang,
Yin Zhang,
Jianyu Yang
Abstract:
Although deep learning-based methods have achieved excellent performance on SAR ATR, the fact that it is difficult to acquire and label a lot of SAR images makes these methods, which originally performed well, perform weakly. This may be because most of them consider the whole target images as input, but the researches find that, under limited training data, the deep learning model can't capture d…
▽ More
Although deep learning-based methods have achieved excellent performance on SAR ATR, the fact that it is difficult to acquire and label a lot of SAR images makes these methods, which originally performed well, perform weakly. This may be because most of them consider the whole target images as input, but the researches find that, under limited training data, the deep learning model can't capture discriminative image regions in the whole images, rather focus on more useless even harmful image regions for recognition. Therefore, the results are not satisfactory. In this paper, we design a SAR ATR framework under limited training samples, which mainly consists of two branches and two modules, global assisted branch and local enhanced branch, feature capture module and feature discrimination module. In every training process, the global assisted branch first finishes the initial recognition based on the whole image. Based on the initial recognition results, the feature capture module automatically searches and locks the crucial image regions for correct recognition, which we named as the golden key of image. Then the local extract the local features from the captured crucial image regions. Finally, the overall features and local features are input into the classifier and dynamically weighted using the learnable voting parameters to collaboratively complete the final recognition under limited training samples. The model soundness experiments demonstrate the effectiveness of our method through the improvement of feature distribution and recognition probability. The experimental results and comparisons on MSTAR and OPENSAR show that our method has achieved superior recognition performance.
△ Less
Submitted 20 August, 2023;
originally announced August 2023.
-
An Entropy-Awareness Meta-Learning Method for SAR Open-Set ATR
Authors:
Chenwei Wang,
Siyi Luo,
Jifang Pei,
Xiaoyu Liu,
Yulin Huang,
Yin Zhang,
Jianyu Yang
Abstract:
Existing synthetic aperture radar automatic target recognition (SAR ATR) methods have been effective for the classification of seen target classes. However, it is more meaningful and challenging to distinguish the unseen target classes, i.e., open set recognition (OSR) problem, which is an urgent problem for the practical SAR ATR. The key solution of OSR is to effectively establish the exclusivene…
▽ More
Existing synthetic aperture radar automatic target recognition (SAR ATR) methods have been effective for the classification of seen target classes. However, it is more meaningful and challenging to distinguish the unseen target classes, i.e., open set recognition (OSR) problem, which is an urgent problem for the practical SAR ATR. The key solution of OSR is to effectively establish the exclusiveness of feature distribution of known classes. In this letter, we propose an entropy-awareness meta-learning method that improves the exclusiveness of feature distribution of known classes which means our method is effective for not only classifying the seen classes but also encountering the unseen other classes. Through meta-learning tasks, the proposed method learns to construct a feature space of the dynamic-assigned known classes. This feature space is required by the tasks to reject all other classes not belonging to the known classes. At the same time, the proposed entropy-awareness loss helps the model to enhance the feature space with effective and robust discrimination between the known and unknown classes. Therefore, our method can construct a dynamic feature space with discrimination between the known and unknown classes to simultaneously classify the dynamic-assigned known classes and reject the unknown classes. Experiments conducted on the moving and stationary target acquisition and recognition (MSTAR) dataset have shown the effectiveness of our method for SAR OSR.
△ Less
Submitted 20 August, 2023;
originally announced August 2023.
-
SAR Ship Target Recognition via Selective Feature Discrimination and Multifeature Center Classifier
Authors:
Chenwei Wang,
Siyi Luo,
Jifang Pei,
Yulin Huang,
Yin Zhang,
Jianyu Yang
Abstract:
Maritime surveillance is not only necessary for every country, such as in maritime safeguarding and fishing controls, but also plays an essential role in international fields, such as in rescue support and illegal immigration control. Most of the existing automatic target recognition (ATR) methods directly send the extracted whole features of SAR ships into one classifier. The classifiers of most…
▽ More
Maritime surveillance is not only necessary for every country, such as in maritime safeguarding and fishing controls, but also plays an essential role in international fields, such as in rescue support and illegal immigration control. Most of the existing automatic target recognition (ATR) methods directly send the extracted whole features of SAR ships into one classifier. The classifiers of most methods only assign one feature center to each class. However, the characteristics of SAR ship images, large inner-class variance, and small interclass difference lead to the whole features containing useless partial features and a single feature center for each class in the classifier failing with large inner-class variance. We proposes a SAR ship target recognition method via selective feature discrimination and multifeature center classifier. The selective feature discrimination automatically finds the similar partial features from the most similar interclass image pairs and the dissimilar partial features from the most dissimilar inner-class image pairs. It then provides a loss to enhance these partial features with more interclass separability. Motivated by divide and conquer, the multifeature center classifier assigns multiple learnable feature centers for each ship class. In this way, the multifeature centers divide the large inner-class variance into several smaller variances and conquered by combining all feature centers of one ship class. Finally, the probability distribution over all feature centers is considered comprehensively to achieve an accurate recognition of SAR ship images. The ablation experiments and experimental results on OpenSARShip and FUSAR-Ship datasets show that our method has achieved superior recognition performance under decreasing training SAR ship samples.
△ Less
Submitted 8 November, 2023; v1 submitted 20 August, 2023;
originally announced August 2023.
-
SAR Ship Target Recognition Via Multi-Scale Feature Attention and Adaptive-Weighed Classifier
Authors:
Chenwei Wang,
Jifang Pei,
Siyi Luo,
Weibo Huo,
Yulin Huang,
Yin Zhang,
Jianyu Yang
Abstract:
Maritime surveillance is indispensable for civilian fields, including national maritime safeguarding, channel monitoring, and so on, in which synthetic aperture radar (SAR) ship target recognition is a crucial research field. The core problem to realizing accurate SAR ship target recognition is the large inner-class variance and inter-class overlap of SAR ship features, which limits the recognitio…
▽ More
Maritime surveillance is indispensable for civilian fields, including national maritime safeguarding, channel monitoring, and so on, in which synthetic aperture radar (SAR) ship target recognition is a crucial research field. The core problem to realizing accurate SAR ship target recognition is the large inner-class variance and inter-class overlap of SAR ship features, which limits the recognition performance. Most existing methods plainly extract multi-scale features of the network and utilize equally each feature scale in the classification stage. However, the shallow multi-scale features are not discriminative enough, and each scale feature is not equally effective for recognition. These factors lead to the limitation of recognition performance. Therefore, we proposed a SAR ship recognition method via multi-scale feature attention and adaptive-weighted classifier to enhance features in each scale, and adaptively choose the effective feature scale for accurate recognition. We first construct an in-network feature pyramid to extract multi-scale features from SAR ship images. Then, the multi-scale feature attention can extract and enhance the principal components from the multi-scale features with more inner-class compactness and inter-class separability. Finally, the adaptive weighted classifier chooses the effective feature scales in the feature pyramid to achieve the final precise recognition. Through experiments and comparisons under OpenSARship data set, the proposed method is validated to achieve state-of-the-art performance for SAR ship recognition.
△ Less
Submitted 20 August, 2023;
originally announced August 2023.
-
SAR ATR Method with Limited Training Data via an Embedded Feature Augmenter and Dynamic Hierarchical-Feature Refiner
Authors:
Chenwei Wang,
Siyi Luo,
Yulin Huang,
Jifang Pei,
Yin Zhang,
Jianyu Yang
Abstract:
Without sufficient data, the quantity of information available for supervised training is constrained, as obtaining sufficient synthetic aperture radar (SAR) training data in practice is frequently challenging. Therefore, current SAR automatic target recognition (ATR) algorithms perform poorly with limited training data availability, resulting in a critical need to increase SAR ATR performance. In…
▽ More
Without sufficient data, the quantity of information available for supervised training is constrained, as obtaining sufficient synthetic aperture radar (SAR) training data in practice is frequently challenging. Therefore, current SAR automatic target recognition (ATR) algorithms perform poorly with limited training data availability, resulting in a critical need to increase SAR ATR performance. In this study, a new method to improve SAR ATR when training data are limited is proposed. First, an embedded feature augmenter is designed to enhance the extracted virtual features located far away from the class center. Based on the relative distribution of the features, the algorithm pulls the corresponding virtual features with different strengths toward the corresponding class center. The designed augmenter increases the amount of information available for supervised training and improves the separability of the extracted features. Second, a dynamic hierarchical-feature refiner is proposed to capture the discriminative local features of the samples. Through dynamically generated kernels, the proposed refiner integrates the discriminative local features of different dimensions into the global features, further enhancing the inner-class compactness and inter-class separability of the extracted features. The proposed method not only increases the amount of information available for supervised training but also extracts the discriminative features from the samples, resulting in superior ATR performance in problems with limited SAR training data. Experimental results on the moving and stationary target acquisition and recognition (MSTAR), OpenSARShip, and FUSAR-Ship benchmark datasets demonstrate the robustness and outstanding ATR performance of the proposed method in response to limited SAR training data.
△ Less
Submitted 1 September, 2023; v1 submitted 20 August, 2023;
originally announced August 2023.
-
Causal SAR ATR with Limited Data via Dual Invariance
Authors:
Chenwei Wang,
You Qin,
Li Li,
Siyi Luo,
Yulin Huang,
Jifang Pei,
Yin Zhang,
Jianyu Yang
Abstract:
Synthetic aperture radar automatic target recognition (SAR ATR) with limited data has recently been a hot research topic to enhance weak generalization. Despite many excellent methods being proposed, a fundamental theory is lacked to explain what problem the limited SAR data causes, leading to weak generalization of ATR. In this paper, we establish a causal ATR model demonstrating that noise $N$ t…
▽ More
Synthetic aperture radar automatic target recognition (SAR ATR) with limited data has recently been a hot research topic to enhance weak generalization. Despite many excellent methods being proposed, a fundamental theory is lacked to explain what problem the limited SAR data causes, leading to weak generalization of ATR. In this paper, we establish a causal ATR model demonstrating that noise $N$ that could be blocked with ample SAR data, becomes a confounder with limited data for recognition. As a result, it has a detrimental causal effect damaging the efficacy of feature $X$ extracted from SAR images, leading to weak generalization of SAR ATR with limited data. The effect of $N$ on feature can be estimated and eliminated by using backdoor adjustment to pursue the direct causality between $X$ and the predicted class $Y$. However, it is difficult for SAR images to precisely estimate and eliminated the effect of $N$ on $X$. The limited SAR data scarcely powers the majority of existing optimization losses based on empirical risk minimization (ERM), thus making it difficult to effectively eliminate $N$'s effect. To tackle with difficult estimation and elimination of $N$'s effect, we propose a dual invariance comprising the inner-class invariant proxy and the noise-invariance loss. Motivated by tackling change with invariance, the inner-class invariant proxy facilitates precise estimation of $N$'s effect on $X$ by obtaining accurate invariant features for each class with the limited data. The noise-invariance loss transitions the ERM's data quantity necessity into a need for noise environment annotations, effectively eliminating $N$'s effect on $X$ by cleverly applying the previous $N$'s estimation as the noise environment annotations. Experiments on three benchmark datasets indicate that the proposed method achieves superior performance.
△ Less
Submitted 10 November, 2023; v1 submitted 18 August, 2023;
originally announced August 2023.
-
Unveiling Causalities in SAR ATR: A Causal Interventional Approach for Limited Data
Authors:
Chenwei Wang,
Xin Chen,
You Qin,
Siyi Luo,
Yulin Huang,
Jifang Pei,
Jianyu Yang
Abstract:
Synthetic aperture radar automatic target recognition (SAR ATR) methods fall short with limited training data. In this letter, we propose a causal interventional ATR method (CIATR) to formulate the problem of limited SAR data which helps us uncover the ever-elusive causalities among the key factors in ATR, and thus pursue the desired causal effect without changing the imaging conditions. A structu…
▽ More
Synthetic aperture radar automatic target recognition (SAR ATR) methods fall short with limited training data. In this letter, we propose a causal interventional ATR method (CIATR) to formulate the problem of limited SAR data which helps us uncover the ever-elusive causalities among the key factors in ATR, and thus pursue the desired causal effect without changing the imaging conditions. A structural causal model (SCM) is comprised using causal inference to help understand how imaging conditions acts as a confounder introducing spurious correlation when SAR data is limited. This spurious correlation among SAR images and the predicted classes can be fundamentally tackled with the conventional backdoor adjustments. An effective implement of backdoor adjustments is proposed by firstly using data augmentation with spatial-frequency domain hybrid transformation to estimate the potential effect of varying imaging conditions on SAR images. Then, a feature discrimination approach with hybrid similarity measurement is introduced to measure and mitigate the structural and vector angle impacts of varying imaging conditions on the extracted features from SAR images. Thus, our CIATR can pursue the true causality between SAR images and the corresponding classes even with limited SAR data. Experiments and comparisons conducted on the moving and stationary target acquisition and recognition (MSTAR) and OpenSARship datasets have shown the effectiveness of our method with limited SAR data.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
A deep deformable residual learning network for SAR images segmentation
Authors:
Chenwei Wang,
Jifang Pei,
Yulin Huang,
Jianyu Yang
Abstract:
Reliable automatic target segmentation in Synthetic Aperture Radar (SAR) imagery has played an important role in the SAR fields. Different from the traditional methods, Spectral Residual (SR) and CFAR detector, with the recent adavance in machine learning theory, there has emerged a novel method for SAR target segmentation, based on the deep learning networks. In this paper, we proposed a deep def…
▽ More
Reliable automatic target segmentation in Synthetic Aperture Radar (SAR) imagery has played an important role in the SAR fields. Different from the traditional methods, Spectral Residual (SR) and CFAR detector, with the recent adavance in machine learning theory, there has emerged a novel method for SAR target segmentation, based on the deep learning networks. In this paper, we proposed a deep deformable residual learning network for target segmentation that attempts to preserve the precise contour of the target. For this, the deformable convolutional layers and residual learning block are applied, which could extract and preserve the geometric information of the targets as much as possible. Based on the Moving and Stationary Target Acquisition and Recognition (MSTAR) data set, experimental results have shown the superiority of the proposed network for the precise targets segmentation.
△ Less
Submitted 15 August, 2023;
originally announced August 2023.
-
When Deep Learning Meets Multi-Task Learning in SAR ATR: Simultaneous Target Recognition and Segmentation
Authors:
Chenwei Wang,
Jifang Pei,
Zhiyong Wang,
Yulin Huang,
Junjie Wu,
Haiguang Yang,
Jianyu Yang
Abstract:
With the recent advances of deep learning, automatic target recognition (ATR) of synthetic aperture radar (SAR) has achieved superior performance. By not being limited to the target category, the SAR ATR system could benefit from the simultaneous extraction of multifarious target attributes. In this paper, we propose a new multi-task learning approach for SAR ATR, which could obtain the accurate c…
▽ More
With the recent advances of deep learning, automatic target recognition (ATR) of synthetic aperture radar (SAR) has achieved superior performance. By not being limited to the target category, the SAR ATR system could benefit from the simultaneous extraction of multifarious target attributes. In this paper, we propose a new multi-task learning approach for SAR ATR, which could obtain the accurate category and precise shape of the targets simultaneously. By introducing deep learning theory into multi-task learning, we first propose a novel multi-task deep learning framework with two main structures: encoder and decoder. The encoder is constructed to extract sufficient image features in different scales for the decoder, while the decoder is a tasks-specific structure which employs these extracted features adaptively and optimally to meet the different feature demands of the recognition and segmentation. Therefore, the proposed framework has the ability to achieve superior recognition and segmentation performance. Based on the Moving and Stationary Target Acquisition and Recognition (MSTAR) dataset, experimental results show the superiority of the proposed framework in terms of recognition and segmentation.
△ Less
Submitted 14 August, 2023;
originally announced August 2023.
-
SAR Target Image Generation Method Using Azimuth-Controllable Generative Adversarial Network
Authors:
Chenwei Wang,
Jifang Pei,
Xiaoyu Liu,
Yulin Huang,
Deqing Mao,
Yin Zhang,
Jianyu Yang
Abstract:
Sufficient synthetic aperture radar (SAR) target images are very important for the development of researches. However, available SAR target images are often limited in practice, which hinders the progress of SAR application. In this paper, we propose an azimuth-controllable generative adversarial network to generate precise SAR target images with an intermediate azimuth between two given SAR image…
▽ More
Sufficient synthetic aperture radar (SAR) target images are very important for the development of researches. However, available SAR target images are often limited in practice, which hinders the progress of SAR application. In this paper, we propose an azimuth-controllable generative adversarial network to generate precise SAR target images with an intermediate azimuth between two given SAR images' azimuths. This network mainly contains three parts: generator, discriminator, and predictor. Through the proposed specific network structure, the generator can extract and fuse the optimal target features from two input SAR target images to generate SAR target image. Then a similarity discriminator and an azimuth predictor are designed. The similarity discriminator can differentiate the generated SAR target images from the real SAR images to ensure the accuracy of the generated, while the azimuth predictor measures the difference of azimuth between the generated and the desired to ensure the azimuth controllability of the generated. Therefore, the proposed network can generate precise SAR images, and their azimuths can be controlled well by the inputs of the deep network, which can generate the target images in different azimuths to solve the small sample problem to some degree and benefit the researches of SAR images. Extensive experimental results show the superiority of the proposed method in azimuth controllability and accuracy of SAR target image generation.
△ Less
Submitted 10 August, 2023;
originally announced August 2023.
-
Global in Local: A Convolutional Transformer for SAR ATR FSL
Authors:
Chenwei Wang,
Yulin Huang,
Xiaoyu Liu,
Jifang Pei,
Yin Zhang,
Jianyu Yang
Abstract:
Convolutional neural networks (CNNs) have dominated the synthetic aperture radar (SAR) automatic target recognition (ATR) for years. However, under the limited SAR images, the width and depth of the CNN-based models are limited, and the widening of the received field for global features in images is hindered, which finally leads to the low performance of recognition. To address these challenges, w…
▽ More
Convolutional neural networks (CNNs) have dominated the synthetic aperture radar (SAR) automatic target recognition (ATR) for years. However, under the limited SAR images, the width and depth of the CNN-based models are limited, and the widening of the received field for global features in images is hindered, which finally leads to the low performance of recognition. To address these challenges, we propose a Convolutional Transformer (ConvT) for SAR ATR few-shot learning (FSL). The proposed method focuses on constructing a hierarchical feature representation and capturing global dependencies of local features in each layer, named global in local. A novel hybrid loss is proposed to interpret the few SAR images in the forms of recognition labels and contrastive image pairs, construct abundant anchor-positive and anchor-negative image pairs in one batch and provide sufficient loss for the optimization of the ConvT to overcome the few sample effect. An auto augmentation is proposed to enhance and enrich the diversity and amount of the few training samples to explore the hidden feature in a few SAR images and avoid the over-fitting in SAR ATR FSL. Experiments conducted on the Moving and Stationary Target Acquisition and Recognition dataset (MSTAR) have shown the effectiveness of our proposed ConvT for SAR ATR FSL. Different from existing SAR ATR FSL methods employing additional training datasets, our method achieved pioneering performance without other SAR target images in training.
△ Less
Submitted 10 August, 2023;
originally announced August 2023.
-
SAR ATR under Limited Training Data Via MobileNetV3
Authors:
Chenwei Wang,
Siyi Luo,
Lin Liu,
Yin Zhang,
Jifang Pei,
Yulin Huang,
Jianyu Yang
Abstract:
In recent years, deep learning has been widely used to solve the bottleneck problem of synthetic aperture radar (SAR) automatic target recognition (ATR). However, most current methods rely heavily on a large number of training samples and have many parameters which lead to failure under limited training samples. In practical applications, the SAR ATR method needs not only superior performance unde…
▽ More
In recent years, deep learning has been widely used to solve the bottleneck problem of synthetic aperture radar (SAR) automatic target recognition (ATR). However, most current methods rely heavily on a large number of training samples and have many parameters which lead to failure under limited training samples. In practical applications, the SAR ATR method needs not only superior performance under limited training data but also real-time performance. Therefore, we try to use a lightweight network for SAR ATR under limited training samples, which has fewer parameters, less computational effort, and shorter inference time than normal networks. At the same time, the lightweight network combines the advantages of existing lightweight networks and uses a combination of MnasNet and NetAdapt algorithms to find the optimal neural network architecture for a given problem. Through experiments and comparisons under the moving and stationary target acquisition and recognition (MSTAR) dataset, the lightweight network is validated to have excellent recognition performance for SAR ATR on limited training samples and be very computationally small, reflecting the great potential of this network structure for practical applications.
△ Less
Submitted 10 August, 2023; v1 submitted 27 June, 2023;
originally announced June 2023.
-
Momentum-Based Policy Gradient Methods
Authors:
Feihu Huang,
Shangqian Gao,
Jian Pei,
Heng Huang
Abstract:
In the paper, we propose a class of efficient momentum-based policy gradient methods for the model-free reinforcement learning, which use adaptive learning rates and do not require any large batches. Specifically, we propose a fast important-sampling momentum-based policy gradient (IS-MBPG) method based on a new momentum-based variance reduced technique and the importance sampling technique. We al…
▽ More
In the paper, we propose a class of efficient momentum-based policy gradient methods for the model-free reinforcement learning, which use adaptive learning rates and do not require any large batches. Specifically, we propose a fast important-sampling momentum-based policy gradient (IS-MBPG) method based on a new momentum-based variance reduced technique and the importance sampling technique. We also propose a fast Hessian-aided momentum-based policy gradient (HA-MBPG) method based on the momentum-based variance reduced technique and the Hessian-aided technique. Moreover, we prove that both the IS-MBPG and HA-MBPG methods reach the best known sample complexity of $O(ε^{-3})$ for finding an $ε$-stationary point of the non-concave performance function, which only require one trajectory at each iteration. In particular, we present a non-adaptive version of IS-MBPG method, i.e., IS-MBPG*, which also reaches the best known sample complexity of $O(ε^{-3})$ without any large batches. In the experiments, we apply four benchmark tasks to demonstrate the effectiveness of our algorithms.
△ Less
Submitted 6 August, 2020; v1 submitted 13 July, 2020;
originally announced July 2020.
-
AM-GCN: Adaptive Multi-channel Graph Convolutional Networks
Authors:
Xiao Wang,
Meiqi Zhu,
Deyu Bo,
Peng Cui,
Chuan Shi,
Jian Pei
Abstract:
Graph Convolutional Networks (GCNs) have gained great popularity in tackling various analytics tasks on graph and network data. However, some recent studies raise concerns about whether GCNs can optimally integrate node features and topological structures in a complex graph with rich information. In this paper, we first present an experimental investigation. Surprisingly, our experimental results…
▽ More
Graph Convolutional Networks (GCNs) have gained great popularity in tackling various analytics tasks on graph and network data. However, some recent studies raise concerns about whether GCNs can optimally integrate node features and topological structures in a complex graph with rich information. In this paper, we first present an experimental investigation. Surprisingly, our experimental results clearly show that the capability of the state-of-the-art GCNs in fusing node features and topological structures is distant from optimal or even satisfactory. The weakness may severely hinder the capability of GCNs in some classification tasks, since GCNs may not be able to adaptively learn some deep correlation information between topological structures and node features. Can we remedy the weakness and design a new type of GCNs that can retain the advantages of the state-of-the-art GCNs and, at the same time, enhance the capability of fusing topological structures and node features substantially? We tackle the challenge and propose an adaptive multi-channel graph convolutional networks for semi-supervised classification (AM-GCN). The central idea is that we extract the specific and common embeddings from node features, topological structures, and their combinations simultaneously, and use the attention mechanism to learn adaptive importance weights of the embeddings. Our extensive experiments on benchmark data sets clearly show that AM-GCN extracts the most correlated information from both node features and topological structures substantially, and improves the classification accuracy with a clear margin.
△ Less
Submitted 10 July, 2020; v1 submitted 5 July, 2020;
originally announced July 2020.
-
Simulation-Based Digital Twin Development for Blockchain Enabled End-to-End Industrial Hemp Supply Chain Risk Management
Authors:
Keqi Wang,
Wei Xie,
Wencen Wu,
Bo Wang,
Jinxiang Pei,
Mike Baker,
Qi Zhou
Abstract:
With the passage of the 2018 U.S. Farm Bill, Industrial Hemp production is moved from limited pilot programs to a regulated agriculture production system. However, Industrial Hemp Supply Chain (IHSC) faces critical challenges, including: high complexity and variability, very limited production knowledge, lack of data and information tracking. In this paper, we propose blockchain-enabled IHSC and d…
▽ More
With the passage of the 2018 U.S. Farm Bill, Industrial Hemp production is moved from limited pilot programs to a regulated agriculture production system. However, Industrial Hemp Supply Chain (IHSC) faces critical challenges, including: high complexity and variability, very limited production knowledge, lack of data and information tracking. In this paper, we propose blockchain-enabled IHSC and develop a preliminary simulation-based digital twin for this distributed cyber-physical system (CPS) to support the process learning and risk management. Basically, we develop a two-layer blockchain with proof of authority smart contract, which can track the data and key information, improve the supply chain transparency, and leverage local authorities and state regulators to ensure the quality control verification. Then, we introduce a stochastic simulation-based digital twin for IHSC risk management, which can characterize the process spatial-temporal causal interdependencies and dynamic evolution to guide risk control and decision making. Our empirical study demonstrates the promising performance of proposed platform.
△ Less
Submitted 18 June, 2020;
originally announced June 2020.
-
Transfer Learning in General Lensless Imaging through Scattering Media
Authors:
Yukuan Yang,
Lei Deng,
Peng Jiao,
Yansong Chua,
Jing Pei,
Cheng Ma,
Guoqi Li
Abstract:
Recently deep neural networks (DNNs) have been successfully introduced to the field of lensless imaging through scattering media. By solving an inverse problem in computational imaging, DNNs can overcome several shortcomings in the conventional lensless imaging through scattering media methods, namely, high cost, poor quality, complex control, and poor anti-interference. However, for training, a l…
▽ More
Recently deep neural networks (DNNs) have been successfully introduced to the field of lensless imaging through scattering media. By solving an inverse problem in computational imaging, DNNs can overcome several shortcomings in the conventional lensless imaging through scattering media methods, namely, high cost, poor quality, complex control, and poor anti-interference. However, for training, a large number of training samples on various datasets have to be collected, with a DNN trained on one dataset generally performing poorly for recovering images from another dataset. The underlying reason is that lensless imaging through scattering media is a high dimensional regression problem and it is difficult to obtain an analytical solution. In this work, transfer learning is proposed to address this issue. Our main idea is to train a DNN on a relatively complex dataset using a large number of training samples and fine-tune the last few layers using very few samples from other datasets. Instead of the thousands of samples required to train from scratch, transfer learning alleviates the problem of costly data acquisition. Specifically, considering the difference in sample sizes and similarity among datasets, we propose two DNN architectures, namely LISMU-FCN and LISMU-OCN, and a balance loss function designed for balancing smoothness and sharpness. LISMU-FCN, with much fewer parameters, can achieve imaging across similar datasets while LISMU-OCN can achieve imaging across significantly different datasets. What's more, we establish a set of simulation algorithms which are close to the real experiment, and it is of great significance and practical value in the research on lensless scattering imaging. In summary, this work provides a new solution for lensless imaging through scattering media using transfer learning in DNNs.
△ Less
Submitted 28 December, 2019;
originally announced December 2019.