-
HOTA: Hierarchical Overlap-Tiling Aggregation for Large-Area 3D Flood Mapping
Authors:
Wenfeng Jia,
Bin Liang,
Yuxi Lu,
Attavit Wilaiwongsakul,
Muhammad Arif Khan,
Lihong Zheng
Abstract:
Floods are among the most frequent natural hazards and cause significant social and economic damage. Timely, large-scale information on flood extent and depth is essential for disaster response; however, existing products often trade spatial detail for coverage or ignore flood depth altogether. To bridge this gap, this work presents HOTA: Hierarchical Overlap-Tiling Aggregation, a plug-and-play, m…
▽ More
Floods are among the most frequent natural hazards and cause significant social and economic damage. Timely, large-scale information on flood extent and depth is essential for disaster response; however, existing products often trade spatial detail for coverage or ignore flood depth altogether. To bridge this gap, this work presents HOTA: Hierarchical Overlap-Tiling Aggregation, a plug-and-play, multi-scale inference strategy. When combined with SegFormer and a dual-constraint depth estimation module, this approach forms a complete 3D flood-mapping pipeline. HOTA applies overlapping tiles of different sizes to multispectral Sentinel-2 images only during inference, enabling the SegFormer model to capture both local features and kilometre-scale inundation without changing the network weights or retraining. The subsequent depth module is based on a digital elevation model (DEM) differencing method, which refines the 2D mask and estimates flood depth by enforcing (i) zero depth along the flood boundary and (ii) near-constant flood volume with respect to the DEM. A case study on the March 2021 Kempsey (Australia) flood shows that HOTA, when coupled with SegFormer, improves IoU from 73\% (U-Net baseline) to 84\%. The resulting 3D surface achieves a mean absolute boundary error of less than 0.5 m. These results demonstrate that HOTA can produce accurate, large-area 3D flood maps suitable for rapid disaster response.
△ Less
Submitted 10 July, 2025;
originally announced July 2025.
-
Cross-domain Hyperspectral Image Classification based on Bi-directional Domain Adaptation
Authors:
Yuxiang Zhang,
Wei Li,
Wen Jia,
Mengmeng Zhang,
Ran Tao,
Shunlin Liang
Abstract:
Utilizing hyperspectral remote sensing technology enables the extraction of fine-grained land cover classes. Typically, satellite or airborne images used for training and testing are acquired from different regions or times, where the same class has significant spectral shifts in different scenes. In this paper, we propose a Bi-directional Domain Adaptation (BiDA) framework for cross-domain hypers…
▽ More
Utilizing hyperspectral remote sensing technology enables the extraction of fine-grained land cover classes. Typically, satellite or airborne images used for training and testing are acquired from different regions or times, where the same class has significant spectral shifts in different scenes. In this paper, we propose a Bi-directional Domain Adaptation (BiDA) framework for cross-domain hyperspectral image (HSI) classification, which focuses on extracting both domain-invariant features and domain-specific information in the independent adaptive space, thereby enhancing the adaptability and separability to the target scene. In the proposed BiDA, a triple-branch transformer architecture (the source branch, target branch, and coupled branch) with semantic tokenizer is designed as the backbone. Specifically, the source branch and target branch independently learn the adaptive space of source and target domains, a Coupled Multi-head Cross-attention (CMCA) mechanism is developed in coupled branch for feature interaction and inter-domain correlation mining. Furthermore, a bi-directional distillation loss is designed to guide adaptive space learning using inter-domain correlation. Finally, we propose an Adaptive Reinforcement Strategy (ARS) to encourage the model to focus on specific generalized feature extraction within both source and target scenes in noise condition. Experimental results on cross-temporal/scene airborne and satellite datasets demonstrate that the proposed BiDA performs significantly better than some state-of-the-art domain adaptation approaches. In the cross-temporal tree species classification task, the proposed BiDA is more than 3\%$\sim$5\% higher than the most advanced method. The codes will be available from the website: https://github.com/YuxiangZhang-BIT/IEEE_TCSVT_BiDA.
△ Less
Submitted 2 July, 2025;
originally announced July 2025.
-
Efficient RAW Image Deblurring with Adaptive Frequency Modulation
Authors:
Wenlong Jiao,
Binglong Li,
Wei Shang,
Ping Wang,
Dongwei Ren
Abstract:
Image deblurring plays a crucial role in enhancing visual clarity across various applications. Although most deep learning approaches primarily focus on sRGB images, which inherently lose critical information during the image signal processing pipeline, RAW images, being unprocessed and linear, possess superior restoration potential but remain underexplored. Deblurring RAW images presents unique c…
▽ More
Image deblurring plays a crucial role in enhancing visual clarity across various applications. Although most deep learning approaches primarily focus on sRGB images, which inherently lose critical information during the image signal processing pipeline, RAW images, being unprocessed and linear, possess superior restoration potential but remain underexplored. Deblurring RAW images presents unique challenges, particularly in handling frequency-dependent blur while maintaining computational efficiency. To address these issues, we propose Frequency Enhanced Network (FrENet), a framework specifically designed for RAW-to-RAW deblurring that operates directly in the frequency domain. We introduce a novel Adaptive Frequency Positional Modulation module, which dynamically adjusts frequency components according to their spectral positions, thereby enabling precise control over the deblurring process. Additionally, frequency domain skip connections are adopted to further preserve high-frequency details. Experimental results demonstrate that FrENet surpasses state-of-the-art deblurring methods in RAW image deblurring, achieving significantly better restoration quality while maintaining high efficiency in terms of reduced MACs. Furthermore, FrENet's adaptability enables it to be extended to sRGB images, where it delivers comparable or superior performance compared to methods specifically designed for sRGB data. The code will be available at https://github.com/WenlongJiao/FrENet .
△ Less
Submitted 3 June, 2025; v1 submitted 30 May, 2025;
originally announced May 2025.
-
Hybrid Learning for Cold-Start-Aware Microservice Scheduling in Dynamic Edge Environments
Authors:
Jingxi Lu,
Wenhao Li,
Jianxiong Guo,
Xingjian Ding,
Zhiqing Tang,
Tian Wang,
Weijia Jia
Abstract:
With the rapid growth of IoT devices and their diverse workloads, container-based microservices deployed at edge nodes have become a lightweight and scalable solution. However, existing microservice scheduling algorithms often assume static resource availability, which is unrealistic when multiple containers are assigned to an edge node. Besides, containers suffer from cold-start inefficiencies du…
▽ More
With the rapid growth of IoT devices and their diverse workloads, container-based microservices deployed at edge nodes have become a lightweight and scalable solution. However, existing microservice scheduling algorithms often assume static resource availability, which is unrealistic when multiple containers are assigned to an edge node. Besides, containers suffer from cold-start inefficiencies during early-stage training in currently popular reinforcement learning (RL) algorithms. In this paper, we propose a hybrid learning framework that combines offline imitation learning (IL) with online Soft Actor-Critic (SAC) optimization to enable a cold-start-aware microservice scheduling with dynamic allocation for computing resources. We first formulate a delay-and-energy-aware scheduling problem and construct a rule-based expert to generate demonstration data for behavior cloning. Then, a GRU-enhanced policy network is designed in the policy network to extract the correlation among multiple decisions by separately encoding slow-evolving node states and fast-changing microservice features, and an action selection mechanism is given to speed up the convergence. Extensive experiments show that our method significantly accelerates convergence and achieves superior final performance. Compared with baselines, our algorithm improves the total objective by $50\%$ and convergence speed by $70\%$, and demonstrates the highest stability and robustness across various edge configurations.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
Mutual Evidential Deep Learning for Medical Image Segmentation
Authors:
Yuanpeng He,
Yali Bi,
Lijian Li,
Chi-Man Pun,
Wenpin Jiao,
Zhi Jin
Abstract:
Existing semi-supervised medical segmentation co-learning frameworks have realized that model performance can be diminished by the biases in model recognition caused by low-quality pseudo-labels. Due to the averaging nature of their pseudo-label integration strategy, they fail to explore the reliability of pseudo-labels from different sources. In this paper, we propose a mutual evidential deep lea…
▽ More
Existing semi-supervised medical segmentation co-learning frameworks have realized that model performance can be diminished by the biases in model recognition caused by low-quality pseudo-labels. Due to the averaging nature of their pseudo-label integration strategy, they fail to explore the reliability of pseudo-labels from different sources. In this paper, we propose a mutual evidential deep learning (MEDL) framework that offers a potentially viable solution for pseudo-label generation in semi-supervised learning from two perspectives. First, we introduce networks with different architectures to generate complementary evidence for unlabeled samples and adopt an improved class-aware evidential fusion to guide the confident synthesis of evidential predictions sourced from diverse architectural networks. Second, utilizing the uncertainty in the fused evidence, we design an asymptotic Fisher information-based evidential learning strategy. This strategy enables the model to initially focus on unlabeled samples with more reliable pseudo-labels, gradually shifting attention to samples with lower-quality pseudo-labels while avoiding over-penalization of mislabeled classes in high data uncertainty samples. Additionally, for labeled data, we continue to adopt an uncertainty-driven asymptotic learning strategy, gradually guiding the model to focus on challenging voxels. Extensive experiments on five mainstream datasets have demonstrated that MEDL achieves state-of-the-art performance.
△ Less
Submitted 18 May, 2025;
originally announced May 2025.
-
Transferable Deployment of Semantic Edge Inference Systems via Unsupervised Domain Adaption
Authors:
Weiqiang Jiao,
Suzhi Bi,
Xian Li,
Cheng Guo,
Hao Chen,
Zhi Quan
Abstract:
This paper investigates deploying semantic edge inference systems for performing a common image clarification task. In particular, each system consists of multiple Internet of Things (IoT) devices that first locally encode the sensing data into semantic features and then transmit them to an edge server for subsequent data fusion and task inference. The inference accuracy is determined by efficient…
▽ More
This paper investigates deploying semantic edge inference systems for performing a common image clarification task. In particular, each system consists of multiple Internet of Things (IoT) devices that first locally encode the sensing data into semantic features and then transmit them to an edge server for subsequent data fusion and task inference. The inference accuracy is determined by efficient training of the feature encoder/decoder using labeled data samples. Due to the difference in sensing data and communication channel distributions, deploying the system in a new environment may induce high costs in annotating data labels and re-training the encoder/decoder models. To achieve cost-effective transferable system deployment, we propose an efficient Domain Adaptation method for Semantic Edge INference systems (DASEIN) that can maintain high inference accuracy in a new environment without the need for labeled samples. Specifically, DASEIN exploits the task-relevant data correlation between different deployment scenarios by leveraging the techniques of unsupervised domain adaptation and knowledge distillation. It devises an efficient two-step adaptation procedure that sequentially aligns the data distributions and adapts to the channel variations. Numerical results show that, under a substantial change in sensing data distributions, the proposed DASEIN outperforms the best-performing benchmark method by 7.09% and 21.33% in inference accuracy when the new environment has similar or 25 dB lower channel signal to noise power ratios (SNRs), respectively. This verifies the effectiveness of the proposed method in adapting both data and channel distributions in practical transfer deployment applications.
△ Less
Submitted 16 April, 2025;
originally announced April 2025.
-
One-Shot Affordance Grounding of Deformable Objects in Egocentric Organizing Scenes
Authors:
Wanjun Jia,
Fan Yang,
Mengfei Duan,
Xianchi Chen,
Yinxi Wang,
Yiming Jiang,
Wenrui Chen,
Kailun Yang,
Zhiyong Li
Abstract:
Deformable object manipulation in robotics presents significant challenges due to uncertainties in component properties, diverse configurations, visual interference, and ambiguous prompts. These factors complicate both perception and control tasks. To address these challenges, we propose a novel method for One-Shot Affordance Grounding of Deformable Objects (OS-AGDO) in egocentric organizing scene…
▽ More
Deformable object manipulation in robotics presents significant challenges due to uncertainties in component properties, diverse configurations, visual interference, and ambiguous prompts. These factors complicate both perception and control tasks. To address these challenges, we propose a novel method for One-Shot Affordance Grounding of Deformable Objects (OS-AGDO) in egocentric organizing scenes, enabling robots to recognize previously unseen deformable objects with varying colors and shapes using minimal samples. Specifically, we first introduce the Deformable Object Semantic Enhancement Module (DefoSEM), which enhances hierarchical understanding of the internal structure and improves the ability to accurately identify local features, even under conditions of weak component information. Next, we propose the ORB-Enhanced Keypoint Fusion Module (OEKFM), which optimizes feature extraction of key components by leveraging geometric constraints and improves adaptability to diversity and visual interference. Additionally, we propose an instance-conditional prompt based on image data and task context, effectively mitigates the issue of region ambiguity caused by prompt words. To validate these methods, we construct a diverse real-world dataset, AGDDO15, which includes 15 common types of deformable objects and their associated organizational actions. Experimental results demonstrate that our approach significantly outperforms state-of-the-art methods, achieving improvements of 6.2%, 3.2%, and 2.9% in KLD, SIM, and NSS metrics, respectively, while exhibiting high generalization performance. Source code and benchmark dataset will be publicly available at https://github.com/Dikay1/OS-AGDO.
△ Less
Submitted 2 March, 2025;
originally announced March 2025.
-
PIGUIQA: A Physical Imaging Guided Perceptual Framework for Underwater Image Quality Assessment
Authors:
Weizhi Xian,
Mingliang Zhou,
Leong Hou U,
Lang Shujun,
Bin Fang,
Tao Xiang,
Zhaowei Shang,
Weijia Jia
Abstract:
In this paper, we propose a Physical Imaging Guided perceptual framework for Underwater Image Quality Assessment (UIQA), termed PIGUIQA. First, we formulate UIQA as a comprehensive problem that considers the combined effects of direct transmission attenuation and backward scattering on image perception. By leveraging underwater radiative transfer theory, we systematically integrate physics-based i…
▽ More
In this paper, we propose a Physical Imaging Guided perceptual framework for Underwater Image Quality Assessment (UIQA), termed PIGUIQA. First, we formulate UIQA as a comprehensive problem that considers the combined effects of direct transmission attenuation and backward scattering on image perception. By leveraging underwater radiative transfer theory, we systematically integrate physics-based imaging estimations to establish quantitative metrics for these distortions. Second, recognizing spatial variations in image content significance and human perceptual sensitivity to distortions, we design a module built upon a neighborhood attention mechanism for local perception of images. This module effectively captures subtle features in images, thereby enhancing the adaptive perception of distortions on the basis of local information. Third, by employing a global perceptual aggregator that further integrates holistic image scene with underwater distortion information, the proposed model accurately predicts image quality scores. Extensive experiments across multiple benchmarks demonstrate that PIGUIQA achieves state-of-the-art performance while maintaining robust cross-dataset generalizability. The implementation is publicly available at https://anonymous.4open.science/r/PIGUIQA-A465/
△ Less
Submitted 5 March, 2025; v1 submitted 19 December, 2024;
originally announced December 2024.
-
Mechanisms of Generative Image-to-Image Translation Networks
Authors:
Guangzong Chen,
Mingui Sun,
Zhi-Hong Mao,
Kangni Liu,
Wenyan Jia
Abstract:
Generative Adversarial Networks (GANs) are a class of neural networks that have been widely used in the field of image-to-image translation. In this paper, we propose a streamlined image-to-image translation network with a simpler architecture compared to existing models. We investigate the relationship between GANs and autoencoders and provide an explanation for the efficacy of employing only the…
▽ More
Generative Adversarial Networks (GANs) are a class of neural networks that have been widely used in the field of image-to-image translation. In this paper, we propose a streamlined image-to-image translation network with a simpler architecture compared to existing models. We investigate the relationship between GANs and autoencoders and provide an explanation for the efficacy of employing only the GAN component for tasks involving image translation. We show that adversarial for GAN models yields results comparable to those of existing methods without additional complex loss penalties. Subsequently, we elucidate the rationale behind this phenomenon. We also incorporate experimental results to demonstrate the validity of our findings.
△ Less
Submitted 15 November, 2024;
originally announced November 2024.
-
FDA-MIMO-Based Integrated Multi-Target Sensing and Communication System with Complex Coefficients Information Embedding
Authors:
Jiangwei Jian,
Bang Huang,
Wenkai Jia,
Mingcheng Fu,
Wen-Qin Wang,
Qimao Huang
Abstract:
The echo signals of frequency diverse array multiple-input multiple-output (FDA-MIMO) feature angle-range coupling, enabling simultaneous discrimination and estimation of multiple targets at different locations. In light of this, based on FDA-MIMO, this paper explores an sensing-centric integrated sensing and communication (ISAC) system for multi-target sensing. At the base station, we propose the…
▽ More
The echo signals of frequency diverse array multiple-input multiple-output (FDA-MIMO) feature angle-range coupling, enabling simultaneous discrimination and estimation of multiple targets at different locations. In light of this, based on FDA-MIMO, this paper explores an sensing-centric integrated sensing and communication (ISAC) system for multi-target sensing. At the base station, we propose the FDA-MIMO-based spatial spectrum multi-target estimation (SSMTE) method, which first jointly estimates the angle and distance of targets and then estimates the velocities. To reduce the sensing computational complexity, the low-complexity spatial spectrum estimation (LCSSE) algorithm is proposed. LCSSE reduces the complexity without degrading the sensing performance by converting the joint angle-range search into two one-dimensional searches. To address the range ambiguity caused by frequency offset, a frequency offset design criterion (FODC) is proposed. It designs the integer and fractional components of the frequency offset to ensure the ambiguity distance exceeds the maximum sensing range, thereby alleviating parameters pairing errors. Moreover, the complex coefficients information embedding (CCIE) scheme is designed to improve system communication rates, which carries extra bits by selecting complex coefficients from the coefficient vector. The closed-form expressions for the bit error rate (BER) tight upper bound and the Cramér-Rao bound (CRB) are derived. Simulation results show that the proposed system excels in multi-target sensing and communications.
△ Less
Submitted 4 December, 2024; v1 submitted 4 September, 2024;
originally announced September 2024.
-
Shape-Preserving Generation of Food Images for Automatic Dietary Assessment
Authors:
Guangzong Chen,
Zhi-Hong Mao,
Mingui Sun,
Kangni Liu,
Wenyan Jia
Abstract:
Traditional dietary assessment methods heavily rely on self-reporting, which is time-consuming and prone to bias. Recent advancements in Artificial Intelligence (AI) have revealed new possibilities for dietary assessment, particularly through analysis of food images. Recognizing foods and estimating food volumes from images are known as the key procedures for automatic dietary assessment. However,…
▽ More
Traditional dietary assessment methods heavily rely on self-reporting, which is time-consuming and prone to bias. Recent advancements in Artificial Intelligence (AI) have revealed new possibilities for dietary assessment, particularly through analysis of food images. Recognizing foods and estimating food volumes from images are known as the key procedures for automatic dietary assessment. However, both procedures required large amounts of training images labeled with food names and volumes, which are currently unavailable. Alternatively, recent studies have indicated that training images can be artificially generated using Generative Adversarial Networks (GANs). Nonetheless, convenient generation of large amounts of food images with known volumes remain a challenge with the existing techniques. In this work, we present a simple GAN-based neural network architecture for conditional food image generation. The shapes of the food and container in the generated images closely resemble those in the reference input image. Our experiments demonstrate the realism of the generated images and shape-preserving capabilities of the proposed framework.
△ Less
Submitted 23 August, 2024;
originally announced August 2024.
-
Millimeter Wave Radar-based Human Activity Recognition for Healthcare Monitoring Robot
Authors:
Zhanzhong Gu,
Xiangjian He,
Gengfa Fang,
Chengpei Xu,
Feng Xia,
Wenjing Jia
Abstract:
Healthcare monitoring is crucial, especially for the daily care of elderly individuals living alone. It can detect dangerous occurrences, such as falls, and provide timely alerts to save lives. Non-invasive millimeter wave (mmWave) radar-based healthcare monitoring systems using advanced human activity recognition (HAR) models have recently gained significant attention. However, they encounter cha…
▽ More
Healthcare monitoring is crucial, especially for the daily care of elderly individuals living alone. It can detect dangerous occurrences, such as falls, and provide timely alerts to save lives. Non-invasive millimeter wave (mmWave) radar-based healthcare monitoring systems using advanced human activity recognition (HAR) models have recently gained significant attention. However, they encounter challenges in handling sparse point clouds, achieving real-time continuous classification, and coping with limited monitoring ranges when statically mounted. To overcome these limitations, we propose RobHAR, a movable robot-mounted mmWave radar system with lightweight deep neural networks for real-time monitoring of human activities. Specifically, we first propose a sparse point cloud-based global embedding to learn the features of point clouds using the light-PointNet (LPN) backbone. Then, we learn the temporal pattern with a bidirectional lightweight LSTM model (BiLiLSTM). In addition, we implement a transition optimization strategy, integrating the Hidden Markov Model (HMM) with Connectionist Temporal Classification (CTC) to improve the accuracy and robustness of the continuous HAR. Our experiments on three datasets indicate that our method significantly outperforms the previous studies in both discrete and continuous HAR tasks. Finally, we deploy our system on a movable robot-mounted edge computing platform, achieving flexible healthcare monitoring in real-world scenarios.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report
Authors:
Bin Ren,
Yawei Li,
Nancy Mehta,
Radu Timofte,
Hongyuan Yu,
Cheng Wan,
Yuxin Hong,
Bingnan Han,
Zhuoyuan Wu,
Yajun Zou,
Yuqing Liu,
Jizhe Li,
Keji He,
Chao Fan,
Heng Zhang,
Xiaolin Zhang,
Xuanwu Yin,
Kunlong Zuo,
Bohao Liao,
Peizhe Xia,
Long Peng,
Zhibo Du,
Xin Di,
Wangkai Li,
Yang Wang
, et al. (109 additional authors not shown)
Abstract:
This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such…
▽ More
This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such as runtime, parameters, and FLOPs, while still maintaining a peak signal-to-noise ratio (PSNR) of approximately 26.90 dB on the DIV2K_LSDIR_valid dataset and 26.99 dB on the DIV2K_LSDIR_test dataset. In addition, this challenge has 4 tracks including the main track (overall performance), sub-track 1 (runtime), sub-track 2 (FLOPs), and sub-track 3 (parameters). In the main track, all three metrics (ie runtime, FLOPs, and parameter count) were considered. The ranking of the main track is calculated based on a weighted sum-up of the scores of all other sub-tracks. In sub-track 1, the practical runtime performance of the submissions was evaluated, and the corresponding score was used to determine the ranking. In sub-track 2, the number of FLOPs was considered. The score calculated based on the corresponding FLOPs was used to determine the ranking. In sub-track 3, the number of parameters was considered. The score calculated based on the corresponding parameters was used to determine the ranking. RLFN is set as the baseline for efficiency measurement. The challenge had 262 registered participants, and 34 teams made valid submissions. They gauge the state-of-the-art in efficient single-image super-resolution. To facilitate the reproducibility of the challenge and enable other researchers to build upon these findings, the code and the pre-trained model of validated solutions are made publicly available at https://github.com/Amazingren/NTIRE2024_ESR/.
△ Less
Submitted 25 June, 2024; v1 submitted 16 April, 2024;
originally announced April 2024.
-
Coherent FDA Receiver and Joint Range-Space-Time Processing
Authors:
Wenkai Jia,
Andreas Jakobsson,
Wen-Qin Wang
Abstract:
When a target is masked by mainlobe clutter with the same Doppler frequency, it is difficult for conventional airborne radars to determine whether a target is present in a given observation using regular space-time adaptive processing techniques. Different from phased-array and multiple-input multiple-output (MIMO) arrays, frequency diverse arrays (FDAs) employ frequency offsets across the array e…
▽ More
When a target is masked by mainlobe clutter with the same Doppler frequency, it is difficult for conventional airborne radars to determine whether a target is present in a given observation using regular space-time adaptive processing techniques. Different from phased-array and multiple-input multiple-output (MIMO) arrays, frequency diverse arrays (FDAs) employ frequency offsets across the array elements, delivering additional range-controllable degrees of freedom, potentially enabling suppression for this kind of clutter. However, the reception of coherent FDA systems employing small frequency offsets and achieving high transmit gain can be further improved. To this end, this work proposes an coherent airborne FDA radar receiver that explores the orthogonality of echo signals in the Doppler domain, allowing a joint space-time processing module to be deployed to separate the aliased returns. The resulting range-space-time adaptive processing allows for a preferable detection performance for coherent airborne FDA radars as compared to current alternative techniques.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
DRAC: Diabetic Retinopathy Analysis Challenge with Ultra-Wide Optical Coherence Tomography Angiography Images
Authors:
Bo Qian,
Hao Chen,
Xiangning Wang,
Haoxuan Che,
Gitaek Kwon,
Jaeyoung Kim,
Sungjin Choi,
Seoyoung Shin,
Felix Krause,
Markus Unterdechler,
Junlin Hou,
Rui Feng,
Yihao Li,
Mostafa El Habib Daho,
Qiang Wu,
Ping Zhang,
Xiaokang Yang,
Yiyu Cai,
Weiping Jia,
Huating Li,
Bin Sheng
Abstract:
Computer-assisted automatic analysis of diabetic retinopathy (DR) is of great importance in reducing the risks of vision loss and even blindness. Ultra-wide optical coherence tomography angiography (UW-OCTA) is a non-invasive and safe imaging modality in DR diagnosis system, but there is a lack of publicly available benchmarks for model development and evaluation. To promote further research and s…
▽ More
Computer-assisted automatic analysis of diabetic retinopathy (DR) is of great importance in reducing the risks of vision loss and even blindness. Ultra-wide optical coherence tomography angiography (UW-OCTA) is a non-invasive and safe imaging modality in DR diagnosis system, but there is a lack of publicly available benchmarks for model development and evaluation. To promote further research and scientific benchmarking for diabetic retinopathy analysis using UW-OCTA images, we organized a challenge named "DRAC - Diabetic Retinopathy Analysis Challenge" in conjunction with the 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022). The challenge consists of three tasks: segmentation of DR lesions, image quality assessment and DR grading. The scientific community responded positively to the challenge, with 11, 12, and 13 teams from geographically diverse institutes submitting different solutions in these three tasks, respectively. This paper presents a summary and analysis of the top-performing solutions and results for each task of the challenge. The obtained results from top algorithms indicate the importance of data augmentation, model architecture and ensemble of networks in improving the performance of deep learning models. These findings have the potential to enable new developments in diabetic retinopathy analysis. The challenge remains open for post-challenge registrations and submissions for benchmarking future methodology developments.
△ Less
Submitted 5 April, 2023;
originally announced April 2023.
-
Convolutional Long Short-Term Memory (convLSTM) for Spatio-Temporal Forecastings of Saturations and Pressure in the SACROC Field
Authors:
Palash Panja,
Wei Jia,
Alec Nelson,
Brian McPherson
Abstract:
A machine learning architecture composed of convolutional long short-term memory (convLSTM) is developed to predict spatio-temporal parameters in the SACROC oil field, Texas, USA. The spatial parameters are recorded at the end of each month for 30 years (360 months), approximately 83% (300 months) of which is used for training and the rest 17% (60 months) is kept for testing. The samples for the c…
▽ More
A machine learning architecture composed of convolutional long short-term memory (convLSTM) is developed to predict spatio-temporal parameters in the SACROC oil field, Texas, USA. The spatial parameters are recorded at the end of each month for 30 years (360 months), approximately 83% (300 months) of which is used for training and the rest 17% (60 months) is kept for testing. The samples for the convLSTM models are prepared by choosing ten consecutive frames as input and ten consecutive frames shifted forward by one frame as output. Individual models are trained for oil, gas, and water saturations, and pressure using the Nesterov accelerated adaptive moment estimation (Nadam) optimization algorithm. A workflow is provided to comprehend the entire process of data extraction, preprocessing, sample preparation, training, testing of machine learning models, and error analysis. Overall, the convLSTM for spatio-temporal prediction shows promising results in predicting spatio-temporal parameters in porous media.
△ Less
Submitted 15 October, 2022;
originally announced December 2022.
-
A Scope Sensitive and Result Attentive Model for Multi-Intent Spoken Language Understanding
Authors:
Lizhi Cheng,
Wenmian Yang,
Weijia Jia
Abstract:
Multi-Intent Spoken Language Understanding (SLU), a novel and more complex scenario of SLU, is attracting increasing attention. Unlike traditional SLU, each intent in this scenario has its specific scope. Semantic information outside the scope even hinders the prediction, which tremendously increases the difficulty of intent detection. More seriously, guiding slot filling with these inaccurate int…
▽ More
Multi-Intent Spoken Language Understanding (SLU), a novel and more complex scenario of SLU, is attracting increasing attention. Unlike traditional SLU, each intent in this scenario has its specific scope. Semantic information outside the scope even hinders the prediction, which tremendously increases the difficulty of intent detection. More seriously, guiding slot filling with these inaccurate intent labels suffers error propagation problems, resulting in unsatisfied overall performance. To solve these challenges, in this paper, we propose a novel Scope-Sensitive Result Attention Network (SSRAN) based on Transformer, which contains a Scope Recognizer (SR) and a Result Attention Network (RAN). Scope Recognizer assignments scope information to each token, reducing the distraction of out-of-scope tokens. Result Attention Network effectively utilizes the bidirectional interaction between results of slot filling and intent detection, mitigating the error propagation problem. Experiments on two public datasets indicate that our model significantly improves SLU performance (5.4\% and 2.1\% on Overall accuracy) over the state-of-the-art baseline.
△ Less
Submitted 22 November, 2022;
originally announced November 2022.
-
Time-Range FDA Beampattern Characteristics
Authors:
Wenkai Jia,
Andreas Jakobsson,
Wen-Qin Wang
Abstract:
Current literature show that frequency diverse arrays (FDAs) are able of producing range-angle-dependent and time-variant transmit beampatterns, but the resulting time and range dependencies and their characteristics are still not well understood. This paper examines the FDA transmission model and the model for the FDA array factor, considering their time-range relationship. We develop two FDA tra…
▽ More
Current literature show that frequency diverse arrays (FDAs) are able of producing range-angle-dependent and time-variant transmit beampatterns, but the resulting time and range dependencies and their characteristics are still not well understood. This paper examines the FDA transmission model and the model for the FDA array factor, considering their time-range relationship. We develop two FDA transmit beampatterns, both yielding the auto-scanning capability of the FDA transmit beams. The scan speed, scan volume, and initial mainlobe direction of the beams are also analyzed. In addition, the equivalent conditions for the FDA integral transmit beampattern and the multiple-input multiple-output (MIMO) beampattern are investigated. Various numerical simulations illustrate the auto-scanning property of the FDA beampattern and the proposed equivalent relationship with the MIMO beampattern, providing the basis for an improved understanding and design of the FDA transmit beampattern.
△ Less
Submitted 21 December, 2022; v1 submitted 14 April, 2022;
originally announced April 2022.
-
Joint Design of the Transmit and Receive Weights for Coherent FDA Radar
Authors:
Wenkai Jia,
Wen-Qin Wang,
Shungsheng Zhang
Abstract:
Frequency diverse array (FDA) differs from conventional array techniques in that it imposes an additional frequency offset (FO) across the array elements. The use of FO provides the FDA with the controllable degree of freedom in range dimension, offering preferable performance in joint angle and range localization, range-ambiguous clutter suppression, and low probability of intercept, as compared…
▽ More
Frequency diverse array (FDA) differs from conventional array techniques in that it imposes an additional frequency offset (FO) across the array elements. The use of FO provides the FDA with the controllable degree of freedom in range dimension, offering preferable performance in joint angle and range localization, range-ambiguous clutter suppression, and low probability of intercept, as compared to its phased-array or multiple-input multiple-output (MIMO) counterparts. In particular, the FO of the coherent FDA is much smaller than the bandwidth of the baseband waveform, capable of obtaining higher transmit gain and output signal-to-interference-plus-noise ratio (SINR). In this paper, we investigate the problem of joint design of the transmit and receive weights for coherent FDA radar systems. The design problem is formulated as the maximization of the ratio of the power in the desired two-dimensional range-angle space to the power in the entire area, subject to an energy constraint that limits the emitted energy of each transmit antenna and a similarity constraint such that a good transmit beampattern can be guaranteed. Due to the resultant problem is NP-hard, therefore, a sequential optimization method based on semidefinite relaxation (SDR) technique is developed. Numerical simulations are provided to demonstrate the effectiveness of the proposed scheme.
△ Less
Submitted 20 December, 2022; v1 submitted 14 April, 2022;
originally announced April 2022.
-
Designing FDA Radars Robust to Contaminated Shared Spectra
Authors:
Wenkai Jia,
Andreas Jakobsson,
Wen-Qin Wang
Abstract:
This paper considers the problem of jointly designing the transmit waveforms and weights for a frequency diverse array (FDA) in a spectrally congested environment in which unintentional spectral interferences exist. Exploiting the properties of the interference signal induced by the processing of the multi-channel mixing and low-pass filtering FDA receiver, the interference covariance matrix struc…
▽ More
This paper considers the problem of jointly designing the transmit waveforms and weights for a frequency diverse array (FDA) in a spectrally congested environment in which unintentional spectral interferences exist. Exploiting the properties of the interference signal induced by the processing of the multi-channel mixing and low-pass filtering FDA receiver, the interference covariance matrix structure is derived. With this, the receive weights are formed using the minimum variance distortionless response (MVDR) method for interference cancellation. Owing to the fact that the resulting output signal-to-interference-plus-noise ratio (SINR) is a function of the transmit waveforms and weights, as well as due to the ever-greater competition for the finite available spectrum, a joint design scheme for the FDA transmit weights and the spectrally compatible waveforms is proposed to efficiently use the available spectrum while maintaining a sufficient receive SINR. The performance of the proposed technique is verified using numerical simulations in terms of the achievable SINR, spectral compatiblity, as well as several aspects of the synthesized waveforms.
△ Less
Submitted 20 December, 2022; v1 submitted 14 April, 2022;
originally announced April 2022.
-
Waveform Optimization with SINR Criteria for FDA Radar in the Presence of Signal-Dependent Mainlobe Interference
Authors:
Wenkai Jia,
Andreas Jakobsson,
Wen-Qin Wang
Abstract:
In this paper, we focus on the design of the transmit waveforms of a frequency diverse array (FDA) in order to improve the output signal-to-interference-plus-noise ratio (SINR) in the presence of signal-dependent mainlobe interference. Since the classical multi-carrier matched filtering-based FDA receiver cannot effectively utilize the waveform diversity of FDA, a novel FDA receiver framework base…
▽ More
In this paper, we focus on the design of the transmit waveforms of a frequency diverse array (FDA) in order to improve the output signal-to-interference-plus-noise ratio (SINR) in the presence of signal-dependent mainlobe interference. Since the classical multi-carrier matched filtering-based FDA receiver cannot effectively utilize the waveform diversity of FDA, a novel FDA receiver framework based on multi-channel mixing and low-pass filtering is developed to keep the separation of the transmit waveform at the receiver side, while preserving the FDA range-controllable degrees of freedom. Furthermore, a range-angle minimum variance distortionless response beamforming technique is introduced to synthesize receiver filter weights with the ability to suppress a possible signal-dependent mainlobe interference. The resulting FDA transmit waveform design problem is initially formulated as an optimization problem consisting of a non-convex objective function and multiple non-convex constraints. To efficiently slove this, we introduce two algorithms, one based on a signal relaxation technique, and the other based on the majorization minimization technique. The preferable performance of the proposed multi-channel low-pass filtering receiver and the optimized transmit waveforms is illustrated using numerical simulations, indicating that the resulting FDA system is not only able to effectively suppress mainlobe interference, but also to yield estimates with a higher SINR than the FDA system without waveform optimization.
△ Less
Submitted 20 December, 2022; v1 submitted 14 April, 2022;
originally announced April 2022.
-
Detecting Soil Moisture Levels Using Battery-Free Wi-Fi Tag
Authors:
Wenli Jiao,
Ju Wang,
Yelu He,
Xiangdong Xi,
Xiaojiang Chen
Abstract:
Soil sensing plays an important role in increasing agricultural output and protecting soil sites. Existing soil sensing methods failed to achieve both high accuracy and low cost. In this paper, we design and implement a high-accuracy and low cost chipless soil moisture sensing system called SoilTAG. We propose a general chipless sensor design methodology which can allow us to customize the signal…
▽ More
Soil sensing plays an important role in increasing agricultural output and protecting soil sites. Existing soil sensing methods failed to achieve both high accuracy and low cost. In this paper, we design and implement a high-accuracy and low cost chipless soil moisture sensing system called SoilTAG. We propose a general chipless sensor design methodology which can allow us to customize the signal feature for sensing soil moisture, instead of blindly capturing the disturbance law of the soil. Based on this principle, we design a battery-free passive tag which can respond to different soil-moisture. Further, we optimize hardware and algorithm design of SoilTAG to locate the passive tag and extract its reflection signal feature to identify soil-moisture using WiFi signals. Extensive experimental results reveal that it can identify 2% absolute soil water content with a sensing distance up to 3m in open field. When the sensing distance is up to 13 m, it can also achieve 5% absolute soil-moisture sensing resolution.
△ Less
Submitted 4 February, 2022;
originally announced February 2022.
-
Multi-frame Joint Enhancement for Early Interlaced Videos
Authors:
Yang Zhao,
Yanbo Ma,
Yuan Chen,
Wei Jia,
Ronggang Wang,
Xiaoping Liu
Abstract:
Early interlaced videos usually contain multiple and interlacing and complex compression artifacts, which significantly reduce the visual quality. Although the high-definition reconstruction technology for early videos has made great progress in recent years, related research on deinterlacing is still lacking. Traditional methods mainly focus on simple interlacing mechanism, and cannot deal with t…
▽ More
Early interlaced videos usually contain multiple and interlacing and complex compression artifacts, which significantly reduce the visual quality. Although the high-definition reconstruction technology for early videos has made great progress in recent years, related research on deinterlacing is still lacking. Traditional methods mainly focus on simple interlacing mechanism, and cannot deal with the complex artifacts in real-world early videos. Recent interlaced video reconstruction deep deinterlacing models only focus on single frame, while neglecting important temporal information. Therefore, this paper proposes a multiframe deinterlacing network joint enhancement network for early interlaced videos that consists of three modules, i.e., spatial vertical interpolation module, temporal alignment and fusion module, and final refinement module. The proposed method can effectively remove the complex artifacts in early videos by using temporal redundancy of multi-fields. Experimental results demonstrate that the proposed method can recover high quality results for both synthetic dataset and real-world early interlaced videos.
△ Less
Submitted 28 September, 2021;
originally announced September 2021.
-
Generative and Discriminative Learning for Distorted Image Restoration
Authors:
Yi Gu,
Yuting Gao,
Jie Li,
Chentao Wu,
Weijia Jia
Abstract:
Liquify is a common technique for image editing, which can be used for image distortion. Due to the uncertainty in the distortion variation, restoring distorted images caused by liquify filter is a challenging task. To edit images in an efficient way, distorted images are expected to be restored automatically. This paper aims at the distorted image restoration, which is characterized by seeking th…
▽ More
Liquify is a common technique for image editing, which can be used for image distortion. Due to the uncertainty in the distortion variation, restoring distorted images caused by liquify filter is a challenging task. To edit images in an efficient way, distorted images are expected to be restored automatically. This paper aims at the distorted image restoration, which is characterized by seeking the appropriate warping and completion of a distorted image. Existing methods focus on the hardware assistance or the geometric principle to solve the specific regular deformation caused by natural phenomena, but they cannot handle the irregularity and uncertainty of artificial distortion in this task. To address this issue, we propose a novel generative and discriminative learning method based on deep neural networks, which can learn various reconstruction mappings and represent complex and high-dimensional data. This method decomposes the task into a rectification stage and a refinement stage. The first stage generative network predicts the mapping from the distorted images to the rectified ones. The second stage generative network then further optimizes the perceptual quality. Since there is no available dataset or benchmark to explore this task, we create a Distorted Face Dataset (DFD) by forward distortion mapping based on CelebA dataset. Extensive experimental evaluation on the proposed benchmark and the application demonstrates that our method is an effective way for distorted image restoration.
△ Less
Submitted 27 November, 2020; v1 submitted 11 November, 2020;
originally announced November 2020.
-
Interpretable Crowd Flow Prediction with Spatial-Temporal Self-Attention
Authors:
Haoxing Lin,
Weijia Jia,
Yongjian You,
Yiping Sun
Abstract:
Crowd flow prediction has been increasingly investigated in intelligent urban computing field as a fundamental component of urban management system. The most challenging part of predicting crowd flow is to measure the complicated spatial-temporal dependencies. A prevalent solution employed in current methods is to divide and conquer the spatial and temporal information by various architectures (e.…
▽ More
Crowd flow prediction has been increasingly investigated in intelligent urban computing field as a fundamental component of urban management system. The most challenging part of predicting crowd flow is to measure the complicated spatial-temporal dependencies. A prevalent solution employed in current methods is to divide and conquer the spatial and temporal information by various architectures (e.g., CNN/GCN, LSTM). However, this strategy has two disadvantages: (1) the sophisticated dependencies are also divided and therefore partially isolated; (2) the spatial-temporal features are transformed into latent representations when passing through different architectures, making it hard to interpret the predicted crowd flow. To address these issues, we propose a Spatial-Temporal Self-Attention Network (STSAN) with an ST encoding gate that calculates the entire spatial-temporal representation with positional and time encodings and therefore avoids dividing the dependencies. Furthermore, we develop a Multi-aspect attention mechanism that applies scaled dot-product attention over spatial-temporal information and measures the attention weights that explicitly indicate the dependencies. Experimental results on traffic and mobile data demonstrate that the proposed method reduces inflow and outflow RMSE by 16% and 8% on the Taxi-NYC dataset compared to the SOTA baselines.
△ Less
Submitted 22 February, 2020;
originally announced February 2020.
-
Residual-Guided In-Loop Filter Using Convolution Neural Network
Authors:
Wei Jia,
Li Li,
Zhu Li,
xiang zhang,
Shan Liu
Abstract:
The block-based coding structure in the hybrid video coding framework inevitably introduces compression artifacts such as blocking, ringing, etc. To compensate for those artifacts, extensive filtering techniques were proposed in the loop of video codecs, which are capable of boosting the subjective and objective qualities of reconstructed videos. Recently, neural network based filters were present…
▽ More
The block-based coding structure in the hybrid video coding framework inevitably introduces compression artifacts such as blocking, ringing, etc. To compensate for those artifacts, extensive filtering techniques were proposed in the loop of video codecs, which are capable of boosting the subjective and objective qualities of reconstructed videos. Recently, neural network based filters were presented with the power of deep learning from a large magnitude of data. Though the coding efficiency has been improved from traditional methods in High-Efficiency Video Coding (HEVC), the rich features and information generated by the compression pipeline has not been fully utilized in the design of neural networks. Therefore, in this paper, we propose the Residual-Reconstruction-based Convolutional Neural Network (RRNet) to further improve the coding efficiency to its full extent, where the compression features induced from bitstream in form of prediction residual is fed into the network as an additional input to the reconstructed frame. In essence, the residual signal can provide valuable information about block partitions and can aid reconstruction of edge and texture regions in a picture. Thus, more adaptive parameters can be trained to handle different texture characteristics. The experimental results show that our proposed RRNet approach presents significant BD-rate savings compared to HEVC and the state-of-the-art CNN-based schemes, indicating that residual signal plays a significant role in enhancing video frame reconstruction.
△ Less
Submitted 3 May, 2021; v1 submitted 29 July, 2019;
originally announced July 2019.
-
Time Varying Channel Tracking with Spatial and Temporal BEM for Massive MIMO Systems
Authors:
Jianwei Zhao,
Hongxiang Xie,
Feifei Gao,
Weimin Jia,
Shi Jin,
Hai Lin
Abstract:
In this paper, we propose a channel tracking method for massive multi-input and multi-output systems under both time-varying and spatial-varying circumstance. Exploiting the characteristics of massive antenna array, a spatial-temporal basis expansion model is designed to reduce the effective dimensions of up-link and down-link channel, which decomposes channel state information into the time-varyi…
▽ More
In this paper, we propose a channel tracking method for massive multi-input and multi-output systems under both time-varying and spatial-varying circumstance. Exploiting the characteristics of massive antenna array, a spatial-temporal basis expansion model is designed to reduce the effective dimensions of up-link and down-link channel, which decomposes channel state information into the time-varying spatial information and gain information. We firstly model the users movements as a one-order unknown Markov process, which is blindly learned by the expectation and maximization (EM) approach. Then, the up-link time varying spatial information can be blindly tracked by Taylor series expansion of the steering vector, while the rest up-link channel gain information can be trained by only a few pilot symbols. Due to angle reciprocity (spatial reciprocity), the spatial information of the down-link channel can be immediately obtained from the up-link counterpart, which greatly reduces the complexity of down-link channel tracking. Various numerical results are provided to demonstrate the effectiveness of the proposed method.
△ Less
Submitted 27 February, 2018;
originally announced February 2018.
-
Beam Tracking for UAV Mounted SatCom on-the-Move with Massive Antenna Array
Authors:
Jianwei Zhao,
Feifei Gao,
Qihui Wu,
Shi Jin,
Yi Wu,
Weimin Jia
Abstract:
Unmanned aerial vehicle (UAV)-satellite communication has drawn dramatic attention for its potential to build the integrated space-air-ground network and the seamless wide-area coverage. The key challenge to UAV-satellite communication is its unstable beam pointing due to the UAV navigation, which is a typical SatCom on-the-move scenario. In this paper, we propose a blind beam tracking approach fo…
▽ More
Unmanned aerial vehicle (UAV)-satellite communication has drawn dramatic attention for its potential to build the integrated space-air-ground network and the seamless wide-area coverage. The key challenge to UAV-satellite communication is its unstable beam pointing due to the UAV navigation, which is a typical SatCom on-the-move scenario. In this paper, we propose a blind beam tracking approach for Ka-band UAVsatellite communication system, where UAV is equipped with a large-scale antenna array. The effects of UAV navigation are firstly released through the mechanical adjustment, which could approximately point the beam towards the target satellite through beam stabilization and dynamic isolation. Specially, the attitude information can be realtimely derived from data fusion of lowcost sensors. Then, the precision of the beam pointing is blindly refined through electrically adjusting the weight of the massive antennas, where an array structure based simultaneous perturbation algorithm is designed. Simulation results are provided to demonstrate the superiority of the proposed method over the existing ones.
△ Less
Submitted 22 September, 2017;
originally announced September 2017.