-
CiUAV: A Multi-Task 3D Indoor Localization System for UAVs based on Channel State Information
Authors:
Cunyi Yin,
Chenwei Wang,
Jing Chen,
Hao Jiang,
Xiren Miao,
Shaocong Zheng Zhenghua Chen Senior,
Hong Yan
Abstract:
Accurate indoor positioning for unmanned aerial vehicles (UAVs) is critical for logistics, surveillance, and emergency response applications, particularly in GPS-denied environments. Existing indoor localization methods, including optical tracking, ultra-wideband, and Bluetooth-based systems, face cost, accuracy, and robustness trade-offs, limiting their practicality for UAV navigation. This paper…
▽ More
Accurate indoor positioning for unmanned aerial vehicles (UAVs) is critical for logistics, surveillance, and emergency response applications, particularly in GPS-denied environments. Existing indoor localization methods, including optical tracking, ultra-wideband, and Bluetooth-based systems, face cost, accuracy, and robustness trade-offs, limiting their practicality for UAV navigation. This paper proposes CiUAV, a novel 3D indoor localization system designed for UAVs, leveraging channel state information (CSI) obtained from low-cost ESP32 IoT-based sensors. The system incorporates a dynamic automatic gain control (AGC) compensation algorithm to mitigate noise and stabilize CSI signals, significantly enhancing the robustness of the measurement. Additionally, a multi-task 3D localization model, Sensor-in-Sample (SiS), is introduced to enhance system robustness by addressing challenges related to incomplete sensor data and limited training samples. SiS achieves this by joint training with varying sensor configurations and sample sizes, ensuring reliable performance even in resource-constrained scenarios. Experiment results demonstrate that CiUAV achieves a LMSE localization error of 0.2629 m in a 3D space, achieving good accuracy and robustness. The proposed system provides a cost-effective and scalable solution, demonstrating its usefulness for UAV applications in resource-constrained indoor environments.
△ Less
Submitted 27 May, 2025;
originally announced May 2025.
-
MaskAttn-UNet: A Mask Attention-Driven Framework for Universal Low-Resolution Image Segmentation
Authors:
Anzhe Cheng,
Chenzhong Yin,
Yu Chang,
Heng Ping,
Shixuan Li,
Shahin Nazarian,
Paul Bogdan
Abstract:
Low-resolution image segmentation is crucial in real-world applications such as robotics, augmented reality, and large-scale scene understanding, where high-resolution data is often unavailable due to computational constraints. To address this challenge, we propose MaskAttn-UNet, a novel segmentation framework that enhances the traditional U-Net architecture via a mask attention mechanism. Our mod…
▽ More
Low-resolution image segmentation is crucial in real-world applications such as robotics, augmented reality, and large-scale scene understanding, where high-resolution data is often unavailable due to computational constraints. To address this challenge, we propose MaskAttn-UNet, a novel segmentation framework that enhances the traditional U-Net architecture via a mask attention mechanism. Our model selectively emphasizes important regions while suppressing irrelevant backgrounds, thereby improving segmentation accuracy in cluttered and complex scenes. Unlike conventional U-Net variants, MaskAttn-UNet effectively balances local feature extraction with broader contextual awareness, making it particularly well-suited for low-resolution inputs. We evaluate our approach on three benchmark datasets with input images rescaled to 128x128 and demonstrate competitive performance across semantic, instance, and panoptic segmentation tasks. Our results show that MaskAttn-UNet achieves accuracy comparable to state-of-the-art methods at significantly lower computational cost than transformer-based models, making it an efficient and scalable solution for low-resolution segmentation in resource-constrained scenarios.
△ Less
Submitted 7 May, 2025; v1 submitted 11 March, 2025;
originally announced March 2025.
-
Joint ML-Bayesian Approach to Adaptive Radar Detection in the presence of Gaussian Interference
Authors:
Chaoran Yin,
Tianqi Wang,
Linjie Yan,
Chengpeng Hao,
Alfonso Farina,
Danilo Orlando
Abstract:
This paper addresses the adaptive radar target detection problem in the presence of Gaussian interference with unknown statistical properties. To this end, the problem is first formulated as a binary hypothesis test, and then we derive a detection architecture grounded on the hybrid of Maximum Likelihood (ML) and Maximum A Posterior (MAP) approach. Specifically, we resort to the hidden discrete la…
▽ More
This paper addresses the adaptive radar target detection problem in the presence of Gaussian interference with unknown statistical properties. To this end, the problem is first formulated as a binary hypothesis test, and then we derive a detection architecture grounded on the hybrid of Maximum Likelihood (ML) and Maximum A Posterior (MAP) approach. Specifically, we resort to the hidden discrete latent variables in conjunction with the Expectation-Maximization (EM) algorithms which cyclically updates the estimates of the unknowns. In this framework, the estimates of the a posteriori probabilities under each hypothesis are representative of the inherent nature of data and used to decide for the presence of a potential target. In addition, we prove that the developed detection scheme ensures the desired Constant False Alarm Rate property with respect to the unknown interference covariance matrix. Numerical examples obtained through synthetic and real recorded data corroborate the effectiveness of the proposed architecture and show that the MAP-based approach ensures evident improvement with respect to the conventional generalized likelihood ratio test at least for the considered scenarios and parameter setting.
△ Less
Submitted 3 March, 2025;
originally announced March 2025.
-
VidMusician: Video-to-Music Generation with Semantic-Rhythmic Alignment via Hierarchical Visual Features
Authors:
Sifei Li,
Binxin Yang,
Chunji Yin,
Chong Sun,
Yuxin Zhang,
Weiming Dong,
Chen Li
Abstract:
Video-to-music generation presents significant potential in video production, requiring the generated music to be both semantically and rhythmically aligned with the video. Achieving this alignment demands advanced music generation capabilities, sophisticated video understanding, and an efficient mechanism to learn the correspondence between the two modalities. In this paper, we propose VidMusicia…
▽ More
Video-to-music generation presents significant potential in video production, requiring the generated music to be both semantically and rhythmically aligned with the video. Achieving this alignment demands advanced music generation capabilities, sophisticated video understanding, and an efficient mechanism to learn the correspondence between the two modalities. In this paper, we propose VidMusician, a parameter-efficient video-to-music generation framework built upon text-to-music models. VidMusician leverages hierarchical visual features to ensure semantic and rhythmic alignment between video and music. Specifically, our approach utilizes global visual features as semantic conditions and local visual features as rhythmic cues. These features are integrated into the generative backbone via cross-attention and in-attention mechanisms, respectively. Through a two-stage training process, we incrementally incorporate semantic and rhythmic features, utilizing zero initialization and identity initialization to maintain the inherent music-generative capabilities of the backbone. Additionally, we construct a diverse video-music dataset, DVMSet, encompassing various scenarios, such as promo videos, commercials, and compilations. Experiments demonstrate that VidMusician outperforms state-of-the-art methods across multiple evaluation metrics and exhibits robust performance on AI-generated videos. Samples are available at \url{https://youtu.be/EPOSXwtl1jw}.
△ Less
Submitted 9 December, 2024;
originally announced December 2024.
-
Deep Learning, Machine Learning -- Digital Signal and Image Processing: From Theory to Application
Authors:
Weiche Hsieh,
Ziqian Bi,
Junyu Liu,
Benji Peng,
Sen Zhang,
Xuanhe Pan,
Jiawei Xu,
Jinlang Wang,
Keyu Chen,
Caitlyn Heqi Yin,
Pohsun Feng,
Yizhu Wen,
Tianyang Wang,
Ming Li,
Jintao Ren,
Qian Niu,
Silin Chen,
Ming Liu
Abstract:
Digital Signal Processing (DSP) and Digital Image Processing (DIP) with Machine Learning (ML) and Deep Learning (DL) are popular research areas in Computer Vision and related fields. We highlight transformative applications in image enhancement, filtering techniques, and pattern recognition. By integrating frameworks like the Discrete Fourier Transform (DFT), Z-Transform, and Fourier Transform met…
▽ More
Digital Signal Processing (DSP) and Digital Image Processing (DIP) with Machine Learning (ML) and Deep Learning (DL) are popular research areas in Computer Vision and related fields. We highlight transformative applications in image enhancement, filtering techniques, and pattern recognition. By integrating frameworks like the Discrete Fourier Transform (DFT), Z-Transform, and Fourier Transform methods, we enable robust data manipulation and feature extraction essential for AI-driven tasks. Using Python, we implement algorithms that optimize real-time data processing, forming a foundation for scalable, high-performance solutions in computer vision. This work illustrates the potential of ML and DL to advance DSP and DIP methodologies, contributing to artificial intelligence, automated feature extraction, and applications across diverse domains.
△ Less
Submitted 26 October, 2024;
originally announced October 2024.
-
A Recursion-Based SNR Determination Method for Short Packet Transmission: Analysis and Applications
Authors:
Chengzhe Yin,
Rui Zhang,
Yongzhao Li,
Yuhan Ruan,
Tao Li,
Jiaheng Lu
Abstract:
The short packet transmission (SPT) has gained much attention in recent years. In SPT, the most significant characteristic is that the finite blocklength code (FBC) is adopted. With FBC, the signal-to-noise ratio (SNR) cannot be expressed as an explicit function with respect to the other transmission parameters. This raises the following two problems for the resource allocation in SPTs: (i) The ex…
▽ More
The short packet transmission (SPT) has gained much attention in recent years. In SPT, the most significant characteristic is that the finite blocklength code (FBC) is adopted. With FBC, the signal-to-noise ratio (SNR) cannot be expressed as an explicit function with respect to the other transmission parameters. This raises the following two problems for the resource allocation in SPTs: (i) The exact value of the SNR is hard to determine, and (ii) The property of SNR w.r.t. the other parameters is hard to analyze, which hinders the efficient optimization of them. To simultaneously tackle these problems, we have developed a recursion method in our prior work. To emphasize the significance of this method, we further analyze the convergence rate of the recursion method and investigate the property of the recursion function in this paper. Specifically, we first analyze the convergence rate of the recursion method, which indicates it can determine the SNR with low complexity. Then, we analyze the property of the recursion function, which facilitates the optimization of the other parameters during the recursion. Finally, we also enumerate some applications for the recursion method. Simulation results indicate that the recursion method converges faster than the other SNR determination methods. Besides, the results also show that the recursion-based methods can almost achieve the optimal solution of the application cases.
△ Less
Submitted 23 August, 2024;
originally announced August 2024.
-
SVSNet+: Enhancing Speaker Voice Similarity Assessment Models with Representations from Speech Foundation Models
Authors:
Chun Yin,
Tai-Shih Chi,
Yu Tsao,
Hsin-Min Wang
Abstract:
Representations from pre-trained speech foundation models (SFMs) have shown impressive performance in many downstream tasks. However, the potential benefits of incorporating pre-trained SFM representations into speaker voice similarity assessment have not been thoroughly investigated. In this paper, we propose SVSNet+, a model that integrates pre-trained SFM representations to improve performance…
▽ More
Representations from pre-trained speech foundation models (SFMs) have shown impressive performance in many downstream tasks. However, the potential benefits of incorporating pre-trained SFM representations into speaker voice similarity assessment have not been thoroughly investigated. In this paper, we propose SVSNet+, a model that integrates pre-trained SFM representations to improve performance in assessing speaker voice similarity. Experimental results on the Voice Conversion Challenge 2018 and 2020 datasets show that SVSNet+ incorporating WavLM representations shows significant improvements compared to baseline models. In addition, while fine-tuning WavLM with a small dataset of the downstream task does not improve performance, using the same dataset to learn a weighted-sum representation of WavLM can substantially improve performance. Furthermore, when WavLM is replaced by other SFMs, SVSNet+ still outperforms the baseline models and exhibits strong generalization ability.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Does Optimal Control Always Benefit from Better Prediction? An Analysis Framework for Predictive Optimal Control
Authors:
Xiangrui Zeng,
Cheng Yin,
Zhouping Yin
Abstract:
The ``prediction + optimal control'' scheme has shown good performance in many applications of automotive, traffic, robot, and building control. In practice, the prediction results are simply considered correct in the optimal control design process. However, in reality, these predictions may never be perfect. Under a conventional stochastic optimal control formulation, it is difficult to answer qu…
▽ More
The ``prediction + optimal control'' scheme has shown good performance in many applications of automotive, traffic, robot, and building control. In practice, the prediction results are simply considered correct in the optimal control design process. However, in reality, these predictions may never be perfect. Under a conventional stochastic optimal control formulation, it is difficult to answer questions like ``what if the predictions are wrong''. This paper presents an analysis framework for predictive optimal control where the subjective belief about the future is no longer considered perfect. A novel concept called the hidden prediction state is proposed to establish connections among the predictors, the subjective beliefs, the control policies and the objective control performance. Based on this framework, the predictor evaluation problem is analyzed. Three commonly-used predictor evaluation measures, including the mean squared error, the regret and the log-likelihood, are considered. It is shown that neither using the mean square error nor using the likelihood can guarantee a monotonic relationship between the predictor error and the optimal control cost. To guarantee control cost improvement, it is suggested the predictor should be evaluated with the control performance, e.g., using the optimal control cost or the regret to evaluate predictors. Numerical examples and examples from automotive applications with real-world driving data are provided to illustrate the ideas and the results.
△ Less
Submitted 5 May, 2024;
originally announced May 2024.
-
Human Activity Recognition with Low-Resolution Infrared Array Sensor Using Semi-supervised Cross-domain Neural Networks for Indoor Environment
Authors:
Cunyi Yin,
Xiren Miao,
Jing Chen,
Hao Jiang,
Deying Chen,
Yixuan Tong,
Shaocong Zheng
Abstract:
Low-resolution infrared-based human activity recognition (HAR) attracted enormous interests due to its low-cost and private. In this paper, a novel semi-supervised crossdomain neural network (SCDNN) based on 8 $\times$ 8 low-resolution infrared sensor is proposed for accurately identifying human activity despite changes in the environment at a low-cost. The SCDNN consists of feature extractor, dom…
▽ More
Low-resolution infrared-based human activity recognition (HAR) attracted enormous interests due to its low-cost and private. In this paper, a novel semi-supervised crossdomain neural network (SCDNN) based on 8 $\times$ 8 low-resolution infrared sensor is proposed for accurately identifying human activity despite changes in the environment at a low-cost. The SCDNN consists of feature extractor, domain discriminator and label classifier. In the feature extractor, the unlabeled and minimal labeled target domain data are trained for domain adaptation to achieve a mapping of the source domain and target domain data. The domain discriminator employs the unsupervised learning to migrate data from the source domain to the target domain. The label classifier obtained from training the source domain data improves the recognition of target domain activities due to the semi-supervised learning utilized in training the target domain data. Experimental results show that the proposed method achieves 92.12\% accuracy for recognition of activities in the target domain by migrating the source and target domains. The proposed approach adapts superior to cross-domain scenarios compared to the existing deep learning methods, and it provides a low-cost yet highly adaptable solution for cross-domain scenarios.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
PowerSkel: A Device-Free Framework Using CSI Signal for Human Skeleton Estimation in Power Station
Authors:
Cunyi Yin,
Xiren Miao,
Jing Chen,
Hao Jiang,
Jianfei Yang,
Yunjiao Zhou,
Min Wu,
Zhenghua Chen
Abstract:
Safety monitoring of power operations in power stations is crucial for preventing accidents and ensuring stable power supply. However, conventional methods such as wearable devices and video surveillance have limitations such as high cost, dependence on light, and visual blind spots. WiFi-based human pose estimation is a suitable method for monitoring power operations due to its low cost, device-f…
▽ More
Safety monitoring of power operations in power stations is crucial for preventing accidents and ensuring stable power supply. However, conventional methods such as wearable devices and video surveillance have limitations such as high cost, dependence on light, and visual blind spots. WiFi-based human pose estimation is a suitable method for monitoring power operations due to its low cost, device-free, and robustness to various illumination conditions.In this paper, a novel Channel State Information (CSI)-based pose estimation framework, namely PowerSkel, is developed to address these challenges. PowerSkel utilizes self-developed CSI sensors to form a mutual sensing network and constructs a CSI acquisition scheme specialized for power scenarios. It significantly reduces the deployment cost and complexity compared to the existing solutions. To reduce interference with CSI in the electricity scenario, a sparse adaptive filtering algorithm is designed to preprocess the CSI. CKDformer, a knowledge distillation network based on collaborative learning and self-attention, is proposed to extract the features from CSI and establish the mapping relationship between CSI and keypoints. The experiments are conducted in a real-world power station, and the results show that the PowerSkel achieves high performance with a PCK@50 of 96.27%, and realizes a significant visualization on pose estimation, even in dark environments. Our work provides a novel low-cost and high-precision pose estimation solution for power operation.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Unlocking Deep Learning: A BP-Free Approach for Parallel Block-Wise Training of Neural Networks
Authors:
Anzhe Cheng,
Zhenkun Wang,
Chenzhong Yin,
Mingxi Cheng,
Heng Ping,
Xiongye Xiao,
Shahin Nazarian,
Paul Bogdan
Abstract:
Backpropagation (BP) has been a successful optimization technique for deep learning models. However, its limitations, such as backward- and update-locking, and its biological implausibility, hinder the concurrent updating of layers and do not mimic the local learning processes observed in the human brain. To address these issues, recent research has suggested using local error signals to asynchron…
▽ More
Backpropagation (BP) has been a successful optimization technique for deep learning models. However, its limitations, such as backward- and update-locking, and its biological implausibility, hinder the concurrent updating of layers and do not mimic the local learning processes observed in the human brain. To address these issues, recent research has suggested using local error signals to asynchronously train network blocks. However, this approach often involves extensive trial-and-error iterations to determine the best configuration for local training. This includes decisions on how to decouple network blocks and which auxiliary networks to use for each block. In our work, we introduce a novel BP-free approach: a block-wise BP-free (BWBPF) neural network that leverages local error signals to optimize distinct sub-neural networks separately, where the global loss is only responsible for updating the output layer. The local error signals used in the BP-free model can be computed in parallel, enabling a potential speed-up in the weight update process through parallel implementation. Our experimental results consistently show that this approach can identify transferable decoupled architectures for VGG and ResNet variations, outperforming models trained with end-to-end backpropagation and other state-of-the-art block-wise learning techniques on datasets such as CIFAR-10 and Tiny-ImageNet. The code is released at https://github.com/Belis0811/BWBPF.
△ Less
Submitted 20 December, 2023;
originally announced December 2023.
-
Navigating Open Set Scenarios for Skeleton-based Action Recognition
Authors:
Kunyu Peng,
Cheng Yin,
Junwei Zheng,
Ruiping Liu,
David Schneider,
Jiaming Zhang,
Kailun Yang,
M. Saquib Sarfraz,
Rainer Stiefelhagen,
Alina Roitberg
Abstract:
In real-world scenarios, human actions often fall outside the distribution of training data, making it crucial for models to recognize known actions and reject unknown ones. However, using pure skeleton data in such open-set conditions poses challenges due to the lack of visual background cues and the distinct sparse structure of body pose sequences. In this paper, we tackle the unexplored Open-Se…
▽ More
In real-world scenarios, human actions often fall outside the distribution of training data, making it crucial for models to recognize known actions and reject unknown ones. However, using pure skeleton data in such open-set conditions poses challenges due to the lack of visual background cues and the distinct sparse structure of body pose sequences. In this paper, we tackle the unexplored Open-Set Skeleton-based Action Recognition (OS-SAR) task and formalize the benchmark on three skeleton-based datasets. We assess the performance of seven established open-set approaches on our task and identify their limits and critical generalization issues when dealing with skeleton information. To address these challenges, we propose a distance-based cross-modality ensemble method that leverages the cross-modal alignment of skeleton joints, bones, and velocities to achieve superior open-set recognition performance. We refer to the key idea as CrossMax - an approach that utilizes a novel cross-modality mean max discrepancy suppression mechanism to align latent spaces during training and a cross-modality distance-based logits refinement method during testing. CrossMax outperforms existing approaches and consistently yields state-of-the-art results across all datasets and backbones. The benchmark, code, and models will be released at https://github.com/KPeng9510/OS-SAR.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
Base Station Beamforming Design for Near-field XL-IRS Beam Training
Authors:
Tao Wang,
Changsheng You,
Changchuan Yin
Abstract:
Existing research on extremely large-scale intelligent reflecting surface (XL-IRS) beam training has assumed the far-field channel model for base station (BS)-IRS link. However, this approach may cause degraded beam training performance in practice due to the near-field channel model of the BS-IRS link. To address this issue, we propose two efficient schemes to optimize BS beamforming for improvin…
▽ More
Existing research on extremely large-scale intelligent reflecting surface (XL-IRS) beam training has assumed the far-field channel model for base station (BS)-IRS link. However, this approach may cause degraded beam training performance in practice due to the near-field channel model of the BS-IRS link. To address this issue, we propose two efficient schemes to optimize BS beamforming for improving the XL-IRS beam training performance. Specifically, the first scheme aims to maximize total received signal power on the XL-IRS, which generalizes the existing angle based BS beamforming design and can be resolved using the singular value decomposition (SVD) method. The second scheme aims to maximize the $\ell_1$-norm of incident signals on the XL-IRS, which is shown to achieve the maximum received power at the user. To solve the non-convex $\ell_1$-norm maximization problem, we propose an eficient algorithm by using the alternating optimization (AO) technique. Numerical results show that the proposed AO based BS beamforming design outperforms the SVD/angle based BS beamforming in terms of training accuracy and achievable received signal-to-noise ratio (SNR).
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
A Novel Two-Layer Codebook Based Near-Field Beam Training for Intelligent Reflecting Surface
Authors:
Tao Wang,
Jie Lv,
Haonan Tong,
Changsheng You,
Changchuan Yin
Abstract:
In this paper, we study the codebook-based near-field beam training for intelligent reflecting surfaces (IRSs) aided wireless system. In the considered model, the near-field beam training is critical to focus signals at the location of user equipment (UE) to obtain prominent IRS array gain. However, existing codebook schemes cannot achieve low training overhead and high receiving power simultaneou…
▽ More
In this paper, we study the codebook-based near-field beam training for intelligent reflecting surfaces (IRSs) aided wireless system. In the considered model, the near-field beam training is critical to focus signals at the location of user equipment (UE) to obtain prominent IRS array gain. However, existing codebook schemes cannot achieve low training overhead and high receiving power simultaneously. To tackle this issue, a novel two-layer codebook based beam training scheme is proposed. The layer-1 codebook is designed based on the omnidirectionality of a random-phase beam pattern, which estimates the UE distance with training overhead equivalent to that of one DFT codeword. Then, based on the estimated UE distance, the layer-2 codebook is generated to scan candidate UE locations and obtain the optimal codeword for IRS beamforming. Numerical results show that compared with benchmarks, the proposed two-layer beam training scheme achieves more accurate UE distance and angle estimation, higher data rate, and smaller training overhead.
△ Less
Submitted 18 April, 2023; v1 submitted 13 March, 2023;
originally announced March 2023.
-
Raising The Limit Of Image Rescaling Using Auxiliary Encoding
Authors:
Chenzhong Yin,
Zhihong Pan,
Xin Zhou,
Le Kang,
Paul Bogdan
Abstract:
Normalizing flow models using invertible neural networks (INN) have been widely investigated for successful generative image super-resolution (SR) by learning the transformation between the normal distribution of latent variable $z$ and the conditional distribution of high-resolution (HR) images gave a low-resolution (LR) input. Recently, image rescaling models like IRN utilize the bidirectional n…
▽ More
Normalizing flow models using invertible neural networks (INN) have been widely investigated for successful generative image super-resolution (SR) by learning the transformation between the normal distribution of latent variable $z$ and the conditional distribution of high-resolution (HR) images gave a low-resolution (LR) input. Recently, image rescaling models like IRN utilize the bidirectional nature of INN to push the performance limit of image upscaling by optimizing the downscaling and upscaling steps jointly. While the random sampling of latent variable $z$ is useful in generating diverse photo-realistic images, it is not desirable for image rescaling when accurate restoration of the HR image is more important. Hence, in places of random sampling of $z$, we propose auxiliary encoding modules to further push the limit of image rescaling performance. Two options to store the encoded latent variables in downscaled LR images, both readily supported in existing image file format, are proposed. One is saved as the alpha-channel, the other is saved as meta-data in the image header, and the corresponding modules are denoted as suffixes -A and -M respectively. Optimal network architectural changes are investigated for both options to demonstrate their effectiveness in raising the rescaling performance limit on different baseline models including IRN and DLV-IRN.
△ Less
Submitted 12 March, 2023;
originally announced March 2023.
-
Classification Schemes for the Radar Reference Window: Design and Comparisons
Authors:
Chaoran Yin,
Linjie Yan,
Chengpeng Hao,
Silvia Liberata Ullo,
Gaetano Giunta,
Alfonso Farina,
Danilo Orlando
Abstract:
In this paper, we address the problem of classifying data within the radar reference window in terms of statistical properties. Specifically, we partition these data into statistically homogeneous subsets by identifying possible clutter power variations with respect to the cells under test (accounting for possible range-spread targets) and/or clutter edges. To this end, we consider different situa…
▽ More
In this paper, we address the problem of classifying data within the radar reference window in terms of statistical properties. Specifically, we partition these data into statistically homogeneous subsets by identifying possible clutter power variations with respect to the cells under test (accounting for possible range-spread targets) and/or clutter edges. To this end, we consider different situations of practical interest and formulate the classification problem as multiple hypothesis tests comprising several models for the operating scenario. Then, we solve the hypothesis testing problems by resorting to suitable approximations of the model order selection rules due to the intractable mathematics associated with the maximum likelihood estimation of some parameters. Remarkably, the classification results provided by the proposed architectures represent an advanced clutter map since, besides the estimation of the clutter parameters, they contain a clustering of the range bins in terms of homogeneous subsets. In fact, such information can drive the conventional detectors towards more reliable estimates of the clutter covariance matrix according to the position of the cells under test. The performance analysis confirms that the conceived architectures represent a viable means to recognize the scenario wherein the radar is operating at least for the considered simulation parameters.
△ Less
Submitted 16 February, 2023;
originally announced February 2023.
-
cRedAnno+: Annotation Exploitation in Self-Explanatory Lung Nodule Diagnosis
Authors:
Jiahao Lu,
Chong Yin,
Kenny Erleben,
Michael Bachmann Nielsen,
Sune Darkner
Abstract:
Recently, attempts have been made to reduce annotation requirements in feature-based self-explanatory models for lung nodule diagnosis. As a representative, cRedAnno achieves competitive performance with considerably reduced annotation needs by introducing self-supervised contrastive learning to do unsupervised feature extraction. However, it exhibits unstable performance under scarce annotation c…
▽ More
Recently, attempts have been made to reduce annotation requirements in feature-based self-explanatory models for lung nodule diagnosis. As a representative, cRedAnno achieves competitive performance with considerably reduced annotation needs by introducing self-supervised contrastive learning to do unsupervised feature extraction. However, it exhibits unstable performance under scarce annotation conditions. To improve the accuracy and robustness of cRedAnno, we propose an annotation exploitation mechanism by conducting semi-supervised active learning with sparse seeding and training quenching in the learned semantically meaningful reasoning space to jointly utilise the extracted features, annotations, and unlabelled data. The proposed approach achieves comparable or even higher malignancy prediction accuracy with 10x fewer annotations, meanwhile showing better robustness and nodule attribute prediction accuracy under the condition of 1% annotations. Our complete code is open-source available: https://github.com/diku-dk/credanno.
△ Less
Submitted 7 November, 2022; v1 submitted 28 October, 2022;
originally announced October 2022.
-
Robust Control For Spacecraft Attitude Tracking Under Multiple Physical Limitations with Guaranteed Performance
Authors:
Jiakun Lei,
Tao Meng,
Weijia Wang,
Chengjin Yin,
Zhonghe Jin
Abstract:
This paper considers the prescribed performance control (PPC) of spacecraft attitude tracking under multiple physical constraints, focusing on the robust issues. A novel Barrier Lyapunov function is proposed to realize the guaranteed-performance control under angular velocity constraint without singularity. Additionally, an adaptive strategy for the performance function is presented to soften the…
▽ More
This paper considers the prescribed performance control (PPC) of spacecraft attitude tracking under multiple physical constraints, focusing on the robust issues. A novel Barrier Lyapunov function is proposed to realize the guaranteed-performance control under angular velocity constraint without singularity. Additionally, an adaptive strategy for the performance function is presented to soften the constraint and quickly re-stabilize the system after severe disturbances occur, providing strong robustness. Further, an auxiliary system is designed to handle the input saturation issue, incorporating the actuator limitation into the system. Based on the proposed structure, a backstepping controller is developed accordingly using a double-layer PPC framework. Numerical simulation results are presented to validate the proposed controller framework's efficiency and robustness.8 pa
△ Less
Submitted 13 September, 2022;
originally announced September 2022.
-
Reducing Annotation Need in Self-Explanatory Models for Lung Nodule Diagnosis
Authors:
Jiahao Lu,
Chong Yin,
Oswin Krause,
Kenny Erleben,
Michael Bachmann Nielsen,
Sune Darkner
Abstract:
Feature-based self-explanatory methods explain their classification in terms of human-understandable features. In the medical imaging community, this semantic matching of clinical knowledge adds significantly to the trustworthiness of the AI. However, the cost of additional annotation of features remains a pressing issue. We address this problem by proposing cRedAnno, a data-/annotation-efficient…
▽ More
Feature-based self-explanatory methods explain their classification in terms of human-understandable features. In the medical imaging community, this semantic matching of clinical knowledge adds significantly to the trustworthiness of the AI. However, the cost of additional annotation of features remains a pressing issue. We address this problem by proposing cRedAnno, a data-/annotation-efficient self-explanatory approach for lung nodule diagnosis. cRedAnno considerably reduces the annotation need by introducing self-supervised contrastive learning to alleviate the burden of learning most parameters from annotation, replacing end-to-end training with two-stage training. When training with hundreds of nodule samples and only 1% of their annotations, cRedAnno achieves competitive accuracy in predicting malignancy, meanwhile significantly surpassing most previous works in predicting nodule attributes. Visualisation of the learned space further indicates that the correlation between the clustering of malignancy and nodule attributes coincides with clinical knowledge. Our complete code is open-source available: https://github.com/diku-dk/credanno.
△ Less
Submitted 25 July, 2022; v1 submitted 27 June, 2022;
originally announced June 2022.
-
Reinforcement Learning from Demonstrations by Novel Interactive Expert and Application to Automatic Berthing Control Systems for Unmanned Surface Vessel
Authors:
Haoran Zhang,
Chenkun Yin,
Yanxin Zhang,
Shangtai Jin,
Zhenxuan Li
Abstract:
In this paper, two novel practical methods of Reinforcement Learning from Demonstration (RLfD) are developed and applied to automatic berthing control systems for Unmanned Surface Vessel. A new expert data generation method, called Model Predictive Based Expert (MPBE) which combines Model Predictive Control and Deep Deterministic Policy Gradient, is developed to provide high quality supervision da…
▽ More
In this paper, two novel practical methods of Reinforcement Learning from Demonstration (RLfD) are developed and applied to automatic berthing control systems for Unmanned Surface Vessel. A new expert data generation method, called Model Predictive Based Expert (MPBE) which combines Model Predictive Control and Deep Deterministic Policy Gradient, is developed to provide high quality supervision data for RLfD algorithms. A straightforward RLfD method, model predictive Deep Deterministic Policy Gradient (MP-DDPG), is firstly introduced by replacing the RL agent with MPBE to directly interact with the environment. Then distribution mismatch problem is analyzed for MP-DDPG, and two techniques that alleviate distribution mismatch are proposed. Furthermore, another novel RLfD algorithm based on the MP-DDPG, called Self-Guided Actor-Critic (SGAC) is present, which can effectively leverage MPBE by continuously querying it to generate high quality expert data online. The distribution mismatch problem leading to unstable learning process is addressed by SGAC in a DAgger manner. In addition, theoretical analysis is given to prove that SGAC algorithm can converge with guaranteed monotonic improvement. Simulation results verify the effectiveness of MP-DDPG and SGAC to accomplish the ship berthing control task, and show advantages of SGAC comparing with other typical reinforcement learning algorithms and MP-DDPG.
△ Less
Submitted 23 February, 2022;
originally announced February 2022.
-
Deep Reinforcement Learning for Intelligent Reflecting Surface-assisted D2D Communications
Authors:
Khoi Khac Nguyen,
Antonino Masaracchia,
Cheng Yin,
Long D. Nguyen,
Octavia A. Dobre,
Trung Q. Duong
Abstract:
In this paper, we propose a deep reinforcement learning (DRL) approach for solving the optimisation problem of the network's sum-rate in device-to-device (D2D) communications supported by an intelligent reflecting surface (IRS). The IRS is deployed to mitigate the interference and enhance the signal between the D2D transmitter and the associated D2D receiver. Our objective is to jointly optimise t…
▽ More
In this paper, we propose a deep reinforcement learning (DRL) approach for solving the optimisation problem of the network's sum-rate in device-to-device (D2D) communications supported by an intelligent reflecting surface (IRS). The IRS is deployed to mitigate the interference and enhance the signal between the D2D transmitter and the associated D2D receiver. Our objective is to jointly optimise the transmit power at the D2D transmitter and the phase shift matrix at the IRS to maximise the network sum-rate. We formulate a Markov decision process and then propose the proximal policy optimisation for solving the maximisation game. Simulation results show impressive performance in terms of the achievable rate and processing time.
△ Less
Submitted 5 August, 2021;
originally announced August 2021.
-
Brain Atlas Guided Attention U-Net for White Matter Hyperintensity Segmentation
Authors:
Zicong Zhang,
Kimerly Powell,
Changchang Yin,
Shilei Cao,
Dani Gonzalez,
Yousef Hannawi,
Ping Zhang
Abstract:
White Matter Hyperintensities (WMH) are the most common manifestation of cerebral small vessel disease (cSVD) on the brain MRI. Accurate WMH segmentation algorithms are important to determine cSVD burden and its clinical consequences. Most of existing WMH segmentation algorithms require both fluid attenuated inversion recovery (FLAIR) images and T1-weighted images as inputs. However, T1-weighted i…
▽ More
White Matter Hyperintensities (WMH) are the most common manifestation of cerebral small vessel disease (cSVD) on the brain MRI. Accurate WMH segmentation algorithms are important to determine cSVD burden and its clinical consequences. Most of existing WMH segmentation algorithms require both fluid attenuated inversion recovery (FLAIR) images and T1-weighted images as inputs. However, T1-weighted images are typically not part of standard clinicalscans which are acquired for patients with acute stroke. In this paper, we propose a novel brain atlas guided attention U-Net (BAGAU-Net) that leverages only FLAIR images with a spatially-registered white matter (WM) brain atlas to yield competitive WMH segmentation performance. Specifically, we designed a dual-path segmentation model with two novel connecting mechanisms, namely multi-input attention module (MAM) and attention fusion module (AFM) to fuse the information from two paths for accurate results. Experiments on two publicly available datasets show the effectiveness of the proposed BAGAU-Net. With only FLAIR images and WM brain atlas, BAGAU-Net outperforms the state-of-the-art method with T1-weighted images, paving the way for effective development of WMH segmentation. Availability:https://github.com/Ericzhang1/BAGAU-Net
△ Less
Submitted 21 December, 2020; v1 submitted 19 October, 2020;
originally announced October 2020.
-
A Machine Learning Approach for Task and Resource Allocation in Mobile Edge Computing Based Networks
Authors:
Sihua Wang,
Mingzhe Chen,
Xuanlin Liu,
Changchuan Yin,
Shuguang Cui,
H. Vincent Poor
Abstract:
In this paper, a joint task, spectrum, and transmit power allocation problem is investigated for a wireless network in which the base stations (BSs) are equipped with mobile edge computing (MEC) servers to jointly provide computational and communication services to users. Each user can request one computational task from three types of computational tasks. Since the data size of each computational…
▽ More
In this paper, a joint task, spectrum, and transmit power allocation problem is investigated for a wireless network in which the base stations (BSs) are equipped with mobile edge computing (MEC) servers to jointly provide computational and communication services to users. Each user can request one computational task from three types of computational tasks. Since the data size of each computational task is different, as the requested computational task varies, the BSs must adjust their resource (subcarrier and transmit power) and task allocation schemes to effectively serve the users. This problem is formulated as an optimization problem whose goal is to minimize the maximal computational and transmission delay among all users. A multi-stack reinforcement learning (RL) algorithm is developed to solve this problem. Using the proposed algorithm, each BS can record the historical resource allocation schemes and users' information in its multiple stacks to avoid learning the same resource allocation scheme and users' states, thus improving the convergence speed and learning efficiency. Simulation results illustrate that the proposed algorithm can reduce the number of iterations needed for convergence and the maximal delay among all users by up to 18% and 11.1% compared to the standard Q-learning algorithm.
△ Less
Submitted 20 July, 2020;
originally announced July 2020.
-
AirRL: A Reinforcement Learning Approach to Urban Air Quality Inference
Authors:
Huiqiang Zhong,
Cunxiang Yin,
Xiaohui Wu,
Jinchang Luo,
JiaWei He
Abstract:
Urban air pollution has become a major environmental problem that threatens public health. It has become increasingly important to infer fine-grained urban air quality based on existing monitoring stations. One of the challenges is how to effectively select some relevant stations for air quality inference. In this paper, we propose a novel model based on reinforcement learning for urban air qualit…
▽ More
Urban air pollution has become a major environmental problem that threatens public health. It has become increasingly important to infer fine-grained urban air quality based on existing monitoring stations. One of the challenges is how to effectively select some relevant stations for air quality inference. In this paper, we propose a novel model based on reinforcement learning for urban air quality inference. The model consists of two modules: a station selector and an air quality regressor. The station selector dynamically selects the most relevant monitoring stations when inferring air quality. The air quality regressor takes in the selected stations and makes air quality inference with deep neural network. We conduct experiments on a real-world air quality dataset and our approach achieves the highest performance compared with several popular solutions, and the experiments show significant effectiveness of proposed model in tackling problems of air quality inference.
△ Less
Submitted 26 March, 2020;
originally announced March 2020.
-
Federated Learning for Task and Resource Allocation in Wireless High Altitude Balloon Networks
Authors:
Sihua Wang,
Mingzhe Chen,
Changchuan Yin,
Walid Saad,
Choong Seon Hong,
Shuguang Cui,
H. Vincent Poor
Abstract:
In this paper, the problem of minimizing energy and time consumption for task computation and transmission is studied in a mobile edge computing (MEC)-enabled balloon network. In the considered network, each user needs to process a computational task in each time instant, where high-altitude balloons (HABs), acting as flying wireless base stations, can use their powerful computational abilities to…
▽ More
In this paper, the problem of minimizing energy and time consumption for task computation and transmission is studied in a mobile edge computing (MEC)-enabled balloon network. In the considered network, each user needs to process a computational task in each time instant, where high-altitude balloons (HABs), acting as flying wireless base stations, can use their powerful computational abilities to process the tasks offloaded from their associated users. Since the data size of each user's computational task varies over time, the HABs must dynamically adjust the user association, service sequence, and task partition scheme to meet the users' needs. This problem is posed as an optimization problem whose goal is to minimize the energy and time consumption for task computing and transmission by adjusting the user association, service sequence, and task allocation scheme. To solve this problem, a support vector machine (SVM)-based federated learning (FL) algorithm is proposed to determine the user association proactively. The proposed SVM-based FL method enables each HAB to cooperatively build an SVM model that can determine all user associations without any transmissions of either user historical associations or computational tasks to other HABs. Given the prediction of the optimal user association, the service sequence and task allocation of each user can be optimized so as to minimize the weighted sum of the energy and time consumption. Simulations with real data of city cellular traffic from the OMNILab at Shanghai Jiao Tong University show that the proposed algorithm can reduce the weighted sum of the energy and time consumption of all users by up to 16.1% compared to a conventional centralized method.
△ Less
Submitted 19 March, 2020;
originally announced March 2020.
-
A Multi-Modal States based Vehicle Descriptor and Dilated Convolutional Social Pooling for Vehicle Trajectory Prediction
Authors:
Huimin Zhang,
Yafei Wang,
Junjia Liu,
Chengwei Li,
Taiyuan Ma,
Chengliang Yin
Abstract:
Precise trajectory prediction of surrounding vehicles is critical for decision-making of autonomous vehicles and learning-based approaches are well recognized for the robustness. However, state-of-the-art learning-based methods ignore 1) the feasibility of the vehicle's multi-modal state information for prediction and 2) the mutual exclusive relationship between the global traffic scene receptive…
▽ More
Precise trajectory prediction of surrounding vehicles is critical for decision-making of autonomous vehicles and learning-based approaches are well recognized for the robustness. However, state-of-the-art learning-based methods ignore 1) the feasibility of the vehicle's multi-modal state information for prediction and 2) the mutual exclusive relationship between the global traffic scene receptive fields and the local position resolution when modeling vehicles' interactions, which may influence prediction accuracy. Therefore, we propose a vehicle-descriptor based LSTM model with the dilated convolutional social pooling (VD+DCS-LSTM) to cope with the above issues. First, each vehicle's multi-modal state information is employed as our model's input and a new vehicle descriptor encoded by stacked sparse auto-encoders is proposed to reflect the deep interactive relationships between various states, achieving the optimal feature extraction and effective use of multi-modal inputs. Secondly, the LSTM encoder is used to encode the historical sequences composed of the vehicle descriptor and a novel dilated convolutional social pooling is proposed to improve modeling vehicles' spatial interactions. Thirdly, the LSTM decoder is used to predict the probability distribution of future trajectories based on maneuvers. The validity of the overall model was verified over the NGSIM US-101 and I-80 datasets and our method outperforms the latest benchmark.
△ Less
Submitted 6 March, 2020;
originally announced March 2020.
-
A novel algorithm to get the Fourier power spectra of a real sequence
Authors:
Jiasong Wang,
Changchuan Yin
Abstract:
For a real sequence of length of m = nl, we may deduce its congruence derivative sequence with length of l. The discrete Fourier transform of original sequence can be calculated by the discrete Fourier transform of the congruence derivative sequence. Based on the relation of discrete Fourier transforms between the two sequences, the features of Fourier power spectra of the integer and fractional p…
▽ More
For a real sequence of length of m = nl, we may deduce its congruence derivative sequence with length of l. The discrete Fourier transform of original sequence can be calculated by the discrete Fourier transform of the congruence derivative sequence. Based on the relation of discrete Fourier transforms between the two sequences, the features of Fourier power spectra of the integer and fractional periods for a real sequence have been investigated. It has proved mathematically that after calculating the Fourier power spectrum at an integer period, the Fourier power spectra of the fractional periods associated this integer period can be easily represented by the computational result of the Fourier power spectrum at the integer period for the sequence. A computational experience using a protein sequence shows that some of the computed results are a kind of Fourier power spectra corresponding to new frequencies which can't be obtained from the traditional discrete Fourier transform. Therefore, the algorithm would be a new realization method for discrete Fourier transform of the real sequence.
△ Less
Submitted 12 April, 2019;
originally announced April 2019.