Search | arXiv e-print repository

Hierarchical Testing with Rabbit Optimization for Industrial Cyber-Physical Systems

Authors: Jinwei Hu, Zezhi Tang, Xin Jin, Benyuan Zhang, Yi Dong, Xiaowei Huang

Abstract: This paper presents HERO (Hierarchical Testing with Rabbit Optimization), a novel black-box adversarial testing framework for evaluating the robustness of deep learning-based Prognostics and Health Management systems in Industrial Cyber-Physical Systems. Leveraging Artificial Rabbit Optimization, HERO generates physically constrained adversarial examples that align with real-world data distributio… ▽ More This paper presents HERO (Hierarchical Testing with Rabbit Optimization), a novel black-box adversarial testing framework for evaluating the robustness of deep learning-based Prognostics and Health Management systems in Industrial Cyber-Physical Systems. Leveraging Artificial Rabbit Optimization, HERO generates physically constrained adversarial examples that align with real-world data distributions via global and local perspective. Its generalizability ensures applicability across diverse ICPS scenarios. This study specifically focuses on the Proton Exchange Membrane Fuel Cell system, chosen for its highly dynamic operational conditions, complex degradation mechanisms, and increasing integration into ICPS as a sustainable and efficient energy solution. Experimental results highlight HERO's ability to uncover vulnerabilities in even state-of-the-art PHM models, underscoring the critical need for enhanced robustness in real-world applications. By addressing these challenges, HERO demonstrates its potential to advance more resilient PHM systems across a wide range of ICPS domains. △ Less

Submitted 5 July, 2025; originally announced July 2025.

Comments: Preprint accepted by IEEE Transactions on Industrial Cyber Physical Systems

arXiv:2507.00755 [pdf]

doi 10.1109/TCSI.2025.3578606

LearnAFE: Circuit-Algorithm Co-design Framework for Learnable Audio Analog Front-End

Authors: Jinhai Hu, Zhongyi Zhang, Cong Sheng Leow, Wang Ling Goh, Yuan Gao

Abstract: This paper presents a circuit-algorithm co-design framework for learnable analog front-end (AFE) in audio signal classification. Designing AFE and backend classifiers separately is a common practice but non-ideal, as shown in this paper. Instead, this paper proposes a joint optimization of the backend classifier with the AFE's transfer function to achieve system-level optimum. More specifically, t… ▽ More This paper presents a circuit-algorithm co-design framework for learnable analog front-end (AFE) in audio signal classification. Designing AFE and backend classifiers separately is a common practice but non-ideal, as shown in this paper. Instead, this paper proposes a joint optimization of the backend classifier with the AFE's transfer function to achieve system-level optimum. More specifically, the transfer function parameters of an analog bandpass filter (BPF) bank are tuned in a signal-to-noise ratio (SNR)-aware training loop for the classifier. Using a co-design loss function LBPF, this work shows superior optimization of both the filter bank and the classifier. Implemented in open-source SKY130 130nm CMOS process, the optimized design achieved 90.5%-94.2% accuracy for 10-keyword classification task across a wide range of input signal SNR from 5 dB to 20 dB, with only 22k classifier parameters. Compared to conventional approach, the proposed audio AFE achieves 8.7% and 12.9% reduction in power and capacitor area respectively. △ Less

Submitted 1 July, 2025; originally announced July 2025.

Comments: 11 pages, 15 figures, accepted for publication on IEEE Transactions on Circuits and Systems I: Regular Papers

arXiv:2506.19893 [pdf, ps, other]

Distillation-Enabled Knowledge Alignment for Generative Semantic Communications in AIGC Provisioning Tasks

Authors: Jingzhi Hu, Geoffrey Ye Li

Abstract: Due to the surging amount of AI-generated content (AIGC), its provisioning to edges and mobile users from the cloud incurs substantial traffic on networks. Generative semantic communication (GSC) offers a promising solution by transmitting highly compact information, i.e., prompt text and latent representations, instead of high-dimensional AIGC data. However, GSC relies on the alignment between th… ▽ More Due to the surging amount of AI-generated content (AIGC), its provisioning to edges and mobile users from the cloud incurs substantial traffic on networks. Generative semantic communication (GSC) offers a promising solution by transmitting highly compact information, i.e., prompt text and latent representations, instead of high-dimensional AIGC data. However, GSC relies on the alignment between the knowledge in the cloud generative AI (GAI) and that possessed by the edges and users, and between the knowledge for wireless transmission and that of actual channels, which remains challenging. In this paper, we propose DeKA-g, a distillation-enabled knowledge alignment algorithm for GSC systems. The core idea is to distill the generation knowledge from the cloud-GAI into low-rank matrices, which can be incorporated by the edge and used to adapt the transmission knowledge to diverse wireless channel conditions. DeKA-g comprises two novel methods: metaword-aided knowledge distillation (MAKD) and variable-rate grouped SNR adaptation (VGSA). For MAKD, an optimized metaword is employed to enhance the efficiency of knowledge distillation, while VGSA enables efficient adaptation to diverse compression rates and SNR ranges. From simulation results, DeKA-g improves the alignment between the edge-generated images and the cloud-generated ones by 44%. Moreover, it adapts to compression rates with 116% higher efficiency than the baseline and enhances the performance in low-SNR conditions by 28%. △ Less

Submitted 24 June, 2025; originally announced June 2025.

arXiv:2506.12712 [pdf, ps, other]

Combining Self-attention and Dilation Convolutional for Semantic Segmentation of Coal Maceral Groups

Authors: Zhenghao Xi, Zhengnan Lv, Yang Zheng, Xiang Liu, Zhuang Yu, Junran Chen, Jing Hu, Yaqi Liu

Abstract: The segmentation of coal maceral groups can be described as a semantic segmentation process of coal maceral group images, which is of great significance for studying the chemical properties of coal. Generally, existing semantic segmentation models of coal maceral groups use the method of stacking parameters to achieve higher accuracy. It leads to increased computational requirements and impacts mo… ▽ More The segmentation of coal maceral groups can be described as a semantic segmentation process of coal maceral group images, which is of great significance for studying the chemical properties of coal. Generally, existing semantic segmentation models of coal maceral groups use the method of stacking parameters to achieve higher accuracy. It leads to increased computational requirements and impacts model training efficiency. At the same time, due to the professionalism and diversity of coal maceral group images sampling, obtaining the number of samples for model training requires a long time and professional personnel operation. To address these issues, We have innovatively developed an IoT-based DA-VIT parallel network model. By utilizing this model, we can continuously broaden the dataset through IoT and achieving sustained improvement in the accuracy of coal maceral groups segmentation. Besides, we decouple the parallel network from the backbone network to ensure the normal using of the backbone network during model data updates. Secondly, DCSA mechanism of DA-VIT is introduced to enhance the local feature information of coal microscopic images. This DCSA can decompose the large kernels of convolutional attention into multiple scales and reduce 81.18% of parameters.Finally, we performed the contrast experiment and ablation experiment between DA-VIT and state-of-the-art methods at lots of evaluation metrics. Experimental results show that DA-VIT-Base achieves 92.14% pixel accuracy and 63.18% mIoU. Params and FLOPs of DA-VIT-Tiny are 4.95M and 8.99G, respectively. All of the evaluation metrics of the proposed DA-VIT are better than other state-of-the-art methods. △ Less

Submitted 15 June, 2025; originally announced June 2025.

arXiv:2506.11496 [pdf, ps, other]

Taming Stable Diffusion for Computed Tomography Blind Super-Resolution

Authors: Chunlei Li, Yilei Shi, Haoxi Hu, Jingliang Hu, Xiao Xiang Zhu, Lichao Mou

Abstract: High-resolution computed tomography (CT) imaging is essential for medical diagnosis but requires increased radiation exposure, creating a critical trade-off between image quality and patient safety. While deep learning methods have shown promise in CT super-resolution, they face challenges with complex degradations and limited medical training data. Meanwhile, large-scale pre-trained diffusion mod… ▽ More High-resolution computed tomography (CT) imaging is essential for medical diagnosis but requires increased radiation exposure, creating a critical trade-off between image quality and patient safety. While deep learning methods have shown promise in CT super-resolution, they face challenges with complex degradations and limited medical training data. Meanwhile, large-scale pre-trained diffusion models, particularly Stable Diffusion, have demonstrated remarkable capabilities in synthesizing fine details across various vision tasks. Motivated by this, we propose a novel framework that adapts Stable Diffusion for CT blind super-resolution. We employ a practical degradation model to synthesize realistic low-quality images and leverage a pre-trained vision-language model to generate corresponding descriptions. Subsequently, we perform super-resolution using Stable Diffusion with a specialized controlling strategy, conditioned on both low-resolution inputs and the generated text descriptions. Extensive experiments show that our method outperforms existing approaches, demonstrating its potential for achieving high-quality CT imaging at reduced radiation doses. Our code will be made publicly available. △ Less

Submitted 13 June, 2025; originally announced June 2025.

arXiv:2506.11339 [pdf, ps, other]

WIP: Exploring the Value of a Debugging Cheat Sheet and Mini Lecture in Improving Undergraduate Debugging Skills and Mindset

Authors: Andrew Ash, John Hu

Abstract: This work-in-progress research paper explores the efficacy of a small-scale microelectronics debugging education intervention utilizing quasi-experimental design in an introductory microelectronics course for third-year electrical and computer engineering (ECE) students. In the first semester of research, the experimental group attended a debugging "mini lecture" covering two common sources of cir… ▽ More This work-in-progress research paper explores the efficacy of a small-scale microelectronics debugging education intervention utilizing quasi-experimental design in an introductory microelectronics course for third-year electrical and computer engineering (ECE) students. In the first semester of research, the experimental group attended a debugging "mini lecture" covering two common sources of circuit error and received a debugging cheat sheet with recommendations for testing and hypothesis formation. Across three debugging problems, students in the experimental group were faster by an average of 1:43 and had a 7 percent higher success rate than the control group. Both groups demonstrated a strong general growth mindset while the experimental group also displayed a shift in their debugging mindset by perceiving a greater value towards debugging. Though these differences are not yet statistically significant, the pilot results indicate that a mini-lecture and debugging cheat sheet are steps in the right direction toward improving students' readiness for debugging in the workplace. △ Less

Submitted 12 June, 2025; originally announced June 2025.

Comments: This is the accepted version of a paper accepted for presentation at the 2025 IEEE Frontiers in Education Conference (FIE). The final version will be available via IEEE Xplore at: https://ieeexplore.ieee.org

arXiv:2506.00397 [pdf]

A Family of Robust Generalized Adaptive Filters and Application for Time-series Prediction

Authors: Yi Peng, Haiquan Zhao, Jinhui Hu

Abstract: The continuous development of new adaptive filters (AFs) based on novel cost functions (CFs) is driven by the demands of various application scenarios and noise environments. However, these algorithms typically demonstrate optimal performance only in specific conditions. In the event of the noise change, the performance of these AFs often declines, rendering simple parameter adjustments ineffectiv… ▽ More The continuous development of new adaptive filters (AFs) based on novel cost functions (CFs) is driven by the demands of various application scenarios and noise environments. However, these algorithms typically demonstrate optimal performance only in specific conditions. In the event of the noise change, the performance of these AFs often declines, rendering simple parameter adjustments ineffective. Instead, a modification of the CF is necessary. To address this issue, the robust generalized adaptive AF (RGA-AF) with strong adaptability and flexibility is proposed in this paper. The flexibility of the RGA-AF's CF allows for smooth adaptation to varying noise environments through parameter adjustments, ensuring optimal filtering performance in diverse scenarios. Moreover, we introduce several fundamental properties of negative RGA (NRGA) entropy and present the negative asymmetric RGA-AF (NAR-GA-AF) and kernel recursive NRGA-AF (KRNRGA-AF). These AFs address asymmetric noise distribution and nonlinear filtering issues, respectively. Simulations of linear system identification and time-series prediction for Chua's circuit under different noise environments demonstrate the superiority of the proposed algorithms in comparison to existing techniques. △ Less

Submitted 31 May, 2025; originally announced June 2025.

arXiv:2505.23743 [pdf, ps, other]

DarkDiff: Advancing Low-Light Raw Enhancement by Retasking Diffusion Models for Camera ISP

Authors: Amber Yijia Zheng, Yu Zhang, Jun Hu, Raymond A. Yeh, Chen Chen

Abstract: High-quality photography in extreme low-light conditions is challenging but impactful for digital cameras. With advanced computing hardware, traditional camera image signal processor (ISP) algorithms are gradually being replaced by efficient deep networks that enhance noisy raw images more intelligently. However, existing regression-based models often minimize pixel errors and result in oversmooth… ▽ More High-quality photography in extreme low-light conditions is challenging but impactful for digital cameras. With advanced computing hardware, traditional camera image signal processor (ISP) algorithms are gradually being replaced by efficient deep networks that enhance noisy raw images more intelligently. However, existing regression-based models often minimize pixel errors and result in oversmoothing of low-light photos or deep shadows. Recent work has attempted to address this limitation by training a diffusion model from scratch, yet those models still struggle to recover sharp image details and accurate colors. We introduce a novel framework to enhance low-light raw images by retasking pre-trained generative diffusion models with the camera ISP. Extensive experiments demonstrate that our method outperforms the state-of-the-art in perceptual quality across three challenging low-light raw image benchmarks. △ Less

Submitted 29 May, 2025; originally announced May 2025.

arXiv:2505.17030 [pdf, ps, other]

Distillation-Enabled Knowledge Alignment Protocol for Semantic Communication in AI Agent Networks

Authors: Jingzhi Hu, Geoffrey Ye Li

Abstract: Future networks are envisioned to connect massive artificial intelligence (AI) agents, enabling their extensive collaboration on diverse tasks. Compared to traditional entities, these agents naturally suit the semantic communication (SC), which can significantly enhance the bandwidth efficiency. Nevertheless, SC requires the knowledge among agents to be aligned, while agents have distinct expert k… ▽ More Future networks are envisioned to connect massive artificial intelligence (AI) agents, enabling their extensive collaboration on diverse tasks. Compared to traditional entities, these agents naturally suit the semantic communication (SC), which can significantly enhance the bandwidth efficiency. Nevertheless, SC requires the knowledge among agents to be aligned, while agents have distinct expert knowledge for their individual tasks in practice. In this paper, we propose a distillation-enabled knowledge alignment protocol (DeKAP), which distills the expert knowledge of each agent into parameter-efficient low-rank matrices, allocates them across the network, and allows agents to simultaneously maintain aligned knowledge for multiple tasks. We formulate the joint minimization of alignment loss, communication overhead, and storage cost as a large-scale integer linear programming problem and develop a highly efficient greedy algorithm. From computer simulation, the DeKAP establishes knowledge alignment with the lowest communication and computation resources compared to conventional approaches. △ Less

Submitted 7 May, 2025; originally announced May 2025.

arXiv:2505.15868 [pdf]

An Inclusive Foundation Model for Generalizable Cytogenetics in Precision Oncology

Authors: Changchun Yang, Weiqian Dai, Yilan Zhang, Siyuan Chen, Jingdong Hu, Junkai Su, Yuxuan Chen, Ao Xu, Na Li, Xin Gao, Yongguo Yu

Abstract: Chromosome analysis is vital for diagnosing genetic disorders and guiding cancer therapy decisions through the identification of somatic clonal aberrations. However, developing an AI model are hindered by the overwhelming complexity and diversity of chromosomal abnormalities, requiring extensive annotation efforts, while automated methods remain task-specific and lack generalizability due to the s… ▽ More Chromosome analysis is vital for diagnosing genetic disorders and guiding cancer therapy decisions through the identification of somatic clonal aberrations. However, developing an AI model are hindered by the overwhelming complexity and diversity of chromosomal abnormalities, requiring extensive annotation efforts, while automated methods remain task-specific and lack generalizability due to the scarcity of comprehensive datasets spanning diverse resource conditions. Here, we introduce CHROMA, a foundation model for cytogenomics, designed to overcome these challenges by learning generalizable representations of chromosomal abnormalities. Pre-trained on over 84,000 specimens (~4 million chromosomal images) via self-supervised learning, CHROMA outperforms other methods across all types of abnormalities, even when trained on fewer labelled data and more imbalanced datasets. By facilitating comprehensive mapping of instability and clonal leisons across various aberration types, CHROMA offers a scalable and generalizable solution for reliable and automated clinical analysis, reducing the annotation workload for experts and advancing precision oncology through the early detection of rare genomic abnormalities, enabling broad clinical AI applications and making advanced genomic analysis more accessible. △ Less

Submitted 21 May, 2025; originally announced May 2025.

Comments: These authors contributed equally to this work: Changchun Yang, Weiqian Dai, Yilan Zhang

arXiv:2505.11793 [pdf, other]

doi 10.1109/TGRS.2024.3388426

CL-CaGAN: Capsule differential adversarial continuous learning for cross-domain hyperspectral anomaly detection

Authors: Jianing Wang, Siying Guo, Zheng Hua, Runhu Huang, Jinyu Hu, Maoguo Gong

Abstract: Anomaly detection (AD) has attracted remarkable attention in hyperspectral image (HSI) processing fields, and most existing deep learning (DL)-based algorithms indicate dramatic potential for detecting anomaly samples through specific training process under current scenario. However, the limited prior information and the catastrophic forgetting problem indicate crucial challenges for existing DL s… ▽ More Anomaly detection (AD) has attracted remarkable attention in hyperspectral image (HSI) processing fields, and most existing deep learning (DL)-based algorithms indicate dramatic potential for detecting anomaly samples through specific training process under current scenario. However, the limited prior information and the catastrophic forgetting problem indicate crucial challenges for existing DL structure in open scenarios cross-domain detection. In order to improve the detection performance, a novel continual learning-based capsule differential generative adversarial network (CL-CaGAN) is proposed to elevate the cross-scenario learning performance for facilitating the real application of DL-based structure in hyperspectral AD (HAD) task. First, a modified capsule structure with adversarial learning network is constructed to estimate the background distribution for surmounting the deficiency of prior information. To mitigate the catastrophic forgetting phenomenon, clustering-based sample replay strategy and a designed extra self-distillation regularization are integrated for merging the history and future knowledge in continual AD task, while the discriminative learning ability from previous detection scenario to current scenario is retained by the elaborately designed structure with continual learning (CL) strategy. In addition, the differentiable enhancement is enforced to augment the generation performance of the training data. This further stabilizes the training process with better convergence and efficiently consolidates the reconstruction ability of background samples. To verify the effectiveness of our proposed CL-CaGAN, we conduct experiments on several real HSIs, and the results indicate that the proposed CL-CaGAN demonstrates higher detection performance and continuous learning capacity for mitigating the catastrophic forgetting under cross-domain scenarios. △ Less

Submitted 16 May, 2025; originally announced May 2025.

Journal ref: IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1-15,2024

arXiv:2505.00237 [pdf, ps, other]

Future-Oriented Navigation: Dynamic Obstacle Avoidance with One-Shot Energy-Based Multimodal Motion Prediction

Authors: Ze Zhang, Georg Hess, Junjie Hu, Emmanuel Dean, Lennart Svensson, Knut Åkesson

Abstract: This paper proposes an integrated approach for the safe and efficient control of mobile robots in dynamic and uncertain environments. The approach consists of two key steps: one-shot multimodal motion prediction to anticipate motions of dynamic obstacles and model predictive control to incorporate these predictions into the motion planning process. Motion prediction is driven by an energy-based ne… ▽ More This paper proposes an integrated approach for the safe and efficient control of mobile robots in dynamic and uncertain environments. The approach consists of two key steps: one-shot multimodal motion prediction to anticipate motions of dynamic obstacles and model predictive control to incorporate these predictions into the motion planning process. Motion prediction is driven by an energy-based neural network that generates high-resolution, multi-step predictions in a single operation. The prediction outcomes are further utilized to create geometric shapes formulated as mathematical constraints. Instead of treating each dynamic obstacle individually, predicted obstacles are grouped by proximity in an unsupervised way to improve performance and efficiency. The overall collision-free navigation is handled by model predictive control with a specific design for proactive dynamic obstacle avoidance. The proposed approach allows mobile robots to navigate effectively in dynamic environments. Its performance is accessed across various scenarios that represent typical warehouse settings. The results demonstrate that the proposed approach outperforms other existing dynamic obstacle avoidance methods. △ Less

Submitted 4 June, 2025; v1 submitted 30 April, 2025; originally announced May 2025.

Comments: Published in IEEE Robotics and Automation Letters (RA-L)

arXiv:2504.15178 [pdf]

Time-Series Analysis on Edge-AI Hardware for Healthcare Monitoring

Authors: Jinhai Hu

Abstract: This project addresses the need for efficient, real-time analysis of biomedical signals such as electrocardiograms (ECG) and electroencephalograms (EEG) for continuous health monitoring. Traditional methods rely on long-duration data recording followed by offline analysis, which is power-intensive and delays responses to critical symptoms such as arrhythmia. To overcome these limitations, a time-d… ▽ More This project addresses the need for efficient, real-time analysis of biomedical signals such as electrocardiograms (ECG) and electroencephalograms (EEG) for continuous health monitoring. Traditional methods rely on long-duration data recording followed by offline analysis, which is power-intensive and delays responses to critical symptoms such as arrhythmia. To overcome these limitations, a time-domain ECG analysis model based on a novel dynamically-biased Long Short-Term Memory (DB-LSTM) neural network is proposed. This model supports simultaneous ECG forecasting and classification with high performance-achieving over 98% accuracy and a normalized mean square error below 1e-3 for forecasting, and over 97% accuracy with faster convergence and fewer training parameters for classification. To enable edge deployment, the model is hardware-optimized by quantizing weights to INT4 or INT3 formats, resulting in only a 2% and 6% drop in classification accuracy during training and inference, respectively, while maintaining full accuracy for forecasting. Extensive simulations using multiple ECG datasets confirm the model's robustness. Future work includes implementing the algorithm on FPGA and CMOS circuits for practical cardiac monitoring, as well as developing a digital hardware platform that supports flexible neural network configurations and on-chip online training for personalized healthcare applications. △ Less

Submitted 21 April, 2025; originally announced April 2025.

Comments: 38 pages, 20 figures, Progress report for qualification cum PhD confirmation exercise

arXiv:2504.14952 [pdf, other]

PIV-FlowDiffuser:Transfer-learning-based denoising diffusion models for PIV

Authors: Qianyu Zhu, Junjie Wang, Jeremiah Hu, Jia Ai, Yong Lee

Abstract: Deep learning algorithms have significantly reduced the computational time and improved the spatial resolution of particle image velocimetry~(PIV). However, the models trained on synthetic datasets might have a degraded performance on practical particle images due to domain gaps. As a result, special residual patterns are often observed for the vector fields of deep learning-based estimators. To r… ▽ More Deep learning algorithms have significantly reduced the computational time and improved the spatial resolution of particle image velocimetry~(PIV). However, the models trained on synthetic datasets might have a degraded performance on practical particle images due to domain gaps. As a result, special residual patterns are often observed for the vector fields of deep learning-based estimators. To reduce the special noise step-by-step, we employ a denoising diffusion model~(FlowDiffuser) for PIV analysis. And the data-hungry iterative denoising diffusion model is trained via a transfer learning strategy, resulting in our PIV-FlowDiffuser method. Specifically, (1) pre-training a FlowDiffuser model with multiple optical flow datasets of the computer vision community, such as Sintel, KITTI, etc; (2) fine-tuning the pre-trained model on synthetic PIV datasets. Note that the PIV images are upsampled by a factor of two to resolve the small-scale turbulent flow structures. The visualized results indicate that our PIV-FlowDiffuser effectively suppresses the noise patterns. Therefore, the denoising diffusion model reduces the average end-point error~($AEE$) by 59.4% over RAFT256-PIV baseline on the classic Cai's dataset. Besides, PIV-FlowDiffuser exhibits enhanced generalization performance on unseen particle images due to transfer learning. Overall, this study highlights the transfer-learning-based denoising diffusion models for PIV. And a detailed implementation is recommended for interested readers in the repository https://github.com/Zhu-Qianyu/PIV-FlowDiffuser. △ Less

Submitted 21 April, 2025; originally announced April 2025.

arXiv:2504.12711 [pdf, other]

NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results

Authors: Xin Li, Yeying Jin, Xin Jin, Zongwei Wu, Bingchen Li, Yufei Wang, Wenhan Yang, Yu Li, Zhibo Chen, Bihan Wen, Robby T. Tan, Radu Timofte, Qiyu Rong, Hongyuan Jing, Mengmeng Zhang, Jinglong Li, Xiangyu Lu, Yi Ren, Yuting Liu, Meng Zhang, Xiang Chen, Qiyuan Guan, Jiangxin Dong, Jinshan Pan, Conglin Gou , et al. (112 additional authors not shown)

Abstract: This paper reviews the NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images. This challenge received a wide range of impressive solutions, which are developed and evaluated using our collected real-world Raindrop Clarity dataset. Unlike existing deraining datasets, our Raindrop Clarity dataset is more diverse and challenging in degradation types and contents, which includ… ▽ More This paper reviews the NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images. This challenge received a wide range of impressive solutions, which are developed and evaluated using our collected real-world Raindrop Clarity dataset. Unlike existing deraining datasets, our Raindrop Clarity dataset is more diverse and challenging in degradation types and contents, which includes day raindrop-focused, day background-focused, night raindrop-focused, and night background-focused degradations. This dataset is divided into three subsets for competition: 14,139 images for training, 240 images for validation, and 731 images for testing. The primary objective of this challenge is to establish a new and powerful benchmark for the task of removing raindrops under varying lighting and focus conditions. There are a total of 361 participants in the competition, and 32 teams submitting valid solutions and fact sheets for the final testing phase. These submissions achieved state-of-the-art (SOTA) performance on the Raindrop Clarity dataset. The project can be found at https://lixinustc.github.io/CVPR-NTIRE2025-RainDrop-Competition.github.io/. △ Less

Submitted 19 April, 2025; v1 submitted 17 April, 2025; originally announced April 2025.

Comments: Challenge Report of CVPR NTIRE 2025; 26 pages; Methods from 32 teams

arXiv:2504.10686 [pdf, other]

The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report

Authors: Bin Ren, Hang Guo, Lei Sun, Zongwei Wu, Radu Timofte, Yawei Li, Yao Zhang, Xinning Chai, Zhengxue Cheng, Yingsheng Qin, Yucai Yang, Li Song, Hongyuan Yu, Pufan Xu, Cheng Wan, Zhijuan Huang, Peng Guo, Shuyuan Cui, Chenjun Li, Xuehai Hu, Pan Pan, Xin Zhang, Heng Zhang, Qing Luo, Linyan Jiang , et al. (122 additional authors not shown)

Abstract: This paper presents a comprehensive review of the NTIRE 2025 Challenge on Single-Image Efficient Super-Resolution (ESR). The challenge aimed to advance the development of deep models that optimize key computational metrics, i.e., runtime, parameters, and FLOPs, while achieving a PSNR of at least 26.90 dB on the $\operatorname{DIV2K\_LSDIR\_valid}$ dataset and 26.99 dB on the… ▽ More This paper presents a comprehensive review of the NTIRE 2025 Challenge on Single-Image Efficient Super-Resolution (ESR). The challenge aimed to advance the development of deep models that optimize key computational metrics, i.e., runtime, parameters, and FLOPs, while achieving a PSNR of at least 26.90 dB on the $\operatorname{DIV2K\_LSDIR\_valid}$ dataset and 26.99 dB on the $\operatorname{DIV2K\_LSDIR\_test}$ dataset. A robust participation saw \textbf{244} registered entrants, with \textbf{43} teams submitting valid entries. This report meticulously analyzes these methods and results, emphasizing groundbreaking advancements in state-of-the-art single-image ESR techniques. The analysis highlights innovative approaches and establishes benchmarks for future research in the field. △ Less

Submitted 14 April, 2025; originally announced April 2025.

Comments: Accepted by CVPR2025 NTIRE Workshop, Efficient Super-Resolution Challenge Report. 50 pages

arXiv:2504.07731 [pdf]

Adaptive Robust Unscented Kalman Filter for Dynamic State Estimation of Power System

Authors: Duc Viet Nguyen, Haiquan Zhao, Jinhui Hu, Le Ngoc Giang

Abstract: Non-Gaussian noise and the uncertainty of noise distribution are the common factors that reduce accuracy in dynamic state estimation of power systems (PS). In addition, the optimal value of the free coefficients in the unscented Kalman filter (UKF) based on information theoretic criteria is also an urgent problem. In this paper, a robust adaptive UKF (AUKF) under generalized minimum mixture error… ▽ More Non-Gaussian noise and the uncertainty of noise distribution are the common factors that reduce accuracy in dynamic state estimation of power systems (PS). In addition, the optimal value of the free coefficients in the unscented Kalman filter (UKF) based on information theoretic criteria is also an urgent problem. In this paper, a robust adaptive UKF (AUKF) under generalized minimum mixture error entropy with fiducial points (GMMEEF) over improve Snow Geese algorithm (ISGA) (ISGA-GMMEEF-AUKF) is proposed to overcome the above difficulties. The estimation process of the proposed algorithm is based on several key steps including augmented regression error model (AREM) construction, adaptive state estimation, and free coefficients optimization. Specifically, an AREM consisting of state prediction and measurement errors is established at the first step. Then, GMMEEF-AUKF is developed by solving the optimization problem based on GMMEEF, which uses a generalized Gaussian kernel combined with mixture correntropy to enhance the flexibility further and resolve the data problem with complex attributes and update the noise covariance matrix according to the AREM framework. Finally, the ISGA is designed to automatically calculate the optimal value of coefficients such as the shape coefficients of the kernel in the GMMEEF criterion, the coefficients selection sigma points in unscented transform, and the update coefficient of the noise covariance matrices fit with the PS model. Simulation results on the IEEE 14, 30, and 57-bus test systems in complex scenarios have confirmed that the proposed algorithm outperforms the MEEF-UKF and UKF by an average efficiency of 26% and 65%, respectively. △ Less

Submitted 10 April, 2025; originally announced April 2025.

Comments: 11 pages, 10 figures,

MSC Class: 94-10; 94-05 ACM Class: H.1.1; H.4.3

arXiv:2504.07365 [pdf, ps, other]

Diffusion Augmented Complex Maximum Total Correntropy Algorithm for Power System Frequency Estimation

Authors: Haiquan Zhao, Yi Peng, Jinsong Chen, Jinhui Hu

Abstract: Currently, adaptive filtering algorithms have been widely applied in frequency estimation for power systems. However, research on diffusion tasks remains insufficient. Existing diffusion adaptive frequency estimation algorithms exhibit certain limitations in handling input noise and lack robustness against impulsive noise. Moreover, traditional adaptive filtering algorithms designed based on the s… ▽ More Currently, adaptive filtering algorithms have been widely applied in frequency estimation for power systems. However, research on diffusion tasks remains insufficient. Existing diffusion adaptive frequency estimation algorithms exhibit certain limitations in handling input noise and lack robustness against impulsive noise. Moreover, traditional adaptive filtering algorithms designed based on the strictly-linear (SL) model fail to effectively address frequency estimation challenges in unbalanced three-phase power systems. To address these issues, this letter proposes an improved diffusion augmented complex maximum total correntropy (DAMTCC) algorithm based on the widely linear (WL) model. The proposed algorithm not only significantly enhances the capability to handle input noise but also demonstrates superior robustness to impulsive noise. Furthermore, it successfully resolves the critical challenge of frequency estimation in unbalanced three-phase power systems, offering an efficient and reliable solution for diffusion power system frequency estimation. Finally, we analyze the stability of the algorithm and computer simulations verify the excellent performance of the algorithm. △ Less

Submitted 9 April, 2025; originally announced April 2025.

arXiv:2503.23883 [pdf, ps, other]

Algorithm Design and Prototype Validation for Reconfigurable Intelligent Sensing Surface: Forward-Only Transmission

Authors: Cheng Luo, Luping Xiang, Jie Hu, Kun Yang

Abstract: Sensing-assisted communication schemes have recently garnered significant research attention. In this work, we design a dual-function reconfigurable intelligent surface (RIS), integrating both active and passive elements, referred to as the reconfigurable intelligent sensing surface (RISS), to enhance communication. By leveraging sensing results from the active elements, we propose communication e… ▽ More Sensing-assisted communication schemes have recently garnered significant research attention. In this work, we design a dual-function reconfigurable intelligent surface (RIS), integrating both active and passive elements, referred to as the reconfigurable intelligent sensing surface (RISS), to enhance communication. By leveraging sensing results from the active elements, we propose communication enhancement and robust interference suppression schemes for both near-field and far-field models, implemented through the passive elements. These schemes remove the need for base station (BS) feedback for RISS control, simplifying the communication process by replacing traditional channel state information (CSI) feedback with real-time sensing from the active elements. The proposed schemes are theoretically analyzed and then validated using software-defined radio (SDR). Experimental results demonstrate the effectiveness of the sensing algorithms in real-world scenarios, such as direction of arrival (DOA) estimation and radio frequency (RF) identification recognition. Moreover, the RISS-assisted communication system shows strong performance in communication enhancement and interference suppression, particularly in near-field models. △ Less

Submitted 19 June, 2025; v1 submitted 31 March, 2025; originally announced March 2025.

arXiv:2503.14966 [pdf, other]

Ultrasound Image-to-Video Synthesis via Latent Dynamic Diffusion Models

Authors: Tingxiu Chen, Yilei Shi, Zixuan Zheng, Bingcong Yan, Jingliang Hu, Xiao Xiang Zhu, Lichao Mou

Abstract: Ultrasound video classification enables automated diagnosis and has emerged as an important research area. However, publicly available ultrasound video datasets remain scarce, hindering progress in developing effective video classification models. We propose addressing this shortage by synthesizing plausible ultrasound videos from readily available, abundant ultrasound images. To this end, we intr… ▽ More Ultrasound video classification enables automated diagnosis and has emerged as an important research area. However, publicly available ultrasound video datasets remain scarce, hindering progress in developing effective video classification models. We propose addressing this shortage by synthesizing plausible ultrasound videos from readily available, abundant ultrasound images. To this end, we introduce a latent dynamic diffusion model (LDDM) to efficiently translate static images to dynamic sequences with realistic video characteristics. We demonstrate strong quantitative results and visually appealing synthesized videos on the BUSV benchmark. Notably, training video classification models on combinations of real and LDDM-synthesized videos substantially improves performance over using real data alone, indicating our method successfully emulates dynamics critical for discrimination. Our image-to-video approach provides an effective data augmentation solution to advance ultrasound video analysis. Code is available at https://github.com/MedAITech/U_I2V. △ Less

Submitted 19 March, 2025; originally announced March 2025.

Comments: MICCAI 2024

arXiv:2503.13987 [pdf, other]

Striving for Simplicity: Simple Yet Effective Prior-Aware Pseudo-Labeling for Semi-Supervised Ultrasound Image Segmentation

Authors: Yaxiong Chen, Yujie Wang, Zixuan Zheng, Jingliang Hu, Yilei Shi, Shengwu Xiong, Xiao Xiang Zhu, Lichao Mou

Abstract: Medical ultrasound imaging is ubiquitous, but manual analysis struggles to keep pace. Automated segmentation can help but requires large labeled datasets, which are scarce. Semi-supervised learning leveraging both unlabeled and limited labeled data is a promising approach. State-of-the-art methods use consistency regularization or pseudo-labeling but grow increasingly complex. Without sufficient l… ▽ More Medical ultrasound imaging is ubiquitous, but manual analysis struggles to keep pace. Automated segmentation can help but requires large labeled datasets, which are scarce. Semi-supervised learning leveraging both unlabeled and limited labeled data is a promising approach. State-of-the-art methods use consistency regularization or pseudo-labeling but grow increasingly complex. Without sufficient labels, these models often latch onto artifacts or allow anatomically implausible segmentations. In this paper, we present a simple yet effective pseudo-labeling method with an adversarially learned shape prior to regularize segmentations. Specifically, we devise an encoder-twin-decoder network where the shape prior acts as an implicit shape model, penalizing anatomically implausible but not ground-truth-deviating predictions. Without bells and whistles, our simple approach achieves state-of-the-art performance on two benchmarks under different partition protocols. We provide a strong baseline for future semi-supervised medical image segmentation. Code is available at https://github.com/WUTCM-Lab/Shape-Prior-Semi-Seg. △ Less

Submitted 18 March, 2025; originally announced March 2025.

Comments: MICCAI 2024

arXiv:2503.08220 [pdf, other]

Bedrock Models in Communication and Sensing: Advancing Generalization, Transferability, and Performance

Authors: Cheng Luo, Luping Xiang, Jie Hu, Kun Yang

Abstract: Deep learning (DL) has emerged as a powerful tool for addressing the intricate challenges inherent in communication and sensing systems, significantly enhancing the intelligence of future sixth-generation (6G) networks. A substantial body of research has highlighted the promise of DL-based techniques in these domains. However, in addition to improving accuracy, new challenges must be addressed reg… ▽ More Deep learning (DL) has emerged as a powerful tool for addressing the intricate challenges inherent in communication and sensing systems, significantly enhancing the intelligence of future sixth-generation (6G) networks. A substantial body of research has highlighted the promise of DL-based techniques in these domains. However, in addition to improving accuracy, new challenges must be addressed regarding the generalization and transferability of DL-based systems. To tackle these issues, this paper introduces a series of mathematically grounded and modularized models, referred to as bedrock models, specifically designed for integration into both communication and sensing systems. Due to their modular architecture, these models can be seamlessly incorporated into existing communication and sensing frameworks. For communication systems, the proposed models demonstrate substantial performance improvements while also exhibit strong transferability, enabling direct parameter sharing across different tasks, which greatly facilitates practical deployment. In sensing applications, the integration of the bedrock models into existing systems results in superior performance, reducing delay and Doppler estimation errors by an order of magnitude compared to traditional methods. Additionally, a pre-equalization strategy based on the bedrock models is proposed for the transmitter. By leveraging sensing information, the transmitted communication signal is dynamically adjusted without altering the communication model pre-trained in AWGN channels. This adaptation enables the system to effectively cope with doubly dispersive channels, restoring the received signal to an AWGN-like condition and achieving near-optimal performance. Simulation results substantiate the effectiveness and transferability of the proposed bedrock models, underscoring their potential to advance both communication and sensing systems. △ Less

Submitted 11 March, 2025; originally announced March 2025.

arXiv:2503.08202 [pdf, other]

Low-Complexity Beamforming Design for Null Space-based Simultaneous Wireless Information and Power Transfer Systems

Authors: Cheng Luo, Jie Hu, Luping Xiang, Kun Yang

Abstract: Simultaneous wireless information and power transfer (SWIPT) is a promising technology for the upcoming sixth-generation (6G) communication networks, enabling internet of things (IoT) devices and sensors to extend their operational lifetimes. In this paper, we propose a SWIPT scheme by projecting the interference signals from both intra-wireless information transfer (WIT) and inter-wireless energy… ▽ More Simultaneous wireless information and power transfer (SWIPT) is a promising technology for the upcoming sixth-generation (6G) communication networks, enabling internet of things (IoT) devices and sensors to extend their operational lifetimes. In this paper, we propose a SWIPT scheme by projecting the interference signals from both intra-wireless information transfer (WIT) and inter-wireless energy transfer (WET) into the null space, simplifying the system into a point-to-point WIT and WET problem. Upon further analysis, we confirm that dedicated energy beamforming is unnecessary. In addition, we develop a low-complexity algorithm to solve the problem efficiently, further reducing computational overhead. Numerical results validate our analysis, showing that the computational complexity is reduced by 97.5\% and 99.96\% for the cases of $K^I = K^E = 2$, $M = 4$ and $K^I = K^E = 16$, $M = 64$, respectively. △ Less

Submitted 11 March, 2025; originally announced March 2025.

arXiv:2503.08198 [pdf, other]

Reconfigurable Intelligent Sensing Surface enables Wireless Powered Communication Networks: Interference Suppression and Massive Wireless Energy Transfer

Authors: Cheng Luo, Jie Hu, Luping Xiang, Kun Yang

Abstract: Recently, a novel structures of reconfigurable intelligent surface (RIS) integrating both passive and active elements, termed reconfigurable intelligent sensing surface (RISS), efficiently addresses challenges in RIS channel estimation and mitigates issues related to multiplicative path loss by processing the signal at the RISS. In this paper, we propose a sensing-assisted wirelessly powered commu… ▽ More Recently, a novel structures of reconfigurable intelligent surface (RIS) integrating both passive and active elements, termed reconfigurable intelligent sensing surface (RISS), efficiently addresses challenges in RIS channel estimation and mitigates issues related to multiplicative path loss by processing the signal at the RISS. In this paper, we propose a sensing-assisted wirelessly powered communication network (WPCN) that utilizes RISS's sensing capabilities to maximize the channel capacity in uplink wireless information transfer (WIT) and assist in massive wireless energy transmission (WET) for downlink. For the WIT in the uplink, the sensing information is utilized to design an interference suppression passive reflection phase shift for the RISS, and take the imperfect sensing results and sharp null into consideration, we also propose a robust scheme. For the WET in the downlink, the massive WET scheme is adopted and benefits from a period of sensing results. The massive WET scheme including beam selection and rotation order optimization to enhance the lower bound of energy harvest for massive users and optimize waiting costs. Numerical results demonstrate the optimal interference suppression threshold for uplink WIT and underscore the achieved fairness in downlink WET. Collectively, by utilizing sensing information, the uplink channel capacity is improved by 20\%, and the worst energy performance and waiting costs for massive WET are effectively optimized, with improvements ranging from 19\% to 59\% and 27\% to 29\%, respectively. △ Less

Submitted 11 March, 2025; originally announced March 2025.

arXiv:2503.04147 [pdf, other]

Energy-Efficient Port Selection and Beamforming Design for Integrated Data and Energy Transfer Assisted by Fluid Antennas

Authors: Long Zhang, Yizhe Zhao, Halvin Yang, Guangming Liang, Jie Hu

Abstract: Integrated data and energy transfer (IDET) is considered as a key enabler of 6G, as it can provide both wireless energy transfer (WET) and wireless data transfer (WDT) services towards low power devices. Thanks to the extra degree of freedom provided by fluid antenna (FA), incorporating FA into IDET systems presents a promising approach to enhance energy efficiency performance. This paper investig… ▽ More Integrated data and energy transfer (IDET) is considered as a key enabler of 6G, as it can provide both wireless energy transfer (WET) and wireless data transfer (WDT) services towards low power devices. Thanks to the extra degree of freedom provided by fluid antenna (FA), incorporating FA into IDET systems presents a promising approach to enhance energy efficiency performance. This paper investigates a FA assisted IDET system, where the transmitter is equipped with multiple FAs and transmits wireless signals to the data receiver (DR) and the energy receiver (ER), which are both equipped with a single traditional antenna. The switching delay and energy consumption induced by port selection are taken into account in IDET system for the first time. We aim to obtain the optimal beamforming vector and the port selection strategy at the transmitter, in order to maximize the short-term and long-term WET efficiency, respectively. The instant sub-optimal solution is obtained by alternatively optimizing the beamforming vector and port selection in each transmission frame, while a novel constrained soft actor critic (C-SAC) algorithm is proposed to find the feasible policy of port selection from the long-term perspective. Simulation results demonstrate that our scheme is able to achieve greater gain in terms of both the short-term and long-term WET efficiency compared to other benchmarks, while not degrading WDT performance. △ Less

Submitted 6 March, 2025; originally announced March 2025.

Comments: Submitted to an IEEE journal

arXiv:2503.02410 [pdf, ps, other]

Neuroverse3D: Developing In-Context Learning Universal Model for Neuroimaging in 3D

Authors: Jiesi Hu, Chenfei Ye, Yanwu Yang, Xutao Guo, Yang Shang, Pengcheng Shi, Hanyang Peng, Ting Ma

Abstract: In-context learning (ICL), a type of universal model, demonstrates exceptional generalization across a wide range of tasks without retraining by leveraging task-specific guidance from context, making it particularly effective for the intricate demands of neuroimaging. However, current ICL models, limited to 2D inputs and thus exhibiting suboptimal performance, struggle to extend to 3D inputs due t… ▽ More In-context learning (ICL), a type of universal model, demonstrates exceptional generalization across a wide range of tasks without retraining by leveraging task-specific guidance from context, making it particularly effective for the intricate demands of neuroimaging. However, current ICL models, limited to 2D inputs and thus exhibiting suboptimal performance, struggle to extend to 3D inputs due to the high memory demands of ICL. In this regard, we introduce Neuroverse3D, an ICL model capable of performing multiple neuroimaging tasks in 3D (e.g., segmentation, denoising, inpainting). Neuroverse3D overcomes the large memory consumption associated with 3D inputs through adaptive parallel-sequential context processing and a U-shaped fusion strategy, allowing it to handle an unlimited number of context images. Additionally, we propose an optimized loss function to balance multi-task training and enhance focus on anatomical boundaries. Our study incorporates 43,674 3D multi-modal scans from 19 neuroimaging datasets and evaluates Neuroverse3D on 14 diverse tasks using held-out test sets. The results demonstrate that Neuroverse3D significantly outperforms existing ICL models and closely matches task-specific models, enabling flexible adaptation to medical center variations without retraining. The code and model weights are publicly available at https://github.com/jiesihu/Neuroverse3D. △ Less

Submitted 4 July, 2025; v1 submitted 4 March, 2025; originally announced March 2025.

arXiv:2502.20986 [pdf, other]

Target Tracking using Robust Sensor Motion Control

Authors: Jingwei Hu, Dave Zachariah, Petre Stoica

Abstract: We consider the problem of tracking moving targets using mobile wireless sensors (of possibly different types). This is a joint estimation and control problem in which a tracking system must take into account both target and sensor dynamics. We make minimal assumptions about the target dynamics, namely only that their accelerations are bounded. We develop a control law that determines the sensor m… ▽ More We consider the problem of tracking moving targets using mobile wireless sensors (of possibly different types). This is a joint estimation and control problem in which a tracking system must take into account both target and sensor dynamics. We make minimal assumptions about the target dynamics, namely only that their accelerations are bounded. We develop a control law that determines the sensor motion control signals so as to maximize target resolvability as the target dynamics evolve. The method is given a tractable formulation that is amenable to an efficient search method and is evaluated in a series of experiments involving both round-trip time based ranging and Doppler frequency shift measurements △ Less

Submitted 28 February, 2025; originally announced February 2025.

arXiv:2502.20941 [pdf, other]

Adaptive Input Design for Nonlinear System Identification with Operational Constraints

Authors: Jingwei Hu, Dave Zachariah, Torbjörn Wigren, Petre Stoica

Abstract: We consider the problem of joint input design and parameter estimation for identifying nonlinear system models through the sequential acquisition of measurements while adhering to system constraints. We utilize a receding horizon approach and propose a new scale-invariant input design criterion, which is tailored to continuously updated parameter estimates, along with a new sequential parameter es… ▽ More We consider the problem of joint input design and parameter estimation for identifying nonlinear system models through the sequential acquisition of measurements while adhering to system constraints. We utilize a receding horizon approach and propose a new scale-invariant input design criterion, which is tailored to continuously updated parameter estimates, along with a new sequential parameter estimator. We demonstrate the ability of the method to design informative experiments online, while steering the system within operational constraints. △ Less

Submitted 23 May, 2025; v1 submitted 28 February, 2025; originally announced February 2025.

arXiv:2502.18200 [pdf, ps, other]

Zero-Shot Semantic Communication with Multimodal Foundation Models

Authors: Jiangjing Hu, Haotian Wu, Wenjing Zhang, Fengyu Wang, Wenjun Xu, Hui Gao, Deniz Gündüz

Abstract: Most existing semantic communication (SemCom) systems use deep joint source-channel coding (DeepJSCC) to encode task-specific semantics in a goal-oriented manner. However, their reliance on predefined tasks and datasets significantly limits their flexibility and generalizability in practical deployments. Multi-modal foundation models provide a promising solution by generating universal semantic to… ▽ More Most existing semantic communication (SemCom) systems use deep joint source-channel coding (DeepJSCC) to encode task-specific semantics in a goal-oriented manner. However, their reliance on predefined tasks and datasets significantly limits their flexibility and generalizability in practical deployments. Multi-modal foundation models provide a promising solution by generating universal semantic tokens. Inspired by this, we introduce SemCLIP, a zero-shot SemCom framework leveraging the contrastive language-image pre-training (CLIP) model. By transmitting CLIP-generated image tokens instead of raw images, SemCLIP enables efficient SemCom under low bandwidth and challenging channel conditions, facilitating diverse downstream tasks and zero-shot applications. Specifically, we propose a DeepJSCC scheme for efficient CLIP token encoding. To mitigate potential degradation caused by compression and channel noise, a multi-modal transmission-aware prompt learning mechanism is designed at the receiver, which adapts prompts based on transmission quality, enhancing system robustness and channel adaptability. Simulation results demonstrate that SemCLIP outperforms the baselines, achieving a $41\%$ improvement in zero-shot performance at low signal-to-noise ratios. Meanwhile, SemCLIP reduces bandwidth usage by more than $50$-fold compared to alternative image transmission methods, demonstrating the potential of foundation models towards a generalized, task-agnostic SemCom solution. △ Less

Submitted 29 May, 2025; v1 submitted 25 February, 2025; originally announced February 2025.

arXiv:2502.18008 [pdf, other]

NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms

Authors: Yashan Wang, Shangda Wu, Jianhuai Hu, Xingjian Du, Yueqi Peng, Yongxin Huang, Shuai Fan, Xiaobing Li, Feng Yu, Maosong Sun

Abstract: We introduce NotaGen, a symbolic music generation model aiming to explore the potential of producing high-quality classical sheet music. Inspired by the success of Large Language Models (LLMs), NotaGen adopts pre-training, fine-tuning, and reinforcement learning paradigms (henceforth referred to as the LLM training paradigms). It is pre-trained on 1.6M pieces of music in ABC notation, and then fin… ▽ More We introduce NotaGen, a symbolic music generation model aiming to explore the potential of producing high-quality classical sheet music. Inspired by the success of Large Language Models (LLMs), NotaGen adopts pre-training, fine-tuning, and reinforcement learning paradigms (henceforth referred to as the LLM training paradigms). It is pre-trained on 1.6M pieces of music in ABC notation, and then fine-tuned on approximately 9K high-quality classical compositions conditioned on "period-composer-instrumentation" prompts. For reinforcement learning, we propose the CLaMP-DPO method, which further enhances generation quality and controllability without requiring human annotations or predefined rewards. Our experiments demonstrate the efficacy of CLaMP-DPO in symbolic music generation models with different architectures and encoding schemes. Furthermore, subjective A/B tests show that NotaGen outperforms baseline models against human compositions, greatly advancing musical aesthetics in symbolic music generation. △ Less

Submitted 21 March, 2025; v1 submitted 25 February, 2025; originally announced February 2025.

arXiv:2502.12736 [pdf, other]

Cross-Domain Continual Learning for Edge Intelligence in Wireless ISAC Networks

Authors: Jingzhi Hu, Xin Li, Zhou Su, Jun Luo

Abstract: In wireless networks with integrated sensing and communications (ISAC), edge intelligence (EI) is expected to be developed at edge devices (ED) for sensing user activities based on channel state information (CSI). However, due to the CSI being highly specific to users' characteristics, the CSI-activity relationship is notoriously domain dependent, essentially demanding EI to learn sufficient datas… ▽ More In wireless networks with integrated sensing and communications (ISAC), edge intelligence (EI) is expected to be developed at edge devices (ED) for sensing user activities based on channel state information (CSI). However, due to the CSI being highly specific to users' characteristics, the CSI-activity relationship is notoriously domain dependent, essentially demanding EI to learn sufficient datasets from various domains in order to gain cross-domain sensing capability. This poses a crucial challenge owing to the EDs' limited resources, for which storing datasets across all domains will be a significant burden. In this paper, we propose the EdgeCL framework, enabling the EI to continually learn-then-discard each incoming dataset, while remaining resilient to catastrophic forgetting. We design a transformer-based discriminator for handling sequences of noisy and nonequispaced CSI samples. Besides, we propose a distilled core-set based knowledge retention method with robustness-enhanced optimization to train the discriminator, preserving its performance for previous domains while preventing future forgetting. Experimental evaluations show that EdgeCL achieves 89% of performance compared to cumulative training while consuming only 3% of its memory, mitigating forgetting by 79%. △ Less

Submitted 14 April, 2025; v1 submitted 18 February, 2025; originally announced February 2025.

arXiv:2502.10932 [pdf, other]

PPAC Driven Multi-die and Multi-technology Floorplanning

Authors: Cristhian Roman-Vicharra, Yiran Chen, Jiang Hu

Abstract: In heterogeneous integration, where different dies may utilize distinct technologies, floorplanning across multiple dies inherently requires simultaneous technology selection. This work presents the first systematic study of multi-die and multi-technology floorplanning. Unlike many conventional approaches, which are primarily driven by area and wirelength, this study additionally considers perform… ▽ More In heterogeneous integration, where different dies may utilize distinct technologies, floorplanning across multiple dies inherently requires simultaneous technology selection. This work presents the first systematic study of multi-die and multi-technology floorplanning. Unlike many conventional approaches, which are primarily driven by area and wirelength, this study additionally considers performance, power, and cost, highlighting the impact of technology selection. A simulated annealing method and a reinforcement learning techniques are developed. Experimental results show that the proposed techniques significantly outperform a naive baseline approach. △ Less

Submitted 15 February, 2025; originally announced February 2025.

arXiv:2502.06100 [pdf, other]

Col-OLHTR: A Novel Framework for Multimodal Online Handwritten Text Recognition

Authors: Chenyu Liu, Jinshui Hu, Baocai Yin, Jia Pan, Bing Yin, Jun Du, Qingfeng Liu

Abstract: Online Handwritten Text Recognition (OLHTR) has gained considerable attention for its diverse range of applications. Current approaches usually treat OLHTR as a sequence recognition task, employing either a single trajectory or image encoder, or multi-stream encoders, combined with a CTC or attention-based recognition decoder. However, these approaches face several drawbacks: 1) single encoders ty… ▽ More Online Handwritten Text Recognition (OLHTR) has gained considerable attention for its diverse range of applications. Current approaches usually treat OLHTR as a sequence recognition task, employing either a single trajectory or image encoder, or multi-stream encoders, combined with a CTC or attention-based recognition decoder. However, these approaches face several drawbacks: 1) single encoders typically focus on either local trajectories or visual regions, lacking the ability to dynamically capture relevant global features in challenging cases; 2) multi-stream encoders, while more comprehensive, suffer from complex structures and increased inference costs. To tackle this, we propose a Collaborative learning-based OLHTR framework, called Col-OLHTR, that learns multimodal features during training while maintaining a single-stream inference process. Col-OLHTR consists of a trajectory encoder, a Point-to-Spatial Alignment (P2SA) module, and an attention-based decoder. The P2SA module is designed to learn image-level spatial features through trajectory-encoded features and 2D rotary position embeddings. During training, an additional image-stream encoder-decoder is collaboratively trained to provide supervision for P2SA features. At inference, the extra streams are discarded, and only the P2SA module is used and merged before the decoder, simplifying the process while preserving high performance. Extensive experimental results on several OLHTR benchmarks demonstrate the state-of-the-art (SOTA) performance, proving the effectiveness and robustness of our design. △ Less

Submitted 9 February, 2025; originally announced February 2025.

Comments: ICASSP 2025

arXiv:2502.04711 [pdf, other]

Dynamic Frequency-Adaptive Knowledge Distillation for Speech Enhancement

Authors: Xihao Yuan, Siqi Liu, Hanting Chen, Lu Zhou, Jian Li, Jie Hu

Abstract: Deep learning-based speech enhancement (SE) models have recently outperformed traditional techniques, yet their deployment on resource-constrained devices remains challenging due to high computational and memory demands. This paper introduces a novel dynamic frequency-adaptive knowledge distillation (DFKD) approach to effectively compress SE models. Our method dynamically assesses the model's outp… ▽ More Deep learning-based speech enhancement (SE) models have recently outperformed traditional techniques, yet their deployment on resource-constrained devices remains challenging due to high computational and memory demands. This paper introduces a novel dynamic frequency-adaptive knowledge distillation (DFKD) approach to effectively compress SE models. Our method dynamically assesses the model's output, distinguishing between high and low-frequency components, and adapts the learning objectives to meet the unique requirements of different frequency bands, capitalizing on the SE task's inherent characteristics. To evaluate the DFKD's efficacy, we conducted experiments on three state-of-the-art models: DCCRN, ConTasNet, and DPTNet. The results demonstrate that our method not only significantly enhances the performance of the compressed model (student model) but also surpasses other logit-based knowledge distillation methods specifically for SE tasks. △ Less

Submitted 7 February, 2025; originally announced February 2025.

Comments: 5 pages, 2 figures, accepted by ICASSP2025

arXiv:2501.18201 [pdf, other]

Neural Operator based Reinforcement Learning for Control of first-order PDEs with Spatially-Varying State Delay

Authors: Jiaqi Hu, Jie Qi, Jing Zhang

Abstract: Control of distributed parameter systems affected by delays is a challenging task, particularly when the delays depend on spatial variables. The idea of integrating analytical control theory with learning-based control within a unified control scheme is becoming increasingly promising and advantageous. In this paper, we address the problem of controlling an unstable first-order hyperbolic PDE with… ▽ More Control of distributed parameter systems affected by delays is a challenging task, particularly when the delays depend on spatial variables. The idea of integrating analytical control theory with learning-based control within a unified control scheme is becoming increasingly promising and advantageous. In this paper, we address the problem of controlling an unstable first-order hyperbolic PDE with spatially-varying delays by combining PDE backstepping control strategies and deep reinforcement learning (RL). To eliminate the assumption on the delay function required for the backstepping design, we propose a soft actor-critic (SAC) architecture incorporating a DeepONet to approximate the backstepping controller. The DeepONet extracts features from the backstepping controller and feeds them into the policy network. In simulations, our algorithm outperforms the baseline SAC without prior backstepping knowledge and the analytical controller. △ Less

Submitted 30 January, 2025; originally announced January 2025.

Comments: 6 Pages, 7 Figures

arXiv:2501.16951 [pdf, other]

Federated Learning Strategies for Coordinated Beamforming in Multicell ISAC

Authors: Lai Jiang, Kaitao Meng, Murat Temiz, Jiaming Hu, Christos Masouros

Abstract: We propose two cooperative beamforming frameworks based on federated learning (FL) for multi-cell integrated sensing and communications (ISAC) systems. Our objective is to address the following dilemma in multicell ISAC: 1) Beamforming strategies that rely solely on local channel information risk generating significant inter-cell interference (ICI), which degrades network performance for both comm… ▽ More We propose two cooperative beamforming frameworks based on federated learning (FL) for multi-cell integrated sensing and communications (ISAC) systems. Our objective is to address the following dilemma in multicell ISAC: 1) Beamforming strategies that rely solely on local channel information risk generating significant inter-cell interference (ICI), which degrades network performance for both communication users and sensing receivers in neighboring cells; 2) conversely centralized beamforming strategies can mitigate ICI by leveraging global channel information, but they come with substantial transmission overhead and latency that can be prohibitive for latency-sensitive and source-constrained applications. To tackle these challenges, we first propose a partially decentralized training framework motivated by the vertical federated learning (VFL) paradigm. In this framework, the participating base stations (BSs) collaboratively design beamforming matrices under the guidance of a central server. The central server aggregates local information from the BSs and provides feedback, allowing BSs to implicitly manage ICI without accessing the global channel information. To make the solution scalable for densely deployed wireless networks, we take further steps to reduce communication overhead by presenting a fully decentralized design based on the horizontal federated learning (HFL). Specifically, we develop a novel loss function to control the interference leakage power, enabling a more efficient training process by entirely eliminating local channel information exchange. Numerical results show that the proposed solutions can achieve significant performance improvements comparable to the benchmarks in terms of both communication and radar information rates. △ Less

Submitted 28 January, 2025; originally announced January 2025.

arXiv:2501.13405 [pdf, other]

Performance Analysis of Fluid Antenna Multiple Access Assisted Wireless Powered Communication Network

Authors: Xiao Lin, Yizhe Zhao, Halvin Yang, Jie Hu

Abstract: This paper investigates a novel fluid antenna multiple access (FAMA)-assisted wireless powered communication network (WPCN), in which a hybrid access point (HAP) equipped with multiple fixed position antennas (FPAs) provides integrated data and energy transfer (IDET) services towards low-power devices that are equipped with a single fluid antenna (FA), while the low-power devices use harvested ene… ▽ More This paper investigates a novel fluid antenna multiple access (FAMA)-assisted wireless powered communication network (WPCN), in which a hybrid access point (HAP) equipped with multiple fixed position antennas (FPAs) provides integrated data and energy transfer (IDET) services towards low-power devices that are equipped with a single fluid antenna (FA), while the low-power devices use harvested energy to power their own uplink transmission. Using the block correlation channel model, both the downlink and uplink wireless data transfer (WDT) outage probabilities are analyzed under specific port selection strategies, including downlink signal-to-interference ratio-based port selection (DSPS) strategy, downlink energy harvesting power-based port selection (DEPS) strategy, uplink signal-to-noise ratio-based port selection (USPS) strategy, and uplink channel-based port selection (UCPS) strategy. A step function approximation (SFA) approach is also relied upon to derive closed-form expressions for the outage probabilities, while the lower bounds for uplink WDT outage probabilities are also formulated. Numerical results demonstrate the validity of our theoretical analysis, which also provide useful guidelines for the system design through the analytical framework. △ Less

Submitted 10 February, 2025; v1 submitted 23 January, 2025; originally announced January 2025.

Comments: Submitted to an IEEE journal

arXiv:2501.07329 [pdf, other]

Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding

Authors: Jiliang Hu, Zuchao Li, Mengjia Shen, Haojun Ai, Sheng Li, Jun Zhang

Abstract: Spoken language understanding (SLU) is a structure prediction task in the field of speech. Recently, many works on SLU that treat it as a sequence-to-sequence task have achieved great success. However, This method is not suitable for simultaneous speech recognition and understanding. In this paper, we propose a joint speech recognition and structure learning framework (JSRSL), an end-to-end SLU mo… ▽ More Spoken language understanding (SLU) is a structure prediction task in the field of speech. Recently, many works on SLU that treat it as a sequence-to-sequence task have achieved great success. However, This method is not suitable for simultaneous speech recognition and understanding. In this paper, we propose a joint speech recognition and structure learning framework (JSRSL), an end-to-end SLU model based on span, which can accurately transcribe speech and extract structured content simultaneously. We conduct experiments on name entity recognition and intent classification using the Chinese dataset AISHELL-NER and the English dataset SLURP. The results show that our proposed method not only outperforms the traditional sequence-to-sequence method in both transcription and extraction capabilities but also achieves state-of-the-art performance on the two datasets. △ Less

Submitted 17 January, 2025; v1 submitted 13 January, 2025; originally announced January 2025.

Comments: 5 pages, 2 figures, accepted by ICASSP 2025

arXiv:2501.04644 [pdf, other]

FleSpeech: Flexibly Controllable Speech Generation with Various Prompts

Authors: Hanzhao Li, Yuke Li, Xinsheng Wang, Jingbin Hu, Qicong Xie, Shan Yang, Lei Xie

Abstract: Controllable speech generation methods typically rely on single or fixed prompts, hindering creativity and flexibility. These limitations make it difficult to meet specific user needs in certain scenarios, such as adjusting the style while preserving a selected speaker's timbre, or choosing a style and generating a voice that matches a character's visual appearance. To overcome these challenges, w… ▽ More Controllable speech generation methods typically rely on single or fixed prompts, hindering creativity and flexibility. These limitations make it difficult to meet specific user needs in certain scenarios, such as adjusting the style while preserving a selected speaker's timbre, or choosing a style and generating a voice that matches a character's visual appearance. To overcome these challenges, we propose \textit{FleSpeech}, a novel multi-stage speech generation framework that allows for more flexible manipulation of speech attributes by integrating various forms of control. FleSpeech employs a multimodal prompt encoder that processes and unifies different text, audio, and visual prompts into a cohesive representation. This approach enhances the adaptability of speech synthesis and supports creative and precise control over the generated speech. Additionally, we develop a data collection pipeline for multimodal datasets to facilitate further research and applications in this field. Comprehensive subjective and objective experiments demonstrate the effectiveness of FleSpeech. Audio samples are available at https://kkksuper.github.io/FleSpeech/ △ Less

Submitted 30 April, 2025; v1 submitted 8 January, 2025; originally announced January 2025.

Comments: 14 pages, 3 figures

arXiv:2412.12197 [pdf]

Anti-bullying Adaptive Cruise Control: A proactive right-of-way protection approach

Authors: Jia Hu, Zhexi Lian, Haoran Wang, Zihan Zhang, Ruoxi Qian, Duo Li, Jaehyun, So, Junnian Zheng

Abstract: The current Adaptive Cruise Control (ACC) systems are vulnerable to "road bully" such as cut-ins. This paper proposed an Anti-bullying Adaptive Cruise Control (AACC) approach with proactive right-of-way protection ability. It bears the following features: i) with the enhanced capability of preventing bullying from cut-ins; ii) optimal but not unsafe; iii) adaptive to various driving styles of cut-… ▽ More The current Adaptive Cruise Control (ACC) systems are vulnerable to "road bully" such as cut-ins. This paper proposed an Anti-bullying Adaptive Cruise Control (AACC) approach with proactive right-of-way protection ability. It bears the following features: i) with the enhanced capability of preventing bullying from cut-ins; ii) optimal but not unsafe; iii) adaptive to various driving styles of cut-in vehicles; iv) with real-time field implementation capability. The proposed approach can identify other road users' driving styles online and conduct game-based motion planning for right-of-way protection. A detailed investigation of the simulation results shows that the proposed approach can prevent bullying from cut-ins and be adaptive to different cut-in vehicles' driving styles. The proposed approach is capable of enhancing travel efficiency by up to 29.55% under different cut-in gaps and can strengthen driving safety compared with the current ACC controller. The proposed approach is flexible and robust against traffic congestion levels. It can improve mobility by up to 11.93% and robustness by 8.74% in traffic flow. Furthermore, the proposed approach can support real-time field implementation by ensuring less than 50 milliseconds computation time. △ Less

Submitted 14 December, 2024; originally announced December 2024.

Comments: 12 pages, 15 figures

arXiv:2412.12126 [pdf]

Seamless Optical Cloud Computing across Edge-Metro Network for Generative AI

Authors: Sizhe Xing, Aolong Sun, Chengxi Wang, Yizhi Wang, Boyu Dong, Junhui Hu, Xuyu Deng, An Yan, Yingjun Liu, Fangchen Hu, Zhongya Li, Ouhan Huang, Junhao Zhao, Yingjun Zhou, Ziwei Li, Jianyang Shi, Xi Xiao, Richard Penty, Qixiang Cheng, Nan Chi, Junwen Zhang

Abstract: The rapid advancement of generative artificial intelligence (AI) in recent years has profoundly reshaped modern lifestyles, necessitating a revolutionary architecture to support the growing demands for computational power. Cloud computing has become the driving force behind this transformation. However, it consumes significant power and faces computation security risks due to the reliance on exten… ▽ More The rapid advancement of generative artificial intelligence (AI) in recent years has profoundly reshaped modern lifestyles, necessitating a revolutionary architecture to support the growing demands for computational power. Cloud computing has become the driving force behind this transformation. However, it consumes significant power and faces computation security risks due to the reliance on extensive data centers and servers in the cloud. Reducing power consumption while enhancing computational scale remains persistent challenges in cloud computing. Here, we propose and experimentally demonstrate an optical cloud computing system that can be seamlessly deployed across edge-metro network. By modulating inputs and models into light, a wide range of edge nodes can directly access the optical computing center via the edge-metro network. The experimental validations show an energy efficiency of 118.6 mW/TOPs (tera operations per second), reducing energy consumption by two orders of magnitude compared to traditional electronic-based cloud computing solutions. Furthermore, it is experimentally validated that this architecture can perform various complex generative AI models through parallel computing to achieve image generation tasks. △ Less

Submitted 1 May, 2025; v1 submitted 4 December, 2024; originally announced December 2024.

arXiv:2412.10822 [pdf]

Automated Driving with Evolution Capability: A Reinforcement Learning Method with Monotonic Performance Enhancement

Authors: Jia Hu, Xuerun Yan, Tian Xu, Haoran Wang

Abstract: Reinforcement Learning (RL) offers a promising solution to enable evolutionary automated driving. However, the conventional RL method is always concerned with risk performance. The updated policy may not obtain a performance enhancement, even leading to performance deterioration. To address this challenge, this research proposes a High Confidence Policy Improvement Reinforcement Learning-based (HC… ▽ More Reinforcement Learning (RL) offers a promising solution to enable evolutionary automated driving. However, the conventional RL method is always concerned with risk performance. The updated policy may not obtain a performance enhancement, even leading to performance deterioration. To address this challenge, this research proposes a High Confidence Policy Improvement Reinforcement Learning-based (HCPI-RL) planner. It is intended to achieve the monotonic evolution of automated driving. A novel RL policy update paradigm is designed to enable the newly learned policy performance consistently surpass that of previous policies, which is deemed as monotonic performance enhancement. Hence, the proposed HCPI-RL planner has the following features: i) Evolutionary automated driving with monotonic performance enhancement; ii) With the capability of handling scenarios with emergency; iii) With enhanced decision-making optimality. Results demonstrate that the proposed HCPI-RL planner enhances the policy return by 44.7% in emergent cut-in scenarios, 108.2% in emergent braking scenarios, and 64.4% in daily cruising scenarios, compared to the PPO planner. Adopting the proposed planner, automated driving efficiency is enhanced by 19.2% compared to the PPO planner, and by 30.7% compared to the rule-based planner. △ Less

Submitted 14 December, 2024; originally announced December 2024.

Comments: 24 pages, 16figures

arXiv:2412.08219 [pdf, other]

Neural Operator Feedback for a First-Order PIDE with Spatially-Varying State Delay

Authors: Jie Qi, Jiaqi Hu, Jing Zhang, Miroslav Krstic

Abstract: A transport PDE with a spatial integral and recirculation with constant delay has been a benchmark for neural operator approximations of PDE backstepping controllers. Introducing a spatially-varying delay into the model gives rise to a gain operator defined through integral equations which the operator's input -- the varying delay function -- enters in previously unencountered manners, including i… ▽ More A transport PDE with a spatial integral and recirculation with constant delay has been a benchmark for neural operator approximations of PDE backstepping controllers. Introducing a spatially-varying delay into the model gives rise to a gain operator defined through integral equations which the operator's input -- the varying delay function -- enters in previously unencountered manners, including in the limits of integration and as the inverse of the `delayED time' function. This, in turn, introduces novel mathematical challenges in estimating the operator's Lipschitz constant. The backstepping kernel function having two branches endows the feedback law with a two-branch structure, where only one of the two feedback branches depends on both of the kernel branches. For this rich feedback structure, we propose a neural operator approximation of such a two-branch feedback law and prove the approximator to be semiglobally practically stabilizing. With numerical results we illustrate the training of the neural operator and its stabilizing capability. △ Less

Submitted 14 December, 2024; v1 submitted 11 December, 2024; originally announced December 2024.

Comments: This 14 page paper contains 1 table and 20 figures

arXiv:2412.06507 [pdf, other]

BATseg: Boundary-aware Multiclass Spinal Cord Tumor Segmentation on 3D MRI Scans

Authors: Hongkang Song, Zihui Zhang, Yanpeng Zhou, Jie Hu, Zishuo Wang, Hou Him Chan, Chon Lok Lei, Chen Xu, Yu Xin, Bo Yang

Abstract: Spinal cord tumors significantly contribute to neurological morbidity and mortality. Precise morphometric quantification, encompassing the size, location, and type of such tumors, holds promise for optimizing treatment planning strategies. Although recent methods have demonstrated excellent performance in medical image segmentation, they primarily focus on discerning shapes with relatively large m… ▽ More Spinal cord tumors significantly contribute to neurological morbidity and mortality. Precise morphometric quantification, encompassing the size, location, and type of such tumors, holds promise for optimizing treatment planning strategies. Although recent methods have demonstrated excellent performance in medical image segmentation, they primarily focus on discerning shapes with relatively large morphology such as brain tumors, ignoring the challenging problem of identifying spinal cord tumors which tend to have tiny sizes, diverse locations, and shapes. To tackle this hard problem of multiclass spinal cord tumor segmentation, we propose a new method, called BATseg, to learn a tumor surface distance field by applying our new multiclass boundary-aware loss function. To verify the effectiveness of our approach, we also introduce the first and large-scale spinal cord tumor dataset. It comprises gadolinium-enhanced T1-weighted 3D MRI scans from 653 patients and contains the four most common spinal cord tumor types: astrocytomas, ependymomas, hemangioblastomas, and spinal meningiomas. Extensive experiments on our dataset and another public kidney tumor segmentation dataset show that our proposed method achieves superior performance for multiclass tumor segmentation. △ Less

Submitted 9 December, 2024; originally announced December 2024.

Comments: ECCV 2024 Workshop on BioImage Computing. Code and data are available at: https://github.com/vLAR-group/BATseg

arXiv:2411.19385 [pdf, other]

Zero-Forget Preservation of Semantic Communication Alignment in Distributed AI Networks

Authors: Jingzhi Hu, Geoffrey Ye Li

Abstract: Future communication networks are expected to connect massive distributed artificial intelligence (AI). Exploiting aligned priori knowledge of AI pairs, it is promising to convert high-dimensional data transmission into highly-compressed semantic communications (SC). However, to accommodate the local data distribution and user preferences, AIs generally adapt to different domains, which fundamenta… ▽ More Future communication networks are expected to connect massive distributed artificial intelligence (AI). Exploiting aligned priori knowledge of AI pairs, it is promising to convert high-dimensional data transmission into highly-compressed semantic communications (SC). However, to accommodate the local data distribution and user preferences, AIs generally adapt to different domains, which fundamentally distorts the SC alignment. In this paper, we propose a zero-forget domain adaptation (ZFDA) framework to preserve SC alignment. To prevent the DA from changing substantial neural parameters of AI, we design sparse additive modifications (SAM) to the parameters, which can be efficiently stored and switched-off to restore the SC alignment. To optimize the SAM, we decouple it into tractable continuous variables and a binary mask, and then handle the binary mask by a score-based optimization. Experimental evaluations on a SC system for image transmissions validate that the proposed framework perfectly preserves the SC alignment with almost no loss of DA performance, even improved in some cases, at a cost of less than 1% of additional memory. △ Less

Submitted 28 November, 2024; originally announced November 2024.

arXiv:2411.17705 [pdf, other]

EEG-DCNet: A Fast and Accurate MI-EEG Dilated CNN Classification Method

Authors: Wei Peng, Kang Liu, Jiaxi Shi, Jianchen Hu

Abstract: The electroencephalography (EEG)-based motor imagery (MI) classification is a critical and challenging task in brain-computer interface (BCI) technology, which plays a significant role in assisting patients with functional impairments to regain mobility. We present a novel multi-scale atrous convolutional neural network (CNN) model called EEG-dilated convolution network (DCNet) to enhance the accu… ▽ More The electroencephalography (EEG)-based motor imagery (MI) classification is a critical and challenging task in brain-computer interface (BCI) technology, which plays a significant role in assisting patients with functional impairments to regain mobility. We present a novel multi-scale atrous convolutional neural network (CNN) model called EEG-dilated convolution network (DCNet) to enhance the accuracy and efficiency of the EEG-based MI classification tasks. We incorporate the $1\times1$ convolutional layer and utilize the multi-branch parallel atrous convolutional architecture in EEG-DCNet to capture the highly nonlinear characteristics and multi-scale features of the EEG signals. Moreover, we utilize the sliding window to enhance the temporal consistency and utilize the attension mechanism to improve the accuracy of recognizing user intentions. The experimental results (via the BCI-IV-2a ,BCI-IV-2b and the High-Gamma datasets) show that EEG-DCNet outperforms existing state-of-the-art (SOTA) approaches in terms of classification accuracy and Kappa scores. Furthermore, since EEG-DCNet requires less number of parameters, the training efficiency and memory consumption are also improved. The experiment code is open-sourced at \href{https://github.com/Kanyooo/EEG-DCNet}{here}. △ Less

Submitted 12 November, 2024; originally announced November 2024.

arXiv:2411.15211 [pdf, other]

LightLLM: A Versatile Large Language Model for Predictive Light Sensing

Authors: Jiawei Hu, Hong Jia, Mahbub Hassan, Lina Yao, Brano Kusy, Wen Hu

Abstract: We propose LightLLM, a model that fine tunes pre-trained large language models (LLMs) for light-based sensing tasks. It integrates a sensor data encoder to extract key features, a contextual prompt to provide environmental information, and a fusion layer to combine these inputs into a unified representation. This combined input is then processed by the pre-trained LLM, which remains frozen while b… ▽ More We propose LightLLM, a model that fine tunes pre-trained large language models (LLMs) for light-based sensing tasks. It integrates a sensor data encoder to extract key features, a contextual prompt to provide environmental information, and a fusion layer to combine these inputs into a unified representation. This combined input is then processed by the pre-trained LLM, which remains frozen while being fine-tuned through the addition of lightweight, trainable components, allowing the model to adapt to new tasks without altering its original parameters. This approach enables flexible adaptation of LLM to specialized light sensing tasks with minimal computational overhead and retraining effort. We have implemented LightLLM for three light sensing tasks: light-based localization, outdoor solar forecasting, and indoor solar estimation. Using real-world experimental datasets, we demonstrate that LightLLM significantly outperforms state-of-the-art methods, achieving 4.4x improvement in localization accuracy and 3.4x improvement in indoor solar estimation when tested in previously unseen environments. We further demonstrate that LightLLM outperforms ChatGPT-4 with direct prompting, highlighting the advantages of LightLLM's specialized architecture for sensor data fusion with textual prompts. △ Less

Submitted 20 November, 2024; originally announced November 2024.

Comments: 15 pages, 14 figures, 5 tables

arXiv:2411.14353

Enhancing Medical Image Segmentation with Deep Learning and Diffusion Models

Authors: Houze Liu, Tong Zhou, Yanlin Xiang, Aoran Shen, Jiacheng Hu, Junliang Du

Abstract: Medical image segmentation is crucial for accurate clinical diagnoses, yet it faces challenges such as low contrast between lesions and normal tissues, unclear boundaries, and high variability across patients. Deep learning has improved segmentation accuracy and efficiency, but it still relies heavily on expert annotations and struggles with the complexities of medical images. The small size of me… ▽ More Medical image segmentation is crucial for accurate clinical diagnoses, yet it faces challenges such as low contrast between lesions and normal tissues, unclear boundaries, and high variability across patients. Deep learning has improved segmentation accuracy and efficiency, but it still relies heavily on expert annotations and struggles with the complexities of medical images. The small size of medical image datasets and the high cost of data acquisition further limit the performance of segmentation networks. Diffusion models, with their iterative denoising process, offer a promising alternative for better detail capture in segmentation. However, they face difficulties in accurately segmenting small targets and maintaining the precision of boundary details. This article discusses the importance of medical image segmentation, the limitations of current deep learning approaches, and the potential of diffusion models to address these challenges. △ Less

Submitted 5 December, 2024; v1 submitted 21 November, 2024; originally announced November 2024.

Comments: After a peer review process for a journal submission, we have been told the main conclusions presented in this paper have been proven previously by others. I believe the paper should be withdrawn

arXiv:2411.08178 [pdf, other]

On Adapting Randomized Nyström Preconditioners to Accelerate Variational Image Reconstruction

Authors: Tao Hong, Zhaoyi Xu, Jason Hu, Jeffrey A. Fessler

Abstract: Model-based iterative reconstruction plays a key role in solving inverse problems. However, the associated minimization problems are generally large-scale, ill-posed, nonsmooth, and sometimes even nonconvex, which present challenges in designing efficient iterative solvers and often prevent their practical use. Preconditioning methods can significantly accelerate the convergence of iterative metho… ▽ More Model-based iterative reconstruction plays a key role in solving inverse problems. However, the associated minimization problems are generally large-scale, ill-posed, nonsmooth, and sometimes even nonconvex, which present challenges in designing efficient iterative solvers and often prevent their practical use. Preconditioning methods can significantly accelerate the convergence of iterative methods. In some applications, computing preconditioners on-the-fly is beneficial. Moreover, forward models in image reconstruction are typically represented as operators, and the corresponding explicit matrices are often unavailable, which brings additional challenges in designing preconditioners. Therefore, for practical use, computing and applying preconditioners should be computationally inexpensive. This paper adapts the randomized Nyström approximation to compute effective preconditioners that accelerate image reconstruction without requiring an explicit matrix for the forward model. We leverage modern GPU computational platforms to compute the preconditioner on-the-fly. Moreover, we propose efficient approaches for applying the preconditioner to problems with nonsmooth regularizers. Our numerical results on image deblurring, super-resolution with impulsive noise, and computed tomography reconstruction demonstrate the efficiency and effectiveness of the proposed preconditioner. △ Less

Submitted 12 November, 2024; originally announced November 2024.

Comments: 13 pages, 11 figures, 4 tables

arXiv:2411.08014 [pdf]

Artistic Neural Style Transfer Algorithms with Activation Smoothing

Authors: Xiangtian Li, Han Cao, Zhaoyang Zhang, Jiacheng Hu, Yuhui Jin, Zihao Zhao

Abstract: The works of Gatys et al. demonstrated the capability of Convolutional Neural Networks (CNNs) in creating artistic style images. This process of transferring content images in different styles is called Neural Style Transfer (NST). In this paper, we re-implement image-based NST, fast NST, and arbitrary NST. We also explore to utilize ResNet with activation smoothing in NST. Extensive experimental… ▽ More The works of Gatys et al. demonstrated the capability of Convolutional Neural Networks (CNNs) in creating artistic style images. This process of transferring content images in different styles is called Neural Style Transfer (NST). In this paper, we re-implement image-based NST, fast NST, and arbitrary NST. We also explore to utilize ResNet with activation smoothing in NST. Extensive experimental results demonstrate that smoothing transformation can greatly improve the quality of stylization results. △ Less

Submitted 12 November, 2024; originally announced November 2024.

Comments: 8 pages,7 figures

Showing 1–50 of 226 results for author: Hu, J