-
Hierarchical Testing with Rabbit Optimization for Industrial Cyber-Physical Systems
Authors:
Jinwei Hu,
Zezhi Tang,
Xin Jin,
Benyuan Zhang,
Yi Dong,
Xiaowei Huang
Abstract:
This paper presents HERO (Hierarchical Testing with Rabbit Optimization), a novel black-box adversarial testing framework for evaluating the robustness of deep learning-based Prognostics and Health Management systems in Industrial Cyber-Physical Systems. Leveraging Artificial Rabbit Optimization, HERO generates physically constrained adversarial examples that align with real-world data distributio…
▽ More
This paper presents HERO (Hierarchical Testing with Rabbit Optimization), a novel black-box adversarial testing framework for evaluating the robustness of deep learning-based Prognostics and Health Management systems in Industrial Cyber-Physical Systems. Leveraging Artificial Rabbit Optimization, HERO generates physically constrained adversarial examples that align with real-world data distributions via global and local perspective. Its generalizability ensures applicability across diverse ICPS scenarios. This study specifically focuses on the Proton Exchange Membrane Fuel Cell system, chosen for its highly dynamic operational conditions, complex degradation mechanisms, and increasing integration into ICPS as a sustainable and efficient energy solution. Experimental results highlight HERO's ability to uncover vulnerabilities in even state-of-the-art PHM models, underscoring the critical need for enhanced robustness in real-world applications. By addressing these challenges, HERO demonstrates its potential to advance more resilient PHM systems across a wide range of ICPS domains.
△ Less
Submitted 5 July, 2025;
originally announced July 2025.
-
LearnAFE: Circuit-Algorithm Co-design Framework for Learnable Audio Analog Front-End
Authors:
Jinhai Hu,
Zhongyi Zhang,
Cong Sheng Leow,
Wang Ling Goh,
Yuan Gao
Abstract:
This paper presents a circuit-algorithm co-design framework for learnable analog front-end (AFE) in audio signal classification. Designing AFE and backend classifiers separately is a common practice but non-ideal, as shown in this paper. Instead, this paper proposes a joint optimization of the backend classifier with the AFE's transfer function to achieve system-level optimum. More specifically, t…
▽ More
This paper presents a circuit-algorithm co-design framework for learnable analog front-end (AFE) in audio signal classification. Designing AFE and backend classifiers separately is a common practice but non-ideal, as shown in this paper. Instead, this paper proposes a joint optimization of the backend classifier with the AFE's transfer function to achieve system-level optimum. More specifically, the transfer function parameters of an analog bandpass filter (BPF) bank are tuned in a signal-to-noise ratio (SNR)-aware training loop for the classifier. Using a co-design loss function LBPF, this work shows superior optimization of both the filter bank and the classifier. Implemented in open-source SKY130 130nm CMOS process, the optimized design achieved 90.5%-94.2% accuracy for 10-keyword classification task across a wide range of input signal SNR from 5 dB to 20 dB, with only 22k classifier parameters. Compared to conventional approach, the proposed audio AFE achieves 8.7% and 12.9% reduction in power and capacitor area respectively.
△ Less
Submitted 1 July, 2025;
originally announced July 2025.
-
Distillation-Enabled Knowledge Alignment for Generative Semantic Communications in AIGC Provisioning Tasks
Authors:
Jingzhi Hu,
Geoffrey Ye Li
Abstract:
Due to the surging amount of AI-generated content (AIGC), its provisioning to edges and mobile users from the cloud incurs substantial traffic on networks. Generative semantic communication (GSC) offers a promising solution by transmitting highly compact information, i.e., prompt text and latent representations, instead of high-dimensional AIGC data. However, GSC relies on the alignment between th…
▽ More
Due to the surging amount of AI-generated content (AIGC), its provisioning to edges and mobile users from the cloud incurs substantial traffic on networks. Generative semantic communication (GSC) offers a promising solution by transmitting highly compact information, i.e., prompt text and latent representations, instead of high-dimensional AIGC data. However, GSC relies on the alignment between the knowledge in the cloud generative AI (GAI) and that possessed by the edges and users, and between the knowledge for wireless transmission and that of actual channels, which remains challenging. In this paper, we propose DeKA-g, a distillation-enabled knowledge alignment algorithm for GSC systems. The core idea is to distill the generation knowledge from the cloud-GAI into low-rank matrices, which can be incorporated by the edge and used to adapt the transmission knowledge to diverse wireless channel conditions. DeKA-g comprises two novel methods: metaword-aided knowledge distillation (MAKD) and variable-rate grouped SNR adaptation (VGSA). For MAKD, an optimized metaword is employed to enhance the efficiency of knowledge distillation, while VGSA enables efficient adaptation to diverse compression rates and SNR ranges. From simulation results, DeKA-g improves the alignment between the edge-generated images and the cloud-generated ones by 44%. Moreover, it adapts to compression rates with 116% higher efficiency than the baseline and enhances the performance in low-SNR conditions by 28%.
△ Less
Submitted 24 June, 2025;
originally announced June 2025.
-
Combining Self-attention and Dilation Convolutional for Semantic Segmentation of Coal Maceral Groups
Authors:
Zhenghao Xi,
Zhengnan Lv,
Yang Zheng,
Xiang Liu,
Zhuang Yu,
Junran Chen,
Jing Hu,
Yaqi Liu
Abstract:
The segmentation of coal maceral groups can be described as a semantic segmentation process of coal maceral group images, which is of great significance for studying the chemical properties of coal. Generally, existing semantic segmentation models of coal maceral groups use the method of stacking parameters to achieve higher accuracy. It leads to increased computational requirements and impacts mo…
▽ More
The segmentation of coal maceral groups can be described as a semantic segmentation process of coal maceral group images, which is of great significance for studying the chemical properties of coal. Generally, existing semantic segmentation models of coal maceral groups use the method of stacking parameters to achieve higher accuracy. It leads to increased computational requirements and impacts model training efficiency. At the same time, due to the professionalism and diversity of coal maceral group images sampling, obtaining the number of samples for model training requires a long time and professional personnel operation. To address these issues, We have innovatively developed an IoT-based DA-VIT parallel network model. By utilizing this model, we can continuously broaden the dataset through IoT and achieving sustained improvement in the accuracy of coal maceral groups segmentation. Besides, we decouple the parallel network from the backbone network to ensure the normal using of the backbone network during model data updates. Secondly, DCSA mechanism of DA-VIT is introduced to enhance the local feature information of coal microscopic images. This DCSA can decompose the large kernels of convolutional attention into multiple scales and reduce 81.18% of parameters.Finally, we performed the contrast experiment and ablation experiment between DA-VIT and state-of-the-art methods at lots of evaluation metrics. Experimental results show that DA-VIT-Base achieves 92.14% pixel accuracy and 63.18% mIoU. Params and FLOPs of DA-VIT-Tiny are 4.95M and 8.99G, respectively. All of the evaluation metrics of the proposed DA-VIT are better than other state-of-the-art methods.
△ Less
Submitted 15 June, 2025;
originally announced June 2025.
-
Taming Stable Diffusion for Computed Tomography Blind Super-Resolution
Authors:
Chunlei Li,
Yilei Shi,
Haoxi Hu,
Jingliang Hu,
Xiao Xiang Zhu,
Lichao Mou
Abstract:
High-resolution computed tomography (CT) imaging is essential for medical diagnosis but requires increased radiation exposure, creating a critical trade-off between image quality and patient safety. While deep learning methods have shown promise in CT super-resolution, they face challenges with complex degradations and limited medical training data. Meanwhile, large-scale pre-trained diffusion mod…
▽ More
High-resolution computed tomography (CT) imaging is essential for medical diagnosis but requires increased radiation exposure, creating a critical trade-off between image quality and patient safety. While deep learning methods have shown promise in CT super-resolution, they face challenges with complex degradations and limited medical training data. Meanwhile, large-scale pre-trained diffusion models, particularly Stable Diffusion, have demonstrated remarkable capabilities in synthesizing fine details across various vision tasks. Motivated by this, we propose a novel framework that adapts Stable Diffusion for CT blind super-resolution. We employ a practical degradation model to synthesize realistic low-quality images and leverage a pre-trained vision-language model to generate corresponding descriptions. Subsequently, we perform super-resolution using Stable Diffusion with a specialized controlling strategy, conditioned on both low-resolution inputs and the generated text descriptions. Extensive experiments show that our method outperforms existing approaches, demonstrating its potential for achieving high-quality CT imaging at reduced radiation doses. Our code will be made publicly available.
△ Less
Submitted 13 June, 2025;
originally announced June 2025.
-
WIP: Exploring the Value of a Debugging Cheat Sheet and Mini Lecture in Improving Undergraduate Debugging Skills and Mindset
Authors:
Andrew Ash,
John Hu
Abstract:
This work-in-progress research paper explores the efficacy of a small-scale microelectronics debugging education intervention utilizing quasi-experimental design in an introductory microelectronics course for third-year electrical and computer engineering (ECE) students. In the first semester of research, the experimental group attended a debugging "mini lecture" covering two common sources of cir…
▽ More
This work-in-progress research paper explores the efficacy of a small-scale microelectronics debugging education intervention utilizing quasi-experimental design in an introductory microelectronics course for third-year electrical and computer engineering (ECE) students. In the first semester of research, the experimental group attended a debugging "mini lecture" covering two common sources of circuit error and received a debugging cheat sheet with recommendations for testing and hypothesis formation. Across three debugging problems, students in the experimental group were faster by an average of 1:43 and had a 7 percent higher success rate than the control group. Both groups demonstrated a strong general growth mindset while the experimental group also displayed a shift in their debugging mindset by perceiving a greater value towards debugging. Though these differences are not yet statistically significant, the pilot results indicate that a mini-lecture and debugging cheat sheet are steps in the right direction toward improving students' readiness for debugging in the workplace.
△ Less
Submitted 12 June, 2025;
originally announced June 2025.
-
A Family of Robust Generalized Adaptive Filters and Application for Time-series Prediction
Authors:
Yi Peng,
Haiquan Zhao,
Jinhui Hu
Abstract:
The continuous development of new adaptive filters (AFs) based on novel cost functions (CFs) is driven by the demands of various application scenarios and noise environments. However, these algorithms typically demonstrate optimal performance only in specific conditions. In the event of the noise change, the performance of these AFs often declines, rendering simple parameter adjustments ineffectiv…
▽ More
The continuous development of new adaptive filters (AFs) based on novel cost functions (CFs) is driven by the demands of various application scenarios and noise environments. However, these algorithms typically demonstrate optimal performance only in specific conditions. In the event of the noise change, the performance of these AFs often declines, rendering simple parameter adjustments ineffective. Instead, a modification of the CF is necessary. To address this issue, the robust generalized adaptive AF (RGA-AF) with strong adaptability and flexibility is proposed in this paper. The flexibility of the RGA-AF's CF allows for smooth adaptation to varying noise environments through parameter adjustments, ensuring optimal filtering performance in diverse scenarios. Moreover, we introduce several fundamental properties of negative RGA (NRGA) entropy and present the negative asymmetric RGA-AF (NAR-GA-AF) and kernel recursive NRGA-AF (KRNRGA-AF). These AFs address asymmetric noise distribution and nonlinear filtering issues, respectively. Simulations of linear system identification and time-series prediction for Chua's circuit under different noise environments demonstrate the superiority of the proposed algorithms in comparison to existing techniques.
△ Less
Submitted 31 May, 2025;
originally announced June 2025.
-
DarkDiff: Advancing Low-Light Raw Enhancement by Retasking Diffusion Models for Camera ISP
Authors:
Amber Yijia Zheng,
Yu Zhang,
Jun Hu,
Raymond A. Yeh,
Chen Chen
Abstract:
High-quality photography in extreme low-light conditions is challenging but impactful for digital cameras. With advanced computing hardware, traditional camera image signal processor (ISP) algorithms are gradually being replaced by efficient deep networks that enhance noisy raw images more intelligently. However, existing regression-based models often minimize pixel errors and result in oversmooth…
▽ More
High-quality photography in extreme low-light conditions is challenging but impactful for digital cameras. With advanced computing hardware, traditional camera image signal processor (ISP) algorithms are gradually being replaced by efficient deep networks that enhance noisy raw images more intelligently. However, existing regression-based models often minimize pixel errors and result in oversmoothing of low-light photos or deep shadows. Recent work has attempted to address this limitation by training a diffusion model from scratch, yet those models still struggle to recover sharp image details and accurate colors. We introduce a novel framework to enhance low-light raw images by retasking pre-trained generative diffusion models with the camera ISP. Extensive experiments demonstrate that our method outperforms the state-of-the-art in perceptual quality across three challenging low-light raw image benchmarks.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
Distillation-Enabled Knowledge Alignment Protocol for Semantic Communication in AI Agent Networks
Authors:
Jingzhi Hu,
Geoffrey Ye Li
Abstract:
Future networks are envisioned to connect massive artificial intelligence (AI) agents, enabling their extensive collaboration on diverse tasks. Compared to traditional entities, these agents naturally suit the semantic communication (SC), which can significantly enhance the bandwidth efficiency. Nevertheless, SC requires the knowledge among agents to be aligned, while agents have distinct expert k…
▽ More
Future networks are envisioned to connect massive artificial intelligence (AI) agents, enabling their extensive collaboration on diverse tasks. Compared to traditional entities, these agents naturally suit the semantic communication (SC), which can significantly enhance the bandwidth efficiency. Nevertheless, SC requires the knowledge among agents to be aligned, while agents have distinct expert knowledge for their individual tasks in practice. In this paper, we propose a distillation-enabled knowledge alignment protocol (DeKAP), which distills the expert knowledge of each agent into parameter-efficient low-rank matrices, allocates them across the network, and allows agents to simultaneously maintain aligned knowledge for multiple tasks. We formulate the joint minimization of alignment loss, communication overhead, and storage cost as a large-scale integer linear programming problem and develop a highly efficient greedy algorithm. From computer simulation, the DeKAP establishes knowledge alignment with the lowest communication and computation resources compared to conventional approaches.
△ Less
Submitted 7 May, 2025;
originally announced May 2025.
-
An Inclusive Foundation Model for Generalizable Cytogenetics in Precision Oncology
Authors:
Changchun Yang,
Weiqian Dai,
Yilan Zhang,
Siyuan Chen,
Jingdong Hu,
Junkai Su,
Yuxuan Chen,
Ao Xu,
Na Li,
Xin Gao,
Yongguo Yu
Abstract:
Chromosome analysis is vital for diagnosing genetic disorders and guiding cancer therapy decisions through the identification of somatic clonal aberrations. However, developing an AI model are hindered by the overwhelming complexity and diversity of chromosomal abnormalities, requiring extensive annotation efforts, while automated methods remain task-specific and lack generalizability due to the s…
▽ More
Chromosome analysis is vital for diagnosing genetic disorders and guiding cancer therapy decisions through the identification of somatic clonal aberrations. However, developing an AI model are hindered by the overwhelming complexity and diversity of chromosomal abnormalities, requiring extensive annotation efforts, while automated methods remain task-specific and lack generalizability due to the scarcity of comprehensive datasets spanning diverse resource conditions. Here, we introduce CHROMA, a foundation model for cytogenomics, designed to overcome these challenges by learning generalizable representations of chromosomal abnormalities. Pre-trained on over 84,000 specimens (~4 million chromosomal images) via self-supervised learning, CHROMA outperforms other methods across all types of abnormalities, even when trained on fewer labelled data and more imbalanced datasets. By facilitating comprehensive mapping of instability and clonal leisons across various aberration types, CHROMA offers a scalable and generalizable solution for reliable and automated clinical analysis, reducing the annotation workload for experts and advancing precision oncology through the early detection of rare genomic abnormalities, enabling broad clinical AI applications and making advanced genomic analysis more accessible.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
CL-CaGAN: Capsule differential adversarial continuous learning for cross-domain hyperspectral anomaly detection
Authors:
Jianing Wang,
Siying Guo,
Zheng Hua,
Runhu Huang,
Jinyu Hu,
Maoguo Gong
Abstract:
Anomaly detection (AD) has attracted remarkable attention in hyperspectral image (HSI) processing fields, and most existing deep learning (DL)-based algorithms indicate dramatic potential for detecting anomaly samples through specific training process under current scenario. However, the limited prior information and the catastrophic forgetting problem indicate crucial challenges for existing DL s…
▽ More
Anomaly detection (AD) has attracted remarkable attention in hyperspectral image (HSI) processing fields, and most existing deep learning (DL)-based algorithms indicate dramatic potential for detecting anomaly samples through specific training process under current scenario. However, the limited prior information and the catastrophic forgetting problem indicate crucial challenges for existing DL structure in open scenarios cross-domain detection. In order to improve the detection performance, a novel continual learning-based capsule differential generative adversarial network (CL-CaGAN) is proposed to elevate the cross-scenario learning performance for facilitating the real application of DL-based structure in hyperspectral AD (HAD) task. First, a modified capsule structure with adversarial learning network is constructed to estimate the background distribution for surmounting the deficiency of prior information. To mitigate the catastrophic forgetting phenomenon, clustering-based sample replay strategy and a designed extra self-distillation regularization are integrated for merging the history and future knowledge in continual AD task, while the discriminative learning ability from previous detection scenario to current scenario is retained by the elaborately designed structure with continual learning (CL) strategy. In addition, the differentiable enhancement is enforced to augment the generation performance of the training data. This further stabilizes the training process with better convergence and efficiently consolidates the reconstruction ability of background samples. To verify the effectiveness of our proposed CL-CaGAN, we conduct experiments on several real HSIs, and the results indicate that the proposed CL-CaGAN demonstrates higher detection performance and continuous learning capacity for mitigating the catastrophic forgetting under cross-domain scenarios.
△ Less
Submitted 16 May, 2025;
originally announced May 2025.
-
Future-Oriented Navigation: Dynamic Obstacle Avoidance with One-Shot Energy-Based Multimodal Motion Prediction
Authors:
Ze Zhang,
Georg Hess,
Junjie Hu,
Emmanuel Dean,
Lennart Svensson,
Knut Åkesson
Abstract:
This paper proposes an integrated approach for the safe and efficient control of mobile robots in dynamic and uncertain environments. The approach consists of two key steps: one-shot multimodal motion prediction to anticipate motions of dynamic obstacles and model predictive control to incorporate these predictions into the motion planning process. Motion prediction is driven by an energy-based ne…
▽ More
This paper proposes an integrated approach for the safe and efficient control of mobile robots in dynamic and uncertain environments. The approach consists of two key steps: one-shot multimodal motion prediction to anticipate motions of dynamic obstacles and model predictive control to incorporate these predictions into the motion planning process. Motion prediction is driven by an energy-based neural network that generates high-resolution, multi-step predictions in a single operation. The prediction outcomes are further utilized to create geometric shapes formulated as mathematical constraints. Instead of treating each dynamic obstacle individually, predicted obstacles are grouped by proximity in an unsupervised way to improve performance and efficiency. The overall collision-free navigation is handled by model predictive control with a specific design for proactive dynamic obstacle avoidance. The proposed approach allows mobile robots to navigate effectively in dynamic environments. Its performance is accessed across various scenarios that represent typical warehouse settings. The results demonstrate that the proposed approach outperforms other existing dynamic obstacle avoidance methods.
△ Less
Submitted 4 June, 2025; v1 submitted 30 April, 2025;
originally announced May 2025.
-
Time-Series Analysis on Edge-AI Hardware for Healthcare Monitoring
Authors:
Jinhai Hu
Abstract:
This project addresses the need for efficient, real-time analysis of biomedical signals such as electrocardiograms (ECG) and electroencephalograms (EEG) for continuous health monitoring. Traditional methods rely on long-duration data recording followed by offline analysis, which is power-intensive and delays responses to critical symptoms such as arrhythmia. To overcome these limitations, a time-d…
▽ More
This project addresses the need for efficient, real-time analysis of biomedical signals such as electrocardiograms (ECG) and electroencephalograms (EEG) for continuous health monitoring. Traditional methods rely on long-duration data recording followed by offline analysis, which is power-intensive and delays responses to critical symptoms such as arrhythmia. To overcome these limitations, a time-domain ECG analysis model based on a novel dynamically-biased Long Short-Term Memory (DB-LSTM) neural network is proposed. This model supports simultaneous ECG forecasting and classification with high performance-achieving over 98% accuracy and a normalized mean square error below 1e-3 for forecasting, and over 97% accuracy with faster convergence and fewer training parameters for classification. To enable edge deployment, the model is hardware-optimized by quantizing weights to INT4 or INT3 formats, resulting in only a 2% and 6% drop in classification accuracy during training and inference, respectively, while maintaining full accuracy for forecasting. Extensive simulations using multiple ECG datasets confirm the model's robustness. Future work includes implementing the algorithm on FPGA and CMOS circuits for practical cardiac monitoring, as well as developing a digital hardware platform that supports flexible neural network configurations and on-chip online training for personalized healthcare applications.
△ Less
Submitted 21 April, 2025;
originally announced April 2025.
-
PIV-FlowDiffuser:Transfer-learning-based denoising diffusion models for PIV
Authors:
Qianyu Zhu,
Junjie Wang,
Jeremiah Hu,
Jia Ai,
Yong Lee
Abstract:
Deep learning algorithms have significantly reduced the computational time and improved the spatial resolution of particle image velocimetry~(PIV). However, the models trained on synthetic datasets might have a degraded performance on practical particle images due to domain gaps. As a result, special residual patterns are often observed for the vector fields of deep learning-based estimators. To r…
▽ More
Deep learning algorithms have significantly reduced the computational time and improved the spatial resolution of particle image velocimetry~(PIV). However, the models trained on synthetic datasets might have a degraded performance on practical particle images due to domain gaps. As a result, special residual patterns are often observed for the vector fields of deep learning-based estimators. To reduce the special noise step-by-step, we employ a denoising diffusion model~(FlowDiffuser) for PIV analysis. And the data-hungry iterative denoising diffusion model is trained via a transfer learning strategy, resulting in our PIV-FlowDiffuser method. Specifically, (1) pre-training a FlowDiffuser model with multiple optical flow datasets of the computer vision community, such as Sintel, KITTI, etc; (2) fine-tuning the pre-trained model on synthetic PIV datasets. Note that the PIV images are upsampled by a factor of two to resolve the small-scale turbulent flow structures. The visualized results indicate that our PIV-FlowDiffuser effectively suppresses the noise patterns. Therefore, the denoising diffusion model reduces the average end-point error~($AEE$) by 59.4% over RAFT256-PIV baseline on the classic Cai's dataset. Besides, PIV-FlowDiffuser exhibits enhanced generalization performance on unseen particle images due to transfer learning. Overall, this study highlights the transfer-learning-based denoising diffusion models for PIV. And a detailed implementation is recommended for interested readers in the repository https://github.com/Zhu-Qianyu/PIV-FlowDiffuser.
△ Less
Submitted 21 April, 2025;
originally announced April 2025.
-
NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images: Methods and Results
Authors:
Xin Li,
Yeying Jin,
Xin Jin,
Zongwei Wu,
Bingchen Li,
Yufei Wang,
Wenhan Yang,
Yu Li,
Zhibo Chen,
Bihan Wen,
Robby T. Tan,
Radu Timofte,
Qiyu Rong,
Hongyuan Jing,
Mengmeng Zhang,
Jinglong Li,
Xiangyu Lu,
Yi Ren,
Yuting Liu,
Meng Zhang,
Xiang Chen,
Qiyuan Guan,
Jiangxin Dong,
Jinshan Pan,
Conglin Gou
, et al. (112 additional authors not shown)
Abstract:
This paper reviews the NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images. This challenge received a wide range of impressive solutions, which are developed and evaluated using our collected real-world Raindrop Clarity dataset. Unlike existing deraining datasets, our Raindrop Clarity dataset is more diverse and challenging in degradation types and contents, which includ…
▽ More
This paper reviews the NTIRE 2025 Challenge on Day and Night Raindrop Removal for Dual-Focused Images. This challenge received a wide range of impressive solutions, which are developed and evaluated using our collected real-world Raindrop Clarity dataset. Unlike existing deraining datasets, our Raindrop Clarity dataset is more diverse and challenging in degradation types and contents, which includes day raindrop-focused, day background-focused, night raindrop-focused, and night background-focused degradations. This dataset is divided into three subsets for competition: 14,139 images for training, 240 images for validation, and 731 images for testing. The primary objective of this challenge is to establish a new and powerful benchmark for the task of removing raindrops under varying lighting and focus conditions. There are a total of 361 participants in the competition, and 32 teams submitting valid solutions and fact sheets for the final testing phase. These submissions achieved state-of-the-art (SOTA) performance on the Raindrop Clarity dataset. The project can be found at https://lixinustc.github.io/CVPR-NTIRE2025-RainDrop-Competition.github.io/.
△ Less
Submitted 19 April, 2025; v1 submitted 17 April, 2025;
originally announced April 2025.
-
The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report
Authors:
Bin Ren,
Hang Guo,
Lei Sun,
Zongwei Wu,
Radu Timofte,
Yawei Li,
Yao Zhang,
Xinning Chai,
Zhengxue Cheng,
Yingsheng Qin,
Yucai Yang,
Li Song,
Hongyuan Yu,
Pufan Xu,
Cheng Wan,
Zhijuan Huang,
Peng Guo,
Shuyuan Cui,
Chenjun Li,
Xuehai Hu,
Pan Pan,
Xin Zhang,
Heng Zhang,
Qing Luo,
Linyan Jiang
, et al. (122 additional authors not shown)
Abstract:
This paper presents a comprehensive review of the NTIRE 2025 Challenge on Single-Image Efficient Super-Resolution (ESR). The challenge aimed to advance the development of deep models that optimize key computational metrics, i.e., runtime, parameters, and FLOPs, while achieving a PSNR of at least 26.90 dB on the $\operatorname{DIV2K\_LSDIR\_valid}$ dataset and 26.99 dB on the…
▽ More
This paper presents a comprehensive review of the NTIRE 2025 Challenge on Single-Image Efficient Super-Resolution (ESR). The challenge aimed to advance the development of deep models that optimize key computational metrics, i.e., runtime, parameters, and FLOPs, while achieving a PSNR of at least 26.90 dB on the $\operatorname{DIV2K\_LSDIR\_valid}$ dataset and 26.99 dB on the $\operatorname{DIV2K\_LSDIR\_test}$ dataset. A robust participation saw \textbf{244} registered entrants, with \textbf{43} teams submitting valid entries. This report meticulously analyzes these methods and results, emphasizing groundbreaking advancements in state-of-the-art single-image ESR techniques. The analysis highlights innovative approaches and establishes benchmarks for future research in the field.
△ Less
Submitted 14 April, 2025;
originally announced April 2025.
-
Adaptive Robust Unscented Kalman Filter for Dynamic State Estimation of Power System
Authors:
Duc Viet Nguyen,
Haiquan Zhao,
Jinhui Hu,
Le Ngoc Giang
Abstract:
Non-Gaussian noise and the uncertainty of noise distribution are the common factors that reduce accuracy in dynamic state estimation of power systems (PS). In addition, the optimal value of the free coefficients in the unscented Kalman filter (UKF) based on information theoretic criteria is also an urgent problem. In this paper, a robust adaptive UKF (AUKF) under generalized minimum mixture error…
▽ More
Non-Gaussian noise and the uncertainty of noise distribution are the common factors that reduce accuracy in dynamic state estimation of power systems (PS). In addition, the optimal value of the free coefficients in the unscented Kalman filter (UKF) based on information theoretic criteria is also an urgent problem. In this paper, a robust adaptive UKF (AUKF) under generalized minimum mixture error entropy with fiducial points (GMMEEF) over improve Snow Geese algorithm (ISGA) (ISGA-GMMEEF-AUKF) is proposed to overcome the above difficulties. The estimation process of the proposed algorithm is based on several key steps including augmented regression error model (AREM) construction, adaptive state estimation, and free coefficients optimization. Specifically, an AREM consisting of state prediction and measurement errors is established at the first step. Then, GMMEEF-AUKF is developed by solving the optimization problem based on GMMEEF, which uses a generalized Gaussian kernel combined with mixture correntropy to enhance the flexibility further and resolve the data problem with complex attributes and update the noise covariance matrix according to the AREM framework. Finally, the ISGA is designed to automatically calculate the optimal value of coefficients such as the shape coefficients of the kernel in the GMMEEF criterion, the coefficients selection sigma points in unscented transform, and the update coefficient of the noise covariance matrices fit with the PS model. Simulation results on the IEEE 14, 30, and 57-bus test systems in complex scenarios have confirmed that the proposed algorithm outperforms the MEEF-UKF and UKF by an average efficiency of 26% and 65%, respectively.
△ Less
Submitted 10 April, 2025;
originally announced April 2025.
-
Diffusion Augmented Complex Maximum Total Correntropy Algorithm for Power System Frequency Estimation
Authors:
Haiquan Zhao,
Yi Peng,
Jinsong Chen,
Jinhui Hu
Abstract:
Currently, adaptive filtering algorithms have been widely applied in frequency estimation for power systems. However, research on diffusion tasks remains insufficient. Existing diffusion adaptive frequency estimation algorithms exhibit certain limitations in handling input noise and lack robustness against impulsive noise. Moreover, traditional adaptive filtering algorithms designed based on the s…
▽ More
Currently, adaptive filtering algorithms have been widely applied in frequency estimation for power systems. However, research on diffusion tasks remains insufficient. Existing diffusion adaptive frequency estimation algorithms exhibit certain limitations in handling input noise and lack robustness against impulsive noise. Moreover, traditional adaptive filtering algorithms designed based on the strictly-linear (SL) model fail to effectively address frequency estimation challenges in unbalanced three-phase power systems. To address these issues, this letter proposes an improved diffusion augmented complex maximum total correntropy (DAMTCC) algorithm based on the widely linear (WL) model. The proposed algorithm not only significantly enhances the capability to handle input noise but also demonstrates superior robustness to impulsive noise. Furthermore, it successfully resolves the critical challenge of frequency estimation in unbalanced three-phase power systems, offering an efficient and reliable solution for diffusion power system frequency estimation. Finally, we analyze the stability of the algorithm and computer simulations verify the excellent performance of the algorithm.
△ Less
Submitted 9 April, 2025;
originally announced April 2025.
-
Algorithm Design and Prototype Validation for Reconfigurable Intelligent Sensing Surface: Forward-Only Transmission
Authors:
Cheng Luo,
Luping Xiang,
Jie Hu,
Kun Yang
Abstract:
Sensing-assisted communication schemes have recently garnered significant research attention. In this work, we design a dual-function reconfigurable intelligent surface (RIS), integrating both active and passive elements, referred to as the reconfigurable intelligent sensing surface (RISS), to enhance communication. By leveraging sensing results from the active elements, we propose communication e…
▽ More
Sensing-assisted communication schemes have recently garnered significant research attention. In this work, we design a dual-function reconfigurable intelligent surface (RIS), integrating both active and passive elements, referred to as the reconfigurable intelligent sensing surface (RISS), to enhance communication. By leveraging sensing results from the active elements, we propose communication enhancement and robust interference suppression schemes for both near-field and far-field models, implemented through the passive elements. These schemes remove the need for base station (BS) feedback for RISS control, simplifying the communication process by replacing traditional channel state information (CSI) feedback with real-time sensing from the active elements. The proposed schemes are theoretically analyzed and then validated using software-defined radio (SDR). Experimental results demonstrate the effectiveness of the sensing algorithms in real-world scenarios, such as direction of arrival (DOA) estimation and radio frequency (RF) identification recognition. Moreover, the RISS-assisted communication system shows strong performance in communication enhancement and interference suppression, particularly in near-field models.
△ Less
Submitted 19 June, 2025; v1 submitted 31 March, 2025;
originally announced March 2025.
-
Ultrasound Image-to-Video Synthesis via Latent Dynamic Diffusion Models
Authors:
Tingxiu Chen,
Yilei Shi,
Zixuan Zheng,
Bingcong Yan,
Jingliang Hu,
Xiao Xiang Zhu,
Lichao Mou
Abstract:
Ultrasound video classification enables automated diagnosis and has emerged as an important research area. However, publicly available ultrasound video datasets remain scarce, hindering progress in developing effective video classification models. We propose addressing this shortage by synthesizing plausible ultrasound videos from readily available, abundant ultrasound images. To this end, we intr…
▽ More
Ultrasound video classification enables automated diagnosis and has emerged as an important research area. However, publicly available ultrasound video datasets remain scarce, hindering progress in developing effective video classification models. We propose addressing this shortage by synthesizing plausible ultrasound videos from readily available, abundant ultrasound images. To this end, we introduce a latent dynamic diffusion model (LDDM) to efficiently translate static images to dynamic sequences with realistic video characteristics. We demonstrate strong quantitative results and visually appealing synthesized videos on the BUSV benchmark. Notably, training video classification models on combinations of real and LDDM-synthesized videos substantially improves performance over using real data alone, indicating our method successfully emulates dynamics critical for discrimination. Our image-to-video approach provides an effective data augmentation solution to advance ultrasound video analysis. Code is available at https://github.com/MedAITech/U_I2V.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
Striving for Simplicity: Simple Yet Effective Prior-Aware Pseudo-Labeling for Semi-Supervised Ultrasound Image Segmentation
Authors:
Yaxiong Chen,
Yujie Wang,
Zixuan Zheng,
Jingliang Hu,
Yilei Shi,
Shengwu Xiong,
Xiao Xiang Zhu,
Lichao Mou
Abstract:
Medical ultrasound imaging is ubiquitous, but manual analysis struggles to keep pace. Automated segmentation can help but requires large labeled datasets, which are scarce. Semi-supervised learning leveraging both unlabeled and limited labeled data is a promising approach. State-of-the-art methods use consistency regularization or pseudo-labeling but grow increasingly complex. Without sufficient l…
▽ More
Medical ultrasound imaging is ubiquitous, but manual analysis struggles to keep pace. Automated segmentation can help but requires large labeled datasets, which are scarce. Semi-supervised learning leveraging both unlabeled and limited labeled data is a promising approach. State-of-the-art methods use consistency regularization or pseudo-labeling but grow increasingly complex. Without sufficient labels, these models often latch onto artifacts or allow anatomically implausible segmentations. In this paper, we present a simple yet effective pseudo-labeling method with an adversarially learned shape prior to regularize segmentations. Specifically, we devise an encoder-twin-decoder network where the shape prior acts as an implicit shape model, penalizing anatomically implausible but not ground-truth-deviating predictions. Without bells and whistles, our simple approach achieves state-of-the-art performance on two benchmarks under different partition protocols. We provide a strong baseline for future semi-supervised medical image segmentation. Code is available at https://github.com/WUTCM-Lab/Shape-Prior-Semi-Seg.
△ Less
Submitted 18 March, 2025;
originally announced March 2025.
-
Bedrock Models in Communication and Sensing: Advancing Generalization, Transferability, and Performance
Authors:
Cheng Luo,
Luping Xiang,
Jie Hu,
Kun Yang
Abstract:
Deep learning (DL) has emerged as a powerful tool for addressing the intricate challenges inherent in communication and sensing systems, significantly enhancing the intelligence of future sixth-generation (6G) networks. A substantial body of research has highlighted the promise of DL-based techniques in these domains. However, in addition to improving accuracy, new challenges must be addressed reg…
▽ More
Deep learning (DL) has emerged as a powerful tool for addressing the intricate challenges inherent in communication and sensing systems, significantly enhancing the intelligence of future sixth-generation (6G) networks. A substantial body of research has highlighted the promise of DL-based techniques in these domains. However, in addition to improving accuracy, new challenges must be addressed regarding the generalization and transferability of DL-based systems. To tackle these issues, this paper introduces a series of mathematically grounded and modularized models, referred to as bedrock models, specifically designed for integration into both communication and sensing systems. Due to their modular architecture, these models can be seamlessly incorporated into existing communication and sensing frameworks. For communication systems, the proposed models demonstrate substantial performance improvements while also exhibit strong transferability, enabling direct parameter sharing across different tasks, which greatly facilitates practical deployment. In sensing applications, the integration of the bedrock models into existing systems results in superior performance, reducing delay and Doppler estimation errors by an order of magnitude compared to traditional methods. Additionally, a pre-equalization strategy based on the bedrock models is proposed for the transmitter. By leveraging sensing information, the transmitted communication signal is dynamically adjusted without altering the communication model pre-trained in AWGN channels. This adaptation enables the system to effectively cope with doubly dispersive channels, restoring the received signal to an AWGN-like condition and achieving near-optimal performance. Simulation results substantiate the effectiveness and transferability of the proposed bedrock models, underscoring their potential to advance both communication and sensing systems.
△ Less
Submitted 11 March, 2025;
originally announced March 2025.
-
Low-Complexity Beamforming Design for Null Space-based Simultaneous Wireless Information and Power Transfer Systems
Authors:
Cheng Luo,
Jie Hu,
Luping Xiang,
Kun Yang
Abstract:
Simultaneous wireless information and power transfer (SWIPT) is a promising technology for the upcoming sixth-generation (6G) communication networks, enabling internet of things (IoT) devices and sensors to extend their operational lifetimes. In this paper, we propose a SWIPT scheme by projecting the interference signals from both intra-wireless information transfer (WIT) and inter-wireless energy…
▽ More
Simultaneous wireless information and power transfer (SWIPT) is a promising technology for the upcoming sixth-generation (6G) communication networks, enabling internet of things (IoT) devices and sensors to extend their operational lifetimes. In this paper, we propose a SWIPT scheme by projecting the interference signals from both intra-wireless information transfer (WIT) and inter-wireless energy transfer (WET) into the null space, simplifying the system into a point-to-point WIT and WET problem. Upon further analysis, we confirm that dedicated energy beamforming is unnecessary. In addition, we develop a low-complexity algorithm to solve the problem efficiently, further reducing computational overhead. Numerical results validate our analysis, showing that the computational complexity is reduced by 97.5\% and 99.96\% for the cases of $K^I = K^E = 2$, $M = 4$ and $K^I = K^E = 16$, $M = 64$, respectively.
△ Less
Submitted 11 March, 2025;
originally announced March 2025.
-
Reconfigurable Intelligent Sensing Surface enables Wireless Powered Communication Networks: Interference Suppression and Massive Wireless Energy Transfer
Authors:
Cheng Luo,
Jie Hu,
Luping Xiang,
Kun Yang
Abstract:
Recently, a novel structures of reconfigurable intelligent surface (RIS) integrating both passive and active elements, termed reconfigurable intelligent sensing surface (RISS), efficiently addresses challenges in RIS channel estimation and mitigates issues related to multiplicative path loss by processing the signal at the RISS. In this paper, we propose a sensing-assisted wirelessly powered commu…
▽ More
Recently, a novel structures of reconfigurable intelligent surface (RIS) integrating both passive and active elements, termed reconfigurable intelligent sensing surface (RISS), efficiently addresses challenges in RIS channel estimation and mitigates issues related to multiplicative path loss by processing the signal at the RISS. In this paper, we propose a sensing-assisted wirelessly powered communication network (WPCN) that utilizes RISS's sensing capabilities to maximize the channel capacity in uplink wireless information transfer (WIT) and assist in massive wireless energy transmission (WET) for downlink. For the WIT in the uplink, the sensing information is utilized to design an interference suppression passive reflection phase shift for the RISS, and take the imperfect sensing results and sharp null into consideration, we also propose a robust scheme. For the WET in the downlink, the massive WET scheme is adopted and benefits from a period of sensing results. The massive WET scheme including beam selection and rotation order optimization to enhance the lower bound of energy harvest for massive users and optimize waiting costs. Numerical results demonstrate the optimal interference suppression threshold for uplink WIT and underscore the achieved fairness in downlink WET. Collectively, by utilizing sensing information, the uplink channel capacity is improved by 20\%, and the worst energy performance and waiting costs for massive WET are effectively optimized, with improvements ranging from 19\% to 59\% and 27\% to 29\%, respectively.
△ Less
Submitted 11 March, 2025;
originally announced March 2025.
-
Energy-Efficient Port Selection and Beamforming Design for Integrated Data and Energy Transfer Assisted by Fluid Antennas
Authors:
Long Zhang,
Yizhe Zhao,
Halvin Yang,
Guangming Liang,
Jie Hu
Abstract:
Integrated data and energy transfer (IDET) is considered as a key enabler of 6G, as it can provide both wireless energy transfer (WET) and wireless data transfer (WDT) services towards low power devices. Thanks to the extra degree of freedom provided by fluid antenna (FA), incorporating FA into IDET systems presents a promising approach to enhance energy efficiency performance. This paper investig…
▽ More
Integrated data and energy transfer (IDET) is considered as a key enabler of 6G, as it can provide both wireless energy transfer (WET) and wireless data transfer (WDT) services towards low power devices. Thanks to the extra degree of freedom provided by fluid antenna (FA), incorporating FA into IDET systems presents a promising approach to enhance energy efficiency performance. This paper investigates a FA assisted IDET system, where the transmitter is equipped with multiple FAs and transmits wireless signals to the data receiver (DR) and the energy receiver (ER), which are both equipped with a single traditional antenna. The switching delay and energy consumption induced by port selection are taken into account in IDET system for the first time. We aim to obtain the optimal beamforming vector and the port selection strategy at the transmitter, in order to maximize the short-term and long-term WET efficiency, respectively. The instant sub-optimal solution is obtained by alternatively optimizing the beamforming vector and port selection in each transmission frame, while a novel constrained soft actor critic (C-SAC) algorithm is proposed to find the feasible policy of port selection from the long-term perspective. Simulation results demonstrate that our scheme is able to achieve greater gain in terms of both the short-term and long-term WET efficiency compared to other benchmarks, while not degrading WDT performance.
△ Less
Submitted 6 March, 2025;
originally announced March 2025.
-
Neuroverse3D: Developing In-Context Learning Universal Model for Neuroimaging in 3D
Authors:
Jiesi Hu,
Chenfei Ye,
Yanwu Yang,
Xutao Guo,
Yang Shang,
Pengcheng Shi,
Hanyang Peng,
Ting Ma
Abstract:
In-context learning (ICL), a type of universal model, demonstrates exceptional generalization across a wide range of tasks without retraining by leveraging task-specific guidance from context, making it particularly effective for the intricate demands of neuroimaging. However, current ICL models, limited to 2D inputs and thus exhibiting suboptimal performance, struggle to extend to 3D inputs due t…
▽ More
In-context learning (ICL), a type of universal model, demonstrates exceptional generalization across a wide range of tasks without retraining by leveraging task-specific guidance from context, making it particularly effective for the intricate demands of neuroimaging. However, current ICL models, limited to 2D inputs and thus exhibiting suboptimal performance, struggle to extend to 3D inputs due to the high memory demands of ICL. In this regard, we introduce Neuroverse3D, an ICL model capable of performing multiple neuroimaging tasks in 3D (e.g., segmentation, denoising, inpainting). Neuroverse3D overcomes the large memory consumption associated with 3D inputs through adaptive parallel-sequential context processing and a U-shaped fusion strategy, allowing it to handle an unlimited number of context images. Additionally, we propose an optimized loss function to balance multi-task training and enhance focus on anatomical boundaries. Our study incorporates 43,674 3D multi-modal scans from 19 neuroimaging datasets and evaluates Neuroverse3D on 14 diverse tasks using held-out test sets. The results demonstrate that Neuroverse3D significantly outperforms existing ICL models and closely matches task-specific models, enabling flexible adaptation to medical center variations without retraining. The code and model weights are publicly available at https://github.com/jiesihu/Neuroverse3D.
△ Less
Submitted 4 July, 2025; v1 submitted 4 March, 2025;
originally announced March 2025.
-
Target Tracking using Robust Sensor Motion Control
Authors:
Jingwei Hu,
Dave Zachariah,
Petre Stoica
Abstract:
We consider the problem of tracking moving targets using mobile wireless sensors (of possibly different types). This is a joint estimation and control problem in which a tracking system must take into account both target and sensor dynamics. We make minimal assumptions about the target dynamics, namely only that their accelerations are bounded. We develop a control law that determines the sensor m…
▽ More
We consider the problem of tracking moving targets using mobile wireless sensors (of possibly different types). This is a joint estimation and control problem in which a tracking system must take into account both target and sensor dynamics. We make minimal assumptions about the target dynamics, namely only that their accelerations are bounded. We develop a control law that determines the sensor motion control signals so as to maximize target resolvability as the target dynamics evolve. The method is given a tractable formulation that is amenable to an efficient search method and is evaluated in a series of experiments involving both round-trip time based ranging and Doppler frequency shift measurements
△ Less
Submitted 28 February, 2025;
originally announced February 2025.
-
Adaptive Input Design for Nonlinear System Identification with Operational Constraints
Authors:
Jingwei Hu,
Dave Zachariah,
Torbjörn Wigren,
Petre Stoica
Abstract:
We consider the problem of joint input design and parameter estimation for identifying nonlinear system models through the sequential acquisition of measurements while adhering to system constraints. We utilize a receding horizon approach and propose a new scale-invariant input design criterion, which is tailored to continuously updated parameter estimates, along with a new sequential parameter es…
▽ More
We consider the problem of joint input design and parameter estimation for identifying nonlinear system models through the sequential acquisition of measurements while adhering to system constraints. We utilize a receding horizon approach and propose a new scale-invariant input design criterion, which is tailored to continuously updated parameter estimates, along with a new sequential parameter estimator. We demonstrate the ability of the method to design informative experiments online, while steering the system within operational constraints.
△ Less
Submitted 23 May, 2025; v1 submitted 28 February, 2025;
originally announced February 2025.
-
Zero-Shot Semantic Communication with Multimodal Foundation Models
Authors:
Jiangjing Hu,
Haotian Wu,
Wenjing Zhang,
Fengyu Wang,
Wenjun Xu,
Hui Gao,
Deniz Gündüz
Abstract:
Most existing semantic communication (SemCom) systems use deep joint source-channel coding (DeepJSCC) to encode task-specific semantics in a goal-oriented manner. However, their reliance on predefined tasks and datasets significantly limits their flexibility and generalizability in practical deployments. Multi-modal foundation models provide a promising solution by generating universal semantic to…
▽ More
Most existing semantic communication (SemCom) systems use deep joint source-channel coding (DeepJSCC) to encode task-specific semantics in a goal-oriented manner. However, their reliance on predefined tasks and datasets significantly limits their flexibility and generalizability in practical deployments. Multi-modal foundation models provide a promising solution by generating universal semantic tokens. Inspired by this, we introduce SemCLIP, a zero-shot SemCom framework leveraging the contrastive language-image pre-training (CLIP) model. By transmitting CLIP-generated image tokens instead of raw images, SemCLIP enables efficient SemCom under low bandwidth and challenging channel conditions, facilitating diverse downstream tasks and zero-shot applications. Specifically, we propose a DeepJSCC scheme for efficient CLIP token encoding. To mitigate potential degradation caused by compression and channel noise, a multi-modal transmission-aware prompt learning mechanism is designed at the receiver, which adapts prompts based on transmission quality, enhancing system robustness and channel adaptability. Simulation results demonstrate that SemCLIP outperforms the baselines, achieving a $41\%$ improvement in zero-shot performance at low signal-to-noise ratios. Meanwhile, SemCLIP reduces bandwidth usage by more than $50$-fold compared to alternative image transmission methods, demonstrating the potential of foundation models towards a generalized, task-agnostic SemCom solution.
△ Less
Submitted 29 May, 2025; v1 submitted 25 February, 2025;
originally announced February 2025.
-
NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms
Authors:
Yashan Wang,
Shangda Wu,
Jianhuai Hu,
Xingjian Du,
Yueqi Peng,
Yongxin Huang,
Shuai Fan,
Xiaobing Li,
Feng Yu,
Maosong Sun
Abstract:
We introduce NotaGen, a symbolic music generation model aiming to explore the potential of producing high-quality classical sheet music. Inspired by the success of Large Language Models (LLMs), NotaGen adopts pre-training, fine-tuning, and reinforcement learning paradigms (henceforth referred to as the LLM training paradigms). It is pre-trained on 1.6M pieces of music in ABC notation, and then fin…
▽ More
We introduce NotaGen, a symbolic music generation model aiming to explore the potential of producing high-quality classical sheet music. Inspired by the success of Large Language Models (LLMs), NotaGen adopts pre-training, fine-tuning, and reinforcement learning paradigms (henceforth referred to as the LLM training paradigms). It is pre-trained on 1.6M pieces of music in ABC notation, and then fine-tuned on approximately 9K high-quality classical compositions conditioned on "period-composer-instrumentation" prompts. For reinforcement learning, we propose the CLaMP-DPO method, which further enhances generation quality and controllability without requiring human annotations or predefined rewards. Our experiments demonstrate the efficacy of CLaMP-DPO in symbolic music generation models with different architectures and encoding schemes. Furthermore, subjective A/B tests show that NotaGen outperforms baseline models against human compositions, greatly advancing musical aesthetics in symbolic music generation.
△ Less
Submitted 21 March, 2025; v1 submitted 25 February, 2025;
originally announced February 2025.
-
Cross-Domain Continual Learning for Edge Intelligence in Wireless ISAC Networks
Authors:
Jingzhi Hu,
Xin Li,
Zhou Su,
Jun Luo
Abstract:
In wireless networks with integrated sensing and communications (ISAC), edge intelligence (EI) is expected to be developed at edge devices (ED) for sensing user activities based on channel state information (CSI). However, due to the CSI being highly specific to users' characteristics, the CSI-activity relationship is notoriously domain dependent, essentially demanding EI to learn sufficient datas…
▽ More
In wireless networks with integrated sensing and communications (ISAC), edge intelligence (EI) is expected to be developed at edge devices (ED) for sensing user activities based on channel state information (CSI). However, due to the CSI being highly specific to users' characteristics, the CSI-activity relationship is notoriously domain dependent, essentially demanding EI to learn sufficient datasets from various domains in order to gain cross-domain sensing capability. This poses a crucial challenge owing to the EDs' limited resources, for which storing datasets across all domains will be a significant burden. In this paper, we propose the EdgeCL framework, enabling the EI to continually learn-then-discard each incoming dataset, while remaining resilient to catastrophic forgetting. We design a transformer-based discriminator for handling sequences of noisy and nonequispaced CSI samples. Besides, we propose a distilled core-set based knowledge retention method with robustness-enhanced optimization to train the discriminator, preserving its performance for previous domains while preventing future forgetting. Experimental evaluations show that EdgeCL achieves 89% of performance compared to cumulative training while consuming only 3% of its memory, mitigating forgetting by 79%.
△ Less
Submitted 14 April, 2025; v1 submitted 18 February, 2025;
originally announced February 2025.
-
PPAC Driven Multi-die and Multi-technology Floorplanning
Authors:
Cristhian Roman-Vicharra,
Yiran Chen,
Jiang Hu
Abstract:
In heterogeneous integration, where different dies may utilize distinct technologies, floorplanning across multiple dies inherently requires simultaneous technology selection. This work presents the first systematic study of multi-die and multi-technology floorplanning. Unlike many conventional approaches, which are primarily driven by area and wirelength, this study additionally considers perform…
▽ More
In heterogeneous integration, where different dies may utilize distinct technologies, floorplanning across multiple dies inherently requires simultaneous technology selection. This work presents the first systematic study of multi-die and multi-technology floorplanning. Unlike many conventional approaches, which are primarily driven by area and wirelength, this study additionally considers performance, power, and cost, highlighting the impact of technology selection. A simulated annealing method and a reinforcement learning techniques are developed. Experimental results show that the proposed techniques significantly outperform a naive baseline approach.
△ Less
Submitted 15 February, 2025;
originally announced February 2025.
-
Col-OLHTR: A Novel Framework for Multimodal Online Handwritten Text Recognition
Authors:
Chenyu Liu,
Jinshui Hu,
Baocai Yin,
Jia Pan,
Bing Yin,
Jun Du,
Qingfeng Liu
Abstract:
Online Handwritten Text Recognition (OLHTR) has gained considerable attention for its diverse range of applications. Current approaches usually treat OLHTR as a sequence recognition task, employing either a single trajectory or image encoder, or multi-stream encoders, combined with a CTC or attention-based recognition decoder. However, these approaches face several drawbacks: 1) single encoders ty…
▽ More
Online Handwritten Text Recognition (OLHTR) has gained considerable attention for its diverse range of applications. Current approaches usually treat OLHTR as a sequence recognition task, employing either a single trajectory or image encoder, or multi-stream encoders, combined with a CTC or attention-based recognition decoder. However, these approaches face several drawbacks: 1) single encoders typically focus on either local trajectories or visual regions, lacking the ability to dynamically capture relevant global features in challenging cases; 2) multi-stream encoders, while more comprehensive, suffer from complex structures and increased inference costs. To tackle this, we propose a Collaborative learning-based OLHTR framework, called Col-OLHTR, that learns multimodal features during training while maintaining a single-stream inference process. Col-OLHTR consists of a trajectory encoder, a Point-to-Spatial Alignment (P2SA) module, and an attention-based decoder. The P2SA module is designed to learn image-level spatial features through trajectory-encoded features and 2D rotary position embeddings. During training, an additional image-stream encoder-decoder is collaboratively trained to provide supervision for P2SA features. At inference, the extra streams are discarded, and only the P2SA module is used and merged before the decoder, simplifying the process while preserving high performance. Extensive experimental results on several OLHTR benchmarks demonstrate the state-of-the-art (SOTA) performance, proving the effectiveness and robustness of our design.
△ Less
Submitted 9 February, 2025;
originally announced February 2025.
-
Dynamic Frequency-Adaptive Knowledge Distillation for Speech Enhancement
Authors:
Xihao Yuan,
Siqi Liu,
Hanting Chen,
Lu Zhou,
Jian Li,
Jie Hu
Abstract:
Deep learning-based speech enhancement (SE) models have recently outperformed traditional techniques, yet their deployment on resource-constrained devices remains challenging due to high computational and memory demands. This paper introduces a novel dynamic frequency-adaptive knowledge distillation (DFKD) approach to effectively compress SE models. Our method dynamically assesses the model's outp…
▽ More
Deep learning-based speech enhancement (SE) models have recently outperformed traditional techniques, yet their deployment on resource-constrained devices remains challenging due to high computational and memory demands. This paper introduces a novel dynamic frequency-adaptive knowledge distillation (DFKD) approach to effectively compress SE models. Our method dynamically assesses the model's output, distinguishing between high and low-frequency components, and adapts the learning objectives to meet the unique requirements of different frequency bands, capitalizing on the SE task's inherent characteristics. To evaluate the DFKD's efficacy, we conducted experiments on three state-of-the-art models: DCCRN, ConTasNet, and DPTNet. The results demonstrate that our method not only significantly enhances the performance of the compressed model (student model) but also surpasses other logit-based knowledge distillation methods specifically for SE tasks.
△ Less
Submitted 7 February, 2025;
originally announced February 2025.
-
Neural Operator based Reinforcement Learning for Control of first-order PDEs with Spatially-Varying State Delay
Authors:
Jiaqi Hu,
Jie Qi,
Jing Zhang
Abstract:
Control of distributed parameter systems affected by delays is a challenging task, particularly when the delays depend on spatial variables. The idea of integrating analytical control theory with learning-based control within a unified control scheme is becoming increasingly promising and advantageous. In this paper, we address the problem of controlling an unstable first-order hyperbolic PDE with…
▽ More
Control of distributed parameter systems affected by delays is a challenging task, particularly when the delays depend on spatial variables. The idea of integrating analytical control theory with learning-based control within a unified control scheme is becoming increasingly promising and advantageous. In this paper, we address the problem of controlling an unstable first-order hyperbolic PDE with spatially-varying delays by combining PDE backstepping control strategies and deep reinforcement learning (RL). To eliminate the assumption on the delay function required for the backstepping design, we propose a soft actor-critic (SAC) architecture incorporating a DeepONet to approximate the backstepping controller. The DeepONet extracts features from the backstepping controller and feeds them into the policy network. In simulations, our algorithm outperforms the baseline SAC without prior backstepping knowledge and the analytical controller.
△ Less
Submitted 30 January, 2025;
originally announced January 2025.
-
Federated Learning Strategies for Coordinated Beamforming in Multicell ISAC
Authors:
Lai Jiang,
Kaitao Meng,
Murat Temiz,
Jiaming Hu,
Christos Masouros
Abstract:
We propose two cooperative beamforming frameworks based on federated learning (FL) for multi-cell integrated sensing and communications (ISAC) systems. Our objective is to address the following dilemma in multicell ISAC: 1) Beamforming strategies that rely solely on local channel information risk generating significant inter-cell interference (ICI), which degrades network performance for both comm…
▽ More
We propose two cooperative beamforming frameworks based on federated learning (FL) for multi-cell integrated sensing and communications (ISAC) systems. Our objective is to address the following dilemma in multicell ISAC: 1) Beamforming strategies that rely solely on local channel information risk generating significant inter-cell interference (ICI), which degrades network performance for both communication users and sensing receivers in neighboring cells; 2) conversely centralized beamforming strategies can mitigate ICI by leveraging global channel information, but they come with substantial transmission overhead and latency that can be prohibitive for latency-sensitive and source-constrained applications. To tackle these challenges, we first propose a partially decentralized training framework motivated by the vertical federated learning (VFL) paradigm. In this framework, the participating base stations (BSs) collaboratively design beamforming matrices under the guidance of a central server. The central server aggregates local information from the BSs and provides feedback, allowing BSs to implicitly manage ICI without accessing the global channel information. To make the solution scalable for densely deployed wireless networks, we take further steps to reduce communication overhead by presenting a fully decentralized design based on the horizontal federated learning (HFL). Specifically, we develop a novel loss function to control the interference leakage power, enabling a more efficient training process by entirely eliminating local channel information exchange. Numerical results show that the proposed solutions can achieve significant performance improvements comparable to the benchmarks in terms of both communication and radar information rates.
△ Less
Submitted 28 January, 2025;
originally announced January 2025.
-
Performance Analysis of Fluid Antenna Multiple Access Assisted Wireless Powered Communication Network
Authors:
Xiao Lin,
Yizhe Zhao,
Halvin Yang,
Jie Hu
Abstract:
This paper investigates a novel fluid antenna multiple access (FAMA)-assisted wireless powered communication network (WPCN), in which a hybrid access point (HAP) equipped with multiple fixed position antennas (FPAs) provides integrated data and energy transfer (IDET) services towards low-power devices that are equipped with a single fluid antenna (FA), while the low-power devices use harvested ene…
▽ More
This paper investigates a novel fluid antenna multiple access (FAMA)-assisted wireless powered communication network (WPCN), in which a hybrid access point (HAP) equipped with multiple fixed position antennas (FPAs) provides integrated data and energy transfer (IDET) services towards low-power devices that are equipped with a single fluid antenna (FA), while the low-power devices use harvested energy to power their own uplink transmission. Using the block correlation channel model, both the downlink and uplink wireless data transfer (WDT) outage probabilities are analyzed under specific port selection strategies, including downlink signal-to-interference ratio-based port selection (DSPS) strategy, downlink energy harvesting power-based port selection (DEPS) strategy, uplink signal-to-noise ratio-based port selection (USPS) strategy, and uplink channel-based port selection (UCPS) strategy. A step function approximation (SFA) approach is also relied upon to derive closed-form expressions for the outage probabilities, while the lower bounds for uplink WDT outage probabilities are also formulated. Numerical results demonstrate the validity of our theoretical analysis, which also provide useful guidelines for the system design through the analytical framework.
△ Less
Submitted 10 February, 2025; v1 submitted 23 January, 2025;
originally announced January 2025.
-
Joint Automatic Speech Recognition And Structure Learning For Better Speech Understanding
Authors:
Jiliang Hu,
Zuchao Li,
Mengjia Shen,
Haojun Ai,
Sheng Li,
Jun Zhang
Abstract:
Spoken language understanding (SLU) is a structure prediction task in the field of speech. Recently, many works on SLU that treat it as a sequence-to-sequence task have achieved great success. However, This method is not suitable for simultaneous speech recognition and understanding. In this paper, we propose a joint speech recognition and structure learning framework (JSRSL), an end-to-end SLU mo…
▽ More
Spoken language understanding (SLU) is a structure prediction task in the field of speech. Recently, many works on SLU that treat it as a sequence-to-sequence task have achieved great success. However, This method is not suitable for simultaneous speech recognition and understanding. In this paper, we propose a joint speech recognition and structure learning framework (JSRSL), an end-to-end SLU model based on span, which can accurately transcribe speech and extract structured content simultaneously. We conduct experiments on name entity recognition and intent classification using the Chinese dataset AISHELL-NER and the English dataset SLURP. The results show that our proposed method not only outperforms the traditional sequence-to-sequence method in both transcription and extraction capabilities but also achieves state-of-the-art performance on the two datasets.
△ Less
Submitted 17 January, 2025; v1 submitted 13 January, 2025;
originally announced January 2025.
-
FleSpeech: Flexibly Controllable Speech Generation with Various Prompts
Authors:
Hanzhao Li,
Yuke Li,
Xinsheng Wang,
Jingbin Hu,
Qicong Xie,
Shan Yang,
Lei Xie
Abstract:
Controllable speech generation methods typically rely on single or fixed prompts, hindering creativity and flexibility. These limitations make it difficult to meet specific user needs in certain scenarios, such as adjusting the style while preserving a selected speaker's timbre, or choosing a style and generating a voice that matches a character's visual appearance. To overcome these challenges, w…
▽ More
Controllable speech generation methods typically rely on single or fixed prompts, hindering creativity and flexibility. These limitations make it difficult to meet specific user needs in certain scenarios, such as adjusting the style while preserving a selected speaker's timbre, or choosing a style and generating a voice that matches a character's visual appearance. To overcome these challenges, we propose \textit{FleSpeech}, a novel multi-stage speech generation framework that allows for more flexible manipulation of speech attributes by integrating various forms of control. FleSpeech employs a multimodal prompt encoder that processes and unifies different text, audio, and visual prompts into a cohesive representation. This approach enhances the adaptability of speech synthesis and supports creative and precise control over the generated speech. Additionally, we develop a data collection pipeline for multimodal datasets to facilitate further research and applications in this field. Comprehensive subjective and objective experiments demonstrate the effectiveness of FleSpeech. Audio samples are available at https://kkksuper.github.io/FleSpeech/
△ Less
Submitted 30 April, 2025; v1 submitted 8 January, 2025;
originally announced January 2025.
-
Anti-bullying Adaptive Cruise Control: A proactive right-of-way protection approach
Authors:
Jia Hu,
Zhexi Lian,
Haoran Wang,
Zihan Zhang,
Ruoxi Qian,
Duo Li,
Jaehyun,
So,
Junnian Zheng
Abstract:
The current Adaptive Cruise Control (ACC) systems are vulnerable to "road bully" such as cut-ins. This paper proposed an Anti-bullying Adaptive Cruise Control (AACC) approach with proactive right-of-way protection ability. It bears the following features: i) with the enhanced capability of preventing bullying from cut-ins; ii) optimal but not unsafe; iii) adaptive to various driving styles of cut-…
▽ More
The current Adaptive Cruise Control (ACC) systems are vulnerable to "road bully" such as cut-ins. This paper proposed an Anti-bullying Adaptive Cruise Control (AACC) approach with proactive right-of-way protection ability. It bears the following features: i) with the enhanced capability of preventing bullying from cut-ins; ii) optimal but not unsafe; iii) adaptive to various driving styles of cut-in vehicles; iv) with real-time field implementation capability. The proposed approach can identify other road users' driving styles online and conduct game-based motion planning for right-of-way protection. A detailed investigation of the simulation results shows that the proposed approach can prevent bullying from cut-ins and be adaptive to different cut-in vehicles' driving styles. The proposed approach is capable of enhancing travel efficiency by up to 29.55% under different cut-in gaps and can strengthen driving safety compared with the current ACC controller. The proposed approach is flexible and robust against traffic congestion levels. It can improve mobility by up to 11.93% and robustness by 8.74% in traffic flow. Furthermore, the proposed approach can support real-time field implementation by ensuring less than 50 milliseconds computation time.
△ Less
Submitted 14 December, 2024;
originally announced December 2024.
-
Seamless Optical Cloud Computing across Edge-Metro Network for Generative AI
Authors:
Sizhe Xing,
Aolong Sun,
Chengxi Wang,
Yizhi Wang,
Boyu Dong,
Junhui Hu,
Xuyu Deng,
An Yan,
Yingjun Liu,
Fangchen Hu,
Zhongya Li,
Ouhan Huang,
Junhao Zhao,
Yingjun Zhou,
Ziwei Li,
Jianyang Shi,
Xi Xiao,
Richard Penty,
Qixiang Cheng,
Nan Chi,
Junwen Zhang
Abstract:
The rapid advancement of generative artificial intelligence (AI) in recent years has profoundly reshaped modern lifestyles, necessitating a revolutionary architecture to support the growing demands for computational power. Cloud computing has become the driving force behind this transformation. However, it consumes significant power and faces computation security risks due to the reliance on exten…
▽ More
The rapid advancement of generative artificial intelligence (AI) in recent years has profoundly reshaped modern lifestyles, necessitating a revolutionary architecture to support the growing demands for computational power. Cloud computing has become the driving force behind this transformation. However, it consumes significant power and faces computation security risks due to the reliance on extensive data centers and servers in the cloud. Reducing power consumption while enhancing computational scale remains persistent challenges in cloud computing. Here, we propose and experimentally demonstrate an optical cloud computing system that can be seamlessly deployed across edge-metro network. By modulating inputs and models into light, a wide range of edge nodes can directly access the optical computing center via the edge-metro network. The experimental validations show an energy efficiency of 118.6 mW/TOPs (tera operations per second), reducing energy consumption by two orders of magnitude compared to traditional electronic-based cloud computing solutions. Furthermore, it is experimentally validated that this architecture can perform various complex generative AI models through parallel computing to achieve image generation tasks.
△ Less
Submitted 1 May, 2025; v1 submitted 4 December, 2024;
originally announced December 2024.
-
Automated Driving with Evolution Capability: A Reinforcement Learning Method with Monotonic Performance Enhancement
Authors:
Jia Hu,
Xuerun Yan,
Tian Xu,
Haoran Wang
Abstract:
Reinforcement Learning (RL) offers a promising solution to enable evolutionary automated driving. However, the conventional RL method is always concerned with risk performance. The updated policy may not obtain a performance enhancement, even leading to performance deterioration. To address this challenge, this research proposes a High Confidence Policy Improvement Reinforcement Learning-based (HC…
▽ More
Reinforcement Learning (RL) offers a promising solution to enable evolutionary automated driving. However, the conventional RL method is always concerned with risk performance. The updated policy may not obtain a performance enhancement, even leading to performance deterioration. To address this challenge, this research proposes a High Confidence Policy Improvement Reinforcement Learning-based (HCPI-RL) planner. It is intended to achieve the monotonic evolution of automated driving. A novel RL policy update paradigm is designed to enable the newly learned policy performance consistently surpass that of previous policies, which is deemed as monotonic performance enhancement. Hence, the proposed HCPI-RL planner has the following features: i) Evolutionary automated driving with monotonic performance enhancement; ii) With the capability of handling scenarios with emergency; iii) With enhanced decision-making optimality. Results demonstrate that the proposed HCPI-RL planner enhances the policy return by 44.7% in emergent cut-in scenarios, 108.2% in emergent braking scenarios, and 64.4% in daily cruising scenarios, compared to the PPO planner. Adopting the proposed planner, automated driving efficiency is enhanced by 19.2% compared to the PPO planner, and by 30.7% compared to the rule-based planner.
△ Less
Submitted 14 December, 2024;
originally announced December 2024.
-
Neural Operator Feedback for a First-Order PIDE with Spatially-Varying State Delay
Authors:
Jie Qi,
Jiaqi Hu,
Jing Zhang,
Miroslav Krstic
Abstract:
A transport PDE with a spatial integral and recirculation with constant delay has been a benchmark for neural operator approximations of PDE backstepping controllers. Introducing a spatially-varying delay into the model gives rise to a gain operator defined through integral equations which the operator's input -- the varying delay function -- enters in previously unencountered manners, including i…
▽ More
A transport PDE with a spatial integral and recirculation with constant delay has been a benchmark for neural operator approximations of PDE backstepping controllers. Introducing a spatially-varying delay into the model gives rise to a gain operator defined through integral equations which the operator's input -- the varying delay function -- enters in previously unencountered manners, including in the limits of integration and as the inverse of the `delayED time' function. This, in turn, introduces novel mathematical challenges in estimating the operator's Lipschitz constant. The backstepping kernel function having two branches endows the feedback law with a two-branch structure, where only one of the two feedback branches depends on both of the kernel branches. For this rich feedback structure, we propose a neural operator approximation of such a two-branch feedback law and prove the approximator to be semiglobally practically stabilizing. With numerical results we illustrate the training of the neural operator and its stabilizing capability.
△ Less
Submitted 14 December, 2024; v1 submitted 11 December, 2024;
originally announced December 2024.
-
BATseg: Boundary-aware Multiclass Spinal Cord Tumor Segmentation on 3D MRI Scans
Authors:
Hongkang Song,
Zihui Zhang,
Yanpeng Zhou,
Jie Hu,
Zishuo Wang,
Hou Him Chan,
Chon Lok Lei,
Chen Xu,
Yu Xin,
Bo Yang
Abstract:
Spinal cord tumors significantly contribute to neurological morbidity and mortality. Precise morphometric quantification, encompassing the size, location, and type of such tumors, holds promise for optimizing treatment planning strategies. Although recent methods have demonstrated excellent performance in medical image segmentation, they primarily focus on discerning shapes with relatively large m…
▽ More
Spinal cord tumors significantly contribute to neurological morbidity and mortality. Precise morphometric quantification, encompassing the size, location, and type of such tumors, holds promise for optimizing treatment planning strategies. Although recent methods have demonstrated excellent performance in medical image segmentation, they primarily focus on discerning shapes with relatively large morphology such as brain tumors, ignoring the challenging problem of identifying spinal cord tumors which tend to have tiny sizes, diverse locations, and shapes. To tackle this hard problem of multiclass spinal cord tumor segmentation, we propose a new method, called BATseg, to learn a tumor surface distance field by applying our new multiclass boundary-aware loss function. To verify the effectiveness of our approach, we also introduce the first and large-scale spinal cord tumor dataset. It comprises gadolinium-enhanced T1-weighted 3D MRI scans from 653 patients and contains the four most common spinal cord tumor types: astrocytomas, ependymomas, hemangioblastomas, and spinal meningiomas. Extensive experiments on our dataset and another public kidney tumor segmentation dataset show that our proposed method achieves superior performance for multiclass tumor segmentation.
△ Less
Submitted 9 December, 2024;
originally announced December 2024.
-
Zero-Forget Preservation of Semantic Communication Alignment in Distributed AI Networks
Authors:
Jingzhi Hu,
Geoffrey Ye Li
Abstract:
Future communication networks are expected to connect massive distributed artificial intelligence (AI). Exploiting aligned priori knowledge of AI pairs, it is promising to convert high-dimensional data transmission into highly-compressed semantic communications (SC). However, to accommodate the local data distribution and user preferences, AIs generally adapt to different domains, which fundamenta…
▽ More
Future communication networks are expected to connect massive distributed artificial intelligence (AI). Exploiting aligned priori knowledge of AI pairs, it is promising to convert high-dimensional data transmission into highly-compressed semantic communications (SC). However, to accommodate the local data distribution and user preferences, AIs generally adapt to different domains, which fundamentally distorts the SC alignment. In this paper, we propose a zero-forget domain adaptation (ZFDA) framework to preserve SC alignment. To prevent the DA from changing substantial neural parameters of AI, we design sparse additive modifications (SAM) to the parameters, which can be efficiently stored and switched-off to restore the SC alignment. To optimize the SAM, we decouple it into tractable continuous variables and a binary mask, and then handle the binary mask by a score-based optimization. Experimental evaluations on a SC system for image transmissions validate that the proposed framework perfectly preserves the SC alignment with almost no loss of DA performance, even improved in some cases, at a cost of less than 1% of additional memory.
△ Less
Submitted 28 November, 2024;
originally announced November 2024.
-
EEG-DCNet: A Fast and Accurate MI-EEG Dilated CNN Classification Method
Authors:
Wei Peng,
Kang Liu,
Jiaxi Shi,
Jianchen Hu
Abstract:
The electroencephalography (EEG)-based motor imagery (MI) classification is a critical and challenging task in brain-computer interface (BCI) technology, which plays a significant role in assisting patients with functional impairments to regain mobility. We present a novel multi-scale atrous convolutional neural network (CNN) model called EEG-dilated convolution network (DCNet) to enhance the accu…
▽ More
The electroencephalography (EEG)-based motor imagery (MI) classification is a critical and challenging task in brain-computer interface (BCI) technology, which plays a significant role in assisting patients with functional impairments to regain mobility. We present a novel multi-scale atrous convolutional neural network (CNN) model called EEG-dilated convolution network (DCNet) to enhance the accuracy and efficiency of the EEG-based MI classification tasks. We incorporate the $1\times1$ convolutional layer and utilize the multi-branch parallel atrous convolutional architecture in EEG-DCNet to capture the highly nonlinear characteristics and multi-scale features of the EEG signals. Moreover, we utilize the sliding window to enhance the temporal consistency and utilize the attension mechanism to improve the accuracy of recognizing user intentions. The experimental results (via the BCI-IV-2a ,BCI-IV-2b and the High-Gamma datasets) show that EEG-DCNet outperforms existing state-of-the-art (SOTA) approaches in terms of classification accuracy and Kappa scores. Furthermore, since EEG-DCNet requires less number of parameters, the training efficiency and memory consumption are also improved. The experiment code is open-sourced at \href{https://github.com/Kanyooo/EEG-DCNet}{here}.
△ Less
Submitted 12 November, 2024;
originally announced November 2024.
-
LightLLM: A Versatile Large Language Model for Predictive Light Sensing
Authors:
Jiawei Hu,
Hong Jia,
Mahbub Hassan,
Lina Yao,
Brano Kusy,
Wen Hu
Abstract:
We propose LightLLM, a model that fine tunes pre-trained large language models (LLMs) for light-based sensing tasks. It integrates a sensor data encoder to extract key features, a contextual prompt to provide environmental information, and a fusion layer to combine these inputs into a unified representation. This combined input is then processed by the pre-trained LLM, which remains frozen while b…
▽ More
We propose LightLLM, a model that fine tunes pre-trained large language models (LLMs) for light-based sensing tasks. It integrates a sensor data encoder to extract key features, a contextual prompt to provide environmental information, and a fusion layer to combine these inputs into a unified representation. This combined input is then processed by the pre-trained LLM, which remains frozen while being fine-tuned through the addition of lightweight, trainable components, allowing the model to adapt to new tasks without altering its original parameters. This approach enables flexible adaptation of LLM to specialized light sensing tasks with minimal computational overhead and retraining effort. We have implemented LightLLM for three light sensing tasks: light-based localization, outdoor solar forecasting, and indoor solar estimation. Using real-world experimental datasets, we demonstrate that LightLLM significantly outperforms state-of-the-art methods, achieving 4.4x improvement in localization accuracy and 3.4x improvement in indoor solar estimation when tested in previously unseen environments. We further demonstrate that LightLLM outperforms ChatGPT-4 with direct prompting, highlighting the advantages of LightLLM's specialized architecture for sensor data fusion with textual prompts.
△ Less
Submitted 20 November, 2024;
originally announced November 2024.
-
Enhancing Medical Image Segmentation with Deep Learning and Diffusion Models
Authors:
Houze Liu,
Tong Zhou,
Yanlin Xiang,
Aoran Shen,
Jiacheng Hu,
Junliang Du
Abstract:
Medical image segmentation is crucial for accurate clinical diagnoses, yet it faces challenges such as low contrast between lesions and normal tissues, unclear boundaries, and high variability across patients. Deep learning has improved segmentation accuracy and efficiency, but it still relies heavily on expert annotations and struggles with the complexities of medical images. The small size of me…
▽ More
Medical image segmentation is crucial for accurate clinical diagnoses, yet it faces challenges such as low contrast between lesions and normal tissues, unclear boundaries, and high variability across patients. Deep learning has improved segmentation accuracy and efficiency, but it still relies heavily on expert annotations and struggles with the complexities of medical images. The small size of medical image datasets and the high cost of data acquisition further limit the performance of segmentation networks. Diffusion models, with their iterative denoising process, offer a promising alternative for better detail capture in segmentation. However, they face difficulties in accurately segmenting small targets and maintaining the precision of boundary details. This article discusses the importance of medical image segmentation, the limitations of current deep learning approaches, and the potential of diffusion models to address these challenges.
△ Less
Submitted 5 December, 2024; v1 submitted 21 November, 2024;
originally announced November 2024.
-
On Adapting Randomized Nyström Preconditioners to Accelerate Variational Image Reconstruction
Authors:
Tao Hong,
Zhaoyi Xu,
Jason Hu,
Jeffrey A. Fessler
Abstract:
Model-based iterative reconstruction plays a key role in solving inverse problems. However, the associated minimization problems are generally large-scale, ill-posed, nonsmooth, and sometimes even nonconvex, which present challenges in designing efficient iterative solvers and often prevent their practical use. Preconditioning methods can significantly accelerate the convergence of iterative metho…
▽ More
Model-based iterative reconstruction plays a key role in solving inverse problems. However, the associated minimization problems are generally large-scale, ill-posed, nonsmooth, and sometimes even nonconvex, which present challenges in designing efficient iterative solvers and often prevent their practical use. Preconditioning methods can significantly accelerate the convergence of iterative methods. In some applications, computing preconditioners on-the-fly is beneficial. Moreover, forward models in image reconstruction are typically represented as operators, and the corresponding explicit matrices are often unavailable, which brings additional challenges in designing preconditioners. Therefore, for practical use, computing and applying preconditioners should be computationally inexpensive. This paper adapts the randomized Nyström approximation to compute effective preconditioners that accelerate image reconstruction without requiring an explicit matrix for the forward model. We leverage modern GPU computational platforms to compute the preconditioner on-the-fly. Moreover, we propose efficient approaches for applying the preconditioner to problems with nonsmooth regularizers. Our numerical results on image deblurring, super-resolution with impulsive noise, and computed tomography reconstruction demonstrate the efficiency and effectiveness of the proposed preconditioner.
△ Less
Submitted 12 November, 2024;
originally announced November 2024.
-
Artistic Neural Style Transfer Algorithms with Activation Smoothing
Authors:
Xiangtian Li,
Han Cao,
Zhaoyang Zhang,
Jiacheng Hu,
Yuhui Jin,
Zihao Zhao
Abstract:
The works of Gatys et al. demonstrated the capability of Convolutional Neural Networks (CNNs) in creating artistic style images. This process of transferring content images in different styles is called Neural Style Transfer (NST). In this paper, we re-implement image-based NST, fast NST, and arbitrary NST. We also explore to utilize ResNet with activation smoothing in NST. Extensive experimental…
▽ More
The works of Gatys et al. demonstrated the capability of Convolutional Neural Networks (CNNs) in creating artistic style images. This process of transferring content images in different styles is called Neural Style Transfer (NST). In this paper, we re-implement image-based NST, fast NST, and arbitrary NST. We also explore to utilize ResNet with activation smoothing in NST. Extensive experimental results demonstrate that smoothing transformation can greatly improve the quality of stylization results.
△ Less
Submitted 12 November, 2024;
originally announced November 2024.