-
On secure UAV-aided ISCC systems
Authors:
Hongjiang Lei,
Congke Jiang,
Ki-Hong Park,
Mohamed A. Aboulhassan,
Sen Zhou,
Gaofeng Pan
Abstract:
Integrated communication and sensing, which can make full use of the limited spectrum resources to perform communication and sensing tasks simultaneously, is an up-and-coming technology in wireless communication networks. In this work, we investigate the secrecy performance of an uncrewed aerial vehicle (UAV)-assisted secure integrated communication, sensing, and computing system, where the UAV se…
▽ More
Integrated communication and sensing, which can make full use of the limited spectrum resources to perform communication and sensing tasks simultaneously, is an up-and-coming technology in wireless communication networks. In this work, we investigate the secrecy performance of an uncrewed aerial vehicle (UAV)-assisted secure integrated communication, sensing, and computing system, where the UAV sends radar signals to locate and disrupt potential eavesdroppers while providing offload services to ground users (GUs). Considering the constraints of UAV maximum speed, transmit power, and propulsion energy, as well as secure offloading, data transmission, and computation time, the total energy consumption of GUs is minimized by jointly optimizing user offloading ratio, user scheduling strategy, transmit beamforming, and UAV trajectory. An efficient iterative optimization algorithm is proposed to solve the non-convex optimization problem caused by tightly coupled dependent variables. In particular, the original optimization problem is decomposed into four sub-optimization problems, and the non-convex sub-problems are transformed into approximately convex forms via successive convex approximation. Then, all sub-problems are solved successively by using the block coordinate descent technique. Numerical results demonstrate the convergence and validate the effectiveness of the proposed algorithm.
△ Less
Submitted 27 June, 2025; v1 submitted 16 June, 2025;
originally announced June 2025.
-
Beamforming for Secure RSMA-Aided ISAC Systems
Authors:
Qian Dan,
Hongjiang Lei,
Ki-Hong Park,
Gaofeng Pan
Abstract:
This work investigates the physical layer security of rate-splitting multiple access (RSMA)-aided integrated communication and sensing (ISAC) systems. The ISAC base station (BS) transmits signals to communicate with users in an eavesdropped scenario and to estimate the parameters of the sensed targets. The research considers different sensing signals under RSMA technology and the Cram{é}r-Rao boun…
▽ More
This work investigates the physical layer security of rate-splitting multiple access (RSMA)-aided integrated communication and sensing (ISAC) systems. The ISAC base station (BS) transmits signals to communicate with users in an eavesdropped scenario and to estimate the parameters of the sensed targets. The research considers different sensing signals under RSMA technology and the Cram{é}r-Rao bound of the parameter estimation is utilized as the sensing metric. With the channel state information (CSI) of eavesdroppers known, the transmitting beam of the BS is optimized to maximize the energy efficiency in terms of the minimum user rate and secrecy capacity, considering the fairness among users and ensuring the sensing performance and communication security. With the CSI of eavesdroppers unknown, the transmitting beam of the BS is designed to minimize the energy consumption for sensing and communication, and the residual power is utilized for artificial noise, which is isotropically emitted to achieve interference with potential eavesdroppers. To solve the non-convex problems, three iterative algorithms based on successive convex approximation and penalty function are proposed. The simulation results illustrate the effectiveness of the proposed schemes.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
Efficient ANN-SNN Conversion with Error Compensation Learning
Authors:
Chang Liu,
Jiangrong Shen,
Xuming Ran,
Mingkun Xu,
Qi Xu,
Yi Xu,
Gang Pan
Abstract:
Artificial neural networks (ANNs) have demonstrated outstanding performance in numerous tasks, but deployment in resource-constrained environments remains a challenge due to their high computational and memory requirements. Spiking neural networks (SNNs) operate through discrete spike events and offer superior energy efficiency, providing a bio-inspired alternative. However, current ANN-to-SNN con…
▽ More
Artificial neural networks (ANNs) have demonstrated outstanding performance in numerous tasks, but deployment in resource-constrained environments remains a challenge due to their high computational and memory requirements. Spiking neural networks (SNNs) operate through discrete spike events and offer superior energy efficiency, providing a bio-inspired alternative. However, current ANN-to-SNN conversion often results in significant accuracy loss and increased inference time due to conversion errors such as clipping, quantization, and uneven activation. This paper proposes a novel ANN-to-SNN conversion framework based on error compensation learning. We introduce a learnable threshold clipping function, dual-threshold neurons, and an optimized membrane potential initialization strategy to mitigate the conversion error. Together, these techniques address the clipping error through adaptive thresholds, dynamically reduce the quantization error through dual-threshold neurons, and minimize the non-uniformity error by effectively managing the membrane potential. Experimental results on CIFAR-10, CIFAR-100, ImageNet datasets show that our method achieves high-precision and ultra-low latency among existing conversion methods. Using only two time steps, our method significantly reduces the inference time while maintains competitive accuracy of 94.75% on CIFAR-10 dataset under ResNet-18 structure. This research promotes the practical application of SNNs on low-power hardware, making efficient real-time processing possible.
△ Less
Submitted 12 May, 2025;
originally announced June 2025.
-
Spiking Neural Networks with Temporal Attention-Guided Adaptive Fusion for imbalanced Multi-modal Learning
Authors:
Jiangrong Shen,
Yulin Xie,
Qi Xu,
Gang Pan,
Huajin Tang,
Badong Chen
Abstract:
Multimodal spiking neural networks (SNNs) hold significant potential for energy-efficient sensory processing but face critical challenges in modality imbalance and temporal misalignment. Current approaches suffer from uncoordinated convergence speeds across modalities and static fusion mechanisms that ignore time-varying cross-modal interactions. We propose the temporal attention-guided adaptive f…
▽ More
Multimodal spiking neural networks (SNNs) hold significant potential for energy-efficient sensory processing but face critical challenges in modality imbalance and temporal misalignment. Current approaches suffer from uncoordinated convergence speeds across modalities and static fusion mechanisms that ignore time-varying cross-modal interactions. We propose the temporal attention-guided adaptive fusion framework for multimodal SNNs with two synergistic innovations: 1) The Temporal Attention-guided Adaptive Fusion (TAAF) module that dynamically assigns importance scores to fused spiking features at each timestep, enabling hierarchical integration of temporally heterogeneous spike-based features; 2) The temporal adaptive balanced fusion loss that modulates learning rates per modality based on the above attention scores, preventing dominant modalities from monopolizing optimization. The proposed framework implements adaptive fusion, especially in the temporal dimension, and alleviates the modality imbalance during multimodal learning, mimicking cortical multisensory integration principles. Evaluations on CREMA-D, AVE, and EAD datasets demonstrate state-of-the-art performance (77.55\%, 70.65\% and 97.5\%accuracy, respectively) with energy efficiency. The system resolves temporal misalignment through learnable time-warping operations and faster modality convergence coordination than baseline SNNs. This work establishes a new paradigm for temporally coherent multimodal learning in neuromorphic systems, bridging the gap between biological sensory processing and efficient machine intelligence.
△ Less
Submitted 20 May, 2025;
originally announced May 2025.
-
Bridging Quantized Artificial Neural Networks and Neuromorphic Hardware
Authors:
Zhenhui Chen,
Haoran Xu,
Yangfan Hu,
Xiaofei Jin,
Xinyu Li,
Ziyang Kang,
Gang Pan,
De Ma
Abstract:
Neuromorphic hardware aims to leverage distributed computing and event-driven circuit design to achieve an energy-efficient AI system. The name "neuromorphic" is derived from its spiking and local computing nature, which mimics the fundamental activity of an animal's nervous system. In neuromorphic hardware, neurons, i.e., computing cores use single-bit, event-driven data (called spikes) for inter…
▽ More
Neuromorphic hardware aims to leverage distributed computing and event-driven circuit design to achieve an energy-efficient AI system. The name "neuromorphic" is derived from its spiking and local computing nature, which mimics the fundamental activity of an animal's nervous system. In neuromorphic hardware, neurons, i.e., computing cores use single-bit, event-driven data (called spikes) for inter-communication, which differs substantially from conventional hardware. To leverage the advantages of neuromorphic hardware and implement a computing model, the conventional approach is to build spiking neural networks (SNNs). SNNs replace the nonlinearity part of artificial neural networks (ANNs) in the realm of deep learning with spiking neurons, where the spiking neuron mimics the basic behavior of bio-neurons. However, there is still a performance gap between SNNs and their ANN counterparts. In this paper, we explore a new way to map computing models onto neuromorphic hardware. We propose a Spiking-Driven ANN (SDANN) framework that directly implements quantized ANN on hardware, eliminating the need for tuning the trainable parameters or any performance degradation. With the power of quantized ANN, our SDANN ensures a lower bound of implementation performance on neuromorphic hardware. To address the limitation of bit width support on hardware, we propose bias calibration and scaled integration methods. Experiments on various tasks demonstrate that our SDANN achieves exactly the same accuracy as the quantized ANN. Beyond toy examples and software implementation, we successfully deployed and validated our spiking models on real neuromorphic hardware, demonstrating the feasibility of the SDANN framework.
△ Less
Submitted 22 June, 2025; v1 submitted 17 May, 2025;
originally announced May 2025.
-
NTIRE 2025 Challenge on Efficient Burst HDR and Restoration: Datasets, Methods, and Results
Authors:
Sangmin Lee,
Eunpil Park,
Angel Canelo,
Hyunhee Park,
Youngjo Kim,
Hyung-Ju Chun,
Xin Jin,
Chongyi Li,
Chun-Le Guo,
Radu Timofte,
Qi Wu,
Tianheng Qiu,
Yuchun Dong,
Shenglin Ding,
Guanghua Pan,
Weiyu Zhou,
Tao Hu,
Yixu Feng,
Duwei Dai,
Yu Cao,
Peng Wu,
Wei Dong,
Yanning Zhang,
Qingsen Yan,
Simon J. Larsen
, et al. (11 additional authors not shown)
Abstract:
This paper reviews the NTIRE 2025 Efficient Burst HDR and Restoration Challenge, which aims to advance efficient multi-frame high dynamic range (HDR) and restoration techniques. The challenge is based on a novel RAW multi-frame fusion dataset, comprising nine noisy and misaligned RAW frames with various exposure levels per scene. Participants were tasked with developing solutions capable of effect…
▽ More
This paper reviews the NTIRE 2025 Efficient Burst HDR and Restoration Challenge, which aims to advance efficient multi-frame high dynamic range (HDR) and restoration techniques. The challenge is based on a novel RAW multi-frame fusion dataset, comprising nine noisy and misaligned RAW frames with various exposure levels per scene. Participants were tasked with developing solutions capable of effectively fusing these frames while adhering to strict efficiency constraints: fewer than 30 million model parameters and a computational budget under 4.0 trillion FLOPs. A total of 217 participants registered, with six teams finally submitting valid solutions. The top-performing approach achieved a PSNR of 43.22 dB, showcasing the potential of novel methods in this domain. This paper provides a comprehensive overview of the challenge, compares the proposed solutions, and serves as a valuable reference for researchers and practitioners in efficient burst HDR and restoration.
△ Less
Submitted 17 May, 2025;
originally announced May 2025.
-
Bidirectional Distillation: A Mixed-Play Framework for Multi-Agent Generalizable Behaviors
Authors:
Lang Feng,
Jiahao Lin,
Dong Xing,
Li Zhang,
De Ma,
Gang Pan
Abstract:
Population-population generalization is a challenging problem in multi-agent reinforcement learning (MARL), particularly when agents encounter unseen co-players. However, existing self-play-based methods are constrained by the limitation of inside-space generalization. In this study, we propose Bidirectional Distillation (BiDist), a novel mixed-play framework, to overcome this limitation in MARL.…
▽ More
Population-population generalization is a challenging problem in multi-agent reinforcement learning (MARL), particularly when agents encounter unseen co-players. However, existing self-play-based methods are constrained by the limitation of inside-space generalization. In this study, we propose Bidirectional Distillation (BiDist), a novel mixed-play framework, to overcome this limitation in MARL. BiDist leverages knowledge distillation in two alternating directions: forward distillation, which emulates the historical policies' space and creates an implicit self-play, and reverse distillation, which systematically drives agents towards novel distributions outside the known policy space in a non-self-play manner. In addition, BiDist operates as a concise and efficient solution without the need for the complex and costly storage of past policies. We provide both theoretical analysis and empirical evidence to support BiDist's effectiveness. Our results highlight its remarkable generalization ability across a variety of cooperative, competitive, and social dilemma tasks, and reveal that BiDist significantly diversifies the policy distribution space. We also present comprehensive ablation studies to reinforce BiDist's effectiveness and key success factors. Source codes are available in the supplementary material.
△ Less
Submitted 16 May, 2025;
originally announced May 2025.
-
Large Wireless Localization Model (LWLM): A Foundation Model for Positioning in 6G Networks
Authors:
Guangjin Pan,
Kaixuan Huang,
Hui Chen,
Shunqing Zhang,
Christian Häger,
Henk Wymeersch
Abstract:
Accurate and robust localization is a critical enabler for emerging 5G and 6G applications, including autonomous driving, extended reality (XR), and smart manufacturing. While data-driven approaches have shown promise, most existing models require large amounts of labeled data and struggle to generalize across deployment scenarios and wireless configurations. To address these limitations, we propo…
▽ More
Accurate and robust localization is a critical enabler for emerging 5G and 6G applications, including autonomous driving, extended reality (XR), and smart manufacturing. While data-driven approaches have shown promise, most existing models require large amounts of labeled data and struggle to generalize across deployment scenarios and wireless configurations. To address these limitations, we propose a foundation-model-based solution tailored for wireless localization. We first analyze how different self-supervised learning (SSL) tasks acquire general-purpose and task-specific semantic features based on information bottleneck (IB) theory. Building on this foundation, we design a pretraining methodology for the proposed Large Wireless Localization Model (LWLM). Specifically, we propose an SSL framework that jointly optimizes three complementary objectives: (i) spatial-frequency masked channel modeling (SF-MCM), (ii) domain-transformation invariance (DTI), and (iii) position-invariant contrastive learning (PICL). These objectives jointly capture the underlying semantics of wireless channel from multiple perspectives. We further design lightweight decoders for key downstream tasks, including time-of-arrival (ToA) estimation, angle-of-arrival (AoA) estimation, single base station (BS) localization, and multiple BS localization. Comprehensive experimental results confirm that LWLM consistently surpasses both model-based and supervised learning baselines across all localization tasks. In particular, LWLM achieves 26.0%--87.5% improvement over transformer models without pretraining, and exhibits strong generalization under label-limited fine-tuning and unseen BS configurations, confirming its potential as a foundation model for wireless localization.
△ Less
Submitted 15 May, 2025;
originally announced May 2025.
-
Human-like Cognitive Generalization for Large Models via Brain-in-the-loop Supervision
Authors:
Jiaxuan Chen,
Yu Qi,
Yueming Wang,
Gang Pan
Abstract:
Recent advancements in deep neural networks (DNNs), particularly large-scale language models, have demonstrated remarkable capabilities in image and natural language understanding. Although scaling up model parameters with increasing volume of training data has progressively improved DNN capabilities, achieving complex cognitive abilities - such as understanding abstract concepts, reasoning, and a…
▽ More
Recent advancements in deep neural networks (DNNs), particularly large-scale language models, have demonstrated remarkable capabilities in image and natural language understanding. Although scaling up model parameters with increasing volume of training data has progressively improved DNN capabilities, achieving complex cognitive abilities - such as understanding abstract concepts, reasoning, and adapting to novel scenarios, which are intrinsic to human cognition - remains a major challenge. In this study, we show that brain-in-the-loop supervised learning, utilizing a small set of brain signals, can effectively transfer human conceptual structures to DNNs, significantly enhancing their comprehension of abstract and even unseen concepts. Experimental results further indicate that the enhanced cognitive capabilities lead to substantial performance gains in challenging tasks, including few-shot/zero-shot learning and out-of-distribution recognition, while also yielding highly interpretable concept representations. These findings highlight that human-in-the-loop supervision can effectively augment the complex cognitive abilities of large models, offering a promising pathway toward developing more human-like cognitive abilities in artificial systems.
△ Less
Submitted 13 May, 2025;
originally announced May 2025.
-
Dual-UAV-Enabled Secure Communication and Sensing for A2G-ISAC Systems with Maneuverable Jamming
Authors:
Libiao Lou,
Yuan Liu,
Fotis Foukalas,
Hongjiang Lei,
Gaofeng Pan,
Theodoros A. Tsiftsis,
Hongwu Liu
Abstract:
In this paper, we propose a dual-unmanned aerial vehicle (UAV)-enabled secure communication and sensing (SCS) scheme for an air-to-ground integrated sensing and communication (ISAC) system, in which a dual-functional source UAV and jamming UAV collaborate to enhance both the secure communication and target sensing performance. From a perspective of hybrid monostatitc-bistatic radar, the jamming UA…
▽ More
In this paper, we propose a dual-unmanned aerial vehicle (UAV)-enabled secure communication and sensing (SCS) scheme for an air-to-ground integrated sensing and communication (ISAC) system, in which a dual-functional source UAV and jamming UAV collaborate to enhance both the secure communication and target sensing performance. From a perspective of hybrid monostatitc-bistatic radar, the jamming UAV maneuvers to aid the source UAV to detect multiple ground targets by emitting artificial noise, meanwhile interfering with the ground eavesdropper. Residual interference is considered to reflect the effects of imperfect successive interference cancellation (SIC) on the receive signal-plus-interference-to-noise ratios, which results in a degraded system performance. To maximize the average secrecy rate (ASR), the dual-UAV trajectory and dual-UAV beamforming are jointly optimized subject to the transmit power budget, UAV maneuvering constraint, and sensing requirements. To tackle the highly complicated non-convex ASR maximization problem, the dual-UAV trajectory and dual-UAV beamforming are optimized for the secure communication (SC) purpose and the SCS purpose, sequentially. In the SC phase, a block coordinate descent algorithm is proposed to optimize the dual-UAV trajectory and dual-UAV beamforming iteratively, using the trust-region successive convex approximation (SCA) and semidefinite relaxation (SDR) techniques. Then, a weighted distance minimization problem is formulated to determine the dual-UAV maneuvering positions suitable for the SCS purpose, which is solved by a heuristic greedy algorithm, followed by the joint optimization of source beamforming and jamming beamforming.
△ Less
Submitted 18 May, 2025; v1 submitted 13 May, 2025;
originally announced May 2025.
-
Hybrid Spiking Vision Transformer for Object Detection with Event Cameras
Authors:
Qi Xu,
Jie Deng,
Jiangrong Shen,
Biwu Chen,
Huajin Tang,
Gang Pan
Abstract:
Event-based object detection has gained increasing attention due to its advantages such as high temporal resolution, wide dynamic range, and asynchronous address-event representation. Leveraging these advantages, Spiking Neural Networks (SNNs) have emerged as a promising approach, offering low energy consumption and rich spatiotemporal dynamics. To further enhance the performance of event-based ob…
▽ More
Event-based object detection has gained increasing attention due to its advantages such as high temporal resolution, wide dynamic range, and asynchronous address-event representation. Leveraging these advantages, Spiking Neural Networks (SNNs) have emerged as a promising approach, offering low energy consumption and rich spatiotemporal dynamics. To further enhance the performance of event-based object detection, this study proposes a novel hybrid spike vision Transformer (HsVT) model. The HsVT model integrates a spatial feature extraction module to capture local and global features, and a temporal feature extraction module to model time dependencies and long-term patterns in event sequences. This combination enables HsVT to capture spatiotemporal features, improving its capability to handle complex event-based object detection tasks. To support research in this area, we developed and publicly released The Fall Detection Dataset as a benchmark for event-based object detection tasks. This dataset, captured using an event-based camera, ensures facial privacy protection and reduces memory usage due to the event representation format. We evaluated the HsVT model on GEN1 and Fall Detection datasets across various model sizes. Experimental results demonstrate that HsVT achieves significant performance improvements in event detection with fewer parameters.
△ Less
Submitted 12 May, 2025;
originally announced May 2025.
-
TS-SNN: Temporal Shift Module for Spiking Neural Networks
Authors:
Kairong Yu,
Tianqing Zhang,
Qi Xu,
Gang Pan,
Hongwei Wang
Abstract:
Spiking Neural Networks (SNNs) are increasingly recognized for their biological plausibility and energy efficiency, positioning them as strong alternatives to Artificial Neural Networks (ANNs) in neuromorphic computing applications. SNNs inherently process temporal information by leveraging the precise timing of spikes, but balancing temporal feature utilization with low energy consumption remains…
▽ More
Spiking Neural Networks (SNNs) are increasingly recognized for their biological plausibility and energy efficiency, positioning them as strong alternatives to Artificial Neural Networks (ANNs) in neuromorphic computing applications. SNNs inherently process temporal information by leveraging the precise timing of spikes, but balancing temporal feature utilization with low energy consumption remains a challenge. In this work, we introduce Temporal Shift module for Spiking Neural Networks (TS-SNN), which incorporates a novel Temporal Shift (TS) module to integrate past, present, and future spike features within a single timestep via a simple yet effective shift operation. A residual combination method prevents information loss by integrating shifted and original features. The TS module is lightweight, requiring only one additional learnable parameter, and can be seamlessly integrated into existing architectures with minimal additional computational cost. TS-SNN achieves state-of-the-art performance on benchmarks like CIFAR-10 (96.72\%), CIFAR-100 (80.28\%), and ImageNet (70.61\%) with fewer timesteps, while maintaining low energy consumption. This work marks a significant step forward in developing efficient and accurate SNN architectures.
△ Less
Submitted 16 May, 2025; v1 submitted 7 May, 2025;
originally announced May 2025.
-
Rate-Limited Closed-Loop Distributed ISAC Systems: An Autoencoder Approach
Authors:
Guangjin Pan,
Zhixing Li,
Ayça Özçelikkale,
Christian Häger,
Musa Furkan Keskin,
Henk Wymeersch
Abstract:
In closed-loop distributed multi-sensor integrated sensing and communication (ISAC) systems, performance often hinges on transmitting high-dimensional sensor observations over rate-limited networks. In this paper, we first present a general framework for rate-limited closed-loop distributed ISAC systems, and then propose an autoencoder-based observation compression method to overcome the constrain…
▽ More
In closed-loop distributed multi-sensor integrated sensing and communication (ISAC) systems, performance often hinges on transmitting high-dimensional sensor observations over rate-limited networks. In this paper, we first present a general framework for rate-limited closed-loop distributed ISAC systems, and then propose an autoencoder-based observation compression method to overcome the constraints imposed by limited transmission capacity. Building on this framework, we conduct a case study using a closed-loop linear quadratic regulator (LQR) system to analyze how the interplay among observation, compression, and state dimensions affects reconstruction accuracy, state estimation error, and control performance. In multi-sensor scenarios, our results further show that optimal resource allocation initially prioritizes low-noise sensors until the compression becomes lossless, after which resources are reallocated to high-noise sensors.
△ Less
Submitted 3 May, 2025;
originally announced May 2025.
-
A Unified QoS-Aware Multiplexing Framework for Next Generation Immersive Communication with Legacy Wireless Applications
Authors:
Jihong Li,
Shunqing Zhang,
Tao Yu,
Guangjin Pan,
Kaixuan Huang,
Xiaojing Chen,
Yanzan Sun,
Junyu Liu,
Jiandong Li,
Derrick Wing Kwan Ng
Abstract:
Immersive communication, including emerging augmented reality, virtual reality, and holographic telepresence, has been identified as a key service for enabling next-generation wireless applications. To align with legacy wireless applications, such as enhanced mobile broadband or ultra-reliable low-latency communication, network slicing has been widely adopted. However, attempting to statistically…
▽ More
Immersive communication, including emerging augmented reality, virtual reality, and holographic telepresence, has been identified as a key service for enabling next-generation wireless applications. To align with legacy wireless applications, such as enhanced mobile broadband or ultra-reliable low-latency communication, network slicing has been widely adopted. However, attempting to statistically isolate the above types of wireless applications through different network slices may lead to throughput degradation and increased queue backlog. To address these challenges, we establish a unified QoS-aware framework that supports immersive communication and legacy wireless applications simultaneously. Based on the Lyapunov drift theorem, we transform the original long-term throughput maximization problem into an equivalent short-term throughput maximization weighted by virtual queue length. Moreover, to cope with the challenges introduced by the interaction between large-timescale network slicing and short-timescale resource allocation, we propose an adaptive adversarial slicing (Ad2S) scheme for networks with invarying channel statistics. To track the network channel variations, we also propose a measurement extrapolation-Kalman filter (ME-KF)-based method and refine our scheme into Ad2S-non-stationary refinement (Ad2S-NR). Through extended numerical examples, we demonstrate that our proposed schemes achieve 3.86 Mbps throughput improvement and 63.96% latency reduction with 24.36% convergence time reduction. Within our framework, the trade-off between total throughput and user service experience can be achieved by tuning systematic parameters.
△ Less
Submitted 2 May, 2025; v1 submitted 30 April, 2025;
originally announced April 2025.
-
UNILoc: Unified Localization Combining Model-Based Geometry and Unsupervised Learning
Authors:
Yuhao Zhang,
Guangjin Pan,
Musa Furkan Keskin,
Ossi Kaltiokallio,
Mikko Valkama,
Henk Wymeersch
Abstract:
Accurate mobile device localization is critical for emerging 5G/6G applications such as autonomous vehicles and augmented reality. In this paper, we propose a unified localization method that integrates model-based and machine learning (ML)-based methods to reap their respective advantages by exploiting available map information. In order to avoid supervised learning, we generate training labels a…
▽ More
Accurate mobile device localization is critical for emerging 5G/6G applications such as autonomous vehicles and augmented reality. In this paper, we propose a unified localization method that integrates model-based and machine learning (ML)-based methods to reap their respective advantages by exploiting available map information. In order to avoid supervised learning, we generate training labels automatically via optimal transport (OT) by fusing geometric estimates with building layouts. Ray-tracing based simulations are carried out to demonstrate that the proposed method significantly improves positioning accuracy for both line-of-sight (LoS) users (compared to ML-based methods) and non-line-of-sight (NLoS) users (compared to model-based methods). Remarkably, the unified method is able to achieve competitive overall performance with the fully-supervised fingerprinting, while eliminating the need for cumbersome labeled data measurement and collection.
△ Less
Submitted 27 April, 2025; v1 submitted 24 April, 2025;
originally announced April 2025.
-
GNN-ACLP: Graph Neural Networks based Analog Circuit Link Prediction
Authors:
Guanyuan Pan,
Tiansheng Zhou,
Bingtao Ma,
Yaqi Wang,
Jianxiang Zhao,
Zhi Li,
Yugui Lin,
Pietro Lio,
Shuai Wang
Abstract:
Circuit link prediction identifying missing component connections from incomplete netlists is crucial in automating analog circuit design. However, existing methods face three main challenges: 1) Insufficient use of topological patterns in circuit graphs reduces prediction accuracy; 2) Data scarcity due to the complexity of annotations hinders model generalization; 3) Limited adaptability to vario…
▽ More
Circuit link prediction identifying missing component connections from incomplete netlists is crucial in automating analog circuit design. However, existing methods face three main challenges: 1) Insufficient use of topological patterns in circuit graphs reduces prediction accuracy; 2) Data scarcity due to the complexity of annotations hinders model generalization; 3) Limited adaptability to various netlist formats. We propose GNN-ACLP, a Graph Neural Networks (GNNs) based framework featuring three innovations to tackle these challenges. First, we introduce the SEAL (Subgraphs, Embeddings, and Attributes for Link Prediction) framework and achieve port-level accuracy in circuit link prediction. Second, we propose Netlist Babel Fish, a netlist format conversion tool leveraging retrieval-augmented generation (RAG) with a large language model (LLM) to enhance the compatibility of netlist formats. Finally, we construct SpiceNetlist, a comprehensive dataset that contains 775 annotated circuits across 10 different component classes. Experimental results achieve accuracy improvements of 16.08% on SpiceNetlist, 11.38% on Image2Net, and 16.01% on Masala-CHAI in intra-dataset evaluation, while maintaining accuracy from 92.05% to 99.07% in cross-dataset evaluation, exhibiting robust feature transfer capabilities.
△ Less
Submitted 18 May, 2025; v1 submitted 14 April, 2025;
originally announced April 2025.
-
Potential Indicator for Continuous Emotion Arousal by Dynamic Neural Synchrony
Authors:
Guandong Pan,
Zhaobang Wu,
Yaqian Yang,
Xin Wang,
Longzhao Liu,
Zhiming Zheng,
Shaoting Tang
Abstract:
The need for automatic and high-quality emotion annotation is paramount in applications such as continuous emotion recognition and video highlight detection, yet achieving this through manual human annotations is challenging. Inspired by inter-subject correlation (ISC) utilized in neuroscience, this study introduces a novel Electroencephalography (EEG) based ISC methodology that leverages a single…
▽ More
The need for automatic and high-quality emotion annotation is paramount in applications such as continuous emotion recognition and video highlight detection, yet achieving this through manual human annotations is challenging. Inspired by inter-subject correlation (ISC) utilized in neuroscience, this study introduces a novel Electroencephalography (EEG) based ISC methodology that leverages a single-electrode and feature-based dynamic approach. Our contributions are three folds. Firstly, we reidentify two potent emotion features suitable for classifying emotions-first-order difference (FD) an differential entropy (DE). Secondly, through the use of overall correlation analysis, we demonstrate the heterogeneous synchronized performance of electrodes. This performance aligns with neural emotion patterns established in prior studies, thus validating the effectiveness of our approach. Thirdly, by employing a sliding window correlation technique, we showcase the significant consistency of dynamic ISCs across various features or key electrodes in each analyzed film clip. Our findings indicate the method's reliability in capturing consistent, dynamic shared neural synchrony among individuals, triggered by evocative film stimuli. This underscores the potential of our approach to serve as an indicator of continuous human emotion arousal. The implications of this research are significant for advancements in affective computing and the broader neuroscience field, suggesting a streamlined and effective tool for emotion analysis in real-world applications.
△ Less
Submitted 23 January, 2025;
originally announced April 2025.
-
PP-FormulaNet: Bridging Accuracy and Efficiency in Advanced Formula Recognition
Authors:
Hongen Liu,
Cheng Cui,
Yuning Du,
Yi Liu,
Gang Pan
Abstract:
Formula recognition is an important task in document intelligence. It involves converting mathematical expressions from document images into structured symbolic formats that computers can easily work with. LaTeX is the most common format used for this purpose. In this work, we present PP-FormulaNet, a state-of-the-art formula recognition model that excels in both accuracy and efficiency. To meet t…
▽ More
Formula recognition is an important task in document intelligence. It involves converting mathematical expressions from document images into structured symbolic formats that computers can easily work with. LaTeX is the most common format used for this purpose. In this work, we present PP-FormulaNet, a state-of-the-art formula recognition model that excels in both accuracy and efficiency. To meet the diverse needs of applications, we have developed two specialized models: PP-FormulaNet-L, tailored for high-accuracy scenarios, and PP-FormulaNet-S, optimized for high-efficiency contexts. Our extensive evaluations reveal that PP-FormulaNet-L attains accuracy levels that surpass those of prominent models such as UniMERNet by a significant 6%. Conversely, PP-FormulaNet-S operates at speeds that are over 16 times faster. These advancements facilitate seamless integration of PP-FormulaNet into a broad spectrum of document processing environments that involve intricate mathematical formulas. Furthermore, we introduce a Formula Mining System, which is capable of extracting a vast amount of high-quality formula data. This system further enhances the robustness and applicability of our formula recognition model. Code and models are publicly available at PaddleOCR(https://github.com/PaddlePaddle/PaddleOCR) and PaddleX(https://github.com/PaddlePaddle/PaddleX).
△ Less
Submitted 24 March, 2025;
originally announced March 2025.
-
Mitigating Reward Over-Optimization in RLHF via Behavior-Supported Regularization
Authors:
Juntao Dai,
Taiye Chen,
Yaodong Yang,
Qian Zheng,
Gang Pan
Abstract:
Reinforcement learning from human feedback (RLHF) is an effective method for aligning large language models (LLMs) with human values. However, reward over-optimization remains an open challenge leading to discrepancies between the performance of LLMs under the reward model and the true human objectives. A primary contributor to reward over-optimization is the extrapolation error that arises when t…
▽ More
Reinforcement learning from human feedback (RLHF) is an effective method for aligning large language models (LLMs) with human values. However, reward over-optimization remains an open challenge leading to discrepancies between the performance of LLMs under the reward model and the true human objectives. A primary contributor to reward over-optimization is the extrapolation error that arises when the reward model evaluates out-of-distribution (OOD) responses. However, current methods still fail to prevent the increasing frequency of OOD response generation during the reinforcement learning (RL) process and are not effective at handling extrapolation errors from OOD responses. In this work, we propose the Behavior-Supported Policy Optimization (BSPO) method to mitigate the reward over-optimization issue. Specifically, we define behavior policy as the next token distribution of the reward training dataset to model the in-distribution (ID) region of the reward model. Building on this, we introduce the behavior-supported Bellman operator to regularize the value function, penalizing all OOD values without impacting the ID ones. Consequently, BSPO reduces the generation of OOD responses during the RL process, thereby avoiding overestimation caused by the reward model's extrapolation errors. Theoretically, we prove that BSPO guarantees a monotonic improvement of the supported policy until convergence to the optimal behavior-supported policy. Empirical results from extensive experiments show that BSPO outperforms baselines in preventing reward over-optimization due to OOD evaluation and finding the optimal ID policy.
△ Less
Submitted 23 March, 2025;
originally announced March 2025.
-
FpgaHub: Fpga-centric Hyper-heterogeneous Computing Platform for Big Data Analytics
Authors:
Zeke Wang,
Jie Zhang,
Hongjing Huang,
Yingtao Li,
Xueying Zhu,
Mo Sun,
Zihan Yang,
De Ma,
Huajing Tang,
Gang Pan,
Fei Wu,
Bingsheng He,
Gustavo Alonso
Abstract:
Modern data analytics requires a huge amount of computing power and processes a massive amount of data. At the same time, the underlying computing platform is becoming much more heterogeneous on both hardware and software. Even though specialized hardware, e.g., FPGA- or GPU- or TPU-based systems, often achieves better performance than a CPU-only system due to the slowing of Moore's law, such syst…
▽ More
Modern data analytics requires a huge amount of computing power and processes a massive amount of data. At the same time, the underlying computing platform is becoming much more heterogeneous on both hardware and software. Even though specialized hardware, e.g., FPGA- or GPU- or TPU-based systems, often achieves better performance than a CPU-only system due to the slowing of Moore's law, such systems are limited in what they can do. For example, GPU-only approaches suffer from severe IO limitations. To truly exploit the potential of hardware heterogeneity, we present FpgaHub, an FPGA-centric hyper-heterogeneous computing platform for big data analytics. The key idea of FpgaHub is to use reconfigurable computing to implement a versatile hub complementing other processors (CPUs, GPUs, DPUs, programmable switches, computational storage, etc.). Using an FPGA as the basis, we can take advantage of its highly reconfigurable nature and rich IO interfaces such as PCIe, networking, and on-board memory, to place it at the center of the architecture and use it as a data and control plane for data movement, scheduling, pre-processing, etc. FpgaHub enables architectural flexibility to allow exploring the rich design space of heterogeneous computing platforms.
△ Less
Submitted 12 March, 2025;
originally announced March 2025.
-
Improving the Sparse Structure Learning of Spiking Neural Networks from the View of Compression Efficiency
Authors:
Jiangrong Shen,
Qi Xu,
Gang Pan,
Badong Chen
Abstract:
The human brain utilizes spikes for information transmission and dynamically reorganizes its network structure to boost energy efficiency and cognitive capabilities throughout its lifespan. Drawing inspiration from this spike-based computation, Spiking Neural Networks (SNNs) have been developed to construct event-driven models that emulate this efficiency. Despite these advances, deep SNNs continu…
▽ More
The human brain utilizes spikes for information transmission and dynamically reorganizes its network structure to boost energy efficiency and cognitive capabilities throughout its lifespan. Drawing inspiration from this spike-based computation, Spiking Neural Networks (SNNs) have been developed to construct event-driven models that emulate this efficiency. Despite these advances, deep SNNs continue to suffer from over-parameterization during training and inference, a stark contrast to the brain's ability to self-organize. Furthermore, existing sparse SNNs are challenged by maintaining optimal pruning levels due to a static pruning ratio, resulting in either under- or over-pruning. In this paper, we propose a novel two-stage dynamic structure learning approach for deep SNNs, aimed at maintaining effective sparse training from scratch while optimizing compression efficiency. The first stage evaluates the compressibility of existing sparse subnetworks within SNNs using the PQ index, which facilitates an adaptive determination of the rewiring ratio for synaptic connections based on data compression insights. In the second stage, this rewiring ratio critically informs the dynamic synaptic connection rewiring process, including both pruning and regrowth. This approach significantly improves the exploration of sparse structure training in deep SNNs, adapting sparsity dynamically from the point view of compression efficiency. Our experiments demonstrate that this sparse training approach not only aligns with the performance of current deep SNNs models but also significantly improves the efficiency of compressing sparse SNNs. Crucially, it preserves the advantages of initiating training with sparse models and offers a promising solution for implementing edge AI on neuromorphic hardware.
△ Less
Submitted 19 February, 2025;
originally announced February 2025.
-
Spiking Neural Networks for Temporal Processing: Status Quo and Future Prospects
Authors:
Chenxiang Ma,
Xinyi Chen,
Yanchen Li,
Qu Yang,
Yujie Wu,
Guoqi Li,
Gang Pan,
Huajin Tang,
Kay Chen Tan,
Jibin Wu
Abstract:
Temporal processing is fundamental for both biological and artificial intelligence systems, as it enables the comprehension of dynamic environments and facilitates timely responses. Spiking Neural Networks (SNNs) excel in handling such data with high efficiency, owing to their rich neuronal dynamics and sparse activity patterns. Given the recent surge in the development of SNNs, there is an urgent…
▽ More
Temporal processing is fundamental for both biological and artificial intelligence systems, as it enables the comprehension of dynamic environments and facilitates timely responses. Spiking Neural Networks (SNNs) excel in handling such data with high efficiency, owing to their rich neuronal dynamics and sparse activity patterns. Given the recent surge in the development of SNNs, there is an urgent need for a comprehensive evaluation of their temporal processing capabilities. In this paper, we first conduct an in-depth assessment of commonly used neuromorphic benchmarks, revealing critical limitations in their ability to evaluate the temporal processing capabilities of SNNs. To bridge this gap, we further introduce a benchmark suite consisting of three temporal processing tasks characterized by rich temporal dynamics across multiple timescales. Utilizing this benchmark suite, we perform a thorough evaluation of recently introduced SNN approaches to elucidate the current status of SNNs in temporal processing. Our findings indicate significant advancements in recently developed spiking neuron models and neural architectures regarding their temporal processing capabilities, while also highlighting a performance gap in handling long-range dependencies when compared to state-of-the-art non-spiking models. Finally, we discuss the key challenges and outline potential avenues for future research.
△ Less
Submitted 13 February, 2025;
originally announced February 2025.
-
CausalCOMRL: Context-Based Offline Meta-Reinforcement Learning with Causal Representation
Authors:
Zhengzhe Zhang,
Wenjia Meng,
Haoliang Sun,
Gang Pan
Abstract:
Context-based offline meta-reinforcement learning (OMRL) methods have achieved appealing success by leveraging pre-collected offline datasets to develop task representations that guide policy learning. However, current context-based OMRL methods often introduce spurious correlations, where task components are incorrectly correlated due to confounders. These correlations can degrade policy performa…
▽ More
Context-based offline meta-reinforcement learning (OMRL) methods have achieved appealing success by leveraging pre-collected offline datasets to develop task representations that guide policy learning. However, current context-based OMRL methods often introduce spurious correlations, where task components are incorrectly correlated due to confounders. These correlations can degrade policy performance when the confounders in the test task differ from those in the training task. To address this problem, we propose CausalCOMRL, a context-based OMRL method that integrates causal representation learning. This approach uncovers causal relationships among the task components and incorporates the causal relationships into task representations, enhancing the generalizability of RL agents. We further improve the distinction of task representations from different tasks by using mutual information optimization and contrastive learning. Utilizing these causal task representations, we employ SAC to optimize policies on meta-RL benchmarks. Experimental results show that CausalCOMRL achieves better performance than other methods on most benchmarks.
△ Less
Submitted 2 February, 2025;
originally announced February 2025.
-
The Composite Task Challenge for Cooperative Multi-Agent Reinforcement Learning
Authors:
Yurui Li,
Yuxuan Chen,
Li Zhang,
Shijian Li,
Gang Pan
Abstract:
The significant role of division of labor (DOL) in promoting cooperation is widely recognized in real-world applications.Many cooperative multi-agent reinforcement learning (MARL) methods have incorporated the concept of DOL to improve cooperation among agents.However, the tasks used in existing testbeds typically correspond to tasks where DOL is often not a necessary feature for achieving optimal…
▽ More
The significant role of division of labor (DOL) in promoting cooperation is widely recognized in real-world applications.Many cooperative multi-agent reinforcement learning (MARL) methods have incorporated the concept of DOL to improve cooperation among agents.However, the tasks used in existing testbeds typically correspond to tasks where DOL is often not a necessary feature for achieving optimal policies.Additionally, the full utilize of DOL concept in MARL methods remains unrealized due to the absence of appropriate tasks.To enhance the generality and applicability of MARL methods in real-world scenarios, there is a necessary to develop tasks that demand multi-agent DOL and cooperation.In this paper, we propose a series of tasks designed to meet these requirements, drawing on real-world rules as the guidance for their design.We guarantee that DOL and cooperation are necessary condition for completing tasks and introduce three factors to expand the diversity of proposed tasks to cover more realistic situations.We evaluate 10 cooperative MARL methods on the proposed tasks.The results indicate that all baselines perform poorly on these tasks.To further validate the solvability of these tasks, we also propose simplified variants of proposed tasks.Experimental results show that baselines are able to handle these simplified variants, providing evidence of the solvability of the proposed tasks.The source files is available at https://github.com/Yurui-Li/CTC.
△ Less
Submitted 1 February, 2025;
originally announced February 2025.
-
AI-driven Wireless Positioning: Fundamentals, Standards, State-of-the-art, and Challenges
Authors:
Guangjin Pan,
Yuan Gao,
Yilin Gao,
Zhiyong Zhong,
Xiaoyu Yang,
Xinyu Guo,
Shugong Xu
Abstract:
Wireless positioning technologies hold significant value for applications in autonomous driving, extended reality (XR), unmanned aerial vehicles (UAVs), and more. With the advancement of artificial intelligence (AI), leveraging AI to enhance positioning accuracy and robustness has emerged as a field full of potential. Driven by the requirements and functionalities defined in the 3rd Generation Par…
▽ More
Wireless positioning technologies hold significant value for applications in autonomous driving, extended reality (XR), unmanned aerial vehicles (UAVs), and more. With the advancement of artificial intelligence (AI), leveraging AI to enhance positioning accuracy and robustness has emerged as a field full of potential. Driven by the requirements and functionalities defined in the 3rd Generation Partnership Project (3GPP) standards, AI/machine learning (ML)-based positioning is becoming a key technology to overcome the limitations of traditional methods. This paper begins with an introduction to the fundamentals of AI and wireless positioning, covering AI models, algorithms, positioning applications, emerging wireless technologies, and the basics of positioning techniques. Subsequently, focusing on standardization progress, we provide a comprehensive review of the evolution of 3GPP positioning standards, with an emphasis on the integration of AI/ML technologies in recent and upcoming releases. Based on the AI/ML-assisted positioning and direct AI/ML positioning schemes outlined in the standards, we conduct an in-depth investigation of related research. we focus on state-of-the-art (SOTA) research in AI-based line-of-sight (LOS)/non-line-of-sight (NLOS) detection, time of arrival (TOA)/time difference of arrival (TDOA) estimation, and angle estimation techniques. For Direct AI/ML Positioning, we explore SOTA advancements in fingerprint-based positioning, knowledge-assisted AI positioning, and channel charting-based positioning. Furthermore, we introduce publicly available datasets for wireless positioning and conclude by summarizing the challenges and opportunities of AI-driven wireless positioning.
△ Less
Submitted 24 January, 2025;
originally announced January 2025.
-
Energy Optimization of Multi-task DNN Inference in MEC-assisted XR Devices: A Lyapunov-Guided Reinforcement Learning Approach
Authors:
Yanzan Sun,
Jiacheng Qiu,
Guangjin Pan,
Shugong Xu,
Shunqing Zhang,
Xiaoyun Wang,
Shuangfeng Han
Abstract:
Extended reality (XR), blending virtual and real worlds, is a key application of future networks. While AI advancements enhance XR capabilities, they also impose significant computational and energy challenges on lightweight XR devices. In this paper, we developed a distributed queue model for multi-task DNN inference, addressing issues of resource competition and queue coupling. In response to th…
▽ More
Extended reality (XR), blending virtual and real worlds, is a key application of future networks. While AI advancements enhance XR capabilities, they also impose significant computational and energy challenges on lightweight XR devices. In this paper, we developed a distributed queue model for multi-task DNN inference, addressing issues of resource competition and queue coupling. In response to the challenges posed by the high energy consumption and limited resources of XR devices, we designed a dual time-scale joint optimization strategy for model partitioning and resource allocation, formulated as a bi-level optimization problem. This strategy aims to minimize the total energy consumption of XR devices while ensuring queue stability and adhering to computational and communication resource constraints. To tackle this problem, we devised a Lyapunov-guided Proximal Policy Optimization algorithm, named LyaPPO. Numerical results demonstrate that the LyaPPO algorithm outperforms the baselines, achieving energy conservation of 24.79% to 46.14% under varying resource capacities. Specifically, the proposed algorithm reduces the energy consumption of XR devices by 24.29% to 56.62% compared to baseline algorithms.
△ Less
Submitted 5 January, 2025;
originally announced January 2025.
-
How Breakable Is Privacy: Probing and Resisting Model Inversion Attacks in Collaborative Inference
Authors:
Rongke Liu,
Youwen Zhu,
Dong Wang,
Gaoning Pan,
Xingyu He,
Weizhi Meng
Abstract:
Collaborative inference (CI) improves computational efficiency for edge devices by transmitting intermediate features to cloud models. However, this process inevitably exposes feature representations to model inversion attacks (MIAs), enabling unauthorized data reconstruction. Despite extensive research, there is no established criterion for assessing the difficulty of MIA implementation, leaving…
▽ More
Collaborative inference (CI) improves computational efficiency for edge devices by transmitting intermediate features to cloud models. However, this process inevitably exposes feature representations to model inversion attacks (MIAs), enabling unauthorized data reconstruction. Despite extensive research, there is no established criterion for assessing the difficulty of MIA implementation, leaving a fundamental question unanswered: \textit{What factors truly and verifiably determine the attack's success in CI?} Moreover, existing defenses lack the theoretical foundation described above, making it challenging to regulate feature information effectively while ensuring privacy and minimizing computational overhead. These shortcomings introduce three key challenges: theoretical gap, methodological limitation, and practical constraint.
To overcome these challenges, we propose the first theoretical criterion to assess MIA difficulty in CI, identifying mutual information, entropy, and effective information volume as key influencing factors. The validity of this criterion is demonstrated by using the mutual information neural estimator. Building on this insight, we propose SiftFunnel, a privacy-preserving framework to resist MIA while maintaining usability. Specifically, we incorporate linear and non-linear correlation constraints alongside label smoothing to suppress redundant information transmission, effectively balancing privacy and usability. To enhance deployability, the edge model adopts a funnel-shaped structure with attention mechanisms, strengthening privacy while reducing computational and storage burdens. Experiments show that, compared to state-of-the-art defense, SiftFunnel increases reconstruction error by $\sim$30\%, lowers mutual and effective information metrics by $\geq$50\%, and reduces edge burdens by almost $20\times$, while maintaining comparable usability.
△ Less
Submitted 20 June, 2025; v1 submitted 1 January, 2025;
originally announced January 2025.
-
SpectralKD: A Unified Framework for Interpreting and Distilling Vision Transformers via Spectral Analysis
Authors:
Huiyuan Tian,
Bonan Xu,
Shijian Li,
Gang Pan
Abstract:
Knowledge Distillation (KD) has achieved widespread success in compressing large Vision Transformers (ViTs), but a unified theoretical framework for both ViTs and KD is still lacking. In this paper, we propose SpectralKD, a novel unified analytical framework that offers deeper insights into ViTs and optimizes KD via spectral analysis. Our model-wise analysis reveals that CaiT concentrates informat…
▽ More
Knowledge Distillation (KD) has achieved widespread success in compressing large Vision Transformers (ViTs), but a unified theoretical framework for both ViTs and KD is still lacking. In this paper, we propose SpectralKD, a novel unified analytical framework that offers deeper insights into ViTs and optimizes KD via spectral analysis. Our model-wise analysis reveals that CaiT concentrates information in their first and last few layers, informing optimal layer selection for KD. Surprisingly, our layer-wise analysis discovers that Swin Transformer and CaiT exhibit similar spectral encoding patterns despite their architectural differences, leading to feature map alignment guideline. Building on these insights, we propose a simple yet effective spectral alignment method for KD. Benefiting from the deeper understanding by above analysis results, even such a simple strategy achieves state-of-the-art performance on ImageNet-1K without introducing any trainable parameters, improving DeiT-Tiny by $+5.2\%$ and Swin-Tiny by $+1.4\%$ in top-1 accuracy. Furthermore, our post-training analysis reveals that distilled students can reproduce spectral patterns similar to their teachers, opening a new area we term ``distillation dynamics". Code and experimental logs are available in https://github.com/thy960112/SpectralKD.
△ Less
Submitted 30 January, 2025; v1 submitted 25 December, 2024;
originally announced December 2024.
-
Darkit: A User-Friendly Software Toolkit for Spiking Large Language Model
Authors:
Xin Du,
Shifan Ye,
Qian Zheng,
Yangfan Hu,
Rui Yan,
Shunyu Qi,
Shuyang Chen,
Huajin Tang,
Gang Pan,
Shuiguang Deng
Abstract:
Large language models (LLMs) have been widely applied in various practical applications, typically comprising billions of parameters, with inference processes requiring substantial energy and computational resources. In contrast, the human brain, employing bio-plausible spiking mechanisms, can accomplish the same tasks while significantly reducing energy consumption, even with a similar number of…
▽ More
Large language models (LLMs) have been widely applied in various practical applications, typically comprising billions of parameters, with inference processes requiring substantial energy and computational resources. In contrast, the human brain, employing bio-plausible spiking mechanisms, can accomplish the same tasks while significantly reducing energy consumption, even with a similar number of parameters. Based on this, several pioneering researchers have proposed and implemented various large language models that leverage spiking neural networks. They have demonstrated the feasibility of these models, validated their performance, and open-sourced their frameworks and partial source code. To accelerate the adoption of brain-inspired large language models and facilitate secondary development for researchers, we are releasing a software toolkit named DarwinKit (Darkit). The toolkit is designed specifically for learners, researchers, and developers working on spiking large models, offering a suite of highly user-friendly features that greatly simplify the learning, deployment, and development processes.
△ Less
Submitted 20 December, 2024;
originally announced December 2024.
-
Personalized Sleep Staging Leveraging Source-free Unsupervised Domain Adaptation
Authors:
Yangxuan Zhou,
Sha Zhao,
Jiquan Wang,
Haiteng Jiang,
hijian Li,
Benyan Luo,
Tao Li,
Gang Pan
Abstract:
Sleep staging is crucial for assessing sleep quality and diagnosing related disorders. Recent deep learning models for automatic sleep staging using polysomnography often suffer from poor generalization to new subjects because they are trained and tested on the same labeled datasets, overlooking individual differences. To tackle this issue, we propose a novel Source-Free Unsupervised Individual Do…
▽ More
Sleep staging is crucial for assessing sleep quality and diagnosing related disorders. Recent deep learning models for automatic sleep staging using polysomnography often suffer from poor generalization to new subjects because they are trained and tested on the same labeled datasets, overlooking individual differences. To tackle this issue, we propose a novel Source-Free Unsupervised Individual Domain Adaptation (SF-UIDA) framework. This two-step adaptation scheme allows the model to effectively adjust to new unlabeled individuals without needing source data, facilitating personalized customization in clinical settings. Our framework has been applied to three established sleep staging models and tested on three public datasets, achieving state-of-the-art performance.
△ Less
Submitted 11 December, 2024;
originally announced December 2024.
-
CLDA-YOLO: Visual Contrastive Learning Based Domain Adaptive YOLO Detector
Authors:
Tianheng Qiu,
Ka Lung Law,
Guanghua Pan,
Jufei Wang,
Xin Gao,
Xuan Huang,
Hu Wei
Abstract:
Unsupervised domain adaptive (UDA) algorithms can markedly enhance the performance of object detectors under conditions of domain shifts, thereby reducing the necessity for extensive labeling and retraining. Current domain adaptive object detection algorithms primarily cater to two-stage detectors, which tend to offer minimal improvements when directly applied to single-stage detectors such as YOL…
▽ More
Unsupervised domain adaptive (UDA) algorithms can markedly enhance the performance of object detectors under conditions of domain shifts, thereby reducing the necessity for extensive labeling and retraining. Current domain adaptive object detection algorithms primarily cater to two-stage detectors, which tend to offer minimal improvements when directly applied to single-stage detectors such as YOLO. Intending to benefit the YOLO detector from UDA, we build a comprehensive domain adaptive architecture using a teacher-student cooperative system for the YOLO detector. In this process, we propose uncertainty learning to cope with pseudo-labeling generated by the teacher model with extreme uncertainty and leverage dynamic data augmentation to asymptotically adapt the teacher-student system to the environment. To address the inability of single-stage object detectors to align at multiple stages, we utilize a unified visual contrastive learning paradigm that aligns instance at backbone and head respectively, which steadily improves the robustness of the detectors in cross-domain tasks. In summary, we present an unsupervised domain adaptive YOLO detector based on visual contrastive learning (CLDA-YOLO), which achieves highly competitive results across multiple domain adaptive datasets without any reduction in inference speed.
△ Less
Submitted 16 December, 2024;
originally announced December 2024.
-
Safe Reinforcement Learning using Finite-Horizon Gradient-based Estimation
Authors:
Juntao Dai,
Yaodong Yang,
Qian Zheng,
Gang Pan
Abstract:
A key aspect of Safe Reinforcement Learning (Safe RL) involves estimating the constraint condition for the next policy, which is crucial for guiding the optimization of safe policy updates. However, the existing Advantage-based Estimation (ABE) method relies on the infinite-horizon discounted advantage function. This dependence leads to catastrophic errors in finite-horizon scenarios with non-disc…
▽ More
A key aspect of Safe Reinforcement Learning (Safe RL) involves estimating the constraint condition for the next policy, which is crucial for guiding the optimization of safe policy updates. However, the existing Advantage-based Estimation (ABE) method relies on the infinite-horizon discounted advantage function. This dependence leads to catastrophic errors in finite-horizon scenarios with non-discounted constraints, resulting in safety-violation updates. In response, we propose the first estimation method for finite-horizon non-discounted constraints in deep Safe RL, termed Gradient-based Estimation (GBE), which relies on the analytic gradient derived along trajectories. Our theoretical and empirical analyses demonstrate that GBE can effectively estimate constraint changes over a finite horizon. Constructing a surrogate optimization problem with GBE, we developed a novel Safe RL algorithm called Constrained Gradient-based Policy Optimization (CGPO). CGPO identifies feasible optimal policies by iteratively resolving sub-problems within trust regions. Our empirical results reveal that CGPO, unlike baseline algorithms, successfully estimates the constraint functions of subsequent policies, thereby ensuring the efficiency and feasibility of each update.
△ Less
Submitted 15 December, 2024;
originally announced December 2024.
-
Deep Learning for Spectrum Prediction in Cognitive Radio Networks: State-of-the-Art, New Opportunities, and Challenges
Authors:
Guangliang Pan,
David K. Y. Yau,
Bo Zhou,
Qihui Wu
Abstract:
Spectrum prediction is considered to be a promising technology that enhances spectrum efficiency by assisting dynamic spectrum access (DSA) in cognitive radio networks (CRN). Nonetheless, the highly nonlinear nature of spectrum data across time, frequency, and space domains, coupled with the intricate spectrum usage patterns, poses challenges for accurate spectrum prediction. Deep learning (DL), r…
▽ More
Spectrum prediction is considered to be a promising technology that enhances spectrum efficiency by assisting dynamic spectrum access (DSA) in cognitive radio networks (CRN). Nonetheless, the highly nonlinear nature of spectrum data across time, frequency, and space domains, coupled with the intricate spectrum usage patterns, poses challenges for accurate spectrum prediction. Deep learning (DL), recognized for its capacity to extract nonlinear features, has been applied to solve these challenges. This paper first shows the advantages of applying DL by comparing with traditional prediction methods. Then, the current state-of-the-art DL-based spectrum prediction techniques are reviewed and summarized in terms of intra-band and crossband prediction. Notably, this paper uses a real-world spectrum dataset to prove the advancements of DL-based methods. Then, this paper proposes a novel intra-band spatiotemporal spectrum prediction framework named ViTransLSTM. This framework integrates visual self-attention and long short-term memory to capture both local and global long-term spatiotemporal dependencies of spectrum usage patterns. Similarly, the effectiveness of the proposed framework is validated on the aforementioned real-world dataset. Finally, the paper presents new related challenges and potential opportunities for future research.
△ Less
Submitted 12 December, 2024;
originally announced December 2024.
-
CBraMod: A Criss-Cross Brain Foundation Model for EEG Decoding
Authors:
Jiquan Wang,
Sha Zhao,
Zhiling Luo,
Yangxuan Zhou,
Haiteng Jiang,
Shijian Li,
Tao Li,
Gang Pan
Abstract:
Electroencephalography (EEG) is a non-invasive technique to measure and record brain electrical activity, widely used in various BCI and healthcare applications. Early EEG decoding methods rely on supervised learning, limited by specific tasks and datasets, hindering model performance and generalizability. With the success of large language models, there is a growing body of studies focusing on EE…
▽ More
Electroencephalography (EEG) is a non-invasive technique to measure and record brain electrical activity, widely used in various BCI and healthcare applications. Early EEG decoding methods rely on supervised learning, limited by specific tasks and datasets, hindering model performance and generalizability. With the success of large language models, there is a growing body of studies focusing on EEG foundation models. However, these studies still leave challenges: Firstly, most of existing EEG foundation models employ full EEG modeling strategy. It models the spatial and temporal dependencies between all EEG patches together, but ignores that the spatial and temporal dependencies are heterogeneous due to the unique structural characteristics of EEG signals. Secondly, existing EEG foundation models have limited generalizability on a wide range of downstream BCI tasks due to varying formats of EEG data, making it challenging to adapt to. To address these challenges, we propose a novel foundation model called CBraMod. Specifically, we devise a criss-cross transformer as the backbone to thoroughly leverage the structural characteristics of EEG signals, which can model spatial and temporal dependencies separately through two parallel attention mechanisms. And we utilize an asymmetric conditional positional encoding scheme which can encode positional information of EEG patches and be easily adapted to the EEG with diverse formats. CBraMod is pre-trained on a very large corpus of EEG through patch-based masked EEG reconstruction. We evaluate CBraMod on up to 10 downstream BCI tasks (12 public datasets). CBraMod achieves the state-of-the-art performance across the wide range of tasks, proving its strong capability and generalizability. The source code is publicly available at https://github.com/wjq-learning/CBraMod.
△ Less
Submitted 13 April, 2025; v1 submitted 10 December, 2024;
originally announced December 2024.
-
VP-MEL: Visual Prompts Guided Multimodal Entity Linking
Authors:
Hongze Mi,
Jinyuan Li,
Xuying Zhang,
Haoran Cheng,
Jiahao Wang,
Di Sun,
Gang Pan
Abstract:
Multimodal entity linking (MEL), a task aimed at linking mentions within multimodal contexts to their corresponding entities in a knowledge base (KB), has attracted much attention due to its wide applications in recent years. However, existing MEL methods often rely on mention words as retrieval cues, which limits their ability to effectively utilize information from both images and text. This rel…
▽ More
Multimodal entity linking (MEL), a task aimed at linking mentions within multimodal contexts to their corresponding entities in a knowledge base (KB), has attracted much attention due to its wide applications in recent years. However, existing MEL methods often rely on mention words as retrieval cues, which limits their ability to effectively utilize information from both images and text. This reliance causes MEL to struggle with accurately retrieving entities in certain scenarios, especially when the focus is on image objects or mention words are missing from the text. To solve these issues, we introduce a Visual Prompts guided Multimodal Entity Linking (VP-MEL) task. Given a text-image pair, VP-MEL aims to link a marked region (i.e., visual prompt) in an image to its corresponding entities in the knowledge base. To facilitate this task, we present a new dataset, VPWiki, specifically designed for VP-MEL. Furthermore, we propose a framework named IIER, which enhances visual feature extraction using visual prompts and leverages the pretrained Detective-VLM model to capture latent information. Experimental results on the VPWiki dataset demonstrate that IIER outperforms baseline methods across multiple benchmarks for the VP-MEL task.
△ Less
Submitted 15 February, 2025; v1 submitted 9 December, 2024;
originally announced December 2024.
-
A Framework For Image Synthesis Using Supervised Contrastive Learning
Authors:
Yibin Liu,
Jianyu Zhang,
Li Zhang,
Shijian Li,
Gang Pan
Abstract:
Text-to-image (T2I) generation aims at producing realistic images corresponding to text descriptions. Generative Adversarial Network (GAN) has proven to be successful in this task. Typical T2I GANs are 2 phase methods that first pretrain an inter-modal representation from aligned image-text pairs and then use GAN to train image generator on that basis. However, such representation ignores the inne…
▽ More
Text-to-image (T2I) generation aims at producing realistic images corresponding to text descriptions. Generative Adversarial Network (GAN) has proven to be successful in this task. Typical T2I GANs are 2 phase methods that first pretrain an inter-modal representation from aligned image-text pairs and then use GAN to train image generator on that basis. However, such representation ignores the inner-modal semantic correspondence, e.g. the images with same label. The semantic label in priory describes the inherent distribution pattern with underlying cross-image relationships, which is supplement to the text description for understanding the full characteristics of image. In this paper, we propose a framework leveraging both inter- and inner-modal correspondence by label guided supervised contrastive learning. We extend the T2I GANs to two parameter-sharing contrast branches in both pretraining and generation phases. This integration effectively clusters the semantically similar image-text pair representations, thereby fostering the generation of higher-quality images. We demonstrate our framework on four novel T2I GANs by both single-object dataset CUB and multi-object dataset COCO, achieving significant improvements in the Inception Score (IS) and Frechet Inception Distance (FID) metrics of imagegeneration evaluation. Notably, on more complex multi-object COCO, our framework improves FID by 30.1%, 27.3%, 16.2% and 17.1% for AttnGAN, DM-GAN, SSA-GAN and GALIP, respectively. We also validate our superiority by comparing with other label guided T2I GANs. The results affirm the effectiveness and competitiveness of our approach in advancing the state-of-the-art GAN for T2I generation
△ Less
Submitted 5 December, 2024;
originally announced December 2024.
-
Redefining Crowdsourced Test Report Prioritization: An Innovative Approach with Large Language Model
Authors:
Yuchen Ling,
Shengcheng Yu,
Chunrong Fang,
Guobin Pan,
Jun Wang,
Jia Liu
Abstract:
Context: Crowdsourced testing has gained popularity in software testing, especially for mobile app testing, due to its ability to bring diversity and tackle fragmentation issues. However, the openness of crowdsourced testing presents challenges, particularly in the manual review of numerous test reports, which is time-consuming and labor-intensive. Objective: The primary goal of this research is t…
▽ More
Context: Crowdsourced testing has gained popularity in software testing, especially for mobile app testing, due to its ability to bring diversity and tackle fragmentation issues. However, the openness of crowdsourced testing presents challenges, particularly in the manual review of numerous test reports, which is time-consuming and labor-intensive. Objective: The primary goal of this research is to improve the efficiency of review processes in crowdsourced testing. Traditional approaches to test report prioritization lack a deep understanding of semantic information in textual descriptions of these reports. This paper introduces LLMPrior, a novel approach for prioritizing crowdsourced test reports using large language models (LLMs). Method: LLMPrior leverages LLMs for the analysis and clustering of crowdsourced test reports based on the types of bugs revealed in their textual descriptions. This involves using prompt engineering techniques to enhance the performance of LLMs. Following the clustering, a recurrent selection algorithm is applied to prioritize the reports. Results: Empirical experiments are conducted to evaluate the effectiveness of LLMPrior. The findings indicate that LLMPrior not only surpasses current state-of-the-art approaches in terms of performance but also proves to be more feasible, efficient, and reliable. This success is attributed to the use of prompt engineering techniques and the cluster-based prioritization strategy. Conclusion: LLMPrior represents a significant advancement in crowdsourced test report prioritization. By effectively utilizing large language models and a cluster-based strategy, it addresses the challenges in traditional prioritization approaches, offering a more efficient and reliable solution for app developers dealing with crowdsourced test reports.
△ Less
Submitted 25 November, 2024;
originally announced November 2024.
-
How to Build a Pre-trained Multimodal model for Simultaneously Chatting and Decision-making?
Authors:
Zuojin Tang,
Bin Hu,
Chenyang Zhao,
De Ma,
Gang Pan,
Bin Liu
Abstract:
Existing large pre-trained models typically map text input to text output in an end-to-end manner, such as ChatGPT, or map a segment of text input to a hierarchy of action decisions, such as OpenVLA. However, humans can simultaneously generate text and actions when receiving specific input signals. For example, a driver can make precise driving decisions while conversing with a friend in the passe…
▽ More
Existing large pre-trained models typically map text input to text output in an end-to-end manner, such as ChatGPT, or map a segment of text input to a hierarchy of action decisions, such as OpenVLA. However, humans can simultaneously generate text and actions when receiving specific input signals. For example, a driver can make precise driving decisions while conversing with a friend in the passenger seat. Motivated by this observation, we consider the following question in this work: is it possible to construct a pre-trained model that can provide both language interaction and precise decision-making capabilities in dynamic open scenarios. We provide a definitive answer to this question by developing a new model architecture termed Visual Language Action model for Chatting and Decision Making (VLA4CD), and further demonstrating its performance in challenging autonomous driving tasks. Specifically, we leverage LoRA to fine-tune a pre-trained LLM with data of multiple modalities covering language, visual, and action. Unlike the existing LoRA operations used for LLM fine-tuning, we have designed new computational modules and training cost functions for VLA4CD. These designs enable VLA4CD to provide continuous-valued action decisions while outputting text responses. In contrast, existing LLMs can only output text responses, and current VLA models can only output action decisions. Moreover, these VLA models handle action data by discretizing and then tokenizing the discretized actions, a method unsuitable for complex decision-making tasks involving high-dimensional continuous-valued action vectors, such as autonomous driving. The experimental results on CARLA validate that: (1) our proposed model construction method is effective; (2) compared to the SOTA VLA model, VLA4CD can provide more accurate real-time decision-making while retaining the text interaction capability inherent to LLMs.
△ Less
Submitted 21 October, 2024;
originally announced October 2024.
-
Enhancing SNN-based Spatio-Temporal Learning: A Benchmark Dataset and Cross-Modality Attention Model
Authors:
Shibo Zhou,
Bo Yang,
Mengwen Yuan,
Runhao Jiang,
Rui Yan,
Gang Pan,
Huajin Tang
Abstract:
Spiking Neural Networks (SNNs), renowned for their low power consumption, brain-inspired architecture, and spatio-temporal representation capabilities, have garnered considerable attention in recent years. Similar to Artificial Neural Networks (ANNs), high-quality benchmark datasets are of great importance to the advances of SNNs. However, our analysis indicates that many prevalent neuromorphic da…
▽ More
Spiking Neural Networks (SNNs), renowned for their low power consumption, brain-inspired architecture, and spatio-temporal representation capabilities, have garnered considerable attention in recent years. Similar to Artificial Neural Networks (ANNs), high-quality benchmark datasets are of great importance to the advances of SNNs. However, our analysis indicates that many prevalent neuromorphic datasets lack strong temporal correlation, preventing SNNs from fully exploiting their spatio-temporal representation capabilities. Meanwhile, the integration of event and frame modalities offers more comprehensive visual spatio-temporal information. Yet, the SNN-based cross-modality fusion remains underexplored.
In this work, we present a neuromorphic dataset called DVS-SLR that can better exploit the inherent spatio-temporal properties of SNNs. Compared to existing datasets, it offers advantages in terms of higher temporal correlation, larger scale, and more varied scenarios. In addition, our neuromorphic dataset contains corresponding frame data, which can be used for developing SNN-based fusion methods. By virtue of the dual-modal feature of the dataset, we propose a Cross-Modality Attention (CMA) based fusion method. The CMA model efficiently utilizes the unique advantages of each modality, allowing for SNNs to learn both temporal and spatial attention scores from the spatio-temporal features of event and frame modalities, subsequently allocating these scores across modalities to enhance their synergy. Experimental results demonstrate that our method not only improves recognition accuracy but also ensures robustness across diverse scenarios.
△ Less
Submitted 21 October, 2024;
originally announced October 2024.
-
Spiking GS: Towards High-Accuracy and Low-Cost Surface Reconstruction via Spiking Neuron-based Gaussian Splatting
Authors:
Weixing Zhang,
Zongrui Li,
De Ma,
Huajin Tang,
Xudong Jiang,
Qian Zheng,
Gang Pan
Abstract:
3D Gaussian Splatting is capable of reconstructing 3D scenes in minutes. Despite recent advances in improving surface reconstruction accuracy, the reconstructed results still exhibit bias and suffer from inefficiency in storage and training. This paper provides a different observation on the cause of the inefficiency and the reconstruction bias, which is attributed to the integration of the low-op…
▽ More
3D Gaussian Splatting is capable of reconstructing 3D scenes in minutes. Despite recent advances in improving surface reconstruction accuracy, the reconstructed results still exhibit bias and suffer from inefficiency in storage and training. This paper provides a different observation on the cause of the inefficiency and the reconstruction bias, which is attributed to the integration of the low-opacity parts (LOPs) of the generated Gaussians. We show that LOPs consist of Gaussians with overall low-opacity (LOGs) and the low-opacity tails (LOTs) of Gaussians. We propose Spiking GS to reduce such two types of LOPs by integrating spiking neurons into the Gaussian Splatting pipeline. Specifically, we introduce global and local full-precision integrate-and-fire spiking neurons to the opacity and representation function of flattened 3D Gaussians, respectively. Furthermore, we enhance the density control strategy with spiking neurons' thresholds and a new criterion on the scale of Gaussians. Our method can represent more accurate reconstructed surfaces at a lower cost. The supplementary material and code are available at https://github.com/zju-bmi-lab/SpikingGS.
△ Less
Submitted 3 December, 2024; v1 submitted 8 October, 2024;
originally announced October 2024.
-
Copiloting Diagnosis of Autism in Real Clinical Scenarios via LLMs
Authors:
Yi Jiang,
Qingyang Shen,
Shuzhong Lai,
Shunyu Qi,
Qian Zheng,
Lin Yao,
Yueming Wang,
Gang Pan
Abstract:
Autism spectrum disorder(ASD) is a pervasive developmental disorder that significantly impacts the daily functioning and social participation of individuals. Despite the abundance of research focused on supporting the clinical diagnosis of ASD, there is still a lack of systematic and comprehensive exploration in the field of methods based on Large Language Models (LLMs), particularly regarding the…
▽ More
Autism spectrum disorder(ASD) is a pervasive developmental disorder that significantly impacts the daily functioning and social participation of individuals. Despite the abundance of research focused on supporting the clinical diagnosis of ASD, there is still a lack of systematic and comprehensive exploration in the field of methods based on Large Language Models (LLMs), particularly regarding the real-world clinical diagnostic scenarios based on Autism Diagnostic Observation Schedule, Second Edition (ADOS-2). Therefore, we have proposed a framework called ADOS-Copilot, which strikes a balance between scoring and explanation and explored the factors that influence the performance of LLMs in this task. The experimental results indicate that our proposed framework is competitive with the diagnostic results of clinicians, with a minimum MAE of 0.4643, binary classification F1-score of 81.79\%, and ternary classification F1-score of 78.37\%. Furthermore, we have systematically elucidated the strengths and limitations of current LLMs in this task from the perspectives of ADOS-2, LLMs' capabilities, language, and model scale aiming to inspire and guide the future application of LLMs in a broader fields of mental health disorders. We hope for more research to be transferred into real clinical practice, opening a window of kindness to the world for eccentric children.
△ Less
Submitted 9 October, 2024; v1 submitted 8 October, 2024;
originally announced October 2024.
-
Secure Offloading in NOMA-Aided Aerial MEC Systems Based on Deep Reinforcement Learning
Authors:
Hongjiang Lei,
Mingxu Yang,
Ki-Hong Park,
Gaofeng Pan
Abstract:
Mobile edge computing (MEC) technology can reduce user latency and energy consumption by offloading computationally intensive tasks to the edge servers. Unmanned aerial vehicles (UAVs) and non-orthogonal multiple access (NOMA) technology enable the MEC networks to provide offloaded computing services for massively accessed terrestrial users conveniently. However, the broadcast nature of signal pro…
▽ More
Mobile edge computing (MEC) technology can reduce user latency and energy consumption by offloading computationally intensive tasks to the edge servers. Unmanned aerial vehicles (UAVs) and non-orthogonal multiple access (NOMA) technology enable the MEC networks to provide offloaded computing services for massively accessed terrestrial users conveniently. However, the broadcast nature of signal propagation in NOMA-based UAV-MEC networks makes it vulnerable to eavesdropping by malicious eavesdroppers. In this work, a secure offload scheme is proposed for NOMA-based UAV-MEC systems with the existence of an aerial eavesdropper. The long-term average network computational cost is minimized by jointly designing the UAV's trajectory, the terrestrial users' transmit power, and computational frequency while ensuring the security of users' offloaded data. Due to the eavesdropper's location uncertainty, the worst-case security scenario is considered through the estimated eavesdropping range. Due to the high-dimensional continuous action space, the deep deterministic policy gradient algorithm is utilized to solve the non-convex optimization problem. Simulation results validate the effectiveness of the proposed scheme.
△ Less
Submitted 11 October, 2024; v1 submitted 13 September, 2024;
originally announced September 2024.
-
Toward Large-scale Spiking Neural Networks: A Comprehensive Survey and Future Directions
Authors:
Yangfan Hu,
Qian Zheng,
Guoqi Li,
Huajin Tang,
Gang Pan
Abstract:
Deep learning has revolutionized artificial intelligence (AI), achieving remarkable progress in fields such as computer vision, speech recognition, and natural language processing. Moreover, the recent success of large language models (LLMs) has fueled a surge in research on large-scale neural networks. However, the escalating demand for computing resources and energy consumption has prompted the…
▽ More
Deep learning has revolutionized artificial intelligence (AI), achieving remarkable progress in fields such as computer vision, speech recognition, and natural language processing. Moreover, the recent success of large language models (LLMs) has fueled a surge in research on large-scale neural networks. However, the escalating demand for computing resources and energy consumption has prompted the search for energy-efficient alternatives. Inspired by the human brain, spiking neural networks (SNNs) promise energy-efficient computation with event-driven spikes. To provide future directions toward building energy-efficient large SNN models, we present a survey of existing methods for developing deep spiking neural networks, with a focus on emerging Spiking Transformers. Our main contributions are as follows: (1) an overview of learning methods for deep spiking neural networks, categorized by ANN-to-SNN conversion and direct training with surrogate gradients; (2) an overview of network architectures for deep spiking neural networks, categorized by deep convolutional neural networks (DCNNs) and Transformer architecture; and (3) a comprehensive comparison of state-of-the-art deep SNNs with a focus on emerging Spiking Transformers. We then further discuss and outline future directions toward large-scale SNNs.
△ Less
Submitted 19 August, 2024;
originally announced September 2024.
-
Human-Inspired Audio-Visual Speech Recognition: Spike Activity, Cueing Interaction and Causal Processing
Authors:
Qianhui Liu,
Jiadong Wang,
Yang Wang,
Xin Yang,
Gang Pan,
Haizhou Li
Abstract:
Humans naturally perform audiovisual speech recognition (AVSR), enhancing the accuracy and robustness by integrating auditory and visual information. Spiking neural networks (SNNs), which mimic the brain's information-processing mechanisms, are well-suited for emulating the human capability of AVSR. Despite their potential, research on SNNs for AVSR is scarce, with most existing audio-visual multi…
▽ More
Humans naturally perform audiovisual speech recognition (AVSR), enhancing the accuracy and robustness by integrating auditory and visual information. Spiking neural networks (SNNs), which mimic the brain's information-processing mechanisms, are well-suited for emulating the human capability of AVSR. Despite their potential, research on SNNs for AVSR is scarce, with most existing audio-visual multimodal methods focused on object or digit recognition. These models simply integrate features from both modalities, neglecting their unique characteristics and interactions. Additionally, they often rely on future information for current processing, which increases recognition latency and limits real-time applicability. Inspired by human speech perception, this paper proposes a novel human-inspired SNN named HI-AVSNN for AVSR, incorporating three key characteristics: cueing interaction, causal processing and spike activity. For cueing interaction, we propose a visual-cued auditory attention module (VCA2M) that leverages visual cues to guide attention to auditory features. We achieve causal processing by aligning the SNN's temporal dimension with that of visual and auditory features and applying temporal masking to utilize only past and current information. To implement spike activity, in addition to using SNNs, we leverage the event camera to capture lip movement as spikes, mimicking the human retina and providing efficient visual data. We evaluate HI-AVSNN on an audiovisual speech recognition dataset combining the DVS-Lip dataset with its corresponding audio samples. Experimental results demonstrate the superiority of our proposed fusion method, outperforming existing audio-visual SNN fusion methods and achieving a 2.27% improvement in accuracy over the only existing SNN-based AVSR method.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
An Asynchronous Multi-core Accelerator for SNN inference
Authors:
Zhuo Chen,
De Ma,
Xiaofei Jin,
Qinghui Xing,
Ouwen Jin,
Xin Du,
Shuibing He,
Gang Pan
Abstract:
Spiking Neural Networks (SNNs) are extensively utilized in brain-inspired computing and neuroscience research. To enhance the speed and energy efficiency of SNNs, several many-core accelerators have been developed. However, maintaining the accuracy of SNNs often necessitates frequent explicit synchronization among all cores, which presents a challenge to overall efficiency. In this paper, we propo…
▽ More
Spiking Neural Networks (SNNs) are extensively utilized in brain-inspired computing and neuroscience research. To enhance the speed and energy efficiency of SNNs, several many-core accelerators have been developed. However, maintaining the accuracy of SNNs often necessitates frequent explicit synchronization among all cores, which presents a challenge to overall efficiency. In this paper, we propose an asynchronous architecture for Spiking Neural Networks (SNNs) that eliminates the need for inter-core synchronization, thus enhancing speed and energy efficiency. This approach leverages the pre-determined dependencies of neuromorphic cores established during compilation. Each core is equipped with a scheduler that monitors the status of its dependencies, allowing it to safely advance to the next timestep without waiting for other cores. This eliminates the necessity for global synchronization and minimizes core waiting time despite inherent workload imbalances. Comprehensive evaluations using five different SNN workloads show that our architecture achieves a 1.86x speedup and a 1.55x increase in energy efficiency compared to state-of-the-art synchronization architectures.
△ Less
Submitted 30 July, 2024;
originally announced July 2024.
-
Optimizing 5G-Advanced Networks for Time-critical Applications: The Role of L4S
Authors:
Guangjin Pan,
Shugong Xu,
Pin Jiang
Abstract:
As 5G networks strive to support advanced time-critical applications, such as immersive Extended Reality (XR), cloud gaming, and autonomous driving, the demand for Real-time Broadband Communication (RTBC) grows. In this article, we present the main mechanisms of Low Latency, Low Loss, and Scalable Throughput (L4S). Subsequently, we investigate the support and challenges of L4S technology in the la…
▽ More
As 5G networks strive to support advanced time-critical applications, such as immersive Extended Reality (XR), cloud gaming, and autonomous driving, the demand for Real-time Broadband Communication (RTBC) grows. In this article, we present the main mechanisms of Low Latency, Low Loss, and Scalable Throughput (L4S). Subsequently, we investigate the support and challenges of L4S technology in the latest 3GPP 5G-Advanced Release 18 (R18) standard. Our case study, using a prototype system for a real-time communication (RTC) application, demonstrates the superiority of L4S technology. The experimental results show that, compared with the GCC algorithm, the proposed L4S-GCC algorithm can reduce the stalling rate by 1.51%-2.80% and increase the bandwidth utilization by 11.4%-31.4%. The results emphasize the immense potential of L4S technology in enhancing transmission performance in time-critical applications.
△ Less
Submitted 30 July, 2024;
originally announced July 2024.
-
Sewer Image Super-Resolution with Depth Priors and Its Lightweight Network
Authors:
Gang Pan,
Chen Wang,
Zhijie Sui,
Shuai Guo,
Yaozhi Lv,
Honglie Li,
Di Sun,
Zixia Xia
Abstract:
The Quick-view (QV) technique serves as a primary method for detecting defects within sewerage systems. However, the effectiveness of QV is impeded by the limited visual range of its hardware, resulting in suboptimal image quality for distant portions of the sewer network. Image super-resolution is an effective way to improve image quality and has been applied in a variety of scenes. However, rese…
▽ More
The Quick-view (QV) technique serves as a primary method for detecting defects within sewerage systems. However, the effectiveness of QV is impeded by the limited visual range of its hardware, resulting in suboptimal image quality for distant portions of the sewer network. Image super-resolution is an effective way to improve image quality and has been applied in a variety of scenes. However, research on super-resolution for sewer images remains considerably unexplored. In response, this study leverages the inherent depth relationships present within QV images and introduces a novel Depth-guided, Reference-based Super-Resolution framework denoted as DSRNet. It comprises two core components: a depth extraction module and a depth information matching module (DMM). DSRNet utilizes the adjacent frames of the low-resolution image as reference images and helps them recover texture information based on the correlation. By combining these modules, the integration of depth priors significantly enhances both visual quality and performance benchmarks. Besides, in pursuit of computational efficiency and compactness, a super-resolution knowledge distillation model based on an attention mechanism is introduced. This mechanism facilitates the acquisition of feature similarity between a more complex teacher model and a streamlined student model, with the latter being a lightweight version of DSRNet. Experimental results demonstrate that DSRNet significantly improves PSNR and SSIM compared with other methods. This study also conducts experiments on sewer defect semantic segmentation, object detection, and classification on the Pipe dataset and Sewer-ML dataset. Experiments show that the method can improve the performance of low-resolution sewer images in these tasks.
△ Less
Submitted 25 February, 2025; v1 submitted 27 July, 2024;
originally announced July 2024.
-
Semi-Supervised Pipe Video Temporal Defect Interval Localization
Authors:
Zhu Huang,
Gang Pan,
Chao Kang,
YaoZhi Lv
Abstract:
In sewer pipe Closed-Circuit Television (CCTV) inspection, accurate temporal defect localization is essential for effective defect classification, detection, segmentation and quantification. Industry standards typically do not require time-interval annotations, even though they are more informative than time-point annotations for defect localization, resulting in additional annotation costs when f…
▽ More
In sewer pipe Closed-Circuit Television (CCTV) inspection, accurate temporal defect localization is essential for effective defect classification, detection, segmentation and quantification. Industry standards typically do not require time-interval annotations, even though they are more informative than time-point annotations for defect localization, resulting in additional annotation costs when fully supervised methods are used. Additionally, differences in scene types and camera motion patterns between pipe inspections and Temporal Action Localization (TAL) hinder the effective transfer of point-supervised TAL methods. Therefore, this study introduces a Semi-supervised multi-Prototype-based method incorporating visual Odometry for enhanced attention guidance (PipeSPO). PipeSPO fully leverages unlabeled data through unsupervised pretext tasks and utilizes time-point annotated data with a weakly supervised multi-prototype-based method, relying on visual odometry features to capture camera pose information. Experiments on real-world datasets demonstrate that PipeSPO achieves 41.89% average precision across Intersection over Union (IoU) thresholds of 0.1-0.7, improving by 8.14% over current state-of-the-art methods.
△ Less
Submitted 21 July, 2024;
originally announced July 2024.
-
Proactive Eavesdropping in Relay Systems via Trajectory and Power Optimization
Authors:
Qian Dan,
Hongjiang Lei,
Ki-Hong Park,
Weijia Lei,
Gaofeng Pan
Abstract:
Wireless relays can effectively extend the transmission range of information. However, if relay technology is utilized unlawfully, it can amplify potential harm. Effectively surveilling illegitimate relay links poses a challenging problem. Unmanned aerial vehicles (UAVs) can proactively surveil wireless relay systems due to their flexible mobility. This work focuses on maximizing the eavesdropping…
▽ More
Wireless relays can effectively extend the transmission range of information. However, if relay technology is utilized unlawfully, it can amplify potential harm. Effectively surveilling illegitimate relay links poses a challenging problem. Unmanned aerial vehicles (UAVs) can proactively surveil wireless relay systems due to their flexible mobility. This work focuses on maximizing the eavesdropping rate (ER) of UAVs by jointly optimizing the trajectory and jamming power. To address this challenge, we propose a new iterative algorithm based on block coordinate descent and successive convex approximation technologies. Simulation results demonstrate that the proposed algorithm significantly enhances the ER through trajectory and jamming power optimization.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Beamforming Design for Joint Target Sensing and Proactive Eavesdropping
Authors:
Qian Dan,
Hongjiang Lei,
Ki-Hong Park,
Gaofeng Pan,
Mohamed-Slim Alouini
Abstract:
This work studies the beamforming design in the joint target sensing and proactive eavesdropping (JTSAPE) system. The JTSAPE base station (BS) receives the information transmitted by the illegal transmitter and transmits the waveform for target sensing. The shared waveform also serves as artificial noise to interfere with the illegal receiver, thereby achieving proactive eavesdropping. We firstly…
▽ More
This work studies the beamforming design in the joint target sensing and proactive eavesdropping (JTSAPE) system. The JTSAPE base station (BS) receives the information transmitted by the illegal transmitter and transmits the waveform for target sensing. The shared waveform also serves as artificial noise to interfere with the illegal receiver, thereby achieving proactive eavesdropping. We firstly optimize the transmitting beam of the BS to maximize the eavesdropping signal-to-interference-plus-noise ratio or minimize the target estimation parameter Cram{é}r-Rao bound, respectively. Then, the joint optimization of proactive eavesdropping and target sensing is investigated, and the normalized weighted optimization problem is formulated. To address the complexity of the original problem, the formulated problem is decomposed into two subproblems: proactive eavesdropping and target sensing, which are solved by the semi-definite relaxation technique. Furthermore, the scenario in which the quality of the eavesdropping channel is stronger than that of the illegal channel is considered. We utilize the sequential rank-one constraint relaxation method and iteration technique to obtain the high-quality suboptimal solution of the beam transmit covariance matrix. Numerical simulation shows the effectiveness of our proposed algorithm.
△ Less
Submitted 2 May, 2025; v1 submitted 8 July, 2024;
originally announced July 2024.