-
Force-IMU Fusion-Based Sensing Acupuncture Needle and Quantitative Analysis System for Acupuncture Manipulations
Authors:
Peng Tian,
Kang Yu,
Tianyun Jiang,
Yuqi Wang,
Haiying Zhang,
Hao Yang,
Yunfeng Wang,
Jun Zhang,
Shuo Gao,
Junhong Gao
Abstract:
Acupuncture, one of the key therapeutic methods in Traditional Chinese Medicine (TCM), has been widely adopted in various clinical fields. Quantitative research on acupuncture manipulation parameters is critical to achieve standardized techniques. However, quantitative mechanical detection of acupuncture parameters remains limited. This study establishes a kinematic and dynamic model of acupunctur…
▽ More
Acupuncture, one of the key therapeutic methods in Traditional Chinese Medicine (TCM), has been widely adopted in various clinical fields. Quantitative research on acupuncture manipulation parameters is critical to achieve standardized techniques. However, quantitative mechanical detection of acupuncture parameters remains limited. This study establishes a kinematic and dynamic model of acupuncture, identifying key parameters such as lifting-thrusting force, acceleration, velocity, displacement, as well as twirling-rotating angular velocity and angle. To measure these critical parameters, we propose a quantitative system comprising a sensing needle equipped with a force sensor and an inertial measurement unit (IMU), as well as an external camera module to capture image information. By fusing visual and IMU data, we accurately identify the stationary or motion states of the needle, enabling segmented computation of lifting-thrusting velocity and displacement. The experimental results demonstrate that the sensing needle achieves comprehensive detection with high precision, featuring a nonlinearity error of 0.45% in force measurement and an RMSE of 1.2 mm in displacement. The extracted parameters provide an objective description of the operational characteristics and motion patterns of the four basic acupuncture manipulations. These findings provide valuable tools and methods for research in acupuncture standardization.
△ Less
Submitted 7 July, 2025;
originally announced July 2025.
-
Synesthesia of Machines (SoM)-Enhanced Sub-THz ISAC Transmission for Air-Ground Network
Authors:
Zonghui Yang,
Shijian Gao,
Xiang Cheng,
Liuqing Yang
Abstract:
Integrated sensing and communication (ISAC) within sub-THz frequencies is crucial for future air-ground networks, but unique propagation characteristics and hardware limitations present challenges in optimizing ISAC performance while increasing operational latency. This paper introduces a multi-modal sensing fusion framework inspired by synesthesia of machine (SoM) to enhance sub-THz ISAC transmis…
▽ More
Integrated sensing and communication (ISAC) within sub-THz frequencies is crucial for future air-ground networks, but unique propagation characteristics and hardware limitations present challenges in optimizing ISAC performance while increasing operational latency. This paper introduces a multi-modal sensing fusion framework inspired by synesthesia of machine (SoM) to enhance sub-THz ISAC transmission. By exploiting inherent degrees of freedom in sub-THz hardware and channels, the framework optimizes the radio-frequency environment. Squint-aware beam management is developed to improve air-ground network adaptability, enabling three-dimensional dynamic ISAC links. Leveraging multi-modal information, the framework enhances ISAC performance and reduces latency. Visual data rapidly localizes users and targets, while a customized multi-modal learning algorithm optimizes the hybrid precoder. A new metric provides comprehensive performance evaluation, and extensive experiments demonstrate that the proposed scheme significantly improves ISAC efficiency.
△ Less
Submitted 15 June, 2025;
originally announced June 2025.
-
Ming-Omni: A Unified Multimodal Model for Perception and Generation
Authors:
Inclusion AI,
Biao Gong,
Cheng Zou,
Chuanyang Zheng,
Chunluan Zhou,
Canxiang Yan,
Chunxiang Jin,
Chunjie Shen,
Dandan Zheng,
Fudong Wang,
Furong Xu,
GuangMing Yao,
Jun Zhou,
Jingdong Chen,
Jianxin Sun,
Jiajia Liu,
Jianjiang Zhu,
Jun Peng,
Kaixiang Ji,
Kaiyou Song,
Kaimeng Ren,
Libin Wang,
Lixiang Ru,
Lele Xie,
Longhua Tan
, et al. (33 additional authors not shown)
Abstract:
We propose Ming-Omni, a unified multimodal model capable of processing images, text, audio, and video, while demonstrating strong proficiency in both speech and image generation. Ming-Omni employs dedicated encoders to extract tokens from different modalities, which are then processed by Ling, an MoE architecture equipped with newly proposed modality-specific routers. This design enables a single…
▽ More
We propose Ming-Omni, a unified multimodal model capable of processing images, text, audio, and video, while demonstrating strong proficiency in both speech and image generation. Ming-Omni employs dedicated encoders to extract tokens from different modalities, which are then processed by Ling, an MoE architecture equipped with newly proposed modality-specific routers. This design enables a single model to efficiently process and fuse multimodal inputs within a unified framework, thereby facilitating diverse tasks without requiring separate models, task-specific fine-tuning, or structural redesign. Importantly, Ming-Omni extends beyond conventional multimodal models by supporting audio and image generation. This is achieved through the integration of an advanced audio decoder for natural-sounding speech and Ming-Lite-Uni for high-quality image generation, which also allow the model to engage in context-aware chatting, perform text-to-speech conversion, and conduct versatile image editing. Our experimental results showcase Ming-Omni offers a powerful solution for unified perception and generation across all modalities. Notably, our proposed Ming-Omni is the first open-source model we are aware of to match GPT-4o in modality support, and we release all code and model weights to encourage further research and development in the community.
△ Less
Submitted 10 June, 2025;
originally announced June 2025.
-
Synesthesia of Machines (SoM)-Aided Online FDD Precoding via Heterogeneous Multi-Modal Sensing: A Vertical Federated Learning Approach
Authors:
Haotian Zhang,
Shijian Gao,
Weibo Wen,
Xiang Cheng,
Liuqing Yang
Abstract:
This paper investigates a heterogeneous multi-vehicle, multi-modal sensing (H-MVMM) aided online precoding problem. The proposed H-MVMM scheme utilizes a vertical federated learning (VFL) framework to minimize pilot sequence length and optimize the sum rate. This offers a promising solution for reducing latency in frequency division duplexing systems. To achieve this, three preprocessing modules a…
▽ More
This paper investigates a heterogeneous multi-vehicle, multi-modal sensing (H-MVMM) aided online precoding problem. The proposed H-MVMM scheme utilizes a vertical federated learning (VFL) framework to minimize pilot sequence length and optimize the sum rate. This offers a promising solution for reducing latency in frequency division duplexing systems. To achieve this, three preprocessing modules are designed to transform raw sensory data into informative representations relevant to precoding. The approach effectively addresses local data heterogeneity arising from diverse on-board sensor configurations through a well-structured VFL training procedure. Additionally, a label-free online model updating strategy is introduced, enabling the H-MVMM scheme to adapt its weights flexibly. This strategy features a pseudo downlink channel state information label simulator (PCSI-Simulator), which is trained using a semi-supervised learning (SSL) approach alongside an online loss function. Numerical results show that the proposed method can closely approximate the performance of traditional optimization techniques with perfect channel state information, achieving a significant 90.6\% reduction in pilot sequence length.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
CIM-NET: A Video Denoising Deep Neural Network Model Optimized for Computing-in-Memory Architectures
Authors:
Shan Gao,
Zhiqiang Wu,
Yawen Niu,
Xiaotao Li,
Qingqing Xu
Abstract:
While deep neural network (DNN)-based video denoising has demonstrated significant performance, deploying state-of-the-art models on edge devices remains challenging due to stringent real-time and energy efficiency requirements. Computing-in-Memory (CIM) chips offer a promising solution by integrating computation within memory cells, enabling rapid matrix-vector multiplication (MVM). However, exis…
▽ More
While deep neural network (DNN)-based video denoising has demonstrated significant performance, deploying state-of-the-art models on edge devices remains challenging due to stringent real-time and energy efficiency requirements. Computing-in-Memory (CIM) chips offer a promising solution by integrating computation within memory cells, enabling rapid matrix-vector multiplication (MVM). However, existing DNN models are often designed without considering CIM architectural constraints, thus limiting their acceleration potential during inference. To address this, we propose a hardware-algorithm co-design framework incorporating two innovations: (1) a CIM-Aware Architecture, CIM-NET, optimized for large receptive field operation and CIM's crossbar-based MVM acceleration; and (2) a pseudo-convolutional operator, CIM-CONV, used within CIM-NET to integrate slide-based processing with fully connected transformations for high-quality feature extraction and reconstruction. This framework significantly reduces the number of MVM operations, improving inference speed on CIM chips while maintaining competitive performance. Experimental results indicate that, compared to the conventional lightweight model FastDVDnet, CIM-NET substantially reduces MVM operations with a slight decrease in denoising performance. With a stride value of 8, CIM-NET reduces MVM operations to 1/77th of the original, while maintaining competitive PSNR (35.11 dB vs. 35.56 dB
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
Transmission Neural Networks: Approximation and Optimal Control
Authors:
Shuang Gao,
Peter E. Caines
Abstract:
Transmission Neural Networks (TransNNs) introduced by Gao and Caines (2022) connect virus spread models over networks and neural networks with tuneable activation functions. This paper presents the approximation technique and the underlying assumptions employed by TransNNs in relation to the corresponding Markovian Susceptible-Infected-Susceptible (SIS) model with 2^n states, where n is the number…
▽ More
Transmission Neural Networks (TransNNs) introduced by Gao and Caines (2022) connect virus spread models over networks and neural networks with tuneable activation functions. This paper presents the approximation technique and the underlying assumptions employed by TransNNs in relation to the corresponding Markovian Susceptible-Infected-Susceptible (SIS) model with 2^n states, where n is the number of nodes in the network. The underlying infection paths are assumed to be stochastic with heterogeneous and time-varying transmission probabilities. We obtain the conditional probability of infection in the stochastic 2^n-state SIS epidemic model corresponding to each state configuration under mild assumptions, which enables control solutions based on Markov decision processes (MDP). Finally, MDP control with 2^n-state SIS epidemic models and optimal control with TransNNs are compared in terms of mitigating virus spread over networks through vaccination, and it is shown that TranNNs enable the generation of control laws with significant computational savings, albeit with more conservative control actions.
△ Less
Submitted 18 May, 2025;
originally announced May 2025.
-
A High-Efficiency Reconfigurable Bidirectional Array Antenna Based on Transmit-Reflect Switchable Metasurface
Authors:
Fan Qin,
Jinyang Bi,
Jiao Ma,
Chao Gu,
Hailin Zhang,
Wenchi Cheng,
Steven Gao
Abstract:
This paper proposes a reconfigurable bidirectional array antenna with high-efficiency radiations and flexible beam-switching capability by employing a novel transmit-reflect switchable metasurface (TRSM). To realize the electromagnetic (EM) wave transmitted or reflected manipulation, a dedicated transmit-reflect switch layer (TRSL) with periodically soldered PIN diodes is introduced between two tr…
▽ More
This paper proposes a reconfigurable bidirectional array antenna with high-efficiency radiations and flexible beam-switching capability by employing a novel transmit-reflect switchable metasurface (TRSM). To realize the electromagnetic (EM) wave transmitted or reflected manipulation, a dedicated transmit-reflect switch layer (TRSL) with periodically soldered PIN diodes is introduced between two transmitted metasurfaces. By switching ON/OFF the embedded diodes, the TRSL performs as a mesh-type ground layer or polarization-grid layer, exhibiting a reflect or transmit property to the incident wave respectively. Further, utilizing the above TRSM configuration in conjunction with a microstrip feed antenna, bidirectional radiations are obtained at the same frequency and polarization. To further reduce the number of PIN diodes and control complexity, an enhanced TRSM using a single diode to control two unit cells is also investigated, resulting in half PIN diodes reduction. Since the bidirectional beam-switching is achieved by only controlling PIN diodes integrated in the ground plane instead of directly acting on the radiation element, which reduces insertion loss and avoids phase quantization errors, the proposed antenna can maintain a high aperture efficiency. To verify this concept, a prototype was designed, fabricated, and measured, demonstrating a successful realization of backward and forward patterns with peak gains of 22.3 and 22.1 dBi, and aperture efficiencies of 47.2% and 43.8%. The 3-dB gain bandwidths of reflected and transmitted modes are 13.7% and 12.3%. This antenna has the advantages of high gain, high aperture efficiency, simple configuration, cost-effectiveness, and flexible and digital beam control.
△ Less
Submitted 13 May, 2025;
originally announced May 2025.
-
Image Steganography For Securing Intellicise Wireless Networks: "Invisible Encryption" Against Eavesdroppers
Authors:
Bizhu Wang,
Song Gao,
Rui Meng,
Haixiao Gao,
Xiaodong Xu,
Mengying Sun,
Chen Dong,
Ping Zhang,
Dusit Niyato
Abstract:
As one of the most promising technologies for intellicise (intelligent and consice) wireless networks, Semantic Communication (SemCom) significantly improves communication efficiency by extracting, transmitting, and recovering semantic information, while reducing transmission delay. However, an integration of communication and artificial intelligence (AI) also exposes SemCom to security and privac…
▽ More
As one of the most promising technologies for intellicise (intelligent and consice) wireless networks, Semantic Communication (SemCom) significantly improves communication efficiency by extracting, transmitting, and recovering semantic information, while reducing transmission delay. However, an integration of communication and artificial intelligence (AI) also exposes SemCom to security and privacy threats posed by intelligent eavesdroppers. To address this challenge, image steganography in SemCom embeds secret semantic features within cover semantic features, allowing intelligent eavesdroppers to decode only the cover image. This technique offers a form of "invisible encryption" for SemCom. Motivated by these advancements, this paper conducts a comprehensive exploration of integrating image steganography into SemCom. Firstly, we review existing encryption techniques in SemCom and assess the potential of image steganography in enhancing its security. Secondly, we delve into various image steganographic paradigms designed to secure SemCom, encompassing three categories of joint source-channel coding (JSCC) models tailored for image steganography SemCom, along with multiple training strategies. Thirdly, we present a case study to illustrate the effectiveness of coverless steganography SemCom. Finally, we propose future research directions for image steganography SemCom.
△ Less
Submitted 7 May, 2025;
originally announced May 2025.
-
Generalised Label-free Artefact Cleaning for Real-time Medical Pulsatile Time Series
Authors:
Xuhang Chen,
Ihsane Olakorede,
Stefan Yu Bögli,
Wenhao Xu,
Erta Beqiri,
Xuemeng Li,
Chenyu Tang,
Zeyu Gao,
Shuo Gao,
Ari Ercole,
Peter Smielewski
Abstract:
Artefacts compromise clinical decision-making in the use of medical time series. Pulsatile waveforms offer probabilities for accurate artefact detection, yet most approaches rely on supervised manners and overlook patient-level distribution shifts. To address these issues, we introduce a generalised label-free framework, GenClean, for real-time artefact cleaning and leverage an in-house dataset of…
▽ More
Artefacts compromise clinical decision-making in the use of medical time series. Pulsatile waveforms offer probabilities for accurate artefact detection, yet most approaches rely on supervised manners and overlook patient-level distribution shifts. To address these issues, we introduce a generalised label-free framework, GenClean, for real-time artefact cleaning and leverage an in-house dataset of 180,000 ten-second arterial blood pressure (ABP) samples for training. We first investigate patient-level generalisation, demonstrating robust performances under both intra- and inter-patient distribution shifts. We further validate its effectiveness through challenging cross-disease cohort experiments on the MIMIC-III database. Additionally, we extend our method to photoplethysmography (PPG), highlighting its applicability to diverse medical pulsatile signals. Finally, its integration into ICM+, a clinical research monitoring software, confirms the real-time feasibility of our framework, emphasising its practical utility in continuous physiological monitoring. This work provides a foundational step toward precision medicine in improving the reliability of high-resolution medical time series analysis
△ Less
Submitted 29 April, 2025;
originally announced April 2025.
-
Generative AI for Physical-Layer Authentication
Authors:
Rui Meng,
Xiqi Cheng,
Song Gao,
Xiaodong Xu,
Chen Dong,
Guoshun Nan,
Xiaofeng Tao,
Ping Zhang,
Tony Q. S. Quek
Abstract:
In recent years, Artificial Intelligence (AI)-driven Physical-Layer Authentication (PLA), which focuses on achieving endogenous security and intelligent identity authentication, has attracted considerable interest. When compared with Discriminative AI (DAI), Generative AI (GAI) offers several advantages, such as fingerprint data augmentation, fingerprint denoising and reconstruction, and protectio…
▽ More
In recent years, Artificial Intelligence (AI)-driven Physical-Layer Authentication (PLA), which focuses on achieving endogenous security and intelligent identity authentication, has attracted considerable interest. When compared with Discriminative AI (DAI), Generative AI (GAI) offers several advantages, such as fingerprint data augmentation, fingerprint denoising and reconstruction, and protection against adversarial attacks. Inspired by these innovations, this paper provides a systematic exploration of GAI's integration into PLA frameworks. We commence with a review of representative authentication techniques, emphasizing PLA's inherent strengths. Following this, we revisit four typical GAI models and contrast the limitations of DAI with the potential of GAI in addressing PLA challenges, including insufficient fingerprint data, environment noises and inferences, perturbations in fingerprint data, and complex tasks. Specifically, we delve into providing GAI-enhance methods for PLA across the data, model, and application layers in detail. Moreover, we present a case study that combines fingerprint extrapolation, generative diffusion models, and cooperative nodes to illustrate the superiority of GAI in bolstering the reliability of PLA compared to DAI. Additionally, we outline potential future research directions for GAI-based PLA.
△ Less
Submitted 25 April, 2025;
originally announced April 2025.
-
RepNet-VSR: Reparameterizable Architecture for High-Fidelity Video Super-Resolution
Authors:
Biao Wu,
Diankai Zhang,
Shaoli Liu,
Si Gao,
Chengjian Zheng,
Ning Wang
Abstract:
As a fundamental challenge in visual computing, video super-resolution (VSR) focuses on reconstructing highdefinition video sequences from their degraded lowresolution counterparts. While deep convolutional neural networks have demonstrated state-of-the-art performance in spatial-temporal super-resolution tasks, their computationally intensive nature poses significant deployment challenges for res…
▽ More
As a fundamental challenge in visual computing, video super-resolution (VSR) focuses on reconstructing highdefinition video sequences from their degraded lowresolution counterparts. While deep convolutional neural networks have demonstrated state-of-the-art performance in spatial-temporal super-resolution tasks, their computationally intensive nature poses significant deployment challenges for resource-constrained edge devices, particularly in real-time mobile video processing scenarios where power efficiency and latency constraints coexist. In this work, we propose a Reparameterizable Architecture for High Fidelity Video Super Resolution method, named RepNet-VSR, for real-time 4x video super-resolution. On the REDS validation set, the proposed model achieves 27.79 dB PSNR when processing 180p to 720p frames in 103 ms per 10 frames on a MediaTek Dimensity NPU. The competition results demonstrate an excellent balance between restoration quality and deployment efficiency. The proposed method scores higher than the previous champion algorithm of MAI video super-resolution challenge.
△ Less
Submitted 22 April, 2025;
originally announced April 2025.
-
Aligning Beam with Imbalanced Multi-modality: A Generative Federated Learning Approach
Authors:
Jiahui Liang,
Miaowen Wen,
Shuoyao Wang,
Yuxuan Liang,
Shijian Gao
Abstract:
As vehicle intelligence advances, multi-modal sensing-aided communication emerges as a key enabler for reliable Vehicle-to-Everything (V2X) connectivity through precise environmental characterization. As centralized learning may suffer from data privacy, model heterogeneity and communication overhead issues, federated learning (FL) has been introduced to support V2X. However, the practical deploym…
▽ More
As vehicle intelligence advances, multi-modal sensing-aided communication emerges as a key enabler for reliable Vehicle-to-Everything (V2X) connectivity through precise environmental characterization. As centralized learning may suffer from data privacy, model heterogeneity and communication overhead issues, federated learning (FL) has been introduced to support V2X. However, the practical deployment of FL faces critical challenges: model performance degradation from label imbalance across vehicles and training instability induced by modality disparities in sensor-equipped agents. To overcome these limitations, we propose a generative FL approach for beam selection (GFL4BS). Our solution features two core innovations: 1) An adaptive zero-shot multi-modal generator coupled with spectral-regularized loss functions to enhance the expressiveness of synthetic data compensating for both label scarcity and missing modalities; 2) A hybrid training paradigm integrating feature fusion with decentralized optimization to ensure training resilience while minimizing communication costs. Experimental evaluations demonstrate significant improvements over baselines achieving 16.2% higher accuracy than the current state-of-the-art under severe label imbalance conditions while maintaining over 70% successful rate even when two agents lack both LiDAR and RGB camera inputs.
△ Less
Submitted 1 May, 2025; v1 submitted 20 April, 2025;
originally announced April 2025.
-
Sequential Task Assignment and Resource Allocation in V2X-Enabled Mobile Edge Computing
Authors:
Yufei Ye,
Shijian Gao,
Xinhu Zheng,
Liuqing Yang
Abstract:
Nowadays, the convergence of Mobile Edge Computing (MEC) and vehicular networks has emerged as a vital facilitator for the ever-increasing intelligent onboard applications. This paper introduces a multi-tier task offloading mechanism for MEC-enabled vehicular networks leveraging vehicle-to-everything (V2X) communications. The study focuses on applications with sequential subtasks and explores two…
▽ More
Nowadays, the convergence of Mobile Edge Computing (MEC) and vehicular networks has emerged as a vital facilitator for the ever-increasing intelligent onboard applications. This paper introduces a multi-tier task offloading mechanism for MEC-enabled vehicular networks leveraging vehicle-to-everything (V2X) communications. The study focuses on applications with sequential subtasks and explores two tiers of collaboration. In the vehicle tier, we design a needing vehicle (NV)-helping vehicle (HV) matching scheme and inter-vehicle collaborative computation is studied, with joint optimization of task offloading decision, communication, and computation resource allocation to minimize energy consumption and meet latency requirements. In the roadside unit (RSU) tier, collaboration among RSUs is investigated to address multi-access issues of bandwidth and computation resources for multiple vehicles. A two-step method is proposed to solve the subchannel allocation problem. Detailed experiments are conducted to demonstrate the effectiveness of the proposed method and assess the impact of different parameters on system energy consumption.
△ Less
Submitted 26 March, 2025;
originally announced March 2025.
-
Estimating Control Barriers from Offline Data
Authors:
Hongzhan Yu,
Seth Farrell,
Ryo Yoshimitsu,
Zhizhen Qin,
Henrik I. Christensen,
Sicun Gao
Abstract:
Learning-based methods for constructing control barrier functions (CBFs) are gaining popularity for ensuring safe robot control. A major limitation of existing methods is their reliance on extensive sampling over the state space or online system interaction in simulation. In this work we propose a novel framework for learning neural CBFs through a fixed, sparsely-labeled dataset collected prior to…
▽ More
Learning-based methods for constructing control barrier functions (CBFs) are gaining popularity for ensuring safe robot control. A major limitation of existing methods is their reliance on extensive sampling over the state space or online system interaction in simulation. In this work we propose a novel framework for learning neural CBFs through a fixed, sparsely-labeled dataset collected prior to training. Our approach introduces new annotation techniques based on out-of-distribution analysis, enabling efficient knowledge propagation from the limited labeled data to the unlabeled data. We also eliminate the dependency on a high-performance expert controller, and allow multiple sub-optimal policies or even manual control during data collection. We evaluate the proposed method on real-world platforms. With limited amount of offline data, it achieves state-of-the-art performance for dynamic obstacle avoidance, demonstrating statistically safer and less conservative maneuvers compared to existing methods.
△ Less
Submitted 20 February, 2025;
originally announced March 2025.
-
Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction
Authors:
Ailin Huang,
Boyong Wu,
Bruce Wang,
Chao Yan,
Chen Hu,
Chengli Feng,
Fei Tian,
Feiyu Shen,
Jingbei Li,
Mingrui Chen,
Peng Liu,
Ruihang Miao,
Wang You,
Xi Chen,
Xuerui Yang,
Yechang Huang,
Yuxiang Zhang,
Zheng Gong,
Zixin Zhang,
Hongyu Zhou,
Jianjian Sun,
Brian Li,
Chengting Feng,
Changyi Wan,
Hanpeng Hu
, et al. (120 additional authors not shown)
Abstract:
Real-time speech interaction, serving as a fundamental interface for human-machine collaboration, holds immense potential. However, current open-source models face limitations such as high costs in voice data collection, weakness in dynamic control, and limited intelligence. To address these challenges, this paper introduces Step-Audio, the first production-ready open-source solution. Key contribu…
▽ More
Real-time speech interaction, serving as a fundamental interface for human-machine collaboration, holds immense potential. However, current open-source models face limitations such as high costs in voice data collection, weakness in dynamic control, and limited intelligence. To address these challenges, this paper introduces Step-Audio, the first production-ready open-source solution. Key contributions include: 1) a 130B-parameter unified speech-text multi-modal model that achieves unified understanding and generation, with the Step-Audio-Chat version open-sourced; 2) a generative speech data engine that establishes an affordable voice cloning framework and produces the open-sourced lightweight Step-Audio-TTS-3B model through distillation; 3) an instruction-driven fine control system enabling dynamic adjustments across dialects, emotions, singing, and RAP; 4) an enhanced cognitive architecture augmented with tool calling and role-playing abilities to manage complex tasks effectively. Based on our new StepEval-Audio-360 evaluation benchmark, Step-Audio achieves state-of-the-art performance in human evaluations, especially in terms of instruction following. On open-source benchmarks like LLaMA Question, shows 9.3% average performance improvement, demonstrating our commitment to advancing the development of open-source multi-modal language technologies. Our code and models are available at https://github.com/stepfun-ai/Step-Audio.
△ Less
Submitted 18 February, 2025; v1 submitted 17 February, 2025;
originally announced February 2025.
-
LLM4WM: Adapting LLM for Wireless Multi-Tasking
Authors:
Xuanyu Liu,
Shijian Gao,
Boxun Liu,
Xiang Cheng,
Liuqing Yang
Abstract:
The wireless channel is fundamental to communication, encompassing numerous tasks collectively referred to as channel-associated tasks. These tasks can leverage joint learning based on channel characteristics to share representations and enhance system design. To capitalize on this advantage, LLM4WM is proposed--a large language model (LLM) multi-task fine-tuning framework specifically tailored fo…
▽ More
The wireless channel is fundamental to communication, encompassing numerous tasks collectively referred to as channel-associated tasks. These tasks can leverage joint learning based on channel characteristics to share representations and enhance system design. To capitalize on this advantage, LLM4WM is proposed--a large language model (LLM) multi-task fine-tuning framework specifically tailored for channel-associated tasks. This framework utilizes a Mixture of Experts with Low-Rank Adaptation (MoE-LoRA) approach for multi-task fine-tuning, enabling the transfer of the pre-trained LLM's general knowledge to these tasks. Given the unique characteristics of wireless channel data, preprocessing modules, adapter modules, and multi-task output layers are designed to align the channel data with the LLM's semantic feature space. Experiments on a channel-associated multi-task dataset demonstrate that LLM4WM outperforms existing methodologies in both full-sample and few-shot evaluations, owing to its robust multi-task joint modeling and transfer learning capabilities.
△ Less
Submitted 7 February, 2025; v1 submitted 22 January, 2025;
originally announced January 2025.
-
Synesthesia of Machines (SoM)-Aided FDD Precoding with Sensing Heterogeneity: A Vertical Federated Learning Approach
Authors:
Haotian Zhang,
Shijian Gao,
Weibo Wen,
Xiang Cheng
Abstract:
High complexity in precoding design for frequency division duplex systems necessitates streamlined solutions. Guided by Synesthesia of Machines (SoM), this paper introduces a heterogeneous multi-vehicle, multi-modal sensing aided precoding scheme within a vertical federated learning (VFL) framework, which significantly minimizes pilot sequence length while optimizing the system's sum rate. We addr…
▽ More
High complexity in precoding design for frequency division duplex systems necessitates streamlined solutions. Guided by Synesthesia of Machines (SoM), this paper introduces a heterogeneous multi-vehicle, multi-modal sensing aided precoding scheme within a vertical federated learning (VFL) framework, which significantly minimizes pilot sequence length while optimizing the system's sum rate. We address the challenges posed by local data heterogeneity due to varying on-board sensor configurations through a meticulously designed VFL training procedure. To extract valuable channel features from multi-modal sensing, we employ three distinct data preprocessing methods that convert raw data into informative representations relevant for precoding. Additionally, we propose an online training strategy based on VFL framework, enabling the scheme to adapt dynamically to fluctuations in user numbers. Numerical results indicate that our approach, utilizing short pilot sequences, closely approximates the performance of traditional optimization methods with perfect channel state information.
△ Less
Submitted 13 March, 2025; v1 submitted 18 January, 2025;
originally announced January 2025.
-
Predictive Target-to-User Association in Complex Scenarios via Hybrid-Field ISAC Signaling
Authors:
Yifeng Yuan,
Miaowen Wen,
Xinhu Zheng,
Shuoyao Wang,
Shijian Gao
Abstract:
This paper presents a novel and robust target-to-user (T2U) association framework to support reliable vehicle-to-infrastructure (V2I) networks that potentially operate within the hybrid field (near-field and far-field). To address the challenges posed by complex vehicle maneuvers and user association ambiguity, an interacting multiple-model filtering scheme is developed, which combines coordinated…
▽ More
This paper presents a novel and robust target-to-user (T2U) association framework to support reliable vehicle-to-infrastructure (V2I) networks that potentially operate within the hybrid field (near-field and far-field). To address the challenges posed by complex vehicle maneuvers and user association ambiguity, an interacting multiple-model filtering scheme is developed, which combines coordinated turn and constant velocity models for predictive beamforming. Building upon this foundation, a lightweight association scheme leverages user-specific integrated sensing and communication (ISAC) signaling while employing probabilistic data association to manage clutter measurements in dense traffic. Numerical results validate that the proposed framework significantly outperforms conventional methods in terms of both tracking accuracy and association reliability.
△ Less
Submitted 15 April, 2025; v1 submitted 18 January, 2025;
originally announced January 2025.
-
A Survey of Secure Semantic Communications
Authors:
Rui Meng,
Song Gao,
Dayu Fan,
Haixiao Gao,
Yining Wang,
Xiaodong Xu,
Bizhu Wang,
Suyu Lv,
Zhidi Zhang,
Mengying Sun,
Shujun Han,
Chen Dong,
Xiaofeng Tao,
Ping Zhang
Abstract:
Semantic communication (SemCom) is regarded as a promising and revolutionary technology in 6G, aiming to transcend the constraints of ``Shannon's trap" by filtering out redundant information and extracting the core of effective data. Compared to traditional communication paradigms, SemCom offers several notable advantages, such as reducing the burden on data transmission, enhancing network managem…
▽ More
Semantic communication (SemCom) is regarded as a promising and revolutionary technology in 6G, aiming to transcend the constraints of ``Shannon's trap" by filtering out redundant information and extracting the core of effective data. Compared to traditional communication paradigms, SemCom offers several notable advantages, such as reducing the burden on data transmission, enhancing network management efficiency, and optimizing resource allocation. Numerous researchers have extensively explored SemCom from various perspectives, including network architecture, theoretical analysis, potential technologies, and future applications. However, as SemCom continues to evolve, a multitude of security and privacy concerns have arisen, posing threats to the confidentiality, integrity, and availability of SemCom systems. This paper presents a comprehensive survey of the technologies that can be utilized to secure SemCom. Firstly, we elaborate on the entire life cycle of SemCom, which includes the model training, model transfer, and semantic information transmission phases. Then, we identify the security and privacy issues that emerge during these three stages. Furthermore, we summarize the techniques available to mitigate these security and privacy threats, including data cleaning, robust learning, defensive strategies against backdoor attacks, adversarial training, differential privacy, cryptography, blockchain technology, model compression, and physical-layer security. Lastly, this paper outlines future research directions to guide researchers in related fields.
△ Less
Submitted 26 March, 2025; v1 submitted 1 January, 2025;
originally announced January 2025.
-
SECodec: Structural Entropy-based Compressive Speech Representation Codec for Speech Language Models
Authors:
Linqin Wang,
Yaping Liu,
Zhengtao Yu,
Shengxiang Gao,
Cunli Mao,
Yuxin Huang,
Wenjun Wang,
Ling Dong
Abstract:
With the rapid advancement of large language models (LLMs), discrete speech representations have become crucial for integrating speech into LLMs. Existing methods for speech representation discretization rely on a predefined codebook size and Euclidean distance-based quantization. However, 1) the size of codebook is a critical parameter that affects both codec performance and downstream task train…
▽ More
With the rapid advancement of large language models (LLMs), discrete speech representations have become crucial for integrating speech into LLMs. Existing methods for speech representation discretization rely on a predefined codebook size and Euclidean distance-based quantization. However, 1) the size of codebook is a critical parameter that affects both codec performance and downstream task training efficiency. 2) The Euclidean distance-based quantization may lead to audio distortion when the size of the codebook is controlled within a reasonable range. In fact, in the field of information compression, structural information and entropy guidance are crucial, but previous methods have largely overlooked these factors. Therefore, we address the above issues from an information-theoretic perspective, we present SECodec, a novel speech representation codec based on structural entropy (SE) for building speech language models. Specifically, we first model speech as a graph, clustering the speech features nodes within the graph and extracting the corresponding codebook by hierarchically and disentangledly minimizing 2D SE. Then, to address the issue of audio distortion, we propose a new quantization method. This method still adheres to the 2D SE minimization principle, adaptively selecting the most suitable token corresponding to the cluster for each incoming original speech node. Furthermore, we develop a Structural Entropy-based Speech Language Model (SESLM) that leverages SECodec. Experimental results demonstrate that SECodec performs comparably to EnCodec in speech reconstruction, and SESLM surpasses VALL-E in zero-shot text-to-speech tasks. Code, demo speeches, speech feature graph, SE codebook, and models are available at https://github.com/wlq2019/SECodec.
△ Less
Submitted 15 December, 2024;
originally announced January 2025.
-
Synesthesia of Machine (SoM)-Driven Analog Precoder Optimization for Enhanced ISAC Performance in Sub-THz Systems
Authors:
Zonghui Yang,
Shijian Gao,
Xiang Cheng
Abstract:
Integrated sensing and communication (ISAC) is anticipated to be widely used in future sub-terahertz (sub-THz) systems. With the line-of-sight (LoS) propagation characteristics of sub-THz channels, ISAC transmitter design largely parallels analog precoder optimization. However, balancing both sensing and communication functionalities is challenging due to the beam squint effect in sub-THz systems,…
▽ More
Integrated sensing and communication (ISAC) is anticipated to be widely used in future sub-terahertz (sub-THz) systems. With the line-of-sight (LoS) propagation characteristics of sub-THz channels, ISAC transmitter design largely parallels analog precoder optimization. However, balancing both sensing and communication functionalities is challenging due to the beam squint effect in sub-THz systems, limiting ISAC performance gains. To overcome this, the unique design flexibility of sub-THz analog hardware is explored to better adapt to the electromagnetic characteristics of sub-THz channels. It is demonstrated that adjusting the equivalent channel through the analog precoder enhances dual-functional gains. Based on this, a near-optimal benchmark for analog precoder optimization is proposed. To address excessive algorithmic complexity, inspiration is drawn from the synesthesia of machine (SoM) to develop a lightweight complex-valued squint-aware network (CSP-Net). This network reduces complexity by utilizing both communication and sensing channel data, with an architecture tailored to specific data and task characteristics. The effectiveness of the proposed schemes is validated through simulations.
△ Less
Submitted 2 March, 2025; v1 submitted 18 December, 2024;
originally announced December 2024.
-
WiFo: Wireless Foundation Model for Channel Prediction
Authors:
Boxun Liu,
Shijian Gao,
Xuanyu Liu,
Xiang Cheng,
Liuqing Yang
Abstract:
Channel prediction permits to acquire channel state information (CSI) without signaling overhead. However, almost all existing channel prediction methods necessitate the deployment of a dedicated model to accommodate a specific configuration. Leveraging the powerful modeling and multi-task learning capabilities of foundation models, we propose the first space-time-frequency (STF) wireless foundati…
▽ More
Channel prediction permits to acquire channel state information (CSI) without signaling overhead. However, almost all existing channel prediction methods necessitate the deployment of a dedicated model to accommodate a specific configuration. Leveraging the powerful modeling and multi-task learning capabilities of foundation models, we propose the first space-time-frequency (STF) wireless foundation model (WiFo) to address time-frequency channel prediction tasks in a one-for-all manner. Specifically, WiFo is initially pre-trained over massive and extensive diverse CSI datasets. Then, the model will be instantly used for channel prediction under various CSI configurations without any fine-tuning. We propose a masked autoencoder (MAE)-based network structure for WiFo to handle heterogeneous STF CSI data, and design several mask reconstruction tasks for self-supervised pre-training to capture the inherent 3D variations of CSI. To fully unleash its predictive power, we build a large-scale heterogeneous simulated CSI dataset consisting of 160K CSI samples for pre-training. Simulations validate its superior unified learning performance across multiple datasets and demonstrate its state-of-the-art (SOTA) zero-shot generalization performance via comparisons with other full-shot baselines.
△ Less
Submitted 19 March, 2025; v1 submitted 11 December, 2024;
originally announced December 2024.
-
Pruned Convolutional Attention Network Based Wideband Spectrum Sensing with Sub-Nyquist Sampling
Authors:
Peihao Dong,
Jibin Jia,
Shen Gao,
Fuhui Zhou,
Qihui Wu
Abstract:
Wideband spectrum sensing (WSS) is critical for orchestrating multitudinous wireless transmissions via spectrum sharing, but may incur excessive costs of hardware, power and computation due to the high sampling rate. In this article, a deep learning based WSS framework embedding the multicoset preprocessing is proposed to enable the low-cost sub-Nyquist sampling. A pruned convolutional attention W…
▽ More
Wideband spectrum sensing (WSS) is critical for orchestrating multitudinous wireless transmissions via spectrum sharing, but may incur excessive costs of hardware, power and computation due to the high sampling rate. In this article, a deep learning based WSS framework embedding the multicoset preprocessing is proposed to enable the low-cost sub-Nyquist sampling. A pruned convolutional attention WSS network (PCA-WSSNet) is designed to organically integrate the multicoset preprocessing and the convolutional attention mechanism as well as to reduce the model complexity remarkably via the selective weight pruning without the performance loss. Furthermore, a transfer learning (TL) strategy benefiting from the model pruning is developed to improve the robustness of PCA-WSSNet with few adaptation samples of new scenarios. Simulation results show the performance superiority of PCA-WSSNet over the state of the art. Compared with direct TL, the pruned TL strategy can simultaneously improve the prediction accuracy in unseen scenarios, reduce the model size, and accelerate the model inference.
△ Less
Submitted 30 November, 2024;
originally announced December 2024.
-
An AI-driven multimodal smart home platform for continuous monitoring and intelligent assistance in post-stroke patients
Authors:
Chenyu Tang,
Ruizhi Zhang,
Shuo Gao,
Zihe Zhao,
Zibo Zhang,
Jiaqi Wang,
Cong Li,
Junliang Chen,
Yanning Dai,
Shengbo Wang,
Ruoyu Juan,
Qiaoying Li,
Ruimou Xie,
Xuhang Chen,
Xinkai Zhou,
Yunjia Xia,
Jianan Chen,
Fanghao Lu,
Xin Li,
Ninglli Wang,
Peter Smielewski,
Yu Pan,
Hubin Zhao,
Luigi G. Occhipinti
Abstract:
At-home rehabilitation for post-stroke patients presents significant challenges, as continuous, personalized care is often limited outside clinical settings. Additionally, the absence of comprehensive solutions addressing diverse monitoring and assistance needs in home environments complicates recovery efforts. Here, we present a multimodal smart home platform designed for continuous, at-home reha…
▽ More
At-home rehabilitation for post-stroke patients presents significant challenges, as continuous, personalized care is often limited outside clinical settings. Additionally, the absence of comprehensive solutions addressing diverse monitoring and assistance needs in home environments complicates recovery efforts. Here, we present a multimodal smart home platform designed for continuous, at-home rehabilitation of post-stroke patients, integrating wearable sensing, ambient monitoring, and adaptive automation. A plantar pressure insole equipped with a machine learning pipeline classifies users into motor recovery stages with up to 94% accuracy, enabling quantitative tracking of walking patterns. A head-mounted eye-tracking module supports cognitive assessments and hands-free control of household devices, while ambient sensors ensure sub-second response times for interaction. These data streams are fused locally via a hierarchical Internet of Things (IoT) architecture, protecting privacy and minimizing latency. An embedded large language model (LLM) agent, Auto-Care, continuously interprets multimodal data to provide real-time interventions-issuing personalized reminders, adjusting environmental conditions, and notifying caregivers. Implemented in a post-stroke context, this integrated smart home platform increases overall user satisfaction by an average of 115% (p<0.01) compared to traditional home environment. Beyond stroke, the system offers a scalable framework for patient-centered, long-term care in broader neurorehabilitation and aging-in-place applications.
△ Less
Submitted 15 April, 2025; v1 submitted 28 November, 2024;
originally announced November 2024.
-
Wearable intelligent throat enables natural speech in stroke patients with dysarthria
Authors:
Chenyu Tang,
Shuo Gao,
Cong Li,
Wentian Yi,
Yuxuan Jin,
Xiaoxue Zhai,
Sixuan Lei,
Hongbei Meng,
Zibo Zhang,
Muzi Xu,
Shengbo Wang,
Xuhang Chen,
Chenxi Wang,
Hongyun Yang,
Ningli Wang,
Wenyu Wang,
Jin Cao,
Xiaodong Feng,
Peter Smielewski,
Yu Pan,
Wenhui Song,
Martin Birchall,
Luigi G. Occhipinti
Abstract:
Wearable silent speech systems hold significant potential for restoring communication in patients with speech impairments. However, seamless, coherent speech remains elusive, and clinical efficacy is still unproven. Here, we present an AI-driven intelligent throat (IT) system that integrates throat muscle vibrations and carotid pulse signal sensors with large language model (LLM) processing to ena…
▽ More
Wearable silent speech systems hold significant potential for restoring communication in patients with speech impairments. However, seamless, coherent speech remains elusive, and clinical efficacy is still unproven. Here, we present an AI-driven intelligent throat (IT) system that integrates throat muscle vibrations and carotid pulse signal sensors with large language model (LLM) processing to enable fluent, emotionally expressive communication. The system utilizes ultrasensitive textile strain sensors to capture high-quality signals from the neck area and supports token-level processing for real-time, continuous speech decoding, enabling seamless, delay-free communication. In tests with five stroke patients with dysarthria, IT's LLM agents intelligently corrected token errors and enriched sentence-level emotional and logical coherence, achieving low error rates (4.2% word error rate, 2.9% sentence error rate) and a 55% increase in user satisfaction. This work establishes a portable, intuitive communication platform for patients with dysarthria with the potential to be applied broadly across different neurological conditions and in multi-language support systems.
△ Less
Submitted 14 March, 2025; v1 submitted 27 November, 2024;
originally announced November 2024.
-
Sensing Capacity for Integrated Sensing and Communication Systems in Low-Altitude Economy
Authors:
Jiahua Wan,
Hong Ren,
Cunhua Pan,
Zhenkun Zhang,
Songtao Gao,
Yiming Yu,
Chengzhong Wang
Abstract:
The burgeoning significance of the low-altitude economy (LAE) has garnered considerable interest, largely fuelled by the widespread deployment of unmanned aerial vehicles (UAVs). To tackle the challenges associated with the detection of unauthorized UAVs and the efficient scheduling of authorized UAVs, this letter introduces a novel performance metric, termed sensing capacity, for integrated sensi…
▽ More
The burgeoning significance of the low-altitude economy (LAE) has garnered considerable interest, largely fuelled by the widespread deployment of unmanned aerial vehicles (UAVs). To tackle the challenges associated with the detection of unauthorized UAVs and the efficient scheduling of authorized UAVs, this letter introduces a novel performance metric, termed sensing capacity, for integrated sensing and communication (ISAC) systems. This metric, which quantifies the capability of a base station (BS) to detect multiple UAVs simultaneously, leverages signal-to-noise ratio (SNR) and probability of detection (PD) as key intermediate variables. Through mathematical derivations, we can derive a closed-form solution for the maximum number of UAVs that can be detected by the BS while adhering to a specific SNR constraint. Furthermore, an approximate solution based on PD constraints is proposed to facilitate the efficient determination of the threshold for the maximum number of detectable UAVs. The accuracy of this analytical approach is verified through extensive simulation results.
△ Less
Submitted 11 November, 2024;
originally announced November 2024.
-
Linear Quadratic Mean Field Games with Quantile-Dependent Cost Coefficients
Authors:
Shuang Gao,
Roland P. Malhamé
Abstract:
This paper studies a class of linear quadratic mean field games where the coefficients of quadratic cost functions depend on both the mean and the variance of the population's state distribution through its quantile function. Such a formulation allows for modelling agents that are sensitive to not only the population average but also the population variance. The corresponding mean field game equil…
▽ More
This paper studies a class of linear quadratic mean field games where the coefficients of quadratic cost functions depend on both the mean and the variance of the population's state distribution through its quantile function. Such a formulation allows for modelling agents that are sensitive to not only the population average but also the population variance. The corresponding mean field game equilibrium is identified, which involves solving two coupled differential equations: one is a Riccati equation and the other the variance evolution equation. Furthermore, the conditions for the existence and uniqueness of the mean field equilibrium are established. Finally, numerical results are presented to illustrate the behavior of two coupled differential equations and the performance of the mean field game solution.
△ Less
Submitted 3 November, 2024;
originally announced November 2024.
-
SEEV: Synthesis with Efficient Exact Verification for ReLU Neural Barrier Functions
Authors:
Hongchao Zhang,
Zhizhen Qin,
Sicun Gao,
Andrew Clark
Abstract:
Neural Control Barrier Functions (NCBFs) have shown significant promise in enforcing safety constraints on nonlinear autonomous systems. State-of-the-art exact approaches to verifying safety of NCBF-based controllers exploit the piecewise-linear structure of ReLU neural networks, however, such approaches still rely on enumerating all of the activation regions of the network near the safety boundar…
▽ More
Neural Control Barrier Functions (NCBFs) have shown significant promise in enforcing safety constraints on nonlinear autonomous systems. State-of-the-art exact approaches to verifying safety of NCBF-based controllers exploit the piecewise-linear structure of ReLU neural networks, however, such approaches still rely on enumerating all of the activation regions of the network near the safety boundary, thus incurring high computation cost. In this paper, we propose a framework for Synthesis with Efficient Exact Verification (SEEV). Our framework consists of two components, namely (i) an NCBF synthesis algorithm that introduces a novel regularizer to reduce the number of activation regions at the safety boundary, and (ii) a verification algorithm that exploits tight over-approximations of the safety conditions to reduce the cost of verifying each piecewise-linear segment. Our simulations show that SEEV significantly improves verification efficiency while maintaining the CBF quality across various benchmark systems and neural network structures. Our code is available at https://github.com/HongchaoZhang-HZ/SEEV.
△ Less
Submitted 26 October, 2024;
originally announced October 2024.
-
UbiHR: Resource-efficient Long-range Heart Rate Sensing on Ubiquitous Devices
Authors:
Haoyu Bian,
Bin Guo,
Sicong Liu,
Yasan Ding,
Shanshan Gao,
Zhiwen Yu
Abstract:
Ubiquitous on-device heart rate sensing is vital for high-stress individuals and chronic patients. Non-contact sensing, compared to contact-based tools, allows for natural user monitoring, potentially enabling more accurate and holistic data collection. However, in open and uncontrolled mobile environments, user movement and lighting introduce. Existing methods, such as curve-based or short-range…
▽ More
Ubiquitous on-device heart rate sensing is vital for high-stress individuals and chronic patients. Non-contact sensing, compared to contact-based tools, allows for natural user monitoring, potentially enabling more accurate and holistic data collection. However, in open and uncontrolled mobile environments, user movement and lighting introduce. Existing methods, such as curve-based or short-range deep learning recognition based on adjacent frames, strike the optimal balance between real-time performance and accuracy, especially under limited device resources. In this paper, we present UbiHR, a ubiquitous device-based heart rate sensing system. Key to UbiHR is a real-time long-range spatio-temporal model enabling noise-independent heart rate recognition and display on commodity mobile devices, along with a set of mechanisms for prompt and energy-efficient sampling and preprocessing. Diverse experiments and user studies involving four devices, four tasks, and 80 participants demonstrate UbiHR's superior performance, enhancing accuracy by up to 74.2\% and reducing latency by 51.2\%.
△ Less
Submitted 24 October, 2024;
originally announced October 2024.
-
High-Order Associative Learning Based on Memristive Circuits for Efficient Learning
Authors:
Shengbo Wang,
Xuemeng Li,
Jialin Ding,
Weihao Ma,
Ying Wang,
Luigi Occhipinti,
Arokia Nathan,
Shuo Gao
Abstract:
Memristive associative learning has gained significant attention for its ability to mimic fundamental biological learning mechanisms while maintaining system simplicity. In this work, we introduce a high-order memristive associative learning framework with a biologically realistic structure. By utilizing memristors as synaptic modules and their state information to bridge different orders of assoc…
▽ More
Memristive associative learning has gained significant attention for its ability to mimic fundamental biological learning mechanisms while maintaining system simplicity. In this work, we introduce a high-order memristive associative learning framework with a biologically realistic structure. By utilizing memristors as synaptic modules and their state information to bridge different orders of associative learning, our design effectively establishes associations between multiple stimuli and replicates the transient nature of high-order associative learning. In Pavlov's classical conditioning experiments, our design achieves a 230% improvement in learning efficiency compared to previous works, with memristor power consumption in the synaptic modules remaining below 11 μW. In large-scale image recognition tasks, we utilize a 20*20 memristor array to represent images, enabling the system to recognize and label test images with semantic information at 100% accuracy. This scalability across different tasks highlights the framework's potential for a wide range of applications, offering enhanced learning efficiency for current memristor-based neuromorphic systems.
△ Less
Submitted 22 October, 2024;
originally announced October 2024.
-
Towards Single-Lens Controllable Depth-of-Field Imaging via Depth-Aware Point Spread Functions
Authors:
Xiaolong Qian,
Qi Jiang,
Yao Gao,
Shaohua Gao,
Zhonghua Yi,
Lei Sun,
Kai Wei,
Haifeng Li,
Kailun Yang,
Kaiwei Wang,
Jian Bai
Abstract:
Controllable Depth-of-Field (DoF) imaging commonly produces amazing visual effects based on heavy and expensive high-end lenses. However, confronted with the increasing demand for mobile scenarios, it is desirable to achieve a lightweight solution with Minimalist Optical Systems (MOS). This work centers around two major limitations of MOS, i.e., the severe optical aberrations and uncontrollable Do…
▽ More
Controllable Depth-of-Field (DoF) imaging commonly produces amazing visual effects based on heavy and expensive high-end lenses. However, confronted with the increasing demand for mobile scenarios, it is desirable to achieve a lightweight solution with Minimalist Optical Systems (MOS). This work centers around two major limitations of MOS, i.e., the severe optical aberrations and uncontrollable DoF, for achieving single-lens controllable DoF imaging via computational methods. A Depth-aware Controllable DoF Imaging (DCDI) framework is proposed equipped with All-in-Focus (AiF) aberration correction and monocular depth estimation, where the recovered image and corresponding depth map are utilized to produce imaging results under diverse DoFs of any high-end lens via patch-wise convolution. To address the depth-varying optical degradation, we introduce a Depth-aware Degradation-adaptive Training (DA2T) scheme. At the dataset level, a Depth-aware Aberration MOS (DAMOS) dataset is established based on the simulation of Point Spread Functions (PSFs) under different object distances. Additionally, we design two plug-and-play depth-aware mechanisms to embed depth information into the aberration image recovery for better tackling depth-aware degradation. Furthermore, we propose a storage-efficient Omni-Lens-Field model to represent the 4D PSF library of various lenses. With the predicted depth map, recovered image, and depth-aware PSF map inferred by Omni-Lens-Field, single-lens controllable DoF imaging is achieved. Comprehensive experimental results demonstrate that the proposed framework enhances the recovery performance, and attains impressive single-lens controllable DoF imaging results, providing a seminal baseline for this field. The source code and the established dataset will be publicly available at https://github.com/XiaolongQian/DCDI.
△ Less
Submitted 11 February, 2025; v1 submitted 15 September, 2024;
originally announced September 2024.
-
An End-to-End Approach for Chord-Conditioned Song Generation
Authors:
Shuochen Gao,
Shun Lei,
Fan Zhuo,
Hangyu Liu,
Feng Liu,
Boshi Tang,
Qiaochu Huang,
Shiyin Kang,
Zhiyong Wu
Abstract:
The Song Generation task aims to synthesize music composed of vocals and accompaniment from given lyrics. While the existing method, Jukebox, has explored this task, its constrained control over the generations often leads to deficiency in music performance. To mitigate the issue, we introduce an important concept from music composition, namely chords, to song generation networks. Chords form the…
▽ More
The Song Generation task aims to synthesize music composed of vocals and accompaniment from given lyrics. While the existing method, Jukebox, has explored this task, its constrained control over the generations often leads to deficiency in music performance. To mitigate the issue, we introduce an important concept from music composition, namely chords, to song generation networks. Chords form the foundation of accompaniment and provide vocal melody with associated harmony. Given the inaccuracy of automatic chord extractors, we devise a robust cross-attention mechanism augmented with dynamic weight sequence to integrate extracted chord information into song generations and reduce frame-level flaws, and propose a novel model termed Chord-Conditioned Song Generator (CSG) based on it. Experimental evidence demonstrates our proposed method outperforms other approaches in terms of musical performance and control precision of generated songs.
△ Less
Submitted 10 September, 2024;
originally announced September 2024.
-
A Flexible Framework for Universal Computational Aberration Correction via Automatic Lens Library Generation and Domain Adaptation
Authors:
Qi Jiang,
Yao Gao,
Shaohua Gao,
Zhonghua Yi,
Lei Sun,
Hao Shi,
Kailun Yang,
Kaiwei Wang,
Jian Bai
Abstract:
Emerging universal Computational Aberration Correction (CAC) paradigms provide an inspiring solution to light-weight and high-quality imaging without repeated data preparation and model training to accommodate new lens designs. However, the training databases in these approaches, i.e., the lens libraries (LensLibs), suffer from their limited coverage of real-world aberration behaviors. In this wor…
▽ More
Emerging universal Computational Aberration Correction (CAC) paradigms provide an inspiring solution to light-weight and high-quality imaging without repeated data preparation and model training to accommodate new lens designs. However, the training databases in these approaches, i.e., the lens libraries (LensLibs), suffer from their limited coverage of real-world aberration behaviors. In this work, we set up an OmniLens framework for universal CAC, considering both the generalization ability and flexibility. OmniLens extends the idea of universal CAC to a broader concept, where a base model is trained for three cases, including zero-shot CAC with the pre-trained model, few-shot CAC with a little lens-specific data for fine-tuning, and domain adaptive CAC using domain adaptation for lens-descriptions-unknown lens. In terms of OmniLens's data foundation, we first propose an Evolution-based Automatic Optical Design (EAOD) pipeline to construct LensLib automatically, coined AODLib, whose diversity is enriched by an evolution framework, with comprehensive constraints and a hybrid optimization strategy for achieving realistic aberration behaviors. For network design, we introduce the guidance of high-quality codebook priors to facilitate zero-shot CAC and few-shot CAC, which enhances the model's generalization ability, while also boosting its convergence in a few-shot case. Furthermore, based on the statistical observation of dark channel priors in optical degradation, we design an unsupervised regularization term to adapt the base model to the target descriptions-unknown lens using its aberration images without ground truth. We validate OmniLens on 4 manually designed low-end lenses with various structures and aberration behaviors. Remarkably, the base model trained on AODLib exhibits strong generalization capabilities, achieving 97% of the lens-specific performance in a zero-shot setting.
△ Less
Submitted 9 September, 2024;
originally announced September 2024.
-
GEM: A GEneral Memristive Transistor Model
Authors:
Shengbo Wang,
Jingfang Pei,
Cong Li,
Xuemeng Li,
Li Tao,
Arokia Nathan,
Guohua Hu,
Shuo Gao
Abstract:
Neuromorphic devices, with their distinct advantages in energy efficiency and parallel processing, are pivotal in advancing artificial intelligence applications. Among these devices, memristive transistors have attracted significant attention due to their superior stability and operation flexibility compared to two-terminal memristors. However, the lack of a robust model that accurately captures t…
▽ More
Neuromorphic devices, with their distinct advantages in energy efficiency and parallel processing, are pivotal in advancing artificial intelligence applications. Among these devices, memristive transistors have attracted significant attention due to their superior stability and operation flexibility compared to two-terminal memristors. However, the lack of a robust model that accurately captures their complex electrical behavior has hindered further exploration of their potential. In this work, we introduce the GEneral Memristive transistor (GEM) model to address this challenge. The GEM model incorporates time-dependent differential equation, a voltage-controlled moving window function, and a nonlinear current output function, enabling precise representation of both switching and output characteristics in memristive transistors. Compared to previous models, the GEM model demonstrates a 300% improvement in modeling the switching behavior, while effectively capturing the inherent nonlinearities and physical limits of these devices. This advancement significantly enhances the realistic simulation of memristive transistors, thereby facilitating further exploration and application development.
△ Less
Submitted 7 November, 2024; v1 submitted 27 August, 2024;
originally announced August 2024.
-
Synesthesia of Machines (SoM)-Enhanced ISAC Precoding for Vehicular Networks with Double Dynamics
Authors:
Zonghui Yang,
Shijian Gao,
Xiang Cheng,
Liuqing Yang
Abstract:
Integrated sensing and communication (ISAC) technology is vital for vehicular networks, yet the time-varying communication channels and rapid movement of targets present significant challenges for real-time precoding design. Traditional optimization-based methods are computationally complex and depend on perfect prior information, which is often unavailable in double-dynamic scenarios. In this pap…
▽ More
Integrated sensing and communication (ISAC) technology is vital for vehicular networks, yet the time-varying communication channels and rapid movement of targets present significant challenges for real-time precoding design. Traditional optimization-based methods are computationally complex and depend on perfect prior information, which is often unavailable in double-dynamic scenarios. In this paper, we propose a synesthesia of machine (SoM)-enhanced precoding paradigm that leverages modalities such as positioning and channel information to adapt to these dynamics. Utilizing a deep reinforcement learning (DRL) framework, our approach pushes ISAC performance boundaries. We also introduce a parameter-shared actor-critic architecture to accelerate training in complex state and action spaces. Extensive experiments validate the superiority of our method over existing approaches.
△ Less
Submitted 3 December, 2024; v1 submitted 24 August, 2024;
originally announced August 2024.
-
Synesthesia of Machines (SoM)-Enhanced Wideband Multi-User CSI Learning With LiDAR Sensing
Authors:
Haotian Zhang,
Shijian Gao,
Xiang Cheng,
Liuqing Yang
Abstract:
Light detection and ranging (LiDAR) has been utilized for optimizing wireless communications due to its ability to detect the environment. This paper explores the use of LiDAR in channel estimation for wideband multi-user multiple-input-multiple-output orthogonal frequency division multiplexing systems and introduces a LiDAR-enhanced Channel State Information (CSI) learning network (LE-CLN). By ut…
▽ More
Light detection and ranging (LiDAR) has been utilized for optimizing wireless communications due to its ability to detect the environment. This paper explores the use of LiDAR in channel estimation for wideband multi-user multiple-input-multiple-output orthogonal frequency division multiplexing systems and introduces a LiDAR-enhanced Channel State Information (CSI) learning network (LE-CLN). By utilizing user positioning information, LE-CLN first calculates user-localized over-complete angular measurements. It then investigates the correlation between LiDAR and CSI, transforming raw LiDAR data into a low-complexity format embedded with signal propagation characteristics. LE-CLN also adapts the use of LiDAR based on channel conditions through attention mechanisms. Thanks to the unique wireless features offered by LiDAR, LE-CLN achieves higher estimation accuracy and spectrum efficiency compared to benchmarks, particularly in latency-sensitive applications where pilot transmissions are expected to be reduced.
△ Less
Submitted 18 March, 2025; v1 submitted 22 August, 2024;
originally announced August 2024.
-
RL-ADN: A High-Performance Deep Reinforcement Learning Environment for Optimal Energy Storage Systems Dispatch in Active Distribution Networks
Authors:
Shengren Hou,
Shuyi Gao,
Weijie Xia,
Edgar Mauricio Salazar Duque,
Peter Palensky,
Pedro P. Vergara
Abstract:
Deep Reinforcement Learning (DRL) presents a promising avenue for optimizing Energy Storage Systems (ESSs) dispatch in distribution networks. This paper introduces RL-ADN, an innovative open-source library specifically designed for solving the optimal ESSs dispatch in active distribution networks. RL-ADN offers unparalleled flexibility in modeling distribution networks, and ESSs, accommodating a w…
▽ More
Deep Reinforcement Learning (DRL) presents a promising avenue for optimizing Energy Storage Systems (ESSs) dispatch in distribution networks. This paper introduces RL-ADN, an innovative open-source library specifically designed for solving the optimal ESSs dispatch in active distribution networks. RL-ADN offers unparalleled flexibility in modeling distribution networks, and ESSs, accommodating a wide range of research goals. A standout feature of RL-ADN is its data augmentation module, based on Gaussian Mixture Model and Copula (GMC) functions, which elevates the performance ceiling of DRL agents. Additionally, RL-ADN incorporates the Laurent power flow solver, significantly reducing the computational burden of power flow calculations during training without sacrificing accuracy. The effectiveness of RL-ADN is demonstrated using in different sizes of distribution networks, showing marked performance improvements in the adaptability of DRL algorithms for ESS dispatch tasks. This enhancement is particularly beneficial from the increased diversity of training scenarios. Furthermore, RL-ADN achieves a tenfold increase in computational efficiency during training, making it highly suitable for large-scale network applications. The library sets a new benchmark in DRL-based ESSs dispatch in distribution networks and it is poised to advance DRL applications in distribution network operations significantly. RL-ADN is available at: https://github.com/ShengrenHou/RL-ADN and https://github.com/distributionnetworksTUDelft/RL-ADN.
△ Less
Submitted 8 August, 2024; v1 submitted 7 August, 2024;
originally announced August 2024.
-
Multibeam Hybrid Transmitarray Based on Polarization Rotating Metasurface With Reconfigurable Bidirectional Radiation
Authors:
Fan Qin,
Yifei Liu,
Chao Gu,
Linfeng Zeng,
Wenchi Cheng,
Hailin Zhang,
Steven Gao
Abstract:
This paper proposes a bidirectional multibeam hybrid transmitarray (HTA) employing a transmission polarization-rotating metasurface (TPRM). A novel configuration is introduced to facilitate bidirectional beam scanning by combining the transmitarray (TA) and folded-transmitarray (FTA). To accomplish the reconfiguration of both unidirectional and bidirectional radiation states in the +z, -z, and +/-…
▽ More
This paper proposes a bidirectional multibeam hybrid transmitarray (HTA) employing a transmission polarization-rotating metasurface (TPRM). A novel configuration is introduced to facilitate bidirectional beam scanning by combining the transmitarray (TA) and folded-transmitarray (FTA). To accomplish the reconfiguration of both unidirectional and bidirectional radiation states in the +z, -z, and +/-z directions, a polarization switchable multi-feed array (MFA) is placed at the focal plane between the TA and FTA, radiating x-polarization, y-polarization, and 45-degree oblique polarization waves, respectively. Meanwhile, the proposed antenna can achieve multibeam radiation in the three aforementioned states by switching the polarization of the MFA. To demonstrate the operating principle, a prototype has been designed, simulated, and fabricated. The measured results agree well with the simulated results. The simulated and measured results indicate that the proposed design can generate reconfigurable multibeam in both forward and backward directions, either separately or simultaneously. In the unidirectional states, forward and backward beam scanning is achieved within an angular range of +/-30° and +/-22°, respectively, with peak gains of 23.6 dBi and 23.1 dBi. A simultaneous forward and backward beam scanning of +/-40° and +/-22° is achieved in the hybrid radiation state, with peak gains of 19.4 dBi and 19.3 dBi, respectively. The proposed antenna array design offers several advantages, including bidirectional low-loss beam scanning, a simple structure, low power consumption, and a low profile.
△ Less
Submitted 2 August, 2024;
originally announced August 2024.
-
Augmenting Channel Simulator and Semi- Supervised Learning for Efficient Indoor Positioning
Authors:
Yupeng Li,
Xinyu Ning,
Shijian Gao,
Yitong Liu,
Zhi Sun,
Qixing Wang,
Jiangzhou Wang
Abstract:
This work aims to tackle the labor-intensive and resource-consuming task of indoor positioning by proposing an efficient approach. The proposed approach involves the introduction of a semi-supervised learning (SSL) with a biased teacher (SSLB) algorithm, which effectively utilizes both labeled and unlabeled channel data. To reduce measurement expenses, unlabeled data is generated using an updated…
▽ More
This work aims to tackle the labor-intensive and resource-consuming task of indoor positioning by proposing an efficient approach. The proposed approach involves the introduction of a semi-supervised learning (SSL) with a biased teacher (SSLB) algorithm, which effectively utilizes both labeled and unlabeled channel data. To reduce measurement expenses, unlabeled data is generated using an updated channel simulator (UCHS), and then weighted by adaptive confidence values to simplify the tuning of hyperparameters. Simulation results demonstrate that the proposed strategy achieves superior performance while minimizing measurement overhead and training expense compared to existing benchmarks, offering a valuable and practical solution for indoor positioning.
△ Less
Submitted 1 August, 2024;
originally announced August 2024.
-
Edge Learning Based Collaborative Automatic Modulation Classification for Hierarchical Cognitive Radio Networks
Authors:
Peihao Dong,
Chaowei He,
Shen Gao,
Fuhui Zhou,
Qihui Wu
Abstract:
In hierarchical cognitive radio networks, edge or cloud servers utilize the data collected by edge devices for modulation classification, which, however, is faced with problems of the computation load, transmission overhead, and data privacy. In this article, an edge learning (EL) based framework jointly mobilizing the edge device and the edge server for intelligent co-inference is proposed to rea…
▽ More
In hierarchical cognitive radio networks, edge or cloud servers utilize the data collected by edge devices for modulation classification, which, however, is faced with problems of the computation load, transmission overhead, and data privacy. In this article, an edge learning (EL) based framework jointly mobilizing the edge device and the edge server for intelligent co-inference is proposed to realize the collaborative automatic modulation classification (C-AMC) between them. A spectrum semantic compression neural network is designed for the edge device to compress the collected raw data into a compact semantic embedding that is then sent to the edge server via the wireless channel. On the edge server side, a modulation classification neural network combining the bidirectional long-short term memory and attention structures is elaborated to determine the modulation type from the noisy semantic embedding. The C-AMC framework decently balances the computation resources of both sides while avoiding the high transmission overhead and data privacy leakage. Both the offline and online training procedures of the C-AMC framework are elaborated. The compression strategy of the C-AMC framework is also developed to further facilitate the deployment, especially for the resource-constrained edge device. Simulation results show the superiority of the EL-based C-AMC framework in terms of the classification accuracy, computational complexity, and the data compression rate as well as reveal useful insights paving the practical implementation.
△ Less
Submitted 30 July, 2024;
originally announced July 2024.
-
An Interface Method for Co-simulation of EMT Model and Shifted Frequency EMT Model Based on Rotational Invariance Techniques
Authors:
Shilin Gao,
Ying Chen,
Zhitong Yu,
Wensheng Chen,
Yankan Song
Abstract:
The shifted frequency-based electromagnetic transient (SFEMT) simulation has greatly improved the computational efficiency of traditional electromagnetic transient (EMT) simulation for the ac grid. This letter proposes a novel interface for the co-simulation of the SFEMT model and the traditional EMT model. The general form of SFEMT modeling and the principle of analytical signal construction are…
▽ More
The shifted frequency-based electromagnetic transient (SFEMT) simulation has greatly improved the computational efficiency of traditional electromagnetic transient (EMT) simulation for the ac grid. This letter proposes a novel interface for the co-simulation of the SFEMT model and the traditional EMT model. The general form of SFEMT modeling and the principle of analytical signal construction are first derived. Then, an interface for the co-simulation of EMT and SFEMT simulation is proposed based on rotational invariance techniques. Theoretical analyses and test results demonstrate the effectiveness of the proposed method.
△ Less
Submitted 27 August, 2024; v1 submitted 21 July, 2024;
originally announced July 2024.
-
Hamilton-Jacobi Reachability in Reinforcement Learning: A Survey
Authors:
Milan Ganai,
Sicun Gao,
Sylvia Herbert
Abstract:
Recent literature has proposed approaches that learn control policies with high performance while maintaining safety guarantees. Synthesizing Hamilton-Jacobi (HJ) reachable sets has become an effective tool for verifying safety and supervising the training of reinforcement learning-based control policies for complex, high-dimensional systems. Previously, HJ reachability was restricted to verifying…
▽ More
Recent literature has proposed approaches that learn control policies with high performance while maintaining safety guarantees. Synthesizing Hamilton-Jacobi (HJ) reachable sets has become an effective tool for verifying safety and supervising the training of reinforcement learning-based control policies for complex, high-dimensional systems. Previously, HJ reachability was restricted to verifying low-dimensional dynamical systems primarily because the computational complexity of the dynamic programming approach it relied on grows exponentially with the number of system states. In recent years, a litany of proposed methods addresses this limitation by computing the reachability value function simultaneously with learning control policies to scale HJ reachability analysis while still maintaining a reliable estimate of the true reachable set. These HJ reachability approximations are used to improve the safety, and even reward performance, of learned control policies and can solve challenging tasks such as those with dynamic obstacles and/or with lidar-based or vision-based observations. In this survey paper, we review the recent developments in the field of HJ reachability estimation in reinforcement learning that would provide a foundational basis for further research into reliability in high-dimensional systems.
△ Less
Submitted 21 August, 2024; v1 submitted 12 July, 2024;
originally announced July 2024.
-
Multi-modal MRI Translation via Evidential Regression and Distribution Calibration
Authors:
Jiyao Liu,
Shangqi Gao,
Yuxin Li,
Lihao Liu,
Xin Gao,
Zhaohu Xing,
Junzhi Ning,
Yanzhou Su,
Xiao-Yong Zhang,
Junjun He,
Ningsheng Xu,
Xiahai Zhuang
Abstract:
Multi-modal Magnetic Resonance Imaging (MRI) translation leverages information from source MRI sequences to generate target modalities, enabling comprehensive diagnosis while overcoming the limitations of acquiring all sequences. While existing deep-learning-based multi-modal MRI translation methods have shown promising potential, they still face two key challenges: 1) lack of reliable uncertainty…
▽ More
Multi-modal Magnetic Resonance Imaging (MRI) translation leverages information from source MRI sequences to generate target modalities, enabling comprehensive diagnosis while overcoming the limitations of acquiring all sequences. While existing deep-learning-based multi-modal MRI translation methods have shown promising potential, they still face two key challenges: 1) lack of reliable uncertainty quantification for synthesized images, and 2) limited robustness when deployed across different medical centers. To address these challenges, we propose a novel framework that reformulates multi-modal MRI translation as a multi-modal evidential regression problem with distribution calibration. Our approach incorporates two key components: 1) an evidential regression module that estimates uncertainties from different source modalities and an explicit distribution mixture strategy for transparent multi-modal fusion, and 2) a distribution calibration mechanism that adapts to source-target mapping shifts to ensure consistent performance across different medical centers. Extensive experiments on three datasets from the BraTS2023 challenge demonstrate that our framework achieves superior performance and robustness across domains.
△ Less
Submitted 18 May, 2025; v1 submitted 10 July, 2024;
originally announced July 2024.
-
Channel Modeling Aided Dataset Generation for AI-Enabled CSI Feedback: Advances, Challenges, and Solutions
Authors:
Yupeng Li,
Gang Li,
Zirui Wen,
Shuangfeng Han,
Shijian Gao,
Guangyi Liu,
Jiangzhou Wang
Abstract:
The AI-enabled autoencoder has demonstrated great potential in channel state information (CSI) feedback in frequency division duplex (FDD) multiple input multiple output (MIMO) systems. However, this method completely changes the existing feedback strategies, making it impractical to deploy in recent years. To address this issue, this paper proposes a channel modeling aided data augmentation metho…
▽ More
The AI-enabled autoencoder has demonstrated great potential in channel state information (CSI) feedback in frequency division duplex (FDD) multiple input multiple output (MIMO) systems. However, this method completely changes the existing feedback strategies, making it impractical to deploy in recent years. To address this issue, this paper proposes a channel modeling aided data augmentation method based on a limited number of field channel data. Specifically, the user equipment (UE) extracts the primary stochastic parameters of the field channel data and transmits them to the base station (BS). The BS then updates the typical TR 38.901 model parameters with the extracted parameters. In this way, the updated channel model is used to generate the dataset. This strategy comprehensively considers the dataset collection, model generalization, model monitoring, and so on. Simulations verify that our proposed strategy can significantly improve performance compared to the benchmarks.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
LLM4CP: Adapting Large Language Models for Channel Prediction
Authors:
Boxun Liu,
Xuanyu Liu,
Shijian Gao,
Xiang Cheng,
Liuqing Yang
Abstract:
Channel prediction is an effective approach for reducing the feedback or estimation overhead in massive multi-input multi-output (m-MIMO) systems. However, existing channel prediction methods lack precision due to model mismatch errors or network generalization issues. Large language models (LLMs) have demonstrated powerful modeling and generalization abilities, and have been successfully applied…
▽ More
Channel prediction is an effective approach for reducing the feedback or estimation overhead in massive multi-input multi-output (m-MIMO) systems. However, existing channel prediction methods lack precision due to model mismatch errors or network generalization issues. Large language models (LLMs) have demonstrated powerful modeling and generalization abilities, and have been successfully applied to cross-modal tasks, including the time series analysis. Leveraging the expressive power of LLMs, we propose a pre-trained LLM-empowered channel prediction method (LLM4CP) to predict the future downlink channel state information (CSI) sequence based on the historical uplink CSI sequence. We fine-tune the network while freezing most of the parameters of the pre-trained LLM for better cross-modality knowledge transfer. To bridge the gap between the channel data and the feature space of the LLM, preprocessor, embedding, and output modules are specifically tailored by taking into account unique channel characteristics. Simulations validate that the proposed method achieves SOTA prediction performance on full-sample, few-shot, and generalization tests with low training and inference costs.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Self-reconfigurable Multifunctional Memristive Nociceptor for Intelligent Robotics
Authors:
Shengbo Wang,
Mingchao Fang,
Lekai Song,
Cong Li,
Jian Zhang,
Arokia Nathan,
Guohua Hu,
Shuo Gao
Abstract:
Artificial nociceptors, mimicking human-like stimuli perception, are of significance for intelligent robotics to work in hazardous and dynamic scenarios. One of the most essential characteristics of the human nociceptor is its self-adjustable attribute, which indicates that the threshold of determination of a potentially hazardous stimulus relies on environmental knowledge. This critical attribute…
▽ More
Artificial nociceptors, mimicking human-like stimuli perception, are of significance for intelligent robotics to work in hazardous and dynamic scenarios. One of the most essential characteristics of the human nociceptor is its self-adjustable attribute, which indicates that the threshold of determination of a potentially hazardous stimulus relies on environmental knowledge. This critical attribute has been currently omitted, but it is highly desired for artificial nociceptors. Inspired by these shortcomings, this article presents, for the first time, a Self-Directed Channel (SDC) memristor-based self-reconfigurable nociceptor, capable of perceiving hazardous pressure stimuli under different temperatures and demonstrates key features of tactile nociceptors, including 'threshold,' 'no-adaptation,' and 'sensitization.' The maximum amplification of hazardous external stimuli is 1000%, and its response characteristics dynamically adapt to current temperature conditions by automatically altering the generated modulation schemes for the memristor. The maximum difference ratio of the response of memristors at different temperatures is 500%, and this adaptability closely mimics the functions of biological tactile nociceptors, resulting in accurate danger perception in various conditions. Beyond temperature adaptation, this memristor-based nociceptor has the potential to integrate different sensory modalities by applying various sensors, thereby achieving human-like perception capabilities in real-world environments.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge
Authors:
Hongwei Bran Li,
Fernando Navarro,
Ivan Ezhov,
Amirhossein Bayat,
Dhritiman Das,
Florian Kofler,
Suprosanna Shit,
Diana Waldmannstetter,
Johannes C. Paetzold,
Xiaobin Hu,
Benedikt Wiestler,
Lucas Zimmer,
Tamaz Amiranashvili,
Chinmay Prabhakar,
Christoph Berger,
Jonas Weidner,
Michelle Alonso-Basant,
Arif Rashid,
Ujjwal Baid,
Wesam Adel,
Deniz Ali,
Bhakti Baheti,
Yingbin Bai,
Ishaan Bhatt,
Sabri Can Cetindag
, et al. (55 additional authors not shown)
Abstract:
Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de…
▽ More
Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the development and evaluation of automated segmentation algorithms. Accurately modeling and quantifying this variability is essential for enhancing the robustness and clinical applicability of these algorithms. We report the set-up and summarize the benchmark results of the Quantification of Uncertainties in Biomedical Image Quantification Challenge (QUBIQ), which was organized in conjunction with International Conferences on Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2020 and 2021. The challenge focuses on the uncertainty quantification of medical image segmentation which considers the omnipresence of inter-rater variability in imaging datasets. The large collection of images with multi-rater annotations features various modalities such as MRI and CT; various organs such as the brain, prostate, kidney, and pancreas; and different image dimensions 2D-vs-3D. A total of 24 teams submitted different solutions to the problem, combining various baseline models, Bayesian neural networks, and ensemble model techniques. The obtained results indicate the importance of the ensemble models, as well as the need for further research to develop efficient 3D methods for uncertainty quantification methods in 3D segmentation tasks.
△ Less
Submitted 24 June, 2024; v1 submitted 19 March, 2024;
originally announced May 2024.
-
Doubly-Dynamic ISAC Precoding for Vehicular Networks: A Constrained Deep Reinforcement Learning (CDRL) Approach
Authors:
Zonghui Yang,
Shijian Gao,
Xiang Cheng
Abstract:
Integrated sensing and communication (ISAC) technology is essential for supporting vehicular networks. However, the communication channel in this scenario exhibits time variations, and the potential targets may move rapidly, resulting in double dynamics. This nature poses a challenge for real-time precoder design. While optimization-based solutions are widely researched, they are complex and heavi…
▽ More
Integrated sensing and communication (ISAC) technology is essential for supporting vehicular networks. However, the communication channel in this scenario exhibits time variations, and the potential targets may move rapidly, resulting in double dynamics. This nature poses a challenge for real-time precoder design. While optimization-based solutions are widely researched, they are complex and heavily rely on perfect channel-related information, which is impractical in double dynamics. To address this challenge, we propose using constrained deep reinforcement learning to facilitate dynamic updates to the ISAC precoder. Additionally, the primal dual-deep deterministic policy gradient and Wolpertinger architecture are tailored to efficiently train the algorithm under complex constraints and varying numbers of users. The proposed scheme not only adapts to the dynamics based on observations but also leverages environmental information to enhance performance and reduce complexity. Its superiority over existing candidates has been validated through experiments.
△ Less
Submitted 23 August, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
Beam Pattern Modulation Embedded Hybrid Transceiver Optimization for Integrated Sensing and Communication
Authors:
Boxun Liu,
Shijian Gao,
Zonghui Yang,
Xiang Cheng,
Liuqing Yang
Abstract:
Integrated sensing and communication (ISAC) emerges as a promising technology for B5G/6G, particularly in the millimeter-wave (mmWave) band. However, the widely utilized hybrid architecture in mmWave systems compromises multiplexing gain due to the constraints of limited radio frequency chains. Moreover, additional sensing functionalities exacerbate the impairment of spectrum efficiency (SE). In t…
▽ More
Integrated sensing and communication (ISAC) emerges as a promising technology for B5G/6G, particularly in the millimeter-wave (mmWave) band. However, the widely utilized hybrid architecture in mmWave systems compromises multiplexing gain due to the constraints of limited radio frequency chains. Moreover, additional sensing functionalities exacerbate the impairment of spectrum efficiency (SE). In this paper, we present an optimized beam pattern modulation-embedded ISAC (BPM-ISAC) transceiver design, which spares one RF chain for sensing and the others for communication. To compensate for the reduced SE, index modulation across communication beams is applied. We formulate an optimization problem aimed at minimizing the mean squared error (MSE) of the sensing beampattern, subject to a symbol MSE constraint. This problem is then solved by sequentially optimizing the analog and digital parts. Both the multi-aperture structure (MAS) and the multi-beam structure (MBS) are considered for the design of the analog part. We conduct theoretical analysis on the asymptotic pairwise error probability (APEP) and the Cramér-Rao bound (CRB) of direction of arrival (DoA) estimation. Numerical simulations validate the overall enhanced ISAC performance over existing alternatives.
△ Less
Submitted 18 February, 2025; v1 submitted 15 May, 2024;
originally announced May 2024.
-
Design and Implementation of mmWave Surface Wave Enabled Fluid Antennas and Experimental Results for Fluid Antenna Multiple Access
Authors:
Yuanjun Shen,
Boyi Tang,
Shuai Gao,
Kin-Fai Tong,
Hang Wong,
Kai-Kit Wong,
Yangyang Zhang
Abstract:
While multiple-input multiple-output (MIMO) technologies continue to advance, concerns arise as to how MIMO can remain scalable if more users are to be accommodated with an increasing number of antennas at the base station (BS) in the upcoming sixth generation (6G). Recently, the concept of fluid antenna system (FAS) has emerged, which promotes position flexibility to enable transmitter channel st…
▽ More
While multiple-input multiple-output (MIMO) technologies continue to advance, concerns arise as to how MIMO can remain scalable if more users are to be accommodated with an increasing number of antennas at the base station (BS) in the upcoming sixth generation (6G). Recently, the concept of fluid antenna system (FAS) has emerged, which promotes position flexibility to enable transmitter channel state information (CSI) free spatial multiple access on one radio frequency (RF) chain. On the theoretical side, the fluid antenna multiple access (FAMA) approach offers a scalable alternative to massive MIMO spatial multiplexing. However, FAMA lacks experimental validation and the hardware implementation of FAS remains a mysterious approach. The aim of this paper is to provide a novel hardware design for FAS and evaluate the performance of FAMA using experimental data. Our FAS design is based on a dynamically reconfigurable "fluid" radiator which is capable of adjusting its position within a predefined space. One single-channel fluid antenna (SCFA) and one double-channel fluid antenna (DCFA) are designed, electromagnetically simulated, fabricated, and measured. The measured radiation patterns of prototypes are imported into channel and network models for evaluating their performance in FAMA. The experimental results demonstrate that in the 5G millimeter-wave (mmWave) bands (24-30 GHz), the FAS prototypes can vary their gain up to an averaged value of 11 dBi. In the case of 4-user FAMA, the double-channel FAS can significantly reduce outage probability by 57% and increases the multiplexing gain to 2.27 when compared to a static omnidirectional antenna.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.