-
Multimodal LLM Integrated Semantic Communications for 6G Immersive Experiences
Authors:
Yusong Zhang,
Yuxuan Sun,
Lei Guo,
Wei Chen,
Bo Ai,
Deniz Gunduz
Abstract:
6G networks promise revolutionary immersive communication experiences including augmented reality (AR), virtual reality (VR), and holographic communications. These applications demand high-dimensional multimodal data transmission and intelligent data processing in real-time, which is extremely challenging over resource-limited wireless communication systems. Moreover, a joint understanding of the…
▽ More
6G networks promise revolutionary immersive communication experiences including augmented reality (AR), virtual reality (VR), and holographic communications. These applications demand high-dimensional multimodal data transmission and intelligent data processing in real-time, which is extremely challenging over resource-limited wireless communication systems. Moreover, a joint understanding of the environment, context, and user intent is essential to deliver task-relevant content effectively. This article presents a novel multimodal large language model (MLLM) integrated semantic communications framework, termed MLLM-SC, which fully leverages reasoning and generative capabilities of pre-trained foundation models for context-aware and task-oriented wireless communication. The MLLM-SC framework adopts a device-edge collaborative architecture. At the edge, MLLM-empowered semantic guidance module analyzes multimodal inputs, user intents, and channel conditions to generate importance-aware attention maps prioritizing semantically critical information. An importance-aware semantic encoder and a resource-adaptive semantic decoder are jointly designed and optimized, which can utilize the semantic guidance for adaptive bandwidth allocation and high-quality content reconstruction or generation. Extensive case studies on visual question answering for AR/VR applications and diffusion-driven image generation validate the effectiveness of MLLM-SC.
△ Less
Submitted 6 July, 2025;
originally announced July 2025.
-
Joint Power Control and Precoding for Cell-Free Massive MIMO Systems With Sparse Multi-Dimensional Graph Neural Networks
Authors:
Yukun Ma,
Jiayi Zhang,
Ziheng Liu,
Guowei Shi,
Bo Ai
Abstract:
Cell-free massive multiple-input multiple-output (CF mMIMO) has emerged as a prominent candidate for future networks due to its ability to significantly enhance spectral efficiency by eliminating inter-cell interference. However, its practical deployment faces considerable challenges, such as high computational complexity and the optimization of its complex processing. To address these challenges,…
▽ More
Cell-free massive multiple-input multiple-output (CF mMIMO) has emerged as a prominent candidate for future networks due to its ability to significantly enhance spectral efficiency by eliminating inter-cell interference. However, its practical deployment faces considerable challenges, such as high computational complexity and the optimization of its complex processing. To address these challenges, this correspondence proposes a framework based on a sparse multi-dimensional graph neural network (SP-MDGNN), which sparsifies the connections between access points (APs) and user equipments (UEs) to significantly reduce computational complexity while maintaining high performance. In addition, the weighted minimum mean square error (WMMSE) algorithm is introduced as a comparative method to further analyze the trade-off between performance and complexity. Simulation results demonstrate that the sparse method achieves an optimal balance between performance and complexity, significantly reducing the computational complexity of the original MDGNN method while incurring only a slight performance degradation, providing insights for the practical deployment of CF mMIMO systems in large-scale network.
△ Less
Submitted 2 July, 2025;
originally announced July 2025.
-
Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation
Authors:
Qiyue Gao,
Xinyu Pi,
Kevin Liu,
Junrong Chen,
Ruolan Yang,
Xinqi Huang,
Xinyu Fang,
Lu Sun,
Gautham Kishore,
Bo Ai,
Stone Tao,
Mengyang Liu,
Jiaxi Yang,
Chao-Jung Lai,
Chuanyang Jin,
Jiannan Xiang,
Benhao Huang,
Zeming Chen,
David Danks,
Hao Su,
Tianmin Shu,
Ziqiao Ma,
Lianhui Qin,
Zhiting Hu
Abstract:
Internal world models (WMs) enable agents to understand the world's state and predict transitions, serving as the basis for advanced deliberative reasoning. Recent large Vision-Language Models (VLMs), such as OpenAI o3, GPT-4o and Gemini, exhibit potential as general-purpose WMs. While the latest studies have evaluated and shown limitations in specific capabilities such as visual understanding, a…
▽ More
Internal world models (WMs) enable agents to understand the world's state and predict transitions, serving as the basis for advanced deliberative reasoning. Recent large Vision-Language Models (VLMs), such as OpenAI o3, GPT-4o and Gemini, exhibit potential as general-purpose WMs. While the latest studies have evaluated and shown limitations in specific capabilities such as visual understanding, a systematic evaluation of VLMs' fundamental WM abilities remains absent. Drawing on comparative psychology and cognitive science, we propose a two-stage framework that assesses Perception (visual, spatial, temporal, quantitative, and motion) and Prediction (mechanistic simulation, transitive inference, compositional inference) to provide an atomic evaluation of VLMs as WMs. Guided by this framework, we introduce WM-ABench, a large-scale benchmark comprising 23 fine-grained evaluation dimensions across 6 diverse simulated environments with controlled counterfactual simulations. Through 660 experiments on 15 latest commercial and open-source VLMs, we find that these models exhibit striking limitations in basic world modeling abilities. For instance, almost all models perform at near-random accuracy when distinguishing motion trajectories. Additionally, they lack disentangled understanding -- e.g., some models tend to believe blue objects move faster than green ones. More rich results and analyses reveal significant gaps between VLMs and human-level world modeling.
△ Less
Submitted 26 June, 2025;
originally announced June 2025.
-
Flexible MIMO for Future Wireless Communications: Which Flexibilities are Possible?
Authors:
Zhe Wang,
Jiayi Zhang,
Bokai Xu,
Wenhui Yi,
Emil Björnson,
Bo Ai
Abstract:
To enable next-generation wireless communication networks with modest spectrum availability, multiple-input multiple-output (MIMO) technology needs to undergo further evolution. In this paper, we introduce a promising next-generation wireless communication concept: flexible MIMO technology. This technology represents a MIMO technology with flexible physical configurations and integrated applicatio…
▽ More
To enable next-generation wireless communication networks with modest spectrum availability, multiple-input multiple-output (MIMO) technology needs to undergo further evolution. In this paper, we introduce a promising next-generation wireless communication concept: flexible MIMO technology. This technology represents a MIMO technology with flexible physical configurations and integrated applications. We categorize twelve representative flexible MIMO technologies into three major classifications: flexible deployment characteristics-based, flexible geometry characteristics-based, and flexible real-time modifications-based. Then, we provide a comprehensive overview of their fundamental characteristics, potential, and challenges. Furthermore, we demonstrate three vital enablers for the flexible MIMO technology, including efficient channel state information (CSI) acquisition schemes, low-complexity beamforming design, and explainable artificial intelligence (AI)-enabled optimization. Within these areas, eight critical sub-enabling technologies are discussed in detail. Finally, we present two case studies-pre-optimized irregular arrays and cell-free movable antennas-where significant potential for flexible MIMO technologies to enhance the system capacity is showcased.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
SAVOR: Skill Affordance Learning from Visuo-Haptic Perception for Robot-Assisted Bite Acquisition
Authors:
Zhanxin Wu,
Bo Ai,
Tom Silver,
Tapomayukh Bhattacharjee
Abstract:
Robot-assisted feeding requires reliable bite acquisition, a challenging task due to the complex interactions between utensils and food with diverse physical properties. These interactions are further complicated by the temporal variability of food properties-for example, steak becomes firm as it cools even during a meal. To address this, we propose SAVOR, a novel approach for learning skill affor…
▽ More
Robot-assisted feeding requires reliable bite acquisition, a challenging task due to the complex interactions between utensils and food with diverse physical properties. These interactions are further complicated by the temporal variability of food properties-for example, steak becomes firm as it cools even during a meal. To address this, we propose SAVOR, a novel approach for learning skill affordances for bite acquisition-how suitable a manipulation skill (e.g., skewering, scooping) is for a given utensil-food interaction. In our formulation, skill affordances arise from the combination of tool affordances (what a utensil can do) and food affordances (what the food allows). Tool affordances are learned offline through calibration, where different utensils interact with a variety of foods to model their functional capabilities. Food affordances are characterized by physical properties such as softness, moisture, and viscosity, initially inferred through commonsense reasoning using a visually-conditioned language model and then dynamically refined through online multi-modal visuo-haptic perception using SAVOR-Net during interaction. Our method integrates these offline and online estimates to predict skill affordances in real time, enabling the robot to select the most appropriate skill for each food item. Evaluated on 20 single-item foods and 10 in-the-wild meals, our approach improves bite acquisition success by 13% over state-of-the-art (SOTA) category-based methods (e.g. use skewer for fruits). These results highlight the importance of modeling interaction-driven skill affordances for generalizable and effective robot-assisted bite acquisition. Website: https://emprise.cs.cornell.edu/savor/
△ Less
Submitted 2 June, 2025;
originally announced June 2025.
-
Multi-Waveguide Pinching Antennas for ISAC
Authors:
Weihao Mao,
Yang Lu,
Yanqing Xu,
Bo Ai,
Octavia A. Dobre,
Dusit Niyato
Abstract:
Recently, a novel flexible-antenna technology, called pinching antennas, has attracted growing academic interest. By inserting discrete dielectric materials, pinching antennas can be activated at arbitrary points along waveguides, allowing for flexible customization of large-scale path loss. This paper investigates a multi-waveguide pinching-antenna integrated sensing and communications (ISAC) sys…
▽ More
Recently, a novel flexible-antenna technology, called pinching antennas, has attracted growing academic interest. By inserting discrete dielectric materials, pinching antennas can be activated at arbitrary points along waveguides, allowing for flexible customization of large-scale path loss. This paper investigates a multi-waveguide pinching-antenna integrated sensing and communications (ISAC) system, where transmit pinching antennas (TPAs) and receive pinching antennas (RPAs) coordinate to simultaneously detect one potential target and serve one downlink user. We formulate a communication rate maximization problem subject to radar signal-to-noise ratio (SNR) requirement, transmit power budget, and the allowable movement region of the TPAs, by jointly optimizing TPA locations and transmit beamforming design. To address the non-convexity of the problem, we propose a novel fine-tuning approximation method to reformulate it into a tractable form, followed by a successive convex approximation (SCA)-based algorithm to obtain the solution efficiently. Extensive simulations validate both the system design and the proposed algorithm. Results show that the proposed method achieves near-optimal performance compared with the computational-intensive exhaustive search-based benchmark, and pinching-antenna ISAC systems exhibit a distinct communication-sensing trade-off compared with conventional systems.
△ Less
Submitted 30 May, 2025;
originally announced May 2025.
-
Sensing-Enhanced Handover Criterion for Low-Altitude Wireless Networks (LAWNs)
Authors:
Jingli Li,
Yiyan Ma,
Bo Ai,
Qingqing Cheng,
Guoyu Ma,
Mi Yang,
Yunlong Lu,
Wenwei Yue,
Zhangdui Zhong
Abstract:
With the rapid growth of the low-altitude economy, the demand for cellular-enabled low-altitude wireless networks (LAWNs) is rising significantly. The three-dimensional mobility of unmanned aerial vehicles (UAVs) will lead to frequent handovers (HOs) in cellular networks, while traditional reference signal received power (RSRP)-based criteria may fail to capture the dynamic environment, causing re…
▽ More
With the rapid growth of the low-altitude economy, the demand for cellular-enabled low-altitude wireless networks (LAWNs) is rising significantly. The three-dimensional mobility of unmanned aerial vehicles (UAVs) will lead to frequent handovers (HOs) in cellular networks, while traditional reference signal received power (RSRP)-based criteria may fail to capture the dynamic environment, causing redundant HOs or HO failures. To address this issue and motivated by the underutilization of sensing information in conventional HO mechanisms, we propose a novel HO activation criterion for UAV systems that integrates both sensing parameters provided by integrated sensing and communication (ISAC) signals and RSRP. First, we construct an ISAC signal model tailored for low-altitude scenarios and derive the Cramér-Rao lower bound for sensing distance estimation. Subsequently, we propose a novel joint HO criterion that extends the conventional RSRP-based method by integrating sensing information from ISAC signals, enabling more reliable HOs in dynamic UAV environments. Simulation results show that the joint HO criterion outperforms the baseline RSRP-based criterion under different signal-to-noise ratio (SNR) and sensing pilot ratio conditions. Particularly, when SNR is greater than 0dB and the sensing pilot ratio is 20%, the proposed joint HO criterion reduces the average HO region length by 49.97% and improves the activation probability by 76.31%.
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
Towards Embodiment Scaling Laws in Robot Locomotion
Authors:
Bo Ai,
Liu Dai,
Nico Bohlinger,
Dichen Li,
Tongzhou Mu,
Zhanxin Wu,
K. Fay,
Henrik I. Christensen,
Jan Peters,
Hao Su
Abstract:
Developing generalist agents that can operate across diverse tasks, environments, and physical embodiments is a grand challenge in robotics and artificial intelligence. In this work, we focus on the axis of embodiment and investigate embodiment scaling laws$\unicode{x2013}$the hypothesis that increasing the number of training embodiments improves generalization to unseen ones. Using robot locomoti…
▽ More
Developing generalist agents that can operate across diverse tasks, environments, and physical embodiments is a grand challenge in robotics and artificial intelligence. In this work, we focus on the axis of embodiment and investigate embodiment scaling laws$\unicode{x2013}$the hypothesis that increasing the number of training embodiments improves generalization to unseen ones. Using robot locomotion as a test bed, we procedurally generate a dataset of $\sim$1,000 varied embodiments, spanning humanoids, quadrupeds, and hexapods, and train generalist policies capable of handling diverse observation and action spaces on random subsets. We find that increasing the number of training embodiments improves generalization to unseen ones, and scaling embodiments is more effective in enabling embodiment-level generalization than scaling data on small, fixed sets of embodiments. Notably, our best policy, trained on the full dataset, zero-shot transfers to novel embodiments in the real world, such as Unitree Go2 and H1. These results represent a step toward general embodied intelligence, with potential relevance to adaptive control for configurable robots, co-design of morphology and control, and beyond.
△ Less
Submitted 8 May, 2025;
originally announced May 2025.
-
Energy-Efficient SIM-assisted Communications: How Many Layers Do We Need?
Authors:
Enyu Shi,
Jiayi Zhang,
Jiancheng An,
Marco Di Renzo,
Bo Ai,
Chau Yuen
Abstract:
The stacked intelligent metasurface (SIM), comprising multiple layers of reconfigurable transmissive metasurfaces, is becoming an increasingly viable solution for future wireless communication systems. In this paper, we explore the integration of SIM in a multi-antenna base station for application to downlink multi-user communications, and a realistic power consumption model for SIM-assisted syste…
▽ More
The stacked intelligent metasurface (SIM), comprising multiple layers of reconfigurable transmissive metasurfaces, is becoming an increasingly viable solution for future wireless communication systems. In this paper, we explore the integration of SIM in a multi-antenna base station for application to downlink multi-user communications, and a realistic power consumption model for SIM-assisted systems is presented. Specifically, we focus on maximizing the energy efficiency (EE) for hybrid precoding design, i.e., the base station digital precoding and SIM wave-based beamforming. Due to the non-convexity and high complexity of the formulated problem, we employ the quadratic transformation method to reformulate the optimization problem and propose an alternating optimization (AO)-based joint precoding framework. Specifically, a successive convex approximation (SCA) algorithm is adopted for the base station precoding design. For the SIM wave-based beamforming, two algorithms are employed: the high-performance semidefinite programming (SDP) method and the low-complexity projected gradient ascent (PGA) algorithm. In particular, the results indicate that while the optimal number of SIM layers for maximizing the EE and spectral efficiency differs, a design of 2 to 5 layers can achieve satisfactory performance for both. Finally, numerical results are illustrated to evaluate the effectiveness of the proposed hybrid precoding framework and to showcase the performance enhancement achieved by the algorithm in comparison to benchmark schemes.
△ Less
Submitted 22 April, 2025;
originally announced April 2025.
-
Uplink Assisted Joint Channel Estimation and CSI Feedback: An Approach Based on Deep Joint Source-Channel Coding
Authors:
Yiran Guo,
Wei Chen,
Bo Ai
Abstract:
In frequency division duplex (FDD) multiple-input multiple-output (MIMO) wireless communication systems, the acquisition of downlink channel state information (CSI) is essential for maximizing spatial resource utilization and improving system spectral efficiency. The separate design of modules in AI-based CSI feedback architectures under traditional modular communication frameworks, including chan…
▽ More
In frequency division duplex (FDD) multiple-input multiple-output (MIMO) wireless communication systems, the acquisition of downlink channel state information (CSI) is essential for maximizing spatial resource utilization and improving system spectral efficiency. The separate design of modules in AI-based CSI feedback architectures under traditional modular communication frameworks, including channel estimation (CE), CSI compression and feedback, leads to sub-optimal performance. In this paper, we propose an uplink assisted joint CE and and CSI feedback approach via deep learning for downlink CSI acquisition, which mitigates performance degradation caused by distribution bias across separately trained modules in traditional modular communication frameworks. The proposed network adopts a deep joint source-channel coding (DJSCC) architecture to mitigate the cliff effect encountered in the conventional separate source-channel coding. Furthermore, we exploit the uplink CSI as auxiliary information to enhance CSI reconstruction accuracy by leveraging the partial reciprocity between the uplink and downlink channels in FDD systems, without introducing additional overhead. The effectiveness of uplink CSI as assisted information and the necessity of an end-toend multi-module joint training architecture is validated through comprehensive ablation and scalability experiments.
△ Less
Submitted 14 April, 2025;
originally announced April 2025.
-
White-Box AI Model: Next Frontier of Wireless Communications
Authors:
Jiayao Yang,
Jiayi Zhang,
Bokai Xu,
Jiakang Zheng,
Zhilong Liu,
Ziheng Liu,
Dusit Niyato,
Mérouane Debbah,
Zhu Han,
Bo Ai
Abstract:
White-box AI (WAI), or explainable AI (XAI) model, a novel tool to achieve the reasoning behind decisions and predictions made by the AI algorithms, makes it more understandable and transparent. It offers a new approach to address key challenges of interpretability and mathematical validation in traditional black-box models. In this paper, WAI-aided wireless communication systems are proposed and…
▽ More
White-box AI (WAI), or explainable AI (XAI) model, a novel tool to achieve the reasoning behind decisions and predictions made by the AI algorithms, makes it more understandable and transparent. It offers a new approach to address key challenges of interpretability and mathematical validation in traditional black-box models. In this paper, WAI-aided wireless communication systems are proposed and investigated thoroughly to utilize the promising capabilities. First, we introduce the fundamental principles of WAI. Then, a detailed comparison between WAI and traditional black-box model is conducted in terms of optimization objectives and architecture design, with a focus on deep neural networks (DNNs) and transformer networks. Furthermore, in contrast to the traditional black-box methods, WAI leverages theory-driven causal modeling and verifiable optimization paths, thereby demonstrating potential advantages in areas such as signal processing and resource allocation. Finally, we outline future research directions for the integration of WAI in wireless communication systems.
△ Less
Submitted 12 April, 2025;
originally announced April 2025.
-
Wan: Open and Advanced Large-Scale Video Generative Models
Authors:
Team Wan,
Ang Wang,
Baole Ai,
Bin Wen,
Chaojie Mao,
Chen-Wei Xie,
Di Chen,
Feiwu Yu,
Haiming Zhao,
Jianxiao Yang,
Jianyuan Zeng,
Jiayu Wang,
Jingfeng Zhang,
Jingren Zhou,
Jinkai Wang,
Jixuan Chen,
Kai Zhu,
Kang Zhao,
Keyu Yan,
Lianghua Huang,
Mengyang Feng,
Ningyi Zhang,
Pandeng Li,
Pingyu Wu,
Ruihang Chu
, et al. (37 additional authors not shown)
Abstract:
This report presents Wan, a comprehensive and open suite of video foundation models designed to push the boundaries of video generation. Built upon the mainstream diffusion transformer paradigm, Wan achieves significant advancements in generative capabilities through a series of innovations, including our novel VAE, scalable pre-training strategies, large-scale data curation, and automated evaluat…
▽ More
This report presents Wan, a comprehensive and open suite of video foundation models designed to push the boundaries of video generation. Built upon the mainstream diffusion transformer paradigm, Wan achieves significant advancements in generative capabilities through a series of innovations, including our novel VAE, scalable pre-training strategies, large-scale data curation, and automated evaluation metrics. These contributions collectively enhance the model's performance and versatility. Specifically, Wan is characterized by four key features: Leading Performance: The 14B model of Wan, trained on a vast dataset comprising billions of images and videos, demonstrates the scaling laws of video generation with respect to both data and model size. It consistently outperforms the existing open-source models as well as state-of-the-art commercial solutions across multiple internal and external benchmarks, demonstrating a clear and significant performance superiority. Comprehensiveness: Wan offers two capable models, i.e., 1.3B and 14B parameters, for efficiency and effectiveness respectively. It also covers multiple downstream applications, including image-to-video, instruction-guided video editing, and personal video generation, encompassing up to eight tasks. Consumer-Grade Efficiency: The 1.3B model demonstrates exceptional resource efficiency, requiring only 8.19 GB VRAM, making it compatible with a wide range of consumer-grade GPUs. Openness: We open-source the entire series of Wan, including source code and all models, with the goal of fostering the growth of the video generation community. This openness seeks to significantly expand the creative possibilities of video production in the industry and provide academia with high-quality video foundation models. All the code and models are available at https://github.com/Wan-Video/Wan2.1.
△ Less
Submitted 18 April, 2025; v1 submitted 26 March, 2025;
originally announced March 2025.
-
Learning Adaptive Dexterous Grasping from Single Demonstrations
Authors:
Liangzhi Shi,
Yulin Liu,
Lingqi Zeng,
Bo Ai,
Zhengdong Hong,
Hao Su
Abstract:
How can robots learn dexterous grasping skills efficiently and apply them adaptively based on user instructions? This work tackles two key challenges: efficient skill acquisition from limited human demonstrations and context-driven skill selection. We introduce AdaDexGrasp, a framework that learns a library of grasping skills from a single human demonstration per skill and selects the most suitabl…
▽ More
How can robots learn dexterous grasping skills efficiently and apply them adaptively based on user instructions? This work tackles two key challenges: efficient skill acquisition from limited human demonstrations and context-driven skill selection. We introduce AdaDexGrasp, a framework that learns a library of grasping skills from a single human demonstration per skill and selects the most suitable one using a vision-language model (VLM). To improve sample efficiency, we propose a trajectory following reward that guides reinforcement learning (RL) toward states close to a human demonstration while allowing flexibility in exploration. To learn beyond the single demonstration, we employ curriculum learning, progressively increasing object pose variations to enhance robustness. At deployment, a VLM retrieves the appropriate skill based on user instructions, bridging low-level learned skills with high-level intent. We evaluate AdaDexGrasp in both simulation and real-world settings, showing that our approach significantly improves RL efficiency and enables learning human-like grasp strategies across varied object configurations. Finally, we demonstrate zero-shot transfer of our learned policies to a real-world PSYONIC Ability Hand, with a 90% success rate across objects, significantly outperforming the baseline.
△ Less
Submitted 26 March, 2025;
originally announced March 2025.
-
Hierarchy-Aware and Channel-Adaptive Semantic Communication for Bandwidth-Limited Data Fusion
Authors:
Lei Guo,
Wei Chen,
Yuxuan Sun,
Bo Ai,
Nikolaos Pappas,
Tony Quek
Abstract:
Obtaining high-resolution hyperspectral images (HR-HSI) is costly and data-intensive, making it necessary to fuse low-resolution hyperspectral images (LR-HSI) with high-resolution RGB images (HR-RGB) for practical applications. However, traditional fusion techniques, which integrate detailed information into the reconstruction, significantly increase bandwidth consumption compared to directly tran…
▽ More
Obtaining high-resolution hyperspectral images (HR-HSI) is costly and data-intensive, making it necessary to fuse low-resolution hyperspectral images (LR-HSI) with high-resolution RGB images (HR-RGB) for practical applications. However, traditional fusion techniques, which integrate detailed information into the reconstruction, significantly increase bandwidth consumption compared to directly transmitting raw data. To overcome these challenges, we propose a hierarchy-aware and channel-adaptive semantic communication approach for bandwidth-limited data fusion. A hierarchical correlation module is proposed to preserve both the overall structural information and the details of the image required for super-resolution. This module efficiently combines deep semantic and shallow features from LR-HSI and HR-RGB. To further reduce bandwidth usage while preserving reconstruction quality, a channel-adaptive attention mechanism based on Transformer is proposed to dynamically integrate and transmit the deep and shallow features, enabling efficient data transmission and high-quality HR-HSI reconstruction. Experimental results on the CAVE and Washington DC Mall datasets demonstrate that our method outperforms single-source transmission, achieving up to a 2 dB improvement in peak signal-to-noise ratio (PSNR). Additionally, it reduces bandwidth consumption by two-thirds, confirming its effectiveness in bandwidth-constrained environments for HR-HSI reconstruction tasks.
△ Less
Submitted 22 March, 2025;
originally announced March 2025.
-
A CGAN-LSTM-Based Framework for Time-Varying Non-Stationary Channel Modeling
Authors:
Keying Guo,
Ruisi He,
Mi Yang,
Yuxin Zhang,
Bo Ai,
Haoxiang Zhang,
Jiahui Han,
Ruifeng Chen
Abstract:
Time-varying non-stationary channels, with complex dynamic variations and temporal evolution characteristics, have significant challenges in channel modeling and communication system performance evaluation. Most existing methods of time-varying channel modeling focus on predicting channel state at a given moment or simulating short-term channel fluctuations, which are unable to capture the long-te…
▽ More
Time-varying non-stationary channels, with complex dynamic variations and temporal evolution characteristics, have significant challenges in channel modeling and communication system performance evaluation. Most existing methods of time-varying channel modeling focus on predicting channel state at a given moment or simulating short-term channel fluctuations, which are unable to capture the long-term evolution of the channel. This paper emphasizes the generation of long-term dynamic channel to fully capture evolution of non-stationary channel properties. The generated channel not only reflects temporal dynamics but also ensures consistent stationarity. We propose a hybrid deep learning framework that combines conditional generative adversarial networks (CGAN) with long short-term memory (LSTM) networks. A stationarity-constrained approach is designed to ensure temporal correlation of the generated time-series channel. This method can generate channel with required temporal non-stationarity. The model is validated by comparing channel statistical features, and the results show that the generated channel is in good agreement with raw channel and provides good performance in terms of non-stationarity.
△ Less
Submitted 2 March, 2025;
originally announced March 2025.
-
Diffusion Dynamics Models with Generative State Estimation for Cloth Manipulation
Authors:
Tongxuan Tian,
Haoyang Li,
Bo Ai,
Xiaodi Yuan,
Zhiao Huang,
Hao Su
Abstract:
Manipulating deformable objects like cloth is challenging due to their complex dynamics, near-infinite degrees of freedom, and frequent self-occlusions, which complicate state estimation and dynamics modeling. Prior work has struggled with robust cloth state estimation, while dynamics models, primarily based on Graph Neural Networks (GNNs), are limited by their locality. Inspired by recent advance…
▽ More
Manipulating deformable objects like cloth is challenging due to their complex dynamics, near-infinite degrees of freedom, and frequent self-occlusions, which complicate state estimation and dynamics modeling. Prior work has struggled with robust cloth state estimation, while dynamics models, primarily based on Graph Neural Networks (GNNs), are limited by their locality. Inspired by recent advances in generative models, we hypothesize that these expressive models can effectively capture intricate cloth configurations and deformation patterns from data. Building on this insight, we propose a diffusion-based generative approach for both perception and dynamics modeling. Specifically, we formulate state estimation as reconstructing the full cloth state from sparse RGB-D observations conditioned on a canonical cloth mesh and dynamics modeling as predicting future states given the current state and robot actions. Leveraging a transformer-based diffusion model, our method achieves high-fidelity state reconstruction while reducing long-horizon dynamics prediction errors by an order of magnitude compared to GNN-based approaches. Integrated with model-predictive control (MPC), our framework successfully executes cloth folding on a real robotic system, demonstrating the potential of generative models for manipulation tasks with partial observability and complex dynamics.
△ Less
Submitted 15 March, 2025;
originally announced March 2025.
-
Channel Estimation for Rydberg Atomic Receivers
Authors:
Bokai Xu,
Jiayi Zhang,
Zhongtao Chen,
Bingyang Cheng,
Ziheng Liu,
Yik-Chung Wu,
Bo Ai
Abstract:
The rapid development of the quantum technology presents huge opportunities for 6G communications. Leveraging the quantum properties of highly excited Rydberg atoms, Rydberg atom-based antennas present distinct advantages, such as high sensitivity, broad frequency range, and compact size, over traditional antennas. To realize efficient precoding, accurate channel state information is essential. Ho…
▽ More
The rapid development of the quantum technology presents huge opportunities for 6G communications. Leveraging the quantum properties of highly excited Rydberg atoms, Rydberg atom-based antennas present distinct advantages, such as high sensitivity, broad frequency range, and compact size, over traditional antennas. To realize efficient precoding, accurate channel state information is essential. However, due to the distinct characteristics of atomic receivers, traditional channel estimation algorithms developed for conventional receivers are no longer applicable. To this end, we propose a novel channel estimation algorithm based on projection gradient descent (PGD), which is applicable to both one-dimensional (1D) and twodimensional (2D) arrays. Simulation results are provided to show the effectiveness of our proposed channel estimation method.
△ Less
Submitted 9 June, 2025; v1 submitted 11 March, 2025;
originally announced March 2025.
-
Beamforming Design for Beyond Diagonal RIS-Aided Cell-Free Massive MIMO Systems
Authors:
Yizhuo Li,
Jiakang Zheng,
Bokai Xu,
Yiyang Zhu,
Jiayi Zhang,
Bo Ai
Abstract:
Reconfigurable intelligent surface (RIS)-aided cell-free (CF) massive multiple-input multiple-output (mMIMO) is a promising architecture for further improving spectral efficiency (SE) with low cost and power consumption. However, conventional RIS has inevitable limitations due to its capability of only reflecting signals. In contrast, beyond-diagonal RIS (BD-RIS), with its ability to both reflect…
▽ More
Reconfigurable intelligent surface (RIS)-aided cell-free (CF) massive multiple-input multiple-output (mMIMO) is a promising architecture for further improving spectral efficiency (SE) with low cost and power consumption. However, conventional RIS has inevitable limitations due to its capability of only reflecting signals. In contrast, beyond-diagonal RIS (BD-RIS), with its ability to both reflect and transmit signals, has gained great attention. This correspondence focuses on using BD-RIS to improve the sum SE of CF mMIMO systems. This requires completing the beamforming design under the transmit power constraints and unitary constraints of the BD-RIS, by optimizing active and passive beamformer simultaneously. To tackle this issue, we introduce an alternating optimization algorithm that decomposes it using fractional programming and solves the subproblems alternatively. Moreover, to address the challenge introduced by the unitary constraint on the beamforming matrix of the BD-RIS, a manifold optimization algorithm is proposed to solve the problem optimally. Simulation results show that BD-RISs outperform RISs comprehensively, especially in the case of the full connected architecture which achieves the best performance, enhancing the sum SE by around 40% compared to ideal RISs.
△ Less
Submitted 10 March, 2025;
originally announced March 2025.
-
Optimal Bilinear Equalizer Beamforming Design for Cell-Free Massive MIMO Networks with Arbitrary Channel Estimators
Authors:
Zhe Wang,
Jiayi Zhang,
Hao Lei,
Dusit Niyato,
Bo Ai
Abstract:
This paper studies the distributed optimal bilinear equalizer (OBE) beamforming design for both the uplink and downlink cell-free massive multiple-input multiple-output networks. We consider arbitrary statistics-based channel estimators over spatially correlated Rician fading channels. In the uplink, we derive the achievable spectral efficiency (SE) performance and OBE combining schemes with arbit…
▽ More
This paper studies the distributed optimal bilinear equalizer (OBE) beamforming design for both the uplink and downlink cell-free massive multiple-input multiple-output networks. We consider arbitrary statistics-based channel estimators over spatially correlated Rician fading channels. In the uplink, we derive the achievable spectral efficiency (SE) performance and OBE combining schemes with arbitrary statistics-based channel estimators and compute their respective closed-form expressions. It is insightful to explore that the achievable SE performance is not dependent on the choice of channel estimator when OBE combining schemes are applied over Rayleigh channels. In the downlink, we derive the achievable SE performance expressions with BE precoding schemes and arbitrary statistics-based channel estimators utilized and compute them in closed form. Then, we obtain the OBE precoding scheme leveraging insights from uplink OBE combining schemes.
△ Less
Submitted 2 March, 2025;
originally announced March 2025.
-
Joint Power Allocation and Phase Shift Design for Stacked Intelligent Metasurfaces-aided Cell-Free Massive MIMO Systems with MARL
Authors:
Yiyang Zhu,
Jiayi Zhang,
Enyu Shi,
Ziheng Liu,
Chau Yuen,
Bo Ai
Abstract:
Cell-free (CF) massive multiple-input multiple-output (mMIMO) systems offer high spectral efficiency (SE) through multiple distributed access points (APs). However, the large number of antennas increases power consumption. We propose incorporating stacked intelligent metasurfaces (SIM) into CF mMIMO systems as a cost-effective, energy-efficient solution. This paper focuses on optimizing the joint…
▽ More
Cell-free (CF) massive multiple-input multiple-output (mMIMO) systems offer high spectral efficiency (SE) through multiple distributed access points (APs). However, the large number of antennas increases power consumption. We propose incorporating stacked intelligent metasurfaces (SIM) into CF mMIMO systems as a cost-effective, energy-efficient solution. This paper focuses on optimizing the joint power allocation of APs and the phase shift of SIMs to maximize the sum SE. To address this complex problem, we introduce a fully distributed multi-agent reinforcement learning (MARL) algorithm. Our novel algorithm, the noisy value method with a recurrent policy in multi-agent policy optimization (NVR-MAPPO), enhances performance by encouraging diverse exploration under centralized training and decentralized execution. Simulations demonstrate that NVR-MAPPO significantly improves sum SE and robustness across various scenarios.
△ Less
Submitted 26 February, 2025;
originally announced February 2025.
-
Multi-Agent Reinforcement Learning in Wireless Distributed Networks for 6G
Authors:
Jiayi Zhang,
Ziheng Liu,
Yiyang Zhu,
Enyu Shi,
Bokai Xu,
Chau Yuen,
Dusit Niyato,
Mérouane Debbah,
Shi Jin,
Bo Ai,
Xuemin,
Shen
Abstract:
The introduction of intelligent interconnectivity between the physical and human worlds has attracted great attention for future sixth-generation (6G) networks, emphasizing massive capacity, ultra-low latency, and unparalleled reliability. Wireless distributed networks and multi-agent reinforcement learning (MARL), both of which have evolved from centralized paradigms, are two promising solutions…
▽ More
The introduction of intelligent interconnectivity between the physical and human worlds has attracted great attention for future sixth-generation (6G) networks, emphasizing massive capacity, ultra-low latency, and unparalleled reliability. Wireless distributed networks and multi-agent reinforcement learning (MARL), both of which have evolved from centralized paradigms, are two promising solutions for the great attention. Given their distinct capabilities, such as decentralization and collaborative mechanisms, integrating these two paradigms holds great promise for unleashing the full power of 6G, attracting significant research and development attention. This paper provides a comprehensive study on MARL-assisted wireless distributed networks for 6G. In particular, we introduce the basic mathematical background and evolution of wireless distributed networks and MARL, as well as demonstrate their interrelationships. Subsequently, we analyze different structures of wireless distributed networks from the perspectives of homogeneous and heterogeneous. Furthermore, we introduce the basic concepts of MARL and discuss two typical categories, including model-based and model-free. We then present critical challenges faced by MARL-assisted wireless distributed networks, providing important guidance and insights for actual implementation. We also explore an interplay between MARL-assisted wireless distributed networks and emerging techniques, such as information bottleneck and mirror learning, delivering in-depth analyses and application scenarios. Finally, we outline several compelling research directions for future MARL-assisted wireless distributed networks.
△ Less
Submitted 9 February, 2025;
originally announced February 2025.
-
Vision Aided Channel Prediction for Vehicular Communications: A Case Study of Received Power Prediction Using RGB Images
Authors:
Xuejian Zhang,
Ruisi He,
Mi Yang,
Zhengyu Zhang,
Ziyi Qi,
Bo Ai
Abstract:
The communication scenarios and channel characteristics of 6G will be more complex and difficult to characterize. Conventional methods for channel prediction face challenges in achieving an optimal balance between accuracy, practicality, and generalizability. Additionally, they often fail to effectively leverage environmental features. Within the framework of integration communication and artifici…
▽ More
The communication scenarios and channel characteristics of 6G will be more complex and difficult to characterize. Conventional methods for channel prediction face challenges in achieving an optimal balance between accuracy, practicality, and generalizability. Additionally, they often fail to effectively leverage environmental features. Within the framework of integration communication and artificial intelligence as a pivotal development vision for 6G, it is imperative to achieve intelligent prediction of channel characteristics. Vision-aided methods have been employed in various wireless communication tasks, excluding channel prediction, and have demonstrated enhanced efficiency and performance. In this paper, we propose a vision-aided two-stage model for channel prediction in millimeter wave vehicular communication scenarios, realizing accurate received power prediction utilizing solely RGB images. Firstly, we obtain original images of propagation environment through an RGB camera. Secondly, three typical computer vision methods including object detection, instance segmentation and binary mask are employed for environmental information extraction from original images in stage 1, and prediction of received power based on processed images is implemented in stage 2. Pre-trained YOLOv8 and ResNets are used in stages 1 and 2, respectively, and fine-tuned on datasets. Finally, we conduct five experiments to evaluate the performance of proposed model, demonstrating its feasibility, accuracy and generalization capabilities. The model proposed in this paper offers novel solutions for achieving intelligent channel prediction in vehicular communications.
△ Less
Submitted 25 January, 2025;
originally announced January 2025.
-
Measurement-Based Modeling and Analysis of UAV Air-Ground Channels at 1 and 4 GHz
Authors:
Zhuangzhuang Cui,
Cesar Briso-Rodriguez,
Ke Guan,
Cesar Calvo-Ramirez,
Bo Ai,
Zhangdui Zhong
Abstract:
In the design of unmanned aerial vehicle (UAV) wireless communications, a better understanding of propagation characteristics and an accurate channel model are required. Measurements and comprehensive analysis for the UAV-based air-ground (AG) propagation channel in the vertical dimension are presented in this letter. Based on the measurement data at 1 and 4 GHz, the large-scale and small-scale ch…
▽ More
In the design of unmanned aerial vehicle (UAV) wireless communications, a better understanding of propagation characteristics and an accurate channel model are required. Measurements and comprehensive analysis for the UAV-based air-ground (AG) propagation channel in the vertical dimension are presented in this letter. Based on the measurement data at 1 and 4 GHz, the large-scale and small-scale channel parameters are extracted in the line-of-sight (LOS) and nonLOS case, respectively. The altitude-dependent path loss model is proposed herein. Furthermore, shadow fading and fast fading are statistically analyzed for comprehensively describing the fading behavior. Our results will be useful in the modeling of AG channels and the performance analysis for UAV-enabled wireless communication systems.
△ Less
Submitted 28 January, 2025;
originally announced January 2025.
-
Measurement-Based Non-Stationary Markov Tapped Delay Line Channel Model for 5G-Railways
Authors:
Xuejian Zhang,
Ruisi He,
Mi Yang,
Jianwen Ding,
Ruifeng Chen,
Shuaiqi Gao,
Ziyi Qi,
Zhengyu Zhang,
Bo Ai,
Zhangdui Zhong
Abstract:
5G for Railways (5G-R) is globally recognized as a promising next-generation railway communication system designed to meet increasing demands. Channel modeling serves as foundation for communication system design, with tapped delay line (TDL) models widely utilized in system simulations due to their simplicity and practicality and serves as a crucial component of various standards like 3GPP. Howev…
▽ More
5G for Railways (5G-R) is globally recognized as a promising next-generation railway communication system designed to meet increasing demands. Channel modeling serves as foundation for communication system design, with tapped delay line (TDL) models widely utilized in system simulations due to their simplicity and practicality and serves as a crucial component of various standards like 3GPP. However, existing TDL models applicable to 5G-R systems are limited. Most fail to capture non-stationarity, a critical characteristic of railway communications, while others are unsuitable for the specific frequency bands and bandwidths of 5G-R. In this paper, a channel measurement campaign for 5G-R dedicated network is carried out, resulting in a measurement-based 5-tap TDL model utilizing a first-order two-state Markov chain to represent channel non stationarity. Key model parameters, including number of taps, statistical distribution of amplitude, phase and Doppler shift, and state transition probability matrix, are extracted. The correlation between tap amplitudes are also obtained. Finally, accuracy of model is validated through comparisons with measurement data and 3GPP model. These findings are expected to offer valuable insights for design, optimization, and link-level simulation and validation of 5G-R systems.
△ Less
Submitted 26 January, 2025;
originally announced January 2025.
-
Vision-Aided Channel Prediction Based on Image Segmentation at Street Intersection Scenarios
Authors:
Xuejian Zhang,
Ruisi He,
Mi Yang,
Ziyi Qi,
Zhengyu Zhang,
Bo Ai,
Zhangdui Zhong
Abstract:
Intelligent vehicular communication with vehicle road collaboration capability is a key technology enabled by 6G, and the integration of various visual sensors on vehicles and infrastructures plays a crucial role. Moreover, accurate channel prediction is foundational to realizing intelligent vehicular communication. Traditional methods are still limited by the inability to balance accuracy and ope…
▽ More
Intelligent vehicular communication with vehicle road collaboration capability is a key technology enabled by 6G, and the integration of various visual sensors on vehicles and infrastructures plays a crucial role. Moreover, accurate channel prediction is foundational to realizing intelligent vehicular communication. Traditional methods are still limited by the inability to balance accuracy and operability based on substantial spectrum resource consumption and highly refined description of environment. Therefore, leveraging out-of-band information introduced by visual sensors provides a new solution and is increasingly applied across various communication tasks. In this paper, we propose a computer vision (CV)-based prediction model for vehicular communications, realizing accurate channel characterization prediction including path loss, Rice K-factor and delay spread based on image segmentation. First, we conduct extensive vehicle-to-infrastructure measurement campaigns, collecting channel and visual data from various street intersection scenarios. The image-channel dataset is generated after a series of data post-processing steps. Image data consists of individual segmentation of target user using YOLOv8 network. Subsequently, established dataset is used to train and test prediction network ResNet-32, where segmented images serve as input of network, and various channel characteristics are treated as labels or target outputs of network. Finally, self-validation and cross-validation experiments are performed. The results indicate that models trained with segmented images achieve high prediction accuracy and remarkable generalization performance across different streets and target users. The model proposed in this paper offers novel solutions for achieving intelligent channel
prediction in vehicular communications.
△ Less
Submitted 26 January, 2025;
originally announced January 2025.
-
Deep Reinforcement Learning for Energy Efficiency Maximization in RSMA-IRS-Assisted ISAC System
Authors:
Zhangfeng Ma,
Ruichen Zhang,
Bo Ai,
Zhuxian Lian,
Linzhou Zeng,
Dusit Niyato
Abstract:
This paper proposes a three-dimensional (3D) geometry-based channel model to accurately represent intelligent reflecting surfaces (IRS)-enhanced integrated sensing and communication (ISAC) networks using rate-splitting multiple access (RSMA) in practical urban environments. Based on this model, we formulate an energy efficiency (EE) maximization problem that incorporates transceiver beamforming co…
▽ More
This paper proposes a three-dimensional (3D) geometry-based channel model to accurately represent intelligent reflecting surfaces (IRS)-enhanced integrated sensing and communication (ISAC) networks using rate-splitting multiple access (RSMA) in practical urban environments. Based on this model, we formulate an energy efficiency (EE) maximization problem that incorporates transceiver beamforming constraints, IRS phase adjustments, and quality-of-service (QoS) requirements to optimize communication and sensing functions. To solve this problem, we use the proximal policy optimization (PPO) algorithm within a deep reinforcement learning (DRL) framework. Our numerical results confirm the effectiveness of the proposed method in improving EE and satisfying QoS requirements. Additionally, we observe that system EE drops at higher frequencies, especially under double-Rayleigh fading.
△ Less
Submitted 25 January, 2025;
originally announced January 2025.
-
ROMA: ROtary and Movable Antenna
Authors:
Jiayi Zhang,
Wenhui Yi,
Bokai Xu,
Zhe Wang,
Huahua Xiao,
Bo Ai
Abstract:
The rotary and movable antenna (ROMA) architecture represents a next-generation multi-antenna technology that enables flexible adjustment of antenna position and array rotation angles of the transceiver. In this letter, we propose a ROMA-aided multi-user MIMO communication system to fully enhance the efficiency and reliability of system transmissions. By deploying ROMA panels at both the transmitt…
▽ More
The rotary and movable antenna (ROMA) architecture represents a next-generation multi-antenna technology that enables flexible adjustment of antenna position and array rotation angles of the transceiver. In this letter, we propose a ROMA-aided multi-user MIMO communication system to fully enhance the efficiency and reliability of system transmissions. By deploying ROMA panels at both the transmitter and receiver sides, and jointly optimizing the three-dimensional (3D) rotation angles of each ROMA panel and the relative positions of antenna elements based on the spatial distribution of users and channel state information (CSI), we can achieve the objective of maximizing the average spectral efficiency (SE). Subsequently, we conduct a detailed analysis of the average SE performance of the system under the consideration of maximum ratio (MR) precoding. Due to the non-convexity of the optimization problem in the ROMA multi-user MIMO system, we propose an efficient solution based on an alternating optimization (AO) algorithm. Finally, simulation results demonstrate that the AO-based ROMA architecture can significantly improve the average SE. Furthermore, the performance improvement becomes more pronounced as the size of the movable region and the transmission power increase.
△ Less
Submitted 23 April, 2025; v1 submitted 23 January, 2025;
originally announced January 2025.
-
Cluster-Based Time-Variant Channel Characterization and Modeling for 5G-Railways
Authors:
Xuejian Zhang,
Ruisi He,
Bo Ai,
Mi Yang,
Jianwen Ding,
Shuaiqi Gao,
Ziyi Qi,
Zhengyu Zhang,
Zhangdui Zhong
Abstract:
With the development of high-speed railways, 5G for Railways (5G-R) is gradually replacing Global System for the Mobile Communications for Railway (GSM-R) worldwide to meet increasing demands. The large bandwidth, array antennas, and non-stationarity caused by high mobility has made 5G-R channel characterization more complex. Therefore, it is essential to develop an accurate channel model for 5G-R…
▽ More
With the development of high-speed railways, 5G for Railways (5G-R) is gradually replacing Global System for the Mobile Communications for Railway (GSM-R) worldwide to meet increasing demands. The large bandwidth, array antennas, and non-stationarity caused by high mobility has made 5G-R channel characterization more complex. Therefore, it is essential to develop an accurate channel model for 5G-R. However, researches on channel characterization and time-variant models specific to 5G-R frequency bands and scenarios is scarce. There are virtually no cluster-based time-variant channel models that capture statistical properties of 5G-R channel. In this paper, we propose a cluster-based time-variant channel model for 5G-R within an enhanced 3GPP framework, which incorporates time evolution features. Extensive channel measurements are conducted on 5G-R private network test line in China. We then extract and analyze typical channel fading characteristics and multipath cluster characteristics. Furthermore, birth-death process of the clusters is modeled by using a four-state Markov chain. Finally, a generalized clustered delay line (CDL) model is established in accordance with 3GPP standard and validated by comparing the results of measurements and simulations. This work enhances the understanding of 5G-R channels and presents a flexible cluster-based time-variant channel model. The results can be used in the design, deployment, and optimization of 5G-R networks.
△ Less
Submitted 30 December, 2024;
originally announced December 2024.
-
Deep Unfolding Beamforming and Power Control Designs for Multi-Port Matching Networks
Authors:
Bokai Xu,
Jiayi Zhang,
Qingfeng Lin,
Huahua Xiao,
Yik-Chung Wu,
Bo Ai
Abstract:
The key technologies of sixth generation (6G), such as ultra-massive multiple-input multiple-output (MIMO), enable intricate interactions between antennas and wireless propagation environments. As a result, it becomes necessary to develop joint models that encompass both antennas and wireless propagation channels. To achieve this, we utilize the multi-port communication theory, which considers imp…
▽ More
The key technologies of sixth generation (6G), such as ultra-massive multiple-input multiple-output (MIMO), enable intricate interactions between antennas and wireless propagation environments. As a result, it becomes necessary to develop joint models that encompass both antennas and wireless propagation channels. To achieve this, we utilize the multi-port communication theory, which considers impedance matching among the source, transmission medium, and load to facilitate efficient power transfer. Specifically, we first investigate the impact of insertion loss, mutual coupling, and other factors on the performance of multi-port matching networks. Next, to further improve system performance, we explore two important deep unfolding designs for the multi-port matching networks: beamforming and power control, respectively. For the hybrid beamforming, we develop a deep unfolding framework, i.e., projected gradient descent (PGD)-Net based on unfolding projected gradient descent. For the power control, we design a deep unfolding network, graph neural network (GNN) aided alternating optimization (AO)Net, which considers the interaction between different ports in optimizing power allocation. Numerical results verify the necessity of considering insertion loss in the dynamic metasurface antenna (DMA) performance analysis. Besides, the proposed PGD-Net based hybrid beamforming approaches approximate the conventional model-based algorithm with very low complexity. Moreover, our proposed power control scheme has a fast run time compared to the traditional weighted minimum mean squared error (WMMSE) method.
△ Less
Submitted 8 December, 2024;
originally announced December 2024.
-
Performance Analysis of XL-MIMO with Rotary and Movable Antennas for High-speed Railway
Authors:
Wenhui Yi,
Jiayi Zhang,
Zhe Wang,
Huahua Xiao,
Bo Ai
Abstract:
The rotary and movable antennas (ROMA) technology is efficient in enhancing wireless network capacity by adjusting both the antenna spacing and three-dimensional (3D) rotation of antenna surfaces, based on the spatial distribution of users and channel statistics. Applying ROMA to high-speed rail (HSR) wireless communications can significantly improve system performance in terms of array gain and s…
▽ More
The rotary and movable antennas (ROMA) technology is efficient in enhancing wireless network capacity by adjusting both the antenna spacing and three-dimensional (3D) rotation of antenna surfaces, based on the spatial distribution of users and channel statistics. Applying ROMA to high-speed rail (HSR) wireless communications can significantly improve system performance in terms of array gain and spatial multiplexing. However, the rapidly changing channel conditions in HSR scenarios present challenges for ROMA configuration. In this correspondence, we propose a analytical framework for configuring ROMA-based extremely large-scale multiple-input-multiple-output (XL-MIMO) system in HSR scenarios based on spatial correlation. First, we develop a localization model based on a mobility-aware near-field beam training algorithm to determine the real-time position of the train relay antennas. Next, we derive the expression for channel orthogonality and antenna spacing based on the spatial correlation matrix, and obtain the optimal antenna spacing when the transceiver panels are aligned in parallel. Moreover, we propose an optimization algorithm for the rotation angle of the transceiver panels, leveraging the differential evolution method, to determine the optimal angle. Finally, numerical results are provided to validate the computational results and optimization algorithm.
△ Less
Submitted 5 December, 2024;
originally announced December 2024.
-
Mobile Cell-Free Massive MIMO with Multi-Agent Reinforcement Learning: A Scalable Framework
Authors:
Ziheng Liu,
Jiayi Zhang,
Yiyang Zhu,
Enyu Shi,
Bo Ai
Abstract:
Cell-free massive multiple-input multiple-output (mMIMO) offers significant advantages in mobility scenarios, mainly due to the elimination of cell boundaries and strong macro diversity. In this paper, we examine the downlink performance of cell-free mMIMO systems equipped with mobile-APs utilizing the concept of unmanned aerial vehicles, where mobility and power control are jointly considered to…
▽ More
Cell-free massive multiple-input multiple-output (mMIMO) offers significant advantages in mobility scenarios, mainly due to the elimination of cell boundaries and strong macro diversity. In this paper, we examine the downlink performance of cell-free mMIMO systems equipped with mobile-APs utilizing the concept of unmanned aerial vehicles, where mobility and power control are jointly considered to effectively enhance coverage and suppress interference. However, the high computational complexity, poor collaboration, limited scalability, and uneven reward distribution of conventional optimization schemes lead to serious performance degradation and instability. These factors complicate the provision of consistent and high-quality service across all user equipments in downlink cell-free mMIMO systems. Consequently, we propose a novel scalable framework enhanced by multi-agent reinforcement learning (MARL) to tackle these challenges. The established framework incorporates a graph neural network (GNN)-aided communication mechanism to facilitate effective collaboration among agents, a permutation architecture to improve scalability, and a directional decoupling architecture to accurately distinguish contributions. In the numerical results, we present comparisons of different optimization schemes and network architectures, which reveal that the proposed scheme can effectively enhance system performance compared to conventional schemes due to the adoption of advanced technologies. In particular, appropriately compressing the observation space of agents is beneficial for achieving a better balance between performance and convergence.
△ Less
Submitted 3 December, 2024;
originally announced December 2024.
-
Deep Learning Based Near-Field User Localization with Beam Squint in Wideband XL-MIMO Systems
Authors:
Hao Lei,
Jiayi Zhang,
Huahua Xiao,
Derrick Wing Kwan Ng,
Bo Ai
Abstract:
Extremely large-scale multiple-input multiple-output (XL-MIMO) is gaining attention as a prominent technology for enabling the sixth-generation (6G) wireless networks. However, the vast antenna array and the huge bandwidth introduce a non-negligible beam squint effect, causing beams of different frequencies to focus at different locations. One approach to cope with this is to employ true-time-dela…
▽ More
Extremely large-scale multiple-input multiple-output (XL-MIMO) is gaining attention as a prominent technology for enabling the sixth-generation (6G) wireless networks. However, the vast antenna array and the huge bandwidth introduce a non-negligible beam squint effect, causing beams of different frequencies to focus at different locations. One approach to cope with this is to employ true-time-delay lines (TTDs)-based beamforming to control the range and trajectory of near-field beam squint, known as the near-field controllable beam squint (CBS) effect. In this paper, we investigate the user localization in near-field wideband XL-MIMO systems under the beam squint effect and spatial non-stationary properties. Firstly, we derive the expressions for Cramér-Rao Bounds (CRBs) for characterizing the performance of estimating both angle and distance. This analysis aims to assess the potential of leveraging CBS for precise user localization. Secondly, a user localization scheme combining CBS and beam training is proposed. Specifically, we organize multiple subcarriers into groups, directing beams from different groups to distinct angles or distances through the CBS to obtain the estimates of users' angles and distances. Furthermore, we design a user localization scheme based on a convolutional neural network model, namely ConvNeXt. This scheme utilizes the inputs and outputs of the CBS-based scheme to generate high-precision estimates of angle and distance. More importantly, our proposed ConvNeXt-based user localization scheme achieves centimeter-level accuracy in localization estimates.
△ Less
Submitted 1 December, 2024;
originally announced December 2024.
-
COST CA20120 INTERACT Framework of Artificial Intelligence Based Channel Modeling
Authors:
Ruisi He,
Nicola D. Cicco,
Bo Ai,
Mi Yang,
Yang Miao,
Mate Boban
Abstract:
Accurate channel models are the prerequisite for communication-theoretic investigations as well as system design. Channel modeling generally relies on statistical and deterministic approaches. However, there are still significant limits for the traditional modeling methods in terms of accuracy, generalization ability, and computational complexity. The fundamental reason is that establishing a quan…
▽ More
Accurate channel models are the prerequisite for communication-theoretic investigations as well as system design. Channel modeling generally relies on statistical and deterministic approaches. However, there are still significant limits for the traditional modeling methods in terms of accuracy, generalization ability, and computational complexity. The fundamental reason is that establishing a quantified and accurate mapping between physical environment and channel characteristics becomes increasing challenging for modern communication systems. Here, in the context of COST CA20120 Action, we evaluate and discuss the feasibility and implementation of using artificial intelligence (AI) for channel modeling, and explore where the future of this field lies. Firstly, we present a framework of AI-based channel modeling to characterize complex wireless channels. Then, we highlight in detail some major challenges and present the possible solutions: i) estimating the uncertainty of AI-based channel predictions, ii) integrating prior knowledge of propagation to improve generalization capabilities, and iii) interpretable AI for channel modeling. We present and discuss illustrative numerical results to showcase the capabilities of AI-based channel modeling.
△ Less
Submitted 31 October, 2024;
originally announced November 2024.
-
Joint Precoding and AP Selection for Energy Efficient RIS-aided Cell-Free Massive MIMO Using Multi-agent Reinforcement Learning
Authors:
Enyu Shi,
Jiayi Zhang,
Ziheng Liu,
Yiyang Zhu,
Chau Yuen,
Derrick Wing Kwan Ng,
Marco Di Renzo,
Bo Ai
Abstract:
Cell-free (CF) massive multiple-input multiple-output (mMIMO) and reconfigurable intelligent surface (RIS) are two advanced transceiver technologies for realizing future sixth-generation (6G) networks. In this paper, we investigate the joint precoding and access point (AP) selection for energy efficient RIS-aided CF mMIMO system. To address the associated computational complexity and communication…
▽ More
Cell-free (CF) massive multiple-input multiple-output (mMIMO) and reconfigurable intelligent surface (RIS) are two advanced transceiver technologies for realizing future sixth-generation (6G) networks. In this paper, we investigate the joint precoding and access point (AP) selection for energy efficient RIS-aided CF mMIMO system. To address the associated computational complexity and communication power consumption, we advocate for user-centric dynamic networks in which each user is served by a subset of APs rather than by all of them. Based on the user-centric network, we formulate a joint precoding and AP selection problem to maximize the energy efficiency (EE) of the considered system. To solve this complex nonconvex problem, we propose an innovative double-layer multi-agent reinforcement learning (MARL)-based scheme. Moreover, we propose an adaptive power threshold-based AP selection scheme to further enhance the EE of the considered system. To reduce the computational complexity of the RIS-aided CF mMIMO system, we introduce a fuzzy logic (FL) strategy into the MARL scheme to accelerate convergence. The simulation results show that the proposed FL-based MARL cooperative architecture effectively improves EE performance, offering a 85\% enhancement over the zero-forcing (ZF) method, and achieves faster convergence speed compared with MARL. It is important to note that increasing the transmission power of the APs or the number of RIS elements can effectively enhance the spectral efficiency (SE) performance, which also leads to an increase in power consumption, resulting in a non-trivial trade-off between the quality of service and EE performance.
△ Less
Submitted 17 November, 2024;
originally announced November 2024.
-
Transmission Scheduling of Millimeter Wave Communication for High-Speed Railway in Space-Air-Ground Integrated Network
Authors:
Lei Liu,
Bo Ai,
Yong Niu,
Zhu Han,
Ning Wang,
Lei Xiong,
Ruisi He
Abstract:
The space-air-ground integrated network (SAGIN) greatly improves coverage and reliability for millimeter-wave (mmWave) communication in high-speed railway (HSR) scenarios. However, a significant challenge arises in the transmission scheduling due to the rapid changes in channel state, link selection for train mobile relays (MRs), and order of the flow scheduling. To tackle this challenge, we intro…
▽ More
The space-air-ground integrated network (SAGIN) greatly improves coverage and reliability for millimeter-wave (mmWave) communication in high-speed railway (HSR) scenarios. However, a significant challenge arises in the transmission scheduling due to the rapid changes in channel state, link selection for train mobile relays (MRs), and order of the flow scheduling. To tackle this challenge, we introduce an optimization problem focused on maximizing the weighted sum completed flows that satisfy the quality of service (QoS) requirements for HSR mmWave communication in SAGIN. To facilitate the simultaneous scheduling of flows by base station-MR (BS-MR), satellite-airship-MR, and satellite-MR links, we propose a link selection algorithm, which can help each flow choose a suitable set of links in every frame and determine whether the BS networks need the assistance of the satellite and airship. Furthermore, taking into account the priority and occupied time slots (TSs) resource of different flows, we propose a multi-link weighted flow scheduling (MWFS) algorithm. This algorithm not only prioritizes scheduling high-priority flows but also aims to maximize the weighted sum completed flows for MRs. Our simulation results confirm that the proposed algorithm significantly increases the weighted sum completed flows and the total transmitted bits. Additionally, the proposed algorithm can achieve the optimal flow transmission in different link switching periods and enhance the scheduling of high-priority flows compared to other algorithms.
△ Less
Submitted 16 October, 2024;
originally announced October 2024.
-
Cooperative Multi-Target Positioning for Cell-Free Massive MIMO with Multi-Agent Reinforcement Learning
Authors:
Ziheng Liu,
Jiayi Zhang,
Enyu Shi,
Yiyang Zhu,
Derrick Wing Kwan Ng,
Bo Ai
Abstract:
Cell-free massive multiple-input multiple-output (mMIMO) is a promising technology to empower next-generation mobile communication networks. In this paper, to address the computational complexity associated with conventional fingerprint positioning, we consider a novel cooperative positioning architecture that involves certain relevant access points (APs) to establish positioning similarity coeffi…
▽ More
Cell-free massive multiple-input multiple-output (mMIMO) is a promising technology to empower next-generation mobile communication networks. In this paper, to address the computational complexity associated with conventional fingerprint positioning, we consider a novel cooperative positioning architecture that involves certain relevant access points (APs) to establish positioning similarity coefficients. Then, we propose an innovative joint positioning and correction framework employing multi-agent reinforcement learning (MARL) to tackle the challenges of high-dimensional sophisticated signal processing, which mainly leverages on the received signal strength information for preliminary positioning, supplemented by the angle of arrival information to refine the initial position estimation. Moreover, to mitigate the bias effects originating from remote APs, we design a cooperative weighted K-nearest neighbor (Co-WKNN)-based estimation scheme to select APs with a high correlation to participate in user positioning. In the numerical results, we present comparisons of various user positioning schemes, which reveal that the proposed MARL-based positioning scheme with Co-WKNN can effectively improve positioning performance. It is important to note that the cooperative positioning architecture is a critical element in striking a balance between positioning performance and computational complexity.
△ Less
Submitted 8 October, 2024;
originally announced October 2024.
-
Distributed Collaborative User Positioning for Cell-Free Massive MIMO with Multi-Agent Reinforcement Learning
Authors:
Ziheng Liu,
Jiayi Zhang,
Enyu Shi,
Yiyang Zhu,
Derrick Wing Kwan Ng,
Bo Ai
Abstract:
In this paper, we investigate a cell-free massive multiple-input multiple-output system, which exhibits great potential in enhancing the capabilities of next-generation mobile communication networks. We first study the distributed positioning problem to lay the groundwork for solving resource allocation and interference management issues. Instead of relying on computationally and spatially complex…
▽ More
In this paper, we investigate a cell-free massive multiple-input multiple-output system, which exhibits great potential in enhancing the capabilities of next-generation mobile communication networks. We first study the distributed positioning problem to lay the groundwork for solving resource allocation and interference management issues. Instead of relying on computationally and spatially complex fingerprint positioning methods, we propose a novel two-stage distributed collaborative positioning architecture with multi-agent reinforcement learning (MARL) network, consisting of a received signal strength-based preliminary positioning network and an angle of arrival-based auxiliary correction network. Our experimental results demonstrate that the two-stage distributed collaborative user positioning architecture can outperform conventional fingerprint positioning methods in terms of positioning accuracy.
△ Less
Submitted 7 October, 2024;
originally announced October 2024.
-
Rate-Splitting for Cell-Free Massive MIMO: Performance Analysis and Generative AI Approach
Authors:
Jiakang Zheng,
Jiayi Zhang,
Hongyang Du,
Ruichen Zhang,
Dusit Niyato,
Octavia A. Dobre,
Bo Ai
Abstract:
Cell-free (CF) massive multiple-input multipleoutput (MIMO) provides a ubiquitous coverage to user equipments (UEs) but it is also susceptible to interference. Ratesplitting (RS) effectively extracts data by decoding interference, yet its effectiveness is limited by the weakest UE. In this paper, we investigate an RS-based CF massive MIMO system, which combines strengths and mitigates weaknesses o…
▽ More
Cell-free (CF) massive multiple-input multipleoutput (MIMO) provides a ubiquitous coverage to user equipments (UEs) but it is also susceptible to interference. Ratesplitting (RS) effectively extracts data by decoding interference, yet its effectiveness is limited by the weakest UE. In this paper, we investigate an RS-based CF massive MIMO system, which combines strengths and mitigates weaknesses of both approaches. Considering imperfect channel state information (CSI) resulting from both pilot contamination and noise, we derive a closed-form expression for the sum spectral efficiency (SE) of the RS-based CF massive MIMO system under a spatially correlated Rician channel. Moreover, we propose low-complexity heuristic algorithms based on statistical CSI for power-splitting of common messages and power-control of private messages, and genetic algorithm is adopted as a solution for upper bound performance. Furthermore, we formulate a joint optimization problem, aiming to maximize the sum SE of the RS-based CF massive MIMO system by optimizing the power-splitting factor and power-control coefficient. Importantly, we improve a generative AI (GAI) algorithm to address this complex and nonconvexity problem by using a diffusion model to obtain solutions. Simulation results demonstrate its effectiveness and practicality in mitigating interference, especially in dynamic environments.
△ Less
Submitted 24 September, 2024; v1 submitted 23 September, 2024;
originally announced September 2024.
-
Joint AP-UE Association and Precoding for SIM-Aided Cell-Free Massive MIMO Systems
Authors:
Enyu Shi,
Jiayi Zhang,
Jiancheng An,
Guangyang Zhang,
Ziheng Liu,
Chau Yuen,
Bo Ai
Abstract:
Cell-free (CF) massive multiple-input multiple-output (mMIMO) systems are emerging as promising alternatives to cellular networks, especially in ultra-dense environments. However, further capacity enhancement requires the deployment of more access points (APs), which will lead to high costs and high energy consumption. To address this issue, in this paper, we explore the integration of low-power,…
▽ More
Cell-free (CF) massive multiple-input multiple-output (mMIMO) systems are emerging as promising alternatives to cellular networks, especially in ultra-dense environments. However, further capacity enhancement requires the deployment of more access points (APs), which will lead to high costs and high energy consumption. To address this issue, in this paper, we explore the integration of low-power, low-cost stacked intelligent metasurfaces (SIM) into CF mMIMO systems to enhance AP capabilities. The key point is that SIM performs precoding-related matrix operations in the wave domain. As a consequence, each AP antenna only needs to transmit data streams for a single user equipment (UE), eliminating the need for complex baseband digital precoding. Then, we formulate the problem of joint AP-UE association and precoding at APs and SIMs to maximize the system sum rate. Due to the non-convexity and high complexity of the formulated problem, we propose a two-stage signal processing framework to solve it. In particular, in the first stage, we propose an AP antenna greedy association (AGA) algorithm to minimize UE interference. In the second stage, we introduce an alternating optimization (AO)-based algorithm that separates the joint power and wave-based precoding optimization problem into two distinct sub-problems: the complex quadratic transform method is used for AP antenna power control, and the projection gradient ascent (PGA) algorithm is employed to find suboptimal solutions for the SIM wave-based precoding. Finally, the numerical results validate the effectiveness of the proposed framework and assess the performance enhancement achieved by the algorithm in comparison to various benchmark schemes. The results show that, with the same number of SIM meta-atoms, the proposed algorithm improves the sum rate by approximately 275% compared to the benchmark scheme.
△ Less
Submitted 19 September, 2024;
originally announced September 2024.
-
Harnessing Stacked Intelligent Metasurface for Enhanced Cell-Free Massive MIMO Systems: A Low-Power and Cost Approach
Authors:
Enyu Shi,
Jiayi Zhang,
Yiyang Zhu,
Jiancheng An,
Chau Yuen,
Bo Ai
Abstract:
In this paper, we explore the integration of low-power, low-cost stacked intelligent metasurfaces (SIM) into cell-free (CF) massive multiple-input multiple-output (mMIMO) systems to enhance access point (AP) capabilities and address high power consumption and cost challenges. Specifically, we investigate the uplink performance of a SIM-enhanced CF mMIMO system and propose a novel system framework.…
▽ More
In this paper, we explore the integration of low-power, low-cost stacked intelligent metasurfaces (SIM) into cell-free (CF) massive multiple-input multiple-output (mMIMO) systems to enhance access point (AP) capabilities and address high power consumption and cost challenges. Specifically, we investigate the uplink performance of a SIM-enhanced CF mMIMO system and propose a novel system framework. First, the closed-form expressions of the spectral efficiency (SE) are obtained using the unique two-layer signal processing framework of CF mMIMO systems. Second, to mitigate inter-user interference, an interference-based greedy algorithm for pilot allocation is introduced. Third, a wave-based beamforming algorithm for SIM is proposed, based only on statistical channel state information, which effectively reduces the fronthaul costs. Finally, a max-min SE power control algorithm is proposed to improve the performance of UE with inferior channel conditions. The results indicate that increasing the number of SIM layers and meta-atoms leads to significant performance improvements and allows for a reduction in the number of APs and AP antennas, thus lowering the costs. In particular, the best SE performance is achieved with the deployment of 20 APs plus 1200 SIM meta-atoms. Finally, the proposed wave-based beamforming algorithm can enhance the SE performance of SIM-enhanced CF-mMIMO systems by 57\%, significantly outperforming traditional CF mMIMO systems.
△ Less
Submitted 19 September, 2024;
originally announced September 2024.
-
Refracting Reconfigurable Intelligent Surface Assisted URLLC for Millimeter Wave High-Speed Train Communication Coverage Enhancement
Authors:
Changzhu Liu,
Ruisi He,
Yong Niu,
Shiwen Mao,
Bo Ai,
Ruifeng Chen
Abstract:
High-speed train (HST) has garnered significant attention from both academia and industry due to the rapid development of railways worldwide. Millimeter wave (mmWave) communication, known for its large bandwidth is an effective way to address performance bottlenecks in cellular network based HST wireless communication systems. However, mmWave signals suffer from significant path loss when traversi…
▽ More
High-speed train (HST) has garnered significant attention from both academia and industry due to the rapid development of railways worldwide. Millimeter wave (mmWave) communication, known for its large bandwidth is an effective way to address performance bottlenecks in cellular network based HST wireless communication systems. However, mmWave signals suffer from significant path loss when traversing carriage, posing substantial challenges to cellular networks. To address this issue, reconfigurable intelligent surfaces (RIS) have gained considerable interest for its ability to enhance cell coverage by reflecting signals toward receiver. Ensuring communication reliability, a core performance indicators of ultra-reliable and low-latency communications (URLLC) in fifth-generation systems, is crucial for providing steady and reliable data transmissions along railways, particularly for delivering safety and control messages and monitoring HST signaling information. In this paper, we investigate a refracting RIS-assisted multi-user multiple-input single-output URLLC system in mmWave HST communications. We propose a sum rate maximization problem, subject to base station beamforming constraint, as well as refracting RIS discrete phase shifts and reliability constraints. To solve this optimization problem, we design a joint optimization algorithm based on alternating optimization method. This involves decoupling the original optimization problem into active beamforming design and packet error probability optimization subproblem, and discrete phase shift design subproblems. These subproblems are addressed exploiting Lagrangian dual method and the local search method, respectively. Simulation results demonstrate the fast convergence of the proposed algorithm and highlight the benefits of refracting RIS adoption for sum rate improvement in mmWave HST networks.
△ Less
Submitted 10 September, 2024;
originally announced September 2024.
-
LLM-Based Multi-Hop Question Answering with Knowledge Graph Integration in Evolving Environments
Authors:
Ruirui Chen,
Weifeng Jiang,
Chengwei Qin,
Ishaan Singh Rawal,
Cheston Tan,
Dongkyu Choi,
Bo Xiong,
Bo Ai
Abstract:
The important challenge of keeping knowledge in Large Language Models (LLMs) up-to-date has led to the development of various methods for incorporating new facts. However, existing methods for such knowledge editing still face difficulties with multi-hop questions that require accurate fact identification and sequential logical reasoning, particularly among numerous fact updates. To tackle these c…
▽ More
The important challenge of keeping knowledge in Large Language Models (LLMs) up-to-date has led to the development of various methods for incorporating new facts. However, existing methods for such knowledge editing still face difficulties with multi-hop questions that require accurate fact identification and sequential logical reasoning, particularly among numerous fact updates. To tackle these challenges, this paper introduces Graph Memory-based Editing for Large Language Models (GMeLLo), a straightforward and effective method that merges the explicit knowledge representation of Knowledge Graphs (KGs) with the linguistic flexibility of LLMs. Beyond merely leveraging LLMs for question answering, GMeLLo employs these models to convert free-form language into structured queries and fact triples, facilitating seamless interaction with KGs for rapid updates and precise multi-hop reasoning. Our results show that GMeLLo significantly surpasses current state-of-the-art (SOTA) knowledge editing methods in the multi-hop question answering benchmark, MQuAKE, especially in scenarios with extensive knowledge edits.
△ Less
Submitted 4 December, 2024; v1 submitted 28 August, 2024;
originally announced August 2024.
-
SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning
Authors:
Yuze Zhao,
Jintao Huang,
Jinghan Hu,
Xingjun Wang,
Yunlin Mao,
Daoze Zhang,
Hong Zhang,
Zeyinzi Jiang,
Zhikai Wu,
Baole Ai,
Ang Wang,
Wenmeng Zhou,
Yingda Chen
Abstract:
Recent development in Large Language Models (LLMs) and Multi-modal Large Language Models (MLLMs) have leverage Attention-based Transformer architectures and achieved superior performance and generalization capabilities. They have since covered extensive areas of traditional learning tasks. For instance, text-based tasks such as text-classification and sequence-labeling, as well as multi-modal task…
▽ More
Recent development in Large Language Models (LLMs) and Multi-modal Large Language Models (MLLMs) have leverage Attention-based Transformer architectures and achieved superior performance and generalization capabilities. They have since covered extensive areas of traditional learning tasks. For instance, text-based tasks such as text-classification and sequence-labeling, as well as multi-modal tasks like Visual Question Answering (VQA) and Optical Character Recognition (OCR), which were previously addressed using different models, can now be tackled based on one foundation model. Consequently, the training and lightweight fine-tuning of LLMs and MLLMs, especially those based on Transformer architecture, has become particularly important. In recognition of these overwhelming needs, we develop SWIFT, a customizable one-stop infrastructure for large models. With support of over $300+$ LLMs and $50+$ MLLMs, SWIFT stands as the open-source framework that provide the most comprehensive support for fine-tuning large models. In particular, it is the first training framework that provides systematic support for MLLMs. In addition to the core functionalities of fine-tuning, SWIFT also integrates post-training processes such as inference, evaluation, and model quantization, to facilitate fast adoptions of large models in various application scenarios. With a systematic integration of various training techniques, SWIFT offers helpful utilities such as benchmark comparisons among different training techniques for large models. For fine-tuning models specialized in agent framework, we show that notable improvements on the ToolBench leader-board can be achieved by training with customized dataset on SWIFT, with an increase of 5.2%-21.8% in the Act.EM metric over various baseline models, a reduction in hallucination by 1.6%-14.1%, and an average performance improvement of 8%-17%.
△ Less
Submitted 19 May, 2025; v1 submitted 10 August, 2024;
originally announced August 2024.
-
Optimal Bilinear Equalizer for Cell-Free Massive MIMO Systems over Correlated Rician Channels
Authors:
Zhe Wang,
Jiayi Zhang,
Emil Björnson,
Dusit Niyato,
Bo Ai
Abstract:
In this paper, we explore the low-complexity optimal bilinear equalizer (OBE) combining scheme design for cell-free massive multiple-input multiple-output networks with spatially correlated Rician fading channels. We provide a spectral efficiency (SE) performance analysis framework for both the centralized and distributed processing schemes with bilinear equalizer (BE)-structure combining schemes…
▽ More
In this paper, we explore the low-complexity optimal bilinear equalizer (OBE) combining scheme design for cell-free massive multiple-input multiple-output networks with spatially correlated Rician fading channels. We provide a spectral efficiency (SE) performance analysis framework for both the centralized and distributed processing schemes with bilinear equalizer (BE)-structure combining schemes applied. The BE-structured combining is a set of schemes that are constructed by the multiplications of channel statistics-based BE matrices and instantaneous channel estimates. Notably, we derive closed-form achievable SE expressions for centralized and distributed BE-structured combining schemes. We propose one centralized and two distributed OBE schemes: Centralized OBE (C-OBE), Distributed OBE based on Global channel statistics (DG-OBE), and Distributed OBE based on Local channel statistics (DL-OBE), which maximize their respective SE expressions. OBE matrices in these schemes are tailored based on varying levels of channel statistics. Notably, we obtain new and insightful closed-form results for the C-OBE, DG-OBE, and DL-OBE combining schemes. Numerical results demonstrate that the proposed OBE schemes can achieve excellent SE, even in scenarios with severe pilot contamination.
△ Less
Submitted 2 March, 2025; v1 submitted 26 July, 2024;
originally announced July 2024.
-
Diffusion-Driven Semantic Communication for Generative Models with Bandwidth Constraints
Authors:
Lei Guo,
Wei Chen,
Yuxuan Sun,
Bo Ai,
Nikolaos Pappas,
Tony Q. S. Quek
Abstract:
Diffusion models have been extensively utilized in AI-generated content (AIGC) in recent years, thanks to the superior generation capabilities. Combining with semantic communications, diffusion models are used for tasks such as denoising, data reconstruction, and content generation. However, existing diffusion-based generative models do not consider the stringent bandwidth limitation, which limits…
▽ More
Diffusion models have been extensively utilized in AI-generated content (AIGC) in recent years, thanks to the superior generation capabilities. Combining with semantic communications, diffusion models are used for tasks such as denoising, data reconstruction, and content generation. However, existing diffusion-based generative models do not consider the stringent bandwidth limitation, which limits its application in wireless communication. This paper introduces a diffusion-driven semantic communication framework with advanced VAE-based compression for bandwidth-constrained generative model. Our designed architecture utilizes the diffusion model, where the signal transmission process through the wireless channel acts as the forward process in diffusion. To reduce bandwidth requirements, we incorporate a downsampling module and a paired upsampling module based on a variational auto-encoder with reparameterization at the receiver to ensure that the recovered features conform to the Gaussian distribution. Furthermore, we derive the loss function for our proposed system and evaluate its performance through comprehensive experiments. Our experimental results demonstrate significant improvements in pixel-level metrics such as peak signal to noise ratio (PSNR) and semantic metrics like learned perceptual image patch similarity (LPIPS). These enhancements are more profound regarding the compression rates and SNR compared to deep joint source-channel coding (DJSCC). We release the code at https://github.com/import-sudo/Diffusion-Driven-Semantic-Communication.
△ Less
Submitted 9 July, 2025; v1 submitted 25 July, 2024;
originally announced July 2024.
-
Near-Field User Localization and Channel Estimation for XL-MIMO Systems: Fundamentals, Recent Advances, and Outlooks
Authors:
Hao Lei,
Jiayi Zhang,
Zhe Wang,
Huahua Xiao,
Bo Ai,
Emil Björnson
Abstract:
Extremely large-scale multiple-input multipleoutput (XL-MIMO) is believed to be a cornerstone of sixth-generation (6G) wireless networks. XL-MIMO uses more antennas to both achieve unprecedented spatial degrees of freedom (DoFs) and exploit new electromagnetic (EM) phenomena occurring in the radiative near-field. The near-field effects provide the XL-MIMO array with depth perception, enabling prec…
▽ More
Extremely large-scale multiple-input multipleoutput (XL-MIMO) is believed to be a cornerstone of sixth-generation (6G) wireless networks. XL-MIMO uses more antennas to both achieve unprecedented spatial degrees of freedom (DoFs) and exploit new electromagnetic (EM) phenomena occurring in the radiative near-field. The near-field effects provide the XL-MIMO array with depth perception, enabling precise localization and spatially multiplexing jointly in the angle and distance domains. This article delineates the distinctions between near-field and far-field propagation, highlighting the unique EM characteristics introduced by having large antenna arrays. It thoroughly examines the challenges these new near-field characteristics pose for user localization and channel estimation and provides a comprehensive review of new algorithms developed to address them. The article concludes by identifying critical future research directions.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
AI-Driven Mobility Management for High-Speed Railway Communications: Compressed Measurements and Proactive Handover
Authors:
Wen Li,
Wei Chen,
Shiyue Wang,
Yuanyuan Zhang,
Michail Matthaiou,
Bo Ai
Abstract:
High-speed railway (HSR) communications are pivotal for ensuring rail safety, operations, maintenance, and delivering passenger information services. The high speed of trains creates rapidly time-varying wireless channels, increases the signaling overhead, and reduces the system throughput, making it difficult to meet the growing and stringent needs of HSR applications. In this article, we explore…
▽ More
High-speed railway (HSR) communications are pivotal for ensuring rail safety, operations, maintenance, and delivering passenger information services. The high speed of trains creates rapidly time-varying wireless channels, increases the signaling overhead, and reduces the system throughput, making it difficult to meet the growing and stringent needs of HSR applications. In this article, we explore artificial intelligence (AI)-based beam-level and cell-level mobility management suitable for HSR communications. Particularly, we propose a compressed spatial multi-beam measurements scheme via compressive sensing for beam-level mobility management in HSR communications. In comparison to traditional down-sampling spatial beam measurements, this method leads to improved spatial-temporal beam prediction accuracy with the same measurement overhead. Moreover, we propose a novel AI-based proactive handover scheme to predict handover events and reduce radio link failure (RLF) rates in HSR communications. Compared with the traditional event A3-based handover mechanism, the proposed approach significantly reduces the RLF rates which saves 50% beam measurement overhead.
△ Less
Submitted 5 July, 2025; v1 submitted 5 July, 2024;
originally announced July 2024.
-
IntentionNet: Map-Lite Visual Navigation at the Kilometre Scale
Authors:
Wei Gao,
Bo Ai,
Joel Loo,
Vinay,
David Hsu
Abstract:
This work explores the challenges of creating a scalable and robust robot navigation system that can traverse both indoor and outdoor environments to reach distant goals. We propose a navigation system architecture called IntentionNet that employs a monolithic neural network as the low-level planner/controller, and uses a general interface that we call intentions to steer the controller. The paper…
▽ More
This work explores the challenges of creating a scalable and robust robot navigation system that can traverse both indoor and outdoor environments to reach distant goals. We propose a navigation system architecture called IntentionNet that employs a monolithic neural network as the low-level planner/controller, and uses a general interface that we call intentions to steer the controller. The paper proposes two types of intentions, Local Path and Environment (LPE) and Discretised Local Move (DLM), and shows that DLM is robust to significant metric positioning and mapping errors. The paper also presents Kilo-IntentionNet, an instance of the IntentionNet system using the DLM intention that is deployed on a Boston Dynamics Spot robot, and which successfully navigates through complex indoor and outdoor environments over distances of up to a kilometre with only noisy odometry.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
RoboPack: Learning Tactile-Informed Dynamics Models for Dense Packing
Authors:
Bo Ai,
Stephen Tian,
Haochen Shi,
Yixuan Wang,
Cheston Tan,
Yunzhu Li,
Jiajun Wu
Abstract:
Tactile feedback is critical for understanding the dynamics of both rigid and deformable objects in many manipulation tasks, such as non-prehensile manipulation and dense packing. We introduce an approach that combines visual and tactile sensing for robotic manipulation by learning a neural, tactile-informed dynamics model. Our proposed framework, RoboPack, employs a recurrent graph neural network…
▽ More
Tactile feedback is critical for understanding the dynamics of both rigid and deformable objects in many manipulation tasks, such as non-prehensile manipulation and dense packing. We introduce an approach that combines visual and tactile sensing for robotic manipulation by learning a neural, tactile-informed dynamics model. Our proposed framework, RoboPack, employs a recurrent graph neural network to estimate object states, including particles and object-level latent physics information, from historical visuo-tactile observations and to perform future state predictions. Our tactile-informed dynamics model, learned from real-world data, can solve downstream robotics tasks with model-predictive control. We demonstrate our approach on a real robot equipped with a compliant Soft-Bubble tactile sensor on non-prehensile manipulation and dense packing tasks, where the robot must infer the physics properties of objects from direct and indirect interactions. Trained on only an average of 30 minutes of real-world interaction data per task, our model can perform online adaptation and make touch-informed predictions. Through extensive evaluations in both long-horizon dynamics prediction and real-world manipulation, our method demonstrates superior effectiveness compared to previous learning-based and physics-based simulation systems.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
VideoQA-SC: Adaptive Semantic Communication for Video Question Answering
Authors:
Jiangyuan Guo,
Wei Chen,
Yuxuan Sun,
Jialong Xu,
Bo Ai
Abstract:
Although semantic communication (SC) has shown its potential in efficiently transmitting multimodal data such as texts, speeches and images, SC for videos has focused primarily on pixel-level reconstruction. However, these SC systems may be suboptimal for downstream intelligent tasks. Moreover, SC systems without pixel-level video reconstruction present advantages by achieving higher bandwidth eff…
▽ More
Although semantic communication (SC) has shown its potential in efficiently transmitting multimodal data such as texts, speeches and images, SC for videos has focused primarily on pixel-level reconstruction. However, these SC systems may be suboptimal for downstream intelligent tasks. Moreover, SC systems without pixel-level video reconstruction present advantages by achieving higher bandwidth efficiency and real-time performance of various intelligent tasks. The difficulty in such system design lies in the extraction of task-related compact semantic representations and their accurate delivery over noisy channels. In this paper, we propose an end-to-end SC system, named VideoQA-SC for video question answering (VideoQA) tasks. Our goal is to accomplish VideoQA tasks directly based on video semantics over noisy or fading wireless channels, bypassing the need for video reconstruction at the receiver. To this end, we develop a spatiotemporal semantic encoder for effective video semantic extraction, and a learning-based bandwidth-adaptive deep joint source-channel coding (DJSCC) scheme for efficient and robust video semantic transmission. Experiments demonstrate that VideoQA-SC outperforms traditional and advanced DJSCC-based SC systems that rely on video reconstruction at the receiver under a wide range of channel conditions and bandwidth constraints. In particular, when the signal-to-noise ratio is low, VideoQA-SC can improve the answer accuracy by 5.17% while saving almost 99.5\% of the bandwidth at the same time, compared with the advanced DJSCC-based SC system. Our results show the great potential of SC system design for video applications.
△ Less
Submitted 11 February, 2025; v1 submitted 17 May, 2024;
originally announced June 2024.