-
Subspace-Based Super-Resolution Sensing for Bi-Static ISAC with Clock Asynchronism
Authors:
Jingbo Zhao,
Zhaoming Lu,
J. Andrew Zhang,
Jiaxi Zhou,
Weicai Li,
Tao Gu
Abstract:
Bi-static sensing is an attractive configuration for integrated sensing and communications (ISAC) systems; however, clock asynchronism between widely separated transmitters and receivers introduces time-varying time offsets (TO) and phase offsets (PO), posing significant challenges. This paper introduces a signal-subspace-based framework that estimates decoupled angles, delays, and complex gain se…
▽ More
Bi-static sensing is an attractive configuration for integrated sensing and communications (ISAC) systems; however, clock asynchronism between widely separated transmitters and receivers introduces time-varying time offsets (TO) and phase offsets (PO), posing significant challenges. This paper introduces a signal-subspace-based framework that estimates decoupled angles, delays, and complex gain sequences (CGS)-- the target-reflected signals -- for multiple dynamic target paths. The proposed framework begins with a novel TO alignment algorithm, leveraging signal subspace or covariance, to mitigate TO variations across temporal snapshots, enabling coherent delay-domain analysis. Subsequently, subspace-based methods are developed to compensate for TO residuals and to perform joint angle-delay estimation. Finally, leveraging the high resolution in the joint angle-delay domain, the framework compensates for the PO and estimates the CGS for each target. The framework can be applied to both single-antenna and multi-antenna systems. Extensive simulations and experiments using commercial Wi-Fi devices demonstrate that the proposed framework significantly surpasses existing solutions in parameter estimation accuracy and delay resolution. Notably, it uniquely achieves a super-resolution in the delay domain, with a probability-of-resolution curve tightly approaching that in synchronized systems.
△ Less
Submitted 15 May, 2025;
originally announced May 2025.
-
MAISY: Motion-Aware Image SYnthesis for Medical Image Motion Correction
Authors:
Andrew Zhang,
Hao Wang,
Shuchang Ye,
Michael Fulham,
Jinman Kim
Abstract:
Patient motion during medical image acquisition causes blurring, ghosting, and distorts organs, which makes image interpretation challenging. Current state-of-the-art algorithms using Generative Adversarial Network (GAN)-based methods with their ability to learn the mappings between corrupted images and their ground truth via Structural Similarity Index Measure (SSIM) loss effectively generate mot…
▽ More
Patient motion during medical image acquisition causes blurring, ghosting, and distorts organs, which makes image interpretation challenging. Current state-of-the-art algorithms using Generative Adversarial Network (GAN)-based methods with their ability to learn the mappings between corrupted images and their ground truth via Structural Similarity Index Measure (SSIM) loss effectively generate motion-free images. However, we identified the following limitations: (i) they mainly focus on global structural characteristics and therefore overlook localized features that often carry critical pathological information, and (ii) the SSIM loss function struggles to handle images with varying pixel intensities, luminance factors, and variance. In this study, we propose Motion-Aware Image SYnthesis (MAISY) which initially characterize motion and then uses it for correction by: (a) leveraging the foundation model Segment Anything Model (SAM), to dynamically learn spatial patterns along anatomical boundaries where motion artifacts are most pronounced and, (b) introducing the Variance-Selective SSIM (VS-SSIM) loss which adaptively emphasizes spatial regions with high pixel variance to preserve essential anatomical details during artifact correction. Experiments on chest and head CT datasets demonstrate that our model outperformed the state-of-the-art counterparts, with Peak Signal-to-Noise Ratio (PSNR) increasing by 40%, SSIM by 10%, and Dice by 16%.
△ Less
Submitted 8 May, 2025; v1 submitted 6 May, 2025;
originally announced May 2025.
-
Zoomer: Adaptive Image Focus Optimization for Black-box MLLM
Authors:
Jiaxu Qian,
Chendong Wang,
Yifan Yang,
Chaoyun Zhang,
Huiqiang Jiang,
Xufang Luo,
Yu Kang,
Qingwei Lin,
Anlan Zhang,
Shiqi Jiang,
Ting Cao,
Tianjun Mao,
Suman Banerjee,
Guyue Liu,
Saravan Rajmohan,
Dongmei Zhang,
Yuqing Yang,
Qi Zhang,
Lili Qiu
Abstract:
Recent advancements in multimodal large language models (MLLMs) have broadened the scope of vision-language tasks, excelling in applications like image captioning and interactive question-answering. However, these models struggle with accurately processing visual data, particularly in tasks requiring precise object recognition and fine visual details. Stringent token limits often result in the omi…
▽ More
Recent advancements in multimodal large language models (MLLMs) have broadened the scope of vision-language tasks, excelling in applications like image captioning and interactive question-answering. However, these models struggle with accurately processing visual data, particularly in tasks requiring precise object recognition and fine visual details. Stringent token limits often result in the omission of critical information, hampering performance. To address these limitations, we introduce \SysName, a novel visual prompting mechanism designed to enhance MLLM performance while preserving essential visual details within token limits. \SysName features three key innovations: a prompt-aware strategy that dynamically highlights relevant image regions, a spatial-preserving orchestration schema that maintains object integrity, and a budget-aware prompting method that balances global context with crucial visual details. Comprehensive evaluations across multiple datasets demonstrate that \SysName consistently outperforms baseline methods, achieving up to a $26.9\%$ improvement in accuracy while significantly reducing token consumption.
△ Less
Submitted 29 April, 2025;
originally announced May 2025.
-
Research on Navigation Methods Based on LLMs
Authors:
Anlong Zhang,
Jianmin Ji
Abstract:
In recent years, the field of indoor navigation has witnessed groundbreaking advancements through the integration of Large Language Models (LLMs). Traditional navigation approaches relying on pre-built maps or reinforcement learning exhibit limitations such as poor generalization and limited adaptability to dynamic environments. In contrast, LLMs offer a novel paradigm for complex indoor navigatio…
▽ More
In recent years, the field of indoor navigation has witnessed groundbreaking advancements through the integration of Large Language Models (LLMs). Traditional navigation approaches relying on pre-built maps or reinforcement learning exhibit limitations such as poor generalization and limited adaptability to dynamic environments. In contrast, LLMs offer a novel paradigm for complex indoor navigation tasks by leveraging their exceptional semantic comprehension, reasoning capabilities, and zero-shot generalization properties. We propose an LLM-based navigation framework that leverages function calling capabilities, positioning the LLM as the central controller. Our methodology involves modular decomposition of conventional navigation functions into reusable LLM tools with expandable configurations. This is complemented by a systematically designed, transferable system prompt template and interaction workflow that can be easily adapted across different implementations. Experimental validation in PyBullet simulation environments across diverse scenarios demonstrates the substantial potential and effectiveness of our approach, particularly in achieving context-aware navigation through dynamic tool composition.
△ Less
Submitted 22 April, 2025;
originally announced April 2025.
-
Bayesian Sensing for Time-Varying Channels in ISAC Systems
Authors:
Xueyang Wang,
Kai Wu,
J. Andrew Zhang,
Shiqi Gong,
Chengwen Xing
Abstract:
Future mobile networks are projected to support integrated sensing and communications in high-speed communication scenarios. Nevertheless, large Doppler shifts induced by time-varying channels may cause severe inter-carrier interference (ICI). Frequency domain shows the potential of reducing ISAC complexity as compared with other domains. However, parameter mismatching issue still exists for such…
▽ More
Future mobile networks are projected to support integrated sensing and communications in high-speed communication scenarios. Nevertheless, large Doppler shifts induced by time-varying channels may cause severe inter-carrier interference (ICI). Frequency domain shows the potential of reducing ISAC complexity as compared with other domains. However, parameter mismatching issue still exists for such sensing. In this paper, we develop a novel sensing scheme based on sparse Bayesian framework, where the delay and Doppler estimation problem in time-varying channels is formulated as a 3D multiple measurement-sparse signal recovery (MM-SSR) problem. We then propose a novel two-layer variational Bayesian inference (VBI) method to decompose the 3D MM-SSR problem into two layers and estimate the Doppler in the first layer and the delay in the second layer alternatively. Subsequently, as is benefited from newly unveiled signal construction, a simplified two-stage multiple signal classification (MUSIC)-based VBI method is proposed, where the delay and the Doppler are estimated by MUSIC and VBI, respectively. Additionally, the Cramér-Rao bound (CRB) of the considered sensing parameters is derived to characterize the lower bound for the proposed estimators. Corroborated by extensive simulation results, our proposed method can achieve improved mean square error (MSE) than its conventional counterparts and is robust against the target number and target speed, thereby validating its wide applicability and advantages over prior arts.
△ Less
Submitted 21 April, 2025;
originally announced April 2025.
-
Transformation of audio embeddings into interpretable, concept-based representations
Authors:
Alice Zhang,
Edison Thomaz,
Lie Lu
Abstract:
Advancements in audio neural networks have established state-of-the-art results on downstream audio tasks. However, the black-box structure of these models makes it difficult to interpret the information encoded in their internal audio representations. In this work, we explore the semantic interpretability of audio embeddings extracted from these neural networks by leveraging CLAP, a contrastive l…
▽ More
Advancements in audio neural networks have established state-of-the-art results on downstream audio tasks. However, the black-box structure of these models makes it difficult to interpret the information encoded in their internal audio representations. In this work, we explore the semantic interpretability of audio embeddings extracted from these neural networks by leveraging CLAP, a contrastive learning model that brings audio and text into a shared embedding space. We implement a post-hoc method to transform CLAP embeddings into concept-based, sparse representations with semantic interpretability. Qualitative and quantitative evaluations show that the concept-based representations outperform or match the performance of original audio embeddings on downstream tasks while providing interpretability. Additionally, we demonstrate that fine-tuning the concept-based representations can further improve their performance on downstream tasks. Lastly, we publish three audio-specific vocabularies for concept-based interpretability of audio embeddings.
△ Less
Submitted 18 April, 2025;
originally announced April 2025.
-
NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement: Methods and Results
Authors:
Xin Li,
Kun Yuan,
Bingchen Li,
Fengbin Guan,
Yizhen Shao,
Zihao Yu,
Xijun Wang,
Yiting Lu,
Wei Luo,
Suhang Yao,
Ming Sun,
Chao Zhou,
Zhibo Chen,
Radu Timofte,
Yabin Zhang,
Ao-Xiang Zhang,
Tianwu Zhi,
Jianzhao Liu,
Yang Li,
Jingwen Xu,
Yiting Liao,
Yushen Zuo,
Mingyang Wu,
Renjie Li,
Shengyun Zhong
, et al. (88 additional authors not shown)
Abstract:
This paper presents a review for the NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement. The challenge comprises two tracks: (i) Efficient Video Quality Assessment (KVQ), and (ii) Diffusion-based Image Super-Resolution (KwaiSR). Track 1 aims to advance the development of lightweight and efficient video quality assessment (VQA) models, with an emphasis on eliminating re…
▽ More
This paper presents a review for the NTIRE 2025 Challenge on Short-form UGC Video Quality Assessment and Enhancement. The challenge comprises two tracks: (i) Efficient Video Quality Assessment (KVQ), and (ii) Diffusion-based Image Super-Resolution (KwaiSR). Track 1 aims to advance the development of lightweight and efficient video quality assessment (VQA) models, with an emphasis on eliminating reliance on model ensembles, redundant weights, and other computationally expensive components in the previous IQA/VQA competitions. Track 2 introduces a new short-form UGC dataset tailored for single image super-resolution, i.e., the KwaiSR dataset. It consists of 1,800 synthetically generated S-UGC image pairs and 1,900 real-world S-UGC images, which are split into training, validation, and test sets using a ratio of 8:1:1. The primary objective of the challenge is to drive research that benefits the user experience of short-form UGC platforms such as Kwai and TikTok. This challenge attracted 266 participants and received 18 valid final submissions with corresponding fact sheets, significantly contributing to the progress of short-form UGC VQA and image superresolution. The project is publicly available at https://github.com/lixinustc/KVQE- ChallengeCVPR-NTIRE2025.
△ Less
Submitted 17 April, 2025;
originally announced April 2025.
-
Integrated Sensing and Communications Over the Years: An Evolution Perspective
Authors:
Di Zhang,
Yuanhao Cui,
Xiaowen Cao,
Nanchi Su,
Fan Liu,
Xiaojun Jing,
J. Andrew Zhang,
Jie Xu,
Christos Masouros,
Dusit Niyato,
Marco Di Renzo
Abstract:
Integrated Sensing and Communications (ISAC) enables efficient spectrum utilization and reduces hardware costs for beyond 5G (B5G) and 6G networks, facilitating intelligent applications that require both high-performance communication and precise sensing capabilities. This survey provides a comprehensive review of the evolution of ISAC over the years. We examine the expansion of the spectrum acros…
▽ More
Integrated Sensing and Communications (ISAC) enables efficient spectrum utilization and reduces hardware costs for beyond 5G (B5G) and 6G networks, facilitating intelligent applications that require both high-performance communication and precise sensing capabilities. This survey provides a comprehensive review of the evolution of ISAC over the years. We examine the expansion of the spectrum across RF and optical ISAC, highlighting the role of advanced technologies, along with key challenges and synergies. We further discuss the advancements in network architecture from single-cell to multi-cell systems, emphasizing the integration of collaborative sensing and interference mitigation strategies. Moreover, we analyze the progress from single-modal to multi-modal sensing, with a focus on the integration of edge intelligence to enable real-time data processing, reduce latency, and enhance decision-making. Finally, we extensively review standardization efforts by 3GPP, IEEE, and ITU, examining the transition of ISAC-related technologies and their implications for the deployment of 6G networks.
△ Less
Submitted 9 April, 2025;
originally announced April 2025.
-
Score-Based Turbo Message Passing for Plug-and-Play Compressive Image Recovery
Authors:
Chang Cai,
Xiaojun Yuan,
Ying-Jun Angela Zhang
Abstract:
Message passing algorithms have been tailored for compressive imaging applications by plugging in different types of off-the-shelf image denoisers. These off-the-shelf denoisers mostly rely on some generic or hand-crafted priors for denoising. Due to their insufficient accuracy in capturing the true image prior, these methods often fail to produce satisfactory results, especially in largely underd…
▽ More
Message passing algorithms have been tailored for compressive imaging applications by plugging in different types of off-the-shelf image denoisers. These off-the-shelf denoisers mostly rely on some generic or hand-crafted priors for denoising. Due to their insufficient accuracy in capturing the true image prior, these methods often fail to produce satisfactory results, especially in largely underdetermined scenarios. On the other hand, score-based generative modeling offers a promising way to accurately characterize the sophisticated image distribution. In this paper, by exploiting the close relation between score-based modeling and empirical Bayes-optimal denoising, we devise a message passing framework that integrates a score-based minimum mean squared error (MMSE) denoiser for compressive image recovery. This framework is firmly rooted in Bayesian formalism, in which state evolution (SE) equations accurately predict its asymptotic performance. Experiments on the FFHQ dataset demonstrate that our method strikes a significantly better performance-complexity tradeoff than conventional message passing, regularized linear regression, and score-based posterior sampling baselines. Remarkably, our method typically requires less than 20 neural function evaluations (NFEs) to converge.
△ Less
Submitted 28 March, 2025;
originally announced March 2025.
-
Deep Learning-based OTFS Channel Estimation and Symbol Detection with Plug and Play Framework
Authors:
Xiaoqi Zhang,
Zhitong Ni,
Weijie Yuan,
J. Andrew Zhang
Abstract:
Orthogonal Time Frequency Space (OTFS) modulation has recently attracted significant interest due to its potential for enabling reliable communication in high-mobility environments. One of the challenges for OTFS receivers is the fractional Doppler that occurs in practical systems, resulting in decreased channel sparsity, and then inaccurate channel estimation and high-complexity equalization. In…
▽ More
Orthogonal Time Frequency Space (OTFS) modulation has recently attracted significant interest due to its potential for enabling reliable communication in high-mobility environments. One of the challenges for OTFS receivers is the fractional Doppler that occurs in practical systems, resulting in decreased channel sparsity, and then inaccurate channel estimation and high-complexity equalization. In this paper, we propose a novel unsupervised deep learning (DL)-based OTFS channel estimation and symbol detection scheme, capable of handling different channel conditions, even in the presence of fractional Doppler. In particular, we design a unified plug-and-play (PnP) framework, which can jointly exploit the flexibility of optimization-based methods and utilize the powerful data-driven capability of DL. A lightweight Unet is integrated into the framework as a powerful implicit channel prior for channel estimation, leading to better exploitation of the channel sparsity and the characteristic of the noise simultaneously. Furthermore, to mitigate the channel estimation errors, we realize the PnP framework with a fully connected (FC) network for symbol detection at different noise levels, thereby enhancing robustness. Finally, numerical results demonstrate the effectiveness and robustness of the algorithm.
△ Less
Submitted 14 March, 2025;
originally announced March 2025.
-
High-Resolution Uplink Sensing in Millimeter-Wave ISAC Systems
Authors:
Liangbin Zhao,
Zhitong Ni,
Yimeng Feng,
Jianguo Li,
Xiangyuan Bu,
J. Andrew Zhang
Abstract:
Perceptive mobile networks (PMNs), integrating ubiquitous sensing capabilities into mobile networks, represent an important application of integrated sensing and communication (ISAC) in 6G. In this paper, we propose a practical framework for uplink sensing of angle-of-arrival (AoA), Doppler, and delay in millimeter-wave (mmWave) communication systems, which addresses challenges posed by clock asyn…
▽ More
Perceptive mobile networks (PMNs), integrating ubiquitous sensing capabilities into mobile networks, represent an important application of integrated sensing and communication (ISAC) in 6G. In this paper, we propose a practical framework for uplink sensing of angle-of-arrival (AoA), Doppler, and delay in millimeter-wave (mmWave) communication systems, which addresses challenges posed by clock asynchrony and hybrid arrays, while being compatible with existing communication protocols. We first introduce a beam scanning method and a corresponding AoA estimation algorithm, which utilizes frequency smoothing to effectively estimate AoAs for both static and dynamic paths. We then propose several methods for constructing a ``clean'' reference signal, which is subsequently used to cancel the effect caused by the clock asynchrony. We further develop a signal ratio-based joint AoA-Doppler-delay estimator and propose an AoA-based 2D-FFT-MUSIC (AB2FM) algorithm that applies 2D-FFT operations on the signal subspace, which accelerates the computation process with low complexity. Our proposed framework can estimate parameters in pairs, removing the complicated parameter association process. Simulation results validate the effectiveness of our proposed framework and demonstrate its robustness in both low and high signal-to-noise ratio (SNR) conditions.
△ Less
Submitted 13 March, 2025;
originally announced March 2025.
-
VoLUT: Efficient Volumetric streaming enhanced by LUT-based super-resolution
Authors:
Chendong Wang,
Anlan Zhang,
Yifan Yang,
Lili Qiu,
Yuqing Yang,
Xinyang Jiang,
Feng Qian,
Suman Banerjee
Abstract:
3D volumetric video provides immersive experience and is gaining traction in digital media. Despite its rising popularity, the streaming of volumetric video content poses significant challenges due to the high data bandwidth requirement. A natural approach to mitigate the bandwidth issue is to reduce the volumetric video's data rate by downsampling the content prior to transmission. The video can…
▽ More
3D volumetric video provides immersive experience and is gaining traction in digital media. Despite its rising popularity, the streaming of volumetric video content poses significant challenges due to the high data bandwidth requirement. A natural approach to mitigate the bandwidth issue is to reduce the volumetric video's data rate by downsampling the content prior to transmission. The video can then be upsampled at the receiver's end using a super-resolution (SR) algorithm to reconstruct the high-resolution details. While super-resolution techniques have been extensively explored and advanced for 2D video content, there is limited work on SR algorithms tailored for volumetric videos.
To address this gap and the growing need for efficient volumetric video streaming, we have developed VoLUT with a new SR algorithm specifically designed for volumetric content. Our algorithm uniquely harnesses the power of lookup tables (LUTs) to facilitate the efficient and accurate upscaling of low-resolution volumetric data. The use of LUTs enables our algorithm to quickly reference precomputed high-resolution values, thereby significantly reducing the computational complexity and time required for upscaling. We further apply adaptive video bit rate algorithm (ABR) to dynamically determine the downsampling rate according to the network condition and stream the selected video rate to the receiver. Compared to related work, VoLUT is the first to enable high-quality 3D SR on commodity mobile devices at line-rate. Our evaluation shows VoLUT can reduce bandwidth usage by 70% , boost QoE by 36.7% for volumetric video streaming and achieve
3D SR speed-up with no quality compromise.
△ Less
Submitted 17 February, 2025;
originally announced February 2025.
-
A Privacy-Preserving Domain Adversarial Federated learning for multi-site brain functional connectivity analysis
Authors:
Yipu Zhang,
Likai Wang,
Kuan-Jui Su,
Aiying Zhang,
Hao Zhu,
Xiaowen Liu,
Hui Shen,
Vince D. Calhoun,
Yuping Wang,
Hongwen Deng
Abstract:
Resting-state functional magnetic resonance imaging (rs-fMRI) and its derived functional connectivity networks (FCNs) have become critical for understanding neurological disorders. However, collaborative analyses and the generalizability of models still face significant challenges due to privacy regulations and the non-IID (non-independent and identically distributed) property of multiple data sou…
▽ More
Resting-state functional magnetic resonance imaging (rs-fMRI) and its derived functional connectivity networks (FCNs) have become critical for understanding neurological disorders. However, collaborative analyses and the generalizability of models still face significant challenges due to privacy regulations and the non-IID (non-independent and identically distributed) property of multiple data sources. To mitigate these difficulties, we propose Domain Adversarial Federated Learning (DAFed), a novel federated deep learning framework specifically designed for non-IID fMRI data analysis in multi-site settings. DAFed addresses these challenges through feature disentanglement, decomposing the latent feature space into domain-invariant and domain-specific components, to ensure robust global learning while preserving local data specificity. Furthermore, adversarial training facilitates effective knowledge transfer between labeled and unlabeled datasets, while a contrastive learning module enhances the global representation of domain-invariant features. We evaluated DAFed on the diagnosis of ASD and further validated its generalizability in the classification of AD, demonstrating its superior classification accuracy compared to state-of-the-art methods. Additionally, an enhanced Score-CAM module identifies key brain regions and functional connectivity significantly associated with ASD and MCI, respectively, uncovering shared neurobiological patterns across sites. These findings highlight the potential of DAFed to advance multi-site collaborative research in neuroimaging while protecting data confidentiality.
△ Less
Submitted 3 February, 2025;
originally announced February 2025.
-
Multi-Carrier Faster-Than-Nyquist Signaling for OTFS Systems
Authors:
Xueyang Wang,
Shiqi Gong,
Wenqian Shen,
Chengwen Xing,
J. Andrew Zhang
Abstract:
Orthogonal time frequency space (OTFS) modulation technique is promising for high-mobility applications to achieve reliable communications. However, the capacity of OTFS systems is generally limited by the Nyquist criterion, requiring orthogonal pulses in both time and frequency domains. In this paper, we propose a novel multi-carrier faster-than-Nyquist (MC-FTN) signaling scheme for OTFS systems.…
▽ More
Orthogonal time frequency space (OTFS) modulation technique is promising for high-mobility applications to achieve reliable communications. However, the capacity of OTFS systems is generally limited by the Nyquist criterion, requiring orthogonal pulses in both time and frequency domains. In this paper, we propose a novel multi-carrier faster-than-Nyquist (MC-FTN) signaling scheme for OTFS systems. By adopting non-orthogonal pulses in both time and frequency domains, our scheme significantly improves the capacity of OTFS systems. Specifically, we firstly develop the signal models for both single-input single-output (SISO) and multiple-input multiple-output (MIMO) OTFS systems. Then, we optimize the delay-Doppler (DD) domain precoding matrix at the transmitter to suppress both the inter-symbol interference (ISI) and inter-carrier interference (ICI) introduced by the MC-FTN signaling. For SISO systems, we develop an eigenvalue decomposition (EVD) precoding scheme with optimal power allocation (PA) for achieving the maximum capacity. For MIMO systems, we develop a successive interference cancellation (SIC)-based precoding scheme via decomposing the capacity maximization problem into multiple sub-capacity maximization problems with largely reduced dimensions of optimization variables. Numerical results demonstrate that our proposed MC-FTN-OTFS signaling scheme achieves significantly higher capacity than traditional Nyquist-criterion-based OTFS systems. Moreover, the SIC-based precoding scheme can effectively reduce the complexity of MIMO capacity maximization, while attaining performance close to the optimal EVD-based precoding scheme.
△ Less
Submitted 12 January, 2025;
originally announced January 2025.
-
Joint Coverage and Electromagnetic Field Exposure Analysis in Downlink and Uplink for RIS-assisted Networks
Authors:
Lin Chen,
Ahmed Elzanaty,
Mustafa A Kishk,
Ying-Jun Angela Zhang
Abstract:
Reconfigurable intelligent surfaces (RISs) have shown the potential to improve signal-to-interference-plus-noise ratio (SINR) related coverage, especially at high-frequency communications. However, assessing electromagnetic filed exposure (EMFE) and establishing EMFE regulations in RIS-assisted large-scale networks are still open issues. This paper proposes a framework to characterize SINR and EMF…
▽ More
Reconfigurable intelligent surfaces (RISs) have shown the potential to improve signal-to-interference-plus-noise ratio (SINR) related coverage, especially at high-frequency communications. However, assessing electromagnetic filed exposure (EMFE) and establishing EMFE regulations in RIS-assisted large-scale networks are still open issues. This paper proposes a framework to characterize SINR and EMFE in such networks for downlink and uplink scenarios. Particularly, we carefully consider the association rule with the presence of RISs, accurate antenna pattern at base stations (BSs), fading model, and power control mechanism at mobile devices in the system model. Under the proposed framework, we derive the marginal and joint distributions of SINR and EMFE in downlink and uplink, respectively. The first moment of EMFE is also provided. Additionally, we design the compliance distance (CD) between a BS/RIS and a user to comply with the EMFE regulations. To facilitate efficient identification, we further provide approximate closed-form expressions for CDs. From numerical results of the marginal distributions, we find that in the downlink scenario, deploying RISs may not always be beneficial, as the improved SINR comes at the cost of increased EMFE. However, in the uplink scenario, RIS deployment is promising to enhance coverage while still maintaining EMFE compliance. By simultaneously evaluating coverage and compliance metrics through joint distributions, we demonstrate the feasibility of RISs in improving uplink and downlink performance. Insights from this framework can contribute to establishing EMFE guidelines and achieving a balance between coverage and compliance when deploying RISs.
△ Less
Submitted 29 November, 2024;
originally announced December 2024.
-
Multimodal Whole Slide Foundation Model for Pathology
Authors:
Tong Ding,
Sophia J. Wagner,
Andrew H. Song,
Richard J. Chen,
Ming Y. Lu,
Andrew Zhang,
Anurag J. Vaidya,
Guillaume Jaume,
Muhammad Shaban,
Ahrong Kim,
Drew F. K. Williamson,
Bowen Chen,
Cristina Almagro-Perez,
Paul Doucet,
Sharifa Sahai,
Chengkuan Chen,
Daisuke Komura,
Akihiro Kawabe,
Shumpei Ishikawa,
Georg Gerber,
Tingying Peng,
Long Phi Le,
Faisal Mahmood
Abstract:
The field of computational pathology has been transformed with recent advances in foundation models that encode histopathology region-of-interests (ROIs) into versatile and transferable feature representations via self-supervised learning (SSL). However, translating these advancements to address complex clinical challenges at the patient and slide level remains constrained by limited clinical data…
▽ More
The field of computational pathology has been transformed with recent advances in foundation models that encode histopathology region-of-interests (ROIs) into versatile and transferable feature representations via self-supervised learning (SSL). However, translating these advancements to address complex clinical challenges at the patient and slide level remains constrained by limited clinical data in disease-specific cohorts, especially for rare clinical conditions. We propose TITAN, a multimodal whole slide foundation model pretrained using 335,645 WSIs via visual self-supervised learning and vision-language alignment with corresponding pathology reports and 423,122 synthetic captions generated from a multimodal generative AI copilot for pathology. Without any finetuning or requiring clinical labels, TITAN can extract general-purpose slide representations and generate pathology reports that generalize to resource-limited clinical scenarios such as rare disease retrieval and cancer prognosis. We evaluate TITAN on diverse clinical tasks and find that TITAN outperforms both ROI and slide foundation models across machine learning settings such as linear probing, few-shot and zero-shot classification, rare cancer retrieval and cross-modal retrieval, and pathology report generation.
△ Less
Submitted 29 November, 2024;
originally announced November 2024.
-
Optimizing Fingerprint-Spectrum-Based Synchronization in Integrated Sensing and Communications
Authors:
Xiao-Yang Wang,
Shaoshi Yang,
Hou-Yu Zhai,
Christos Masouros,
J. Andrew Zhang
Abstract:
Asynchronous radio transceivers often lead to significant range and velocity ambiguity, posing challenges for precise positioning and velocity estimation in passive-sensing perceptive mobile networks (PMNs). To address this issue, carrier frequency offset (CFO) and time offset (TO) synchronization algorithms have been studied in the literature. However, their performance can be significantly affec…
▽ More
Asynchronous radio transceivers often lead to significant range and velocity ambiguity, posing challenges for precise positioning and velocity estimation in passive-sensing perceptive mobile networks (PMNs). To address this issue, carrier frequency offset (CFO) and time offset (TO) synchronization algorithms have been studied in the literature. However, their performance can be significantly affected by the specific choice of the utilized window functions. Hence, we set out to find superior window functions capable of improving the performance of CFO and TO estimation algorithms. We first derive a near-optimal window, and the theoretical synchronization mean square error (MSE) when utilizing this window. However, since this window is not practically achievable, we then develop a practical window selection criterion and test a special window generated by the super-resolution algorithm. Numerical simulation has verified our analysis.
△ Less
Submitted 13 October, 2024;
originally announced October 2024.
-
Secure Video Quality Assessment Resisting Adversarial Attacks
Authors:
Ao-Xiang Zhang,
Yu Ran,
Weixuan Tang,
Yuan-Gen Wang,
Qingxiao Guan,
Chunsheng Yang
Abstract:
The exponential surge in video traffic has intensified the imperative for Video Quality Assessment (VQA). Leveraging cutting-edge architectures, current VQA models have achieved human-comparable accuracy. However, recent studies have revealed the vulnerability of existing VQA models against adversarial attacks. To establish a reliable and practical assessment system, a secure VQA model capable of…
▽ More
The exponential surge in video traffic has intensified the imperative for Video Quality Assessment (VQA). Leveraging cutting-edge architectures, current VQA models have achieved human-comparable accuracy. However, recent studies have revealed the vulnerability of existing VQA models against adversarial attacks. To establish a reliable and practical assessment system, a secure VQA model capable of resisting such malicious attacks is urgently demanded. Unfortunately, no attempt has been made to explore this issue. This paper first attempts to investigate general adversarial defense principles, aiming at endowing existing VQA models with security. Specifically, we first introduce random spatial grid sampling on the video frame for intra-frame defense. Then, we design pixel-wise randomization through a guardian map, globally neutralizing adversarial perturbations. Meanwhile, we extract temporal information from the video sequence as compensation for inter-frame defense. Building upon these principles, we present a novel VQA framework from the security-oriented perspective, termed SecureVQA. Extensive experiments indicate that SecureVQA sets a new benchmark in security while achieving competitive VQA performance compared with state-of-the-art models. Ablation studies delve deeper into analyzing the principles of SecureVQA, demonstrating their generalization and contributions to the security of leading VQA models.
△ Less
Submitted 9 October, 2024;
originally announced October 2024.
-
AIM 2024 Challenge on Video Super-Resolution Quality Assessment: Methods and Results
Authors:
Ivan Molodetskikh,
Artem Borisov,
Dmitriy Vatolin,
Radu Timofte,
Jianzhao Liu,
Tianwu Zhi,
Yabin Zhang,
Yang Li,
Jingwen Xu,
Yiting Liao,
Qing Luo,
Ao-Xiang Zhang,
Peng Zhang,
Haibo Lei,
Linyan Jiang,
Yaqing Li,
Yuqin Cao,
Wei Sun,
Weixia Zhang,
Yinan Sun,
Ziheng Jia,
Yuxin Zhu,
Xiongkuo Min,
Guangtao Zhai,
Weihua Luo
, et al. (2 additional authors not shown)
Abstract:
This paper presents the Video Super-Resolution (SR) Quality Assessment (QA) Challenge that was part of the Advances in Image Manipulation (AIM) workshop, held in conjunction with ECCV 2024. The task of this challenge was to develop an objective QA method for videos upscaled 2x and 4x by modern image- and video-SR algorithms. QA methods were evaluated by comparing their output with aggregate subjec…
▽ More
This paper presents the Video Super-Resolution (SR) Quality Assessment (QA) Challenge that was part of the Advances in Image Manipulation (AIM) workshop, held in conjunction with ECCV 2024. The task of this challenge was to develop an objective QA method for videos upscaled 2x and 4x by modern image- and video-SR algorithms. QA methods were evaluated by comparing their output with aggregate subjective scores collected from >150,000 pairwise votes obtained through crowd-sourced comparisons across 52 SR methods and 1124 upscaled videos. The goal was to advance the state-of-the-art in SR QA, which had proven to be a challenging problem with limited applicability of traditional QA methods. The challenge had 29 registered participants, and 5 teams had submitted their final results, all outperforming the current state-of-the-art. All data, including the private test subset, has been made publicly available on the challenge homepage at https://challenges.videoprocessing.ai/challenges/super-resolution-metrics-challenge.html
△ Less
Submitted 5 October, 2024;
originally announced October 2024.
-
Windowing Optimization for Fingerprint-Spectrum-Based Passive Sensing in Perceptive Mobile Networks
Authors:
Xiao-Yang Wang,
Shaoshi Yang,
Hou-Yu Zhai,
Christos Masouros,
J. Andrew Zhang
Abstract:
Perceptive mobile networks (PMN) have been widely recognized as a pivotal pillar for the sixth generation (6G) mobile communication systems. However, the asynchronicity between transmitters and receivers results in velocity and range ambiguity, which seriously degrades the sensing performance. To mitigate the ambiguity, carrier frequency offset (CFO) and time offset (TO) synchronizations have been…
▽ More
Perceptive mobile networks (PMN) have been widely recognized as a pivotal pillar for the sixth generation (6G) mobile communication systems. However, the asynchronicity between transmitters and receivers results in velocity and range ambiguity, which seriously degrades the sensing performance. To mitigate the ambiguity, carrier frequency offset (CFO) and time offset (TO) synchronizations have been studied in the literature. However, their performance can be significantly affected by the specific choice of the window functions harnessed. Hence, we set out to find superior window functions capable of improving the performance of CFO and TO estimation algorithms. We firstly derive a near-optimal window, and the theoretical synchronization mean square error (MSE) when utilizing this window. However, since this window is not practically achievable, we then test a practical "window function" by utilizing the multiple signal classification (MUSIC) algorithm, which may lead to excellent synchronization performance.
△ Less
Submitted 2 September, 2024;
originally announced September 2024.
-
End-to-End Learning for Task-Oriented Semantic Communications Over MIMO Channels: An Information-Theoretic Framework
Authors:
Chang Cai,
Xiaojun Yuan,
Ying-Jun Angela Zhang
Abstract:
This paper addresses the problem of end-to-end (E2E) design of learning and communication in a task-oriented semantic communication system. In particular, we consider a multi-device cooperative edge inference system over a wireless multiple-input multiple-output (MIMO) multiple access channel, where multiple devices transmit extracted features to a server to perform a classification task. We formu…
▽ More
This paper addresses the problem of end-to-end (E2E) design of learning and communication in a task-oriented semantic communication system. In particular, we consider a multi-device cooperative edge inference system over a wireless multiple-input multiple-output (MIMO) multiple access channel, where multiple devices transmit extracted features to a server to perform a classification task. We formulate the E2E design of feature encoding, MIMO precoding, and classification as a conditional mutual information maximization problem. However, it is notoriously difficult to design and train an E2E network that can be adaptive to both the task dataset and different channel realizations. Regarding network training, we propose a decoupled pretraining framework that separately trains the feature encoder and the MIMO precoder, with a maximum a posteriori (MAP) classifier employed at the server to generate the inference result. The feature encoder is pretrained exclusively using the task dataset, while the MIMO precoder is pretrained solely based on the channel and noise distributions. Nevertheless, we manage to align the pretraining objectives of each individual component with the E2E learning objective, so as to approach the performance bound of E2E learning. By leveraging the decoupled pretraining results for initialization, the E2E learning can be conducted with minimal training overhead. Regarding network architecture design, we develop two deep unfolded precoding networks that effectively incorporate the domain knowledge of the solution to the decoupled precoding problem. Simulation results on both the CIFAR-10 and ModelNet10 datasets verify that the proposed method achieves significantly higher classification accuracy compared to various baselines.
△ Less
Submitted 30 August, 2024;
originally announced August 2024.
-
Joint Offloading and Beamforming Design in Integrating Sensing, Communication, and Computing Systems: A Distributed Approach
Authors:
Peng Liu,
Zesong Fei,
Xinyi Wang,
Jingxuan Huang,
Jie Hu,
J. Andrew Zhang
Abstract:
When applying integrated sensing and communications (ISAC) in future mobile networks, many sensing tasks have low latency requirements, preferably being implemented at terminals. However, terminals often have limited computing capabilities and energy supply. In this paper, we investigate the effectiveness of leveraging the advanced computing capabilities of mobile edge computing (MEC) servers and…
▽ More
When applying integrated sensing and communications (ISAC) in future mobile networks, many sensing tasks have low latency requirements, preferably being implemented at terminals. However, terminals often have limited computing capabilities and energy supply. In this paper, we investigate the effectiveness of leveraging the advanced computing capabilities of mobile edge computing (MEC) servers and the cloud server to address the sensing tasks of ISAC terminals. Specifically, we propose a novel three-tier integrated sensing, communication, and computing (ISCC) framework composed of one cloud server, multiple MEC servers, and multiple terminals, where the terminals can optionally offload sensing data to the MEC server or the cloud server. The offload message is sent via the ISAC waveform, whose echo is used for sensing. We jointly optimize the computation offloading and beamforming strategies to minimize the average execution latency while satisfying sensing requirements. In particular, we propose a low-complexity distributed algorithm to solve the problem. Firstly, we use the alternating direction method of multipliers (ADMM) and derive the closed-form solution for offloading decision variables. Subsequently, we convert the beamforming optimization sub-problem into a weighted minimum mean-square error (WMMSE) problem and propose a fractional programming based algorithm. Numerical results demonstrate that the proposed ISCC framework and distributed algorithm significantly reduce the execution latency and the energy consumption of sensing tasks at a lower computational complexity compared to existing schemes.
△ Less
Submitted 27 August, 2024;
originally announced August 2024.
-
Learning Multi-Rate Task-Oriented Communications Over Symmetric Discrete Memoryless Channels
Authors:
Anbang Zhang,
Shuaishuai Guo
Abstract:
This letter introduces a multi-rate task-oriented communication (MR-ToC) framework. This framework dynamically adapts to variations in affordable data rate within the communication pipeline. It conceptualizes communication pipelines as symmetric, discrete, memoryless channels. We employ a progressive learning strategy to train the system, comprising a nested codebook for encoding and task inferenc…
▽ More
This letter introduces a multi-rate task-oriented communication (MR-ToC) framework. This framework dynamically adapts to variations in affordable data rate within the communication pipeline. It conceptualizes communication pipelines as symmetric, discrete, memoryless channels. We employ a progressive learning strategy to train the system, comprising a nested codebook for encoding and task inference. This configuration allows for the adjustment of multiple rate levels in response to evolving channel conditions. The results from our experiments show that this system not only supports edge inference across various coding levels but also excels in adapting to variable communication environments.
△ Less
Submitted 24 August, 2024;
originally announced August 2024.
-
AIM 2024 Challenge on Compressed Video Quality Assessment: Methods and Results
Authors:
Maksim Smirnov,
Aleksandr Gushchin,
Anastasia Antsiferova,
Dmitry Vatolin,
Radu Timofte,
Ziheng Jia,
Zicheng Zhang,
Wei Sun,
Jiaying Qian,
Yuqin Cao,
Yinan Sun,
Yuxin Zhu,
Xiongkuo Min,
Guangtao Zhai,
Kanjar De,
Qing Luo,
Ao-Xiang Zhang,
Peng Zhang,
Haibo Lei,
Linyan Jiang,
Yaqing Li,
Wenhui Meng,
Zhenzhong Chen,
Zhengxue Cheng,
Jiahao Xiao
, et al. (7 additional authors not shown)
Abstract:
Video quality assessment (VQA) is a crucial task in the development of video compression standards, as it directly impacts the viewer experience. This paper presents the results of the Compressed Video Quality Assessment challenge, held in conjunction with the Advances in Image Manipulation (AIM) workshop at ECCV 2024. The challenge aimed to evaluate the performance of VQA methods on a diverse dat…
▽ More
Video quality assessment (VQA) is a crucial task in the development of video compression standards, as it directly impacts the viewer experience. This paper presents the results of the Compressed Video Quality Assessment challenge, held in conjunction with the Advances in Image Manipulation (AIM) workshop at ECCV 2024. The challenge aimed to evaluate the performance of VQA methods on a diverse dataset of 459 videos, encoded with 14 codecs of various compression standards (AVC/H.264, HEVC/H.265, AV1, and VVC/H.266) and containing a comprehensive collection of compression artifacts. To measure the methods performance, we employed traditional correlation coefficients between their predictions and subjective scores, which were collected via large-scale crowdsourced pairwise human comparisons. For training purposes, participants were provided with the Compressed Video Quality Assessment Dataset (CVQAD), a previously developed dataset of 1022 videos. Up to 30 participating teams registered for the challenge, while we report the results of 6 teams, which submitted valid final solutions and code for reproducing the results. Moreover, we calculated and present the performance of state-of-the-art VQA methods on the developed dataset, providing a comprehensive benchmark for future research. The dataset, results, and online leaderboard are publicly available at https://challenges.videoprocessing.ai/challenges/compressedvideo-quality-assessment.html.
△ Less
Submitted 22 October, 2024; v1 submitted 21 August, 2024;
originally announced August 2024.
-
Semantic-Enabled 6G Communication: A Task-oriented and Privacy-preserving Perspective
Authors:
Shuaishuai Guo,
Anbang Zhang,
Yanhu Wang,
Chenyuan Feng,
Tony Q. S. Quek
Abstract:
Task-oriented semantic communication (ToSC) emerges as an innovative approach in the 6G landscape, characterized by the transmission of only vital information that is directly pertinent to a specific task. While ToSC offers an efficient mode of communication, it concurrently raises concerns regarding privacy, as sophisticated adversaries might possess the capability to reconstruct the original dat…
▽ More
Task-oriented semantic communication (ToSC) emerges as an innovative approach in the 6G landscape, characterized by the transmission of only vital information that is directly pertinent to a specific task. While ToSC offers an efficient mode of communication, it concurrently raises concerns regarding privacy, as sophisticated adversaries might possess the capability to reconstruct the original data from the transmitted features. This paper provides an in-depth analysis of privacy-preserving strategies specifically designed for ToSC relying on deep neural network-based joint source and channel coding (DeepJSCC). Our study encompasses a detailed comparative assessment of trustworthy feature perturbation methods such as differential privacy (DP) and encryption, alongside intrinsic security incorporation approaches like adversarial learning to train the JSCC and learning-based vector quantization (LBVQ). Our comparative analysis underscores the integration of advanced explainable learning algorithms into communication systems, positing a new benchmark for privacy standards in the forthcoming 6G era.
△ Less
Submitted 2 April, 2025; v1 submitted 7 August, 2024;
originally announced August 2024.
-
Multistain Pretraining for Slide Representation Learning in Pathology
Authors:
Guillaume Jaume,
Anurag Vaidya,
Andrew Zhang,
Andrew H. Song,
Richard J. Chen,
Sharifa Sahai,
Dandan Mo,
Emilio Madrigal,
Long Phi Le,
Faisal Mahmood
Abstract:
Developing self-supervised learning (SSL) models that can learn universal and transferable representations of H&E gigapixel whole-slide images (WSIs) is becoming increasingly valuable in computational pathology. These models hold the potential to advance critical tasks such as few-shot classification, slide retrieval, and patient stratification. Existing approaches for slide representation learnin…
▽ More
Developing self-supervised learning (SSL) models that can learn universal and transferable representations of H&E gigapixel whole-slide images (WSIs) is becoming increasingly valuable in computational pathology. These models hold the potential to advance critical tasks such as few-shot classification, slide retrieval, and patient stratification. Existing approaches for slide representation learning extend the principles of SSL from small images (e.g., 224 x 224 patches) to entire slides, usually by aligning two different augmentations (or views) of the slide. Yet the resulting representation remains constrained by the limited clinical and biological diversity of the views. Instead, we postulate that slides stained with multiple markers, such as immunohistochemistry, can be used as different views to form a rich task-agnostic training signal. To this end, we introduce Madeleine, a multimodal pretraining strategy for slide representation learning. Madeleine is trained with a dual global-local cross-stain alignment objective on large cohorts of breast cancer samples (N=4,211 WSIs across five stains) and kidney transplant samples (N=12,070 WSIs across four stains). We demonstrate the quality of slide representations learned by Madeleine on various downstream evaluations, ranging from morphological and molecular classification to prognostic prediction, comprising 21 tasks using 7,299 WSIs from multiple medical centers. Code is available at https://github.com/mahmoodlab/MADELEINE.
△ Less
Submitted 5 August, 2024;
originally announced August 2024.
-
Wireless Communications in Doubly Selective Channels with Domain Adaptivity
Authors:
J. Andrew Zhang,
Hongyang Zhang,
Kai Wu,
Xiaojing Huang,
Jinhong Yuan,
Y. Jay Guo
Abstract:
Wireless communications are significantly impacted by the propagation environment, particularly in doubly selective channels with variations in both time and frequency domains. Orthogonal Time Frequency Space (OTFS) modulation has emerged as a promising solution; however, its high equalization complexity, if performed in the delay-Doppler domain, limits its universal application. This article expl…
▽ More
Wireless communications are significantly impacted by the propagation environment, particularly in doubly selective channels with variations in both time and frequency domains. Orthogonal Time Frequency Space (OTFS) modulation has emerged as a promising solution; however, its high equalization complexity, if performed in the delay-Doppler domain, limits its universal application. This article explores domain-adaptive system design, with an emphasis on adaptive equalization, while also discussing modulation and pilot placement strategies. It investigates the dynamic selection of best-fit domains based on channel conditions to enhance performance across diverse environments. We examine channel domain connections, signal designs, and equalization techniques with domain adaptivity, and highlight future research opportunities.
△ Less
Submitted 30 October, 2024; v1 submitted 31 July, 2024;
originally announced July 2024.
-
Efffcient Sensing Parameter Estimation with Direct Clutter Mitigation in Perceptive Mobile Networks
Authors:
Hang Li,
Hongming Yang,
Qinghua Guo,
J. Andrew Zhang,
Yang Xiang,
Yashan Pang
Abstract:
In this work, we investigate sensing parameter estimation in the presence of clutter in perceptive mobile networks (PMNs) that integrate radar sensing into mobile communications. Performing clutter suppression before sensing parameter estimation is generally desirable as the number of sensing parameters can be signiffcantly reduced. However, existing methods require high-complexity clutter mitigat…
▽ More
In this work, we investigate sensing parameter estimation in the presence of clutter in perceptive mobile networks (PMNs) that integrate radar sensing into mobile communications. Performing clutter suppression before sensing parameter estimation is generally desirable as the number of sensing parameters can be signiffcantly reduced. However, existing methods require high-complexity clutter mitigation and sensing parameter estimation, where clutter is ffrstly identiffed and then removed. In this correspondence, we propose a much simpler but more effective method by incorporating a clutter cancellation mechanism in formulating a sparse signal model for sensing parameter estimation.
In particular, clutter mitigation is performed directly on the received signals and the unitary approximate message passing (UAMP) is leveraged to exploit the common support for sensing parameter estimation in the formulated sparse signal recovery problem. Simulation results show that, compared to state-of-theart methods, the proposed method delivers signiffcantly better performance while with substantially reduced complexity.
△ Less
Submitted 24 July, 2024;
originally announced July 2024.
-
Performance Analysis and Low-Complexity Beamforming Design for Near-Field Physical Layer Security
Authors:
Yunpu Zhang,
Yuan Fang,
Changsheng You,
Ying-Jun Angela Zhang,
Hing Cheung So
Abstract:
Extremely large-scale arrays (XL-arrays) have emerged as a key enabler in achieving the unprecedented performance requirements of future wireless networks, leading to a significant increase in the range of the near-field region. This transition necessitates the spherical wavefront model for characterizing the wireless propagation rather than the far-field planar counterpart, thereby introducing ex…
▽ More
Extremely large-scale arrays (XL-arrays) have emerged as a key enabler in achieving the unprecedented performance requirements of future wireless networks, leading to a significant increase in the range of the near-field region. This transition necessitates the spherical wavefront model for characterizing the wireless propagation rather than the far-field planar counterpart, thereby introducing extra degrees of freedom (DoFs) to wireless system design. In this paper, we explore the beam focusing-based physical layer security (PLS) in the near field, where multiple legitimate users and one eavesdropper are situated in the near-field region of the XL-array base station (BS). First, we consider a special case with one legitimate user and one eavesdropper to shed useful insights into near-field PLS. In particular, it is shown that 1) Artificial noise (AN) is crucial to near-field security provisioning, transforming an insecure system to a secure one; 2) AN can yield numerous security gains, which considerably enhances PLS in the near field as compared to the case without AN taken into account. Next, for the general case with multiple legitimate users, we propose an efficient low-complexity approach to design the beamforming with AN to guarantee near-field secure transmission. Specifically, the low-complexity approach is conceived starting by introducing the concept of interference domain to capture the inter-user interference level, followed by a three-step identification framework for designing the beamforming. Finally, numerical results reveal that 1) the PLS enhancement in the near field is pronounced thanks to the additional spatial DoFs; 2) the proposed approach can achieve close performance to that of the computationally-extensive conventional approach yet with a significantly lower computational complexity.
△ Less
Submitted 8 April, 2025; v1 submitted 18 July, 2024;
originally announced July 2024.
-
Multi-Active-IRS-Assisted Cooperative Sensing: Cramér-Rao Bound and Joint Beamforming Design
Authors:
Yuan Fang,
Xianghao Yu,
Jie Xu,
Ying-Jun Angela Zhang
Abstract:
This paper studies the multi-intelligent reflecting surface (IRS)-assisted cooperative sensing, in which multiple active IRSs are deployed in a distributed manner to facilitate multi-view target sensing at the non-line-of-sight (NLoS) area of the base station (BS). Different from prior works employing passive IRSs, we leverage active IRSs with the capability of amplifying the reflected signals to…
▽ More
This paper studies the multi-intelligent reflecting surface (IRS)-assisted cooperative sensing, in which multiple active IRSs are deployed in a distributed manner to facilitate multi-view target sensing at the non-line-of-sight (NLoS) area of the base station (BS). Different from prior works employing passive IRSs, we leverage active IRSs with the capability of amplifying the reflected signals to overcome the severe multi-hop-reflection path loss in NLoS sensing. In particular, we consider two sensing setups without and with dedicated sensors equipped at active IRSs. In the first case without dedicated sensors at IRSs, we investigate the cooperative sensing at the BS, where the target's direction-of-arrival (DoA) with respect to each IRS is estimated based on the echo signals received at the BS. In the other case with dedicated sensors at IRSs, we consider that each IRS is able to receive echo signals and estimate the target's DoA with respect to itself. For both sensing setups, we first derive the closed-form Cramér-Rao bound (CRB) for estimating target DoA. Then, the (maximum) CRB is minimized by jointly optimizing the transmit beamforming at the BS and the reflective beamforming at the multiple IRSs, subject to the constraints on the maximum transmit power at the BS, as well as the maximum amplification power and the maximum power amplification gain constraints at individual active IRSs. To tackle the resulting highly non-convex (max-)CRB minimization problems, we propose two efficient algorithms to obtain high-quality solutions for the two cases with sensing at the BS and at the IRSs, respectively, based on alternating optimization, successive convex approximation, and semi-definite relaxation.
△ Less
Submitted 18 July, 2024; v1 submitted 18 June, 2024;
originally announced June 2024.
-
Rethinking Waveform for 6G: Harnessing Delay-Doppler Alignment Modulation
Authors:
Zhiqiang Xiao,
Xianda Liu,
Yong Zeng,
J. Andrew Zhang,
Shi Jin,
Rui Zhang
Abstract:
Waveform design has served as a cornerstone for each generation of mobile communication systems. The future sixth-generation (6G) mobile communication networks are expected to employ larger-scale antenna arrays and exploit higher-frequency bands for further boosting data transmission rate and providing ubiquitous wireless sensing. This brings new opportunities and challenges for 6G waveform design…
▽ More
Waveform design has served as a cornerstone for each generation of mobile communication systems. The future sixth-generation (6G) mobile communication networks are expected to employ larger-scale antenna arrays and exploit higher-frequency bands for further boosting data transmission rate and providing ubiquitous wireless sensing. This brings new opportunities and challenges for 6G waveform design. In this article, by leveraging the super spatial resolution of large antenna arrays and the multi-path spatial sparsity of highfrequency wireless channels, we introduce a new approach for waveform design based on the recently proposed delay-Doppler alignment modulation (DDAM). In particular, DDAM makes a paradigm shift of waveform design from the conventional manner of tolerating channel delay and Doppler spreads to actively manipulating them. First, we review the fundamental constraints and performance limitations of orthogonal frequency division multiplexing (OFDM) and introduce new opportunities for 6G waveform design. Next, the motivations and basic principles of DDAM are presented, followed by its various extensions to different wireless system setups. Finally, the main design considerations for DDAM are discussed and the new opportunities for future research are highlighted.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
HDMba: Hyperspectral Remote Sensing Imagery Dehazing with State Space Model
Authors:
Hang Fu,
Genyun Sun,
Yinhe Li,
Jinchang Ren,
Aizhu Zhang,
Cheng Jing,
Pedram Ghamisi
Abstract:
Haze contamination in hyperspectral remote sensing images (HSI) can lead to spatial visibility degradation and spectral distortion. Haze in HSI exhibits spatial irregularity and inhomogeneous spectral distribution, with few dehazing networks available. Current CNN and Transformer-based dehazing methods fail to balance global scene recovery, local detail retention, and computational efficiency. Ins…
▽ More
Haze contamination in hyperspectral remote sensing images (HSI) can lead to spatial visibility degradation and spectral distortion. Haze in HSI exhibits spatial irregularity and inhomogeneous spectral distribution, with few dehazing networks available. Current CNN and Transformer-based dehazing methods fail to balance global scene recovery, local detail retention, and computational efficiency. Inspired by the ability of Mamba to model long-range dependencies with linear complexity, we explore its potential for HSI dehazing and propose the first HSI Dehazing Mamba (HDMba) network. Specifically, we design a novel window selective scan module (WSSM) that captures local dependencies within windows and global correlations between windows by partitioning them. This approach improves the ability of conventional Mamba in local feature extraction. By modeling the local and global spectral-spatial information flow, we achieve a comprehensive analysis of hazy regions. The DehazeMamba layer (DML), constructed by WSSM, and residual DehazeMamba (RDM) blocks, composed of DMLs, are the core components of the HDMba framework. These components effectively characterize the complex distribution of haze in HSIs, aiding in scene reconstruction and dehazing. Experimental results on the Gaofen-5 HSI dataset demonstrate that HDMba outperforms other state-of-the-art methods in dehazing performance. The code will be available at https://github.com/RsAI-lab/HDMba.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
Semantic Importance-Aware Communications with Semantic Correction Using Large Language Models
Authors:
Shuaishuai Guo,
Yanhu Wang,
Jia Ye,
Anbang Zhang,
Kun Xu
Abstract:
Semantic communications, a promising approach for agent-human and agent-agent interactions, typically operate at a feature level, lacking true semantic understanding. This paper explores understanding-level semantic communications (ULSC), transforming visual data into human-intelligible semantic content. We employ an image caption neural network (ICNN) to derive semantic representations from visua…
▽ More
Semantic communications, a promising approach for agent-human and agent-agent interactions, typically operate at a feature level, lacking true semantic understanding. This paper explores understanding-level semantic communications (ULSC), transforming visual data into human-intelligible semantic content. We employ an image caption neural network (ICNN) to derive semantic representations from visual data, expressed as natural language descriptions. These are further refined using a pre-trained large language model (LLM) for importance quantification and semantic error correction. The subsequent semantic importance-aware communications (SIAC) aim to minimize semantic loss while respecting transmission delay constraints, exemplified through adaptive modulation and coding strategies. At the receiving end, LLM-based semantic error correction is utilized. If visual data recreation is desired, a pre-trained generative artificial intelligence (AI) model can regenerate it using the corrected descriptions. We assess semantic similarities between transmitted and recovered content, demonstrating ULSC's superior ability to convey semantic understanding compared to feature-level semantic communications (FLSC). ULSC's conversion of visual data to natural language facilitates various cognitive tasks, leveraging human knowledge bases. Additionally, this method enhances privacy, as neither original data nor features are directly transmitted.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Revealing the Trade-off in ISAC Systems: The KL Divergence Perspective
Authors:
Zesong Fei,
Shuntian Tang,
Xinyi Wang,
Fanghao Xia,
Fan Liu,
J. Andrew Zhang
Abstract:
Integrated sensing and communication (ISAC) is regarded as a promising technique for 6G communication network. In this letter, we investigate the Pareto bound of the ISAC system in terms of a unified Kullback-Leibler (KL) divergence performance metric. We firstly present the relationship between KL divergence and explicit ISAC performance metric, i.e., demodulation error and probability of detecti…
▽ More
Integrated sensing and communication (ISAC) is regarded as a promising technique for 6G communication network. In this letter, we investigate the Pareto bound of the ISAC system in terms of a unified Kullback-Leibler (KL) divergence performance metric. We firstly present the relationship between KL divergence and explicit ISAC performance metric, i.e., demodulation error and probability of detection. Thereafter, we investigate the impact of constellation and beamforming design on the Pareto bound via deep learning and semi-definite relaxation (SDR) techniques. Simulation results show the trade-off between sensing and communication performance in terms of bit error rate (BER) and probability of detection under different parameter set-ups.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
Heuristic Solution to Joint Deployment and Beamforming Design for STAR-RIS Aided Networks
Authors:
Bai Yan,
Qi Zhao,
Jin Zhang,
J. Andrew Zhang
Abstract:
This paper tackles the deployment challenges of Simultaneous Transmitting and Reflecting Reconfigurable Intelligent Surface (STAR-RIS) in communication systems. Unlike existing works that use fixed deployment setups or solely optimize the location, this paper emphasizes the joint optimization of the location and orientation of STAR-RIS. This enables searching across all user grouping possibilities…
▽ More
This paper tackles the deployment challenges of Simultaneous Transmitting and Reflecting Reconfigurable Intelligent Surface (STAR-RIS) in communication systems. Unlike existing works that use fixed deployment setups or solely optimize the location, this paper emphasizes the joint optimization of the location and orientation of STAR-RIS. This enables searching across all user grouping possibilities and fully boosting the system's performance. We consider a sum rate maximization problem with joint optimization and hybrid beamforming design. An offline heuristic solution is proposed for the problem, developed based on differential evolution and semi-definite programming methods. In particular, a point-point representation is proposed for characterizing and exploiting the user-grouping. A balanced grouping method is designed to achieve a desired user grouping with low complexity. Numerical results demonstrate the substantial performance gains achievable through optimal deployment design.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
Interference Management for Full-Duplex ISAC in B5G/6G Networks: Architectures, Challenges, and Solutions
Authors:
Aimin Tang,
Xudong Wang,
J. Andrew Zhang
Abstract:
Integrated sensing and communications (ISAC) has been visioned as a key technique for B5G/6G networks. To support monostatic sensing, a full-duplex radio is indispensable to extract echo signals from targets. Such a radio can also greatly improve network capacity via full-duplex communications. However, full-duplex radios in existing ISAC designs are mainly focused on wireless sensing, while the a…
▽ More
Integrated sensing and communications (ISAC) has been visioned as a key technique for B5G/6G networks. To support monostatic sensing, a full-duplex radio is indispensable to extract echo signals from targets. Such a radio can also greatly improve network capacity via full-duplex communications. However, full-duplex radios in existing ISAC designs are mainly focused on wireless sensing, while the ability of full-duplex communications is usually ignored. In this article, we provide an overview of full-duplex ISAC (FD-ISAC), where a full-duplex radio is used for both wireless sensing and full-duplex communications in B5G/6G networks, with a focus on the fundamental interference management problem in such networks. First, different ISAC architectures are introduced, considering different full-duplex communication modes and wireless sensing modes. Next, the challenging issues of link-level interference and network-level interference are analyzed, illustrating a critical demand on interference management for FD-ISAC. Potential solutions to interference management are then reviewed from the perspective of radio architecture design, beamforming, mode selection, and resource allocation. The corresponding open problems are also highlighted.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Reproducing the Acoustic Velocity Vectors in a Circular Listening Area
Authors:
Jiarui Wang,
Thushara Abhayapala,
Jihui Aimee Zhang,
Prasanga Samarasinghe
Abstract:
Acoustic velocity vectors are important for human's localization of sound at low frequencies. This paper proposes a sound field reproduction algorithm, which matches the acoustic velocity vectors in a circular listening area. In previous work, acoustic velocity vectors are matched either at sweet spots or on the boundary of the listening area. Methods based on sweet spots experience performance de…
▽ More
Acoustic velocity vectors are important for human's localization of sound at low frequencies. This paper proposes a sound field reproduction algorithm, which matches the acoustic velocity vectors in a circular listening area. In previous work, acoustic velocity vectors are matched either at sweet spots or on the boundary of the listening area. Methods based on sweet spots experience performance degradation when the listener moves away from sweet spots, whereas measuring the acoustic velocity vectors on the boundary requires complicated measurement setup. This paper proposes the radial independent cylindrical harmonic coefficients of the acoustic velocity vectors (CHV-indR coefficients) in the circular listening area, which are calculated from the cylindrical harmonic coefficients of the pressure in the circular listening area by using the sound field translation formula. The cylindrical harmonic coefficients of the pressure can be measured by a circular microphone array, which can be bought off-the-shelf. By matching the CHV-indR coefficients, the acoustic velocity vectors are reproduced throughout the listening area. Simulations show that at low frequencies, where the acoustic velocity vectors are the dominant factor for localization, the proposed reproduction method based on matching the CHV-indR coefficients results in higher accuracy in reproduced acoustic velocity vectors when compared with traditional method based on matching the cylindrical harmonic coefficients of the pressure.
△ Less
Submitted 4 September, 2024; v1 submitted 19 March, 2024;
originally announced March 2024.
-
Multistep Inverse Is Not All You Need
Authors:
Alexander Levine,
Peter Stone,
Amy Zhang
Abstract:
In real-world control settings, the observation space is often unnecessarily high-dimensional and subject to time-correlated noise. However, the controllable dynamics of the system are often far simpler than the dynamics of the raw observations. It is therefore desirable to learn an encoder to map the observation space to a simpler space of control-relevant variables. In this work, we consider the…
▽ More
In real-world control settings, the observation space is often unnecessarily high-dimensional and subject to time-correlated noise. However, the controllable dynamics of the system are often far simpler than the dynamics of the raw observations. It is therefore desirable to learn an encoder to map the observation space to a simpler space of control-relevant variables. In this work, we consider the Ex-BMDP model, first proposed by Efroni et al. (2022), which formalizes control problems where observations can be factorized into an action-dependent latent state which evolves deterministically, and action-independent time-correlated noise. Lamb et al. (2022) proposes the "AC-State" method for learning an encoder to extract a complete action-dependent latent state representation from the observations in such problems. AC-State is a multistep-inverse method, in that it uses the encoding of the the first and last state in a path to predict the first action in the path. However, we identify cases where AC-State will fail to learn a correct latent representation of the agent-controllable factor of the state. We therefore propose a new algorithm, ACDF, which combines multistep-inverse prediction with a latent forward model. ACDF is guaranteed to correctly infer an action-dependent latent state encoder for a large class of Ex-BMDP models. We demonstrate the effectiveness of ACDF on tabular Ex-BMDPs through numerical simulations; as well as high-dimensional environments using neural-network-based encoders. Code is available at https://github.com/midi-lab/acdf.
△ Less
Submitted 6 September, 2024; v1 submitted 18 March, 2024;
originally announced March 2024.
-
Performance Bounds for Passive Sensing in Asynchronous ISAC Systems -- Appendices
Authors:
Jingbo Zhao,
Zhaoming Lu,
J. Andrew Zhang,
Weicai Li,
Yifeng Xiong,
Zijun Han,
Xiangming Wen,
Tao Gu
Abstract:
This document contains the appendices for our paper titled ``Performance Bounds for Passive Sensing in Asynchronous ISAC Systems." The appendices include rigorous derivations of key formulas, detailed proofs of the theorems and propositions introduced in the paper, and details of the algorithm tested in the numerical simulation for validation. These appendices aim to support and elaborate on the f…
▽ More
This document contains the appendices for our paper titled ``Performance Bounds for Passive Sensing in Asynchronous ISAC Systems." The appendices include rigorous derivations of key formulas, detailed proofs of the theorems and propositions introduced in the paper, and details of the algorithm tested in the numerical simulation for validation. These appendices aim to support and elaborate on the findings and methodologies presented in the main text. All external references to equations, theorems, and so forth, are directed towards the corresponding elements within the main paper.
△ Less
Submitted 29 March, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Black-box Adversarial Attacks Against Image Quality Assessment Models
Authors:
Yu Ran,
Ao-Xiang Zhang,
Mingjie Li,
Weixuan Tang,
Yuan-Gen Wang
Abstract:
The goal of No-Reference Image Quality Assessment (NR-IQA) is to predict the perceptual quality of an image in line with its subjective evaluation. To put the NR-IQA models into practice, it is essential to study their potential loopholes for model refinement. This paper makes the first attempt to explore the black-box adversarial attacks on NR-IQA models. Specifically, we first formulate the atta…
▽ More
The goal of No-Reference Image Quality Assessment (NR-IQA) is to predict the perceptual quality of an image in line with its subjective evaluation. To put the NR-IQA models into practice, it is essential to study their potential loopholes for model refinement. This paper makes the first attempt to explore the black-box adversarial attacks on NR-IQA models. Specifically, we first formulate the attack problem as maximizing the deviation between the estimated quality scores of original and perturbed images, while restricting the perturbed image distortions for visual quality preservation. Under such formulation, we then design a Bi-directional loss function to mislead the estimated quality scores of adversarial examples towards an opposite direction with maximum deviation. On this basis, we finally develop an efficient and effective black-box attack method against NR-IQA models. Extensive experiments reveal that all the evaluated NR-IQA models are vulnerable to the proposed attack method. And the generated perturbations are not transferable, enabling them to serve the investigation of specialities of disparate IQA models.
△ Less
Submitted 28 February, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
Sensing in Bi-Static ISAC Systems with Clock Asynchronism: A Signal Processing Perspective
Authors:
Kai Wu,
Jacopo Pegoraro,
Francesca Meneghello,
J. Andrew Zhang,
Jesus O. Lacruz,
Joerg Widmer,
Francesco Restuccia,
Michele Rossi,
Xiaojing Huang,
Daqing Zhang,
Giuseppe Caire,
Y. Jay Guo
Abstract:
Integrated Sensing and Communication (ISAC) has been identified as a pillar usage scenario for the impending 6G era. Bi-static sensing, a major type of sensing in ISAC, is promising to expedite ISAC in the near future, as it requires minimal changes to the existing network infrastructure. However, a critical challenge for bi-static sensing is clock asynchronism due to the use of different clocks a…
▽ More
Integrated Sensing and Communication (ISAC) has been identified as a pillar usage scenario for the impending 6G era. Bi-static sensing, a major type of sensing in ISAC, is promising to expedite ISAC in the near future, as it requires minimal changes to the existing network infrastructure. However, a critical challenge for bi-static sensing is clock asynchronism due to the use of different clocks at far-separated transmitters and receivers. This causes the received signal to be affected by time-varying random phase offsets, severely degrading, or even failing, direct sensing. Hence, to effectively enable ISAC, considerable research has been directed toward addressing the clock asynchronism issue in bi-static sensing. This paper provides an overview of the issue and existing techniques developed in an ISAC background. Based on the review and comparison, we also draw insights into the future research directions and open problems, aiming to nurture the maturation of bi-static sensing in ISAC.
△ Less
Submitted 24 June, 2024; v1 submitted 14 February, 2024;
originally announced February 2024.
-
Moment-based metrics for molecules computable from cryo-EM images
Authors:
Andy Zhang,
Oscar Mickelin,
Joe Kileel,
Eric J. Verbeke,
Nicholas F. Marshall,
Marc Aurèle Gilles,
Amit Singer
Abstract:
Single particle cryogenic electron microscopy (cryo-EM) is an imaging technique capable of recovering the high-resolution 3-D structure of biological macromolecules from many noisy and randomly oriented projection images. One notable approach to 3-D reconstruction, known as Kam's method, relies on the moments of the 2-D images. Inspired by Kam's method, we introduce a rotationally invariant metric…
▽ More
Single particle cryogenic electron microscopy (cryo-EM) is an imaging technique capable of recovering the high-resolution 3-D structure of biological macromolecules from many noisy and randomly oriented projection images. One notable approach to 3-D reconstruction, known as Kam's method, relies on the moments of the 2-D images. Inspired by Kam's method, we introduce a rotationally invariant metric between two molecular structures, which does not require 3-D alignment. Further, we introduce a metric between a stack of projection images and a molecular structure, which is invariant to rotations and reflections and does not require performing 3-D reconstruction. Additionally, the latter metric does not assume a uniform distribution of viewing angles. We demonstrate uses of the new metrics on synthetic and experimental datasets, highlighting their ability to measure structural similarity.
△ Less
Submitted 26 January, 2024;
originally announced January 2024.
-
Anchor-points Assisted Uplink Sensing in Perceptive Mobile Networks
Authors:
Yanmo Hu,
J. Andrew Zhang,
Weibo Deng,
Y. Jay Guo
Abstract:
Uplink sensing in integrated sensing and communications (ISAC) systems, such as Perceptive Mobile Networks, is challenging due to the clock asynchronism between transmitter and receiver. Existing solutions typically require the presence of a dominating line-of-sight path and the knowledge of transmitter location at the receiver. In this paper, relaxing these requirements, we propose a novel and ef…
▽ More
Uplink sensing in integrated sensing and communications (ISAC) systems, such as Perceptive Mobile Networks, is challenging due to the clock asynchronism between transmitter and receiver. Existing solutions typically require the presence of a dominating line-of-sight path and the knowledge of transmitter location at the receiver. In this paper, relaxing these requirements, we propose a novel and effective uplink sensing scheme with the assistance of static anchor points. Two major algorithms are proposed in the scheme. The first algorithm estimates the relative timing and carrier frequency offsets due to clock asynchronism, with respect to those at a randomly selected reference snapshot. Theoretical performance analysis is provided for the algorithm. The estimates from the first algorithm are then used to compensate for the offsets and generate the angle-Doppler maps. Using the maps, the second algorithm identifies the anchor points, and then locates the UE and dynamic targets. Feasibility of UE localization is also analyzed. Simulation results are provided and demonstrate the effectiveness of the proposed algorithms.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
Performance Bounds and Optimization for CSI-Ratio based Bi-static Doppler Sensing in ISAC Systems
Authors:
Yanmo Hu,
Kai Wu,
J. Andrew Zhang,
Weibo Deng,
Y. Jay Guo
Abstract:
Bi-static sensing is crucial for exploring the potential of networked sensing capabilities in integrated sensing and communications (ISAC). However, it suffers from the challenging clock asynchronism issue. CSI ratio-based sensing is an effective means to address the issue. Its performance bounds, particular for Doppler sensing, have not been fully understood yet. This work endeavors to fill the r…
▽ More
Bi-static sensing is crucial for exploring the potential of networked sensing capabilities in integrated sensing and communications (ISAC). However, it suffers from the challenging clock asynchronism issue. CSI ratio-based sensing is an effective means to address the issue. Its performance bounds, particular for Doppler sensing, have not been fully understood yet. This work endeavors to fill the research gap. Focusing on a single dynamic path in high-SNR scenarios, we derive the closed-form CRB. Then, through analyzing the mutual interference between dynamic and static paths, we simplify the CRB results by deriving close approximations, further unveiling new insights of the impact of numerous physical parameters on Doppler sensing. Moreover, utilizing the new CRB and analyses, we propose novel waveform optimization strategies for noise- and interference-limited sensing scenarios, which are also empowered by closed-form and efficient solutions. Extensive simulation results are provided to validate the preciseness of the derived CRB results and analyses, with the aid of the maximum-likelihood estimator. The results also demonstrate the substantial enhanced Doppler sensing accuracy and the sensing capabilities for low-speed target achieved by the proposed waveform design.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge
Authors:
He Wang,
Pengcheng Guo,
Yue Li,
Ao Zhang,
Jiayao Sun,
Lei Xie,
Wei Chen,
Pan Zhou,
Hui Bu,
Xin Xu,
Binbin Zhang,
Zhuo Chen,
Jian Wu,
Longbiao Wang,
Eng Siong Chng,
Sun Li
Abstract:
To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge. This challenge collects over 100 hours of multi-channel speech data recorded inside a new energy vehicle and 40 hours…
▽ More
To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge. This challenge collects over 100 hours of multi-channel speech data recorded inside a new energy vehicle and 40 hours of noise for data augmentation. Two tracks, including automatic speech recognition (ASR) and automatic speech diarization and recognition (ASDR) are set up, using character error rate (CER) and concatenated minimum permutation character error rate (cpCER) as evaluation metrics, respectively. Overall, the ICMC-ASR Challenge attracts 98 participating teams and receives 53 valid results in both tracks. In the end, first-place team USTCiflytek achieves a CER of 13.16% in the ASR track and a cpCER of 21.48% in the ASDR track, showing an absolute improvement of 13.08% and 51.4% compared to our challenge baseline, respectively.
△ Less
Submitted 20 February, 2024; v1 submitted 7 January, 2024;
originally announced January 2024.
-
U2-KWS: Unified Two-pass Open-vocabulary Keyword Spotting with Keyword Bias
Authors:
Ao Zhang,
Pan Zhou,
Kaixun Huang,
Yong Zou,
Ming Liu,
Lei Xie
Abstract:
Open-vocabulary keyword spotting (KWS), which allows users to customize keywords, has attracted increasingly more interest. However, existing methods based on acoustic models and post-processing train the acoustic model with ASR training criteria to model all phonemes, making the acoustic model under-optimized for the KWS task. To solve this problem, we propose a novel unified two-pass open-vocabu…
▽ More
Open-vocabulary keyword spotting (KWS), which allows users to customize keywords, has attracted increasingly more interest. However, existing methods based on acoustic models and post-processing train the acoustic model with ASR training criteria to model all phonemes, making the acoustic model under-optimized for the KWS task. To solve this problem, we propose a novel unified two-pass open-vocabulary KWS (U2-KWS) framework inspired by the two-pass ASR model U2. Specifically, we employ the CTC branch as the first stage model to detect potential keyword candidates and the decoder branch as the second stage model to validate candidates. In order to enhance any customized keywords, we redesign the U2 training procedure for U2-KWS and add keyword information by audio and text cross-attention into both branches. We perform experiments on our internal dataset and Aishell-1. The results show that U2-KWS can achieve a significant relative wake-up rate improvement of 41% compared to the traditional customized KWS systems when the false alarm rate is fixed to 0.5 times per hour.
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
Densifying MIMO: Channel Modeling, Physical Constraints, and Performance Evaluation for Holographic Communications
Authors:
Y. Liu,
M. Zhang,
T. Wang,
A. Zhang,
M. Debbah
Abstract:
As the backbone of the fifth-generation (5G) cellular network, massive multiple-input multiple-output (MIMO) encounters a significant challenge in practical applications: how to deploy a large number of antenna elements within limited spaces. Recently, holographic communication has emerged as a potential solution to this issue. It employs dense antenna arrays and provides a tractable model. Nevert…
▽ More
As the backbone of the fifth-generation (5G) cellular network, massive multiple-input multiple-output (MIMO) encounters a significant challenge in practical applications: how to deploy a large number of antenna elements within limited spaces. Recently, holographic communication has emerged as a potential solution to this issue. It employs dense antenna arrays and provides a tractable model. Nevertheless, some challenges must be addressed to actualize this innovative concept. One is the mutual coupling among antenna elements within an array. When the element spacing is small, near-field coupling becomes the dominant factor that strongly restricts the array performance. Another is the polarization of electromagnetic waves. As an intrinsic property, it was not fully considered in the previous channel modeling of holographic communication. The third is the lack of real-world experiments to show the potential and possible defects of a holographic communication system. In this paper, we propose an electromagnetic channel model based on the characteristics of electromagnetic waves. This model encompasses the impact of mutual coupling in the transceiver sides and the depolarization in the propagation environment. Furthermore, by approximating an infinite array, the performance restrictions of large-scale dense antenna arrays are also studied theoretically to exploit the potential of the proposed channel. In addition, numerical simulations and a channel measurement experiment are conducted. The findings reveal that within limited spaces, the coupling effect, particularly for element spacing smaller than half of the wavelength, is the primary factor leading to the inflection point for the performance of holographic communications.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Time and Frequency Offset Estimation and Intercarrier Interference Cancellation for AFDM Systems
Authors:
Yuankun Tang,
Anjie Zhang,
Miaowen Wen,
Yu Huang,
Fei Ji,
Jinming Wen
Abstract:
Affine frequency division multiplexing (AFDM) is an emerging multicarrier waveform that offers a potential solution for achieving reliable communications over time-varying channels. This paper proposes two maximum-likelihood (ML) estimators of symbol time offset and carrier frequency offset for AFDM systems. One is called joint ML estimator, which evaluates the arrival time and carrier frequency o…
▽ More
Affine frequency division multiplexing (AFDM) is an emerging multicarrier waveform that offers a potential solution for achieving reliable communications over time-varying channels. This paper proposes two maximum-likelihood (ML) estimators of symbol time offset and carrier frequency offset for AFDM systems. One is called joint ML estimator, which evaluates the arrival time and carrier frequency offset by comparing the correlations of samples. Moreover, we propose the other so-called stepwise ML estimator to reduce the complexity. Both proposed estimators exploit the redundant information contained within the chirp-periodic prefix inherent in AFDM symbols, thus dispensing with any additional pilots. To further mitigate the intercarrier interference resulting from the residual frequency offset, we design a mirror-mapping-based scheme for AFDM systems. Numerical results verify the effectiveness of the proposed time and carrier frequency offset estimation criteria and the mirror-mapping-based modulation for AFDM systems.
△ Less
Submitted 28 December, 2023; v1 submitted 10 October, 2023;
originally announced October 2023.
-
Waveform Design for MIMO-OFDM Integrated Sensing and Communication System: An Information Theoretical Approach
Authors:
Zhiqing Wei,
Jinghui Piao,
Xin Yuan,
Huici Wu,
J. Andrew Zhang,
Zhiyong Feng,
Lin Wang,
Ping Zhang
Abstract:
Integrated sensing and communication (ISAC) is regarded as the enabling technology in the future 5th-Generation-Advanced (5G-A) and 6th-Generation (6G) mobile communication system. ISAC waveform design is critical in ISAC system. However, the difference of the performance metrics between sensing and communication brings challenges for the ISAC waveform design. This paper applies the unified perfor…
▽ More
Integrated sensing and communication (ISAC) is regarded as the enabling technology in the future 5th-Generation-Advanced (5G-A) and 6th-Generation (6G) mobile communication system. ISAC waveform design is critical in ISAC system. However, the difference of the performance metrics between sensing and communication brings challenges for the ISAC waveform design. This paper applies the unified performance metrics in information theory, namely mutual information (MI), to measure the communication and sensing performance in multicarrier ISAC system. In multi-input multi-output orthogonal frequency division multiplexing (MIMO-OFDM) ISAC system, we first derive the sensing and communication MI with subcarrier correlation and spatial correlation. Then, we propose optimal waveform designs for maximizing the sensing MI, communication MI and the weighted sum of sensing and communication MI, respectively. The optimization results are validated by Monte Carlo simulations. Our work provides effective closed-form expressions for waveform design, enabling the realization of MIMO-OFDM ISAC system with balanced performance in communication and sensing.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition
Authors:
Kaixun Huang,
Ao Zhang,
Binbin Zhang,
Tianyi Xu,
Xingchen Song,
Lei Xie
Abstract:
The attention-based deep contextual biasing method has been demonstrated to effectively improve the recognition performance of end-to-end automatic speech recognition (ASR) systems on given contextual phrases. However, unlike shallow fusion methods that directly bias the posterior of the ASR model, deep biasing methods implicitly integrate contextual information, making it challenging to control t…
▽ More
The attention-based deep contextual biasing method has been demonstrated to effectively improve the recognition performance of end-to-end automatic speech recognition (ASR) systems on given contextual phrases. However, unlike shallow fusion methods that directly bias the posterior of the ASR model, deep biasing methods implicitly integrate contextual information, making it challenging to control the degree of bias. In this study, we introduce a spike-triggered deep biasing method that simultaneously supports both explicit and implicit bias. Moreover, both bias approaches exhibit significant improvements and can be cascaded with shallow fusion methods for better results. Furthermore, we propose a context sampling enhancement strategy and improve the contextual phrase filtering algorithm. Experiments on the public WenetSpeech Mandarin biased-word dataset show a 32.0% relative CER reduction compared to the baseline model, with an impressively 68.6% relative CER reduction on contextual phrases.
△ Less
Submitted 6 October, 2023;
originally announced October 2023.