Search | arXiv e-print repository

arXiv:2410.20690 [pdf, other]

KANsformer for Scalable Beamforming

Authors: Xinke Xie, Yang Lu, Chong-Yung Chi, Wei Chen, Bo Ai, Dusit Niyato

Abstract: This paper proposes an unsupervised deep-learning (DL) approach by integrating transformer and Kolmogorov-Arnold networks (KAN) termed KANsformer to realize scalable beamforming for mobile communication systems. Specifically, we consider a classic multi-input-single-output energy efficiency maximization problem subject to the total power budget. The proposed KANsformer first extracts hidden featur… ▽ More This paper proposes an unsupervised deep-learning (DL) approach by integrating transformer and Kolmogorov-Arnold networks (KAN) termed KANsformer to realize scalable beamforming for mobile communication systems. Specifically, we consider a classic multi-input-single-output energy efficiency maximization problem subject to the total power budget. The proposed KANsformer first extracts hidden features via a multi-head self-attention mechanism and then reads out the desired beamforming design via KAN. Numerical results are provided to evaluate the KANsformer in terms of generalization performance, transfer learning and ablation experiment. Overall, the KANsformer outperforms existing benchmark DL approaches, and is adaptable to the change in the number of mobile users with real-time and near-optimal inference. △ Less

Submitted 27 October, 2024; originally announced October 2024.

arXiv:2410.05739 [pdf, ps, other]

End-to-end multi-channel speaker extraction and binaural speech synthesis

Authors: Cheng Chi, Xiaoyu Li, Yuxuan Ke, Qunping Ni, Yao Ge, Xiaodong Li, Chengshi Zheng

Abstract: Speech clarity and spatial audio immersion are the two most critical factors in enhancing remote conferencing experiences. Existing methods are often limited: either due to the lack of spatial information when using only one microphone, or because their performance is highly dependent on the accuracy of direction-of-arrival estimation when using microphone array. To overcome this issue, we introdu… ▽ More Speech clarity and spatial audio immersion are the two most critical factors in enhancing remote conferencing experiences. Existing methods are often limited: either due to the lack of spatial information when using only one microphone, or because their performance is highly dependent on the accuracy of direction-of-arrival estimation when using microphone array. To overcome this issue, we introduce an end-to-end deep learning framework that has the capacity of mapping multi-channel noisy and reverberant signals to clean and spatialized binaural speech directly. This framework unifies source extraction, noise suppression, and binaural rendering into one network. In this framework, a novel magnitude-weighted interaural level difference loss function is proposed that aims to improve the accuracy of spatial rendering. Extensive evaluations show that our method outperforms established baselines in terms of both speech quality and spatial fidelity. △ Less

Submitted 11 July, 2025; v1 submitted 8 October, 2024; originally announced October 2024.

arXiv:2408.05776 [pdf]

Convergence of Symbiotic Communications and Blockchain for Sustainable and Trustworthy 6G Wireless Networks

Authors: Haoxiang Luo, Gang Sun, Cheng Chi, Hongfang Yu, Mohsen Guizani

Abstract: Symbiotic communication (SC) is known as a new wireless communication paradigm, similar to the natural ecosystem population, and can enable multiple communication systems to cooperate and mutualize through service exchange and resource sharing. As a result, SC is seen as an important potential technology for future sixth-generation (6G) communications, solving the problem of lack of spectrum resou… ▽ More Symbiotic communication (SC) is known as a new wireless communication paradigm, similar to the natural ecosystem population, and can enable multiple communication systems to cooperate and mutualize through service exchange and resource sharing. As a result, SC is seen as an important potential technology for future sixth-generation (6G) communications, solving the problem of lack of spectrum resources and energy inefficiency. Symbiotic relationships among communication systems can complement radio resources in 6G. However, the absence of established trust relationships among diverse communication systems presents a formidable hurdle in ensuring efficient and trusted resource and service exchange within SC frameworks. To better realize trusted SC services in 6G, in this paper, we propose a solution that converges SC and blockchain, called a symbiotic blockchain network (SBN). Specifically, we first use cognitive backscatter communication to transform blockchain consensus, that is, the symbiotic blockchain consensus (SBC), so that it can be better suited for the wireless network. Then, for SBC, we propose a highly energy-efficient sharding scheme to meet the extremely low power consumption requirements in 6G. Finally, such a blockchain scheme guarantees trusted transactions of communication services in SC. Through ablation experiments, our proposed SBN demonstrates significant efficacy in mitigating energy consumption and reducing processing latency in adversarial networks, which is expected to achieve a sustainable and trusted 6G wireless network. △ Less

Submitted 11 August, 2024; originally announced August 2024.

arXiv:2406.19464 [pdf, other]

ManiWAV: Learning Robot Manipulation from In-the-Wild Audio-Visual Data

Authors: Zeyi Liu, Cheng Chi, Eric Cousineau, Naveen Kuppuswamy, Benjamin Burchfiel, Shuran Song

Abstract: Audio signals provide rich information for the robot interaction and object properties through contact. This information can surprisingly ease the learning of contact-rich robot manipulation skills, especially when the visual information alone is ambiguous or incomplete. However, the usage of audio data in robot manipulation has been constrained to teleoperated demonstrations collected by either a… ▽ More Audio signals provide rich information for the robot interaction and object properties through contact. This information can surprisingly ease the learning of contact-rich robot manipulation skills, especially when the visual information alone is ambiguous or incomplete. However, the usage of audio data in robot manipulation has been constrained to teleoperated demonstrations collected by either attaching a microphone to the robot or object, which significantly limits its usage in robot learning pipelines. In this work, we introduce ManiWAV: an 'ear-in-hand' data collection device to collect in-the-wild human demonstrations with synchronous audio and visual feedback, and a corresponding policy interface to learn robot manipulation policy directly from the demonstrations. We demonstrate the capabilities of our system through four contact-rich manipulation tasks that require either passively sensing the contact events and modes, or actively sensing the object surface materials and states. In addition, we show that our system can generalize to unseen in-the-wild environments by learning from diverse in-the-wild human demonstrations. △ Less

Submitted 3 November, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

Comments: Conference on Robot Learning (CoRL) 2024; Project website: https://maniwav.github.io/

arXiv:2403.09096 [pdf, other]

Deep unfolding Network for Hyperspectral Image Super-Resolution with Automatic Exposure Correction

Authors: Yuan Fang, Yipeng Liu, Jie Chen, Zhen Long, Ao Li, Chong-Yung Chi, Ce Zhu

Abstract: In recent years, the fusion of high spatial resolution multispectral image (HR-MSI) and low spatial resolution hyperspectral image (LR-HSI) has been recognized as an effective method for HSI super-resolution (HSI-SR). However, both HSI and MSI may be acquired under extreme conditions such as night or poorly illuminating scenarios, which may cause different exposure levels, thereby seriously downgr… ▽ More In recent years, the fusion of high spatial resolution multispectral image (HR-MSI) and low spatial resolution hyperspectral image (LR-HSI) has been recognized as an effective method for HSI super-resolution (HSI-SR). However, both HSI and MSI may be acquired under extreme conditions such as night or poorly illuminating scenarios, which may cause different exposure levels, thereby seriously downgrading the yielded HSISR. In contrast to most existing methods based on respective low-light enhancements (LLIE) of MSI and HSI followed by their fusion, a deep Unfolding HSI Super-Resolution with Automatic Exposure Correction (UHSR-AEC) is proposed, that can effectively generate a high-quality fused HSI-SR (in texture and features) even under very imbalanced exposures, thanks to the correlation between LLIE and HSI-SR taken into account. Extensive experiments are provided to demonstrate the state-of-the-art overall performance of the proposed UHSR-AEC, including comparison with some benchmark peer methods. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2307.16259 [pdf, ps, other]

Communication-Sensing Region for Cell-Free Massive MIMO ISAC Systems

Authors: Weihao Mao, Yang Lu, Chong-Yung Chi, Bo Ai, Zhangdui Zhong, Zhiguo Ding

Abstract: This paper investigates the system model and the transmit beamforming design for the Cell-Free massive multi-input multi-output (MIMO) integrated sensing and communication (ISAC) system. The impact of the uncertainty of the target locations on the propagation of wireless signals is considered during both uplink and downlink phases, and especially, the main statistics of the MIMO channel estimation… ▽ More This paper investigates the system model and the transmit beamforming design for the Cell-Free massive multi-input multi-output (MIMO) integrated sensing and communication (ISAC) system. The impact of the uncertainty of the target locations on the propagation of wireless signals is considered during both uplink and downlink phases, and especially, the main statistics of the MIMO channel estimation error are theoretically derived in the closed-form fashion. A fundamental performance metric, termed communication-sensing (C-S) region, is defined for the considered system via three cases, i.e., the sensing-only case, the communication-only case and the ISAC case. The transmit beamforming design problems for the three cases are respectively carried out through different reformulations, e.g., the Lagrangian dual transform and the quadratic fractional transform, and some combinations of the block coordinate descent method and the successive convex approximation method. Numerical results present a 3-dimensional C-S region with a dynamic number of access points to illustrate the trade-off between communication and radar sensing. The advantage for radar sensing of the Cell-Free massive MIMO system is also studied via a comparison with the traditional cellular system. Finally, the efficacy of the proposed beamforming scheme is validated in comparison with zero-forcing and maximum ratio transmission schemes. △ Less

Submitted 30 July, 2023; originally announced July 2023.

arXiv:2303.09858 [pdf, other]

Preventing Unauthorized AI Over-Analysis by Medical Image Adversarial Watermarking

Authors: Xingxing Wei, Bangzheng Pu, Shiji Zhao, Chen Chi, Huazhu Fu

Abstract: The advancement of deep learning has facilitated the integration of Artificial Intelligence (AI) into clinical practices, particularly in computer-aided diagnosis. Given the pivotal role of medical images in various diagnostic procedures, it becomes imperative to ensure the responsible and secure utilization of AI techniques. However, the unauthorized utilization of AI for image analysis raises si… ▽ More The advancement of deep learning has facilitated the integration of Artificial Intelligence (AI) into clinical practices, particularly in computer-aided diagnosis. Given the pivotal role of medical images in various diagnostic procedures, it becomes imperative to ensure the responsible and secure utilization of AI techniques. However, the unauthorized utilization of AI for image analysis raises significant concerns regarding patient privacy and potential infringement on the proprietary rights of data custodians. Consequently, the development of pragmatic and cost-effective strategies that safeguard patient privacy and uphold medical image copyrights emerges as a critical necessity. In direct response to this pressing demand, we present a pioneering solution named Medical Image Adversarial watermarking (MIAD-MARK). Our approach introduces watermarks that strategically mislead unauthorized AI diagnostic models, inducing erroneous predictions without compromising the integrity of the visual content. Importantly, our method integrates an authorization protocol tailored for legitimate users, enabling the removal of the MIAD-MARK through encryption-generated keys. Through extensive experiments, we validate the efficacy of MIAD-MARK across three prominent medical image datasets. The empirical outcomes demonstrate the substantial impact of our approach, notably reducing the accuracy of standard AI diagnostic models to a mere 8.57% under white box conditions and 45.83% in the more challenging black box scenario. Additionally, our solution effectively mitigates unauthorized exploitation of medical images even in the presence of sophisticated watermark removal networks. Notably, those AI diagnosis networks exhibit a meager average accuracy of 38.59% when applied to images protected by MIAD-MARK, underscoring the robustness of our safeguarding mechanism. △ Less

Submitted 13 September, 2023; v1 submitted 17 March, 2023; originally announced March 2023.

arXiv:2102.12726 [pdf, other]

Design and Control of a Highly Redundant Rigid-Flexible Coupling Robot to Assist the COVID-19 Oropharyngeal-Swab Sampling

Authors: Yingbai Hu, Jian Li, Yongquan Chen, Qiwen Wang, Chuliang Chi, Heng Zhang, Qing Gao, Yuanmin Lan, Zheng Li, Zonggao Mu, Zhenglong Sun, Alois Knoll

Abstract: The outbreak of novel coronavirus pneumonia (COVID-19) has caused mortality and morbidity worldwide. Oropharyngeal-swab (OP-swab) sampling is widely used for the diagnosis of COVID-19 in the world. To avoid the clinical staff from being affected by the virus, we developed a 9-degree-of-freedom (DOF) rigid-flexible coupling (RFC) robot to assist the COVID-19 OP-swab sampling. This robot is composed… ▽ More The outbreak of novel coronavirus pneumonia (COVID-19) has caused mortality and morbidity worldwide. Oropharyngeal-swab (OP-swab) sampling is widely used for the diagnosis of COVID-19 in the world. To avoid the clinical staff from being affected by the virus, we developed a 9-degree-of-freedom (DOF) rigid-flexible coupling (RFC) robot to assist the COVID-19 OP-swab sampling. This robot is composed of a visual system, UR5 robot arm, micro-pneumatic actuator and force-sensing system. The robot is expected to reduce risk and free up the clinical staff from the long-term repetitive sampling work. Compared with a rigid sampling robot, the developed force-sensing RFC robot can facilitate OP-swab sampling procedures in a safer and softer way. In addition, a varying-parameter zeroing neural network-based optimization method is also proposed for motion planning of the 9-DOF redundant manipulator. The developed robot system is validated by OP-swab sampling on both oral cavity phantoms and volunteers. △ Less

Submitted 25 February, 2021; originally announced February 2021.

Comments: 8 pages, 11 figures

arXiv:2007.00041 [pdf, other]

doi 10.1109/MSP.2020.3013555

Multi-way Graph Signal Processing on Tensors: Integrative analysis of irregular geometries

Authors: Jay S. Stanley III, Eric C. Chi, Gal Mishne

Abstract: Graph signal processing (GSP) is an important methodology for studying data residing on irregular structures. As acquired data is increasingly taking the form of multi-way tensors, new signal processing tools are needed to maximally utilize the multi-way structure within the data. In this paper, we review modern signal processing frameworks generalizing GSP to multi-way data, starting from graph s… ▽ More Graph signal processing (GSP) is an important methodology for studying data residing on irregular structures. As acquired data is increasingly taking the form of multi-way tensors, new signal processing tools are needed to maximally utilize the multi-way structure within the data. In this paper, we review modern signal processing frameworks generalizing GSP to multi-way data, starting from graph signals coupled to familiar regular axes such as time in sensor networks, and then extending to general graphs across all tensor modes. This widely applicable paradigm motivates reformulating and improving upon classical problems and approaches to creatively address the challenges in tensor-based data. We synthesize common themes arising from current efforts to combine GSP with tensor analysis and highlight future directions in extending GSP to the multi-way paradigm. △ Less

Submitted 27 July, 2020; v1 submitted 30 June, 2020; originally announced July 2020.

Comments: In review for IEEE Signal Processing Magazine

arXiv:2004.00298 [pdf, other]

Stationarity of Time-Series on Graph via Bivariate Translation Invariance

Authors: Amin Jalili, Chong-Yung Chi

Abstract: Stationarity is a cornerstone in classical signal processing (CSP) for modeling and characterizing various stochastic signals for the ensuing analysis. However, in many complex real world scenarios, where the stochastic process lies over an irregular graph structure, CSP discards the underlying structure in analyzing such structured data. Then it is essential to establish a new framework to analyz… ▽ More Stationarity is a cornerstone in classical signal processing (CSP) for modeling and characterizing various stochastic signals for the ensuing analysis. However, in many complex real world scenarios, where the stochastic process lies over an irregular graph structure, CSP discards the underlying structure in analyzing such structured data. Then it is essential to establish a new framework to analyze the high-dimensional graph structured stochastic signals by taking the underlying structure into account. To this end, looking through the lens of operator theory, we first propose a new bivariate isometric joint translation operator (JTO) consistent with the structural characteristic of translation operators in other signal domains. Moreover, we characterize time-vertex filtering based on the proposed JTO. Thereupon, we put forth a new definition of joint wide-sense stationary (JWSS) signals in time-vertex domain using the proposed isometric JTO together with its spectral characterization. Then a new joint power spectral density (JPSD) estimator, called generalized Welch method (GWM), is presented. Simulation results are provided to show the efficacy of this JPSD estimator. Furthermore, to show the usefulness of JWSS modeling, we focus on the classification of time-series on graph. To that end, by modeling the brain Electroencephalography (EEG) signals as JWSS processes, we use JPSD as the feature for the Emotion and Alzheimer's disease (AD) recognition. Experimental results demonstrate that JPSD yields superior Emotion and AD recognition accuracy in comparison with the classical power spectral density (PSD) and graph PSD (GPSD) as the feature set for both applications. Eventually, we provide some concluding remarks. △ Less

Submitted 24 November, 2021; v1 submitted 1 April, 2020; originally announced April 2020.

arXiv:1909.04178 [pdf, other]

Translation Operator in Graph Signal Processing: A Generalized Approach

Authors: Amin Jalili, Sadid Sahami, Chong-Yung Chi

Abstract: The notion of translation (shift) is straightforward in classical signal processing, however, it is challenging on an irregular graph structure. In this work, we present an approach to characterize the translation operator in various signal domains. By a natural generalization from classical domains, one can characterize an abstract representation for the graph translation operator. Then we propos… ▽ More The notion of translation (shift) is straightforward in classical signal processing, however, it is challenging on an irregular graph structure. In this work, we present an approach to characterize the translation operator in various signal domains. By a natural generalization from classical domains, one can characterize an abstract representation for the graph translation operator. Then we propose an isometric translation operator in joint time-vertex domain consistent with the abstract form of translation operators in other domains. We also demonstrate the connection between this notion and the Schrödinger equation on a dynamic system which intriguingly describes the idea behind translation on graph. △ Less

Submitted 20 January, 2020; v1 submitted 9 September, 2019; originally announced September 2019.

Showing 1–11 of 11 results for author: Chi, C