Search | arXiv e-print repository

FollowSpot: Enhancing Wireless Communications via Movable Ceiling-Mounted Metasurfaces

Authors: Wenhai Lai, Kaiming Shen, Rui Zhang

Abstract: This paper studies the optimal placement of ceiling-mounted metasurfaces (MTSs) to help focus the wireless signal beam onto the target receiver, as inspired by the theatre spotlight. We assume that a total of $M$ MTSs are deployed, and that there are $L$ possible positions for each MTS. The resulting signal-to-noise (SNR) maximization problem is difficult to tackle directly because of the coupling… ▽ More This paper studies the optimal placement of ceiling-mounted metasurfaces (MTSs) to help focus the wireless signal beam onto the target receiver, as inspired by the theatre spotlight. We assume that a total of $M$ MTSs are deployed, and that there are $L$ possible positions for each MTS. The resulting signal-to-noise (SNR) maximization problem is difficult to tackle directly because of the coupling between the placement decisions of the different MTSs. Mathematically, we are faced with a nonlinear discrete optimization problem with $L^M$ possible solutions. A remarkable result shown in this paper is that the above challenging problem can be efficiently solved within $O(ML^2\log(ML))$ time. There are two key steps in developing the proposed algorithm. First, we successfully decouple the placement variables of different MTSs by introducing a continuous auxiliary variable $μ$; the discrete primal variables are now easy to optimize when $μ$ is held fixed, but the optimization problem of $μ$ is nonconvex. Second, we show that the optimization of continuous $μ$ can be recast into a discrete optimization problem with only $LM$ possible solutions, so the optimal $μ$ can now be readily obtained. Numerical results show that the proposed algorithm can not only guarantee a global optimum but also reach the optimal solution efficiently. △ Less

Submitted 5 July, 2025; originally announced July 2025.

Comments: 11 pages

arXiv:2506.00987 [pdf, ps, other]

doi 10.1109/LWC.2025.3576288

Blind Passive Beamforming for MIMO System

Authors: Wenhai Lai, Jiawei Yao, Kaiming Shen

Abstract: Passive beamforming for the intelligent surface (IS)-aided multiple-input multiple-output (MIMO) communication is a difficult nonconvex problem. It becomes even more challenging under the practical discrete constraints on phase shifts. Unlike most of the existing approaches that rely on the channel state information (CSI), this work advocates a blind beamforming strategy without any CSI. Simply pu… ▽ More Passive beamforming for the intelligent surface (IS)-aided multiple-input multiple-output (MIMO) communication is a difficult nonconvex problem. It becomes even more challenging under the practical discrete constraints on phase shifts. Unlike most of the existing approaches that rely on the channel state information (CSI), this work advocates a blind beamforming strategy without any CSI. Simply put, we propose a statistical method that learns the main feature of the wireless environment from the random samples of received signal power. Field tests in the 5G commercial network demonstrate the superiority of the proposed blind passive beamforming method. △ Less

Submitted 1 June, 2025; originally announced June 2025.

Comments: 6 pages

Journal ref: IEEE Wireless Communications Letters 2025

arXiv:2504.18425 [pdf, other]

Kimi-Audio Technical Report

Authors: KimiTeam, Ding Ding, Zeqian Ju, Yichong Leng, Songxiang Liu, Tong Liu, Zeyu Shang, Kai Shen, Wei Song, Xu Tan, Heyi Tang, Zhengtao Wang, Chu Wei, Yifei Xin, Xinran Xu, Jianwei Yu, Yutao Zhang, Xinyu Zhou, Y. Charles, Jun Chen, Yanru Chen, Yulun Du, Weiran He, Zhenxing Hu, Guokun Lai , et al. (15 additional authors not shown)

Abstract: We present Kimi-Audio, an open-source audio foundation model that excels in audio understanding, generation, and conversation. We detail the practices in building Kimi-Audio, including model architecture, data curation, training recipe, inference deployment, and evaluation. Specifically, we leverage a 12.5Hz audio tokenizer, design a novel LLM-based architecture with continuous features as input a… ▽ More We present Kimi-Audio, an open-source audio foundation model that excels in audio understanding, generation, and conversation. We detail the practices in building Kimi-Audio, including model architecture, data curation, training recipe, inference deployment, and evaluation. Specifically, we leverage a 12.5Hz audio tokenizer, design a novel LLM-based architecture with continuous features as input and discrete tokens as output, and develop a chunk-wise streaming detokenizer based on flow matching. We curate a pre-training dataset that consists of more than 13 million hours of audio data covering a wide range of modalities including speech, sound, and music, and build a pipeline to construct high-quality and diverse post-training data. Initialized from a pre-trained LLM, Kimi-Audio is continual pre-trained on both audio and text data with several carefully designed tasks, and then fine-tuned to support a diverse of audio-related tasks. Extensive evaluation shows that Kimi-Audio achieves state-of-the-art performance on a range of audio benchmarks including speech recognition, audio understanding, audio question answering, and speech conversation. We release the codes, model checkpoints, as well as the evaluation toolkits in https://github.com/MoonshotAI/Kimi-Audio. △ Less

Submitted 25 April, 2025; originally announced April 2025.

arXiv:2503.17005 [pdf]

Autonomous Exploration-Based Precise Mapping for Mobile Robots through Stepwise and Consistent Motions

Authors: Muhua Zhang, Lei Ma, Ying Wu, Kai Shen, Yongkui Sun, Henry Leung

Abstract: This paper presents an autonomous exploration framework. It is designed for indoor ground mobile robots that utilize laser Simultaneous Localization and Mapping (SLAM), ensuring process completeness and precise mapping results. For frontier search, the local-global sampling architecture based on multiple Rapidly Exploring Random Trees (RRTs) is employed. Traversability checks during RRT expansion… ▽ More This paper presents an autonomous exploration framework. It is designed for indoor ground mobile robots that utilize laser Simultaneous Localization and Mapping (SLAM), ensuring process completeness and precise mapping results. For frontier search, the local-global sampling architecture based on multiple Rapidly Exploring Random Trees (RRTs) is employed. Traversability checks during RRT expansion and global RRT pruning upon map updates eliminate unreachable frontiers, reducing potential collisions and deadlocks. Adaptive sampling density adjustments, informed by obstacle distribution, enhance exploration coverage potential. For frontier point navigation, a stepwise consistent motion strategy is adopted, wherein the robot strictly drives straight on approximately equidistant line segments in the polyline path and rotates in place at segment junctions. This simplified, decoupled motion pattern improves scan-matching stability and mitigates map drift. For process control, the framework serializes frontier point selection and navigation, avoiding oscillation caused by frequent goal changes in conventional parallelized processes. The waypoint retracing mechanism is introduced to generate repeated observations, triggering loop closure detection and backend optimization in graph-based SLAM, thereby improving map consistency and precision. Experiments in both simulation and real-world scenarios validate the effectiveness of the framework. It achieves improved mapping coverage and precision in more challenging environments compared to baseline 2D exploration algorithms. It also shows robustness in supporting resource-constrained robot platforms and maintaining mapping consistency across various LiDAR field-of-view (FoV) configurations. △ Less

Submitted 21 March, 2025; originally announced March 2025.

Comments: 8 pages, 11 figures. This work has been submitted to the IEEE for possible publication

arXiv:2503.14345 [pdf, other]

MoonCast: High-Quality Zero-Shot Podcast Generation

Authors: Zeqian Ju, Dongchao Yang, Jianwei Yu, Kai Shen, Yichong Leng, Zhengtao Wang, Xu Tan, Xinyu Zhou, Tao Qin, Xiangyang Li

Abstract: Recent advances in text-to-speech synthesis have achieved notable success in generating high-quality short utterances for individual speakers. However, these systems still face challenges when extending their capabilities to long, multi-speaker, and spontaneous dialogues, typical of real-world scenarios such as podcasts. These limitations arise from two primary challenges: 1) long speech: podcasts… ▽ More Recent advances in text-to-speech synthesis have achieved notable success in generating high-quality short utterances for individual speakers. However, these systems still face challenges when extending their capabilities to long, multi-speaker, and spontaneous dialogues, typical of real-world scenarios such as podcasts. These limitations arise from two primary challenges: 1) long speech: podcasts typically span several minutes, exceeding the upper limit of most existing work; 2) spontaneity: podcasts are marked by their spontaneous, oral nature, which sharply contrasts with formal, written contexts; existing works often fall short in capturing this spontaneity. In this paper, we propose MoonCast, a solution for high-quality zero-shot podcast generation, aiming to synthesize natural podcast-style speech from text-only sources (e.g., stories, technical reports, news in TXT, PDF, or Web URL formats) using the voices of unseen speakers. To generate long audio, we adopt a long-context language model-based audio modeling approach utilizing large-scale long-context speech data. To enhance spontaneity, we utilize a podcast generation module to generate scripts with spontaneous details, which have been empirically shown to be as crucial as the text-to-speech modeling itself. Experiments demonstrate that MoonCast outperforms baselines, with particularly notable improvements in spontaneity and coherence. △ Less

Submitted 19 March, 2025; v1 submitted 18 March, 2025; originally announced March 2025.

arXiv:2502.13429 [pdf, other]

Uplink Coordinated Pilot Design for 1-bit Massive MIMO in Correlated Channel

Authors: Hyeongtak Yun, Juntaek Han, Kaiming Shen, Jeonghun Park

Abstract: In this paper, we propose a coordinated pilot design method to minimize the channel estimation mean squared error (MSE) in 1-bit analog-to-digital converters (ADCs) massive multiple-input multiple-output (MIMO). Under the assumption that the well-known Bussgang linear minimum mean square error (BLMMSE) estimator is used for channel estimation, we first observe that the resulting MSE leads to an in… ▽ More In this paper, we propose a coordinated pilot design method to minimize the channel estimation mean squared error (MSE) in 1-bit analog-to-digital converters (ADCs) massive multiple-input multiple-output (MIMO). Under the assumption that the well-known Bussgang linear minimum mean square error (BLMMSE) estimator is used for channel estimation, we first observe that the resulting MSE leads to an intractable optimization problem, as it involves the arcsin function and a complex multiple matrix ratio form. To resolve this, we derive the approximate MSE by assuming the low signal-to-noise ratio (SNR) regime, by which we develop an efficient coordinated pilot design based on a fractional programming technique. The proposed pilot design is distinguishable from the existing work in that it is applicable in general system environments, including correlated channel and multi-cell environments. We demonstrate that the proposed method outperforms the channel estimation accuracy performance compared to the conventional approaches. △ Less

Submitted 19 February, 2025; originally announced February 2025.

Comments: 5 pages, 2 figures

arXiv:2501.15536 [pdf, ps, other]

Intelligent Surface Assisted Radar Stealth Against Unauthorized ISAC

Authors: Fan Xu, Wenhai Lai, Kaiming Shen

Abstract: The integration of radar sensors and communication networks as envisioned for the 6G wireless networks poses significant security risks, e.g., the user position information can be released to an unauthorized dual-functional base station (DFBS). To address this issue, we propose an intelligent surface (IS)-assisted radar stealth technology that prevents adversarial sensing. Specifically, we modify… ▽ More The integration of radar sensors and communication networks as envisioned for the 6G wireless networks poses significant security risks, e.g., the user position information can be released to an unauthorized dual-functional base station (DFBS). To address this issue, we propose an intelligent surface (IS)-assisted radar stealth technology that prevents adversarial sensing. Specifically, we modify the wireless channels by tuning the phase shifts of IS in order to protect the target user from unauthorized sensing without jeopardizing the wireless communication link. In principle, we wish to maximize the distortion between the estimated angle-of-arrival (AoA) by the DFBS and the ground truth given the minimum signal-to-noise-radio (SNR) constraint for communication. Toward this end, we propose characterizing the problem as a game played by the DFBS and the IS, in which the DFBS aims to maximize a particular utility while the IS aims to minimize the utility. Although the problem is nonconvex, this paper shows that it can be optimally solved in closed form from a geometric perspective. According to the simulations, the proposed closed-form algorithm outperforms the baseline methods significantly in combating unauthorized sensing while limiting the impacts on wireless communications. △ Less

Submitted 26 January, 2025; originally announced January 2025.

Comments: 5 pages, 6 figures

arXiv:2407.20914 [pdf, ps, other]

doi 10.1109/LSP.2024.3436669

An Efficient Convex-Hull Relaxation Based Algorithm for Multi-User Discrete Passive Beamforming

Authors: Wenhai Lai, Zheyu Wu, Yi Feng, Kaiming Shen, Ya-Feng Liu

Abstract: Intelligent reflecting surface (IRS) is an emerging technology to enhance spatial multiplexing in wireless networks. This letter considers the discrete passive beamforming design for IRS in order to maximize the minimum signal-to-interference-plus-noise ratio (SINR) among multiple users in an IRS-assisted downlink network. The main design difficulty lies in the discrete phase-shift constraint. Dif… ▽ More Intelligent reflecting surface (IRS) is an emerging technology to enhance spatial multiplexing in wireless networks. This letter considers the discrete passive beamforming design for IRS in order to maximize the minimum signal-to-interference-plus-noise ratio (SINR) among multiple users in an IRS-assisted downlink network. The main design difficulty lies in the discrete phase-shift constraint. Differing from most existing works, this letter advocates a convex-hull relaxation of the discrete constraints which leads to a continuous reformulated problem equivalent to the original discrete problem. This letter further proposes an efficient alternating projection/proximal gradient descent and ascent algorithm for solving the reformulated problem. Simulation results show that the proposed algorithm outperforms the state-of-the-art methods significantly. △ Less

Submitted 28 August, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

Comments: 5 pages

Journal ref: IEEE Signal Processing Letters 2024

arXiv:2407.12648 [pdf, ps, other]

Blind Beamforming for Coverage Enhancement with Intelligent Reflecting Surface

Authors: Fan Xu, Jiawei Yao, Wenhai Lai, Kaiming Shen, Xin Li, Xin Chen, Zhi-Quan Luo

Abstract: Conventional policy for configuring an intelligent reflecting surface (IRS) typically requires channel state information (CSI), thus incurring substantial overhead costs and facing incompatibility with the current network protocols. This paper proposes a blind beamforming strategy in the absence of CSI, aiming to boost the minimum signal-to-noise ratio (SNR) among all the receiver positions, namel… ▽ More Conventional policy for configuring an intelligent reflecting surface (IRS) typically requires channel state information (CSI), thus incurring substantial overhead costs and facing incompatibility with the current network protocols. This paper proposes a blind beamforming strategy in the absence of CSI, aiming to boost the minimum signal-to-noise ratio (SNR) among all the receiver positions, namely the coverage enhancement. Although some existing works already consider the IRS-assisted coverage enhancement without CSI, they assume certain position-channel models through which the channels can be recovered from the geographic locations. In contrast, our approach solely relies on the received signal power data, not assuming any position-channel model. We examine the achievability and converse of the proposed blind beamforming method. If the IRS has $N$ reflective elements and there are $U$ receiver positions, then our method guarantees the minimum SNR of $Ω(N^2/U)$ -- which is fairly close to the upper bound $O(N+N^2\sqrt{\ln (NU)}/\sqrt[4]{U})$. Aside from the simulation results, we justify the practical use of blind beamforming in a field test at 2.6 GHz. According to the real-world experiment, the proposed blind beamforming method boosts the minimum SNR across seven random positions in a conference room by 18.22 dB, while the position-based method yields a boost of 12.08 dB. △ Less

Submitted 17 July, 2024; originally announced July 2024.

Comments: 17 pages

arXiv:2406.10910 [pdf, other]

doi 10.1109/TWC.2025.3556301

Fast Fractional Programming for Multi-Cell Integrated Sensing and Communications

Authors: Yannan Chen, Yi Feng, Xiaoyang Li, Licheng Zhao, Kaiming Shen

Abstract: This paper concerns the coordinate multi-cell beamforming design for integrated sensing and communications (ISAC). In particular, we assume that each base station (BS) has massive antennas. The optimization objective is to maximize a weighted sum of the data rates (for communications) and the Fisher information (for sensing). We first show that the conventional beamforming method for the multiple-… ▽ More This paper concerns the coordinate multi-cell beamforming design for integrated sensing and communications (ISAC). In particular, we assume that each base station (BS) has massive antennas. The optimization objective is to maximize a weighted sum of the data rates (for communications) and the Fisher information (for sensing). We first show that the conventional beamforming method for the multiple-input multiple-output (MIMO) transmission, i.e., the weighted minimum mean square error (WMMSE) algorithm, works for the ISAC problem case from a fractional programming (FP) perspective. However, the WMMSE algorithm frequently requires computing the $N\times N$ matrix inverse, where $N$ is the number of transmit or receive antennas, so the algorithm becomes quite costly when antennas are massively deployed. To address this issue, we develop a nonhomogeneous bound and use it in conjunction with the FP technique to solve the ISAC beamforming problem without the need to invert any large matrices. It is further shown that the resulting new FP algorithm has an intimate connection with gradient projection, based on which we can accelerate the convergence via Nesterov's gradient extrapolation. △ Less

Submitted 27 March, 2025; v1 submitted 16 June, 2024; originally announced June 2024.

Comments: 17 pages

Journal ref: IEEE Transactions on Wireless Communications 2025

arXiv:2404.03204 [pdf, other]

RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis

Authors: Detai Xin, Xu Tan, Kai Shen, Zeqian Ju, Dongchao Yang, Yuancheng Wang, Shinnosuke Takamichi, Hiroshi Saruwatari, Shujie Liu, Jinyu Li, Sheng Zhao

Abstract: We present RALL-E, a robust language modeling method for text-to-speech (TTS) synthesis. While previous work based on large language models (LLMs) shows impressive performance on zero-shot TTS, such methods often suffer from poor robustness, such as unstable prosody (weird pitch and rhythm/duration) and a high word error rate (WER), due to the autoregressive prediction style of language models. Th… ▽ More We present RALL-E, a robust language modeling method for text-to-speech (TTS) synthesis. While previous work based on large language models (LLMs) shows impressive performance on zero-shot TTS, such methods often suffer from poor robustness, such as unstable prosody (weird pitch and rhythm/duration) and a high word error rate (WER), due to the autoregressive prediction style of language models. The core idea behind RALL-E is chain-of-thought (CoT) prompting, which decomposes the task into simpler steps to enhance the robustness of LLM-based TTS. To accomplish this idea, RALL-E first predicts prosody features (pitch and duration) of the input text and uses them as intermediate conditions to predict speech tokens in a CoT style. Second, RALL-E utilizes the predicted duration prompt to guide the computing of self-attention weights in Transformer to enforce the model to focus on the corresponding phonemes and prosody features when predicting speech tokens. Results of comprehensive objective and subjective evaluations demonstrate that, compared to a powerful baseline method VALL-E, RALL-E significantly improves the WER of zero-shot TTS from $5.6\%$ (without reranking) and $1.7\%$ (with reranking) to $2.5\%$ and $1.0\%$, respectively. Furthermore, we demonstrate that RALL-E correctly synthesizes sentences that are hard for VALL-E and reduces the error rate from $68\%$ to $4\%$. △ Less

Submitted 19 May, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

arXiv:2403.03100 [pdf, other]

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Authors: Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan, Detai Xin, Dongchao Yang, Yanqing Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu, Tao Qin, Xiang-Yang Li, Wei Ye, Shikun Zhang, Jiang Bian, Lei He, Jinyu Li, Sheng Zhao

Abstract: While recent large-scale text-to-speech (TTS) models have achieved significant progress, they still fall short in speech quality, similarity, and prosody. Considering speech intricately encompasses various attributes (e.g., content, prosody, timbre, and acoustic details) that pose significant challenges for generation, a natural idea is to factorize speech into individual subspaces representing di… ▽ More While recent large-scale text-to-speech (TTS) models have achieved significant progress, they still fall short in speech quality, similarity, and prosody. Considering speech intricately encompasses various attributes (e.g., content, prosody, timbre, and acoustic details) that pose significant challenges for generation, a natural idea is to factorize speech into individual subspaces representing different attributes and generate them individually. Motivated by it, we propose NaturalSpeech 3, a TTS system with novel factorized diffusion models to generate natural speech in a zero-shot way. Specifically, 1) we design a neural codec with factorized vector quantization (FVQ) to disentangle speech waveform into subspaces of content, prosody, timbre, and acoustic details; 2) we propose a factorized diffusion model to generate attributes in each subspace following its corresponding prompt. With this factorization design, NaturalSpeech 3 can effectively and efficiently model intricate speech with disentangled subspaces in a divide-and-conquer way. Experiments show that NaturalSpeech 3 outperforms the state-of-the-art TTS systems on quality, similarity, prosody, and intelligibility, and achieves on-par quality with human recordings. Furthermore, we achieve better performance by scaling to 1B parameters and 200K hours of training data. △ Less

Submitted 23 April, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

Comments: Achieving human-level quality and naturalness on multi-speaker datasets (e.g., LibriSpeech) in a zero-shot way

arXiv:2312.16918 [pdf, other]

Intelligent Surfaces Empowered Wireless Network: Recent Advances and The Road to 6G

Authors: Qingqing Wu, Beixiong Zheng, Changsheng You, Lipeng Zhu, Kaiming Shen, Xiaodan Shao, Weidong Mei, Boya Di, Hongliang Zhang, Ertugrul Basar, Lingyang Song, Marco Di Renzo, Zhi-Quan Luo, Rui Zhang

Abstract: Intelligent surfaces (ISs) have emerged as a key technology to empower a wide range of appealing applications for wireless networks, due to their low cost, high energy efficiency, flexibility of deployment and capability of constructing favorable wireless channels/radio environments. Moreover, the recent advent of several new IS architectures further expanded their electromagnetic functionalities… ▽ More Intelligent surfaces (ISs) have emerged as a key technology to empower a wide range of appealing applications for wireless networks, due to their low cost, high energy efficiency, flexibility of deployment and capability of constructing favorable wireless channels/radio environments. Moreover, the recent advent of several new IS architectures further expanded their electromagnetic functionalities from passive reflection to active amplification, simultaneous reflection and refraction, as well as holographic beamforming. However, the research on ISs is still in rapid progress and there have been recent technological advances in ISs and their emerging applications that are worthy of a timely review. Thus, we provide in this paper a comprehensive survey on the recent development and advances of ISs aided wireless networks. Specifically, we start with an overview on the anticipated use cases of ISs in future wireless networks such as 6G, followed by a summary of the recent standardization activities related to ISs. Then, the main design issues of the commonly adopted reflection-based IS and their state-of-the-art solutions are presented in detail, including reflection optimization, deployment, signal modulation, wireless sensing, and integrated sensing and communications. Finally, recent progress and new challenges in advanced IS architectures are discussed to inspire futrue research. △ Less

Submitted 24 March, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

arXiv:2311.04546 [pdf, ps, other]

Discerning and Enhancing the Weighted Sum-Rate Maximization Algorithms in Communications

Authors: Zepeng Zhang, Ziping Zhao, Kaiming Shen, Daniel P. Palomar, Wei Yu

Abstract: Weighted sum-rate (WSR) maximization plays a critical role in communication system design. This paper examines three optimization methods for WSR maximization, which ensure convergence to stationary points: two block coordinate ascent (BCA) algorithms, namely, weighted sum-minimum mean-square error (WMMSE) and WSR maximization via fractional programming (WSR-FP), along with a minorization-maximiza… ▽ More Weighted sum-rate (WSR) maximization plays a critical role in communication system design. This paper examines three optimization methods for WSR maximization, which ensure convergence to stationary points: two block coordinate ascent (BCA) algorithms, namely, weighted sum-minimum mean-square error (WMMSE) and WSR maximization via fractional programming (WSR-FP), along with a minorization-maximization (MM) algorithm, WSR maximization via MM (WSR-MM). Our contributions are threefold. Firstly, we delineate the exact relationships among WMMSE, WSR-FP, and WSR-MM, which, despite their extensive use in the literature, lack a comprehensive comparative study. By probing the theoretical underpinnings linking the BCA and MM algorithmic frameworks, we reveal the direct correlations between the equivalent transformation techniques, essential to the development of WMMSE and WSR-FP, and the surrogate functions pivotal to WSR-MM. Secondly, we propose a novel algorithm, WSR-MM+, harnessing the flexibility of selecting surrogate functions in MM framework. By circumventing the repeated matrix inversions in the search for optimal Lagrange multipliers in existing algorithms, WSR-MM+ significantly reduces the computational load per iteration and accelerates convergence. Thirdly, we reconceptualize WSR-MM+ within the BCA framework, introducing a new equivalent transform, which gives rise to an enhanced version of WSR-FP, named as WSR-FP+. We further demonstrate that WSR-MM+ can be construed as the basic gradient projection method. This perspective yields a deeper understanding into its computational intricacies. Numerical simulations corroborate the connections between WMMSE, WSR-FP, and WSR-MM and confirm the efficacy of the proposed WSR-MM+ and WSR-FP+ algorithms. △ Less

Submitted 8 November, 2023; originally announced November 2023.

arXiv:2310.08705 [pdf, other]

A Benchmarking Protocol for SAR Colorization: From Regression to Deep Learning Approaches

Authors: Kangqing Shen, Gemine Vivone, Xiaoyuan Yang, Simone Lolli, Michael Schmitt

Abstract: Synthetic aperture radar (SAR) images are widely used in remote sensing. Interpreting SAR images can be challenging due to their intrinsic speckle noise and grayscale nature. To address this issue, SAR colorization has emerged as a research direction to colorize gray scale SAR images while preserving the original spatial information and radiometric information. However, this research field is stil… ▽ More Synthetic aperture radar (SAR) images are widely used in remote sensing. Interpreting SAR images can be challenging due to their intrinsic speckle noise and grayscale nature. To address this issue, SAR colorization has emerged as a research direction to colorize gray scale SAR images while preserving the original spatial information and radiometric information. However, this research field is still in its early stages, and many limitations can be highlighted. In this paper, we propose a full research line for supervised learning-based approaches to SAR colorization. Our approach includes a protocol for generating synthetic color SAR images, several baselines, and an effective method based on the conditional generative adversarial network (cGAN) for SAR colorization. We also propose numerical assessment metrics for the problem at hand. To our knowledge, this is the first attempt to propose a research line for SAR colorization that includes a protocol, a benchmark, and a complete performance evaluation. Our extensive tests demonstrate the effectiveness of our proposed cGAN-based network for SAR colorization. The code will be made publicly available. △ Less

Submitted 12 October, 2023; originally announced October 2023.

Comments: 16 pages, 16 figures, 6 tables

arXiv:2309.02285 [pdf, other]

PromptTTS 2: Describing and Generating Voices with Text Prompt

Authors: Yichong Leng, Zhifang Guo, Kai Shen, Xu Tan, Zeqian Ju, Yanqing Liu, Yufei Liu, Dongchao Yang, Leying Zhang, Kaitao Song, Lei He, Xiang-Yang Li, Sheng Zhao, Tao Qin, Jiang Bian

Abstract: Speech conveys more information than text, as the same word can be uttered in various voices to convey diverse information. Compared to traditional text-to-speech (TTS) methods relying on speech prompts (reference speech) for voice variability, using text prompts (descriptions) is more user-friendly since speech prompts can be hard to find or may not exist at all. TTS approaches based on the text… ▽ More Speech conveys more information than text, as the same word can be uttered in various voices to convey diverse information. Compared to traditional text-to-speech (TTS) methods relying on speech prompts (reference speech) for voice variability, using text prompts (descriptions) is more user-friendly since speech prompts can be hard to find or may not exist at all. TTS approaches based on the text prompt face two main challenges: 1) the one-to-many problem, where not all details about voice variability can be described in the text prompt, and 2) the limited availability of text prompt datasets, where vendors and large cost of data labeling are required to write text prompts for speech. In this work, we introduce PromptTTS 2 to address these challenges with a variation network to provide variability information of voice not captured by text prompts, and a prompt generation pipeline to utilize the large language models (LLM) to compose high quality text prompts. Specifically, the variation network predicts the representation extracted from the reference speech (which contains full information about voice variability) based on the text prompt representation. For the prompt generation pipeline, it generates text prompts for speech with a speech language understanding model to recognize voice attributes (e.g., gender, speed) from speech and a large language model to formulate text prompts based on the recognition results. Experiments on a large-scale (44K hours) speech dataset demonstrate that compared to the previous works, PromptTTS 2 generates voices more consistent with text prompts and supports the sampling of diverse voice variability, thereby offering users more choices on voice generation. Additionally, the prompt generation pipeline produces high-quality text prompts, eliminating the large labeling cost. The demo page of PromptTTS 2 is available online. △ Less

Submitted 11 October, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

Comments: Demo page: https://speechresearch.github.io/prompttts2

arXiv:2309.01480 [pdf, ps, other]

EventTrojan: Manipulating Non-Intrusive Speech Quality Assessment via Imperceptible Events

Authors: Ying Ren, Kailai Shen, Zhe Ye, Diqun Yan

Abstract: Non-Intrusive speech quality assessment (NISQA) has gained significant attention for predicting speech's mean opinion score (MOS) without requiring the reference speech. Researchers have gradually started to apply NISQA to various practical scenarios. However, little attention has been paid to the security of NISQA models. Backdoor attacks represent the most serious threat to deep neural networks… ▽ More Non-Intrusive speech quality assessment (NISQA) has gained significant attention for predicting speech's mean opinion score (MOS) without requiring the reference speech. Researchers have gradually started to apply NISQA to various practical scenarios. However, little attention has been paid to the security of NISQA models. Backdoor attacks represent the most serious threat to deep neural networks (DNNs) due to the fact that backdoors possess a very high attack success rate once embedded. However, existing backdoor attacks assume that the attacker actively feeds samples containing triggers into the model during the inference phase. This is not adapted to the specific scenario of NISQA. And current backdoor attacks on regression tasks lack an objective metric to measure the attack performance. To address these issues, we propose a novel backdoor triggering approach (EventTrojan) that utilizes an event during the usage of the NISQA model as a trigger. Moreover, we innovatively provide an objective metric for backdoor attacks on regression tasks. Extensive experiments on four benchmark datasets demonstrate the effectiveness of the EventTrojan attack. Besides, it also has good resistance to several defense methods. △ Less

Submitted 11 September, 2024; v1 submitted 4 September, 2023; originally announced September 2023.

Comments: Accepted by ICME2024

arXiv:2308.04179 [pdf, other]

Breaking Speaker Recognition with PaddingBack

Authors: Zhe Ye, Diqun Yan, Li Dong, Kailai Shen

Abstract: Machine Learning as a Service (MLaaS) has gained popularity due to advancements in Deep Neural Networks (DNNs). However, untrusted third-party platforms have raised concerns about AI security, particularly in backdoor attacks. Recent research has shown that speech backdoors can utilize transformations as triggers, similar to image backdoors. However, human ears can easily be aware of these transfo… ▽ More Machine Learning as a Service (MLaaS) has gained popularity due to advancements in Deep Neural Networks (DNNs). However, untrusted third-party platforms have raised concerns about AI security, particularly in backdoor attacks. Recent research has shown that speech backdoors can utilize transformations as triggers, similar to image backdoors. However, human ears can easily be aware of these transformations, leading to suspicion. In this paper, we propose PaddingBack, an inaudible backdoor attack that utilizes malicious operations to generate poisoned samples, rendering them indistinguishable from clean ones. Instead of using external perturbations as triggers, we exploit the widely-used speech signal operation, padding, to break speaker recognition systems. Experimental results demonstrate the effectiveness of our method, achieving a significant attack success rate while retaining benign accuracy. Furthermore, PaddingBack demonstrates the ability to resist defense methods and maintain its stealthiness against human perception. △ Less

Submitted 11 March, 2024; v1 submitted 8 August, 2023; originally announced August 2023.

arXiv:2305.18998 [pdf, other]

Adaptive Blind Beamforming for Intelligent Surface

Authors: Wenhai Lai, Wenyu Wang, Fan Xu, Xin Li, Shaobo Niu, Kaiming Shen

Abstract: Configuring intelligent surface (IS) or passive antenna array without any channel knowledge, namely blind beamforming, is a frontier research topic in the wireless communication field. Existing methods in the previous literature for blind beamforming include the RFocus and the CSM, the effectiveness of which has been demonstrated on hardware prototypes. However, this paper points out a subtle issu… ▽ More Configuring intelligent surface (IS) or passive antenna array without any channel knowledge, namely blind beamforming, is a frontier research topic in the wireless communication field. Existing methods in the previous literature for blind beamforming include the RFocus and the CSM, the effectiveness of which has been demonstrated on hardware prototypes. However, this paper points out a subtle issue with these blind beamforming algorithms: the RFocus and the CSM may fail to work in the non-line-of-sight (NLoS) channel case. To address this issue, we suggest a grouping strategy that enables adaptive blind beamforming. Specifically, the reflective elements (REs) of the IS are divided into three groups; each group is configured randomly to obtain a dataset of random samples. We then extract the statistical feature of the wireless environment from the random samples, thereby coordinating phase shifts of the IS without channel acquisition. The RE grouping plays a critical role in guaranteeing performance gain in the NLoS case. In particular, if we place all the REs in the same group, the proposed algorithm would reduce to the RFocus and the CSM. We validate the advantage of the proposed blind beamforming algorithm in the real-world networks at 3.5 GHz aside from simulations. △ Less

Submitted 24 September, 2024; v1 submitted 30 May, 2023; originally announced May 2023.

Comments: 14 pages, 14 figures

arXiv:2304.09116 [pdf, other]

NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers

Authors: Kai Shen, Zeqian Ju, Xu Tan, Yanqing Liu, Yichong Leng, Lei He, Tao Qin, Sheng Zhao, Jiang Bian

Abstract: Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wild datasets is important to capture the diversity in human speech such as speaker identities, prosodies, and styles (e.g., singing). Current large TTS systems usually quantize speech into discrete tokens and use language models to generate these tokens one by one, which suffer from unstable prosody, word skipping/repeating is… ▽ More Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wild datasets is important to capture the diversity in human speech such as speaker identities, prosodies, and styles (e.g., singing). Current large TTS systems usually quantize speech into discrete tokens and use language models to generate these tokens one by one, which suffer from unstable prosody, word skipping/repeating issue, and poor voice quality. In this paper, we develop NaturalSpeech 2, a TTS system that leverages a neural audio codec with residual vector quantizers to get the quantized latent vectors and uses a diffusion model to generate these latent vectors conditioned on text input. To enhance the zero-shot capability that is important to achieve diverse speech synthesis, we design a speech prompting mechanism to facilitate in-context learning in the diffusion model and the duration/pitch predictor. We scale NaturalSpeech 2 to large-scale datasets with 44K hours of speech and singing data and evaluate its voice quality on unseen speakers. NaturalSpeech 2 outperforms previous TTS systems by a large margin in terms of prosody/timbre similarity, robustness, and voice quality in a zero-shot setting, and performs novel zero-shot singing synthesis with only a speech prompt. Audio samples are available at https://speechresearch.github.io/naturalspeech2. △ Less

Submitted 30 May, 2023; v1 submitted 18 April, 2023; originally announced April 2023.

Comments: A large-scale text-to-speech and singing voice synthesis system with latent diffusion models. Update: NaturalSpeech 2 extension to voice conversion and speech enhancement

arXiv:2302.09717 [pdf, ps, other]

doi 10.1109/TSP.2023.3334818

Coordinating Multiple Intelligent Reflecting Surfaces without Channel Information

Authors: Fan Xu, Jiawei Yao, Wenhai Lai, Kaiming Shen, Xin Li, Xin Chen, Zhi-Quan Luo

Abstract: Conventional beamforming methods for intelligent reflecting surfaces (IRSs) or reconfigurable intelligent surfaces (RISs) typically entail the full channel state information (CSI). However, the computational cost of channel acquisition soars exponentially with the number of IRSs. To bypass this difficulty, we propose a novel strategy called blind beamforming that coordinates multiple IRSs by means… ▽ More Conventional beamforming methods for intelligent reflecting surfaces (IRSs) or reconfigurable intelligent surfaces (RISs) typically entail the full channel state information (CSI). However, the computational cost of channel acquisition soars exponentially with the number of IRSs. To bypass this difficulty, we propose a novel strategy called blind beamforming that coordinates multiple IRSs by means of statistics without knowing CSI. Blind beamforming only requires measuring the received signal power at the user terminal for a sequence of randomly generated phase shifts across all IRSs. The main idea is to extract the key statistical quantity for beamforming by exploring only a small portion of the whole solution space of phase shifts. We show that blind beamforming guarantees a signal-to-noise ratio (SNR) boost of Theta(N^{2L}) under certain conditions, where L is the number of IRSs and N is the number of reflecting elements per IRS. The proposed conditions for achieving the optimal SNR boost of Theta(N^{4}) in a double-IRS system are much easier to satisfy than the existing ones in the literature. Most importantly, the proposed conditions can be extended to a fully general L-IRS system. The above result significantly improves upon the state of the art in the area of multi-IRS-assisted communication. Moreover, blind beamforming is justified via field tests and simulations. In particular, as shown in our field tests at 2.6 GHz, our method yields up to 17 dB SNR boost; to the best of our knowledge, this is the first time that the use of multiple IRSs gets verified in the real world. △ Less

Submitted 8 January, 2024; v1 submitted 19 February, 2023; originally announced February 2023.

Comments: 16 pages

Journal ref: IEEE Transactions on Signal Processing 2024

arXiv:2302.06727 [pdf, other]

doi 10.1038/s41598-024-54251-1

Deep Learning Predicts Prevalent and Incident Parkinson's Disease From UK Biobank Fundus Imaging

Authors: Charlie Tran, Kai Shen, Kang Liu, Akshay Ashok, Adolfo Ramirez-Zamora, Jinghua Chen, Yulin Li, Ruogu Fang

Abstract: Parkinson's disease is the world's fastest-growing neurological disorder. Research to elucidate the mechanisms of Parkinson's disease and automate diagnostics would greatly improve the treatment of patients with Parkinson's disease. Current diagnostic methods are expensive and have limited availability. Considering the insidious and preclinical onset and progression of the disease, a desirable scr… ▽ More Parkinson's disease is the world's fastest-growing neurological disorder. Research to elucidate the mechanisms of Parkinson's disease and automate diagnostics would greatly improve the treatment of patients with Parkinson's disease. Current diagnostic methods are expensive and have limited availability. Considering the insidious and preclinical onset and progression of the disease, a desirable screening should be diagnostically accurate even before the onset of symptoms to allow medical interventions. We highlight retinal fundus imaging, often termed a window to the brain, as a diagnostic screening modality for Parkinson's disease. We conducted a systematic evaluation of conventional machine learning and deep learning techniques to classify Parkinson's disease from UK Biobank fundus imaging. Our results show that Parkinson's disease individuals can be differentiated from age and gender-matched healthy subjects with an Area Under the Curve (AUC) of 0.77. This accuracy is maintained when predicting either prevalent or incident Parkinson's disease. Explainability and trustworthiness are enhanced by visual attribution maps of localized biomarkers and quantified metrics of model robustness to data perturbations. △ Less

Submitted 18 February, 2024; v1 submitted 13 February, 2023; originally announced February 2023.

Comments: 17 pages, 4 figures, 2 tables, 4 supplementary tables

arXiv:2211.04704 [pdf, other]

doi 10.1109/LWC.2022.3232146

A Linear Time Algorithm for the Optimal Discrete IRS Beamforming

Authors: Shuyi Ren, Kaiming Shen, Xin Li, Xin Chen, Zhi-Quan Luo

Abstract: It remains an open problem to find the optimal configuration of phase shifts under the discrete constraint for intelligent reflecting surface (IRS) in polynomial time. The above problem is widely believed to be difficult because it is not linked to any known combinatorial problems that can be solved efficiently. The branch-and-bound algorithms and the approximation algorithms constitute the best r… ▽ More It remains an open problem to find the optimal configuration of phase shifts under the discrete constraint for intelligent reflecting surface (IRS) in polynomial time. The above problem is widely believed to be difficult because it is not linked to any known combinatorial problems that can be solved efficiently. The branch-and-bound algorithms and the approximation algorithms constitute the best results in this area. Nevertheless, this work shows that the global optimum can actually be reached in linear time on average in terms of the number of reflective elements (REs) of IRS. The main idea is to geometrically interpret the discrete beamforming problem as choosing the optimal point on the unit circle. Although the number of possible combinations of phase shifts grows exponentially with the number of REs, it turns out that there are only a linear number of circular arcs that possibly contain the optimal point. Furthermore, the proposed algorithm can be viewed as a novel approach to a special case of the discrete quadratic program (QP). △ Less

Submitted 7 September, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

Comments: 5 pages

arXiv:2205.09306 [pdf, other]

Joint Device Selection and Power Control for Wireless Federated Learning

Authors: Wei Guo, Ran Li, Chuan Huang, Xiaoqi Qin, Kaiming Shen, Wei Zhang

Abstract: This paper studies the joint device selection and power control scheme for wireless federated learning (FL), considering both the downlink and uplink communications between the parameter server (PS) and the terminal devices. In each round of model training, the PS first broadcasts the global model to the terminal devices in an analog fashion, and then the terminal devices perform local training an… ▽ More This paper studies the joint device selection and power control scheme for wireless federated learning (FL), considering both the downlink and uplink communications between the parameter server (PS) and the terminal devices. In each round of model training, the PS first broadcasts the global model to the terminal devices in an analog fashion, and then the terminal devices perform local training and upload the updated model parameters to the PS via over-the-air computation (AirComp). First, we propose an AirComp-based adaptive reweighing scheme for the aggregation of local updated models, where the model aggregation weights are directly determined by the uplink transmit power values of the selected devices and which enables the joint learning and communication optimization simply by the device selection and power control. Furthermore, we provide a convergence analysis for the proposed wireless FL algorithm and the upper bound on the expected optimality gap between the expected and optimal global loss values is derived. With instantaneous channel state information (CSI), we formulate the optimality gap minimization problems under both the individual and sum uplink transmit power constraints, respectively, which are shown to be solved by the semidefinite programming (SDR) technique. Numerical results reveal that our proposed wireless FL algorithm achieves close to the best performance by using the ideal FedAvg scheme with error-free model exchange and full device participation. △ Less

Submitted 18 May, 2022; originally announced May 2022.

arXiv:2112.02260 [pdf, ps, other]

doi 10.1109/JSTSP.2022.3176479

Configuring Intelligent Reflecting Surface with Performance Guarantees: Optimal Beamforming

Authors: Yaowen Zhang, Kaiming Shen, Shuyi Ren, Xin Li, Xin Chen, Zhi-Quan Luo

Abstract: This work proposes linear time strategies to optimally configure the phase shifts for the reflective elements of an intelligent reflecting surface (IRS). Specifically, we show that the binary phase beamforming can be optimally solved in linear time to maximize the received signal-to-noise ratio (SNR). For the general K-ary phase beamforming, we develop a linear time approximation algorithm that gu… ▽ More This work proposes linear time strategies to optimally configure the phase shifts for the reflective elements of an intelligent reflecting surface (IRS). Specifically, we show that the binary phase beamforming can be optimally solved in linear time to maximize the received signal-to-noise ratio (SNR). For the general K-ary phase beamforming, we develop a linear time approximation algorithm that guarantees performance within a constant fraction (1+\cos(π/K))/2 of the global optimum, e.g., it can attain over 85% of the optimal performance for the quadrature beamforming with K=4. According to the numerical results, the proposed approximation algorithm for discrete IRS beamforming outperforms the existing algorithms significantly in boosting the received SNR. △ Less

Submitted 4 December, 2021; originally announced December 2021.

Comments: 9 pages, 10 figures

arXiv:2104.06189 [pdf]

Numerical Energy Analysis of In-wheel Motor Driven Autonomous Electric Vehicles

Authors: Kang Shen, Fan Yang, Xinyou Ke, Cheng Zhang, Chris Yuan

Abstract: Autonomous electric vehicles are being widely studied nowadays as the future technology of ground transportation, while the autonomous electric vehicles based on conventional powertrain system limit their energy and power transmission efficiencies and may hinder their broad applications in future. Here we report a study on the energy consumption and efficiency improvement of a mid-size autonomous… ▽ More Autonomous electric vehicles are being widely studied nowadays as the future technology of ground transportation, while the autonomous electric vehicles based on conventional powertrain system limit their energy and power transmission efficiencies and may hinder their broad applications in future. Here we report a study on the energy consumption and efficiency improvement of a mid-size autonomous electric vehicle driven by in-wheel motors, through the development of a numerical energy model, validated with the actual driving data and implemented in a case study. The energy analysis was conducted under three driving conditions: flat road, upslope, and downslope driving to examine the energy consumption, with the energy-saving potential of the in-wheel-motor driven powertrain system systematically explored and discussed. Considering the energy recovery from the regenerative braking, energy consumption and regenerated energy were calculated in specific driving cycles based on vehicle dynamics and autonomous driving patterns. A case study was conducted using the baseline electric vehicle driving data in West Los Angeles. It was found that an in-wheel motor driven autonomous electric vehicle can save up to 17.5% of energy compared with a conventional electric vehicle during the slope driving. Using the efficiency maps of a commercial in-wheel motor, the numerical energy model and validated results obtained from this study are in line with actual situations, and can be used to support sustainable development of more energy-efficient autonomous electric vehicles in the future. △ Less

Submitted 10 April, 2021; originally announced April 2021.

arXiv:2010.05382 [pdf]

doi 10.1038/s41377-020-00403-7

Miniscope3D: optimized single-shot miniature 3D fluorescence microscopy

Authors: Kyrollos Yanny, Nick Antipa, William Liberti, Sam Dehaeck, Kristina Monakhova, Fanglin Linda Liu, Konlin Shen, Ren Ng, Laura Waller

Abstract: Miniature fluorescence microscopes are a standard tool in systems biology. However, widefield miniature microscopes capture only 2D information, and modifications that enable 3D capabilities increase the size and weight and have poor resolution outside a narrow depth range. Here, we achieve the 3D capability by replacing the tube lens of a conventional 2D Miniscope with an optimized multifocal pha… ▽ More Miniature fluorescence microscopes are a standard tool in systems biology. However, widefield miniature microscopes capture only 2D information, and modifications that enable 3D capabilities increase the size and weight and have poor resolution outside a narrow depth range. Here, we achieve the 3D capability by replacing the tube lens of a conventional 2D Miniscope with an optimized multifocal phase mask at the objective's aperture stop. Placing the phase mask at the aperture stop significantly reduces the size of the device, and varying the focal lengths enables a uniform resolution across a wide depth range. The phase mask encodes the 3D fluorescence intensity into a single 2D measurement, and the 3D volume is recovered by solving a sparsity-constrained inverse problem. We provide methods for designing and fabricating the phase mask and an efficient forward model that accounts for the field-varying aberrations in miniature objectives. We demonstrate a prototype that is 17 mm tall and weighs 2.5 grams, achieving 2.76 $μ$m lateral, and 15 $μ$m axial resolution across most of the 900x700x390 $μm^3$ volume at 40 volumes per second. The performance is validated experimentally on resolution targets, dynamic biological samples, and mouse brain tissue. Compared with existing miniature single-shot volume-capture implementations, our system is smaller and lighter and achieves a more than 2x better lateral and axial resolution throughout a 10x larger usable depth range. Our microscope design provides single-shot 3D imaging for applications where a compact platform matters, such as volumetric neural imaging in freely moving animals and 3D motion studies of dynamic samples in incubators and lab-on-a-chip devices. △ Less

Submitted 11 October, 2020; originally announced October 2020.

Comments: Published with Nature Springer in Light: Science and Applications

Journal ref: Light: Science & Applications 9.1 (2020): 1-13

arXiv:2006.13668 [pdf, ps, other]

Stochastic Transceiver Optimization in Multi-Tags Symbiotic Radio Systems

Authors: Xihan Chen, Hei Victor Cheng, Kaiming Shen, An Liu, Min-Jian Zhao

Abstract: Symbiotic radio (SR) is emerging as a spectrum- and energy-efficient communication paradigm for future passive Internet-of-things (IoT), where some single-antenna backscatter devices, referred to as Tags, are parasitic in an active primary transmission. The primary transceiver is designed to assist both direct-link (DL) and backscatter-link (BL) communication. In multi-tags SR systems, the transce… ▽ More Symbiotic radio (SR) is emerging as a spectrum- and energy-efficient communication paradigm for future passive Internet-of-things (IoT), where some single-antenna backscatter devices, referred to as Tags, are parasitic in an active primary transmission. The primary transceiver is designed to assist both direct-link (DL) and backscatter-link (BL) communication. In multi-tags SR systems, the transceiver designs become much more complicated due to the presence of DL and inter-Tag interference, which further poses new challenges to the availability and reliability of DL and BL transmission. To overcome these challenges, we formulate the stochastic optimization of transceiver design as the general network utility maximization problem (GUMP). The resultant problem is a stochastic multiple-ratio fractional non-convex problem, and consequently challenging to solve. By leveraging some fractional programming techniques, we tailor a surrogate function with the specific structure and subsequently develop a batch stochastic parallel decomposition (BSPD) algorithm, which is shown to converge to stationary solutions of the GNUMP. Simulation results verify the effectiveness of the proposed algorithm by numerical examples in terms of the achieved system throughput. △ Less

Submitted 24 June, 2020; originally announced June 2020.

Comments: Accepted by IEEE Internet Things J

arXiv:1912.11678 [pdf, other]

Joint Annotator-and-Spectrum Allocation in Wireless Networks for Crowd Labelling

Authors: Xiaoyang Li, Guangxu Zhu, Kaiming Shen, Wei Yu, Yi Gong, Kaibin Huang

Abstract: The massive sensing data generated by Internet-of-Things will provide fuel for ubiquitous artificial intelligence (AI), automating the operations of our society ranging from transportation to healthcare. The realistic adoption of this technique however entails labelling of the enormous data prior to the training of AI models via supervised learning. To tackle this challenge, we explore a new persp… ▽ More The massive sensing data generated by Internet-of-Things will provide fuel for ubiquitous artificial intelligence (AI), automating the operations of our society ranging from transportation to healthcare. The realistic adoption of this technique however entails labelling of the enormous data prior to the training of AI models via supervised learning. To tackle this challenge, we explore a new perspective of wireless crowd labelling that is capable of downloading data to many imperfect mobile annotators for repetition labelling by exploiting multicasting in wireless networks. In this cross-disciplinary area, the integration of the rate-distortion theory and the principle of repetition labelling for accuracy improvement gives rise to a new tradeoff between radio-and-annotator resources under a constraint on labelling accuracy. Building on the tradeoff and aiming at maximizing the labelling throughput, this work focuses on the joint optimization of encoding rate, annotator clustering, and sub-channel allocation, which results in an NP-hard integer programming problem. To devise an efficient solution approach, we establish an optimal sequential annotator-clustering scheme based on the order of decreasing signal-to-noise ratios. Thereby, the optimal solution can be found by an efficient tree search. Next, the solution is simplified by applying truncated channel inversion. Alternatively, the optimization problem can be recognized as a knapsack problem, which can be efficiently solved in pseudo-polynomial time by means of dynamic programming. In addition, exact polices are derived for the annotators constrained and spectrum constrained cases. Last, simulation results demonstrate the significant throughput gains based on the optimal solution compared with decoupled allocation of the two types of resources. △ Less

Submitted 25 December, 2019; originally announced December 2019.

arXiv:1910.01150 [pdf]

Fault Detection Using Nonlinear Low-Dimensional Representation of Sensor Data

Authors: Kai Shen, Anya Mcguirk, Yuwei Liao, Arin Chaudhuri, Deovrat Kakde

Abstract: Sensor data analysis plays a key role in health assessment of critical equipment. Such data are multivariate and exhibit nonlinear relationships. This paper describes how one can exploit nonlinear dimension reduction techniques, such as the t-distributed stochastic neighbor embedding (t-SNE) and kernel principal component analysis (KPCA) for fault detection. We show that using anomaly detection wi… ▽ More Sensor data analysis plays a key role in health assessment of critical equipment. Such data are multivariate and exhibit nonlinear relationships. This paper describes how one can exploit nonlinear dimension reduction techniques, such as the t-distributed stochastic neighbor embedding (t-SNE) and kernel principal component analysis (KPCA) for fault detection. We show that using anomaly detection with low dimensional representations provides better interpretability and is conducive to edge processing in IoT applications. △ Less

Submitted 2 October, 2019; originally announced October 2019.

arXiv:1908.07408 [pdf, ps, other]

Mixed-Timescale Beamforming and Power Splitting for Massive MIMO Aided SWIPT IoT Network

Authors: Xihan Chen, Hei Victor Cheng, An Liu, Kaiming Shen, Min-Jian Zhao

Abstract: Traditional simultaneous wireless information and power transfer (SWIPT) with power splitting assumes perfect channel state information (CSI), which is difficult to obtain especially in the massive multiple-input-multiple-output (MIMO) regime. In this letter, we consider a mixed-timescale joint beamforming and power splitting (MJBP) scheme to maximize general utility functions under a power constr… ▽ More Traditional simultaneous wireless information and power transfer (SWIPT) with power splitting assumes perfect channel state information (CSI), which is difficult to obtain especially in the massive multiple-input-multiple-output (MIMO) regime. In this letter, we consider a mixed-timescale joint beamforming and power splitting (MJBP) scheme to maximize general utility functions under a power constraint in the downlink of a massive MIMO SWIPT IoT network. In this scheme, the transmit digital beamformer is adapted to the imperfect CSI, while the receive power splitters are adapted to the long-term channel statistics only due to the consideration of hardware limit and signaling overhead. The formulated optimization problem is solved using a mixed-timescale online stochastic successive convex approximation (MO-SSCA) algorithm. Simulation results reveal significant gain over the baselines. △ Less

Submitted 20 August, 2019; originally announced August 2019.

Comments: An extended version of a manuscript submitted to IEEE WCL

arXiv:1905.09386 [pdf, other]

A Sub-mm$^3$ Ultrasonic Free-floating Implant for Multi-mote Neural Recording

Authors: Mohammad Meraj Ghanbari, David K. Piech, Konlin Shen, Sina Faraji Alamouti, Cem Yalcin, Benjamin C. Johnson, Jose M. Carmena, Michel M. Maharbiz, Rikky Muller

Abstract: A 0.8 mm$^3$ wireless, ultrasonically powered, free-floating neural recording implant is presented. The device is comprised only of a 0.25 mm$^2$ recording IC and a single piezoceramic resonator that is used for both power harvesting and data transmission. Uplink data transmission is performed by analog amplitude modulation of the ultrasound echo. Using a 1.78 MHz main carrier, >35 kbps/mote equiv… ▽ More A 0.8 mm$^3$ wireless, ultrasonically powered, free-floating neural recording implant is presented. The device is comprised only of a 0.25 mm$^2$ recording IC and a single piezoceramic resonator that is used for both power harvesting and data transmission. Uplink data transmission is performed by analog amplitude modulation of the ultrasound echo. Using a 1.78 MHz main carrier, >35 kbps/mote equivalent uplink data rate is achieved. A technique to linearize the echo amplitude modulation is introduced, resulting in <1.2\% static nonlinearity of the received signal over a $\pm$10 mV input range. The IC dissipates 37.7 $μ$W, while the neural recording front-end consumes 4 $μ$W and achieves a noise floor of 5.3 $μ$V$_{rms}$ in a 5 kHz bandwidth. This work improves sub-mm recording mote depth by >2.5x, resulting in the highest measured depth/volume ratio by $\sim$3x. Orthogonal subcarrier modulation enables simultaneous operation of multiple implants, using a single-element ultrasound external transducer. Dual-mote simultaneous power up and data transmission is demonstrated at a rate of 7 kS/s at the depth of 50 mm. △ Less

Submitted 16 July, 2019; v1 submitted 18 May, 2019; originally announced May 2019.

Comments: 11 pages, 22 figures, Submitted to Journal of Solid-State Circuits

arXiv:1808.01486 [pdf, other]

doi 10.1109/JSAC.2019.2904352

Spatial Deep Learning for Wireless Scheduling

Authors: Wei Cui, Kaiming Shen, Wei Yu

Abstract: The optimal scheduling of interfering links in a dense wireless network with full frequency reuse is a challenging task. The traditional method involves first estimating all the interfering channel strengths then optimizing the scheduling based on the model. This model-based method is however resource intensive and computationally hard because channel estimation is expensive in dense networks; fur… ▽ More The optimal scheduling of interfering links in a dense wireless network with full frequency reuse is a challenging task. The traditional method involves first estimating all the interfering channel strengths then optimizing the scheduling based on the model. This model-based method is however resource intensive and computationally hard because channel estimation is expensive in dense networks; furthermore, finding even a locally optimal solution of the resulting optimization problem may be computationally complex. This paper shows that by using a deep learning approach, it is possible to bypass the channel estimation and to schedule links efficiently based solely on the geographic locations of the transmitters and the receivers, due to the fact that in many propagation environments, the wireless channel strength is largely a function of the distance dependent path-loss. This is accomplished by unsupervised training over randomly deployed networks, and by using a novel neural network architecture that computes the geographic spatial convolutions of the interfering or interfered neighboring nodes along with subsequent multiple feedback stages to learn the optimum solution. The resulting neural network gives near-optimal performance for sum-rate maximization and is capable of generalizing to larger deployment areas and to deployments of different link densities. Moreover, to provide fairness, this paper proposes a novel scheduling approach that utilizes the sum-rate optimal scheduling algorithm over judiciously chosen subsets of links for maximizing a proportional fairness objective over the network. The proposed approach shows highly competitive and generalizable network utility maximization results. △ Less

Submitted 4 February, 2021; v1 submitted 4 August, 2018; originally announced August 2018.

Comments: This paper is the full version of the paper presented at IEEE Global Communications Conference 2018. It includes 15 pages and 12 figures

Journal ref: IEEE J. Sel. Areas in Commun. 37 (2019) 1248-1261

Showing 1–33 of 33 results for author: Shen, K