-
FollowSpot: Enhancing Wireless Communications via Movable Ceiling-Mounted Metasurfaces
Authors:
Wenhai Lai,
Kaiming Shen,
Rui Zhang
Abstract:
This paper studies the optimal placement of ceiling-mounted metasurfaces (MTSs) to help focus the wireless signal beam onto the target receiver, as inspired by the theatre spotlight. We assume that a total of $M$ MTSs are deployed, and that there are $L$ possible positions for each MTS. The resulting signal-to-noise (SNR) maximization problem is difficult to tackle directly because of the coupling…
▽ More
This paper studies the optimal placement of ceiling-mounted metasurfaces (MTSs) to help focus the wireless signal beam onto the target receiver, as inspired by the theatre spotlight. We assume that a total of $M$ MTSs are deployed, and that there are $L$ possible positions for each MTS. The resulting signal-to-noise (SNR) maximization problem is difficult to tackle directly because of the coupling between the placement decisions of the different MTSs. Mathematically, we are faced with a nonlinear discrete optimization problem with $L^M$ possible solutions. A remarkable result shown in this paper is that the above challenging problem can be efficiently solved within $O(ML^2\log(ML))$ time. There are two key steps in developing the proposed algorithm. First, we successfully decouple the placement variables of different MTSs by introducing a continuous auxiliary variable $μ$; the discrete primal variables are now easy to optimize when $μ$ is held fixed, but the optimization problem of $μ$ is nonconvex. Second, we show that the optimization of continuous $μ$ can be recast into a discrete optimization problem with only $LM$ possible solutions, so the optimal $μ$ can now be readily obtained. Numerical results show that the proposed algorithm can not only guarantee a global optimum but also reach the optimal solution efficiently.
△ Less
Submitted 5 July, 2025;
originally announced July 2025.
-
Blind Passive Beamforming for MIMO System
Authors:
Wenhai Lai,
Jiawei Yao,
Kaiming Shen
Abstract:
Passive beamforming for the intelligent surface (IS)-aided multiple-input multiple-output (MIMO) communication is a difficult nonconvex problem. It becomes even more challenging under the practical discrete constraints on phase shifts. Unlike most of the existing approaches that rely on the channel state information (CSI), this work advocates a blind beamforming strategy without any CSI. Simply pu…
▽ More
Passive beamforming for the intelligent surface (IS)-aided multiple-input multiple-output (MIMO) communication is a difficult nonconvex problem. It becomes even more challenging under the practical discrete constraints on phase shifts. Unlike most of the existing approaches that rely on the channel state information (CSI), this work advocates a blind beamforming strategy without any CSI. Simply put, we propose a statistical method that learns the main feature of the wireless environment from the random samples of received signal power. Field tests in the 5G commercial network demonstrate the superiority of the proposed blind passive beamforming method.
△ Less
Submitted 1 June, 2025;
originally announced June 2025.
-
Kimi-Audio Technical Report
Authors:
KimiTeam,
Ding Ding,
Zeqian Ju,
Yichong Leng,
Songxiang Liu,
Tong Liu,
Zeyu Shang,
Kai Shen,
Wei Song,
Xu Tan,
Heyi Tang,
Zhengtao Wang,
Chu Wei,
Yifei Xin,
Xinran Xu,
Jianwei Yu,
Yutao Zhang,
Xinyu Zhou,
Y. Charles,
Jun Chen,
Yanru Chen,
Yulun Du,
Weiran He,
Zhenxing Hu,
Guokun Lai
, et al. (15 additional authors not shown)
Abstract:
We present Kimi-Audio, an open-source audio foundation model that excels in audio understanding, generation, and conversation. We detail the practices in building Kimi-Audio, including model architecture, data curation, training recipe, inference deployment, and evaluation. Specifically, we leverage a 12.5Hz audio tokenizer, design a novel LLM-based architecture with continuous features as input a…
▽ More
We present Kimi-Audio, an open-source audio foundation model that excels in audio understanding, generation, and conversation. We detail the practices in building Kimi-Audio, including model architecture, data curation, training recipe, inference deployment, and evaluation. Specifically, we leverage a 12.5Hz audio tokenizer, design a novel LLM-based architecture with continuous features as input and discrete tokens as output, and develop a chunk-wise streaming detokenizer based on flow matching. We curate a pre-training dataset that consists of more than 13 million hours of audio data covering a wide range of modalities including speech, sound, and music, and build a pipeline to construct high-quality and diverse post-training data. Initialized from a pre-trained LLM, Kimi-Audio is continual pre-trained on both audio and text data with several carefully designed tasks, and then fine-tuned to support a diverse of audio-related tasks. Extensive evaluation shows that Kimi-Audio achieves state-of-the-art performance on a range of audio benchmarks including speech recognition, audio understanding, audio question answering, and speech conversation. We release the codes, model checkpoints, as well as the evaluation toolkits in https://github.com/MoonshotAI/Kimi-Audio.
△ Less
Submitted 25 April, 2025;
originally announced April 2025.
-
Autonomous Exploration-Based Precise Mapping for Mobile Robots through Stepwise and Consistent Motions
Authors:
Muhua Zhang,
Lei Ma,
Ying Wu,
Kai Shen,
Yongkui Sun,
Henry Leung
Abstract:
This paper presents an autonomous exploration framework. It is designed for indoor ground mobile robots that utilize laser Simultaneous Localization and Mapping (SLAM), ensuring process completeness and precise mapping results. For frontier search, the local-global sampling architecture based on multiple Rapidly Exploring Random Trees (RRTs) is employed. Traversability checks during RRT expansion…
▽ More
This paper presents an autonomous exploration framework. It is designed for indoor ground mobile robots that utilize laser Simultaneous Localization and Mapping (SLAM), ensuring process completeness and precise mapping results. For frontier search, the local-global sampling architecture based on multiple Rapidly Exploring Random Trees (RRTs) is employed. Traversability checks during RRT expansion and global RRT pruning upon map updates eliminate unreachable frontiers, reducing potential collisions and deadlocks. Adaptive sampling density adjustments, informed by obstacle distribution, enhance exploration coverage potential. For frontier point navigation, a stepwise consistent motion strategy is adopted, wherein the robot strictly drives straight on approximately equidistant line segments in the polyline path and rotates in place at segment junctions. This simplified, decoupled motion pattern improves scan-matching stability and mitigates map drift. For process control, the framework serializes frontier point selection and navigation, avoiding oscillation caused by frequent goal changes in conventional parallelized processes. The waypoint retracing mechanism is introduced to generate repeated observations, triggering loop closure detection and backend optimization in graph-based SLAM, thereby improving map consistency and precision. Experiments in both simulation and real-world scenarios validate the effectiveness of the framework. It achieves improved mapping coverage and precision in more challenging environments compared to baseline 2D exploration algorithms. It also shows robustness in supporting resource-constrained robot platforms and maintaining mapping consistency across various LiDAR field-of-view (FoV) configurations.
△ Less
Submitted 21 March, 2025;
originally announced March 2025.
-
MoonCast: High-Quality Zero-Shot Podcast Generation
Authors:
Zeqian Ju,
Dongchao Yang,
Jianwei Yu,
Kai Shen,
Yichong Leng,
Zhengtao Wang,
Xu Tan,
Xinyu Zhou,
Tao Qin,
Xiangyang Li
Abstract:
Recent advances in text-to-speech synthesis have achieved notable success in generating high-quality short utterances for individual speakers. However, these systems still face challenges when extending their capabilities to long, multi-speaker, and spontaneous dialogues, typical of real-world scenarios such as podcasts. These limitations arise from two primary challenges: 1) long speech: podcasts…
▽ More
Recent advances in text-to-speech synthesis have achieved notable success in generating high-quality short utterances for individual speakers. However, these systems still face challenges when extending their capabilities to long, multi-speaker, and spontaneous dialogues, typical of real-world scenarios such as podcasts. These limitations arise from two primary challenges: 1) long speech: podcasts typically span several minutes, exceeding the upper limit of most existing work; 2) spontaneity: podcasts are marked by their spontaneous, oral nature, which sharply contrasts with formal, written contexts; existing works often fall short in capturing this spontaneity. In this paper, we propose MoonCast, a solution for high-quality zero-shot podcast generation, aiming to synthesize natural podcast-style speech from text-only sources (e.g., stories, technical reports, news in TXT, PDF, or Web URL formats) using the voices of unseen speakers. To generate long audio, we adopt a long-context language model-based audio modeling approach utilizing large-scale long-context speech data. To enhance spontaneity, we utilize a podcast generation module to generate scripts with spontaneous details, which have been empirically shown to be as crucial as the text-to-speech modeling itself. Experiments demonstrate that MoonCast outperforms baselines, with particularly notable improvements in spontaneity and coherence.
△ Less
Submitted 19 March, 2025; v1 submitted 18 March, 2025;
originally announced March 2025.
-
Uplink Coordinated Pilot Design for 1-bit Massive MIMO in Correlated Channel
Authors:
Hyeongtak Yun,
Juntaek Han,
Kaiming Shen,
Jeonghun Park
Abstract:
In this paper, we propose a coordinated pilot design method to minimize the channel estimation mean squared error (MSE) in 1-bit analog-to-digital converters (ADCs) massive multiple-input multiple-output (MIMO). Under the assumption that the well-known Bussgang linear minimum mean square error (BLMMSE) estimator is used for channel estimation, we first observe that the resulting MSE leads to an in…
▽ More
In this paper, we propose a coordinated pilot design method to minimize the channel estimation mean squared error (MSE) in 1-bit analog-to-digital converters (ADCs) massive multiple-input multiple-output (MIMO). Under the assumption that the well-known Bussgang linear minimum mean square error (BLMMSE) estimator is used for channel estimation, we first observe that the resulting MSE leads to an intractable optimization problem, as it involves the arcsin function and a complex multiple matrix ratio form. To resolve this, we derive the approximate MSE by assuming the low signal-to-noise ratio (SNR) regime, by which we develop an efficient coordinated pilot design based on a fractional programming technique. The proposed pilot design is distinguishable from the existing work in that it is applicable in general system environments, including correlated channel and multi-cell environments. We demonstrate that the proposed method outperforms the channel estimation accuracy performance compared to the conventional approaches.
△ Less
Submitted 19 February, 2025;
originally announced February 2025.
-
Intelligent Surface Assisted Radar Stealth Against Unauthorized ISAC
Authors:
Fan Xu,
Wenhai Lai,
Kaiming Shen
Abstract:
The integration of radar sensors and communication networks as envisioned for the 6G wireless networks poses significant security risks, e.g., the user position information can be released to an unauthorized dual-functional base station (DFBS). To address this issue, we propose an intelligent surface (IS)-assisted radar stealth technology that prevents adversarial sensing. Specifically, we modify…
▽ More
The integration of radar sensors and communication networks as envisioned for the 6G wireless networks poses significant security risks, e.g., the user position information can be released to an unauthorized dual-functional base station (DFBS). To address this issue, we propose an intelligent surface (IS)-assisted radar stealth technology that prevents adversarial sensing. Specifically, we modify the wireless channels by tuning the phase shifts of IS in order to protect the target user from unauthorized sensing without jeopardizing the wireless communication link. In principle, we wish to maximize the distortion between the estimated angle-of-arrival (AoA) by the DFBS and the ground truth given the minimum signal-to-noise-radio (SNR) constraint for communication. Toward this end, we propose characterizing the problem as a game played by the DFBS and the IS, in which the DFBS aims to maximize a particular utility while the IS aims to minimize the utility. Although the problem is nonconvex, this paper shows that it can be optimally solved in closed form from a geometric perspective. According to the simulations, the proposed closed-form algorithm outperforms the baseline methods significantly in combating unauthorized sensing while limiting the impacts on wireless communications.
△ Less
Submitted 26 January, 2025;
originally announced January 2025.
-
An Efficient Convex-Hull Relaxation Based Algorithm for Multi-User Discrete Passive Beamforming
Authors:
Wenhai Lai,
Zheyu Wu,
Yi Feng,
Kaiming Shen,
Ya-Feng Liu
Abstract:
Intelligent reflecting surface (IRS) is an emerging technology to enhance spatial multiplexing in wireless networks. This letter considers the discrete passive beamforming design for IRS in order to maximize the minimum signal-to-interference-plus-noise ratio (SINR) among multiple users in an IRS-assisted downlink network. The main design difficulty lies in the discrete phase-shift constraint. Dif…
▽ More
Intelligent reflecting surface (IRS) is an emerging technology to enhance spatial multiplexing in wireless networks. This letter considers the discrete passive beamforming design for IRS in order to maximize the minimum signal-to-interference-plus-noise ratio (SINR) among multiple users in an IRS-assisted downlink network. The main design difficulty lies in the discrete phase-shift constraint. Differing from most existing works, this letter advocates a convex-hull relaxation of the discrete constraints which leads to a continuous reformulated problem equivalent to the original discrete problem. This letter further proposes an efficient alternating projection/proximal gradient descent and ascent algorithm for solving the reformulated problem. Simulation results show that the proposed algorithm outperforms the state-of-the-art methods significantly.
△ Less
Submitted 28 August, 2024; v1 submitted 30 July, 2024;
originally announced July 2024.
-
Blind Beamforming for Coverage Enhancement with Intelligent Reflecting Surface
Authors:
Fan Xu,
Jiawei Yao,
Wenhai Lai,
Kaiming Shen,
Xin Li,
Xin Chen,
Zhi-Quan Luo
Abstract:
Conventional policy for configuring an intelligent reflecting surface (IRS) typically requires channel state information (CSI), thus incurring substantial overhead costs and facing incompatibility with the current network protocols. This paper proposes a blind beamforming strategy in the absence of CSI, aiming to boost the minimum signal-to-noise ratio (SNR) among all the receiver positions, namel…
▽ More
Conventional policy for configuring an intelligent reflecting surface (IRS) typically requires channel state information (CSI), thus incurring substantial overhead costs and facing incompatibility with the current network protocols. This paper proposes a blind beamforming strategy in the absence of CSI, aiming to boost the minimum signal-to-noise ratio (SNR) among all the receiver positions, namely the coverage enhancement. Although some existing works already consider the IRS-assisted coverage enhancement without CSI, they assume certain position-channel models through which the channels can be recovered from the geographic locations. In contrast, our approach solely relies on the received signal power data, not assuming any position-channel model. We examine the achievability and converse of the proposed blind beamforming method. If the IRS has $N$ reflective elements and there are $U$ receiver positions, then our method guarantees the minimum SNR of $Ω(N^2/U)$ -- which is fairly close to the upper bound $O(N+N^2\sqrt{\ln (NU)}/\sqrt[4]{U})$. Aside from the simulation results, we justify the practical use of blind beamforming in a field test at 2.6 GHz. According to the real-world experiment, the proposed blind beamforming method boosts the minimum SNR across seven random positions in a conference room by 18.22 dB, while the position-based method yields a boost of 12.08 dB.
△ Less
Submitted 17 July, 2024;
originally announced July 2024.
-
Fast Fractional Programming for Multi-Cell Integrated Sensing and Communications
Authors:
Yannan Chen,
Yi Feng,
Xiaoyang Li,
Licheng Zhao,
Kaiming Shen
Abstract:
This paper concerns the coordinate multi-cell beamforming design for integrated sensing and communications (ISAC). In particular, we assume that each base station (BS) has massive antennas. The optimization objective is to maximize a weighted sum of the data rates (for communications) and the Fisher information (for sensing). We first show that the conventional beamforming method for the multiple-…
▽ More
This paper concerns the coordinate multi-cell beamforming design for integrated sensing and communications (ISAC). In particular, we assume that each base station (BS) has massive antennas. The optimization objective is to maximize a weighted sum of the data rates (for communications) and the Fisher information (for sensing). We first show that the conventional beamforming method for the multiple-input multiple-output (MIMO) transmission, i.e., the weighted minimum mean square error (WMMSE) algorithm, works for the ISAC problem case from a fractional programming (FP) perspective. However, the WMMSE algorithm frequently requires computing the $N\times N$ matrix inverse, where $N$ is the number of transmit or receive antennas, so the algorithm becomes quite costly when antennas are massively deployed. To address this issue, we develop a nonhomogeneous bound and use it in conjunction with the FP technique to solve the ISAC beamforming problem without the need to invert any large matrices. It is further shown that the resulting new FP algorithm has an intimate connection with gradient projection, based on which we can accelerate the convergence via Nesterov's gradient extrapolation.
△ Less
Submitted 27 March, 2025; v1 submitted 16 June, 2024;
originally announced June 2024.
-
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis
Authors:
Detai Xin,
Xu Tan,
Kai Shen,
Zeqian Ju,
Dongchao Yang,
Yuancheng Wang,
Shinnosuke Takamichi,
Hiroshi Saruwatari,
Shujie Liu,
Jinyu Li,
Sheng Zhao
Abstract:
We present RALL-E, a robust language modeling method for text-to-speech (TTS) synthesis. While previous work based on large language models (LLMs) shows impressive performance on zero-shot TTS, such methods often suffer from poor robustness, such as unstable prosody (weird pitch and rhythm/duration) and a high word error rate (WER), due to the autoregressive prediction style of language models. Th…
▽ More
We present RALL-E, a robust language modeling method for text-to-speech (TTS) synthesis. While previous work based on large language models (LLMs) shows impressive performance on zero-shot TTS, such methods often suffer from poor robustness, such as unstable prosody (weird pitch and rhythm/duration) and a high word error rate (WER), due to the autoregressive prediction style of language models. The core idea behind RALL-E is chain-of-thought (CoT) prompting, which decomposes the task into simpler steps to enhance the robustness of LLM-based TTS. To accomplish this idea, RALL-E first predicts prosody features (pitch and duration) of the input text and uses them as intermediate conditions to predict speech tokens in a CoT style. Second, RALL-E utilizes the predicted duration prompt to guide the computing of self-attention weights in Transformer to enforce the model to focus on the corresponding phonemes and prosody features when predicting speech tokens. Results of comprehensive objective and subjective evaluations demonstrate that, compared to a powerful baseline method VALL-E, RALL-E significantly improves the WER of zero-shot TTS from $5.6\%$ (without reranking) and $1.7\%$ (with reranking) to $2.5\%$ and $1.0\%$, respectively. Furthermore, we demonstrate that RALL-E correctly synthesizes sentences that are hard for VALL-E and reduces the error rate from $68\%$ to $4\%$.
△ Less
Submitted 19 May, 2024; v1 submitted 4 April, 2024;
originally announced April 2024.
-
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Authors:
Zeqian Ju,
Yuancheng Wang,
Kai Shen,
Xu Tan,
Detai Xin,
Dongchao Yang,
Yanqing Liu,
Yichong Leng,
Kaitao Song,
Siliang Tang,
Zhizheng Wu,
Tao Qin,
Xiang-Yang Li,
Wei Ye,
Shikun Zhang,
Jiang Bian,
Lei He,
Jinyu Li,
Sheng Zhao
Abstract:
While recent large-scale text-to-speech (TTS) models have achieved significant progress, they still fall short in speech quality, similarity, and prosody. Considering speech intricately encompasses various attributes (e.g., content, prosody, timbre, and acoustic details) that pose significant challenges for generation, a natural idea is to factorize speech into individual subspaces representing di…
▽ More
While recent large-scale text-to-speech (TTS) models have achieved significant progress, they still fall short in speech quality, similarity, and prosody. Considering speech intricately encompasses various attributes (e.g., content, prosody, timbre, and acoustic details) that pose significant challenges for generation, a natural idea is to factorize speech into individual subspaces representing different attributes and generate them individually. Motivated by it, we propose NaturalSpeech 3, a TTS system with novel factorized diffusion models to generate natural speech in a zero-shot way. Specifically, 1) we design a neural codec with factorized vector quantization (FVQ) to disentangle speech waveform into subspaces of content, prosody, timbre, and acoustic details; 2) we propose a factorized diffusion model to generate attributes in each subspace following its corresponding prompt. With this factorization design, NaturalSpeech 3 can effectively and efficiently model intricate speech with disentangled subspaces in a divide-and-conquer way. Experiments show that NaturalSpeech 3 outperforms the state-of-the-art TTS systems on quality, similarity, prosody, and intelligibility, and achieves on-par quality with human recordings. Furthermore, we achieve better performance by scaling to 1B parameters and 200K hours of training data.
△ Less
Submitted 23 April, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
-
Intelligent Surfaces Empowered Wireless Network: Recent Advances and The Road to 6G
Authors:
Qingqing Wu,
Beixiong Zheng,
Changsheng You,
Lipeng Zhu,
Kaiming Shen,
Xiaodan Shao,
Weidong Mei,
Boya Di,
Hongliang Zhang,
Ertugrul Basar,
Lingyang Song,
Marco Di Renzo,
Zhi-Quan Luo,
Rui Zhang
Abstract:
Intelligent surfaces (ISs) have emerged as a key technology to empower a wide range of appealing applications for wireless networks, due to their low cost, high energy efficiency, flexibility of deployment and capability of constructing favorable wireless channels/radio environments. Moreover, the recent advent of several new IS architectures further expanded their electromagnetic functionalities…
▽ More
Intelligent surfaces (ISs) have emerged as a key technology to empower a wide range of appealing applications for wireless networks, due to their low cost, high energy efficiency, flexibility of deployment and capability of constructing favorable wireless channels/radio environments. Moreover, the recent advent of several new IS architectures further expanded their electromagnetic functionalities from passive reflection to active amplification, simultaneous reflection and refraction, as well as holographic beamforming. However, the research on ISs is still in rapid progress and there have been recent technological advances in ISs and their emerging applications that are worthy of a timely review. Thus, we provide in this paper a comprehensive survey on the recent development and advances of ISs aided wireless networks. Specifically, we start with an overview on the anticipated use cases of ISs in future wireless networks such as 6G, followed by a summary of the recent standardization activities related to ISs. Then, the main design issues of the commonly adopted reflection-based IS and their state-of-the-art solutions are presented in detail, including reflection optimization, deployment, signal modulation, wireless sensing, and integrated sensing and communications. Finally, recent progress and new challenges in advanced IS architectures are discussed to inspire futrue research.
△ Less
Submitted 24 March, 2024; v1 submitted 28 December, 2023;
originally announced December 2023.
-
Discerning and Enhancing the Weighted Sum-Rate Maximization Algorithms in Communications
Authors:
Zepeng Zhang,
Ziping Zhao,
Kaiming Shen,
Daniel P. Palomar,
Wei Yu
Abstract:
Weighted sum-rate (WSR) maximization plays a critical role in communication system design. This paper examines three optimization methods for WSR maximization, which ensure convergence to stationary points: two block coordinate ascent (BCA) algorithms, namely, weighted sum-minimum mean-square error (WMMSE) and WSR maximization via fractional programming (WSR-FP), along with a minorization-maximiza…
▽ More
Weighted sum-rate (WSR) maximization plays a critical role in communication system design. This paper examines three optimization methods for WSR maximization, which ensure convergence to stationary points: two block coordinate ascent (BCA) algorithms, namely, weighted sum-minimum mean-square error (WMMSE) and WSR maximization via fractional programming (WSR-FP), along with a minorization-maximization (MM) algorithm, WSR maximization via MM (WSR-MM). Our contributions are threefold. Firstly, we delineate the exact relationships among WMMSE, WSR-FP, and WSR-MM, which, despite their extensive use in the literature, lack a comprehensive comparative study. By probing the theoretical underpinnings linking the BCA and MM algorithmic frameworks, we reveal the direct correlations between the equivalent transformation techniques, essential to the development of WMMSE and WSR-FP, and the surrogate functions pivotal to WSR-MM. Secondly, we propose a novel algorithm, WSR-MM+, harnessing the flexibility of selecting surrogate functions in MM framework. By circumventing the repeated matrix inversions in the search for optimal Lagrange multipliers in existing algorithms, WSR-MM+ significantly reduces the computational load per iteration and accelerates convergence. Thirdly, we reconceptualize WSR-MM+ within the BCA framework, introducing a new equivalent transform, which gives rise to an enhanced version of WSR-FP, named as WSR-FP+. We further demonstrate that WSR-MM+ can be construed as the basic gradient projection method. This perspective yields a deeper understanding into its computational intricacies. Numerical simulations corroborate the connections between WMMSE, WSR-FP, and WSR-MM and confirm the efficacy of the proposed WSR-MM+ and WSR-FP+ algorithms.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
A Benchmarking Protocol for SAR Colorization: From Regression to Deep Learning Approaches
Authors:
Kangqing Shen,
Gemine Vivone,
Xiaoyuan Yang,
Simone Lolli,
Michael Schmitt
Abstract:
Synthetic aperture radar (SAR) images are widely used in remote sensing. Interpreting SAR images can be challenging due to their intrinsic speckle noise and grayscale nature. To address this issue, SAR colorization has emerged as a research direction to colorize gray scale SAR images while preserving the original spatial information and radiometric information. However, this research field is stil…
▽ More
Synthetic aperture radar (SAR) images are widely used in remote sensing. Interpreting SAR images can be challenging due to their intrinsic speckle noise and grayscale nature. To address this issue, SAR colorization has emerged as a research direction to colorize gray scale SAR images while preserving the original spatial information and radiometric information. However, this research field is still in its early stages, and many limitations can be highlighted. In this paper, we propose a full research line for supervised learning-based approaches to SAR colorization. Our approach includes a protocol for generating synthetic color SAR images, several baselines, and an effective method based on the conditional generative adversarial network (cGAN) for SAR colorization. We also propose numerical assessment metrics for the problem at hand. To our knowledge, this is the first attempt to propose a research line for SAR colorization that includes a protocol, a benchmark, and a complete performance evaluation. Our extensive tests demonstrate the effectiveness of our proposed cGAN-based network for SAR colorization. The code will be made publicly available.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
PromptTTS 2: Describing and Generating Voices with Text Prompt
Authors:
Yichong Leng,
Zhifang Guo,
Kai Shen,
Xu Tan,
Zeqian Ju,
Yanqing Liu,
Yufei Liu,
Dongchao Yang,
Leying Zhang,
Kaitao Song,
Lei He,
Xiang-Yang Li,
Sheng Zhao,
Tao Qin,
Jiang Bian
Abstract:
Speech conveys more information than text, as the same word can be uttered in various voices to convey diverse information. Compared to traditional text-to-speech (TTS) methods relying on speech prompts (reference speech) for voice variability, using text prompts (descriptions) is more user-friendly since speech prompts can be hard to find or may not exist at all. TTS approaches based on the text…
▽ More
Speech conveys more information than text, as the same word can be uttered in various voices to convey diverse information. Compared to traditional text-to-speech (TTS) methods relying on speech prompts (reference speech) for voice variability, using text prompts (descriptions) is more user-friendly since speech prompts can be hard to find or may not exist at all. TTS approaches based on the text prompt face two main challenges: 1) the one-to-many problem, where not all details about voice variability can be described in the text prompt, and 2) the limited availability of text prompt datasets, where vendors and large cost of data labeling are required to write text prompts for speech. In this work, we introduce PromptTTS 2 to address these challenges with a variation network to provide variability information of voice not captured by text prompts, and a prompt generation pipeline to utilize the large language models (LLM) to compose high quality text prompts. Specifically, the variation network predicts the representation extracted from the reference speech (which contains full information about voice variability) based on the text prompt representation. For the prompt generation pipeline, it generates text prompts for speech with a speech language understanding model to recognize voice attributes (e.g., gender, speed) from speech and a large language model to formulate text prompts based on the recognition results. Experiments on a large-scale (44K hours) speech dataset demonstrate that compared to the previous works, PromptTTS 2 generates voices more consistent with text prompts and supports the sampling of diverse voice variability, thereby offering users more choices on voice generation. Additionally, the prompt generation pipeline produces high-quality text prompts, eliminating the large labeling cost. The demo page of PromptTTS 2 is available online.
△ Less
Submitted 11 October, 2023; v1 submitted 5 September, 2023;
originally announced September 2023.
-
EventTrojan: Manipulating Non-Intrusive Speech Quality Assessment via Imperceptible Events
Authors:
Ying Ren,
Kailai Shen,
Zhe Ye,
Diqun Yan
Abstract:
Non-Intrusive speech quality assessment (NISQA) has gained significant attention for predicting speech's mean opinion score (MOS) without requiring the reference speech. Researchers have gradually started to apply NISQA to various practical scenarios. However, little attention has been paid to the security of NISQA models. Backdoor attacks represent the most serious threat to deep neural networks…
▽ More
Non-Intrusive speech quality assessment (NISQA) has gained significant attention for predicting speech's mean opinion score (MOS) without requiring the reference speech. Researchers have gradually started to apply NISQA to various practical scenarios. However, little attention has been paid to the security of NISQA models. Backdoor attacks represent the most serious threat to deep neural networks (DNNs) due to the fact that backdoors possess a very high attack success rate once embedded. However, existing backdoor attacks assume that the attacker actively feeds samples containing triggers into the model during the inference phase. This is not adapted to the specific scenario of NISQA. And current backdoor attacks on regression tasks lack an objective metric to measure the attack performance. To address these issues, we propose a novel backdoor triggering approach (EventTrojan) that utilizes an event during the usage of the NISQA model as a trigger. Moreover, we innovatively provide an objective metric for backdoor attacks on regression tasks. Extensive experiments on four benchmark datasets demonstrate the effectiveness of the EventTrojan attack. Besides, it also has good resistance to several defense methods.
△ Less
Submitted 11 September, 2024; v1 submitted 4 September, 2023;
originally announced September 2023.
-
Breaking Speaker Recognition with PaddingBack
Authors:
Zhe Ye,
Diqun Yan,
Li Dong,
Kailai Shen
Abstract:
Machine Learning as a Service (MLaaS) has gained popularity due to advancements in Deep Neural Networks (DNNs). However, untrusted third-party platforms have raised concerns about AI security, particularly in backdoor attacks. Recent research has shown that speech backdoors can utilize transformations as triggers, similar to image backdoors. However, human ears can easily be aware of these transfo…
▽ More
Machine Learning as a Service (MLaaS) has gained popularity due to advancements in Deep Neural Networks (DNNs). However, untrusted third-party platforms have raised concerns about AI security, particularly in backdoor attacks. Recent research has shown that speech backdoors can utilize transformations as triggers, similar to image backdoors. However, human ears can easily be aware of these transformations, leading to suspicion. In this paper, we propose PaddingBack, an inaudible backdoor attack that utilizes malicious operations to generate poisoned samples, rendering them indistinguishable from clean ones. Instead of using external perturbations as triggers, we exploit the widely-used speech signal operation, padding, to break speaker recognition systems. Experimental results demonstrate the effectiveness of our method, achieving a significant attack success rate while retaining benign accuracy. Furthermore, PaddingBack demonstrates the ability to resist defense methods and maintain its stealthiness against human perception.
△ Less
Submitted 11 March, 2024; v1 submitted 8 August, 2023;
originally announced August 2023.
-
Adaptive Blind Beamforming for Intelligent Surface
Authors:
Wenhai Lai,
Wenyu Wang,
Fan Xu,
Xin Li,
Shaobo Niu,
Kaiming Shen
Abstract:
Configuring intelligent surface (IS) or passive antenna array without any channel knowledge, namely blind beamforming, is a frontier research topic in the wireless communication field. Existing methods in the previous literature for blind beamforming include the RFocus and the CSM, the effectiveness of which has been demonstrated on hardware prototypes. However, this paper points out a subtle issu…
▽ More
Configuring intelligent surface (IS) or passive antenna array without any channel knowledge, namely blind beamforming, is a frontier research topic in the wireless communication field. Existing methods in the previous literature for blind beamforming include the RFocus and the CSM, the effectiveness of which has been demonstrated on hardware prototypes. However, this paper points out a subtle issue with these blind beamforming algorithms: the RFocus and the CSM may fail to work in the non-line-of-sight (NLoS) channel case. To address this issue, we suggest a grouping strategy that enables adaptive blind beamforming. Specifically, the reflective elements (REs) of the IS are divided into three groups; each group is configured randomly to obtain a dataset of random samples. We then extract the statistical feature of the wireless environment from the random samples, thereby coordinating phase shifts of the IS without channel acquisition. The RE grouping plays a critical role in guaranteeing performance gain in the NLoS case. In particular, if we place all the REs in the same group, the proposed algorithm would reduce to the RFocus and the CSM. We validate the advantage of the proposed blind beamforming algorithm in the real-world networks at 3.5 GHz aside from simulations.
△ Less
Submitted 24 September, 2024; v1 submitted 30 May, 2023;
originally announced May 2023.
-
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
Authors:
Kai Shen,
Zeqian Ju,
Xu Tan,
Yanqing Liu,
Yichong Leng,
Lei He,
Tao Qin,
Sheng Zhao,
Jiang Bian
Abstract:
Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wild datasets is important to capture the diversity in human speech such as speaker identities, prosodies, and styles (e.g., singing). Current large TTS systems usually quantize speech into discrete tokens and use language models to generate these tokens one by one, which suffer from unstable prosody, word skipping/repeating is…
▽ More
Scaling text-to-speech (TTS) to large-scale, multi-speaker, and in-the-wild datasets is important to capture the diversity in human speech such as speaker identities, prosodies, and styles (e.g., singing). Current large TTS systems usually quantize speech into discrete tokens and use language models to generate these tokens one by one, which suffer from unstable prosody, word skipping/repeating issue, and poor voice quality. In this paper, we develop NaturalSpeech 2, a TTS system that leverages a neural audio codec with residual vector quantizers to get the quantized latent vectors and uses a diffusion model to generate these latent vectors conditioned on text input. To enhance the zero-shot capability that is important to achieve diverse speech synthesis, we design a speech prompting mechanism to facilitate in-context learning in the diffusion model and the duration/pitch predictor. We scale NaturalSpeech 2 to large-scale datasets with 44K hours of speech and singing data and evaluate its voice quality on unseen speakers. NaturalSpeech 2 outperforms previous TTS systems by a large margin in terms of prosody/timbre similarity, robustness, and voice quality in a zero-shot setting, and performs novel zero-shot singing synthesis with only a speech prompt. Audio samples are available at https://speechresearch.github.io/naturalspeech2.
△ Less
Submitted 30 May, 2023; v1 submitted 18 April, 2023;
originally announced April 2023.
-
Coordinating Multiple Intelligent Reflecting Surfaces without Channel Information
Authors:
Fan Xu,
Jiawei Yao,
Wenhai Lai,
Kaiming Shen,
Xin Li,
Xin Chen,
Zhi-Quan Luo
Abstract:
Conventional beamforming methods for intelligent reflecting surfaces (IRSs) or reconfigurable intelligent surfaces (RISs) typically entail the full channel state information (CSI). However, the computational cost of channel acquisition soars exponentially with the number of IRSs. To bypass this difficulty, we propose a novel strategy called blind beamforming that coordinates multiple IRSs by means…
▽ More
Conventional beamforming methods for intelligent reflecting surfaces (IRSs) or reconfigurable intelligent surfaces (RISs) typically entail the full channel state information (CSI). However, the computational cost of channel acquisition soars exponentially with the number of IRSs. To bypass this difficulty, we propose a novel strategy called blind beamforming that coordinates multiple IRSs by means of statistics without knowing CSI. Blind beamforming only requires measuring the received signal power at the user terminal for a sequence of randomly generated phase shifts across all IRSs. The main idea is to extract the key statistical quantity for beamforming by exploring only a small portion of the whole solution space of phase shifts. We show that blind beamforming guarantees a signal-to-noise ratio (SNR) boost of Theta(N^{2L}) under certain conditions, where L is the number of IRSs and N is the number of reflecting elements per IRS. The proposed conditions for achieving the optimal SNR boost of Theta(N^{4}) in a double-IRS system are much easier to satisfy than the existing ones in the literature. Most importantly, the proposed conditions can be extended to a fully general L-IRS system. The above result significantly improves upon the state of the art in the area of multi-IRS-assisted communication. Moreover, blind beamforming is justified via field tests and simulations. In particular, as shown in our field tests at 2.6 GHz, our method yields up to 17 dB SNR boost; to the best of our knowledge, this is the first time that the use of multiple IRSs gets verified in the real world.
△ Less
Submitted 8 January, 2024; v1 submitted 19 February, 2023;
originally announced February 2023.
-
Deep Learning Predicts Prevalent and Incident Parkinson's Disease From UK Biobank Fundus Imaging
Authors:
Charlie Tran,
Kai Shen,
Kang Liu,
Akshay Ashok,
Adolfo Ramirez-Zamora,
Jinghua Chen,
Yulin Li,
Ruogu Fang
Abstract:
Parkinson's disease is the world's fastest-growing neurological disorder. Research to elucidate the mechanisms of Parkinson's disease and automate diagnostics would greatly improve the treatment of patients with Parkinson's disease. Current diagnostic methods are expensive and have limited availability. Considering the insidious and preclinical onset and progression of the disease, a desirable scr…
▽ More
Parkinson's disease is the world's fastest-growing neurological disorder. Research to elucidate the mechanisms of Parkinson's disease and automate diagnostics would greatly improve the treatment of patients with Parkinson's disease. Current diagnostic methods are expensive and have limited availability. Considering the insidious and preclinical onset and progression of the disease, a desirable screening should be diagnostically accurate even before the onset of symptoms to allow medical interventions. We highlight retinal fundus imaging, often termed a window to the brain, as a diagnostic screening modality for Parkinson's disease. We conducted a systematic evaluation of conventional machine learning and deep learning techniques to classify Parkinson's disease from UK Biobank fundus imaging. Our results show that Parkinson's disease individuals can be differentiated from age and gender-matched healthy subjects with an Area Under the Curve (AUC) of 0.77. This accuracy is maintained when predicting either prevalent or incident Parkinson's disease. Explainability and trustworthiness are enhanced by visual attribution maps of localized biomarkers and quantified metrics of model robustness to data perturbations.
△ Less
Submitted 18 February, 2024; v1 submitted 13 February, 2023;
originally announced February 2023.
-
A Linear Time Algorithm for the Optimal Discrete IRS Beamforming
Authors:
Shuyi Ren,
Kaiming Shen,
Xin Li,
Xin Chen,
Zhi-Quan Luo
Abstract:
It remains an open problem to find the optimal configuration of phase shifts under the discrete constraint for intelligent reflecting surface (IRS) in polynomial time. The above problem is widely believed to be difficult because it is not linked to any known combinatorial problems that can be solved efficiently. The branch-and-bound algorithms and the approximation algorithms constitute the best r…
▽ More
It remains an open problem to find the optimal configuration of phase shifts under the discrete constraint for intelligent reflecting surface (IRS) in polynomial time. The above problem is widely believed to be difficult because it is not linked to any known combinatorial problems that can be solved efficiently. The branch-and-bound algorithms and the approximation algorithms constitute the best results in this area. Nevertheless, this work shows that the global optimum can actually be reached in linear time on average in terms of the number of reflective elements (REs) of IRS. The main idea is to geometrically interpret the discrete beamforming problem as choosing the optimal point on the unit circle. Although the number of possible combinations of phase shifts grows exponentially with the number of REs, it turns out that there are only a linear number of circular arcs that possibly contain the optimal point. Furthermore, the proposed algorithm can be viewed as a novel approach to a special case of the discrete quadratic program (QP).
△ Less
Submitted 7 September, 2023; v1 submitted 9 November, 2022;
originally announced November 2022.
-
Joint Device Selection and Power Control for Wireless Federated Learning
Authors:
Wei Guo,
Ran Li,
Chuan Huang,
Xiaoqi Qin,
Kaiming Shen,
Wei Zhang
Abstract:
This paper studies the joint device selection and power control scheme for wireless federated learning (FL), considering both the downlink and uplink communications between the parameter server (PS) and the terminal devices. In each round of model training, the PS first broadcasts the global model to the terminal devices in an analog fashion, and then the terminal devices perform local training an…
▽ More
This paper studies the joint device selection and power control scheme for wireless federated learning (FL), considering both the downlink and uplink communications between the parameter server (PS) and the terminal devices. In each round of model training, the PS first broadcasts the global model to the terminal devices in an analog fashion, and then the terminal devices perform local training and upload the updated model parameters to the PS via over-the-air computation (AirComp). First, we propose an AirComp-based adaptive reweighing scheme for the aggregation of local updated models, where the model aggregation weights are directly determined by the uplink transmit power values of the selected devices and which enables the joint learning and communication optimization simply by the device selection and power control. Furthermore, we provide a convergence analysis for the proposed wireless FL algorithm and the upper bound on the expected optimality gap between the expected and optimal global loss values is derived. With instantaneous channel state information (CSI), we formulate the optimality gap minimization problems under both the individual and sum uplink transmit power constraints, respectively, which are shown to be solved by the semidefinite programming (SDR) technique. Numerical results reveal that our proposed wireless FL algorithm achieves close to the best performance by using the ideal FedAvg scheme with error-free model exchange and full device participation.
△ Less
Submitted 18 May, 2022;
originally announced May 2022.
-
Configuring Intelligent Reflecting Surface with Performance Guarantees: Optimal Beamforming
Authors:
Yaowen Zhang,
Kaiming Shen,
Shuyi Ren,
Xin Li,
Xin Chen,
Zhi-Quan Luo
Abstract:
This work proposes linear time strategies to optimally configure the phase shifts for the reflective elements of an intelligent reflecting surface (IRS). Specifically, we show that the binary phase beamforming can be optimally solved in linear time to maximize the received signal-to-noise ratio (SNR). For the general K-ary phase beamforming, we develop a linear time approximation algorithm that gu…
▽ More
This work proposes linear time strategies to optimally configure the phase shifts for the reflective elements of an intelligent reflecting surface (IRS). Specifically, we show that the binary phase beamforming can be optimally solved in linear time to maximize the received signal-to-noise ratio (SNR). For the general K-ary phase beamforming, we develop a linear time approximation algorithm that guarantees performance within a constant fraction (1+\cos(π/K))/2 of the global optimum, e.g., it can attain over 85% of the optimal performance for the quadrature beamforming with K=4. According to the numerical results, the proposed approximation algorithm for discrete IRS beamforming outperforms the existing algorithms significantly in boosting the received SNR.
△ Less
Submitted 4 December, 2021;
originally announced December 2021.
-
Numerical Energy Analysis of In-wheel Motor Driven Autonomous Electric Vehicles
Authors:
Kang Shen,
Fan Yang,
Xinyou Ke,
Cheng Zhang,
Chris Yuan
Abstract:
Autonomous electric vehicles are being widely studied nowadays as the future technology of ground transportation, while the autonomous electric vehicles based on conventional powertrain system limit their energy and power transmission efficiencies and may hinder their broad applications in future. Here we report a study on the energy consumption and efficiency improvement of a mid-size autonomous…
▽ More
Autonomous electric vehicles are being widely studied nowadays as the future technology of ground transportation, while the autonomous electric vehicles based on conventional powertrain system limit their energy and power transmission efficiencies and may hinder their broad applications in future. Here we report a study on the energy consumption and efficiency improvement of a mid-size autonomous electric vehicle driven by in-wheel motors, through the development of a numerical energy model, validated with the actual driving data and implemented in a case study. The energy analysis was conducted under three driving conditions: flat road, upslope, and downslope driving to examine the energy consumption, with the energy-saving potential of the in-wheel-motor driven powertrain system systematically explored and discussed. Considering the energy recovery from the regenerative braking, energy consumption and regenerated energy were calculated in specific driving cycles based on vehicle dynamics and autonomous driving patterns. A case study was conducted using the baseline electric vehicle driving data in West Los Angeles. It was found that an in-wheel motor driven autonomous electric vehicle can save up to 17.5% of energy compared with a conventional electric vehicle during the slope driving. Using the efficiency maps of a commercial in-wheel motor, the numerical energy model and validated results obtained from this study are in line with actual situations, and can be used to support sustainable development of more energy-efficient autonomous electric vehicles in the future.
△ Less
Submitted 10 April, 2021;
originally announced April 2021.
-
Miniscope3D: optimized single-shot miniature 3D fluorescence microscopy
Authors:
Kyrollos Yanny,
Nick Antipa,
William Liberti,
Sam Dehaeck,
Kristina Monakhova,
Fanglin Linda Liu,
Konlin Shen,
Ren Ng,
Laura Waller
Abstract:
Miniature fluorescence microscopes are a standard tool in systems biology. However, widefield miniature microscopes capture only 2D information, and modifications that enable 3D capabilities increase the size and weight and have poor resolution outside a narrow depth range. Here, we achieve the 3D capability by replacing the tube lens of a conventional 2D Miniscope with an optimized multifocal pha…
▽ More
Miniature fluorescence microscopes are a standard tool in systems biology. However, widefield miniature microscopes capture only 2D information, and modifications that enable 3D capabilities increase the size and weight and have poor resolution outside a narrow depth range. Here, we achieve the 3D capability by replacing the tube lens of a conventional 2D Miniscope with an optimized multifocal phase mask at the objective's aperture stop. Placing the phase mask at the aperture stop significantly reduces the size of the device, and varying the focal lengths enables a uniform resolution across a wide depth range. The phase mask encodes the 3D fluorescence intensity into a single 2D measurement, and the 3D volume is recovered by solving a sparsity-constrained inverse problem. We provide methods for designing and fabricating the phase mask and an efficient forward model that accounts for the field-varying aberrations in miniature objectives. We demonstrate a prototype that is 17 mm tall and weighs 2.5 grams, achieving 2.76 $μ$m lateral, and 15 $μ$m axial resolution across most of the 900x700x390 $μm^3$ volume at 40 volumes per second. The performance is validated experimentally on resolution targets, dynamic biological samples, and mouse brain tissue. Compared with existing miniature single-shot volume-capture implementations, our system is smaller and lighter and achieves a more than 2x better lateral and axial resolution throughout a 10x larger usable depth range. Our microscope design provides single-shot 3D imaging for applications where a compact platform matters, such as volumetric neural imaging in freely moving animals and 3D motion studies of dynamic samples in incubators and lab-on-a-chip devices.
△ Less
Submitted 11 October, 2020;
originally announced October 2020.
-
Stochastic Transceiver Optimization in Multi-Tags Symbiotic Radio Systems
Authors:
Xihan Chen,
Hei Victor Cheng,
Kaiming Shen,
An Liu,
Min-Jian Zhao
Abstract:
Symbiotic radio (SR) is emerging as a spectrum- and energy-efficient communication paradigm for future passive Internet-of-things (IoT), where some single-antenna backscatter devices, referred to as Tags, are parasitic in an active primary transmission. The primary transceiver is designed to assist both direct-link (DL) and backscatter-link (BL) communication. In multi-tags SR systems, the transce…
▽ More
Symbiotic radio (SR) is emerging as a spectrum- and energy-efficient communication paradigm for future passive Internet-of-things (IoT), where some single-antenna backscatter devices, referred to as Tags, are parasitic in an active primary transmission. The primary transceiver is designed to assist both direct-link (DL) and backscatter-link (BL) communication. In multi-tags SR systems, the transceiver designs become much more complicated due to the presence of DL and inter-Tag interference, which further poses new challenges to the availability and reliability of DL and BL transmission. To overcome these challenges, we formulate the stochastic optimization of transceiver design as the general network utility maximization problem (GUMP). The resultant problem is a stochastic multiple-ratio fractional non-convex problem, and consequently challenging to solve. By leveraging some fractional programming techniques, we tailor a surrogate function with the specific structure and subsequently develop a batch stochastic parallel decomposition (BSPD) algorithm, which is shown to converge to stationary solutions of the GNUMP. Simulation results verify the effectiveness of the proposed algorithm by numerical examples in terms of the achieved system throughput.
△ Less
Submitted 24 June, 2020;
originally announced June 2020.
-
Joint Annotator-and-Spectrum Allocation in Wireless Networks for Crowd Labelling
Authors:
Xiaoyang Li,
Guangxu Zhu,
Kaiming Shen,
Wei Yu,
Yi Gong,
Kaibin Huang
Abstract:
The massive sensing data generated by Internet-of-Things will provide fuel for ubiquitous artificial intelligence (AI), automating the operations of our society ranging from transportation to healthcare. The realistic adoption of this technique however entails labelling of the enormous data prior to the training of AI models via supervised learning. To tackle this challenge, we explore a new persp…
▽ More
The massive sensing data generated by Internet-of-Things will provide fuel for ubiquitous artificial intelligence (AI), automating the operations of our society ranging from transportation to healthcare. The realistic adoption of this technique however entails labelling of the enormous data prior to the training of AI models via supervised learning. To tackle this challenge, we explore a new perspective of wireless crowd labelling that is capable of downloading data to many imperfect mobile annotators for repetition labelling by exploiting multicasting in wireless networks. In this cross-disciplinary area, the integration of the rate-distortion theory and the principle of repetition labelling for accuracy improvement gives rise to a new tradeoff between radio-and-annotator resources under a constraint on labelling accuracy. Building on the tradeoff and aiming at maximizing the labelling throughput, this work focuses on the joint optimization of encoding rate, annotator clustering, and sub-channel allocation, which results in an NP-hard integer programming problem. To devise an efficient solution approach, we establish an optimal sequential annotator-clustering scheme based on the order of decreasing signal-to-noise ratios. Thereby, the optimal solution can be found by an efficient tree search. Next, the solution is simplified by applying truncated channel inversion. Alternatively, the optimization problem can be recognized as a knapsack problem, which can be efficiently solved in pseudo-polynomial time by means of dynamic programming. In addition, exact polices are derived for the annotators constrained and spectrum constrained cases. Last, simulation results demonstrate the significant throughput gains based on the optimal solution compared with decoupled allocation of the two types of resources.
△ Less
Submitted 25 December, 2019;
originally announced December 2019.
-
Fault Detection Using Nonlinear Low-Dimensional Representation of Sensor Data
Authors:
Kai Shen,
Anya Mcguirk,
Yuwei Liao,
Arin Chaudhuri,
Deovrat Kakde
Abstract:
Sensor data analysis plays a key role in health assessment of critical equipment. Such data are multivariate and exhibit nonlinear relationships. This paper describes how one can exploit nonlinear dimension reduction techniques, such as the t-distributed stochastic neighbor embedding (t-SNE) and kernel principal component analysis (KPCA) for fault detection. We show that using anomaly detection wi…
▽ More
Sensor data analysis plays a key role in health assessment of critical equipment. Such data are multivariate and exhibit nonlinear relationships. This paper describes how one can exploit nonlinear dimension reduction techniques, such as the t-distributed stochastic neighbor embedding (t-SNE) and kernel principal component analysis (KPCA) for fault detection. We show that using anomaly detection with low dimensional representations provides better interpretability and is conducive to edge processing in IoT applications.
△ Less
Submitted 2 October, 2019;
originally announced October 2019.
-
Mixed-Timescale Beamforming and Power Splitting for Massive MIMO Aided SWIPT IoT Network
Authors:
Xihan Chen,
Hei Victor Cheng,
An Liu,
Kaiming Shen,
Min-Jian Zhao
Abstract:
Traditional simultaneous wireless information and power transfer (SWIPT) with power splitting assumes perfect channel state information (CSI), which is difficult to obtain especially in the massive multiple-input-multiple-output (MIMO) regime. In this letter, we consider a mixed-timescale joint beamforming and power splitting (MJBP) scheme to maximize general utility functions under a power constr…
▽ More
Traditional simultaneous wireless information and power transfer (SWIPT) with power splitting assumes perfect channel state information (CSI), which is difficult to obtain especially in the massive multiple-input-multiple-output (MIMO) regime. In this letter, we consider a mixed-timescale joint beamforming and power splitting (MJBP) scheme to maximize general utility functions under a power constraint in the downlink of a massive MIMO SWIPT IoT network. In this scheme, the transmit digital beamformer is adapted to the imperfect CSI, while the receive power splitters are adapted to the long-term channel statistics only due to the consideration of hardware limit and signaling overhead. The formulated optimization problem is solved using a mixed-timescale online stochastic successive convex approximation (MO-SSCA) algorithm. Simulation results reveal significant gain over the baselines.
△ Less
Submitted 20 August, 2019;
originally announced August 2019.
-
A Sub-mm$^3$ Ultrasonic Free-floating Implant for Multi-mote Neural Recording
Authors:
Mohammad Meraj Ghanbari,
David K. Piech,
Konlin Shen,
Sina Faraji Alamouti,
Cem Yalcin,
Benjamin C. Johnson,
Jose M. Carmena,
Michel M. Maharbiz,
Rikky Muller
Abstract:
A 0.8 mm$^3$ wireless, ultrasonically powered, free-floating neural recording implant is presented. The device is comprised only of a 0.25 mm$^2$ recording IC and a single piezoceramic resonator that is used for both power harvesting and data transmission. Uplink data transmission is performed by analog amplitude modulation of the ultrasound echo. Using a 1.78 MHz main carrier, >35 kbps/mote equiv…
▽ More
A 0.8 mm$^3$ wireless, ultrasonically powered, free-floating neural recording implant is presented. The device is comprised only of a 0.25 mm$^2$ recording IC and a single piezoceramic resonator that is used for both power harvesting and data transmission. Uplink data transmission is performed by analog amplitude modulation of the ultrasound echo. Using a 1.78 MHz main carrier, >35 kbps/mote equivalent uplink data rate is achieved. A technique to linearize the echo amplitude modulation is introduced, resulting in <1.2\% static nonlinearity of the received signal over a $\pm$10 mV input range. The IC dissipates 37.7 $μ$W, while the neural recording front-end consumes 4 $μ$W and achieves a noise floor of 5.3 $μ$V$_{rms}$ in a 5 kHz bandwidth. This work improves sub-mm recording mote depth by >2.5x, resulting in the highest measured depth/volume ratio by $\sim$3x. Orthogonal subcarrier modulation enables simultaneous operation of multiple implants, using a single-element ultrasound external transducer. Dual-mote simultaneous power up and data transmission is demonstrated at a rate of 7 kS/s at the depth of 50 mm.
△ Less
Submitted 16 July, 2019; v1 submitted 18 May, 2019;
originally announced May 2019.
-
Spatial Deep Learning for Wireless Scheduling
Authors:
Wei Cui,
Kaiming Shen,
Wei Yu
Abstract:
The optimal scheduling of interfering links in a dense wireless network with full frequency reuse is a challenging task. The traditional method involves first estimating all the interfering channel strengths then optimizing the scheduling based on the model. This model-based method is however resource intensive and computationally hard because channel estimation is expensive in dense networks; fur…
▽ More
The optimal scheduling of interfering links in a dense wireless network with full frequency reuse is a challenging task. The traditional method involves first estimating all the interfering channel strengths then optimizing the scheduling based on the model. This model-based method is however resource intensive and computationally hard because channel estimation is expensive in dense networks; furthermore, finding even a locally optimal solution of the resulting optimization problem may be computationally complex. This paper shows that by using a deep learning approach, it is possible to bypass the channel estimation and to schedule links efficiently based solely on the geographic locations of the transmitters and the receivers, due to the fact that in many propagation environments, the wireless channel strength is largely a function of the distance dependent path-loss. This is accomplished by unsupervised training over randomly deployed networks, and by using a novel neural network architecture that computes the geographic spatial convolutions of the interfering or interfered neighboring nodes along with subsequent multiple feedback stages to learn the optimum solution. The resulting neural network gives near-optimal performance for sum-rate maximization and is capable of generalizing to larger deployment areas and to deployments of different link densities. Moreover, to provide fairness, this paper proposes a novel scheduling approach that utilizes the sum-rate optimal scheduling algorithm over judiciously chosen subsets of links for maximizing a proportional fairness objective over the network. The proposed approach shows highly competitive and generalizable network utility maximization results.
△ Less
Submitted 4 February, 2021; v1 submitted 4 August, 2018;
originally announced August 2018.