-
Spectral Efficiency Maximization for mmWave MIMO-Aided Integrated Sensing and Communication Under Practical Constraints
Authors:
Jitendra Singh,
Anand Mehrotra,
Suraj Srivastava,
Aditya K. Jagannatham,
Lajos Hanzo
Abstract:
A hybrid transmit precoder (TPC) and receive combiner (RC) pair is conceived for millimeter wave (mmWave) multiple input multiple output (MIMO) integrated sensing and communication (ISAC) systems. The proposed design considers a practical mean squared error (MSE) constraint between the desired and the achieved beampatterns constructed for identifying radar targets (RTs). To achieve optimal perform…
▽ More
A hybrid transmit precoder (TPC) and receive combiner (RC) pair is conceived for millimeter wave (mmWave) multiple input multiple output (MIMO) integrated sensing and communication (ISAC) systems. The proposed design considers a practical mean squared error (MSE) constraint between the desired and the achieved beampatterns constructed for identifying radar targets (RTs). To achieve optimal performance, we formulate an optimization problem relying on sum spectral efficiency (SE) maximization of the communication users (CUs), while satisfying certain radar beampattern similarity (RBPS), total transmit power, and constant modulus constraints, where the latter are attributed to the hybrid mmWave MIMO architecture. Since the aforementioned problem is non-convex and intractable, a sequential approach is proposed wherein the TPCs are designed first, followed by the RCs. To deal with the non-convex MSE and constant modulus constraints in the TPC design problem, we propose a majorization and minimization (MM) based Riemannian conjugate gradient (RCG) method, which restricts the tolerable MSE of the beampattern to within a predefined limit. Moreover, the least squares and the zero-forcing methods are adopted for maximizing the sum-SE and for mitigating the multiuser interference (MUI), respectively. Furthermore, to design the RC at each CU, we propose a linear MM-based blind combiner (LMBC) scheme that does not rely on the knowledge of the TPC at the CUs and has a low complexity. To achieve user fairness, we further extend the proposed sequential approach for maximizing the geometric mean (GM) of the CU's rate. Simulation results are presented, which show the superior performance of the proposed hybrid TPC and RC in comparison to the state-of-the-art designs in the mmWave MIMO ISAC systems under consideration.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
Data-Aided CSI Estimation Using Affine-Precoded Superimposed Pilots in Orthogonal Time Frequency Space Modulated MIMO Systems
Authors:
Anand Mehrotra,
Suraj Srivastava,
Aditya K. Jagannatham,
Lajos Hanzo
Abstract:
An orthogonal affine-precoded superimposed pilot-based architecture is developed for the cyclic prefix (CP)-aided SISO and MIMO orthogonal time frequency space systems relying on arbitrary transmitter-receiver pulse shaping. The data and pilot symbol matrices are affine-precoded and superimposed in the delay Doppler-domain followed by the development of an end-to-end DD-domain relationship for the…
▽ More
An orthogonal affine-precoded superimposed pilot-based architecture is developed for the cyclic prefix (CP)-aided SISO and MIMO orthogonal time frequency space systems relying on arbitrary transmitter-receiver pulse shaping. The data and pilot symbol matrices are affine-precoded and superimposed in the delay Doppler-domain followed by the development of an end-to-end DD-domain relationship for the input-output symbols. At the receiver, the decoupled pilot and data symbol are extracted by employing orthogonal precoder matrices, which eliminates the mutual interference. Furthermore, a novel pilot-aided Bayesian learning (PA-BL) technique is conceived for the channel state information (CSI) estimation of SISO OTFS systems based on the expectation-maximization (EM) technique. Subsequently, a data-aided Bayesian learning (DA-BL)-based joint CSI estimation and data detection technique is proposed, which beneficially harnesses the estimated data symbols for improved CSI estimation. In this scenario our sophisticated data detection rule also integrates the CSI uncertainty of channel estimation into our the linear minimum mean square error (LMMSE) detectors. The AP-SIP framework is also extended to MIMO OTFS systems, wherein the DD-domain input matrix is affine-precoded for each transmit antenna (TA). Then an EM algorithm-based PA-BL scheme is derived for simultaneous row-group sparse CSI estimation for this system, followed also by our data-aided DA-BL scheme that performs joint CSI estimation and data detection. Moreover, the Bayesian Cramer-Rao bounds (BCRBs) are also derived for both SISO as well as MIMO OTFS systems. Finally, simulation results are presented for characterizing the performance of the proposed CSI estimation techniques in a range of typical settings along with their bit error rate (BER) performance in comparison to an ideal system having perfect CSI.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
Federated Self-supervised Speech Representations: Are We There Yet?
Authors:
Yan Gao,
Javier Fernandez-Marques,
Titouan Parcollet,
Abhinav Mehrotra,
Nicholas D. Lane
Abstract:
The ubiquity of microphone-enabled devices has lead to large amounts of unlabelled audio data being produced at the edge. The integration of self-supervised learning (SSL) and federated learning (FL) into one coherent system can potentially offer data privacy guarantees while also advancing the quality and robustness of speech representations. In this paper, we provide a first-of-its-kind systemat…
▽ More
The ubiquity of microphone-enabled devices has lead to large amounts of unlabelled audio data being produced at the edge. The integration of self-supervised learning (SSL) and federated learning (FL) into one coherent system can potentially offer data privacy guarantees while also advancing the quality and robustness of speech representations. In this paper, we provide a first-of-its-kind systematic study of the feasibility and complexities for training speech SSL models under FL scenarios from the perspective of algorithms, hardware, and systems limits. Despite the high potential of their combination, we find existing system constraints and algorithmic behaviour make SSL and FL systems nearly impossible to build today. Yet critically, our results indicate specific performance bottlenecks and research opportunities that would allow this situation to be reversed. While our analysis suggests that, given existing trends in hardware, hybrid SSL and FL speech systems will not be viable until 2027. We believe this study can act as a roadmap to accelerate work towards reaching this milestone much earlier.
△ Less
Submitted 19 July, 2022; v1 submitted 6 April, 2022;
originally announced April 2022.
-
Inferring Facing Direction from Voice Signals
Authors:
Yu-Lin Wei,
Rui Li,
Abhinav Mehrotra,
Romit Roy Choudhury,
Nic Lane
Abstract:
Consider a home or office where multiple devices are running voice assistants (e.g., TVs, lights, ovens, refrigerators, etc.). A human user turns to a particular device and gives a voice command, such as ``Alexa, can you ...''. This paper focuses on the problem of detecting which device the user was facing, and therefore, enabling only that device to respond to the command. Our core intuition emer…
▽ More
Consider a home or office where multiple devices are running voice assistants (e.g., TVs, lights, ovens, refrigerators, etc.). A human user turns to a particular device and gives a voice command, such as ``Alexa, can you ...''. This paper focuses on the problem of detecting which device the user was facing, and therefore, enabling only that device to respond to the command. Our core intuition emerges from the fact that human voice exhibits a directional radiation pattern, and the orientation of this pattern should influence the signal received at each device. Unfortunately, indoor multipath, unknown user location, and unknown voice signals pose as critical hurdles. Through a new algorithm that estimates the line-of-sight (LoS) power from a given signal, and combined with beamforming and triangulation, we design a functional solution called CoDIR. Results from $500+$ configurations, across $5$ rooms and $9$ different users, are encouraging. While improvements are necessary, we believe this is an important step forward in a challenging but urgent problem space.
△ Less
Submitted 28 September, 2021; v1 submitted 27 September, 2021;
originally announced September 2021.
-
Bunched LPCNet : Vocoder for Low-cost Neural Text-To-Speech Systems
Authors:
Ravichander Vipperla,
Sangjun Park,
Kihyun Choo,
Samin Ishtiaq,
Kyoungbo Min,
Sourav Bhattacharya,
Abhinav Mehrotra,
Alberto Gil C. P. Ramos,
Nicholas D. Lane
Abstract:
LPCNet is an efficient vocoder that combines linear prediction and deep neural network modules to keep the computational complexity low. In this work, we present two techniques to further reduce it's complexity, aiming for a low-cost LPCNet vocoder-based neural Text-to-Speech (TTS) System. These techniques are: 1) Sample-bunching, which allows LPCNet to generate more than one audio sample per infe…
▽ More
LPCNet is an efficient vocoder that combines linear prediction and deep neural network modules to keep the computational complexity low. In this work, we present two techniques to further reduce it's complexity, aiming for a low-cost LPCNet vocoder-based neural Text-to-Speech (TTS) System. These techniques are: 1) Sample-bunching, which allows LPCNet to generate more than one audio sample per inference; and 2) Bit-bunching, which reduces the computations in the final layer of LPCNet. With the proposed bunching techniques, LPCNet, in conjunction with a Deep Convolutional TTS (DCTTS) acoustic model, shows a 2.19x improvement over the baseline run-time when running on a mobile device, with a less than 0.1 decrease in TTS mean opinion score (MOS).
△ Less
Submitted 11 August, 2020;
originally announced August 2020.