-
Performance Evaluation of Scheduling Scheme in O-RAN 5G Network using NS-3
Authors:
A. K. Subudhi,
A. Piccioni,
V. Gudepu,
A. Marotta,
F. Graziosi,
R. M. Hegde,
K. Kondepu
Abstract:
The integration of Open Radio Access Network (O-RAN) principles into 5G networks introduces a paradigm shift in how radio resources are managed and optimized. O-RAN's open architecture enables the deployment of intelligent applications (xApps) that can dynamically adapt to varying network conditions and user demands. In this paper, we present radio resource scheduling schemes -- a possible O-RAN-c…
▽ More
The integration of Open Radio Access Network (O-RAN) principles into 5G networks introduces a paradigm shift in how radio resources are managed and optimized. O-RAN's open architecture enables the deployment of intelligent applications (xApps) that can dynamically adapt to varying network conditions and user demands. In this paper, we present radio resource scheduling schemes -- a possible O-RAN-compliant xApp can be designed. This xApp facilitates the implementation of customized scheduling strategies, tailored to meet the diverse Quality-of-Service (QoS) requirements of emerging 5G use cases, such as enhanced mobile broadband (eMBB), massive machine-type communications (mMTC), and ultra-reliable low-latency communications (URLLC).
We have tested the implemented scheduling schemes within an ns-3 simulation environment, integrated with the O-RAN framework. The evaluation includes the implementation of the Max-Throughput (MT) scheduling policy -- which prioritizes resource allocation based on optimal channel conditions, the Proportional-Fair (PF) scheduling policy -- which balances fairness with throughput, and compared with the default Round Robin (RR) scheduler. In addition, the implemented scheduling schemes support dynamic Time Division Duplex (TDD), allowing flexible configuration of Downlink (DL) and Uplink (UL) switching for bidirectional transmissions, ensuring efficient resource utilization across various scenarios. The results demonstrate resource allocation's effectiveness under MT and PF scheduling policies. To assess the efficiency of this resource allocation, we analyzed the Modulation Coding Scheme (MCS), the number of symbols, and Transmission Time Intervals (TTIs) allocated per user, and compared them with the throughput achieved. The analysis revealed a consistent relationship between these factors and the observed throughput.
△ Less
Submitted 1 April, 2025;
originally announced April 2025.
-
Learning Speaker-specific Lip-to-Speech Generation
Authors:
Munender Varshney,
Ravindra Yadav,
Vinay P. Namboodiri,
Rajesh M Hegde
Abstract:
Understanding the lip movement and inferring the speech from it is notoriously difficult for the common person. The task of accurate lip-reading gets help from various cues of the speaker and its contextual or environmental setting. Every speaker has a different accent and speaking style, which can be inferred from their visual and speech features. This work aims to understand the correlation/mapp…
▽ More
Understanding the lip movement and inferring the speech from it is notoriously difficult for the common person. The task of accurate lip-reading gets help from various cues of the speaker and its contextual or environmental setting. Every speaker has a different accent and speaking style, which can be inferred from their visual and speech features. This work aims to understand the correlation/mapping between speech and the sequence of lip movement of individual speakers in an unconstrained and large vocabulary. We model the frame sequence as a prior to the transformer in an auto-encoder setting and learned a joint embedding that exploits temporal properties of both audio and video. We learn temporal synchronization using deep metric learning, which guides the decoder to generate speech in sync with input lip movements. The predictive posterior thus gives us the generated speech in speaker speaking style. We have trained our model on the Grid and Lip2Wav Chemistry lecture dataset to evaluate single speaker natural speech generation tasks from lip movement in an unconstrained natural setting. Extensive evaluation using various qualitative and quantitative metrics with human evaluation also shows that our method outperforms the Lip2Wav Chemistry dataset(large vocabulary in an unconstrained setting) by a good margin across almost all evaluation metrics and marginally outperforms the state-of-the-art on GRID dataset.
△ Less
Submitted 20 August, 2022; v1 submitted 4 June, 2022;
originally announced June 2022.
-
Stochastic Talking Face Generation Using Latent Distribution Matching
Authors:
Ravindra Yadav,
Ashish Sardana,
Vinay P Namboodiri,
Rajesh M Hegde
Abstract:
The ability to envisage the visual of a talking face based just on hearing a voice is a unique human capability. There have been a number of works that have solved for this ability recently. We differ from these approaches by enabling a variety of talking face generations based on single audio input. Indeed, just having the ability to generate a single talking face would make a system almost robot…
▽ More
The ability to envisage the visual of a talking face based just on hearing a voice is a unique human capability. There have been a number of works that have solved for this ability recently. We differ from these approaches by enabling a variety of talking face generations based on single audio input. Indeed, just having the ability to generate a single talking face would make a system almost robotic in nature. In contrast, our unsupervised stochastic audio-to-video generation model allows for diverse generations from a single audio input. Particularly, we present an unsupervised stochastic audio-to-video generation model that can capture multiple modes of the video distribution. We ensure that all the diverse generations are plausible. We do so through a principled multi-modal variational autoencoder framework. We demonstrate its efficacy on the challenging LRW and GRID datasets and demonstrate performance better than the baseline, while having the ability to generate multiple diverse lip synchronized videos.
△ Less
Submitted 21 November, 2020;
originally announced November 2020.
-
Speech Prediction in Silent Videos using Variational Autoencoders
Authors:
Ravindra Yadav,
Ashish Sardana,
Vinay P Namboodiri,
Rajesh M Hegde
Abstract:
Understanding the relationship between the auditory and visual signals is crucial for many different applications ranging from computer-generated imagery (CGI) and video editing automation to assisting people with hearing or visual impairments. However, this is challenging since the distribution of both audio and visual modality is inherently multimodal. Therefore, most of the existing methods ign…
▽ More
Understanding the relationship between the auditory and visual signals is crucial for many different applications ranging from computer-generated imagery (CGI) and video editing automation to assisting people with hearing or visual impairments. However, this is challenging since the distribution of both audio and visual modality is inherently multimodal. Therefore, most of the existing methods ignore the multimodal aspect and assume that there only exists a deterministic one-to-one mapping between the two modalities. It can lead to low-quality predictions as the model collapses to optimizing the average behavior rather than learning the full data distributions. In this paper, we present a stochastic model for generating speech in a silent video. The proposed model combines recurrent neural networks and variational deep generative models to learn the auditory signal's conditional distribution given the visual signal. We demonstrate the performance of our model on the GRID dataset based on standard benchmarks.
△ Less
Submitted 14 November, 2020;
originally announced November 2020.
-
A Generalized Framework for Autonomous Calibration of Wheeled Mobile Robots
Authors:
Mohan Krishna Nutalapati,
Lavish Arora,
Anway Bose,
Ketan Rajawat,
Rajesh M Hegde
Abstract:
Robotic calibration allows for the fusion of data from multiple sensors such as odometers, cameras, etc., by providing appropriate transformational relationships between the corresponding reference frames. For wheeled robots equipped with exteroceptive sensors, calibration entails learning the motion model of the sensor or the robot in terms of the odometric data, and must generally be performed p…
▽ More
Robotic calibration allows for the fusion of data from multiple sensors such as odometers, cameras, etc., by providing appropriate transformational relationships between the corresponding reference frames. For wheeled robots equipped with exteroceptive sensors, calibration entails learning the motion model of the sensor or the robot in terms of the odometric data, and must generally be performed prior to performing tasks such as simultaneous localization and mapping (SLAM). Within this context, the current trend is to carry out simultaneous calibration of odometry and sensor without the use of any additional hardware. Building upon the existing simultaneous calibration algorithms, we put forth a generalized calibration framework that can not only handle robots operating in 2D with arbitrary or unknown motion models but also handle outliers in an automated manner. We first propose an algorithm based on the alternating minimization framework applicable to two-wheel differential drive. Subsequently, for arbitrary but known drive configurations we put forth an iteratively re-weighted least squares methodology leveraging an intelligent weighing scheme. Different from the existing works, these proposed algorithms require no manual intervention and seamlessly handle outliers that arise due to both systematic and non-systematic errors. Finally, we put forward a novel Gaussian Process-based non-parametric approach for calibrating wheeled robots with arbitrary or unknown drive configurations. Detailed experiments are performed to demonstrate the accuracy, usefulness, and flexibility of the proposed algorithms.
△ Less
Submitted 6 January, 2020;
originally announced January 2020.
-
Model Free Calibration of Wheeled Robots Using Gaussian Process
Authors:
Mohan Krishna Nutalapati,
Lavish Arora,
Anway Bose,
Ketan Rajawat,
Rajesh M Hegde
Abstract:
Robotic calibration allows for the fusion of data from multiple sensors such as odometers, cameras, etc., by providing appropriate relationships between the corresponding reference frames. For wheeled robots equipped with camera/lidar along with wheel encoders, calibration entails learning the motion model of the sensor or the robot in terms of the data from the encoders and generally carried out…
▽ More
Robotic calibration allows for the fusion of data from multiple sensors such as odometers, cameras, etc., by providing appropriate relationships between the corresponding reference frames. For wheeled robots equipped with camera/lidar along with wheel encoders, calibration entails learning the motion model of the sensor or the robot in terms of the data from the encoders and generally carried out before performing tasks such as simultaneous localization and mapping (SLAM). This work puts forward a novel Gaussian Process-based non-parametric approach for calibrating wheeled robots with arbitrary or unknown drive configurations. The procedure is more general as it learns the entire sensor/robot motion model in terms of odometry measurements. Different from existing non-parametric approaches, our method relies on measurements from the onboard sensors and hence does not require the ground truth information from external motion capture systems. Alternatively, we propose a computationally efficient approach that relies on the linear approximation of the sensor motion model. Finally, we perform experiments to calibrate robots with un-modelled effects to demonstrate the accuracy, usefulness, and flexibility of the proposed approach.
△ Less
Submitted 25 October, 2019;
originally announced October 2019.
-
Minimum-Phase HRTF Modeling of Pinna Spectral Notches using Group Delay Decomposition
Authors:
Sandeep Reddy C,
Rajesh M Hegde
Abstract:
Accurate reconstruction of HRTFs is important in the development of high quality binaural sound synthesis systems. Conventionally, minimum phase HRTF model development for reconstruction of HRTFs has been limited to minimum phase-pure delay models which ignore the all pass component of the HRTF. In this paper, a novel method for minimum phase HRTF modelling of Pinna Spectral Notches (PSNs) using g…
▽ More
Accurate reconstruction of HRTFs is important in the development of high quality binaural sound synthesis systems. Conventionally, minimum phase HRTF model development for reconstruction of HRTFs has been limited to minimum phase-pure delay models which ignore the all pass component of the HRTF. In this paper, a novel method for minimum phase HRTF modelling of Pinna Spectral Notches (PSNs) using group delay decomposition is proposed. The proposed model captures the PSNs contributed by both the minimum phase and all pass component of HRTF thus facilitating an accurate reconstruction of HRTFs. The purely minimum phase HRTF components and their corresponding spatial angles are first identified using Fourier Bessel Series method that ensures a continuous evolution of the PSNs. The minimum phase-pure delay model is used to reconstruct HRTF for these spatial angles. Subsequently, the spatial angles which require both the minimum phase and all pass components are modelled using an all-pass filter cascaded with minimum-phase pure-delay model. Performance of the proposed model is evaluated by conducting experiments on PSN extraction, cross coherence analysis, and binaural synthesis. Both objective and subjective evaluation results are used to indicate the significance of the proposed model in binaural sound synthesis.
△ Less
Submitted 3 April, 2018; v1 submitted 6 November, 2017;
originally announced November 2017.
-
A Review of Localization and Tracking Algorithms in Wireless Sensor Networks
Authors:
Sudhir Kumar,
Rajesh M. Hegde
Abstract:
In this paper, a comprehensive survey of the pioneer as well as the state of-the-art localization and tracking methods in the wireless sensor networks is presented. Localization is mostly applicable for the static sensor nodes, whereas, tracking for the mobile sensor nodes. The localization algorithms are broadly classified as range-based and range-free methods. The estimated range (distance) betw…
▽ More
In this paper, a comprehensive survey of the pioneer as well as the state of-the-art localization and tracking methods in the wireless sensor networks is presented. Localization is mostly applicable for the static sensor nodes, whereas, tracking for the mobile sensor nodes. The localization algorithms are broadly classified as range-based and range-free methods. The estimated range (distance) between an anchor and an unknown node is highly erroneous in an indoor scenario. This limitation can be handled up to a large extent by employing a large number of existing access points (APs) in the range free localization method. Recent works emphasize on the use multi-sensor data like magnetic, inertial, compass, gyroscope, ultrasound, infrared, visual and/or odometer to improve the localization accuracy further. Additionally, tracking method does the future prediction of location based on the past location history. A smooth trajectory is noted even if some of the received measurements are erroneous. Real experimental set-ups such as National Instruments (NI) wireless sensor nodes, Crossbow motes and hand-held devices for carrying out the localization and tracking are also highlighted herein.
△ Less
Submitted 9 January, 2017;
originally announced January 2017.
-
A Bayesian Approach to Estimation of Speaker Normalization Parameters
Authors:
Dhananjay Ram,
Debasis Kundu,
Rajesh M. Hegde
Abstract:
In this work, a Bayesian approach to speaker normalization is proposed to compensate for the degradation in performance of a speaker independent speech recognition system. The speaker normalization method proposed herein uses the technique of vocal tract length normalization (VTLN). The VTLN parameters are estimated using a novel Bayesian approach which utilizes the Gibbs sampler, a special type o…
▽ More
In this work, a Bayesian approach to speaker normalization is proposed to compensate for the degradation in performance of a speaker independent speech recognition system. The speaker normalization method proposed herein uses the technique of vocal tract length normalization (VTLN). The VTLN parameters are estimated using a novel Bayesian approach which utilizes the Gibbs sampler, a special type of Markov Chain Monte Carlo method. Additionally the hyperparameters are estimated using maximum likelihood approach. This model is used assuming that human vocal tract can be modeled as a tube of uniform cross section. It captures the variation in length of the vocal tract of different speakers more effectively, than the linear model used in literature. The work has also investigated different methods like minimization of Mean Square Error (MSE) and Mean Absolute Error (MAE) for the estimation of VTLN parameters. Both single pass and two pass approaches are then used to build a VTLN based speech recognizer. Experimental results on recognition of vowels and Hindi phrases from a medium vocabulary indicate that the Bayesian method improves the performance by a considerable margin.
△ Less
Submitted 19 October, 2016;
originally announced October 2016.
-
Second Order Cone Programming for Sensor Node Localization in Mixed LOS/NLOS Conditions
Authors:
Sudhir Kumar,
Rishabh Dixit,
Rajesh M. Hegde
Abstract:
In this paper, a novel method for sensor node localization under mixed line-of-sight/non-line-of-sight (LOS/NLOS) conditions based on second order cone programming (SOCP) is presented. SOCP methods have, hitherto, not been utilized in the node localization under mixed LOS/NLOS conditions. Unlike semidefinite programming (SDP) formulation, SOCP is computationally efficient for resource constrained…
▽ More
In this paper, a novel method for sensor node localization under mixed line-of-sight/non-line-of-sight (LOS/NLOS) conditions based on second order cone programming (SOCP) is presented. SOCP methods have, hitherto, not been utilized in the node localization under mixed LOS/NLOS conditions. Unlike semidefinite programming (SDP) formulation, SOCP is computationally efficient for resource constrained ad-hoc sensor network. The proposed method can work seamlessly in mixed LOS/NLOS conditions. The robustness of the method is due to the fair utilization of all measurements obtained under LOS and NLOS conditions. The computational complexity of this method is quadratic in the number of nearest neighbours of the unknown node. Extensive simulations and real field deployments are used to evaluate the performance of the proposed method. The experimental results of the proposed method is reasonably better when compared to similar methods in literature.
△ Less
Submitted 12 August, 2015;
originally announced August 2015.
-
A Complex Matrix Factorization approach to Joint Modeling of Magnitude and Phase for Source Separation
Authors:
Chaitanya Ahuja,
Karan Nathwani,
Rajesh M. Hegde
Abstract:
Conventional NMF methods for source separation factorize the matrix of spectral magnitudes. Spectral Phase is not included in the decomposition process of these methods. However, phase of the speech mixture is generally used in reconstructing the target speech signal. This results in undesired traces of interfering sources in the target signal. In this paper the spectral phase is incorporated in t…
▽ More
Conventional NMF methods for source separation factorize the matrix of spectral magnitudes. Spectral Phase is not included in the decomposition process of these methods. However, phase of the speech mixture is generally used in reconstructing the target speech signal. This results in undesired traces of interfering sources in the target signal. In this paper the spectral phase is incorporated in the decomposition process itself. Additionally, the complex matrix factorization problem is reduced to an NMF problem using simple transformations. This results in effective separation of speech mixtures since both magnitude and phase are utilized jointly in the separation process. Improvement in source separation results are demonstrated using objective quality evaluations on the GRID corpus.
△ Less
Submitted 25 November, 2014;
originally announced November 2014.