Skip to main content

Showing 1–50 of 125 results for author: Singh, R

Searching in archive eess. Search in all archives.
.
  1. arXiv:2506.20609  [pdf, ps, other

    cs.SD cs.AI cs.MM eess.AS

    Deciphering GunType Hierarchy through Acoustic Analysis of Gunshot Recordings

    Authors: Ankit Shah, Rita Singh, Bhiksha Raj, Alexander Hauptmann

    Abstract: The escalating rates of gun-related violence and mass shootings represent a significant threat to public safety. Timely and accurate information for law enforcement agencies is crucial in mitigating these incidents. Current commercial gunshot detection systems, while effective, often come with prohibitive costs. This research explores a cost-effective alternative by leveraging acoustic analysis of… ▽ More

    Submitted 25 June, 2025; originally announced June 2025.

    Comments: 4 pages + 1 References

  2. arXiv:2506.18182  [pdf, ps, other

    cs.SD eess.AS

    Human Voice is Unique

    Authors: Rita Singh, Bhiksha Raj

    Abstract: Voice is increasingly being used as a biometric entity in many applications. These range from speaker identification and verification systems to human profiling technologies that attempt to estimate myriad aspects of the speaker's persona from their voice. However, for an entity to be a true biometric identifier, it must be unique. This paper establishes a first framework for calculating the uniqu… ▽ More

    Submitted 22 June, 2025; originally announced June 2025.

    Comments: 15 pages, 1 figure, 2 tables

  3. arXiv:2506.16070  [pdf, ps, other

    eess.SP

    Towards AI-Driven RANs for 6G and Beyond: Architectural Advancements and Future Horizons

    Authors: Mathushaharan Rathakrishnan, Samiru Gayan, Rohit Singh, Amandeep Kaur, Hazer Inaltekin, Sampath Edirisinghe, H. Vincent Poor

    Abstract: It is envisioned that 6G networks will be supported by key architectural principles, including intelligence, decentralization, interoperability, and digitalization. With the advances in artificial intelligence (AI) and machine learning (ML), embedding intelligence into the foundation of wireless communication systems is recognized as essential for 6G and beyond. Existing radio access network (RAN)… ▽ More

    Submitted 19 June, 2025; originally announced June 2025.

  4. arXiv:2506.09375  [pdf, ps, other

    cs.CL cs.SD eess.AS

    CoLMbo: Speaker Language Model for Descriptive Profiling

    Authors: Massa Baali, Shuo Han, Syed Abdul Hannan, Purusottam Samal, Karanveer Singh, Soham Deshmukh, Rita Singh, Bhiksha Raj

    Abstract: Speaker recognition systems are often limited to classification tasks and struggle to generate detailed speaker characteristics or provide context-rich descriptions. These models primarily extract embeddings for speaker identification but fail to capture demographic attributes such as dialect, gender, and age in a structured manner. This paper introduces CoLMbo, a Speaker Language Model (SLM) that… ▽ More

    Submitted 10 June, 2025; originally announced June 2025.

  5. arXiv:2506.08372  [pdf, ps, other

    cs.SD eess.AS

    Multimodal Zero-Shot Framework for Deepfake Hate Speech Detection in Low-Resource Languages

    Authors: Rishabh Ranjan, Likhith Ayinala, Mayank Vatsa, Richa Singh

    Abstract: This paper introduces a novel multimodal framework for hate speech detection in deepfake audio, excelling even in zero-shot scenarios. Unlike previous approaches, our method uses contrastive learning to jointly align audio and text representations across languages. We present the first benchmark dataset with 127,290 paired text and synthesized speech samples in six languages: English and five low-… ▽ More

    Submitted 9 June, 2025; originally announced June 2025.

    Comments: Accepted in Interpseech 2025

  6. arXiv:2506.06772  [pdf, ps, other

    cs.SD eess.AS

    SynHate: Detecting Hate Speech in Synthetic Deepfake Audio

    Authors: Rishabh Ranjan, Kishan Pipariya, Mayank Vatsa, Richa Singh

    Abstract: The rise of deepfake audio and hate speech, powered by advanced text-to-speech, threatens online safety. We present SynHate, the first multilingual dataset for detecting hate speech in synthetic audio, spanning 37 languages. SynHate uses a novel four-class scheme: Real-normal, Real-hate, Fake-normal, and Fake-hate. Built from MuTox and ADIMA datasets, it captures diverse hate speech patterns globa… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

    Comments: Accepted in Interspeech 2025

  7. arXiv:2506.06756  [pdf, ps, other

    cs.SD eess.AS

    Can Quantized Audio Language Models Perform Zero-Shot Spoofing Detection?

    Authors: Bikash Dutta, Rishabh Ranjan, Shyam Sathvik, Mayank Vatsa, Richa Singh

    Abstract: Quantization is essential for deploying large audio language models (LALMs) efficiently in resource-constrained environments. However, its impact on complex tasks, such as zero-shot audio spoofing detection, remains underexplored. This study evaluates the zero-shot capabilities of five LALMs, GAMA, LTU-AS, MERaLiON, Qwen-Audio, and SALMONN, across three distinct datasets: ASVspoof2019, In-the-Wild… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

    Comments: Accepted in Interspeech 2025

  8. arXiv:2505.10048  [pdf, ps, other

    eess.SY

    Planar Herding of Multiple Evaders with a Single Herder

    Authors: Rishabh Kumar Singh, Debraj Chakraborty

    Abstract: A planar herding problem is considered, where a superior pursuer herds a flock of non-cooperative, inferior evaders around a predefined target point. An inverse square law of repulsion is assumed between the pursuer and each evader. Two classes of pursuer trajectories are proposed: (i) a constant angular-velocity spiral, and (ii) a constant angular-velocity circle, both centered around the target… ▽ More

    Submitted 15 May, 2025; originally announced May 2025.

    Comments: 12 pages, 16 figures

  9. arXiv:2505.00750  [pdf

    cs.SD eess.AS

    GVPT -- A software for guided visual pitch tracking

    Authors: Hyunjin Cho, Farhad Tabasi, Jeremy D. Greenlee, Rahul Singh

    Abstract: GVPT (Guided visual pitch tracking) is a publicly available, real-time pitch tracking software designed to guide and evaluate vocal pitch control using visual feedback. Developed for clinical and research applications, the system presents various visual target pitch contour and overlays the subject's pitch in real-time to promote accurate vocal reproduction. GVPT supports difficulty modification,… ▽ More

    Submitted 30 April, 2025; originally announced May 2025.

  10. arXiv:2504.00276  [pdf, other

    eess.SY

    On-the-fly Surrogation for Complex Nonlinear Dynamics

    Authors: E. Javier Olucha, Rajiv Singh, Amritam Das, Roland Tóth

    Abstract: High-fidelity models are essential for accurately capturing nonlinear system dynamics. However, simulation of these models is often computationally too expensive and, due to their complexity, they are not directly suitable for analysis, control design or real-time applications. Surrogate modelling techniques seek to construct simplified representations of these systems with minimal complexity, but… ▽ More

    Submitted 3 April, 2025; v1 submitted 31 March, 2025; originally announced April 2025.

    Comments: Preprint submitted to the 2025 64th IEEE Conference on Decision and Control (CDC)

  11. arXiv:2503.08540  [pdf, other

    cs.SD cs.AI eess.AS

    Mellow: a small audio language model for reasoning

    Authors: Soham Deshmukh, Satvik Dixit, Rita Singh, Bhiksha Raj

    Abstract: Multimodal Audio-Language Models (ALMs) can understand and reason over both audio and text. Typically, reasoning performance correlates with model size, with the best results achieved by models exceeding 8 billion parameters. However, no prior work has explored enabling small audio-language models to perform reasoning tasks, despite the potential applications for edge devices. To address this gap,… ▽ More

    Submitted 11 March, 2025; originally announced March 2025.

    Comments: Checkpoint and dataset available at: https://github.com/soham97/mellow

  12. arXiv:2502.04476  [pdf, other

    cs.SD cs.AI eess.AS

    ADIFF: Explaining audio difference using natural language

    Authors: Soham Deshmukh, Shuo Han, Rita Singh, Bhiksha Raj

    Abstract: Understanding and explaining differences between audio recordings is crucial for fields like audio forensics, quality assessment, and audio generation. This involves identifying and describing audio events, acoustic scenes, signal characteristics, and their emotional impact on listeners. This paper stands out as the first work to comprehensively study the task of explaining audio differences and t… ▽ More

    Submitted 6 February, 2025; originally announced February 2025.

    Comments: Accepted at ICLR 2025. Dataset and checkpoints are available at: https://github.com/soham97/ADIFF

  13. arXiv:2501.09416  [pdf, other

    eess.SP

    Physical Layer Design for Ambient IoT

    Authors: Rohit Singh, Anil Kumar Yerrapragada, Radha Krishna Ganti

    Abstract: There is a growing demand for ultra low power and ultra low complexity devices for applications which require maintenance-free and battery-less operation. One way to serve such applications is through backscatter devices, which communicate using energy harvested from ambient sources such as radio waves transmitted by a reader. Traditional backscatter devices, such as RFID, are limited by range, in… ▽ More

    Submitted 16 January, 2025; originally announced January 2025.

    Comments: 8 pages, 5 figures, 1 table

  14. arXiv:2501.09229  [pdf, other

    cs.LG cs.SD eess.AS

    Tessellated Linear Model for Age Prediction from Voice

    Authors: Dareen Alharthi, Mahsa Zamani, Bhiksha Raj, Rita Singh

    Abstract: Voice biometric tasks, such as age estimation require modeling the often complex relationship between voice features and the biometric variable. While deep learning models can handle such complexity, they typically require large amounts of accurately labeled data to perform well. Such data are often scarce for biometric tasks such as voice-based age prediction. On the other hand, simpler models li… ▽ More

    Submitted 27 January, 2025; v1 submitted 15 January, 2025; originally announced January 2025.

    Comments: Accepted at ICASSP 2025

  15. arXiv:2501.06881  [pdf, ps, other

    eess.SP

    Gaussian Integral based Bayesian Smoother

    Authors: Rohit Kumar Singh, Kundan Kumar, Shovan Bhaumik

    Abstract: This work introduces the Gaussian integration to address a smoothing problem of a nonlinear stochastic state space model. The probability densities of states at each time instant are assumed to be Gaussian, and their means and covariances are evaluated by utilizing the odd-even properties of Gaussian integral, which are further utilized to realize Rauch-Tung-Striebel (RTS) smoothing expressions. G… ▽ More

    Submitted 12 January, 2025; originally announced January 2025.

  16. arXiv:2411.08919  [pdf, other

    eess.SP cs.AI cs.IT cs.LG

    A Machine Learning based Hybrid Receiver for 5G NR PRACH

    Authors: Rohit Singh, Anil Kumar Yerrapragada, Radha Krishna Ganti

    Abstract: Random Access is a critical procedure using which a User Equipment (UE) identifies itself to a Base Station (BS). Random Access starts with the UE transmitting a random preamble on the Physical Random Access Channel (PRACH). In a conventional BS receiver, the UE's specific preamble is identified by correlation with all the possible preambles. The PRACH signal is also used to estimate the timing ad… ▽ More

    Submitted 3 November, 2024; originally announced November 2024.

    Comments: 6 pages, 9 figures

  17. arXiv:2410.12948  [pdf, other

    cs.CL cs.SD eess.AS

    What Do Speech Foundation Models Not Learn About Speech?

    Authors: Abdul Waheed, Hanin Atwany, Bhiksha Raj, Rita Singh

    Abstract: Understanding how speech foundation models capture non-verbal cues is crucial for improving their interpretability and adaptability across diverse tasks. In our work, we analyze several prominent models such as Whisper, Seamless, Wav2Vec, HuBERT, and Qwen2-Audio focusing on their learned representations in both paralinguistic and non-paralinguistic tasks from the Dynamic-SUPERB benchmark. Our stud… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

    Comments: 20 Pages

  18. arXiv:2410.09578  [pdf, ps, other

    cs.SD eess.AS

    Objective Measurements of Voice Quality

    Authors: Hira Dhamyal, Rita Singh

    Abstract: The quality of human voice plays an important role across various fields like music, speech therapy, and communication, yet it lacks a universally accepted, objective definition. Instead, voice quality is referred to using subjective descriptors like "rough", "breathy" etc. Despite this subjectivity, extensive research across disciplines has linked these voice qualities to specific information abo… ▽ More

    Submitted 12 October, 2024; originally announced October 2024.

  19. arXiv:2410.08084  [pdf, other

    eess.IV

    Color-Guided Flying Pixel Correction in Depth Images

    Authors: Ekamresh Vasudevan, Shashank N. Sridhara, Eduardo Pavez, Antonio Ortega, Raghavendra Singh, Srinath Kalluri

    Abstract: We present a novel method to correct flying pixels within data captured by Time-of-flight (ToF) sensors. Flying pixel (FP) artifacts occur when signals from foreground and background objects reach the same sensor pixel, leading to a confident yet incorrect depth estimation in space - floating between two objects. Commercial RGB-D cameras have a complementary setup consisting of ToF sensors to capt… ▽ More

    Submitted 10 October, 2024; originally announced October 2024.

    Comments: 6 pages, 7 figures, Presented at IEEE 26th International Workshop on Multimedia Signal Processing (MMSP)

  20. arXiv:2410.05037  [pdf, other

    cs.SD eess.AS

    Improving Speaker Representations Using Contrastive Losses on Multi-scale Features

    Authors: Satvik Dixit, Massa Baali, Rita Singh, Bhiksha Raj

    Abstract: Speaker verification systems have seen significant advancements with the introduction of Multi-scale Feature Aggregation (MFA) architectures, such as MFA-Conformer and ECAPA-TDNN. These models leverage information from various network depths by concatenating intermediate feature maps before the pooling and projection layers, demonstrating that even shallower feature maps encode valuable speaker-sp… ▽ More

    Submitted 7 October, 2024; originally announced October 2024.

  21. arXiv:2410.03904  [pdf, other

    cs.SD cs.AI eess.AS

    Did You Hear That? Introducing AADG: A Framework for Generating Benchmark Data in Audio Anomaly Detection

    Authors: Ksheeraja Raghavan, Samiran Gode, Ankit Shah, Surabhi Raghavan, Wolfram Burgard, Bhiksha Raj, Rita Singh

    Abstract: We introduce a novel, general-purpose audio generation framework specifically designed for anomaly detection and localization. Unlike existing datasets that predominantly focus on industrial and machine-related sounds, our framework focuses a broader range of environments, particularly useful in real-world scenarios where only audio data are available, such as in video-derived or telephonic audio.… ▽ More

    Submitted 4 October, 2024; originally announced October 2024.

    Comments: 9 pages, under review

  22. arXiv:2410.00047  [pdf, other

    eess.IV cs.LG q-bio.NC

    Looking through the mind's eye via multimodal encoder-decoder networks

    Authors: Arman Afrasiyabi, Erica Busch, Rahul Singh, Dhananjay Bhaskar, Laurent Caplette, Nicholas Turk-Browne, Smita Krishnaswamy

    Abstract: In this work, we explore the decoding of mental imagery from subjects using their fMRI measurements. In order to achieve this decoding, we first created a mapping between a subject's fMRI signals elicited by the videos the subjects watched. This mapping associates the high dimensional fMRI activation states with visual imagery. Next, we prompted the subjects textually, primarily with emotion label… ▽ More

    Submitted 27 September, 2024; originally announced October 2024.

  23. arXiv:2409.07642  [pdf, other

    cs.LG eess.SY

    Deep Learning of Dynamic Systems using System Identification Toolbox(TM)

    Authors: Tianyu Dai, Khaled Aljanaideh, Rong Chen, Rajiv Singh, Alec Stothert, Lennart Ljung

    Abstract: MATLAB(R) releases over the last 3 years have witnessed a continuing growth in the dynamic modeling capabilities offered by the System Identification Toolbox(TM). The emphasis has been on integrating deep learning architectures and training techniques that facilitate the use of deep neural networks as building blocks of nonlinear models. The toolbox offers neural state-space models which can be ex… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Journal ref: IFAC-PapersOnLine, July 2024, 20th IFAC Symposium on System Identification SYSID 2024

  24. arXiv:2409.01066  [pdf, other

    cs.AI eess.SY

    Learning in Hybrid Active Inference Models

    Authors: Poppy Collis, Ryan Singh, Paul F Kinghorn, Christopher L Buckley

    Abstract: An open problem in artificial intelligence is how systems can flexibly learn discrete abstractions that are useful for solving inherently continuous problems. Previous work in computational neuroscience has considered this functional integration of discrete and continuous variables during decision-making under the formalism of active inference (Parr, Friston & de Vries, 2017; Parr & Friston, 2018)… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 11 pages (+ appendix). Accepted to the International Workshop on Active Inference 2024. arXiv admin note: substantial text overlap with arXiv:2408.10970

  25. arXiv:2408.14927  [pdf, other

    eess.IV cs.CV

    Automatic Detection of COVID-19 from Chest X-ray Images Using Deep Learning Model

    Authors: Alloy Das, Rohit Agarwal, Rituparna Singh, Arindam Chowdhury, Debashis Nandi

    Abstract: The infectious disease caused by novel corona virus (2019-nCoV) has been widely spreading since last year and has shaken the entire world. It has caused an unprecedented effect on daily life, global economy and public health. Hence this disease detection has life-saving importance for both patients as well as doctors. Due to limited test kits, it is also a daunting task to test every patient with… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

    Comments: Accepted in AIP Conference Proceedings (Vol. 2424, No. 1)

  26. arXiv:2408.10970  [pdf, other

    cs.AI eess.SY

    Hybrid Recurrent Models Support Emergent Descriptions for Hierarchical Planning and Control

    Authors: Poppy Collis, Ryan Singh, Paul F Kinghorn, Christopher L Buckley

    Abstract: An open problem in artificial intelligence is how systems can flexibly learn discrete abstractions that are useful for solving inherently continuous problems. Previous work has demonstrated that a class of hybrid state-space model known as recurrent switching linear dynamical systems (rSLDS) discover meaningful behavioural units via the piecewise linear decomposition of complex continuous dynamics… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 4 pages, 3 figures

  27. arXiv:2408.07277  [pdf, other

    cs.CL cs.HC cs.SD eess.AS

    Speech vs. Transcript: Does It Matter for Human Annotators in Speech Summarization?

    Authors: Roshan Sharma, Suwon Shon, Mark Lindsey, Hira Dhamyal, Rita Singh, Bhiksha Raj

    Abstract: Reference summaries for abstractive speech summarization require human annotation, which can be performed by listening to an audio recording or by reading textual transcripts of the recording. In this paper, we examine whether summaries based on annotators listening to the recordings differ from those based on annotators reading transcripts. Using existing intrinsic evaluation based on human evalu… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: Accepted to ACL 2024 Main Conference

  28. arXiv:2407.19265  [pdf, other

    cs.SD cs.LG eess.AS

    Towards Robust Few-shot Class Incremental Learning in Audio Classification using Contrastive Representation

    Authors: Riyansha Singh, Parinita Nema, Vinod K Kurmi

    Abstract: In machine learning applications, gradual data ingress is common, especially in audio processing where incremental learning is vital for real-time analytics. Few-shot class-incremental learning addresses challenges arising from limited incoming data. Existing methods often integrate additional trainable components or rely on a fixed embedding extractor post-training on base sessions to mitigate co… ▽ More

    Submitted 7 August, 2024; v1 submitted 27 July, 2024; originally announced July 2024.

    Comments: INTERSPEECH 2024 accepted

  29. arXiv:2407.18062  [pdf, other

    cs.SD eess.AS

    Audio Entailment: Assessing Deductive Reasoning for Audio Understanding

    Authors: Soham Deshmukh, Shuo Han, Hazim Bukhari, Benjamin Elizalde, Hannes Gamper, Rita Singh, Bhiksha Raj

    Abstract: Recent literature uses language to build foundation models for audio. These Audio-Language Models (ALMs) are trained on a vast number of audio-text pairs and show remarkable performance in tasks including Text-to-Audio Retrieval, Captioning, and Question Answering. However, their ability to engage in more complex open-ended tasks, like Interactive Question-Answering, requires proficiency in logica… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  30. arXiv:2407.15300  [pdf, other

    cs.SD eess.AS

    SELM: Enhancing Speech Emotion Recognition for Out-of-Domain Scenarios

    Authors: Hazim Bukhari, Soham Deshmukh, Hira Dhamyal, Bhiksha Raj, Rita Singh

    Abstract: Speech Emotion Recognition (SER) has been traditionally formulated as a classification task. However, emotions are generally a spectrum whose distribution varies from situation to situation leading to poor Out-of-Domain (OOD) performance. We take inspiration from statistical formulation of Automatic Speech Recognition (ASR) and formulate the SER task as generating the most likely sequence of text… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: Accepted at INTERSPEECH 2024

  31. arXiv:2405.13370  [pdf, other

    eess.IV cs.CV cs.LG

    Low-Resolution Chest X-ray Classification via Knowledge Distillation and Multi-task Learning

    Authors: Yasmeena Akhter, Rishabh Ranjan, Richa Singh, Mayank Vatsa

    Abstract: This research addresses the challenges of diagnosing chest X-rays (CXRs) at low resolutions, a common limitation in resource-constrained healthcare settings. High-resolution CXR imaging is crucial for identifying small but critical anomalies, such as nodules or opacities. However, when images are downsized for processing in Computer-Aided Diagnosis (CAD) systems, vital spatial details and receptiv… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: IEEE ISBI 2024

  32. arXiv:2405.09101  [pdf, other

    cs.RO eess.SY

    Adaptive Koopman Embedding for Robust Control of Complex Nonlinear Dynamical Systems

    Authors: Rajpal Singh, Chandan Kumar Sah, Jishnu Keshavan

    Abstract: The discovery of linear embedding is the key to the synthesis of linear control techniques for nonlinear systems. In recent years, while Koopman operator theory has become a prominent approach for learning these linear embeddings through data-driven methods, these algorithms often exhibit limitations in generalizability beyond the distribution captured by training data and are not robust to change… ▽ More

    Submitted 20 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

    Comments: Corrected the title

  33. arXiv:2405.05937  [pdf, other

    eess.SP eess.SY

    Dynamics of a Towed Cable with Sensor-Array for Underwater Target Motion Analysis

    Authors: Rohit Kumar Singh, Subrata Kumar, Shovan Bhaumik

    Abstract: During a war situation, many times an underwater target motion analysis (TMA) is performed using bearing-only measurements, obtained from a sensor array, which is towed by an own-ship with the help of a connected cable. It is well known that the own-ship is required to perform a manoeuvre in order to make the system observable and localise the target successfully. During the maneuver, it is import… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  34. arXiv:2405.05676  [pdf, other

    eess.SP

    Maximum Correntropy Polynomial Chaos Kalman Filter for Underwater Navigation

    Authors: Rohit Kumar Singh, Joydeb Saha, Shovan Bhaumik

    Abstract: This paper develops an underwater navigation solution that utilizes a strapdown inertial navigation system (SINS) and fuses a set of auxiliary sensors such as an acoustic positioning system, Doppler velocity log, depth meter, attitude meter, and magnetometer to accurately estimate an underwater vessel's position and orientation. The conventional integrated navigation system assumes Gaussian measur… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  35. arXiv:2403.15248  [pdf, other

    cs.CV cs.AI eess.IV

    Self-Supervised Backbone Framework for Diverse Agricultural Vision Tasks

    Authors: Sudhir Sornapudi, Rajhans Singh

    Abstract: Computer vision in agriculture is game-changing with its ability to transform farming into a data-driven, precise, and sustainable industry. Deep learning has empowered agriculture vision to analyze vast, complex visual data, but heavily rely on the availability of large annotated datasets. This remains a bottleneck as manual labeling is error-prone, time-consuming, and expensive. The lack of effi… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  36. arXiv:2402.15707  [pdf, other

    eess.SP quant-ph

    A Quick Guide to Quantum Communication

    Authors: Rohit Singh, Roshan M. Bodile

    Abstract: This article provides a quick overview of quantum communication, bringing together several innovative aspects of quantum enabled transmission. We first take a neutral look at the role of quantum communication, presenting its importance for the forthcoming wireless. Then, we summarise the principles and basic mechanisms involved in quantum communication, including quantum entanglement, quantum supe… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  37. arXiv:2402.09585  [pdf, other

    cs.SD eess.AS

    Domain Adaptation for Contrastive Audio-Language Models

    Authors: Soham Deshmukh, Rita Singh, Bhiksha Raj

    Abstract: Audio-Language Models (ALM) aim to be general-purpose audio models by providing zero-shot capabilities at test time. The zero-shot performance of ALM improves by using suitable text prompts for each domain. The text prompts are usually hand-crafted through an ad-hoc process and lead to a drop in ALM generalization and out-of-distribution performance. Existing approaches to improve domain performan… ▽ More

    Submitted 21 July, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: Accepted at INTERSPEECH 2024

  38. arXiv:2402.09244  [pdf, other

    eess.SP

    Zero-energy Devices for 6G: Technical Enablers at a Glance

    Authors: Onel López, Ritesh Kumar Singh, Dinh-Thuy Phan-Huy, Efstathios Katranaras, Nafiseh Mazloum, Riku Jäntti, Hamza Khan, Osmel Rosabal, Pavlos Alexias, Prasoon Raghuwanshi, David Ruiz-Guirola, Bikramjit Singh, Andreas Höglund, Dung Pham Van, Amirhossein Azarbahram, Jeroen Famaey

    Abstract: Low-cost, resource-constrained, maintenance-free, and energy-harvesting (EH) Internet of Things (IoT) devices, referred to as zero-energy devices (ZEDs), are rapidly attracting attention from industry and academia due to their myriad of applications. To date, such devices remain primarily unsupported by modern IoT connectivity solutions due to their intrinsic fabrication, hardware, deployment, and… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: 8 pages, 4 Figures

  39. arXiv:2402.00282  [pdf, other

    eess.AS cs.SD

    PAM: Prompting Audio-Language Models for Audio Quality Assessment

    Authors: Soham Deshmukh, Dareen Alharthi, Benjamin Elizalde, Hannes Gamper, Mahmoud Al Ismail, Rita Singh, Bhiksha Raj, Huaming Wang

    Abstract: While audio quality is a key performance metric for various audio processing tasks, including generative modeling, its objective measurement remains a challenge. Audio-Language Models (ALMs) are pre-trained on audio-text pairs that may contain information about audio quality, the presence of artifacts, or noise. Given an audio input and a text prompt related to quality, an ALM can be used to calcu… ▽ More

    Submitted 31 January, 2024; originally announced February 2024.

  40. arXiv:2401.12803  [pdf, other

    cs.IT cs.AI cs.LG eess.SP

    Enhancements for 5G NR PRACH Reception: An AI/ML Approach

    Authors: Rohit Singh, Anil Kumar Yerrapragada, Jeeva Keshav S, Radha Krishna Ganti

    Abstract: Random Access is an important step in enabling the initial attachment of a User Equipment (UE) to a Base Station (gNB). The UE identifies itself by embedding a Preamble Index (RAPID) in the phase rotation of a known base sequence, which it transmits on the Physical Random Access Channel (PRACH). The signal on the PRACH also enables the estimation of propagation delay, often known as Timing Advance… ▽ More

    Submitted 12 January, 2024; originally announced January 2024.

  41. arXiv:2310.13817  [pdf, other

    eess.SY

    Deep Learning Based Forecasting-Aided State Estimation in Active Distribution Networks

    Authors: Malek Alduhaymi, Ravindra Singh, Firdous Ul Nazir, Bikash C. Pal

    Abstract: Operating an active distribution network (ADN) in the absence of enough measurements, the presence of distributed energy resources, and poor knowledge of responsive demand behaviour is a huge challenge. This paper introduces systematic modelling of demand response behaviour which is then included in Forecasting Aided State Estimation (FASE) for better control of the network. There are several inno… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

  42. arXiv:2310.02298  [pdf, other

    cs.SD cs.AI eess.AS

    Prompting Audios Using Acoustic Properties For Emotion Representation

    Authors: Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha Raj, Rita Singh

    Abstract: Emotions lie on a continuum, but current models treat emotions as a finite valued discrete variable. This representation does not capture the diversity in the expression of emotion. To better represent emotions we propose the use of natural language descriptions (or prompts). In this work, we address the challenge of automatically generating these prompts and training a model to better learn emoti… ▽ More

    Submitted 6 December, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2211.07737

  43. arXiv:2310.00706  [pdf, other

    cs.CL cs.SD eess.AS

    Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech

    Authors: Dareen Alharthi, Roshan Sharma, Hira Dhamyal, Soumi Maiti, Bhiksha Raj, Rita Singh

    Abstract: Modern speech synthesis systems have improved significantly, with synthetic speech being indistinguishable from real speech. However, efficient and holistic evaluation of synthetic speech still remains a significant challenge. Human evaluation using Mean Opinion Score (MOS) is ideal, but inefficient due to high costs. Therefore, researchers have developed auxiliary automatic metrics like Word Erro… ▽ More

    Submitted 1 October, 2023; originally announced October 2023.

  44. arXiv:2309.13544  [pdf

    cs.IR cs.AI cs.LG cs.SD eess.AS

    Related Rhythms: Recommendation System To Discover Music You May Like

    Authors: Rahul Singh, Pranav Kanuparthi

    Abstract: Machine Learning models are being utilized extensively to drive recommender systems, which is a widely explored topic today. This is especially true of the music industry, where we are witnessing a surge in growth. Besides a large chunk of active users, these systems are fueled by massive amounts of data. These large-scale systems yield applications that aim to provide a better user experience and… ▽ More

    Submitted 24 September, 2023; originally announced September 2023.

    ACM Class: I.2.6; H.3.3

  45. arXiv:2309.13227  [pdf, other

    cs.LG cs.SD eess.AS

    Importance of negative sampling in weak label learning

    Authors: Ankit Shah, Fuyu Tang, Zelin Ye, Rita Singh, Bhiksha Raj

    Abstract: Weak-label learning is a challenging task that requires learning from data "bags" containing positive and negative instances, but only the bag labels are known. The pool of negative instances is usually larger than positive instances, thus making selecting the most informative negative instance critical for performance. Such a selection strategy for negative instances from each bag is an open prob… ▽ More

    Submitted 22 September, 2023; originally announced September 2023.

  46. arXiv:2309.07372  [pdf, other

    eess.AS cs.SD

    Training Audio Captioning Models without Audio

    Authors: Soham Deshmukh, Benjamin Elizalde, Dimitra Emmanouilidou, Bhiksha Raj, Rita Singh, Huaming Wang

    Abstract: Automated Audio Captioning (AAC) is the task of generating natural language descriptions given an audio stream. A typical AAC system requires manually curated training data of audio segments and corresponding text caption annotations. The creation of these audio-caption pairs is costly, resulting in general data scarcity for the task. In this work, we address this major limitation and propose an a… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

  47. arXiv:2308.14190  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Score-Based Generative Models for PET Image Reconstruction

    Authors: Imraj RD Singh, Alexander Denker, Riccardo Barbano, Željko Kereta, Bangti Jin, Kris Thielemans, Peter Maass, Simon Arridge

    Abstract: Score-based generative models have demonstrated highly promising results for medical image reconstruction tasks in magnetic resonance imaging or computed tomography. However, their application to Positron Emission Tomography (PET) is still largely unexplored. PET image reconstruction involves a variety of challenges, including Poisson noise with high variance and a wide dynamic range. To address t… ▽ More

    Submitted 23 January, 2024; v1 submitted 27 August, 2023; originally announced August 2023.

    Comments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org/2024:001

    MSC Class: 15A29; 45Q05 ACM Class: I.4.9; J.2; I.2.1

    Journal ref: Machine.Learning.for.Biomedical.Imaging. 2 (2024)

  48. arXiv:2307.13953  [pdf, other

    cs.CV cs.SD eess.AS

    The Hidden Dance of Phonemes and Visage: Unveiling the Enigmatic Link between Phonemes and Facial Features

    Authors: Liao Qu, Xianwei Zou, Xiang Li, Yandong Wen, Rita Singh, Bhiksha Raj

    Abstract: This work unveils the enigmatic link between phonemes and facial features. Traditional studies on voice-face correlations typically involve using a long period of voice input, including generating face images from voices and reconstructing 3D face meshes from voices. However, in situations like voice-based crimes, the available voice evidence may be short and limited. Additionally, from a physiolo… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

    Comments: Interspeech 2023

  49. arXiv:2307.13948  [pdf, other

    cs.CV cs.SD eess.AS

    Rethinking Voice-Face Correlation: A Geometry View

    Authors: Xiang Li, Yandong Wen, Muqiao Yang, Jinglu Wang, Rita Singh, Bhiksha Raj

    Abstract: Previous works on voice-face matching and voice-guided face synthesis demonstrate strong correlations between voice and face, but mainly rely on coarse semantic cues such as gender, age, and emotion. In this paper, we aim to investigate the capability of reconstructing the 3D facial shape from voice from a geometry perspective without any semantic information. We propose a voice-anthropometric mea… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

    Comments: ACM Multimedia 2023

  50. arXiv:2307.08217  [pdf, other

    cs.CL cs.SD eess.AS

    BASS: Block-wise Adaptation for Speech Summarization

    Authors: Roshan Sharma, Kenneth Zheng, Siddhant Arora, Shinji Watanabe, Rita Singh, Bhiksha Raj

    Abstract: End-to-end speech summarization has been shown to improve performance over cascade baselines. However, such models are difficult to train on very large inputs (dozens of minutes or hours) owing to compute restrictions and are hence trained with truncated model inputs. Truncation leads to poorer models, and a solution to this problem rests in block-wise modeling, i.e., processing a portion of the i… ▽ More

    Submitted 16 July, 2023; originally announced July 2023.

    Comments: Accepted at Interspeech 2023