Skip to main content

Showing 1–16 of 16 results for author: Dhamyal, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2410.09578  [pdf, ps, other

    cs.SD eess.AS

    Objective Measurements of Voice Quality

    Authors: Hira Dhamyal, Rita Singh

    Abstract: The quality of human voice plays an important role across various fields like music, speech therapy, and communication, yet it lacks a universally accepted, objective definition. Instead, voice quality is referred to using subjective descriptors like "rough", "breathy" etc. Despite this subjectivity, extensive research across disciplines has linked these voice qualities to specific information abo… ▽ More

    Submitted 12 October, 2024; originally announced October 2024.

  2. arXiv:2409.05799  [pdf, other

    cs.SD cs.CL

    PDAF: A Phonetic Debiasing Attention Framework For Speaker Verification

    Authors: Massa Baali, Abdulhamid Aldoobi, Hira Dhamyal, Rita Singh, Bhiksha Raj

    Abstract: Speaker verification systems are crucial for authenticating identity through voice. Traditionally, these systems focus on comparing feature vectors, overlooking the speech's content. However, this paper challenges this by highlighting the importance of phonetic dominance, a measure of the frequency or duration of phonemes, as a crucial cue in speaker verification. A novel Phoneme Debiasing Attenti… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

    Comments: Accepted to SLT

  3. arXiv:2408.07277  [pdf, other

    cs.CL cs.HC cs.SD eess.AS

    Speech vs. Transcript: Does It Matter for Human Annotators in Speech Summarization?

    Authors: Roshan Sharma, Suwon Shon, Mark Lindsey, Hira Dhamyal, Rita Singh, Bhiksha Raj

    Abstract: Reference summaries for abstractive speech summarization require human annotation, which can be performed by listening to an audio recording or by reading textual transcripts of the recording. In this paper, we examine whether summaries based on annotators listening to the recordings differ from those based on annotators reading transcripts. Using existing intrinsic evaluation based on human evalu… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: Accepted to ACL 2024 Main Conference

  4. arXiv:2407.15300  [pdf, other

    cs.SD eess.AS

    SELM: Enhancing Speech Emotion Recognition for Out-of-Domain Scenarios

    Authors: Hazim Bukhari, Soham Deshmukh, Hira Dhamyal, Bhiksha Raj, Rita Singh

    Abstract: Speech Emotion Recognition (SER) has been traditionally formulated as a classification task. However, emotions are generally a spectrum whose distribution varies from situation to situation leading to poor Out-of-Domain (OOD) performance. We take inspiration from statistical formulation of Automatic Speech Recognition (ASR) and formulate the SER task as generating the most likely sequence of text… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: Accepted at INTERSPEECH 2024

  5. arXiv:2406.10083  [pdf, other

    cs.CL cs.SD eess.AS

    On the Evaluation of Speech Foundation Models for Spoken Language Understanding

    Authors: Siddhant Arora, Ankita Pasad, Chung-Ming Chien, Jionghao Han, Roshan Sharma, Jee-weon Jung, Hira Dhamyal, William Chen, Suwon Shon, Hung-yi Lee, Karen Livescu, Shinji Watanabe

    Abstract: The Spoken Language Understanding Evaluation (SLUE) suite of benchmark tasks was recently introduced to address the need for open resources and benchmarking of complex spoken language understanding (SLU) tasks, including both classification and sequence generation tasks, on natural speech. The benchmark has demonstrated preliminary success in using pre-trained speech foundation models (SFM) for th… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted at ACL Findings 2024

  6. arXiv:2310.04445  [pdf, other

    cs.CL cs.AI cs.LG

    LoFT: Local Proxy Fine-tuning For Improving Transferability Of Adversarial Attacks Against Large Language Model

    Authors: Muhammad Ahmed Shah, Roshan Sharma, Hira Dhamyal, Raphael Olivier, Ankit Shah, Joseph Konan, Dareen Alharthi, Hazim T Bukhari, Massa Baali, Soham Deshmukh, Michael Kuhlmann, Bhiksha Raj, Rita Singh

    Abstract: It has been shown that Large Language Model (LLM) alignments can be circumvented by appending specially crafted attack suffixes with harmful queries to elicit harmful responses. To conduct attacks against private target models whose characterization is unknown, public models can be used as proxies to fashion the attack, with successful attacks being transferred from public proxies to private targe… ▽ More

    Submitted 21 October, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

  7. arXiv:2310.02298  [pdf, other

    cs.SD cs.AI eess.AS

    Prompting Audios Using Acoustic Properties For Emotion Representation

    Authors: Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha Raj, Rita Singh

    Abstract: Emotions lie on a continuum, but current models treat emotions as a finite valued discrete variable. This representation does not capture the diversity in the expression of emotion. To better represent emotions we propose the use of natural language descriptions (or prompts). In this work, we address the challenge of automatically generating these prompts and training a model to better learn emoti… ▽ More

    Submitted 6 December, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: arXiv admin note: substantial text overlap with arXiv:2211.07737

  8. arXiv:2310.00706  [pdf, other

    cs.CL cs.SD eess.AS

    Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech

    Authors: Dareen Alharthi, Roshan Sharma, Hira Dhamyal, Soumi Maiti, Bhiksha Raj, Rita Singh

    Abstract: Modern speech synthesis systems have improved significantly, with synthetic speech being indistinguishable from real speech. However, efficient and holistic evaluation of synthetic speech still remains a significant challenge. Human evaluation using Mean Opinion Score (MOS) is ideal, but inefficient due to high costs. Therefore, researchers have developed auxiliary automatic metrics like Word Erro… ▽ More

    Submitted 1 October, 2023; originally announced October 2023.

  9. arXiv:2211.07737  [pdf, other

    cs.SD cs.LG eess.AS

    Describing emotions with acoustic property prompts for speech emotion recognition

    Authors: Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha Raj, Rita Singh

    Abstract: Emotions lie on a broad continuum and treating emotions as a discrete number of classes limits the ability of a model to capture the nuances in the continuum. The challenge is how to describe the nuances of emotions and how to enable a model to learn the descriptions. In this work, we devise a method to automatically create a description (or prompt) for a given audio by computing acoustic properti… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

  10. arXiv:2210.16642  [pdf, other

    cs.SD cs.AI cs.CL eess.AS

    Unifying the Discrete and Continuous Emotion labels for Speech Emotion Recognition

    Authors: Roshan Sharma, Hira Dhamyal, Bhiksha Raj, Rita Singh

    Abstract: Traditionally, in paralinguistic analysis for emotion detection from speech, emotions have been identified with discrete or dimensional (continuous-valued) labels. Accordingly, models that have been proposed for emotion detection use one or the other of these label types. However, psychologists like Russell and Plutchik have proposed theories and models that unite these views, maintaining that the… ▽ More

    Submitted 29 October, 2022; originally announced October 2022.

    Comments: Under Review at ICASSP 2023

  11. arXiv:2206.12568  [pdf, other

    cs.SD cs.AI eess.AS

    Self-supervision and Learnable STRFs for Age, Emotion, and Country Prediction

    Authors: Roshan Sharma, Tyler Vuong, Mark Lindsey, Hira Dhamyal, Rita Singh, Bhiksha Raj

    Abstract: This work presents a multitask approach to the simultaneous estimation of age, country of origin, and emotion given vocal burst audio for the 2022 ICML Expressive Vocalizations Challenge ExVo-MultiTask track. The method of choice utilized a combination of spectro-temporal modulation and self-supervised features, followed by an encoder-decoder network organized in a multitask paradigm. We evaluate… ▽ More

    Submitted 25 June, 2022; originally announced June 2022.

    Journal ref: Proceedings of the 39th International Conference on Machine Learning, Baltimore, Maryland, USA, PMLR 162, 2022

  12. arXiv:2204.04802  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    On the pragmatism of using binary classifiers over data intensive neural network classifiers for detection of COVID-19 from voice

    Authors: Ankit Shah, Hira Dhamyal, Yang Gao, Daniel Arancibia, Mario Arancibia, Bhiksha Raj, Rita Singh

    Abstract: Lately, there has been a global effort by multiple research groups to detect COVID-19 from voice. Different researchers use different kinds of information from the voice signal to achieve this. Various types of phonated sounds and the sound of cough and breath have all been used with varying degree of success in automated voice-based COVID-19 detection apps. In this paper, we show that detecting C… ▽ More

    Submitted 25 October, 2022; v1 submitted 10 April, 2022; originally announced April 2022.

    Comments: Submitted to ICASSP 2022

  13. arXiv:2110.04678  [pdf, other

    cs.SD cs.AI eess.AS

    An Overview of Techniques for Biomarker Discovery in Voice Signal

    Authors: Rita Singh, Ankit Shah, Hira Dhamyal

    Abstract: This paper reflects on the effect of several categories of medical conditions on human voice, focusing on those that may be hypothesized to have effects on voice, but for which the changes themselves may be subtle enough to have eluded observation in standard analytical examinations of the voice signal. It presents three categories of techniques that can potentially uncover such elusive biomarkers… ▽ More

    Submitted 9 October, 2021; originally announced October 2021.

    Comments: Last two authors contributed equally to the paper

  14. Masked Proxy Loss For Text-Independent Speaker Verification

    Authors: Jiachen Lian, Aiswarya Vinod Kumar, Hira Dhamyal, Bhiksha Raj, Rita Singh

    Abstract: Open-set speaker recognition can be regarded as a metric learning problem, which is to maximize inter-class variance and minimize intra-class variance. Supervised metric learning can be categorized into entity-based learning and proxy-based learning. Most of the existing metric learning objectives like Contrastive, Triplet, Prototypical, GE2E, etc all belong to the former division, the performance… ▽ More

    Submitted 24 June, 2021; v1 submitted 9 November, 2020; originally announced November 2020.

    Comments: Accepted at Interspeech 2021

  15. arXiv:1911.05733  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    The phonetic bases of vocal expressed emotion: natural versus acted

    Authors: Hira Dhamyal, Shahan Ali Memon, Bhiksha Raj, Rita Singh

    Abstract: Can vocal emotions be emulated? This question has been a recurrent concern of the speech community, and has also been vigorously investigated. It has been fueled further by its link to the issue of validity of acted emotion databases. Much of the speech and vocal emotion research has relied on acted emotion databases as valid proxies for studying natural emotions. To create models that generalize… ▽ More

    Submitted 24 July, 2020; v1 submitted 12 November, 2019; originally announced November 2019.

    Comments: 5 pages, 6 figures

  16. arXiv:1910.11386  [pdf, other

    cs.CL cs.DB cs.HC

    Detecting gender differences in perception of emotion in crowdsourced data

    Authors: Shahan Ali Memon, Hira Dhamyal, Oren Wright, Daniel Justice, Vijaykumar Palat, William Boler, Bhiksha Raj, Rita Singh

    Abstract: Do men and women perceive emotions differently? Popular convictions place women as more emotionally perceptive than men. Empirical findings, however, remain inconclusive. Most prior studies focus on visual modalities. In addition, almost all of the studies are limited to experiments within controlled environments. Generalizability and scalability of these studies has not been sufficiently establis… ▽ More

    Submitted 4 November, 2019; v1 submitted 24 October, 2019; originally announced October 2019.