Skip to main content

Showing 1–18 of 18 results for author: Berisha, V

Searching in archive eess. Search in all archives.
.
  1. arXiv:2503.15627  [pdf, other

    eess.AS eess.SP

    A Speech Production Model for Radar: Connecting Speech Acoustics with Radar-Measured Vibrations

    Authors: Isabella Lenz, Yu Rong, Daniel Bliss, Julie Liss, Visar Berisha

    Abstract: Millimeter Wave (mmWave) radar has emerged as a promising modality for speech sensing, offering advantages over traditional microphones. Prior works have demonstrated that radar captures motion signals related to vocal vibrations, but there is a gap in the understanding of the analytical connection between radar-measured vibrations and acoustic speech signals. We establish a mathematical framework… ▽ More

    Submitted 19 March, 2025; originally announced March 2025.

    Comments: 5 pages, 6 figure, InterSpeech Conference

  2. arXiv:2502.01685  [pdf, other

    cs.AI cs.CL cs.CV cs.SD eess.AS

    Automated Extraction of Spatio-Semantic Graphs for Identifying Cognitive Impairment

    Authors: Si-Ioi Ng, Pranav S. Ambadi, Kimberly D. Mueller, Julie Liss, Visar Berisha

    Abstract: Existing methods for analyzing linguistic content from picture descriptions for assessment of cognitive-linguistic impairment often overlook the participant's visual narrative path, which typically requires eye tracking to assess. Spatio-semantic graphs are a useful tool for analyzing this narrative path from transcripts alone, however they are limited by the need for manual tagging of content inf… ▽ More

    Submitted 2 February, 2025; originally announced February 2025.

    Comments: To appear in ICASSP 2025

  3. arXiv:2501.15858  [pdf

    cs.CL cs.SD eess.AS

    Applications of Artificial Intelligence for Cross-language Intelligibility Assessment of Dysarthric Speech

    Authors: Eunjung Yeo, Julie Liss, Visar Berisha, David Mortensen

    Abstract: Purpose: Speech intelligibility is a critical outcome in the assessment and management of dysarthria, yet most research and clinical practices have focused on English, limiting their applicability across languages. This commentary introduces a conceptual framework--and a demonstration of how it can be implemented--leveraging artificial intelligence (AI) to advance cross-language intelligibility as… ▽ More

    Submitted 8 May, 2025; v1 submitted 27 January, 2025; originally announced January 2025.

    Comments: 14 pages, 2 figure, 2 tables

  4. arXiv:2410.21640  [pdf, other

    eess.AS cs.AI cs.SD

    A Tutorial on Clinical Speech AI Development: From Data Collection to Model Validation

    Authors: Si-Ioi Ng, Lingfeng Xu, Ingo Siegert, Nicholas Cummins, Nina R. Benway, Julie Liss, Visar Berisha

    Abstract: There has been a surge of interest in leveraging speech as a marker of health for a wide spectrum of conditions. The underlying premise is that any neurological, mental, or physical deficits that impact speech production can be objectively assessed via automated analysis of speech. Recent advances in speech-based Artificial Intelligence (AI) models for diagnosing and tracking mental health, cognit… ▽ More

    Submitted 28 October, 2024; originally announced October 2024.

    Comments: 76 pages, 24 figures

  5. arXiv:2310.17049  [pdf, other

    cs.SD cs.AI eess.AS

    Learning Repeatable Speech Embeddings Using An Intra-class Correlation Regularizer

    Authors: Jianwei Zhang, Suren Jayasuriya, Visar Berisha

    Abstract: A good supervised embedding for a specific machine learning task is only sensitive to changes in the label of interest and is invariant to other confounding factors. We leverage the concept of repeatability from measurement theory to describe this property and propose to use the intra-class correlation coefficient (ICC) to evaluate the repeatability of embeddings. We then propose a novel regulariz… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: Accepted by NeurIPS 2023

  6. Requirements for Mass Adoption of Assistive Listening Technology by the General Public

    Authors: Thomas B. Kaufmann, Mehdi Foroogozar, Julie Liss, Visar Berisha

    Abstract: Assistive listening systems (ALSs) dramatically increase speech intelligibility and reduce listening effort. It is very likely that essentially everyone, not only individuals with hearing loss, would benefit from the increased signal-to-noise ratio an ALS provides in almost any listening scenario. However, ALSs are rarely used by anyone other than people with severe to profound hearing losses. To… ▽ More

    Submitted 3 May, 2023; v1 submitted 4 March, 2023; originally announced March 2023.

    Comments: Accepted to ICASSP 2023

    Journal ref: 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW), Rhodes Island, Greece, 2023, pp. 1-5

  7. arXiv:2211.09858  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Robust Vocal Quality Feature Embeddings for Dysphonic Voice Detection

    Authors: Jianwei Zhang, Julie Liss, Suren Jayasuriya, Visar Berisha

    Abstract: Approximately 1.2% of the world's population has impaired voice production. As a result, automatic dysphonic voice detection has attracted considerable academic and clinical interest. However, existing methods for automated voice assessment often fail to generalize outside the training conditions or to other related applications. In this paper, we propose a deep learning framework for generating a… ▽ More

    Submitted 26 January, 2023; v1 submitted 17 November, 2022; originally announced November 2022.

    Comments: This manuscript is submitted on July 06, 2022 to IEEE/ACM Transactions on Audio, Speech, and Language Processing for peer-review

  8. arXiv:2210.09334  [pdf

    eess.AS cs.LG cs.SD

    TorchDIVA: An Extensible Computational Model of Speech Production built on an Open-Source Machine Learning Library

    Authors: Sean Kinahan, Julie Liss, Visar Berisha

    Abstract: The DIVA model is a computational model of speech motor control that combines a simulation of the brain regions responsible for speech production with a model of the human vocal tract. The model is currently implemented in Matlab Simulink; however, this is less than ideal as most of the development in speech technology research is done in Python. This means there is a wealth of machine learning to… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

  9. arXiv:2203.10054  [pdf, ps, other

    eess.AS

    Consonant-Vowel Transition Models Based on Deep Learning for Objective Evaluation of Articulation

    Authors: Vikram C. Mathad, Julie M. Liss, Kathy Chapman, Nancy Scherer, Visar Berisha

    Abstract: Spectro-temporal dynamics of consonant-vowel (CV) transition regions are considered to provide robust cues related to articulation. In this work, we propose an objective measure of precise articulation, dubbed the objective articulation measure (OAM), by analyzing the CV transitions segmented around vowel onsets. The OAM is derived based on the posteriors of a convolutional neural network pre-trai… ▽ More

    Submitted 18 March, 2022; originally announced March 2022.

  10. Restoring degraded speech via a modified diffusion model

    Authors: Jianwei Zhang, Suren Jayasuriya, Visar Berisha

    Abstract: There are many deterministic mathematical operations (e.g. compression, clipping, downsampling) that degrade speech quality considerably. In this paper we introduce a neural network architecture, based on a modification of the DiffWave model, that aims to restore the original speech signal. DiffWave, a recently published diffusion-based vocoder, has shown state-of-the-art synthesized speech qualit… ▽ More

    Submitted 2 September, 2021; v1 submitted 22 April, 2021; originally announced April 2021.

    Journal ref: Proc. Interspeech 2021, 221-225, 2021)

  11. arXiv:2009.11354  [pdf, other

    eess.AS

    A Deep Learning Algorithm for Objective Assessment of Hypernasality in Children with Cleft Palate

    Authors: Vikram C. Mathad, Nancy Scherer, Kathy Chapman, Julie M. Liss, Visar Berisha

    Abstract: Objectives: Evaluation of hypernasality requires extensive perceptual training by clinicians and extending this training on a large scale internationally is untenable; this compounds the health disparities that already exist among children with cleft. In this work, we present the objective hypernasality measure (OHM), a speech analytics algorithm that automatically measures hypernasality in speech… ▽ More

    Submitted 23 September, 2020; originally announced September 2020.

  12. arXiv:1911.11360  [pdf, other

    eess.AS cs.SD eess.SP

    Robust Estimation of Hypernasality in Dysarthria with Acoustic Model Likelihood Features

    Authors: Michael Saxon, Ayush Tripathi, Yishan Jiao, Julie Liss, Visar Berisha

    Abstract: Hypernasality is a common characteristic symptom across many motor-speech disorders. For voiced sounds, hypernasality introduces an additional resonance in the lower frequencies and, for unvoiced sounds, there is reduced articulatory precision due to air escaping through the nasal cavity. However, the acoustic manifestation of these symptoms is highly variable, making hypernasality estimation very… ▽ More

    Submitted 5 August, 2020; v1 submitted 26 November, 2019; originally announced November 2019.

    Comments: 12 pages, 9 figures, 2 tables

    Journal ref: IEEE/ACM Trans. on Audio, Speech, and Language Proc. 28 (2020) 2511-2522

  13. arXiv:1906.01157  [pdf, other

    cs.CL cs.SD eess.AS eess.SP

    A Review of Automated Speech and Language Features for Assessment of Cognitive and Thought Disorders

    Authors: Rohit Voleti, Julie M. Liss, Visar Berisha

    Abstract: It is widely accepted that information derived from analyzing speech (the acoustic signal) and language production (words and sentences) serves as a useful window into the health of an individual's cognitive ability. In fact, most neuropsychological testing batteries have a component related to speech and language where clinicians elicit speech from patients for subjective evaluation across a broa… ▽ More

    Submitted 4 November, 2019; v1 submitted 3 June, 2019; originally announced June 2019.

    Comments: \c{opyright} 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

    Report number: J-STSP-AAHD-00183-2019

  14. arXiv:1811.07021  [pdf, other

    cs.CL cs.SD eess.AS

    Investigating the Effects of Word Substitution Errors on Sentence Embeddings

    Authors: Rohit Voleti, Julie M. Liss, Visar Berisha

    Abstract: A key initial step in several natural language processing (NLP) tasks involves embedding phrases of text to vectors of real numbers that preserve semantic meaning. To that end, several methods have been recently proposed with impressive results on semantic similarity tasks. However, all of these approaches assume that perfect transcripts are available when generating the embeddings. While this is… ▽ More

    Submitted 24 April, 2019; v1 submitted 16 November, 2018; originally announced November 2018.

    Comments: 4 Pages, 2 figures. Copyright IEEE 2019. Accepted and to appear in the Proceedings of the 44th International Conference on Acoustics, Speech, and Signal Processing 2019 (IEEE-ICASSP-2019), May 12-17 in Brighton, U.K. Personal use of this material is permitted. However, permission to reprint/republish this material must be obtained from the IEEE

  15. arXiv:1808.01535  [pdf, other

    eess.AS cs.CL cs.LG stat.ML

    Triplet Network with Attention for Speaker Diarization

    Authors: Huan Song, Megan Willi, Jayaraman J. Thiagarajan, Visar Berisha, Andreas Spanias

    Abstract: In automatic speech processing systems, speaker diarization is a crucial front-end component to separate segments from different speakers. Inspired by the recent success of deep neural networks (DNNs) in semantic inferencing, triplet loss-based architectures have been successfully used for this problem. However, existing work utilizes conventional i-vectors as the input representation and builds s… ▽ More

    Submitted 4 August, 2018; originally announced August 2018.

    Comments: Interspeech2018

  16. arXiv:1807.01738  [pdf, other

    eess.AS cs.SD

    Investigating the role of L1 in automatic pronunciation evaluation of L2 speech

    Authors: Ming Tu, Anna Grabek, Julie Liss, Visar Berisha

    Abstract: Automatic pronunciation evaluation plays an important role in pronunciation training and second language education. This field draws heavily on concepts from automatic speech recognition (ASR) to quantify how close the pronunciation of non-native speech is to native-like pronunciation. However, it is known that the formation of accent is related to pronunciation patterns of both the target languag… ▽ More

    Submitted 4 July, 2018; originally announced July 2018.

    Comments: To appear in Interspeech 2018

  17. arXiv:1804.10325  [pdf, other

    eess.AS

    Simulating dysarthric speech for training data augmentation in clinical speech applications

    Authors: Yishan Jiao, Ming Tu, Visar Berisha, Julie Liss

    Abstract: Training machine learning algorithms for speech applications requires large, labeled training data sets. This is problematic for clinical applications where obtaining such data is prohibitively expensive because of privacy concerns or lack of access. As a result, clinical speech applications are typically developed using small data sets with only tens of speakers. In this paper, we propose a metho… ▽ More

    Submitted 26 April, 2018; originally announced April 2018.

    Comments: Will appear in Proc. of ICASSP 2018

  18. arXiv:1804.08663  [pdf, other

    eess.AS cs.SD

    A Discriminative Acoustic-Prosodic Approach for Measuring Local Entrainment

    Authors: Megan M. Willi, Stephanie A. Borrie, Tyson S. Barrett, Ming Tu, Visar Berisha

    Abstract: Acoustic-prosodic entrainment describes the tendency of humans to align or adapt their speech acoustics to each other in conversation. This alignment of spoken behavior has important implications for conversational success. However, modeling the subtle nature of entrainment in spoken dialogue continues to pose a challenge. In this paper, we propose a straightforward definition for local entrainmen… ▽ More

    Submitted 12 July, 2018; v1 submitted 23 April, 2018; originally announced April 2018.