Skip to main content

Showing 1–12 of 12 results for author: Siriwardena, Y M

.
  1. A Multimodal Framework for the Assessment of the Schizophrenia Spectrum

    Authors: Gowtham Premananth, Yashish M. Siriwardena, Philip Resnik, Sonia Bansal, Deanna L. Kelly, Carol Espy-Wilson

    Abstract: This paper presents a novel multimodal framework to distinguish between different symptom classes of subjects in the schizophrenia spectrum and healthy controls using audio, video, and text modalities. We implemented Convolution Neural Network and Long Short Term Memory based unimodal models and experimented on various multimodal fusion approaches to come up with the proposed framework. We utilize… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted to be presented at Interspeech 2024

  2. arXiv:2406.05947  [pdf, other

    eess.AS

    Accent Conversion with Articulatory Representations

    Authors: Yashish M. Siriwardena, Nathan Swedlow, Audrey Howard, Evan Gitterman, Dan Darcy, Carol Espy-Wilson, Andrea Fanelli

    Abstract: Conversion of non-native accented speech to native (American) English has a wide range of applications such as improving intelligibility of non-native speech. Previous work on this domain has used phonetic posteriograms as the target speech representation to train an acoustic model which is then used to extract a compact representation of input speech for accent conversion. In this work, we introd… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: Accepted at INTERSPEECH 2024

  3. arXiv:2309.15136  [pdf, other

    eess.SP cs.MM cs.SD eess.AS eess.IV

    A multi-modal approach for identifying schizophrenia using cross-modal attention

    Authors: Gowtham Premananth, Yashish M. Siriwardena, Philip Resnik, Carol Espy-Wilson

    Abstract: This study focuses on how different modalities of human communication can be used to distinguish between healthy controls and subjects with schizophrenia who exhibit strong positive symptoms. We developed a multi-modal schizophrenia classification system using audio, video, and text. Facial action units and vocal tract variables were extracted as low-level features from video and audio respectivel… ▽ More

    Submitted 18 April, 2024; v1 submitted 26 September, 2023; originally announced September 2023.

    Comments: Accepted to Annual International Conference of the IEEE Engineering in Medicine and Biology Society 2024

  4. arXiv:2309.09220  [pdf, other

    eess.AS cs.AI cs.SD

    Improving Speech Inversion Through Self-Supervised Embeddings and Enhanced Tract Variables

    Authors: Ahmed Adel Attia, Yashish M. Siriwardena, Carol Espy-Wilson

    Abstract: The performance of deep learning models depends significantly on their capacity to encode input features efficiently and decode them into meaningful outputs. Better input and output representation has the potential to boost models' performance and generalization. In the context of acoustic-to-articulatory speech inversion (SI) systems, we study the impact of utilizing speech representations acquir… ▽ More

    Submitted 7 September, 2024; v1 submitted 17 September, 2023; originally announced September 2023.

  5. arXiv:2306.00203  [pdf, ps, other

    eess.AS

    Speaker-independent Speech Inversion for Estimation of Nasalance

    Authors: Yashish M. Siriwardena, Carol Espy-Wilson, Suzanne Boyce, Mark K. Tiede, Liran Oren

    Abstract: The velopharyngeal (VP) valve regulates the opening between the nasal and oral cavities. This valve opens and closes through a coordinated motion of the velum and pharyngeal walls. Nasalance is an objective measure derived from the oral and nasal acoustic signals that correlate with nasality. In this work, we evaluate the degree to which the nasalance measure reflects fine-grained patterns of VP m… ▽ More

    Submitted 31 May, 2023; originally announced June 2023.

    Comments: Interspeech 2023

  6. Acoustic-to-Articulatory Speech Inversion Features for Mispronunciation Detection of /r/ in Child Speech Sound Disorders

    Authors: Nina R Benway, Yashish M Siriwardena, Jonathan L Preston, Elaine Hitchcock, Tara McAllister, Carol Espy-Wilson

    Abstract: Acoustic-to-articulatory speech inversion could enhance automated clinical mispronunciation detection to provide detailed articulatory feedback unattainable by formant-based mispronunciation detection algorithms; however, it is unclear the extent to which a speech inversion system trained on adult speech performs in the context of (1) child and (2) clinical speech. In the absence of an articulator… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: *denotes equal contribution. To appear in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2023

    Journal ref: Proc. INTERSPEECH 2023, 4568-4572

  7. arXiv:2210.16454  [pdf, ps, other

    eess.AS cs.SD

    Learning to Compute the Articulatory Representations of Speech with the MIRRORNET

    Authors: Yashish M. Siriwardena, Carol Espy-Wilson, Shihab Shamma

    Abstract: Most organisms including humans function by coordinating and integrating sensory signals with motor actions to survive and accomplish desired tasks. Learning these complex sensorimotor mappings proceeds simultaneously and often in an unsupervised or semi-supervised fashion. An autoencoder architecture (MirrorNet) inspired by this sensorimotor learning paradigm is explored in this work to control a… ▽ More

    Submitted 25 May, 2023; v1 submitted 28 October, 2022; originally announced October 2022.

    Comments: Interspeech 2023

    Journal ref: Interspeech 2023

  8. arXiv:2210.16450  [pdf, ps, other

    eess.AS cs.SD

    The Secret Source : Incorporating Source Features to Improve Acoustic-to-Articulatory Speech Inversion

    Authors: Yashish M. Siriwardena, Carol Espy-Wilson

    Abstract: In this work, we incorporated acoustically derived source features, aperiodicity, periodicity and pitch as additional targets to an acoustic-to-articulatory speech inversion (SI) system. We also propose a Temporal Convolution based SI system, which uses auditory spectrograms as the input speech representation, to learn long-range dependencies and complex interactions between the source and vocal t… ▽ More

    Submitted 28 October, 2022; originally announced October 2022.

  9. Acoustic-to-articulatory Speech Inversion with Multi-task Learning

    Authors: Yashish M. Siriwardena, Ganesh Sivaraman, Carol Espy-Wilson

    Abstract: Multi-task learning (MTL) frameworks have proven to be effective in diverse speech related tasks like automatic speech recognition (ASR) and speech emotion recognition. This paper proposes a MTL framework to perform acoustic-to-articulatory speech inversion by simultaneously learning an acoustic to phoneme mapping as a shared task. We use the Haskins Production Rate Comparison (HPRC) database whic… ▽ More

    Submitted 26 May, 2022; originally announced May 2022.

    Journal ref: Proc. Interspeech 2022

  10. arXiv:2205.13086  [pdf, ps, other

    eess.AS

    Audio Data Augmentation for Acoustic-to-articulatory Speech Inversion using Bidirectional Gated RNNs

    Authors: Yashish M. Siriwardena, Ahmed Adel Attia, Ganesh Sivaraman, Carol Espy-Wilson

    Abstract: Data augmentation has proven to be a promising prospect in improving the performance of deep learning models by adding variability to training data. In previous work with developing a noise robust acoustic-to-articulatory speech inversion system, we have shown the importance of noise augmentation to improve the performance of speech inversion in noisy speech. In this work, we compare and contrast… ▽ More

    Submitted 31 May, 2023; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: EUSIPCO 2023

  11. The Mirrornet : Learning Audio Synthesizer Controls Inspired by Sensorimotor Interaction

    Authors: Yashish M. Siriwardena, Guilhem Marion, Shihab Shamma

    Abstract: Experiments to understand the sensorimotor neural interactions in the human cortical speech system support the existence of a bidirectional flow of interactions between the auditory and motor regions. Their key function is to enable the brain to `learn' how to control the vocal tract for speech production. This idea is the impetus for the recently proposed "MirrorNet", a constrained autoencoder ar… ▽ More

    Submitted 18 February, 2022; v1 submitted 11 October, 2021; originally announced October 2021.

    Journal ref: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  12. arXiv:2110.04440  [pdf, other

    eess.AS cs.MM cs.SD

    Multimodal Approach for Assessing Neuromotor Coordination in Schizophrenia Using Convolutional Neural Networks

    Authors: Yashish M. Siriwardena, Chris Kitchen, Deanna L. Kelly, Carol Espy-Wilson

    Abstract: This study investigates the speech articulatory coordination in schizophrenia subjects exhibiting strong positive symptoms (e.g. hallucinations and delusions), using two distinct channel-delay correlation methods. We show that the schizophrenic subjects with strong positive symptoms and who are markedly ill pose complex articulatory coordination pattern in facial and speech gestures than what is o… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

    Comments: 5 pages. arXiv admin note: text overlap with arXiv:2102.07054

    Journal ref: Proceedings of the 2021 International Conference on Multimodal Interaction