Skip to main content

Showing 1–8 of 8 results for author: Ghorbani, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2310.11004  [pdf, other

    eess.AS eess.SP

    Advanced accent/dialect identification and accentedness assessment with multi-embedding models and automatic speech recognition

    Authors: Shahram Ghorbani, John H. L. Hansen

    Abstract: Accurately classifying accents and assessing accentedness in non-native speakers are both challenging tasks due to the complexity and diversity of accent and dialect variations. In this study, embeddings from advanced pre-trained language identification (LID) and speaker identification (SID) models are leveraged to improve the accuracy of accent classification and non-native accentedness assessmen… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

    Comments: Submitted to The Journal of the Acoustical Society of America

  2. arXiv:2011.04084  [pdf, other

    eess.AS cs.SD eess.IV

    Listen, Look and Deliberate: Visual context-aware speech recognition using pre-trained text-video representations

    Authors: Shahram Ghorbani, Yashesh Gaur, Yu Shi, Jinyu Li

    Abstract: In this study, we try to address the problem of leveraging visual signals to improve Automatic Speech Recognition (ASR), also known as visual context-aware ASR (VC-ASR). We explore novel VC-ASR approaches to leverage video and text representations extracted by a self-supervised pre-trained text-video embedding model. Firstly, we propose a multi-stream attention architecture to leverage signals fro… ▽ More

    Submitted 8 November, 2020; originally announced November 2020.

    Comments: Accepted at SLT 2021

  3. arXiv:2007.09131  [pdf, other

    eess.AS cs.SD eess.SP

    SkipConvNet: Skip Convolutional Neural Network for Speech Dereverberation using Optimally Smoothed Spectral Mapping

    Authors: Vinay Kothapally, Wei Xia, Shahram Ghorbani, John H. L. Hansen, Wei Xue, Jing Huang

    Abstract: The reliability of using fully convolutional networks (FCNs) has been successfully demonstrated by recent studies in many speech applications. One of the most popular variants of these FCNs is the `U-Net', which is an encoder-decoder network with skip connections. In this study, we propose `SkipConvNet' where we replace each skip connection with multiple convolutional modules to provide decoder wi… ▽ More

    Submitted 17 July, 2020; originally announced July 2020.

    Comments: Submitted to Interspeech2020

  4. MoVi: A Large Multipurpose Motion and Video Dataset

    Authors: Saeed Ghorbani, Kimia Mahdaviani, Anne Thaler, Konrad Kording, Douglas James Cook, Gunnar Blohm, Nikolaus F. Troje

    Abstract: Human movements are both an area of intense study and the basis of many applications such as character animation. For many applications, it is crucial to identify movements from videos or analyze datasets of movements. Here we introduce a new human Motion and Video dataset MoVi, which we make available publicly. It contains 60 female and 30 male actors performing a collection of 20 predefined ever… ▽ More

    Submitted 3 March, 2020; originally announced March 2020.

  5. arXiv:2001.01656  [pdf, other

    eess.AS cs.SD

    Audio-visual Recognition of Overlapped speech for the LRS2 dataset

    Authors: Jianwei Yu, Shi-Xiong Zhang, Jian Wu, Shahram Ghorbani, Bo Wu, Shiyin Kang, Shansong Liu, Xunying Liu, Helen Meng, Dong Yu

    Abstract: Automatic recognition of overlapped speech remains a highly challenging task to date. Motivated by the bimodal nature of human speech perception, this paper investigates the use of audio-visual technologies for overlapped speech recognition. Three issues associated with the construction of audio-visual speech recognition (AVSR) systems are addressed. First, the basic architecture designs i.e. end-… ▽ More

    Submitted 6 January, 2020; originally announced January 2020.

    Comments: 5 pages, 5 figures, submitted to icassp2019

  6. arXiv:1910.00565  [pdf, ps, other

    eess.AS cs.CL cs.LG

    Domain Expansion in DNN-based Acoustic Models for Robust Speech Recognition

    Authors: Shahram Ghorbani, Soheil Khorram, John H. L. Hansen

    Abstract: Training acoustic models with sequentially incoming data -- while both leveraging new data and avoiding the forgetting effect-- is an essential obstacle to achieving human intelligence level in speech recognition. An obvious approach to leverage data from a new domain (e.g., new accented speech) is to first generate a comprehensive dataset of all domains, by combining all available data, and then… ▽ More

    Submitted 1 October, 2019; originally announced October 2019.

    Comments: Accepted at ASRU, 2019

  7. Leveraging native language information for improved accented speech recognition

    Authors: Shahram Ghorbani, John H. L. Hansen

    Abstract: Recognition of accented speech is a long-standing challenge for automatic speech recognition (ASR) systems, given the increasing worldwide population of bi-lingual speakers with English as their second language. If we consider foreign-accented speech as an interpolation of the native language (L1) and English (L2), using a model that can simultaneously address both languages would perform better a… ▽ More

    Submitted 18 April, 2019; originally announced April 2019.

    Comments: Accepted at Interspeech 2018

  8. arXiv:1809.06833  [pdf, other

    eess.AS

    Advancing Multi-Accented LSTM-CTC Speech Recognition using a Domain Specific Student-Teacher Learning Paradigm

    Authors: Shahram Ghorbani, Ahmet E. Bulut, John H. L. Hansen

    Abstract: Non-native speech causes automatic speech recognition systems to degrade in performance. Past strategies to address this challenge have considered model adaptation, accent classification with a model selection, alternate pronunciation lexicon, etc. In this study, we consider a recurrent neural network (RNN) with connectionist temporal classification (CTC) cost function trained on multi-accent Engl… ▽ More

    Submitted 1 October, 2019; v1 submitted 18 September, 2018; originally announced September 2018.

    Comments: Accepted at SLT 2018