Skip to main content

Showing 1–21 of 21 results for author: Khorram, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.02582  [pdf, other

    cs.SD cs.AI eess.AS

    Clustering and Mining Accented Speech for Inclusive and Fair Speech Recognition

    Authors: Jaeyoung Kim, Han Lu, Soheil Khorram, Anshuman Tripathi, Qian Zhang, Hasim Sak

    Abstract: Modern automatic speech recognition (ASR) systems are typically trained on more than tens of thousands hours of speech data, which is one of the main factors for their great success. However, the distribution of such data is typically biased towards common accents or typical speech patterns. As a result, those systems often poorly perform on atypical accented speech. In this paper, we present acce… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  2. arXiv:2402.17065  [pdf, other

    cs.CV cs.AI cs.LG

    Taming the Tail in Class-Conditional GANs: Knowledge Sharing via Unconditional Training at Lower Resolutions

    Authors: Saeed Khorram, Mingqi Jiang, Mohamad Shahbazi, Mohamad H. Danesh, Li Fuxin

    Abstract: Despite extensive research on training generative adversarial networks (GANs) with limited training data, learning to generate images from long-tailed training distributions remains fairly unexplored. In the presence of imbalanced multi-class training data, GANs tend to favor classes with more samples, leading to the generation of low-quality and less diverse samples in tail classes. In this study… ▽ More

    Submitted 16 June, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

  3. arXiv:2212.06872  [pdf, other

    cs.CV

    Comparing the Decision-Making Mechanisms by Transformers and CNNs via Explanation Methods

    Authors: Mingqi Jiang, Saeed Khorram, Li Fuxin

    Abstract: In order to gain insights about the decision-making of different visual recognition backbones, we propose two methodologies, sub-explanation counting and cross-testing, that systematically applies deep explanation algorithms on a dataset-wide basis, and compares the statistics generated from the amount and nature of the explanations. These methodologies reveal the difference among networks in term… ▽ More

    Submitted 24 June, 2024; v1 submitted 13 December, 2022; originally announced December 2022.

    Comments: 25 pages with 37 figures, to be published in CVPR24. Project Webpage: https://mingqij.github.io/projects/cdmmtc/

  4. arXiv:2205.14054  [pdf, other

    cs.LG

    Contrastive Siamese Network for Semi-supervised Speech Recognition

    Authors: Soheil Khorram, Jaeyoung Kim, Anshuman Tripathi, Han Lu, Qian Zhang, Hasim Sak

    Abstract: This paper introduces contrastive siamese (c-siam) network, an architecture for leveraging unlabeled acoustic data in speech recognition. c-siam is the first network that extracts high-level linguistic information from speech by matching outputs of two identical transformer encoders. It contains augmented and target branches which are trained by: (1) masking inputs and matching outputs with a cont… ▽ More

    Submitted 27 May, 2022; originally announced May 2022.

  5. arXiv:2203.15064  [pdf, other

    cs.CV cs.AI cs.LG

    Cycle-Consistent Counterfactuals by Latent Transformations

    Authors: Saeed Khorram, Li Fuxin

    Abstract: CounterFactual (CF) visual explanations try to find images similar to the query image that change the decision of a vision system to a specified outcome. Existing methods either require inference-time optimization or joint training with a generative adversarial model which makes them time-consuming and difficult to use in practice. We propose a novel approach, Cycle-Consistent Counterfactuals by L… ▽ More

    Submitted 28 March, 2022; originally announced March 2022.

  6. arXiv:2109.06365  [pdf, other

    cs.CV cs.LG

    From Heatmaps to Structural Explanations of Image Classifiers

    Authors: Li Fuxin, Zhongang Qi, Saeed Khorram, Vivswan Shitole, Prasad Tadepalli, Minsuk Kahng, Alan Fern

    Abstract: This paper summarizes our endeavors in the past few years in terms of explaining image classifiers, with the aim of including negative results and insights we have gained. The paper starts with describing the explainable neural network (XNN), which attempts to extract and visualize several high-level concepts purely from the deep network, without relying on human linguistic concepts. This helps us… ▽ More

    Submitted 13 September, 2021; originally announced September 2021.

    Comments: Submitted to Applied AI Letters

    Journal ref: Applied AI Letters.2021;2:e46

  7. arXiv:2105.00339  [pdf, other

    cs.LG cs.AI cs.DC

    Stochastic Block-ADMM for Training Deep Networks

    Authors: Saeed Khorram, Xiao Fu, Mohamad H. Danesh, Zhongang Qi, Li Fuxin

    Abstract: In this paper, we propose Stochastic Block-ADMM as an approach to train deep neural networks in batch and online settings. Our method works by splitting neural networks into an arbitrary number of blocks and utilizes auxiliary variables to connect these blocks while optimizing with stochastic gradient descent. This allows training deep networks with non-differentiable constraints where conventiona… ▽ More

    Submitted 1 May, 2021; originally announced May 2021.

  8. arXiv:2012.15783  [pdf, other

    cs.CV cs.LG eess.IV

    iGOS++: Integrated Gradient Optimized Saliency by Bilateral Perturbations

    Authors: Saeed Khorram, Tyler Lawson, Fuxin Li

    Abstract: The black-box nature of the deep networks makes the explanation for "why" they make certain predictions extremely challenging. Saliency maps are one of the most widely-used local explanation tools to alleviate this problem. One of the primary approaches for generating saliency maps is by optimizing a mask over the input dimensions so that the output of the network is influenced the most by the mas… ▽ More

    Submitted 1 May, 2021; v1 submitted 31 December, 2020; originally announced December 2020.

  9. arXiv:2006.03745  [pdf, other

    cs.LG stat.ML

    Re-understanding Finite-State Representations of Recurrent Policy Networks

    Authors: Mohamad H. Danesh, Anurag Koul, Alan Fern, Saeed Khorram

    Abstract: We introduce an approach for understanding control policies represented as recurrent neural networks. Recent work has approached this problem by transforming such recurrent policy networks into finite-state machines (FSM) and then analyzing the equivalent minimized FSM. While this led to interesting insights, the minimization process can obscure a deeper understanding of a machine's operation by m… ▽ More

    Submitted 11 July, 2021; v1 submitted 5 June, 2020; originally announced June 2020.

    Comments: ICML 2021

  10. arXiv:2004.06044  [pdf

    eess.SP cs.LG

    Sleep Stage Scoring Using Joint Frequency-Temporal and Unsupervised Features

    Authors: Mohamadreza Jafaryani, Saeed Khorram, Vahid Pourahmadi, Minoo Shahbazi

    Abstract: Patients with sleep disorders can better manage their lifestyle if they know about their special situations. Detection of such sleep disorders is usually possible by analyzing a number of vital signals that have been collected from the patients. To simplify this task, a number of Automatic Sleep Stage Recognition (ASSR) methods have been proposed. Most of these methods use temporal-frequency featu… ▽ More

    Submitted 9 April, 2020; originally announced April 2020.

  11. arXiv:1910.07047  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Analyzing Large Receptive Field Convolutional Networks for Distant Speech Recognition

    Authors: Salar Jafarlou, Soheil Khorram, Vinay Kothapally, John H. L. Hansen

    Abstract: Despite significant efforts over the last few years to build a robust automatic speech recognition (ASR) system for different acoustic settings, the performance of the current state-of-the-art technologies significantly degrades in noisy reverberant environments. Convolutional Neural Networks (CNNs) have been successfully used to achieve substantial improvements in many speech processing applica… ▽ More

    Submitted 15 October, 2019; originally announced October 2019.

    Comments: ASRU 2019

  12. arXiv:1910.00565  [pdf, ps, other

    eess.AS cs.CL cs.LG

    Domain Expansion in DNN-based Acoustic Models for Robust Speech Recognition

    Authors: Shahram Ghorbani, Soheil Khorram, John H. L. Hansen

    Abstract: Training acoustic models with sequentially incoming data -- while both leveraging new data and avoiding the forgetting effect-- is an essential obstacle to achieving human intelligence level in speech recognition. An obvious approach to leverage data from a new domain (e.g., new accented speech) is to first generate a comprehensive dataset of all domains, by combining all available data, and then… ▽ More

    Submitted 1 October, 2019; originally announced October 2019.

    Comments: Accepted at ASRU, 2019

  13. arXiv:1908.01768  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Probabilistic Permutation Invariant Training for Speech Separation

    Authors: Midia Yousefi, Soheil Khorram, John H. L. Hansen

    Abstract: Single-microphone, speaker-independent speech separation is normally performed through two steps: (i) separating the specific speech sources, and (ii) determining the best output-label assignment to find the separation error. The second step is the main obstacle in training neural networks for speech separation. Recently proposed Permutation Invariant Training (PIT) addresses this problem by deter… ▽ More

    Submitted 4 August, 2019; originally announced August 2019.

    Comments: Interspeech 2019

  14. arXiv:1907.03050  [pdf, other

    cs.LG cs.HC eess.AS stat.ML

    Jointly Aligning and Predicting Continuous Emotion Annotations

    Authors: Soheil Khorram, Melvin G McInnis, Emily Mower Provost

    Abstract: Time-continuous dimensional descriptions of emotions (e.g., arousal, valence) allow researchers to characterize short-time changes and to capture long-term trends in emotion expression. However, continuous emotion labels are generally not synchronized with the input speech signal due to delays caused by reaction-time, which is inherent in human evaluations. To deal with this challenge, we introduc… ▽ More

    Submitted 18 July, 2019; v1 submitted 5 July, 2019; originally announced July 2019.

    Comments: IEEE Transactions on Affective Computing

  15. arXiv:1907.02526  [pdf, other

    cs.SD cs.LG eess.AS

    Convolutional Neural Network-based Speech Enhancement for Cochlear Implant Recipients

    Authors: Nursadul Mamun, Soheil Khorram, John H. L. Hansen

    Abstract: Attempts to develop speech enhancement algorithms with improved speech intelligibility for cochlear implant (CI) users have met with limited success. To improve speech enhancement methods for CI users, we propose to perform speech enhancement in a cochlear filter-bank feature space, a feature-set specifically designed for CI users based on CI auditory stimuli. We leverage a convolutional neural ne… ▽ More

    Submitted 3 July, 2019; originally announced July 2019.

    Comments: Interspeech 2019

  16. Visualizing Deep Networks by Optimizing with Integrated Gradients

    Authors: Zhongang Qi, Saeed Khorram, Li Fuxin

    Abstract: Understanding and interpreting the decisions made by deep learning models is valuable in many domains. In computer vision, computing heatmaps from a deep network is a popular approach for visualizing and understanding deep networks. However, heatmaps that do not correlate with the network may mislead human, hence the performance of heatmaps in providing a faithful explanation to the underlying dee… ▽ More

    Submitted 11 December, 2020; v1 submitted 2 May, 2019; originally announced May 2019.

    Journal ref: AAAI 2020

  17. arXiv:1903.09245  [pdf, other

    cs.LG cs.AI cs.CC stat.ML

    Trainable Time Warping: Aligning Time-Series in the Continuous-Time Domain

    Authors: Soheil Khorram, Melvin G McInnis, Emily Mower Provost

    Abstract: DTW calculates the similarity or alignment between two signals, subject to temporal warping. However, its computational complexity grows exponentially with the number of time-series. Although there have been algorithms developed that are linear in the number of time-series, they are generally quadratic in time-series length. The exception is generalized time warping (GTW), which has linear computa… ▽ More

    Submitted 21 March, 2019; originally announced March 2019.

    Comments: ICASSP 2019

  18. arXiv:1806.10658  [pdf, other

    cs.HC

    The PRIORI Emotion Dataset: Linking Mood to Emotion Detected In-the-Wild

    Authors: Soheil Khorram, Mimansa Jaiswal, John Gideon, Melvin McInnis, Emily Mower Provost

    Abstract: Bipolar Disorder is a chronic psychiatric illness characterized by pathological mood swings associated with severe disruptions in emotion regulation. Clinical monitoring of mood is key to the care of these dynamic and incapacitating mood states. Frequent and detailed monitoring improves clinical sensitivity to detect mood state changes, but typically requires costly and limited resources. Speech c… ▽ More

    Submitted 19 June, 2018; originally announced June 2018.

    Comments: Interspeech 2018

  19. Embedding Deep Networks into Visual Explanations

    Authors: Zhongang Qi, Saeed Khorram, Fuxin Li

    Abstract: In this paper, we propose a novel Explanation Neural Network (XNN) to explain the predictions made by a deep network. The XNN works by learning a nonlinear embedding of a high-dimensional activation vector of a deep network layer into a low-dimensional explanation space while retaining faithfulness i.e., the original deep learning predictions can be constructed from the few concepts extracted by o… ▽ More

    Submitted 11 December, 2020; v1 submitted 15 September, 2017; originally announced September 2017.

    Journal ref: Artificial Intelligence (2020)

  20. arXiv:1708.07050  [pdf, other

    cs.SD cs.AI

    Capturing Long-term Temporal Dependencies with Convolutional Networks for Continuous Emotion Recognition

    Authors: Soheil Khorram, Zakaria Aldeneh, Dimitrios Dimitriadis, Melvin McInnis, Emily Mower Provost

    Abstract: The goal of continuous emotion recognition is to assign an emotion value to every frame in a sequence of acoustic features. We show that incorporating long-term temporal dependencies is critical for continuous emotion recognition tasks. To this end, we first investigate architectures that use dilated convolutions. We show that even though such architectures outperform previously reported systems,… ▽ More

    Submitted 23 August, 2017; originally announced August 2017.

    Comments: 5 pages, 5 figures, 2 tables, Interspeech 2017

  21. arXiv:1706.03256  [pdf, other

    cs.LG

    Progressive Neural Networks for Transfer Learning in Emotion Recognition

    Authors: John Gideon, Soheil Khorram, Zakaria Aldeneh, Dimitrios Dimitriadis, Emily Mower Provost

    Abstract: Many paralinguistic tasks are closely related and thus representations learned in one domain can be leveraged for another. In this paper, we investigate how knowledge can be transferred between three paralinguistic tasks: speaker, emotion, and gender recognition. Further, we extend this problem to cross-dataset tasks, asking how knowledge captured in one emotion dataset can be transferred to anoth… ▽ More

    Submitted 10 June, 2017; originally announced June 2017.

    Comments: 5 pages, 4 figures, to appear in the proceedings of Interspeech 2017