Skip to main content

Showing 1–4 of 4 results for author: Ghahabi, O

Searching in archive cs. Search in all archives.
.
  1. arXiv:2106.11075  [pdf, other

    cs.SD cs.AI eess.AS

    EML Online Speech Activity Detection for the Fearless Steps Challenge Phase-III

    Authors: Omid Ghahabi, Volker Fischer

    Abstract: Speech Activity Detection (SAD), locating speech segments within an audio recording, is a main part of most speech technology applications. Robust SAD is usually more difficult in noisy conditions with varying signal-to-noise ratios (SNR). The Fearless Steps challenge has recently provided such data from the NASA Apollo-11 mission for different speech processing tasks including SAD. Most audio rec… ▽ More

    Submitted 21 June, 2021; originally announced June 2021.

  2. arXiv:2010.12497  [pdf, other

    cs.SD cs.CL eess.AS

    EML System Description for VoxCeleb Speaker Diarization Challenge 2020

    Authors: Omid Ghahabi, Volker Fischer

    Abstract: This technical report describes the EML submission to the first VoxCeleb speaker diarization challenge. Although the aim of the challenge has been the offline processing of the signals, the submitted system is basically the EML online algorithm which decides about the speaker labels in runtime approximately every 1.2 sec. For the first phase of the challenge, only VoxCeleb2 dev dataset was used fo… ▽ More

    Submitted 23 October, 2020; originally announced October 2020.

  3. Deep Learning for Single and Multi-Session i-Vector Speaker Recognition

    Authors: Omid Ghahabi, Javier Hernando

    Abstract: The promising performance of Deep Learning (DL) in speech recognition has motivated the use of DL in other speech technology applications such as speaker recognition. Given i-vectors as inputs, the authors proposed an impostor selection algorithm and a universal model adaptation process in a hybrid system based on Deep Belief Networks (DBN) and Deep Neural Networks (DNN) to discriminatively model… ▽ More

    Submitted 8 December, 2015; originally announced December 2015.

    Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, Volume: 25, Issue: 4, April 2017

  4. Adaptive Variable Degree-k Zero-Trees for Re-Encoding of Perceptually Quantized Wavelet-Packet Transformed Audio and High Quality Speech

    Authors: Omid Ghahabi, Mohammad H. Savoji

    Abstract: A fast, efficient and scalable algorithm is proposed, in this paper, for re-encoding of perceptually quantized wavelet-packet transform (WPT) coefficients of audio and high quality speech and is called "adaptive variable degree-k zero-trees" (AVDZ). The quantization process is carried out by taking into account some basic perceptual considerations, and achieves good subjective quality with low com… ▽ More

    Submitted 16 January, 2011; originally announced January 2011.

    Comments: 30 pages (Double space), 15 figures, 5 tables, ISRN Signal Processing (in Press)