Skip to main content

Showing 1–4 of 4 results for author: Girish, K V V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2504.09081  [pdf, other

    eess.AS cs.AI cs.CL

    SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning

    Authors: Prabhat Pandey, Rupak Vignesh Swaminathan, K V Vijay Girish, Arunasish Sen, Jian Xie, Grant P. Strimel, Andreas Schwarz

    Abstract: We introduce SIFT (Speech Instruction Fine-Tuning), a 50M-example dataset designed for instruction fine-tuning and pre-training of speech-text large language models (LLMs). SIFT-50M is built from publicly available speech corpora, which collectively contain 14K hours of speech, and leverages LLMs along with off-the-shelf expert models. The dataset spans five languages, encompassing a diverse range… ▽ More

    Submitted 17 April, 2025; v1 submitted 12 April, 2025; originally announced April 2025.

  2. arXiv:1609.09764  [pdf, ps, other

    cs.SD

    Adaptive dictionary based approach for background noise and speaker classification and subsequent source separation

    Authors: K V Vijay Girish, A G Ramakrishnan, T V Ananthapadmanabha

    Abstract: A judicious combination of dictionary learning methods, block sparsity and source recovery algorithm are used in a hierarchical manner to identify the noises and the speakers from a noisy conversation between two people. Conversations are simulated using speech from two speakers, each with a different background noise, with varied SNR values, down to -10 dB. Ten each of randomly chosen male and fe… ▽ More

    Submitted 28 October, 2016; v1 submitted 30 September, 2016; originally announced September 2016.

    Comments: 12 pages

  3. arXiv:1510.07774  [pdf, ps, other

    cs.SD

    A dictionary learning and source recovery based approach to classify diverse audio sources

    Authors: K V Vijay Girish, T V Ananthapadmanabha, A G Ramakrishnan

    Abstract: A dictionary learning based audio source classification algorithm is proposed to classify a sample audio signal as one amongst a finite set of different audio sources. Cosine similarity measure is used to select the atoms during dictionary learning. Based on three objective measures proposed, namely, signal to distortion ratio (SDR), the number of non-zero weights and the sum of weights, a frame-w… ▽ More

    Submitted 27 October, 2015; originally announced October 2015.

    Comments: 5 pages, 5 figures

    ACM Class: H.5.1

  4. arXiv:1411.0370  [pdf, ps, other

    cs.SD

    Detection of transitions between broad phonetic classes in a speech signal

    Authors: T V Ananthapadmanabha, K V Vijay Girish, A G Ramakrishnan

    Abstract: Detection of transitions between broad phonetic classes in a speech signal is an important problem which has applications such as landmark detection and segmentation. The proposed hierarchical method detects silence to non-silence transitions, high amplitude (mostly sonorants) to low ampli- tude (mostly fricatives/affricates/stop bursts) transitions and vice-versa. A subset of the extremum (minimu… ▽ More

    Submitted 3 November, 2014; originally announced November 2014.

    Comments: 12 pages, 5 figures