Skip to main content

Showing 1–7 of 7 results for author: Palaskar, S

Searching in archive eess. Search in all archives.
.
  1. arXiv:2406.09617  [pdf, other

    cs.CL cs.HC eess.AS

    Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection

    Authors: Shruti Palaskar, Oggi Rudovic, Sameer Dharur, Florian Pesce, Gautam Krishna, Aswin Sivaraman, Jack Berkowitz, Ahmed Hussen Abdelaziz, Saurabh Adya, Ahmed Tewfik

    Abstract: Although Large Language Models (LLMs) have shown promise for human-like conversations, they are primarily pre-trained on text data. Incorporating audio or video improves performance, but collecting large-scale multimodal data and pre-training multimodal LLMs is challenging. To this end, we propose a Fusion Low Rank Adaptation (FLoRA) technique that efficiently adapts a pre-trained unimodal LLM to… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech 2024

  2. arXiv:2110.06263  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    Speech Summarization using Restricted Self-Attention

    Authors: Roshan Sharma, Shruti Palaskar, Alan W Black, Florian Metze

    Abstract: Speech summarization is typically performed by using a cascade of speech recognition and text summarization models. End-to-end modeling of speech summarization models is challenging due to memory and compute constraints arising from long input audio sequences. Recent work in document summarization has inspired methods to reduce the complexity of self-attentions, which enables transformer models to… ▽ More

    Submitted 24 January, 2022; v1 submitted 12 October, 2021; originally announced October 2021.

    Comments: Accepted at ICASSP 2022

  3. arXiv:2011.11610  [pdf, other

    cs.CV cs.LG eess.IV

    Transfer Learning for Oral Cancer Detection using Microscopic Images

    Authors: Rutwik Palaskar, Renu Vyas, Vilas Khedekar, Sangeeta Palaskar, Pranjal Sahu

    Abstract: Oral cancer has more than 83% survival rate if detected in its early stages, however, only 29% of cases are currently detected early. Deep learning techniques can detect patterns of oral cancer cells and can aid in its early detection. In this work, we present the first results of neural networks for oral cancer detection using microscopic images. We compare numerous state-of-the-art models via tr… ▽ More

    Submitted 9 April, 2021; v1 submitted 23 November, 2020; originally announced November 2020.

  4. arXiv:2003.07692  [pdf, ps, other

    eess.AS cs.LG cs.SD stat.ML

    ASR Error Correction and Domain Adaptation Using Machine Translation

    Authors: Anirudh Mani, Shruti Palaskar, Nimshi Venkat Meripo, Sandeep Konam, Florian Metze

    Abstract: Off-the-shelf pre-trained Automatic Speech Recognition (ASR) systems are an increasingly viable service for companies of any size building speech-based products. While these ASR systems are trained on large amounts of data, domain mismatch is still an issue for many such parties that want to use this service as-is leading to not so optimal results for their task. We propose a simple technique to p… ▽ More

    Submitted 13 March, 2020; originally announced March 2020.

    Comments: Accepted for Oral Presentation at ICASSP 2020

  5. arXiv:1902.06833  [pdf, other

    cs.CL cs.SD eess.AS

    Learned In Speech Recognition: Contextual Acoustic Word Embeddings

    Authors: Shruti Palaskar, Vikas Raunak, Florian Metze

    Abstract: End-to-end acoustic-to-word speech recognition models have recently gained popularity because they are easy to train, scale well to large amounts of training data, and do not require a lexicon. In addition, word models may also be easier to integrate with downstream tasks such as spoken language understanding, because inference (search) is much simplified compared to phoneme, character or any othe… ▽ More

    Submitted 18 February, 2019; originally announced February 2019.

    Comments: Accepted at ICASSP 2019, 5 pages, 1 figure, 3 tables

  6. arXiv:1807.09597  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Acoustic-to-Word Recognition with Sequence-to-Sequence Models

    Authors: Shruti Palaskar, Florian Metze

    Abstract: Acoustic-to-Word recognition provides a straightforward solution to end-to-end speech recognition without needing external decoding, language model re-scoring or lexicon. While character-based models offer a natural solution to the out-of-vocabulary problem, word models can be simpler to decode and may also be able to directly recognize semantically meaningful units. We present effective methods t… ▽ More

    Submitted 21 August, 2018; v1 submitted 23 July, 2018; originally announced July 2018.

    Comments: 9 pages, 3 figures, Under Review at SLT 2018

  7. arXiv:1804.09713  [pdf, other

    eess.AS cs.CL cs.LG

    End-to-End Multimodal Speech Recognition

    Authors: Shruti Palaskar, Ramon Sanabria, Florian Metze

    Abstract: Transcription or sub-titling of open-domain videos is still a challenging domain for Automatic Speech Recognition (ASR) due to the data's challenging acoustics, variable signal processing and the essentially unrestricted domain of the data. In previous work, we have shown that the visual channel -- specifically object and scene features -- can help to adapt the acoustic model (AM) and language mod… ▽ More

    Submitted 25 April, 2018; originally announced April 2018.

    Comments: 5 pages, 5 figures, Accepted at IEEE International Conference on Acoustics, Speech and Signal Processing 2018 (ICASSP 2018)