Skip to main content

Showing 1–2 of 2 results for author: Lapastora, A

.
  1. arXiv:2412.16500  [pdf, other

    eess.AS cs.AI cs.CL

    Speech Retrieval-Augmented Generation without Automatic Speech Recognition

    Authors: Do June Min, Karel Mundnich, Andy Lapastora, Erfan Soltanmohammadi, Srikanth Ronanki, Kyu Han

    Abstract: One common approach for question answering over speech data is to first transcribe speech using automatic speech recognition (ASR) and then employ text-based retrieval-augmented generation (RAG) on the transcriptions. While this cascaded pipeline has proven effective in many practical settings, ASR errors can propagate to the retrieval and generation steps. To overcome this limitation, we introduc… ▽ More

    Submitted 3 January, 2025; v1 submitted 21 December, 2024; originally announced December 2024.

    Comments: ICASSP 2025

  2. arXiv:2405.15750  [pdf, other

    cs.CL cs.AI cs.LG

    Filtered Corpus Training (FiCT) Shows that Language Models can Generalize from Indirect Evidence

    Authors: Abhinav Patil, Jaap Jumelet, Yu Ying Chiu, Andy Lapastora, Peter Shen, Lexie Wang, Clevis Willrich, Shane Steinert-Threlkeld

    Abstract: This paper introduces Filtered Corpus Training, a method that trains language models (LMs) on corpora with certain linguistic constructions filtered out from the training data, and uses it to measure the ability of LMs to perform linguistic generalization on the basis of indirect evidence. We apply the method to both LSTM and Transformer LMs (of roughly comparable size), developing filtered corpor… ▽ More

    Submitted 6 August, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: Forthcoming in Transactions of the Association for Computational Linguistics (TACL). This is a pre-MIT Press publication version. For code and trained models, see http://github.com/CLMBRs/corpus-filtering