Skip to main content

Showing 1–8 of 8 results for author: Pandey, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2501.12501  [pdf, other

    eess.AS cs.SD

    A Domain Adaptation Framework for Speech Recognition Systems with Only Synthetic data

    Authors: Minh Tran, Yutong Pang, Debjyoti Paul, Laxmi Pandey, Kevin Jiang, Jinxi Guo, Ke Li, Shun Zhang, Xuedong Zhang, Xin Lei

    Abstract: We introduce DAS (Domain Adaptation with Synthetic data), a novel domain adaptation framework for pre-trained ASR model, designed to efficiently adapt to various language-defined domains without requiring any real data. In particular, DAS first prompts large language models (LLMs) to generate domain-specific texts before converting these texts to speech via text-to-speech technology. The synthetic… ▽ More

    Submitted 21 January, 2025; originally announced January 2025.

    Comments: ICASSP 2025

  2. Implementing An Artificial Quantum Perceptron

    Authors: Ashutosh Hathidara, Lalit Pandey

    Abstract: A Perceptron is a fundamental building block of a neural network. The flexibility and scalability of perceptron make it ubiquitous in building intelligent systems. Studies have shown the efficacy of a single neuron in making intelligent decisions. Here, we examined and compared two perceptrons with distinct mechanisms, and developed a quantum version of one of those perceptrons. As a part of this… ▽ More

    Submitted 24 March, 2025; v1 submitted 2 December, 2024; originally announced December 2024.

    Journal ref: Ann Comp Phy Material Sci, 2(1), 01-05 (2025)

  3. arXiv:2407.16664  [pdf, other

    cs.CL eess.AS

    Towards scalable efficient on-device ASR with transfer learning

    Authors: Laxmi Pandey, Ke Li, Jinxi Guo, Debjyoti Paul, Arthur Guo, Jay Mahadeokar, Xuedong Zhang

    Abstract: Multilingual pretraining for transfer learning significantly boosts the robustness of low-resource monolingual ASR models. This study systematically investigates three main aspects: (a) the impact of transfer learning on model performance during initial training or fine-tuning, (b) the influence of transfer learning across dataset domains and languages, and (c) the effect on rare-word recognition… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  4. arXiv:2312.02843  [pdf, other

    cs.CV cs.AI cs.LG cs.NE

    Are Vision Transformers More Data Hungry Than Newborn Visual Systems?

    Authors: Lalit Pandey, Samantha M. W. Wood, Justin N. Wood

    Abstract: Vision transformers (ViTs) are top performing models on many computer vision benchmarks and can accurately predict human behavior on object recognition tasks. However, researchers question the value of using ViTs as models of biological learning because ViTs are thought to be more data hungry than brains, with ViTs requiring more training data to reach similar levels of performance. To test this a… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Comments: Accepted in Thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023)

  5. arXiv:2307.00653  [pdf, other

    cs.AI cs.GT

    Neuro-Symbolic Sudoku Solver

    Authors: Ashutosh Hathidara, Lalit Pandey

    Abstract: Deep Neural Networks have achieved great success in some of the complex tasks that humans can do with ease. These include image recognition/classification, natural language processing, game playing etc. However, modern Neural Networks fail or perform poorly when trained on tasks that can be solved easily using backtracking and traditional algorithms. Therefore, we use the architecture of the Neuro… ▽ More

    Submitted 2 July, 2023; originally announced July 2023.

    Comments: Published as a conference paper at KDD KiML 2023

  6. arXiv:2212.14776  [pdf, ps, other

    cs.LG

    On the Interpretability of Attention Networks

    Authors: Lakshmi Narayan Pandey, Rahul Vashisht, Harish G. Ramaswamy

    Abstract: Attention mechanisms form a core component of several successful deep learning architectures, and are based on one key idea: ''The output depends only on a small (but unknown) segment of the input.'' In several practical applications like image captioning and language translation, this is mostly true. In trained models with an attention mechanism, the outputs of an intermediate module that encodes… ▽ More

    Submitted 14 May, 2023; v1 submitted 30 December, 2022; originally announced December 2022.

    Comments: ACML 2022,PMLR, Volume 189, https://proceedings.mlr.press/v189/pandey23a/pandey23a.pdf

    Journal ref: Proceedings of The 14th Asian Conference on Machine, 832--847, 2023, Volume:189; PMLR

  7. arXiv:2207.09674  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Improving Data Driven Inverse Text Normalization using Data Augmentation

    Authors: Laxmi Pandey, Debjyoti Paul, Pooja Chitkara, Yutong Pang, Xuedong Zhang, Kjell Schubert, Mark Chou, Shu Liu, Yatharth Saraf

    Abstract: Inverse text normalization (ITN) is used to convert the spoken form output of an automatic speech recognition (ASR) system to a written form. Traditional handcrafted ITN rules can be complex to transcribe and maintain. Meanwhile neural modeling approaches require quality large-scale spoken-written pair examples in the same or similar domain as the ASR system (in-domain data), to train. Both these… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

  8. arXiv:2106.08706  [pdf, other

    eess.IV cs.CV cs.HC cs.LG cs.SD eess.AS

    Silent Speech and Emotion Recognition from Vocal Tract Shape Dynamics in Real-Time MRI

    Authors: Laxmi Pandey, Ahmed Sabbir Arif

    Abstract: Speech sounds of spoken language are obtained by varying configuration of the articulators surrounding the vocal tract. They contain abundant information that can be utilized to better understand the underlying mechanism of human speech production. We propose a novel deep neural network-based learning framework that understands acoustic information in the variable-length sequence of vocal tract sh… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

    Comments: 8 pages

    ACM Class: I.4.9; I.2.10