-
Retrieval Augmented Correction of Named Entity Speech Recognition Errors
Authors:
Ernest Pusateri,
Anmol Walia,
Anirudh Kashi,
Bortik Bandyopadhyay,
Nadia Hyder,
Sayantan Mahinder,
Raviteja Anantha,
Daben Liu,
Sashank Gondala
Abstract:
In recent years, end-to-end automatic speech recognition (ASR) systems have proven themselves remarkably accurate and performant, but these systems still have a significant error rate for entity names which appear infrequently in their training data. In parallel to the rise of end-to-end ASR systems, large language models (LLMs) have proven to be a versatile tool for various natural language proce…
▽ More
In recent years, end-to-end automatic speech recognition (ASR) systems have proven themselves remarkably accurate and performant, but these systems still have a significant error rate for entity names which appear infrequently in their training data. In parallel to the rise of end-to-end ASR systems, large language models (LLMs) have proven to be a versatile tool for various natural language processing (NLP) tasks. In NLP tasks where a database of relevant knowledge is available, retrieval augmented generation (RAG) has achieved impressive results when used with LLMs. In this work, we propose a RAG-like technique for correcting speech recognition entity name errors. Our approach uses a vector database to index a set of relevant entities. At runtime, database queries are generated from possibly errorful textual ASR hypotheses, and the entities retrieved using these queries are fed, along with the ASR hypotheses, to an LLM which has been adapted to correct ASR errors. Overall, our best system achieves 33%-39% relative word error rate reductions on synthetic test sets focused on voice assistant queries of rare music entities without regressing on the STOP test set, a publicly available voice assistant test set covering many domains.
△ Less
Submitted 9 September, 2024;
originally announced September 2024.
-
Dual-Camera Joint Deblurring-Denoising
Authors:
Shayan Shekarforoush,
Amanpreet Walia,
Marcus A. Brubaker,
Konstantinos G. Derpanis,
Alex Levinshtein
Abstract:
Recent image enhancement methods have shown the advantages of using a pair of long and short-exposure images for low-light photography. These image modalities offer complementary strengths and weaknesses. The former yields an image that is clean but blurry due to camera or object motion, whereas the latter is sharp but noisy due to low photon count. Motivated by the fact that modern smartphones co…
▽ More
Recent image enhancement methods have shown the advantages of using a pair of long and short-exposure images for low-light photography. These image modalities offer complementary strengths and weaknesses. The former yields an image that is clean but blurry due to camera or object motion, whereas the latter is sharp but noisy due to low photon count. Motivated by the fact that modern smartphones come equipped with multiple rear-facing camera sensors, we propose a novel dual-camera method for obtaining a high-quality image. Our method uses a synchronized burst of short exposure images captured by one camera and a long exposure image simultaneously captured by another. Having a synchronized short exposure burst alongside the long exposure image enables us to (i) obtain better denoising by using a burst instead of a single image, (ii) recover motion from the burst and use it for motion-aware deblurring of the long exposure image, and (iii) fuse the two results to further enhance quality. Our method is able to achieve state-of-the-art results on synthetic dual-camera images from the GoPro dataset with five times fewer training parameters compared to the next best method. We also show that our method qualitatively outperforms competing approaches on real synchronized dual-camera captures.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
Gated Stereo: Joint Depth Estimation from Gated and Wide-Baseline Active Stereo Cues
Authors:
Stefanie Walz,
Mario Bijelic,
Andrea Ramazzina,
Amanpreet Walia,
Fahim Mannan,
Felix Heide
Abstract:
We propose Gated Stereo, a high-resolution and long-range depth estimation technique that operates on active gated stereo images. Using active and high dynamic range passive captures, Gated Stereo exploits multi-view cues alongside time-of-flight intensity cues from active gating. To this end, we propose a depth estimation method with a monocular and stereo depth prediction branch which are combin…
▽ More
We propose Gated Stereo, a high-resolution and long-range depth estimation technique that operates on active gated stereo images. Using active and high dynamic range passive captures, Gated Stereo exploits multi-view cues alongside time-of-flight intensity cues from active gating. To this end, we propose a depth estimation method with a monocular and stereo depth prediction branch which are combined in a final fusion stage. Each block is supervised through a combination of supervised and gated self-supervision losses. To facilitate training and validation, we acquire a long-range synchronized gated stereo dataset for automotive scenarios. We find that the method achieves an improvement of more than 50 % MAE compared to the next best RGB stereo method, and 74 % MAE to existing monocular gated methods for distances up to 160 m. Our code,models and datasets are available here.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
Training Large-Vocabulary Neural Language Models by Private Federated Learning for Resource-Constrained Devices
Authors:
Mingbin Xu,
Congzheng Song,
Ye Tian,
Neha Agrawal,
Filip Granqvist,
Rogier van Dalen,
Xiao Zhang,
Arturo Argueta,
Shiyi Han,
Yaqiao Deng,
Leo Liu,
Anmol Walia,
Alex Jin
Abstract:
Federated Learning (FL) is a technique to train models using data distributed across devices. Differential Privacy (DP) provides a formal privacy guarantee for sensitive data. Our goal is to train a large neural network language model (NNLM) on compute-constrained devices while preserving privacy using FL and DP. However, the DP-noise introduced to the model increases as the model size grows, whic…
▽ More
Federated Learning (FL) is a technique to train models using data distributed across devices. Differential Privacy (DP) provides a formal privacy guarantee for sensitive data. Our goal is to train a large neural network language model (NNLM) on compute-constrained devices while preserving privacy using FL and DP. However, the DP-noise introduced to the model increases as the model size grows, which often prevents convergence. We propose Partial Embedding Updates (PEU), a novel technique to decrease noise by decreasing payload size. Furthermore, we adopt Low Rank Adaptation (LoRA) and Noise Contrastive Estimation (NCE) to reduce the memory demands of large models on compute-constrained devices. This combination of techniques makes it possible to train large-vocabulary language models while preserving accuracy and privacy.
△ Less
Submitted 18 July, 2022;
originally announced July 2022.
-
Gated2Gated: Self-Supervised Depth Estimation from Gated Images
Authors:
Amanpreet Walia,
Stefanie Walz,
Mario Bijelic,
Fahim Mannan,
Frank Julca-Aguilar,
Michael Langer,
Werner Ritter,
Felix Heide
Abstract:
Gated cameras hold promise as an alternative to scanning LiDAR sensors with high-resolution 3D depth that is robust to back-scatter in fog, snow, and rain. Instead of sequentially scanning a scene and directly recording depth via the photon time-of-flight, as in pulsed LiDAR sensors, gated imagers encode depth in the relative intensity of a handful of gated slices, captured at megapixel resolution…
▽ More
Gated cameras hold promise as an alternative to scanning LiDAR sensors with high-resolution 3D depth that is robust to back-scatter in fog, snow, and rain. Instead of sequentially scanning a scene and directly recording depth via the photon time-of-flight, as in pulsed LiDAR sensors, gated imagers encode depth in the relative intensity of a handful of gated slices, captured at megapixel resolution. Although existing methods have shown that it is possible to decode high-resolution depth from such measurements, these methods require synchronized and calibrated LiDAR to supervise the gated depth decoder -- prohibiting fast adoption across geographies, training on large unpaired datasets, and exploring alternative applications outside of automotive use cases. In this work, we fill this gap and propose an entirely self-supervised depth estimation method that uses gated intensity profiles and temporal consistency as a training signal. The proposed model is trained end-to-end from gated video sequences, does not require LiDAR or RGB data, and learns to estimate absolute depth values. We take gated slices as input and disentangle the estimation of the scene albedo, depth, and ambient light, which are then used to learn to reconstruct the input slices through a cyclic loss. We rely on temporal consistency between a given frame and neighboring gated slices to estimate depth in regions with shadows and reflections. We experimentally validate that the proposed approach outperforms existing supervised and self-supervised depth estimation methods based on monocular RGB and stereo images, as well as supervised methods based on gated images.
△ Less
Submitted 4 December, 2021;
originally announced December 2021.
-
A Novel Advanced Heap Corruption and Security Method
Authors:
Arundhati Walia,
Syed i. Ahson
Abstract:
Heap security has been a major concern since the past two decades. Recently many methods have been proposed to secure heap i.e. to avoid heap overrun and attacks. The paper describes a method suggested to secure heap at the operating system level. Major emphasis is given to Solaris operating system's dynamic memory manager. When memory is required dynamically during runtime, the SysVmalloc acts as…
▽ More
Heap security has been a major concern since the past two decades. Recently many methods have been proposed to secure heap i.e. to avoid heap overrun and attacks. The paper describes a method suggested to secure heap at the operating system level. Major emphasis is given to Solaris operating system's dynamic memory manager. When memory is required dynamically during runtime, the SysVmalloc acts as a memory allocator.Vmalloc allocates the chunks of memory in the form of splay tree structure. A self adjusting binary tree structure is reviewed in the paper, moreover major security issue to secure heap area is also suggested in the paper
△ Less
Submitted 7 June, 2012;
originally announced June 2012.