-
On Scaling Contrastive Representations for Low-Resource Speech Recognition
Abstract: Recent advances in self-supervised learning through contrastive training have shown that it is possible to learn a competitive speech recognition system with as little as 10 minutes of labeled data. However, these systems are computationally expensive since they require pre-training followed by fine-tuning in a large parameter space. We explore the performance of such systems without fine-tuning b… ▽ More
Submitted 1 February, 2021; originally announced February 2021.
Comments: © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
-
Utilizing Domain Knowledge in End-to-End Audio Processing
Abstract: End-to-end neural network based approaches to audio modelling are generally outperformed by models trained on high-level data representations. In this paper we present preliminary work that shows the feasibility of training the first layers of a deep convolutional neural network (CNN) model to learn the commonly-used log-scaled mel-spectrogram transformation. Secondly, we demonstrate that upon ini… ▽ More
Submitted 1 December, 2017; originally announced December 2017.
Comments: Accepted at the ML4Audio workshop at the NIPS 2017
-
Exploiting Nontrivial Connectivity for Automatic Speech Recognition
Abstract: Nontrivial connectivity has allowed the training of very deep networks by addressing the problem of vanishing gradients and offering a more efficient method of reusing parameters. In this paper we make a comparison between residual networks, densely-connected networks and highway networks on an image classification task. Next, we show that these methodologies can easily be deployed into automatic… ▽ More
Submitted 28 November, 2017; originally announced November 2017.
Comments: Accepted at the ML4Audio workshop at the NIPS 2017