Search | arXiv e-print repository

Memory-efficient Stochastic methods for Memory-based Transformers

Authors: Vishwajit Kumar Vishnu, C. Chandra Sekhar

Abstract: Training Memory-based transformers can require a large amount of memory and can be quite inefficient. We propose a novel two-phase training mechanism and a novel regularization technique to improve the training efficiency of memory-based transformers, which are often used for long-range context problems. For our experiments, we consider transformer-XL as our baseline model which is one of memoryba… ▽ More Training Memory-based transformers can require a large amount of memory and can be quite inefficient. We propose a novel two-phase training mechanism and a novel regularization technique to improve the training efficiency of memory-based transformers, which are often used for long-range context problems. For our experiments, we consider transformer-XL as our baseline model which is one of memorybased transformer models. We show that our resultant model, Skip Cross-head TransformerXL, outperforms the baseline on character level language modeling task with similar parameters and outperforms the baseline on word level language modelling task with almost 20% fewer parameters. Our proposed methods do not require any additional memory. We also demonstrate the effectiveness of our regularization mechanism on BERT which shows similar performance with reduction in standard deviation of scores of around 30% on multiple GLUE tasks. △ Less

Submitted 14 November, 2023; originally announced November 2023.

arXiv:2108.02850 [pdf, other]

Unsupervised Domain Adaptation in Speech Recognition using Phonetic Features

Authors: Rupam Ojha, C Chandra Sekhar

Abstract: Automatic speech recognition is a difficult problem in pattern recognition because several sources of variability exist in the speech input like the channel variations, the input might be clean or noisy, the speakers may have different accent and variations in the gender, etc. As a result, domain adaptation is important in speech recognition where we train the model for a particular source domain… ▽ More Automatic speech recognition is a difficult problem in pattern recognition because several sources of variability exist in the speech input like the channel variations, the input might be clean or noisy, the speakers may have different accent and variations in the gender, etc. As a result, domain adaptation is important in speech recognition where we train the model for a particular source domain and test it on a different target domain. In this paper, we propose a technique to perform unsupervised gender-based domain adaptation in speech recognition using phonetic features. The experiments are performed on the TIMIT dataset and there is a considerable decrease in the phoneme error rate using the proposed approach. △ Less

Submitted 4 August, 2021; originally announced August 2021.

Comments: 5 pages, 3 figures

arXiv:2105.05061 [pdf, other]

Semi-Supervised Metric Learning: A Deep Resurrection

Authors: Ujjal Kr Dutta, Mehrtash Harandi, Chellu Chandra Sekhar

Abstract: Distance Metric Learning (DML) seeks to learn a discriminative embedding where similar examples are closer, and dissimilar examples are apart. In this paper, we address the problem of Semi-Supervised DML (SSDML) that tries to learn a metric using a few labeled examples, and abundantly available unlabeled examples. SSDML is important because it is infeasible to manually annotate all the examples pr… ▽ More Distance Metric Learning (DML) seeks to learn a discriminative embedding where similar examples are closer, and dissimilar examples are apart. In this paper, we address the problem of Semi-Supervised DML (SSDML) that tries to learn a metric using a few labeled examples, and abundantly available unlabeled examples. SSDML is important because it is infeasible to manually annotate all the examples present in a large dataset. Surprisingly, with the exception of a few classical approaches that learn a linear Mahalanobis metric, SSDML has not been studied in the recent years, and lacks approaches in the deep SSDML scenario. In this paper, we address this challenging problem, and revamp SSDML with respect to deep learning. In particular, we propose a stochastic, graph-based approach that first propagates the affinities between the pairs of examples from labeled data, to that of the unlabeled pairs. The propagated affinities are used to mine triplet based constraints for metric learning. We impose orthogonality constraint on the metric parameters, as it leads to a better performance by avoiding a model collapse. △ Less

Submitted 10 May, 2021; originally announced May 2021.

Comments: In AAAI-2021

arXiv:2103.03215 [pdf, other]

Front-end Diarization for Percussion Separation in Taniavartanam of Carnatic Music Concerts

Authors: Nauman Dawalatabad, Jilt Sebastian, Jom Kuriakose, C. Chandra Sekhar, Shrikanth Narayanan, Hema A. Murthy

Abstract: Instrument separation in an ensemble is a challenging task. In this work, we address the problem of separating the percussive voices in the taniavartanam segments of Carnatic music. In taniavartanam, a number of percussive instruments play together or in tandem. Separation of instruments in regions where only one percussion is present leads to interference and artifacts at the output, as source se… ▽ More Instrument separation in an ensemble is a challenging task. In this work, we address the problem of separating the percussive voices in the taniavartanam segments of Carnatic music. In taniavartanam, a number of percussive instruments play together or in tandem. Separation of instruments in regions where only one percussion is present leads to interference and artifacts at the output, as source separation algorithms assume the presence of multiple percussive voices throughout the audio segment. We prevent this by first subjecting the taniavartanam to diarization. This process results in homogeneous clusters consisting of segments of either a single voice or multiple voices. A cluster of segments with multiple voices is identified using the Gaussian mixture model (GMM), which is then subjected to source separation. A deep recurrent neural network (DRNN) based approach is used to separate the multiple instrument segments. The effectiveness of the proposed system is evaluated on a standard Carnatic music dataset. The proposed approach provides close-to-oracle performance for non-overlapping segments and a significant improvement over traditional separation schemes. △ Less

Submitted 4 March, 2021; originally announced March 2021.

arXiv:2010.06304 [pdf, other]

doi 10.1109/TASLP.2020.3036231

Novel Architectures for Unsupervised Information Bottleneck based Speaker Diarization of Meetings

Authors: Nauman Dawalatabad, Srikanth Madikeri, C. Chandra Sekhar, Hema A. Murthy

Abstract: Speaker diarization is an important problem that is topical, and is especially useful as a preprocessor for conversational speech related applications. The objective of this paper is two-fold: (i) segment initialization by uniformly distributing speaker information across the initial segments, and (ii) incorporating speaker discriminative features within the unsupervised diarization framework. In… ▽ More Speaker diarization is an important problem that is topical, and is especially useful as a preprocessor for conversational speech related applications. The objective of this paper is two-fold: (i) segment initialization by uniformly distributing speaker information across the initial segments, and (ii) incorporating speaker discriminative features within the unsupervised diarization framework. In the first part of the work, a varying length segment initialization technique for Information Bottleneck (IB) based speaker diarization system using phoneme rate as the side information is proposed. This initialization distributes speaker information uniformly across the segments and provides a better starting point for IB based clustering. In the second part of the work, we present a Two-Pass Information Bottleneck (TPIB) based speaker diarization system that incorporates speaker discriminative features during the process of diarization. The TPIB based speaker diarization system has shown improvement over the baseline IB based system. During the first pass of the TPIB system, a coarse segmentation is performed using IB based clustering. The alignments obtained are used to generate speaker discriminative features using a shallow feed-forward neural network and linear discriminant analysis. The discriminative features obtained are used in the second pass to obtain the final speaker boundaries. In the final part of the paper, variable segment initialization is combined with the TPIB framework. This leverages the advantages of better segment initialization and speaker discriminative features that results in an additional improvement in performance. An evaluation on standard meeting datasets shows that a significant absolute improvement of 3.9% and 4.7% is obtained on the NIST and AMI datasets, respectively. △ Less

Submitted 13 October, 2020; originally announced October 2020.

Comments: Accepted in IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING

Journal ref: IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, 2021, pp 14-27

arXiv:2008.09880 [pdf, other]

doi 10.1109/TAI.2020.3026982

Unsupervised Deep Metric Learning via Orthogonality based Probabilistic Loss

Authors: Ujjal Kr Dutta, Mehrtash Harandi, Chellu Chandra Sekhar

Abstract: Metric learning is an important problem in machine learning. It aims to group similar examples together. Existing state-of-the-art metric learning approaches require class labels to learn a metric. As obtaining class labels in all applications is not feasible, we propose an unsupervised approach that learns a metric without making use of class labels. The lack of class labels is compensated by obt… ▽ More Metric learning is an important problem in machine learning. It aims to group similar examples together. Existing state-of-the-art metric learning approaches require class labels to learn a metric. As obtaining class labels in all applications is not feasible, we propose an unsupervised approach that learns a metric without making use of class labels. The lack of class labels is compensated by obtaining pseudo-labels of data using a graph-based clustering approach. The pseudo-labels are used to form triplets of examples, which guide the metric learning. We propose a probabilistic loss that minimizes the chances of each triplet violating an angular constraint. A weight function, and an orthogonality constraint in the objective speeds up the convergence and avoids a model collapse. We also provide a stochastic formulation of our method to scale up to large-scale datasets. Our studies demonstrate the competitiveness of our approach against state-of-the-art methods. We also thoroughly study the effect of the different components of our method. △ Less

Submitted 22 August, 2020; originally announced August 2020.

Comments: In the IEEE Transactions on Artificial Intelligence (IEEE TAI)

arXiv:2002.12394 [pdf, other]

Affinity guided Geometric Semi-Supervised Metric Learning

Authors: Ujjal Kr Dutta, Mehrtash Harandi, Chellu Chandra Sekhar

Abstract: In this paper, we revamp the forgotten classical Semi-Supervised Distance Metric Learning (SSDML) problem from a Riemannian geometric lens, to leverage stochastic optimization within a end-to-end deep framework. The motivation comes from the fact that apart from a few classical SSDML approaches learning a linear Mahalanobis metric, deep SSDML has not been studied. We first extend existing SSDML me… ▽ More In this paper, we revamp the forgotten classical Semi-Supervised Distance Metric Learning (SSDML) problem from a Riemannian geometric lens, to leverage stochastic optimization within a end-to-end deep framework. The motivation comes from the fact that apart from a few classical SSDML approaches learning a linear Mahalanobis metric, deep SSDML has not been studied. We first extend existing SSDML methods to their deep counterparts and then propose a new method to overcome their limitations. Due to the nature of constraints on our metric parameters, we leverage Riemannian optimization. Our deep SSDML method with a novel affinity propagation based triplet mining strategy outperforms its competitors. △ Less

Submitted 6 November, 2020; v1 submitted 27 February, 2020; originally announced February 2020.

Comments: Paper accepted in NeurIPS 2020 workshop on Differential Geometry meets Deep Learning

arXiv:1902.08051 [pdf, other]

doi 10.1109/ICASSP.2019.8683114

Incremental Transfer Learning in Two-pass Information Bottleneck based Speaker Diarization System for Meetings

Authors: Nauman Dawalatabad, Srikanth Madikeri, C Chandra Sekhar, Hema A Murthy

Abstract: The two-pass information bottleneck (TPIB) based speaker diarization system operates independently on different conversational recordings. TPIB system does not consider previously learned speaker discriminative information while diarizing new conversations. Hence, the real time factor (RTF) of TPIB system is high owing to the training time required for the artificial neural network (ANN). This pap… ▽ More The two-pass information bottleneck (TPIB) based speaker diarization system operates independently on different conversational recordings. TPIB system does not consider previously learned speaker discriminative information while diarizing new conversations. Hence, the real time factor (RTF) of TPIB system is high owing to the training time required for the artificial neural network (ANN). This paper attempts to improve the RTF of the TPIB system using an incremental transfer learning approach where the parameters learned by the ANN from other conversations are updated using current conversation rather than learning parameters from scratch. This reduces the RTF significantly. The effectiveness of the proposed approach compared to the baseline IB and the TPIB systems is demonstrated on standard NIST and AMI conversational meeting datasets. With a minor degradation in performance, the proposed system shows a significant improvement of 33.07% and 24.45% in RTF with respect to TPIB system on the NIST RT-04Eval and AMI-1 datasets, respectively. △ Less

Submitted 21 February, 2019; originally announced February 2019.

Comments: 5 pages, 2 figures, To appear in Proc. ICASSP 2019, May 12-17, 2019, Brighton, UK

Showing 1–8 of 8 results for author: Sekhar, C C