Search | arXiv e-print repository

arXiv:2007.09028 [pdf, other]

Sequential Explanations with Mental Model-Based Policies

Authors: Arnold YS Yeung, Shalmali Joshi, Joseph Jay Williams, Frank Rudzicz

Abstract: The act of explaining across two parties is a feedback loop, where one provides information on what needs to be explained and the other provides an explanation relevant to this information. We apply a reinforcement learning framework which emulates this format by providing explanations based on the explainee's current mental model. We conduct novel online human experiments where explanations gener… ▽ More The act of explaining across two parties is a feedback loop, where one provides information on what needs to be explained and the other provides an explanation relevant to this information. We apply a reinforcement learning framework which emulates this format by providing explanations based on the explainee's current mental model. We conduct novel online human experiments where explanations generated by various explanation methods are selected and presented to participants, using policies which observe participants' mental models, in order to optimize an interpretability proxy. Our results suggest that mental model-based policies (anchored in our proposed state representation) may increase interpretability over multiple sequential explanations, when compared to a random selection baseline. This work provides insight into how to select explanations which increase relevant information for users, and into conducting human-grounded experimentation to understand interpretability. △ Less

Submitted 17 July, 2020; originally announced July 2020.

Comments: Accepted into ICML 2020 Workshop on Human Interpretability in Machine Learning (Spotlight)

arXiv:2005.11371 [pdf, other]

Speaker diarization with session-level speaker embedding refinement using graph neural networks

Authors: Jixuan Wang, Xiong Xiao, Jian Wu, Ranjani Ramamurthy, Frank Rudzicz, Michael Brudno

Abstract: Deep speaker embedding models have been commonly used as a building block for speaker diarization systems; however, the speaker embedding model is usually trained according to a global loss defined on the training data, which could be sub-optimal for distinguishing speakers locally in a specific meeting session. In this work we present the first use of graph neural networks (GNNs) for the speaker… ▽ More Deep speaker embedding models have been commonly used as a building block for speaker diarization systems; however, the speaker embedding model is usually trained according to a global loss defined on the training data, which could be sub-optimal for distinguishing speakers locally in a specific meeting session. In this work we present the first use of graph neural networks (GNNs) for the speaker diarization problem, utilizing a GNN to refine speaker embeddings locally using the structural information between speech segments inside each session. The speaker embeddings extracted by a pre-trained model are remapped into a new embedding space, in which the different speakers within a single session are better separated. The model is trained for linkage prediction in a supervised manner by minimizing the difference between the affinity matrix constructed by the refined embeddings and the ground-truth adjacency matrix. Spectral clustering is then applied on top of the refined embeddings. We show that the clustering performance of the refined speaker embeddings outperforms the original embeddings significantly on both simulated and real meeting data, and our system achieves the state-of-the-art result on the NIST SRE 2000 CALLHOME database. △ Less

Submitted 22 May, 2020; originally announced May 2020.

Comments: ICASSP 2020 (45th International Conference on Acoustics, Speech, and Signal Processing)

arXiv:1906.10064 [pdf, other]

Variations on the Chebyshev-Lagrange Activation Function

Authors: Yuchen Li, Frank Rudzicz, Jekaterina Novikova

Abstract: We seek to improve the data efficiency of neural networks and present novel implementations of parameterized piece-wise polynomial activation functions. The parameters are the y-coordinates of n+1 Chebyshev nodes per hidden unit and Lagrangian interpolation between the nodes produces the polynomial on [-1, 1]. We show results for different methods of handling inputs outside [-1, 1] on synthetic da… ▽ More We seek to improve the data efficiency of neural networks and present novel implementations of parameterized piece-wise polynomial activation functions. The parameters are the y-coordinates of n+1 Chebyshev nodes per hidden unit and Lagrangian interpolation between the nodes produces the polynomial on [-1, 1]. We show results for different methods of handling inputs outside [-1, 1] on synthetic datasets, finding significant improvements in capacity of expression and accuracy of interpolation in models that compute some form of linear extrapolation from either ends. We demonstrate competitive or state-of-the-art performance on the classification of images (MNIST and CIFAR-10) and minimally-correlated vectors (DementiaBank) when we replace ReLU or tanh with linearly extrapolated Chebyshev-Lagrange activations in deep residual architectures. △ Less

Submitted 24 June, 2019; originally announced June 2019.

arXiv:1902.02375 [pdf, other]

Centroid-based deep metric learning for speaker recognition

Authors: Jixuan Wang, Kuan-Chieh Wang, Marc Law, Frank Rudzicz, Michael Brudno

Abstract: Speaker embedding models that utilize neural networks to map utterances to a space where distances reflect similarity between speakers have driven recent progress in the speaker recognition task. However, there is still a significant performance gap between recognizing speakers in the training set and unseen speakers. The latter case corresponds to the few-shot learning task, where a trained model… ▽ More Speaker embedding models that utilize neural networks to map utterances to a space where distances reflect similarity between speakers have driven recent progress in the speaker recognition task. However, there is still a significant performance gap between recognizing speakers in the training set and unseen speakers. The latter case corresponds to the few-shot learning task, where a trained model is evaluated on unseen classes. Here, we optimize a speaker embedding model with prototypical network loss (PNL), a state-of-the-art approach for the few-shot image classification task. The resulting embedding model outperforms the state-of-the-art triplet loss based models in both speaker verification and identification tasks, for both seen and unseen speakers. △ Less

Submitted 6 February, 2019; originally announced February 2019.

Comments: ICASSP 2019 (44th International Conference on Acoustics, Speech, and Signal Processing)

arXiv:1811.12254 [pdf, other]

The Effect of Heterogeneous Data for Alzheimer's Disease Detection from Speech

Authors: Aparna Balagopalan, Jekaterina Novikova, Frank Rudzicz, Marzyeh Ghassemi

Abstract: Speech datasets for identifying Alzheimer's disease (AD) are generally restricted to participants performing a single task, e.g. describing an image shown to them. As a result, models trained on linguistic features derived from such datasets may not be generalizable across tasks. Building on prior work demonstrating that same-task data of healthy participants helps improve AD detection on a single… ▽ More Speech datasets for identifying Alzheimer's disease (AD) are generally restricted to participants performing a single task, e.g. describing an image shown to them. As a result, models trained on linguistic features derived from such datasets may not be generalizable across tasks. Building on prior work demonstrating that same-task data of healthy participants helps improve AD detection on a single-task dataset of pathological speech, we augment an AD-specific dataset consisting of subjects describing a picture with multi-task healthy data. We demonstrate that normative data from multiple speech-based tasks helps improve AD detection by up to 9%. Visualization of decision boundaries reveals that models trained on a combination of structured picture descriptions and unstructured conversational speech have the least out-of-task error and show the most potential to generalize to multiple tasks. We analyze the impact of age of the added samples and if they affect fairness in classification. We also provide explanations for a possible inductive bias effect across tasks using model-agnostic feature anchors. This work highlights the need for heterogeneous datasets for encoding changes in multiple facets of cognition and for developing a task-independent AD detection model. △ Less

Submitted 29 November, 2018; originally announced November 2018.

Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

Report number: ML4H/2018/147

arXiv:1811.10376 [pdf, other]

Robustness against the channel effect in pathological voice detection

Authors: Yi-Te Hsu, Zining Zhu, Chi-Te Wang, Shih-Hau Fang, Frank Rudzicz, Yu Tsao

Abstract: Many people are suffering from voice disorders, which can adversely affect the quality of their lives. In response, some researchers have proposed algorithms for automatic assessment of these disorders, based on voice signals. However, these signals can be sensitive to the recording devices. Indeed, the channel effect is a pervasive problem in machine learning for healthcare. In this study, we pro… ▽ More Many people are suffering from voice disorders, which can adversely affect the quality of their lives. In response, some researchers have proposed algorithms for automatic assessment of these disorders, based on voice signals. However, these signals can be sensitive to the recording devices. Indeed, the channel effect is a pervasive problem in machine learning for healthcare. In this study, we propose a detection system for pathological voice, which is robust against the channel effect. This system is based on a bidirectional LSTM network. To increase the performance robustness against channel mismatch, we integrate domain adversarial training (DAT) to eliminate the differences between the devices. When we train on data recorded on a high-quality microphone and evaluate on smartphone data without labels, our robust detection system increases the PR-AUC from 0.8448 to 0.9455 (and 0.9522 with target sample labels). To the best of our knowledge, this is the first study applying unsupervised domain adaptation to pathological voice detection. Notably, our system does not need target device sample labels, which allows for generalization to many new devices. △ Less

Submitted 2 December, 2018; v1 submitted 26 November, 2018; originally announced November 2018.

Comments: Machine Learning for Health (ML4H) Workshop at NeurIPS 2018 arXiv:1811.07216

Report number: ML4H/2018/200

arXiv:1811.08081 [pdf, other]

ChainGAN: A sequential approach to GANs

Authors: Safwan Hossain, Kiarash Jamali, Yuchen Li, Frank Rudzicz

Abstract: We propose a new architecture and training methodology for generative adversarial networks. Current approaches attempt to learn the transformation from a noise sample to a generated data sample in one shot. Our proposed generator architecture, called $\textit{ChainGAN}$, uses a two-step process. It first attempts to transform a noise vector into a crude sample, similar to a traditional generator.… ▽ More We propose a new architecture and training methodology for generative adversarial networks. Current approaches attempt to learn the transformation from a noise sample to a generated data sample in one shot. Our proposed generator architecture, called $\textit{ChainGAN}$, uses a two-step process. It first attempts to transform a noise vector into a crude sample, similar to a traditional generator. Next, a chain of networks, called $\textit{editors}$, attempt to sequentially enhance this sample. We train each of these units independently, instead of with end-to-end backpropagation on the entire chain. Our model is robust, efficient, and flexible as we can apply it to various network architectures. We provide rationale for our choices and experimentally evaluate our model, achieving competitive results on several datasets. △ Less

Submitted 22 November, 2018; v1 submitted 20 November, 2018; originally announced November 2018.

arXiv:1811.07266 [pdf, other]

DeepConsensus: using the consensus of features from multiple layers to attain robust image classification

Authors: Yuchen Li, Safwan Hossain, Kiarash Jamali, Frank Rudzicz

Abstract: We consider a classifier whose test set is exposed to various perturbations that are not present in the training set. These test samples still contain enough features to map them to the same class as their unperturbed counterpart. Current architectures exhibit rapid degradation of accuracy when trained on standard datasets but then used to classify perturbed samples of that data. To address this,… ▽ More We consider a classifier whose test set is exposed to various perturbations that are not present in the training set. These test samples still contain enough features to map them to the same class as their unperturbed counterpart. Current architectures exhibit rapid degradation of accuracy when trained on standard datasets but then used to classify perturbed samples of that data. To address this, we present a novel architecture named DeepConsensus that significantly improves generalization to these test-time perturbations. Our key insight is that deep neural networks should directly consider summaries of low and high level features when making classifications. Existing convolutional neural networks can be augmented with DeepConsensus, leading to improved resistance against large and small perturbations on MNIST, EMNIST, FashionMNIST, CIFAR10 and SVHN datasets. △ Less

Submitted 2 December, 2018; v1 submitted 17 November, 2018; originally announced November 2018.

arXiv:1807.07217 [pdf, other]

Deconfounding age effects with fair representation learning when assessing dementia

Authors: Zining Zhu, Jekaterina Novikova, Frank Rudzicz

Abstract: One of the most prevalent symptoms among the elderly population, dementia, can be detected by classifiers trained on linguistic features extracted from narrative transcripts. However, these linguistic features are impacted in a similar but different fashion by the normal aging process. Aging is therefore a confounding factor, whose effects have been hard for machine learning classifiers (especiall… ▽ More One of the most prevalent symptoms among the elderly population, dementia, can be detected by classifiers trained on linguistic features extracted from narrative transcripts. However, these linguistic features are impacted in a similar but different fashion by the normal aging process. Aging is therefore a confounding factor, whose effects have been hard for machine learning classifiers (especially deep neural network based models) to ignore. We show DNN models are capable of estimating ages based on linguistic features. Predicting dementia based on this aging bias could lead to potentially non-generalizable accuracies on clinical datasets, if not properly deconfounded. In this paper, we propose to address this deconfounding problem with fair representation learning. We build neural network classifiers that learn low-dimensional representations reflecting the impacts of dementia yet discarding the effects of age. To evaluate these classifiers, we specify a model-agnostic score $Δ_{eo}^{(N)}$ measuring how classifier results are deconfounded from age. Our best models compromise accuracy by only 2.56\% and 1.54\% on two clinical datasets compared to DNNs, and their $Δ_{eo}^{(2)}$ scores are better than statistical (residulization and inverse probability weight) adjustments. △ Less

Submitted 7 September, 2019; v1 submitted 18 July, 2018; originally announced July 2018.

Comments: 9 pages, 2 figures

arXiv:1805.09366 [pdf, other]

Semi-supervised classification by reaching consensus among modalities

Authors: Zining Zhu, Jekaterina Novikova, Frank Rudzicz

Abstract: Deep learning has demonstrated abilities to learn complex structures, but they can be restricted by available data. Recently, Consensus Networks (CNs) were proposed to alleviate data sparsity by utilizing features from multiple modalities, but they too have been limited by the size of labeled data. In this paper, we extend CN to Transductive Consensus Networks (TCNs), suitable for semi-supervised… ▽ More Deep learning has demonstrated abilities to learn complex structures, but they can be restricted by available data. Recently, Consensus Networks (CNs) were proposed to alleviate data sparsity by utilizing features from multiple modalities, but they too have been limited by the size of labeled data. In this paper, we extend CN to Transductive Consensus Networks (TCNs), suitable for semi-supervised learning. In TCNs, different modalities of input are compressed into latent representations, which we encourage to become indistinguishable during iterative adversarial training. To understand TCNs two mechanisms, consensus and classification, we put forward its three variants in ablation studies on these mechanisms. To further investigate TCN models, we treat the latent representations as probability distributions and measure their similarities as the negative relative Jensen-Shannon divergences. We show that a consensus state beneficial for classification desires a stable but imperfect similarity between the representations. Overall, TCNs outperform or align with the best benchmark algorithms given 20 to 200 labeled samples on the Bank Marketing and the DementiaBank datasets. △ Less

Submitted 19 November, 2018; v1 submitted 23 May, 2018; originally announced May 2018.

Comments: NIPS IRASL Workshop 2018

Showing 1–10 of 10 results for author: Rudzicz, F