-
Boosting Open Set Recognition Performance through Modulated Representation Learning
Authors:
Amit Kumar Kundu,
Vaishnavi Patil,
Joseph Jaja
Abstract:
The open set recognition (OSR) problem aims to identify test samples from novel semantic classes that are not part of the training classes, a task that is crucial in many practical scenarios. However, existing OSR methods use a constant scaling factor (the temperature) to the logits before applying a loss function, which hinders the model from exploring both ends of the spectrum in representation…
▽ More
The open set recognition (OSR) problem aims to identify test samples from novel semantic classes that are not part of the training classes, a task that is crucial in many practical scenarios. However, existing OSR methods use a constant scaling factor (the temperature) to the logits before applying a loss function, which hinders the model from exploring both ends of the spectrum in representation learning -- from instance-level to semantic-level features. In this paper, we address this problem by enabling temperature-modulated representation learning using our novel negative cosine scheduling scheme. Our scheduling lets the model form a coarse decision boundary at the beginning of training by focusing on fewer neighbors, and gradually prioritizes more neighbors to smooth out rough edges. This gradual task switching leads to a richer and more generalizable representation space. While other OSR methods benefit by including regularization or auxiliary negative samples, such as with mix-up, thereby adding a significant computational overhead, our scheme can be folded into any existing OSR method with no overhead. We implement the proposed scheme on top of a number of baselines, using both cross-entropy and contrastive loss functions as well as a few other OSR methods, and find that our scheme boosts both the OSR performance and the closed set performance in most cases, especially on the tougher semantic shift benchmarks.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
Detecting and Monitoring Bias for Subgroups in Breast Cancer Detection AI
Authors:
Amit Kumar Kundu,
Florence X. Doo,
Vaishnavi Patil,
Amitabh Varshney,
Joseph Jaja
Abstract:
Automated mammography screening plays an important role in early breast cancer detection. However, current machine learning models, developed on some training datasets, may exhibit performance degradation and bias when deployed in real-world settings. In this paper, we analyze the performance of high-performing AI models on two mammography datasets-the Emory Breast Imaging Dataset (EMBED) and the…
▽ More
Automated mammography screening plays an important role in early breast cancer detection. However, current machine learning models, developed on some training datasets, may exhibit performance degradation and bias when deployed in real-world settings. In this paper, we analyze the performance of high-performing AI models on two mammography datasets-the Emory Breast Imaging Dataset (EMBED) and the RSNA 2022 challenge dataset. Specifically, we evaluate how these models perform across different subgroups, defined by six attributes, to detect potential biases using a range of classification metrics. Our analysis identifies certain subgroups that demonstrate notable underperformance, highlighting the need for ongoing monitoring of these subgroups' performance. To address this, we adopt a monitoring method designed to detect performance drifts over time. Upon identifying a drift, this method issues an alert, which can enable timely interventions. This approach not only provides a tool for tracking the performance but also helps ensure that AI models continue to perform effectively across diverse populations.
△ Less
Submitted 14 February, 2025;
originally announced February 2025.
-
ProtoVAE: Prototypical Networks for Unsupervised Disentanglement
Authors:
Vaishnavi Patil,
Matthew Evanusa,
Joseph JaJa
Abstract:
Generative modeling and self-supervised learning have in recent years made great strides towards learning from data in a completely unsupervised way. There is still however an open area of investigation into guiding a neural network to encode the data into representations that are interpretable or explainable. The problem of unsupervised disentanglement is of particular importance as it proposes t…
▽ More
Generative modeling and self-supervised learning have in recent years made great strides towards learning from data in a completely unsupervised way. There is still however an open area of investigation into guiding a neural network to encode the data into representations that are interpretable or explainable. The problem of unsupervised disentanglement is of particular importance as it proposes to discover the different latent factors of variation or semantic concepts from the data alone, without labeled examples, and encode them into structurally disjoint latent representations. Without additional constraints or inductive biases placed in the network, a generative model may learn the data distribution and encode the factors, but not necessarily in a disentangled way. Here, we introduce a novel deep generative VAE-based model, ProtoVAE, that leverages a deep metric learning Prototypical network trained using self-supervision to impose these constraints. The prototypical network constrains the mapping of the representation space to data space to ensure that controlled changes in the representation space are mapped to changes in the factors of variations in the data space. Our model is completely unsupervised and requires no a priori knowledge of the dataset, including the number of factors. We evaluate our proposed model on the benchmark dSprites, 3DShapes, and MPI3D disentanglement datasets, showing state of the art results against previous methods via qualitative traversals in the latent space, as well as quantitative disentanglement metrics. We further qualitatively demonstrate the effectiveness of our model on the real-world CelebA dataset.
△ Less
Submitted 15 May, 2023;
originally announced May 2023.
-
DOT-VAE: Disentangling One Factor at a Time
Authors:
Vaishnavi Patil,
Matthew Evanusa,
Joseph JaJa
Abstract:
As we enter the era of machine learning characterized by an overabundance of data, discovery, organization, and interpretation of the data in an unsupervised manner becomes a critical need. One promising approach to this endeavour is the problem of Disentanglement, which aims at learning the underlying generative latent factors, called the factors of variation, of the data and encoding them in dis…
▽ More
As we enter the era of machine learning characterized by an overabundance of data, discovery, organization, and interpretation of the data in an unsupervised manner becomes a critical need. One promising approach to this endeavour is the problem of Disentanglement, which aims at learning the underlying generative latent factors, called the factors of variation, of the data and encoding them in disjoint latent representations. Recent advances have made efforts to solve this problem for synthetic datasets generated by a fixed set of independent factors of variation. Here, we propose to extend this to real-world datasets with a countable number of factors of variations. We propose a novel framework which augments the latent space of a Variational Autoencoders with a disentangled space and is trained using a Wake-Sleep-inspired two-step algorithm for unsupervised disentanglement. Our network learns to disentangle interpretable, independent factors from the data ``one at a time", and encode it in different dimensions of the disentangled latent space, while making no prior assumptions about the number of factors or their joint distribution. We demonstrate its quantitative and qualitative effectiveness by evaluating the latent representations learned on two synthetic benchmark datasets; DSprites and 3DShapes and on a real datasets CelebA.
△ Less
Submitted 20 October, 2022; v1 submitted 19 October, 2022;
originally announced October 2022.
-
TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation
Authors:
Jun Wang,
Mingfei Gao,
Yuqian Hu,
Ramprasaath R. Selvaraju,
Chetan Ramaiah,
Ran Xu,
Joseph F. JaJa,
Larry S. Davis
Abstract:
Text-VQA aims at answering questions that require understanding the textual cues in an image. Despite the great progress of existing Text-VQA methods, their performance suffers from insufficient human-labeled question-answer (QA) pairs. However, we observe that, in general, the scene text is not fully exploited in the existing datasets -- only a small portion of the text in each image participates…
▽ More
Text-VQA aims at answering questions that require understanding the textual cues in an image. Despite the great progress of existing Text-VQA methods, their performance suffers from insufficient human-labeled question-answer (QA) pairs. However, we observe that, in general, the scene text is not fully exploited in the existing datasets -- only a small portion of the text in each image participates in the annotated QA activities. This results in a huge waste of useful information. To address this deficiency, we develop a new method to generate high-quality and diverse QA pairs by explicitly utilizing the existing rich text available in the scene context of each image. Specifically, we propose, TAG, a text-aware visual question-answer generation architecture that learns to produce meaningful, and accurate QA samples using a multimodal transformer. The architecture exploits underexplored scene text information and enhances scene understanding of Text-VQA models by combining the generated QA pairs with the initial training data. Extensive experimental results on two well-known Text-VQA benchmarks (TextVQA and ST-VQA) demonstrate that our proposed TAG effectively enlarges the training data that helps improve the Text-VQA performance without extra labeling effort. Moreover, our model outperforms state-of-the-art approaches that are pre-trained with extra large-scale data. Code is available at https://github.com/HenryJunW/TAG.
△ Less
Submitted 7 October, 2022; v1 submitted 2 August, 2022;
originally announced August 2022.
-
FedNet2Net: Saving Communication and Computations in Federated Learning with Model Growing
Authors:
Amit Kumar Kundu,
Joseph Jaja
Abstract:
Federated learning (FL) is a recently developed area of machine learning, in which the private data of a large number of distributed clients is used to develop a global model under the coordination of a central server without explicitly exposing the data. The standard FL strategy has a number of significant bottlenecks including large communication requirements and high impact on the clients' reso…
▽ More
Federated learning (FL) is a recently developed area of machine learning, in which the private data of a large number of distributed clients is used to develop a global model under the coordination of a central server without explicitly exposing the data. The standard FL strategy has a number of significant bottlenecks including large communication requirements and high impact on the clients' resources. Several strategies have been described in the literature trying to address these issues. In this paper, a novel scheme based on the notion of "model growing" is proposed. Initially, the server deploys a small model of low complexity, which is trained to capture the data complexity during the initial set of rounds. When the performance of such a model saturates, the server switches to a larger model with the help of function-preserving transformations. The model complexity increases as more data is processed by the clients, and the overall process continues until the desired performance is achieved. Therefore, the most complex model is broadcast only at the final stage in our approach resulting in substantial reduction in communication cost and client computational requirements. The proposed approach is tested extensively on three standard benchmarks and is shown to achieve substantial reduction in communication and client computation while achieving comparable accuracy when compared to the current most effective strategies.
△ Less
Submitted 19 July, 2022;
originally announced July 2022.
-
Class-Similarity Based Label Smoothing for Confidence Calibration
Authors:
Chihuang Liu,
Joseph JaJa
Abstract:
Generating confidence calibrated outputs is of utmost importance for the applications of deep neural networks in safety-critical decision-making systems. The output of a neural network is a probability distribution where the scores are estimated confidences of the input belonging to the corresponding classes, and hence they represent a complete estimate of the output likelihood relative to all cla…
▽ More
Generating confidence calibrated outputs is of utmost importance for the applications of deep neural networks in safety-critical decision-making systems. The output of a neural network is a probability distribution where the scores are estimated confidences of the input belonging to the corresponding classes, and hence they represent a complete estimate of the output likelihood relative to all classes. In this paper, we propose a novel form of label smoothing to improve confidence calibration. Since different classes are of different intrinsic similarities, more similar classes should result in closer probability values in the final output. This motivates the development of a new smooth label where the label values are based on similarities with the reference class. We adopt different similarity measurements, including those that capture feature-based similarities or semantic similarity. We demonstrate through extensive experiments, on various datasets and network architectures, that our approach consistently outperforms state-of-the-art calibration techniques including uniform label smoothing.
△ Less
Submitted 15 September, 2021; v1 submitted 24 June, 2020;
originally announced June 2020.
-
Feature Prioritization and Regularization Improve Standard Accuracy and Adversarial Robustness
Authors:
Chihuang Liu,
Joseph JaJa
Abstract:
Adversarial training has been successfully applied to build robust models at a certain cost. While the robustness of a model increases, the standard classification accuracy declines. This phenomenon is suggested to be an inherent trade-off. We propose a model that employs feature prioritization by a nonlinear attention module and $L_2$ feature regularization to improve the adversarial robustness a…
▽ More
Adversarial training has been successfully applied to build robust models at a certain cost. While the robustness of a model increases, the standard classification accuracy declines. This phenomenon is suggested to be an inherent trade-off. We propose a model that employs feature prioritization by a nonlinear attention module and $L_2$ feature regularization to improve the adversarial robustness and the standard accuracy relative to adversarial training. The attention module encourages the model to rely heavily on robust features by assigning larger weights to them while suppressing non-robust features. The regularizer encourages the model to extract similar features for the natural and adversarial images, effectively ignoring the added perturbation. In addition to evaluating the robustness of our model, we provide justification for the attention module and propose a novel experimental strategy that quantitatively demonstrates that our model is almost ideally aligned with salient data characteristics. Additional experimental results illustrate the power of our model relative to the state of the art methods.
△ Less
Submitted 12 August, 2019; v1 submitted 4 October, 2018;
originally announced October 2018.
-
Learning Graph-Level Representations with Recurrent Neural Networks
Authors:
Yu Jin,
Joseph F. JaJa
Abstract:
Recently a variety of methods have been developed to encode graphs into low-dimensional vectors that can be easily exploited by machine learning algorithms. The majority of these methods start by embedding the graph nodes into a low-dimensional vector space, followed by using some scheme to aggregate the node embeddings. In this work, we develop a new approach to learn graph-level representations,…
▽ More
Recently a variety of methods have been developed to encode graphs into low-dimensional vectors that can be easily exploited by machine learning algorithms. The majority of these methods start by embedding the graph nodes into a low-dimensional vector space, followed by using some scheme to aggregate the node embeddings. In this work, we develop a new approach to learn graph-level representations, which includes a combination of unsupervised and supervised learning components. We start by learning a set of node representations in an unsupervised fashion. Graph nodes are mapped into node sequences sampled from random walk approaches approximated by the Gumbel-Softmax distribution. Recurrent neural network (RNN) units are modified to accommodate both the node representations as well as their neighborhood information. Experiments on standard graph classification benchmarks demonstrate that our proposed approach achieves superior or comparable performance relative to the state-of-the-art algorithms in terms of convergence speed and classification accuracy. We further illustrate the effectiveness of the different components used by our approach.
△ Less
Submitted 11 September, 2018; v1 submitted 19 May, 2018;
originally announced May 2018.
-
A High Performance Implementation of Spectral Clustering on CPU-GPU Platforms
Authors:
Yu Jin,
Joseph F. JaJa
Abstract:
Spectral clustering is one of the most popular graph clustering algorithms, which achieves the best performance for many scientific and engineering applications. However, existing implementations in commonly used software platforms such as Matlab and Python do not scale well for many of the emerging Big Data applications. In this paper, we present a fast implementation of the spectral clustering a…
▽ More
Spectral clustering is one of the most popular graph clustering algorithms, which achieves the best performance for many scientific and engineering applications. However, existing implementations in commonly used software platforms such as Matlab and Python do not scale well for many of the emerging Big Data applications. In this paper, we present a fast implementation of the spectral clustering algorithm on a CPU-GPU heterogeneous platform. Our implementation takes advantage of the computational power of the multi-core CPU and the massive multithreading and SIMD capabilities of GPUs. Given the input as data points in high dimensional space, we propose a parallel scheme to build a sparse similarity graph represented in a standard sparse representation format. Then we compute the smallest $k$ eigenvectors of the Laplacian matrix by utilizing the reverse communication interfaces of ARPACK software and cuSPARSE library, where $k$ is typically very large. Moreover, we implement a very fast parallelized $k$-means algorithm on GPUs. Our implementation is shown to be significantly faster compared to the best known Matlab and Python implementations for each step. In addition, our algorithm scales to problems with a very large number of clusters.
△ Less
Submitted 12 February, 2018;
originally announced February 2018.
-
Graph Coarsening with Preserved Spectral Properties
Authors:
Yu Jin,
Andreas Loukas,
Joseph F. JaJa
Abstract:
Large-scale graphs are widely used to represent object relationships in many real world applications. The occurrence of large-scale graphs presents significant computational challenges to process, analyze, and extract information. Graph coarsening techniques are commonly used to reduce the computational load while attempting to maintain the basic structural properties of the original graph. As the…
▽ More
Large-scale graphs are widely used to represent object relationships in many real world applications. The occurrence of large-scale graphs presents significant computational challenges to process, analyze, and extract information. Graph coarsening techniques are commonly used to reduce the computational load while attempting to maintain the basic structural properties of the original graph. As there is no consensus on the specific graph properties preserved by coarse graphs, how to measure the differences between original and coarse graphs remains a key challenge. In this work, we introduce a new perspective regarding the graph coarsening based on concepts from spectral graph theory. We propose and justify new distance functions that characterize the differences between original and coarse graphs. We show that the proposed spectral distance naturally captures the structural differences in the graph coarsening process. In addition, we provide efficient graph coarsening algorithms to generate graphs which provably preserve the spectral properties from original graphs. Experiments show that our proposed algorithms consistently achieve better results compared to previous graph coarsening methods on graph classification and block recovery tasks.
△ Less
Submitted 10 October, 2019; v1 submitted 12 February, 2018;
originally announced February 2018.
-
A Data-Driven Approach to Extract Connectivity Structures from Diffusion Tensor Imaging Data
Authors:
Yu Jin,
Joseph F. JaJa,
Rong Chen,
Edward H. Herskovits
Abstract:
Diffusion Tensor Imaging (DTI) is an effective tool for the analysis of structural brain connectivity in normal development and in a broad range of brain disorders. However efforts to derive inherent characteristics of structural brain networks have been hampered by the very high dimensionality of the data, relatively small sample sizes, and the lack of widely acceptable connectivity-based regions…
▽ More
Diffusion Tensor Imaging (DTI) is an effective tool for the analysis of structural brain connectivity in normal development and in a broad range of brain disorders. However efforts to derive inherent characteristics of structural brain networks have been hampered by the very high dimensionality of the data, relatively small sample sizes, and the lack of widely acceptable connectivity-based regions of interests (ROIs). Typical approaches have focused either on regions defined by standard anatomical atlases that do not incorporate anatomical connectivity, or have been based on voxel-wise analysis, which results in loss of statistical power relative to structure-wise connectivity analysis. In this work, we propose a novel, computationally efficient iterative clustering method to generate connectivity-based whole-brain parcellations that converge to a stable parcellation in a few iterations. Our algorithm is based on a sparse representation of the whole brain connectivity matrix, which reduces the number of edges from around a half billion to a few million while incorporating the necessary spatial constraints. We show that the resulting regions in a sense capture the inherent connectivity information present in the data, and are stable with respect to initialization and the randomization scheme within the algorithm. These parcellations provide consistent structural regions across the subjects of population samples that are homogeneous with respect to anatomic connectivity. Our method also derives connectivity structures that can be used to distinguish between population samples with known different structural connectivity. In particular, new results in structural differences for different population samples such as Females vs Males, Normal Controls vs Schizophrenia, and different age groups in Normal Controls are also shown.
△ Less
Submitted 12 February, 2018;
originally announced February 2018.
-
Scalable Algorithms for Generating and Analyzing Structural Brain Networks with a Varying Number of Nodes
Authors:
Yu Jin,
Joseph F. JaJa,
Rong Chen,
Edward H. Herskovits
Abstract:
Diffusion Magnetic Resonance Imaging (MRI) exploits the anisotropic diffusion of water molecules in the brain to enable the estimation of the brain's anatomical fiber tracts at a relatively high resolution. In particular, tractographic methods can be used to generate whole-brain anatomical connectivity matrix where each element provides an estimate of the connectivity strength between the correspo…
▽ More
Diffusion Magnetic Resonance Imaging (MRI) exploits the anisotropic diffusion of water molecules in the brain to enable the estimation of the brain's anatomical fiber tracts at a relatively high resolution. In particular, tractographic methods can be used to generate whole-brain anatomical connectivity matrix where each element provides an estimate of the connectivity strength between the corresponding voxels. Structural brain networks are built using the connectivity information and a predefined brain parcellation, where the nodes of the network represent the brain regions and the edge weights capture the connectivity strengths between the corresponding brain regions. This paper introduces a number of novel scalable methods to generate and analyze structural brain networks with a varying number of nodes. In particular, we introduce a new parallel algorithm to quickly generate large scale connectivity-based parcellations for which voxels in a region possess highly similar connectivity patterns to the rest of the regions. We show that the corresponding regional structural consistency is always superior to randomly generated parcellations over a wide range of parcellation sizes. Corresponding brain networks with a varying number of nodes are analyzed using standard graph-theorectic measures, as well as, new measures derived from spectral graph theory. Our results indicate increasingly more statistical power of brain networks with larger numbers of nodes and the relatively unique shape of the spectral profile of large brain networks relative to other well-known networks.
△ Less
Submitted 13 September, 2016;
originally announced September 2016.
-
From Maxout to Channel-Out: Encoding Information on Sparse Pathways
Authors:
Qi Wang,
Joseph JaJa
Abstract:
Motivated by an important insight from neural science, we propose a new framework for understanding the success of the recently proposed "maxout" networks. The framework is based on encoding information on sparse pathways and recognizing the correct pathway at inference time. Elaborating further on this insight, we propose a novel deep network architecture, called "channel-out" network, which take…
▽ More
Motivated by an important insight from neural science, we propose a new framework for understanding the success of the recently proposed "maxout" networks. The framework is based on encoding information on sparse pathways and recognizing the correct pathway at inference time. Elaborating further on this insight, we propose a novel deep network architecture, called "channel-out" network, which takes a much better advantage of sparse pathway encoding. In channel-out networks, pathways are not only formed a posteriori, but they are also actively selected according to the inference outputs from the lower layers. From a mathematical perspective, channel-out networks can represent a wider class of piece-wise continuous functions, thereby endowing the network with more expressive power than that of maxout networks. We test our channel-out networks on several well-known image classification benchmarks, setting new state-of-the-art performance on CIFAR-100 and STL-10, which represent some of the "harder" image classification benchmarks.
△ Less
Submitted 18 November, 2013;
originally announced December 2013.