Skip to main content

Showing 1–50 of 92 results for author: Steeg, G V

Searching in archive cs. Search in all archives.
.
  1. arXiv:2506.11893  [pdf, ps, other

    cs.LG

    Measurement-aligned Flow for Inverse Problem

    Authors: Shaorong Zhang, Rob Brekelmans, Yunshu Wu, Greg Ver Steeg

    Abstract: Diffusion models provide a powerful way to incorporate complex prior information for solving inverse problems. However, existing methods struggle to correctly incorporate guidance from conflicting signals in the prior and measurement, especially in the challenging setting of non-Gaussian or unknown noise. To bridge these gaps, we propose Measurement-Aligned Sampling (MAS), a novel framework for li… ▽ More

    Submitted 13 June, 2025; originally announced June 2025.

  2. arXiv:2505.12358  [pdf, ps, other

    cs.LG cs.AI

    AbFlowNet: Optimizing Antibody-Antigen Binding Energy via Diffusion-GFlowNet Fusion

    Authors: Abrar Rahman Abir, Haz Sameen Shahgir, Md Rownok Zahan Ratul, Md Toki Tahmid, Greg Ver Steeg, Yue Dong

    Abstract: Complementarity Determining Regions (CDRs) are critical segments of an antibody that facilitate binding to specific antigens. Current computational methods for CDR design utilize reconstruction losses and do not jointly optimize binding energy, a crucial metric for antibody efficacy. Rather, binding energy optimization is done through computationally expensive Online Reinforcement Learning (RL) pi… ▽ More

    Submitted 18 May, 2025; originally announced May 2025.

  3. arXiv:2504.15267  [pdf, other

    cs.CV

    Diffusion Bridge Models for 3D Medical Image Translation

    Authors: Shaorong Zhang, Tamoghna Chattopadhyay, Sophia I. Thomopoulos, Jose-Luis Ambite, Paul M. Thompson, Greg Ver Steeg

    Abstract: Diffusion tensor imaging (DTI) provides crucial insights into the microstructure of the human brain, but it can be time-consuming to acquire compared to more readily available T1-weighted (T1w) magnetic resonance imaging (MRI). To address this challenge, we propose a diffusion bridge model for 3D brain image translation between T1w MRI and DTI modalities. Our model learns to generate high-quality… ▽ More

    Submitted 21 April, 2025; originally announced April 2025.

  4. arXiv:2504.07087  [pdf, other

    cs.CL cs.AI cs.IR

    KG-LLM-Bench: A Scalable Benchmark for Evaluating LLM Reasoning on Textualized Knowledge Graphs

    Authors: Elan Markowitz, Krupa Galiya, Greg Ver Steeg, Aram Galstyan

    Abstract: Knowledge graphs have emerged as a popular method for injecting up-to-date, factual knowledge into large language models (LLMs). This is typically achieved by converting the knowledge graph into text that the LLM can process in context. While multiple methods of encoding knowledge graphs have been proposed, the impact of this textualization process on LLM performance remains under-explored. We int… ▽ More

    Submitted 9 April, 2025; originally announced April 2025.

    Comments: To be presented at NAACL-HLT, KnowledgeNLP Workshop (2025)

  5. arXiv:2501.15435  [pdf, other

    cs.LG cs.CV

    Making Sense Of Distributed Representations With Activation Spectroscopy

    Authors: Kyle Reing, Greg Ver Steeg, Aram Galstyan

    Abstract: In the study of neural network interpretability, there is growing evidence to suggest that relevant features are encoded across many neurons in a distributed fashion. Making sense of these distributed representations without knowledge of the network's encoding strategy is a combinatorial task that is not guaranteed to be tractable. This work explores one feasible path to both detecting and tracing… ▽ More

    Submitted 26 January, 2025; originally announced January 2025.

  6. arXiv:2411.05855  [pdf, other

    cs.LG cs.CV cs.NE

    Learning Morphisms with Gauss-Newton Approximation for Growing Networks

    Authors: Neal Lawton, Aram Galstyan, Greg Ver Steeg

    Abstract: A popular method for Neural Architecture Search (NAS) is based on growing networks via small local changes to the network's architecture called network morphisms. These methods start with a small seed network and progressively grow the network by adding new neurons in an automated way. However, it remains a challenge to efficiently determine which parts of the network are best to grow. Here we pro… ▽ More

    Submitted 6 November, 2024; originally announced November 2024.

    Comments: 12 pages, 4 figures

    MSC Class: 68T07 ACM Class: I.2.8

  7. arXiv:2410.21553  [pdf, ps, other

    cs.LG

    Exploring the Design Space of Diffusion Bridge Models

    Authors: Shaorong Zhang, Yuanbin Cheng, Greg Ver Steeg

    Abstract: Diffusion bridge models and stochastic interpolants enable high-quality image-to-image (I2I) translation by creating paths between distributions in pixel space. However, the proliferation of techniques based on incompatible mathematical assumptions have impeded progress. In this work, we unify and expand the space of bridge models by extending Stochastic Interpolants (SIs) with preconditioning, en… ▽ More

    Submitted 2 July, 2025; v1 submitted 28 October, 2024; originally announced October 2024.

    Comments: 23 pages, 9 figures

  8. arXiv:2410.14713  [pdf, other

    cs.LG cs.CL

    QuAILoRA: Quantization-Aware Initialization for LoRA

    Authors: Neal Lawton, Aishwarya Padmakumar, Judith Gaspers, Jack FitzGerald, Anoop Kumar, Greg Ver Steeg, Aram Galstyan

    Abstract: QLoRA reduces the memory-cost of fine-tuning a large language model (LLM) with LoRA by quantizing the base LLM. However, quantization introduces quantization errors that negatively impact model performance after fine-tuning. In this paper we introduce QuAILoRA, a quantization-aware initialization for LoRA that mitigates this negative impact by decreasing quantization errors at initialization. Our… ▽ More

    Submitted 9 October, 2024; originally announced October 2024.

    Comments: 12 pages, 7 figures. Submitted to the 4th NeurIPS Workshop on Efficient Natural Language and Speech Processing (ENLSP-IV)

    MSC Class: 68T50

  9. arXiv:2407.08946  [pdf, other

    cs.LG

    Your Diffusion Model is Secretly a Noise Classifier and Benefits from Contrastive Training

    Authors: Yunshu Wu, Yingtao Luo, Xianghao Kong, Evangelos E. Papalexakis, Greg Ver Steeg

    Abstract: Diffusion models learn to denoise data and the trained denoiser is then used to generate new samples from the data distribution. In this paper, we revisit the diffusion sampling process and identify a fundamental cause of sample quality degradation: the denoiser is poorly estimated in regions that are far Outside Of the training Distribution (OOD), and the sampling process inevitably evaluates in… ▽ More

    Submitted 1 November, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  10. arXiv:2402.15833  [pdf, other

    cs.CL cs.LG

    Prompt Perturbation Consistency Learning for Robust Language Models

    Authors: Yao Qiang, Subhrangshu Nandi, Ninareh Mehrabi, Greg Ver Steeg, Anoop Kumar, Anna Rumshisky, Aram Galstyan

    Abstract: Large language models (LLMs) have demonstrated impressive performance on a number of natural language processing tasks, such as question answering and text summarization. However, their performance on sequence labeling tasks such as intent classification and slot filling (IC-SF), which is a central component in personal assistant systems, lags significantly behind discriminative models. Furthermor… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

  11. arXiv:2402.08919  [pdf, other

    cs.CV cs.LG

    Interpretable Measures of Conceptual Similarity by Complexity-Constrained Descriptive Auto-Encoding

    Authors: Alessandro Achille, Greg Ver Steeg, Tian Yu Liu, Matthew Trager, Carson Klingenberg, Stefano Soatto

    Abstract: Quantifying the degree of similarity between images is a key copyright issue for image-based machine learning. In legal doctrine however, determining the degree of similarity between works requires subjective analysis, and fact-finders (judges and juries) can demonstrate considerable variability in these subjective judgement calls. Images that are structurally similar can be deemed dissimilar, whe… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  12. arXiv:2312.14440  [pdf, other

    cs.LG cs.CR

    Asymmetric Bias in Text-to-Image Generation with Adversarial Attacks

    Authors: Haz Sameen Shahgir, Xianghao Kong, Greg Ver Steeg, Yue Dong

    Abstract: The widespread use of Text-to-Image (T2I) models in content generation requires careful examination of their safety, including their robustness to adversarial attacks. Despite extensive research on adversarial attacks, the reasons for their effectiveness remain underexplored. This paper presents an empirical study on adversarial attacks against T2I models, focusing on analyzing factors associated… ▽ More

    Submitted 17 July, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

    Comments: camera-ready version

  13. arXiv:2310.07972  [pdf, other

    cs.LG cs.AI cs.IT

    Interpretable Diffusion via Information Decomposition

    Authors: Xianghao Kong, Ollie Liu, Han Li, Dani Yogatama, Greg Ver Steeg

    Abstract: Denoising diffusion models enable conditional generation and density modeling of complex relationships like images and text. However, the nature of the learned relationships is opaque making it difficult to understand precisely what relationships between words and parts of an image are captured, or to predict the effect of an intervention. We illuminate the fine-grained relationships learned by di… ▽ More

    Submitted 18 May, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: 32 pages, 18 figures

  14. arXiv:2306.09520  [pdf, other

    cs.LG cs.AI stat.ME stat.ML

    Ensembled Prediction Intervals for Causal Outcomes Under Hidden Confounding

    Authors: Myrl G. Marmarelis, Greg Ver Steeg, Aram Galstyan, Fred Morstatter

    Abstract: Causal inference of exact individual treatment outcomes in the presence of hidden confounders is rarely possible. Recent work has extended prediction intervals with finite-sample guarantees to partially identifiable causal outcomes, by means of a sensitivity model for hidden confounding. In deep learning, predictors can exploit their inductive biases for better generalization out of sample. We arg… ▽ More

    Submitted 1 November, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

  15. Knowledge Enhanced Multi-Domain Recommendations in an AI Assistant Application

    Authors: Elan Markowitz, Ziyan Jiang, Fan Yang, Xing Fan, Tony Chen, Greg Ver Steeg, Aram Galstyan

    Abstract: This work explores unifying knowledge enhanced recommendation with multi-domain recommendation systems in a conversational AI assistant application. Multi-domain recommendation leverages users' interactions in previous domains to improve recommendations in a new one. Knowledge graph enhancement seeks to use external knowledge graphs to improve recommendations within a single domain. Both research… ▽ More

    Submitted 24 March, 2025; v1 submitted 9 June, 2023; originally announced June 2023.

    Journal ref: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  16. arXiv:2305.19264  [pdf, other

    cs.CL cs.LG

    Jointly Reparametrized Multi-Layer Adaptation for Efficient and Private Tuning

    Authors: Umang Gupta, Aram Galstyan, Greg Ver Steeg

    Abstract: Efficient finetuning of pretrained language transformers is becoming increasingly prevalent for solving natural language processing tasks. While effective, it can still require a large number of tunable parameters. This can be a drawback for low-resource applications and training with differential-privacy constraints, where excessive noise may be introduced during finetuning. To this end, we propo… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: To appear in the Findings of ACL 2023. Code available at https://github.com/umgupta/jointly-reparametrized-finetuning

  17. arXiv:2305.16597  [pdf, other

    cs.CL cs.AI cs.LG

    Neural Architecture Search for Parameter-Efficient Fine-tuning of Large Pre-trained Language Models

    Authors: Neal Lawton, Anoop Kumar, Govind Thattai, Aram Galstyan, Greg Ver Steeg

    Abstract: Parameter-efficient tuning (PET) methods fit pre-trained language models (PLMs) to downstream tasks by either computing a small compressed update for a subset of model parameters, or appending and fine-tuning a small number of new model parameters to the pre-trained network. Hand-designed PET architectures from the literature perform well in practice, but have the potential to be improved via auto… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: 8 pages, 3 figures, ACL 2023

    ACM Class: I.2.7

  18. arXiv:2305.10625  [pdf, other

    cs.LG

    Measuring and Mitigating Local Instability in Deep Neural Networks

    Authors: Arghya Datta, Subhrangshu Nandi, Jingcheng Xu, Greg Ver Steeg, He Xie, Anoop Kumar, Aram Galstyan

    Abstract: Deep Neural Networks (DNNs) are becoming integral components of real world services relied upon by millions of users. Unfortunately, architects of these systems can find it difficult to ensure reliable performance as irrelevant details like random initialization can unexpectedly change the outputs of a trained system with potentially disastrous consequences. We formulate the model stability proble… ▽ More

    Submitted 18 May, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

    Comments: To be published in Findings of the Association for Computational Linguistics (ACL), 2023

  19. arXiv:2303.06992  [pdf, other

    cs.LG stat.ML

    Improving Mutual Information Estimation with Annealed and Energy-Based Bounds

    Authors: Rob Brekelmans, Sicong Huang, Marzyeh Ghassemi, Greg Ver Steeg, Roger Grosse, Alireza Makhzani

    Abstract: Mutual information (MI) is a fundamental quantity in information theory and machine learning. However, direct estimation of MI is intractable, even if the true joint probability density for the variables of interest is known, as it involves estimating a potentially high-dimensional log partition function. In this work, we present a unifying view of existing MI bounds from the perspective of import… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

    Comments: A shorter version appeared in the International Conference on Learning Representations (ICLR) 2022

    Journal ref: ICLR 2022 https://openreview.net/forum?id=T0B9AoM_bFg

  20. arXiv:2303.01491  [pdf, other

    eess.IV cs.LG q-bio.QM

    Transferring Models Trained on Natural Images to 3D MRI via Position Encoded Slice Models

    Authors: Umang Gupta, Tamoghna Chattopadhyay, Nikhil Dhinagar, Paul M. Thompson, Greg Ver Steeg, The Alzheimer's Disease Neuroimaging Initiative

    Abstract: Transfer learning has remarkably improved computer vision. These advances also promise improvements in neuroimaging, where training set sizes are often small. However, various difficulties arise in directly applying models pretrained on natural images to radiologic images, such as MRIs. In particular, a mismatch in the input space (2D images vs. 3D MRIs) restricts the direct transfer of models, of… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: To appear at IEEE International Symposium on Biomedical Imaging 2023 (ISBI 2023). Code is available at https://github.com/umgupta/2d-slice-set-networks

  21. arXiv:2302.03792  [pdf, other

    cs.LG cs.IT

    Information-Theoretic Diffusion

    Authors: Xianghao Kong, Rob Brekelmans, Greg Ver Steeg

    Abstract: Denoising diffusion models have spurred significant gains in density modeling and image generation, precipitating an industrial revolution in text-guided AI art generation. We introduce a new mathematical foundation for diffusion models inspired by classic results in information theory that connect Information with Minimum Mean Square Error regression, the so-called I-MMSE relations. We generalize… ▽ More

    Submitted 7 February, 2023; originally announced February 2023.

    Comments: 26 pages, 7 figures, International Conference on Learning Representations (ICLR), 2023. Code is at http://github.com/kxh001/ITdiffusion and http://github.com/gregversteeg/InfoDiffusionSimple

  22. arXiv:2208.11669  [pdf, other

    cs.LG cs.CR eess.IV q-bio.QM

    Towards Sparsified Federated Neuroimaging Models via Weight Pruning

    Authors: Dimitris Stripelis, Umang Gupta, Nikhil Dhinagar, Greg Ver Steeg, Paul Thompson, José Luis Ambite

    Abstract: Federated training of large deep neural networks can often be restrictive due to the increasing costs of communicating the updates with increasing model sizes. Various model pruning techniques have been designed in centralized settings to reduce inference times. Combining centralized pruning techniques with federated training seems intuitive for reducing communication costs -- by pruning the model… ▽ More

    Submitted 24 August, 2022; originally announced August 2022.

    Comments: Accepted to 3rd MICCAI Workshop on Distributed, Collaborative and Federated Learning (DeCaF, 2022)

  23. Formal limitations of sample-wise information-theoretic generalization bounds

    Authors: Hrayr Harutyunyan, Greg Ver Steeg, Aram Galstyan

    Abstract: Some of the tightest information-theoretic generalization bounds depend on the average information between the learned hypothesis and a single training example. However, these sample-wise bounds were derived only for expected generalization gap. We show that even for expected squared generalization gap no such sample-wise information-theoretic bounds exist. The same is true for PAC-Bayes and singl… ▽ More

    Submitted 13 December, 2022; v1 submitted 13 May, 2022; originally announced May 2022.

    Comments: 2022 IEEE Information Theory Workshop

  24. arXiv:2205.05249  [pdf, other

    cs.LG cs.CR cs.CV cs.DC

    Secure & Private Federated Neuroimaging

    Authors: Dimitris Stripelis, Umang Gupta, Hamza Saleem, Nikhil Dhinagar, Tanmay Ghai, Rafael Chrysovalantis Anastasiou, Armaghan Asghar, Greg Ver Steeg, Srivatsan Ravi, Muhammad Naveed, Paul M. Thompson, Jose Luis Ambite

    Abstract: The amount of biomedical data continues to grow rapidly. However, collecting data from multiple sites for joint analysis remains challenging due to security, privacy, and regulatory concerns. To overcome this challenge, we use Federated Learning, which enables distributed training of neural network models over multiple data sources without sharing data. Each site trains the neural network over its… ▽ More

    Submitted 28 August, 2023; v1 submitted 10 May, 2022; originally announced May 2022.

    Comments: 18 pages, 13 figures, 2 tables

    ACM Class: I.2; I.5.1; J.3

  25. arXiv:2204.12430  [pdf, other

    cs.LG

    Federated Progressive Sparsification (Purge, Merge, Tune)+

    Authors: Dimitris Stripelis, Umang Gupta, Greg Ver Steeg, Jose Luis Ambite

    Abstract: To improve federated training of neural networks, we develop FedSparsify, a sparsification strategy based on progressive weight magnitude pruning. Our method has several benefits. First, since the size of the network becomes increasingly smaller, computation and communication costs during training are reduced. Second, the models are incrementally constrained to a smaller set of parameters, which f… ▽ More

    Submitted 15 May, 2023; v1 submitted 26 April, 2022; originally announced April 2022.

    Comments: Accepted at the Workshop on Federated Learning: Recent Advances and New Challenges, in Conjunction with NeurIPS 2022 (FL-NeurIPS'22) 23 pages, 12 figures, 1 algorithm, 2 Tables

    MSC Class: 68T07 ACM Class: I.2.m

  26. arXiv:2204.11206  [pdf, other

    stat.ME cs.LG stat.ML

    Partial Identification of Dose Responses with Hidden Confounders

    Authors: Myrl G. Marmarelis, Elizabeth Haddad, Andrew Jesson, Neda Jahanshad, Aram Galstyan, Greg Ver Steeg

    Abstract: Inferring causal effects of continuous-valued treatments from observational data is a crucial task promising to better inform policy- and decision-makers. A critical assumption needed to identify these effects is that all confounding variables -- causal parents of both the treatment and the outcome -- are included as covariates. Unfortunately, given observational data alone, we cannot know with ce… ▽ More

    Submitted 12 June, 2023; v1 submitted 24 April, 2022; originally announced April 2022.

  27. arXiv:2203.12574  [pdf, other

    cs.CL cs.LG

    Mitigating Gender Bias in Distilled Language Models via Counterfactual Role Reversal

    Authors: Umang Gupta, Jwala Dhamala, Varun Kumar, Apurv Verma, Yada Pruksachatkun, Satyapriya Krishna, Rahul Gupta, Kai-Wei Chang, Greg Ver Steeg, Aram Galstyan

    Abstract: Language models excel at generating coherent text, and model compression techniques such as knowledge distillation have enabled their use in resource-constrained settings. However, these models can be biased in multiple ways, including the unfounded association of male and female genders with gender-neutral professions. Therefore, knowledge distillation without any fairness constraints may preserv… ▽ More

    Submitted 23 March, 2022; originally announced March 2022.

    Comments: To appear in the Findings of ACL 2022

  28. arXiv:2203.10204  [pdf, other

    cond-mat.mtrl-sci cond-mat.dis-nn cs.CV cs.LG

    Inferring topological transitions in pattern-forming processes with self-supervised learning

    Authors: Marcin Abram, Keith Burghardt, Greg Ver Steeg, Aram Galstyan, Remi Dingreville

    Abstract: The identification and classification of transitions in topological and microstructural regimes in pattern-forming processes are critical for understanding and fabricating microstructurally precise novel materials in many application domains. Unfortunately, relevant microstructure transitions may depend on process parameters in subtle and complex ways that are not captured by the classic theory of… ▽ More

    Submitted 10 August, 2022; v1 submitted 18 March, 2022; originally announced March 2022.

    Comments: 17 pages, 6 figures, 8 pages of supplementary information

    ACM Class: I.2.6; I.4.7; I.5.4; I.6.m; J.2

  29. arXiv:2111.13733  [pdf, other

    cs.LG

    Failure Modes of Domain Generalization Algorithms

    Authors: Tigran Galstyan, Hrayr Harutyunyan, Hrant Khachatrian, Greg Ver Steeg, Aram Galstyan

    Abstract: Domain generalization algorithms use training data from multiple domains to learn models that generalize well to unseen domains. While recently proposed benchmarks demonstrate that most of the existing algorithms do not outperform simple baselines, the established evaluation methods fail to expose the impact of various factors that contribute to the poor performance. In this paper we propose an ev… ▽ More

    Submitted 26 November, 2021; originally announced November 2021.

  30. arXiv:2111.06312  [pdf, other

    cs.LG cs.AI cs.MS cs.SI

    Implicit SVD for Graph Representation Learning

    Authors: Sami Abu-El-Haija, Hesham Mostafa, Marcel Nassar, Valentino Crespi, Greg Ver Steeg, Aram Galstyan

    Abstract: Recent improvements in the performance of state-of-the-art (SOTA) methods for Graph Representational Learning (GRL) have come at the cost of significant computational resource requirements for training, e.g., for calculating gradients via backprop over many data epochs. Meanwhile, Singular Value Decomposition (SVD) can find closed-form solutions to convex problems, using merely a handful of epochs… ▽ More

    Submitted 11 November, 2021; originally announced November 2021.

    Journal ref: Advances in Neural Information Processing Systems (NeurIPS) 2021

  31. arXiv:2111.02434  [pdf, other

    cs.LG physics.comp-ph

    Hamiltonian Dynamics with Non-Newtonian Momentum for Rapid Sampling

    Authors: Greg Ver Steeg, Aram Galstyan

    Abstract: Sampling from an unnormalized probability distribution is a fundamental problem in machine learning with applications including Bayesian modeling, latent factor inference, and energy-based model training. After decades of research, variations of MCMC remain the default approach to sampling despite slow convergence. Auxiliary neural models can learn to speed up MCMC, but the overhead for training t… ▽ More

    Submitted 29 December, 2021; v1 submitted 3 November, 2021; originally announced November 2021.

    Comments: 31 pages, 19 figures. Advances in Neural Information Processing Systems (NeurIPS), 2021. Animations at https://sites.google.com/view/esh-dynamics/home, code at https://github.com/gregversteeg/esh_dynamics

  32. arXiv:2110.01584  [pdf, other

    cs.LG stat.ML

    Information-theoretic generalization bounds for black-box learning algorithms

    Authors: Hrayr Harutyunyan, Maxim Raginsky, Greg Ver Steeg, Aram Galstyan

    Abstract: We derive information-theoretic generalization bounds for supervised learning algorithms based on the information contained in predictions rather than in the output of the training algorithm. These bounds improve over the existing information-theoretic bounds, are applicable to a wider range of algorithms, and solve two key challenges: (a) they give meaningful results for deterministic algorithms… ▽ More

    Submitted 5 October, 2021; v1 submitted 4 October, 2021; originally announced October 2021.

    Comments: NeurIPS 2021

  33. arXiv:2109.03952  [pdf, other

    cs.AI

    Attributing Fair Decisions with Attention Interventions

    Authors: Ninareh Mehrabi, Umang Gupta, Fred Morstatter, Greg Ver Steeg, Aram Galstyan

    Abstract: The widespread use of Artificial Intelligence (AI) in consequential domains, such as healthcare and parole decision-making systems, has drawn intense scrutiny on the fairness of these methods. However, ensuring fairness is often insufficient as the rationale for a contentious decision needs to be audited, understood, and defended. We propose that the attention mechanism can be used to ensure fair… ▽ More

    Submitted 8 September, 2021; originally announced September 2021.

  34. arXiv:2108.03437  [pdf, other

    cs.CR cs.LG

    Secure Neuroimaging Analysis using Federated Learning with Homomorphic Encryption

    Authors: Dimitris Stripelis, Hamza Saleem, Tanmay Ghai, Nikhil Dhinagar, Umang Gupta, Chrysovalantis Anastasiou, Greg Ver Steeg, Srivatsan Ravi, Muhammad Naveed, Paul M. Thompson, Jose Luis Ambite

    Abstract: Federated learning (FL) enables distributed computation of machine learning models over various disparate, remote data sources, without requiring to transfer any individual data to a centralized location. This results in an improved generalizability of models and efficient scaling of computation as more sources and larger datasets are added to the federation. Nevertheless, recent membership attack… ▽ More

    Submitted 9 November, 2021; v1 submitted 7 August, 2021; originally announced August 2021.

    Comments: 9 pages, 3 figures, 1 algorithm

  35. arXiv:2107.00745  [pdf, other

    cs.LG cs.AI stat.ML

    q-Paths: Generalizing the Geometric Annealing Path using Power Means

    Authors: Vaden Masrani, Rob Brekelmans, Thang Bui, Frank Nielsen, Aram Galstyan, Greg Ver Steeg, Frank Wood

    Abstract: Many common machine learning methods involve the geometric annealing path, a sequence of intermediate densities between two distributions of interest constructed using the geometric average. While alternatives such as the moment-averaging path have demonstrated performance gains in some settings, their practical applicability remains limited by exponential family endpoint assumptions and a lack of… ▽ More

    Submitted 1 July, 2021; originally announced July 2021.

    Comments: arXiv admin note: text overlap with arXiv:2012.07823

  36. arXiv:2105.02866  [pdf, other

    q-bio.QM cs.CR cs.LG eess.IV

    Membership Inference Attacks on Deep Regression Models for Neuroimaging

    Authors: Umang Gupta, Dimitris Stripelis, Pradeep K. Lam, Paul M. Thompson, José Luis Ambite, Greg Ver Steeg

    Abstract: Ensuring the privacy of research participants is vital, even more so in healthcare environments. Deep learning approaches to neuroimaging require large datasets, and this often necessitates sharing data between multiple sites, which is antithetical to the privacy objectives. Federated learning is a commonly proposed solution to this problem. It circumvents the need for data sharing by sharing para… ▽ More

    Submitted 3 June, 2021; v1 submitted 6 May, 2021; originally announced May 2021.

    Comments: To appear at Medical Imaging with Deep Learning 2021 (MIDL 2021)

  37. arXiv:2102.08530  [pdf, other

    cs.LG cs.MS cs.SI

    Fast Graph Learning with Unique Optimal Solutions

    Authors: Sami Abu-El-Haija, Valentino Crespi, Greg Ver Steeg, Aram Galstyan

    Abstract: We consider two popular Graph Representation Learning (GRL) methods: message passing for node classification and network embedding for link prediction. For each, we pick a popular model that we: (i) linearize and (ii) and switch its training objective to Frobenius norm error minimization. These simplifications can cast the training into finding the optimal parameters in closed-form. We program in… ▽ More

    Submitted 22 April, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

    Journal ref: ICLR 2021 Workshop on Geometrical and Topological Representation Learning

  38. arXiv:2102.04438  [pdf, other

    eess.IV cs.LG q-bio.QM

    Improved Brain Age Estimation with Slice-based Set Networks

    Authors: Umang Gupta, Pradeep K. Lam, Greg Ver Steeg, Paul M. Thompson

    Abstract: Deep Learning for neuroimaging data is a promising but challenging direction. The high dimensionality of 3D MRI scans makes this endeavor compute and data-intensive. Most conventional 3D neuroimaging methods use 3D-CNN-based architectures with a large number of parameters and require more time and data to train. Recently, 2D-slice-based models have received increasing attention as they have fewer… ▽ More

    Submitted 9 February, 2021; v1 submitted 8 February, 2021; originally announced February 2021.

    Comments: To appear at IEEE International Symposium on Biomedical Imaging 2021 (ISBI 2021). Code is available at https://git.io/JtazG

  39. arXiv:2102.04350  [pdf, other

    cs.LG

    Graph Traversal with Tensor Functionals: A Meta-Algorithm for Scalable Learning

    Authors: Elan Markowitz, Keshav Balasubramanian, Mehrnoosh Mirtaheri, Sami Abu-El-Haija, Bryan Perozzi, Greg Ver Steeg, Aram Galstyan

    Abstract: Graph Representation Learning (GRL) methods have impacted fields from chemistry to social science. However, their algorithmic implementations are specialized to specific use-cases e.g.message passing methods are run differently from node embedding ones. Despite their apparent differences, all these methods utilize the graph structure, and therefore, their learning can be approximated with stochast… ▽ More

    Submitted 8 February, 2021; originally announced February 2021.

    Comments: To appear in ICLR 2021

  40. arXiv:2101.04108  [pdf, other

    cs.LG stat.ML

    Controllable Guarantees for Fair Outcomes via Contrastive Information Estimation

    Authors: Umang Gupta, Aaron M Ferber, Bistra Dilkina, Greg Ver Steeg

    Abstract: Controlling bias in training datasets is vital for ensuring equal treatment, or parity, between different groups in downstream applications. A naive solution is to transform the data so that it is statistically independent of group membership, but this may throw away too much information when a reasonable compromise between fairness and accuracy is desired. Another common approach is to limit the… ▽ More

    Submitted 3 June, 2021; v1 submitted 11 January, 2021; originally announced January 2021.

    Comments: This version fixes an error in Theorem 2 of the original manuscript that appeared at the Proceedings of the 35th AAAI Conference on Artificial Intelligence (AAAI-21). Code is available at https://github.com/umgupta/fairness-via-contrastive-estimation

  41. arXiv:2012.15480  [pdf, other

    cs.LG cs.IT stat.ML

    Likelihood Ratio Exponential Families

    Authors: Rob Brekelmans, Frank Nielsen, Alireza Makhzani, Aram Galstyan, Greg Ver Steeg

    Abstract: The exponential family is well known in machine learning and statistical physics as the maximum entropy distribution subject to a set of observed constraints, while the geometric mixture path is common in MCMC methods such as annealed importance sampling. Linking these two ideas, recent work has interpreted the geometric mixture path as an exponential family of distributions to analyze the thermod… ▽ More

    Submitted 15 January, 2021; v1 submitted 31 December, 2020; originally announced December 2020.

    Comments: NeurIPS Workshop on Deep Learning through Information Geometry

  42. arXiv:2012.07823  [pdf, other

    cs.LG

    Annealed Importance Sampling with q-Paths

    Authors: Rob Brekelmans, Vaden Masrani, Thang Bui, Frank Wood, Aram Galstyan, Greg Ver Steeg, Frank Nielsen

    Abstract: Annealed importance sampling (AIS) is the gold standard for estimating partition functions or marginal likelihoods, corresponding to importance sampling over a path of distributions between a tractable base and an unnormalized target. While AIS yields an unbiased estimator for any path, existing literature has been primarily limited to the geometric mixture or moment-averaged paths associated with… ▽ More

    Submitted 14 December, 2020; originally announced December 2020.

    Comments: NeurIPS Workshop on Deep Learning through Information Geometry (Best Paper Award)

    Journal ref: Published at UAI 2021 https://arxiv.boxedpaper.com/abs/2107.00745

  43. arXiv:2007.14917  [pdf, other

    cs.LG stat.ML

    Compressing Deep Neural Networks via Layer Fusion

    Authors: James O' Neill, Greg Ver Steeg, Aram Galstyan

    Abstract: This paper proposes \textit{layer fusion} - a model compression technique that discovers which weights to combine and then fuses weights of similar fully-connected, convolutional and attention layers. Layer fusion can significantly reduce the number of layers of the original network with little additional computation overhead, while maintaining competitive performance. From experiments on CIFAR-10… ▽ More

    Submitted 29 July, 2020; originally announced July 2020.

  44. arXiv:2007.05335  [pdf, other

    cs.LG stat.ML

    Robust Classification under Class-Dependent Domain Shift

    Authors: Tigran Galstyan, Hrant Khachatrian, Greg Ver Steeg, Aram Galstyan

    Abstract: Investigation of machine learning algorithms robust to changes between the training and test distributions is an active area of research. In this paper we explore a special type of dataset shift which we call class-dependent domain shift. It is characterized by the following features: the input data causally depends on the label, the shift in the data is fully explained by a known variable, the va… ▽ More

    Submitted 10 July, 2020; originally announced July 2020.

    Comments: Accepted at ICML 2020 workshop on Uncertainty and Robustness in Deep Learning

  45. arXiv:2007.00642  [pdf, other

    cs.LG stat.ML

    All in the Exponential Family: Bregman Duality in Thermodynamic Variational Inference

    Authors: Rob Brekelmans, Vaden Masrani, Frank Wood, Greg Ver Steeg, Aram Galstyan

    Abstract: The recently proposed Thermodynamic Variational Objective (TVO) leverages thermodynamic integration to provide a family of variational inference objectives, which both tighten and generalize the ubiquitous Evidence Lower Bound (ELBO). However, the tightness of TVO bounds was not previously known, an expensive grid search was used to choose a "schedule" of intermediate distributions, and model lear… ▽ More

    Submitted 1 July, 2020; originally announced July 2020.

    Comments: ICML 2020

  46. arXiv:2006.00115  [pdf, other

    q-bio.QM cs.CV cs.LG eess.IV

    Overview of Scanner Invariant Representations

    Authors: Daniel Moyer, Greg Ver Steeg, Paul M. Thompson

    Abstract: Pooled imaging data from multiple sources is subject to bias from each source. Studies that do not correct for these scanner/site biases at best lose statistical power, and at worst leave spurious correlations in their data. Estimation of the bias effects is non-trivial due to the paucity of data with correspondence across sites, so called "traveling phantom" data, which is expensive to collect. N… ▽ More

    Submitted 29 May, 2020; originally announced June 2020.

    Comments: Accepted as a short paper in MIDL 2020. In accordance with the MIDL 2020 Call for Papers, this short paper is an overview of an already published work arXiv:1904.05375, and was submitted to MIDL in order to allow presentation and discussion at the meeting

    Report number: MIDL/2020/ExtendedAbstract/yqm9RD_XHT

  47. A Metric Space for Point Process Excitations

    Authors: Myrl G. Marmarelis, Greg Ver Steeg, Aram Galstyan

    Abstract: A multivariate Hawkes process enables self- and cross-excitations through a triggering matrix that behaves like an asymmetrical covariance structure, characterizing pairwise interactions between the event types. Full-rank estimation of all interactions is often infeasible in empirical settings. Models that specialize on a spatiotemporal application alleviate this obstacle by exploiting spatial loc… ▽ More

    Submitted 23 April, 2022; v1 submitted 5 May, 2020; originally announced May 2020.

    Journal ref: Journal of Artificial Intelligence Research 73 (2022) 1323-1353

  48. arXiv:2002.07933  [pdf, other

    cs.LG stat.ML

    Improving Generalization by Controlling Label-Noise Information in Neural Network Weights

    Authors: Hrayr Harutyunyan, Kyle Reing, Greg Ver Steeg, Aram Galstyan

    Abstract: In the presence of noisy or incorrect labels, neural networks have the undesirable tendency to memorize information about the noise. Standard regularization techniques such as dropout, weight decay or data augmentation sometimes help, but do not prevent this behavior. If one considers neural network weights as random variables that depend on the data and stochasticity of training, the amount of me… ▽ More

    Submitted 20 November, 2020; v1 submitted 18 February, 2020; originally announced February 2020.

    Comments: ICML, 2020

  49. arXiv:1912.00646  [pdf, other

    cs.LG stat.ML

    Discovery and Separation of Features for Invariant Representation Learning

    Authors: Ayush Jaiswal, Rob Brekelmans, Daniel Moyer, Greg Ver Steeg, Wael AbdAlmageed, Premkumar Natarajan

    Abstract: Supervised machine learning models often associate irrelevant nuisance factors with the prediction target, which hurts generalization. We propose a framework for training robust neural networks that induces invariance to nuisances through learning to discover and separate predictive and nuisance factors of data. We present an information theoretic formulation of our approach, from which we derive… ▽ More

    Submitted 2 December, 2019; originally announced December 2019.

    Comments: 10 pages, 3 figures

  50. arXiv:1911.04060  [pdf, other

    cs.LG stat.ML

    Invariant Representations through Adversarial Forgetting

    Authors: Ayush Jaiswal, Daniel Moyer, Greg Ver Steeg, Wael AbdAlmageed, Premkumar Natarajan

    Abstract: We propose a novel approach to achieving invariance for deep neural networks in the form of inducing amnesia to unwanted factors of data through a new adversarial forgetting mechanism. We show that the forgetting mechanism serves as an information-bottleneck, which is manipulated by the adversarial training to learn invariance to unwanted factors. Empirical results show that the proposed framework… ▽ More

    Submitted 20 November, 2019; v1 submitted 10 November, 2019; originally announced November 2019.

    Comments: To appear in Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI-20)