-
Why Masking Diffusion Works: Condition on the Jump Schedule for Improved Discrete Diffusion
Authors:
Alan N. Amin,
Nate Gruver,
Andrew Gordon Wilson
Abstract:
Discrete diffusion models, like continuous diffusion models, generate high-quality samples by gradually undoing noise applied to datapoints with a Markov process. Gradual generation in theory comes with many conceptual benefits; for example, inductive biases can be incorporated into the noising Markov process, and access to improved sampling algorithms. In practice, however, the consistently best…
▽ More
Discrete diffusion models, like continuous diffusion models, generate high-quality samples by gradually undoing noise applied to datapoints with a Markov process. Gradual generation in theory comes with many conceptual benefits; for example, inductive biases can be incorporated into the noising Markov process, and access to improved sampling algorithms. In practice, however, the consistently best performing discrete diffusion model is, surprisingly, masking diffusion, which does not denoise gradually. Here we explain the superior performance of masking diffusion by noting that it makes use of a fundamental difference between continuous and discrete Markov processes: discrete Markov processes evolve by discontinuous jumps at a fixed rate and, unlike other discrete diffusion models, masking diffusion builds in the known distribution of jump times and only learns where to jump to. We show that we can similarly bake in the known distribution of jump times into any discrete diffusion model. The resulting models - schedule-conditioned discrete diffusion (SCUD) - generalize classical discrete diffusion and masking diffusion. By applying SCUD to models with noising processes that incorporate inductive biases on images, text, and protein data, we build models that outperform masking.
△ Less
Submitted 9 June, 2025;
originally announced June 2025.
-
IGraSS: Learning to Identify Infrastructure Networks from Satellite Imagery by Iterative Graph-constrained Semantic Segmentation
Authors:
Oishee Bintey Hoque,
Abhijin Adiga,
Aniruddha Adiga,
Siddharth Chaudhary,
Madhav V. Marathe,
S. S. Ravi,
Kirti Rajagopalan,
Amanda Wilson,
Samarth Swarup
Abstract:
Accurate canal network mapping is essential for water management, including irrigation planning and infrastructure maintenance. State-of-the-art semantic segmentation models for infrastructure mapping, such as roads, rely on large, well-annotated remote sensing datasets. However, incomplete or inadequate ground truth can hinder these learning approaches. Many infrastructure networks have graph-lev…
▽ More
Accurate canal network mapping is essential for water management, including irrigation planning and infrastructure maintenance. State-of-the-art semantic segmentation models for infrastructure mapping, such as roads, rely on large, well-annotated remote sensing datasets. However, incomplete or inadequate ground truth can hinder these learning approaches. Many infrastructure networks have graph-level properties such as reachability to a source (like canals) or connectivity (roads) that can be leveraged to improve these existing ground truth. This paper develops a novel iterative framework IGraSS, combining a semantic segmentation module-incorporating RGB and additional modalities (NDWI, DEM)-with a graph-based ground-truth refinement module. The segmentation module processes satellite imagery patches, while the refinement module operates on the entire data viewing the infrastructure network as a graph. Experiments show that IGraSS reduces unreachable canal segments from around 18% to 3%, and training with refined ground truth significantly improves canal identification. IGraSS serves as a robust framework for both refining noisy ground truth and mapping canal networks from remote sensing imagery. We also demonstrate the effectiveness and generalizability of IGraSS using road networks as an example, applying a different graph-theoretic constraint to complete road networks.
△ Less
Submitted 10 June, 2025; v1 submitted 9 June, 2025;
originally announced June 2025.
-
The Gaussian Mixing Mechanism: Renyi Differential Privacy via Gaussian Sketches
Authors:
Omri Lev,
Vishwak Srinivasan,
Moshe Shenfeld,
Katrina Ligett,
Ayush Sekhari,
Ashia C. Wilson
Abstract:
Gaussian sketching, which consists of pre-multiplying the data with a random Gaussian matrix, is a widely used technique for multiple problems in data science and machine learning, with applications spanning computationally efficient optimization, coded computing, and federated learning. This operation also provides differential privacy guarantees due to its inherent randomness. In this work, we r…
▽ More
Gaussian sketching, which consists of pre-multiplying the data with a random Gaussian matrix, is a widely used technique for multiple problems in data science and machine learning, with applications spanning computationally efficient optimization, coded computing, and federated learning. This operation also provides differential privacy guarantees due to its inherent randomness. In this work, we revisit this operation through the lens of Renyi Differential Privacy (RDP), providing a refined privacy analysis that yields significantly tighter bounds than prior results. We then demonstrate how this improved analysis leads to performance improvement in different linear regression settings, establishing theoretical utility guarantees. Empirically, our methods improve performance across multiple datasets and, in several cases, reduce runtime.
△ Less
Submitted 4 June, 2025; v1 submitted 30 May, 2025;
originally announced May 2025.
-
Out of Sight, Not Out of Context? Egocentric Spatial Reasoning in VLMs Across Disjoint Frames
Authors:
Sahithya Ravi,
Gabriel Sarch,
Vibhav Vineet,
Andrew D. Wilson,
Balasaravanan Thoravi Kumaravel
Abstract:
An embodied AI assistant operating on egocentric video must integrate spatial cues across time - for instance, determining where an object A, glimpsed a few moments ago lies relative to an object B encountered later. We introduce Disjoint-3DQA , a generative QA benchmark that evaluates this ability of VLMs by posing questions about object pairs that are not co-visible in the same frame. We evaluat…
▽ More
An embodied AI assistant operating on egocentric video must integrate spatial cues across time - for instance, determining where an object A, glimpsed a few moments ago lies relative to an object B encountered later. We introduce Disjoint-3DQA , a generative QA benchmark that evaluates this ability of VLMs by posing questions about object pairs that are not co-visible in the same frame. We evaluated seven state-of-the-art VLMs and found that models lag behind human performance by 28%, with steeper declines in accuracy (60% to 30 %) as the temporal gap widens. Our analysis further reveals that providing trajectories or bird's-eye-view projections to VLMs results in only marginal improvements, whereas providing oracle 3D coordinates leads to a substantial 20% performance increase. This highlights a core bottleneck of multi-frame VLMs in constructing and maintaining 3D scene representations over time from visual signals. Disjoint-3DQA therefore sets a clear, measurable challenge for long-horizon spatial reasoning and aims to catalyze future research at the intersection of vision, language, and embodied AI.
△ Less
Submitted 30 May, 2025;
originally announced May 2025.
-
Physiology-Informed Generative Multi-Task Network for Contrast-Free CT Perfusion
Authors:
Wasif Khan,
Kyle B. See,
Simon Kato,
Ziqian Huang,
Amy Lazarte,
Kyle Douglas,
Xiangyang Lou,
Teng J. Peng,
Dhanashree Rajderkar,
John Rees,
Pina Sanelli,
Amita Singh,
Ibrahim Tuna,
Christina A. Wilson,
Ruogu Fang
Abstract:
Perfusion imaging is extensively utilized to assess hemodynamic status and tissue perfusion in various organs. Computed tomography perfusion (CTP) imaging plays a key role in the early assessment and planning of stroke treatment. While CTP provides essential perfusion parameters to identify abnormal blood flow in the brain, the use of contrast agents in CTP can lead to allergic reactions and adver…
▽ More
Perfusion imaging is extensively utilized to assess hemodynamic status and tissue perfusion in various organs. Computed tomography perfusion (CTP) imaging plays a key role in the early assessment and planning of stroke treatment. While CTP provides essential perfusion parameters to identify abnormal blood flow in the brain, the use of contrast agents in CTP can lead to allergic reactions and adverse side effects, along with costing USD 4.9 billion worldwide in 2022. To address these challenges, we propose a novel deep learning framework called Multitask Automated Generation of Intermodal CT perfusion maps (MAGIC). This framework combines generative artificial intelligence and physiological information to map non-contrast computed tomography (CT) imaging to multiple contrast-free CTP imaging maps. We demonstrate enhanced image fidelity by incorporating physiological characteristics into the loss terms. Our network was trained and validated using CT image data from patients referred for stroke at UF Health and demonstrated robustness to abnormalities in brain perfusion activity. A double-blinded study was conducted involving seven experienced neuroradiologists and vascular neurologists. This study validated MAGIC's visual quality and diagnostic accuracy showing favorable performance compared to clinical perfusion imaging with intravenous contrast injection. Overall, MAGIC holds great promise in revolutionizing healthcare by offering contrast-free, cost-effective, and rapid perfusion imaging.
△ Less
Submitted 12 May, 2025;
originally announced May 2025.
-
Layered Unlearning for Adversarial Relearning
Authors:
Timothy Qian,
Vinith Suriyakumar,
Ashia Wilson,
Dylan Hadfield-Menell
Abstract:
Our goal is to understand how post-training methods, such as fine-tuning, alignment, and unlearning, modify language model behavior and representations. We are particularly interested in the brittle nature of these modifications that makes them easy to bypass through prompt engineering or relearning. Recent results suggest that post-training induces shallow context-dependent ``circuits'' that supp…
▽ More
Our goal is to understand how post-training methods, such as fine-tuning, alignment, and unlearning, modify language model behavior and representations. We are particularly interested in the brittle nature of these modifications that makes them easy to bypass through prompt engineering or relearning. Recent results suggest that post-training induces shallow context-dependent ``circuits'' that suppress specific response patterns. This could be one explanation for the brittleness of post-training. To test this hypothesis, we design an unlearning algorithm, Layered Unlearning (LU), that creates distinct inhibitory mechanisms for a growing subset of the data. By unlearning the first $i$ folds while retaining the remaining $k - i$ at the $i$th of $k$ stages, LU limits the ability of relearning on a subset of data to recover the full dataset. We evaluate LU through a combination of synthetic and large language model (LLM) experiments. We find that LU improves robustness to adversarial relearning for several different unlearning methods. Our results contribute to the state-of-the-art of machine unlearning and provide insight into the effect of post-training updates.
△ Less
Submitted 14 May, 2025;
originally announced May 2025.
-
Knowledge-Informed Deep Learning for Irrigation Type Mapping from Remote Sensing
Authors:
Oishee Bintey Hoque,
Nibir Chandra Mandal,
Abhijin Adiga,
Samarth Swarup,
Sayjro Kossi Nouwakpo,
Amanda Wilson,
Madhav Marathe
Abstract:
Accurate mapping of irrigation methods is crucial for sustainable agricultural practices and food systems. However, existing models that rely solely on spectral features from satellite imagery are ineffective due to the complexity of agricultural landscapes and limited training data, making this a challenging problem. We present Knowledge-Informed Irrigation Mapping (KIIM), a novel Swin-Transforme…
▽ More
Accurate mapping of irrigation methods is crucial for sustainable agricultural practices and food systems. However, existing models that rely solely on spectral features from satellite imagery are ineffective due to the complexity of agricultural landscapes and limited training data, making this a challenging problem. We present Knowledge-Informed Irrigation Mapping (KIIM), a novel Swin-Transformer based approach that uses (i) a specialized projection matrix to encode crop to irrigation probability, (ii) a spatial attention map to identify agricultural lands from non-agricultural lands, (iii) bi-directional cross-attention to focus complementary information from different modalities, and (iv) a weighted ensemble for combining predictions from images and crop information. Our experimentation on five states in the US shows up to 22.9\% (IoU) improvement over baseline with a 71.4% (IoU) improvement for hard-to-classify drip irrigation. In addition, we propose a two-phase transfer learning approach to enhance cross-state irrigation mapping, achieving a 51% IoU boost in a state with limited labeled data. The ability to achieve baseline performance with only 40% of the training data highlights its efficiency, reducing the dependency on extensive manual labeling efforts and making large-scale, automated irrigation mapping more feasible and cost-effective.
△ Less
Submitted 5 June, 2025; v1 submitted 13 May, 2025;
originally announced May 2025.
-
Grounding Task Assistance with Multimodal Cues from a Single Demonstration
Authors:
Gabriel Sarch,
Balasaravanan Thoravi Kumaravel,
Sahithya Ravi,
Vibhav Vineet,
Andrew D. Wilson
Abstract:
A person's demonstration often serves as a key reference for others learning the same task. However, RGB video, the dominant medium for representing these demonstrations, often fails to capture fine-grained contextual cues such as intent, safety-critical environmental factors, and subtle preferences embedded in human behavior. This sensory gap fundamentally limits the ability of Vision Language Mo…
▽ More
A person's demonstration often serves as a key reference for others learning the same task. However, RGB video, the dominant medium for representing these demonstrations, often fails to capture fine-grained contextual cues such as intent, safety-critical environmental factors, and subtle preferences embedded in human behavior. This sensory gap fundamentally limits the ability of Vision Language Models (VLMs) to reason about why actions occur and how they should adapt to individual users. To address this, we introduce MICA (Multimodal Interactive Contextualized Assistance), a framework that improves conversational agents for task assistance by integrating eye gaze and speech cues. MICA segments demonstrations into meaningful sub-tasks and extracts keyframes and captions that capture fine-grained intent and user-specific cues, enabling richer contextual grounding for visual question answering. Evaluations on questions derived from real-time chat-assisted task replication show that multimodal cues significantly improve response quality over frame-based retrieval. Notably, gaze cues alone achieves 93% of speech performance, and their combination yields the highest accuracy. Task type determines the effectiveness of implicit (gaze) vs. explicit (speech) cues, underscoring the need for adaptable multimodal models. These results highlight the limitations of frame-based context and demonstrate the value of multimodal signals for real-world AI task assistance.
△ Less
Submitted 2 May, 2025;
originally announced May 2025.
-
Compute-Optimal LLMs Provably Generalize Better With Scale
Authors:
Marc Finzi,
Sanyam Kapoor,
Diego Granziol,
Anming Gu,
Christopher De Sa,
J. Zico Kolter,
Andrew Gordon Wilson
Abstract:
Why do larger language models generalize better? To investigate this question, we develop generalization bounds on the pretraining objective of large language models (LLMs) in the compute-optimal regime, as described by the Chinchilla scaling laws. We introduce a novel, fully empirical Freedman-type martingale concentration inequality that tightens existing bounds by accounting for the variance of…
▽ More
Why do larger language models generalize better? To investigate this question, we develop generalization bounds on the pretraining objective of large language models (LLMs) in the compute-optimal regime, as described by the Chinchilla scaling laws. We introduce a novel, fully empirical Freedman-type martingale concentration inequality that tightens existing bounds by accounting for the variance of the loss function. This generalization bound can be decomposed into three interpretable components: the number of parameters per token, the loss variance, and the quantization error at a fixed bitrate. As compute-optimal language models are scaled up, the number of parameters per data point remains constant; however, both the loss variance and the quantization error decrease, implying that larger models should have smaller generalization gaps. We examine why larger models tend to be more quantizable from an information theoretic perspective, showing that the rate at which they can integrate new information grows more slowly than their capacity on the compute-optimal frontier. From these findings we produce a scaling law for the generalization gap, with bounds that become predictably stronger with scale.
△ Less
Submitted 21 April, 2025;
originally announced April 2025.
-
A Consequentialist Critique of Binary Classification Evaluation Practices
Authors:
Gerardo Flores,
Abigail Schiff,
Alyssa H. Smith,
Julia A Fukuyama,
Ashia C. Wilson
Abstract:
ML-supported decisions, such as ordering tests or determining preventive custody, often involve binary classification based on probabilistic forecasts. Evaluation frameworks for such forecasts typically consider whether to prioritize independent-decision metrics (e.g., Accuracy) or top-K metrics (e.g., Precision@K), and whether to focus on fixed thresholds or threshold-agnostic measures like AUC-R…
▽ More
ML-supported decisions, such as ordering tests or determining preventive custody, often involve binary classification based on probabilistic forecasts. Evaluation frameworks for such forecasts typically consider whether to prioritize independent-decision metrics (e.g., Accuracy) or top-K metrics (e.g., Precision@K), and whether to focus on fixed thresholds or threshold-agnostic measures like AUC-ROC. We highlight that a consequentialist perspective, long advocated by decision theorists, should naturally favor evaluations that support independent decisions using a mixture of thresholds given their prevalence, such as Brier scores and Log loss. However, our empirical analysis reveals a strong preference for top-K metrics or fixed thresholds in evaluations at major conferences like ICML, FAccT, and CHIL. To address this gap, we use this decision-theoretic framework to map evaluation metrics to their optimal use cases, along with a Python package, briertools, to promote the broader adoption of Brier scores. In doing so, we also uncover new theoretical connections, including a reconciliation between the Brier Score and Decision Curve Analysis, which clarifies and responds to a longstanding critique by (Assel, et al. 2017) regarding the clinical utility of proper scoring rules.
△ Less
Submitted 6 April, 2025;
originally announced April 2025.
-
A Reliable and Efficient Detection Pipeline for Rodent Ultrasonic Vocalizations
Authors:
Sabah Shahnoor Anis,
Devin M. Kellis,
Kris Ford Kaigler,
Marlene A. Wilson,
Christian O'Reilly
Abstract:
Analyzing ultrasonic vocalizations (USVs) is crucial for understanding rodents' affective states and social behaviors, but the manual analysis is time-consuming and prone to errors. Automated USV detection systems have been developed to address these challenges. Yet, these systems often rely on machine learning and fail to generalize effectively to new datasets. To tackle these shortcomings, we in…
▽ More
Analyzing ultrasonic vocalizations (USVs) is crucial for understanding rodents' affective states and social behaviors, but the manual analysis is time-consuming and prone to errors. Automated USV detection systems have been developed to address these challenges. Yet, these systems often rely on machine learning and fail to generalize effectively to new datasets. To tackle these shortcomings, we introduce ContourUSV, an efficient automated system for detecting USVs from audio recordings. Our pipeline includes spectrogram generation, cleaning, pre-processing, contour detection, post-processing, and evaluation against manual annotations. To ensure robustness and reliability, we compared ContourUSV with three state-of-the-art systems using an existing open-access USV dataset (USVSEG) and a second dataset we are releasing publicly along with this paper. On average, across the two datasets, ContourUSV outperformed the other three systems with a 1.51x improvement in precision, 1.17x in recall, 1.80x in F1 score, and 1.49x in specificity while achieving an average speedup of 117.07x.
△ Less
Submitted 24 March, 2025;
originally announced March 2025.
-
Allocation Multiplicity: Evaluating the Promises of the Rashomon Set
Authors:
Shomik Jain,
Margaret Wang,
Kathleen Creel,
Ashia Wilson
Abstract:
The Rashomon set of equally-good models promises less discriminatory algorithms, reduced outcome homogenization, and fairer decisions through model ensembles or reconciliation. However, we argue from the perspective of allocation multiplicity that these promises may remain unfulfilled. When there are more qualified candidates than resources available, many different allocations of scarce resources…
▽ More
The Rashomon set of equally-good models promises less discriminatory algorithms, reduced outcome homogenization, and fairer decisions through model ensembles or reconciliation. However, we argue from the perspective of allocation multiplicity that these promises may remain unfulfilled. When there are more qualified candidates than resources available, many different allocations of scarce resources can achieve the same utility. This space of equal-utility allocations may not be faithfully reflected by the Rashomon set, as we show in a case study of healthcare allocations. We attribute these unfulfilled promises to several factors: limitations in empirical methods for sampling from the Rashomon set, the standard practice of deterministically selecting individuals with the lowest risk, and structural biases that cause all equally-good models to view some qualified individuals as inherently risky.
△ Less
Submitted 25 May, 2025; v1 submitted 20 March, 2025;
originally announced March 2025.
-
Homogeneous Algorithms Can Reduce Competition in Personalized Pricing
Authors:
Nathanael Jo,
Kathleen Creel,
Ashia Wilson,
Manish Raghavan
Abstract:
Firms' algorithm development practices are often homogeneous. Whether firms train algorithms on similar data, aim at similar benchmarks, or rely on similar pre-trained models, the result is correlated predictions. We model the impact of correlated algorithms on competition in the context of personalized pricing. Our analysis reveals that (1) higher correlation diminishes consumer welfare and (2) a…
▽ More
Firms' algorithm development practices are often homogeneous. Whether firms train algorithms on similar data, aim at similar benchmarks, or rely on similar pre-trained models, the result is correlated predictions. We model the impact of correlated algorithms on competition in the context of personalized pricing. Our analysis reveals that (1) higher correlation diminishes consumer welfare and (2) as consumers become more price sensitive, firms are increasingly incentivized to compromise on the accuracy of their predictions in exchange for coordination. We demonstrate our theoretical results in a stylized empirical study where two firms compete using personalized pricing algorithms. Our results underscore the ease with which algorithms facilitate price correlation without overt communication, which raises concerns about a new frontier of anti-competitive behavior. We analyze the implications of our results on the application and interpretation of US antitrust law.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
When Should We Orchestrate Multiple Agents?
Authors:
Umang Bhatt,
Sanyam Kapoor,
Mihir Upadhyay,
Ilia Sucholutsky,
Francesco Quinzan,
Katherine M. Collins,
Adrian Weller,
Andrew Gordon Wilson,
Muhammad Bilal Zafar
Abstract:
Strategies for orchestrating the interactions between multiple agents, both human and artificial, can wildly overestimate performance and underestimate the cost of orchestration. We design a framework to orchestrate agents under realistic conditions, such as inference costs or availability constraints. We show theoretically that orchestration is only effective if there are performance or cost diff…
▽ More
Strategies for orchestrating the interactions between multiple agents, both human and artificial, can wildly overestimate performance and underestimate the cost of orchestration. We design a framework to orchestrate agents under realistic conditions, such as inference costs or availability constraints. We show theoretically that orchestration is only effective if there are performance or cost differentials between agents. We then empirically demonstrate how orchestration between multiple agents can be helpful for selecting agents in a simulated environment, picking a learning strategy in the infamous Rogers' Paradox from social science, and outsourcing tasks to other agents during a question-answer task in a user study.
△ Less
Submitted 17 March, 2025;
originally announced March 2025.
-
Deep Learning is Not So Mysterious or Different
Authors:
Andrew Gordon Wilson
Abstract:
Deep neural networks are often seen as different from other model classes by defying conventional notions of generalization. Popular examples of anomalous generalization behaviour include benign overfitting, double descent, and the success of overparametrization. We argue that these phenomena are not distinct to neural networks, or particularly mysterious. Moreover, this generalization behaviour c…
▽ More
Deep neural networks are often seen as different from other model classes by defying conventional notions of generalization. Popular examples of anomalous generalization behaviour include benign overfitting, double descent, and the success of overparametrization. We argue that these phenomena are not distinct to neural networks, or particularly mysterious. Moreover, this generalization behaviour can be intuitively understood, and rigorously characterized using long-standing generalization frameworks such as PAC-Bayes and countable hypothesis bounds. We present soft inductive biases as a key unifying principle in explaining these phenomena: rather than restricting the hypothesis space to avoid overfitting, embrace a flexible hypothesis space, with a soft preference for simpler solutions that are consistent with the data. This principle can be encoded in many model classes, and thus deep learning is not as mysterious or different from other model classes as it might seem. However, we also highlight how deep learning is relatively distinct in other ways, such as its ability for representation learning, phenomena such as mode connectivity, and its relative universality.
△ Less
Submitted 3 March, 2025;
originally announced March 2025.
-
Spatiotemporal Forecasting in Climate Data Using EOFs and Machine Learning Models: A Case Study in Chile
Authors:
Mauricio Herrera,
Francisca Kleisinger,
Andrés Wilsón
Abstract:
Effective resource management and environmental planning in regions with high climatic variability, such as Chile, demand advanced predictive tools. This study addresses this challenge by employing an innovative and computationally efficient hybrid methodology that integrates machine learning (ML) methods for time series forecasting with established statistical techniques. The spatiotemporal data…
▽ More
Effective resource management and environmental planning in regions with high climatic variability, such as Chile, demand advanced predictive tools. This study addresses this challenge by employing an innovative and computationally efficient hybrid methodology that integrates machine learning (ML) methods for time series forecasting with established statistical techniques. The spatiotemporal data undergo decomposition using time-dependent Empirical Orthogonal Functions (EOFs), denoted as \(φ_{k}(t)\), and their corresponding spatial coefficients, \(α_{k}(s)\), to reduce dimensionality. Wavelet analysis provides high-resolution time and frequency information from the \(φ_{k}(t)\) functions, while neural networks forecast these functions within a medium-range horizon \(h\). By utilizing various ML models, particularly a Wavelet - ANN hybrid model, we forecast \(φ_{k}(t+h)\) up to a time horizon \(h\), and subsequently reconstruct the spatiotemporal data using these extended EOFs. This methodology is applied to a grid of climate data covering the territory of Chile. It transitions from a high-dimensional multivariate spatiotemporal data forecasting problem to a low-dimensional univariate forecasting problem. Additionally, cluster analysis with Dynamic Time Warping for defining similarities between rainfall time series, along with spatial coherence and predictability assessments, has been instrumental in identifying geographic areas where model performance is enhanced. This approach also elucidates the reasons behind poor forecast performance in regions or clusters with low spatial coherence and predictability. By utilizing cluster medoids, the forecasting process becomes more practical and efficient. This compound approach significantly reduces computational complexity while generating forecasts of reasonable accuracy and utility.
△ Less
Submitted 20 February, 2025;
originally announced February 2025.
-
Transformers Boost the Performance of Decision Trees on Tabular Data across Sample Sizes
Authors:
Mayuka Jayawardhana,
Renbo,
Samuel Dooley,
Valeriia Cherepanova,
Andrew Gordon Wilson,
Frank Hutter,
Colin White,
Tom Goldstein,
Micah Goldblum
Abstract:
Large language models (LLMs) perform remarkably well on tabular datasets in zero- and few-shot settings, since they can extract meaning from natural language column headers that describe features and labels. Similarly, TabPFN, a recent non-LLM transformer pretrained on numerous tables for in-context learning, has demonstrated excellent performance for dataset sizes up to a thousand samples. In con…
▽ More
Large language models (LLMs) perform remarkably well on tabular datasets in zero- and few-shot settings, since they can extract meaning from natural language column headers that describe features and labels. Similarly, TabPFN, a recent non-LLM transformer pretrained on numerous tables for in-context learning, has demonstrated excellent performance for dataset sizes up to a thousand samples. In contrast, gradient-boosted decision trees (GBDTs) are typically trained from scratch on each dataset without benefiting from pretraining data and must learn the relationships between columns from their entries alone since they lack natural language understanding. LLMs and TabPFN excel on small tabular datasets where a strong prior is essential, yet they are not competitive with GBDTs on medium or large datasets, since their context lengths are limited. In this paper, we propose a simple and lightweight approach for fusing large language models and TabPFN with gradient-boosted decision trees, which allows scalable GBDTs to benefit from the natural language capabilities and pretraining of transformers. We name our fusion methods LLM-Boost and PFN-Boost, respectively. While matching or surpassing the performance of the transformer at sufficiently small dataset sizes and GBDTs at sufficiently large sizes, LLM-Boost and PFN-Boost outperform both standalone components on a wide range of dataset sizes in between. We demonstrate state-of-the-art performance against numerous baselines and ensembling algorithms. We find that PFN-Boost achieves the best average performance among all methods we test for all but very small dataset sizes. We release our code at http://github.com/MayukaJ/LLM-Boost .
△ Less
Submitted 5 February, 2025; v1 submitted 4 February, 2025;
originally announced February 2025.
-
PearSAN: A Machine Learning Method for Inverse Design using Pearson Correlated Surrogate Annealing
Authors:
Michael Bezick,
Blake A. Wilson,
Vaishnavi Iyer,
Yuheng Chen,
Vladimir M. Shalaev,
Sabre Kais,
Alexander V. Kildishev,
Alexandra Boltasseva,
Brad Lackey
Abstract:
PearSAN is a machine learning-assisted optimization algorithm applicable to inverse design problems with large design spaces, where traditional optimizers struggle. The algorithm leverages the latent space of a generative model for rapid sampling and employs a Pearson correlated surrogate model to predict the figure of merit of the true design metric. As a showcase example, PearSAN is applied to t…
▽ More
PearSAN is a machine learning-assisted optimization algorithm applicable to inverse design problems with large design spaces, where traditional optimizers struggle. The algorithm leverages the latent space of a generative model for rapid sampling and employs a Pearson correlated surrogate model to predict the figure of merit of the true design metric. As a showcase example, PearSAN is applied to thermophotovoltaic (TPV) metasurface design by matching the working bands between a thermal radiator and a photovoltaic cell. PearSAN can work with any pretrained generative model with a discretized latent space, making it easy to integrate with VQ-VAEs and binary autoencoders. Its novel Pearson correlational loss can be used as both a latent regularization method, similar to batch and layer normalization, and as a surrogate training loss. We compare both to previous energy matching losses, which are shown to enforce poor regularization and performance, even with upgraded affine parameters. PearSAN achieves a state-of-the-art maximum design efficiency of 97%, and is at least an order of magnitude faster than previous methods, with an improved maximum figure-of-merit gain.
△ Less
Submitted 26 December, 2024;
originally announced December 2024.
-
Explainable AI for Multivariate Time Series Pattern Exploration: Latent Space Visual Analytics with Temporal Fusion Transformer and Variational Autoencoders in Power Grid Event Diagnosis
Authors:
Haowen Xu,
Ali Boyaci,
Jianming Lian,
Aaron Wilson
Abstract:
Detecting and analyzing complex patterns in multivariate time-series data is crucial for decision-making in urban and environmental system operations. However, challenges arise from the high dimensionality, intricate complexity, and interconnected nature of complex patterns, which hinder the understanding of their underlying physical processes. Existing AI methods often face limitations in interpr…
▽ More
Detecting and analyzing complex patterns in multivariate time-series data is crucial for decision-making in urban and environmental system operations. However, challenges arise from the high dimensionality, intricate complexity, and interconnected nature of complex patterns, which hinder the understanding of their underlying physical processes. Existing AI methods often face limitations in interpretability, computational efficiency, and scalability, reducing their applicability in real-world scenarios. This paper proposes a novel visual analytics framework that integrates two generative AI models, Temporal Fusion Transformer (TFT) and Variational Autoencoders (VAEs), to reduce complex patterns into lower-dimensional latent spaces and visualize them in 2D using dimensionality reduction techniques such as PCA, t-SNE, and UMAP with DBSCAN. These visualizations, presented through coordinated and interactive views and tailored glyphs, enable intuitive exploration of complex multivariate temporal patterns, identifying patterns' similarities and uncover their potential correlations for a better interpretability of the AI outputs. The framework is demonstrated through a case study on power grid signal data, where it identifies multi-label grid event signatures, including faults and anomalies with diverse root causes. Additionally, novel metrics and visualizations are introduced to validate the models and evaluate the performance, efficiency, and consistency of latent maps generated by TFT and VAE under different configurations. These analyses provide actionable insights for model parameter tuning and reliability improvements. Comparative results highlight that TFT achieves shorter run times and superior scalability to diverse time-series data shapes compared to VAE. This work advances fault diagnosis in multivariate time series, fostering explainable AI to support critical system operations.
△ Less
Submitted 24 December, 2024; v1 submitted 20 December, 2024;
originally announced December 2024.
-
Bayesian Optimization of Antibodies Informed by a Generative Model of Evolving Sequences
Authors:
Alan Nawzad Amin,
Nate Gruver,
Yilun Kuang,
Lily Li,
Hunter Elliott,
Calvin McCarter,
Aniruddh Raghu,
Peyton Greenside,
Andrew Gordon Wilson
Abstract:
To build effective therapeutics, biologists iteratively mutate antibody sequences to improve binding and stability. Proposed mutations can be informed by previous measurements or by learning from large antibody databases to predict only typical antibodies. Unfortunately, the space of typical antibodies is enormous to search, and experiments often fail to find suitable antibodies on a budget. We in…
▽ More
To build effective therapeutics, biologists iteratively mutate antibody sequences to improve binding and stability. Proposed mutations can be informed by previous measurements or by learning from large antibody databases to predict only typical antibodies. Unfortunately, the space of typical antibodies is enormous to search, and experiments often fail to find suitable antibodies on a budget. We introduce Clone-informed Bayesian Optimization (CloneBO), a Bayesian optimization procedure that efficiently optimizes antibodies in the lab by teaching a generative model how our immune system optimizes antibodies. Our immune system makes antibodies by iteratively evolving specific portions of their sequences to bind their target strongly and stably, resulting in a set of related, evolving sequences known as a clonal family. We train a large language model, CloneLM, on hundreds of thousands of clonal families and use it to design sequences with mutations that are most likely to optimize an antibody within the human immune system. We propose to guide our designs to fit previous measurements with a twisted sequential Monte Carlo procedure. We show that CloneBO optimizes antibodies substantially more efficiently than previous methods in realistic in silico experiments and designs stronger and more stable binders in in vitro wet lab experiments.
△ Less
Submitted 10 December, 2024;
originally announced December 2024.
-
Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization
Authors:
Luca Masserano,
Abdul Fatir Ansari,
Boran Han,
Xiyuan Zhang,
Christos Faloutsos,
Michael W. Mahoney,
Andrew Gordon Wilson,
Youngsuk Park,
Syama Rangapuram,
Danielle C. Maddix,
Yuyang Wang
Abstract:
How to best develop foundational models for time series forecasting remains an important open question. Tokenization is a crucial consideration in this effort: what is an effective discrete vocabulary for a real-valued sequential input? To address this question, we develop WaveToken, a wavelet-based tokenizer that allows models to learn complex representations directly in the space of time-localiz…
▽ More
How to best develop foundational models for time series forecasting remains an important open question. Tokenization is a crucial consideration in this effort: what is an effective discrete vocabulary for a real-valued sequential input? To address this question, we develop WaveToken, a wavelet-based tokenizer that allows models to learn complex representations directly in the space of time-localized frequencies. Our method first scales and decomposes the input time series, then thresholds and quantizes the wavelet coefficients, and finally pre-trains an autoregressive model to forecast coefficients for the forecast horizon. By decomposing coarse and fine structures in the inputs, wavelets provide an eloquent and compact language for time series forecasting that simplifies learning. Empirical results on a comprehensive benchmark, including 42 datasets for both in-domain and zero-shot settings, show that WaveToken: i) provides better accuracy than recently proposed foundation models for forecasting while using a much smaller vocabulary (1024 tokens), and performs on par or better than modern deep learning models trained specifically on each dataset; and ii) exhibits superior generalization capabilities, achieving the best average rank across all datasets for three complementary metrics. In addition, we show that our method can easily capture complex temporal patterns of practical relevance that are challenging for other recent pre-trained models, including trends, sparse spikes, and non-stationary time series with varying frequencies evolving over time.
△ Less
Submitted 6 December, 2024;
originally announced December 2024.
-
LLMForecaster: Improving Seasonal Event Forecasts with Unstructured Textual Data
Authors:
Hanyu Zhang,
Chuck Arvin,
Dmitry Efimov,
Michael W. Mahoney,
Dominique Perrault-Joncas,
Shankar Ramasubramanian,
Andrew Gordon Wilson,
Malcolm Wolff
Abstract:
Modern time-series forecasting models often fail to make full use of rich unstructured information about the time series themselves. This lack of proper conditioning can lead to obvious model failures; for example, models may be unaware of the details of a particular product, and hence fail to anticipate seasonal surges in customer demand in the lead up to major exogenous events like holidays for…
▽ More
Modern time-series forecasting models often fail to make full use of rich unstructured information about the time series themselves. This lack of proper conditioning can lead to obvious model failures; for example, models may be unaware of the details of a particular product, and hence fail to anticipate seasonal surges in customer demand in the lead up to major exogenous events like holidays for clearly relevant products. To address this shortcoming, this paper introduces a novel forecast post-processor -- which we call LLMForecaster -- that fine-tunes large language models (LLMs) to incorporate unstructured semantic and contextual information and historical data to improve the forecasts from an existing demand forecasting pipeline. In an industry-scale retail application, we demonstrate that our technique yields statistically significantly forecast improvements across several sets of products subject to holiday-driven demand surges.
△ Less
Submitted 3 December, 2024;
originally announced December 2024.
-
Agricultural Landscape Understanding At Country-Scale
Authors:
Radhika Dua,
Nikita Saxena,
Aditi Agarwal,
Alex Wilson,
Gaurav Singh,
Hoang Tran,
Ishan Deshpande,
Amandeep Kaur,
Gaurav Aggarwal,
Chandan Nath,
Arnab Basu,
Vishal Batchu,
Sharath Holla,
Bindiya Kurle,
Olana Missura,
Rahul Aggarwal,
Shubhika Garg,
Nishi Shah,
Avneet Singh,
Dinesh Tewari,
Agata Dondzik,
Bharat Adsul,
Milind Sohoni,
Asim Rama Praveen,
Aaryan Dangi
, et al. (10 additional authors not shown)
Abstract:
Agricultural landscapes are quite complex, especially in the Global South where fields are smaller, and agricultural practices are more varied. In this paper we report on our progress in digitizing the agricultural landscape (natural and man-made) in our study region of India. We use high resolution imagery and a UNet style segmentation model to generate the first of its kind national-scale multi-…
▽ More
Agricultural landscapes are quite complex, especially in the Global South where fields are smaller, and agricultural practices are more varied. In this paper we report on our progress in digitizing the agricultural landscape (natural and man-made) in our study region of India. We use high resolution imagery and a UNet style segmentation model to generate the first of its kind national-scale multi-class panoptic segmentation output. Through this work we have been able to identify individual fields across 151.7M hectares, and delineating key features such as water resources and vegetation. We share how this output was validated by our team and externally by downstream users, including some sample use cases that can lead to targeted data driven decision making. We believe this dataset will contribute towards digitizing agriculture by generating the foundational baselayer.
△ Less
Submitted 8 November, 2024;
originally announced November 2024.
-
Understanding Communication Preferences of Information Workers in Engagement with Text-Based Conversational Agents
Authors:
Ananya Bhattacharjee,
Jina Suh,
Mahsa Ershadi,
Shamsi T. Iqbal,
Andrew D. Wilson,
Javier Hernandez
Abstract:
Communication traits in text-based human-AI conversations play pivotal roles in shaping user experiences and perceptions of systems. With the advancement of large language models (LLMs), it is now feasible to analyze these traits at a more granular level. In this study, we explore the preferences of information workers regarding chatbot communication traits across seven applications. Participants…
▽ More
Communication traits in text-based human-AI conversations play pivotal roles in shaping user experiences and perceptions of systems. With the advancement of large language models (LLMs), it is now feasible to analyze these traits at a more granular level. In this study, we explore the preferences of information workers regarding chatbot communication traits across seven applications. Participants were invited to participate in an interactive survey, which featured adjustable sliders, allowing them to adjust and express their preferences for five key communication traits: formality, personification, empathy, sociability, and humor. Our findings reveal distinct communication preferences across different applications; for instance, there was a preference for relatively high empathy in wellbeing contexts and relatively low personification in coding. Similarities in preferences were also noted between applications such as chatbots for customer service and scheduling. These insights offer crucial design guidelines for future chatbots, emphasizing the need for nuanced trait adjustments for each application.
△ Less
Submitted 1 November, 2024; v1 submitted 27 October, 2024;
originally announced October 2024.
-
UniCoN: Universal Conditional Networks for Multi-Age Embryonic Cartilage Segmentation with Sparsely Annotated Data
Authors:
Nishchal Sapkota,
Yejia Zhang,
Zihao Zhao,
Maria Gomez,
Yuhan Hsi,
Jordan A. Wilson,
Kazuhiko Kawasaki,
Greg Holmes,
Meng Wu,
Ethylin Wang Jabs,
Joan T. Richtsmeier,
Susan M. Motch Perrine,
Danny Z. Chen
Abstract:
Osteochondrodysplasia, affecting 2-3% of newborns globally, is a group of bone and cartilage disorders that often result in head malformations, contributing to childhood morbidity and reduced quality of life. Current research on this disease using mouse models faces challenges since it involves accurately segmenting the developing cartilage in 3D micro-CT images of embryonic mice. Tackling this se…
▽ More
Osteochondrodysplasia, affecting 2-3% of newborns globally, is a group of bone and cartilage disorders that often result in head malformations, contributing to childhood morbidity and reduced quality of life. Current research on this disease using mouse models faces challenges since it involves accurately segmenting the developing cartilage in 3D micro-CT images of embryonic mice. Tackling this segmentation task with deep learning (DL) methods is laborious due to the big burden of manual image annotation, expensive due to the high acquisition costs of 3D micro-CT images, and difficult due to embryonic cartilage's complex and rapidly changing shapes. While DL approaches have been proposed to automate cartilage segmentation, most such models have limited accuracy and generalizability, especially across data from different embryonic age groups. To address these limitations, we propose novel DL methods that can be adopted by any DL architectures -- including CNNs, Transformers, or hybrid models -- which effectively leverage age and spatial information to enhance model performance. Specifically, we propose two new mechanisms, one conditioned on discrete age categories and the other on continuous image crop locations, to enable an accurate representation of cartilage shape changes across ages and local shape details throughout the cranial region. Extensive experiments on multi-age cartilage segmentation datasets show significant and consistent performance improvements when integrating our conditional modules into popular DL segmentation architectures. On average, we achieve a 1.7% Dice score increase with minimal computational overhead and a 7.5% improvement on unseen data. These results highlight the potential of our approach for developing robust, universal models capable of handling diverse datasets with limited annotated data, a key challenge in DL-based medical image analysis.
△ Less
Submitted 16 October, 2024;
originally announced October 2024.
-
Unstable Unlearning: The Hidden Risk of Concept Resurgence in Diffusion Models
Authors:
Vinith M. Suriyakumar,
Rohan Alur,
Ayush Sekhari,
Manish Raghavan,
Ashia C. Wilson
Abstract:
Text-to-image diffusion models rely on massive, web-scale datasets. Training them from scratch is computationally expensive, and as a result, developers often prefer to make incremental updates to existing models. These updates often compose fine-tuning steps (to learn new concepts or improve model performance) with "unlearning" steps (to "forget" existing concepts, such as copyrighted works or ex…
▽ More
Text-to-image diffusion models rely on massive, web-scale datasets. Training them from scratch is computationally expensive, and as a result, developers often prefer to make incremental updates to existing models. These updates often compose fine-tuning steps (to learn new concepts or improve model performance) with "unlearning" steps (to "forget" existing concepts, such as copyrighted works or explicit content). In this work, we demonstrate a critical and previously unknown vulnerability that arises in this paradigm: even under benign, non-adversarial conditions, fine-tuning a text-to-image diffusion model on seemingly unrelated images can cause it to "relearn" concepts that were previously "unlearned." We comprehensively investigate the causes and scope of this phenomenon, which we term concept resurgence, by performing a series of experiments which compose "concept unlearning" with subsequent fine-tuning of Stable Diffusion v1.4 and Stable Diffusion v2.1. Our findings underscore the fragility of composing incremental model updates, and raise serious new concerns about current approaches to ensuring the safety and alignment of text-to-image diffusion models.
△ Less
Submitted 10 February, 2025; v1 submitted 10 October, 2024;
originally announced October 2024.
-
Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices
Authors:
Andres Potapczynski,
Shikai Qiu,
Marc Finzi,
Christopher Ferri,
Zixi Chen,
Micah Goldblum,
Bayan Bruss,
Christopher De Sa,
Andrew Gordon Wilson
Abstract:
Dense linear layers are the dominant computational bottleneck in large neural networks, presenting a critical need for more efficient alternatives. Previous efforts focused on a small number of hand-crafted structured matrices and neglected to investigate whether these structures can surpass dense layers in terms of compute-optimal scaling laws when both the model size and training examples are op…
▽ More
Dense linear layers are the dominant computational bottleneck in large neural networks, presenting a critical need for more efficient alternatives. Previous efforts focused on a small number of hand-crafted structured matrices and neglected to investigate whether these structures can surpass dense layers in terms of compute-optimal scaling laws when both the model size and training examples are optimally allocated. In this work, we present a unifying framework that enables searching among all linear operators expressible via an Einstein summation. This framework encompasses many previously proposed structures, such as low-rank, Kronecker, Tensor-Train, Block Tensor-Train (BTT), and Monarch, along with many novel structures. To analyze the framework, we develop a taxonomy of all such operators based on their computational and algebraic properties and show that differences in the compute-optimal scaling laws are mostly governed by a small number of variables that we introduce. Namely, a small $ω$ (which measures parameter sharing) and large $ψ$ (which measures the rank) reliably led to better scaling laws. Guided by the insight that full-rank structures that maximize parameters per unit of compute perform the best, we propose BTT-MoE, a novel Mixture-of-Experts (MoE) architecture obtained by sparsifying computation in the BTT structure. In contrast to the standard sparse MoE for each entire feed-forward network, BTT-MoE learns an MoE in every single linear layer of the model, including the projection matrices in the attention blocks. We find BTT-MoE provides a substantial compute-efficiency gain over dense layers and standard MoE.
△ Less
Submitted 4 October, 2024; v1 submitted 2 October, 2024;
originally announced October 2024.
-
SpaceBlender: Creating Context-Rich Collaborative Spaces Through Generative 3D Scene Blending
Authors:
Nels Numan,
Shwetha Rajaram,
Balasaravanan Thoravi Kumaravel,
Nicolai Marquardt,
Andrew D. Wilson
Abstract:
There is increased interest in using generative AI to create 3D spaces for Virtual Reality (VR) applications. However, today's models produce artificial environments, falling short of supporting collaborative tasks that benefit from incorporating the user's physical context. To generate environments that support VR telepresence, we introduce SpaceBlender, a novel pipeline that utilizes generative…
▽ More
There is increased interest in using generative AI to create 3D spaces for Virtual Reality (VR) applications. However, today's models produce artificial environments, falling short of supporting collaborative tasks that benefit from incorporating the user's physical context. To generate environments that support VR telepresence, we introduce SpaceBlender, a novel pipeline that utilizes generative AI techniques to blend users' physical surroundings into unified virtual spaces. This pipeline transforms user-provided 2D images into context-rich 3D environments through an iterative process consisting of depth estimation, mesh alignment, and diffusion-based space completion guided by geometric priors and adaptive text prompts. In a preliminary within-subjects study, where 20 participants performed a collaborative VR affinity diagramming task in pairs, we compared SpaceBlender with a generic virtual environment and a state-of-the-art scene generation framework, evaluating its ability to create virtual spaces suitable for collaboration. Participants appreciated the enhanced familiarity and context provided by SpaceBlender but also noted complexities in the generative environments that could detract from task focus. Drawing on participant feedback, we propose directions for improving the pipeline and discuss the value and design of blended spaces for different scenarios.
△ Less
Submitted 20 September, 2024;
originally announced September 2024.
-
Applying Action Masking and Curriculum Learning Techniques to Improve Data Efficiency and Overall Performance in Operational Technology Cyber Security using Reinforcement Learning
Authors:
Alec Wilson,
William Holmes,
Ryan Menzies,
Kez Smithson Whitehead
Abstract:
In previous work, the IPMSRL environment (Integrated Platform Management System Reinforcement Learning environment) was developed with the aim of training defensive RL agents in a simulator representing a subset of an IPMS on a maritime vessel under a cyber-attack. This paper extends the use of IPMSRL to enhance realism including the additional dynamics of false positive alerts and alert delay. Ap…
▽ More
In previous work, the IPMSRL environment (Integrated Platform Management System Reinforcement Learning environment) was developed with the aim of training defensive RL agents in a simulator representing a subset of an IPMS on a maritime vessel under a cyber-attack. This paper extends the use of IPMSRL to enhance realism including the additional dynamics of false positive alerts and alert delay. Applying curriculum learning, in the most difficult environment tested, resulted in an episode reward mean increasing from a baseline result of -2.791 to -0.569. Applying action masking, in the most difficult environment tested, resulted in an episode reward mean increasing from a baseline result of -2.791 to -0.743. Importantly, this level of performance was reached in less than 1 million timesteps, which was far more data efficient than vanilla PPO which reached a lower level of performance after 2.5 million timesteps. The training method which resulted in the highest level of performance observed in this paper was a combination of the application of curriculum learning and action masking, with a mean episode reward of 0.137. This paper also introduces a basic hardcoded defensive agent encoding a representation of cyber security best practice, which provides context to the episode reward mean figures reached by the RL agents. The hardcoded agent managed an episode reward mean of -1.895. This paper therefore shows that applications of curriculum learning and action masking, both independently and in tandem, present a way to overcome the complex real-world dynamics that are present in operational technology cyber security threat remediation.
△ Less
Submitted 13 September, 2024;
originally announced September 2024.
-
Satellite Sunroof: High-res Digital Surface Models and Roof Segmentation for Global Solar Mapping
Authors:
Vishal Batchu,
Alex Wilson,
Betty Peng,
Carl Elkin,
Umangi Jain,
Christopher Van Arsdale,
Ross Goroshin,
Varun Gulshan
Abstract:
The transition to renewable energy, particularly solar, is key to mitigating climate change. Google's Solar API aids this transition by estimating solar potential from aerial imagery, but its impact is constrained by geographical coverage. This paper proposes expanding the API's reach using satellite imagery, enabling global solar potential assessment. We tackle challenges involved in building a D…
▽ More
The transition to renewable energy, particularly solar, is key to mitigating climate change. Google's Solar API aids this transition by estimating solar potential from aerial imagery, but its impact is constrained by geographical coverage. This paper proposes expanding the API's reach using satellite imagery, enabling global solar potential assessment. We tackle challenges involved in building a Digital Surface Model (DSM) and roof instance segmentation from lower resolution and single oblique views using deep learning models. Our models, trained on aligned satellite and aerial datasets, produce 25cm DSMs and roof segments. With ~1m DSM MAE on buildings, ~5deg roof pitch error and ~56% IOU on roof segmentation, they significantly enhance the Solar API's potential to promote solar adoption.
△ Less
Submitted 29 August, 2024; v1 submitted 26 August, 2024;
originally announced August 2024.
-
Adaptive Backtracking Line Search
Authors:
Joao V. Cavalcanti,
Laurent Lessard,
Ashia C. Wilson
Abstract:
Backtracking line search is foundational in numerical optimization. The basic idea is to adjust the step-size of an algorithm by a constant factor until some chosen criterion (e.g. Armijo, Descent Lemma) is satisfied. We propose a novel way to adjust step-sizes, replacing the constant factor used in regular backtracking with one that takes into account the degree to which the chosen criterion is v…
▽ More
Backtracking line search is foundational in numerical optimization. The basic idea is to adjust the step-size of an algorithm by a constant factor until some chosen criterion (e.g. Armijo, Descent Lemma) is satisfied. We propose a novel way to adjust step-sizes, replacing the constant factor used in regular backtracking with one that takes into account the degree to which the chosen criterion is violated, with no additional computational burden. This light-weight adjustment leads to significantly faster optimization, which we confirm by performing a variety of experiments on over fifteen real world datasets. For convex problems, we prove adaptive backtracking requires no more adjustments to produce a feasible step-size than regular backtracking does. For nonconvex smooth problems, we prove adaptive backtracking enjoys the same guarantees of regular backtracking. Furthermore, we prove adaptive backtracking preserves the convergence rates of gradient descent and its accelerated variant.
△ Less
Submitted 26 May, 2025; v1 submitted 23 August, 2024;
originally announced August 2024.
-
Automating Transparency Mechanisms in the Judicial System Using LLMs: Opportunities and Challenges
Authors:
Ishana Shastri,
Shomik Jain,
Barbara Engelhardt,
Ashia Wilson
Abstract:
Bringing more transparency to the judicial system for the purposes of increasing accountability often demands extensive effort from auditors who must meticulously sift through numerous disorganized legal case files to detect patterns of bias and errors. For example, the high-profile investigation into the Curtis Flowers case took seven reporters a full year to assemble evidence about the prosecuto…
▽ More
Bringing more transparency to the judicial system for the purposes of increasing accountability often demands extensive effort from auditors who must meticulously sift through numerous disorganized legal case files to detect patterns of bias and errors. For example, the high-profile investigation into the Curtis Flowers case took seven reporters a full year to assemble evidence about the prosecutor's history of selecting racially biased juries. LLMs have the potential to automate and scale these transparency pipelines, especially given their demonstrated capabilities to extract information from unstructured documents. We discuss the opportunities and challenges of using LLMs to provide transparency in two important court processes: jury selection in criminal trials and housing eviction cases.
△ Less
Submitted 15 August, 2024;
originally announced August 2024.
-
Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models
Authors:
Sanae Lotfi,
Yilun Kuang,
Brandon Amos,
Micah Goldblum,
Marc Finzi,
Andrew Gordon Wilson
Abstract:
Large language models (LLMs) with billions of parameters excel at predicting the next token in a sequence. Recent work computes non-vacuous compression-based generalization bounds for LLMs, but these bounds are vacuous for large models at the billion-parameter scale. Moreover, these bounds are obtained through restrictive compression techniques, bounding compressed models that generate low-quality…
▽ More
Large language models (LLMs) with billions of parameters excel at predicting the next token in a sequence. Recent work computes non-vacuous compression-based generalization bounds for LLMs, but these bounds are vacuous for large models at the billion-parameter scale. Moreover, these bounds are obtained through restrictive compression techniques, bounding compressed models that generate low-quality text. Additionally, the tightness of these existing bounds depends on the number of IID documents in a training set rather than the much larger number of non-IID constituent tokens, leaving untapped potential for tighter bounds. In this work, we instead use properties of martingales to derive generalization bounds that benefit from the vast number of tokens in LLM training sets. Since a dataset contains far more tokens than documents, our generalization bounds not only tolerate but actually benefit from far less restrictive compression schemes. With Monarch matrices, Kronecker factorizations, and post-training quantization, we achieve non-vacuous generalization bounds for LLMs as large as LLaMA2-70B. Unlike previous approaches, our work achieves the first non-vacuous bounds for models that are deployed in practice and generate high-quality text.
△ Less
Submitted 25 July, 2024;
originally announced July 2024.
-
Non-native Quantum Generative Optimization with Adversarial Autoencoders
Authors:
Blake A. Wilson,
Jonathan Wurtz,
Vahagn Mkhitaryan,
Michael Bezick,
Sheng-Tao Wang,
Sabre Kais,
Vladimir M. Shalaev,
Alexandra Boltasseva
Abstract:
Large-scale optimization problems are prevalent in several fields, including engineering, finance, and logistics. However, most optimization problems cannot be efficiently encoded onto a physical system because the existing quantum samplers have too few qubits. Another typical limiting factor is that the optimization constraints are not compatible with the native cost Hamiltonian. This work presen…
▽ More
Large-scale optimization problems are prevalent in several fields, including engineering, finance, and logistics. However, most optimization problems cannot be efficiently encoded onto a physical system because the existing quantum samplers have too few qubits. Another typical limiting factor is that the optimization constraints are not compatible with the native cost Hamiltonian. This work presents a new approach to address these challenges. We introduce the adversarial quantum autoencoder model (AQAM) that can be used to map large-scale optimization problems onto existing quantum samplers while simultaneously optimizing the problem through latent quantum-enhanced Boltzmann sampling. We demonstrate the AQAM on a neutral atom sampler, and showcase the model by optimizing 64px by 64px unit cells that represent a broad-angle filter metasurface applicable to improving the coherence of neutral atom devices. Using 12-atom simulations, we demonstrate that the AQAM achieves a lower Renyi divergence and a larger spectral gap when compared to classical Markov Chain Monte Carlo samplers. Our work paves the way to more efficient mapping of conventional optimization problems into existing quantum samplers.
△ Less
Submitted 18 July, 2024;
originally announced July 2024.
-
The Approximate Fisher Influence Function: Faster Estimation of Data Influence in Statistical Models
Authors:
Omri Lev,
Ashia C. Wilson
Abstract:
Quantifying the influence of infinitesimal changes in training data on model performance is crucial for understanding and improving machine learning models. In this work, we reformulate this problem as a weighted empirical risk minimization and enhance existing influence function-based methods by using information geometry to derive a new algorithm to estimate influence. Our formulation proves ver…
▽ More
Quantifying the influence of infinitesimal changes in training data on model performance is crucial for understanding and improving machine learning models. In this work, we reformulate this problem as a weighted empirical risk minimization and enhance existing influence function-based methods by using information geometry to derive a new algorithm to estimate influence. Our formulation proves versatile across various applications, and we further demonstrate in simulations how it remains informative even in non-convex cases. Furthermore, we show that our method offers significant computational advantages over current Newton step-based methods.
△ Less
Submitted 9 April, 2025; v1 submitted 11 July, 2024;
originally announced July 2024.
-
Just How Flexible are Neural Networks in Practice?
Authors:
Ravid Shwartz-Ziv,
Micah Goldblum,
Arpit Bansal,
C. Bayan Bruss,
Yann LeCun,
Andrew Gordon Wilson
Abstract:
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters, underpinning notions of overparameterized and underparameterized models. In practice, however, we only find solutions accessible via our training procedure, including the optimizer and regularizers, limiting flexibility. Moreover, the exact parameterization of the function c…
▽ More
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters, underpinning notions of overparameterized and underparameterized models. In practice, however, we only find solutions accessible via our training procedure, including the optimizer and regularizers, limiting flexibility. Moreover, the exact parameterization of the function class, built into an architecture, shapes its loss surface and impacts the minima we find. In this work, we examine the ability of neural networks to fit data in practice. Our findings indicate that: (1) standard optimizers find minima where the model can only fit training sets with significantly fewer samples than it has parameters; (2) convolutional networks are more parameter-efficient than MLPs and ViTs, even on randomly labeled data; (3) while stochastic training is thought to have a regularizing effect, SGD actually finds minima that fit more training data than full-batch gradient descent; (4) the difference in capacity to fit correctly labeled and incorrectly labeled samples can be predictive of generalization; (5) ReLU activation functions result in finding minima that fit more data despite being designed to avoid vanishing and exploding gradients in deep architectures.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Scalable and Flexible Causal Discovery with an Efficient Test for Adjacency
Authors:
Alan Nawzad Amin,
Andrew Gordon Wilson
Abstract:
To make accurate predictions, understand mechanisms, and design interventions in systems of many variables, we wish to learn causal graphs from large scale data. Unfortunately the space of all possible causal graphs is enormous so scalably and accurately searching for the best fit to the data is a challenge. In principle we could substantially decrease the search space, or learn the graph entirely…
▽ More
To make accurate predictions, understand mechanisms, and design interventions in systems of many variables, we wish to learn causal graphs from large scale data. Unfortunately the space of all possible causal graphs is enormous so scalably and accurately searching for the best fit to the data is a challenge. In principle we could substantially decrease the search space, or learn the graph entirely, by testing the conditional independence of variables. However, deciding if two variables are adjacent in a causal graph may require an exponential number of tests. Here we build a scalable and flexible method to evaluate if two variables are adjacent in a causal graph, the Differentiable Adjacency Test (DAT). DAT replaces an exponential number of tests with a provably equivalent relaxed problem. It then solves this problem by training two neural networks. We build a graph learning method based on DAT, DAT-Graph, that can also learn from data with interventions. DAT-Graph can learn graphs of 1000 variables with state of the art accuracy. Using the graph learned by DAT-Graph, we also build models that make much more accurate predictions of the effects of interventions on large scale RNA sequencing data.
△ Less
Submitted 18 June, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
Large Language Models Must Be Taught to Know What They Don't Know
Authors:
Sanyam Kapoor,
Nate Gruver,
Manley Roberts,
Katherine Collins,
Arka Pal,
Umang Bhatt,
Adrian Weller,
Samuel Dooley,
Micah Goldblum,
Andrew Gordon Wilson
Abstract:
When using large language models (LLMs) in high-stakes applications, we need to know when we can trust their predictions. Some works argue that prompting high-performance LLMs is sufficient to produce calibrated uncertainties, while others introduce sampling methods that can be prohibitively expensive. In this work, we first argue that prompting on its own is insufficient to achieve good calibrati…
▽ More
When using large language models (LLMs) in high-stakes applications, we need to know when we can trust their predictions. Some works argue that prompting high-performance LLMs is sufficient to produce calibrated uncertainties, while others introduce sampling methods that can be prohibitively expensive. In this work, we first argue that prompting on its own is insufficient to achieve good calibration and then show that fine-tuning on a small dataset of correct and incorrect answers can create an uncertainty estimate with good generalization and small computational overhead. We show that a thousand graded examples are sufficient to outperform baseline methods and that training through the features of a model is necessary for good performance and tractable for large open-source models when using LoRA. We also investigate the mechanisms that enable reliable LLM uncertainty estimation, finding that many models can be used as general-purpose uncertainty estimators, applicable not just to their own uncertainties but also the uncertainty of other models. Lastly, we show that uncertainty estimates inform human use of LLMs in human-AI collaborative settings through a user study.
△ Less
Submitted 5 December, 2024; v1 submitted 12 June, 2024;
originally announced June 2024.
-
Transferring Knowledge from Large Foundation Models to Small Downstream Models
Authors:
Shikai Qiu,
Boran Han,
Danielle C. Maddix,
Shuai Zhang,
Yuyang Wang,
Andrew Gordon Wilson
Abstract:
How do we transfer the relevant knowledge from ever larger foundation models into small, task-specific downstream models that can run at much lower costs? Standard transfer learning using pre-trained weights as the initialization transfers limited information and commits us to often massive pre-trained architectures. This procedure also precludes combining multiple pre-trained models that learn co…
▽ More
How do we transfer the relevant knowledge from ever larger foundation models into small, task-specific downstream models that can run at much lower costs? Standard transfer learning using pre-trained weights as the initialization transfers limited information and commits us to often massive pre-trained architectures. This procedure also precludes combining multiple pre-trained models that learn complementary information. To address these shortcomings, we introduce Adaptive Feature Transfer (AFT). Instead of transferring weights, AFT operates purely on features, thereby decoupling the choice of the pre-trained model from the smaller downstream model. Rather than indiscriminately compressing all pre-trained features, AFT adaptively transfers pre-trained features that are most useful for performing the downstream task, using a simple regularization that adds minimal overhead. Across multiple vision, language, and multi-modal datasets, AFT achieves significantly better downstream performance compared to alternatives with a similar computational cost. Furthermore, AFT reliably translates improvement in pre-trained models into improvement in downstream performance, even if the downstream model is over $50\times$ smaller, and can effectively transfer complementary information learned by multiple pre-trained models.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Compute Better Spent: Replacing Dense Layers with Structured Matrices
Authors:
Shikai Qiu,
Andres Potapczynski,
Marc Finzi,
Micah Goldblum,
Andrew Gordon Wilson
Abstract:
Dense linear layers are the dominant computational bottleneck in foundation models. Identifying more efficient alternatives to dense matrices has enormous potential for building more compute-efficient models, as exemplified by the success of convolutional networks in the image domain. In this work, we systematically explore structured matrices as replacements for dense matrices. We show that diffe…
▽ More
Dense linear layers are the dominant computational bottleneck in foundation models. Identifying more efficient alternatives to dense matrices has enormous potential for building more compute-efficient models, as exemplified by the success of convolutional networks in the image domain. In this work, we systematically explore structured matrices as replacements for dense matrices. We show that different structures often require drastically different initialization scales and learning rates, which are crucial to performance, especially as models scale. Using insights from the Maximal Update Parameterization, we determine the optimal scaling for initialization and learning rates of these unconventional layers. Finally, we measure the scaling laws of different structures to compare how quickly their performance improves with compute. We propose a novel matrix family containing Monarch matrices, the Block Tensor-Train (BTT), which we show performs better than dense matrices for the same compute on multiple tasks. On CIFAR-10/100 with augmentation, BTT achieves exponentially lower training loss than dense when training MLPs and ViTs. BTT matches dense ViT-S/32 performance on ImageNet-1k with 3.8 times less compute and is more efficient than dense for training small GPT-2 language models.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
As an AI Language Model, "Yes I Would Recommend Calling the Police": Norm Inconsistency in LLM Decision-Making
Authors:
Shomik Jain,
D Calacci,
Ashia Wilson
Abstract:
We investigate the phenomenon of norm inconsistency: where LLMs apply different norms in similar situations. Specifically, we focus on the high-risk application of deciding whether to call the police in Amazon Ring home surveillance videos. We evaluate the decisions of three state-of-the-art LLMs -- GPT-4, Gemini 1.0, and Claude 3 Sonnet -- in relation to the activities portrayed in the videos, th…
▽ More
We investigate the phenomenon of norm inconsistency: where LLMs apply different norms in similar situations. Specifically, we focus on the high-risk application of deciding whether to call the police in Amazon Ring home surveillance videos. We evaluate the decisions of three state-of-the-art LLMs -- GPT-4, Gemini 1.0, and Claude 3 Sonnet -- in relation to the activities portrayed in the videos, the subjects' skin-tone and gender, and the characteristics of the neighborhoods where the videos were recorded. Our analysis reveals significant norm inconsistencies: (1) a discordance between the recommendation to call the police and the actual presence of criminal activity, and (2) biases influenced by the racial demographics of the neighborhoods. These results highlight the arbitrariness of model decisions in the surveillance context and the limitations of current bias detection and mitigation strategies in normative decision-making.
△ Less
Submitted 17 August, 2024; v1 submitted 23 May, 2024;
originally announced May 2024.
-
Modeling Caption Diversity in Contrastive Vision-Language Pretraining
Authors:
Samuel Lavoie,
Polina Kirichenko,
Mark Ibrahim,
Mahmoud Assran,
Andrew Gordon Wilson,
Aaron Courville,
Nicolas Ballas
Abstract:
There are a thousand ways to caption an image. Contrastive Language Pretraining (CLIP) on the other hand, works by mapping an image and its caption to a single vector -- limiting how well CLIP-like models can represent the diverse ways to describe an image. In this work, we introduce Llip, Latent Language Image Pretraining, which models the diversity of captions that could match an image. Llip's v…
▽ More
There are a thousand ways to caption an image. Contrastive Language Pretraining (CLIP) on the other hand, works by mapping an image and its caption to a single vector -- limiting how well CLIP-like models can represent the diverse ways to describe an image. In this work, we introduce Llip, Latent Language Image Pretraining, which models the diversity of captions that could match an image. Llip's vision encoder outputs a set of visual features that are mixed into a final representation by conditioning on information derived from the text. We show that Llip outperforms non-contextualized baselines like CLIP and SigLIP on a variety of tasks even with large-scale encoders. Llip improves zero-shot classification by an average of 2.9% zero-shot classification benchmarks with a ViT-G/14 encoder. Specifically, Llip attains a zero-shot top-1 accuracy of 83.5% on ImageNet outperforming a similarly sized CLIP by 1.4%. We also demonstrate improvement on zero-shot retrieval on MS-COCO by 6.0%. We provide a comprehensive analysis of the components introduced by the method and demonstrate that Llip leads to richer visual representations.
△ Less
Submitted 29 March, 2025; v1 submitted 29 April, 2024;
originally announced May 2024.
-
Leveraging Speech for Gesture Detection in Multimodal Communication
Authors:
Esam Ghaleb,
Ilya Burenko,
Marlou Rasenberg,
Wim Pouw,
Ivan Toni,
Peter Uhrig,
Anna Wilson,
Judith Holler,
Aslı Özyürek,
Raquel Fernández
Abstract:
Gestures are inherent to human interaction and often complement speech in face-to-face communication, forming a multimodal communication system. An important task in gesture analysis is detecting a gesture's beginning and end. Research on automatic gesture detection has primarily focused on visual and kinematic information to detect a limited set of isolated or silent gestures with low variability…
▽ More
Gestures are inherent to human interaction and often complement speech in face-to-face communication, forming a multimodal communication system. An important task in gesture analysis is detecting a gesture's beginning and end. Research on automatic gesture detection has primarily focused on visual and kinematic information to detect a limited set of isolated or silent gestures with low variability, neglecting the integration of speech and vision signals to detect gestures that co-occur with speech. This work addresses this gap by focusing on co-speech gesture detection, emphasising the synchrony between speech and co-speech hand gestures. We address three main challenges: the variability of gesture forms, the temporal misalignment between gesture and speech onsets, and differences in sampling rate between modalities. We investigate extended speech time windows and employ separate backbone models for each modality to address the temporal misalignment and sampling rate differences. We utilize Transformer encoders in cross-modal and early fusion techniques to effectively align and integrate speech and skeletal sequences. The study results show that combining visual and speech information significantly enhances gesture detection performance. Our findings indicate that expanding the speech buffer beyond visual time segments improves performance and that multimodal integration using cross-modal and early fusion techniques outperforms baseline methods using unimodal and late fusion methods. Additionally, we find a correlation between the models' gesture prediction confidence and low-level speech frequency features potentially associated with gestures. Overall, the study provides a better understanding and detection methods for co-speech gestures, facilitating the analysis of multimodal communication.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Scarce Resource Allocations That Rely On Machine Learning Should Be Randomized
Authors:
Shomik Jain,
Kathleen Creel,
Ashia Wilson
Abstract:
Contrary to traditional deterministic notions of algorithmic fairness, this paper argues that fairly allocating scarce resources using machine learning often requires randomness. We address why, when, and how to randomize by proposing stochastic procedures that more adequately account for all of the claims that individuals have to allocations of social goods or opportunities.
Contrary to traditional deterministic notions of algorithmic fairness, this paper argues that fairly allocating scarce resources using machine learning often requires randomness. We address why, when, and how to randomize by proposing stochastic procedures that more adequately account for all of the claims that individuals have to allocations of social goods or opportunities.
△ Less
Submitted 19 June, 2024; v1 submitted 12 April, 2024;
originally announced April 2024.
-
Generating Potent Poisons and Backdoors from Scratch with Guided Diffusion
Authors:
Hossein Souri,
Arpit Bansal,
Hamid Kazemi,
Liam Fowl,
Aniruddha Saha,
Jonas Geiping,
Andrew Gordon Wilson,
Rama Chellappa,
Tom Goldstein,
Micah Goldblum
Abstract:
Modern neural networks are often trained on massive datasets that are web scraped with minimal human inspection. As a result of this insecure curation pipeline, an adversary can poison or backdoor the resulting model by uploading malicious data to the internet and waiting for a victim to scrape and train on it. Existing approaches for creating poisons and backdoors start with randomly sampled clea…
▽ More
Modern neural networks are often trained on massive datasets that are web scraped with minimal human inspection. As a result of this insecure curation pipeline, an adversary can poison or backdoor the resulting model by uploading malicious data to the internet and waiting for a victim to scrape and train on it. Existing approaches for creating poisons and backdoors start with randomly sampled clean data, called base samples, and then modify those samples to craft poisons. However, some base samples may be significantly more amenable to poisoning than others. As a result, we may be able to craft more potent poisons by carefully choosing the base samples. In this work, we use guided diffusion to synthesize base samples from scratch that lead to significantly more potent poisons and backdoors than previous state-of-the-art attacks. Our Guided Diffusion Poisoning (GDP) base samples can be combined with any downstream poisoning or backdoor attack to boost its effectiveness. Our implementation code is publicly available at: https://github.com/hsouri/GDP .
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Quadcopter Team Configurable Motion Guided by a Quadruped
Authors:
Mohammad Ghufran,
Sourish Tetakayala,
Jack Hughes,
Aron Wilson,
Hossein Rastgoftar
Abstract:
The paper focuses on modeling and experimental evaluation of a quadcopter team configurable coordination guided by a single quadruped robot. We consider the quadcopter team as particles of a two-dimensional deformable body and propose a two-dimensional affine transformation model for safe and collision-free configurable coordination of this heterogeneous robotic system. The proposed affine transfo…
▽ More
The paper focuses on modeling and experimental evaluation of a quadcopter team configurable coordination guided by a single quadruped robot. We consider the quadcopter team as particles of a two-dimensional deformable body and propose a two-dimensional affine transformation model for safe and collision-free configurable coordination of this heterogeneous robotic system. The proposed affine transformation is decomposed into translation, that is specified by the quadruped global position, and configurable motion of the quadcopters, which is determined by a nonsingular Jacobian matrix so that the quadcopter team can safely navigate a constrained environment while avoiding collision. We propose two methods to experimentally evaluate the proposed heterogeneous robot coordination model. The first method measures real positions of quadcopters, quadruped, and environmental objects all with respect to the global coordinate system. On the other hand, the second method measures position with respect to the local coordinate system fixed on the dog robot which in turn enables safe planning the Jacobian matrix of the quadcopter team while the world is virtually approached the robotic system.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
BlendScape: Enabling End-User Customization of Video-Conferencing Environments through Generative AI
Authors:
Shwetha Rajaram,
Nels Numan,
Balasaravanan Thoravi Kumaravel,
Nicolai Marquardt,
Andrew D. Wilson
Abstract:
Today's video-conferencing tools support a rich range of professional and social activities, but their generic meeting environments cannot be dynamically adapted to align with distributed collaborators' needs. To enable end-user customization, we developed BlendScape, a rendering and composition system for video-conferencing participants to tailor environments to their meeting context by leveragin…
▽ More
Today's video-conferencing tools support a rich range of professional and social activities, but their generic meeting environments cannot be dynamically adapted to align with distributed collaborators' needs. To enable end-user customization, we developed BlendScape, a rendering and composition system for video-conferencing participants to tailor environments to their meeting context by leveraging AI image generation techniques. BlendScape supports flexible representations of task spaces by blending users' physical or digital backgrounds into unified environments and implements multimodal interaction techniques to steer the generation. Through an exploratory study with 15 end-users, we investigated whether and how they would find value in using generative AI to customize video-conferencing environments. Participants envisioned using a system like BlendScape to facilitate collaborative activities in the future, but required further controls to mitigate distracting or unrealistic visual elements. We implemented scenarios to demonstrate BlendScape's expressiveness for supporting environment design strategies from prior work and propose composition techniques to improve the quality of environments.
△ Less
Submitted 1 October, 2024; v1 submitted 20 March, 2024;
originally announced March 2024.
-
Mind the GAP: Improving Robustness to Subpopulation Shifts with Group-Aware Priors
Authors:
Tim G. J. Rudner,
Ya Shi Zhang,
Andrew Gordon Wilson,
Julia Kempe
Abstract:
Machine learning models often perform poorly under subpopulation shifts in the data distribution. Developing methods that allow machine learning models to better generalize to such shifts is crucial for safe deployment in real-world settings. In this paper, we develop a family of group-aware prior (GAP) distributions over neural network parameters that explicitly favor models that generalize well…
▽ More
Machine learning models often perform poorly under subpopulation shifts in the data distribution. Developing methods that allow machine learning models to better generalize to such shifts is crucial for safe deployment in real-world settings. In this paper, we develop a family of group-aware prior (GAP) distributions over neural network parameters that explicitly favor models that generalize well under subpopulation shifts. We design a simple group-aware prior that only requires access to a small set of data with group information and demonstrate that training with this prior yields state-of-the-art performance -- even when only retraining the final layer of a previously trained non-robust model. Group aware-priors are conceptually simple, complementary to existing approaches, such as attribute pseudo labeling and data reweighting, and open up promising new avenues for harnessing Bayesian inference to enable robustness to subpopulation shifts.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
Chronos: Learning the Language of Time Series
Authors:
Abdul Fatir Ansari,
Lorenzo Stella,
Caner Turkmen,
Xiyuan Zhang,
Pedro Mercado,
Huibin Shen,
Oleksandr Shchur,
Syama Sundar Rangapuram,
Sebastian Pineda Arango,
Shubham Kapoor,
Jasper Zschiegner,
Danielle C. Maddix,
Hao Wang,
Michael W. Mahoney,
Kari Torkkola,
Andrew Gordon Wilson,
Michael Bohlke-Schneider,
Yuyang Wang
Abstract:
We introduce Chronos, a simple yet effective framework for pretrained probabilistic time series models. Chronos tokenizes time series values using scaling and quantization into a fixed vocabulary and trains existing transformer-based language model architectures on these tokenized time series via the cross-entropy loss. We pretrained Chronos models based on the T5 family (ranging from 20M to 710M…
▽ More
We introduce Chronos, a simple yet effective framework for pretrained probabilistic time series models. Chronos tokenizes time series values using scaling and quantization into a fixed vocabulary and trains existing transformer-based language model architectures on these tokenized time series via the cross-entropy loss. We pretrained Chronos models based on the T5 family (ranging from 20M to 710M parameters) on a large collection of publicly available datasets, complemented by a synthetic dataset that we generated via Gaussian processes to improve generalization. In a comprehensive benchmark consisting of 42 datasets, and comprising both classical local models and deep learning methods, we show that Chronos models: (a) significantly outperform other methods on datasets that were part of the training corpus; and (b) have comparable and occasionally superior zero-shot performance on new datasets, relative to methods that were trained specifically on them. Our results demonstrate that Chronos models can leverage time series data from diverse domains to improve zero-shot accuracy on unseen forecasting tasks, positioning pretrained models as a viable tool to greatly simplify forecasting pipelines.
△ Less
Submitted 4 November, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
Controllable Prompt Tuning For Balancing Group Distributional Robustness
Authors:
Hoang Phan,
Andrew Gordon Wilson,
Qi Lei
Abstract:
Models trained on data composed of different groups or domains can suffer from severe performance degradation under distribution shifts. While recent methods have largely focused on optimizing the worst-group objective, this often comes at the expense of good performance on other groups. To address this problem, we introduce an optimization scheme to achieve good performance across groups and find…
▽ More
Models trained on data composed of different groups or domains can suffer from severe performance degradation under distribution shifts. While recent methods have largely focused on optimizing the worst-group objective, this often comes at the expense of good performance on other groups. To address this problem, we introduce an optimization scheme to achieve good performance across groups and find a good solution for all without severely sacrificing performance on any of them. However, directly applying such optimization involves updating the parameters of the entire network, making it both computationally expensive and challenging. Thus, we introduce Controllable Prompt Tuning (CPT), which couples our approach with prompt-tuning techniques. On spurious correlation benchmarks, our procedures achieve state-of-the-art results across both transformer and non-transformer architectures, as well as unimodal and multimodal data, while requiring only 0.4% tunable parameters.
△ Less
Submitted 4 June, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.