-
Principled Out-of-Distribution Generalization via Simplicity
Authors:
Jiawei Ge,
Amanda Wang,
Shange Tang,
Chi Jin
Abstract:
Modern foundation models exhibit remarkable out-of-distribution (OOD) generalization, solving tasks far beyond the support of their training data. However, the theoretical principles underpinning this phenomenon remain elusive. This paper investigates this problem by examining the compositional generalization abilities of diffusion models in image generation. Our analysis reveals that while neural…
▽ More
Modern foundation models exhibit remarkable out-of-distribution (OOD) generalization, solving tasks far beyond the support of their training data. However, the theoretical principles underpinning this phenomenon remain elusive. This paper investigates this problem by examining the compositional generalization abilities of diffusion models in image generation. Our analysis reveals that while neural network architectures are expressive enough to represent a wide range of models -- including many with undesirable behavior on OOD inputs -- the true, generalizable model that aligns with human expectations typically corresponds to the simplest among those consistent with the training data.
Motivated by this observation, we develop a theoretical framework for OOD generalization via simplicity, quantified using a predefined simplicity metric. We analyze two key regimes: (1) the constant-gap setting, where the true model is strictly simpler than all spurious alternatives by a fixed gap, and (2) the vanishing-gap setting, where the fixed gap is replaced by a smoothness condition ensuring that models close in simplicity to the true model yield similar predictions. For both regimes, we study the regularized maximum likelihood estimator and establish the first sharp sample complexity guarantees for learning the true, generalizable, simple model.
△ Less
Submitted 28 May, 2025;
originally announced May 2025.
-
The Leaderboard Illusion
Authors:
Shivalika Singh,
Yiyang Nan,
Alex Wang,
Daniel D'Souza,
Sayash Kapoor,
Ahmet Üstün,
Sanmi Koyejo,
Yuntian Deng,
Shayne Longpre,
Noah A. Smith,
Beyza Ermis,
Marzieh Fadaee,
Sara Hooker
Abstract:
Measuring progress is fundamental to the advancement of any scientific field. As benchmarks play an increasingly central role, they also grow more susceptible to distortion. Chatbot Arena has emerged as the go-to leaderboard for ranking the most capable AI systems. Yet, in this work we identify systematic issues that have resulted in a distorted playing field. We find that undisclosed private test…
▽ More
Measuring progress is fundamental to the advancement of any scientific field. As benchmarks play an increasingly central role, they also grow more susceptible to distortion. Chatbot Arena has emerged as the go-to leaderboard for ranking the most capable AI systems. Yet, in this work we identify systematic issues that have resulted in a distorted playing field. We find that undisclosed private testing practices benefit a handful of providers who are able to test multiple variants before public release and retract scores if desired. We establish that the ability of these providers to choose the best score leads to biased Arena scores due to selective disclosure of performance results. At an extreme, we identify 27 private LLM variants tested by Meta in the lead-up to the Llama-4 release. We also establish that proprietary closed models are sampled at higher rates (number of battles) and have fewer models removed from the arena than open-weight and open-source alternatives. Both these policies lead to large data access asymmetries over time. Providers like Google and OpenAI have received an estimated 19.2% and 20.4% of all data on the arena, respectively. In contrast, a combined 83 open-weight models have only received an estimated 29.7% of the total data. We show that access to Chatbot Arena data yields substantial benefits; even limited additional data can result in relative performance gains of up to 112% on the arena distribution, based on our conservative estimates. Together, these dynamics result in overfitting to Arena-specific dynamics rather than general model quality. The Arena builds on the substantial efforts of both the organizers and an open community that maintains this valuable evaluation platform. We offer actionable recommendations to reform the Chatbot Arena's evaluation framework and promote fairer, more transparent benchmarking for the field
△ Less
Submitted 12 May, 2025; v1 submitted 29 April, 2025;
originally announced April 2025.
-
Analysis of Multiple-try Metropolis via Poincaré inequalities
Authors:
Rocco Caprio,
Sam Power,
Andi Wang
Abstract:
We study the Multiple-try Metropolis algorithm using the framework of Poincaré inequalities. We describe the Multiple-try Metropolis as an auxiliary variable implementation of a resampling approximation to an ideal Metropolis--Hastings algorithm. Under suitable moment conditions on the importance weights, we derive explicit Poincaré comparison results between the Multiple-try algorithm and the ide…
▽ More
We study the Multiple-try Metropolis algorithm using the framework of Poincaré inequalities. We describe the Multiple-try Metropolis as an auxiliary variable implementation of a resampling approximation to an ideal Metropolis--Hastings algorithm. Under suitable moment conditions on the importance weights, we derive explicit Poincaré comparison results between the Multiple-try algorithm and the ideal algorithm. We characterize the spectral gap of the latter, and finally in the Gaussian case prove explicit non-asymptotic convergence bounds for Multiple-try Metropolis by comparison.
△ Less
Submitted 25 April, 2025;
originally announced April 2025.
-
Attentional Graph Meta-Learning for Indoor Localization Using Extremely Sparse Fingerprints
Authors:
Wenzhong Yan,
Feng Yin,
Jun Gao,
Ao Wang,
Yang Tian,
Ruizhi Chen
Abstract:
Fingerprint-based indoor localization is often labor-intensive due to the need for dense grids and repeated measurements across time and space. Maintaining high localization accuracy with extremely sparse fingerprints remains a persistent challenge. Existing benchmark methods primarily rely on the measured fingerprints, while neglecting valuable spatial and environmental characteristics. In this p…
▽ More
Fingerprint-based indoor localization is often labor-intensive due to the need for dense grids and repeated measurements across time and space. Maintaining high localization accuracy with extremely sparse fingerprints remains a persistent challenge. Existing benchmark methods primarily rely on the measured fingerprints, while neglecting valuable spatial and environmental characteristics. In this paper, we propose a systematic integration of an Attentional Graph Neural Network (AGNN) model, capable of learning spatial adjacency relationships and aggregating information from neighboring fingerprints, and a meta-learning framework that utilizes datasets with similar environmental characteristics to enhance model training. To minimize the labor required for fingerprint collection, we introduce two novel data augmentation strategies: 1) unlabeled fingerprint augmentation using moving platforms, which enables the semi-supervised AGNN model to incorporate information from unlabeled fingerprints, and 2) synthetic labeled fingerprint augmentation through environmental digital twins, which enhances the meta-learning framework through a practical distribution alignment, which can minimize the feature discrepancy between synthetic and real-world fingerprints effectively. By integrating these novel modules, we propose the Attentional Graph Meta-Learning (AGML) model. This novel model combines the strengths of the AGNN model and the meta-learning framework to address the challenges posed by extremely sparse fingerprints. To validate our approach, we collected multiple datasets from both consumer-grade WiFi devices and professional equipment across diverse environments. Extensive experiments conducted on both synthetic and real-world datasets demonstrate that the AGML model-based localization method consistently outperforms all baseline methods using sparse fingerprints across all evaluated metrics.
△ Less
Submitted 7 April, 2025;
originally announced April 2025.
-
Distributed Primal-Dual Algorithms: Unification, Connections, and Insights
Authors:
Runxiong Wu,
Dong Liu,
Xueqin Wang,
Andi Wang
Abstract:
We study primal-dual algorithms for general empirical risk minimization problems in distributed settings, focusing on two prominent classes of algorithms. The first class is the communication-efficient distributed dual coordinate ascent (CoCoA), derived from the coordinate ascent method for solving the dual problem. The second class is the alternating direction method of multipliers (ADMM), includ…
▽ More
We study primal-dual algorithms for general empirical risk minimization problems in distributed settings, focusing on two prominent classes of algorithms. The first class is the communication-efficient distributed dual coordinate ascent (CoCoA), derived from the coordinate ascent method for solving the dual problem. The second class is the alternating direction method of multipliers (ADMM), including consensus ADMM, linearized ADMM, and proximal ADMM. We demonstrate that both classes of algorithms can be transformed into a unified update form that involves only primal and dual variables. This discovery reveals key connections between the two classes of algorithms: CoCoA can be interpreted as a special case of proximal ADMM for solving the dual problem, while consensus ADMM is closely related to a proximal ADMM algorithm. This discovery provides the insight that by adjusting the augmented Lagrangian parameter, we can easily enable the ADMM variants to outperform the CoCoA variants. We further explore linearized versions of ADMM and analyze the effects of tuning parameters on these ADMM variants in the distributed setting. Our theoretical findings are supported by extensive simulation studies and real-world data analysis.
△ Less
Submitted 1 February, 2025;
originally announced February 2025.
-
Test-time regression: a unifying framework for designing sequence models with associative memory
Authors:
Ke Alexander Wang,
Jiaxin Shi,
Emily B. Fox
Abstract:
Sequence models lie at the heart of modern deep learning. However, rapid advancements have produced a diversity of seemingly unrelated architectures, such as Transformers and recurrent alternatives. In this paper, we introduce a unifying framework to understand and derive these sequence models, inspired by the empirical importance of associative recall, the capability to retrieve contextually rele…
▽ More
Sequence models lie at the heart of modern deep learning. However, rapid advancements have produced a diversity of seemingly unrelated architectures, such as Transformers and recurrent alternatives. In this paper, we introduce a unifying framework to understand and derive these sequence models, inspired by the empirical importance of associative recall, the capability to retrieve contextually relevant tokens. We formalize associative recall as a two-step process, memorization and retrieval, casting memorization as a regression problem. Layers that combine these two steps perform associative recall via ``test-time regression'' over its input tokens. Prominent layers, including linear attention, state-space models, fast-weight programmers, online learners, and softmax attention, arise as special cases defined by three design choices: the regression weights, the regressor function class, and the test-time optimization algorithm. Our approach clarifies how linear attention fails to capture inter-token correlations and offers a mathematical justification for the empirical effectiveness of query-key normalization in softmax attention. Further, it illuminates unexplored regions within the design space, which we use to derive novel higher-order generalizations of softmax attention. Beyond unification, our work bridges sequence modeling with classic regression methods, a field with extensive literature, paving the way for developing more powerful and theoretically principled architectures.
△ Less
Submitted 1 May, 2025; v1 submitted 21 January, 2025;
originally announced January 2025.
-
Transfer Learning for Individualized Treatment Rules: Application to Sepsis Patients Data from eICU-CRD and MIMIC-III Databases
Authors:
Andong Wang,
Kelly Wentzlof,
Johnny Rajala,
Miontranese Green,
Yunshu Zhang,
Shu Yang
Abstract:
Modern precision medicine aims to utilize real-world data to provide the best treatment for an individual patient. An individualized treatment rule (ITR) maps each patient's characteristics to a recommended treatment scheme that maximizes the expected outcome of the patient. A challenge precision medicine faces is population heterogeneity, as studies on treatment effects are often conducted on sou…
▽ More
Modern precision medicine aims to utilize real-world data to provide the best treatment for an individual patient. An individualized treatment rule (ITR) maps each patient's characteristics to a recommended treatment scheme that maximizes the expected outcome of the patient. A challenge precision medicine faces is population heterogeneity, as studies on treatment effects are often conducted on source populations that differ from the populations of interest in terms of the distribution of patient characteristics. Our research goal is to explore a transfer learning algorithm that aims to address the population heterogeneity problem and obtain targeted, optimal, and interpretable ITRs. The algorithm incorporates a calibrated augmented inverse probability weighting (CAIPW) estimator for the average treatment effect (ATE) and employs value function maximization for the target population using Genetic Algorithm (GA) to produce our desired ITR. To demonstrate its practical utility, we apply this transfer learning algorithm to two large medical databases, Electronic Intensive Care Unit Collaborative Research Database (eICU-CRD) and Medical Information Mart for Intensive Care III (MIMIC-III). We first identify the important covariates, treatment options, and outcomes of interest based on the two databases, and then estimate the optimal linear ITRs for patients with sepsis. Our research introduces and applies new techniques for data fusion to obtain data-driven ITRs that cater to patients' individual medical needs in a population of interest. By emphasizing generalizability and personalized decision-making, this methodology extends its potential application beyond medicine to fields such as marketing, technology, social sciences, and education.
△ Less
Submitted 3 January, 2025;
originally announced January 2025.
-
Revisiting Interactions of Multiple Driver States in Heterogenous Population and Cognitive Tasks
Authors:
Jiyao Wang,
Ange Wang,
Song Yan,
Dengbo He,
Kaishun Wu
Abstract:
In real-world driving scenarios, multiple states occur simultaneously due to individual differences and environmental factors, complicating the analysis and estimation of driver states. Previous studies, limited by experimental design and analytical methods, may not be able to disentangle the relationships among multiple driver states and environmental factors. This paper introduces the Double Mac…
▽ More
In real-world driving scenarios, multiple states occur simultaneously due to individual differences and environmental factors, complicating the analysis and estimation of driver states. Previous studies, limited by experimental design and analytical methods, may not be able to disentangle the relationships among multiple driver states and environmental factors. This paper introduces the Double Machine Learning (DML) analysis method to the field of driver state analysis to tackle this challenge. To train and test the DML model, a driving simulator experiment with 42 participants was conducted. All participants drove SAE level-3 vehicles and conducted three types of cognitive tasks in a 3-hour driving experiment. Drivers' subjective cognitive load and drowsiness levels were collected throughout the experiment. Then, we isolated individual and environmental factors affecting driver state variations and the factors affecting drivers' physiological and eye-tracking metrics when they are under specific states. The results show that our approach successfully decoupled and inferred the complex causal relationships between multiple types of drowsiness and cognitive load. Additionally, we identified key physiological and eye-tracking indicators in the presence of multiple driver states and under the influence of a single state, excluding the influence of other driver states, environmental factors, and individual characteristics. Our causal inference analytical framework can offer new insights for subsequent analysis of drivers' states. Further, the updated causal relation graph based on the DML analysis can provide theoretical bases for driver state monitoring based on physiological and eye-tracking measures.
△ Less
Submitted 19 December, 2024; v1 submitted 18 December, 2024;
originally announced December 2024.
-
Ensemble Kalman Diffusion Guidance: A Derivative-free Method for Inverse Problems
Authors:
Hongkai Zheng,
Wenda Chu,
Austin Wang,
Nikola Kovachki,
Ricardo Baptista,
Yisong Yue
Abstract:
When solving inverse problems, one increasingly popular approach is to use pre-trained diffusion models as plug-and-play priors. This framework can accommodate different forward models without re-training while preserving the generative capability of diffusion models. Despite their success in many imaging inverse problems, most existing methods rely on privileged information such as derivative, ps…
▽ More
When solving inverse problems, one increasingly popular approach is to use pre-trained diffusion models as plug-and-play priors. This framework can accommodate different forward models without re-training while preserving the generative capability of diffusion models. Despite their success in many imaging inverse problems, most existing methods rely on privileged information such as derivative, pseudo-inverse, or full knowledge about the forward model. This reliance poses a substantial limitation that restricts their use in a wide range of problems where such information is unavailable, such as in many scientific applications. We propose Ensemble Kalman Diffusion Guidance (EnKG), a derivative-free approach that can solve inverse problems by only accessing forward model evaluations and a pre-trained diffusion model prior. We study the empirical effectiveness of EnKG across various inverse problems, including scientific settings such as inferring fluid flows and astronomical objects, which are highly non-linear inverse problems that often only permit black-box access to the forward model. We open-source our code at https://github.com/devzhk/enkg-pytorch.
△ Less
Submitted 2 June, 2025; v1 submitted 30 September, 2024;
originally announced September 2024.
-
Explicit convergence rates of underdamped Langevin dynamics under weighted and weak Poincaré--Lions inequalities
Authors:
Giovanni Brigati,
Gabriel Stoltz,
Andi Q. Wang,
Lihan Wang
Abstract:
We study the long-time behavior of the underdamped Langevin dynamics, in the case of so-called \emph{weak confinement}. Indeed, any $\mathrm{L}^\infty$ distribution (in position and velocity) relaxes to equilibrium over time, and we quantify the convergence rate. In our situation, the spatial equilibrium distribution does not satisfy a Poincaré inequality. Instead, we assume a weighted Poincaré in…
▽ More
We study the long-time behavior of the underdamped Langevin dynamics, in the case of so-called \emph{weak confinement}. Indeed, any $\mathrm{L}^\infty$ distribution (in position and velocity) relaxes to equilibrium over time, and we quantify the convergence rate. In our situation, the spatial equilibrium distribution does not satisfy a Poincaré inequality. Instead, we assume a weighted Poincaré inequality, which allows for fat-tail or sub-exponential potential energies. We provide constructive and fully explicit estimates in $\mathrm{L}^2$-norm for $\mathrm{L}^\infty$ initial data. A key-ingredient is a new space-time weighted Poincaré--Lions inequality, entailing, in turn, a weak Poincaré--Lions inequality.
△ Less
Submitted 17 June, 2025; v1 submitted 22 July, 2024;
originally announced July 2024.
-
On Understanding Attention-Based In-Context Learning for Categorical Data
Authors:
Aaron T. Wang,
William Convertino,
Xiang Cheng,
Ricardo Henao,
Lawrence Carin
Abstract:
In-context learning based on attention models is examined for data with categorical outcomes, with inference in such models viewed from the perspective of functional gradient descent (GD). We develop a network composed of attention blocks, with each block employing a self-attention layer followed by a cross-attention layer, with associated skip connections. This model can exactly perform multi-ste…
▽ More
In-context learning based on attention models is examined for data with categorical outcomes, with inference in such models viewed from the perspective of functional gradient descent (GD). We develop a network composed of attention blocks, with each block employing a self-attention layer followed by a cross-attention layer, with associated skip connections. This model can exactly perform multi-step functional GD inference for in-context inference with categorical observations. We perform a theoretical analysis of this setup, generalizing many prior assumptions in this line of work, including the class of attention mechanisms for which it is appropriate. We demonstrate the framework empirically on synthetic data, image classification and language generation.
△ Less
Submitted 6 May, 2025; v1 submitted 27 May, 2024;
originally announced May 2024.
-
skscope: Fast Sparsity-Constrained Optimization in Python
Authors:
Zezhi Wang,
Jin Zhu,
Peng Chen,
Huiyang Peng,
Xiaoke Zhang,
Anran Wang,
Junxian Zhu,
Xueqin Wang
Abstract:
Applying iterative solvers on sparsity-constrained optimization (SCO) requires tedious mathematical deduction and careful programming/debugging that hinders these solvers' broad impact. In the paper, the library skscope is introduced to overcome such an obstacle. With skscope, users can solve the SCO by just programming the objective function. The convenience of skscope is demonstrated through two…
▽ More
Applying iterative solvers on sparsity-constrained optimization (SCO) requires tedious mathematical deduction and careful programming/debugging that hinders these solvers' broad impact. In the paper, the library skscope is introduced to overcome such an obstacle. With skscope, users can solve the SCO by just programming the objective function. The convenience of skscope is demonstrated through two examples in the paper, where sparse linear regression and trend filtering are addressed with just four lines of code. More importantly, skscope's efficient implementation allows state-of-the-art solvers to quickly attain the sparse solution regardless of the high dimensionality of parameter space. Numerical experiments reveal the available solvers in skscope can achieve up to 80x speedup on the competing relaxation solutions obtained via the benchmarked convex solver. skscope is published on the Python Package Index (PyPI) and Conda, and its source code is available at: https://github.com/abess-team/skscope.
△ Less
Submitted 11 October, 2024; v1 submitted 27 March, 2024;
originally announced March 2024.
-
Weak Poincaré inequality comparisons for ideal and hybrid slice sampling
Authors:
Sam Power,
Daniel Rudolf,
Björn Sprungk,
Andi Q. Wang
Abstract:
Using the framework of weak Poincar{é} inequalities, we provide a general comparison between the Hybrid and Ideal Slice Sampling Markov chains in terms of their Dirichlet forms. In particular, under suitable assumptions Hybrid Slice Sampling will inherit fast convergence from Ideal Slice Sampling and conversely. We apply our results to analyse the convergence of the Independent Metropolis--Hasting…
▽ More
Using the framework of weak Poincar{é} inequalities, we provide a general comparison between the Hybrid and Ideal Slice Sampling Markov chains in terms of their Dirichlet forms. In particular, under suitable assumptions Hybrid Slice Sampling will inherit fast convergence from Ideal Slice Sampling and conversely. We apply our results to analyse the convergence of the Independent Metropolis--Hastings, Slice Sampling with Stepping-Out and Shrinkage, and Hit-and-Run-within-Slice Sampling algorithms.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
Weak Poincaré Inequalities for Markov chains: theory and applications
Authors:
Christophe Andrieu,
Anthony Lee,
Sam Power,
Andi Q. Wang
Abstract:
We investigate the application of Weak Poincaré Inequalities (WPI) to Markov chains to study their rates of convergence and to derive complexity bounds. At a theoretical level we investigate the necessity of the existence of WPIs to ensure \mathrm{L}^{2}-convergence, in particular by establishing equivalence with the Resolvent Uniform Positivity-Improving (RUPI) condition and providing a counterex…
▽ More
We investigate the application of Weak Poincaré Inequalities (WPI) to Markov chains to study their rates of convergence and to derive complexity bounds. At a theoretical level we investigate the necessity of the existence of WPIs to ensure \mathrm{L}^{2}-convergence, in particular by establishing equivalence with the Resolvent Uniform Positivity-Improving (RUPI) condition and providing a counterexample. From a more practical perspective, we extend the celebrated Cheeger's inequalities to the subgeometric setting, and further apply these techniques to study random-walk Metropolis algorithms for heavy-tailed target distributions and to obtain lower bounds on pseudo-marginal algorithms.
△ Less
Submitted 18 December, 2023;
originally announced December 2023.
-
Go beyond End-to-End Training: Boosting Greedy Local Learning with Context Supply
Authors:
Chengting Yu,
Fengzhao Zhang,
Hanzhi Ma,
Aili Wang,
Erping Li
Abstract:
Traditional end-to-end (E2E) training of deep networks necessitates storing intermediate activations for back-propagation, resulting in a large memory footprint on GPUs and restricted model parallelization. As an alternative, greedy local learning partitions the network into gradient-isolated modules and trains supervisely based on local preliminary losses, thereby providing asynchronous and paral…
▽ More
Traditional end-to-end (E2E) training of deep networks necessitates storing intermediate activations for back-propagation, resulting in a large memory footprint on GPUs and restricted model parallelization. As an alternative, greedy local learning partitions the network into gradient-isolated modules and trains supervisely based on local preliminary losses, thereby providing asynchronous and parallel training methods that substantially reduce memory cost. However, empirical experiments reveal that as the number of segmentations of the gradient-isolated module increases, the performance of the local learning scheme degrades substantially, severely limiting its expansibility. To avoid this issue, we theoretically analyze the greedy local learning from the standpoint of information theory and propose a ContSup scheme, which incorporates context supply between isolated modules to compensate for information loss. Experiments on benchmark datasets (i.e. CIFAR, SVHN, STL-10) achieve SOTA results and indicate that our proposed method can significantly improve the performance of greedy local learning with minimal memory and computational overhead, allowing for the boost of the number of isolated modules. Our codes are available at https://github.com/Tab-ct/ContSup.
△ Less
Submitted 3 December, 2024; v1 submitted 12 December, 2023;
originally announced December 2023.
-
Interpretable Mechanistic Representations for Meal-level Glycemic Control in the Wild
Authors:
Ke Alexander Wang,
Emily B. Fox
Abstract:
Diabetes encompasses a complex landscape of glycemic control that varies widely among individuals. However, current methods do not faithfully capture this variability at the meal level. On the one hand, expert-crafted features lack the flexibility of data-driven methods; on the other hand, learned representations tend to be uninterpretable which hampers clinical adoption. In this paper, we propose…
▽ More
Diabetes encompasses a complex landscape of glycemic control that varies widely among individuals. However, current methods do not faithfully capture this variability at the meal level. On the one hand, expert-crafted features lack the flexibility of data-driven methods; on the other hand, learned representations tend to be uninterpretable which hampers clinical adoption. In this paper, we propose a hybrid variational autoencoder to learn interpretable representations of CGM and meal data. Our method grounds the latent space to the inputs of a mechanistic differential equation, producing embeddings that reflect physiological quantities, such as insulin sensitivity, glucose effectiveness, and basal glucose levels. Moreover, we introduce a novel method to infer the glucose appearance rate, making the mechanistic model robust to unreliable meal logs. On a dataset of CGM and self-reported meals from individuals with type-2 diabetes and pre-diabetes, our unsupervised representation discovers a separation between individuals proportional to their disease severity. Our embeddings produce clusters that are up to 4x better than naive, expert, black-box, and pure mechanistic features. Our method provides a nuanced, yet interpretable, embedding space to compare glycemic control within and across individuals, directly learnable from in-the-wild data.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
Ano-SuPs: Multi-size anomaly detection for manufactured products by identifying suspected patches
Authors:
Hao Xu,
Juan Du,
Andi Wang,
YingCong Chen
Abstract:
Image-based systems have gained popularity owing to their capacity to provide rich manufacturing status information, low implementation costs and high acquisition rates. However, the complexity of the image background and various anomaly patterns pose new challenges to existing matrix decomposition methods, which are inadequate for modeling requirements. Moreover, the uncertainty of the anomaly ca…
▽ More
Image-based systems have gained popularity owing to their capacity to provide rich manufacturing status information, low implementation costs and high acquisition rates. However, the complexity of the image background and various anomaly patterns pose new challenges to existing matrix decomposition methods, which are inadequate for modeling requirements. Moreover, the uncertainty of the anomaly can cause anomaly contamination problems, making the designed model and method highly susceptible to external disturbances. To address these challenges, we propose a two-stage strategy anomaly detection method that detects anomalies by identifying suspected patches (Ano-SuPs). Specifically, we propose to detect the patches with anomalies by reconstructing the input image twice: the first step is to obtain a set of normal patches by removing those suspected patches, and the second step is to use those normal patches to refine the identification of the patches with anomalies. To demonstrate its effectiveness, we evaluate the proposed method systematically through simulation experiments and case studies. We further identified the key parameters and designed steps that impact the model's performance and efficiency.
△ Less
Submitted 3 January, 2025; v1 submitted 20 September, 2023;
originally announced September 2023.
-
Replicability of Simulation Studies for the Investigation of Statistical Methods: The RepliSims Project
Authors:
K. Luijken,
A. Lohmann,
U. Alter,
J. Claramunt Gonzalez,
F. J. Clouth,
J. L. Fossum,
L. Hesen,
A. H. J. Huizing,
J. Ketelaar,
A. K. Montoya,
L. Nab,
R. C. C. Nijman,
B. B. L. Penning de Vries,
T. D. Tibbe,
Y. A. Wang,
R. H. H. Groenwold
Abstract:
Results of simulation studies evaluating the performance of statistical methods are often considered actionable and thus can have a major impact on the way empirical research is implemented. However, so far there is limited evidence about the reproducibility and replicability of statistical simulation studies. Therefore, eight highly cited statistical simulation studies were selected, and their re…
▽ More
Results of simulation studies evaluating the performance of statistical methods are often considered actionable and thus can have a major impact on the way empirical research is implemented. However, so far there is limited evidence about the reproducibility and replicability of statistical simulation studies. Therefore, eight highly cited statistical simulation studies were selected, and their replicability was assessed by teams of replicators with formal training in quantitative methodology. The teams found relevant information in the original publications and used it to write simulation code with the aim of replicating the results. The primary outcome was the feasibility of replicability based on reported information in the original publications. Replicability varied greatly: Some original studies provided detailed information leading to almost perfect replication of results, whereas other studies did not provide enough information to implement any of the reported simulations. Replicators had to make choices regarding missing or ambiguous information in the original studies, error handling, and software environment. Factors facilitating replication included public availability of code, and descriptions of the data-generating procedure and methods in graphs, formulas, structured text, and publicly accessible additional resources such as technical reports. Replicability of statistical simulation studies was mainly impeded by lack of information and sustainability of information sources. Reproducibility could be achieved for simulation studies by providing open code and data as a supplement to the publication. Additionally, simulation studies should be transparently reported with all relevant information either in the research paper itself or in easily accessible supplementary material to allow for replicability.
△ Less
Submitted 5 July, 2023;
originally announced July 2023.
-
Tensor Dirichlet Process Multinomial Mixture Model for Passenger Trajectory Clustering
Authors:
Ziyue Li,
Hao Yan,
Chen Zhang,
Andi Wang,
Wolfgang Ketter,
Lijun Sun,
Fugee Tsung
Abstract:
Passenger clustering based on travel records is essential for transportation operators. However, existing methods cannot easily cluster the passengers due to the hierarchical structure of the passenger trip information, namely: each passenger has multiple trips, and each trip contains multi-dimensional multi-mode information. Furthermore, existing approaches rely on an accurate specification of th…
▽ More
Passenger clustering based on travel records is essential for transportation operators. However, existing methods cannot easily cluster the passengers due to the hierarchical structure of the passenger trip information, namely: each passenger has multiple trips, and each trip contains multi-dimensional multi-mode information. Furthermore, existing approaches rely on an accurate specification of the clustering number to start, which is difficult when millions of commuters are using the transport systems on a daily basis. In this paper, we propose a novel Tensor Dirichlet Process Multinomial Mixture model (Tensor-DPMM), which is designed to preserve the multi-mode and hierarchical structure of the multi-dimensional trip information via tensor, and cluster them in a unified one-step manner. The model also has the ability to determine the number of clusters automatically by using the Dirichlet Process to decide the probabilities for a passenger to be either assigned in an existing cluster or to create a new cluster: This allows our model to grow the clusters as needed in a dynamic manner. Finally, existing methods do not consider spatial semantic graphs such as geographical proximity and functional similarity between the locations, which may cause inaccurate clustering. To this end, we further propose a variant of our model, namely the Tensor-DPMM with Graph. For the algorithm, we propose a tensor Collapsed Gibbs Sampling method, with an innovative step of "disband and relocating", which disbands clusters with too small amount of members and relocates them to the remaining clustering. This avoids uncontrollable growing amounts of clusters. A case study based on Hong Kong metro passenger data is conducted to demonstrate the automatic process of learning the number of clusters, and the learned clusters are better in within-cluster compactness and cross-cluster separateness.
△ Less
Submitted 23 June, 2023;
originally announced June 2023.
-
A Unified Probabilistic Framework for Spatiotemporal Passenger Crowdedness Inference within Urban Rail Transit Network
Authors:
Min Jiang,
Andi Wang,
Ziyue Li,
Fugee Tsung
Abstract:
This paper proposes the Spatio-Temporal Crowdedness Inference Model (STCIM), a framework to infer the passenger distribution inside the whole urban rail transit (URT) system in real-time. Our model is practical since the model is designed in a probabilistic manner and only based on the entry and exit timestamps information collected by the automatic fare collection (AFC) system. Firstly, the entir…
▽ More
This paper proposes the Spatio-Temporal Crowdedness Inference Model (STCIM), a framework to infer the passenger distribution inside the whole urban rail transit (URT) system in real-time. Our model is practical since the model is designed in a probabilistic manner and only based on the entry and exit timestamps information collected by the automatic fare collection (AFC) system. Firstly, the entire URT system is decomposed into several components of stations and segments. By decomposing a passenger's travel actions into entering, traveling, transferring, and exiting, we build a statistical model to estimate the passengers' lingering time within each component and the passengers' destination based on historical AFC data. Then, the passengers' spatial distribution is predicted in real-time based on each passenger's elapsed travel time and their entry station. The effectiveness of the scheme is validated with a real dataset from a real URT system.
△ Less
Submitted 14 June, 2023;
originally announced June 2023.
-
Interpretation and visualization of distance covariance through additive decomposition of correlations formula
Authors:
Andi Wang,
Hao Yan,
Juan Du
Abstract:
Distance covariance is a widely used statistical methodology for testing the dependency between two groups of variables. Despite the appealing properties of consistency and superior testing power, the testing results of distance covariance are often hard to be interpreted. This paper presents an elementary interpretation of the mechanism of distance covariance through an additive decomposition of…
▽ More
Distance covariance is a widely used statistical methodology for testing the dependency between two groups of variables. Despite the appealing properties of consistency and superior testing power, the testing results of distance covariance are often hard to be interpreted. This paper presents an elementary interpretation of the mechanism of distance covariance through an additive decomposition of correlations formula. Based on this formula, a visualization method is developed to provide practitioners with a more intuitive explanation of the distance covariance score.
△ Less
Submitted 24 May, 2023;
originally announced May 2023.
-
Sequence Modeling with Multiresolution Convolutional Memory
Authors:
Jiaxin Shi,
Ke Alexander Wang,
Emily B. Fox
Abstract:
Efficiently capturing the long-range patterns in sequential data sources salient to a given task -- such as classification and generative modeling -- poses a fundamental challenge. Popular approaches in the space tradeoff between the memory burden of brute-force enumeration and comparison, as in transformers, the computational burden of complicated sequential dependencies, as in recurrent neural n…
▽ More
Efficiently capturing the long-range patterns in sequential data sources salient to a given task -- such as classification and generative modeling -- poses a fundamental challenge. Popular approaches in the space tradeoff between the memory burden of brute-force enumeration and comparison, as in transformers, the computational burden of complicated sequential dependencies, as in recurrent neural networks, or the parameter burden of convolutional networks with many or large filters. We instead take inspiration from wavelet-based multiresolution analysis to define a new building block for sequence modeling, which we call a MultiresLayer. The key component of our model is the multiresolution convolution, capturing multiscale trends in the input sequence. Our MultiresConv can be implemented with shared filters across a dilated causal convolution tree. Thus it garners the computational advantages of convolutional networks and the principled theoretical motivation of wavelet decompositions. Our MultiresLayer is straightforward to implement, requires significantly fewer parameters, and maintains at most a $\mathcal{O}(N\log N)$ memory footprint for a length $N$ sequence. Yet, by stacking such layers, our model yields state-of-the-art performance on a number of sequence classification and autoregressive density estimation tasks using CIFAR-10, ListOps, and PTB-XL datasets.
△ Less
Submitted 1 November, 2023; v1 submitted 2 May, 2023;
originally announced May 2023.
-
Compositionality in algorithms for smoothing
Authors:
Moritz Schauer,
Frank van der Meulen,
Andi Q. Wang
Abstract:
Backward Filtering Forward Guiding (BFFG) is a bidirectional algorithm proposed in Mider et al. [2021] and studied more in depth in a general setting in Van der Meulen and Schauer [2022]. In category theory, optics have been proposed for modelling systems with bidirectional data flow. We connect BFFG with optics by demonstrating that the forward and backwards map together define a functor from a c…
▽ More
Backward Filtering Forward Guiding (BFFG) is a bidirectional algorithm proposed in Mider et al. [2021] and studied more in depth in a general setting in Van der Meulen and Schauer [2022]. In category theory, optics have been proposed for modelling systems with bidirectional data flow. We connect BFFG with optics by demonstrating that the forward and backwards map together define a functor from a category of Markov kernels into a category of optics, which can furthermore be lax monoidal under further assumptions.
△ Less
Submitted 16 April, 2025; v1 submitted 24 March, 2023;
originally announced March 2023.
-
Evaluating the Performance of Low-Cost PM2.5 Sensors in Mobile Settings
Authors:
Priyanka deSouza,
An Wang,
Yuki Machida,
Tiffany Duhl,
Simone Mora,
Prashant Kumar,
Ralph Kahn,
Carlo Ratti,
John L. Durant,
Neelakshi Hudda
Abstract:
Low-cost sensors (LCS) for measuring air pollution are increasingly being deployed in mobile applications but questions concerning the quality of the measurements remain unanswered. For example, what is the best way to correct LCS data in a mobile setting? Which factors most significantly contribute to differences between mobile LCS data and higher-quality instruments? Can data from LCS be used to…
▽ More
Low-cost sensors (LCS) for measuring air pollution are increasingly being deployed in mobile applications but questions concerning the quality of the measurements remain unanswered. For example, what is the best way to correct LCS data in a mobile setting? Which factors most significantly contribute to differences between mobile LCS data and higher-quality instruments? Can data from LCS be used to identify hotspots and generate generalizable pollutant concentration maps? To help address these questions we deployed low-cost PM2.5 sensors (Alphasense OPC-N3) and a research-grade instrument (TSI DustTrak) in a mobile laboratory in Boston, MA, USA. We first collocated these instruments with stationary PM2.5 reference monitors at nearby regulatory sites. Next, using the reference measurements, we developed different models to correct the OPC-N3 and DustTrak measurements, and then transferred the corrections to the mobile setting. We observed that more complex correction models appeared to perform better than simpler models in the stationary setting; however, when transferred to the mobile setting, corrected OPC-N3 measurements agreed less well with corrected DustTrak data. In general, corrections developed using minute-level collocation measurements transferred better to the mobile setting than corrections developed using hourly-averaged data. Mobile laboratory speed, OPC-N3 orientation relative to the direction of travel, date, hour-of-the-day, and road class together explain a small but significant amount of variation between corrected OPC-N3 and DustTrak measurements during the mobile deployment. Persistent hotspots identified by the OPC-N3s agreed with those identified by the DustTrak. Similarly, maps of PM2.5 distribution produced from the mobile corrected OPC-N3 and DustTrak measurements agreed well.
△ Less
Submitted 10 January, 2023;
originally announced January 2023.
-
Assessing long-term medical remanufacturing emissions with Life Cycle Analysis
Authors:
Julia A. Meister,
Jack Sharp,
and Yan Wang,
Khuong An Nguyen
Abstract:
The unsustainable take-make-dispose linear economy prevalent in healthcare contributes 4.4% to global Greenhouse Gas emissions. A popular but not yet widely-embraced solution is to remanufacture common single-use medical devices like electrophysiology catheters, significantly extending their lifetimes by enabling a circular life cycle. To support the adoption of catheter remanufacturing, we propos…
▽ More
The unsustainable take-make-dispose linear economy prevalent in healthcare contributes 4.4% to global Greenhouse Gas emissions. A popular but not yet widely-embraced solution is to remanufacture common single-use medical devices like electrophysiology catheters, significantly extending their lifetimes by enabling a circular life cycle. To support the adoption of catheter remanufacturing, we propose a comprehensive emission framework and carry out a holistic evaluation of virgin manufactured and remanufactured carbon emissions with Life Cycle Analysis (LCA). We followed ISO modelling standards and NHS reporting guidelines to ensure industry relevance. We conclude that remanufacturing may lead to a reduction of up to 60% per turn (-1.92 kg CO2eq, burden-free) and 57% per life (-1.87 kg CO2eq, burdened). Our extensive sensitivity analysis and industry-informed buy-back scheme simulation revealed long-term emission reductions of up to 48% per remanufactured catheter life (-1.73 kg CO2eq). Our comprehensive results encourage the adoption of electrophysiology catheter remanufacturing, and highlight the importance of estimating long-term emissions in addition to traditional emission metrics.
△ Less
Submitted 9 February, 2023; v1 submitted 28 November, 2022;
originally announced November 2022.
-
Operator Splitting Value Iteration
Authors:
Amin Rakhsha,
Andrew Wang,
Mohammad Ghavamzadeh,
Amir-massoud Farahmand
Abstract:
We introduce new planning and reinforcement learning algorithms for discounted MDPs that utilize an approximate model of the environment to accelerate the convergence of the value function. Inspired by the splitting approach in numerical linear algebra, we introduce Operator Splitting Value Iteration (OS-VI) for both Policy Evaluation and Control problems. OS-VI achieves a much faster convergence…
▽ More
We introduce new planning and reinforcement learning algorithms for discounted MDPs that utilize an approximate model of the environment to accelerate the convergence of the value function. Inspired by the splitting approach in numerical linear algebra, we introduce Operator Splitting Value Iteration (OS-VI) for both Policy Evaluation and Control problems. OS-VI achieves a much faster convergence rate when the model is accurate enough. We also introduce a sample-based version of the algorithm called OS-Dyna. Unlike the traditional Dyna architecture, OS-Dyna still converges to the correct value function in presence of model approximation error.
△ Less
Submitted 25 November, 2022;
originally announced November 2022.
-
Explicit convergence bounds for Metropolis Markov chains: isoperimetry, spectral gaps and profiles
Authors:
Christophe Andrieu,
Anthony Lee,
Sam Power,
Andi Q. Wang
Abstract:
We derive the first explicit bounds for the spectral gap of a random walk Metropolis algorithm on $R^d$ for any value of the proposal variance, which when scaled appropriately recovers the correct $d^{-1}$ dependence on dimension for suitably regular invariant distributions. We also obtain explicit bounds on the ${\rm L}^2$-mixing time for a broad class of models. In obtaining these results, we re…
▽ More
We derive the first explicit bounds for the spectral gap of a random walk Metropolis algorithm on $R^d$ for any value of the proposal variance, which when scaled appropriately recovers the correct $d^{-1}$ dependence on dimension for suitably regular invariant distributions. We also obtain explicit bounds on the ${\rm L}^2$-mixing time for a broad class of models. In obtaining these results, we refine the use of isoperimetric profile inequalities to obtain conductance profile bounds, which also enable the derivation of explicit bounds in a much broader class of models. We also obtain similar results for the preconditioned Crank--Nicolson Markov chain, obtaining dimension-independent bounds under suitable assumptions.
△ Less
Submitted 31 October, 2023; v1 submitted 16 November, 2022;
originally announced November 2022.
-
Sampling using Adaptive Regenerative Processes
Authors:
Hector McKimm,
Andi Q Wang,
Murray Pollock,
Christian P Robert,
Gareth O Roberts
Abstract:
Enriching Brownian motion with regenerations from a fixed regeneration distribution $μ$ at a particular regeneration rate $κ$ results in a Markov process that has a target distribution $π$ as its invariant distribution. For the purpose of Monte Carlo inference, implementing such a scheme requires firstly selection of regeneration distribution $μ$, and secondly computation of a specific constant…
▽ More
Enriching Brownian motion with regenerations from a fixed regeneration distribution $μ$ at a particular regeneration rate $κ$ results in a Markov process that has a target distribution $π$ as its invariant distribution. For the purpose of Monte Carlo inference, implementing such a scheme requires firstly selection of regeneration distribution $μ$, and secondly computation of a specific constant $C$. Both of these tasks can be very difficult in practice for good performance. We introduce a method for adapting the regeneration distribution, by adding point masses to it. This allows the process to be simulated with as few regenerations as possible and obviates the need to find said constant $C$. Moreover, the choice of fixed $μ$ is replaced with the choice of the initial regeneration distribution, which is considerably less difficult. We establish convergence of this resulting self-reinforcing process and explore its effectiveness at sampling from a number of target distributions. The examples show that adapting the regeneration distribution guards against poor choices of fixed regeneration distribution and can reduce the error of Monte Carlo estimates of expectations of interest, especially when $π$ is skewed.
△ Less
Submitted 20 February, 2024; v1 submitted 18 October, 2022;
originally announced October 2022.
-
STSC-SNN: Spatio-Temporal Synaptic Connection with Temporal Convolution and Attention for Spiking Neural Networks
Authors:
Chengting Yu,
Zheming Gu,
Da Li,
Gaoang Wang,
Aili Wang,
Erping Li
Abstract:
Spiking Neural Networks (SNNs), as one of the algorithmic models in neuromorphic computing, have gained a great deal of research attention owing to temporal information processing capability, low power consumption, and high biological plausibility. The potential to efficiently extract spatio-temporal features makes it suitable for processing the event streams. However, existing synaptic structures…
▽ More
Spiking Neural Networks (SNNs), as one of the algorithmic models in neuromorphic computing, have gained a great deal of research attention owing to temporal information processing capability, low power consumption, and high biological plausibility. The potential to efficiently extract spatio-temporal features makes it suitable for processing the event streams. However, existing synaptic structures in SNNs are almost full-connections or spatial 2D convolution, neither of which can extract temporal dependencies adequately. In this work, we take inspiration from biological synapses and propose a spatio-temporal synaptic connection SNN (STSC-SNN) model, to enhance the spatio-temporal receptive fields of synaptic connections, thereby establishing temporal dependencies across layers. Concretely, we incorporate temporal convolution and attention mechanisms to implement synaptic filtering and gating functions. We show that endowing synaptic models with temporal dependencies can improve the performance of SNNs on classification tasks. In addition, we investigate the impact of performance vias varied spatial-temporal receptive fields and reevaluate the temporal modules in SNNs. Our approach is tested on neuromorphic datasets, including DVS128 Gesture (gesture recognition), N-MNIST, CIFAR10-DVS (image classification), and SHD (speech digit recognition). The results show that the proposed model outperforms the state-of-the-art accuracy on nearly all datasets.
△ Less
Submitted 11 October, 2022;
originally announced October 2022.
-
Plateau in Monotonic Linear Interpolation -- A "Biased" View of Loss Landscape for Deep Networks
Authors:
Xiang Wang,
Annie N. Wang,
Mo Zhou,
Rong Ge
Abstract:
Monotonic linear interpolation (MLI) - on the line connecting a random initialization with the minimizer it converges to, the loss and accuracy are monotonic - is a phenomenon that is commonly observed in the training of neural networks. Such a phenomenon may seem to suggest that optimization of neural networks is easy. In this paper, we show that the MLI property is not necessarily related to the…
▽ More
Monotonic linear interpolation (MLI) - on the line connecting a random initialization with the minimizer it converges to, the loss and accuracy are monotonic - is a phenomenon that is commonly observed in the training of neural networks. Such a phenomenon may seem to suggest that optimization of neural networks is easy. In this paper, we show that the MLI property is not necessarily related to the hardness of optimization problems, and empirical observations on MLI for deep neural networks depend heavily on biases. In particular, we show that interpolating both weights and biases linearly leads to very different influences on the final output, and when different classes have different last-layer biases on a deep network, there will be a long plateau in both the loss and accuracy interpolation (which existing theory of MLI cannot explain). We also show how the last-layer biases for different classes can be different even on a perfectly balanced dataset using a simple model. Empirically we demonstrate that similar intuitions hold on practical networks and realistic datasets.
△ Less
Submitted 14 February, 2023; v1 submitted 3 October, 2022;
originally announced October 2022.
-
Poincaré inequalities for Markov chains: a meeting with Cheeger, Lyapunov and Metropolis
Authors:
Christophe Andrieu,
Anthony Lee,
Sam Power,
Andi Q. Wang
Abstract:
We develop a theory of weak Poincaré inequalities to characterize convergence rates of ergodic Markov chains. Motivated by the application of Markov chains in the context of algorithms, we develop a relevant set of tools which enable the practical study of convergence rates in the setting of Markov chain Monte Carlo methods, but also well beyond.
We develop a theory of weak Poincaré inequalities to characterize convergence rates of ergodic Markov chains. Motivated by the application of Markov chains in the context of algorithms, we develop a relevant set of tools which enable the practical study of convergence rates in the setting of Markov chain Monte Carlo methods, but also well beyond.
△ Less
Submitted 10 August, 2022;
originally announced August 2022.
-
Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml
Authors:
Elham E Khoda,
Dylan Rankin,
Rafael Teixeira de Lima,
Philip Harris,
Scott Hauck,
Shih-Chieh Hsu,
Michael Kagan,
Vladimir Loncar,
Chaitanya Paikara,
Richa Rao,
Sioni Summers,
Caterina Vernieri,
Aaron Wang
Abstract:
Recurrent neural networks have been shown to be effective architectures for many tasks in high energy physics, and thus have been widely adopted. Their use in low-latency environments has, however, been limited as a result of the difficulties of implementing recurrent architectures on field-programmable gate arrays (FPGAs). In this paper we present an implementation of two types of recurrent neura…
▽ More
Recurrent neural networks have been shown to be effective architectures for many tasks in high energy physics, and thus have been widely adopted. Their use in low-latency environments has, however, been limited as a result of the difficulties of implementing recurrent architectures on field-programmable gate arrays (FPGAs). In this paper we present an implementation of two types of recurrent neural network layers -- long short-term memory and gated recurrent unit -- within the hls4ml framework. We demonstrate that our implementation is capable of producing effective designs for both small and large models, and can be customized to meet specific design requirements for inference latencies and FPGA resources. We show the performance and synthesized designs for multiple neural networks, many of which are trained specifically for jet identification tasks at the CERN Large Hadron Collider.
△ Less
Submitted 1 July, 2022;
originally announced July 2022.
-
The Machine Learning for Combinatorial Optimization Competition (ML4CO): Results and Insights
Authors:
Maxime Gasse,
Quentin Cappart,
Jonas Charfreitag,
Laurent Charlin,
Didier Chételat,
Antonia Chmiela,
Justin Dumouchelle,
Ambros Gleixner,
Aleksandr M. Kazachkov,
Elias Khalil,
Pawel Lichocki,
Andrea Lodi,
Miles Lubin,
Chris J. Maddison,
Christopher Morris,
Dimitri J. Papageorgiou,
Augustin Parjadis,
Sebastian Pokutta,
Antoine Prouvost,
Lara Scavuzzo,
Giulia Zarpellon,
Linxin Yang,
Sha Lai,
Akang Wang,
Xiaodong Luo
, et al. (16 additional authors not shown)
Abstract:
Combinatorial optimization is a well-established area in operations research and computer science. Until recently, its methods have focused on solving problem instances in isolation, ignoring that they often stem from related data distributions in practice. However, recent years have seen a surge of interest in using machine learning as a new approach for solving combinatorial problems, either dir…
▽ More
Combinatorial optimization is a well-established area in operations research and computer science. Until recently, its methods have focused on solving problem instances in isolation, ignoring that they often stem from related data distributions in practice. However, recent years have seen a surge of interest in using machine learning as a new approach for solving combinatorial problems, either directly as solvers or by enhancing exact solvers. Based on this context, the ML4CO aims at improving state-of-the-art combinatorial optimization solvers by replacing key heuristic components. The competition featured three challenging tasks: finding the best feasible solution, producing the tightest optimality certificate, and giving an appropriate solver configuration. Three realistic datasets were considered: balanced item placement, workload apportionment, and maritime inventory routing. This last dataset was kept anonymous for the contestants.
△ Less
Submitted 17 March, 2022; v1 submitted 4 March, 2022;
originally announced March 2022.
-
Robust classification with flexible discriminant analysis in heterogeneous data
Authors:
Pierre Houdouin,
Frédéric Pascal,
Matthieu Jonckheere,
Andrew Wang
Abstract:
Linear and Quadratic Discriminant Analysis are well-known classical methods but can heavily suffer from non-Gaussian distributions and/or contaminated datasets, mainly because of the underlying Gaussian assumption that is not robust. To fill this gap, this paper presents a new robust discriminant analysis where each data point is drawn by its own arbitrary Elliptically Symmetrical (ES) distributio…
▽ More
Linear and Quadratic Discriminant Analysis are well-known classical methods but can heavily suffer from non-Gaussian distributions and/or contaminated datasets, mainly because of the underlying Gaussian assumption that is not robust. To fill this gap, this paper presents a new robust discriminant analysis where each data point is drawn by its own arbitrary Elliptically Symmetrical (ES) distribution and its own arbitrary scale parameter. Such a model allows for possibly very heterogeneous, independent but non-identically distributed samples. After deriving a new decision rule, it is shown that maximum-likelihood parameter estimation and classification are very simple, fast and robust compared to state-of-the-art methods.
△ Less
Submitted 9 January, 2022;
originally announced January 2022.
-
Is Importance Weighting Incompatible with Interpolating Classifiers?
Authors:
Ke Alexander Wang,
Niladri S. Chatterji,
Saminul Haque,
Tatsunori Hashimoto
Abstract:
Importance weighting is a classic technique to handle distribution shifts. However, prior work has presented strong empirical and theoretical evidence demonstrating that importance weights can have little to no effect on overparameterized neural networks. Is importance weighting truly incompatible with the training of overparameterized neural networks? Our paper answers this in the negative. We sh…
▽ More
Importance weighting is a classic technique to handle distribution shifts. However, prior work has presented strong empirical and theoretical evidence demonstrating that importance weights can have little to no effect on overparameterized neural networks. Is importance weighting truly incompatible with the training of overparameterized neural networks? Our paper answers this in the negative. We show that importance weighting fails not because of the overparameterization, but instead, as a result of using exponentially-tailed losses like the logistic or cross-entropy loss. As a remedy, we show that polynomially-tailed losses restore the effects of importance reweighting in correcting distribution shift in overparameterized models. We characterize the behavior of gradient descent on importance weighted polynomially-tailed losses with overparameterized linear models, and theoretically demonstrate the advantage of using polynomially-tailed losses in a label shift setting. Surprisingly, our theory shows that using weights that are obtained by exponentiating the classical unbiased importance weights can improve performance. Finally, we demonstrate the practical value of our analysis with neural network experiments on a subpopulation shift and a label shift dataset. When reweighted, our loss function can outperform reweighted cross-entropy by as much as 9% in test accuracy. Our loss function also gives test accuracies comparable to, or even exceeding, well-tuned state-of-the-art methods for correcting distribution shifts.
△ Less
Submitted 4 March, 2022; v1 submitted 24 December, 2021;
originally announced December 2021.
-
Comparison of Markov chains via weak Poincaré inequalities with application to pseudo-marginal MCMC
Authors:
Christophe Andrieu,
Anthony Lee,
Sam Power,
Andi Q. Wang
Abstract:
We investigate the use of a certain class of functional inequalities known as weak Poincaré inequalities to bound convergence of Markov chains to equilibrium. We show that this enables the straightforward and transparent derivation of subgeometric convergence bounds for methods such as the Independent Metropolis--Hastings sampler and pseudo-marginal methods for intractable likelihoods, the latter…
▽ More
We investigate the use of a certain class of functional inequalities known as weak Poincaré inequalities to bound convergence of Markov chains to equilibrium. We show that this enables the straightforward and transparent derivation of subgeometric convergence bounds for methods such as the Independent Metropolis--Hastings sampler and pseudo-marginal methods for intractable likelihoods, the latter being subgeometric in many practical settings. These results rely on novel quantitative comparison theorems between Markov chains. Associated proofs are simpler than those relying on drift/minorization conditions and the tools developed allow us to recover and further extend known results as particular cases. We are then able to provide new insights into the practical use of pseudo-marginal algorithms, analyse the effect of averaging in Approximate Bayesian Computation (ABC) and the use of products of independent averages, and also to study the case of lognormal weights relevant to particle marginal Metropolis--Hastings (PMMH).
△ Less
Submitted 9 August, 2022; v1 submitted 10 December, 2021;
originally announced December 2021.
-
PDMP Monte Carlo methods for piecewise-smooth densities
Authors:
Augustin Chevallier,
Sam Power,
Andi Q. Wang,
Paul Fearnhead
Abstract:
There has been substantial interest in developing Markov chain Monte Carlo algorithms based on piecewise-deterministic Markov processes. However existing algorithms can only be used if the target distribution of interest is differentiable everywhere. The key to adapting these algorithms so that they can sample from to densities with discontinuities is defining appropriate dynamics for the process…
▽ More
There has been substantial interest in developing Markov chain Monte Carlo algorithms based on piecewise-deterministic Markov processes. However existing algorithms can only be used if the target distribution of interest is differentiable everywhere. The key to adapting these algorithms so that they can sample from to densities with discontinuities is defining appropriate dynamics for the process when it hits a discontinuity. We present a simple condition for the transition of the process at a discontinuity which can be used to extend any existing sampler for smooth densities, and give specific choices for this transition which work with popular algorithms such as the Bouncy Particle Sampler, the Coordinate Sampler and the Zig-Zag Process. Our theoretical results extend and make rigorous arguments that have been presented previously, for instance constructing samplers for continuous densities restricted to a bounded domain, and we present a version of the Zig-Zag Process that can work in such a scenario. Our novel approach to deriving the invariant distribution of a piecewise-deterministic Markov process with boundaries may be of independent interest.
△ Less
Submitted 10 November, 2021;
originally announced November 2021.
-
Perturbation theory for killed Markov processes and quasi-stationary distributions
Authors:
Daniel Rudolf,
Andi Q. Wang
Abstract:
Motivated by recent developments of quasi-stationary Monte Carlo methods, we investigate the stability of quasi-stationary distributions of killed Markov processes under perturbations of the generator. We first consider a general bounded self-adjoint perturbation operator, and after that, study a particular unbounded perturbation corresponding to truncation of the killing rate. In both scenarios,…
▽ More
Motivated by recent developments of quasi-stationary Monte Carlo methods, we investigate the stability of quasi-stationary distributions of killed Markov processes under perturbations of the generator. We first consider a general bounded self-adjoint perturbation operator, and after that, study a particular unbounded perturbation corresponding to truncation of the killing rate. In both scenarios, we quantify the difference between eigenfunctions of the smallest eigenvalue of the perturbed and unperturbed generators in a Hilbert space norm. As a consequence, L1 norm estimates of the difference of the resulting quasi-stationary distributions in terms of the perturbation are provided.
△ Less
Submitted 24 September, 2024; v1 submitted 28 September, 2021;
originally announced September 2021.
-
SKIing on Simplices: Kernel Interpolation on the Permutohedral Lattice for Scalable Gaussian Processes
Authors:
Sanyam Kapoor,
Marc Finzi,
Ke Alexander Wang,
Andrew Gordon Wilson
Abstract:
State-of-the-art methods for scalable Gaussian processes use iterative algorithms, requiring fast matrix vector multiplies (MVMs) with the covariance kernel. The Structured Kernel Interpolation (SKI) framework accelerates these MVMs by performing efficient MVMs on a grid and interpolating back to the original space. In this work, we develop a connection between SKI and the permutohedral lattice us…
▽ More
State-of-the-art methods for scalable Gaussian processes use iterative algorithms, requiring fast matrix vector multiplies (MVMs) with the covariance kernel. The Structured Kernel Interpolation (SKI) framework accelerates these MVMs by performing efficient MVMs on a grid and interpolating back to the original space. In this work, we develop a connection between SKI and the permutohedral lattice used for high-dimensional fast bilateral filtering. Using a sparse simplicial grid instead of a dense rectangular one, we can perform GP inference exponentially faster in the dimension than SKI. Our approach, Simplex-GP, enables scaling SKI to high dimensions, while maintaining strong predictive performance. We additionally provide a CUDA implementation of Simplex-GP, which enables significant GPU acceleration of MVM based inference.
△ Less
Submitted 12 June, 2021;
originally announced June 2021.
-
Bayesian Algorithm Execution: Estimating Computable Properties of Black-box Functions Using Mutual Information
Authors:
Willie Neiswanger,
Ke Alexander Wang,
Stefano Ermon
Abstract:
In many real-world problems, we want to infer some property of an expensive black-box function $f$, given a budget of $T$ function evaluations. One example is budget constrained global optimization of $f$, for which Bayesian optimization is a popular method. Other properties of interest include local optima, level sets, integrals, or graph-structured information induced by $f$. Often, we can find…
▽ More
In many real-world problems, we want to infer some property of an expensive black-box function $f$, given a budget of $T$ function evaluations. One example is budget constrained global optimization of $f$, for which Bayesian optimization is a popular method. Other properties of interest include local optima, level sets, integrals, or graph-structured information induced by $f$. Often, we can find an algorithm $\mathcal{A}$ to compute the desired property, but it may require far more than $T$ queries to execute. Given such an $\mathcal{A}$, and a prior distribution over $f$, we refer to the problem of inferring the output of $\mathcal{A}$ using $T$ evaluations as Bayesian Algorithm Execution (BAX). To tackle this problem, we present a procedure, InfoBAX, that sequentially chooses queries that maximize mutual information with respect to the algorithm's output. Applying this to Dijkstra's algorithm, for instance, we infer shortest paths in synthetic and real-world graphs with black-box edge costs. Using evolution strategies, we yield variants of Bayesian optimization that target local, rather than global, optima. On these problems, InfoBAX uses up to 500 times fewer queries to $f$ than required by the original algorithm. Our method is closely connected to other Bayesian optimal experimental design procedures such as entropy search methods and optimal sensor placement using Gaussian processes.
△ Less
Submitted 6 July, 2021; v1 submitted 19 April, 2021;
originally announced April 2021.
-
A Two-Sample Robust Bayesian Mendelian Randomization Method Accounting for Linkage Disequilibrium and Idiosyncratic Pleiotropy with Applications to the COVID-19 Outcome
Authors:
Anqi Wang,
Zhonghua Liu
Abstract:
Mendelian randomization (MR) is a statistical method exploiting genetic variants as instrumental variables to estimate the causal effect of modifiable risk factors on an outcome of interest. Despite wide uses of various popular two-sample MR methods based on genome-wide association study summary level data, however, those methods could suffer from potential power loss or/and biased inference when…
▽ More
Mendelian randomization (MR) is a statistical method exploiting genetic variants as instrumental variables to estimate the causal effect of modifiable risk factors on an outcome of interest. Despite wide uses of various popular two-sample MR methods based on genome-wide association study summary level data, however, those methods could suffer from potential power loss or/and biased inference when the chosen genetic variants are in linkage disequilibrium (LD), and also have relatively large direct effects on the outcome whose distribution might be heavy-tailed which is commonly referred to as the idiosyncratic pleiotropy phenomenon. To resolve those two issues, we propose a novel Robust Bayesian Mendelian Randomization (RBMR) model that uses the more robust multivariate generalized t-distribution to model such direct effects in a probabilistic model framework which can also incorporate the LD structure explicitly. The generalized t-distribution can be represented as a Gaussian scaled mixture so that our model parameters can be estimated by the EM-type algorithms. We compute the standard errors by calibrating the evidence lower bound using the likelihood ratio test. Through extensive simulation studies, we show that our RBMR has robust performance compared to other competing methods. We also apply our RBMR method to two benchmark data sets and find that RBMR has smaller bias and standard errors. Using our proposed RBMR method, we find that coronary artery disease is associated with increased risk of critically ill coronavirus disease 2019 (COVID-19). We also develop a user-friendly R package RBMR for public use.
△ Less
Submitted 16 November, 2021; v1 submitted 4 March, 2021;
originally announced March 2021.
-
Subgeometric hypocoercivity for piecewise-deterministic Markov process Monte Carlo methods
Authors:
Christophe Andrieu,
Paul Dobson,
Andi Q. Wang
Abstract:
We extend the hypocoercivity framework for piecewise-deterministic Markov process (PDMP) Monte Carlo established in [Andrieu et. al. (2018)] to heavy-tailed target distributions, which exhibit subgeometric rates of convergence to equilibrium. We make use of weak Poincaré inequalities, as developed in the work of [Grothaus and Wang (2019)], the ideas of which we adapt to the PDMPs of interest. On t…
▽ More
We extend the hypocoercivity framework for piecewise-deterministic Markov process (PDMP) Monte Carlo established in [Andrieu et. al. (2018)] to heavy-tailed target distributions, which exhibit subgeometric rates of convergence to equilibrium. We make use of weak Poincaré inequalities, as developed in the work of [Grothaus and Wang (2019)], the ideas of which we adapt to the PDMPs of interest. On the way we report largely potential-independent approaches to bounding explicitly solutions of the Poisson equation of the Langevin diffusion and its first and second derivatives, required here to control various terms arising in the application of the hypocoercivity result.
△ Less
Submitted 29 April, 2021; v1 submitted 18 November, 2020;
originally announced November 2020.
-
Simplifying Hamiltonian and Lagrangian Neural Networks via Explicit Constraints
Authors:
Marc Finzi,
Ke Alexander Wang,
Andrew Gordon Wilson
Abstract:
Reasoning about the physical world requires models that are endowed with the right inductive biases to learn the underlying dynamics. Recent works improve generalization for predicting trajectories by learning the Hamiltonian or Lagrangian of a system rather than the differential equations directly. While these methods encode the constraints of the systems using generalized coordinates, we show th…
▽ More
Reasoning about the physical world requires models that are endowed with the right inductive biases to learn the underlying dynamics. Recent works improve generalization for predicting trajectories by learning the Hamiltonian or Lagrangian of a system rather than the differential equations directly. While these methods encode the constraints of the systems using generalized coordinates, we show that embedding the system into Cartesian coordinates and enforcing the constraints explicitly with Lagrange multipliers dramatically simplifies the learning problem. We introduce a series of challenging chaotic and extended-body systems, including systems with N-pendulums, spring coupling, magnetic fields, rigid rotors, and gyroscopes, to push the limits of current approaches. Our experiments show that Cartesian coordinates with explicit constraints lead to a 100x improvement in accuracy and data efficiency.
△ Less
Submitted 26 October, 2020;
originally announced October 2020.
-
Dissecting Hessian: Understanding Common Structure of Hessian in Neural Networks
Authors:
Yikai Wu,
Xingyu Zhu,
Chenwei Wu,
Annie Wang,
Rong Ge
Abstract:
Hessian captures important properties of the deep neural network loss landscape. Previous works have observed low rank structure in the Hessians of neural networks. In this paper, we propose a decoupling conjecture that decomposes the layer-wise Hessians of a network as the Kronecker product of two smaller matrices. We can analyze the properties of these smaller matrices and prove the structure of…
▽ More
Hessian captures important properties of the deep neural network loss landscape. Previous works have observed low rank structure in the Hessians of neural networks. In this paper, we propose a decoupling conjecture that decomposes the layer-wise Hessians of a network as the Kronecker product of two smaller matrices. We can analyze the properties of these smaller matrices and prove the structure of top eigenspace random 2-layer networks. The decoupling conjecture has several other interesting implications - top eigenspaces for different models have surprisingly high overlap, and top eigenvectors form low rank matrices when they are reshaped into the same shape as the corresponding weight matrix. All of these can be verified empirically for deeper networks. Finally, we use the structure of layer-wise Hessian to get better explicit generalization bounds for neural networks.
△ Less
Submitted 21 October, 2022; v1 submitted 8 October, 2020;
originally announced October 2020.
-
SketchEmbedNet: Learning Novel Concepts by Imitating Drawings
Authors:
Alexander Wang,
Mengye Ren,
Richard S. Zemel
Abstract:
Sketch drawings capture the salient information of visual concepts. Previous work has shown that neural networks are capable of producing sketches of natural objects drawn from a small number of classes. While earlier approaches focus on generation quality or retrieval, we explore properties of image representations learned by training a model to produce sketches of images. We show that this gener…
▽ More
Sketch drawings capture the salient information of visual concepts. Previous work has shown that neural networks are capable of producing sketches of natural objects drawn from a small number of classes. While earlier approaches focus on generation quality or retrieval, we explore properties of image representations learned by training a model to produce sketches of images. We show that this generative, class-agnostic model produces informative embeddings of images from novel examples, classes, and even novel datasets in a few-shot setting. Additionally, we find that these learned representations exhibit interesting structure and compositionality.
△ Less
Submitted 22 June, 2021; v1 submitted 27 August, 2020;
originally announced September 2020.
-
Trajectory Based Podcast Recommendation
Authors:
Greg Benton,
Ghazal Fazelnia,
Alice Wang,
Ben Carterette
Abstract:
Podcast recommendation is a growing area of research that presents new challenges and opportunities. Individuals interact with podcasts in a way that is distinct from most other media; and primary to our concerns is distinct from music consumption. We show that successful and consistent recommendations can be made by viewing users as moving through the podcast library sequentially. Recommendations…
▽ More
Podcast recommendation is a growing area of research that presents new challenges and opportunities. Individuals interact with podcasts in a way that is distinct from most other media; and primary to our concerns is distinct from music consumption. We show that successful and consistent recommendations can be made by viewing users as moving through the podcast library sequentially. Recommendations for future podcasts are then made using the trajectory taken from their sequential behavior. Our experiments provide evidence that user behavior is confined to local trends, and that listening patterns tend to be found over short sequences of similar types of shows. Ultimately, our approach gives a450%increase in effectiveness over a collaborative filtering baseline.
△ Less
Submitted 8 September, 2020;
originally announced September 2020.
-
Stronger and Faster Wasserstein Adversarial Attacks
Authors:
Kaiwen Wu,
Allen Houze Wang,
Yaoliang Yu
Abstract:
Deep models, while being extremely flexible and accurate, are surprisingly vulnerable to "small, imperceptible" perturbations known as adversarial attacks. While the majority of existing attacks focus on measuring perturbations under the $\ell_p$ metric, Wasserstein distance, which takes geometry in pixel space into account, has long been known to be a suitable metric for measuring image quality a…
▽ More
Deep models, while being extremely flexible and accurate, are surprisingly vulnerable to "small, imperceptible" perturbations known as adversarial attacks. While the majority of existing attacks focus on measuring perturbations under the $\ell_p$ metric, Wasserstein distance, which takes geometry in pixel space into account, has long been known to be a suitable metric for measuring image quality and has recently risen as a compelling alternative to the $\ell_p$ metric in adversarial attacks. However, constructing an effective attack under the Wasserstein metric is computationally much more challenging and calls for better optimization algorithms. We address this gap in two ways: (a) we develop an exact yet efficient projection operator to enable a stronger projected gradient attack; (b) we show that the Frank-Wolfe method equipped with a suitable linear minimization oracle works extremely fast under Wasserstein constraints. Our algorithms not only converge faster but also generate much stronger attacks. For instance, we decrease the accuracy of a residual network on CIFAR-10 to $3.4\%$ within a Wasserstein perturbation ball of radius $0.005$, in contrast to $65.6\%$ using the previous Wasserstein attack based on an \emph{approximate} projection operator. Furthermore, employing our stronger attacks in adversarial training significantly improves the robustness of adversarially trained models.
△ Less
Submitted 6 August, 2020;
originally announced August 2020.
-
Additive Tensor Decomposition Considering Structural Data Information
Authors:
Shancong Mou,
Andi Wang,
Chuck Zhang,
Jianjun Shi
Abstract:
Tensor data with rich structural information becomes increasingly important in process modeling, monitoring, and diagnosis. Here structural information is referred to structural properties such as sparsity, smoothness, low-rank, and piecewise constancy. To reveal useful information from tensor data, we propose to decompose the tensor into the summation of multiple components based on different str…
▽ More
Tensor data with rich structural information becomes increasingly important in process modeling, monitoring, and diagnosis. Here structural information is referred to structural properties such as sparsity, smoothness, low-rank, and piecewise constancy. To reveal useful information from tensor data, we propose to decompose the tensor into the summation of multiple components based on different structural information of them. In this paper, we provide a new definition of structural information in tensor data. Based on it, we propose an additive tensor decomposition (ATD) framework to extract useful information from tensor data. This framework specifies a high dimensional optimization problem to obtain the components with distinct structural information. An alternating direction method of multipliers (ADMM) algorithm is proposed to solve it, which is highly parallelable and thus suitable for the proposed optimization problem. Two simulation examples and a real case study in medical image analysis illustrate the versatility and effectiveness of the ATD framework.
△ Less
Submitted 27 July, 2020;
originally announced July 2020.
-
Quantum Criticism: A Tagged News Corpus Analysed for Sentiment and Named Entities
Authors:
Ashwini Badgujar,
Sheng Chen,
Andrew Wang,
Kai Yu,
Paul Intrevado,
David Guy Brizan
Abstract:
In this research, we continuously collect data from the RSS feeds of traditional news sources. We apply several pre-trained implementations of named entity recognition (NER) tools, quantifying the success of each implementation. We also perform sentiment analysis of each news article at the document, paragraph and sentence level, with the goal of creating a corpus of tagged news articles that is m…
▽ More
In this research, we continuously collect data from the RSS feeds of traditional news sources. We apply several pre-trained implementations of named entity recognition (NER) tools, quantifying the success of each implementation. We also perform sentiment analysis of each news article at the document, paragraph and sentence level, with the goal of creating a corpus of tagged news articles that is made available to the public through a web interface. Finally, we show how the data in this corpus could be used to identify bias in news reporting.
△ Less
Submitted 5 June, 2020;
originally announced June 2020.
-
Fast Risk Assessment for Autonomous Vehicles Using Learned Models of Agent Futures
Authors:
Allen Wang,
Xin Huang,
Ashkan Jasour,
Brian Williams
Abstract:
This paper presents fast non-sampling based methods to assess the risk of trajectories for autonomous vehicles when probabilistic predictions of other agents' futures are generated by deep neural networks (DNNs). The presented methods address a wide range of representations for uncertain predictions including both Gaussian and non-Gaussian mixture models for predictions of both agent positions and…
▽ More
This paper presents fast non-sampling based methods to assess the risk of trajectories for autonomous vehicles when probabilistic predictions of other agents' futures are generated by deep neural networks (DNNs). The presented methods address a wide range of representations for uncertain predictions including both Gaussian and non-Gaussian mixture models for predictions of both agent positions and controls. We show that the problem of risk assessment when Gaussian mixture models (GMMs) of agent positions are learned can be solved rapidly to arbitrary levels of accuracy with existing numerical methods. To address the problem of risk assessment for non-Gaussian mixture models of agent position, we propose finding upper bounds on risk using Chebyshev's Inequality and sums-of-squares (SOS) programming; they are both of interest as the former is much faster while the latter can be arbitrarily tight. These approaches only require statistical moments of agent positions to determine upper bounds on risk. To perform risk assessment when models are learned for agent controls as opposed to positions, we develop TreeRing, an algorithm analogous to tree search over the ring of polynomials that can be used to exactly propagate moments of control distributions into position distributions through nonlinear dynamics. The presented methods are demonstrated on realistic predictions from DNNs trained on the Argoverse and CARLA datasets and are shown to be effective for rapidly assessing the probability of low probability events.
△ Less
Submitted 3 June, 2020; v1 submitted 27 May, 2020;
originally announced May 2020.