Skip to main content

Showing 1–26 of 26 results for author: Xu, A

Searching in archive stat. Search in all archives.
.
  1. arXiv:2505.14808  [pdf, ps, other

    stat.ML cs.LG math.ST

    Out-of-Distribution Generalization of In-Context Learning: A Low-Dimensional Subspace Perspective

    Authors: Soo Min Kwon, Alec S. Xu, Can Yaras, Laura Balzano, Qing Qu

    Abstract: This work aims to demystify the out-of-distribution (OOD) capabilities of in-context learning (ICL) by studying linear regression tasks parameterized with low-rank covariance matrices. With such a parameterization, we can model distribution shifts as a varying angle between the subspace of the training and testing covariance matrices. We prove that a single-layer linear attention model incurs a te… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  2. arXiv:2503.17503  [pdf, other

    cs.LG physics.geo-ph stat.ML

    Towards Understanding the Benefits of Neural Network Parameterizations in Geophysical Inversions: A Study With Neural Fields

    Authors: Anran Xu, Lindsey J. Heagy

    Abstract: In this work, we employ neural fields, which use neural networks to map a coordinate to the corresponding physical property value at that coordinate, in a test-time learning manner. For a test-time learning method, the weights are learned during the inversion, as compared to traditional approaches which require a network to be trained using a training dataset. Results for synthetic examples in sei… ▽ More

    Submitted 22 May, 2025; v1 submitted 21 March, 2025; originally announced March 2025.

  3. arXiv:2502.01763  [pdf, other

    cs.LG math.OC stat.ML

    On The Concurrence of Layer-wise Preconditioning Methods and Provable Feature Learning

    Authors: Thomas T. Zhang, Behrad Moniri, Ansh Nagwekar, Faraz Rahman, Anton Xue, Hamed Hassani, Nikolai Matni

    Abstract: Layer-wise preconditioning methods are a family of memory-efficient optimization algorithms that introduce preconditioners per axis of each layer's weight tensors. These methods have seen a recent resurgence, demonstrating impressive performance relative to entry-wise ("diagonal") preconditioning methods such as Adam(W) on a wide range of neural network optimization tasks. Complementary to their p… ▽ More

    Submitted 3 February, 2025; originally announced February 2025.

  4. arXiv:2501.02364  [pdf, other

    cs.LG cs.CV stat.ML

    Understanding How Nonlinear Layers Create Linearly Separable Features for Low-Dimensional Data

    Authors: Alec S. Xu, Can Yaras, Peng Wang, Qing Qu

    Abstract: Deep neural networks have attained remarkable success across diverse classification tasks. Recent empirical studies have shown that deep networks learn features that are linearly separable across classes. However, these findings often lack rigorous justifications, even under relatively simple settings. In this work, we address this gap by examining the linear separation capabilities of shallow non… ▽ More

    Submitted 4 January, 2025; originally announced January 2025.

    Comments: 32 pages, 9 figures

  5. arXiv:2409.04897  [pdf, other

    cs.DS cs.CY cs.LG econ.TH stat.ML

    Centralized Selection with Preferences in the Presence of Biases

    Authors: L. Elisa Celis, Amit Kumar, Nisheeth K. Vishnoi, Andrew Xu

    Abstract: This paper considers the scenario in which there are multiple institutions, each with a limited capacity for candidates, and candidates, each with preferences over the institutions. A central entity evaluates the utility of each candidate to the institutions, and the goal is to select candidates for each institution in a way that maximizes utility while also considering the candidates' preferences… ▽ More

    Submitted 7 September, 2024; originally announced September 2024.

    Comments: The conference version of this paper appears in ICML 2024

  6. arXiv:2407.11426  [pdf, other

    cs.LG cs.AI stat.ME

    Generally-Occurring Model Change for Robust Counterfactual Explanations

    Authors: Ao Xu, Tieru Wu

    Abstract: With the increasing impact of algorithmic decision-making on human lives, the interpretability of models has become a critical issue in machine learning. Counterfactual explanation is an important method in the field of interpretable machine learning, which can not only help users understand why machine learning models make specific decisions, but also help users understand how to change these dec… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  7. arXiv:2309.04626  [pdf, other

    stat.ML cs.AI cs.IT cs.LG

    Perceptual adjustment queries and an inverted measurement paradigm for low-rank metric learning

    Authors: Austin Xu, Andrew D. McRae, Jingyan Wang, Mark A. Davenport, Ashwin Pananjady

    Abstract: We introduce a new type of query mechanism for collecting human feedback, called the perceptual adjustment query ( PAQ). Being both informative and cognitively lightweight, the PAQ adopts an inverted measurement scheme, and combines advantages from both cardinal and ordinal queries. We showcase the PAQ in the metric learning problem, where we collect PAQ measurements to learn an unknown Mahalanobi… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: 42 pages, 6 figures

  8. arXiv:2305.19215  [pdf, other

    stat.ML cs.LG

    dotears: Scalable, consistent DAG estimation using observational and interventional data

    Authors: Albert Xue, Jingyou Rao, Sriram Sankararaman, Harold Pimentel

    Abstract: New biological assays like Perturb-seq link highly parallel CRISPR interventions to a high-dimensional transcriptomic readout, providing insight into gene regulatory networks. Causal gene regulatory networks can be represented by directed acyclic graph (DAGs), but learning DAGs from observational data is complicated by lack of identifiability and a combinatorial solution space. Score-based structu… ▽ More

    Submitted 20 February, 2024; v1 submitted 30 May, 2023; originally announced May 2023.

  9. arXiv:2303.17758  [pdf, other

    stat.AP stat.CO

    Commuter Count: Inferring Travel Patterns from Location Data

    Authors: Nathan Musoke, Emily Kendall, Mateja Gosenca, Lillian Guo, Lerh Feng Low, Angela Xue, Richard Easther

    Abstract: In this Working Paper we analyse computational strategies for using aggregated spatio-temporal population data acquired from telecommunications networks to infer travel and movement patterns between geographical regions. Specifically, we focus on hour-by-hour cellphone counts for the SA-2 geographical regions covering the whole of New Zealand. This Working Paper describes the implementation of the… ▽ More

    Submitted 30 March, 2023; originally announced March 2023.

    Comments: Submitted to Covid-19 Modelling Aotearoa

  10. arXiv:2301.08852  [pdf, other

    stat.ME eess.SP stat.ML

    HeMPPCAT: Mixtures of Probabilistic Principal Component Analysers for Data with Heteroscedastic Noise

    Authors: Alec S. Xu, Laura Balzano, Jeffrey A. Fessler

    Abstract: Mixtures of probabilistic principal component analysis (MPPCA) is a well-known mixture model extension of principal component analysis (PCA). Similar to PCA, MPPCA assumes the data samples in each mixture contain homoscedastic noise. However, datasets with heterogeneous noise across samples are becoming increasingly common, as larger datasets are generated by collecting samples from several source… ▽ More

    Submitted 25 January, 2023; v1 submitted 20 January, 2023; originally announced January 2023.

  11. arXiv:2212.02688  [pdf, other

    stat.ME stat.AP

    Online Bayesian prediction of remaining useful life for gamma degradation process under conjugate priors

    Authors: Ancha Xu

    Abstract: Gamma process has been extensively used to model monotone degradation data. Statistical inference for the gamma process is difficult due to the complex parameter structure involved in the likelihood function. In this paper, we derive a conjugate prior for the homogeneous gamma process, and some properties of the prior distribution are explored. Three algorithms (Gibbs sampling, discrete grid sampl… ▽ More

    Submitted 5 December, 2022; originally announced December 2022.

  12. arXiv:2206.06469  [pdf

    cs.LG stat.ML

    Invariant Structure Learning for Better Generalization and Causal Explainability

    Authors: Yunhao Ge, Sercan Ö. Arik, Jinsung Yoon, Ao Xu, Laurent Itti, Tomas Pfister

    Abstract: Learning the causal structure behind data is invaluable for improving generalization and obtaining high-quality explanations. We propose a novel framework, Invariant Structure Learning (ISL), that is designed to improve causal structure discovery by utilizing generalization as an indication. ISL splits the data into different environments, and learns a structure that is invariant to the target acr… ▽ More

    Submitted 13 June, 2022; originally announced June 2022.

    Comments: 16 pages (including Appendix), 4 figures

  13. arXiv:2202.01953  [pdf, other

    cs.LG stat.ML

    Active metric learning and classification using similarity queries

    Authors: Namrata Nadagouda, Austin Xu, Mark A. Davenport

    Abstract: Active learning is commonly used to train label-efficient models by adaptively selecting the most informative queries. However, most active learning strategies are designed to either learn a representation of the data (e.g., embedding or metric learning) or perform well on a task (e.g., classification) on the data. However, many machine learning tasks involve a combination of both representation l… ▽ More

    Submitted 3 February, 2022; originally announced February 2022.

    Comments: 23 pages, 14 figures

  14. arXiv:2012.14868  [pdf, ps, other

    cs.LG cs.AI cs.IT math.ST stat.ML

    Minimum Excess Risk in Bayesian Learning

    Authors: Aolin Xu, Maxim Raginsky

    Abstract: We analyze the best achievable performance of Bayesian learning under generative models by defining and upper-bounding the minimum excess risk (MER): the gap between the minimum expected loss attainable by learning from data and the minimum expected loss that could be achieved if the model realization were known. The definition of MER provides a principled way to define different notions of uncert… ▽ More

    Submitted 31 December, 2021; v1 submitted 29 December, 2020; originally announced December 2020.

    Comments: Added results on realizable models and connection to VC dimension of the generative function class. Several results have been improved

  15. arXiv:2009.02302  [pdf, other

    stat.ML cs.LG

    Simultaneous Preference and Metric Learning from Paired Comparisons

    Authors: Austin Xu, Mark A. Davenport

    Abstract: A popular model of preference in the context of recommendation systems is the so-called \emph{ideal point} model. In this model, a user is represented as a vector $\mathbf{u}$ together with a collection of items $\mathbf{x_1}, \ldots, \mathbf{x_N}$ in a common low-dimensional space. The vector $\mathbf{u}$ represents the user's "ideal point," or the ideal combination of features that represents a… ▽ More

    Submitted 6 September, 2020; v1 submitted 4 September, 2020; originally announced September 2020.

    Comments: 16 pages, 10 figures

  16. arXiv:2008.06233  [pdf, other

    cs.LG stat.ML

    Privacy-Preserving Asynchronous Federated Learning Algorithms for Multi-Party Vertically Collaborative Learning

    Authors: Bin Gu, An Xu, Zhouyuan Huo, Cheng Deng, Heng Huang

    Abstract: The privacy-preserving federated learning for vertically partitioned data has shown promising results as the solution of the emerging multi-party joint modeling application, in which the data holders (such as government branches, private finance and e-business companies) collaborate throughout the learning process rather than relying on a trusted third party to hold data. However, existing federat… ▽ More

    Submitted 14 August, 2020; originally announced August 2020.

  17. arXiv:2008.05823  [pdf, other

    cs.LG cs.DC stat.ML

    Step-Ahead Error Feedback for Distributed Training with Compressed Gradient

    Authors: An Xu, Zhouyuan Huo, Heng Huang

    Abstract: Although the distributed machine learning methods can speed up the training of large deep neural networks, the communication cost has become the non-negligible bottleneck to constrain the performance. To address this challenge, the gradient compression based communication-efficient distributed learning methods were designed to reduce the communication cost, and more recently the local error feedba… ▽ More

    Submitted 24 January, 2022; v1 submitted 13 August, 2020; originally announced August 2020.

  18. arXiv:2004.05298  [pdf, other

    cs.LG cs.CV stat.ML

    Detached Error Feedback for Distributed SGD with Random Sparsification

    Authors: An Xu, Heng Huang

    Abstract: The communication bottleneck has been a critical problem in large-scale distributed deep learning. In this work, we study distributed SGD with random block-wise sparsification as the gradient compressor, which is ring-allreduce compatible and highly computation-efficient but leads to inferior performance. To tackle this important issue, we improve the communication-efficient distributed SGD from a… ▽ More

    Submitted 13 June, 2022; v1 submitted 10 April, 2020; originally announced April 2020.

  19. arXiv:2002.11082  [pdf, other

    cs.LG cs.DC stat.ML

    Optimal Gradient Quantization Condition for Communication-Efficient Distributed Training

    Authors: An Xu, Zhouyuan Huo, Heng Huang

    Abstract: The communication of gradients is costly for training deep neural networks with multiple devices in computer vision applications. In particular, the growing size of deep learning models leads to higher communication overheads that defy the ideal linear training speedup regarding the number of devices. Gradient quantization is one of the common methods to reduce communication costs. However, it can… ▽ More

    Submitted 25 February, 2020; originally announced February 2020.

  20. arXiv:1912.07127  [pdf, other

    cs.LG stat.ML

    Sepsis World Model: A MIMIC-based OpenAI Gym "World Model" Simulator for Sepsis Treatment

    Authors: Amirhossein Kiani, Chris Wang, Angela Xu

    Abstract: Sepsis is a life-threatening condition caused by the body's response to an infection. In order to treat patients with sepsis, physicians must control varying dosages of various antibiotics, fluids, and vasopressors based on a large number of variables in an emergency setting. In this project we employ a "world model" methodology to create a simulator that aims to predict the next state of a patien… ▽ More

    Submitted 15 December, 2019; originally announced December 2019.

    Comments: This project was done as a class project for CS221 at Stanford University

  21. arXiv:1911.05268  [pdf, other

    cs.LG cs.AI cs.CR stat.ML

    Adversarial Examples in Modern Machine Learning: A Review

    Authors: Rey Reza Wiyatno, Anqi Xu, Ousmane Dia, Archy de Berker

    Abstract: Recent research has found that many families of machine learning models are vulnerable to adversarial examples: inputs that are specifically designed to cause the target model to produce erroneous outputs. In this survey, we focus on machine learning models in the visual domain, where methods for generating and detecting such examples have been most extensively studied. We explore a variety of adv… ▽ More

    Submitted 15 November, 2019; v1 submitted 12 November, 2019; originally announced November 2019.

    Comments: Work in progress, 97 pages

  22. arXiv:1909.02625  [pdf, other

    cs.LG cs.DC stat.ML

    On the Acceleration of Deep Learning Model Parallelism with Staleness

    Authors: An Xu, Zhouyuan Huo, Heng Huang

    Abstract: Training the deep convolutional neural network for computer vision problems is slow and inefficient, especially when it is large and distributed across multiple devices. The inefficiency is caused by the backpropagation algorithm's forward locking, backward locking, and update locking problems. Existing solutions for acceleration either can only handle one locking problem or lead to severe accurac… ▽ More

    Submitted 19 January, 2022; v1 submitted 5 September, 2019; originally announced September 2019.

  23. arXiv:1904.00788  [pdf, other

    cs.CL cs.LG stat.ML

    Neural Abstractive Text Summarization and Fake News Detection

    Authors: Soheil Esmaeilzadeh, Gao Xian Peh, Angela Xu

    Abstract: In this work, we study abstractive text summarization by exploring different models such as LSTM-encoder-decoder with attention, pointer-generator networks, coverage mechanisms, and transformers. Upon extensive and careful hyperparameter tuning we compare the proposed architectures against each other for the abstractive text summarization task. Finally, as an extension of our work, we apply our te… ▽ More

    Submitted 12 December, 2019; v1 submitted 24 March, 2019; originally announced April 2019.

  24. arXiv:1808.07945  [pdf, other

    cs.LG stat.ML

    Maximal Jacobian-based Saliency Map Attack

    Authors: Rey Wiyatno, Anqi Xu

    Abstract: The Jacobian-based Saliency Map Attack is a family of adversarial attack methods for fooling classification models, such as deep neural networks for image classification tasks. By saturating a few pixels in a given image to their maximum or minimum values, JSMA can cause the model to misclassify the resulting adversarial image as a specified erroneous target class. We propose two variants of JSMA,… ▽ More

    Submitted 23 August, 2018; originally announced August 2018.

    Comments: Extended version of extended abstract for MAIS 2018

  25. arXiv:1705.07809  [pdf, ps, other

    cs.LG cs.IT stat.ML

    Information-theoretic analysis of generalization capability of learning algorithms

    Authors: Aolin Xu, Maxim Raginsky

    Abstract: We derive upper bounds on the generalization error of a learning algorithm in terms of the mutual information between its input and output. The bounds provide an information-theoretic understanding of generalization in learning problems, and give theoretical guidelines for striking the right balance between data fit and generalization by controlling the input-output mutual information. We propose… ▽ More

    Submitted 6 November, 2017; v1 submitted 22 May, 2017; originally announced May 2017.

    Comments: Final version, accepted to NIPS 2017

  26. arXiv:1004.5442  [pdf, ps, other

    cond-mat.soft cs.CE nlin.CG physics.comp-ph physics.flu-dyn stat.CO

    Multiple-Relaxation-Time Lattice Boltzmann Approach to Compressible Flows with Flexible Specific-Heat Ratio and Prandtl Number

    Authors: Feng Chen, Aiguo Xu, Guangcai Zhang, Yingjun Li, Sauro Succi

    Abstract: A new multiple-relaxation-time lattice Boltzmann scheme for compressible flows with arbitrary specific heat ratio and Prandtl number is presented. In the new scheme, which is based on a two-dimensional 16-discrete-velocity model, the moment space and the corresponding transformation matrix are constructed according to the seven-moment relations associated with the local equilibrium distribution fu… ▽ More

    Submitted 31 May, 2010; v1 submitted 29 April, 2010; originally announced April 2010.

    Comments: Accepted for publication in EPL

    Journal ref: EPL (Europhysics Letters) 90, 54003 (2010)