Skip to main content

Showing 1–50 of 133 results for author: Cao, Y

Searching in archive stat. Search in all archives.
.
  1. arXiv:2507.01732  [pdf, ps, other

    stat.ME

    Gold after Randomized Sand: Model-X Split Knockoffs for Controlled Transformation Selection

    Authors: Yang Cao, Hangyu Lin, Xinwei Sun, Yuan Yao

    Abstract: Controlling the False Discovery Rate (FDR) in variable selection is crucial for reproducibility and preventing over-selection, particularly with the increasing prevalence of predictive modeling. The Split Knockoff method, a recent extension of the canonical Knockoffs framework, offers finite-sample FDR control for selecting sparse transformations, finding applications across signal processing, eco… ▽ More

    Submitted 2 July, 2025; originally announced July 2025.

  2. arXiv:2505.21074  [pdf, ps, other

    cs.LG cs.AI cs.CR cs.CV stat.ML

    Red-Teaming Text-to-Image Systems by Rule-based Preference Modeling

    Authors: Yichuan Cao, Yibo Miao, Xiao-Shan Gao, Yinpeng Dong

    Abstract: Text-to-image (T2I) models raise ethical and safety concerns due to their potential to generate inappropriate or harmful images. Evaluating these models' security through red-teaming is vital, yet white-box approaches are limited by their need for internal access, complicating their use with closed-source models. Moreover, existing black-box methods often assume knowledge about the model's specifi… ▽ More

    Submitted 27 May, 2025; originally announced May 2025.

  3. arXiv:2505.12129  [pdf, other

    cs.LG stat.ME stat.ML

    Metric Graph Kernels via the Tropical Torelli Map

    Authors: Yueqi Cao, Anthea Monod

    Abstract: We propose new graph kernels grounded in the study of metric graphs via tropical algebraic geometry. In contrast to conventional graph kernels that are based on graph combinatorics such as nodes, edges, and subgraphs, our graph kernels are purely based on the geometry and topology of the underlying metric space. A key characterizing property of our construction is its invariance under edge subdivi… ▽ More

    Submitted 17 May, 2025; originally announced May 2025.

    Comments: 20 pages, 7 figures

  4. arXiv:2505.09432  [pdf, ps, other

    cs.LG stat.ML

    Establishing Linear Surrogate Regret Bounds for Convex Smooth Losses via Convolutional Fenchel-Young Losses

    Authors: Yuzhou Cao, Han Bao, Lei Feng, Bo An

    Abstract: Surrogate regret bounds, also known as excess risk bounds, bridge the gap between the convergence rates of surrogate and target losses, with linear bounds favorable for their lossless regret transfer. While convex smooth surrogate losses are appealing in particular due to the efficient estimation and optimization, the existence of a trade-off between the smoothness and linear regret bound has been… ▽ More

    Submitted 14 May, 2025; v1 submitted 14 May, 2025; originally announced May 2025.

  5. arXiv:2505.06531  [pdf, ps, other

    stat.ML cs.LG math.ST

    High-Dimensional Importance-Weighted Information Criteria: Theory and Optimality

    Authors: Yong-Syun Cao, Shinpei Imori, Ching-Kang Ing

    Abstract: Imori and Ing (2025) proposed the importance-weighted orthogonal greedy algorithm (IWOGA) for model selection in high-dimensional misspecified regression models under covariate shift. To determine the number of IWOGA iterations, they introduced the high-dimensional importance-weighted information criterion (HDIWIC). They argued that the combined use of IWOGA and HDIWIC, IWOGA + HDIWIC, achieves an… ▽ More

    Submitted 10 May, 2025; originally announced May 2025.

  6. arXiv:2504.11344  [pdf, other

    cs.LG cs.AI stat.ML

    Interpretable Hybrid-Rule Temporal Point Processes

    Authors: Yunyang Cao, Juekai Lin, Hongye Wang, Wenhao Li, Bo Jin

    Abstract: Temporal Point Processes (TPPs) are widely used for modeling event sequences in various medical domains, such as disease onset prediction, progression analysis, and clinical decision support. Although TPPs effectively capture temporal dynamics, their lack of interpretability remains a critical challenge. Recent advancements have introduced interpretable TPPs. However, these methods fail to incorpo… ▽ More

    Submitted 19 April, 2025; v1 submitted 15 April, 2025; originally announced April 2025.

  7. arXiv:2504.08638  [pdf, other

    stat.ML cs.LG

    Transformer Learns Optimal Variable Selection in Group-Sparse Classification

    Authors: Chenyang Zhang, Xuran Meng, Yuan Cao

    Abstract: Transformers have demonstrated remarkable success across various applications. However, the success of transformers have not been understood in theory. In this work, we give a case study of how transformers can be trained to learn a classic statistical model with "group sparsity", where the input variables form multiple groups, and the label only depends on the variables from one of the groups. We… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

    Comments: 63 pages, 6 figures

  8. arXiv:2504.08628  [pdf, other

    stat.ML cs.LG

    Gradient Descent Robustly Learns the Intrinsic Dimension of Data in Training Convolutional Neural Networks

    Authors: Chenyang Zhang, Peifeng Gao, Difan Zou, Yuan Cao

    Abstract: Modern neural networks are usually highly over-parameterized. Behind the wide usage of over-parameterized networks is the belief that, if the data are simple, then the trained network will be automatically equivalent to a simple predictor. Following this intuition, many existing works have studied different notions of "ranks" of neural networks and their relation to the rank of data. In this work,… ▽ More

    Submitted 11 April, 2025; originally announced April 2025.

    Comments: 43 pages, 4 figures

  9. arXiv:2504.02646  [pdf, other

    cs.LG cs.AI cs.IR stat.ML

    Prompt Optimization with Logged Bandit Data

    Authors: Haruka Kiyohara, Daniel Yiming Cao, Yuta Saito, Thorsten Joachims

    Abstract: We study how to use naturally available user feedback, such as clicks, to optimize large language model (LLM) pipelines for generating personalized sentences using prompts. Naive approaches, which estimate the policy gradient in the prompt space, suffer either from variance caused by the large action space of prompts or bias caused by inaccurate reward predictions. To circumvent these challenges,… ▽ More

    Submitted 3 April, 2025; originally announced April 2025.

    Comments: Preprint

  10. arXiv:2502.15609  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    On the Robustness of Transformers against Context Hijacking for Linear Classification

    Authors: Tianle Li, Chenyang Zhang, Xingwu Chen, Yuan Cao, Difan Zou

    Abstract: Transformer-based Large Language Models (LLMs) have demonstrated powerful in-context learning capabilities. However, their predictions can be disrupted by factually correct context, a phenomenon known as context hijacking, revealing a significant robustness issue. To understand this phenomenon theoretically, we explore an in-context linear classification problem based on recent advances in linear… ▽ More

    Submitted 21 February, 2025; originally announced February 2025.

  11. arXiv:2502.06007  [pdf, other

    stat.ML cs.LG

    Transformers versus the EM Algorithm in Multi-class Clustering

    Authors: Yihan He, Hong-Yu Chen, Yuan Cao, Jianqing Fan, Han Liu

    Abstract: LLMs demonstrate significant inference capacities in complicated machine learning tasks, using the Transformer model as its backbone. Motivated by the limited understanding of such models on the unsupervised learning problems, we study the learning guarantees of Transformers in performing multi-class clustering of the Gaussian Mixture Models. We develop a theory drawing strong connections between… ▽ More

    Submitted 9 February, 2025; originally announced February 2025.

  12. arXiv:2501.02547  [pdf, ps, other

    stat.ML cs.LG

    Transformers Simulate MLE for Sequence Generation in Bayesian Networks

    Authors: Yuan Cao, Yihan He, Dennis Wu, Hong-Yu Chen, Jianqing Fan, Han Liu

    Abstract: Transformers have achieved significant success in various fields, notably excelling in tasks involving sequential data like natural language processing. Despite these achievements, the theoretical understanding of transformers' capabilities remains limited. In this paper, we investigate the theoretical capabilities of transformers to autoregressively generate sequences in Bayesian networks based o… ▽ More

    Submitted 8 July, 2025; v1 submitted 5 January, 2025; originally announced January 2025.

    Comments: 51 pages, 17 figures, 5 tables

  13. arXiv:2501.01312  [pdf, other

    stat.ML cs.LG math.ST

    Learning Spectral Methods by Transformers

    Authors: Yihan He, Yuan Cao, Hong-Yu Chen, Dennis Wu, Jianqing Fan, Han Liu

    Abstract: Transformers demonstrate significant advantages as the building block of modern LLMs. In this work, we study the capacities of Transformers in performing unsupervised learning. We show that multi-layered Transformers, given a sufficiently large set of pre-training instances, are able to learn the algorithms themselves and perform statistical estimation tasks given new instances. This learning para… ▽ More

    Submitted 12 January, 2025; v1 submitted 2 January, 2025; originally announced January 2025.

    Comments: 77 pages, 12 figures

  14. arXiv:2412.19444  [pdf, other

    cs.LG math.OC stat.ML

    Towards Simple and Provable Parameter-Free Adaptive Gradient Methods

    Authors: Yuanzhe Tao, Huizhuo Yuan, Xun Zhou, Yuan Cao, Quanquan Gu

    Abstract: Optimization algorithms such as AdaGrad and Adam have significantly advanced the training of deep models by dynamically adjusting the learning rate during the optimization process. However, adhoc tuning of learning rates poses a challenge, leading to inefficiencies in practice. To address this issue, recent research has focused on developing "learning-rate-free" or "parameter-free" algorithms that… ▽ More

    Submitted 26 December, 2024; originally announced December 2024.

    Comments: 34 pages, 16 figures, 3 tables

  15. arXiv:2412.01021  [pdf, other

    stat.ML cs.LG

    On the Feature Learning in Diffusion Models

    Authors: Andi Han, Wei Huang, Yuan Cao, Difan Zou

    Abstract: The predominant success of diffusion models in generative modeling has spurred significant interest in understanding their theoretical foundations. In this work, we propose a feature learning framework aimed at analyzing and comparing the training dynamics of diffusion models with those of traditional classification models. Our theoretical analysis demonstrates that diffusion models, due to the de… ▽ More

    Submitted 2 March, 2025; v1 submitted 1 December, 2024; originally announced December 2024.

  16. arXiv:2411.17003  [pdf, other

    cs.LG cs.AI stat.ML

    Can a Single Tree Outperform an Entire Forest?

    Authors: Qiangqiang Mao, Yankai Cao

    Abstract: The prevailing mindset is that a single decision tree underperforms classic random forests in testing accuracy, despite its advantages in interpretability and lightweight structure. This study challenges such a mindset by significantly improving the testing accuracy of an oblique regression tree through our gradient-based entire tree optimization framework, making its performance comparable to the… ▽ More

    Submitted 25 November, 2024; originally announced November 2024.

  17. arXiv:2411.09128  [pdf, ps, other

    cs.IT stat.AP

    Performance Analysis of uRLLC in scalable Cell-free Radio Access Network System

    Authors: Ziyang Zhang, Dongming Wang, Yunxiang Guo, Yang Cao, Xiaohu You

    Abstract: As a critical component of beyond fifth-generation (B5G) and sixth-generation (6G) mobile communication systems, ultra-reliable low-latency communication (uRLLC) imposes stringent requirements on latency and reliability. In recent years, with the improvement of mobile communication network, centralized and distributed processing schemes for cellfree massive multiple-input multiple-output (CF-mMIMO… ▽ More

    Submitted 12 December, 2024; v1 submitted 13 November, 2024; originally announced November 2024.

  18. arXiv:2410.23610  [pdf, other

    stat.ML cs.LG math.ST

    Global Convergence in Training Large-Scale Transformers

    Authors: Cheng Gao, Yuan Cao, Zihao Li, Yihan He, Mengdi Wang, Han Liu, Jason Matthew Klusowski, Jianqing Fan

    Abstract: Despite the widespread success of Transformers across various domains, their optimization guarantees in large-scale model settings are not well-understood. This paper rigorously analyzes the convergence properties of gradient flow in training Transformers with weight decay regularization. First, we construct the mean-field limit of large-scale Transformers, showing that as the model width and dept… ▽ More

    Submitted 30 October, 2024; originally announced October 2024.

    Comments: to be published in 38th Conference on Neural Information Processing Systems (NeurIPS 2024)

    MSC Class: 35Q93

  19. arXiv:2410.20650  [pdf, other

    cs.LG cs.AI stat.ML

    NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks

    Authors: Yongchang Hao, Yanshuai Cao, Lili Mou

    Abstract: The performance of neural networks improves when more parameters are used. However, the model sizes are constrained by the available on-device memory during training and inference. Although applying techniques like quantization can alleviate the constraint, they suffer from performance degradation. In this work, we introduce NeuZip, a new weight compression scheme based on the entropy of floating-… ▽ More

    Submitted 27 October, 2024; originally announced October 2024.

  20. A Systematic Review of Machine Learning Approaches for Detecting Deceptive Activities on Social Media: Methods, Challenges, and Biases

    Authors: Yunchong Liu, Xiaorui Shen, Yeyubei Zhang, Zhongyan Wang, Yexin Tian, Jianglai Dai, Yuchen Cao

    Abstract: Social media platforms like Twitter, Facebook, and Instagram have facilitated the spread of misinformation, necessitating automated detection systems. This systematic review evaluates 36 studies that apply machine learning (ML) and deep learning (DL) models to detect fake news, spam, and fake accounts on social media. Using the Prediction model Risk Of Bias ASsessment Tool (PROBAST), the review id… ▽ More

    Submitted 9 March, 2025; v1 submitted 26 October, 2024; originally announced October 2024.

  21. arXiv:2410.19139  [pdf, other

    cs.LG stat.ML

    Initialization Matters: On the Benign Overfitting of Two-Layer ReLU CNN with Fully Trainable Layers

    Authors: Shuning Shang, Xuran Meng, Yuan Cao, Difan Zou

    Abstract: Benign overfitting refers to how over-parameterized neural networks can fit training data perfectly and generalize well to unseen data. While this has been widely investigated theoretically, existing works are limited to two-layer networks with fixed output layers, where only the hidden weights are trained. We extend the analysis to two-layer ReLU convolutional neural networks (CNNs) with fully tr… ▽ More

    Submitted 24 October, 2024; originally announced October 2024.

    Comments: 80 pages, 3 figures, 1 table

  22. arXiv:2409.18685  [pdf, other

    cs.LG stat.ML

    Understanding the Benefits of SimCLR Pre-Training in Two-Layer Convolutional Neural Networks

    Authors: Han Zhang, Yuan Cao

    Abstract: SimCLR is one of the most popular contrastive learning methods for vision tasks. It pre-trains deep neural networks based on a large amount of unlabeled data by teaching the model to distinguish between positive and negative pairs of augmented images. It is believed that SimCLR can pre-train a deep neural network to learn efficient representations that can lead to a better performance of future su… ▽ More

    Submitted 27 September, 2024; originally announced September 2024.

    Comments: 65 pages, 4 figures

  23. arXiv:2406.15439  [pdf

    physics.soc-ph stat.AP

    Heterogeneous peer effects of college roommates on academic performance

    Authors: Yi Cao, Tao Zhou, Jian Gao

    Abstract: Understanding how student peers influence learning outcomes is crucial for effective education management in complex social systems. The complexities of peer selection and evolving peer relationships, however, pose challenges for identifying peer effects using static observational data. Here we use both null-model and regression approaches to examine peer effects using longitudinal data from 5,272… ▽ More

    Submitted 29 May, 2024; originally announced June 2024.

    Comments: 56 pages, 4 figures, 2 tables, with Supplementary Information

    Journal ref: Nature Communications, 15(1), 4785 (2024)

  24. arXiv:2406.10650  [pdf, other

    stat.ML cs.LG

    The Implicit Bias of Adam on Separable Data

    Authors: Chenyang Zhang, Difan Zou, Yuan Cao

    Abstract: Adam has become one of the most favored optimizers in deep learning problems. Despite its success in practice, numerous mysteries persist regarding its theoretical understanding. In this paper, we study the implicit bias of Adam in linear logistic regression. Specifically, we show that when the training data are linearly separable, Adam converges towards a linear classifier that achieves the maxim… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 33 pages, 2 figures

  25. arXiv:2402.03295  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    Ginger: An Efficient Curvature Approximation with Linear Complexity for General Neural Networks

    Authors: Yongchang Hao, Yanshuai Cao, Lili Mou

    Abstract: Second-order optimization approaches like the generalized Gauss-Newton method are considered more powerful as they utilize the curvature information of the objective function with preconditioning matrices. Albeit offering tempting theoretical benefits, they are not easily applicable to modern deep learning. The major reason is due to the quadratic memory and cubic time complexity to compute the in… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  26. arXiv:2402.03293  [pdf, other

    cs.LG cs.AI stat.ML

    Flora: Low-Rank Adapters Are Secretly Gradient Compressors

    Authors: Yongchang Hao, Yanshuai Cao, Lili Mou

    Abstract: Despite large neural networks demonstrating remarkable abilities to complete different tasks, they require excessive memory usage to store the optimization states for training. To alleviate this, the low-rank adaptation (LoRA) is proposed to reduce the optimization states by training fewer parameters. However, LoRA restricts overall weight update matrices to be low-rank, limiting the model perform… ▽ More

    Submitted 12 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: Accepted @ ICML 2024

  27. arXiv:2401.13624  [pdf, other

    stat.ML cs.LG

    Can overfitted deep neural networks in adversarial training generalize? -- An approximation viewpoint

    Authors: Zhongjie Shi, Fanghui Liu, Yuan Cao, Johan A. K. Suykens

    Abstract: Adversarial training is a widely used method to improve the robustness of deep neural networks (DNNs) over adversarial perturbations. However, it is empirically observed that adversarial training on over-parameterized networks often suffers from the \textit{robust overfitting}: it can achieve almost zero adversarial training error while the robust generalization performance is not promising. In th… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

  28. arXiv:2311.13958  [pdf, other

    stat.ML cs.CV cs.LG

    Handling The Non-Smooth Challenge in Tensor SVD: A Multi-Objective Tensor Recovery Framework

    Authors: Jingjing Zheng, Wanglong Lu, Wenzhe Wang, Yankai Cao, Xiaoqin Zhang, Xianta Jiang

    Abstract: Recently, numerous tensor singular value decomposition (t-SVD)-based tensor recovery methods have shown promise in processing visual data, such as color images and videos. However, these methods often suffer from severe performance degradation when confronted with tensor data exhibiting non-smooth changes. It has been commonly observed in real-world scenarios but ignored by the traditional t-SVD-b… ▽ More

    Submitted 13 July, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

  29. arXiv:2311.04550  [pdf, other

    cs.LG stat.ML

    Regression with Cost-based Rejection

    Authors: Xin Cheng, Yuzhou Cao, Haobo Wang, Hongxin Wei, Bo An, Lei Feng

    Abstract: Learning with rejection is an important framework that can refrain from making predictions to avoid critical mispredictions by balancing between prediction and rejection. Previous studies on cost-based rejection only focused on the classification setting, which cannot handle the continuous and infinite target space in the regression setting. In this paper, we investigate a novel regression problem… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: Accepted by NeurIPS 2023

  30. Split Knockoffs for Multiple Comparisons: Controlling the Directional False Discovery Rate

    Authors: Yang Cao, Xinwei Sun, Yuan Yao

    Abstract: Multiple comparisons in hypothesis testing often encounter structural constraints in various applications. For instance, in structural Magnetic Resonance Imaging for Alzheimer's Disease, the focus extends beyond examining atrophic brain regions to include comparisons of anatomically adjacent regions. These constraints can be modeled as linear transformations of parameters, where the sign patterns… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

    Comments: Journal of the American Statistical Association, 2023

  31. arXiv:2306.11680  [pdf, other

    cs.LG math.OC stat.ML

    The Implicit Bias of Batch Normalization in Linear Models and Two-layer Linear Convolutional Neural Networks

    Authors: Yuan Cao, Difan Zou, Yuanzhi Li, Quanquan Gu

    Abstract: We study the implicit bias of batch normalization trained by gradient descent. We show that when learning a linear model with batch normalization for binary classification, gradient descent converges to a uniform margin classifier on the training data with an $\exp(-Ω(\log^2 t))$ convergence rate. This distinguishes linear models with batch normalization from those without batch normalization in t… ▽ More

    Submitted 11 July, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

    Comments: 53 pages, 2 figures

  32. arXiv:2303.17940  [pdf, other

    stat.ML cs.LG

    Per-Example Gradient Regularization Improves Learning Signals from Noisy Data

    Authors: Xuran Meng, Yuan Cao, Difan Zou

    Abstract: Gradient regularization, as described in \citet{barrett2021implicit}, is a highly effective technique for promoting flat minima during gradient descent. Empirical evidence suggests that this regularization technique can significantly enhance the robustness of deep learning models against noisy perturbations, while also reducing test error. In this paper, we explore the per-example gradient regular… ▽ More

    Submitted 31 March, 2023; originally announced March 2023.

  33. arXiv:2303.08433  [pdf, other

    cs.LG cs.AI math.OC stat.ML

    The Benefits of Mixup for Feature Learning

    Authors: Difan Zou, Yuan Cao, Yuanzhi Li, Quanquan Gu

    Abstract: Mixup, a simple data augmentation method that randomly mixes two data points via linear interpolation, has been extensively applied in various deep learning applications to gain better generalization. However, the theoretical underpinnings of its efficacy are not yet fully understood. In this paper, we aim to seek a fundamental understanding of the benefits of Mixup. We first show that Mixup using… ▽ More

    Submitted 15 March, 2023; originally announced March 2023.

    Comments: 72 pages, 4 figures

  34. arXiv:2303.05793  [pdf, other

    stat.ME

    Analyzing covariate clustering effects in healthcare cost subgroups: insights and applications for prediction

    Authors: Zhengxiao Li, Yifan Huang, Yang Cao

    Abstract: Healthcare cost prediction is a challenging task due to the high-dimensionality and high correlation among covariates. Additionally, the skewed, heavy-tailed, and often multi-modal nature of cost data can complicate matters further due to unobserved heterogeneity. In this study, we propose a novel framework for finite mixture regression models that incorporates covariate clustering methods to bett… ▽ More

    Submitted 10 March, 2023; originally announced March 2023.

    Comments: 36 pages; 7 figures

  35. arXiv:2302.02334  [pdf, other

    cs.LG cs.AI stat.ML

    Revisiting Discriminative vs. Generative Classifiers: Theory and Implications

    Authors: Chenyu Zheng, Guoqiang Wu, Fan Bao, Yue Cao, Chongxuan Li, Jun Zhu

    Abstract: A large-scale deep model pre-trained on massive labeled or unlabeled data transfers well to downstream tasks. Linear evaluation freezes parameters in the pre-trained model and trains a linear classifier separately, which is efficient and attractive for transfer. However, little work has investigated the classifier in linear evaluation except for the default logistic regression. Inspired by the sta… ▽ More

    Submitted 29 May, 2023; v1 submitted 5 February, 2023; originally announced February 2023.

    Comments: Accepted by ICML 2023, 58 pages

  36. arXiv:2210.10479  [pdf, other

    physics.ao-ph astro-ph.EP stat.AP

    Inferring changes to the global carbon cycle with WOMBAT v2.0, a hierarchical flux-inversion framework

    Authors: Michael Bertolacci, Andrew Zammit-Mangion, Andrew Schuh, Beata Bukosa, Jenny Fisher, Yi Cao, Aleya Kaushik, Noel Cressie

    Abstract: The natural cycles of the surface-to-atmosphere fluxes of carbon dioxide (CO$_2$) and other important greenhouse gases are changing in response to human influences. These changes need to be quantified to understand climate change and its impacts, but this is difficult to do because natural fluxes occur over large spatial and temporal scales. To infer trends in fluxes and identify phase shifts and… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

  37. arXiv:2210.10003  [pdf, other

    stat.AP math.OC stat.ML

    $k$-Means Clustering for Persistent Homology

    Authors: Yueqi Cao, Prudence Leung, Anthea Monod

    Abstract: Persistent homology is a methodology central to topological data analysis that extracts and summarizes the topological features within a dataset as a persistence diagram; it has recently gained much popularity from its myriad successful applications to many domains. However, its algebraic construction induces a metric space of persistence diagrams with a highly complex geometry. In this paper, we… ▽ More

    Submitted 25 November, 2023; v1 submitted 18 October, 2022; originally announced October 2022.

    Comments: 21 pages, 6 figures

  38. arXiv:2208.09897  [pdf, other

    math.ST cs.LG stat.ML

    Multiple Descent in the Multiple Random Feature Model

    Authors: Xuran Meng, Jianfeng Yao, Yuan Cao

    Abstract: Recent works have demonstrated a double descent phenomenon in over-parameterized learning. Although this phenomenon has been investigated by recent works, it has not been fully understood in theory. In this paper, we investigate the multiple descent phenomenon in a class of multi-component prediction models. We first consider a ''double random feature model'' (DRFM) concatenating two types of rand… ▽ More

    Submitted 10 October, 2023; v1 submitted 21 August, 2022; originally announced August 2022.

    Comments: 89 pages, 9 figures. Version 3 adds new description of triple descent in certain double random feature model, deletes the discussion of NTK regimes, and adds more literature references

    MSC Class: 62R07

  39. A Geometric Condition for Uniqueness of Fréchet Means of Persistence Diagrams

    Authors: Yueqi Cao, Anthea Monod

    Abstract: The Fréchet mean is an important statistical summary and measure of centrality of data; it has been defined and studied for persistent homology captured by persistence diagrams. However, the complicated geometry of the space of persistence diagrams implies that the Fréchet mean for a given set of persistence diagrams is not necessarily unique, which prohibits theoretical guarantees for empirical m… ▽ More

    Submitted 2 January, 2025; v1 submitted 8 July, 2022; originally announced July 2022.

    Comments: 18 pages, 4 figures

  40. arXiv:2206.09908  [pdf, other

    math.ST stat.ML

    Learning Optimal Flows for Non-Equilibrium Importance Sampling

    Authors: Yu Cao, Eric Vanden-Eijnden

    Abstract: Many applications in computational sciences and statistical inference require the computation of expectations with respect to complex high-dimensional distributions with unknown normalization constants, as well as the estimation of these constants. Here we develop a method to perform these calculations based on generating samples from a simple base distribution, transporting them by the flow gener… ▽ More

    Submitted 24 October, 2022; v1 submitted 20 June, 2022; originally announced June 2022.

  41. arXiv:2205.10833  [pdf, other

    stat.AP

    Privacy Protection for Youth Risk Behavior Using Bayesian Data Synthesis: A Case Study to the YRBS

    Authors: Yixiao Cao, Jingchen Hu

    Abstract: The large number of publicly available survey datasets of wide variety, albeit useful, raise respondent-level privacy concerns. The synthetic data approach to data privacy and confidentiality has been shown useful in terms of privacy protection and utility preservation. This paper aims at illustrating how synthetic data can facilitate the dissemination of highly sensitive information about youth r… ▽ More

    Submitted 22 May, 2022; originally announced May 2022.

  42. arXiv:2204.09155  [pdf, other

    stat.ML cs.LG math.ST stat.ME

    Approximating Persistent Homology for Large Datasets

    Authors: Yueqi Cao, Anthea Monod

    Abstract: Persistent homology is an important methodology from topological data analysis which adapts theory from algebraic topology to data settings and has been successfully implemented in many applications. It produces a statistical summary in the form of a persistence diagram, which captures the shape and size of the data. Despite its widespread use, persistent homology is simply impossible to implement… ▽ More

    Submitted 18 May, 2022; v1 submitted 19 April, 2022; originally announced April 2022.

    Comments: 24 pages, 9 figures

  43. arXiv:2202.06526  [pdf, other

    cs.LG math.OC stat.ML

    Benign Overfitting in Two-layer Convolutional Neural Networks

    Authors: Yuan Cao, Zixiang Chen, Mikhail Belkin, Quanquan Gu

    Abstract: Modern neural networks often have great expressive power and can be trained to overfit the training data, while still achieving a good test performance. This phenomenon is referred to as "benign overfitting". Recently, there emerges a line of works studying "benign overfitting" from the theoretical perspective. However, they are limited to linear models or kernel/random feature models, and there i… ▽ More

    Submitted 14 June, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

    Comments: 42 pages, 1 figure. Version 3 improves the presentation and adds a comparison with a concurrent work

  44. arXiv:2112.15250  [pdf, other

    cs.LG math.OC stat.ML

    Benign Overfitting in Adversarially Robust Linear Classification

    Authors: Jinghui Chen, Yuan Cao, Quanquan Gu

    Abstract: "Benign overfitting", where classifiers memorize noisy training data yet still achieve a good generalization performance, has drawn great attention in the machine learning community. To explain this surprising phenomenon, a series of works have provided theoretical justification in over-parameterized linear regression, classification, and kernel methods. However, it is not clear if benign overfitt… ▽ More

    Submitted 30 December, 2021; originally announced December 2021.

    Comments: 24 pages, 5 figures

  45. arXiv:2110.15253  [pdf, other

    cs.LG stat.ML

    Understanding How Encoder-Decoder Architectures Attend

    Authors: Kyle Aitken, Vinay V Ramasesh, Yuan Cao, Niru Maheswaranathan

    Abstract: Encoder-decoder networks with attention have proven to be a powerful way to solve many sequence-to-sequence tasks. In these networks, attention aligns encoder and decoder states and is often used for visualizing network behavior. However, the mechanisms used by networks to generate appropriate attention matrices are still mysterious. Moreover, how these mechanisms vary depending on the particular… ▽ More

    Submitted 28 October, 2021; originally announced October 2021.

    Comments: 10+14 pages, 16 figures. NeurIPS 2021

  46. arXiv:2108.11371  [pdf, other

    cs.LG math.OC stat.ML

    Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization

    Authors: Difan Zou, Yuan Cao, Yuanzhi Li, Quanquan Gu

    Abstract: Adaptive gradient methods such as Adam have gained increasing popularity in deep learning optimization. However, it has been observed that compared with (stochastic) gradient descent, Adam can converge to a different solution with a significantly worse test error in many deep learning applications such as image classification, even with a fine-tuned regularization. In this paper, we provide a theo… ▽ More

    Submitted 25 August, 2021; originally announced August 2021.

    Comments: 42 pages, 2 figures and 1 table

  47. arXiv:2106.08864  [pdf, other

    cs.LG stat.ML

    Multi-Class Classification from Single-Class Data with Confidences

    Authors: Yuzhou Cao, Lei Feng, Senlin Shu, Yitian Xu, Bo An, Gang Niu, Masashi Sugiyama

    Abstract: Can we learn a multi-class classifier from only data of a single class? We show that without any assumptions on the loss functions, models, and optimizers, we can successfully learn a multi-class classifier from only data of a single class with a rigorous consistency guarantee when confidences (i.e., the class-posterior probabilities for all the classes) are available. Specifically, we propose an… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

    Comments: 23 pages, 1 figure

  48. arXiv:2106.08360  [pdf, other

    stat.ME

    Multi-sample estimation of centered log-ratio matrix in microbiome studies

    Authors: Yezheng Li, Hongzhe Li, Yuanpei Cao

    Abstract: In microbiome studies, one of the ways of studying bacterial abundances is to estimate bacterial composition based on the sequencing read counts. Various transformations are then applied to such compositional data for downstream statistical analysis, among which the centered log-ratio (clr) transformation is most commonly used. Due to limited sequencing depth and DNA dropouts, many rare bacteria… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.

  49. arXiv:2104.13628  [pdf, other

    cs.LG math.ST stat.ML

    Risk Bounds for Over-parameterized Maximum Margin Classification on Sub-Gaussian Mixtures

    Authors: Yuan Cao, Quanquan Gu, Mikhail Belkin

    Abstract: Modern machine learning systems such as deep neural networks are often highly over-parameterized so that they can fit the noisy training data exactly, yet they can still achieve small test errors in practice. In this paper, we study this "benign overfitting" phenomenon of the maximum margin classifier for linear classification problems. Specifically, we consider data generated from sub-Gaussian mi… ▽ More

    Submitted 2 January, 2022; v1 submitted 28 April, 2021; originally announced April 2021.

    Comments: 27 pages, 3 figures. In NeurIPS 2021

  50. arXiv:2104.01672  [pdf, other

    stat.ML cs.LG math.AT

    Topological Information Retrieval with Dilation-Invariant Bottleneck Comparative Measures

    Authors: Yueqi Cao, Athanasios Vlontzos, Luca Schmidtke, Bernhard Kainz, Anthea Monod

    Abstract: Appropriately representing elements in a database so that queries may be accurately matched is a central task in information retrieval; recently, this has been achieved by embedding the graphical structure of the database into a manifold in a hierarchy-preserving manner using a variety of metrics. Persistent homology is a tool commonly used in topological data analysis that is able to rigorously c… ▽ More

    Submitted 6 July, 2022; v1 submitted 4 April, 2021; originally announced April 2021.

    Comments: 29 pages, 10 figures, 4 tables

    MSC Class: 68P15; 68P20; 55N31

    Journal ref: Information and Inference: A Journal of the IMA, Volume 12, Issue 3 (2023)