Skip to main content

Showing 1–20 of 20 results for author: Ruan, F

Searching in archive stat. Search in all archives.
.
  1. arXiv:2502.11665  [pdf, ps, other

    stat.ML cs.LG math.CA math.FA math.OC

    On the kernel learning problem

    Authors: Yang Li, Feng Ruan

    Abstract: The classical kernel ridge regression problem aims to find the best fit for the output $Y$ as a function of the input data $X\in \mathbb{R}^d$, with a fixed choice of regularization term imposed by a given choice of a reproducing kernel Hilbert space, such as a Sobolev space. Here we consider a generalization of the kernel ridge regression problem, by introducing an extra matrix parameter $U$, whi… ▽ More

    Submitted 8 June, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

    Comments: 61 pages

  2. arXiv:2411.13868  [pdf, ps, other

    stat.ME cs.CL cs.LG math.ST stat.ML

    Robust Detection of Watermarks for Large Language Models Under Human Edits

    Authors: Xiang Li, Feng Ruan, Huiyuan Wang, Qi Long, Weijie J. Su

    Abstract: Watermarking has offered an effective approach to distinguishing text generated by large language models (LLMs) from human-written text. However, the pervasive presence of human edits on LLM-generated text dilutes watermark signals, thereby significantly degrading detection performance of existing methods. In this paper, by modeling human edits through mixture model detection, we introduce a new m… ▽ More

    Submitted 27 June, 2025; v1 submitted 21 November, 2024; originally announced November 2024.

  3. arXiv:2406.06903  [pdf, ps, other

    stat.ML cs.LG math.ST

    On the Limitation of Kernel Dependence Maximization for Feature Selection

    Authors: Keli Liu, Feng Ruan

    Abstract: A simple and intuitive method for feature selection consists of choosing the feature subset that maximizes a nonparametric measure of dependence between the response and the features. A popular proposal from the literature uses the Hilbert-Schmidt Independence Criterion (HSIC) as the nonparametric dependence measure. The rationale behind this approach to feature selection is that important feature… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  4. arXiv:2405.10289  [pdf, ps, other

    math.OC math.ST stat.ML

    Subgradient Selection Convergence Implies Uniform Subdifferential Set Convergence: And Other Tight Convergences Rates in Stochastic Convex Composite Minimization

    Authors: Feng Ruan

    Abstract: In nonsmooth, nonconvex stochastic optimization, understanding the uniform convergence of subdifferential mappings is crucial for analyzing stationary points of sample average approximations of risk as they approach the population risk. Yet, characterizing this convergence remains a fundamental challenge. This work introduces a novel perspective by connecting the uniform convergence of subdifferen… ▽ More

    Submitted 22 December, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: This revision extends Theorem 1 to locally Lipschitz cases and eliminates numerous typos

  5. arXiv:2404.03900  [pdf, ps, other

    stat.ML cs.AI cs.LG cs.NE

    Nonparametric Modern Hopfield Models

    Authors: Jerry Yao-Chieh Hu, Bo-Yu Chen, Dennis Wu, Feng Ruan, Han Liu

    Abstract: We present a nonparametric interpretation for deep learning compatible modern Hopfield models and utilize this new perspective to debut efficient variants. Our key contribution stems from interpreting the memory storage and retrieval processes in modern Hopfield models as a nonparametric regression problem subject to a set of query-memory pairs. Interestingly, our framework not only recovers the k… ▽ More

    Submitted 8 June, 2025; v1 submitted 5 April, 2024; originally announced April 2024.

    Comments: Accepted at ICML 2025. Code available at https://github.com/MAGICS-LAB/NonparametricHopfield. v2 matches with camera-ready version

  6. arXiv:2404.01245  [pdf, other

    math.ST cs.CL cs.CR cs.LG stat.ML

    A Statistical Framework of Watermarks for Large Language Models: Pivot, Detection Efficiency and Optimal Rules

    Authors: Xiang Li, Feng Ruan, Huiyuan Wang, Qi Long, Weijie J. Su

    Abstract: Since ChatGPT was introduced in November 2022, embedding (nearly) unnoticeable statistical signals into text generated by large language models (LLMs), also known as watermarking, has been used as a principled approach to provable detection of LLM-generated text from its human-written counterpart. In this paper, we introduce a general and flexible framework for reasoning about the statistical effi… ▽ More

    Submitted 1 December, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: To appear in the Annals of Statistics

  7. arXiv:2310.11736  [pdf, other

    math.ST math.OC stat.ML

    Layered Models can "Automatically" Regularize and Discover Low-Dimensional Structures via Feature Learning

    Authors: Yunlu Chen, Yang Li, Keli Liu, Feng Ruan

    Abstract: Layered models like neural networks appear to extract key features from data through empirical risk minimization, yet the theoretical understanding for this process remains unclear. Motivated by these observations, we study a two-layer nonparametric regression model where the input undergoes a linear transformation followed by a nonlinear mapping to predict the output, mirroring the structure of t… ▽ More

    Submitted 30 January, 2025; v1 submitted 18 October, 2023; originally announced October 2023.

    Comments: Updated title. Revised introduction. Restructured proofs. Added real data experiments. Fixed typos

  8. arXiv:2310.00176  [pdf, ps, other

    math.ST stat.ML

    Universality of max-margin classifiers

    Authors: Andrea Montanari, Feng Ruan, Basil Saeed, Youngtak Sohn

    Abstract: Maximum margin binary classification is one of the most fundamental algorithms in machine learning, yet the role of featurization maps and the high-dimensional asymptotics of the misclassification error for non-Gaussian features are still poorly understood. We consider settings in which we observe binary labels $y_i$ and either $d$-dimensional covariates ${\boldsymbol z}_i$ that are mapped to a… ▽ More

    Submitted 29 September, 2023; originally announced October 2023.

    Comments: 60 pages

  9. arXiv:2208.09793  [pdf, other

    stat.ML cs.AI stat.AP

    FastCPH: Efficient Survival Analysis for Neural Networks

    Authors: Xuelin Yang, Louis Abraham, Sejin Kim, Petr Smirnov, Feng Ruan, Benjamin Haibe-Kains, Robert Tibshirani

    Abstract: The Cox proportional hazards model is a canonical method in survival analysis for prediction of the life expectancy of a patient given clinical or genetic covariates -- it is a linear model in its original form. In recent years, several methods have been proposed to generalize the Cox model to neural networks, but none of these are both numerically correct and computationally efficient. We propose… ▽ More

    Submitted 20 August, 2022; originally announced August 2022.

  10. arXiv:2110.05852  [pdf, other

    stat.ML cs.LG math.ST

    On the Self-Penalization Phenomenon in Feature Selection

    Authors: Michael I. Jordan, Keli Liu, Feng Ruan

    Abstract: We describe an implicit sparsity-inducing mechanism based on minimization over a family of kernels: \begin{equation*} \min_{β, f}~\widehat{\mathbb{E}}[L(Y, f(β^{1/q} \odot X)] + λ_n \|f\|_{\mathcal{H}_q}^2~~\text{subject to}~~β\ge 0, \end{equation*} where $L$ is the loss, $\odot$ is coordinate-wise multiplication and $\mathcal{H}_q$ is the reproducing kernel Hilbert space based on the kernel… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

    Comments: 54 pages

  11. arXiv:2106.09387  [pdf, other

    math.ST stat.ME stat.ML

    Taming Nonconvexity in Kernel Feature Selection -- Favorable Properties of the Laplace Kernel

    Authors: Feng Ruan, Keli Liu, Michael I. Jordan

    Abstract: Kernel-based feature selection is an important tool in nonparametric statistics. Despite many practical applications of kernel-based feature selection, there is little statistical theory available to support the method. A core challenge is the objective function of the optimization problems used to define kernel-based feature selection are nonconvex. The literature has only studied the statistical… ▽ More

    Submitted 25 May, 2022; v1 submitted 17 June, 2021; originally announced June 2021.

    Comments: 26 pages main text; 74 pages total; appendix rewritten (typo fixed; proof structure reorganized)

  12. arXiv:2012.07348  [pdf, other

    cs.LG cs.GT cs.MA stat.ML

    Bandit Learning in Decentralized Matching Markets

    Authors: Lydia T. Liu, Feng Ruan, Horia Mania, Michael I. Jordan

    Abstract: We study two-sided matching markets in which one side of the market (the players) does not have a priori knowledge about its preferences for the other side (the arms) and is required to learn its preferences from experience. Also, we assume the players have no direct means of communication. This model extends the standard stochastic multi-armed bandit framework to a decentralized multiple player s… ▽ More

    Submitted 21 June, 2021; v1 submitted 14 December, 2020; originally announced December 2020.

    Comments: 34 pages

  13. arXiv:2011.12215  [pdf, other

    stat.ME cs.LG math.ST

    A Self-Penalizing Objective Function for Scalable Interaction Detection

    Authors: Keli Liu, Feng Ruan

    Abstract: We tackle the problem of nonparametric variable selection with a focus on discovering interactions between variables. With $p$ variables there are $O(p^s)$ possible order-$s$ interactions making exhaustive search infeasible. It is nonetheless possible to identify the variables involved in interactions with only linear computation cost, $O(p)$. The trick is to maximize a class of parametrized nonpa… ▽ More

    Submitted 12 December, 2020; v1 submitted 24 November, 2020; originally announced November 2020.

    Comments: 34 pages; the Appendix can be found on the authors' personal websites (the url is in the pdf)

  14. arXiv:2003.07337  [pdf, other

    stat.ML cs.LG math.OC

    Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis

    Authors: Koulik Khamaru, Ashwin Pananjady, Feng Ruan, Martin J. Wainwright, Michael I. Jordan

    Abstract: We address the problem of policy evaluation in discounted Markov decision processes, and provide instance-dependent guarantees on the $\ell_\infty$-error under a generative model. We establish both asymptotic and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms. Theory-inspired simulations s… ▽ More

    Submitted 16 March, 2020; originally announced March 2020.

    Comments: 38 pages, 3 figures

  15. arXiv:1911.01544  [pdf, other

    math.ST stat.ML

    The generalization error of max-margin linear classifiers: Benign overfitting and high dimensional asymptotics in the overparametrized regime

    Authors: Andrea Montanari, Feng Ruan, Youngtak Sohn, Jun Yan

    Abstract: Modern machine learning classifiers often exhibit vanishing classification error on the training set. They achieve this by learning nonlinear representations of the inputs that maps the data into linearly separable classes. Motivated by these phenomena, we revisit high-dimensional maximum margin classification for linearly separable data. We consider a stylized setting in which data… ▽ More

    Submitted 22 March, 2023; v1 submitted 4 November, 2019; originally announced November 2019.

    Comments: 97 pages; 12 pdf figures (A major revision; changed the title and included a new result on benign overfitting)

  16. arXiv:1907.12207  [pdf, other

    stat.ML cs.LG

    LassoNet: A Neural Network with Feature Sparsity

    Authors: Ismael Lemhadri, Feng Ruan, Louis Abraham, Robert Tibshirani

    Abstract: Much work has been done recently to make neural networks more interpretable, and one obvious approach is to arrange for the network to use only a subset of the available features. In linear models, Lasso (or $\ell_1$-regularized) regression assigns zero weights to the most irrelevant or redundant features, and is widely used in data science. However the Lasso only applies to linear models. Here we… ▽ More

    Submitted 16 June, 2021; v1 submitted 29 July, 2019; originally announced July 2019.

    Comments: 18 pages, 10 fg. arXiv admin note: text overlap with arXiv:1901.09346 by other authors

    Journal ref: Journal of Machine Learning Research 22 (2021) 1-29

  17. arXiv:1810.02954  [pdf, other

    math.ST cs.IT stat.ME

    Adapting to Unknown Noise Distribution in Matrix Denoising

    Authors: Andrea Montanari, Feng Ruan, Jun Yan

    Abstract: We consider the problem of estimating an unknown matrix $\boldsymbol{X}\in {\mathbb R}^{m\times n}$, from observations $\boldsymbol{Y} = \boldsymbol{X}+\boldsymbol{W}$ where $\boldsymbol{W}$ is a noise matrix with independent and identically distributed entries, as to minimize estimation error measured in operator norm. Assuming that the underlying signal $\boldsymbol{X}$ is low-rank and incoheren… ▽ More

    Submitted 4 November, 2018; v1 submitted 6 October, 2018; originally announced October 2018.

  18. arXiv:1612.05612  [pdf, other

    math.ST math.OC stat.ML

    Asymptotic Optimality in Stochastic Optimization

    Authors: John Duchi, Feng Ruan

    Abstract: We study local complexity measures for stochastic convex optimization problems, providing a local minimax theory analogous to that of Hájek and Le Cam for classical statistical problems. We give complementary optimality results, developing fully online methods that adaptively achieve optimal convergence guarantees. Our results provide function-specific lower bounds and convergence results that mak… ▽ More

    Submitted 2 November, 2018; v1 submitted 16 December, 2016; originally announced December 2016.

    Journal ref: Annals of Statistics 2019

  19. arXiv:1604.05910  [pdf, other

    stat.AP

    A Tutorial on Libra: R package for the Linearized Bregman Algorithm in High Dimensional Statistics

    Authors: Jiechao Xiong, Feng Ruan, Yuan Yao

    Abstract: The R package, Libra, stands for the LInearized BRegman Al- gorithm in high dimensional statistics. The Linearized Bregman Algorithm is a simple iterative procedure to generate sparse regularization paths of model estimation, which are rstly discovered in applied mathematics for image restoration and particularly suitable for parallel implementation in large scale problems. The limit of such an al… ▽ More

    Submitted 20 April, 2016; originally announced April 2016.

  20. Sparse Recovery via Differential Inclusions

    Authors: Stanley Osher, Feng Ruan, Jiechao Xiong, Yuan Yao, Wotao Yin

    Abstract: In this paper, we recover sparse signals from their noisy linear measurements by solving nonlinear differential inclusions, which is based on the notion of inverse scale space (ISS) developed in applied mathematics. Our goal here is to bring this idea to address a challenging problem in statistics, \emph{i.e.} finding the oracle estimator which is unbiased and sign-consistent using dynamics. We ca… ▽ More

    Submitted 21 January, 2016; v1 submitted 30 June, 2014; originally announced June 2014.

    Comments: In Applied and Computational Harmonic Analysis, 2016

    Report number: CAM Report 14-61