Skip to main content

Showing 1–10 of 10 results for author: Zhang, H R

Searching in archive stat. Search in all archives.
.
  1. arXiv:2409.06091  [pdf, other

    cs.LG cs.AI cs.SI stat.ML

    Scalable Multitask Learning Using Gradient-based Estimation of Task Affinity

    Authors: Dongyue Li, Aneesh Sharma, Hongyang R. Zhang

    Abstract: Multitask learning is a widely used paradigm for training models on diverse tasks, with applications ranging from graph neural networks to language model fine-tuning. Since tasks may interfere with each other, a key notion for modeling their relationships is task affinity. This includes pairwise task affinity, computed among pairs of tasks, and higher-order affinity, computed among subsets of task… ▽ More

    Submitted 20 November, 2024; v1 submitted 9 September, 2024; originally announced September 2024.

    Comments: 16 pages. Appeared in KDD 2024

  2. arXiv:2306.08553  [pdf, other

    cs.LG cs.DS math.OC stat.ML

    Noise Stability Optimization for Finding Flat Minima: A Hessian-based Regularization Approach

    Authors: Hongyang R. Zhang, Dongyue Li, Haotian Ju

    Abstract: The training of over-parameterized neural networks has received much study in recent literature. An important consideration is the regularization of over-parameterized networks due to their highly nonconvex and nonlinear geometry. In this paper, we study noise injection algorithms, which can regularize the Hessian of the loss, leading to regions with flat loss surfaces. Specifically, by injecting… ▽ More

    Submitted 23 September, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

    Comments: 39 pages

    Journal ref: Trans. Mach. Learn. Res. 2024

  3. arXiv:2303.14582  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Identification of Negative Transfers in Multitask Learning Using Surrogate Models

    Authors: Dongyue Li, Huy L. Nguyen, Hongyang R. Zhang

    Abstract: Multitask learning is widely used in practice to train a low-resource target task by augmenting it with multiple related source tasks. Yet, naively combining all the source tasks with a target task does not always improve the prediction performance for the target task due to negative transfers. Thus, a critical problem in multitask learning is identifying subsets of source tasks that would benefit… ▽ More

    Submitted 27 December, 2023; v1 submitted 25 March, 2023; originally announced March 2023.

    Comments: 30 pages. Appeared in TMLR'23

  4. arXiv:2302.04451  [pdf, other

    cs.LG cs.SI math.ST stat.ML

    Generalization in Graph Neural Networks: Improved PAC-Bayesian Bounds on Graph Diffusion

    Authors: Haotian Ju, Dongyue Li, Aneesh Sharma, Hongyang R. Zhang

    Abstract: Graph neural networks are widely used tools for graph prediction tasks. Motivated by their empirical performance, prior works have developed generalization bounds for graph neural networks, which scale with graph structures in terms of the maximum degree. In this paper, we present generalization bounds that instead scale with the largest singular value of the graph neural network's feature diffusi… ▽ More

    Submitted 23 October, 2023; v1 submitted 9 February, 2023; originally announced February 2023.

    Comments: 36 pages. Appeared in AISTATS 2023

  5. arXiv:2206.02659  [pdf, other

    cs.LG cs.CV math.ST stat.ML

    Robust Fine-Tuning of Deep Neural Networks with Hessian-based Generalization Guarantees

    Authors: Haotian Ju, Dongyue Li, Hongyang R. Zhang

    Abstract: We consider fine-tuning a pretrained deep neural network on a target task. We study the generalization properties of fine-tuning to understand the problem of overfitting, which has often been observed (e.g., when the target dataset is small or when the training labels are noisy). Existing generalization measures for deep networks depend on notions such as distance from the initialization (i.e., th… ▽ More

    Submitted 22 December, 2023; v1 submitted 6 June, 2022; originally announced June 2022.

    Comments: 38 pages. Appeared in ICML 2022

  6. arXiv:2111.04578  [pdf, ps, other

    cs.LG cs.CV stat.ML

    Improved Regularization and Robustness for Fine-tuning in Neural Networks

    Authors: Dongyue Li, Hongyang R. Zhang

    Abstract: A widely used algorithm for transfer learning is fine-tuning, where a pre-trained model is fine-tuned on a target task with a small amount of labeled data. When the capacity of the pre-trained model is significantly larger than the size of the target dataset, fine-tuning is prone to overfitting and memorizing the training labels. Hence, a crucial question is to regularize fine-tuning and ensure it… ▽ More

    Submitted 13 August, 2025; v1 submitted 8 November, 2021; originally announced November 2021.

    Comments: 22 pages. Appeared in NeurIPS'21

  7. arXiv:2010.11750  [pdf, ps, other

    stat.ML cs.LG

    Precise High-Dimensional Asymptotics for Quantifying Heterogeneous Transfers

    Authors: Fan Yang, Hongyang R. Zhang, Sen Wu, Christopher Ré, Weijie J. Su

    Abstract: The problem of learning one task using samples from another task is central to transfer learning. In this paper, we focus on answering the following question: when does combining the samples from two related tasks perform better than learning with one target task alone? This question is motivated by an empirical phenomenon known as negative transfer, which has been observed in practice. While the… ▽ More

    Submitted 9 June, 2025; v1 submitted 22 October, 2020; originally announced October 2020.

    Comments: 88 pages

    Journal ref: J. Mach. Learn. Res. 26 (2025)

  8. arXiv:2007.04596  [pdf, ps, other

    cs.LG math.OC stat.ML

    Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK

    Authors: Yuanzhi Li, Tengyu Ma, Hongyang R. Zhang

    Abstract: We consider the dynamic of gradient descent for learning a two-layer neural network. We assume the input $x\in\mathbb{R}^d$ is drawn from a Gaussian distribution and the label of $x$ satisfies $f^{\star}(x) = a^{\top}|W^{\star}x|$, where $a\in\mathbb{R}^d$ is a nonnegative vector and $W^{\star} \in\mathbb{R}^{d\times d}$ is an orthonormal matrix. We show that an over-parametrized two-layer neural… ▽ More

    Submitted 9 July, 2020; originally announced July 2020.

    Comments: Conference on Learning Theory (COLT) 2020

  9. arXiv:2005.00695  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    On the Generalization Effects of Linear Transformations in Data Augmentation

    Authors: Sen Wu, Hongyang R. Zhang, Gregory Valiant, Christopher Ré

    Abstract: Data augmentation is a powerful technique to improve performance in applications such as image and text classification tasks. Yet, there is little rigorous understanding of why and how various augmentations work. In this work, we consider a family of linear transformations and study their effects on the ridge estimator in an over-parametrized linear regression setting. First, we show that transfor… ▽ More

    Submitted 26 July, 2023; v1 submitted 2 May, 2020; originally announced May 2020.

    Comments: 22 pages. Appeared in ICML 2020

  10. arXiv:1811.00148  [pdf, other

    cs.LG cs.DS stat.ML

    Recovery Guarantees for Quadratic Tensors with Sparse Observations

    Authors: Hongyang R. Zhang, Vatsal Sharan, Moses Charikar, Yingyu Liang

    Abstract: We consider the tensor completion problem of predicting the missing entries of a tensor. The commonly used CP model has a triple product form, but an alternate family of quadratic models, which are the sum of pairwise products instead of a triple product, have emerged from applications such as recommendation systems. Non-convex methods are the method of choice for learning quadratic models, and th… ▽ More

    Submitted 29 July, 2023; v1 submitted 31 October, 2018; originally announced November 2018.

    Comments: 16 pages. Appeared in AISTATS 2019