Skip to main content

Showing 1–23 of 23 results for author: Long, P

Searching in archive math. Search in all archives.
.
  1. arXiv:2312.09617  [pdf, ps, other

    math.CV

    On the symmetric $q$-analog on the bi-univalent functions with respect to symmetric points

    Authors: Pinhong Long, Huili Han, Halit Orhan, Huo Tang

    Abstract: Our objective is to usher and investigate the subclass$\widetilde{\mathcal{S^{*}_{\sum}}}^η_{q}(μ,λ;φ)$ of the function class $\sum$ of analytic and bi-univalent functions related with the symmetric $q$-derivative operator and the generalized Bernardi integral operator. On the one hand, without the generalized Bernardi integral operator we estimate the second Hankel determinants for the reduced su… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: 19 pages

    MSC Class: 30C45; 30C50

  2. arXiv:2210.01513  [pdf, other

    cs.LG math.OC stat.ML

    The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima

    Authors: Peter L. Bartlett, Philip M. Long, Olivier Bousquet

    Abstract: We consider Sharpness-Aware Minimization (SAM), a gradient-based optimization method for deep networks that has exhibited performance improvements on image and language prediction problems. We show that when SAM is applied with a convex quadratic objective, for most random initializations it converges to a cycle that oscillates between either side of the minimum in the direction with the largest c… ▽ More

    Submitted 11 April, 2023; v1 submitted 4 October, 2022; originally announced October 2022.

  3. arXiv:2209.09315  [pdf, other

    cs.LG cs.AI math.ST stat.ML

    Deep Linear Networks can Benignly Overfit when Shallow Ones Do

    Authors: Niladri S. Chatterji, Philip M. Long

    Abstract: We bound the excess risk of interpolating deep linear networks trained using gradient flow. In a setting previously used to establish risk bounds for the minimum $\ell_2$-norm interpolant, we show that randomly initialized deep linear networks can closely approximate or even match known bounds for the minimum $\ell_2$-norm interpolant. Our analysis also reveals that interpolating deep linear model… ▽ More

    Submitted 6 February, 2023; v1 submitted 19 September, 2022; originally announced September 2022.

  4. arXiv:2110.02914  [pdf, ps, other

    stat.ML cs.LG math.ST

    Foolish Crowds Support Benign Overfitting

    Authors: Niladri S. Chatterji, Philip M. Long

    Abstract: We prove a lower bound on the excess risk of sparse interpolating procedures for linear regression with Gaussian data in the overparameterized regime. We apply this result to obtain a lower bound for basis pursuit (the minimum $\ell_1$-norm interpolant) that implies that its excess risk can converge at an exponentially slower rate than OLS (the minimum $\ell_2$-norm interpolant), even when the gro… ▽ More

    Submitted 17 March, 2022; v1 submitted 6 October, 2021; originally announced October 2021.

  5. arXiv:2108.11489  [pdf, ps, other

    stat.ML cs.LG math.ST

    The Interplay Between Implicit Bias and Benign Overfitting in Two-Layer Linear Networks

    Authors: Niladri S. Chatterji, Philip M. Long, Peter L. Bartlett

    Abstract: The recent success of neural network models has shone light on a rather surprising statistical phenomenon: statistical models that perfectly fit noisy data can generalize well to unseen test data. Understanding this phenomenon of $\textit{benign overfitting}$ has attracted intense theoretical and empirical study. In this paper, we consider interpolating two-layer linear neural networks trained wit… ▽ More

    Submitted 9 September, 2022; v1 submitted 25 August, 2021; originally announced August 2021.

    Comments: Accepted for publication at JMLR

  6. arXiv:2102.04998  [pdf, ps, other

    stat.ML cs.AI cs.LG math.OC

    When does gradient descent with logistic loss interpolate using deep networks with smoothed ReLU activations?

    Authors: Niladri S. Chatterji, Philip M. Long, Peter L. Bartlett

    Abstract: We establish conditions under which gradient descent applied to fixed-width deep networks drives the logistic loss to zero, and prove bounds on the rate of convergence. Our analysis applies for smoothed approximations to the ReLU, such as Swish and the Huberized ReLU, proposed in previous applied work. We provide two sufficient conditions for convergence. The first is simply a bound on the loss at… ▽ More

    Submitted 1 July, 2021; v1 submitted 9 February, 2021; originally announced February 2021.

  7. arXiv:2012.02409  [pdf, other

    stat.ML cs.LG math.OC

    When does gradient descent with logistic loss find interpolating two-layer networks?

    Authors: Niladri S. Chatterji, Philip M. Long, Peter L. Bartlett

    Abstract: We study the training of finite-width two-layer smoothed ReLU networks for binary classification using the logistic loss. We show that gradient descent drives the training loss to zero if the initial loss is small enough. When the data satisfies certain cluster and separation conditions and the network is wide enough, we show that one step of gradient descent reduces the loss sufficiently that the… ▽ More

    Submitted 1 July, 2021; v1 submitted 4 December, 2020; originally announced December 2020.

  8. arXiv:2010.08479  [pdf, ps, other

    stat.ML cs.LG math.ST

    Failures of model-dependent generalization bounds for least-norm interpolation

    Authors: Peter L. Bartlett, Philip M. Long

    Abstract: We consider bounds on the generalization performance of the least-norm linear regressor, in the over-parameterized regime where it can interpolate the data. We describe a sense in which any generalization bound of a type that is commonly proved in statistical learning theory must sometimes be very loose when applied to analyze the least-norm interpolant. In particular, for a variety of natural joi… ▽ More

    Submitted 20 January, 2021; v1 submitted 16 October, 2020; originally announced October 2020.

    Journal ref: JMLR, 22(204):1-15, 2021

  9. arXiv:2004.12019  [pdf, other

    stat.ML cs.LG math.ST

    Finite-sample Analysis of Interpolating Linear Classifiers in the Overparameterized Regime

    Authors: Niladri S. Chatterji, Philip M. Long

    Abstract: We prove bounds on the population risk of the maximum margin algorithm for two-class linear classification. For linearly separable training data, the maximum margin algorithm has been shown in previous work to be equivalent to a limit of training with logistic loss using gradient descent, as the training error is driven to zero. We analyze this algorithm applied to random data including misclassif… ▽ More

    Submitted 1 June, 2021; v1 submitted 24 April, 2020; originally announced April 2020.

    Comments: Corrected typographical errors from the previous version of this paper

  10. arXiv:2003.01094  [pdf, other

    cs.LG math.OC stat.ML

    On the Global Convergence of Training Deep Linear ResNets

    Authors: Difan Zou, Philip M. Long, Quanquan Gu

    Abstract: We study the convergence of gradient descent (GD) and stochastic gradient descent (SGD) for training $L$-hidden-layer linear residual networks (ResNets). We prove that for training deep residual networks with certain linear transformations at input and output layers, which are fixed throughout training, both GD and SGD with zero initialization on all hidden weights can converge to the global minim… ▽ More

    Submitted 2 March, 2020; originally announced March 2020.

    Comments: 26 pages, 1 figure. In ICLR 2020

  11. arXiv:2002.00291  [pdf, ps, other

    stat.ML cs.LG math.ST

    Oracle Lower Bounds for Stochastic Gradient Sampling Algorithms

    Authors: Niladri S. Chatterji, Peter L. Bartlett, Philip M. Long

    Abstract: We consider the problem of sampling from a strongly log-concave density in $\mathbb{R}^d$, and prove an information theoretic lower bound on the number of stochastic gradient queries of the log density needed. Several popular sampling algorithms (including many Markov chain Monte Carlo methods) operate by using stochastic gradients of the log density to generate a sample; our results establish an… ▽ More

    Submitted 3 July, 2021; v1 submitted 1 February, 2020; originally announced February 2020.

    Comments: 21 pages; accepted for publication at Bernoulli

  12. arXiv:1906.11300  [pdf, other

    stat.ML cs.LG math.ST

    Benign Overfitting in Linear Regression

    Authors: Peter L. Bartlett, Philip M. Long, Gábor Lugosi, Alexander Tsigler

    Abstract: The phenomenon of benign overfitting is one of the key mysteries uncovered by deep learning methodology: deep neural networks seem to predict well, even with a perfect fit to noisy training data. Motivated by this phenomenon, we consider when a perfect fit to training data in linear regression is compatible with accurate prediction. We give a characterization of linear regression problems for whic… ▽ More

    Submitted 29 January, 2020; v1 submitted 26 June, 2019; originally announced June 2019.

  13. arXiv:1905.12600  [pdf, other

    cs.LG cs.AI cs.NE math.ST stat.ML

    Generalization bounds for deep convolutional neural networks

    Authors: Philip M. Long, Hanie Sedghi

    Abstract: We prove bounds on the generalization error of convolutional networks. The bounds are in terms of the training loss, the number of parameters, the Lipschitz constant of the loss and the distance from the weights to the initial weights. They are independent of the number of pixels in the input, and the height and width of hidden feature maps. We present experiments using CIFAR-10 with varying hyper… ▽ More

    Submitted 8 April, 2020; v1 submitted 29 May, 2019; originally announced May 2019.

    Comments: Published as a conference paper at ICLR 2020

  14. arXiv:1901.02104  [pdf, other

    cs.LG cs.AI cs.NE math.ST stat.ML

    On the effect of the activation function on the distribution of hidden nodes in a deep network

    Authors: Philip M. Long, Hanie Sedghi

    Abstract: We analyze the joint probability distribution on the lengths of the vectors of hidden variables in different layers of a fully connected deep network, when the weights and biases are chosen randomly according to Gaussian distributions, and the input is in $\{ -1, 1\}^N$. We show that, if the activation function $φ$ satisfies a minimal set of assumptions, satisfied by all activation functions that… ▽ More

    Submitted 7 January, 2019; originally announced January 2019.

  15. arXiv:1811.06702  [pdf, ps, other

    math.AP math.FA

    A characterization of rough fractional type integral operators and Campanato estimates for their commutators on the variable exponent vanishing generalized Morrey spaces

    Authors: Ferit Grbz, Shenghu Ding, Huili Han, Pinhong Long

    Abstract: In this paper, applying some properties of variable exponent analysis, we first dwell on Adams and Spanne type estimates for a class of fractional type integral operators of variable orders, respectively and then, obtain variable exponent generalized Campanato estimates for the corresponding commutators on the vanishing generalized Morrey spaces… ▽ More

    Submitted 16 November, 2018; originally announced November 2018.

    Comments: 26 pages

    MSC Class: 42B20; 42B35; 46E30

  16. arXiv:1807.07013  [pdf, ps, other

    cs.DS cs.LG math.ST

    Learning Sums of Independent Random Variables with Sparse Collective Support

    Authors: Anindya De, Philip M. Long, Rocco A. Servedio

    Abstract: We study the learnability of sums of independent integer random variables given a bound on the size of the union of their supports. For $\mathcal{A} \subset \mathbf{Z}_{+}$, a sum of independent random variables with collective support $\mathcal{A}$} (called an $\mathcal{A}$-sum in this paper) is a distribution $\mathbf{S} = \mathbf{X}_1 + \cdots + \mathbf{X}_N$ where the $\mathbf{X}_i$'s are mutu… ▽ More

    Submitted 12 November, 2020; v1 submitted 18 July, 2018; originally announced July 2018.

    Comments: Conference version in FOCS'18; Journal version to appear in JMLR

  17. arXiv:1804.05012  [pdf, ps, other

    cs.LG cs.AI cs.NE math.ST stat.ML

    Representing smooth functions as compositions of near-identity functions with implications for deep network optimization

    Authors: Peter L. Bartlett, Steven N. Evans, Philip M. Long

    Abstract: We show that any smooth bi-Lipschitz $h$ can be represented exactly as a composition $h_m \circ ... \circ h_1$ of functions $h_1,...,h_m$ that are close to the identity in the sense that each $\left(h_i-\mathrm{Id}\right)$ is Lipschitz, and the Lipschitz constant decreases inversely with the number $m$ of functions composed. This implies that $h$ can be represented to any accuracy by a deep residu… ▽ More

    Submitted 16 April, 2018; v1 submitted 13 April, 2018; originally announced April 2018.

  18. arXiv:1802.06093  [pdf, ps, other

    cs.LG cs.NE math.OC math.ST stat.ML

    Gradient descent with identity initialization efficiently learns positive definite linear transformations by deep residual networks

    Authors: Peter L. Bartlett, David P. Helmbold, Philip M. Long

    Abstract: We analyze algorithms for approximating a function $f(x) = Φx$ mapping $\Re^d$ to $\Re^d$ using deep linear neural networks, i.e. that learn a function $h$ parameterized by matrices $Θ_1,...,Θ_L$ and defined by $h(x) = Θ_L Θ_{L-1} ... Θ_1 x$. We focus on algorithms that learn through gradient descent on the population quadratic loss in the case that the distribution over the inputs is isotropic.… ▽ More

    Submitted 18 June, 2018; v1 submitted 16 February, 2018; originally announced February 2018.

  19. arXiv:1801.09834  [pdf, ps, other

    stat.ME math.ST stat.ML

    A Flexible Procedure for Mixture Proportion Estimation in Positive-Unlabeled Learning

    Authors: Zhenfeng Lin, James P. Long

    Abstract: Positive--unlabeled (PU) learning considers two samples, a positive set P with observations from only one class and an unlabeled set U with observations from two classes. The goal is to classify observations in U. Class mixture proportion estimation (MPE) in U is a key step in PU learning. Blanchard et al. [2010] showed that MPE in PU learning is a generalization of the problem of estimating the p… ▽ More

    Submitted 9 January, 2020; v1 submitted 29 January, 2018; originally announced January 2018.

    Comments: 28 pages (including 9 pages of Technical Notes), 4 figures, 1 table

  20. arXiv:1602.04484  [pdf, ps, other

    cs.LG cs.AI cs.NE math.ST stat.ML

    Surprising properties of dropout in deep networks

    Authors: David P. Helmbold, Philip M. Long

    Abstract: We analyze dropout in deep networks with rectified linear units and the quadratic loss. Our results expose surprising differences between the behavior of dropout and more traditional regularizers like weight decay. For example, on some simple data sets dropout training produces negative weights even though the output is the sum of the inputs. This provides a counterpoint to the suggestion that dro… ▽ More

    Submitted 19 April, 2017; v1 submitted 14 February, 2016; originally announced February 2016.

  21. arXiv:1412.4736  [pdf, other

    cs.LG cs.AI cs.NE math.ST stat.ML

    On the Inductive Bias of Dropout

    Authors: David P. Helmbold, Philip M. Long

    Abstract: Dropout is a simple but effective technique for learning in neural networks and other settings. A sound theoretical understanding of dropout is needed to determine when dropout should be applied and how to use it most effectively. In this paper we continue the exploration of dropout as a regularizer pioneered by Wager, et.al. We focus on linear classification where a convex proxy to the misclassif… ▽ More

    Submitted 17 February, 2015; v1 submitted 15 December, 2014; originally announced December 2014.

    Journal ref: Journal of Machine Learning Research, 16, 3403-3454 (2015). (See http://jmlr.org/papers/volume16/helmbold15a/helmbold15a.pdf.)

  22. arXiv:1211.1082  [pdf, ps, other

    cs.LG math.ST stat.ML

    Active and passive learning of linear separators under log-concave distributions

    Authors: Maria Florina Balcan, Philip M. Long

    Abstract: We provide new results concerning label efficient, polynomial time, passive and active learning of linear separators. We prove that active learning provides an exponential improvement over PAC (passive) learning of homogeneous linear separators under nearly log-concave distributions. Building on this, we provide a computationally efficient PAC algorithm with optimal (up to a constant factor) sampl… ▽ More

    Submitted 26 April, 2013; v1 submitted 5 November, 2012; originally announced November 2012.

  23. arXiv:1205.6081  [pdf, ps, other

    math.CA math.FA

    Criterions of Wiener type for minimally thin sets and rarefied sets associated with the stationary Schrödinger operator in a cone

    Authors: Pinhong Long, Zhiqiang Gao, Guantie Deng

    Abstract: In the paper we give some criterions for a-minimally thin sets and a-rarefied sets associated with the stationary Schrödinger operator at a fixed Martin boundary point or {\infty} with respect to a cone. Moreover, we show that a positive superfunction on a cone behaves regularly outside a-rarefied set. Finally we illustrate the relation between a-minimally thin set and a-rarefied set in a cone.

    Submitted 28 May, 2012; originally announced May 2012.

    MSC Class: 31B05; 31B25; 31C35