Skip to main content

Showing 1–3 of 3 results for author: Tseran, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2305.19510  [pdf, other

    cs.LG math.CO stat.ML

    Mildly Overparameterized ReLU Networks Have a Favorable Loss Landscape

    Authors: Kedar Karhadkar, Michael Murray, Hanna Tseran, Guido Montúfar

    Abstract: We study the loss landscape of both shallow and deep, mildly overparameterized ReLU neural networks on a generic finite input dataset for the squared error loss. We show both by count and volume that most activation patterns correspond to parameter regions with no bad local minima. Furthermore, for one-dimensional input data, we show most activation regions realizable by the network contain a high… ▽ More

    Submitted 8 February, 2024; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: 40 pages

  2. arXiv:2301.06956  [pdf, other

    stat.ML cs.LG

    Expected Gradients of Maxout Networks and Consequences to Parameter Initialization

    Authors: Hanna Tseran, Guido Montúfar

    Abstract: We study the gradients of a maxout network with respect to inputs and parameters and obtain bounds for the moments depending on the architecture and the parameter distribution. We observe that the distribution of the input-output Jacobian depends on the input, which complicates a stable parameter initialization. Based on the moments of the gradients, we formulate parameter initialization strategie… ▽ More

    Submitted 18 May, 2023; v1 submitted 17 January, 2023; originally announced January 2023.

    Comments: Published at ICML 2023, 42 pages, 11 figures

  3. arXiv:2107.00379  [pdf, other

    stat.ML cs.LG

    On the Expected Complexity of Maxout Networks

    Authors: Hanna Tseran, Guido Montúfar

    Abstract: Learning with neural networks relies on the complexity of the representable functions, but more importantly, the particular assignment of typical parameters to functions of different complexity. Taking the number of activation regions as a complexity measure, recent works have shown that the practical complexity of deep ReLU networks is often far from the theoretical maximum. In this work, we show… ▽ More

    Submitted 16 December, 2021; v1 submitted 1 July, 2021; originally announced July 2021.

    Comments: Published at NeurIPS 2021, 47 pages, 18 figures