Skip to main content

Showing 1–4 of 4 results for author: Helfrich, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2108.01110  [pdf, other

    cs.LG math.NA

    Batch Normalization Preconditioning for Neural Network Training

    Authors: Susanna Lange, Kyle Helfrich, Qiang Ye

    Abstract: Batch normalization (BN) is a popular and ubiquitous method in deep learning that has been shown to decrease training time and improve generalization performance of neural networks. Despite its success, BN is not theoretically well understood. It is not suitable for use with very small mini-batch sizes or online learning. In this paper, we propose a new method called Batch Normalization Preconditi… ▽ More

    Submitted 19 January, 2022; v1 submitted 2 August, 2021; originally announced August 2021.

  2. arXiv:1911.07964  [pdf, other

    cs.LG stat.ML

    Eigenvalue Normalized Recurrent Neural Networks for Short Term Memory

    Authors: Kyle Helfrich, Qiang Ye

    Abstract: Several variants of recurrent neural networks (RNNs) with orthogonal or unitary recurrent matrices have recently been developed to mitigate the vanishing/exploding gradient problem and to model long-term dependencies of sequences. However, with the eigenvalues of the recurrent matrix on the unit circle, the recurrent state retains all input information which may unnecessarily consume model capacit… ▽ More

    Submitted 18 November, 2019; originally announced November 2019.

  3. arXiv:1811.04142  [pdf, other

    stat.ML cs.LG

    Complex Unitary Recurrent Neural Networks using Scaled Cayley Transform

    Authors: Kehelwala D. G. Maduranga, Kyle E. Helfrich, Qiang Ye

    Abstract: Recurrent neural networks (RNNs) have been successfully used on a wide range of sequential data problems. A well known difficulty in using RNNs is the \textit{vanishing or exploding gradient} problem. Recently, there have been several different RNN architectures that try to mitigate this issue by maintaining an orthogonal or unitary recurrent weight matrix. One such architecture is the scaled Cayl… ▽ More

    Submitted 25 February, 2019; v1 submitted 9 November, 2018; originally announced November 2018.

  4. arXiv:1707.09520  [pdf, other

    stat.ML cs.LG

    Orthogonal Recurrent Neural Networks with Scaled Cayley Transform

    Authors: Kyle Helfrich, Devin Willmott, Qiang Ye

    Abstract: Recurrent Neural Networks (RNNs) are designed to handle sequential data but suffer from vanishing or exploding gradients. Recent work on Unitary Recurrent Neural Networks (uRNNs) have been used to address this issue and in some cases, exceed the capabilities of Long Short-Term Memory networks (LSTMs). We propose a simpler and novel update scheme to maintain orthogonal recurrent weight matrices wit… ▽ More

    Submitted 19 June, 2018; v1 submitted 29 July, 2017; originally announced July 2017.

    Comments: 12 pages