Search | arXiv e-print repository

Sensitivity-Based Layer Insertion for Residual and Feedforward Neural Networks

Authors: Evelyn Herberg, Roland Herzog, Frederik Köhne, Leonie Kreis, Anton Schiela

Abstract: The training of neural networks requires tedious and often manual tuning of the network architecture. We propose a systematic method to insert new layers during the training process, which eliminates the need to choose a fixed network size before training. Our technique borrows techniques from constrained optimization and is based on first-order sensitivity information of the objective with respec… ▽ More The training of neural networks requires tedious and often manual tuning of the network architecture. We propose a systematic method to insert new layers during the training process, which eliminates the need to choose a fixed network size before training. Our technique borrows techniques from constrained optimization and is based on first-order sensitivity information of the objective with respect to the virtual parameters that additional layers, if inserted, would offer. We consider fully connected feedforward networks with selected activation functions as well as residual neural networks. In numerical experiments, the proposed sensitivity-based layer insertion technique exhibits improved training decay, compared to not inserting the layer. Furthermore, the computational effort is reduced in comparison to inserting the layer from the beginning. The code is available at \url{https://github.com/LeonieKreis/layer_insertion_sensitivity_based}. △ Less

Submitted 27 November, 2023; originally announced November 2023.

arXiv:2306.16111 [pdf, other]

Time Regularization in Optimal Time Variable Learning

Authors: Evelyn Herberg, Roland Herzog, Frederik Köhne

Abstract: Recently, optimal time variable learning in deep neural networks (DNNs) was introduced in arXiv:2204.08528. In this manuscript we extend the concept by introducing a regularization term that directly relates to the time horizon in discrete dynamical systems. Furthermore, we propose an adaptive pruning approach for Residual Neural Networks (ResNets), which reduces network complexity without comprom… ▽ More Recently, optimal time variable learning in deep neural networks (DNNs) was introduced in arXiv:2204.08528. In this manuscript we extend the concept by introducing a regularization term that directly relates to the time horizon in discrete dynamical systems. Furthermore, we propose an adaptive pruning approach for Residual Neural Networks (ResNets), which reduces network complexity without compromising expressiveness, while simultaneously decreasing training time. The results are illustrated by applying the proposed concepts to classification tasks on the well known MNIST and Fashion MNIST data sets. Our PyTorch code is available on https://github.com/frederikkoehne/time_variable_learning. △ Less

Submitted 6 December, 2023; v1 submitted 28 June, 2023; originally announced June 2023.

arXiv:2304.05133 [pdf, other]

Lecture Notes: Neural Network Architectures

Authors: Evelyn Herberg

Abstract: These lecture notes provide an overview of Neural Network architectures from a mathematical point of view. Especially, Machine Learning with Neural Networks is seen as an optimization problem. Covered are an introduction to Neural Networks and the following architectures: Feedforward Neural Network, Convolutional Neural Network, ResNet, and Recurrent Neural Network. These lecture notes provide an overview of Neural Network architectures from a mathematical point of view. Especially, Machine Learning with Neural Networks is seen as an optimization problem. Covered are an introduction to Neural Networks and the following architectures: Feedforward Neural Network, Convolutional Neural Network, ResNet, and Recurrent Neural Network. △ Less

Submitted 18 April, 2023; v1 submitted 11 April, 2023; originally announced April 2023.

Comments: added more references

MSC Class: 68T07

arXiv:2204.08528 [pdf, other]

An Optimal Time Variable Learning Framework for Deep Neural Networks

Authors: Harbir Antil, Hugo Díaz, Evelyn Herberg

Abstract: Feature propagation in Deep Neural Networks (DNNs) can be associated to nonlinear discrete dynamical systems. The novelty, in this paper, lies in letting the discretization parameter (time step-size) vary from layer to layer, which needs to be learned, in an optimization framework. The proposed framework can be applied to any of the existing networks such as ResNet, DenseNet or Fractional-DNN. Thi… ▽ More Feature propagation in Deep Neural Networks (DNNs) can be associated to nonlinear discrete dynamical systems. The novelty, in this paper, lies in letting the discretization parameter (time step-size) vary from layer to layer, which needs to be learned, in an optimization framework. The proposed framework can be applied to any of the existing networks such as ResNet, DenseNet or Fractional-DNN. This framework is shown to help overcome the vanishing and exploding gradient issues. Stability of some of the existing continuous DNNs such as Fractional-DNN is also studied. The proposed approach is applied to an ill-posed 3D-Maxwell's equation. △ Less

Submitted 18 April, 2022; originally announced April 2022.

MSC Class: 34A08; 49J15; 68T05; 82C32

Showing 1–4 of 4 results for author: Herberg, E