Skip to main content

Showing 1–7 of 7 results for author: Hrycej, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2410.16523  [pdf, other

    cs.LG cs.CV stat.CO stat.ME stat.ML

    Efficient Neural Network Training via Subset Pretraining

    Authors: Jan Spörer, Bernhard Bermeitinger, Tomas Hrycej, Niklas Limacher, Siegfried Handschuh

    Abstract: In training neural networks, it is common practice to use partial gradients computed over batches, mostly very small subsets of the training set. This approach is motivated by the argument that such a partial gradient is close to the true one, with precision growing only with the square root of the batch size. A theoretical justification is with the help of stochastic approximation theory. However… ▽ More

    Submitted 29 October, 2024; v1 submitted 21 October, 2024; originally announced October 2024.

    Comments: To appear in KDIR 2024

    Journal ref: Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KDIR2024

  2. Reducing the Transformer Architecture to a Minimum

    Authors: Bernhard Bermeitinger, Tomas Hrycej, Massimo Pavone, Julianus Kath, Siegfried Handschuh

    Abstract: Transformers are a widespread and successful model architecture, particularly in Natural Language Processing (NLP) and Computer Vision (CV). The essential innovation of this architecture is the Attention Mechanism, which solves the problem of extracting relevant context information from long sequences in NLP and realistic scenes in CV. A classical neural network component, a Multi-Layer Perceptron… ▽ More

    Submitted 29 October, 2024; v1 submitted 17 October, 2024; originally announced October 2024.

    Comments: 8 pages, to appear in KDIR2024

    Journal ref: Proceedings of the 16th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KDIR2024

  3. Make Deep Networks Shallow Again

    Authors: Bernhard Bermeitinger, Tomas Hrycej, Siegfried Handschuh

    Abstract: Deep neural networks have a good success record and are thus viewed as the best architecture choice for complex applications. Their main shortcoming has been, for a long time, the vanishing gradient which prevented the numerical optimization algorithms from acceptable convergence. A breakthrough has been achieved by the concept of residual connections -- an identity mapping parallel to a conventio… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

    Comments: to be published at KDIR2023, Rome

    Journal ref: Proceedings of the 15th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - KDIR2023

  4. Number of Attention Heads vs Number of Transformer-Encoders in Computer Vision

    Authors: Tomas Hrycej, Bernhard Bermeitinger, Siegfried Handschuh

    Abstract: Determining an appropriate number of attention heads on one hand and the number of transformer-encoders, on the other hand, is an important choice for Computer Vision (CV) tasks using the Transformer architecture. Computing experiments confirmed the expectation that the total number of parameters has to satisfy the condition of overdetermination (i.e., number of constraints significantly exceeding… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

  5. Training Neural Networks in Single vs Double Precision

    Authors: Tomas Hrycej, Bernhard Bermeitinger, Siegfried Handschuh

    Abstract: The commitment to single-precision floating-point arithmetic is widespread in the deep learning community. To evaluate whether this commitment is justified, the influence of computing precision (single and double precision) on the optimization performance of the Conjugate Gradient (CG) method (a second-order optimization algorithm) and RMSprop (a first-order algorithm) has been investigated. Tests… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

  6. Representational Capacity of Deep Neural Networks -- A Computing Study

    Authors: Bernhard Bermeitinger, Tomas Hrycej, Siegfried Handschuh

    Abstract: There is some theoretical evidence that deep neural networks with multiple hidden layers have a potential for more efficient representation of multidimensional mappings than shallow networks with a single hidden layer. The question is whether it is possible to exploit this theoretical advantage for finding such representations with help of numerical training methods. Tests using prototypical probl… ▽ More

    Submitted 19 July, 2019; originally announced July 2019.

    Journal ref: 2019 11th International Conference on Knowledge Discovery and Information Retrieval (KDIR)

  7. arXiv:1906.11755  [pdf, other

    cs.LG math.NA stat.ML

    Singular Value Decomposition and Neural Networks

    Authors: Bernhard Bermeitinger, Tomas Hrycej, Siegfried Handschuh

    Abstract: Singular Value Decomposition (SVD) constitutes a bridge between the linear algebra concepts and multi-layer neural networks---it is their linear analogy. Besides of this insight, it can be used as a good initial guess for the network parameters, leading to substantially better optimization results.

    Submitted 27 June, 2019; originally announced June 2019.

    Journal ref: ICANN 2019: Artificial Neural Networks and Machine Learning - Deep Learning