Search | arXiv e-print repository

arXiv:2009.14554 [pdf, other]

One Reflection Suffice

Authors: Alexander Mathiasen, Frederik Hvilshøj

Abstract: Orthogonal weight matrices are used in many areas of deep learning. Much previous work attempt to alleviate the additional computational resources it requires to constrain weight matrices to be orthogonal. One popular approach utilizes *many* Householder reflections. The only practical drawback is that many reflections cause low GPU utilization. We mitigate this final drawback by proving that *one… ▽ More Orthogonal weight matrices are used in many areas of deep learning. Much previous work attempt to alleviate the additional computational resources it requires to constrain weight matrices to be orthogonal. One popular approach utilizes *many* Householder reflections. The only practical drawback is that many reflections cause low GPU utilization. We mitigate this final drawback by proving that *one* reflection is sufficient, if the reflection is computed by an auxiliary neural network. △ Less

Submitted 30 September, 2020; originally announced September 2020.

arXiv:2009.14075 [pdf, other]

Backpropagating through Fréchet Inception Distance

Authors: Alexander Mathiasen, Frederik Hvilshøj

Abstract: The Fréchet Inception Distance (FID) has been used to evaluate hundreds of generative models. We introduce FastFID, which can efficiently train generative models with FID as a loss function. Using FID as an additional loss for Generative Adversarial Networks improves their FID. The Fréchet Inception Distance (FID) has been used to evaluate hundreds of generative models. We introduce FastFID, which can efficiently train generative models with FID as a loss function. Using FID as an additional loss for Generative Adversarial Networks improves their FID. △ Less

Submitted 14 April, 2021; v1 submitted 29 September, 2020; originally announced September 2020.

arXiv:2009.13977 [pdf, other]

What if Neural Networks had SVDs?

Authors: Alexander Mathiasen, Frederik Hvilshøj, Jakob Rødsgaard Jørgensen, Anshul Nasery, Davide Mottin

Abstract: Various Neural Networks employ time-consuming matrix operations like matrix inversion. Many such matrix operations are faster to compute given the Singular Value Decomposition (SVD). Previous work allows using the SVD in Neural Networks without computing it. In theory, the techniques can speed up matrix operations, however, in practice, they are not fast enough. We present an algorithm that is fas… ▽ More Various Neural Networks employ time-consuming matrix operations like matrix inversion. Many such matrix operations are faster to compute given the Singular Value Decomposition (SVD). Previous work allows using the SVD in Neural Networks without computing it. In theory, the techniques can speed up matrix operations, however, in practice, they are not fast enough. We present an algorithm that is fast enough to speed up several matrix operations. The algorithm increases the degree of parallelism of an underlying matrix multiplication $H\cdot X$ where $H$ is an orthogonal matrix represented by a product of Householder matrices. Code is available at www.github.com/AlexanderMath/fasth . △ Less

Submitted 29 September, 2020; originally announced September 2020.

arXiv:1909.12518 [pdf, ps, other]

Margin-Based Generalization Lower Bounds for Boosted Classifiers

Authors: Allan Grønlund, Lior Kamma, Kasper Green Larsen, Alexander Mathiasen, Jelani Nelson

Abstract: Boosting is one of the most successful ideas in machine learning. The most well-accepted explanations for the low generalization error of boosting algorithms such as AdaBoost stem from margin theory. The study of margins in the context of boosting algorithms was initiated by Schapire, Freund, Bartlett and Lee (1998) and has inspired numerous boosting algorithms and generalization bounds. To date,… ▽ More Boosting is one of the most successful ideas in machine learning. The most well-accepted explanations for the low generalization error of boosting algorithms such as AdaBoost stem from margin theory. The study of margins in the context of boosting algorithms was initiated by Schapire, Freund, Bartlett and Lee (1998) and has inspired numerous boosting algorithms and generalization bounds. To date, the strongest known generalization (upper bound) is the $k$th margin bound of Gao and Zhou (2013). Despite the numerous generalization upper bounds that have been proved over the last two decades, nothing is known about the tightness of these bounds. In this paper, we give the first margin-based lower bounds on the generalization error of boosted classifiers. Our lower bounds nearly match the $k$th margin bound and thus almost settle the generalization performance of boosted classifiers in terms of margins. △ Less

Submitted 7 May, 2020; v1 submitted 27 September, 2019; originally announced September 2019.

arXiv:1901.10789 [pdf, other]

Optimal Minimal Margin Maximization with Boosting

Authors: Allan Grønlund, Kasper Green Larsen, Alexander Mathiasen

Abstract: Boosting algorithms produce a classifier by iteratively combining base hypotheses. It has been observed experimentally that the generalization error keeps improving even after achieving zero training error. One popular explanation attributes this to improvements in margins. A common goal in a long line of research, is to maximize the smallest margin using as few base hypotheses as possible, culmin… ▽ More Boosting algorithms produce a classifier by iteratively combining base hypotheses. It has been observed experimentally that the generalization error keeps improving even after achieving zero training error. One popular explanation attributes this to improvements in margins. A common goal in a long line of research, is to maximize the smallest margin using as few base hypotheses as possible, culminating with the AdaBoostV algorithm by (R{ä}tsch and Warmuth [JMLR'04]). The AdaBoostV algorithm was later conjectured to yield an optimal trade-off between number of hypotheses trained and the minimal margin over all training points (Nie et al. [JMLR'13]). Our main contribution is a new algorithm refuting this conjecture. Furthermore, we prove a lower bound which implies that our new algorithm is optimal. △ Less

Submitted 30 January, 2019; originally announced January 2019.

Showing 1–5 of 5 results for author: Mathiasen, A