-
Predicting the Critical Number of Layers for Hierarchical Support Vector Regression
Authors:
Ryan Mohr,
Maria Fonoberova,
Zlatko Drmač,
Iva Manojlović,
Igor Mezić
Abstract:
Hierarchical support vector regression (HSVR) models a function from data as a linear combination of SVR models at a range of scales, starting at a coarse scale and moving to finer scales as the hierarchy continues. In the original formulation of HSVR, there were no rules for choosing the depth of the model. In this paper, we observe in a number of models a phase transition in the training error -…
▽ More
Hierarchical support vector regression (HSVR) models a function from data as a linear combination of SVR models at a range of scales, starting at a coarse scale and moving to finer scales as the hierarchy continues. In the original formulation of HSVR, there were no rules for choosing the depth of the model. In this paper, we observe in a number of models a phase transition in the training error -- the error remains relatively constant as layers are added, until a critical scale is passed, at which point the training error drops close to zero and remains nearly constant for added layers. We introduce a method to predict this critical scale a priori with the prediction based on the support of either a Fourier transform of the data or the Dynamic Mode Decomposition (DMD) spectrum. This allows us to determine the required number of layers prior to training any models.
△ Less
Submitted 21 December, 2020;
originally announced December 2020.
-
Applications of Koopman Mode Analysis to Neural Networks
Authors:
Iva Manojlović,
Maria Fonoberova,
Ryan Mohr,
Aleksandr Andrejčuk,
Zlatko Drmač,
Yannis Kevrekidis,
Igor Mezić
Abstract:
We consider the training process of a neural network as a dynamical system acting on the high-dimensional weight space. Each epoch is an application of the map induced by the optimization algorithm and the loss function. Using this induced map, we can apply observables on the weight space and measure their evolution. The evolution of the observables are given by the Koopman operator associated wit…
▽ More
We consider the training process of a neural network as a dynamical system acting on the high-dimensional weight space. Each epoch is an application of the map induced by the optimization algorithm and the loss function. Using this induced map, we can apply observables on the weight space and measure their evolution. The evolution of the observables are given by the Koopman operator associated with the induced dynamical system. We use the spectrum and modes of the Koopman operator to realize the above objectives. Our methods can help to, a priori, determine the network depth; determine if we have a bad initialization of the network weights, allowing a restart before training too long; speeding up the training time. Additionally, our methods help enable noise rejection and improve robustness. We show how the Koopman spectrum can be used to determine the number of layers required for the architecture. Additionally, we show how we can elucidate the convergence versus non-convergence of the training process by monitoring the spectrum, in particular, how the existence of eigenvalues clustering around 1 determines when to terminate the learning process. We also show how using Koopman modes we can selectively prune the network to speed up the training procedure. Finally, we show that incorporating loss functions based on negative Sobolev norms can allow for the reconstruction of a multi-scale signal polluted by very large amounts of noise.
△ Less
Submitted 21 June, 2020;
originally announced June 2020.
-
New robust ScaLAPACK routine for computing the QR factorization with column pivoting
Authors:
Zvonimir Bujanović,
Zlatko Drmač
Abstract:
In this note we describe two modifications of the ScaLAPACK subroutines PxGEQPF for computing the QR factorization with the Businger-Golub column pivoting. First, we resolve a subtle numerical instability in the same way as we have done it for the LAPACK subroutines xGEQPF, xGEQP3 in 2006. [LAPACK Working Note 176 (2006); ACM Trans. Math. Softw. 2008]. The problem originates in the first release o…
▽ More
In this note we describe two modifications of the ScaLAPACK subroutines PxGEQPF for computing the QR factorization with the Businger-Golub column pivoting. First, we resolve a subtle numerical instability in the same way as we have done it for the LAPACK subroutines xGEQPF, xGEQP3 in 2006. [LAPACK Working Note 176 (2006); ACM Trans. Math. Softw. 2008]. The problem originates in the first release of LINPACK in the 1970's: due to severe cancellations in the down-dating of partial column norms, the pivoting procedure may be in the dark completely about the true norms of the pivot column candidates. This may cause miss-pivoting, and as a result loss of the important rank revealing structure of the computed triangular factor, with severe consequences on other solvers that rely on the rank revealing pivoting. The instability is so subtle that, e.g., inserting a WRITE statement or changing the process topology can drastically change the result. Secondly, we also correct a programming error in the complex subroutines PCGEQPF, PZGEQPF, which also causes wrong pivoting because of erroneous use of PSCNRM2, PDZNRM2 for the explicit norm computation.
△ Less
Submitted 12 October, 2019;
originally announced October 2019.
-
Learning low-dimensional dynamical-system models from noisy frequency-response data with Loewner rational interpolation
Authors:
Zlatko Drmač,
Benjamin Peherstorfer
Abstract:
Loewner rational interpolation provides a versatile tool to learn low-dimensional dynamical-system models from frequency-response measurements. This work investigates the robustness of the Loewner approach to noise. The key finding is that if the measurements are polluted with Gaussian noise, then the error due to noise grows at most linearly with the standard deviation with high probability under…
▽ More
Loewner rational interpolation provides a versatile tool to learn low-dimensional dynamical-system models from frequency-response measurements. This work investigates the robustness of the Loewner approach to noise. The key finding is that if the measurements are polluted with Gaussian noise, then the error due to noise grows at most linearly with the standard deviation with high probability under certain conditions. The analysis gives insights into making the Loewner approach robust against noise via linear transformations and judicious selections of measurements. Numerical results demonstrate the linear growth of the error on benchmark examples.
△ Less
Submitted 4 November, 2020; v1 submitted 30 September, 2019;
originally announced October 2019.
-
Parallel solver for shifted systems in a hybrid CPU-GPU framework
Authors:
Nela Bosner,
Zvonimir Bujanović,
Zlatko Drmač
Abstract:
This paper proposes a combination of a hybrid CPU--GPU and a pure GPU software implementation of a direct algorithm for solving shifted linear systems $(A - σI)X = B$ with large number of complex shifts $σ$ and multiple right-hand sides. Such problems often appear e.g. in control theory when evaluating the transfer function, or as a part of an algorithm performing interpolatory model reduction, as…
▽ More
This paper proposes a combination of a hybrid CPU--GPU and a pure GPU software implementation of a direct algorithm for solving shifted linear systems $(A - σI)X = B$ with large number of complex shifts $σ$ and multiple right-hand sides. Such problems often appear e.g. in control theory when evaluating the transfer function, or as a part of an algorithm performing interpolatory model reduction, as well as when computing pseudospectra and structured pseudospectra, or solving large linear systems of ordinary differential equations. The proposed algorithm first jointly reduces the general full $n\times n$ matrix $A$ and the $n\times m$ full right-hand side matrix $B$ to the controller Hessenberg canonical form that facilitates efficient solution: $A$ is transformed to a so-called $m$-Hessenberg form and $B$ is made upper-triangular. This is implemented as blocked highly parallel CPU--GPU hybrid algorithm; individual blocks are reduced by the CPU, and the necessary updates of the rest of the matrix are split among the cores of the CPU and the GPU. To enhance parallelization, the reduction and the updates are overlapped. In the next phase, the reduced $m$-Hessenberg--triangular systems are solved entirely on the GPU, with shifts divided into batches. The benefits of such load distribution are demonstrated by numerical experiments. In particular, we show that our proposed implementation provides an excellent basis for efficient implementations of computational methods in systems and control theory, from evaluation of transfer function to the interpolatory model reduction.
△ Less
Submitted 21 August, 2017;
originally announced August 2017.