-
Data-Parallel Neural Network Training via Nonlinearly Preconditioned Trust-Region Method
Authors:
Samuel A. Cruz Alegría,
Ken Trotti,
Alena Kopaničáková,
Rolf Krause
Abstract:
Parallel training methods are increasingly relevant in machine learning (ML) due to the continuing growth in model and dataset sizes. We propose a variant of the Additively Preconditioned Trust-Region Strategy (APTS) for training deep neural networks (DNNs). The proposed APTS method utilizes a data-parallel approach to construct a nonlinear preconditioner employed in the nonlinear optimization str…
▽ More
Parallel training methods are increasingly relevant in machine learning (ML) due to the continuing growth in model and dataset sizes. We propose a variant of the Additively Preconditioned Trust-Region Strategy (APTS) for training deep neural networks (DNNs). The proposed APTS method utilizes a data-parallel approach to construct a nonlinear preconditioner employed in the nonlinear optimization strategy. In contrast to the common employment of Stochastic Gradient Descent (SGD) and Adaptive Moment Estimation (Adam), which are both variants of gradient descent (GD) algorithms, the APTS method implicitly adjusts the step sizes in each iteration, thereby removing the need for costly hyperparameter tuning. We demonstrate the performance of the proposed APTS variant using the MNIST and CIFAR-10 datasets. The results obtained indicate that the APTS variant proposed here achieves comparable validation accuracy to SGD and Adam, all while allowing for parallel training and obviating the need for expensive hyperparameter tuning.
△ Less
Submitted 7 February, 2025;
originally announced February 2025.
-
A note on the convergence of multigrid methods for the Riesz-space equation and an application to image deblurring
Authors:
Danyal Ahmad,
Marco Donatelli,
Mariarosa Mazza,
Stefano Serra-Capizzano,
Ken Trotti
Abstract:
In the past decades, a remarkable amount of research has been carried out regarding fast solvers for large linear systems resulting from various discretizations of fractional differential equations (FDEs). In the current work, we focus on multigrid methods for a Riesz-space FDE whose theoretical convergence analysis of such multigrids is currently limited to the two-grid method. Here we provide a…
▽ More
In the past decades, a remarkable amount of research has been carried out regarding fast solvers for large linear systems resulting from various discretizations of fractional differential equations (FDEs). In the current work, we focus on multigrid methods for a Riesz-space FDE whose theoretical convergence analysis of such multigrids is currently limited to the two-grid method. Here we provide a detailed theoretical convergence study in the case of V-cycle and W-cycle. Moreover, we discuss its use combined with a band approximation and we compare the result with both $τ$ and circulant preconditionings. The numerical tests include 2D problems as well as the extension to the case of a Riesz-FDE with variable coefficients. Finally, we apply the best-performing method to an image deblurring problem with Tikhonov regularization.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Parallel Trust-Region Approaches in Neural Network Training: Beyond Traditional Methods
Authors:
Ken Trotti,
Samuel A. Cruz Alegría,
Alena Kopaničáková,
Rolf Krause
Abstract:
We propose to train neural networks (NNs) using a novel variant of the ``Additively Preconditioned Trust-region Strategy'' (APTS). The proposed method is based on a parallelizable additive domain decomposition approach applied to the neural network's parameters. Built upon the TR framework, the APTS method ensures global convergence towards a minimizer. Moreover, it eliminates the need for computa…
▽ More
We propose to train neural networks (NNs) using a novel variant of the ``Additively Preconditioned Trust-region Strategy'' (APTS). The proposed method is based on a parallelizable additive domain decomposition approach applied to the neural network's parameters. Built upon the TR framework, the APTS method ensures global convergence towards a minimizer. Moreover, it eliminates the need for computationally expensive hyper-parameter tuning, as the TR algorithm automatically determines the step size in each iteration. We demonstrate the capabilities, strengths, and limitations of the proposed APTS training method by performing a series of numerical experiments. The presented numerical study includes a comparison with widely used training methods such as SGD, Adam, LBFGS, and the standard TR method.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
A domain splitting strategy for solving PDEs
Authors:
Ken Trotti
Abstract:
In this work we develop a novel domain splitting strategy for the solution of partial differential equations. Focusing on a uniform discretization of the $d$-dimensional advection-diffusion equation, our proposal is a two-level algorithm that merges the solutions obtained from the discretization of the equation over highly anisotropic submeshes to compute an initial approximation of the fine solut…
▽ More
In this work we develop a novel domain splitting strategy for the solution of partial differential equations. Focusing on a uniform discretization of the $d$-dimensional advection-diffusion equation, our proposal is a two-level algorithm that merges the solutions obtained from the discretization of the equation over highly anisotropic submeshes to compute an initial approximation of the fine solution. The algorithm then iteratively refines the initial guess by leveraging the structure of the residual. Performing costly calculations on anisotropic submeshes enable us to reduce the dimensionality of the problem by one, and the merging process, which involves the computation of solutions over disjoint domains, allows for parallel implementation.
△ Less
Submitted 2 March, 2023;
originally announced March 2023.
-
A smoothing analysis for multigrid methods applied to tempered fractional problems
Authors:
D. Ahmad,
M. Donatelli,
M. Mazza,
S. Serra-Capizzano,
K. Trotti
Abstract:
We consider the numerical solution of time-dependent space tempered fractional diffusion equations. The use of Crank-Nicolson in time and of second-order accurate tempered weighted and shifted Grünwald difference in space leads to dense (multilevel) Toeplitz-like linear systems. By exploiting the related structure, we design an ad-hoc multigrid solver and multigrid-based preconditioners, all with…
▽ More
We consider the numerical solution of time-dependent space tempered fractional diffusion equations. The use of Crank-Nicolson in time and of second-order accurate tempered weighted and shifted Grünwald difference in space leads to dense (multilevel) Toeplitz-like linear systems. By exploiting the related structure, we design an ad-hoc multigrid solver and multigrid-based preconditioners, all with weighted Jacobi as smoother. A new smoothing analysis is provided, which refines state-of-the-art results expanding the set of the suitable Jacobi weights. Furthermore, we prove that if a multigrid method is effective in the non-tempered case, then the same multigrid method is effective also in the tempered one. The numerical results confirm the theoretical analysis, showing that the resulting multigrid-based solvers are computationally effective for tempered fractional diffusion equations.
△ Less
Submitted 10 October, 2022;
originally announced October 2022.
-
Multigrid for two-sided fractional differential equations discretized by finite volume elements on graded meshes
Authors:
Marco Donatelli,
Rolf Krause,
Mariarosa Mazza,
Ken Trotti
Abstract:
It is known that the solution of a conservative steady-state two-sided fractional diffusion problem can exhibit singularities near the boundaries. As consequence of this, and due to the conservative nature of the problem, we adopt a finite volume elements discretization approach over a generic non-uniform mesh. We focus on grids mapped by a smooth function which consist in a combination of a grade…
▽ More
It is known that the solution of a conservative steady-state two-sided fractional diffusion problem can exhibit singularities near the boundaries. As consequence of this, and due to the conservative nature of the problem, we adopt a finite volume elements discretization approach over a generic non-uniform mesh. We focus on grids mapped by a smooth function which consist in a combination of a graded mesh near the singularity and a uniform mesh where the solution is smooth. Such a choice gives rise to Toeplitz-like discretization matrices and thus allows a low computational cost of the matrix-vector product and a detailed spectral analysis. The obtained spectral information is used to develop an ad-hoc parameter free multigrid preconditioner for GMRES, which is numerically shown to yield good convergence results in presence of graded meshes mapped by power functions that accumulate points near the singularity. The approximation order of the considered graded meshes is numerically compared with the one of a certain composite mesh given in literature that still leads to Toeplitz-like linear systems and is then still well-suited for our multigrid method. Several numerical tests confirm that power graded meshes result in lower approximation errors than composite ones and that our solver has a wide range of applicability.
△ Less
Submitted 19 September, 2022;
originally announced September 2022.