SOFIM: Stochastic Optimization Using Regularized Fisher Information Matrix
Authors:
Mrinmay Sen,
A. K. Qin,
Gayathri C,
Raghu Kishore N,
Yen-Wei Chen,
Balasubramanian Raman
Abstract:
This paper introduces a new stochastic optimization method based on the regularized Fisher information matrix (FIM), named SOFIM, which can efficiently utilize the FIM to approximate the Hessian matrix for finding Newton's gradient update in large-scale stochastic optimization of machine learning models. It can be viewed as a variant of natural gradient descent, where the challenge of storing and…
▽ More
This paper introduces a new stochastic optimization method based on the regularized Fisher information matrix (FIM), named SOFIM, which can efficiently utilize the FIM to approximate the Hessian matrix for finding Newton's gradient update in large-scale stochastic optimization of machine learning models. It can be viewed as a variant of natural gradient descent, where the challenge of storing and calculating the full FIM is addressed through making use of the regularized FIM and directly finding the gradient update direction via Sherman-Morrison matrix inversion. Additionally, like the popular Adam method, SOFIM uses the first moment of the gradient to address the issue of non-stationary objectives across mini-batches due to heterogeneous data. The utilization of the regularized FIM and Sherman-Morrison matrix inversion leads to the improved convergence rate with the same space and time complexities as stochastic gradient descent (SGD) with momentum. The extensive experiments on training deep learning models using several benchmark image classification datasets demonstrate that the proposed SOFIM outperforms SGD with momentum and several state-of-the-art Newton optimization methods in term of the convergence speed for achieving the pre-specified objectives of training and test losses as well as test accuracy.
△ Less
Submitted 1 May, 2024; v1 submitted 5 March, 2024;
originally announced March 2024.
Parameterizing Path Partitions
Authors:
Henning Fernau,
Florent Foucaud,
Kevin Mann,
Utkarsh Padariya,
Rajath Rao K. N
Abstract:
We study the algorithmic complexity of partitioning the vertex set of a given (di)graph into a small number of paths. The Path Partition problem (PP) has been studied extensively, as it includes Hamiltonian Path as a special case. The natural variants where the paths are required to be either \emph{induced} (Induced Path Partition, IPP) or \emph{shortest} (Shortest Path Partition, SPP), have recei…
▽ More
We study the algorithmic complexity of partitioning the vertex set of a given (di)graph into a small number of paths. The Path Partition problem (PP) has been studied extensively, as it includes Hamiltonian Path as a special case. The natural variants where the paths are required to be either \emph{induced} (Induced Path Partition, IPP) or \emph{shortest} (Shortest Path Partition, SPP), have received much less attention. Both problems are known to be NP-complete on undirected graphs; we strengthen this by showing that they remain so even on planar bipartite directed acyclic graphs (DAGs), and that SPP remains NP-hard on undirected bipartite graphs. When parameterized by the natural parameter ``number of paths'', both SPP and IPP are shown to be W[1]-hard on DAGs. We also show that SPP is in XP both for DAGs and undirected graphs for the same parameter, as well as for other special subclasses of directed graphs (IPP is known to be NP-hard on undirected graphs, even for two paths). On the positive side, we show that for undirected graphs, both problems are in FPT, parameterized by neighborhood diversity. We also give an explicit algorithm for the vertex cover parameterization of PP. When considering the dual parameterization (graph order minus number of paths), all three variants, IPP, SPP and PP, are shown to be in FPT for undirected graphs. We also lift the mentioned neighborhood diversity and dual parameterization results to directed graphs; here, we need to define a proper novel notion of directed neighborhood diversity. As we also show, most of our results transfer to the case of covering by edge-disjoint paths, and purely covering.
△ Less
Submitted 20 December, 2024; v1 submitted 22 December, 2022;
originally announced December 2022.