Which Optimizer Works Best for Physics-Informed Neural Networks and Kolmogorov-Arnold Networks?

Kiyani, Elham; Shukla, Khemraj; Urbán, Jorge F.; Darbon, Jérôme; Karniadakis, George Em

Computer Science > Machine Learning

arXiv:2501.16371 (cs)

[Submitted on 22 Jan 2025 (v1), last revised 17 Apr 2025 (this version, v3)]

Title:Which Optimizer Works Best for Physics-Informed Neural Networks and Kolmogorov-Arnold Networks?

Authors:Elham Kiyani, Khemraj Shukla, Jorge F. Urbán, Jérôme Darbon, George Em Karniadakis

View PDF HTML (experimental)

Abstract:Physics-Informed Neural Networks (PINNs) have revolutionized the computation of PDE solutions by integrating partial differential equations (PDEs) into the neural network's training process as soft constraints, becoming an important component of the scientific machine learning (SciML) ecosystem. More recently, physics-informed Kolmogorv-Arnold networks (PIKANs) have also shown to be effective and comparable in accuracy with PINNs. In their current implementation, both PINNs and PIKANs are mainly optimized using first-order methods like Adam, as well as quasi-Newton methods such as BFGS and its low-memory variant, L-BFGS. However, these optimizers often struggle with highly non-linear and non-convex loss landscapes, leading to challenges such as slow convergence, local minima entrapment, and (non)degenerate saddle points. In this study, we investigate the performance of Self-Scaled BFGS (SSBFGS), Self-Scaled Broyden (SSBroyden) methods and other advanced quasi-Newton schemes, including BFGS and L-BFGS with different line search strategies approaches. These methods dynamically rescale updates based on historical gradient information, thus enhancing training efficiency and accuracy. We systematically compare these optimizers -- using both PINNs and PIKANs -- on key challenging linear, stiff, multi-scale and non-linear PDEs, including the Burgers, Allen-Cahn, Kuramoto-Sivashinsky, and Ginzburg-Landau equations. Our findings provide state-of-the-art results with orders-of-magnitude accuracy improvements without the use of adaptive weights or any other enhancements typically employed in PINNs. More broadly, our results reveal insights into the effectiveness of second-order optimization strategies in significantly improving the convergence and accurate generalization of PINNs and PIKANs.

Comments:	36 pages, 27 figures
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Cite as:	arXiv:2501.16371 [cs.LG]
	(or arXiv:2501.16371v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2501.16371

Submission history

From: Elham Kianiharchegani [view email]
[v1] Wed, 22 Jan 2025 21:19:42 UTC (18,656 KB)
[v2] Tue, 15 Apr 2025 03:30:52 UTC (18,796 KB)
[v3] Thu, 17 Apr 2025 13:26:56 UTC (18,796 KB)

Computer Science > Machine Learning

Title:Which Optimizer Works Best for Physics-Informed Neural Networks and Kolmogorov-Arnold Networks?

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Which Optimizer Works Best for Physics-Informed Neural Networks and Kolmogorov-Arnold Networks?

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators