Convergence of SGD for Training Neural Networks with Sliced Wasserstein Losses

Tanguy, Eloi

Computer Science > Machine Learning

arXiv:2307.11714 (cs)

[Submitted on 21 Jul 2023 (v1), last revised 18 Mar 2024 (this version, v3)]

Title:Convergence of SGD for Training Neural Networks with Sliced Wasserstein Losses

Authors:Eloi Tanguy

View PDF HTML (experimental)

Abstract:Optimal Transport has sparked vivid interest in recent years, in particular thanks to the Wasserstein distance, which provides a geometrically sensible and intuitive way of comparing probability measures. For computational reasons, the Sliced Wasserstein (SW) distance was introduced as an alternative to the Wasserstein distance, and has seen uses for training generative Neural Networks (NNs). While convergence of Stochastic Gradient Descent (SGD) has been observed practically in such a setting, there is to our knowledge no theoretical guarantee for this observation. Leveraging recent works on convergence of SGD on non-smooth and non-convex functions by Bianchi et al. (2022), we aim to bridge that knowledge gap, and provide a realistic context under which fixed-step SGD trajectories for the SW loss on NN parameters converge. More precisely, we show that the trajectories approach the set of (sub)-gradient flow equations as the step decreases. Under stricter assumptions, we show a much stronger convergence result for noised and projected SGD schemes, namely that the long-run limits of the trajectories approach a set of generalised critical points of the loss function.

Subjects:	Machine Learning (cs.LG); Optimization and Control (math.OC); Probability (math.PR)
Cite as:	arXiv:2307.11714 [cs.LG]
	(or arXiv:2307.11714v3 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2307.11714
Journal reference:	Transactions on Machine Learning Research, 2023 2835-8856

Submission history

From: Eloi Tanguy [view email]
[v1] Fri, 21 Jul 2023 17:19:01 UTC (9,974 KB)
[v2] Tue, 30 Jan 2024 16:24:51 UTC (43 KB)
[v3] Mon, 18 Mar 2024 09:55:08 UTC (43 KB)

Computer Science > Machine Learning

Title:Convergence of SGD for Training Neural Networks with Sliced Wasserstein Losses

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Convergence of SGD for Training Neural Networks with Sliced Wasserstein Losses

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators