Search | arXiv e-print repository

Explicit Universal and Approximate-Universal Kernels on Compact Metric Spaces

Abstract: Universal kernels, whose Reproducing Kernel Hilbert Space is dense in the space of continuous functions are of great practical and theoretical interest. In this paper, we introduce an explicit construction of universal kernels on compact metric spaces. We also introduce a notion of approximate universality, and construct tractable kernels that are approximately universal. Universal kernels, whose Reproducing Kernel Hilbert Space is dense in the space of continuous functions are of great practical and theoretical interest. In this paper, we introduce an explicit construction of universal kernels on compact metric spaces. We also introduce a notion of approximate universality, and construct tractable kernels that are approximately universal. △ Less

Submitted 13 June, 2025; v1 submitted 4 June, 2025; originally announced June 2025.

MSC Class: 46E22; 41A05

arXiv:2501.04016 [pdf, other]

Computing Barycentres of Measures for Generic Transport Costs

Authors: Eloi Tanguy, Julie Delon, Nathaël Gozlan

Abstract: Wasserstein barycentres represent average distributions between multiple probability measures for the Wasserstein distance. The numerical computation of Wasserstein barycentres is notoriously challenging. A common approach is to use Sinkhorn iterations, where an entropic regularisation term is introduced to make the problem more manageable. Another approach involves using fixed-point methods, akin… ▽ More Wasserstein barycentres represent average distributions between multiple probability measures for the Wasserstein distance. The numerical computation of Wasserstein barycentres is notoriously challenging. A common approach is to use Sinkhorn iterations, where an entropic regularisation term is introduced to make the problem more manageable. Another approach involves using fixed-point methods, akin to those employed for computing Fréchet means on manifolds. The convergence of such methods for 2-Wasserstein barycentres, specifically with a quadratic cost function and absolutely continuous measures, was studied by Alvarez-Esteban et al. (2016). In this paper, we delve into the main ideas behind this fixed-point method and explore how it can be generalised to accommodate more diverse transport costs and generic probability measures, thereby extending its applicability to a broader range of problems. We show convergence results for this approach and illustrate its numerical behaviour on several barycentre problems. △ Less

Submitted 20 December, 2024; originally announced January 2025.

arXiv:2407.13445 [pdf, other]

Constrained Approximate Optimal Transport Maps

Authors: Eloi Tanguy, Agnès Desolneux, Julie Delon

Abstract: We investigate finding a map $g$ within a function class $G$ that minimises an Optimal Transport (OT) cost between a target measure $ν$ and the image by $g$ of a source measure $μ$. This is relevant when an OT map from $μ$ to $ν$ does not exist or does not satisfy the desired constraints of $G$. We address existence and uniqueness for generic subclasses of $L$-Lipschitz functions, including gradie… ▽ More We investigate finding a map $g$ within a function class $G$ that minimises an Optimal Transport (OT) cost between a target measure $ν$ and the image by $g$ of a source measure $μ$. This is relevant when an OT map from $μ$ to $ν$ does not exist or does not satisfy the desired constraints of $G$. We address existence and uniqueness for generic subclasses of $L$-Lipschitz functions, including gradients of (strongly) convex functions and typical Neural Networks. We explore a variant that approaches a transport plan, showing equivalence to a map problem in some cases. For the squared Euclidean cost, we propose alternating minimisation over a transport plan $π$ and map $g$, with the optimisation over $g$ being the $L^2$ projection on $G$ of the barycentric mapping $\overlineπ$. In dimension one, this global problem equates the $L^2$ projection of $\overline{π^*}$ onto $G$ for an OT plan $π^*$ between $μ$ and $ν$, but this does not extend to higher dimensions. We introduce a simple kernel method to find $g$ within a Reproducing Kernel Hilbert Space in the discrete case. We present numerical methods for $L$-Lipschitz gradients of $\ell$-strongly convex potentials, and study the convergence of Stochastic Gradient Descent methods for Neural Networks. We finish with an illustration on colour transfer, applying learned maps on new images, and showcasing outlier robustness. △ Less

Submitted 12 March, 2025; v1 submitted 18 July, 2024; originally announced July 2024.

MSC Class: 49Q22

arXiv:2307.11714 [pdf, ps, other]

Convergence of SGD for Training Neural Networks with Sliced Wasserstein Losses

Authors: Eloi Tanguy

Abstract: Optimal Transport has sparked vivid interest in recent years, in particular thanks to the Wasserstein distance, which provides a geometrically sensible and intuitive way of comparing probability measures. For computational reasons, the Sliced Wasserstein (SW) distance was introduced as an alternative to the Wasserstein distance, and has seen uses for training generative Neural Networks (NNs). Whil… ▽ More Optimal Transport has sparked vivid interest in recent years, in particular thanks to the Wasserstein distance, which provides a geometrically sensible and intuitive way of comparing probability measures. For computational reasons, the Sliced Wasserstein (SW) distance was introduced as an alternative to the Wasserstein distance, and has seen uses for training generative Neural Networks (NNs). While convergence of Stochastic Gradient Descent (SGD) has been observed practically in such a setting, there is to our knowledge no theoretical guarantee for this observation. Leveraging recent works on convergence of SGD on non-smooth and non-convex functions by Bianchi et al. (2022), we aim to bridge that knowledge gap, and provide a realistic context under which fixed-step SGD trajectories for the SW loss on NN parameters converge. More precisely, we show that the trajectories approach the set of (sub)-gradient flow equations as the step decreases. Under stricter assumptions, we show a much stronger convergence result for noised and projected SGD schemes, namely that the long-run limits of the trajectories approach a set of generalised critical points of the loss function. △ Less

Submitted 18 March, 2024; v1 submitted 21 July, 2023; originally announced July 2023.

Journal ref: Transactions on Machine Learning Research, 2023 2835-8856

arXiv:2307.10352 [pdf, other]

doi 10.1090/mcom/3994

Properties of Discrete Sliced Wasserstein Losses

Authors: Eloi Tanguy, Rémi Flamary, Julie Delon

Abstract: The Sliced Wasserstein (SW) distance has become a popular alternative to the Wasserstein distance for comparing probability measures. Widespread applications include image processing, domain adaptation and generative modelling, where it is common to optimise some parameters in order to minimise SW, which serves as a loss function between discrete probability measures (since measures admitting dens… ▽ More The Sliced Wasserstein (SW) distance has become a popular alternative to the Wasserstein distance for comparing probability measures. Widespread applications include image processing, domain adaptation and generative modelling, where it is common to optimise some parameters in order to minimise SW, which serves as a loss function between discrete probability measures (since measures admitting densities are numerically unattainable). All these optimisation problems bear the same sub-problem, which is minimising the Sliced Wasserstein energy. In this paper we study the properties of $\mathcal{E}: Y \longmapsto \mathrm{SW}_2^2(γ_Y, γ_Z)$, i.e. the SW distance between two uniform discrete measures with the same amount of points as a function of the support $Y \in \mathbb{R}^{n \times d}$ of one of the measures. We investigate the regularity and optimisation properties of this energy, as well as its Monte-Carlo approximation $\mathcal{E}_p$ (estimating the expectation in SW using only $p$ samples) and show convergence results on the critical points of $\mathcal{E}_p$ to those of $\mathcal{E}$, as well as an almost-sure uniform convergence and a uniform Central Limit result on the process $\mathcal{E}_p(Y)$. Finally, we show that in a certain sense, Stochastic Gradient Descent methods minimising $\mathcal{E}$ and $\mathcal{E}_p$ converge towards (Clarke) critical points of these energies. △ Less

Submitted 14 May, 2025; v1 submitted 19 July, 2023; originally announced July 2023.

Journal ref: Mathematics of Computation (2024)

arXiv:2304.12029 [pdf, other]

Reconstructing discrete measures from projections. Consequences on the empirical Sliced Wasserstein Distance

Authors: Eloi Tanguy, Rémi Flamary, Julie Delon

Abstract: This paper deals with the reconstruction of a discrete measure $γ_Z$ on $\mathbb{R}^d$ from the knowledge of its pushforward measures $P_i\#γ_Z$ by linear applications $P_i: \mathbb{R}^d \rightarrow \mathbb{R}^{d_i}$ (for instance projections onto subspaces). The measure $γ_Z$ being fixed, assuming that the rows of the matrices $P_i$ are independent realizations of laws which do not give mass to h… ▽ More This paper deals with the reconstruction of a discrete measure $γ_Z$ on $\mathbb{R}^d$ from the knowledge of its pushforward measures $P_i\#γ_Z$ by linear applications $P_i: \mathbb{R}^d \rightarrow \mathbb{R}^{d_i}$ (for instance projections onto subspaces). The measure $γ_Z$ being fixed, assuming that the rows of the matrices $P_i$ are independent realizations of laws which do not give mass to hyperplanes, we show that if $\sum_i d_i > d$, this reconstruction problem has almost certainly a unique solution. This holds for any number of points in $γ_Z$. A direct consequence of this result is an almost-sure separability property on the empirical Sliced Wasserstein distance. △ Less

Submitted 12 April, 2024; v1 submitted 24 April, 2023; originally announced April 2023.

Showing 1–6 of 6 results for author: Tanguy, E