-
Information Geometry of Wasserstein Statistics on Shapes and Affine Deformations
Authors:
Shun-ichi Amari,
Takeru Matsuda
Abstract:
Information geometry and Wasserstein geometry are two main structures introduced in a manifold of probability distributions, and they capture its different characteristics. We study characteristics of Wasserstein geometry in the framework of Li and Zhao (2023) for the affine deformation statistical model, which is a multi-dimensional generalization of the location-scale model. We compare merits an…
▽ More
Information geometry and Wasserstein geometry are two main structures introduced in a manifold of probability distributions, and they capture its different characteristics. We study characteristics of Wasserstein geometry in the framework of Li and Zhao (2023) for the affine deformation statistical model, which is a multi-dimensional generalization of the location-scale model. We compare merits and demerits of estimators based on information geometry and Wasserstein geometry. The shape of a probability distribution and its affine deformation are separated in the Wasserstein geometry, showing its robustness against the waveform perturbation in exchange for the loss in Fisher efficiency. We show that the Wasserstein estimator is the moment estimator in the case of the elliptically symmetric affine deformation model. It coincides with the information-geometrical estimator (maximum-likelihood estimator) when the waveform is Gaussian. The role of the Wasserstein efficiency is elucidated in terms of robustness against waveform change.
△ Less
Submitted 25 June, 2024; v1 submitted 23 July, 2023;
originally announced July 2023.
-
Wasserstein Statistics in One-dimensional Location-Scale Model
Authors:
Shun-ichi Amari,
Takeru Matsuda
Abstract:
Wasserstein geometry and information geometry are two important structures to be introduced in a manifold of probability distributions. Wasserstein geometry is defined by using the transportation cost between two distributions, so it reflects the metric of the base manifold on which the distributions are defined. Information geometry is defined to be invariant under reversible transformations of t…
▽ More
Wasserstein geometry and information geometry are two important structures to be introduced in a manifold of probability distributions. Wasserstein geometry is defined by using the transportation cost between two distributions, so it reflects the metric of the base manifold on which the distributions are defined. Information geometry is defined to be invariant under reversible transformations of the base space. Both have their own merits for applications. In particular, statistical inference is based upon information geometry, where the Fisher metric plays a fundamental role, whereas Wasserstein geometry is useful in computer vision and AI applications. In this study, we analyze statistical inference based on the Wasserstein geometry in the case that the base space is one-dimensional. By using the location-scale model, we further derive the W-estimator that explicitly minimizes the transportation cost from the empirical distribution to a statistical model and study its asymptotic behaviors. We show that the W-estimator is consistent and explicitly give its asymptotic distribution by using the functional delta method. The W-estimator is Fisher efficient in the Gaussian case.
△ Less
Submitted 28 December, 2020; v1 submitted 21 July, 2020;
originally announced July 2020.
-
Wasserstein statistics in 1D location-scale model
Authors:
Shun-ichi Amari
Abstract:
Wasserstein geometry and information geometry are two important structures introduced in a manifold of probability distributions. The former is defined by using the transportation cost between two distributions, so it reflects the metric structure of the base manifold on which distributions are defined. Information geometry is constructed based on the invariance criterion that the geometry is inva…
▽ More
Wasserstein geometry and information geometry are two important structures introduced in a manifold of probability distributions. The former is defined by using the transportation cost between two distributions, so it reflects the metric structure of the base manifold on which distributions are defined. Information geometry is constructed based on the invariance criterion that the geometry is invariant under reversible transformations of the base space. Both have their own merits for applications. Statistical inference is constructed on information geometry, where the Fisher metric plays a fundamental role, whereas Wasserstein geometry is useful for applications to computer vision and AI. We propose statistical inference based on the Wasserstein geometry in the case that the base space is 1-dimensional. By using the location-scale model, we derive the $W$-estimator explicitly and studies its asymptotic behaviors.
△ Less
Submitted 5 March, 2020;
originally announced March 2020.
-
Interpolating between Optimal Transport and MMD using Sinkhorn Divergences
Authors:
Jean Feydy,
Thibault Séjourné,
François-Xavier Vialard,
Shun-ichi Amari,
Alain Trouvé,
Gabriel Peyré
Abstract:
Comparing probability distributions is a fundamental problem in data sciences. Simple norms and divergences such as the total variation and the relative entropy only compare densities in a point-wise manner and fail to capture the geometric nature of the problem. In sharp contrast, Maximum Mean Discrepancies (MMD) and Optimal Transport distances (OT) are two classes of distances between measures t…
▽ More
Comparing probability distributions is a fundamental problem in data sciences. Simple norms and divergences such as the total variation and the relative entropy only compare densities in a point-wise manner and fail to capture the geometric nature of the problem. In sharp contrast, Maximum Mean Discrepancies (MMD) and Optimal Transport distances (OT) are two classes of distances between measures that take into account the geometry of the underlying space and metrize the convergence in law.
This paper studies the Sinkhorn divergences, a family of geometric divergences that interpolates between MMD and OT. Relying on a new notion of geometric entropy, we provide theoretical guarantees for these divergences: positivity, convexity and metrization of the convergence in law. On the practical side, we detail a numerical scheme that enables the large scale application of these divergences for machine learning: on the GPU, gradients of the Sinkhorn loss can be computed for batches of a million samples.
△ Less
Submitted 18 October, 2018;
originally announced October 2018.
-
Information Geometry Connecting Wasserstein Distance and Kullback-Leibler Divergence via the Entropy-Relaxed Transportation Problem
Authors:
Shun-ichi Amari,
Ryo Karakida,
Masafumi Oizumi
Abstract:
Two geometrical structures have been extensively studied for a manifold of probability distributions. One is based on the Fisher information metric, which is invariant under reversible transformations of random variables, while the other is based on the Wasserstein distance of optimal transportation, which reflects the structure of the distance between random variables. Here, we propose a new info…
▽ More
Two geometrical structures have been extensively studied for a manifold of probability distributions. One is based on the Fisher information metric, which is invariant under reversible transformations of random variables, while the other is based on the Wasserstein distance of optimal transportation, which reflects the structure of the distance between random variables. Here, we propose a new information-geometrical theory that is a unified framework connecting the Wasserstein distance and Kullback-Leibler (KL) divergence. We primarily considered a discrete case consisting of $n$ elements and studied the geometry of the probability simplex $S_{n-1}$, which is the set of all probability distributions over $n$ elements. The Wasserstein distance was introduced in $S_{n-1}$ by the optimal transportation of commodities from distribution ${\mathbf{p}}$ to distribution ${\mathbf{q}}$, where ${\mathbf{p}}$, ${\mathbf{q}} \in S_{n-1}$. We relaxed the optimal transportation by using entropy, which was introduced by Cuturi. The optimal solution was called the entropy-relaxed stochastic transportation plan. The entropy-relaxed optimal cost $C({\mathbf{p}}, {\mathbf{q}})$ was computationally much less demanding than the original Wasserstein distance but does not define a distance because it is not minimized at ${\mathbf{p}}={\mathbf{q}}$. To define a proper divergence while retaining the computational advantage, we first introduced a divergence function in the manifold $S_{n-1} \times S_{n-1}$ of optimal transportation plans. We fully explored the information geometry of the manifold of the optimal transportation plans and subsequently constructed a new one-parameter family of divergences in $S_{n-1}$ that are related to both the Wasserstein distance and the KL-divergence.
△ Less
Submitted 28 September, 2017;
originally announced September 2017.
-
Curvature of Hessian Manfiolds
Authors:
Shun-ichi Amari,
John Armstrong
Abstract:
We prove that, in dimensions greater than 2, the generic metric is not a Hessian metric and find a curvature condition on Hessian metrics in dimensions greater than 3. In particular we prove that the forms used to define the Pontryagin classes in terms of the curvature vanish on a Hessian manifold. By contrast all analytic Riemannian 2-metrics are Hessian metrics.
We prove that, in dimensions greater than 2, the generic metric is not a Hessian metric and find a curvature condition on Hessian metrics in dimensions greater than 3. In particular we prove that the forms used to define the Pontryagin classes in terms of the curvature vanish on a Hessian manifold. By contrast all analytic Riemannian 2-metrics are Hessian metrics.
△ Less
Submitted 4 December, 2013;
originally announced December 2013.
-
Achieving Precise Mechanical Control in Intrinsically Noisy Systems
Authors:
Wenlian Lu,
Jianfeng Feng,
Shun-ichi Amari,
David Waxman
Abstract:
How can precise control be realised in intrinsically noisy systems? Here, we develop a general theoretical framework that provides a way to achieve precise control in signal-dependent noisy environments. When the control signal has Poisson or supra-Poisson noise, precise control is not possible. If, however, the control signal has sub-Poisson noise, then precise control is possible. For this case,…
▽ More
How can precise control be realised in intrinsically noisy systems? Here, we develop a general theoretical framework that provides a way to achieve precise control in signal-dependent noisy environments. When the control signal has Poisson or supra-Poisson noise, precise control is not possible. If, however, the control signal has sub-Poisson noise, then precise control is possible. For this case, the precise control solution is not a function, but a rapidly varying random process that must be averaged with respect to a governing probability density functional. Our theoretical approach is applied to the control of straight-trajectory arm movement. Sub-Poisson noise in the control signal is shown to be capable of leading to precise control. Intriguingly, the control signal for this system has a natural counterpart, namely the bursting pulses of neurons --trains of Dirac-delta functions-- in biological systems to achieve precise control performance.
△ Less
Submitted 30 April, 2013;
originally announced April 2013.
-
Dually flat structure with escort probability and its application to alpha-Voronoi diagrams
Authors:
Atsumi Ohara,
Hiroshi Matsuzoe,
Shun-ichi Amari
Abstract:
This paper studies geometrical structure of the manifold of escort probability distributions and shows its new applicability to information science. In order to realize escort probabilities we use a conformal transformation that flattens so-called alpha-geometry of the space of discrete probability distributions, which well characterizes nonadditive statistics on the space. As a result escort prob…
▽ More
This paper studies geometrical structure of the manifold of escort probability distributions and shows its new applicability to information science. In order to realize escort probabilities we use a conformal transformation that flattens so-called alpha-geometry of the space of discrete probability distributions, which well characterizes nonadditive statistics on the space. As a result escort probabilities are proved to be flat coordinates of the usual probabilities for the derived dually flat structure. Finally, we demonstrate that escort probabilities with the new structure admits a simple algorithm to compute Voronoi diagrams and centroids with respect to alpha-divergences.
△ Less
Submitted 24 October, 2010;
originally announced October 2010.