-
Maximum likelihood estimation for the $λ$-exponential family
Authors:
Xiwei Tian,
Ting-Kam Leonard Wong,
Jiaowen Yang,
Jun Zhang
Abstract:
The $λ$-exponential family generalizes the standard exponential family via a generalized convex duality motivated by optimal transport. It is the constant-curvature analogue of the exponential family from the information-geometric point of view, but the development of computational methodologies is still in an early stage. In this paper, we propose a fixed point iteration for maximum likelihood es…
▽ More
The $λ$-exponential family generalizes the standard exponential family via a generalized convex duality motivated by optimal transport. It is the constant-curvature analogue of the exponential family from the information-geometric point of view, but the development of computational methodologies is still in an early stage. In this paper, we propose a fixed point iteration for maximum likelihood estimation under i.i.d.~sampling, and prove using the duality that the likelihood is monotone along the iterations. We illustrate the algorithm with the $q$-Gaussian distribution and the Dirichlet perturbation.
△ Less
Submitted 6 May, 2025;
originally announced May 2025.
-
On the Wasserstein alignment problem
Authors:
Soumik Pal,
Bodhisattva Sen,
Ting-Kam Leonard Wong
Abstract:
Suppose we are given two metric spaces and a family of continuous transformations from one to the other. Given a probability distribution on each of these two spaces - namely the source and the target measures - the Wasserstein alignment problem seeks the transformation that minimizes the optimal transport cost between its pushforward of the source distribution and the target distribution, ensurin…
▽ More
Suppose we are given two metric spaces and a family of continuous transformations from one to the other. Given a probability distribution on each of these two spaces - namely the source and the target measures - the Wasserstein alignment problem seeks the transformation that minimizes the optimal transport cost between its pushforward of the source distribution and the target distribution, ensuring the closest possible alignment in a probabilistic sense. Examples of interest include two distributions on two Euclidean spaces $\mathbb{R}^n$ and $\mathbb{R}^d$, and we want a spatial embedding of the $n$-dimensional source measure in $\mathbb{R}^d$ that is closest in some Wasserstein metric to the target distribution on $\mathbb{R}^d$. Similar data alignment problems also commonly arise in shape analysis and computer vision. In this paper we show that this nonconvex optimal transport projection problem admits a convex Kantorovich-type dual. This allows us to characterize the set of projections and devise a linear programming algorithm. For certain special examples, such as orthogonal transformations on Euclidean spaces of unequal dimensions and the $2$-Wasserstein cost, we characterize the covariance of the optimal projections. Our results also cover the generalization when we penalize each transformation by a function. An example is the inner product Gromov-Wasserstein distance minimization problem which has recently gained popularity.
△ Less
Submitted 9 March, 2025;
originally announced March 2025.
-
Adapted optimal transport between Gaussian processes in discrete time
Authors:
Madhu Gunasingam,
Ting-Kam Leonard Wong
Abstract:
We derive explicitly the adapted $2$-Wasserstein distance between non-degenerate Gaussian distributions on $\mathbb{R}^N$ and characterize the optimal bicausal coupling(s). This leads to an adapted version of the Bures-Wasserstein distance on the space of positive definite matrices.
We derive explicitly the adapted $2$-Wasserstein distance between non-degenerate Gaussian distributions on $\mathbb{R}^N$ and characterize the optimal bicausal coupling(s). This leads to an adapted version of the Bures-Wasserstein distance on the space of positive definite matrices.
△ Less
Submitted 12 January, 2025; v1 submitted 9 April, 2024;
originally announced April 2024.
-
JKO schemes with general transport costs
Authors:
Cale Rankin,
Ting-Kam Leonard Wong
Abstract:
We modify the JKO scheme, which is a time discretization of Wasserstein gradient flows, by replacing the Wasserstein distance with more general transport costs on manifolds. We show when the cost function has a mixed Hessian which defines a Riemannian metric, our modified JKO scheme converges under suitable conditions to the corresponding Riemannian Fokker--Planck equation. Thus on a Riemannian ma…
▽ More
We modify the JKO scheme, which is a time discretization of Wasserstein gradient flows, by replacing the Wasserstein distance with more general transport costs on manifolds. We show when the cost function has a mixed Hessian which defines a Riemannian metric, our modified JKO scheme converges under suitable conditions to the corresponding Riemannian Fokker--Planck equation. Thus on a Riemannian manifold one may replace the (squared) Riemannian distance with any cost function which induces the metric. Of interest is when the Riemannian distance is computationally intractable, but a suitable cost has a simple analytic expression. We consider the Fokker--Planck equation on compact submanifolds with the Neumann boundary condition and on complete Riemannian manifolds with a finite drift condition. As an application we consider Hessian manifolds, taking as a cost the Bregman divergence.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Information Geometry for the Working Information Theorist
Authors:
Kumar Vijay Mishra,
M. Ashok Kumar,
Ting-Kam Leonard Wong
Abstract:
Information geometry is a study of statistical manifolds, that is, spaces of probability distributions from a geometric perspective. Its classical information-theoretic applications relate to statistical concepts such as Fisher information, sufficient statistics, and efficient estimators. Today, information geometry has emerged as an interdisciplinary field that finds applications in diverse areas…
▽ More
Information geometry is a study of statistical manifolds, that is, spaces of probability distributions from a geometric perspective. Its classical information-theoretic applications relate to statistical concepts such as Fisher information, sufficient statistics, and efficient estimators. Today, information geometry has emerged as an interdisciplinary field that finds applications in diverse areas such as radar sensing, array signal processing, quantum physics, deep learning, and optimal transport. This article presents an overview of essential information geometry to initiate an information theorist, who may be unfamiliar with this exciting area of research. We explain the concepts of divergences on statistical manifolds, generalized notions of distances, orthogonality, and geodesics, thereby paving the way for concrete applications and novel theoretical investigations. We also highlight some recent information-geometric developments, which are of interest to the broader information theory community.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
Bregman-Wasserstein divergence: geometry and applications
Authors:
Amanjit Singh Kainth,
Cale Rankin,
Ting-Kam Leonard Wong
Abstract:
The Bregman-Wasserstein divergence is the optimal transport cost when the underlying cost function is given by a Bregman divergence, and arises naturally in fields such as statistics and machine learning. We establish fundamental properties of the Bregman-Wasserstein divergence and propose a novel generalized transport geometry that promotes the Bregman geometry to the space of probability distrib…
▽ More
The Bregman-Wasserstein divergence is the optimal transport cost when the underlying cost function is given by a Bregman divergence, and arises naturally in fields such as statistics and machine learning. We establish fundamental properties of the Bregman-Wasserstein divergence and propose a novel generalized transport geometry that promotes the Bregman geometry to the space of probability distributions. We provide a probabilistic interpretation involving exponential families and define generalized displacement interpolations compatible with the Bregman geometry. These interpolations are used to derive a generalized Pythagorean inequality, which is of independent interest. Furthermore, we construct a generalized dualistic geometry that lifts the differential geometry of the Bregman divergence to an infinite-dimensional statistical manifold. On the computational side, we demonstrate how Bregman-Wasserstein optimal transport maps can be estimated using neural approaches, establish the well-posedness of Bregman-Wasserstein barycenters, and relate them to Bayesian learning. Finally, we introduce the Bregman-Wasserstein JKO scheme for discretizing Riemannian Wasserstein gradient flows.
△ Less
Submitted 10 April, 2025; v1 submitted 11 February, 2023;
originally announced February 2023.
-
Conformal Mirror Descent with Logarithmic Divergences
Authors:
Amanjit Singh Kainth,
Ting-Kam Leonard Wong,
Frank Rudzicz
Abstract:
The logarithmic divergence is an extension of the Bregman divergence motivated by optimal transport and a generalized convex duality, and satisfies many remarkable properties. Using the geometry induced by the logarithmic divergence, we introduce a generalization of continuous time mirror descent that we term the conformal mirror descent. We derive its dynamics under a generalized mirror map, and…
▽ More
The logarithmic divergence is an extension of the Bregman divergence motivated by optimal transport and a generalized convex duality, and satisfies many remarkable properties. Using the geometry induced by the logarithmic divergence, we introduce a generalization of continuous time mirror descent that we term the conformal mirror descent. We derive its dynamics under a generalized mirror map, and show that it is a time change of a corresponding Hessian gradient flow. We also prove convergence results in continuous time. We apply the conformal mirror descent to online estimation of a generalized exponential family, and construct a family of gradient flows on the unit simplex via the Dirichlet optimal transport problem.
△ Less
Submitted 7 September, 2022;
originally announced September 2022.
-
An isomorphism theorem for models of Weak König's Lemma without primitive recursion
Authors:
Marta Fiori-Carones,
Leszek Aleksander Kołodziejczyk,
Tin Lok Wong,
Keita Yokoyama
Abstract:
We prove that if $(M,\mathcal{X})$ and $(M,\mathcal{Y})$ are countable models of the theory $\mathrm{WKL}^*_0$ such that $\mathrm{I}Σ_1(A)$ fails for some $A \in \mathcal{X} \cap \mathcal{Y}$, then $(M,\mathcal{X})$ and $(M,\mathcal{Y})$ are isomorphic. As a consequence, the analytic hierarchy collapses to $Δ^1_1$ provably in $\mathrm{WKL}^*_0 + \neg\mathrm{I}Σ^0_1$, and $\mathrm{WKL}$ is the stro…
▽ More
We prove that if $(M,\mathcal{X})$ and $(M,\mathcal{Y})$ are countable models of the theory $\mathrm{WKL}^*_0$ such that $\mathrm{I}Σ_1(A)$ fails for some $A \in \mathcal{X} \cap \mathcal{Y}$, then $(M,\mathcal{X})$ and $(M,\mathcal{Y})$ are isomorphic. As a consequence, the analytic hierarchy collapses to $Δ^1_1$ provably in $\mathrm{WKL}^*_0 + \neg\mathrm{I}Σ^0_1$, and $\mathrm{WKL}$ is the strongest $Π^1_2$ statement that is $Π^1_1$-conservative over $\mathrm{RCA}^*_0 + \neg\mathrm{I}Σ^0_1$.
Applying our results to the $Δ^0_n$-definable sets in models of $\mathrm{RCA}^*_0 + \mathrm{B}Σ^0_n + \neg\mathrm{I}Σ^0_n$ that also satisfy an appropriate relativization of Weak König's Lemma, we prove that for each $n \ge 1$, the set of $Π^1_2$ sentences that are $Π^1_1$-conservative over $\mathrm{RCA}^*_0 + \mathrm{B}Σ^0_n + \neg\mathrm{I}Σ^0_n$ is c.e. In contrast, we prove that the set of $Π^1_2$ sentences that are $Π^1_1$-conservative over $\mathrm{RCA}^*_0 + \mathrm{B}Σ^0_n$ is $Π_2$-complete. This answers a question of Towsner.
We also show that $\mathrm{RCA}_0 + \mathrm{RT}^2_2$ is $Π^1_1$-conservative over $\mathrm{B}Σ^0_2$ if and only if it is conservative over $\mathrm{B}Σ^0_2$ with respect to $\forall Π^0_5$ sentences.
△ Less
Submitted 27 August, 2022; v1 submitted 20 December, 2021;
originally announced December 2021.
-
Tsallis and Rényi deformations linked via a new $λ$-duality
Authors:
Ting-Kam Leonard Wong,
Jun Zhang
Abstract:
Tsallis and Rényi entropies, which are monotone transformations of each other, are deformations of the celebrated Shannon entropy. Maximization of these deformed entropies, under suitable constraints, leads to the $q$-exponential family which has applications in non-extensive statistical physics, information theory and statistics. In previous information-geometric studies, the $q$-exponential fami…
▽ More
Tsallis and Rényi entropies, which are monotone transformations of each other, are deformations of the celebrated Shannon entropy. Maximization of these deformed entropies, under suitable constraints, leads to the $q$-exponential family which has applications in non-extensive statistical physics, information theory and statistics. In previous information-geometric studies, the $q$-exponential family was analyzed using classical convex duality and Bregman divergence. In this paper, we show that a generalized $λ$-duality, where $λ= 1 - q$ is the constant information-geometric curvature, leads to a generalized exponential family which is essentially equivalent to the $q$-exponential family and has deep connections with Rényi entropy and optimal transport. Using this generalized convex duality and its associated logarithmic divergence, we show that our $λ$-exponential family satisfies properties that parallel and generalize those of the exponential family. Under our framework, the Rényi entropy and divergence arise naturally, and we give a new proof of the Tsallis/Rényi entropy maximizing property of the $q$-exponential family. We also introduce a $λ$-mixture family which may be regarded as the dual of the $λ$-exponential family, and connect it with other mixture-type families. Finally, we discuss a duality between the $λ$-exponential family and the $λ$-logarithmic divergence, and study its statistical consequences.
△ Less
Submitted 12 January, 2022; v1 submitted 25 July, 2021;
originally announced July 2021.
-
Projections with logarithmic divergences
Authors:
Zhixu Tao,
Ting-Kam Leonard Wong
Abstract:
In information geometry, generalized exponential families and statistical manifolds with curvature are under active investigation in recent years. In this paper we consider the statistical manifold induced by a logarithmic $L^{(α)}$-divergence which generalizes the Bregman divergence. It is known that such a manifold is dually projectively flat with constant negative sectional curvature, and is cl…
▽ More
In information geometry, generalized exponential families and statistical manifolds with curvature are under active investigation in recent years. In this paper we consider the statistical manifold induced by a logarithmic $L^{(α)}$-divergence which generalizes the Bregman divergence. It is known that such a manifold is dually projectively flat with constant negative sectional curvature, and is closely related to the $\mathcal{F}^{(α)}$-family, a generalized exponential family introduced by the second author. Our main result constructs a dual foliation of the statistical manifold, i.e., an orthogonal decomposition consisting of primal and dual autoparallel submanifolds. This decomposition, which can be naturally interpreted in terms of primal and dual projections with respect to the logarithmic divergence, extends the dual foliation of a dually flat manifold studied by Amari. As an application, we formulate a new $L^{(α)}$-PCA problem which generalizes the exponential family PCA.
△ Less
Submitted 8 May, 2021;
originally announced May 2021.
-
Functional portfolio optimization in stochastic portfolio theory
Authors:
Steven Campbell,
Ting-Kam Leonard Wong
Abstract:
In this paper we develop a concrete and fully implementable approach to the optimization of functionally generated portfolios in stochastic portfolio theory. The main idea is to optimize over a family of rank-based portfolios parameterized by an exponentially concave function on the unit interval. This choice can be motivated by the long term stability of the capital distribution observed in large…
▽ More
In this paper we develop a concrete and fully implementable approach to the optimization of functionally generated portfolios in stochastic portfolio theory. The main idea is to optimize over a family of rank-based portfolios parameterized by an exponentially concave function on the unit interval. This choice can be motivated by the long term stability of the capital distribution observed in large equity markets, and allows us to circumvent the curse of dimensionality. The resulting optimization problem, which is convex, allows for various regularizations and constraints to be imposed on the generating function. We prove an existence and uniqueness result for our optimization problem and provide a stability estimate in terms of a Wasserstein metric of the input measure. Then, we formulate a discretization which can be implemented numerically using available software packages and analyze its approximation error. Finally, we present empirical examples using CRSP data from the US stock market, including the performance of the portfolios allowing for dividends, defaults, and transaction costs.
△ Less
Submitted 9 October, 2021; v1 submitted 19 March, 2021;
originally announced March 2021.
-
Ramsey's theorem for pairs, collection, and proof size
Authors:
Leszek Aleksander Kołodziejczyk,
Tin Lok Wong,
Keita Yokoyama
Abstract:
We prove that any proof of a $\forall Σ^0_2$ sentence in the theory $\mathrm{WKL}_0 + \mathrm{RT}^2_2$ can be translated into a proof in $\mathrm{RCA}_0$ at the cost of a polynomial increase in size. In fact, the proof in $\mathrm{RCA}_0$ can be found by a polynomial-time algorithm. On the other hand, $\mathrm{RT}^2_2$ has non-elementary speedup over the weaker base theory $\mathrm{RCA}^*_0$ for p…
▽ More
We prove that any proof of a $\forall Σ^0_2$ sentence in the theory $\mathrm{WKL}_0 + \mathrm{RT}^2_2$ can be translated into a proof in $\mathrm{RCA}_0$ at the cost of a polynomial increase in size. In fact, the proof in $\mathrm{RCA}_0$ can be found by a polynomial-time algorithm. On the other hand, $\mathrm{RT}^2_2$ has non-elementary speedup over the weaker base theory $\mathrm{RCA}^*_0$ for proofs of $Σ_1$ sentences.
We also show that for $n \ge 0$, proofs of $Π_{n+2}$ sentences in $\mathrm{B}Σ_{n+1}+\exp$ can be translated into proofs in $\mathrm{I}Σ_{n} + \exp$ at polynomial cost. Moreover, the $Π_{n+2}$-conservativity of $\mathrm{B}Σ_{n+1} + \exp$ over $\mathrm{I}Σ_{n} + \exp$ can be proved in $\mathrm{PV}$, a fragment of bounded arithmetic corresponding to polynomial-time computation. For $n \ge 1$, this answers a question of Clote, Hájek, and Paris.
△ Less
Submitted 16 January, 2021; v1 submitted 14 May, 2020;
originally announced May 2020.
-
Scalable Gradients for Stochastic Differential Equations
Authors:
Xuechen Li,
Ting-Kam Leonard Wong,
Ricky T. Q. Chen,
David Duvenaud
Abstract:
The adjoint sensitivity method scalably computes gradients of solutions to ordinary differential equations. We generalize this method to stochastic differential equations, allowing time-efficient and constant-memory computation of gradients with high-order adaptive solvers. Specifically, we derive a stochastic differential equation whose solution is the gradient, a memory-efficient algorithm for c…
▽ More
The adjoint sensitivity method scalably computes gradients of solutions to ordinary differential equations. We generalize this method to stochastic differential equations, allowing time-efficient and constant-memory computation of gradients with high-order adaptive solvers. Specifically, we derive a stochastic differential equation whose solution is the gradient, a memory-efficient algorithm for caching noise, and conditions under which numerical solutions converge. In addition, we combine our method with gradient-based stochastic variational inference for latent stochastic differential equations. We use our method to fit stochastic dynamics defined by neural networks, achieving competitive performance on a 50-dimensional motion capture dataset.
△ Less
Submitted 18 October, 2020; v1 submitted 5 January, 2020;
originally announced January 2020.
-
Where Pigeonhole Principles meet König Lemmas
Authors:
David Belanger,
Chitat Chong,
Wei Wang,
Tin Lok Wong,
Yue Yang
Abstract:
We study the pigeonhole principle for $Σ_2$-definable injections with domain twice as large as the codomain, and the weak König lemma for $Δ^0_2$-definable trees in which every level has at least half of the possible nodes. We show that the latter implies the existence of $2$-random reals, and is conservative over the former. We also show that the former is strictly weaker than the usual pigeonhol…
▽ More
We study the pigeonhole principle for $Σ_2$-definable injections with domain twice as large as the codomain, and the weak König lemma for $Δ^0_2$-definable trees in which every level has at least half of the possible nodes. We show that the latter implies the existence of $2$-random reals, and is conservative over the former. We also show that the former is strictly weaker than the usual pigeonhole principle for $Σ_2$-definable injections.
△ Less
Submitted 7 December, 2019;
originally announced December 2019.
-
Random concave functions
Authors:
Peter Baxendale,
Ting-Kam Leonard Wong
Abstract:
Spaces of convex and concave functions appear naturally in theory and applications. For example, convex regression and log-concave density estimation are important topics in nonparametric statistics. In stochastic portfolio theory, concave functions on the unit simplex measure the concentration of capital, and their gradient maps define novel investment strategies. The gradient maps may also be re…
▽ More
Spaces of convex and concave functions appear naturally in theory and applications. For example, convex regression and log-concave density estimation are important topics in nonparametric statistics. In stochastic portfolio theory, concave functions on the unit simplex measure the concentration of capital, and their gradient maps define novel investment strategies. The gradient maps may also be regarded as optimal transport maps on the simplex. In this paper we construct and study probability measures supported on spaces of concave functions. These measures may serve as prior distributions in Bayesian statistics and Cover's universal portfolio, and induce distribution-valued random variables via optimal transport. The random concave functions are constructed on the unit simplex by taking a suitably scaled (mollified, or soft) minimum of random hyperplanes. Depending on the regime of the parameters, we show that as the number of hyperplanes tends to infinity there are several possible limiting behaviors. In particular, there is a transition from a deterministic almost sure limit to a non-trivial limiting distribution that can be characterized using convex duality and Poisson point processes.
△ Less
Submitted 24 May, 2021; v1 submitted 30 October, 2019;
originally announced October 2019.
-
Logarithmic divergences: geometry and interpretation of curvature
Authors:
Ting-Kam Leonard Wong,
Jiaowen Yang
Abstract:
We study the logarithmic $L^{(α)}$-divergence which extrapolates the Bregman divergence and corresponds to solutions to novel optimal transport problems. We show that this logarithmic divergence is equivalent to a conformal transformation of the Bregman divergence, and, via an explicit affine immersion, is equivalent to Kurose's geometric divergence. In particular, the $L^{(α)}$-divergence is a ca…
▽ More
We study the logarithmic $L^{(α)}$-divergence which extrapolates the Bregman divergence and corresponds to solutions to novel optimal transport problems. We show that this logarithmic divergence is equivalent to a conformal transformation of the Bregman divergence, and, via an explicit affine immersion, is equivalent to Kurose's geometric divergence. In particular, the $L^{(α)}$-divergence is a canonical divergence of a statistical manifold with constant sectional curvature $-α$. For such a manifold, we give a geometric interpretation of its sectional curvature in terms of how the divergence between a pair of primal and dual geodesics differ from the dually flat case. Further results can be found in our follow-up paper [27] which uncovers a novel relation between optimal transport and information geometry.
△ Less
Submitted 17 June, 2019;
originally announced June 2019.
-
Pseudo-Riemannian geometry embeds information geometry in optimal transport
Authors:
Ting-Kam Leonard Wong,
Jiaowen Yang
Abstract:
Optimal transport and information geometry both study geometric structures on spaces of probability distributions. Optimal transport characterizes the cost-minimizing movement from one distribution to another, while information geometry originates from coordinate-invariant properties of statistical inference. Their connections and applications in statistics and machine learning have started to gai…
▽ More
Optimal transport and information geometry both study geometric structures on spaces of probability distributions. Optimal transport characterizes the cost-minimizing movement from one distribution to another, while information geometry originates from coordinate-invariant properties of statistical inference. Their connections and applications in statistics and machine learning have started to gain more attention. In this paper we give a new differential geometric connection between the two fields. Namely, the pseudo-Riemannian framework of Kim and McCann, a geometric perspective on the fundamental Ma-Trudinger-Wang (MTW) condition in the regularity theory of optimal transport maps, encodes the dualistic structure of statistical manifold. This general relation is described using the natural framework of $c$-divergence, a divergence defined by an optimal transport map. As a by-product, we obtain a new information-geometric interpretation of the MTW tensor. This connection sheds light on old and new aspects of information geometry. The dually flat geometry of Bregman divergence corresponds to the quadratic cost and the pseudo-Euclidean space, and the $L^{(α)}$-divergence introduced by Pal and the first author has constant sectional curvature in a sense to be made precise. In these cases we give a geometric interpretation of the information-geometric curvature in terms of the divergence between a primal-dual pair of geodesics.
△ Less
Submitted 5 May, 2021; v1 submitted 31 May, 2019;
originally announced June 2019.
-
Multiplicative Schrödinger problem and the Dirichlet transport
Authors:
Soumik Pal,
Ting-Kam Leonard Wong
Abstract:
We consider an optimal transport problem on the unit simplex whose solutions are given by gradients of exponentially concave functions and prove two main results. First, we show that the optimal transport is the large deviation limit of a particle system of Dirichlet processes transporting one probability measure on the unit simplex to another by coordinatewise multiplication and normalizing. The…
▽ More
We consider an optimal transport problem on the unit simplex whose solutions are given by gradients of exponentially concave functions and prove two main results. First, we show that the optimal transport is the large deviation limit of a particle system of Dirichlet processes transporting one probability measure on the unit simplex to another by coordinatewise multiplication and normalizing. The structure of our Lagrangian and the appearance of the Dirichlet process relate our problem closely to the entropic measure on the Wasserstein space as defined by von-Renesse and Sturm in the context of Wasserstein diffusion. The limiting procedure is a triangular limit where we allow simultaneously the number of particles to grow to infinity while the `noise' tends to zero. The method, which generalizes easily to many other cost functions, including the squared Euclidean distance, provides a novel combination of the Schrödinger problem approach due to C. Léonard and the related Brownian particle systems by Adams et al.which does not require gamma convergence. Second, we analyze the behavior of entropy along the paths of transport. The reference measure on the simplex is taken to be the Dirichlet measure with all zero parameters which relates to the finite-dimensional distributions of the entropic measure. The interpolating curves are not the usual McCann lines. Nevertheless we show that entropy plus a multiple of the transport cost remains convex, which is reminiscent of the semiconvexity of entropy along lines of McCann interpolations in negative curvature spaces. We also obtain, under suitable conditions, dimension-free bounds of the optimal transport cost in terms of entropy.
△ Less
Submitted 4 July, 2020; v1 submitted 15 July, 2018;
originally announced July 2018.
-
Random walks and induced Dirichlet forms on compact spaces of homogeneous type
Authors:
Shi-Lei Kong,
Ka-Sing Lau,
Ting-Kam Leonard Wong
Abstract:
We extend our study of random walks and induced Dirichlet forms on self-similar sets [arXiv:1604.05440, 1612.01708] to compact spaces of homogeneous type $(K, ρ,μ)$. A successive partition on $K$ brings a natural augmented tree structure $(X, E)$ that is Gromov hyperbolic, and the hyperbolic boundary is Hölder equivalent to $K$. We then introduce a class of transient reversible random walks on…
▽ More
We extend our study of random walks and induced Dirichlet forms on self-similar sets [arXiv:1604.05440, 1612.01708] to compact spaces of homogeneous type $(K, ρ,μ)$. A successive partition on $K$ brings a natural augmented tree structure $(X, E)$ that is Gromov hyperbolic, and the hyperbolic boundary is Hölder equivalent to $K$. We then introduce a class of transient reversible random walks on $(X, E)$ with return ratio $λ$. Using Silverstein's theory of Markov chains, we prove that the random walk induces an energy form on $K$ with $$ {\mathcal E}_K [u] \asymp \iint_{K\times K \setminus Δ} \frac{|u(ξ) - u(η)|^2}{V(ξ, η)ρ(ξ, η)^β} dμ(ξ) dμ(η), $$ where $V(ξ, η)$ is the $μ$-volume of the ball centered at $ξ$ with radius $ρ(ξ, η)$, $Δ$ is the diagonal, and $β$ depends on $λ$. In particular, for an $α$-set in ${\mathbb R}^d$, the kernel of the energy form is of order $\frac{1}{|ξ-η|^{α+β}}$. We also discuss conditions for this energy form to be a non-local regular Dirichlet form.
△ Less
Submitted 8 April, 2018;
originally announced April 2018.
-
Logarithmic divergences from optimal transport and Rényi geometry
Authors:
Ting-Kam Leonard Wong
Abstract:
Divergences, also known as contrast functions, are distance-like quantities defined on manifolds of non-negative or probability measures. Using the duality in optimal transport, we introduce and study the one-parameter family of $L^{(\pm α)}$-divergences. It includes the Bregman divergence corresponding to the Euclidean quadratic cost, and the $L$-divergence introduced by Pal and the author in con…
▽ More
Divergences, also known as contrast functions, are distance-like quantities defined on manifolds of non-negative or probability measures. Using the duality in optimal transport, we introduce and study the one-parameter family of $L^{(\pm α)}$-divergences. It includes the Bregman divergence corresponding to the Euclidean quadratic cost, and the $L$-divergence introduced by Pal and the author in connection with portfolio theory and a logarithmic cost function. They admit natural generalizations of exponential family that are closely related to the $α$-family and $q$-exponential family. In particular, the $L^{(\pm α)}$-divergences of the corresponding potential functions are Rényi divergences. Using this unified framework we prove that the induced geometries are dually projectively flat with constant sectional curvatures, and a generalized Pythagorean theorem holds true. Conversely, we show that if a statistical manifold is dually projectively flat with constant curvature $\pm α$ with $α> 0$, then it is locally induced by an $L^{(\mp α)}$-divergence. We define in this context a canonical divergence which extends the one for dually flat manifolds.
△ Less
Submitted 3 September, 2018; v1 submitted 10 December, 2017;
originally announced December 2017.
-
On portfolios generated by optimal transport
Authors:
Ting-Kam Leonard Wong
Abstract:
First introduced by Fernholz in stochastic portfolio theory, functionally generated portfolio allows its investment performance to be attributed to directly observable and easily interpretable market quantities. In previous works we showed that Fernholz's multiplicatively generated portfolio has deep connections with optimal transport and the information geometry of exponentially concave functions…
▽ More
First introduced by Fernholz in stochastic portfolio theory, functionally generated portfolio allows its investment performance to be attributed to directly observable and easily interpretable market quantities. In previous works we showed that Fernholz's multiplicatively generated portfolio has deep connections with optimal transport and the information geometry of exponentially concave functions. Recently, Karatzas and Ruf introduced a new additive portfolio generation whose relation with optimal transport was studied by Vervuurt. We show that additively generated portfolio can be interpreted in terms of the well-known dually flat information geometry of Bregman divergence. Moreover, we characterize, in a sense to be made precise, all possible forms of functional portfolio constructions that contain additive and multiplicative generations as special cases. Each construction involves a divergence functional on the unit simplex measuring the market volatility captured, and admits a pathwise decomposition for the portfolio value. We illustrate with an empirical example.
△ Less
Submitted 25 September, 2017; v1 submitted 10 September, 2017;
originally announced September 2017.
-
Some observations on the logical foundations of inductive theorem proving
Authors:
Stefan Hetzl,
Tin Lok Wong
Abstract:
In this paper we study the logical foundations of automated inductive theorem proving. To that aim we first develop a theoretical model that is centered around the difficulty of finding induction axioms which are sufficient for proving a goal.
Based on this model, we then analyze the following aspects: the choice of a proof shape, the choice of an induction rule and the language of the induction…
▽ More
In this paper we study the logical foundations of automated inductive theorem proving. To that aim we first develop a theoretical model that is centered around the difficulty of finding induction axioms which are sufficient for proving a goal.
Based on this model, we then analyze the following aspects: the choice of a proof shape, the choice of an induction rule and the language of the induction formula. In particular, using model-theoretic techniques, we clarify the relationship between notions of inductiveness that have been considered in the literature on automated inductive theorem proving. This is a corrected version of the paper arXiv:1704.01930v5 published originally on Nov.~16, 2017.
△ Less
Submitted 12 April, 2018; v1 submitted 6 April, 2017;
originally announced April 2017.
-
Cover's universal portfolio, stochastic portfolio theory and the numeraire portfolio
Authors:
Christa Cuchiero,
Walter Schachermayer,
Ting-Kam Leonard Wong
Abstract:
Cover's celebrated theorem states that the long run yield of a properly chosen "universal" portfolio is as good as the long run yield of the best retrospectively chosen constant rebalanced portfolio. The "universality" pertains to the fact that this result is model-free, i.e., not dependent on an underlying stochastic process. We extend Cover's theorem to the setting of stochastic portfolio theory…
▽ More
Cover's celebrated theorem states that the long run yield of a properly chosen "universal" portfolio is as good as the long run yield of the best retrospectively chosen constant rebalanced portfolio. The "universality" pertains to the fact that this result is model-free, i.e., not dependent on an underlying stochastic process. We extend Cover's theorem to the setting of stochastic portfolio theory as initiated by R. Fernholz: the rebalancing rule need not to be constant anymore but may depend on the present state of the stock market. This model-free result is complemented by a comparison with the log-optimal numeraire portfolio when fixing a stochastic model of the stock market. Roughly speaking, under appropriate assumptions, the optimal long run yield coincides for the three approaches mentioned in the title of this paper. We present our results in discrete and continuous time.
△ Less
Submitted 29 November, 2016;
originally announced November 2016.
-
Exponentially concave functions and a new information geometry
Authors:
Soumik Pal,
Ting-Kam Leonard Wong
Abstract:
A function is exponentially concave if its exponential is concave. We consider exponentially concave functions on the unit simplex. In a previous paper we showed that gradient maps of exponentially concave functions provide solutions to a Monge-Kantorovich optimal transport problem and give a better gradient approximation than those of ordinary concave functions. The approximation error, called L-…
▽ More
A function is exponentially concave if its exponential is concave. We consider exponentially concave functions on the unit simplex. In a previous paper we showed that gradient maps of exponentially concave functions provide solutions to a Monge-Kantorovich optimal transport problem and give a better gradient approximation than those of ordinary concave functions. The approximation error, called L-divergence, is different from the usual Bregman divergence. Using tools of information geometry and optimal transport, we show that L-divergence induces a new information geometry on the simplex consisting of a Riemannian metric and a pair of dually coupled affine connections which defines two kinds of geodesics. We show that the induced geometry is dually projectively flat but not flat. Nevertheless, we prove an analogue of the celebrated generalized Pythagorean theorem from classical information geometry. On the other hand, we consider displacement interpolation under a Lagrangian integral action that is consistent with the optimal transport problem and show that the action minimizing curves are dual geodesics. The Pythagorean theorem is also shown to have an interesting application of determining the optimal trading frequency in stochastic portfolio theory.
△ Less
Submitted 31 May, 2017; v1 submitted 19 May, 2016;
originally announced May 2016.
-
Random walks and induced Dirichlet forms on self-similar sets
Authors:
Shi-Lei Kong,
Ka-Sing Lau,
Ting-Kam Leonard Wong
Abstract:
Let $K$ be a self-similar set satisfying the open set condition. Following Kaimanovich's elegant idea, it has been proved that on the symbolic space $X$ of $K$ a natural augmented tree structure ${\mathfrak E}$ exists; it is hyperbolic, and the hyperbolic boundary $\partial_HX$ with the Gromov metric is Hölder equivalent to $K$. In this paper we consider certain reversible random walks with return…
▽ More
Let $K$ be a self-similar set satisfying the open set condition. Following Kaimanovich's elegant idea, it has been proved that on the symbolic space $X$ of $K$ a natural augmented tree structure ${\mathfrak E}$ exists; it is hyperbolic, and the hyperbolic boundary $\partial_HX$ with the Gromov metric is Hölder equivalent to $K$. In this paper we consider certain reversible random walks with return ratio $0< λ<1$ on $(X, {\mathfrak E})$. We show that the Martin boundary ${\mathcal M}$ can be identified with $\partial_H X$ and $K$. With this setup and a device of Silverstein, we obtain precise estimates of the Martin kernel and the Naïm kernel in terms of the Gromov product. Moreover, the Naïm kernel turns out to be a jump kernel satisfying the estimate $Θ(ξ, η) \asymp |ξ-η|^{-(α+ β)}$, where $α$ is the Hausdorff dimension of $K$ and $β$ depends on $λ$. For suitable $β$, the kernel defines a regular non-local Dirichlet form on $K$. This extends the results of Kigami concerning random walks on certain trees with Cantor-type sets as boundaries.
△ Less
Submitted 19 October, 2017; v1 submitted 19 April, 2016;
originally announced April 2016.
-
Universal portfolios in stochastic portfolio theory
Authors:
Ting-Kam Leonard Wong
Abstract:
Consider a family of portfolio strategies with the aim of achieving the asymptotic growth rate of the best one. The idea behind Cover's universal portfolio is to build a wealth-weighted average which can be viewed as a buy-and-hold portfolio of portfolios. When an optimal portfolio exists, the wealth-weighted average converges to it by concentration of wealth. Working under a discrete time and pat…
▽ More
Consider a family of portfolio strategies with the aim of achieving the asymptotic growth rate of the best one. The idea behind Cover's universal portfolio is to build a wealth-weighted average which can be viewed as a buy-and-hold portfolio of portfolios. When an optimal portfolio exists, the wealth-weighted average converges to it by concentration of wealth. Working under a discrete time and pathwise setup, we show under suitable conditions that the distribution of wealth in the family satisfies a pathwise large deviation principle as time tends to infinity. Our main result extends Cover's portfolio to the nonparametric family of functionally generated portfolios in stochastic portfolio theory and establishes its asymptotic universality.
△ Less
Submitted 12 December, 2016; v1 submitted 9 October, 2015;
originally announced October 2015.
-
Optimization of relative arbitrage
Authors:
Ting-Kam Leonard Wong
Abstract:
In stochastic portfolio theory, a relative arbitrage is an equity portfolio which is guaranteed to outperform a benchmark portfolio over a finite horizon. When the market is diverse and sufficiently volatile, and the benchmark is the market or a buy-and-hold portfolio, functionally generated portfolios introduced by Fernholz provide a systematic way of constructing relative arbitrages. In this pap…
▽ More
In stochastic portfolio theory, a relative arbitrage is an equity portfolio which is guaranteed to outperform a benchmark portfolio over a finite horizon. When the market is diverse and sufficiently volatile, and the benchmark is the market or a buy-and-hold portfolio, functionally generated portfolios introduced by Fernholz provide a systematic way of constructing relative arbitrages. In this paper we show that if the market portfolio is replaced by the equal or entropy weighted portfolio among many others, no relative arbitrages can be constructed under the same conditions using functionally generated portfolios. We also introduce and study a shaped-constrained optimization problem for functionally generated portfolios in the spirit of maximum likelihood estimation of a log-concave density.
△ Less
Submitted 24 November, 2014; v1 submitted 31 July, 2014;
originally announced July 2014.
-
The geometry of relative arbitrage
Authors:
Soumik Pal,
Ting-Kam Leonard Wong
Abstract:
Consider an equity market with $n$ stocks. The vector of proportions of the total market capitalizations that belong to each stock is called the market weight. The market weight defines the market portfolio which is a buy-and-hold portfolio representing the performance of the entire stock market. Consider a function that assigns a portfolio vector to each possible value of the market weight, and w…
▽ More
Consider an equity market with $n$ stocks. The vector of proportions of the total market capitalizations that belong to each stock is called the market weight. The market weight defines the market portfolio which is a buy-and-hold portfolio representing the performance of the entire stock market. Consider a function that assigns a portfolio vector to each possible value of the market weight, and we perform self-financing trading using this portfolio function. We study the problem of characterizing functions such that the resulting portfolio will outperform the market portfolio in the long run under the conditions of diversity and sufficient volatility. No other assumption on the future behavior of stock prices is made. We prove that the only solutions are functionally generated portfolios in the sense of Fernholz. A second characterization is given as the optimal maps of a remarkable optimal transport problem. Both characterizations follow from a novel property of portfolios called multiplicative cyclical monotonicity.
△ Less
Submitted 27 July, 2015; v1 submitted 15 February, 2014;
originally announced February 2014.
-
Energy, entropy, and arbitrage
Authors:
Soumik Pal,
Ting-Kam Leonard Wong
Abstract:
We introduce a pathwise approach to analyze the relative performance of an equity portfolio with respect to a benchmark market portfolio. In this energy-entropy framework, the relative performance is decomposed into three components: a volatility term, a relative entropy term measuring the distance between the portfolio weights and the market capital distribution, and another entropy term that can…
▽ More
We introduce a pathwise approach to analyze the relative performance of an equity portfolio with respect to a benchmark market portfolio. In this energy-entropy framework, the relative performance is decomposed into three components: a volatility term, a relative entropy term measuring the distance between the portfolio weights and the market capital distribution, and another entropy term that can be controlled by the investor by adopting a suitable rebalancing strategy. This framework leads to a class of portfolio strategies that allows one to outperform, in the long run, a market that is diverse and sufficiently volatile in the sense of stochastic portfolio theory. The framework is illustrated with several empirical examples.
△ Less
Submitted 1 January, 2016; v1 submitted 25 August, 2013;
originally announced August 2013.