Search | arXiv e-print repository

Maximum likelihood estimation for the $λ$-exponential family

Authors: Xiwei Tian, Ting-Kam Leonard Wong, Jiaowen Yang, Jun Zhang

Abstract: The $λ$-exponential family generalizes the standard exponential family via a generalized convex duality motivated by optimal transport. It is the constant-curvature analogue of the exponential family from the information-geometric point of view, but the development of computational methodologies is still in an early stage. In this paper, we propose a fixed point iteration for maximum likelihood es… ▽ More The $λ$-exponential family generalizes the standard exponential family via a generalized convex duality motivated by optimal transport. It is the constant-curvature analogue of the exponential family from the information-geometric point of view, but the development of computational methodologies is still in an early stage. In this paper, we propose a fixed point iteration for maximum likelihood estimation under i.i.d.~sampling, and prove using the duality that the likelihood is monotone along the iterations. We illustrate the algorithm with the $q$-Gaussian distribution and the Dirichlet perturbation. △ Less

Submitted 6 May, 2025; originally announced May 2025.

Comments: 9 pages, 2 figures

arXiv:2503.06838 [pdf, other]

On the Wasserstein alignment problem

Authors: Soumik Pal, Bodhisattva Sen, Ting-Kam Leonard Wong

Abstract: Suppose we are given two metric spaces and a family of continuous transformations from one to the other. Given a probability distribution on each of these two spaces - namely the source and the target measures - the Wasserstein alignment problem seeks the transformation that minimizes the optimal transport cost between its pushforward of the source distribution and the target distribution, ensurin… ▽ More Suppose we are given two metric spaces and a family of continuous transformations from one to the other. Given a probability distribution on each of these two spaces - namely the source and the target measures - the Wasserstein alignment problem seeks the transformation that minimizes the optimal transport cost between its pushforward of the source distribution and the target distribution, ensuring the closest possible alignment in a probabilistic sense. Examples of interest include two distributions on two Euclidean spaces $\mathbb{R}^n$ and $\mathbb{R}^d$, and we want a spatial embedding of the $n$-dimensional source measure in $\mathbb{R}^d$ that is closest in some Wasserstein metric to the target distribution on $\mathbb{R}^d$. Similar data alignment problems also commonly arise in shape analysis and computer vision. In this paper we show that this nonconvex optimal transport projection problem admits a convex Kantorovich-type dual. This allows us to characterize the set of projections and devise a linear programming algorithm. For certain special examples, such as orthogonal transformations on Euclidean spaces of unequal dimensions and the $2$-Wasserstein cost, we characterize the covariance of the optimal projections. Our results also cover the generalization when we penalize each transformation by a function. An example is the inner product Gromov-Wasserstein distance minimization problem which has recently gained popularity. △ Less

Submitted 9 March, 2025; originally announced March 2025.

Comments: 30 pages, 4 figures

MSC Class: 49Q22; 90C46 (Primary); 90C25; 90C05 (Secondary)

arXiv:2404.06625 [pdf, other]

Adapted optimal transport between Gaussian processes in discrete time

Authors: Madhu Gunasingam, Ting-Kam Leonard Wong

Abstract: We derive explicitly the adapted $2$-Wasserstein distance between non-degenerate Gaussian distributions on $\mathbb{R}^N$ and characterize the optimal bicausal coupling(s). This leads to an adapted version of the Bures-Wasserstein distance on the space of positive definite matrices. We derive explicitly the adapted $2$-Wasserstein distance between non-degenerate Gaussian distributions on $\mathbb{R}^N$ and characterize the optimal bicausal coupling(s). This leads to an adapted version of the Bures-Wasserstein distance on the space of positive definite matrices. △ Less

Submitted 12 January, 2025; v1 submitted 9 April, 2024; originally announced April 2024.

Comments: 16 pages, 2 figures. To appear in Electronic Communications in Probability

MSC Class: 49Q22; 60G15 (Primary); 60B10 (Secondary)

arXiv:2402.17681 [pdf, other]

JKO schemes with general transport costs

Authors: Cale Rankin, Ting-Kam Leonard Wong

Abstract: We modify the JKO scheme, which is a time discretization of Wasserstein gradient flows, by replacing the Wasserstein distance with more general transport costs on manifolds. We show when the cost function has a mixed Hessian which defines a Riemannian metric, our modified JKO scheme converges under suitable conditions to the corresponding Riemannian Fokker--Planck equation. Thus on a Riemannian ma… ▽ More We modify the JKO scheme, which is a time discretization of Wasserstein gradient flows, by replacing the Wasserstein distance with more general transport costs on manifolds. We show when the cost function has a mixed Hessian which defines a Riemannian metric, our modified JKO scheme converges under suitable conditions to the corresponding Riemannian Fokker--Planck equation. Thus on a Riemannian manifold one may replace the (squared) Riemannian distance with any cost function which induces the metric. Of interest is when the Riemannian distance is computationally intractable, but a suitable cost has a simple analytic expression. We consider the Fokker--Planck equation on compact submanifolds with the Neumann boundary condition and on complete Riemannian manifolds with a finite drift condition. As an application we consider Hessian manifolds, taking as a cost the Bregman divergence. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 27 pages

MSC Class: 35K57 (Primary) 58J35; 82C31 (Secondary)

arXiv:2310.03884 [pdf, other]

Information Geometry for the Working Information Theorist

Authors: Kumar Vijay Mishra, M. Ashok Kumar, Ting-Kam Leonard Wong

Abstract: Information geometry is a study of statistical manifolds, that is, spaces of probability distributions from a geometric perspective. Its classical information-theoretic applications relate to statistical concepts such as Fisher information, sufficient statistics, and efficient estimators. Today, information geometry has emerged as an interdisciplinary field that finds applications in diverse areas… ▽ More Information geometry is a study of statistical manifolds, that is, spaces of probability distributions from a geometric perspective. Its classical information-theoretic applications relate to statistical concepts such as Fisher information, sufficient statistics, and efficient estimators. Today, information geometry has emerged as an interdisciplinary field that finds applications in diverse areas such as radar sensing, array signal processing, quantum physics, deep learning, and optimal transport. This article presents an overview of essential information geometry to initiate an information theorist, who may be unfamiliar with this exciting area of research. We explain the concepts of divergences on statistical manifolds, generalized notions of distances, orthogonality, and geodesics, thereby paving the way for concrete applications and novel theoretical investigations. We also highlight some recent information-geometric developments, which are of interest to the broader information theory community. △ Less

Submitted 5 October, 2023; originally announced October 2023.

Comments: 12 pages, 3 figures, 1 table

arXiv:2302.05833 [pdf, other]

Bregman-Wasserstein divergence: geometry and applications

Authors: Amanjit Singh Kainth, Cale Rankin, Ting-Kam Leonard Wong

Abstract: The Bregman-Wasserstein divergence is the optimal transport cost when the underlying cost function is given by a Bregman divergence, and arises naturally in fields such as statistics and machine learning. We establish fundamental properties of the Bregman-Wasserstein divergence and propose a novel generalized transport geometry that promotes the Bregman geometry to the space of probability distrib… ▽ More The Bregman-Wasserstein divergence is the optimal transport cost when the underlying cost function is given by a Bregman divergence, and arises naturally in fields such as statistics and machine learning. We establish fundamental properties of the Bregman-Wasserstein divergence and propose a novel generalized transport geometry that promotes the Bregman geometry to the space of probability distributions. We provide a probabilistic interpretation involving exponential families and define generalized displacement interpolations compatible with the Bregman geometry. These interpolations are used to derive a generalized Pythagorean inequality, which is of independent interest. Furthermore, we construct a generalized dualistic geometry that lifts the differential geometry of the Bregman divergence to an infinite-dimensional statistical manifold. On the computational side, we demonstrate how Bregman-Wasserstein optimal transport maps can be estimated using neural approaches, establish the well-posedness of Bregman-Wasserstein barycenters, and relate them to Bayesian learning. Finally, we introduce the Bregman-Wasserstein JKO scheme for discretizing Riemannian Wasserstein gradient flows. △ Less

Submitted 10 April, 2025; v1 submitted 11 February, 2023; originally announced February 2023.

Comments: 57 pages, Significant changes to structure with new sections on applications

MSC Class: 53B12 (Primary) 49Q22; 58B20 (Secondary)

arXiv:2209.02938 [pdf, other]

Conformal Mirror Descent with Logarithmic Divergences

Authors: Amanjit Singh Kainth, Ting-Kam Leonard Wong, Frank Rudzicz

Abstract: The logarithmic divergence is an extension of the Bregman divergence motivated by optimal transport and a generalized convex duality, and satisfies many remarkable properties. Using the geometry induced by the logarithmic divergence, we introduce a generalization of continuous time mirror descent that we term the conformal mirror descent. We derive its dynamics under a generalized mirror map, and… ▽ More The logarithmic divergence is an extension of the Bregman divergence motivated by optimal transport and a generalized convex duality, and satisfies many remarkable properties. Using the geometry induced by the logarithmic divergence, we introduce a generalization of continuous time mirror descent that we term the conformal mirror descent. We derive its dynamics under a generalized mirror map, and show that it is a time change of a corresponding Hessian gradient flow. We also prove convergence results in continuous time. We apply the conformal mirror descent to online estimation of a generalized exponential family, and construct a family of gradient flows on the unit simplex via the Dirichlet optimal transport problem. △ Less

Submitted 7 September, 2022; originally announced September 2022.

arXiv:2112.10876 [pdf, ps, other]

An isomorphism theorem for models of Weak König's Lemma without primitive recursion

Authors: Marta Fiori-Carones, Leszek Aleksander Kołodziejczyk, Tin Lok Wong, Keita Yokoyama

Abstract: We prove that if $(M,\mathcal{X})$ and $(M,\mathcal{Y})$ are countable models of the theory $\mathrm{WKL}^*_0$ such that $\mathrm{I}Σ_1(A)$ fails for some $A \in \mathcal{X} \cap \mathcal{Y}$, then $(M,\mathcal{X})$ and $(M,\mathcal{Y})$ are isomorphic. As a consequence, the analytic hierarchy collapses to $Δ^1_1$ provably in $\mathrm{WKL}^*_0 + \neg\mathrm{I}Σ^0_1$, and $\mathrm{WKL}$ is the stro… ▽ More We prove that if $(M,\mathcal{X})$ and $(M,\mathcal{Y})$ are countable models of the theory $\mathrm{WKL}^*_0$ such that $\mathrm{I}Σ_1(A)$ fails for some $A \in \mathcal{X} \cap \mathcal{Y}$, then $(M,\mathcal{X})$ and $(M,\mathcal{Y})$ are isomorphic. As a consequence, the analytic hierarchy collapses to $Δ^1_1$ provably in $\mathrm{WKL}^*_0 + \neg\mathrm{I}Σ^0_1$, and $\mathrm{WKL}$ is the strongest $Π^1_2$ statement that is $Π^1_1$-conservative over $\mathrm{RCA}^*_0 + \neg\mathrm{I}Σ^0_1$. Applying our results to the $Δ^0_n$-definable sets in models of $\mathrm{RCA}^*_0 + \mathrm{B}Σ^0_n + \neg\mathrm{I}Σ^0_n$ that also satisfy an appropriate relativization of Weak König's Lemma, we prove that for each $n \ge 1$, the set of $Π^1_2$ sentences that are $Π^1_1$-conservative over $\mathrm{RCA}^*_0 + \mathrm{B}Σ^0_n + \neg\mathrm{I}Σ^0_n$ is c.e. In contrast, we prove that the set of $Π^1_2$ sentences that are $Π^1_1$-conservative over $\mathrm{RCA}^*_0 + \mathrm{B}Σ^0_n$ is $Π_2$-complete. This answers a question of Towsner. We also show that $\mathrm{RCA}_0 + \mathrm{RT}^2_2$ is $Π^1_1$-conservative over $\mathrm{B}Σ^0_2$ if and only if it is conservative over $\mathrm{B}Σ^0_2$ with respect to $\forall Π^0_5$ sentences. △ Less

Submitted 27 August, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

Comments: 29 pages. Somewhat more polished version compared to v1, with small improvements and simplifications throughout the text but no major mathematical changes. Introduction slightly expanded to point out model-theoretic aspects of the paper

MSC Class: 03H15; 03C10; 03B30; 03C62; 03F30; 03F35

arXiv:2107.11925 [pdf, other]

Tsallis and Rényi deformations linked via a new $λ$-duality

Authors: Ting-Kam Leonard Wong, Jun Zhang

Abstract: Tsallis and Rényi entropies, which are monotone transformations of each other, are deformations of the celebrated Shannon entropy. Maximization of these deformed entropies, under suitable constraints, leads to the $q$-exponential family which has applications in non-extensive statistical physics, information theory and statistics. In previous information-geometric studies, the $q$-exponential fami… ▽ More Tsallis and Rényi entropies, which are monotone transformations of each other, are deformations of the celebrated Shannon entropy. Maximization of these deformed entropies, under suitable constraints, leads to the $q$-exponential family which has applications in non-extensive statistical physics, information theory and statistics. In previous information-geometric studies, the $q$-exponential family was analyzed using classical convex duality and Bregman divergence. In this paper, we show that a generalized $λ$-duality, where $λ= 1 - q$ is the constant information-geometric curvature, leads to a generalized exponential family which is essentially equivalent to the $q$-exponential family and has deep connections with Rényi entropy and optimal transport. Using this generalized convex duality and its associated logarithmic divergence, we show that our $λ$-exponential family satisfies properties that parallel and generalize those of the exponential family. Under our framework, the Rényi entropy and divergence arise naturally, and we give a new proof of the Tsallis/Rényi entropy maximizing property of the $q$-exponential family. We also introduce a $λ$-mixture family which may be regarded as the dual of the $λ$-exponential family, and connect it with other mixture-type families. Finally, we discuss a duality between the $λ$-exponential family and the $λ$-logarithmic divergence, and study its statistical consequences. △ Less

Submitted 12 January, 2022; v1 submitted 25 July, 2021; originally announced July 2021.

Comments: 41 pages, 7 figures. Revised

arXiv:2105.07767 [pdf, other]

Projections with logarithmic divergences

Authors: Zhixu Tao, Ting-Kam Leonard Wong

Abstract: In information geometry, generalized exponential families and statistical manifolds with curvature are under active investigation in recent years. In this paper we consider the statistical manifold induced by a logarithmic $L^{(α)}$-divergence which generalizes the Bregman divergence. It is known that such a manifold is dually projectively flat with constant negative sectional curvature, and is cl… ▽ More In information geometry, generalized exponential families and statistical manifolds with curvature are under active investigation in recent years. In this paper we consider the statistical manifold induced by a logarithmic $L^{(α)}$-divergence which generalizes the Bregman divergence. It is known that such a manifold is dually projectively flat with constant negative sectional curvature, and is closely related to the $\mathcal{F}^{(α)}$-family, a generalized exponential family introduced by the second author. Our main result constructs a dual foliation of the statistical manifold, i.e., an orthogonal decomposition consisting of primal and dual autoparallel submanifolds. This decomposition, which can be naturally interpreted in terms of primal and dual projections with respect to the logarithmic divergence, extends the dual foliation of a dually flat manifold studied by Amari. As an application, we formulate a new $L^{(α)}$-PCA problem which generalizes the exponential family PCA. △ Less

Submitted 8 May, 2021; originally announced May 2021.

Comments: 9 pages, 2 figures. To appear in GSI2021

arXiv:2103.10925 [pdf, other]

Functional portfolio optimization in stochastic portfolio theory

Authors: Steven Campbell, Ting-Kam Leonard Wong

Abstract: In this paper we develop a concrete and fully implementable approach to the optimization of functionally generated portfolios in stochastic portfolio theory. The main idea is to optimize over a family of rank-based portfolios parameterized by an exponentially concave function on the unit interval. This choice can be motivated by the long term stability of the capital distribution observed in large… ▽ More In this paper we develop a concrete and fully implementable approach to the optimization of functionally generated portfolios in stochastic portfolio theory. The main idea is to optimize over a family of rank-based portfolios parameterized by an exponentially concave function on the unit interval. This choice can be motivated by the long term stability of the capital distribution observed in large equity markets, and allows us to circumvent the curse of dimensionality. The resulting optimization problem, which is convex, allows for various regularizations and constraints to be imposed on the generating function. We prove an existence and uniqueness result for our optimization problem and provide a stability estimate in terms of a Wasserstein metric of the input measure. Then, we formulate a discretization which can be implemented numerically using available software packages and analyze its approximation error. Finally, we present empirical examples using CRSP data from the US stock market, including the performance of the portfolios allowing for dividends, defaults, and transaction costs. △ Less

Submitted 9 October, 2021; v1 submitted 19 March, 2021; originally announced March 2021.

Comments: 41 pages, 7 figures, 1 table. Revised version

arXiv:2005.06854 [pdf, ps, other]

Ramsey's theorem for pairs, collection, and proof size

Authors: Leszek Aleksander Kołodziejczyk, Tin Lok Wong, Keita Yokoyama

Abstract: We prove that any proof of a $\forall Σ^0_2$ sentence in the theory $\mathrm{WKL}_0 + \mathrm{RT}^2_2$ can be translated into a proof in $\mathrm{RCA}_0$ at the cost of a polynomial increase in size. In fact, the proof in $\mathrm{RCA}_0$ can be found by a polynomial-time algorithm. On the other hand, $\mathrm{RT}^2_2$ has non-elementary speedup over the weaker base theory $\mathrm{RCA}^*_0$ for p… ▽ More We prove that any proof of a $\forall Σ^0_2$ sentence in the theory $\mathrm{WKL}_0 + \mathrm{RT}^2_2$ can be translated into a proof in $\mathrm{RCA}_0$ at the cost of a polynomial increase in size. In fact, the proof in $\mathrm{RCA}_0$ can be found by a polynomial-time algorithm. On the other hand, $\mathrm{RT}^2_2$ has non-elementary speedup over the weaker base theory $\mathrm{RCA}^*_0$ for proofs of $Σ_1$ sentences. We also show that for $n \ge 0$, proofs of $Π_{n+2}$ sentences in $\mathrm{B}Σ_{n+1}+\exp$ can be translated into proofs in $\mathrm{I}Σ_{n} + \exp$ at polynomial cost. Moreover, the $Π_{n+2}$-conservativity of $\mathrm{B}Σ_{n+1} + \exp$ over $\mathrm{I}Σ_{n} + \exp$ can be proved in $\mathrm{PV}$, a fragment of bounded arithmetic corresponding to polynomial-time computation. For $n \ge 1$, this answers a question of Clote, Hájek, and Paris. △ Less

Submitted 16 January, 2021; v1 submitted 14 May, 2020; originally announced May 2020.

Comments: 33 pages. Corrected definition of forcing in Section 4, with appropriate modifications to the argument. Minor editorial changes throughout the text

MSC Class: 03B30; 03F35; 03F20 (Primary); 03F25; 03F30; 03H15; 05D10 (Secondary)

arXiv:2001.01328 [pdf, other]

Scalable Gradients for Stochastic Differential Equations

Authors: Xuechen Li, Ting-Kam Leonard Wong, Ricky T. Q. Chen, David Duvenaud

Abstract: The adjoint sensitivity method scalably computes gradients of solutions to ordinary differential equations. We generalize this method to stochastic differential equations, allowing time-efficient and constant-memory computation of gradients with high-order adaptive solvers. Specifically, we derive a stochastic differential equation whose solution is the gradient, a memory-efficient algorithm for c… ▽ More The adjoint sensitivity method scalably computes gradients of solutions to ordinary differential equations. We generalize this method to stochastic differential equations, allowing time-efficient and constant-memory computation of gradients with high-order adaptive solvers. Specifically, we derive a stochastic differential equation whose solution is the gradient, a memory-efficient algorithm for caching noise, and conditions under which numerical solutions converge. In addition, we combine our method with gradient-based stochastic variational inference for latent stochastic differential equations. We use our method to fit stochastic dynamics defined by neural networks, achieving competitive performance on a 50-dimensional motion capture dataset. △ Less

Submitted 18 October, 2020; v1 submitted 5 January, 2020; originally announced January 2020.

Comments: AISTATS 2020; 25 pages, 6 figures in main text; clarify notation in appendix

arXiv:1912.03487 [pdf, ps, other]

Where Pigeonhole Principles meet König Lemmas

Authors: David Belanger, Chitat Chong, Wei Wang, Tin Lok Wong, Yue Yang

Abstract: We study the pigeonhole principle for $Σ_2$-definable injections with domain twice as large as the codomain, and the weak König lemma for $Δ^0_2$-definable trees in which every level has at least half of the possible nodes. We show that the latter implies the existence of $2$-random reals, and is conservative over the former. We also show that the former is strictly weaker than the usual pigeonhol… ▽ More We study the pigeonhole principle for $Σ_2$-definable injections with domain twice as large as the codomain, and the weak König lemma for $Δ^0_2$-definable trees in which every level has at least half of the possible nodes. We show that the latter implies the existence of $2$-random reals, and is conservative over the former. We also show that the former is strictly weaker than the usual pigeonhole principle for $Σ_2$-definable injections. △ Less

Submitted 7 December, 2019; originally announced December 2019.

Comments: 33 pages

arXiv:1910.13668 [pdf, other]

Random concave functions

Authors: Peter Baxendale, Ting-Kam Leonard Wong

Abstract: Spaces of convex and concave functions appear naturally in theory and applications. For example, convex regression and log-concave density estimation are important topics in nonparametric statistics. In stochastic portfolio theory, concave functions on the unit simplex measure the concentration of capital, and their gradient maps define novel investment strategies. The gradient maps may also be re… ▽ More Spaces of convex and concave functions appear naturally in theory and applications. For example, convex regression and log-concave density estimation are important topics in nonparametric statistics. In stochastic portfolio theory, concave functions on the unit simplex measure the concentration of capital, and their gradient maps define novel investment strategies. The gradient maps may also be regarded as optimal transport maps on the simplex. In this paper we construct and study probability measures supported on spaces of concave functions. These measures may serve as prior distributions in Bayesian statistics and Cover's universal portfolio, and induce distribution-valued random variables via optimal transport. The random concave functions are constructed on the unit simplex by taking a suitably scaled (mollified, or soft) minimum of random hyperplanes. Depending on the regime of the parameters, we show that as the number of hyperplanes tends to infinity there are several possible limiting behaviors. In particular, there is a transition from a deterministic almost sure limit to a non-trivial limiting distribution that can be characterized using convex duality and Poisson point processes. △ Less

Submitted 24 May, 2021; v1 submitted 30 October, 2019; originally announced October 2019.

Comments: 42 pages, 8 figures. Substantially revised. To appear in The Annals of Applied Probability

arXiv:1906.09103 [pdf, ps, other]

Logarithmic divergences: geometry and interpretation of curvature

Authors: Ting-Kam Leonard Wong, Jiaowen Yang

Abstract: We study the logarithmic $L^{(α)}$-divergence which extrapolates the Bregman divergence and corresponds to solutions to novel optimal transport problems. We show that this logarithmic divergence is equivalent to a conformal transformation of the Bregman divergence, and, via an explicit affine immersion, is equivalent to Kurose's geometric divergence. In particular, the $L^{(α)}$-divergence is a ca… ▽ More We study the logarithmic $L^{(α)}$-divergence which extrapolates the Bregman divergence and corresponds to solutions to novel optimal transport problems. We show that this logarithmic divergence is equivalent to a conformal transformation of the Bregman divergence, and, via an explicit affine immersion, is equivalent to Kurose's geometric divergence. In particular, the $L^{(α)}$-divergence is a canonical divergence of a statistical manifold with constant sectional curvature $-α$. For such a manifold, we give a geometric interpretation of its sectional curvature in terms of how the divergence between a pair of primal and dual geodesics differ from the dually flat case. Further results can be found in our follow-up paper [27] which uncovers a novel relation between optimal transport and information geometry. △ Less

Submitted 17 June, 2019; originally announced June 2019.

Comments: 10 pages, International Conference on Geometric Science of Information. Springer, 2019. arXiv admin note: text overlap with arXiv:1906.00030

arXiv:1906.00030 [pdf, ps, other]

Pseudo-Riemannian geometry embeds information geometry in optimal transport

Authors: Ting-Kam Leonard Wong, Jiaowen Yang

Abstract: Optimal transport and information geometry both study geometric structures on spaces of probability distributions. Optimal transport characterizes the cost-minimizing movement from one distribution to another, while information geometry originates from coordinate-invariant properties of statistical inference. Their connections and applications in statistics and machine learning have started to gai… ▽ More Optimal transport and information geometry both study geometric structures on spaces of probability distributions. Optimal transport characterizes the cost-minimizing movement from one distribution to another, while information geometry originates from coordinate-invariant properties of statistical inference. Their connections and applications in statistics and machine learning have started to gain more attention. In this paper we give a new differential geometric connection between the two fields. Namely, the pseudo-Riemannian framework of Kim and McCann, a geometric perspective on the fundamental Ma-Trudinger-Wang (MTW) condition in the regularity theory of optimal transport maps, encodes the dualistic structure of statistical manifold. This general relation is described using the natural framework of $c$-divergence, a divergence defined by an optimal transport map. As a by-product, we obtain a new information-geometric interpretation of the MTW tensor. This connection sheds light on old and new aspects of information geometry. The dually flat geometry of Bregman divergence corresponds to the quadratic cost and the pseudo-Euclidean space, and the $L^{(α)}$-divergence introduced by Pal and the first author has constant sectional curvature in a sense to be made precise. In these cases we give a geometric interpretation of the information-geometric curvature in terms of the divergence between a primal-dual pair of geodesics. △ Less

Submitted 5 May, 2021; v1 submitted 31 May, 2019; originally announced June 2019.

Comments: 28 pages, 2 figures. Substantially revised. [Originally titled "Optimal transport and information geometry.]

arXiv:1807.05649 [pdf, ps, other]

Multiplicative Schrödinger problem and the Dirichlet transport

Authors: Soumik Pal, Ting-Kam Leonard Wong

Abstract: We consider an optimal transport problem on the unit simplex whose solutions are given by gradients of exponentially concave functions and prove two main results. First, we show that the optimal transport is the large deviation limit of a particle system of Dirichlet processes transporting one probability measure on the unit simplex to another by coordinatewise multiplication and normalizing. The… ▽ More We consider an optimal transport problem on the unit simplex whose solutions are given by gradients of exponentially concave functions and prove two main results. First, we show that the optimal transport is the large deviation limit of a particle system of Dirichlet processes transporting one probability measure on the unit simplex to another by coordinatewise multiplication and normalizing. The structure of our Lagrangian and the appearance of the Dirichlet process relate our problem closely to the entropic measure on the Wasserstein space as defined by von-Renesse and Sturm in the context of Wasserstein diffusion. The limiting procedure is a triangular limit where we allow simultaneously the number of particles to grow to infinity while the `noise' tends to zero. The method, which generalizes easily to many other cost functions, including the squared Euclidean distance, provides a novel combination of the Schrödinger problem approach due to C. Léonard and the related Brownian particle systems by Adams et al.which does not require gamma convergence. Second, we analyze the behavior of entropy along the paths of transport. The reference measure on the simplex is taken to be the Dirichlet measure with all zero parameters which relates to the finite-dimensional distributions of the entropic measure. The interpolating curves are not the usual McCann lines. Nevertheless we show that entropy plus a multiple of the transport cost remains convex, which is reminiscent of the semiconvexity of entropy along lines of McCann interpolations in negative curvature spaces. We also obtain, under suitable conditions, dimension-free bounds of the optimal transport cost in terms of entropy. △ Less

Submitted 4 July, 2020; v1 submitted 15 July, 2018; originally announced July 2018.

Comments: 40 pages. To appear in Probability Theory and Related Fields

MSC Class: 60J75; 60G57; 60F10

arXiv:1804.02646 [pdf, ps, other]

Random walks and induced Dirichlet forms on compact spaces of homogeneous type

Authors: Shi-Lei Kong, Ka-Sing Lau, Ting-Kam Leonard Wong

Abstract: We extend our study of random walks and induced Dirichlet forms on self-similar sets [arXiv:1604.05440, 1612.01708] to compact spaces of homogeneous type $(K, ρ,μ)$. A successive partition on $K$ brings a natural augmented tree structure $(X, E)$ that is Gromov hyperbolic, and the hyperbolic boundary is Hölder equivalent to $K$. We then introduce a class of transient reversible random walks on… ▽ More We extend our study of random walks and induced Dirichlet forms on self-similar sets [arXiv:1604.05440, 1612.01708] to compact spaces of homogeneous type $(K, ρ,μ)$. A successive partition on $K$ brings a natural augmented tree structure $(X, E)$ that is Gromov hyperbolic, and the hyperbolic boundary is Hölder equivalent to $K$. We then introduce a class of transient reversible random walks on $(X, E)$ with return ratio $λ$. Using Silverstein's theory of Markov chains, we prove that the random walk induces an energy form on $K$ with $$ {\mathcal E}_K [u] \asymp \iint_{K\times K \setminus Δ} \frac{|u(ξ) - u(η)|^2}{V(ξ, η)ρ(ξ, η)^β} dμ(ξ) dμ(η), $$ where $V(ξ, η)$ is the $μ$-volume of the ball centered at $ξ$ with radius $ρ(ξ, η)$, $Δ$ is the diagonal, and $β$ depends on $λ$. In particular, for an $α$-set in ${\mathbb R}^d$, the kernel of the energy form is of order $\frac{1}{|ξ-η|^{α+β}}$. We also discuss conditions for this energy form to be a non-local regular Dirichlet form. △ Less

Submitted 8 April, 2018; originally announced April 2018.

Comments: 21 pages, no figures

MSC Class: Primary 60J10; Secondary 28A80; 60J50

arXiv:1712.03610 [pdf, ps, other]

Logarithmic divergences from optimal transport and Rényi geometry

Authors: Ting-Kam Leonard Wong

Abstract: Divergences, also known as contrast functions, are distance-like quantities defined on manifolds of non-negative or probability measures. Using the duality in optimal transport, we introduce and study the one-parameter family of $L^{(\pm α)}$-divergences. It includes the Bregman divergence corresponding to the Euclidean quadratic cost, and the $L$-divergence introduced by Pal and the author in con… ▽ More Divergences, also known as contrast functions, are distance-like quantities defined on manifolds of non-negative or probability measures. Using the duality in optimal transport, we introduce and study the one-parameter family of $L^{(\pm α)}$-divergences. It includes the Bregman divergence corresponding to the Euclidean quadratic cost, and the $L$-divergence introduced by Pal and the author in connection with portfolio theory and a logarithmic cost function. They admit natural generalizations of exponential family that are closely related to the $α$-family and $q$-exponential family. In particular, the $L^{(\pm α)}$-divergences of the corresponding potential functions are Rényi divergences. Using this unified framework we prove that the induced geometries are dually projectively flat with constant sectional curvatures, and a generalized Pythagorean theorem holds true. Conversely, we show that if a statistical manifold is dually projectively flat with constant curvature $\pm α$ with $α> 0$, then it is locally induced by an $L^{(\mp α)}$-divergence. We define in this context a canonical divergence which extends the one for dually flat manifolds. △ Less

Submitted 3 September, 2018; v1 submitted 10 December, 2017; originally announced December 2017.

Comments: 39 pages. Revised

arXiv:1709.03169 [pdf, other]

On portfolios generated by optimal transport

Authors: Ting-Kam Leonard Wong

Abstract: First introduced by Fernholz in stochastic portfolio theory, functionally generated portfolio allows its investment performance to be attributed to directly observable and easily interpretable market quantities. In previous works we showed that Fernholz's multiplicatively generated portfolio has deep connections with optimal transport and the information geometry of exponentially concave functions… ▽ More First introduced by Fernholz in stochastic portfolio theory, functionally generated portfolio allows its investment performance to be attributed to directly observable and easily interpretable market quantities. In previous works we showed that Fernholz's multiplicatively generated portfolio has deep connections with optimal transport and the information geometry of exponentially concave functions. Recently, Karatzas and Ruf introduced a new additive portfolio generation whose relation with optimal transport was studied by Vervuurt. We show that additively generated portfolio can be interpreted in terms of the well-known dually flat information geometry of Bregman divergence. Moreover, we characterize, in a sense to be made precise, all possible forms of functional portfolio constructions that contain additive and multiplicative generations as special cases. Each construction involves a divergence functional on the unit simplex measuring the market volatility captured, and admits a pathwise decomposition for the portfolio value. We illustrate with an empirical example. △ Less

Submitted 25 September, 2017; v1 submitted 10 September, 2017; originally announced September 2017.

Comments: 19 pages, 4 figures. Revised

arXiv:1704.01930 [pdf, other]

doi 10.23638/LMCS-13(4:10)2017

Some observations on the logical foundations of inductive theorem proving

Authors: Stefan Hetzl, Tin Lok Wong

Abstract: In this paper we study the logical foundations of automated inductive theorem proving. To that aim we first develop a theoretical model that is centered around the difficulty of finding induction axioms which are sufficient for proving a goal. Based on this model, we then analyze the following aspects: the choice of a proof shape, the choice of an induction rule and the language of the induction… ▽ More In this paper we study the logical foundations of automated inductive theorem proving. To that aim we first develop a theoretical model that is centered around the difficulty of finding induction axioms which are sufficient for proving a goal. Based on this model, we then analyze the following aspects: the choice of a proof shape, the choice of an induction rule and the language of the induction formula. In particular, using model-theoretic techniques, we clarify the relationship between notions of inductiveness that have been considered in the literature on automated inductive theorem proving. This is a corrected version of the paper arXiv:1704.01930v5 published originally on Nov.~16, 2017. △ Less

Submitted 12 April, 2018; v1 submitted 6 April, 2017; originally announced April 2017.

Journal ref: Logical Methods in Computer Science, Volume 13, Issue 4, Automated deduction (April 13, 2018) lmcs:3256

arXiv:1611.09631 [pdf, ps, other]

Cover's universal portfolio, stochastic portfolio theory and the numeraire portfolio

Authors: Christa Cuchiero, Walter Schachermayer, Ting-Kam Leonard Wong

Abstract: Cover's celebrated theorem states that the long run yield of a properly chosen "universal" portfolio is as good as the long run yield of the best retrospectively chosen constant rebalanced portfolio. The "universality" pertains to the fact that this result is model-free, i.e., not dependent on an underlying stochastic process. We extend Cover's theorem to the setting of stochastic portfolio theory… ▽ More Cover's celebrated theorem states that the long run yield of a properly chosen "universal" portfolio is as good as the long run yield of the best retrospectively chosen constant rebalanced portfolio. The "universality" pertains to the fact that this result is model-free, i.e., not dependent on an underlying stochastic process. We extend Cover's theorem to the setting of stochastic portfolio theory as initiated by R. Fernholz: the rebalancing rule need not to be constant anymore but may depend on the present state of the stock market. This model-free result is complemented by a comparison with the log-optimal numeraire portfolio when fixing a stochastic model of the stock market. Roughly speaking, under appropriate assumptions, the optimal long run yield coincides for the three approaches mentioned in the title of this paper. We present our results in discrete and continuous time. △ Less

Submitted 29 November, 2016; originally announced November 2016.

arXiv:1605.05819 [pdf, other]

Exponentially concave functions and a new information geometry

Authors: Soumik Pal, Ting-Kam Leonard Wong

Abstract: A function is exponentially concave if its exponential is concave. We consider exponentially concave functions on the unit simplex. In a previous paper we showed that gradient maps of exponentially concave functions provide solutions to a Monge-Kantorovich optimal transport problem and give a better gradient approximation than those of ordinary concave functions. The approximation error, called L-… ▽ More A function is exponentially concave if its exponential is concave. We consider exponentially concave functions on the unit simplex. In a previous paper we showed that gradient maps of exponentially concave functions provide solutions to a Monge-Kantorovich optimal transport problem and give a better gradient approximation than those of ordinary concave functions. The approximation error, called L-divergence, is different from the usual Bregman divergence. Using tools of information geometry and optimal transport, we show that L-divergence induces a new information geometry on the simplex consisting of a Riemannian metric and a pair of dually coupled affine connections which defines two kinds of geodesics. We show that the induced geometry is dually projectively flat but not flat. Nevertheless, we prove an analogue of the celebrated generalized Pythagorean theorem from classical information geometry. On the other hand, we consider displacement interpolation under a Lagrangian integral action that is consistent with the optimal transport problem and show that the action minimizing curves are dual geodesics. The Pythagorean theorem is also shown to have an interesting application of determining the optimal trading frequency in stochastic portfolio theory. △ Less

Submitted 31 May, 2017; v1 submitted 19 May, 2016; originally announced May 2016.

Comments: 49 pages, 3 figures; revised. To appear in the Annals of Probability

arXiv:1604.05440 [pdf, ps, other]

doi 10.1016/j.aim.2017.09.029

Random walks and induced Dirichlet forms on self-similar sets

Authors: Shi-Lei Kong, Ka-Sing Lau, Ting-Kam Leonard Wong

Abstract: Let $K$ be a self-similar set satisfying the open set condition. Following Kaimanovich's elegant idea, it has been proved that on the symbolic space $X$ of $K$ a natural augmented tree structure ${\mathfrak E}$ exists; it is hyperbolic, and the hyperbolic boundary $\partial_HX$ with the Gromov metric is Hölder equivalent to $K$. In this paper we consider certain reversible random walks with return… ▽ More Let $K$ be a self-similar set satisfying the open set condition. Following Kaimanovich's elegant idea, it has been proved that on the symbolic space $X$ of $K$ a natural augmented tree structure ${\mathfrak E}$ exists; it is hyperbolic, and the hyperbolic boundary $\partial_HX$ with the Gromov metric is Hölder equivalent to $K$. In this paper we consider certain reversible random walks with return ratio $0< λ<1$ on $(X, {\mathfrak E})$. We show that the Martin boundary ${\mathcal M}$ can be identified with $\partial_H X$ and $K$. With this setup and a device of Silverstein, we obtain precise estimates of the Martin kernel and the Naïm kernel in terms of the Gromov product. Moreover, the Naïm kernel turns out to be a jump kernel satisfying the estimate $Θ(ξ, η) \asymp |ξ-η|^{-(α+ β)}$, where $α$ is the Hausdorff dimension of $K$ and $β$ depends on $λ$. For suitable $β$, the kernel defines a regular non-local Dirichlet form on $K$. This extends the results of Kigami concerning random walks on certain trees with Cantor-type sets as boundaries. △ Less

Submitted 19 October, 2017; v1 submitted 19 April, 2016; originally announced April 2016.

Comments: 33 pages with 2 figures

MSC Class: 28A80; 60J10 (Primary); 60J50 (Secondary)

Journal ref: Advances in Mathematics 320 (2017), 1099--1134

arXiv:1510.02808 [pdf, ps, other]

Universal portfolios in stochastic portfolio theory

Authors: Ting-Kam Leonard Wong

Abstract: Consider a family of portfolio strategies with the aim of achieving the asymptotic growth rate of the best one. The idea behind Cover's universal portfolio is to build a wealth-weighted average which can be viewed as a buy-and-hold portfolio of portfolios. When an optimal portfolio exists, the wealth-weighted average converges to it by concentration of wealth. Working under a discrete time and pat… ▽ More Consider a family of portfolio strategies with the aim of achieving the asymptotic growth rate of the best one. The idea behind Cover's universal portfolio is to build a wealth-weighted average which can be viewed as a buy-and-hold portfolio of portfolios. When an optimal portfolio exists, the wealth-weighted average converges to it by concentration of wealth. Working under a discrete time and pathwise setup, we show under suitable conditions that the distribution of wealth in the family satisfies a pathwise large deviation principle as time tends to infinity. Our main result extends Cover's portfolio to the nonparametric family of functionally generated portfolios in stochastic portfolio theory and establishes its asymptotic universality. △ Less

Submitted 12 December, 2016; v1 submitted 9 October, 2015; originally announced October 2015.

Comments: 25 pages; revised

arXiv:1407.8300 [pdf, other]

Optimization of relative arbitrage

Authors: Ting-Kam Leonard Wong

Abstract: In stochastic portfolio theory, a relative arbitrage is an equity portfolio which is guaranteed to outperform a benchmark portfolio over a finite horizon. When the market is diverse and sufficiently volatile, and the benchmark is the market or a buy-and-hold portfolio, functionally generated portfolios introduced by Fernholz provide a systematic way of constructing relative arbitrages. In this pap… ▽ More In stochastic portfolio theory, a relative arbitrage is an equity portfolio which is guaranteed to outperform a benchmark portfolio over a finite horizon. When the market is diverse and sufficiently volatile, and the benchmark is the market or a buy-and-hold portfolio, functionally generated portfolios introduced by Fernholz provide a systematic way of constructing relative arbitrages. In this paper we show that if the market portfolio is replaced by the equal or entropy weighted portfolio among many others, no relative arbitrages can be constructed under the same conditions using functionally generated portfolios. We also introduce and study a shaped-constrained optimization problem for functionally generated portfolios in the spirit of maximum likelihood estimation of a log-concave density. △ Less

Submitted 24 November, 2014; v1 submitted 31 July, 2014; originally announced July 2014.

Comments: 33 pages, 5 figures, 2 tables; revised version

arXiv:1402.3720 [pdf, other]

The geometry of relative arbitrage

Authors: Soumik Pal, Ting-Kam Leonard Wong

Abstract: Consider an equity market with $n$ stocks. The vector of proportions of the total market capitalizations that belong to each stock is called the market weight. The market weight defines the market portfolio which is a buy-and-hold portfolio representing the performance of the entire stock market. Consider a function that assigns a portfolio vector to each possible value of the market weight, and w… ▽ More Consider an equity market with $n$ stocks. The vector of proportions of the total market capitalizations that belong to each stock is called the market weight. The market weight defines the market portfolio which is a buy-and-hold portfolio representing the performance of the entire stock market. Consider a function that assigns a portfolio vector to each possible value of the market weight, and we perform self-financing trading using this portfolio function. We study the problem of characterizing functions such that the resulting portfolio will outperform the market portfolio in the long run under the conditions of diversity and sufficient volatility. No other assumption on the future behavior of stock prices is made. We prove that the only solutions are functionally generated portfolios in the sense of Fernholz. A second characterization is given as the optimal maps of a remarkable optimal transport problem. Both characterizations follow from a novel property of portfolios called multiplicative cyclical monotonicity. △ Less

Submitted 27 July, 2015; v1 submitted 15 February, 2014; originally announced February 2014.

Comments: 31 pages, 5 figures; substantially revised; Section 4 illustrates the optiaml transport approach with empirical examples

arXiv:1308.5376 [pdf, other]

Energy, entropy, and arbitrage

Authors: Soumik Pal, Ting-Kam Leonard Wong

Abstract: We introduce a pathwise approach to analyze the relative performance of an equity portfolio with respect to a benchmark market portfolio. In this energy-entropy framework, the relative performance is decomposed into three components: a volatility term, a relative entropy term measuring the distance between the portfolio weights and the market capital distribution, and another entropy term that can… ▽ More We introduce a pathwise approach to analyze the relative performance of an equity portfolio with respect to a benchmark market portfolio. In this energy-entropy framework, the relative performance is decomposed into three components: a volatility term, a relative entropy term measuring the distance between the portfolio weights and the market capital distribution, and another entropy term that can be controlled by the investor by adopting a suitable rebalancing strategy. This framework leads to a class of portfolio strategies that allows one to outperform, in the long run, a market that is diverse and sufficiently volatile in the sense of stochastic portfolio theory. The framework is illustrated with several empirical examples. △ Less

Submitted 1 January, 2016; v1 submitted 25 August, 2013; originally announced August 2013.

Comments: 21 pages, 7 figures. Substantially revised

Showing 1–29 of 29 results for author: Wong, T L