-
Model Selection for Gaussian-gated Gaussian Mixture of Experts Using Dendrograms of Mixing Measures
Authors:
Tuan Thai,
TrungTin Nguyen,
Dat Do,
Nhat Ho,
Christopher Drovandi
Abstract:
Mixture of Experts (MoE) models constitute a widely utilized class of ensemble learning approaches in statistics and machine learning, known for their flexibility and computational efficiency. They have become integral components in numerous state-of-the-art deep neural network architectures, particularly for analyzing heterogeneous data across diverse domains. Despite their practical success, the…
▽ More
Mixture of Experts (MoE) models constitute a widely utilized class of ensemble learning approaches in statistics and machine learning, known for their flexibility and computational efficiency. They have become integral components in numerous state-of-the-art deep neural network architectures, particularly for analyzing heterogeneous data across diverse domains. Despite their practical success, the theoretical understanding of model selection, especially concerning the optimal number of mixture components or experts, remains limited and poses significant challenges. These challenges primarily stem from the inclusion of covariates in both the Gaussian gating functions and expert networks, which introduces intrinsic interactions governed by partial differential equations with respect to their parameters. In this paper, we revisit the concept of dendrograms of mixing measures and introduce a novel extension to Gaussian-gated Gaussian MoE models that enables consistent estimation of the true number of mixture components and achieves the pointwise optimal convergence rate for parameter estimation in overfitted scenarios. Notably, this approach circumvents the need to train and compare a range of models with varying numbers of components, thereby alleviating the computational burden, particularly in high-dimensional or deep neural network settings. Experimental results on synthetic data demonstrate the effectiveness of the proposed method in accurately recovering the number of experts. It outperforms common criteria such as the Akaike information criterion, the Bayesian information criterion, and the integrated completed likelihood, while achieving optimal convergence rates for parameter estimation and accurately approximating the regression function.
△ Less
Submitted 23 May, 2025; v1 submitted 19 May, 2025;
originally announced May 2025.
-
Demystifying Softmax Gating Function in Gaussian Mixture of Experts
Authors:
Huy Nguyen,
TrungTin Nguyen,
Nhat Ho
Abstract:
Understanding the parameter estimation of softmax gating Gaussian mixture of experts has remained a long-standing open problem in the literature. It is mainly due to three fundamental theoretical challenges associated with the softmax gating function: (i) the identifiability only up to the translation of parameters; (ii) the intrinsic interaction via partial differential equations between the soft…
▽ More
Understanding the parameter estimation of softmax gating Gaussian mixture of experts has remained a long-standing open problem in the literature. It is mainly due to three fundamental theoretical challenges associated with the softmax gating function: (i) the identifiability only up to the translation of parameters; (ii) the intrinsic interaction via partial differential equations between the softmax gating and the expert functions in the Gaussian density; (iii) the complex dependence between the numerator and denominator of the conditional density of softmax gating Gaussian mixture of experts. We resolve these challenges by proposing novel Voronoi loss functions among parameters and establishing the convergence rates of maximum likelihood estimator (MLE) for solving parameter estimation in these models. When the true number of experts is unknown and over-specified, our findings show a connection between the convergence rate of the MLE and a solvability problem of a system of polynomial equations.
△ Less
Submitted 29 October, 2023; v1 submitted 5 May, 2023;
originally announced May 2023.
-
Minimax Optimal Rate for Parameter Estimation in Multivariate Deviated Models
Authors:
Dat Do,
Huy Nguyen,
Khai Nguyen,
Nhat Ho
Abstract:
We study the maximum likelihood estimation (MLE) in the multivariate deviated model where the data are generated from the density function $(1-λ^{\ast})h_{0}(x)+λ^{\ast}f(x|μ^{\ast}, Σ^{\ast})$ in which $h_{0}$ is a known function, $λ^{\ast} \in [0,1]$ and $(μ^{\ast}, Σ^{\ast})$ are unknown parameters to estimate. The main challenges in deriving the convergence rate of the MLE mainly come from two…
▽ More
We study the maximum likelihood estimation (MLE) in the multivariate deviated model where the data are generated from the density function $(1-λ^{\ast})h_{0}(x)+λ^{\ast}f(x|μ^{\ast}, Σ^{\ast})$ in which $h_{0}$ is a known function, $λ^{\ast} \in [0,1]$ and $(μ^{\ast}, Σ^{\ast})$ are unknown parameters to estimate. The main challenges in deriving the convergence rate of the MLE mainly come from two issues: (1) The interaction between the function $h_{0}$ and the density function $f$; (2) The deviated proportion $λ^{\ast}$ can go to the extreme points of $[0,1]$ as the sample size tends to infinity. To address these challenges, we develop the \emph{distinguishability condition} to capture the linear independent relation between the function $h_{0}$ and the density function $f$. We then provide comprehensive convergence rates of the MLE via the vanishing rate of $λ^{\ast}$ to zero as well as the distinguishability of two functions $h_{0}$ and $f$.
△ Less
Submitted 29 October, 2023; v1 submitted 27 January, 2023;
originally announced January 2023.
-
On Excess Mass Behavior in Gaussian Mixture Models with Orlicz-Wasserstein Distances
Authors:
Aritra Guha,
Nhat Ho,
XuanLong Nguyen
Abstract:
Dirichlet Process mixture models (DPMM) in combination with Gaussian kernels have been an important modeling tool for numerous data domains arising from biological, physical, and social sciences. However, this versatility in applications does not extend to strong theoretical guarantees for the underlying parameter estimates, for which only a logarithmic rate is achieved. In this work, we (re)intro…
▽ More
Dirichlet Process mixture models (DPMM) in combination with Gaussian kernels have been an important modeling tool for numerous data domains arising from biological, physical, and social sciences. However, this versatility in applications does not extend to strong theoretical guarantees for the underlying parameter estimates, for which only a logarithmic rate is achieved. In this work, we (re)introduce and investigate a metric, named Orlicz-Wasserstein distance, in the study of the Bayesian contraction behavior for the parameters. We show that despite the overall slow convergence guarantees for all the parameters, posterior contraction for parameters happens at almost polynomial rates in outlier regions of the parameter space. Our theoretical results provide new insight in understanding the convergence behavior of parameters arising from various settings of hierarchical Bayesian nonparametric models. In addition, we provide an algorithm to compute the metric by leveraging Sinkhorn divergences and validate our findings through a simulation study.
△ Less
Submitted 26 January, 2023;
originally announced January 2023.
-
Statistical and Computational Complexities of BFGS Quasi-Newton Method for Generalized Linear Models
Authors:
Qiujiang Jin,
Tongzheng Ren,
Nhat Ho,
Aryan Mokhtari
Abstract:
The gradient descent (GD) method has been used widely to solve parameter estimation in generalized linear models (GLMs), a generalization of linear models when the link function can be non-linear. In GLMs with a polynomial link function, it has been shown that in the high signal-to-noise ratio (SNR) regime, due to the problem's strong convexity and smoothness, GD converges linearly and reaches the…
▽ More
The gradient descent (GD) method has been used widely to solve parameter estimation in generalized linear models (GLMs), a generalization of linear models when the link function can be non-linear. In GLMs with a polynomial link function, it has been shown that in the high signal-to-noise ratio (SNR) regime, due to the problem's strong convexity and smoothness, GD converges linearly and reaches the final desired accuracy in a logarithmic number of iterations. In contrast, in the low SNR setting, where the problem becomes locally convex, GD converges at a slower rate and requires a polynomial number of iterations to reach the desired accuracy. Even though Newton's method can be used to resolve the flat curvature of the loss functions in the low SNR case, its computational cost is prohibitive in high-dimensional settings as it is $\mathcal{O}(d^3)$, where $d$ the is the problem dimension. To address the shortcomings of GD and Newton's method, we propose the use of the BFGS quasi-Newton method to solve parameter estimation of the GLMs, which has a per iteration cost of $\mathcal{O}(d^2)$. When the SNR is low, for GLMs with a polynomial link function of degree $p$, we demonstrate that the iterates of BFGS converge linearly to the optimal solution of the population least-square loss function, and the contraction coefficient of the BFGS algorithm is comparable to that of Newton's method. Moreover, the contraction factor of the linear rate is independent of problem parameters and only depends on the degree of the link function $p$. Also, for the empirical loss with $n$ samples, we prove that in the low SNR setting of GLMs with a polynomial link function of degree $p$, the iterates of BFGS reach a final statistical radius of $\mathcal{O}((d/n)^{\frac{1}{2p+2}})$ after at most $\log(n/d)$ iterations.
△ Less
Submitted 14 March, 2024; v1 submitted 31 May, 2022;
originally announced June 2022.
-
Beyond EM Algorithm on Over-specified Two-Component Location-Scale Gaussian Mixtures
Authors:
Tongzheng Ren,
Fuheng Cui,
Sujay Sanghavi,
Nhat Ho
Abstract:
The Expectation-Maximization (EM) algorithm has been predominantly used to approximate the maximum likelihood estimation of the location-scale Gaussian mixtures. However, when the models are over-specified, namely, the chosen number of components to fit the data is larger than the unknown true number of components, EM needs a polynomial number of iterations in terms of the sample size to reach the…
▽ More
The Expectation-Maximization (EM) algorithm has been predominantly used to approximate the maximum likelihood estimation of the location-scale Gaussian mixtures. However, when the models are over-specified, namely, the chosen number of components to fit the data is larger than the unknown true number of components, EM needs a polynomial number of iterations in terms of the sample size to reach the final statistical radius; this is computationally expensive in practice. The slow convergence of EM is due to the missing of the locally strong convexity with respect to the location parameter on the negative population log-likelihood function, i.e., the limit of the negative sample log-likelihood function when the sample size goes to infinity. To efficiently explore the curvature of the negative log-likelihood functions, by specifically considering two-component location-scale Gaussian mixtures, we develop the Exponential Location Update (ELU) algorithm. The idea of the ELU algorithm is that we first obtain the exact optimal solution for the scale parameter and then perform an exponential step-size gradient descent for the location parameter. We demonstrate theoretically and empirically that the ELU iterates converge to the final statistical radius of the models after a logarithmic number of iterations. To the best of our knowledge, it resolves the long-standing open question in the literature about developing an optimization algorithm that has optimal statistical and computational complexities for solving parameter estimation even under some specific settings of the over-specified Gaussian mixture models.
△ Less
Submitted 23 May, 2022;
originally announced May 2022.
-
An Exponentially Increasing Step-size for Parameter Estimation in Statistical Models
Authors:
Nhat Ho,
Tongzheng Ren,
Sujay Sanghavi,
Purnamrita Sarkar,
Rachel Ward
Abstract:
Using gradient descent (GD) with fixed or decaying step-size is a standard practice in unconstrained optimization problems. However, when the loss function is only locally convex, such a step-size schedule artificially slows GD down as it cannot explore the flat curvature of the loss function. To overcome that issue, we propose to exponentially increase the step-size of the GD algorithm. Under hom…
▽ More
Using gradient descent (GD) with fixed or decaying step-size is a standard practice in unconstrained optimization problems. However, when the loss function is only locally convex, such a step-size schedule artificially slows GD down as it cannot explore the flat curvature of the loss function. To overcome that issue, we propose to exponentially increase the step-size of the GD algorithm. Under homogeneous assumptions on the loss function, we demonstrate that the iterates of the proposed \emph{exponential step size gradient descent} (EGD) algorithm converge linearly to the optimal solution. Leveraging that optimization insight, we then consider using the EGD algorithm for solving parameter estimation under both regular and non-regular statistical models whose loss function becomes locally convex when the sample size goes to infinity. We demonstrate that the EGD iterates reach the final statistical radius within the true parameter after a logarithmic number of iterations, which is in stark contrast to a \emph{polynomial} number of iterations of the GD algorithm in non-regular statistical models. Therefore, the total computational complexity of the EGD algorithm is \emph{optimal} and exponentially cheaper than that of the GD for solving parameter estimation in non-regular statistical models while being comparable to that of the GD in regular statistical settings. To the best of our knowledge, it resolves a long-standing gap between statistical and algorithmic computational complexities of parameter estimation in non-regular statistical models. Finally, we provide targeted applications of the general theory to several classes of statistical models, including generalized linear models with polynomial link functions and location Gaussian mixture models.
△ Less
Submitted 1 February, 2023; v1 submitted 16 May, 2022;
originally announced May 2022.
-
Refined Convergence Rates for Maximum Likelihood Estimation under Finite Mixture Models
Authors:
Tudor Manole,
Nhat Ho
Abstract:
We revisit the classical problem of deriving convergence rates for the maximum likelihood estimator (MLE) in finite mixture models. The Wasserstein distance has become a standard loss function for the analysis of parameter estimation in these models, due in part to its ability to circumvent label switching and to accurately characterize the behaviour of fitted mixture components with vanishing wei…
▽ More
We revisit the classical problem of deriving convergence rates for the maximum likelihood estimator (MLE) in finite mixture models. The Wasserstein distance has become a standard loss function for the analysis of parameter estimation in these models, due in part to its ability to circumvent label switching and to accurately characterize the behaviour of fitted mixture components with vanishing weights. However, the Wasserstein distance is only able to capture the worst-case convergence rate among the remaining fitted mixture components. We demonstrate that when the log-likelihood function is penalized to discourage vanishing mixing weights, stronger loss functions can be derived to resolve this shortcoming of the Wasserstein distance. These new loss functions accurately capture the heterogeneity in convergence rates of fitted mixture components, and we use them to sharpen existing pointwise and uniform convergence rates in various classes of mixture models. In particular, these results imply that a subset of the components of the penalized MLE typically converge significantly faster than could have been anticipated from past work. We further show that some of these conclusions extend to the traditional MLE. Our theoretical findings are supported by a simulation study to illustrate these improved convergence rates.
△ Less
Submitted 20 June, 2022; v1 submitted 17 February, 2022;
originally announced February 2022.
-
Improving Computational Complexity in Statistical Models with Second-Order Information
Authors:
Tongzheng Ren,
Jiacheng Zhuo,
Sujay Sanghavi,
Nhat Ho
Abstract:
It is known that when the statistical models are singular, i.e., the Fisher information matrix at the true parameter is degenerate, the fixed step-size gradient descent algorithm takes polynomial number of steps in terms of the sample size $n$ to converge to a final statistical radius around the true parameter, which can be unsatisfactory for the application. To further improve that computational…
▽ More
It is known that when the statistical models are singular, i.e., the Fisher information matrix at the true parameter is degenerate, the fixed step-size gradient descent algorithm takes polynomial number of steps in terms of the sample size $n$ to converge to a final statistical radius around the true parameter, which can be unsatisfactory for the application. To further improve that computational complexity, we consider the utilization of the second-order information in the design of optimization algorithms. Specifically, we study the normalized gradient descent (NormGD) algorithm for solving parameter estimation in parametric statistical models, which is a variant of gradient descent algorithm whose step size is scaled by the maximum eigenvalue of the Hessian matrix of the empirical loss function of statistical models. When the population loss function, i.e., the limit of the empirical loss function when $n$ goes to infinity, is homogeneous in all directions, we demonstrate that the NormGD iterates reach a final statistical radius around the true parameter after a logarithmic number of iterations in terms of $n$. Therefore, for fixed dimension $d$, the NormGD algorithm achieves the optimal overall computational complexity $\mathcal{O}(n)$ to reach the final statistical radius. This computational complexity is cheaper than that of the fixed step-size gradient descent algorithm, which is of the order $\mathcal{O}(n^τ)$ for some $τ> 1$, to reach the same statistical radius. We illustrate our general theory under two statistical models: generalized linear models and mixture models, and experimental results support our prediction with general theory.
△ Less
Submitted 13 April, 2022; v1 submitted 8 February, 2022;
originally announced February 2022.
-
Beyond Black Box Densities: Parameter Learning for the Deviated Components
Authors:
Dat Do,
Nhat Ho,
XuanLong Nguyen
Abstract:
As we collect additional samples from a data population for which a known density function estimate may have been previously obtained by a black box method, the increased complexity of the data set may result in the true density being deviated from the known estimate by a mixture distribution. To model this phenomenon, we consider the \emph{deviating mixture model}…
▽ More
As we collect additional samples from a data population for which a known density function estimate may have been previously obtained by a black box method, the increased complexity of the data set may result in the true density being deviated from the known estimate by a mixture distribution. To model this phenomenon, we consider the \emph{deviating mixture model} $(1-λ^{*})h_0 + λ^{*} (\sum_{i = 1}^{k} p_{i}^{*} f(x|θ_{i}^{*}))$, where $h_0$ is a known density function, while the deviated proportion $λ^{*}$ and latent mixing measure $G_{*} = \sum_{i = 1}^{k} p_{i}^{*} δ_{θ_i^{*}}$ associated with the mixture distribution are unknown. Via a novel notion of distinguishability between the known density $h_{0}$ and the deviated mixture distribution, we establish rates of convergence for the maximum likelihood estimates of $λ^{*}$ and $G^{*}$ under Wasserstein metric. Simulation studies are carried out to illustrate the theory.
△ Less
Submitted 26 October, 2022; v1 submitted 5 February, 2022;
originally announced February 2022.
-
Bayesian Consistency with the Supremum Metric
Authors:
Nhat Ho,
Stephen G. Walker
Abstract:
We present simple conditions for Bayesian consistency in the supremum metric. The key to the technique is a triangle inequality which allows us to explicitly use weak convergence, a consequence of the standard Kullback--Leibler support condition for the prior. A further condition is to ensure that smoothed versions of densities are not too far from the original density, thus dealing with densities…
▽ More
We present simple conditions for Bayesian consistency in the supremum metric. The key to the technique is a triangle inequality which allows us to explicitly use weak convergence, a consequence of the standard Kullback--Leibler support condition for the prior. A further condition is to ensure that smoothed versions of densities are not too far from the original density, thus dealing with densities which could track the data too closely. A key result of the paper is that we demonstrate supremum consistency using weaker conditions compared to those currently used to secure $\mathbb{L}_1$ consistency.
△ Less
Submitted 10 January, 2022;
originally announced January 2022.
-
Polytopes, supersymmetry, and integrable systems
Authors:
Martin A. Guest,
Nan-Kuo Ho
Abstract:
We review some links between Lie-theoretic polytopes and field theories in physics, which were proposed in the 1990's. A basic ingredient is the Coxeter Plane, whose relation to integrable systems and the Stokes Phenomenon has only recently come to light. We use this to give a systematic mathematical treatment, which gives further support to the physical proposals. This article is based on a talk…
▽ More
We review some links between Lie-theoretic polytopes and field theories in physics, which were proposed in the 1990's. A basic ingredient is the Coxeter Plane, whose relation to integrable systems and the Stokes Phenomenon has only recently come to light. We use this to give a systematic mathematical treatment, which gives further support to the physical proposals. This article is based on a talk which was scheduled to be given at the workshop "Representations of Discrete Groups and Geometric Topology on Manifolds", Josai University, 12-13 March 2020.
△ Less
Submitted 28 October, 2021;
originally announced October 2021.
-
Towards Statistical and Computational Complexities of Polyak Step Size Gradient Descent
Authors:
Tongzheng Ren,
Fuheng Cui,
Alexia Atsidakou,
Sujay Sanghavi,
Nhat Ho
Abstract:
We study the statistical and computational complexities of the Polyak step size gradient descent algorithm under generalized smoothness and Lojasiewicz conditions of the population loss function, namely, the limit of the empirical loss function when the sample size goes to infinity, and the stability between the gradients of the empirical and population loss functions, namely, the polynomial growt…
▽ More
We study the statistical and computational complexities of the Polyak step size gradient descent algorithm under generalized smoothness and Lojasiewicz conditions of the population loss function, namely, the limit of the empirical loss function when the sample size goes to infinity, and the stability between the gradients of the empirical and population loss functions, namely, the polynomial growth on the concentration bound between the gradients of sample and population loss functions. We demonstrate that the Polyak step size gradient descent iterates reach a final statistical radius of convergence around the true parameter after logarithmic number of iterations in terms of the sample size. It is computationally cheaper than the polynomial number of iterations on the sample size of the fixed-step size gradient descent algorithm to reach the same final statistical radius when the population loss function is not locally strongly convex. Finally, we illustrate our general theory under three statistical examples: generalized linear model, mixture model, and mixed linear regression model.
△ Less
Submitted 14 October, 2021;
originally announced October 2021.
-
Entropic Gromov-Wasserstein between Gaussian Distributions
Authors:
Khang Le,
Dung Le,
Huy Nguyen,
Dat Do,
Tung Pham,
Nhat Ho
Abstract:
We study the entropic Gromov-Wasserstein and its unbalanced version between (unbalanced) Gaussian distributions with different dimensions. When the metric is the inner product, which we refer to as inner product Gromov-Wasserstein (IGW), we demonstrate that the optimal transportation plans of entropic IGW and its unbalanced variant are (unbalanced) Gaussian distributions. Via an application of von…
▽ More
We study the entropic Gromov-Wasserstein and its unbalanced version between (unbalanced) Gaussian distributions with different dimensions. When the metric is the inner product, which we refer to as inner product Gromov-Wasserstein (IGW), we demonstrate that the optimal transportation plans of entropic IGW and its unbalanced variant are (unbalanced) Gaussian distributions. Via an application of von Neumann's trace inequality, we obtain closed-form expressions for the entropic IGW between these Gaussian distributions. Finally, we consider an entropic inner product Gromov-Wasserstein barycenter of multiple Gaussian distributions. We prove that the barycenter is a Gaussian distribution when the entropic regularization parameter is small. We further derive a closed-form expression for the covariance matrix of the barycenter.
△ Less
Submitted 24 February, 2022; v1 submitted 24 August, 2021;
originally announced August 2021.
-
On Multimarginal Partial Optimal Transport: Equivalent Forms and Computational Complexity
Authors:
Khang Le,
Huy Nguyen,
Tung Pham,
Nhat Ho
Abstract:
We study the multi-marginal partial optimal transport (POT) problem between $m$ discrete (unbalanced) measures with at most $n$ supports. We first prove that we can obtain two equivalence forms of the multimarginal POT problem in terms of the multimarginal optimal transport problem via novel extensions of cost tensor. The first equivalence form is derived under the assumptions that the total masse…
▽ More
We study the multi-marginal partial optimal transport (POT) problem between $m$ discrete (unbalanced) measures with at most $n$ supports. We first prove that we can obtain two equivalence forms of the multimarginal POT problem in terms of the multimarginal optimal transport problem via novel extensions of cost tensor. The first equivalence form is derived under the assumptions that the total masses of each measure are sufficiently close while the second equivalence form does not require any conditions on these masses but at the price of more sophisticated extended cost tensor. Our proof techniques for obtaining these equivalence forms rely on novel procedures of moving mass in graph theory to push transportation plan into appropriate regions. Finally, based on the equivalence forms, we develop optimization algorithm, named ApproxMPOT algorithm, that builds upon the Sinkhorn algorithm for solving the entropic regularized multimarginal optimal transport. We demonstrate that the ApproxMPOT algorithm can approximate the optimal value of multimarginal POT problem with a computational complexity upper bound of the order $\tilde{\mathcal{O}}(m^3(n+1)^{m}/ \varepsilon^2)$ where $\varepsilon > 0$ stands for the desired tolerance.
△ Less
Submitted 24 February, 2022; v1 submitted 18 August, 2021;
originally announced August 2021.
-
On Integral Theorems and their Statistical Properties
Authors:
Nhat Ho,
Stephen G. Walker
Abstract:
We introduce a class of integral theorems based on cyclic functions and Riemann sums approximating integrals. The Fourier integral theorem, derived as a combination of a transform and inverse transform, arises as a special case. The integral theorems provide natural estimators of density functions via Monte Carlo methods. Assessments of the quality of the density estimators can be used to obtain o…
▽ More
We introduce a class of integral theorems based on cyclic functions and Riemann sums approximating integrals. The Fourier integral theorem, derived as a combination of a transform and inverse transform, arises as a special case. The integral theorems provide natural estimators of density functions via Monte Carlo methods. Assessments of the quality of the density estimators can be used to obtain optimal cyclic functions, alternatives to the sin function, which minimize square integrals. Our proof techniques rely on a variational approach in ordinary differential equations and the Cauchy residue theorem in complex analysis.
△ Less
Submitted 20 March, 2022; v1 submitted 22 July, 2021;
originally announced July 2021.
-
BONuS: Multiple multivariate testing with a data-adaptivetest statistic
Authors:
Chiao-Yu Yang,
Lihua Lei,
Nhat Ho,
Will Fithian
Abstract:
We propose a new adaptive empirical Bayes framework, the Bag-Of-Null-Statistics (BONuS) procedure, for multiple testing where each hypothesis testing problem is itself multivariate or nonparametric. BONuS is an adaptive and interactive knockoff-type method that helps improve the testing power while controlling the false discovery rate (FDR), and is closely connected to the "counting knockoffs" pro…
▽ More
We propose a new adaptive empirical Bayes framework, the Bag-Of-Null-Statistics (BONuS) procedure, for multiple testing where each hypothesis testing problem is itself multivariate or nonparametric. BONuS is an adaptive and interactive knockoff-type method that helps improve the testing power while controlling the false discovery rate (FDR), and is closely connected to the "counting knockoffs" procedure analyzed in Weinstein et al. (2017). Contrary to procedures that start with a $p$-value for each hypothesis, our method analyzes the entire data set to adaptively estimate an optimal $p$-value transform based on an empirical Bayes model. Despite the extra adaptivity, our method controls FDR in finite samples even if the empirical Bayes model is incorrect or the estimation is poor. An extension, the Double BONuS procedure, validates the empirical Bayes model to guard against power loss due to model misspecification.
△ Less
Submitted 1 July, 2021; v1 submitted 29 June, 2021;
originally announced June 2021.
-
On Robust Optimal Transport: Computational Complexity and Barycenter Computation
Authors:
Khang Le,
Huy Nguyen,
Quang Nguyen,
Tung Pham,
Hung Bui,
Nhat Ho
Abstract:
We consider robust variants of the standard optimal transport, named robust optimal transport, where marginal constraints are relaxed via Kullback-Leibler divergence. We show that Sinkhorn-based algorithms can approximate the optimal cost of robust optimal transport in $\widetilde{\mathcal{O}}(\frac{n^2}{\varepsilon})$ time, in which $n$ is the number of supports of the probability distributions a…
▽ More
We consider robust variants of the standard optimal transport, named robust optimal transport, where marginal constraints are relaxed via Kullback-Leibler divergence. We show that Sinkhorn-based algorithms can approximate the optimal cost of robust optimal transport in $\widetilde{\mathcal{O}}(\frac{n^2}{\varepsilon})$ time, in which $n$ is the number of supports of the probability distributions and $\varepsilon$ is the desired error. Furthermore, we investigate a fixed-support robust barycenter problem between $m$ discrete probability distributions with at most $n$ number of supports and develop an approximating algorithm based on iterative Bregman projections (IBP). For the specific case $m = 2$, we show that this algorithm can approximate the optimal barycenter value in $\widetilde{\mathcal{O}}(\frac{mn^2}{\varepsilon})$ time, thus being better than the previous complexity $\widetilde{\mathcal{O}}(\frac{mn^2}{\varepsilon^2})$ of the IBP algorithm for approximating the Wasserstein barycenter.
△ Less
Submitted 27 October, 2021; v1 submitted 12 February, 2021;
originally announced February 2021.
-
Multivariate Smoothing via the Fourier Integral Theorem and Fourier Kernel
Authors:
Nhat Ho,
Stephen G. Walker
Abstract:
Starting with the Fourier integral theorem, we present natural Monte Carlo estimators of multivariate functions including densities, mixing densities, transition densities, regression functions, and the search for modes of multivariate density functions (modal regression). Rates of convergence are established and, in many cases, provide superior rates to current standard estimators such as those b…
▽ More
Starting with the Fourier integral theorem, we present natural Monte Carlo estimators of multivariate functions including densities, mixing densities, transition densities, regression functions, and the search for modes of multivariate density functions (modal regression). Rates of convergence are established and, in many cases, provide superior rates to current standard estimators such as those based on kernels, including kernel density estimators and kernel regression functions. Numerical illustrations are presented.
△ Less
Submitted 28 December, 2020;
originally announced December 2020.
-
Projection Robust Wasserstein Distance and Riemannian Optimization
Authors:
Tianyi Lin,
Chenyou Fan,
Nhat Ho,
Marco Cuturi,
Michael I. Jordan
Abstract:
Projection robust Wasserstein (PRW) distance, or Wasserstein projection pursuit (WPP), is a robust variant of the Wasserstein distance. Recent work suggests that this quantity is more robust than the standard Wasserstein distance, in particular when comparing probability measures in high-dimensions. However, it is ruled out for practical application because the optimization model is essentially no…
▽ More
Projection robust Wasserstein (PRW) distance, or Wasserstein projection pursuit (WPP), is a robust variant of the Wasserstein distance. Recent work suggests that this quantity is more robust than the standard Wasserstein distance, in particular when comparing probability measures in high-dimensions. However, it is ruled out for practical application because the optimization model is essentially non-convex and non-smooth which makes the computation intractable. Our contribution in this paper is to revisit the original motivation behind WPP/PRW, but take the hard route of showing that, despite its non-convexity and lack of nonsmoothness, and even despite some hardness results proved by~\citet{Niles-2019-Estimation} in a minimax sense, the original formulation for PRW/WPP \textit{can} be efficiently computed in practice using Riemannian optimization, yielding in relevant cases better behavior than its convex relaxation. More specifically, we provide three simple algorithms with solid theoretical guarantee on their complexity bound (one in the appendix), and demonstrate their effectiveness and efficiency by conducing extensive experiments on synthetic and real data. This paper provides a first step into a computational theory of the PRW distance and provides the links between optimal transport and Riemannian optimization.
△ Less
Submitted 1 January, 2023; v1 submitted 12 June, 2020;
originally announced June 2020.
-
Uniform Convergence Rates for Maximum Likelihood Estimation under Two-Component Gaussian Mixture Models
Authors:
Tudor Manole,
Nhat Ho
Abstract:
We derive uniform convergence rates for the maximum likelihood estimator and minimax lower bounds for parameter estimation in two-component location-scale Gaussian mixture models with unequal variances. We assume the mixing proportions of the mixture are known and fixed, but make no separation assumption on the underlying mixture components. A phase transition is shown to exist in the optimal para…
▽ More
We derive uniform convergence rates for the maximum likelihood estimator and minimax lower bounds for parameter estimation in two-component location-scale Gaussian mixture models with unequal variances. We assume the mixing proportions of the mixture are known and fixed, but make no separation assumption on the underlying mixture components. A phase transition is shown to exist in the optimal parameter estimation rate, depending on whether or not the mixture is balanced. Key to our analysis is a careful study of the dependence between the parameters of location-scale Gaussian mixture models, as captured through systems of polynomial equalities and inequalities whose solution set drives the rates we obtain. A simulation study illustrates the theoretical findings of this work.
△ Less
Submitted 1 June, 2020;
originally announced June 2020.
-
Instability, Computational Efficiency and Statistical Accuracy
Authors:
Nhat Ho,
Koulik Khamaru,
Raaz Dwivedi,
Martin J. Wainwright,
Michael I. Jordan,
Bin Yu
Abstract:
Many statistical estimators are defined as the fixed point of a data-dependent operator, with estimators based on minimizing a cost function being an important special case. The limiting performance of such estimators depends on the properties of the population-level operator in the idealized limit of infinitely many samples. We develop a general framework that yields bounds on statistical accurac…
▽ More
Many statistical estimators are defined as the fixed point of a data-dependent operator, with estimators based on minimizing a cost function being an important special case. The limiting performance of such estimators depends on the properties of the population-level operator in the idealized limit of infinitely many samples. We develop a general framework that yields bounds on statistical accuracy based on the interplay between the deterministic convergence rate of the algorithm at the population level, and its degree of (in)stability when applied to an empirical object based on $n$ samples. Using this framework, we analyze both stable forms of gradient descent and some higher-order and unstable algorithms, including Newton's method and its cubic-regularized variant, as well as the EM algorithm. We provide applications of our general results to several concrete classes of models, including Gaussian mixture estimation, non-linear regression models, and informative non-response models. We exhibit cases in which an unstable algorithm can achieve the same statistical accuracy as a stable algorithm in exponentially fewer steps -- namely, with the number of iterations being reduced from polynomial to logarithmic in sample size $n$.
△ Less
Submitted 20 March, 2022; v1 submitted 22 May, 2020;
originally announced May 2020.
-
Flat connections and the commutator map for SU(2)
Authors:
Nan-Kuo Ho,
Lisa C. Jeffrey,
Paul Selick,
Eugene Z. Xia
Abstract:
We study the topology of the SU(2)-representation variety of the compact oriented surface of genus 2 with one boundary component about which the holonomy is a generator of the center of SU(2).
We study the topology of the SU(2)-representation variety of the compact oriented surface of genus 2 with one boundary component about which the holonomy is a generator of the center of SU(2).
△ Less
Submitted 5 February, 2021; v1 submitted 15 May, 2020;
originally announced May 2020.
-
On Unbalanced Optimal Transport: An Analysis of Sinkhorn Algorithm
Authors:
Khiem Pham,
Khang Le,
Nhat Ho,
Tung Pham,
Hung Bui
Abstract:
We provide a computational complexity analysis for the Sinkhorn algorithm that solves the entropic regularized Unbalanced Optimal Transport (UOT) problem between two measures of possibly different masses with at most $n$ components. We show that the complexity of the Sinkhorn algorithm for finding an $\varepsilon$-approximate solution to the UOT problem is of order…
▽ More
We provide a computational complexity analysis for the Sinkhorn algorithm that solves the entropic regularized Unbalanced Optimal Transport (UOT) problem between two measures of possibly different masses with at most $n$ components. We show that the complexity of the Sinkhorn algorithm for finding an $\varepsilon$-approximate solution to the UOT problem is of order $\widetilde{\mathcal{O}}(n^2/ \varepsilon)$, which is near-linear time. To the best of our knowledge, this complexity is better than the complexity of the Sinkhorn algorithm for solving the Optimal Transport (OT) problem, which is of order $\widetilde{\mathcal{O}}(n^2/\varepsilon^2)$. Our proof technique is based on the geometric convergence of the Sinkhorn updates to the optimal dual solution of the entropic regularized UOT problem and some properties of the primal solution. It is also different from the proof for the complexity of the Sinkhorn algorithm for approximating the OT problem since the UOT solution does not have to meet the marginal constraints.
△ Less
Submitted 18 November, 2020; v1 submitted 9 February, 2020;
originally announced February 2020.
-
Sampling for Bayesian Mixture Models: MCMC with Polynomial-Time Mixing
Authors:
Wenlong Mou,
Nhat Ho,
Martin J. Wainwright,
Peter L. Bartlett,
Michael I. Jordan
Abstract:
We study the problem of sampling from the power posterior distribution in Bayesian Gaussian mixture models, a robust version of the classical posterior. This power posterior is known to be non-log-concave and multi-modal, which leads to exponential mixing times for some standard MCMC algorithms. We introduce and study the Reflected Metropolis-Hastings Random Walk (RMRW) algorithm for sampling. For…
▽ More
We study the problem of sampling from the power posterior distribution in Bayesian Gaussian mixture models, a robust version of the classical posterior. This power posterior is known to be non-log-concave and multi-modal, which leads to exponential mixing times for some standard MCMC algorithms. We introduce and study the Reflected Metropolis-Hastings Random Walk (RMRW) algorithm for sampling. For symmetric two-component Gaussian mixtures, we prove that its mixing time is bounded as $d^{1.5}(d + \Vert θ_{0} \Vert^2)^{4.5}$ as long as the sample size $n$ is of the order $d (d + \Vert θ_{0} \Vert^2)$. Notably, this result requires no conditions on the separation of the two means. En route to proving this bound, we establish some new results of possible independent interest that allow for combining Poincaré inequalities for conditional and marginal densities.
△ Less
Submitted 11 December, 2019;
originally announced December 2019.
-
On the Complexity of Approximating Multimarginal Optimal Transport
Authors:
Tianyi Lin,
Nhat Ho,
Marco Cuturi,
Michael I. Jordan
Abstract:
We study the complexity of approximating the multimarginal optimal transport (MOT) distance, a generalization of the classical optimal transport distance, considered here between $m$ discrete probability distributions supported each on $n$ support points. First, we show that the standard linear programming (LP) representation of the MOT problem is not a minimum-cost flow problem when $m \geq 3$. T…
▽ More
We study the complexity of approximating the multimarginal optimal transport (MOT) distance, a generalization of the classical optimal transport distance, considered here between $m$ discrete probability distributions supported each on $n$ support points. First, we show that the standard linear programming (LP) representation of the MOT problem is not a minimum-cost flow problem when $m \geq 3$. This negative result implies that some combinatorial algorithms, e.g., network simplex method, are not suitable for approximating the MOT problem, while the worst-case complexity bound for the deterministic interior-point algorithm remains a quantity of $\tilde{O}(n^{3m})$. We then propose two simple and \textit{deterministic} algorithms for approximating the MOT problem. The first algorithm, which we refer to as \textit{multimarginal Sinkhorn} algorithm, is a provably efficient multimarginal generalization of the Sinkhorn algorithm. We show that it achieves a complexity bound of $\tilde{O}(m^3n^m\varepsilon^{-2})$ for a tolerance $\varepsilon \in (0, 1)$. This provides a first \textit{near-linear time} complexity bound guarantee for approximating the MOT problem and matches the best known complexity bound for the Sinkhorn algorithm in the classical OT setting when $m = 2$. The second algorithm, which we refer to as \textit{accelerated multimarginal Sinkhorn} algorithm, achieves the acceleration by incorporating an estimate sequence and the complexity bound is $\tilde{O}(m^3n^{m+1/3}\varepsilon^{-4/3})$. This bound is better than that of the first algorithm in terms of $1/\varepsilon$, and accelerated alternating minimization algorithm~\citep{Tupitsa-2020-Multimarginal} in terms of $n$. Finally, we compare our new algorithms with the commercial LP solver \textsc{Gurobi}. Preliminary results on synthetic data and real images demonstrate the effectiveness and efficiency of our algorithms.
△ Less
Submitted 21 February, 2022; v1 submitted 30 September, 2019;
originally announced October 2019.
-
A Diffusion Process Perspective on Posterior Contraction Rates for Parameters
Authors:
Wenlong Mou,
Nhat Ho,
Martin J. Wainwright,
Peter Bartlett,
Michael I. Jordan
Abstract:
We analyze the posterior contraction rates of parameters in Bayesian models via the Langevin diffusion process, in particular by controlling moments of the stochastic process and taking limits. Analogous to the non-asymptotic analysis of statistical M-estimators and stochastic optimization algorithms, our contraction rates depend on the structure of the population log-likelihood function, and stoc…
▽ More
We analyze the posterior contraction rates of parameters in Bayesian models via the Langevin diffusion process, in particular by controlling moments of the stochastic process and taking limits. Analogous to the non-asymptotic analysis of statistical M-estimators and stochastic optimization algorithms, our contraction rates depend on the structure of the population log-likelihood function, and stochastic perturbation bounds between the population and sample log-likelihood functions. Convergence rates are determined by a non-linear equation that relates the population-level structure to stochastic perturbation terms, along with a term characterizing the diffusive behavior. Based on this technique, we also prove non-asymptotic versions of a Bernstein-von-Mises guarantee for the posterior. We illustrate this general theory by deriving posterior convergence rates for various concrete examples, as well as approximate posterior distributions computed using Langevin sampling procedures.
△ Less
Submitted 16 August, 2022; v1 submitted 3 September, 2019;
originally announced September 2019.
-
Convergence Rates for Gaussian Mixtures of Experts
Authors:
Nhat Ho,
Chiao-Yu Yang,
Michael I. Jordan
Abstract:
We provide a theoretical treatment of over-specified Gaussian mixtures of experts with covariate-free gating networks. We establish the convergence rates of the maximum likelihood estimation (MLE) for these models. Our proof technique is based on a novel notion of \emph{algebraic independence} of the expert functions. Drawing on optimal transport theory, we establish a connection between the algeb…
▽ More
We provide a theoretical treatment of over-specified Gaussian mixtures of experts with covariate-free gating networks. We establish the convergence rates of the maximum likelihood estimation (MLE) for these models. Our proof technique is based on a novel notion of \emph{algebraic independence} of the expert functions. Drawing on optimal transport theory, we establish a connection between the algebraic independence and a certain class of partial differential equations (PDEs). Exploiting this connection allows us to derive convergence rates and minimax lower bounds for parameter estimation.
△ Less
Submitted 7 March, 2022; v1 submitted 9 July, 2019;
originally announced July 2019.
-
Posterior Distribution for the Number of Clusters in Dirichlet Process Mixture Models
Authors:
Chiao-Yu Yang,
Eric Xia,
Nhat Ho,
Michael I. Jordan
Abstract:
Dirichlet process mixture models (DPMM) play a central role in Bayesian nonparametrics, with applications throughout statistics and machine learning. DPMMs are generally used in clustering problems where the number of clusters is not known in advance, and the posterior distribution is treated as providing inference for this number. Recently, however, it has been shown that the DPMM is inconsistent…
▽ More
Dirichlet process mixture models (DPMM) play a central role in Bayesian nonparametrics, with applications throughout statistics and machine learning. DPMMs are generally used in clustering problems where the number of clusters is not known in advance, and the posterior distribution is treated as providing inference for this number. Recently, however, it has been shown that the DPMM is inconsistent in inferring the true number of components in certain cases. This is an asymptotic result, and it would be desirable to understand whether it holds with finite samples, and to more fully understand the full posterior. In this work, we provide a rigorous study for the posterior distribution of the number of clusters in DPMM under different prior distributions on the parameters and constraints on the distributions of the data. We provide novel lower bounds on the ratios of probabilities between $s+1$ clusters and $s$ clusters when the prior distributions on parameters are chosen to be Gaussian or uniform distributions.
△ Less
Submitted 18 October, 2020; v1 submitted 23 May, 2019;
originally announced May 2019.
-
Sharp Analysis of Expectation-Maximization for Weakly Identifiable Models
Authors:
Raaz Dwivedi,
Nhat Ho,
Koulik Khamaru,
Martin J. Wainwright,
Michael I. Jordan,
Bin Yu
Abstract:
We study a class of weakly identifiable location-scale mixture models for which the maximum likelihood estimates based on $n$ i.i.d. samples are known to have lower accuracy than the classical $n^{- \frac{1}{2}}$ error. We investigate whether the Expectation-Maximization (EM) algorithm also converges slowly for these models. We provide a rigorous characterization of EM for fitting a weakly identif…
▽ More
We study a class of weakly identifiable location-scale mixture models for which the maximum likelihood estimates based on $n$ i.i.d. samples are known to have lower accuracy than the classical $n^{- \frac{1}{2}}$ error. We investigate whether the Expectation-Maximization (EM) algorithm also converges slowly for these models. We provide a rigorous characterization of EM for fitting a weakly identifiable Gaussian mixture in a univariate setting where we prove that the EM algorithm converges in order $n^{\frac{3}{4}}$ steps and returns estimates that are at a Euclidean distance of order ${ n^{- \frac{1}{8}}}$ and ${ n^{-\frac{1} {4}}}$ from the true location and scale parameter respectively. Establishing the slow rates in the univariate setting requires a novel localization argument with two stages, with each stage involving an epoch-based argument applied to a different surrogate EM operator at the population level. We demonstrate several multivariate ($d \geq 2$) examples that exhibit the same slow rates as the univariate case. We also prove slow statistical rates in higher dimensions in a special case, when the fitted covariance is constrained to be a multiple of the identity.
△ Less
Submitted 15 November, 2021; v1 submitted 1 February, 2019;
originally announced February 2019.
-
On posterior contraction of parameters and interpretability in Bayesian mixture modeling
Authors:
Aritra Guha,
Nhat Ho,
XuanLong Nguyen
Abstract:
We study posterior contraction behaviors for parameters of interest in the context of Bayesian mixture modeling, where the number of mixing components is unknown while the model itself may or may not be correctly specified. Two representative types of prior specification will be considered: one requires explicitly a prior distribution on the number of mixture components, while the other places a n…
▽ More
We study posterior contraction behaviors for parameters of interest in the context of Bayesian mixture modeling, where the number of mixing components is unknown while the model itself may or may not be correctly specified. Two representative types of prior specification will be considered: one requires explicitly a prior distribution on the number of mixture components, while the other places a nonparametric prior on the space of mixing distributions. The former is shown to yield an optimal rate of posterior contraction on the model parameters under minimal conditions, while the latter can be utilized to consistently recover the unknown number of mixture components, with the help of a fast probabilistic post-processing procedure. We then turn the study of these Bayesian procedures to the realistic settings of model misspecification. It will be shown that the modeling choice of kernel density functions plays perhaps the most impactful roles in determining the posterior contraction rates in the misspecified situations. Drawing on concrete posterior contraction rates established in this paper we wish to highlight some aspects about the interesting tradeoffs between model expressiveness and interpretability that a statistical modeler must negotiate in the rich world of mixture modeling.
△ Less
Submitted 15 January, 2019;
originally announced January 2019.
-
Invisible knots and rainbow rings: knots not determined by their determinants
Authors:
James Godzik,
Nancy Ho,
Jennifer Jones,
Thomas W. Mattman,
Dan Sours
Abstract:
We determine p-colorability of the paradromic rings. These rings arise by generalizing the well-known experiment of bisecting a Mobius strip. Instead of joining the ends with a single half twist, use $m$ twists, and, rather than bisecting ($n = 2$), cut the strip into $n$ sections. We call the resulting collection of thin strips $P(m,n)$. By replacing each thin strip with its midline, we think of…
▽ More
We determine p-colorability of the paradromic rings. These rings arise by generalizing the well-known experiment of bisecting a Mobius strip. Instead of joining the ends with a single half twist, use $m$ twists, and, rather than bisecting ($n = 2$), cut the strip into $n$ sections. We call the resulting collection of thin strips $P(m,n)$. By replacing each thin strip with its midline, we think of $P(m,n)$ as a link, that is, a collection of circles in space. Using the notion of $p$-colorability from knot theory, we determine, for each $m$ and $n$, which primes $p$ can be used to color $P(m,n)$.
Amazingly, almost all admit 0, 1, or an infinite number of prime colorings! This is reminiscent of solutions sets in linear algebra. Indeed, the problem quickly turns into a study of the eigenvalues of a large, nearly diagonal matrix.
Our paper combines this explicit calculation in linear algebra with a survey of several ideas from knot theory including colorability and torus links.
△ Less
Submitted 4 January, 2019;
originally announced January 2019.
-
Singularity, Misspecification, and the Convergence Rate of EM
Authors:
Raaz Dwivedi,
Nhat Ho,
Koulik Khamaru,
Michael I. Jordan,
Martin J. Wainwright,
Bin Yu
Abstract:
A line of recent work has analyzed the behavior of the Expectation-Maximization (EM) algorithm in the well-specified setting, in which the population likelihood is locally strongly concave around its maximizing argument. Examples include suitably separated Gaussian mixture models and mixtures of linear regressions. We consider over-specified settings in which the number of fitted components is lar…
▽ More
A line of recent work has analyzed the behavior of the Expectation-Maximization (EM) algorithm in the well-specified setting, in which the population likelihood is locally strongly concave around its maximizing argument. Examples include suitably separated Gaussian mixture models and mixtures of linear regressions. We consider over-specified settings in which the number of fitted components is larger than the number of components in the true distribution. Such misspecified settings can lead to singularity in the Fisher information matrix, and moreover, the maximum likelihood estimator based on $n$ i.i.d. samples in $d$ dimensions can have a non-standard $\mathcal{O}((d/n)^{\frac{1}{4}})$ rate of convergence. Focusing on the simple setting of two-component mixtures fit to a $d$-dimensional Gaussian distribution, we study the behavior of the EM algorithm both when the mixture weights are different (unbalanced case), and are equal (balanced case). Our analysis reveals a sharp distinction between these two cases: in the former, the EM algorithm converges geometrically to a point at Euclidean distance of $\mathcal{O}((d/n)^{\frac{1}{2}})$ from the true parameter, whereas in the latter case, the convergence rate is exponentially slower, and the fixed point has a much lower $\mathcal{O}((d/n)^{\frac{1}{4}})$ accuracy. Analysis of this singular case requires the introduction of some novel techniques: in particular, we make use of a careful form of localization in the associated empirical process, and develop a recursive argument to progressively sharpen the statistical rate.
△ Less
Submitted 28 April, 2020; v1 submitted 1 October, 2018;
originally announced October 2018.
-
Sprague-Grundy Function of Matroids and Related Hypergraphs
Authors:
Endre Boros,
Vladimir Gurvich,
Nhan Bao Ho,
Kazuhisa Makino,
Peter Mursic
Abstract:
We consider a generalization of the classical game of $NIM$ called hypergraph $NIM$. Given a hypergraph $\cH$ on the ground set $V = \{1, \ldots, n\}$ of $n$ piles of stones, two players alternate in choosing a hyperedge $H \in \cH$ and strictly decreasing all piles $i\in H$. The player who makes the last move is the winner. In this paper we give an explicit formula that describes the Sprague-Grun…
▽ More
We consider a generalization of the classical game of $NIM$ called hypergraph $NIM$. Given a hypergraph $\cH$ on the ground set $V = \{1, \ldots, n\}$ of $n$ piles of stones, two players alternate in choosing a hyperedge $H \in \cH$ and strictly decreasing all piles $i\in H$. The player who makes the last move is the winner. In this paper we give an explicit formula that describes the Sprague-Grundy function of hypergraph $NIM$ for several classes of hypergraphs. In particular we characterize all $2$-uniform hypergraphs (that is graphs) and all matroids for which the formula works. We show that all self-dual matroids are included in this class.
△ Less
Submitted 19 March, 2019; v1 submitted 31 March, 2018;
originally announced April 2018.
-
Sprague-Grundy Function of Symmetric Hypergraphs
Authors:
Endre Boros,
Vladimir Gurvich,
Nhan Bao Ho,
Kazuhisa Makino,
Peter Mursic
Abstract:
We consider a generalization of the classical game of $NIM$ called hypergraph $NIM$. Given a hypergraph $\cH$ on the ground set $V = \{1, \ldots, n\}$ of $n$ piles of stones, two players alternate in choosing a hyperedge $H \in \cH$ and strictly decreasing all piles $i\in H$. The player who makes the last move is the winner. Recently it was shown that for many classes of hypergraphs the Sprague-Gr…
▽ More
We consider a generalization of the classical game of $NIM$ called hypergraph $NIM$. Given a hypergraph $\cH$ on the ground set $V = \{1, \ldots, n\}$ of $n$ piles of stones, two players alternate in choosing a hyperedge $H \in \cH$ and strictly decreasing all piles $i\in H$. The player who makes the last move is the winner. Recently it was shown that for many classes of hypergraphs the Sprague-Grundy function of the corresponding game is given by the formula introduced originally by Jenkyns and Mayberry (1980). In this paper we characterize symmetric hypergraphs for which the Sprague-Grundy function is described by the same formula.
△ Less
Submitted 31 March, 2018;
originally announced April 2018.
-
Kostant, Steinberg, and the Stokes matrices of the tt*-Toda equations
Authors:
Martin Guest,
Nan-Kuo Ho
Abstract:
We propose a Lie-theoretic definition of the tt*-Toda equations for any complex simple Lie algebra $\mathfrak{g}$, based on the concept of topological-antitopological fusion which was introduced by Cecotti and Vafa. Our main result concerns the Stokes data of a certain meromorphic connection, whose isomonodromic deformations are controlled by these equations. Exploiting a framework introduced by B…
▽ More
We propose a Lie-theoretic definition of the tt*-Toda equations for any complex simple Lie algebra $\mathfrak{g}$, based on the concept of topological-antitopological fusion which was introduced by Cecotti and Vafa. Our main result concerns the Stokes data of a certain meromorphic connection, whose isomonodromic deformations are controlled by these equations. Exploiting a framework introduced by Boalch, we show that this data has a remarkable structure, which can be described using Kostant's theory of Cartan subalgebras in apposition and Steinberg's theory of conjugacy classes of regular elements. A by-product of this is a convenient visualization of the orbit structure of the roots under the action of a Coxeter element. As an application, we compute canonical Stokes data of certain solutions of the tt*-Toda equations in terms of their asymptotics.
△ Less
Submitted 4 February, 2018;
originally announced February 2018.
-
A Lie-theoretic description of the solution space of the tt*-Toda equations
Authors:
Martin Guest,
Nan-Kuo Ho
Abstract:
We give a Lie-theoretic explanation for the convex polytope which parametrizes the globally smooth solutions of the topological-antitopological fusion equations of Toda type (tt$^*$-Toda equations) which were introduced by Cecotti and Vafa. It is known from [GL] [GIL1] [M1] [M2] that these solutions can be parametrized by monodromy data of a certain flat $SL_{n+1}\mathbb{R}$-connection. Using Boal…
▽ More
We give a Lie-theoretic explanation for the convex polytope which parametrizes the globally smooth solutions of the topological-antitopological fusion equations of Toda type (tt$^*$-Toda equations) which were introduced by Cecotti and Vafa. It is known from [GL] [GIL1] [M1] [M2] that these solutions can be parametrized by monodromy data of a certain flat $SL_{n+1}\mathbb{R}$-connection. Using Boalch's Lie-theoretic description of Stokes data, and Steinberg's description of regular conjugacy classes of a linear algebraic group, we express this monodromy data as a convex subset of a Weyl alcove of $SU_{n+1}$.
△ Less
Submitted 31 January, 2018;
originally announced January 2018.
-
The SU(2)-character variety of the closed surface of genus 2
Authors:
Nan-Kuo Ho,
Lisa C. Jeffrey,
Khoa Dang Nguyen,
Eugene Z. Xia
Abstract:
We study the symplectic geometry of the SU(2)-representation variety of the compact oriented surface of genus 2. We use the Goldman flows to identify subsets of the moduli space with corresponding subsets of $\mathbb P^3(\mathbb C)$. We also define and study two antisymplectic involutions on the moduli space and their fixed point sets.
We study the symplectic geometry of the SU(2)-representation variety of the compact oriented surface of genus 2. We use the Goldman flows to identify subsets of the moduli space with corresponding subsets of $\mathbb P^3(\mathbb C)$. We also define and study two antisymplectic involutions on the moduli space and their fixed point sets.
△ Less
Submitted 6 November, 2017;
originally announced November 2017.
-
Robust estimation of mixing measures in finite mixture models
Authors:
Nhat Ho,
XuanLong Nguyen,
Ya'acov Ritov
Abstract:
In finite mixture models, apart from underlying mixing measure, true kernel density function of each subpopulation in the data is, in many scenarios, unknown. Perhaps the most popular approach is to choose some kernel functions that we empirically believe our data are generated from and use these kernels to fit our models. Nevertheless, as long as the chosen kernel and the true kernel are differen…
▽ More
In finite mixture models, apart from underlying mixing measure, true kernel density function of each subpopulation in the data is, in many scenarios, unknown. Perhaps the most popular approach is to choose some kernel functions that we empirically believe our data are generated from and use these kernels to fit our models. Nevertheless, as long as the chosen kernel and the true kernel are different, statistical inference of mixing measure under this setting will be highly unstable. To overcome this challenge, we propose flexible and efficient robust estimators of the mixing measure in these models, which are inspired by the idea of minimum Hellinger distance estimator, model selection criteria, and superefficiency phenomenon. We demonstrate that our estimators consistently recover the true number of components and achieve the optimal convergence rates of parameter estimation under both the well- and mis-specified kernel settings for any fixed bandwidth. These desirable asymptotic properties are illustrated via careful simulation studies with both synthetic and real data.
△ Less
Submitted 23 September, 2017;
originally announced September 2017.
-
The Sprague-Grundy function for some nearly disjunctive sums of Nim and Silver Dollar games
Authors:
Graham Farr,
Nhan Bao Ho
Abstract:
We introduce and analyse an extension of the disjunctive sum operation on some classical impartial games. Whereas the disjunctive sum describes positions formed from independent subpositions, our operation combines positions that are not completely independent but interact only in a very restricted way. We extend the games Nim and Silver Dollar, played by moving counters along one-dimensional stri…
▽ More
We introduce and analyse an extension of the disjunctive sum operation on some classical impartial games. Whereas the disjunctive sum describes positions formed from independent subpositions, our operation combines positions that are not completely independent but interact only in a very restricted way. We extend the games Nim and Silver Dollar, played by moving counters along one-dimensional strips of cells, by joining several strips at their initial cell. We prove that, in certain cases, computing the Sprague-Grundy function can be simplified to that of a simpler game with at most two tokens in each strip. We give an algorithm that, for each Sprague-Grundy value g, computes the positions of two-token Star Nim whose Sprague-Grundy values are g. We establish that the sequence of differences of entries of these positions is ultimately additively periodic.
△ Less
Submitted 22 February, 2017;
originally announced February 2017.
-
Tetris Hypergraphs and Combinations of Impartial Games
Authors:
Endre Boros,
Vladimir Gurvich,
Nhan Bao Ho,
Kazuhisa Makino,
Peter Mursic
Abstract:
The Sprague-Grundy (SG) theory reduces the sum of impartial games to the classical game of $NIM$. We generalize the concept of sum and introduce $\cH$-combinations of impartial games for any hypergraph $\cH$. In particular, we introduce the game $NIM_\cH$ which is the $\cH$-combination of single pile $NIM$ games. An impartial game is called SG decreasing if its SG value is decreased by every move.…
▽ More
The Sprague-Grundy (SG) theory reduces the sum of impartial games to the classical game of $NIM$. We generalize the concept of sum and introduce $\cH$-combinations of impartial games for any hypergraph $\cH$. In particular, we introduce the game $NIM_\cH$ which is the $\cH$-combination of single pile $NIM$ games. An impartial game is called SG decreasing if its SG value is decreased by every move. Extending the SG theory, we reduce the $\cH$-combination of SG decreasing games to $NIM_\cH$. We call $\cH$ a Tetris hypergraph if $NIM_\cH$ is SG decreasing. We provide some necessary and some sufficient conditions for a hypergraph to be Tetris.
△ Less
Submitted 10 January, 2017;
originally announced January 2017.
-
Conditions of smoothness of moduli spaces of flat connections and of character varieties
Authors:
Nan-Kuo Ho,
Graeme Wilkin,
Siye Wu
Abstract:
We use gauge theoretic and algebraic methods to examine sufficient conditions for smooth points on the moduli space of flat connections on a compact manifold and on the character variety of a finitely generated and presented group. We give a complete proof of the slice theorem for the action of the group of gauge transformations on the space of flat connections. Consequently, the slice is smooth i…
▽ More
We use gauge theoretic and algebraic methods to examine sufficient conditions for smooth points on the moduli space of flat connections on a compact manifold and on the character variety of a finitely generated and presented group. We give a complete proof of the slice theorem for the action of the group of gauge transformations on the space of flat connections. Consequently, the slice is smooth if the second cohomology of the manifold with coefficients in the semisimple part of the adjoint bundle vanishes. On the other hand, we find that the smoothness of the slice for the character variety of a finitely generated and presented group depends not only on the second group cohomology but also on the relation module of the presentation. However, when there is a single relator or if there is no relation among the relators in the presentation, our condition reduces to the minimality of the second group cohomology. This is also verified using Fox calculus. Finally, we compare the conditions of smoothness in the two approaches.
△ Less
Submitted 12 September, 2018; v1 submitted 31 October, 2016;
originally announced October 2016.
-
Singularity structures and impacts on parameter estimation in finite mixtures of distributions
Authors:
Nhat Ho,
XuanLong Nguyen
Abstract:
Singularities of a statistical model are the elements of the model's parameter space which make the corresponding Fisher information matrix degenerate. These are the points for which estimation techniques such as the maximum likelihood estimator and standard Bayesian procedures do not admit the root-$n$ parametric rate of convergence. We propose a general framework for the identification of singul…
▽ More
Singularities of a statistical model are the elements of the model's parameter space which make the corresponding Fisher information matrix degenerate. These are the points for which estimation techniques such as the maximum likelihood estimator and standard Bayesian procedures do not admit the root-$n$ parametric rate of convergence. We propose a general framework for the identification of singularity structures of the parameter space of finite mixtures, and study the impacts of the singularity structures on minimax lower bounds and rates of convergence for the maximum likelihood estimator over a compact parameter space. Our study makes explicit the deep links between model singularities, parameter estimation convergence rates and minimax lower bounds, and the algebraic geometry of the parameter space for mixtures of continuous distributions. The theory is applied to establish concrete convergence rates of parameter estimation for finite mixture of skew-normal distributions. This rich and increasingly popular mixture model is shown to exhibit a remarkably complex range of asymptotic behaviors which have not been hitherto reported in the literature.
△ Less
Submitted 23 July, 2019; v1 submitted 9 September, 2016;
originally announced September 2016.
-
Accelerating the Uzawa Algorithm
Authors:
Nguyenho Ho,
Sarah D. Olson,
Homer F. Walker
Abstract:
The Uzawa algorithm is an iterative method for the solution of saddle-point problems, which arise in many applications, including fluid dynamics. Viewing the Uzawa algorithm as a fixed- point iteration, we explore the use of Anderson acceleration (also known as Anderson mixing) to improve the convergence. We compare the performance of the preconditioned Uzawa algorithm with and without acceleratio…
▽ More
The Uzawa algorithm is an iterative method for the solution of saddle-point problems, which arise in many applications, including fluid dynamics. Viewing the Uzawa algorithm as a fixed- point iteration, we explore the use of Anderson acceleration (also known as Anderson mixing) to improve the convergence. We compare the performance of the preconditioned Uzawa algorithm with and without acceleration on several steady Stokes and Oseen problems for incompressible flows. For perspective, we include in our comparison several other iterative methods that have appeared in the literature. The results indicate that the accelerated preconditioned Uzawa algorithm converges significantly faster than the algorithm without acceleration and is competitive with the other methods considered.
△ Less
Submitted 23 May, 2016; v1 submitted 14 October, 2015;
originally announced October 2015.
-
On tame, pet, domestic, and miserable impartial games
Authors:
Vladimir Gurvich,
Nhan Bao Ho
Abstract:
Playing impartial games under the normal and misere conventions may differ a lot. However, there are also many "exceptions" for which the normal and misere plays are very similar. As early as in 1901 Bouton noticed that this is the case with the game of Nim. In 1976 Conway introduced a large class of such games that he called tame games. Here we introduce a proper subclass, pet games, and a proper…
▽ More
Playing impartial games under the normal and misere conventions may differ a lot. However, there are also many "exceptions" for which the normal and misere plays are very similar. As early as in 1901 Bouton noticed that this is the case with the game of Nim. In 1976 Conway introduced a large class of such games that he called tame games. Here we introduce a proper subclass, pet games, and a proper superclass, domestic games. For each of these three classes we provide an efficiently verifiable characterization based on the following property. These games are closely related to another important subclass of the tame games introduced in 2007 by the first author and called miserable games. We show that tame, pet, and domestic games turn into miserable games by "slight modifications" of their definitions. We also show that the sum of miserable games is miserable and find several other classes that respect summation. The developed techniques allow us to prove that very many well-known impartial games fall into classes mentioned above. Such examples include all subtraction games, which are pet; game Euclid, which is miserable (and, hence, tame), as well as many versions of the Wythoff game and Nim, which may be miserable, pet, or domestic.
△ Less
Submitted 19 May, 2017; v1 submitted 25 August, 2015;
originally announced August 2015.
-
Slow $k$-Nim
Authors:
Vladimir Gurvich,
Nhan Bao Ho
Abstract:
Given $n$ piles of tokens and a positive integer $k \leq n$, we study the following two impartial combinatorial games Nim$^1_{n, \leq k}$ and Nim$^1_{n, =k}$. In the first (resp. second) game, a player, by one move, chooses at least $1$ and at most (resp. exactly) $k$ non-empty piles and removes one token from each of these piles. For the normal and misère version of each game we compute the Sprag…
▽ More
Given $n$ piles of tokens and a positive integer $k \leq n$, we study the following two impartial combinatorial games Nim$^1_{n, \leq k}$ and Nim$^1_{n, =k}$. In the first (resp. second) game, a player, by one move, chooses at least $1$ and at most (resp. exactly) $k$ non-empty piles and removes one token from each of these piles. For the normal and misère version of each game we compute the Sprague-Grundy function for the cases $n = k = 2$ and $n = k+1 = 3$. For game Nim$^1_{n, \leq k}$ we also characterize its P-positions for the cases $n \leq k+2$ and $n = k+3 \leq 6$.
△ Less
Submitted 24 August, 2015;
originally announced August 2015.
-
On the Sprague-Grundy function of Exact $k$-Nim
Authors:
Endre Boros,
Vladimir Gurvich,
Nhan Bao Ho,
Kazuhisa Makino,
Peter Mursic
Abstract:
Moore's generalization of the game of {\sc Nim} is played as follows. Let $n$ and $k$ be two integers such that $1 \leq k \leq n$. Given $n$ piles of tokens, two players move alternately, removing tokens from at least one and at most $k$ of the piles. The player who makes the last move wins. The game was solved by Moore in 1910 and an explicit formula for its Sprague-Grundy function was given by J…
▽ More
Moore's generalization of the game of {\sc Nim} is played as follows. Let $n$ and $k$ be two integers such that $1 \leq k \leq n$. Given $n$ piles of tokens, two players move alternately, removing tokens from at least one and at most $k$ of the piles. The player who makes the last move wins. The game was solved by Moore in 1910 and an explicit formula for its Sprague-Grundy function was given by Jenkyns and Mayberry in 1980, for the case $n = k+1$ only. We introduce another generalization of {\sc Nim}, called {\sc Exact $k$-Nim}, in which each move reduces exactly $k$ piles. We give an explicit formula for the Sprague-Grundy function of {\sc Exact $k$-Nim} in case $2k \geq n$. In case $n=2k$ our formula is surprisingly similar to Jenkyns and Mayberry's one.
△ Less
Submitted 17 January, 2017; v1 submitted 18 August, 2015;
originally announced August 2015.
-
Three-pile Sharing Nim and the quadratic time winning strategy
Authors:
Nhan Bao Ho
Abstract:
We study a variant of 3-pile Nim in which a move consists of taking tokens from one pile and, instead of removing then, topping up on a smaller pile provided that the destination pile does not have more tokens then the source pile after the move. We discover a situation in which each column of two-dimensional array of Sprague-Grundy values is a palindrome. We establish a formula for P-positions by…
▽ More
We study a variant of 3-pile Nim in which a move consists of taking tokens from one pile and, instead of removing then, topping up on a smaller pile provided that the destination pile does not have more tokens then the source pile after the move. We discover a situation in which each column of two-dimensional array of Sprague-Grundy values is a palindrome. We establish a formula for P-positions by which winning moves can be computed in quadratic time. We prove a formula for positions whose Sprague-Grundy values are 1 and estimate the distribution of those positions whose nim-values are g. We discuss the periodicity of nim-sequences that seem to be bounded.
△ Less
Submitted 10 May, 2016; v1 submitted 23 June, 2015;
originally announced June 2015.
-
On the Sprague-Grundy Function of Tetris Extensions of Proper {\sc Nim}
Authors:
Endre Boros,
Vladimir Gurvich,
Nhan Bao Ho,
Kazuhisa Makino
Abstract:
Given a hypergraph $\cH \subseteq 2^I \setminus \{\emptyset\}$ on the ground set $I = \{1, \ldots, n\}$, we assign to each $i \in I$ a nonnegative integer $x_i$, that is a pile of $x_i$ tokens, and consider the following generalization of the classical game of {\sc Nim}: Two players alternate turns. In a move a player chooses an arbitrary edge $H \in \cH$ and reduces all piles $i \in H$. The playe…
▽ More
Given a hypergraph $\cH \subseteq 2^I \setminus \{\emptyset\}$ on the ground set $I = \{1, \ldots, n\}$, we assign to each $i \in I$ a nonnegative integer $x_i$, that is a pile of $x_i$ tokens, and consider the following generalization of the classical game of {\sc Nim}: Two players alternate turns. In a move a player chooses an arbitrary edge $H \in \cH$ and reduces all piles $i \in H$. The player who is out of moves loses. We call the obtained game hypergraph {\sc Nim}. Such a game is called proper {\sc Nim}, when $\cH=2^I \setminus\{I,\emptyset\}$ is the family of all proper subsets of $I$. Jenkyns and Mayberry \cite{JM80} described the Sprague-Grundy (or SG in short) function of these games. In this paper we introduce Tetris extensions of hypergraph {\sc Nim}, and obtain a closed formula for the SG functions of the extensions of proper {\sc Nim}, when $n\geq 3$. Surprisingly, the case of $n=2$ is much more complicated. For this case we only suggest several partial results and conjectures.
△ Less
Submitted 29 March, 2018; v1 submitted 27 April, 2015;
originally announced April 2015.
-
Intersection Cohomology of the Universal Imploded Cross-Section of SU(3)
Authors:
Nan-Kuo Ho,
Lisa Jeffrey
Abstract:
We compute the intersection cohomology of the universal imploded cross-section of SU(3), and show that it is different from the intersection cohomology of a point.
We compute the intersection cohomology of the universal imploded cross-section of SU(3), and show that it is different from the intersection cohomology of a point.
△ Less
Submitted 19 February, 2015;
originally announced February 2015.