-
A Unified Framework for Variable Selection in Model-Based Clustering with Missing Not at Random
Authors:
Binh H. Ho,
Long Nguyen Chi,
TrungTin Nguyen,
Binh T. Nguyen,
Van Ha Hoang,
Christopher Drovandi
Abstract:
Model-based clustering integrated with variable selection is a powerful tool for uncovering latent structures within complex data. However, its effectiveness is often hindered by challenges such as identifying relevant variables that define heterogeneous subgroups and handling data that are missing not at random, a prevalent issue in fields like transcriptomics. While several notable methods have…
▽ More
Model-based clustering integrated with variable selection is a powerful tool for uncovering latent structures within complex data. However, its effectiveness is often hindered by challenges such as identifying relevant variables that define heterogeneous subgroups and handling data that are missing not at random, a prevalent issue in fields like transcriptomics. While several notable methods have been proposed to address these problems, they typically tackle each issue in isolation, thereby limiting their flexibility and adaptability. This paper introduces a unified framework designed to address these challenges simultaneously. Our approach incorporates a data-driven penalty matrix into penalized clustering to enable more flexible variable selection, along with a mechanism that explicitly models the relationship between missingness and latent class membership. We demonstrate that, under certain regularity conditions, the proposed framework achieves both asymptotic consistency and selection consistency, even in the presence of missing data. This unified strategy significantly enhances the capability and efficiency of model-based clustering, advancing methodologies for identifying informative variables that define homogeneous subgroups in the presence of complex missing data patterns. The performance of the framework, including its computational efficiency, is evaluated through simulations and demonstrated using both synthetic and real-world transcriptomic datasets.
△ Less
Submitted 25 May, 2025;
originally announced May 2025.
-
Atomic Cluster Expansion without Self-Interaction
Authors:
Cheuk Hin Ho,
Timon S. Gutleb,
Christoph Ortner
Abstract:
The Atomic Cluster Expansion (ACE) (Drautz, Phys. Rev. B 99, 2019) has been widely applied in high energy physics, quantum mechanics and atomistic modeling to construct many-body interaction models respecting physical symmetries. Computational efficiency is achieved by allowing non-physical self-interaction terms in the model. We propose and analyze an efficient method to evaluate and parameterize…
▽ More
The Atomic Cluster Expansion (ACE) (Drautz, Phys. Rev. B 99, 2019) has been widely applied in high energy physics, quantum mechanics and atomistic modeling to construct many-body interaction models respecting physical symmetries. Computational efficiency is achieved by allowing non-physical self-interaction terms in the model. We propose and analyze an efficient method to evaluate and parameterize an orthogonal, or, non-self-interacting cluster expansion model. We present numerical experiments demonstrating improved conditioning and more robust approximation properties than the original expansion in regression tasks both in simplified toy problems and in applications in the machine learning of interatomic potentials.
△ Less
Submitted 4 January, 2024; v1 submitted 3 January, 2024;
originally announced January 2024.
-
A Multilevel Method for Many-Electron Schrödinger Equations Based on the Atomic Cluster Expansion
Authors:
Dexuan Zhou,
Huajie Chen,
Cheuk Hin Ho,
Christoph Ortner
Abstract:
The atomic cluster expansion (ACE) (Drautz, 2019) yields a highly efficient and intepretable parameterisation of symmetric polynomials that has achieved great success in modelling properties of many-particle systems. In the present work we extend the practical applicability of the ACE framework to the computation of many-electron wave functions. To that end, we develop a customized variational Mon…
▽ More
The atomic cluster expansion (ACE) (Drautz, 2019) yields a highly efficient and intepretable parameterisation of symmetric polynomials that has achieved great success in modelling properties of many-particle systems. In the present work we extend the practical applicability of the ACE framework to the computation of many-electron wave functions. To that end, we develop a customized variational Monte-Carlo algorithm that exploits the sparsity and hierarchical properties of ACE wave functions. We demonstrate the feasibility on a range of proof-of-concept applications to one-dimensional systems.
△ Less
Submitted 4 May, 2023; v1 submitted 9 April, 2023;
originally announced April 2023.
-
On the binary adder channel with complete feedback, with an application to quantitative group testing
Authors:
Samuel H. Florin,
Matthew H. Ho,
Zilin Jiang
Abstract:
We determine the exact value of the optimal symmetric rate point $(r, r)$ in the Dueck zero-error capacity region of the binary adder channel with complete feedback. We proved that the average zero-error capacity $r = h(1/2-δ) \approx 0.78974$, where $h(\cdot)$ is the binary entropy function and $δ= 1/(2\log_2(2+\sqrt3))$. Our motivation is a problem in quantitative group testing. Given a set of…
▽ More
We determine the exact value of the optimal symmetric rate point $(r, r)$ in the Dueck zero-error capacity region of the binary adder channel with complete feedback. We proved that the average zero-error capacity $r = h(1/2-δ) \approx 0.78974$, where $h(\cdot)$ is the binary entropy function and $δ= 1/(2\log_2(2+\sqrt3))$. Our motivation is a problem in quantitative group testing. Given a set of $n$ elements two of which are defective, the quantitative group testing problem asks for the identification of these two defectives through a series of tests. Each test gives the number of defectives contained in the tested subset, and the outcomes of previous tests are assumed known at the time of designing the current test. We establish that the minimum number of tests is asymptotic to $(\log_2 n) / r$ as $n \to \infty$.
△ Less
Submitted 28 December, 2021; v1 submitted 25 January, 2021;
originally announced January 2021.
-
Parallel FPGA Router using Sub-Gradient method and Steiner tree
Authors:
Rohit Agrawal,
Chin Hao Hoo,
Kapil Ahuja,
Akash Kumar
Abstract:
In the FPGA (Field Programmable Gate Arrays) design flow, one of the most time-consuming step is the routing of nets. Therefore, there is a need to accelerate it. In a recent paper by Hoo et. al., the authors have developed a Linear Programming based framework that parallelizes this routing process to achieve significant speedups (the algorithm is termed as ParaLaR). However, this approach has cer…
▽ More
In the FPGA (Field Programmable Gate Arrays) design flow, one of the most time-consuming step is the routing of nets. Therefore, there is a need to accelerate it. In a recent paper by Hoo et. al., the authors have developed a Linear Programming based framework that parallelizes this routing process to achieve significant speedups (the algorithm is termed as ParaLaR). However, this approach has certain weaknesses. Namely, the constraints violation by the solution and a local minima that could be improved. We address these two issues here.
In our paper, we use this framework and solve it using the Primal-Dual sub-gradient method that better exploits the problem properties. We also propose a better way to update the size of the step taken by this iterative algorithm. We perform experiments on a set of standard benchmarks, where we show that our algorithm outperforms the standard existing algorithms (VPR and ParaLaR).
We achieve up to 22% improvement in the constraints violation and the standard metric of the minimum channel width when compared with ParaLaR (which is same as in VPR). We achieve about 20% savings in another standard metric of the total wire length (when compared with VPR), which is the same as for ParaLaR. Hence, our algorithm achieves minimum value for all the three parameters. Also, the critical path delay for our algorithm is almost same as compared to VPR and ParaLaR. We achieve relative speedups of 3 times when we run a parallel version of our algorithm using 4 threads.
△ Less
Submitted 19 August, 2018; v1 submitted 10 March, 2018;
originally announced March 2018.
-
Logarithmic Coefficients and Generalized Multifractality of Whole-Plane SLE
Authors:
Bertrand Duplantier,
Xuan Hieu Ho,
Thanh Binh Le,
Michel Zinsmeister
Abstract:
We consider the whole-plane SLE conformal map f from the unit disk to the slit plane, and show that its mixed moments, involving a power p of the derivative modulus |f'| and a power q of the map |f| itself, have closed forms along some integrability curves in the (p,q) moment plane, which depend continuously on the SLE parameter kappa. The generalization of this integrability property to the m-fol…
▽ More
We consider the whole-plane SLE conformal map f from the unit disk to the slit plane, and show that its mixed moments, involving a power p of the derivative modulus |f'| and a power q of the map |f| itself, have closed forms along some integrability curves in the (p,q) moment plane, which depend continuously on the SLE parameter kappa. The generalization of this integrability property to the m-fold transform of f is also given. We define a generalized integral means spectrum corresponding to the singular behavior of the mixed moments above. By inversion, it allows for a unified description of the unbounded interior and bounded exterior versions of whole-plane SLE, and of their m-fold generalizations. The average generalized spectrum of whole-plane SLE takes four possible forms, separated by five phase transition lines in the moment plane, whereas the average generalized spectrum of the m-fold whole-plane SLE is directly obtained from a linear map acting in that plane. We also conjecture the form of the universal generalized integral means spectrum.
△ Less
Submitted 21 April, 2017; v1 submitted 21 April, 2015;
originally announced April 2015.
-
Block Sampling under Strong Dependence
Authors:
Ting Zhang,
Hwai-Chung Ho,
Martin Wendler,
Wei Biao Wu
Abstract:
The paper considers the block sampling method for long-range dependent processes. Our theory generalizes earlier ones by Hall, Jing and Lahiri (1998) on functionals of Gaussian processes and Nordman and Lahiri (2005) on linear processes. In particular, we allow nonlinear transforms of linear processes. Under suitable conditions on physical dependence measures, we prove the validity of the block sa…
▽ More
The paper considers the block sampling method for long-range dependent processes. Our theory generalizes earlier ones by Hall, Jing and Lahiri (1998) on functionals of Gaussian processes and Nordman and Lahiri (2005) on linear processes. In particular, we allow nonlinear transforms of linear processes. Under suitable conditions on physical dependence measures, we prove the validity of the block sampling method. The problem of estimating the self-similar index is also studied.
△ Less
Submitted 19 December, 2013;
originally announced December 2013.
-
On Berry--Esseen bounds for non-instantaneous filters of linear processes
Authors:
Tsung-Lin Cheng,
Hwai-Chung Ho
Abstract:
Let $X_n=\sum_{i=1}^{\infty}a_iε_{n-i}$, where the $ε_i$ are i.i.d. with mean 0 and at least finite second moment, and the $a_i$ are assumed to satisfy $|a_i|=O(i^{-β})$ with $β>1/2$. When $1/2<β<1$, $X_n$ is usually called a long-range dependent or long-memory process. For a certain class of Borel functions $K(x_1,...,x_{d+1})$, $d\ge0$, from ${\mathcal{R}}^{d+1}$ to $\mathcal{R}$, which includ…
▽ More
Let $X_n=\sum_{i=1}^{\infty}a_iε_{n-i}$, where the $ε_i$ are i.i.d. with mean 0 and at least finite second moment, and the $a_i$ are assumed to satisfy $|a_i|=O(i^{-β})$ with $β>1/2$. When $1/2<β<1$, $X_n$ is usually called a long-range dependent or long-memory process. For a certain class of Borel functions $K(x_1,...,x_{d+1})$, $d\ge0$, from ${\mathcal{R}}^{d+1}$ to $\mathcal{R}$, which includes indicator functions and polynomials, the stationary sequence $K(X_n,X_{n+1},...,X_{n+d})$ is considered. By developing a finite orthogonal expansion of $K(X_n,...,X_{n+d})$, the Berry--Esseen type bounds for the normalized sum $Q_N/\sqrt{N},Q_N=\sum_{n=1}^N(K(X_ n,...,X_{n+d})-\mathrm{E}K(X_n,...,X_{n+d}))$ are obtained when $Q_N/\sqrt{N}$ obeys the central limit theorem with positive limiting variance.
△ Less
Submitted 14 May, 2008;
originally announced May 2008.
-
Time Series and Related Topics. In Memory of Ching-Zong Wei
Authors:
Hwai-Chung Ho,
Ching-Kang Ing,
Tze Leung Lai
Abstract:
A major research area of Ching-Zong Wei (1949--2004) was time series models and their applications in econometrics and engineering, to which he made many important contributions. A conference on time series and related topics in memory of him was held on December 12--14, 2005, at Academia Sinica in Taipei, where he was Director of the Institute of Statistical Science from 1993 to 1999. Of the fo…
▽ More
A major research area of Ching-Zong Wei (1949--2004) was time series models and their applications in econometrics and engineering, to which he made many important contributions. A conference on time series and related topics in memory of him was held on December 12--14, 2005, at Academia Sinica in Taipei, where he was Director of the Institute of Statistical Science from 1993 to 1999. Of the forty-two speakers at the conference, twenty contributed to this volume. These papers are listed under the following three headings.
△ Less
Submitted 2 March, 2007;
originally announced March 2007.
-
Estimation errors of the Sharpe ratio for long-memory stochastic volatility models
Authors:
Hwai-Chung Ho
Abstract:
The Sharpe ratio, which is defined as the ratio of the excess expected return of an investment to its standard deviation, has been widely cited in the financial literature by researchers and practitioners. However, very little attention has been paid to the statistical properties of the estimation of the ratio. Lo (2002) derived the $\sqrt{n}$-normality of the ratio's estimation errors for retur…
▽ More
The Sharpe ratio, which is defined as the ratio of the excess expected return of an investment to its standard deviation, has been widely cited in the financial literature by researchers and practitioners. However, very little attention has been paid to the statistical properties of the estimation of the ratio. Lo (2002) derived the $\sqrt{n}$-normality of the ratio's estimation errors for returns which are iid or stationary with serial correlations, and pointed out that to make inference on the accuracy of the estimation, the serial correlation among the returns needs to be taken into account. In the present paper a class of time series models for returns is introduced to demonstrate that there exists a factor other than the serial correlation of the returns that dominates the asymptotic behavior of the Sharpe ratio statistics. The model under consideration is a linear process whose innovation sequence has summable coefficients and contains a latent volatility component which is long-memory. It is proved that the estimation errors of the ratio are asymptotically normal with a convergence rate slower than $\sqrt{n}$ and that the estimation deviation of the expected return makes no contribution to the limiting distribution.
△ Less
Submitted 27 February, 2007;
originally announced February 2007.
-
Iterated smoothed bootstrap confidence intervals for population quantiles
Authors:
Yvonne H. S. Ho,
Stephen M. S. Lee
Abstract:
This paper investigates the effects of smoothed bootstrap iterations on coverage probabilities of smoothed bootstrap and bootstrap-t confidence intervals for population quantiles, and establishes the optimal kernel bandwidths at various stages of the smoothing procedures. The conventional smoothed bootstrap and bootstrap-t methods have been known to yield one-sided coverage errors of orders O(n^…
▽ More
This paper investigates the effects of smoothed bootstrap iterations on coverage probabilities of smoothed bootstrap and bootstrap-t confidence intervals for population quantiles, and establishes the optimal kernel bandwidths at various stages of the smoothing procedures. The conventional smoothed bootstrap and bootstrap-t methods have been known to yield one-sided coverage errors of orders O(n^{-1/2}) and o(n^{-2/3}), respectively, for intervals based on the sample quantile of a random sample of size n. We sharpen the latter result to O(n^{-5/6}) with proper choices of bandwidths at the bootstrapping and Studentization steps. We show further that calibration of the nominal coverage level by means of the iterated bootstrap succeeds in reducing the coverage error of the smoothed bootstrap percentile interval to the order O(n^{-2/3}) and that of the smoothed bootstrap-t interval to O(n^{-58/57}), provided that bandwidths are selected of appropriate orders. Simulation results confirm our asymptotic findings, suggesting that the iterated smoothed bootstrap-t method yields the most accurate coverage. On the other hand, the iterated smoothed bootstrap percentile method interval has the advantage of being shorter and more stable than the bootstrap-t intervals.
△ Less
Submitted 25 April, 2005;
originally announced April 2005.