-
Improving statistical learning methods via features selection without replacement sampling and random projection
Authors:
Sulaiman khan,
Muhammad Ahmad,
Fida Ullah,
Carlos Aguilar Ibañez,
José Eduardo Valdez Rodriguez
Abstract:
Cancer is fundamentally a genetic disease characterized by genetic and epigenetic alterations that disrupt normal gene expression, leading to uncontrolled cell growth and metastasis. High-dimensional microarray datasets pose challenges for classification models due to the "small n, large p" problem, resulting in overfitting. This study makes three different key contributions: 1) we propose a machi…
▽ More
Cancer is fundamentally a genetic disease characterized by genetic and epigenetic alterations that disrupt normal gene expression, leading to uncontrolled cell growth and metastasis. High-dimensional microarray datasets pose challenges for classification models due to the "small n, large p" problem, resulting in overfitting. This study makes three different key contributions: 1) we propose a machine learning-based approach integrating the Feature Selection Without Re-placement (FSWOR) technique and a projection method to improve classification accuracy. 2) We apply the Kendall statistical test to identify the most significant genes from the brain cancer mi-croarray dataset (GSE50161), reducing the feature space from 54,675 to 20,890 genes.3) we apply machine learning models using k-fold cross validation techniques in which our model incorpo-rates ensemble classifiers with LDA projection and Naïve Bayes, achieving a test score of 96%, outperforming existing methods by 9.09%. The results demonstrate the effectiveness of our ap-proach in high-dimensional gene expression analysis, improving classification accuracy while mitigating overfitting. This study contributes to cancer biomarker discovery, offering a robust computational method for analyzing microarray data.
△ Less
Submitted 28 May, 2025;
originally announced June 2025.
-
Method of moments for Gaussian mixtures: Implementation and benchmarks
Authors:
Haley Colgate Kottler,
Julia Lindberg,
Jose Israel Rodriguez
Abstract:
Gaussian mixture models are universal approximators in the sense that any smooth density can be approximated arbitrarily well with a Gaussian mixture model with enough components. Due to their broad expressive power, Gaussian mixture models appear in many applications. As a result, algebraic parameter recovery for Gaussian mixture models from data is a valuable contribution to multiple fields. Our…
▽ More
Gaussian mixture models are universal approximators in the sense that any smooth density can be approximated arbitrarily well with a Gaussian mixture model with enough components. Due to their broad expressive power, Gaussian mixture models appear in many applications. As a result, algebraic parameter recovery for Gaussian mixture models from data is a valuable contribution to multiple fields. Our work documents performance of the method of moments for high dimensional Gaussian mixtures. We outline the method of moments, and selections of moments and their corresponding polynomials that work well for parameter recovery in practice. Our main contribution puts these ideas into practice with an implementation as a julia package, GMMParameterEstimation, as well as computational benchmarks.
△ Less
Submitted 11 February, 2025;
originally announced February 2025.
-
Activation degree thresholds and expressiveness of polynomial neural networks
Authors:
Bella Finkel,
Jose Israel Rodriguez,
Chenxi Wu,
Thomas Yahl
Abstract:
We study the expressive power of deep polynomial neural networks through the geometry of their neurovariety. We introduce the notion of the activation degree threshold of a network architecture to express when the dimension of the neurovariety achieves its theoretical maximum. We prove the existence of the activation degree threshold for all polynomial neural networks without width-one bottlenecks…
▽ More
We study the expressive power of deep polynomial neural networks through the geometry of their neurovariety. We introduce the notion of the activation degree threshold of a network architecture to express when the dimension of the neurovariety achieves its theoretical maximum. We prove the existence of the activation degree threshold for all polynomial neural networks without width-one bottlenecks and demonstrate a universal upper bound that is quadratic in the width of largest size. In doing so, we prove the high activation degree conjecture of Kileel, Trager, and Bruna. Certain structured architectures have exceptional activation degree thresholds, making them especially expressive in the sense of their neurovariety dimension. In this direction, we prove that polynomial neural networks with equi-width architectures are maximally expressive by showing their activation degree threshold is one.
△ Less
Submitted 24 April, 2025; v1 submitted 8 August, 2024;
originally announced August 2024.
-
Experimental validation of the Kibble-Zurek Mechanism on a Digital Quantum Computer
Authors:
Santiago Higuera-Quintero,
Ferney J. Rodríguez,
Luis Quiroga,
Fernando J. Gómez-Ruiz
Abstract:
The Kibble-Zurek mechanism (KZM) captures the essential physics of nonequilibrium quantum phase transitions with symmetry breaking. KZM predicts a universal scaling power law for the defect density which is fully determined by the system's critical exponents at equilibrium and the quenching rate. We experimentally tested the KZM for the simplest quantum case, a single qubit under the Landau-Zener…
▽ More
The Kibble-Zurek mechanism (KZM) captures the essential physics of nonequilibrium quantum phase transitions with symmetry breaking. KZM predicts a universal scaling power law for the defect density which is fully determined by the system's critical exponents at equilibrium and the quenching rate. We experimentally tested the KZM for the simplest quantum case, a single qubit under the Landau-Zener evolution, on an open access IBM quantum computer (IBM-Q). We find that for this simple one-qubit model, experimental data validates the central KZM assumption of the adiabatic-impulse approximation for a well isolated qubit. Furthermore, we report on extensive IBM-Q experiments on individual qubits embedded in different circuit environments and topologies, separately elucidating the role of crosstalk between qubits and the increasing decoherence effects associated with the quantum circuit depth on the KZM predictions. Our results strongly suggest that increasing circuit depth acts as a decoherence source, producing a rapid deviation of experimental data from theoretical unitary predictions.
△ Less
Submitted 25 October, 2022; v1 submitted 1 August, 2022;
originally announced August 2022.
-
Estimating Gaussian mixtures using sparse polynomial moment systems
Authors:
Julia Lindberg,
Carlos Améndola,
Jose Israel Rodriguez
Abstract:
The method of moments is a classical statistical technique for density estimation that solves a system of moment equations to estimate the parameters of an unknown distribution. A fundamental question critical to understanding identifiability asks how many moment equations are needed to get finitely many solutions and how many solutions there are. We answer this question for classes of Gaussian mi…
▽ More
The method of moments is a classical statistical technique for density estimation that solves a system of moment equations to estimate the parameters of an unknown distribution. A fundamental question critical to understanding identifiability asks how many moment equations are needed to get finitely many solutions and how many solutions there are. We answer this question for classes of Gaussian mixture models using the tools of polyhedral geometry. In addition, we show that a generic Gaussian $k$-mixture model is identifiable from its first $3k+2$ moments. Using these results, we present a homotopy algorithm that performs parameter recovery for high dimensional Gaussian mixture models where the number of paths tracked scales linearly in the dimension.
△ Less
Submitted 10 June, 2024; v1 submitted 29 June, 2021;
originally announced June 2021.
-
Feature Selection from High-Dimensional Data with Very Low Sample Size: A Cautionary Tale
Authors:
Ludmila I. Kuncheva,
Clare E. Matthews,
Álvar Arnaiz-González,
Juan J. Rodríguez
Abstract:
In classification problems, the purpose of feature selection is to identify a small, highly discriminative subset of the original feature set. In many applications, the dataset may have thousands of features and only a few dozens of samples (sometimes termed `wide'). This study is a cautionary tale demonstrating why feature selection in such cases may lead to undesirable results. In view to highli…
▽ More
In classification problems, the purpose of feature selection is to identify a small, highly discriminative subset of the original feature set. In many applications, the dataset may have thousands of features and only a few dozens of samples (sometimes termed `wide'). This study is a cautionary tale demonstrating why feature selection in such cases may lead to undesirable results. In view to highlight the sample size issue, we derive the required sample size for declaring two features different. Using an example, we illustrate the heavy dependency between feature set and classifier, which poses a question to classifier-agnostic feature selection methods. However, the choice of a good selector-classifier pair is hampered by the low correlation between estimated and true error rate, as illustrated by another example. While previous studies raising similar issues validate their message with mostly synthetic data, here we carried out an experiment with 20 real datasets. We created an exaggerated scenario whereby we cut a very small portion of the data (10 instances per class) for feature selection and used the rest of the data for testing. The results reinforce the caution and suggest that it may be better to refrain from feature selection from very wide datasets rather than return misleading output to the user.
△ Less
Submitted 27 August, 2020;
originally announced August 2020.
-
Automated Thalamic Nuclei Segmentation Using Multi-Planar Cascaded Convolutional Neural Networks
Authors:
Mohammad S Majdi,
Mahesh B Keerthivasan,
Brian K Rutt,
Natalie M Zahr,
Jeffrey J Rodriguez,
Manojkumar Saranathan
Abstract:
A cascaded multi-planar scheme with a modified residual U-Net architecture was used to segment thalamic nuclei on conventional and white-matter-nulled (WMn) magnetization prepared rapid gradient echo (MPRAGE) data. A single network was optimized to work with images from healthy controls and patients with multiple sclerosis (MS) and essential tremor (ET), acquired at both 3T and 7T field strengths.…
▽ More
A cascaded multi-planar scheme with a modified residual U-Net architecture was used to segment thalamic nuclei on conventional and white-matter-nulled (WMn) magnetization prepared rapid gradient echo (MPRAGE) data. A single network was optimized to work with images from healthy controls and patients with multiple sclerosis (MS) and essential tremor (ET), acquired at both 3T and 7T field strengths. Dice similarity coefficient and volume similarity index (VSI) were used to evaluate performance. Clinical utility was demonstrated by applying this method to study the effect of MS on thalamic nuclei atrophy. Segmentation of each thalamus into twelve nuclei was achieved in under a minute. For 7T WMn-MPRAGE, the proposed method outperforms current state-of-the-art on patients with ET with statistically significant improvements in Dice for five nuclei (increase in the range of 0.05-0.18) and VSI for four nuclei (increase in the range of 0.05-0.19), while performing comparably for healthy and MS subjects. Dice and VSI achieved using 7T WMn-MPRAGE data are comparable to those using 3T WMn-MPRAGE data. For conventional MPRAGE, the proposed method shows a statistically significant Dice improvement in the range of 0.14-0.63 over FreeSurfer for all nuclei and disease types. Effect of noise on network performance shows robustness to images with SNR as low as half the baseline SNR. Atrophy of four thalamic nuclei and whole thalamus was observed for MS patients compared to healthy control subjects, after controlling for the effect of parallel imaging, intracranial volume, gender, and age (p<0.004). The proposed segmentation method is fast, accurate, performs well across disease types and field strengths, and shows great potential for improving our understanding of thalamic nuclei involvement in neurological diseases.
△ Less
Submitted 17 June, 2020; v1 submitted 16 December, 2019;
originally announced December 2019.
-
Optimization Methods for Interpretable Differentiable Decision Trees in Reinforcement Learning
Authors:
Andrew Silva,
Taylor Killian,
Ivan Dario Jimenez Rodriguez,
Sung-Hyun Son,
Matthew Gombolay
Abstract:
Decision trees are ubiquitous in machine learning for their ease of use and interpretability. Yet, these models are not typically employed in reinforcement learning as they cannot be updated online via stochastic gradient descent. We overcome this limitation by allowing for a gradient update over the entire tree that improves sample complexity affords interpretable policy extraction. First, we inc…
▽ More
Decision trees are ubiquitous in machine learning for their ease of use and interpretability. Yet, these models are not typically employed in reinforcement learning as they cannot be updated online via stochastic gradient descent. We overcome this limitation by allowing for a gradient update over the entire tree that improves sample complexity affords interpretable policy extraction. First, we include theoretical motivation on the need for policy-gradient learning by examining the properties of gradient descent over differentiable decision trees. Second, we demonstrate that our approach equals or outperforms a neural network on all domains and can learn discrete decision trees online with average rewards up to 7x higher than a batch-trained decision tree. Third, we conduct a user study to quantify the interpretability of a decision tree, rule list, and a neural network with statistically significant results ($p < 0.001$).
△ Less
Submitted 25 June, 2020; v1 submitted 21 March, 2019;
originally announced March 2019.
-
Differentiable MPC for End-to-end Planning and Control
Authors:
Brandon Amos,
Ivan Dario Jimenez Rodriguez,
Jacob Sacks,
Byron Boots,
J. Zico Kolter
Abstract:
We present foundations for using Model Predictive Control (MPC) as a differentiable policy class for reinforcement learning in continuous state and action spaces. This provides one way of leveraging and combining the advantages of model-free and model-based approaches. Specifically, we differentiate through MPC by using the KKT conditions of the convex approximation at a fixed point of the control…
▽ More
We present foundations for using Model Predictive Control (MPC) as a differentiable policy class for reinforcement learning in continuous state and action spaces. This provides one way of leveraging and combining the advantages of model-free and model-based approaches. Specifically, we differentiate through MPC by using the KKT conditions of the convex approximation at a fixed point of the controller. Using this strategy, we are able to learn the cost and dynamics of a controller via end-to-end learning. Our experiments focus on imitation learning in the pendulum and cartpole domains, where we learn the cost and dynamics terms of an MPC policy class. We show that our MPC policies are significantly more data-efficient than a generic neural network and that our method is superior to traditional system identification in a setting where the expert is unrealizable.
△ Less
Submitted 14 October, 2019; v1 submitted 31 October, 2018;
originally announced October 2018.
-
The Maximum Likelihood Degree of Toric Varieties
Authors:
Carlos Améndola,
Nathan Bliss,
Isaac Burke,
Courtney R. Gibbons,
Martin Helmer,
Serkan Hoşten,
Evan D. Nash,
Jose Israel Rodriguez,
Daniel Smolkin
Abstract:
We study the maximum likelihood degree (ML degree) of toric varieties, known as discrete exponential models in statistics. By introducing scaling coefficients to the monomial parameterization of the toric variety, one can change the ML degree. We show that the ML degree is equal to the degree of the toric variety for generic scalings, while it drops if and only if the scaling vector is in the locu…
▽ More
We study the maximum likelihood degree (ML degree) of toric varieties, known as discrete exponential models in statistics. By introducing scaling coefficients to the monomial parameterization of the toric variety, one can change the ML degree. We show that the ML degree is equal to the degree of the toric variety for generic scalings, while it drops if and only if the scaling vector is in the locus of the principal $A$-determinant. We also illustrate how to compute the ML estimate of a toric variety numerically via homotopy continuation from a scaled toric variety with low ML degree. Throughout, we include examples motivated by algebraic geometry and statistics. We compute the ML degree of rational normal scrolls and a large class of Veronese-type varieties. In addition, we investigate the ML degree of scaled Segre varieties, hierarchical loglinear models, and graphical models.
△ Less
Submitted 8 November, 2017; v1 submitted 7 March, 2017;
originally announced March 2017.
-
Entropy measure for the quantification of upper quantile interdependence in multivariate distributions
Authors:
Jhan Rodríguez,
András Bárdossy
Abstract:
We introduce a new measure of interdependence among the components of a random vector along the main diagonal of the vector copula, i.e. along the line $u_{1}=\ldots=u_{J}$, for $\left(u_{1},\ldots,u_{J}\right)\in\left[0,1\right]^{J}$. Our measure is related to the Shannon entropy of a discrete random variable, hence we call it an "entropy index". This entropy index is invariant with respect to ma…
▽ More
We introduce a new measure of interdependence among the components of a random vector along the main diagonal of the vector copula, i.e. along the line $u_{1}=\ldots=u_{J}$, for $\left(u_{1},\ldots,u_{J}\right)\in\left[0,1\right]^{J}$. Our measure is related to the Shannon entropy of a discrete random variable, hence we call it an "entropy index". This entropy index is invariant with respect to marginal non-decreasing transformations and can be used to quantify the intensity of the vector components association in arbitrary dimensions. We show the applicability of our entropy index by an example with real data of 4 stock prices of the DAX index. In case the random vector is in the domain of attraction of an extreme value distribution, our index is shown to have as limit the distribution's extremal coefficient, which can be interpreted as the effective number of asymptotically independent components in the vector.
△ Less
Submitted 28 August, 2014;
originally announced August 2014.
-
Maximum Likelihood for Matrices with Rank Constraints
Authors:
Jonathan Hauenstein,
Jose Rodriguez,
Bernd Sturmfels
Abstract:
Maximum likelihood estimation is a fundamental optimization problem in statistics. We study this problem on manifolds of matrices with bounded rank. These represent mixtures of distributions of two independent discrete random variables. We determine the maximum likelihood degree for a range of determinantal varieties, and we apply numerical algebraic geometry to compute all critical points of thei…
▽ More
Maximum likelihood estimation is a fundamental optimization problem in statistics. We study this problem on manifolds of matrices with bounded rank. These represent mixtures of distributions of two independent discrete random variables. We determine the maximum likelihood degree for a range of determinantal varieties, and we apply numerical algebraic geometry to compute all critical points of their likelihood functions. This led to the discovery of maximum likelihood duality between matrices of complementary ranks, a result proved subsequently by Draisma and Rodriguez.
△ Less
Submitted 18 March, 2013; v1 submitted 30 September, 2012;
originally announced October 2012.