-
Feature Network Methods in Machine Learning and Applications
Authors:
Xinying Mu,
Mark Kon
Abstract:
A machine learning (ML) feature network is a graph that connects ML features in learning tasks based on their similarity. This network representation allows us to view feature vectors as functions on the network. By leveraging function operations from Fourier analysis and from functional analysis, one can easily generate new and novel features, making use of the graph structure imposed on the feat…
▽ More
A machine learning (ML) feature network is a graph that connects ML features in learning tasks based on their similarity. This network representation allows us to view feature vectors as functions on the network. By leveraging function operations from Fourier analysis and from functional analysis, one can easily generate new and novel features, making use of the graph structure imposed on the feature vectors. Such network structures have previously been studied implicitly in image processing and computational biology. We thus describe feature networks as graph structures imposed on feature vectors, and provide applications in machine learning. One application involves graph-based generalizations of convolutional neural networks, involving structured deep learning with hierarchical representations of features that have varying depth or complexity. This extends also to learning algorithms that are able to generate useful new multilevel features. Additionally, we discuss the use of feature networks to engineer new features, which can enhance the expressiveness of the model. We give a specific example of a deep tree-structured feature network, where hierarchical connections are formed through feature clustering and feed-forward learning. This results in low learning complexity and computational efficiency. Unlike "standard" neural features which are limited to modulated (thresholded) linear combinations of adjacent ones, feature networks offer more general feedforward dependencies among features. For example, radial basis functions or graph structure-based dependencies between features can be utilized.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
Stochastic Functional Analysis and Multilevel Vector Field Anomaly Detection
Authors:
Julio E Castrillon-Candas,
Mark Kon
Abstract:
Massive vector field datasets are common in multi-spectral optical and radar sensors, among many other emerging areas of application. In this paper we develop a novel stochastic functional (data) analysis approach for detecting anomalies based on the covariance structure of nominal stochastic behavior across a domain. An optimal vector field Karhunen-Loeve expansion is applied to such random field…
▽ More
Massive vector field datasets are common in multi-spectral optical and radar sensors, among many other emerging areas of application. In this paper we develop a novel stochastic functional (data) analysis approach for detecting anomalies based on the covariance structure of nominal stochastic behavior across a domain. An optimal vector field Karhunen-Loeve expansion is applied to such random field data. A series of multilevel orthogonal functional subspaces is constructed from the geometry of the domain, adapted from the KL expansion. Detection is achieved by examining the projection of the random field on the multilevel basis. In addition, reliable hypothesis tests are formed that do not require prior assumptions on probability distributions of the data. The method is applied to the important problem of deforestation and degradation in the Amazon forest. This is a complex non-monotonic process, as forests can degrade and recover. Using multi-spectral satellite data from Sentinel-2, the multilevel filter is constructed and anomalies are treated as deviations from the initial state of the forest. Forest anomalies are quantified with robust hypothesis tests. Our approach shows the advantage of using multiple bands of data in a vectorized complex, leading to better anomaly detection beyond the capabilities of scalar-based methods.
△ Less
Submitted 5 October, 2022; v1 submitted 11 July, 2022;
originally announced July 2022.
-
Multilevel Stochastic Optimization for Imputation in Massive Medical Data Records
Authors:
Wenrui Li,
Xiaoyu Wang,
Yuetian Sun,
Snezana Milanovic,
Mark Kon,
Julio Enrique Castrillon-Candas
Abstract:
It has long been a recognized problem that many datasets contain significant levels of missing numerical data. A potentially critical predicate for application of machine learning methods to datasets involves addressing this problem. However, this is a challenging task. In this paper, we apply a recently developed multi-level stochastic optimization approach to the problem of imputation in massive…
▽ More
It has long been a recognized problem that many datasets contain significant levels of missing numerical data. A potentially critical predicate for application of machine learning methods to datasets involves addressing this problem. However, this is a challenging task. In this paper, we apply a recently developed multi-level stochastic optimization approach to the problem of imputation in massive medical records. The approach is based on computational applied mathematics techniques and is highly accurate. In particular, for the Best Linear Unbiased Predictor (BLUP) this multi-level formulation is exact, and is significantly faster and more numerically stable. This permits practical application of Kriging methods to data imputation problems for massive datasets. We test this approach on data from the National Inpatient Sample (NIS) data records, Healthcare Cost and Utilization Project (HCUP), Agency for Healthcare Research and Quality. Numerical results show that the multi-level method significantly outperforms current approaches and is numerically robust. It has superior accuracy as compared with methods recommended in the recent report from HCUP. Benchmark tests show up to 75% reductions in error. Furthermore, the results are also superior to recent state of the art methods such as discriminative deep learning.
△ Less
Submitted 3 April, 2024; v1 submitted 18 October, 2021;
originally announced October 2021.
-
Stochastic tensor space feature theory with applications to robust machine learning
Authors:
Julio Enrique Castrillon-Candas,
Dingning Liu,
Sicheng Yang,
Xiaoling Zhang,
Mark Kon
Abstract:
In this paper we develop a Multilevel Orthogonal Subspace (MOS) Karhunen-Loeve feature theory based on stochastic tensor spaces, for the construction of robust machine learning features. Training data is treated as instances of a random field within a relevant Bochner space. Our key observation is that separate machine learning classes can reside predominantly in mostly distinct subspaces. Using t…
▽ More
In this paper we develop a Multilevel Orthogonal Subspace (MOS) Karhunen-Loeve feature theory based on stochastic tensor spaces, for the construction of robust machine learning features. Training data is treated as instances of a random field within a relevant Bochner space. Our key observation is that separate machine learning classes can reside predominantly in mostly distinct subspaces. Using the Karhunen-Loeve expansion and a hierarchical expansion of the first (nominal) class, a MOS is constructed to detect anomalous signal components, treating the second class as an outlier of the first. The projection coefficients of the input data into these subspaces are then used to train a Machine Learning (ML) classifier. These coefficients become new features from which much clearer separation surfaces can arise for the underlying classes. Tests in the blood plasma dataset (Alzheimer's Disease Neuroimaging Initiative) show dramatic increases in accuracy. This is in contrast to popular ML methods such as Gradient Boosting, RUS Boost, Random Forest and (Convolutional) Neural Networks.
△ Less
Submitted 20 March, 2025; v1 submitted 4 October, 2021;
originally announced October 2021.
-
Coupled VAE: Improved Accuracy and Robustness of a Variational Autoencoder
Authors:
Shichen Cao,
Jingjing Li,
Kenric P. Nelson,
Mark A. Kon
Abstract:
We present a coupled Variational Auto-Encoder (VAE) method that improves the accuracy and robustness of the probabilistic inferences on represented data. The new method models the dependency between input feature vectors (images) and weighs the outliers with a higher penalty by generalizing the original loss function to the coupled entropy function, using the principles of nonlinear statistical co…
▽ More
We present a coupled Variational Auto-Encoder (VAE) method that improves the accuracy and robustness of the probabilistic inferences on represented data. The new method models the dependency between input feature vectors (images) and weighs the outliers with a higher penalty by generalizing the original loss function to the coupled entropy function, using the principles of nonlinear statistical coupling. We evaluate the performance of the coupled VAE model using the MNIST dataset. Compared with the traditional VAE algorithm, the output images generated by the coupled VAE method are clearer and less blurry. The visualization of the input images embedded in 2D latent variable space provides a deeper insight into the structure of new model with coupled loss function: the latent variable has a smaller deviation and a more compact latent space generates the output values. We analyze the histogram of the likelihoods of the input images using the generalized mean, which measures the model's accuracy as a function of the relative risk. The neutral accuracy, which is the geometric mean and is consistent with a measure of the Shannon cross-entropy, is improved. The robust accuracy, measured by the -2/3 generalized mean, is also improved.
△ Less
Submitted 12 July, 2021; v1 submitted 2 June, 2019;
originally announced June 2019.
-
Analytic regularity and stochastic collocation of high dimensional Newton iterates
Authors:
Julio Enrique Castrillon-Candas,
Mark Kon
Abstract:
In this paper we introduce concepts from uncertainty quantification (UQ) and numerical analysis for the efficient evaluation of stochastic high dimensional Newton iterates. In particular, we develop complex analytic regularity theory of the solution with respect to the random variables. This justifies the application of sparse grids for the computation of stochastic moments. Convergence rates are…
▽ More
In this paper we introduce concepts from uncertainty quantification (UQ) and numerical analysis for the efficient evaluation of stochastic high dimensional Newton iterates. In particular, we develop complex analytic regularity theory of the solution with respect to the random variables. This justifies the application of sparse grids for the computation of stochastic moments. Convergence rates are derived and are shown to be subexponential or algebraic with respect to the number of realizations of random perturbations. Due the accuracy of the method, sparse grids are well suited for computing low probability events with high confidence. We apply our method to the power flow problem. Numerical experiments on the 39 bus New England power system model with large stochastic loads are consistent with the theoretical convergence rates.
△ Less
Submitted 22 May, 2019;
originally announced May 2019.
-
Use of the geometric mean as a statistic for the scale of the coupled Gaussian distributions
Authors:
Kenric P. Nelson,
Mark A. Kon,
Sabir R. Umarov
Abstract:
The geometric mean is shown to be an appropriate statistic for the scale of a heavy-tailed coupled Gaussian distribution or equivalently the Student's t distribution. The coupled Gaussian is a member of a family of distributions parameterized by the nonlinear statistical coupling which is the reciprocal of the degree of freedom and is proportional to fluctuations in the inverse scale of the Gaussi…
▽ More
The geometric mean is shown to be an appropriate statistic for the scale of a heavy-tailed coupled Gaussian distribution or equivalently the Student's t distribution. The coupled Gaussian is a member of a family of distributions parameterized by the nonlinear statistical coupling which is the reciprocal of the degree of freedom and is proportional to fluctuations in the inverse scale of the Gaussian. Existing estimators of the scale of the coupled Gaussian have relied on estimates of the full distribution, and they suffer from problems related to outliers in heavy-tailed distributions. In this paper, the scale of a coupled Gaussian is proven to be equal to the product of the generalized mean and the square root of the coupling. From our numerical computations of the scales of coupled Gaussians using the generalized mean of random samples, it is indicated that only samples from a Cauchy distribution (with coupling parameter one) form an unbiased estimate with diminishing variance for large samples. Nevertheless, we also prove that the scale is a function of the geometric mean, the coupling term and a harmonic number. Numerical experiments show that this estimator is unbiased with diminishing variance for large samples for a broad range of coupling values.
△ Less
Submitted 13 August, 2018; v1 submitted 4 April, 2018;
originally announced April 2018.
-
Feature vector regularization in machine learning
Authors:
Yue Fan,
Louise Raphael,
Mark Kon
Abstract:
Problems in machine learning (ML) can involve noisy input data, and ML classification methods have reached limiting accuracies when based on standard ML data sets consisting of feature vectors and their classes. Greater accuracy will require incorporation of prior structural information on data into learning. We study methods to regularize feature vectors (unsupervised regularization methods), ana…
▽ More
Problems in machine learning (ML) can involve noisy input data, and ML classification methods have reached limiting accuracies when based on standard ML data sets consisting of feature vectors and their classes. Greater accuracy will require incorporation of prior structural information on data into learning. We study methods to regularize feature vectors (unsupervised regularization methods), analogous to supervised regularization for estimating functions in ML. We study regularization (denoising) of ML feature vectors using Tikhonov and other regularization methods for functions on ${\bf R}^n$. A feature vector ${\bf x}=(x_1,\ldots,x_n)=\{x_q\}_{q=1}^n$ is viewed as a function of its index $q$, and smoothed using prior information on its structure. This can involve a penalty functional on feature vectors analogous to those in statistical learning, or use of proximity (e.g. graph) structure on the set of indices. Such feature vector regularization inherits a property from function denoising on ${\bf R}^n$, in that accuracy is non-monotonic in the denoising (regularization) parameter $α$. Under some assumptions about the noise level and the data structure, we show that the best reconstruction accuracy also occurs at a finite positive $α$ in index spaces with graph structures. We adapt two standard function denoising methods used on ${\bf R}^n$, local averaging and kernel regression. In general the index space can be any discrete set with a notion of proximity, e.g. a metric space, a subset of ${\bf R}^n$, or a graph/network, with feature vectors as functions with some notion of continuity. We show this improves feature vector recovery, and thus the subsequent classification or regression done on them. We give an example in gene expression analysis for cancer classification with the genome as an index space and network structure based protein-protein interactions.
△ Less
Submitted 30 December, 2013; v1 submitted 18 December, 2012;
originally announced December 2012.
-
A complexity analysis of statistical learning algorithms
Authors:
Mark A. Kon
Abstract:
We apply information-based complexity analysis to support vector machine (SVM) algorithms, with the goal of a comprehensive continuous algorithmic analysis of such algorithms. This involves complexity measures in which some higher order operations (e.g., certain optimizations) are considered primitive for the purposes of measuring complexity. We consider classes of information operators and algori…
▽ More
We apply information-based complexity analysis to support vector machine (SVM) algorithms, with the goal of a comprehensive continuous algorithmic analysis of such algorithms. This involves complexity measures in which some higher order operations (e.g., certain optimizations) are considered primitive for the purposes of measuring complexity. We consider classes of information operators and algorithms made up of scaled families, and investigate the utility of scaling the complexities to minimize error. We look at the division of statistical learning into information and algorithmic components, at the complexities of each, and at applications to support vector machine (SVM) and more general machine learning algorithms. We give applications to SVM algorithms graded into linear and higher order components, and give an example in biomedical informatics.
△ Less
Submitted 18 December, 2012;
originally announced December 2012.
-
On the probabilistic continuous complexity conjecture
Authors:
Mark A. Kon
Abstract:
In this paper we prove the probabilistic continuous complexity conjecture. In continuous complexity theory, this states that the complexity of solving a continuous problem with probability approaching 1 converges (in this limit) to the complexity of solving the same problem in its worst case. We prove the conjecture holds if and only if space of problem elements is uniformly convex. The non-unifor…
▽ More
In this paper we prove the probabilistic continuous complexity conjecture. In continuous complexity theory, this states that the complexity of solving a continuous problem with probability approaching 1 converges (in this limit) to the complexity of solving the same problem in its worst case. We prove the conjecture holds if and only if space of problem elements is uniformly convex. The non-uniformly convex case has a striking counterexample in the problem of identifying a Brownian path in Wiener space, where it is shown that probabilistic complexity converges to only half of the worst case complexity in this limit.
△ Less
Submitted 6 December, 2012;
originally announced December 2012.
-
On Some Integrated Approaches to Inference
Authors:
Mark A. Kon,
Leszek Plaskota
Abstract:
We present arguments for the formulation of unified approach to different standard continuous inference methods from partial information. It is claimed that an explicit partition of information into a priori (prior knowledge) and a posteriori information (data) is an important way of standardizing inference approaches so that they can be compared on a normative scale, and so that notions of optima…
▽ More
We present arguments for the formulation of unified approach to different standard continuous inference methods from partial information. It is claimed that an explicit partition of information into a priori (prior knowledge) and a posteriori information (data) is an important way of standardizing inference approaches so that they can be compared on a normative scale, and so that notions of optimal algorithms become farther-reaching. The inference methods considered include neural network approaches, information-based complexity, and Monte Carlo, spline, and regularization methods. The model is an extension of currently used continuous complexity models, with a class of algorithms in the form of optimization methods, in which an optimization functional (involving the data) is minimized. This extends the family of current approaches in continuous complexity theory, which include the use of interpolatory algorithms in worst and average case settings.
△ Less
Submitted 5 December, 2012;
originally announced December 2012.
-
Empirical Normalization for Quadratic Discriminant Analysis and Classifying Cancer Subtypes
Authors:
Mark A. Kon,
Nikolay Nikolaev
Abstract:
We introduce a new discriminant analysis method (Empirical Discriminant Analysis or EDA) for binary classification in machine learning. Given a dataset of feature vectors, this method defines an empirical feature map transforming the training and test data into new data with components having Gaussian empirical distributions. This map is an empirical version of the Gaussian copula used in probabil…
▽ More
We introduce a new discriminant analysis method (Empirical Discriminant Analysis or EDA) for binary classification in machine learning. Given a dataset of feature vectors, this method defines an empirical feature map transforming the training and test data into new data with components having Gaussian empirical distributions. This map is an empirical version of the Gaussian copula used in probability and mathematical finance. The purpose is to form a feature mapped dataset as close as possible to Gaussian, after which standard quadratic discriminants can be used for classification. We discuss this method in general, and apply it to some datasets in computational biology.
△ Less
Submitted 29 October, 2012; v1 submitted 28 March, 2012;
originally announced March 2012.