Search | arXiv e-print repository

Conditional regression for the Nonlinear Single-Variable Model

Abstract: Regressing a function $F$ on $\mathbb{R}^d$ without the statistical and computational curse of dimensionality requires special statistical models, for example that impose geometric assumptions on the distribution of the data (e.g., that its support is low-dimensional), or strong smoothness assumptions on $F$, or a special structure $F$. Among the latter, compositional models $F=f\circ g$ with $g$… ▽ More Regressing a function $F$ on $\mathbb{R}^d$ without the statistical and computational curse of dimensionality requires special statistical models, for example that impose geometric assumptions on the distribution of the data (e.g., that its support is low-dimensional), or strong smoothness assumptions on $F$, or a special structure $F$. Among the latter, compositional models $F=f\circ g$ with $g$ mapping to $\mathbb{R}^r$ with $r\ll d$ include classical single- and multi-index models, as well as neural networks. While the case where $g$ is linear is well-understood, less is known when $g$ is nonlinear, and in particular for which $g$'s the curse of dimensionality in estimating $F$, or both $f$ and $g$, may be circumvented. Here we consider a model $F(X):=f(Π_γX)$ where $Π_γ:\mathbb{R}^d\to[0,\textrm{len}_γ]$ is the closest-point projection onto the parameter of a regular curve $γ:[0, \textrm{len}_γ]\to\mathbb{R}^d$, and $f:[0,\textrm{len}_γ]\to \mathbb{R}^1$. The input data $X$ is not low-dimensional: it can be as far from $γ$ as the condition that $Π_γ(X)$ is well-defined allows. The distribution $X$, the curve $γ$ and the function $f$ are all unknown. This model is a natural nonlinear generalization of the single-index model, corresponding to $γ$ being a line. We propose a nonparametric estimator, based on conditional regression, that under suitable assumptions, the strongest of which being that $f$ is coarsely monotone, achieves, up to log factors, the $\textit{one-dimensional}$ optimal min-max rate for non-parametric regression, up to the level of noise in the observations, and be constructed in time $\mathcal{O}(d^2 n\log n)$. All the constants in the learning bounds, in the minimal number of samples required for our bounds to hold, and in the computational complexity are at most low-order polynomials in $d$. △ Less

Submitted 11 July, 2025; v1 submitted 14 November, 2024; originally announced November 2024.

Comments: 57 pages, 10 figures

MSC Class: 62G08

arXiv:2410.04655 [pdf, other]

Graph Fourier Neural Kernels (G-FuNK): Learning Solutions of Nonlinear Diffusive Parametric PDEs on Multiple Domains

Authors: Shane E. Loeffler, Zan Ahmad, Syed Yusuf Ali, Carolyna Yamamoto, Dan M. Popescu, Alana Yee, Yash Lal, Natalia Trayanova, Mauro Maggioni

Abstract: Predicting time-dependent dynamics of complex systems governed by non-linear partial differential equations (PDEs) with varying parameters and domains is a challenging task motivated by applications across various fields. We introduce a novel family of neural operators based on our Graph Fourier Neural Kernels, designed to learn solution generators for nonlinear PDEs in which the highest-order ter… ▽ More Predicting time-dependent dynamics of complex systems governed by non-linear partial differential equations (PDEs) with varying parameters and domains is a challenging task motivated by applications across various fields. We introduce a novel family of neural operators based on our Graph Fourier Neural Kernels, designed to learn solution generators for nonlinear PDEs in which the highest-order term is diffusive, across multiple domains and parameters. G-FuNK combines components that are parameter- and domain-adapted with others that are not. The domain-adapted components are constructed using a weighted graph on the discretized domain, where the graph Laplacian approximates the highest-order diffusive term, ensuring boundary condition compliance and capturing the parameter and domain-specific behavior. Meanwhile, the learned components transfer across domains and parameters using our variant Fourier Neural Operators. This approach naturally embeds geometric and directional information, improving generalization to new test domains without need for retraining the network. To handle temporal dynamics, our method incorporates an integrated ODE solver to predict the evolution of the system. Experiments show G-FuNK's capability to accurately approximate heat, reaction diffusion, and cardiac electrophysiology equations across various geometries and anisotropic diffusivity fields. G-FuNK achieves low relative errors on unseen domains and fiber fields, significantly accelerating predictions compared to traditional finite-element solvers. △ Less

Submitted 9 October, 2024; v1 submitted 6 October, 2024; originally announced October 2024.

arXiv:2408.16138 [pdf, ps, other]

Thinner Latent Spaces: Detecting Dimension and Imposing Invariance with Conformal Autoencoders

Authors: George A. Kevrekidis, Zan Ahmad, Mauro Maggioni, Soledad Villar, Yannis G. Kevrekidis

Abstract: Conformal Autoencoders are a neural network architecture that imposes orthogonality conditions between the gradients of latent variables to obtain disentangled representations of data. In this work we show that orthogonality relations within the latent layer of the network can be leveraged to infer the intrinsic dimensionality of nonlinear manifold data sets (locally characterized by the dimension… ▽ More Conformal Autoencoders are a neural network architecture that imposes orthogonality conditions between the gradients of latent variables to obtain disentangled representations of data. In this work we show that orthogonality relations within the latent layer of the network can be leveraged to infer the intrinsic dimensionality of nonlinear manifold data sets (locally characterized by the dimension of their tangent space), while simultaneously computing encoding and decoding (embedding) maps. We outline the relevant theory relying on differential geometry, and describe the corresponding gradient-descent optimization algorithm. The method is applied to several data sets and we highlight its applicability, advantages, and shortcomings. In addition, we demonstrate that the same computational technology can be used to build coordinate invariance to local group actions when defined only on a (reduced) submanifold of the embedding space. △ Less

Submitted 10 July, 2025; v1 submitted 28 August, 2024; originally announced August 2024.

arXiv:2402.08412 [pdf, other]

Interacting Particle Systems on Networks: joint inference of the network and the interaction kernel

Authors: Quanjun Lang, Xiong Wang, Fei Lu, Mauro Maggioni

Abstract: Modeling multi-agent systems on networks is a fundamental challenge in a wide variety of disciplines. We jointly infer the weight matrix of the network and the interaction kernel, which determine respectively which agents interact with which others and the rules of such interactions from data consisting of multiple trajectories. The estimator we propose leads naturally to a non-convex optimization… ▽ More Modeling multi-agent systems on networks is a fundamental challenge in a wide variety of disciplines. We jointly infer the weight matrix of the network and the interaction kernel, which determine respectively which agents interact with which others and the rules of such interactions from data consisting of multiple trajectories. The estimator we propose leads naturally to a non-convex optimization problem, and we investigate two approaches for its solution: one is based on the alternating least squares (ALS) algorithm; another is based on a new algorithm named operator regression with alternating least squares (ORALS). Both algorithms are scalable to large ensembles of data trajectories. We establish coercivity conditions guaranteeing identifiability and well-posedness. The ALS algorithm appears statistically efficient and robust even in the small data regime but lacks performance and convergence guarantees. The ORALS estimator is consistent and asymptotically normal under a coercivity condition. We conduct several numerical experiments ranging from Kuramoto particle systems on networks to opinion dynamics in leader-follower models. △ Less

Submitted 13 February, 2024; originally announced February 2024.

Comments: 53 pages, 17 figures

MSC Class: 62F12; 82C22

arXiv:2212.00746 [pdf, other]

Learning Transition Operators From Sparse Space-Time Samples

Authors: Christian Kümmerle, Mauro Maggioni, Sui Tang

Abstract: We consider the nonlinear inverse problem of learning a transition operator $\mathbf{A}$ from partial observations at different times, in particular from sparse observations of entries of its powers $\mathbf{A},\mathbf{A}^2,\cdots,\mathbf{A}^{T}$. This Spatio-Temporal Transition Operator Recovery problem is motivated by the recent interest in learning time-varying graph signals that are driven by… ▽ More We consider the nonlinear inverse problem of learning a transition operator $\mathbf{A}$ from partial observations at different times, in particular from sparse observations of entries of its powers $\mathbf{A},\mathbf{A}^2,\cdots,\mathbf{A}^{T}$. This Spatio-Temporal Transition Operator Recovery problem is motivated by the recent interest in learning time-varying graph signals that are driven by graph operators depending on the underlying graph topology. We address the nonlinearity of the problem by embedding it into a higher-dimensional space of suitable block-Hankel matrices, where it becomes a low-rank matrix completion problem, even if $\mathbf{A}$ is of full rank. For both a uniform and an adaptive random space-time sampling model, we quantify the recoverability of the transition operator via suitable measures of incoherence of these block-Hankel embedding matrices. For graph transition operators these measures of incoherence depend on the interplay between the dynamics and the graph topology. We develop a suitable non-convex iterative reweighted least squares (IRLS) algorithm, establish its quadratic local convergence, and show that, in optimal scenarios, no more than $\mathcal{O}(rn \log(nT))$ space-time samples are sufficient to ensure accurate recovery of a rank-$r$ operator $\mathbf{A}$ of size $n \times n$. This establishes that spatial samples can be substituted by a comparable number of space-time samples. We provide an efficient implementation of the proposed IRLS algorithm with space complexity of order $O(r n T)$ and per-iteration time complexity linear in $n$. Numerical experiments for transition operators based on several graph models confirm that the theoretical findings accurately track empirical phase transitions, and illustrate the applicability and scalability of the proposed algorithm. △ Less

Submitted 1 December, 2022; originally announced December 2022.

Comments: 34 pages, 12 figures

arXiv:2207.05242 [pdf, other]

Unsupervised learning of observation functions in state-space models by nonparametric moment methods

Authors: Qingci An, Yannis Kevrekidis, Fei Lu, Mauro Maggioni

Abstract: We investigate the unsupervised learning of non-invertible observation functions in nonlinear state-space models. Assuming abundant data of the observation process along with the distribution of the state process, we introduce a nonparametric generalized moment method to estimate the observation function via constrained regression. The major challenge comes from the non-invertibility of the observ… ▽ More We investigate the unsupervised learning of non-invertible observation functions in nonlinear state-space models. Assuming abundant data of the observation process along with the distribution of the state process, we introduce a nonparametric generalized moment method to estimate the observation function via constrained regression. The major challenge comes from the non-invertibility of the observation function and the lack of data pairs between the state and observation. We address the fundamental issue of identifiability from quadratic loss functionals and show that the function space of identifiability is the closure of a RKHS that is intrinsic to the state process. Numerical results show that the first two moments and temporal correlations, along with upper and lower bounds, can identify functions ranging from piecewise polynomials to smooth functions, leading to convergent estimators. The limitations of this method, such as non-identifiability due to symmetry and stationarity, are also discussed. △ Less

Submitted 11 July, 2022; originally announced July 2022.

MSC Class: 62G05; 68Q32; 62M15

arXiv:2104.02120 [pdf, other]

Nonlinear model reduction for slow-fast stochastic systems near unknown invariant manifolds

Authors: Felix X. -F. Ye, Sichen Yang, Mauro Maggioni

Abstract: We introduce a nonlinear stochastic model reduction technique for high-dimensional stochastic dynamical systems that have a low-dimensional invariant effective manifold with slow dynamics, and high-dimensional, large fast modes. Given only access to a black box simulator from which short bursts of simulation can be obtained, we design an algorithm that outputs an estimate of the invariant manifold… ▽ More We introduce a nonlinear stochastic model reduction technique for high-dimensional stochastic dynamical systems that have a low-dimensional invariant effective manifold with slow dynamics, and high-dimensional, large fast modes. Given only access to a black box simulator from which short bursts of simulation can be obtained, we design an algorithm that outputs an estimate of the invariant manifold, a process of the effective stochastic dynamics on it, which has averaged out the fast modes, and a simulator thereof. This simulator is efficient in that it exploits of the low dimension of the invariant manifold, and takes time steps of size dependent on the regularity of the effective process, and therefore typically much larger than that of the original simulator, which had to resolve the fast modes. The algorithm and the estimation can be performed on-the-fly, leading to efficient exploration of the effective state space, without losing consistency with the underlying dynamics. This construction enables fast and efficient simulation of paths of the effective dynamics, together with estimation of crucial features and observables of such dynamics, including the stationary distribution, identification of metastable states, and residence times and transition rates between them. △ Less

Submitted 24 October, 2023; v1 submitted 5 April, 2021; originally announced April 2021.

arXiv:2101.05119 [pdf, ps, other]

Multiscale regression on unknown manifolds

Authors: Wenjing Liao, Mauro Maggioni, Stefano Vigogna

Abstract: We consider the regression problem of estimating functions on $\mathbb{R}^D$ but supported on a $d$-dimensional manifold $ \mathcal{M} \subset \mathbb{R}^D $ with $ d \ll D $. Drawing ideas from multi-resolution analysis and nonlinear approximation, we construct low-dimensional coordinates on $\mathcal{M}$ at multiple scales, and perform multiscale regression by local polynomial fitting. We propos… ▽ More We consider the regression problem of estimating functions on $\mathbb{R}^D$ but supported on a $d$-dimensional manifold $ \mathcal{M} \subset \mathbb{R}^D $ with $ d \ll D $. Drawing ideas from multi-resolution analysis and nonlinear approximation, we construct low-dimensional coordinates on $\mathcal{M}$ at multiple scales, and perform multiscale regression by local polynomial fitting. We propose a data-driven wavelet thresholding scheme that automatically adapts to the unknown regularity of the function, allowing for efficient estimation of functions exhibiting nonuniform regularity at different locations and scales. We analyze the generalization error of our method by proving finite sample bounds in high probability on rich classes of priors. Our estimator attains optimal learning rates (up to logarithmic factors) as if the function was defined on a known Euclidean domain of dimension $d$, instead of an unknown manifold embedded in $\mathbb{R}^D$. The implemented algorithm has quasilinear complexity in the sample size, with constants linear in $D$ and exponential in $d$. Our work therefore establishes a new framework for regression on low-dimensional sets embedded in high dimensions, with fast implementation and strong theoretical guarantees. △ Less

Submitted 13 January, 2021; originally announced January 2021.

arXiv:2010.03729 [pdf, other]

Learning Theory for Inferring Interaction Kernels in Second-Order Interacting Agent Systems

Authors: Jason Miller, Sui Tang, Ming Zhong, Mauro Maggioni

Abstract: Modeling the complex interactions of systems of particles or agents is a fundamental scientific and mathematical problem that is studied in diverse fields, ranging from physics and biology, to economics and machine learning. In this work, we describe a very general second-order, heterogeneous, multivariable, interacting agent model, with an environment, that encompasses a wide variety of known sys… ▽ More Modeling the complex interactions of systems of particles or agents is a fundamental scientific and mathematical problem that is studied in diverse fields, ranging from physics and biology, to economics and machine learning. In this work, we describe a very general second-order, heterogeneous, multivariable, interacting agent model, with an environment, that encompasses a wide variety of known systems. We describe an inference framework that uses nonparametric regression and approximation theory based techniques to efficiently derive estimators of the interaction kernels which drive these dynamical systems. We develop a complete learning theory which establishes strong consistency and optimal nonparametric min-max rates of convergence for the estimators, as well as provably accurate predicted trajectories. The estimators exploit the structure of the equations in order to overcome the curse of dimensionality and we describe a fundamental coercivity condition on the inverse problem which ensures that the kernels can be learned and relates to the minimal singular value of the learning matrix. The numerical algorithm presented to build the estimators is parallelizable, performs well on high-dimensional problems, and is demonstrated on complex dynamical systems. △ Less

Submitted 7 October, 2020; originally announced October 2020.

Comments: 68 pages

MSC Class: 62Gxx; 37Nxx; 68Txx

arXiv:2007.15174 [pdf, other]

Learning interaction kernels in stochastic systems of interacting particles from multiple trajectories

Authors: Fei Lu, Mauro Maggioni, Sui Tang

Abstract: We consider stochastic systems of interacting particles or agents, with dynamics determined by an interaction kernel which only depends on pairwise distances. We study the problem of inferring this interaction kernel from observations of the positions of the particles, in either continuous or discrete time, along multiple independent trajectories. We introduce a nonparametric inference approach to… ▽ More We consider stochastic systems of interacting particles or agents, with dynamics determined by an interaction kernel which only depends on pairwise distances. We study the problem of inferring this interaction kernel from observations of the positions of the particles, in either continuous or discrete time, along multiple independent trajectories. We introduce a nonparametric inference approach to this inverse problem, based on a regularized maximum likelihood estimator constrained to suitable hypothesis spaces adaptive to data. We show that a coercivity condition enables us to control the condition number of this problem and prove the consistency of our estimator, and that in fact it converges at a near-optimal learning rate, equal to the min-max rate of $1$-dimensional non-parametric regression. In particular, this rate is independent of the dimension of the state space, which is typically very high. We also analyze the discretization errors in the case of discrete-time observations, showing that it is of order $1/2$ in terms of the time gaps between observations. This term, when large, dominates the sampling error and the approximation error, preventing convergence of the estimator. Finally, we exhibit an efficient parallel algorithm to construct the estimator from data, and we demonstrate the effectiveness of our algorithm with numerical tests on prototype systems including stochastic opinion dynamics and a Lennard-Jones model. △ Less

Submitted 29 July, 2020; originally announced July 2020.

Comments: 38 pages; 9 figures

MSC Class: 70F17; 62G05; 62M05

arXiv:1912.11123 [pdf, other]

Data-driven Discovery of Emergent Behaviors in Collective Dynamics

Authors: Mauro Maggioni, Jason Miller, Ming Zhong

Abstract: Particle- and agent-based systems are a ubiquitous modeling tool in many disciplines. We consider the fundamental problem of inferring interaction kernels from observations of agent-based dynamical systems given observations of trajectories, in particular for collective dynamical systems exhibiting emergent behaviors with complicated interaction kernels, in a nonparametric fashion, and for kernels… ▽ More Particle- and agent-based systems are a ubiquitous modeling tool in many disciplines. We consider the fundamental problem of inferring interaction kernels from observations of agent-based dynamical systems given observations of trajectories, in particular for collective dynamical systems exhibiting emergent behaviors with complicated interaction kernels, in a nonparametric fashion, and for kernels which are parametrized by a single unknown parameter. We extend the estimators introduced in \cite{PNASLU}, which are based on suitably regularized least squares estimators, to these larger classes of systems. We provide extensive numerical evidence that the estimators provide faithful approximations to the interaction kernels, and provide accurate predictions for trajectories started at new initial conditions, both throughout the ``training'' time interval in which the observations were made, and often much beyond. We demonstrate these features on prototypical systems displaying collective behaviors, ranging from opinion dynamics, flocking dynamics, self-propelling particle dynamics, synchronized oscillator dynamics, and a gravitational system. Our experiments also suggest that our estimated systems can display the same emergent behaviors of the observed systems, that occur at larger timescales than those used in the training data. Finally, in the case of families of systems governed by a parameterized family of interaction kernels, we introduce novel estimators that estimate the parameterized family of kernels, splitting it into a common interaction kernel and the action of parameters. We demonstrate this in the case of gravity, by learning both the ``common component'' $1/r^2$ and the dependency on mass, without any a priori knowledge of either one, from observations of planetary motions in our solar system. △ Less

Submitted 30 March, 2020; v1 submitted 23 December, 2019; originally announced December 2019.

arXiv:1910.04832 [pdf, other]

Learning interaction kernels in heterogeneous systems of agents from multiple trajectories

Authors: Fei Lu, Mauro Maggioni, Sui Tang

Abstract: Systems of interacting particles or agents have wide applications in many disciplines such as Physics, Chemistry, Biology and Economics. These systems are governed by interaction laws, which are often unknown: estimating them from observation data is a fundamental task that can provide meaningful insights and accurate predictions of the behaviour of the agents. In this paper, we consider the inver… ▽ More Systems of interacting particles or agents have wide applications in many disciplines such as Physics, Chemistry, Biology and Economics. These systems are governed by interaction laws, which are often unknown: estimating them from observation data is a fundamental task that can provide meaningful insights and accurate predictions of the behaviour of the agents. In this paper, we consider the inverse problem of learning interaction laws given data from multiple trajectories, in a nonparametric fashion, when the interaction kernels depend on pairwise distances. We establish a condition for learnability of interaction kernels, and construct estimators that are guaranteed to converge in a suitable $L^2$ space, at the optimal min-max rate for 1-dimensional nonparametric regression. We propose an efficient learning algorithm based on least squares, which can be implemented in parallel for multiple trajectories and is therefore well-suited for the high dimensional, big data regime. Numerical simulations on a variety examples, including opinion dynamics, predator-swarm dynamics and heterogeneous particle dynamics, suggest that the learnability condition is satisfied in models used in practice, and the rate of convergence of our estimator is consistent with the theory. These simulations also suggest that our estimators are robust to noise in the observations, and produce accurate predictions of dynamics in relative large time intervals, even when they are learned from data collected in short time intervals. △ Less

Submitted 14 July, 2020; v1 submitted 10 October, 2019; originally announced October 2019.

Comments: 63 pages, revised various places

MSC Class: 62GXX

arXiv:1905.12989 [pdf, other]

Learning by Active Nonlinear Diffusion

Authors: Mauro Maggioni, James M. Murphy

Abstract: This article proposes an active learning method for high dimensional data, based on intrinsic data geometries learned through diffusion processes on graphs. Diffusion distances are used to parametrize low-dimensional structures on the dataset, which allow for high-accuracy labelings of the dataset with only a small number of carefully chosen labels. The geometric structure of the data suggests reg… ▽ More This article proposes an active learning method for high dimensional data, based on intrinsic data geometries learned through diffusion processes on graphs. Diffusion distances are used to parametrize low-dimensional structures on the dataset, which allow for high-accuracy labelings of the dataset with only a small number of carefully chosen labels. The geometric structure of the data suggests regions that have homogeneous labels, as well as regions with high label complexity that should be queried for labels. The proposed method enjoys theoretical performance guarantees on a general geometric data model, in which clusters corresponding to semantically meaningful classes are permitted to have nonlinear geometries, high ambient dimensionality, and suffer from significant noise and outlier corruption. The proposed algorithm is implemented in a manner that is quasilinear in the number of unlabeled data points, and exhibits competitive empirical performance on synthetic datasets and real hyperspectral remote sensing images. △ Less

Submitted 30 May, 2019; originally announced May 2019.

Comments: 20 pages, 10 figures

arXiv:1902.05402 [pdf, other]

doi 10.1109/LGRS.2019.2943001

Spectral-Spatial Diffusion Geometry for Hyperspectral Image Clustering

Authors: James M. Murphy, Mauro Maggioni

Abstract: An unsupervised learning algorithm to cluster hyperspectral image (HSI) data is proposed that exploits spatially-regularized random walks. Markov diffusions are defined on the space of HSI spectra with transitions constrained to near spatial neighbors. The explicit incorporation of spatial regularity into the diffusion construction leads to smoother random processes that are more adapted for unsup… ▽ More An unsupervised learning algorithm to cluster hyperspectral image (HSI) data is proposed that exploits spatially-regularized random walks. Markov diffusions are defined on the space of HSI spectra with transitions constrained to near spatial neighbors. The explicit incorporation of spatial regularity into the diffusion construction leads to smoother random processes that are more adapted for unsupervised machine learning than those based on spectra alone. The regularized diffusion process is subsequently used to embed the high-dimensional HSI into a lower dimensional space through diffusion distances. Cluster modes are computed using density estimation and diffusion distances, and all other points are labeled according to these modes. The proposed method has low computational complexity and performs competitively against state-of-the-art HSI clustering algorithms on real data. In particular, the proposed spatial regularization confers an empirical advantage over non-regularized methods. △ Less

Submitted 8 February, 2019; originally announced February 2019.

arXiv:1812.06003 [pdf, other]

doi 10.1073/pnas.1822012116

Nonparametric inference of interaction laws in systems of agents from trajectory data

Authors: Fei Lu, Mauro Maggioni, Sui Tang, Ming Zhong

Abstract: Inferring the laws of interaction between particles and agents in complex dynamical systems from observational data is a fundamental challenge in a wide variety of disciplines. We propose a non-parametric statistical learning approach to estimate the governing laws of distance-based interactions, with no reference or assumption about their analytical form, from data consisting trajectories of inte… ▽ More Inferring the laws of interaction between particles and agents in complex dynamical systems from observational data is a fundamental challenge in a wide variety of disciplines. We propose a non-parametric statistical learning approach to estimate the governing laws of distance-based interactions, with no reference or assumption about their analytical form, from data consisting trajectories of interacting agents. We demonstrate the effectiveness of our learning approach both by providing theoretical guarantees, and by testing the approach on a variety of prototypical systems in various disciplines. These systems include homogeneous and heterogeneous agents systems, ranging from particle systems in fundamental physics to agent-based systems modeling opinion dynamics under the social influence, prey-predator dynamics, flocking and swarming, and phototaxis in cell dynamics. △ Less

Submitted 23 March, 2019; v1 submitted 14 December, 2018; originally announced December 2018.

arXiv:1810.06702 [pdf, other]

Learning by Unsupervised Nonlinear Diffusion

Authors: Mauro Maggioni, James M. Murphy

Abstract: This paper proposes and analyzes a novel clustering algorithm that combines graph-based diffusion geometry with techniques based on density and mode estimation. The proposed method is suitable for data generated from mixtures of distributions with densities that are both multimodal and have nonlinear shapes. A crucial aspect of this algorithm is the use of time of a data-adapted diffusion process… ▽ More This paper proposes and analyzes a novel clustering algorithm that combines graph-based diffusion geometry with techniques based on density and mode estimation. The proposed method is suitable for data generated from mixtures of distributions with densities that are both multimodal and have nonlinear shapes. A crucial aspect of this algorithm is the use of time of a data-adapted diffusion process as a scale parameter that is different from the local spatial scale parameter used in many clustering algorithms. We prove estimates for the behavior of diffusion distances with respect to this time parameter under a flexible nonparametric data model, identifying a range of times in which the mesoscopic equilibria of the underlying process are revealed, corresponding to a gap between within-cluster and between-cluster diffusion distances. These structures can be missed by the top eigenvectors of the graph Laplacian, commonly used in spectral clustering. This analysis is leveraged to prove sufficient conditions guaranteeing the accuracy of the proposed \emph{learning by unsupervised nonlinear diffusion (LUND)} procedure. We implement LUND and confirm its theoretical properties on illustrative datasets, demonstrating the theoretical and empirical advantages over both spectral clustering and density-based clustering techniques. △ Less

Submitted 29 December, 2018; v1 submitted 15 October, 2018; originally announced October 2018.

Comments: 40 Pages, 17 Figures

arXiv:1712.06206 [pdf, other]

Path-Based Spectral Clustering: Guarantees, Robustness to Outliers, and Fast Algorithms

Authors: Anna Little, Mauro Maggioni, James M. Murphy

Abstract: We consider the problem of clustering with the longest-leg path distance (LLPD) metric, which is informative for elongated and irregularly shaped clusters. We prove finite-sample guarantees on the performance of clustering with respect to this metric when random samples are drawn from multiple intrinsically low-dimensional clusters in high-dimensional space, in the presence of a large number of hi… ▽ More We consider the problem of clustering with the longest-leg path distance (LLPD) metric, which is informative for elongated and irregularly shaped clusters. We prove finite-sample guarantees on the performance of clustering with respect to this metric when random samples are drawn from multiple intrinsically low-dimensional clusters in high-dimensional space, in the presence of a large number of high-dimensional outliers. By combining these results with spectral clustering with respect to LLPD, we provide conditions under which the Laplacian eigengap statistic correctly determines the number of clusters for a large class of data sets, and prove guarantees on the labeling accuracy of the proposed algorithm. Our methods are quite general and provide performance guarantees for spectral clustering with any ultrametric. We also introduce an efficient, easy to implement approximation algorithm for the LLPD based on a multiscale analysis of adjacency graphs, which allows for the runtime of LLPD spectral clustering to be quasilinear in the number of data points. △ Less

Submitted 6 March, 2019; v1 submitted 17 December, 2017; originally announced December 2017.

Comments: 59 pages, 12 figures

arXiv:1709.01233 [pdf, other]

Supervised Dimensionality Reduction for Big Data

Authors: Joshua T. Vogelstein, Eric Bridgeford, Minh Tang, Da Zheng, Christopher Douville, Randal Burns, Mauro Maggioni

Abstract: To solve key biomedical problems, experimentalists now routinely measure millions or billions of features (dimensions) per sample, with the hope that data science techniques will be able to build accurate data-driven inferences. Because sample sizes are typically orders of magnitude smaller than the dimensionality of these data, valid inferences require finding a low-dimensional representation tha… ▽ More To solve key biomedical problems, experimentalists now routinely measure millions or billions of features (dimensions) per sample, with the hope that data science techniques will be able to build accurate data-driven inferences. Because sample sizes are typically orders of magnitude smaller than the dimensionality of these data, valid inferences require finding a low-dimensional representation that preserves the discriminating information (e.g., whether the individual suffers from a particular disease). There is a lack of interpretable supervised dimensionality reduction methods that scale to millions of dimensions with strong statistical theoretical guarantees.We introduce an approach, XOX, to extending principal components analysis by incorporating class-conditional moment estimates into the low-dimensional projection. The simplest ver-sion, "Linear Optimal Low-rank" projection (LOL), incorporates the class-conditional means. We prove, and substantiate with both synthetic and real data benchmarks, that LOL and its generalizations in the XOX framework lead to improved data representations for subsequent classification, while maintaining computational efficiency and scalability. Using multiple brain imaging datasets consisting of >150 million features, and several genomics datasets with>500,000 features, LOL outperforms other scalable linear dimensionality reduction techniques in terms of accuracy, while only requiring a few minutes on a standard desktop computer. △ Less

Submitted 23 January, 2021; v1 submitted 5 September, 2017; originally announced September 2017.

Comments: 6 figures

arXiv:1611.01179 [pdf, other]

Adaptive Geometric Multiscale Approximations for Intrinsically Low-dimensional Data

Authors: Wenjing Liao, Mauro Maggioni

Abstract: We consider the problem of efficiently approximating and encoding high-dimensional data sampled from a probability distribution $ρ$ in $\mathbb{R}^D$, that is nearly supported on a $d$-dimensional set $\mathcal{M}$ - for example supported on a $d$-dimensional Riemannian manifold. Geometric Multi-Resolution Analysis (GMRA) provides a robust and computationally efficient procedure to construct low-d… ▽ More We consider the problem of efficiently approximating and encoding high-dimensional data sampled from a probability distribution $ρ$ in $\mathbb{R}^D$, that is nearly supported on a $d$-dimensional set $\mathcal{M}$ - for example supported on a $d$-dimensional Riemannian manifold. Geometric Multi-Resolution Analysis (GMRA) provides a robust and computationally efficient procedure to construct low-dimensional geometric approximations of $\mathcal{M}$ at varying resolutions. We introduce a thresholding algorithm on the geometric wavelet coefficients, leading to what we call adaptive GMRA approximations. We show that these data-driven, empirical approximations perform well, when the threshold is chosen as a suitable universal function of the number of samples $n$, on a wide variety of measures $ρ$, that are allowed to exhibit different regularity at different scales and locations, thereby efficiently encoding data from more complex measures than those supported on manifolds. These approximations yield a data-driven dictionary, together with a fast transform mapping data to coefficients, and an inverse of such a map. The algorithms for both the dictionary construction and the transforms have complexity $C n \log n$ with the constant linear in $D$ and exponential in $d$. Our work therefore establishes adaptive GMRA as a fast dictionary learning algorithm with approximation guarantees. We include several numerical experiments on both synthetic and real data, confirming our theoretical results and demonstrating the effectiveness of adaptive GMRA. △ Less

Submitted 18 July, 2017; v1 submitted 3 November, 2016; originally announced November 2016.

arXiv:1609.05148 [pdf, other]

doi 10.7554/eLife.41690

Discovering and Deciphering Relationships Across Disparate Data Modalities

Authors: Joshua T. Vogelstein, Eric Bridgeford, Qing Wang, Carey E. Priebe, Mauro Maggioni, Cencheng Shen

Abstract: Understanding the relationships between different properties of data, such as whether a connectome or genome has information about disease status, is becoming increasingly important in modern biological datasets. While existing approaches can test whether two properties are related, they often require unfeasibly large sample sizes in real data scenarios, and do not provide any insight into how or… ▽ More Understanding the relationships between different properties of data, such as whether a connectome or genome has information about disease status, is becoming increasingly important in modern biological datasets. While existing approaches can test whether two properties are related, they often require unfeasibly large sample sizes in real data scenarios, and do not provide any insight into how or why the procedure reached its decision. Our approach, "Multiscale Graph Correlation" (MGC), is a dependence test that juxtaposes previously disparate data science techniques, including k-nearest neighbors, kernel methods (such as support vector machines), and multiscale analysis (such as wavelets). Other methods typically require double or triple the number samples to achieve the same statistical power as MGC in a benchmark suite including high-dimensional and nonlinear relationships - spanning polynomial (linear, quadratic, cubic), trigonometric (sinusoidal, circular, ellipsoidal, spiral), geometric (square, diamond, W-shape), and other functions, with dimensionality ranging from 1 to 1000. Moreover, MGC uniquely provides a simple and elegant characterization of the potentially complex latent geometry underlying the relationship, providing insight while maintaining computational efficiency. In several real data applications, including brain imaging and cancer genetics, MGC is the only method that can both detect the presence of a dependency and provide specific guidance for the next experiment and/or analysis to conduct. △ Less

Submitted 6 December, 2018; v1 submitted 16 September, 2016; originally announced September 2016.

Journal ref: eLife 8, e41690, 2019

arXiv:1509.07497 [pdf, other]

High Dimensional Data Modeling Techniques for Detection of Chemical Plumes and Anomalies in Hyperspectral Images and Movies

Authors: Yi, Wang, Guangliang Chen, Mauro Maggioni

Abstract: We briefly review recent progress in techniques for modeling and analyzing hyperspectral images and movies, in particular for detecting plumes of both known and unknown chemicals. For detecting chemicals of known spectrum, we extend the technique of using a single subspace for modeling the background to a "mixture of subspaces" model to tackle more complicated background. Furthermore, we use parti… ▽ More We briefly review recent progress in techniques for modeling and analyzing hyperspectral images and movies, in particular for detecting plumes of both known and unknown chemicals. For detecting chemicals of known spectrum, we extend the technique of using a single subspace for modeling the background to a "mixture of subspaces" model to tackle more complicated background. Furthermore, we use partial least squares regression on a resampled training set to boost performance. For the detection of unknown chemicals we view the problem as an anomaly detection problem, and use novel estimators with low-sampled complexity for intrinsically low-dimensional data in high-dimensions that enable us to model the "normal" spectra and detect anomalies. We apply these algorithms to benchmark data sets made available by the Automated Target Detection program co-funded by NSF, DTRA and NGA, and compare, when applicable, to current state-of-the-art algorithms, with favorable results. △ Less

Submitted 29 January, 2016; v1 submitted 24 September, 2015; originally announced September 2015.

arXiv:1506.03410 [pdf, other]

Sparse Projection Oblique Randomer Forests

Authors: Tyler M. Tomita, James Browne, Cencheng Shen, Jaewon Chung, Jesse L. Patsolic, Benjamin Falk, Jason Yim, Carey E. Priebe, Randal Burns, Mauro Maggioni, Joshua T. Vogelstein

Abstract: Decision forests, including Random Forests and Gradient Boosting Trees, have recently demonstrated state-of-the-art performance in a variety of machine learning settings. Decision forests are typically ensembles of axis-aligned decision trees; that is, trees that split only along feature dimensions. In contrast, many recent extensions to decision forests are based on axis-oblique splits. Unfortuna… ▽ More Decision forests, including Random Forests and Gradient Boosting Trees, have recently demonstrated state-of-the-art performance in a variety of machine learning settings. Decision forests are typically ensembles of axis-aligned decision trees; that is, trees that split only along feature dimensions. In contrast, many recent extensions to decision forests are based on axis-oblique splits. Unfortunately, these extensions forfeit one or more of the favorable properties of decision forests based on axis-aligned splits, such as robustness to many noise dimensions, interpretability, or computational efficiency. We introduce yet another decision forest, called "Sparse Projection Oblique Randomer Forests" (SPORF). SPORF uses very sparse random projections, i.e., linear combinations of a small subset of features. SPORF significantly improves accuracy over existing state-of-the-art algorithms on a standard benchmark suite for classification with >100 problems of varying dimension, sample size, and number of classes. To illustrate how SPORF addresses the limitations of both axis-aligned and existing oblique decision forest methods, we conduct extensive simulated experiments. SPORF typically yields improved performance over existing decision forests, while mitigating computational efficiency and scalability and maintaining interpretability. SPORF can easily be incorporated into other ensemble methods such as boosting to obtain potentially similar gains. △ Less

Submitted 3 October, 2019; v1 submitted 10 June, 2015; originally announced June 2015.

Comments: 31 pages; submitted to Journal of Machine Learning Research for review

MSC Class: 68T10 ACM Class: I.5.2

Journal ref: Journal of Machine Learning Research 21(104), 1-39, 2020

arXiv:1212.1143 [pdf, other]

Multiscale Markov Decision Problems: Compression, Solution, and Transfer Learning

Authors: Jake Bouvrie, Mauro Maggioni

Abstract: Many problems in sequential decision making and stochastic control often have natural multiscale structure: sub-tasks are assembled together to accomplish complex goals. Systematically inferring and leveraging hierarchical structure, particularly beyond a single level of abstraction, has remained a longstanding challenge. We describe a fast multiscale procedure for repeatedly compressing, or homog… ▽ More Many problems in sequential decision making and stochastic control often have natural multiscale structure: sub-tasks are assembled together to accomplish complex goals. Systematically inferring and leveraging hierarchical structure, particularly beyond a single level of abstraction, has remained a longstanding challenge. We describe a fast multiscale procedure for repeatedly compressing, or homogenizing, Markov decision processes (MDPs), wherein a hierarchy of sub-problems at different scales is automatically determined. Coarsened MDPs are themselves independent, deterministic MDPs, and may be solved using existing algorithms. The multiscale representation delivered by this procedure decouples sub-tasks from each other and can lead to substantial improvements in convergence rates both locally within sub-problems and globally across sub-problems, yielding significant computational savings. A second fundamental aspect of this work is that these multiscale decompositions yield new transfer opportunities across different problems, where solutions of sub-tasks at different levels of the hierarchy may be amenable to transfer to new problems. Localized transfer of policies and potential operators at arbitrary scales is emphasized. Finally, we demonstrate compression and transfer in a collection of illustrative domains, including examples involving discrete and continuous statespaces. △ Less

Submitted 5 December, 2012; originally announced December 2012.

Comments: 86 pages, 15 figures

arXiv:1105.4924 [pdf, ps, other]

Multiscale Geometric Methods for Data Sets II: Geometric Multi-Resolution Analysis

Authors: William K. Allard, Guangliang Chen, Mauro Maggioni

Abstract: Data sets are often modeled as point clouds in $R^D$, for $D$ large. It is often assumed that the data has some interesting low-dimensional structure, for example that of a $d$-dimensional manifold $M$, with $d$ much smaller than $D$. When $M$ is simply a linear subspace, one may exploit this assumption for encoding efficiently the data by projecting onto a dictionary of $d$ vectors in $R^D$ (for… ▽ More Data sets are often modeled as point clouds in $R^D$, for $D$ large. It is often assumed that the data has some interesting low-dimensional structure, for example that of a $d$-dimensional manifold $M$, with $d$ much smaller than $D$. When $M$ is simply a linear subspace, one may exploit this assumption for encoding efficiently the data by projecting onto a dictionary of $d$ vectors in $R^D$ (for example found by SVD), at a cost $(n+D)d$ for $n$ data points. When $M$ is nonlinear, there are no "explicit" constructions of dictionaries that achieve a similar efficiency: typically one uses either random dictionaries, or dictionaries obtained by black-box optimization. In this paper we construct data-dependent multi-scale dictionaries that aim at efficient encoding and manipulating of the data. Their construction is fast, and so are the algorithms that map data points to dictionary coefficients and vice versa. In addition, data points are guaranteed to have a sparse representation in terms of the dictionary. We think of dictionaries as the analogue of wavelets, but for approximating point clouds rather than functions. △ Less

Submitted 7 September, 2011; v1 submitted 24 May, 2011; originally announced May 2011.

Comments: Re-formatted using AMS style

Showing 1–24 of 24 results for author: Maggioni, M