-
Covariate-dependent Graphical Model Estimation via Neural Networks with Statistical Guarantees
Authors:
Jiahe Lin,
Yikai Zhang,
George Michailidis
Abstract:
Graphical models are widely used in diverse application domains to model the conditional dependencies amongst a collection of random variables. In this paper, we consider settings where the graph structure is covariate-dependent, and investigate a deep neural network-based approach to estimate it. The method allows for flexible functional dependency on the covariate, and fits the data reasonably w…
▽ More
Graphical models are widely used in diverse application domains to model the conditional dependencies amongst a collection of random variables. In this paper, we consider settings where the graph structure is covariate-dependent, and investigate a deep neural network-based approach to estimate it. The method allows for flexible functional dependency on the covariate, and fits the data reasonably well in the absence of a Gaussianity assumption. Theoretical results with PAC guarantees are established for the method, under assumptions commonly used in an Empirical Risk Minimization framework. The performance of the proposed method is evaluated on several synthetic data settings and benchmarked against existing approaches. The method is further illustrated on real datasets involving data from neuroscience and finance, respectively, and produces interpretable results.
△ Less
Submitted 22 April, 2025;
originally announced April 2025.
-
Neural Network-Based Change Point Detection for Large-Scale Time-Evolving Data
Authors:
Jialiang Geng,
George Michailidis
Abstract:
The paper studies the problem of detecting and locating change points in multivariate time-evolving data. The problem has a long history in statistics and signal processing and various algorithms have been developed primarily for simple parametric models. In this work, we focus on modeling the data through feed-forward neural networks and develop a detection strategy based on the following two-ste…
▽ More
The paper studies the problem of detecting and locating change points in multivariate time-evolving data. The problem has a long history in statistics and signal processing and various algorithms have been developed primarily for simple parametric models. In this work, we focus on modeling the data through feed-forward neural networks and develop a detection strategy based on the following two-step procedure. In the first step, the neural network is trained over a prespecified window of the data, and its test error function is calibrated over another prespecified window. Then, the test error function is used over a moving window to identify the change point. Once a change point is detected, the procedure involving these two steps is repeated until all change points are identified. The proposed strategy yields consistent estimates for both the number and the locations of the change points under temporal dependence of the data-generating process. The effectiveness of the proposed strategy is illustrated on synthetic data sets that provide insights on how to select in practice tuning parameters of the algorithm and in real data sets. Finally, we note that although the detection strategy is general and can work with different neural network architectures, the theoretical guarantees provided are specific to feed-forward neural architectures.
△ Less
Submitted 12 March, 2025;
originally announced March 2025.
-
Deep Learning-based Approaches for State Space Models: A Selective Review
Authors:
Jiahe Lin,
George Michailidis
Abstract:
State-space models (SSMs) offer a powerful framework for dynamical system analysis, wherein the temporal dynamics of the system are assumed to be captured through the evolution of the latent states, which govern the values of the observations. This paper provides a selective review of recent advancements in deep neural network-based approaches for SSMs, and presents a unified perspective for discr…
▽ More
State-space models (SSMs) offer a powerful framework for dynamical system analysis, wherein the temporal dynamics of the system are assumed to be captured through the evolution of the latent states, which govern the values of the observations. This paper provides a selective review of recent advancements in deep neural network-based approaches for SSMs, and presents a unified perspective for discrete time deep state space models and continuous time ones such as latent neural Ordinary Differential and Stochastic Differential Equations. It starts with an overview of the classical maximum likelihood based approach for learning SSMs, reviews variational autoencoder as a general learning pipeline for neural network-based approaches in the presence of latent variables, and discusses in detail representative deep learning models that fall under the SSM framework. Very recent developments, where SSMs are used as standalone architectural modules for improving efficiency in sequence modeling, are also examined. Finally, examples involving mixed frequency and irregularly-spaced time series data are presented to demonstrate the advantage of SSMs in these settings.
△ Less
Submitted 15 December, 2024;
originally announced December 2024.
-
A generalized Bayesian approach for high-dimensional robust regression with serially correlated errors and predictors
Authors:
Saptarshi Chakraborty,
Kshitij Khare,
George Michailidis
Abstract:
This paper introduces a loss-based generalized Bayesian methodology for high-dimensional robust regression with serially correlated errors and predictors. The proposed framework employs a novel scaled pseudo-Huber (SPH) loss function, which smooths the well-known Huber loss, effectively balancing quadratic ($\ell_2$) and absolute linear ($\ell_1$) loss behaviors. This flexibility enables the frame…
▽ More
This paper introduces a loss-based generalized Bayesian methodology for high-dimensional robust regression with serially correlated errors and predictors. The proposed framework employs a novel scaled pseudo-Huber (SPH) loss function, which smooths the well-known Huber loss, effectively balancing quadratic ($\ell_2$) and absolute linear ($\ell_1$) loss behaviors. This flexibility enables the framework to accommodate both thin-tailed and heavy-tailed data efficiently. The generalized Bayesian approach constructs a working likelihood based on the SPH loss, facilitating efficient and stable estimation while providing rigorous uncertainty quantification for all model parameters. Notably, this approach allows formal statistical inference without requiring ad hoc tuning parameter selection while adaptively addressing a wide range of tail behavior in the errors. By specifying appropriate prior distributions for the regression coefficients--such as ridge priors for small or moderate-dimensional settings and spike-and-slab priors for high-dimensional settings--the framework ensures principled inference. We establish rigorous theoretical guarantees for accurate parameter estimation and correct predictor selection under sparsity assumptions for a wide range of data generating setups. Extensive simulation studies demonstrate the superior performance of our approach compared to traditional Bayesian regression methods based on $\ell_2$ and $\ell_1$-loss functions. The results highlight its flexibility and robustness, particularly in challenging high-dimensional settings characterized by data contamination.
△ Less
Submitted 12 March, 2025; v1 submitted 7 December, 2024;
originally announced December 2024.
-
A VAE-based Framework for Learning Multi-Level Neural Granger-Causal Connectivity
Authors:
Jiahe Lin,
Huitian Lei,
George Michailidis
Abstract:
Granger causality has been widely used in various application domains to capture lead-lag relationships amongst the components of complex dynamical systems, and the focus in extant literature has been on a single dynamical system. In certain applications in macroeconomics and neuroscience, one has access to data from a collection of related such systems, wherein the modeling task of interest is to…
▽ More
Granger causality has been widely used in various application domains to capture lead-lag relationships amongst the components of complex dynamical systems, and the focus in extant literature has been on a single dynamical system. In certain applications in macroeconomics and neuroscience, one has access to data from a collection of related such systems, wherein the modeling task of interest is to extract the shared common structure that is embedded across them, as well as to identify the idiosyncrasies within individual ones. This paper introduces a Variational Autoencoder (VAE) based framework that jointly learns Granger-causal relationships amongst components in a collection of related-yet-heterogeneous dynamical systems, and handles the aforementioned task in a principled way. The performance of the proposed framework is evaluated on several synthetic data settings and benchmarked against existing approaches designed for individual system learning. The method is further illustrated on a real dataset involving time series data from a neurophysiological experiment and produces interpretable results.
△ Less
Submitted 25 February, 2024;
originally announced February 2024.
-
A Functional Coefficients Network Autoregressive Model
Authors:
Hang Yin,
Abolfazl Safikhani,
George Michailidis
Abstract:
The paper introduces a flexible model for the analysis of multivariate nonlinear time series data. The proposed Functional Coefficients Network Autoregressive (FCNAR) model considers the response of each node in the network to depend in a nonlinear fashion to each own past values (autoregressive component), as well as past values of each neighbor (network component). Key issues of model stability/…
▽ More
The paper introduces a flexible model for the analysis of multivariate nonlinear time series data. The proposed Functional Coefficients Network Autoregressive (FCNAR) model considers the response of each node in the network to depend in a nonlinear fashion to each own past values (autoregressive component), as well as past values of each neighbor (network component). Key issues of model stability/stationarity, together with model parameter identifiability, estimation and inference are addressed for error processes that can be heavier than Gaussian for both fixed and growing number of network nodes. The performance of the estimators for the FCNAR model is assessed on synthetic data and the applicability of the model is illustrated on multiple indicators of air pollution data.
△ Less
Submitted 11 February, 2024;
originally announced February 2024.
-
Structural Discovery with Partial Ordering Information for Time-Dependent Data with Convergence Guarantees
Authors:
Jiahe Lin,
Huitian Lei,
George Michailidis
Abstract:
Structural discovery amongst a set of variables is of interest in both static and dynamic settings. In the presence of lead-lag dependencies in the data, the dynamics of the system can be represented through a structural equation model (SEM) that simultaneously captures the contemporaneous and temporal relationships amongst the variables, with the former encoded through a directed acyclic graph (D…
▽ More
Structural discovery amongst a set of variables is of interest in both static and dynamic settings. In the presence of lead-lag dependencies in the data, the dynamics of the system can be represented through a structural equation model (SEM) that simultaneously captures the contemporaneous and temporal relationships amongst the variables, with the former encoded through a directed acyclic graph (DAG) for model identification. In many real applications, a partial ordering amongst the nodes of the DAG is available, which makes it either beneficial or imperative to incorporate it as a constraint in the problem formulation. This paper develops an algorithm that can seamlessly incorporate a priori partial ordering information for solving a linear SEM (also known as Structural Vector Autoregression) under a high-dimensional setting. The proposed algorithm is provably convergent to a stationary point, and exhibits competitive performance on both synthetic and real data sets.
△ Less
Submitted 26 November, 2023;
originally announced November 2023.
-
Low Tree-Rank Bayesian Vector Autoregression Model
Authors:
Leo L. Duan,
Zeyu Yuwen,
George Michailidis,
Zhengwu Zhang
Abstract:
Vector autoregression has been widely used for modeling and analysis of multivariate time series data. In high-dimensional settings, model parameter regularization schemes inducing sparsity yield interpretable models and achieved good forecasting performance. However, in many data applications, such as those in neuroscience, the Granger causality graph estimates from existing vector autoregression…
▽ More
Vector autoregression has been widely used for modeling and analysis of multivariate time series data. In high-dimensional settings, model parameter regularization schemes inducing sparsity yield interpretable models and achieved good forecasting performance. However, in many data applications, such as those in neuroscience, the Granger causality graph estimates from existing vector autoregression methods tend to be quite dense and difficult to interpret, unless one compromises on the goodness-of-fit. To address this issue, this paper proposes to incorporate a commonly used structural assumption -- that the ground-truth graph should be largely connected, in the sense that it should only contain at most a few components. We take a Bayesian approach and develop a novel tree-rank prior distribution for the regression coefficients. Specifically, this prior distribution forces the non-zero coefficients to appear only on the union of a few spanning trees. Since each spanning tree connects $p$ nodes with only $(p-1)$ edges, it effectively achieves both high connectivity and high sparsity. We develop a computationally efficient Gibbs sampler that is scalable to large sample size and high dimension. In analyzing test-retest functional magnetic resonance imaging data, our model produces a much more interpretable graph estimate, compared to popular existing approaches. In addition, we show appealing properties of this new method, such as efficient computation, mild stability conditions and posterior consistency.
△ Less
Submitted 6 June, 2023; v1 submitted 4 April, 2022;
originally announced April 2022.
-
Multivariate Analysis for Multiple Network Data via Semi-Symmetric Tensor PCA
Authors:
Michael Weylandt,
George Michailidis
Abstract:
Network data are commonly collected in a variety of applications, representing either directly measured or statistically inferred connections between features of interest. In an increasing number of domains, these networks are collected over time, such as interactions between users of a social media platform on different days, or across multiple subjects, such as in multi-subject studies of brain…
▽ More
Network data are commonly collected in a variety of applications, representing either directly measured or statistically inferred connections between features of interest. In an increasing number of domains, these networks are collected over time, such as interactions between users of a social media platform on different days, or across multiple subjects, such as in multi-subject studies of brain connectivity. When analyzing multiple large networks, dimensionality reduction techniques are often used to embed networks in a more tractable low-dimensional space. To this end, we develop a framework for principal components analysis (PCA) on collections of networks via a specialized tensor decomposition we term Semi-Symmetric Tensor PCA or SS-TPCA. We derive computationally efficient algorithms for computing our proposed SS-TPCA decomposition and establish statistical efficiency of our approach under a standard low-rank signal plus noise model. Remarkably, we show that SS-TPCA achieves the same estimation accuracy as classical matrix PCA, with error proportional to the square root of the number of vertices in the network and not the number of edges as might be expected. Our framework inherits many of the strengths of classical PCA and is suitable for a wide range of unsupervised learning tasks, including identifying principal networks, isolating meaningful changepoints or outlying observations, and for characterizing the "variability network" of the most varying edges. Finally, we demonstrate the effectiveness of our proposal on simulated data and on an example from empirical legal studies. The techniques used to establish our main consistency results are surprisingly straightforward and may find use in a variety of other network analysis problems.
△ Less
Submitted 2 September, 2022; v1 submitted 9 February, 2022;
originally announced February 2022.
-
A generalized likelihood based Bayesian approach for scalable joint regression and covariance selection in high dimensions
Authors:
Srijata Samanta,
Kshitij Khare,
George Michailidis
Abstract:
The paper addresses joint sparsity selection in the regression coefficient matrix and the error precision (inverse covariance) matrix for high-dimensional multivariate regression models in the Bayesian paradigm. The selected sparsity patterns are crucial to help understand the network of relationships between the predictor and response variables, as well as the conditional relationships among the…
▽ More
The paper addresses joint sparsity selection in the regression coefficient matrix and the error precision (inverse covariance) matrix for high-dimensional multivariate regression models in the Bayesian paradigm. The selected sparsity patterns are crucial to help understand the network of relationships between the predictor and response variables, as well as the conditional relationships among the latter. While Bayesian methods have the advantage of providing natural uncertainty quantification through posterior inclusion probabilities and credible intervals, current Bayesian approaches either restrict to specific sub-classes of sparsity patterns and/or are not scalable to settings with hundreds of responses and predictors. Bayesian approaches which only focus on estimating the posterior mode are scalable, but do not generate samples from the posterior distribution for uncertainty quantification. Using a bi-convex regression based generalized likelihood and spike-and-slab priors, we develop an algorithm called Joint Regression Network Selector (JRNS) for joint regression and covariance selection which (a) can accommodate general sparsity patterns, (b) provides posterior samples for uncertainty quantification, and (c) is scalable and orders of magnitude faster than the state-of-the-art Bayesian approaches providing uncertainty quantification. We demonstrate the statistical and computational efficacy of the proposed approach on synthetic data and through the analysis of selected cancer data sets. We also establish high-dimensional posterior consistency for one of the developed algorithms.
△ Less
Submitted 14 January, 2022;
originally announced January 2022.
-
Hybrid Modeling of Regional COVID-19 Transmission Dynamics in the U.S
Authors:
Yue Bai,
Abolfazl Safikhani,
George Michailidis
Abstract:
The fast transmission rate of COVID-19 worldwide has made this virus the most important challenge of year 2020. Many mitigation policies have been imposed by the governments at different regional levels (country, state, county, and city) to stop the spread of this virus. Quantifying the effect of such mitigation strategies on the transmission and recovery rates, and predicting the rate of new dail…
▽ More
The fast transmission rate of COVID-19 worldwide has made this virus the most important challenge of year 2020. Many mitigation policies have been imposed by the governments at different regional levels (country, state, county, and city) to stop the spread of this virus. Quantifying the effect of such mitigation strategies on the transmission and recovery rates, and predicting the rate of new daily cases are two crucial tasks. In this paper, we propose a hybrid modeling framework which not only accounts for such policies but also utilizes the spatial and temporal information to characterize the pattern of COVID-19 progression. Specifically, a piecewise susceptible-infected-recovered (SIR) model is developed while the dates at which the transmission/recover rates change significantly are defined as "break points" in this model. A novel and data-driven algorithm is designed to locate the break points using ideas from fused lasso and thresholding. In order to enhance the forecasting power and to describe additional temporal dependence among the daily number of cases, this model is further coupled with spatial smoothing covariates and vector auto-regressive (VAR) model. The proposed model is applied to several U.S. states and counties, and the results confirm the effect of "stay-at-home orders" and some states' early "re-openings" by detecting break points close to such events. Further, the model provided satisfactory short-term forecasts of the number of new daily cases at regional levels by utilizing the estimated spatio-temporal covariance structures. They were also better or on par with other proposed models in the literature, including flexible deep learning ones. Finally, selected theoretical results and empirical performance of the proposed methodology on synthetic data are reported which justify the good performance of the proposed method.
△ Less
Submitted 21 December, 2021;
originally announced December 2021.
-
Joint Learning of Linear Time-Invariant Dynamical Systems
Authors:
Aditya Modi,
Mohamad Kazem Shirani Faradonbeh,
Ambuj Tewari,
George Michailidis
Abstract:
Linear time-invariant systems are very popular models in system theory and applications. A fundamental problem in system identification that remains rather unaddressed in extant literature is to leverage commonalities amongst related linear systems to estimate their transition matrices more accurately. To address this problem, the current paper investigates methods for jointly estimating the trans…
▽ More
Linear time-invariant systems are very popular models in system theory and applications. A fundamental problem in system identification that remains rather unaddressed in extant literature is to leverage commonalities amongst related linear systems to estimate their transition matrices more accurately. To address this problem, the current paper investigates methods for jointly estimating the transition matrices of multiple systems. It is assumed that the transition matrices are unknown linear functions of some unknown shared basis matrices. We establish finite-time estimation error rates that fully reflect the roles of trajectory lengths, dimension, and number of systems under consideration. The presented results are fairly general and show the significant gains that can be achieved by pooling data across systems in comparison to learning each system individually. Further, they are shown to be robust against model misspecifications. To obtain the results, we develop novel techniques that are of interest for addressing similar joint-learning problems. They include tightly bounding estimation errors in terms of the eigen-structures of transition matrices, establishing sharp high probability bounds for singular values of dependent random matrices, and capturing effects of misspecified transition matrices as the systems evolve over time.
△ Less
Submitted 2 January, 2024; v1 submitted 20 December, 2021;
originally announced December 2021.
-
A General Modeling Framework for Network Autoregressive Processes
Authors:
Hang Yin,
Abolfazl Safikhani,
George Michailidis
Abstract:
The paper develops a general flexible framework for Network Autoregressive Processes (NAR), wherein the response of each node linearly depends on its past values, a prespecified linear combination of neighboring nodes and a set of node-specific covariates. The corresponding coefficients are node-specific, while the framework can accommodate heavier than Gaussian errors with both spatial-autorgress…
▽ More
The paper develops a general flexible framework for Network Autoregressive Processes (NAR), wherein the response of each node linearly depends on its past values, a prespecified linear combination of neighboring nodes and a set of node-specific covariates. The corresponding coefficients are node-specific, while the framework can accommodate heavier than Gaussian errors with both spatial-autorgressive and factor based covariance structures. We provide a sufficient condition that ensures the stability (stationarity) of the underlying NAR that is significantly weaker than its counterparts in previous work in the literature. Further, we develop ordinary and generalized least squares estimators for both a fixed, as well as a diverging number of network nodes, and also provide their ridge regularized counterparts that exhibit better performance in large network settings, together with their asymptotic distributions. We also address the issue of misspecifying the network connectivity and its impact on the aforementioned asymptotic distributions of the various NAR parameter estimators. The framework is illustrated on both synthetic and real air pollution data.
△ Less
Submitted 18 October, 2021;
originally announced October 2021.
-
High Dimensional Logistic Regression Under Network Dependence
Authors:
Somabha Mukherjee,
Ziang Niu,
Sagnik Halder,
Bhaswar B. Bhattacharya,
George Michailidis
Abstract:
Logistic regression is key method for modeling the probability of a binary outcome based on a collection of covariates. However, the classical formulation of logistic regression relies on the independent sampling assumption, which is often violated when the outcomes interact through an underlying network structure, such as over a temporal/spatial domain or on a social network. This necessitates th…
▽ More
Logistic regression is key method for modeling the probability of a binary outcome based on a collection of covariates. However, the classical formulation of logistic regression relies on the independent sampling assumption, which is often violated when the outcomes interact through an underlying network structure, such as over a temporal/spatial domain or on a social network. This necessitates the development of models that can simultaneously handle both the network `peer-effect' and the effect of high-dimensional covariates. In this paper, we develop a framework for incorporating such dependencies in a high-dimensional logistic regression model by introducing a quadratic interaction term, as in the Ising model, designed to capture the pairwise interactions from the underlying network. The resulting model can also be viewed as an Ising model, where the node-dependent external fields linearly encode the high-dimensional covariates. We propose a penalized maximum pseudo-likelihood method for estimating the network peer-effect and the effect of the covariates (the regression coefficients), which, in addition to handling the high-dimensionality of the parameters, conveniently avoids the computational intractability of the maximum likelihood approach. Under various standard regularity conditions, we show that the corresponding estimate attains the classical high-dimensional rate of consistency. Our results imply that even under network dependence it is possible to consistently estimate the model parameters at the same rate as in classical (independent) logistic regression, when the true parameter is sparse and the underlying network is not too dense. We also develop an efficient algorithm for computing the estimates and validate our theoretical results in numerical experiments. An application to selecting genes in clustering spatial transcriptomics data is also discussed.
△ Less
Submitted 24 September, 2024; v1 submitted 7 October, 2021;
originally announced October 2021.
-
Multiple Change Point Detection in Reduced Rank High Dimensional Vector Autoregressive Models
Authors:
Peiliang Bai,
Abolfazl Safikhani,
George Michailidis
Abstract:
We study the problem of detecting and locating change points in high-dimensional Vector Autoregressive (VAR) models, whose transition matrices exhibit low rank plus sparse structure. We first address the problem of detecting a single change point using an exhaustive search algorithm and establish a finite sample error bound for its accuracy. Next, we extend the results to the case of multiple chan…
▽ More
We study the problem of detecting and locating change points in high-dimensional Vector Autoregressive (VAR) models, whose transition matrices exhibit low rank plus sparse structure. We first address the problem of detecting a single change point using an exhaustive search algorithm and establish a finite sample error bound for its accuracy. Next, we extend the results to the case of multiple change points that can grow as a function of the sample size. Their detection is based on a two-step algorithm, wherein the first step, an exhaustive search for a candidate change point is employed for overlapping windows, and subsequently, a backward elimination procedure is used to screen out redundant candidates. The two-step strategy yields consistent estimates of the number and the locations of the change points. To reduce computation cost, we also investigate conditions under which a surrogate VAR model with a weakly sparse transition matrix can accurately estimate the change points and their locations for data generated by the original model. This work also addresses and resolves a number of novel technical challenges posed by the nature of the VAR models under consideration. The effectiveness of the proposed algorithms and methodology is illustrated on both synthetic and two real data sets.
△ Less
Submitted 29 September, 2021;
originally announced September 2021.
-
A Fast Detection Method of Break Points in Effective Connectivity Networks
Authors:
Peiliang Bai,
Abolfazl Safikhani,
George Michailidis
Abstract:
There is increasing interest in identifying changes in the underlying states of brain networks. The availability of large scale neuroimaging data creates a strong need to develop fast, scalable methods for detecting and localizing in time such changes and also identify their drivers, thus enabling neuroscientists to hypothesize about potential mechanisms. This paper presents a fast method for dete…
▽ More
There is increasing interest in identifying changes in the underlying states of brain networks. The availability of large scale neuroimaging data creates a strong need to develop fast, scalable methods for detecting and localizing in time such changes and also identify their drivers, thus enabling neuroscientists to hypothesize about potential mechanisms. This paper presents a fast method for detecting break points in exceedingly long time series neuroimaging data, based on vector autoregressive (Granger causal) models. It uses a multi-step strategy based on a regularized objective function that leads to fast identification of candidate break points, followed by clustering steps to select the final set of break points and subsequent estimation with false positives control of the underlying Granger causal networks. The latter provides insights into key changes in network connectivity that led to the presence of break points. The proposed methodology is illustrated on synthetic data varying in their length, dimensionality, number of break points, strength of the signal, and also applied to EEG data related to visual tasks.
△ Less
Submitted 8 January, 2022; v1 submitted 29 September, 2021;
originally announced September 2021.
-
Inference for Change Points in High Dimensional Mean Shift Models
Authors:
Abhishek Kaul,
George Michailidis
Abstract:
We consider the problem of constructing confidence intervals for the locations of change points in a high-dimensional mean shift model. To that end, we develop a locally refitted least squares estimator and obtain component-wise and simultaneous rates of estimation of the underlying change points. The simultaneous rate is the sharpest available in the literature by at least a factor of $\log p,$ w…
▽ More
We consider the problem of constructing confidence intervals for the locations of change points in a high-dimensional mean shift model. To that end, we develop a locally refitted least squares estimator and obtain component-wise and simultaneous rates of estimation of the underlying change points. The simultaneous rate is the sharpest available in the literature by at least a factor of $\log p,$ while the component-wise one is optimal. These results enable existence of limiting distributions. Component-wise distributions are characterized under both vanishing and non-vanishing jump size regimes, while joint distributions for any finite subset of change point estimates are characterized under the latter regime, which also yields asymptotic independence of these estimates. The combined results are used to construct asymptotically valid component-wise and simultaneous confidence intervals for the change point parameters. The results are established under a high dimensional scaling, allowing for diminishing jump sizes, in the presence of diverging number of change points and under subexponential errors. They are illustrated on synthetic data and on sensor measurements from smartphones for activity recognition.
△ Less
Submitted 19 July, 2021;
originally announced July 2021.
-
A Decentralized Adaptive Momentum Method for Solving a Class of Min-Max Optimization Problems
Authors:
Babak Barazandeh,
Tianjian Huang,
George Michailidis
Abstract:
Min-max saddle point games have recently been intensely studied, due to their wide range of applications, including training Generative Adversarial Networks (GANs). However, most of the recent efforts for solving them are limited to special regimes such as convex-concave games. Further, it is customarily assumed that the underlying optimization problem is solved either by a single machine or in th…
▽ More
Min-max saddle point games have recently been intensely studied, due to their wide range of applications, including training Generative Adversarial Networks (GANs). However, most of the recent efforts for solving them are limited to special regimes such as convex-concave games. Further, it is customarily assumed that the underlying optimization problem is solved either by a single machine or in the case of multiple machines connected in centralized fashion, wherein each one communicates with a central node. The latter approach becomes challenging, when the underlying communications network has low bandwidth. In addition, privacy considerations may dictate that certain nodes can communicate with a subset of other nodes. Hence, it is of interest to develop methods that solve min-max games in a decentralized manner. To that end, we develop a decentralized adaptive momentum (ADAM)-type algorithm for solving min-max optimization problem under the condition that the objective function satisfies a Minty Variational Inequality condition, which is a generalization to convex-concave case. The proposed method overcomes shortcomings of recent non-adaptive gradient-based decentralized algorithms for min-max optimization problems that do not perform well in practice and require careful tuning. In this paper, we obtain non-asymptotic rates of convergence of the proposed algorithm (coined DADAM$^3$) for finding a (stochastic) first-order Nash equilibrium point and subsequently evaluate its performance on training GANs. The extensive empirical evaluation shows that DADAM$^3$ outperforms recently developed methods, including decentralized optimistic stochastic gradient for solving such min-max problems.
△ Less
Submitted 28 June, 2021; v1 submitted 10 June, 2021;
originally announced June 2021.
-
Multiple Change Point Detection in Structured VAR Models: the VARDetect R Package
Authors:
Peiliang Bai,
Yue Bai,
Abolfazl Safikhani,
George Michailidis
Abstract:
Vector Auto-Regressive (VAR) models capture lead-lag temporal dynamics of multivariate time series data. They have been widely used in macroeconomics, financial econometrics, neuroscience and functional genomics. In many applications, the data exhibit structural changes in their autoregressive dynamics, which correspond to changes in the transition matrices of the VAR model that specify such dynam…
▽ More
Vector Auto-Regressive (VAR) models capture lead-lag temporal dynamics of multivariate time series data. They have been widely used in macroeconomics, financial econometrics, neuroscience and functional genomics. In many applications, the data exhibit structural changes in their autoregressive dynamics, which correspond to changes in the transition matrices of the VAR model that specify such dynamics. We present the R package VARDetect that implements two classes of algorithms to detect multiple change points in piecewise stationary VAR models. The first exhibits sublinear computational complexity in the number of time points and is best suited for structured sparse models, while the second exhibits linear time complexity and is designed for models whose transition matrices are assumed to have a low rank plus sparse decomposition. The package also has functions to generate data from the various variants of VAR models discussed, which is useful in simulation studies, as well as to visualize the results through network layouts.
△ Less
Submitted 13 October, 2021; v1 submitted 23 May, 2021;
originally announced May 2021.
-
Solving a class of non-convex min-max games using adaptive momentum methods
Authors:
Babak Barazandeh,
Davoud Ataee Tarzanagh,
George Michailidis
Abstract:
Adaptive momentum methods have recently attracted a lot of attention for training of deep neural networks. They use an exponential moving average of past gradients of the objective function to update both search directions and learning rates. However, these methods are not suited for solving min-max optimization problems that arise in training generative adversarial networks. In this paper, we pro…
▽ More
Adaptive momentum methods have recently attracted a lot of attention for training of deep neural networks. They use an exponential moving average of past gradients of the objective function to update both search directions and learning rates. However, these methods are not suited for solving min-max optimization problems that arise in training generative adversarial networks. In this paper, we propose an adaptive momentum min-max algorithm that generalizes adaptive momentum methods to the non-convex min-max regime. Further, we establish non-asymptotic rates of convergence for the proposed algorithm when used in a reasonably broad class of non-convex min-max optimization problems. Experimental results illustrate its superior performance vis-a-vis benchmark methods for solving such problems.
△ Less
Submitted 26 April, 2021;
originally announced April 2021.
-
Sparse Partial Least Squares for Coarse Noisy Graph Alignment
Authors:
Michael Weylandt,
George Michailidis,
T. Mitchell Roddenberry
Abstract:
Graph signal processing (GSP) provides a powerful framework for analyzing signals arising in a variety of domains. In many applications of GSP, multiple network structures are available, each of which captures different aspects of the same underlying phenomenon. To integrate these different data sources, graph alignment techniques attempt to find the best correspondence between vertices of two gra…
▽ More
Graph signal processing (GSP) provides a powerful framework for analyzing signals arising in a variety of domains. In many applications of GSP, multiple network structures are available, each of which captures different aspects of the same underlying phenomenon. To integrate these different data sources, graph alignment techniques attempt to find the best correspondence between vertices of two graphs. We consider a generalization of this problem, where there is no natural one-to-one mapping between vertices, but where there is correspondence between the community structures of each graph. Because we seek to learn structure at this higher community level, we refer to this problem as "coarse" graph alignment. To this end, we propose a novel regularized partial least squares method which both incorporates the observed graph structures and imposes sparsity in order to reflect the underlying block community structure. We provide efficient algorithms for our method and demonstrate its effectiveness in simulations.
△ Less
Submitted 6 April, 2021;
originally announced April 2021.
-
Automatic Registration and Clustering of Time Series
Authors:
Michael Weylandt,
George Michailidis
Abstract:
Clustering of time series data exhibits a number of challenges not present in other settings, notably the problem of registration (alignment) of observed signals. Typical approaches include pre-registration to a user-specified template or time warping approaches which attempt to optimally align series with a minimum of distortion. For many signals obtained from recording or sensing devices, these…
▽ More
Clustering of time series data exhibits a number of challenges not present in other settings, notably the problem of registration (alignment) of observed signals. Typical approaches include pre-registration to a user-specified template or time warping approaches which attempt to optimally align series with a minimum of distortion. For many signals obtained from recording or sensing devices, these methods may be unsuitable as a template signal is not available for pre-registration, while the distortion of warping approaches may obscure meaningful temporal information. We propose a new method for automatic time series alignment within a clustering problem. Our approach, Temporal Registration using Optimal Unitary Transformations (TROUT), is based on a novel dissimilarity measure between time series that is easy to compute and automatically identifies optimal alignment between pairs of time series. By embedding our new measure in a optimization formulation, we retain well-known advantages of computational and statistical performance. We provide an efficient algorithm for TROUT-based clustering and demonstrate its superior performance over a range of competitors.
△ Less
Submitted 10 February, 2021; v1 submitted 8 December, 2020;
originally announced December 2020.
-
A semi-parametric model for target localization in distributed systems
Authors:
Rohit K. Patra,
Moulinath Banerjee,
George Michailidis
Abstract:
Distributed systems serve as a key technological infrastructure for monitoring diverse systems across space and time. Examples of their widespread applications include: precision agriculture, surveillance, ecosystem and physical infrastructure monitoring, animal behavior and tracking, disaster response and recovery to name a few. Such systems comprise of a large number of sensor devices at fixed l…
▽ More
Distributed systems serve as a key technological infrastructure for monitoring diverse systems across space and time. Examples of their widespread applications include: precision agriculture, surveillance, ecosystem and physical infrastructure monitoring, animal behavior and tracking, disaster response and recovery to name a few. Such systems comprise of a large number of sensor devices at fixed locations, wherein each individual sensor obtains measurements that are subsequently fused and processed at a central processing node. A key problem for such systems is to detect targets and identify their locations, for which a large body of literature has been developed focusing primarily on employing parametric models for signal attenuation from target to device. In this paper, we adopt a nonparametric approach that only assumes that the signal is nonincreasing as function of the distance between the sensor and the target. We propose a simple tuning parameter free estimator for the target location, namely, the simple score estimator (SSCE). We show that the SSCE is $\sqrt{n}$ consistent and has a Gaussian limit distribution which can be used to construct asymptotic confidence regions for the location of the target. We study the performance of the SSCE through extensive simulations, and finally demonstrate an application to target detection in a video surveillance data set.
△ Less
Submitted 3 December, 2020;
originally announced December 2020.
-
Inference on the Change Point for High Dimensional Dynamic Graphical Models
Authors:
Abhishek Kaul,
Hongjin Zhang,
Konstantinos Tsampourakis,
George Michailidis
Abstract:
We develop an estimator for the change point parameter for a dynamically evolving graphical model, and also obtain its asymptotic distribution under high dimensional scaling. To procure the latter result, we establish that the proposed estimator exhibits an $O_p(ψ^{-2})$ rate of convergence, wherein $ψ$ represents the jump size between the graphical model parameters before and after the change poi…
▽ More
We develop an estimator for the change point parameter for a dynamically evolving graphical model, and also obtain its asymptotic distribution under high dimensional scaling. To procure the latter result, we establish that the proposed estimator exhibits an $O_p(ψ^{-2})$ rate of convergence, wherein $ψ$ represents the jump size between the graphical model parameters before and after the change point. Further, it retains sufficient adaptivity against plug-in estimates of the graphical model parameters. We characterize the forms of the asymptotic distribution under the both a vanishing and a non-vanishing regime of the magnitude of the jump size. Specifically, in the former case it corresponds to the argmax of a negative drift asymmetric two sided Brownian motion, while in the latter case to the argmax of a negative drift asymmetric two sided random walk, whose increments depend on the distribution of the graphical model. Easy to implement algorithms are provided for estimating the change point and their performance assessed on synthetic data. The proposed methodology is further illustrated on RNA-sequenced microbiome data and their changes between young and older individuals.
△ Less
Submitted 21 February, 2021; v1 submitted 19 May, 2020;
originally announced May 2020.
-
Adaptive First-and Zeroth-order Methods for Weakly Convex Stochastic Optimization Problems
Authors:
Parvin Nazari,
Davoud Ataee Tarzanagh,
George Michailidis
Abstract:
In this paper, we design and analyze a new family of adaptive subgradient methods for solving an important class of weakly convex (possibly nonsmooth) stochastic optimization problems. Adaptive methods that use exponential moving averages of past gradients to update search directions and learning rates have recently attracted a lot of attention for solving optimization problems that arise in machi…
▽ More
In this paper, we design and analyze a new family of adaptive subgradient methods for solving an important class of weakly convex (possibly nonsmooth) stochastic optimization problems. Adaptive methods that use exponential moving averages of past gradients to update search directions and learning rates have recently attracted a lot of attention for solving optimization problems that arise in machine learning. Nevertheless, their convergence analysis almost exclusively requires smoothness and/or convexity of the objective function. In contrast, we establish non-asymptotic rates of convergence of first and zeroth-order adaptive methods and their proximal variants for a reasonably broad class of nonsmooth \& nonconvex optimization problems. Experimental results indicate how the proposed algorithms empirically outperform stochastic gradient descent and its zeroth-order variant for solving such optimization problems.
△ Less
Submitted 24 May, 2020; v1 submitted 19 May, 2020;
originally announced May 2020.
-
B-CONCORD -- A scalable Bayesian high-dimensional precision matrix estimation procedure
Authors:
Peyman Jalali,
Kshitij Khare,
George Michailidis
Abstract:
Sparse estimation of the precision matrix under high-dimensional scaling constitutes a canonical problem in statistics and machine learning. Numerous regression and likelihood based approaches, many frequentist and some Bayesian in nature have been developed. Bayesian methods provide direct uncertainty quantification of the model parameters through the posterior distribution and thus do not requir…
▽ More
Sparse estimation of the precision matrix under high-dimensional scaling constitutes a canonical problem in statistics and machine learning. Numerous regression and likelihood based approaches, many frequentist and some Bayesian in nature have been developed. Bayesian methods provide direct uncertainty quantification of the model parameters through the posterior distribution and thus do not require a second round of computations for obtaining debiased estimates of the model parameters and their confidence intervals. However, they are computationally expensive for settings involving more than 500 variables. To that end, we develop B-CONCORD for the problem at hand, a Bayesian analogue of the CONvex CORrelation selection methoD (CONCORD) introduced by Khare et al. (2015). B-CONCORD leverages the CONCORD generalized likelihood function together with a spike-and-slab prior distribution to induce sparsity in the precision matrix parameters. We establish model selection and estimation consistency under high-dimensional scaling; further, we develop a procedure that refits only the non-zero parameters of the precision matrix, leading to significant improvements in the estimates in finite samples. Extensive numerical work illustrates the computational scalability of the proposed approach vis-a-vis competing Bayesian methods, as well as its accuracy.
△ Less
Submitted 18 May, 2020;
originally announced May 2020.
-
Online detection of local abrupt changes in high-dimensional Gaussian graphical models
Authors:
Hossein Keshavarz,
George Michailidis
Abstract:
The problem of identifying change points in high-dimensional Gaussian graphical models (GGMs) in an online fashion is of interest, due to new applications in biology, economics and social sciences. The offline version of the problem, where all the data are a priori available, has led to a number of methods and associated algorithms involving regularized loss functions. However, for the online vers…
▽ More
The problem of identifying change points in high-dimensional Gaussian graphical models (GGMs) in an online fashion is of interest, due to new applications in biology, economics and social sciences. The offline version of the problem, where all the data are a priori available, has led to a number of methods and associated algorithms involving regularized loss functions. However, for the online version, there is currently only a single work in the literature that develops a sequential testing procedure and also studies its asymptotic false alarm probability and power. The latter test is best suited for the detection of change points driven by global changes in the structure of the precision matrix of the GGM, in the sense that many edges are involved. Nevertheless, in many practical settings the change point is driven by local changes, in the sense that only a small number of edges exhibit changes. To that end, we develop a novel test to address this problem that is based on the $\ell_\infty$ norm of the normalized covariance matrix of an appropriately selected portion of incoming data. The study of the asymptotic distribution of the proposed test statistic under the null (no presence of a change point) and the alternative (presence of a change point) hypotheses requires new technical tools that examine maxima of graph-dependent Gaussian random variables, and that of independent interest. It is further shown that these tools lead to the imposition of mild regularity conditions for key model parameters, instead of more stringent ones required by leveraging previously used tools in related problems in the literature. Numerical work on synthetic data illustrates the good performance of the proposed detection procedure both in terms of computational and statistical efficiency across numerous experimental settings.
△ Less
Submitted 15 March, 2020;
originally announced March 2020.
-
Regularized Estimation of High-dimensional Factor-Augmented Vector Autoregressive (FAVAR) Models
Authors:
Jiahe Lin,
George Michailidis
Abstract:
A factor-augmented vector autoregressive (FAVAR) model is defined by a VAR equation that captures lead-lag correlations amongst a set of observed variables $X$ and latent factors $F$, and a calibration equation that relates another set of observed variables $Y$ with $F$ and $X$. The latter equation is used to estimate the factors that are subsequently used in estimating the parameters of the VAR s…
▽ More
A factor-augmented vector autoregressive (FAVAR) model is defined by a VAR equation that captures lead-lag correlations amongst a set of observed variables $X$ and latent factors $F$, and a calibration equation that relates another set of observed variables $Y$ with $F$ and $X$. The latter equation is used to estimate the factors that are subsequently used in estimating the parameters of the VAR system. The FAVAR model has become popular in applied economic research, since it can summarize a large number of variables of interest as a few factors through the calibration equation and subsequently examine their influence on core variables of primary interest through the VAR equation. However, there is increasing need for examining lead-lag relationships between a large number of time series, while incorporating information from another high-dimensional set of variables. Hence, in this paper we investigate the FAVAR model under high-dimensional scaling. We introduce an appropriate identification constraint for the model parameters, which when incorporated into the formulated optimization problem yields estimates with good statistical properties. Further, we address a number of technical challenges introduced by the fact that estimates of the VAR system model parameters are based on estimated rather than directly observed quantities. The performance of the proposed estimators is evaluated on synthetic data. Further, the model is applied to commodity prices and reveals interesting and interpretable relationships between the prices and the factors extracted from a set of global macroeconomic indicators.
△ Less
Submitted 31 May, 2020; v1 submitted 9 December, 2019;
originally announced December 2019.
-
Approximate Factor Models with Strongly Correlated Idiosyncratic Errors
Authors:
Jiahe Lin,
George Michailidis
Abstract:
We consider the estimation of approximate factor models for time series data, where strong serial and cross-sectional correlations amongst the idiosyncratic component are present. This setting comes up naturally in many applications, but existing approaches in the literature rely on the assumption that such correlations are weak, leading to mis-specification of the number of factors selected and c…
▽ More
We consider the estimation of approximate factor models for time series data, where strong serial and cross-sectional correlations amongst the idiosyncratic component are present. This setting comes up naturally in many applications, but existing approaches in the literature rely on the assumption that such correlations are weak, leading to mis-specification of the number of factors selected and consequently inaccurate inference. In this paper, we explicitly incorporate the dependent structure present in the idiosyncratic component through lagged values of the observed multivariate time series. We formulate a constrained optimization problem to estimate the factor space and the transition matrices of the lagged values {\em simultaneously}, wherein the constraints reflect the low rank nature of the common factors and the sparsity of the transition matrices. We establish theoretical properties of the obtained estimates, and introduce an easy-to-implement computational procedure for empirical work. The performance of the model and the implementation procedure is evaluated on synthetic data and compared with competing approaches, and further illustrated on a data set involving weekly log-returns of 75 US large financial institutions for the 2001-2016 period.
△ Less
Submitted 9 December, 2019;
originally announced December 2019.
-
Regularized and Smooth Double Core Tensor Factorization for Heterogeneous Data
Authors:
Davoud Ataee Tarzanagh,
George Michailidis
Abstract:
We introduce a general tensor model suitable for data analytic tasks for {\em heterogeneous} datasets, wherein there are joint low-rank structures within groups of observations, but also discriminative structures across different groups. To capture such complex structures, a double core tensor (DCOT) factorization model is introduced together with a family of smoothing loss functions. By leveragin…
▽ More
We introduce a general tensor model suitable for data analytic tasks for {\em heterogeneous} datasets, wherein there are joint low-rank structures within groups of observations, but also discriminative structures across different groups. To capture such complex structures, a double core tensor (DCOT) factorization model is introduced together with a family of smoothing loss functions. By leveraging the proposed smoothing function, the model accurately estimates the model factors, even in the presence of missing entries. A linearized ADMM method is employed to solve regularized versions of DCOT factorizations, that avoid large tensor operations and large memory storage requirements. Further, we establish theoretically its global convergence, together with consistency of the estimates of the model parameters. The effectiveness of the DCOT model is illustrated on several real-world examples including image completion, recommender systems, subspace clustering and detecting modules in heterogeneous Omics multi-modal data, since it provides more insightful decompositions than conventional tensor methods.
△ Less
Submitted 3 October, 2022; v1 submitted 23 November, 2019;
originally announced November 2019.
-
Spiked Laplacian Graphs: Bayesian Community Detection in Heterogeneous Networks
Authors:
Leo L Duan,
George Michailidis,
Mingzhou Ding
Abstract:
In network data analysis, it is becoming common to work with a collection of graphs that exhibit \emph{heterogeneity}. For example, neuroimaging data from patient cohorts are increasingly available. A critical analytical task is to identify communities, and graph Laplacian-based methods are routinely used. However, these methods are currently limited to a single network and do not provide measures…
▽ More
In network data analysis, it is becoming common to work with a collection of graphs that exhibit \emph{heterogeneity}. For example, neuroimaging data from patient cohorts are increasingly available. A critical analytical task is to identify communities, and graph Laplacian-based methods are routinely used. However, these methods are currently limited to a single network and do not provide measures of uncertainty on the community assignment. In this work, we propose a probabilistic network model called the ``Spiked Laplacian Graph'' that considers each network as an invertible transform of the Laplacian, with its eigenvalues modeled by a modified spiked structure. This effectively reduces the number of parameters in the eigenvectors, and their sign patterns allow efficient estimation of the community structure. Further, the posterior distribution of the eigenvectors provides uncertainty quantification for the community estimates. Subsequently, we introduce a Bayesian non-parametric approach to address the issue of heterogeneity in a collection of graphs. Theoretical results are established on the posterior consistency of the procedure and provide insights on the trade-off between model resolution and accuracy. We illustrate the performance of the methodology on synthetic data sets, as well as a neuroscience study related to brain activity in working memory.
Keywords: Hierarchical Community Detection, Isoperimetric Constant, Mixed-Effect Eigendecomposition, Normalized Graph Cut, Stiefel Manifold
△ Less
Submitted 9 March, 2020; v1 submitted 6 October, 2019;
originally announced October 2019.
-
Analyses of Multi-collection Corpora via Compound Topic Modeling
Authors:
Clint P. George,
Wei Xia,
George Michailidis
Abstract:
As electronically stored data grow in daily life, obtaining novel and relevant information becomes challenging in text mining. Thus people have sought statistical methods based on term frequency, matrix algebra, or topic modeling for text mining. Popular topic models have centered on one single text collection, which is deficient for comparative text analyses. We consider a setting where one can p…
▽ More
As electronically stored data grow in daily life, obtaining novel and relevant information becomes challenging in text mining. Thus people have sought statistical methods based on term frequency, matrix algebra, or topic modeling for text mining. Popular topic models have centered on one single text collection, which is deficient for comparative text analyses. We consider a setting where one can partition the corpus into subcollections. Each subcollection shares a common set of topics, but there exists relative variation in topic proportions among collections. Including any prior knowledge about the corpus (e.g. organization structure), we propose the compound latent Dirichlet allocation (cLDA) model, improving on previous work, encouraging generalizability, and depending less on user-input parameters. To identify the parameters of interest in cLDA, we study Markov chain Monte Carlo (MCMC) and variational inference approaches extensively, and suggest an efficient MCMC method. We evaluate cLDA qualitatively and quantitatively using both synthetic and real-world corpora. The usability study on some real-world corpora illustrates the superiority of cLDA to explore the underlying topics automatically but also model their connections and variations across multiple collections.
△ Less
Submitted 17 June, 2019;
originally announced July 2019.
-
Online Distributed Estimation of Principal Eigenspaces
Authors:
Davoud Ataee Tarzanagh,
Mohamad Kazem Shirani Faradonbeh,
George Michailidis
Abstract:
Principal components analysis (PCA) is a widely used dimension reduction technique with an extensive range of applications. In this paper, an online distributed algorithm is proposed for recovering the principal eigenspaces. We further establish its rate of convergence and show how it relates to the number of nodes employed in the distributed computation, the effective rank of the data matrix unde…
▽ More
Principal components analysis (PCA) is a widely used dimension reduction technique with an extensive range of applications. In this paper, an online distributed algorithm is proposed for recovering the principal eigenspaces. We further establish its rate of convergence and show how it relates to the number of nodes employed in the distributed computation, the effective rank of the data matrix under consideration, and the gap in the spectrum of the underlying population covariance matrix. The proposed algorithm is illustrated on low-rank approximation and $\boldsymbol{k}$-means clustering tasks. The numerical results show a substantial computational speed-up vis-a-vis standard distributed PCA algorithms, without compromising learning accuracy.
△ Less
Submitted 17 May, 2019;
originally announced May 2019.
-
Randomized Algorithms for Data-Driven Stabilization of Stochastic Linear Systems
Authors:
Mohamad Kazem Shirani Faradonbeh,
Ambuj Tewari,
George Michailidis
Abstract:
Data-driven control strategies for dynamical systems with unknown parameters are popular in theory and applications. An essential problem is to prevent stochastic linear systems becoming destabilized, due to the uncertainty of the decision-maker about the dynamical parameter. Two randomized algorithms are proposed for this problem, but the performance is not sufficiently investigated. Further, the…
▽ More
Data-driven control strategies for dynamical systems with unknown parameters are popular in theory and applications. An essential problem is to prevent stochastic linear systems becoming destabilized, due to the uncertainty of the decision-maker about the dynamical parameter. Two randomized algorithms are proposed for this problem, but the performance is not sufficiently investigated. Further, the effect of key parameters of the algorithms such as the magnitude and the frequency of applying the randomizations is not currently available. This work studies the stabilization speed and the failure probability of data-driven procedures. We provide numerical analyses for the performance of two methods: stochastic feedback, and stochastic parameter. The presented results imply that as long as the number of statistically independent randomizations is not too small, fast stabilization is guaranteed.
△ Less
Submitted 16 May, 2019;
originally announced May 2019.
-
Change Point Estimation in Panel Data with Temporal and Cross-sectional Dependence
Authors:
Monika Bhattacharjee,
Moulinath Banerjee,
George Michailidis
Abstract:
We study the problem of detecting a common change point in large panel data based on a mean shift model, wherein the errors exhibit both temporal and cross-sectional dependence. A least squares based procedure is used to estimate the location of the change point. Further, we establish the convergence rate and obtain the asymptotic distribution of the least squares estimator. The form of the distri…
▽ More
We study the problem of detecting a common change point in large panel data based on a mean shift model, wherein the errors exhibit both temporal and cross-sectional dependence. A least squares based procedure is used to estimate the location of the change point. Further, we establish the convergence rate and obtain the asymptotic distribution of the least squares estimator. The form of the distribution is determined by the behavior of the norm difference of the means before and after the change point. Since the behavior of this norm difference is, a priori, unknown to the practitioner, we also develop a novel data driven adaptive procedure that provides valid confidence intervals for the common change point, without requiring any such knowledge. Numerical work based on synthetic data illustrates the performance of the estimator in finite samples under different settings of temporal and cross-sectional dependence, sample size and number of panels. Finally, we examine an application to financial stock data and discuss the identified change points.
△ Less
Submitted 24 April, 2019;
originally announced April 2019.
-
On Applications of Bootstrap in Continuous Space Reinforcement Learning
Authors:
Mohamad Kazem Shirani Faradonbeh,
Ambuj Tewari,
George Michailidis
Abstract:
In decision making problems for continuous state and action spaces, linear dynamical models are widely employed. Specifically, policies for stochastic linear systems subject to quadratic cost functions capture a large number of applications in reinforcement learning. Selected randomized policies have been studied in the literature recently that address the trade-off between identification and cont…
▽ More
In decision making problems for continuous state and action spaces, linear dynamical models are widely employed. Specifically, policies for stochastic linear systems subject to quadratic cost functions capture a large number of applications in reinforcement learning. Selected randomized policies have been studied in the literature recently that address the trade-off between identification and control. However, little is known about policies based on bootstrapping observed states and actions. In this work, we show that bootstrap-based policies achieve a square root scaling of regret with respect to time. We also obtain results on the accuracy of learning the model's dynamics. Corroborative numerical analysis that illustrates the technical results is also provided.
△ Less
Submitted 20 April, 2019; v1 submitted 13 March, 2019;
originally announced March 2019.
-
Estimation of Gaussian directed acyclic graphs using partial ordering information with an application to dairy cattle data
Authors:
Syed Rahman,
Kshitij Khare,
George Michailidis,
Carlos Martinez,
Juan Carulla
Abstract:
Estimating a directed acyclic graph (DAG) from observational data represents a canonical learning problem and has generated a lot of interest in recent years. Research has focused mostly on the following two cases: when no information regarding the ordering of the nodes in the DAG is available, and when a domain-specific complete ordering of the nodes is available. In this paper, motivated by a re…
▽ More
Estimating a directed acyclic graph (DAG) from observational data represents a canonical learning problem and has generated a lot of interest in recent years. Research has focused mostly on the following two cases: when no information regarding the ordering of the nodes in the DAG is available, and when a domain-specific complete ordering of the nodes is available. In this paper, motivated by a recent application in dairy science, we develop a method for DAG estimation for the middle scenario, where partition based partial ordering of the nodes is known based on domain specific knowledge.We develop an efficient algorithm that solves the posited problem, coined Partition-DAG. Through extensive simulations using the DREAM3 Yeast data, we illustrate that Partition-DAG effectively incorporates the partial ordering information to improve both speed and accuracy. We then illustrate the usefulness of Partition-DAG by applying it to recently collected dairy cattle data, and inferring relationships between various variables involved in dairy agroecosystems.
△ Less
Submitted 13 February, 2019;
originally announced February 2019.
-
A Bayesian Approach to Joint Estimation of Multiple Graphical Models
Authors:
Peyman Jalali,
Kshitij Khare,
George Michailidis
Abstract:
The problem of joint estimation of multiple graphical models from high dimensional data has been studied in the statistics and machine learning literature, due to its importance in diverse fields including molecular biology, neuroscience and the social sciences. This work develops a Bayesian approach that decomposes the model parameters across the multiple graphical models into shared components a…
▽ More
The problem of joint estimation of multiple graphical models from high dimensional data has been studied in the statistics and machine learning literature, due to its importance in diverse fields including molecular biology, neuroscience and the social sciences. This work develops a Bayesian approach that decomposes the model parameters across the multiple graphical models into shared components across subsets of models and edges, and idiosyncratic ones. Further, it leverages a novel multivariate prior distribution, coupled with a pseudo-likelihood that enables fast computations through a robust and efficient Gibbs sampling scheme. We establish strong posterior consistency for model selection, as well as estimation of model parameters under high dimensional scaling with the number of variables growing exponentially with the sample size. The efficacy of the proposed approach is illustrated on both synthetic and real data.
Keywords: Pseudo-likelihood, Gibbs sampling, posterior consistency, Omics data
△ Less
Submitted 2 July, 2019; v1 submitted 10 February, 2019;
originally announced February 2019.
-
DADAM: A Consensus-based Distributed Adaptive Gradient Method for Online Optimization
Authors:
Parvin Nazari,
Davoud Ataee Tarzanagh,
George Michailidis
Abstract:
Adaptive gradient-based optimization methods such as \textsc{Adagrad}, \textsc{Rmsprop}, and \textsc{Adam} are widely used in solving large-scale machine learning problems including deep learning. A number of schemes have been proposed in the literature aiming at parallelizing them, based on communications of peripheral nodes with a central node, but incur high communications cost. To address this…
▽ More
Adaptive gradient-based optimization methods such as \textsc{Adagrad}, \textsc{Rmsprop}, and \textsc{Adam} are widely used in solving large-scale machine learning problems including deep learning. A number of schemes have been proposed in the literature aiming at parallelizing them, based on communications of peripheral nodes with a central node, but incur high communications cost. To address this issue, we develop a novel consensus-based distributed adaptive moment estimation method (\textsc{Dadam}) for online optimization over a decentralized network that enables data parallelization, as well as decentralized computation. The method is particularly useful, since it can accommodate settings where access to local data is allowed. Further, as established theoretically in this work, it can outperform centralized adaptive algorithms, for certain classes of loss functions used in applications. We analyze the convergence properties of the proposed algorithm and provide a dynamic regret bound on the convergence rate of adaptive moment estimation methods in both stochastic and deterministic settings. Empirical results demonstrate that \textsc{Dadam} works also well in practice and compares favorably to competing online optimization methods.
△ Less
Submitted 28 May, 2019; v1 submitted 25 January, 2019;
originally announced January 2019.
-
Low Rank and Structured Modeling of High-dimensional Vector Autoregressions
Authors:
Sumanta Basu,
Xianqi Li,
George Michailidis
Abstract:
Network modeling of high-dimensional time series data is a key learning task due to its widespread use in a number of application areas, including macroeconomics, finance and neuroscience. While the problem of sparse modeling based on vector autoregressive models (VAR) has been investigated in depth in the literature, more complex network structures that involve low rank and group sparse component…
▽ More
Network modeling of high-dimensional time series data is a key learning task due to its widespread use in a number of application areas, including macroeconomics, finance and neuroscience. While the problem of sparse modeling based on vector autoregressive models (VAR) has been investigated in depth in the literature, more complex network structures that involve low rank and group sparse components have received considerably less attention, despite their presence in data. Failure to account for low-rank structures results in spurious connectivity among the observed time series, which may lead practitioners to draw incorrect conclusions about pertinent scientific or policy questions. In order to accurately estimate a network of Granger causal interactions after accounting for latent effects, we introduce a novel approach for estimating low-rank and structured sparse high-dimensional VAR models. We introduce a regularized framework involving a combination of nuclear norm and lasso (or group lasso) penalty. Further, and subsequently establish non-asymptotic upper bounds on the estimation error rates of the low-rank and the structured sparse components. We also introduce a fast estimation algorithm and finally demonstrate the performance of the proposed modeling framework over standard sparse VAR estimates through numerical experiments on synthetic and real data.
△ Less
Submitted 9 December, 2018;
originally announced December 2018.
-
Change Point Estimation in a Dynamic Stochastic Block Model
Authors:
Monika Bhattacharjee,
Moulinath Banerjee,
George Michailidis
Abstract:
We consider the problem of estimating the location of a single change point in a dynamic stochastic block model. We propose two methods of estimating the change point, together with the model parameters. The first employs a least squares criterion function and takes into consideration the full structure of the stochastic block model and is evaluated at each point in time. Hence, as an intermediate…
▽ More
We consider the problem of estimating the location of a single change point in a dynamic stochastic block model. We propose two methods of estimating the change point, together with the model parameters. The first employs a least squares criterion function and takes into consideration the full structure of the stochastic block model and is evaluated at each point in time. Hence, as an intermediate step, it requires estimating the community structure based on a clustering algorithm at every time point. The second method comprises of the following two steps: in the first one, a least squares function is used and evaluated at each time point, but ignores the community structures and just considers a random graph generating mechanism exhibiting a change point. Once the change point is identified, in the second step, all network data before and after it are used together with a clustering algorithm to obtain the corresponding community structures and subsequently estimate the generating stochastic block model parameters. A comparison between these two methods is illustrated. Further, for both methods under their respective identifiability and certain additional regularity conditions, we establish rates of convergence and derive the asymptotic distributions of the change point estimators. The results are illustrated on synthetic data.
△ Less
Submitted 20 May, 2020; v1 submitted 7 December, 2018;
originally announced December 2018.
-
Finite Time Adaptive Stabilization of LQ Systems
Authors:
Mohamad Kazem Shirani Faradonbeh,
Ambuj Tewari,
George Michailidis
Abstract:
Stabilization of linear systems with unknown dynamics is a canonical problem in adaptive control. Since the lack of knowledge of system parameters can cause it to become destabilized, an adaptive stabilization procedure is needed prior to regulation. Therefore, the adaptive stabilization needs to be completed in finite time. In order to achieve this goal, asymptotic approaches are not very helpful…
▽ More
Stabilization of linear systems with unknown dynamics is a canonical problem in adaptive control. Since the lack of knowledge of system parameters can cause it to become destabilized, an adaptive stabilization procedure is needed prior to regulation. Therefore, the adaptive stabilization needs to be completed in finite time. In order to achieve this goal, asymptotic approaches are not very helpful. There are only a few existing non-asymptotic results and a full treatment of the problem is not currently available.
In this work, leveraging the novel method of random linear feedbacks, we establish high probability guarantees for finite time stabilization. Our results hold for remarkably general settings because we carefully choose a minimal set of assumptions. These include stabilizability of the underlying system and restricting the degree of heaviness of the noise distribution. To derive our results, we also introduce a number of new concepts and technical tools to address regularity and instability of the closed-loop matrix.
△ Less
Submitted 22 July, 2018;
originally announced July 2018.
-
On Adaptive Linear-Quadratic Regulators
Authors:
Mohamad Kazem Shirani Faradonbeh,
Ambuj Tewari,
George Michailidis
Abstract:
Performance of adaptive control policies is assessed through the regret with respect to the optimal regulator, which reflects the increase in the operating cost due to uncertainty about the dynamics parameters. However, available results in the literature do not provide a quantitative characterization of the effect of the unknown parameters on the regret. Further, there are problems regarding the…
▽ More
Performance of adaptive control policies is assessed through the regret with respect to the optimal regulator, which reflects the increase in the operating cost due to uncertainty about the dynamics parameters. However, available results in the literature do not provide a quantitative characterization of the effect of the unknown parameters on the regret. Further, there are problems regarding the efficient implementation of some of the existing adaptive policies. Finally, results regarding the accuracy with which the system's parameters are identified are scarce and rather incomplete.
This study aims to comprehensively address these three issues. First, by introducing a novel decomposition of adaptive policies, we establish a sharp expression for the regret of an arbitrary policy in terms of the deviations from the optimal regulator. Second, we show that adaptive policies based on slight modifications of the Certainty Equivalence scheme are efficient. Specifically, we establish a regret of (nearly) square-root rate for two families of randomized adaptive policies. The presented regret bounds are obtained by using anti-concentration results on the random matrices employed for randomizing the estimates of the unknown parameters. Moreover, we study the minimal additional information on dynamics matrices that using them the regret will become of logarithmic order. Finally, the rates at which the unknown parameters of the system are being identified are presented.
△ Less
Submitted 20 March, 2020; v1 submitted 27 June, 2018;
originally announced June 2018.
-
Sequential change-point detection in high-dimensional Gaussian graphical models
Authors:
Hossein Keshavarz,
George Michailidis,
Yves Atchade
Abstract:
High dimensional piecewise stationary graphical models represent a versatile class for modelling time varying networks arising in diverse application areas, including biology, economics, and social sciences. There has been recent work in offline detection and estimation of regime changes in the topology of sparse graphical models. However, the online setting remains largely unexplored, despite its…
▽ More
High dimensional piecewise stationary graphical models represent a versatile class for modelling time varying networks arising in diverse application areas, including biology, economics, and social sciences. There has been recent work in offline detection and estimation of regime changes in the topology of sparse graphical models. However, the online setting remains largely unexplored, despite its high relevance to applications in sensor networks and other engineering monitoring systems, as well as financial markets. To that end, this work introduces a novel scalable online algorithm for detecting an unknown number of abrupt changes in the inverse covariance matrix of sparse Gaussian graphical models with small delay. The proposed algorithm is based upon monitoring the conditional log-likelihood of all nodes in the network and can be extended to a large class of continuous and discrete graphical models. We also investigate asymptotic properties of our procedure under certain mild regularity conditions on the graph size, sparsity level, number of samples, and pre- and post-changes in the topology of the network. Numerical works on both synthetic and real data illustrate the good performance of the proposed methodology both in terms of computational and statistical efficiency across numerous experimental settings.
△ Less
Submitted 20 June, 2018;
originally announced June 2018.
-
Joint Estimation and Inference for Data Integration Problems based on Multiple Multi-layered Gaussian Graphical Models
Authors:
Subhabrata Majumdar,
George Michailidis
Abstract:
The rapid development of high-throughput technologies has enabled the generation of data from biological or disease processes that span multiple layers, like genomic, proteomic or metabolomic data, and further pertain to multiple sources, like disease subtypes or experimental conditions. In this work, we propose a general statistical framework based on Gaussian graphical models for horizontal (i.e…
▽ More
The rapid development of high-throughput technologies has enabled the generation of data from biological or disease processes that span multiple layers, like genomic, proteomic or metabolomic data, and further pertain to multiple sources, like disease subtypes or experimental conditions. In this work, we propose a general statistical framework based on Gaussian graphical models for horizontal (i.e. across conditions or subtypes) and vertical (i.e. across different layers containing data on molecular compartments) integration of information in such datasets. We start with decomposing the multi-layer problem into a series of two-layer problems. For each two-layer problem, we model the outcomes at a node in the lower layer as dependent on those of other nodes in that layer, as well as all nodes in the upper layer. We use a combination of neighborhood selection and group-penalized regression to obtain sparse estimates of all model parameters. Following this, we develop a debiasing technique and asymptotic distributions of inter-layer directed edge weights that utilize already computed neighborhood selection coefficients for nodes in the upper layer. Subsequently, we establish global and simultaneous testing procedures for these edge weights. Performance of the proposed methodology is evaluated on synthetic and real data.
△ Less
Submitted 21 January, 2022; v1 submitted 8 March, 2018;
originally announced March 2018.
-
Optimism-Based Adaptive Regulation of Linear-Quadratic Systems
Authors:
Mohamad Kazem Shirani Faradonbeh,
Ambuj Tewari,
George Michailidis
Abstract:
The main challenge for adaptive regulation of linear-quadratic systems is the trade-off between identification and control. An adaptive policy needs to address both the estimation of unknown dynamics parameters (exploration), as well as the regulation of the underlying system (exploitation). To this end, optimism-based methods which bias the identification in favor of optimistic approximations of…
▽ More
The main challenge for adaptive regulation of linear-quadratic systems is the trade-off between identification and control. An adaptive policy needs to address both the estimation of unknown dynamics parameters (exploration), as well as the regulation of the underlying system (exploitation). To this end, optimism-based methods which bias the identification in favor of optimistic approximations of the true parameter are employed in the literature. A number of asymptotic results have been established, but their finite time counterparts are few, with important restrictions.
This study establishes results for the worst-case regret of optimism-based adaptive policies. The presented high probability upper bounds are optimal up to logarithmic factors. The non-asymptotic analysis of this work requires very mild assumptions; (i) stabilizability of the system's dynamics, and (ii) limiting the degree of heaviness of the noise distribution. To establish such bounds, certain novel techniques are developed to comprehensively address the probabilistic behavior of dependent random matrices with heavy-tailed distributions.
△ Less
Submitted 28 March, 2019; v1 submitted 20 November, 2017;
originally announced November 2017.
-
Intelligent sampling for multiple change-points in exceedingly long time series with rate guarantees
Authors:
Zhiyuan Lu,
Moulinath Banerjee,
George Michailidis
Abstract:
Change point estimation in its offline version is traditionally performed by optimizing over the data set of interest, by considering each data point as the true location parameter and computing a data fit criterion. Subsequently, the data point that minimizes the criterion is declared as the change point estimate. For estimating multiple change points, the procedures are analogous in spirit, but…
▽ More
Change point estimation in its offline version is traditionally performed by optimizing over the data set of interest, by considering each data point as the true location parameter and computing a data fit criterion. Subsequently, the data point that minimizes the criterion is declared as the change point estimate. For estimating multiple change points, the procedures are analogous in spirit, but significantly more involved in execution. Since change-points are local discontinuities, only data points close to the actual change point provide useful information for estimation, while data points far away are superfluous, to the point where using only a few points close to the true parameter is just as precise as using the full data set. Leveraging this "locality principle", we introduce a two-stage procedure for the problem at hand, which in the 1st stage uses a sparse subsample to obtain pilot estimates of the underlying change points, and in the 2nd stage refines these estimates by sampling densely in appropriately defined neighborhoods around them. We establish that this method achieves the same rate of convergence and even virtually the same asymptotic distribution as the analysis of the full data, while reducing computational complexity to O(N^0.5) time (N being the length of data set), as opposed to at least O(N) time for all current procedures, making it promising for the analysis on exceedingly long data sets with adequately spaced out change points. The main results are established under a signal plus noise model with independent and identically distributed error terms, but extensions to dependent data settings, as well as multiple stage (>2) procedures are also provided. The performance of our procedure -- which is coined "intelligent sampling" -- is illustrated on both synthetic and real Internet data streams.
△ Less
Submitted 9 April, 2020; v1 submitted 20 October, 2017;
originally announced October 2017.
-
Regularized Estimation and Testing for High-Dimensional Multi-Block Vector-Autoregressive Models
Authors:
Jiahe Lin,
George Michailidis
Abstract:
Dynamical systems comprising of multiple components that can be partitioned into distinct blocks originate in many scientific areas. A pertinent example is the interactions between financial assets and selected macroeconomic indicators, which has been studied at aggregate level---e.g. a stock index and an employment index---extensively in the macroeconomics literature. A key shortcoming of this ap…
▽ More
Dynamical systems comprising of multiple components that can be partitioned into distinct blocks originate in many scientific areas. A pertinent example is the interactions between financial assets and selected macroeconomic indicators, which has been studied at aggregate level---e.g. a stock index and an employment index---extensively in the macroeconomics literature. A key shortcoming of this approach is that it ignores potential influences from other related components (e.g. Gross Domestic Product) that may exert influence on the system's dynamics and structure and thus produces incorrect results. To mitigate this issue, we consider a multi-block linear dynamical system with Granger-causal ordering between blocks, wherein the blocks' temporal dynamics are described by vector autoregressive processes and are influenced by blocks higher in the system hierarchy. We derive the maximum likelihood estimator for the posited model for Gaussian data in the high-dimensional setting based on appropriate regularization schemes for the parameters of the block components. To optimize the underlying non-convex likelihood function, we develop an iterative algorithm with convergence guarantees. We establish theoretical properties of the maximum likelihood estimates, leveraging the decomposability of the regularizers and a careful analysis of the iterates. Finally, we develop testing procedures for the null hypothesis of whether a block "Granger-causes" another block of variables. The performance of the model and the testing procedures are evaluated on synthetic data, and illustrated on a data set involving log-returns of the US S&P100 component stocks and key macroeconomic variables for the 2001--16 period.
△ Less
Submitted 19 August, 2017;
originally announced August 2017.
-
Likelihood Inference for Large Scale Stochastic Blockmodels with Covariates based on a Divide-and-Conquer Parallelizable Algorithm with Communication
Authors:
Sandipan Roy,
Yves Atchadé,
George Michailidis
Abstract:
We consider a stochastic blockmodel equipped with node covariate information, that is helpful in analyzing social network data. The key objective is to obtain maximum likelihood estimates of the model parameters. For this task, we devise a fast, scalable Monte Carlo EM type algorithm based on case-control approximation of the log-likelihood coupled with a subsampling approach. A key feature of the…
▽ More
We consider a stochastic blockmodel equipped with node covariate information, that is helpful in analyzing social network data. The key objective is to obtain maximum likelihood estimates of the model parameters. For this task, we devise a fast, scalable Monte Carlo EM type algorithm based on case-control approximation of the log-likelihood coupled with a subsampling approach. A key feature of the proposed algorithm is its parallelizability, by processing portions of the data on several cores, while leveraging communication of key statistics across the cores during each iteration of the algorithm. The performance of the algorithm is evaluated on synthetic data sets and compared with competing methods for blockmodel parameter estimation. We also illustrate the model on data from a Facebook derived social network enhanced with node covariate information.
△ Less
Submitted 7 August, 2018; v1 submitted 30 October, 2016;
originally announced October 2016.
-
Penalized Maximum Likelihood Estimation of Multi-layered Gaussian Graphical Models
Authors:
Jiahe Lin,
Sumanta Basu,
Moulinath Banerjee,
George Michailidis
Abstract:
Analyzing multi-layered graphical models provides insight into understanding the conditional relationships among nodes within layers after adjusting for and quantifying the effects of nodes from other layers. We obtain the penalized maximum likelihood estimator for Gaussian multi-layered graphical models, based on a computational approach involving screening of variables, iterative estimation of t…
▽ More
Analyzing multi-layered graphical models provides insight into understanding the conditional relationships among nodes within layers after adjusting for and quantifying the effects of nodes from other layers. We obtain the penalized maximum likelihood estimator for Gaussian multi-layered graphical models, based on a computational approach involving screening of variables, iterative estimation of the directed edges between layers and undirected edges within layers and a final refitting and stability selection step that provides improved performance in finite sample settings. We establish the consistency of the estimator in a high-dimensional setting. To obtain this result, we develop a strategy that leverages the biconvexity of the likelihood function to ensure convergence of the developed iterative algorithm to a stationary point, as well as careful uniform error control of the estimates over iterations. The performance of the maximum likelihood estimator is illustrated on synthetic data.
△ Less
Submitted 5 January, 2016;
originally announced January 2016.