-
Latent Space Score-based Diffusion Model for Probabilistic Multivariate Time Series Imputation
Authors:
Guojun Liang,
Najmeh Abiri,
Atiye Sadat Hashemi,
Jens Lundström,
Stefan Byttner,
Prayag Tiwari
Abstract:
Accurate imputation is essential for the reliability and success of downstream tasks. Recently, diffusion models have attracted great attention in this field. However, these models neglect the latent distribution in a lower-dimensional space derived from the observed data, which limits the generative capacity of the diffusion model. Additionally, dealing with the original missing data without labe…
▽ More
Accurate imputation is essential for the reliability and success of downstream tasks. Recently, diffusion models have attracted great attention in this field. However, these models neglect the latent distribution in a lower-dimensional space derived from the observed data, which limits the generative capacity of the diffusion model. Additionally, dealing with the original missing data without labels becomes particularly problematic. To address these issues, we propose the Latent Space Score-Based Diffusion Model (LSSDM) for probabilistic multivariate time series imputation. Observed values are projected onto low-dimensional latent space and coarse values of the missing data are reconstructed without knowing their ground truth values by this unsupervised learning approach. Finally, the reconstructed values are fed into a conditional diffusion model to obtain the precise imputed values of the time series. In this way, LSSDM not only possesses the power to identify the latent distribution but also seamlessly integrates the diffusion model to obtain the high-fidelity imputed values and assess the uncertainty of the dataset. Experimental results demonstrate that LSSDM achieves superior imputation performance while also providing a better explanation and uncertainty analysis of the imputation mechanism. The website of the code is \textit{https://github.com/gorgen2020/LSSDM\_imputation}.
△ Less
Submitted 13 September, 2024;
originally announced September 2024.
-
Unveiling the Cycloid Trajectory of EM Iterations in Mixed Linear Regression
Authors:
Zhankun Luo,
Abolfazl Hashemi
Abstract:
We study the trajectory of iterations and the convergence rates of the Expectation-Maximization (EM) algorithm for two-component Mixed Linear Regression (2MLR). The fundamental goal of MLR is to learn the regression models from unlabeled observations. The EM algorithm finds extensive applications in solving the mixture of linear regressions. Recent results have established the super-linear converg…
▽ More
We study the trajectory of iterations and the convergence rates of the Expectation-Maximization (EM) algorithm for two-component Mixed Linear Regression (2MLR). The fundamental goal of MLR is to learn the regression models from unlabeled observations. The EM algorithm finds extensive applications in solving the mixture of linear regressions. Recent results have established the super-linear convergence of EM for 2MLR in the noiseless and high SNR settings under some assumptions and its global convergence rate with random initialization has been affirmed. However, the exponent of convergence has not been theoretically estimated and the geometric properties of the trajectory of EM iterations are not well-understood. In this paper, first, using Bessel functions we provide explicit closed-form expressions for the EM updates under all SNR regimes. Then, in the noiseless setting, we completely characterize the behavior of EM iterations by deriving a recurrence relation at the population level and notably show that all the iterations lie on a certain cycloid. Based on this new trajectory-based analysis, we exhibit the theoretical estimate for the exponent of super-linear convergence and further improve the statistical error bound at the finite-sample level. Our analysis provides a new framework for studying the behavior of EM for Mixed Linear Regression.
△ Less
Submitted 3 June, 2024; v1 submitted 28 May, 2024;
originally announced May 2024.
-
Optimistic Regret Bounds for Online Learning in Adversarial Markov Decision Processes
Authors:
Sang Bin Moon,
Abolfazl Hashemi
Abstract:
The Adversarial Markov Decision Process (AMDP) is a learning framework that deals with unknown and varying tasks in decision-making applications like robotics and recommendation systems. A major limitation of the AMDP formalism, however, is pessimistic regret analysis results in the sense that although the cost function can change from one episode to the next, the evolution in many settings is not…
▽ More
The Adversarial Markov Decision Process (AMDP) is a learning framework that deals with unknown and varying tasks in decision-making applications like robotics and recommendation systems. A major limitation of the AMDP formalism, however, is pessimistic regret analysis results in the sense that although the cost function can change from one episode to the next, the evolution in many settings is not adversarial. To address this, we introduce and study a new variant of AMDP, which aims to minimize regret while utilizing a set of cost predictors. For this setting, we develop a new policy search method that achieves a sublinear optimistic regret with high probability, that is a regret bound which gracefully degrades with the estimation power of the cost predictors. Establishing such optimistic regret bounds is nontrivial given that (i) as we demonstrate, the existing importance-weighted cost estimators cannot establish optimistic bounds, and (ii) the feedback model of AMDP is different (and more realistic) than the existing optimistic online learning works. Our result, in particular, hinges upon developing a novel optimistically biased cost estimator that leverages cost predictors and enables a high-probability regret analysis without imposing restrictive assumptions. We further discuss practical extensions of the proposed scheme and demonstrate its efficacy numerically.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
Efficient Hierarchical Bayesian Inference for Spatio-temporal Regression Models in Neuroimaging
Authors:
Ali Hashemi,
Yijing Gao,
Chang Cai,
Sanjay Ghosh,
Klaus-Robert Müller,
Srikantan S. Nagarajan,
Stefan Haufe
Abstract:
Several problems in neuroimaging and beyond require inference on the parameters of multi-task sparse hierarchical regression models. Examples include M/EEG inverse problems, neural encoding models for task-based fMRI analyses, and climate science. In these domains, both the model parameters to be inferred and the measurement noise may exhibit a complex spatio-temporal structure. Existing work eith…
▽ More
Several problems in neuroimaging and beyond require inference on the parameters of multi-task sparse hierarchical regression models. Examples include M/EEG inverse problems, neural encoding models for task-based fMRI analyses, and climate science. In these domains, both the model parameters to be inferred and the measurement noise may exhibit a complex spatio-temporal structure. Existing work either neglects the temporal structure or leads to computationally demanding inference schemes. Overcoming these limitations, we devise a novel flexible hierarchical Bayesian framework within which the spatio-temporal dynamics of model parameters and noise are modeled to have Kronecker product covariance structure. Inference in our framework is based on majorization-minimization optimization and has guaranteed convergence properties. Our highly efficient algorithms exploit the intrinsic Riemannian geometry of temporal autocovariance matrices. For stationary dynamics described by Toeplitz matrices, the theory of circulant embeddings is employed. We prove convex bounding properties and derive update rules of the resulting algorithms. On both synthetic and real neural data from M/EEG, we demonstrate that our methods lead to improved performance.
△ Less
Submitted 23 November, 2021; v1 submitted 2 November, 2021;
originally announced November 2021.
-
Robust Training in High Dimensions via Block Coordinate Geometric Median Descent
Authors:
Anish Acharya,
Abolfazl Hashemi,
Prateek Jain,
Sujay Sanghavi,
Inderjit S. Dhillon,
Ufuk Topcu
Abstract:
Geometric median (\textsc{Gm}) is a classical method in statistics for achieving a robust estimation of the uncorrupted data; under gross corruption, it achieves the optimal breakdown point of 0.5. However, its computational complexity makes it infeasible for robustifying stochastic gradient descent (SGD) for high-dimensional optimization problems. In this paper, we show that by applying \textsc{G…
▽ More
Geometric median (\textsc{Gm}) is a classical method in statistics for achieving a robust estimation of the uncorrupted data; under gross corruption, it achieves the optimal breakdown point of 0.5. However, its computational complexity makes it infeasible for robustifying stochastic gradient descent (SGD) for high-dimensional optimization problems. In this paper, we show that by applying \textsc{Gm} to only a judiciously chosen block of coordinates at a time and using a memory mechanism, one can retain the breakdown point of 0.5 for smooth non-convex problems, with non-asymptotic convergence rates comparable to the SGD with \textsc{Gm}.
△ Less
Submitted 16 June, 2021;
originally announced June 2021.
-
On the Convergence of Differentially Private Federated Learning on Non-Lipschitz Objectives, and with Normalized Client Updates
Authors:
Rudrajit Das,
Abolfazl Hashemi,
Sujay Sanghavi,
Inderjit S. Dhillon
Abstract:
There is a dearth of convergence results for differentially private federated learning (FL) with non-Lipschitz objective functions (i.e., when gradient norms are not bounded). The primary reason for this is that the clipping operation (i.e., projection onto an $\ell_2$ ball of a fixed radius called the clipping threshold) for bounding the sensitivity of the average update to each client's update i…
▽ More
There is a dearth of convergence results for differentially private federated learning (FL) with non-Lipschitz objective functions (i.e., when gradient norms are not bounded). The primary reason for this is that the clipping operation (i.e., projection onto an $\ell_2$ ball of a fixed radius called the clipping threshold) for bounding the sensitivity of the average update to each client's update introduces bias depending on the clipping threshold and the number of local steps in FL, and analyzing this is not easy. For Lipschitz functions, the Lipschitz constant serves as a trivial clipping threshold with zero bias. However, Lipschitzness does not hold in many practical settings; moreover, verifying it and computing the Lipschitz constant is hard. Thus, the choice of the clipping threshold is non-trivial and requires a lot of tuning in practice. In this paper, we provide the first convergence result for private FL on smooth \textit{convex} objectives \textit{for a general clipping threshold} -- \textit{without assuming Lipschitzness}. We also look at a simpler alternative to clipping (for bounding sensitivity) which is \textit{normalization} -- where we use only a scaled version of the unit vector along the client updates, completely discarding the magnitude information. {The resulting normalization-based private FL algorithm is theoretically shown to have better convergence than its clipping-based counterpart on smooth convex functions. We corroborate our theory with synthetic experiments as well as experiments on benchmarking datasets.
△ Less
Submitted 15 April, 2022; v1 submitted 13 June, 2021;
originally announced June 2021.
-
Generalization Bounds for Sparse Random Feature Expansions
Authors:
Abolfazl Hashemi,
Hayden Schaeffer,
Robert Shi,
Ufuk Topcu,
Giang Tran,
Rachel Ward
Abstract:
Random feature methods have been successful in various machine learning tasks, are easy to compute, and come with theoretical accuracy bounds. They serve as an alternative approach to standard neural networks since they can represent similar function spaces without a costly training phase. However, for accuracy, random feature methods require more measurements than trainable parameters, limiting t…
▽ More
Random feature methods have been successful in various machine learning tasks, are easy to compute, and come with theoretical accuracy bounds. They serve as an alternative approach to standard neural networks since they can represent similar function spaces without a costly training phase. However, for accuracy, random feature methods require more measurements than trainable parameters, limiting their use for data-scarce applications or problems in scientific machine learning. This paper introduces the sparse random feature expansion to obtain parsimonious random feature models. Specifically, we leverage ideas from compressive sensing to generate random feature expansions with theoretical guarantees even in the data-scarce setting. In particular, we provide generalization bounds for functions in a certain class (that is dense in a reproducing kernel Hilbert space) depending on the number of samples and the distribution of features. The generalization bounds improve with additional structural conditions, such as coordinate sparsity, compact clusters of the spectrum, or rapid spectral decay. In particular, by introducing sparse features, i.e. features with random sparse weights, we provide improved bounds for low order functions. We show that the sparse random feature expansions outperforms shallow networks in several scientific machine learning tasks.
△ Less
Submitted 20 August, 2021; v1 submitted 4 March, 2021;
originally announced March 2021.
-
Faster Non-Convex Federated Learning via Global and Local Momentum
Authors:
Rudrajit Das,
Anish Acharya,
Abolfazl Hashemi,
Sujay Sanghavi,
Inderjit S. Dhillon,
Ufuk Topcu
Abstract:
We propose \texttt{FedGLOMO}, a novel federated learning (FL) algorithm with an iteration complexity of $\mathcal{O}(ε^{-1.5})$ to converge to an $ε$-stationary point (i.e., $\mathbb{E}[\|\nabla f(\bm{x})\|^2] \leq ε$) for smooth non-convex functions -- under arbitrary client heterogeneity and compressed communication -- compared to the $\mathcal{O}(ε^{-2})$ complexity of most prior works. Our key…
▽ More
We propose \texttt{FedGLOMO}, a novel federated learning (FL) algorithm with an iteration complexity of $\mathcal{O}(ε^{-1.5})$ to converge to an $ε$-stationary point (i.e., $\mathbb{E}[\|\nabla f(\bm{x})\|^2] \leq ε$) for smooth non-convex functions -- under arbitrary client heterogeneity and compressed communication -- compared to the $\mathcal{O}(ε^{-2})$ complexity of most prior works. Our key algorithmic idea that enables achieving this improved complexity is based on the observation that the convergence in FL is hampered by two sources of high variance: (i) the global server aggregation step with multiple local updates, exacerbated by client heterogeneity, and (ii) the noise of the local client-level stochastic gradients. By modeling the server aggregation step as a generalized gradient-type update, we propose a variance-reducing momentum-based global update at the server, which when applied in conjunction with variance-reduced local updates at the clients, enables \texttt{FedGLOMO} to enjoy an improved convergence rate. Moreover, we derive our results under a novel and more realistic client-heterogeneity assumption which we verify empirically -- unlike prior assumptions that are hard to verify. Our experiments illustrate the intrinsic variance reduction effect of \texttt{FedGLOMO}, which implicitly suppresses client-drift in heterogeneous data distribution settings and promotes communication efficiency.
△ Less
Submitted 24 October, 2021; v1 submitted 7 December, 2020;
originally announced December 2020.
-
On the Benefits of Multiple Gossip Steps in Communication-Constrained Decentralized Optimization
Authors:
Abolfazl Hashemi,
Anish Acharya,
Rudrajit Das,
Haris Vikalo,
Sujay Sanghavi,
Inderjit Dhillon
Abstract:
In decentralized optimization, it is common algorithmic practice to have nodes interleave (local) gradient descent iterations with gossip (i.e. averaging over the network) steps. Motivated by the training of large-scale machine learning models, it is also increasingly common to require that messages be {\em lossy compressed} versions of the local parameters. In this paper, we show that, in such co…
▽ More
In decentralized optimization, it is common algorithmic practice to have nodes interleave (local) gradient descent iterations with gossip (i.e. averaging over the network) steps. Motivated by the training of large-scale machine learning models, it is also increasingly common to require that messages be {\em lossy compressed} versions of the local parameters. In this paper, we show that, in such compressed decentralized optimization settings, there are benefits to having {\em multiple} gossip steps between subsequent gradient iterations, even when the cost of doing so is appropriately accounted for e.g. by means of reducing the precision of compressed information. In particular, we show that having $O(\log\frac{1}ε)$ gradient iterations {with constant step size} - and $O(\log\frac{1}ε)$ gossip steps between every pair of these iterations - enables convergence to within $ε$ of the optimal value for smooth non-convex objectives satisfying Polyak-Łojasiewicz condition. This result also holds for smooth strongly convex objectives. To our knowledge, this is the first work that derives convergence results for nonconvex optimization under arbitrary communication compression.
△ Less
Submitted 20 November, 2020;
originally announced November 2020.
-
Identifying Sparse Low-Dimensional Structures in Markov Chains: A Nonnegative Matrix Factorization Approach
Authors:
Mahsa Ghasemi,
Abolfazl Hashemi,
Haris Vikalo,
Ufuk Topcu
Abstract:
We consider the problem of learning low-dimensional representations for large-scale Markov chains. We formulate the task of representation learning as that of mapping the state space of the model to a low-dimensional state space, called the kernel space. The kernel space contains a set of meta states which are desired to be representative of only a small subset of original states. To promote this…
▽ More
We consider the problem of learning low-dimensional representations for large-scale Markov chains. We formulate the task of representation learning as that of mapping the state space of the model to a low-dimensional state space, called the kernel space. The kernel space contains a set of meta states which are desired to be representative of only a small subset of original states. To promote this structural property, we constrain the number of nonzero entries of the mappings between the state space and the kernel space. By imposing the desired characteristics of the representation, we cast the problem as a constrained nonnegative matrix factorization. To compute the solution, we propose an efficient block coordinate gradient descent and theoretically analyze its convergence properties.
△ Less
Submitted 7 April, 2020; v1 submitted 27 September, 2019;
originally announced September 2019.
-
Sampling and Reconstruction of Graph Signals via Weak Submodularity and Semidefinite Relaxation
Authors:
Abolfazl Hashemi,
Rasoul Shafipour,
Haris Vikalo,
Gonzalo Mateos
Abstract:
We study the problem of sampling a bandlimited graph signal in the presence of noise, where the objective is to select a node subset of prescribed cardinality that minimizes the signal reconstruction mean squared error (MSE). To that end, we formulate the task at hand as the minimization of MSE subject to binary constraints, and approximate the resulting NP-hard problem via semidefinite programmin…
▽ More
We study the problem of sampling a bandlimited graph signal in the presence of noise, where the objective is to select a node subset of prescribed cardinality that minimizes the signal reconstruction mean squared error (MSE). To that end, we formulate the task at hand as the minimization of MSE subject to binary constraints, and approximate the resulting NP-hard problem via semidefinite programming (SDP) relaxation. Moreover, we provide an alternative formulation based on maximizing a monotone weak submodular function and propose a randomized-greedy algorithm to find a sub-optimal subset. We then derive a worst-case performance guarantee on the MSE returned by the randomized greedy algorithm for general non-stationary graph signals. The efficacy of the proposed methods is illustrated through numerical simulations on synthetic and real-world graphs. Notably, the randomized greedy algorithm yields an order-of-magnitude speedup over state-of-the-art greedy sampling schemes, while incurring only a marginal MSE performance loss.
△ Less
Submitted 31 October, 2017;
originally announced November 2017.
-
Accelerated Sparse Subspace Clustering
Authors:
Abolfazl Hashemi,
Haris Vikalo
Abstract:
State-of-the-art algorithms for sparse subspace clustering perform spectral clustering on a similarity matrix typically obtained by representing each data point as a sparse combination of other points using either basis pursuit (BP) or orthogonal matching pursuit (OMP). BP-based methods are often prohibitive in practice while the performance of OMP-based schemes are unsatisfactory, especially in s…
▽ More
State-of-the-art algorithms for sparse subspace clustering perform spectral clustering on a similarity matrix typically obtained by representing each data point as a sparse combination of other points using either basis pursuit (BP) or orthogonal matching pursuit (OMP). BP-based methods are often prohibitive in practice while the performance of OMP-based schemes are unsatisfactory, especially in settings where data points are highly similar. In this paper, we propose a novel algorithm that exploits an accelerated variant of orthogonal least-squares to efficiently find the underlying subspaces. We show that under certain conditions the proposed algorithm returns a subspace-preserving solution. Simulation results illustrate that the proposed method compares favorably with BP-based method in terms of running time while being significantly more accurate than OMP-based schemes.
△ Less
Submitted 31 October, 2017;
originally announced November 2017.
-
Sparse recovery via Orthogonal Least-Squares under presence of Noise
Authors:
Abolfazl Hashemi,
Haris Vikalo
Abstract:
We consider the Orthogonal Least-Squares (OLS) algorithm for the recovery of a $m$-dimensional $k$-sparse signal from a low number of noisy linear measurements. The Exact Recovery Condition (ERC) in bounded noisy scenario is established for OLS under certain condition on nonzero elements of the signal. The new result also improves the existing guarantees for Orthogonal Matching Pursuit (OMP) algor…
▽ More
We consider the Orthogonal Least-Squares (OLS) algorithm for the recovery of a $m$-dimensional $k$-sparse signal from a low number of noisy linear measurements. The Exact Recovery Condition (ERC) in bounded noisy scenario is established for OLS under certain condition on nonzero elements of the signal. The new result also improves the existing guarantees for Orthogonal Matching Pursuit (OMP) algorithm. In addition, This framework is employed to provide probabilistic guarantees for the case that the coefficient matrix is drawn at random according to Gaussian or Bernoulli distribution where we exploit some concentration properties. It is shown that under certain conditions, OLS recovers the true support in $k$ iterations with high probability. This in turn demonstrates that ${\cal O}\left(k\log m\right)$ measurements is sufficient for exact recovery of sparse signals via OLS.
△ Less
Submitted 8 August, 2016;
originally announced August 2016.
-
Sampling Requirements and Accelerated Schemes for Sparse Linear Regression with Orthogonal Least-Squares
Authors:
Abolfazl Hashemi,
Haris Vikalo
Abstract:
We study the problem of inferring a sparse vector from random linear combinations of its components. We propose the Accelerated Orthogonal Least-Squares (AOLS) algorithm that improves performance of the well-known Orthogonal Least-Squares (OLS) algorithm while requiring significantly lower computational costs. While OLS greedily selects columns of the coefficient matrix that correspond to non-zero…
▽ More
We study the problem of inferring a sparse vector from random linear combinations of its components. We propose the Accelerated Orthogonal Least-Squares (AOLS) algorithm that improves performance of the well-known Orthogonal Least-Squares (OLS) algorithm while requiring significantly lower computational costs. While OLS greedily selects columns of the coefficient matrix that correspond to non-zero components of the sparse vector, AOLS employs a novel computationally efficient procedure that speeds up the search by anticipating future selections via choosing $L$ columns in each step, where $L$ is an adjustable hyper-parameter. We analyze the performance of AOLS and establish lower bounds on the probability of exact recovery for both noiseless and noisy random linear measurements. In the noiseless scenario, it is shown that when the coefficients are samples from a Gaussian distribution, AOLS with high probability recovers a $k$-sparse $m$-dimensional sparse vector using ${\cal O}(k\log \frac{m}{k+L-1})$ measurements. Similar result is established for the bounded-noise scenario where an additional condition on the smallest nonzero element of the unknown vector is required. The asymptotic sampling complexity of AOLS is lower than the asymptotic sampling complexity of the existing sparse reconstruction algorithms. In simulations, AOLS is compared to state-of-the-art sparse recovery techniques and shown to provide better performance in terms of accuracy, running time, or both. Finally, we consider an application of AOLS to clustering high-dimensional data lying on the union of low-dimensional subspaces and demonstrate its superiority over existing methods.
△ Less
Submitted 13 April, 2018; v1 submitted 8 August, 2016;
originally announced August 2016.
-
Sparse Linear Regression via Generalized Orthogonal Least-Squares
Authors:
Abolfazl Hashemi,
Haris Vikalo
Abstract:
Sparse linear regression, which entails finding a sparse solution to an underdetermined system of linear equations, can formally be expressed as an $l_0$-constrained least-squares problem. The Orthogonal Least-Squares (OLS) algorithm sequentially selects the features (i.e., columns of the coefficient matrix) to greedily find an approximate sparse solution. In this paper, a generalization of Orthog…
▽ More
Sparse linear regression, which entails finding a sparse solution to an underdetermined system of linear equations, can formally be expressed as an $l_0$-constrained least-squares problem. The Orthogonal Least-Squares (OLS) algorithm sequentially selects the features (i.e., columns of the coefficient matrix) to greedily find an approximate sparse solution. In this paper, a generalization of Orthogonal Least-Squares which relies on a recursive relation between the components of the optimal solution to select L features at each step and solve the resulting overdetermined system of equations is proposed. Simulation results demonstrate that the generalized OLS algorithm is computationally efficient and achieves performance superior to that of existing greedy algorithms broadly used in the literature.
△ Less
Submitted 28 July, 2016; v1 submitted 22 February, 2016;
originally announced February 2016.