Search | arXiv e-print repository

arXiv:2004.01468 [pdf, other]

Epitomic Variational Graph Autoencoder

Authors: Rayyan Ahmad Khan, Muhammad Umer Anwaar, Martin Kleinsteuber

Abstract: Variational autoencoder (VAE) is a widely used generative model for learning latent representations. Burda et al. in their seminal paper showed that learning capacity of VAE is limited by over-pruning. It is a phenomenon where a significant number of latent variables fail to capture any information about the input data and the corresponding hidden units become inactive. This adversely affects lear… ▽ More Variational autoencoder (VAE) is a widely used generative model for learning latent representations. Burda et al. in their seminal paper showed that learning capacity of VAE is limited by over-pruning. It is a phenomenon where a significant number of latent variables fail to capture any information about the input data and the corresponding hidden units become inactive. This adversely affects learning diverse and interpretable latent representations. As variational graph autoencoder (VGAE) extends VAE for graph-structured data, it inherits the over-pruning problem. In this paper, we adopt a model based approach and propose epitomic VGAE (EVGAE),a generative variational framework for graph datasets which successfully mitigates the over-pruning problem and also boosts the generative ability of VGAE. We consider EVGAE to consist of multiple sparse VGAE models, called epitomes, that are groups of latent variables sharing the latent space. This approach aids in increasing active units as epitomes compete to learn better representation of the graph data. We verify our claims via experiments on three benchmark datasets. Our experiments show that EVGAE has a better generative ability than VGAE. Moreover, EVGAE outperforms VGAE on link prediction task in citation networks. △ Less

Submitted 7 August, 2020; v1 submitted 3 April, 2020; originally announced April 2020.

arXiv:1608.05493 [pdf, ps, other]

doi 10.1109/TNSM.2016.2598788

Network Volume Anomaly Detection and Identification in Large-scale Networks based on Online Time-structured Traffic Tensor Tracking

Authors: Hiroyuki Kasai, Wolfgang Kellerer, Martin Kleinsteuber

Abstract: This paper addresses network anomography, that is, the problem of inferring network-level anomalies from indirect link measurements. This problem is cast as a low-rank subspace tracking problem for normal flows under incomplete observations, and an outlier detection problem for abnormal flows. Since traffic data is large-scale time-structured data accompanied with noise and outliers under partial… ▽ More This paper addresses network anomography, that is, the problem of inferring network-level anomalies from indirect link measurements. This problem is cast as a low-rank subspace tracking problem for normal flows under incomplete observations, and an outlier detection problem for abnormal flows. Since traffic data is large-scale time-structured data accompanied with noise and outliers under partial observations, an efficient modeling method is essential. To this end, this paper proposes an online subspace tracking of a Hankelized time-structured traffic tensor for normal flows based on the Candecomp/PARAFAC decomposition exploiting the recursive least squares (RLS) algorithm. We estimate abnormal flows as outlier sparse flows via sparsity maximization in the underlying under-constrained linear-inverse problem. A major advantage is that our algorithm estimates normal flows by low-dimensional matrices with time-directional features as well as the spatial correlation of multiple links without using the past observed measurements and the past model parameters. Extensive numerical evaluations show that the proposed algorithm achieves faster convergence per iteration of model approximation, and better volume anomaly detection performance compared to state-of-the-art algorithms. △ Less

Submitted 19 August, 2016; originally announced August 2016.

Comments: IEEE Transactions on Network and Service Management

Journal ref: IEEE Transactions on Network and Service Management, vol.13, no.3, pp.636-650, 2016

arXiv:1506.03958 [pdf, other]

Robust Structured Low-Rank Approximation on the Grassmannian

Authors: Clemens Hage, Martin Kleinsteuber

Abstract: Over the past years Robust PCA has been established as a standard tool for reliable low-rank approximation of matrices in the presence of outliers. Recently, the Robust PCA approach via nuclear norm minimization has been extended to matrices with linear structures which appear in applications such as system identification and data series analysis. At the same time it has been shown how to control… ▽ More Over the past years Robust PCA has been established as a standard tool for reliable low-rank approximation of matrices in the presence of outliers. Recently, the Robust PCA approach via nuclear norm minimization has been extended to matrices with linear structures which appear in applications such as system identification and data series analysis. At the same time it has been shown how to control the rank of a structured approximation via matrix factorization approaches. The drawbacks of these methods either lie in the lack of robustness against outliers or in their static nature of repeated batch-processing. We present a Robust Structured Low-Rank Approximation method on the Grassmannian that on the one hand allows for fast re-initialization in an online setting due to subspace identification with manifolds, and that is robust against outliers due to a smooth approximation of the $\ell_p$-norm cost function on the other hand. The method is evaluated in online time series forecasting tasks on simulated and real-world data. △ Less

Submitted 12 June, 2015; originally announced June 2015.

arXiv:1503.02398 [pdf, other]

doi 10.1109/TSP.2015.2481875

Learning Co-Sparse Analysis Operators with Separable Structures

Authors: Matthias Seibert, Julian Wörmann, Rémi Gribonval, Martin Kleinsteuber

Abstract: In the co-sparse analysis model a set of filters is applied to a signal out of the signal class of interest yielding sparse filter responses. As such, it may serve as a prior in inverse problems, or for structural analysis of signals that are known to belong to the signal class. The more the model is adapted to the class, the more reliable it is for these purposes. The task of learning such operat… ▽ More In the co-sparse analysis model a set of filters is applied to a signal out of the signal class of interest yielding sparse filter responses. As such, it may serve as a prior in inverse problems, or for structural analysis of signals that are known to belong to the signal class. The more the model is adapted to the class, the more reliable it is for these purposes. The task of learning such operators for a given class is therefore a crucial problem. In many applications, it is also required that the filter responses are obtained in a timely manner, which can be achieved by filters with a separable structure. Not only can operators of this sort be efficiently used for computing the filter responses, but they also have the advantage that less training samples are required to obtain a reliable estimate of the operator. The first contribution of this work is to give theoretical evidence for this claim by providing an upper bound for the sample complexity of the learning process. The second is a stochastic gradient descent (SGD) method designed to learn an analysis operator with separable structures, which includes a novel and efficient step size selection rule. Numerical experiments are provided that link the sample complexity to the convergence speed of the SGD algorithm. △ Less

Submitted 11 September, 2015; v1 submitted 9 March, 2015; originally announced March 2015.

Comments: 11 pages double column, 4 figures, 3 tables

arXiv:1406.1621 [pdf, other]

Separable Cosparse Analysis Operator Learning

Authors: Matthias Seibert, Julian Wörmann, Rémi Gribonval, Martin Kleinsteuber

Abstract: The ability of having a sparse representation for a certain class of signals has many applications in data analysis, image processing, and other research fields. Among sparse representations, the cosparse analysis model has recently gained increasing interest. Many signals exhibit a multidimensional structure, e.g. images or three-dimensional MRI scans. Most data analysis and learning algorithms u… ▽ More The ability of having a sparse representation for a certain class of signals has many applications in data analysis, image processing, and other research fields. Among sparse representations, the cosparse analysis model has recently gained increasing interest. Many signals exhibit a multidimensional structure, e.g. images or three-dimensional MRI scans. Most data analysis and learning algorithms use vectorized signals and thereby do not account for this underlying structure. The drawback of not taking the inherent structure into account is a dramatic increase in computational cost. We propose an algorithm for learning a cosparse Analysis Operator that adheres to the preexisting structure of the data, and thus allows for a very efficient implementation. This is achieved by enforcing a separable structure on the learned operator. Our learning algorithm is able to deal with multidimensional data of arbitrary order. We evaluate our method on volumetric data at the example of three-dimensional MRI scans. △ Less

Submitted 6 June, 2014; originally announced June 2014.

Comments: 5 pages, 3 figures, accepted at EUSIPCO 2014

arXiv:1403.5112 [pdf, ps, other]

doi 10.1109/SSP.2014.6884621

On The Sample Complexity of Sparse Dictionary Learning

Authors: Matthias Seibert, Martin Kleinsteuber, Rémi Gribonval, Rodolphe Jenatton, Francis Bach

Abstract: In the synthesis model signals are represented as a sparse combinations of atoms from a dictionary. Dictionary learning describes the acquisition process of the underlying dictionary for a given set of training samples. While ideally this would be achieved by optimizing the expectation of the factors over the underlying distribution of the training data, in practice the necessary information about… ▽ More In the synthesis model signals are represented as a sparse combinations of atoms from a dictionary. Dictionary learning describes the acquisition process of the underlying dictionary for a given set of training samples. While ideally this would be achieved by optimizing the expectation of the factors over the underlying distribution of the training data, in practice the necessary information about the distribution is not available. Therefore, in real world applications it is achieved by minimizing an empirical average over the available samples. The main goal of this paper is to provide a sample complexity estimate that controls to what extent the empirical average deviates from the cost function. This estimate then provides a suitable estimate to the accuracy of the representation of the learned dictionary. The presented approach exemplifies the general results proposed by the authors in Sample Complexity of Dictionary Learning and other Matrix Factorizations, Gribonval et al. and gives more concrete bounds of the sample complexity of dictionary learning. We cover a variety of sparsity measures employed in the learning procedure. △ Less

Submitted 20 March, 2014; originally announced March 2014.

Comments: 4 pages, submitted to Statistical Signal Processing Workshop 2014

arXiv:1312.3790 [pdf, ps, other]

Sample Complexity of Dictionary Learning and other Matrix Factorizations

Authors: Rémi Gribonval, Rodolphe Jenatton, Francis Bach, Martin Kleinsteuber, Matthias Seibert

Abstract: Many modern tools in machine learning and signal processing, such as sparse dictionary learning, principal component analysis (PCA), non-negative matrix factorization (NMF), $K$-means clustering, etc., rely on the factorization of a matrix obtained by concatenating high-dimensional vectors from a training collection. While the idealized task would be to optimize the expected quality of the factors… ▽ More Many modern tools in machine learning and signal processing, such as sparse dictionary learning, principal component analysis (PCA), non-negative matrix factorization (NMF), $K$-means clustering, etc., rely on the factorization of a matrix obtained by concatenating high-dimensional vectors from a training collection. While the idealized task would be to optimize the expected quality of the factors over the underlying distribution of training vectors, it is achieved in practice by minimizing an empirical average over the considered collection. The focus of this paper is to provide sample complexity estimates to uniformly control how much the empirical average deviates from the expected cost function. Standard arguments imply that the performance of the empirical predictor also exhibit such guarantees. The level of genericity of the approach encompasses several possible constraints on the factors (tensor product structure, shift-invariance, sparsity \ldots), thus providing a unified perspective on the sample complexity of several widely used matrix factorization schemes. The derived generalization bounds behave proportional to $\sqrt{\log(n)/n}$ w.r.t.\ the number of samples $n$ for the considered matrix factorization techniques. △ Less

Submitted 9 April, 2015; v1 submitted 13 December, 2013; originally announced December 2013.

Comments: to appear

Journal ref: IEEE Transactions on Information Theory, Institute of Electrical and Electronics Engineers (IEEE), 2015, pp.18

arXiv:1303.5244 [pdf, other]

Separable Dictionary Learning

Authors: Simon Hawe, Matthias Seibert, Martin Kleinsteuber

Abstract: Many techniques in computer vision, machine learning, and statistics rely on the fact that a signal of interest admits a sparse representation over some dictionary. Dictionaries are either available analytically, or can be learned from a suitable training set. While analytic dictionaries permit to capture the global structure of a signal and allow a fast implementation, learned dictionaries often… ▽ More Many techniques in computer vision, machine learning, and statistics rely on the fact that a signal of interest admits a sparse representation over some dictionary. Dictionaries are either available analytically, or can be learned from a suitable training set. While analytic dictionaries permit to capture the global structure of a signal and allow a fast implementation, learned dictionaries often perform better in applications as they are more adapted to the considered class of signals. In imagery, unfortunately, the numerical burden for (i) learning a dictionary and for (ii) employing the dictionary for reconstruction tasks only allows to deal with relatively small image patches that only capture local image information. The approach presented in this paper aims at overcoming these drawbacks by allowing a separable structure on the dictionary throughout the learning process. On the one hand, this permits larger patch-sizes for the learning phase, on the other hand, the dictionary is applied efficiently in reconstruction tasks. The learning procedure is based on optimizing over a product of spheres which updates the dictionary as a whole, thus enforces basic dictionary properties such as mutual coherence explicitly during the learning procedure. In the special case where no separable structure is enforced, our method competes with state-of-the-art dictionary learning methods like K-SVD. △ Less

Submitted 21 March, 2013; originally announced March 2013.

Comments: 12 pages, 2 figures, 1 table

arXiv:1210.0805 [pdf, other]

Robust PCA and subspace tracking from incomplete observations using L0-surrogates

Authors: Clemens Hage, Martin Kleinsteuber

Abstract: Many applications in data analysis rely on the decomposition of a data matrix into a low-rank and a sparse component. Existing methods that tackle this task use the nuclear norm and L1-cost functions as convex relaxations of the rank constraint and the sparsity measure, respectively, or employ thresholding techniques. We propose a method that allows for reconstructing and tracking a subspace of up… ▽ More Many applications in data analysis rely on the decomposition of a data matrix into a low-rank and a sparse component. Existing methods that tackle this task use the nuclear norm and L1-cost functions as convex relaxations of the rank constraint and the sparsity measure, respectively, or employ thresholding techniques. We propose a method that allows for reconstructing and tracking a subspace of upper-bounded dimension from incomplete and corrupted observations. It does not require any a priori information about the number of outliers. The core of our algorithm is an intrinsic Conjugate Gradient method on the set of orthogonal projection matrices, the so-called Grassmannian. Non-convex sparsity measures are used for outlier detection, which leads to improved performance in terms of robustly recovering and tracking the low-rank matrix. In particular, our approach can cope with more outliers and with an underlying matrix of higher rank than other state-of-the-art methods. △ Less

Submitted 17 January, 2013; v1 submitted 2 October, 2012; originally announced October 2012.

Showing 1–9 of 9 results for author: Kleinsteuber, M