Search | arXiv e-print repository

A Metropolis-Adjusted Langevin Algorithm for Sampling Jeffreys Prior

Authors: Yibo Shi, Braghadeesh Lakshminarayanan, Cristian R. Rojas

Abstract: Inference and estimation are fundamental aspects of statistics, system identification and machine learning. For most inference problems, prior knowledge is available on the system to be modeled, and Bayesian analysis is a natural framework to impose such prior information in the form of a prior distribution. However, in many situations, coming out with a fully specified prior distribution is not e… ▽ More Inference and estimation are fundamental aspects of statistics, system identification and machine learning. For most inference problems, prior knowledge is available on the system to be modeled, and Bayesian analysis is a natural framework to impose such prior information in the form of a prior distribution. However, in many situations, coming out with a fully specified prior distribution is not easy, as prior knowledge might be too vague, so practitioners prefer to use a prior distribution that is as `ignorant' or `uninformative' as possible, in the sense of not imposing subjective beliefs, while still supporting reliable statistical analysis. Jeffreys prior is an appealing uninformative prior because it offers two important benefits: (i) it is invariant under any re-parameterization of the model, (ii) it encodes the intrinsic geometric structure of the parameter space through the Fisher information matrix, which in turn enhances the diversity of parameter samples. Despite these benefits, drawing samples from Jeffreys prior is a challenging task. In this paper, we propose a general sampling scheme using the Metropolis-Adjusted Langevin Algorithm that enables sampling of parameter values from Jeffreys prior, and provide numerical illustrations of our approach through several examples. △ Less

Submitted 15 April, 2025; v1 submitted 8 April, 2025; originally announced April 2025.

Comments: 7 pages

arXiv:2402.06048 [pdf, ps, other]

Balancing Application Relevant and Sparsity Revealing Excitation in Input Design

Authors: Javad Parsa, Cristian R. Rojas, Håkan Hjalmarsson

Abstract: The maximum absolute correlation between regressors, which is called mutual coherence, plays an essential role in sparse estimation. A regressor matrix whose columns are highly correlated may result from optimal input design, since there is no constraint on the mutual coherence, making it difficult to handle sparse estimation. This paper aims to tackle this issue for fixed denominator models, whic… ▽ More The maximum absolute correlation between regressors, which is called mutual coherence, plays an essential role in sparse estimation. A regressor matrix whose columns are highly correlated may result from optimal input design, since there is no constraint on the mutual coherence, making it difficult to handle sparse estimation. This paper aims to tackle this issue for fixed denominator models, which include Laguerre, Kautz, and generalized orthonormal basis function expansion models, for example. The paper proposes an optimal input design method where the achieved Fisher information matrix is fitted to the desired Fisher matrix, together with a coordinate transformation designed to make the regressors in the transformed coordinates have low mutual coherence. The method can be used together with any sparse estimation method and any desired Fisher matrix. A numerical study shows its potential for alleviating the problem of model order selection when used in conjunction with, for example, classical methods such as the Akaike Information Criterion. △ Less

Submitted 10 October, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

Comments: Accepted to the IEEE Transactions on Automatic Control

arXiv:2311.11657 [pdf, other]

Minimax Two-Stage Gradient Boosting for Parameter Estimation

Authors: Braghadeesh Lakshminarayanan, Cristian R. Rojas

Abstract: Parameter estimation is an important sub-field in statistics and system identification. Various methods for parameter estimation have been proposed in the literature, among which the Two-Stage (TS) approach is particularly promising, due to its ease of implementation and reliable estimates. Among the different statistical frameworks used to derive TS estimators, the min-max framework is attractive… ▽ More Parameter estimation is an important sub-field in statistics and system identification. Various methods for parameter estimation have been proposed in the literature, among which the Two-Stage (TS) approach is particularly promising, due to its ease of implementation and reliable estimates. Among the different statistical frameworks used to derive TS estimators, the min-max framework is attractive due to its mild dependence on prior knowledge about the parameters to be estimated. However, the existing implementation of the minimax TS approach has currently limited applicability, due to its heavy computational load. In this paper, we overcome this difficulty by using a gradient boosting machine (GBM) in the second stage of TS approach. We call the resulting algorithm the Two-Stage Gradient Boosting Machine (TSGBM) estimator. Finally, we test our proposed TSGBM estimator on several numerical examples including models of dynamical systems. △ Less

Submitted 20 November, 2023; originally announced November 2023.

Comments: 6 pages

arXiv:2306.07024 [pdf, other]

DRCFS: Doubly Robust Causal Feature Selection

Authors: Francesco Quinzan, Ashkan Soleymani, Patrick Jaillet, Cristian R. Rojas, Stefan Bauer

Abstract: Knowing the features of a complex system that are highly relevant to a particular target variable is of fundamental interest in many areas of science. Existing approaches are often limited to linear settings, sometimes lack guarantees, and in most cases, do not scale to the problem at hand, in particular to images. We propose DRCFS, a doubly robust feature selection method for identifying the caus… ▽ More Knowing the features of a complex system that are highly relevant to a particular target variable is of fundamental interest in many areas of science. Existing approaches are often limited to linear settings, sometimes lack guarantees, and in most cases, do not scale to the problem at hand, in particular to images. We propose DRCFS, a doubly robust feature selection method for identifying the causal features even in nonlinear and high dimensional settings. We provide theoretical guarantees, illustrate necessary conditions for our assumptions, and perform extensive experiments across a wide range of simulated and semi-synthetic datasets. DRCFS significantly outperforms existing state-of-the-art methods, selecting robust features even in challenging highly non-linear and high-dimensional problems. △ Less

Submitted 5 July, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

arXiv:2305.03295 [pdf, other]

Decentralized diffusion-based learning under non-parametric limited prior knowledge

Authors: Paweł Wachel, Krzysztof Kowalczyk, Cristian R. Rojas

Abstract: We study the problem of diffusion-based network learning of a nonlinear phenomenon, $m$, from local agents' measurements collected in a noisy environment. For a decentralized network and information spreading merely between directly neighboring nodes, we propose a non-parametric learning algorithm, that avoids raw data exchange and requires only mild \textit{a priori} knowledge about $m$. Non-asym… ▽ More We study the problem of diffusion-based network learning of a nonlinear phenomenon, $m$, from local agents' measurements collected in a noisy environment. For a decentralized network and information spreading merely between directly neighboring nodes, we propose a non-parametric learning algorithm, that avoids raw data exchange and requires only mild \textit{a priori} knowledge about $m$. Non-asymptotic estimation error bounds are derived for the proposed method. Its potential applications are illustrated through simulation experiments. △ Less

Submitted 5 May, 2023; originally announced May 2023.

arXiv:2211.10332 [pdf, other]

A Unified Approach to Differentially Private Bayes Point Estimation

Authors: Braghadeesh Lakshminarayanan, Cristian R. Rojas

Abstract: Parameter estimation in statistics and system identification relies on data that may contain sensitive information. To protect this sensitive information, the notion of \emph{differential privacy} (DP) has been proposed, which enforces confidentiality by introducing randomization in the estimates. Standard algorithms for differentially private estimation are based on adding an appropriate amount o… ▽ More Parameter estimation in statistics and system identification relies on data that may contain sensitive information. To protect this sensitive information, the notion of \emph{differential privacy} (DP) has been proposed, which enforces confidentiality by introducing randomization in the estimates. Standard algorithms for differentially private estimation are based on adding an appropriate amount of noise to the output of a traditional point estimation method. This leads to an accuracy-privacy trade off, as adding more noise reduces the accuracy while increasing privacy. In this paper, we propose a new Unified Bayes Private Point (UBaPP) approach to Bayes point estimation of the unknown parameters of a data generating mechanism under a DP constraint, that achieves a better accuracy-privacy trade off than traditional approaches. We verify the performance of our approach on a simple numerical example. △ Less

Submitted 18 November, 2022; originally announced November 2022.

arXiv:2204.00036 [pdf, other]

doi 10.1109/CDC51059.2022.9993024

A Statistical Decision-Theoretical Perspective on the Two-Stage Approach to Parameter Estimation

Authors: Braghadeesh Lakshminarayanan, Cristian R. Rojas

Abstract: One of the most important problems in system identification and statistics is how to estimate the unknown parameters of a given model. Optimization methods and specialized procedures, such as Empirical Minimization (EM) can be used in case the likelihood function can be computed. For situations where one can only simulate from a parametric model, but the likelihood is difficult or impossible to ev… ▽ More One of the most important problems in system identification and statistics is how to estimate the unknown parameters of a given model. Optimization methods and specialized procedures, such as Empirical Minimization (EM) can be used in case the likelihood function can be computed. For situations where one can only simulate from a parametric model, but the likelihood is difficult or impossible to evaluate, a technique known as the Two-Stage (TS) Approach can be applied to obtain reliable parametric estimates. Unfortunately, there is currently a lack of theoretical justification for TS. In this paper, we propose a statistical decision-theoretical derivation of TS, which leads to Bayesian and Minimax estimators. We also show how to apply the TS approach on models for independent and identically distributed samples, by computing quantiles of the data as a first step, and using a linear function as the second stage. The proposed method is illustrated via numerical simulations. △ Less

Submitted 15 April, 2022; v1 submitted 31 March, 2022; originally announced April 2022.

Comments: 7 pages, 6 figures, 1 table

arXiv:2105.14114 [pdf, other]

Asymptotically Optimal Bandits under Weighted Information

Authors: Matias I. Müller, Cristian R. Rojas

Abstract: We study the problem of regret minimization in a multi-armed bandit setup where the agent is allowed to play multiple arms at each round by spreading the resources usually allocated to only one arm. At each iteration the agent selects a normalized power profile and receives a Gaussian vector as outcome, where the unknown variance of each sample is inversely proportional to the power allocated to t… ▽ More We study the problem of regret minimization in a multi-armed bandit setup where the agent is allowed to play multiple arms at each round by spreading the resources usually allocated to only one arm. At each iteration the agent selects a normalized power profile and receives a Gaussian vector as outcome, where the unknown variance of each sample is inversely proportional to the power allocated to that arm. The reward corresponds to a linear combination of the power profile and the outcomes, resembling a linear bandit. By spreading the power, the agent can choose to collect information much faster than in a traditional multi-armed bandit at the price of reducing the accuracy of the samples. This setup is fundamentally different from that of a linear bandit -- the regret is known to scale as $Θ(\sqrt{T})$ for linear bandits, while in this setup the agent receives a much more detailed feedback, for which we derive a tight $\log(T)$ problem-dependent lower-bound. We propose a Thompson-Sampling-based strategy, called Weighted Thompson Sampling (\WTS), that designs the power profile as its posterior belief of each arm being the best arm, and show that its upper bound matches the derived logarithmic lower bound. Finally, we apply this strategy to a problem of control and system identification, where the goal is to estimate the maximum gain (also called $\mathcal{H}_\infty$-norm) of a linear dynamical system based on batches of input-output samples. △ Less

Submitted 28 May, 2021; originally announced May 2021.

Comments: 9 content pages, 3 references pages, 22 appendix pages, 4 figures, 34 total pages

arXiv:1912.08103 [pdf, ps, other]

A Finite-Sample Deviation Bound for Stable Autoregressive Processes

Authors: Rodrigo A. González, Cristian R. Rojas

Abstract: In this paper, we study non-asymptotic deviation bounds of the least squares estimator in Gaussian AR($n$) processes. By relying on martingale concentration inequalities and a tail-bound for $χ^2$ distributed variables, we provide a concentration bound for the sample covariance matrix of the process output. With this, we present a problem-dependent finite-time bound on the deviation probability of… ▽ More In this paper, we study non-asymptotic deviation bounds of the least squares estimator in Gaussian AR($n$) processes. By relying on martingale concentration inequalities and a tail-bound for $χ^2$ distributed variables, we provide a concentration bound for the sample covariance matrix of the process output. With this, we present a problem-dependent finite-time bound on the deviation probability of any fixed linear combination of the estimated parameters of the AR$(n)$ process. We discuss extensions and limitations of our approach. △ Less

Submitted 25 May, 2020; v1 submitted 17 December, 2019; originally announced December 2019.

Comments: 15 pages

arXiv:1912.01308 [pdf, other]

Bayesian Model Selection for Change Point Detection and Clustering

Authors: Othmane Mazhar, Cristian R. Rojas, Carlo Fischione, Mohammad R. Hesamzadeh

Abstract: We address the new problem of estimating a piece-wise constant signal with the purpose of detecting its change points and the levels of clusters. Our approach is to model it as a nonparametric penalized least square model selection on a family of models indexed over the collection of partitions of the design points and propose a computationally efficient algorithm to approximately solve it. Statis… ▽ More We address the new problem of estimating a piece-wise constant signal with the purpose of detecting its change points and the levels of clusters. Our approach is to model it as a nonparametric penalized least square model selection on a family of models indexed over the collection of partitions of the design points and propose a computationally efficient algorithm to approximately solve it. Statistically, minimizing such a penalized criterion yields an approximation to the maximum a posteriori probability (MAP) estimator. The criterion is then analyzed and an oracle inequality is derived using a Gaussian concentration inequality. The oracle inequality is used to derive on one hand conditions for consistency and on the other hand an adaptive upper bound on the expected square risk of the estimator, which statistically motivates our approximation. Finally, we apply our algorithm to simulated data to experimentally validate the statistical guarantees and illustrate its behavior. △ Less

Submitted 3 December, 2019; originally announced December 2019.

Comments: 37 page, 4 figures, Proceedings of the 35th International Conference on Machine Learning (ICML), PMLR 80:3433-3442, 2018

arXiv:1507.07238 [pdf, ps, other]

Estimator Selection: End-Performance Metric Aspects

Authors: Dimitrios Katselis, Cristian R. Rojas, Carolyn L. Beck

Abstract: Recently, a framework for application-oriented optimal experiment design has been introduced. In this context, the distance of the estimated system from the true one is measured in terms of a particular end-performance metric. This treatment leads to superior unknown system estimates to classical experiment designs based on usual pointwise functional distances of the estimated system from the true… ▽ More Recently, a framework for application-oriented optimal experiment design has been introduced. In this context, the distance of the estimated system from the true one is measured in terms of a particular end-performance metric. This treatment leads to superior unknown system estimates to classical experiment designs based on usual pointwise functional distances of the estimated system from the true one. The separation of the system estimator from the experiment design is done within this new framework by choosing and fixing the estimation method to either a maximum likelihood (ML) approach or a Bayesian estimator such as the minimum mean square error (MMSE). Since the MMSE estimator delivers a system estimate with lower mean square error (MSE) than the ML estimator for finite-length experiments, it is usually considered the best choice in practice in signal processing and control applications. Within the application-oriented framework a related meaningful question is: Are there end-performance metrics for which the ML estimator outperforms the MMSE when the experiment is finite-length? In this paper, we affirmatively answer this question based on a simple linear Gaussian regression example. △ Less

Submitted 26 July, 2015; originally announced July 2015.

Comments: arXiv admin note: substantial text overlap with arXiv:1303.4289

arXiv:1507.06346 [pdf, other]

Evaluation of Spectral Learning for the Identification of Hidden Markov Models

Authors: Robert Mattila, Cristian R. Rojas, Bo Wahlberg

Abstract: Hidden Markov models have successfully been applied as models of discrete time series in many fields. Often, when applied in practice, the parameters of these models have to be estimated. The currently predominating identification methods, such as maximum-likelihood estimation and especially expectation-maximization, are iterative and prone to have problems with local minima. A non-iterative metho… ▽ More Hidden Markov models have successfully been applied as models of discrete time series in many fields. Often, when applied in practice, the parameters of these models have to be estimated. The currently predominating identification methods, such as maximum-likelihood estimation and especially expectation-maximization, are iterative and prone to have problems with local minima. A non-iterative method employing a spectral subspace-like approach has recently been proposed in the machine learning literature. This paper evaluates the performance of this algorithm, and compares it to the performance of the expectation-maximization algorithm, on a number of numerical examples. We find that the performance is mixed; it successfully identifies some systems with relatively few available observations, but fails completely for some systems even when a large amount of observations is available. An open question is how this discrepancy can be explained. We provide some indications that it could be related to how well-conditioned some system parameters are. △ Less

Submitted 22 July, 2015; originally announced July 2015.

Comments: This paper is accepted and will be published in The Proceedings of the 17th IFAC Symposium on System Identification (SYSID 2015), Beijing, China, 2015

arXiv:1501.05740 [pdf, other]

Bayesian Learning for Low-Rank matrix reconstruction

Authors: Martin Sundin, Cristian R. Rojas, Magnus Jansson, Saikat Chatterjee

Abstract: We develop latent variable models for Bayesian learning based low-rank matrix completion and reconstruction from linear measurements. For under-determined systems, the developed methods are shown to reconstruct low-rank matrices when neither the rank nor the noise power is known a-priori. We derive relations between the latent variable models and several low-rank promoting penalty functions. The r… ▽ More We develop latent variable models for Bayesian learning based low-rank matrix completion and reconstruction from linear measurements. For under-determined systems, the developed methods are shown to reconstruct low-rank matrices when neither the rank nor the noise power is known a-priori. We derive relations between the latent variable models and several low-rank promoting penalty functions. The relations justify the use of Kronecker structured covariance matrices in a Gaussian based prior. In the methods, we use evidence approximation and expectation-maximization to learn the model parameters. The performance of the methods is evaluated through extensive numerical simulations. △ Less

Submitted 23 January, 2015; originally announced January 2015.

Comments: Submitted to IEEE Transactions on Signal Processing

arXiv:1412.0607 [pdf, other]

How to monitor and mitigate stair-casing in l1 trend filtering

Authors: Cristian R. Rojas, Bo Wahlberg

Abstract: In this paper we study the estimation of changing trends in time-series using $\ell_1$ trend filtering. This method generalizes 1D Total Variation (TV) denoising for detection of step changes in means to detecting changes in trends, and it relies on a convex optimization problem for which there are very efficient numerical algorithms. It is known that TV denoising suffers from the so-called stair-… ▽ More In this paper we study the estimation of changing trends in time-series using $\ell_1$ trend filtering. This method generalizes 1D Total Variation (TV) denoising for detection of step changes in means to detecting changes in trends, and it relies on a convex optimization problem for which there are very efficient numerical algorithms. It is known that TV denoising suffers from the so-called stair-case effect, which leads to detecting false change points. The objective of this paper is to show that $\ell_1$ trend filtering also suffers from a certain stair-case problem. The analysis is based on an interpretation of the dual variables of the optimization problem in the method as integrated random walk. We discuss consistency conditions for $\ell_1$ trend filtering, how to monitor their fulfillment, and how to modify the algorithm to avoid the stair-case false detection problem. △ Less

Submitted 1 December, 2014; originally announced December 2014.

arXiv:1407.5820 [pdf, other]

Approximate Regularization Path for Nuclear Norm Based H2 Model Reduction

Authors: Niclas Blomberg, Cristian R. Rojas, Bo Wahlberg

Abstract: This paper concerns model reduction of dynamical systems using the nuclear norm of the Hankel matrix to make a trade-off between model fit and model complexity. This results in a convex optimization problem where this trade-off is determined by one crucial design parameter. The main contribution is a methodology to approximately calculate all solutions up to a certain tolerance to the model reduct… ▽ More This paper concerns model reduction of dynamical systems using the nuclear norm of the Hankel matrix to make a trade-off between model fit and model complexity. This results in a convex optimization problem where this trade-off is determined by one crucial design parameter. The main contribution is a methodology to approximately calculate all solutions up to a certain tolerance to the model reduction problem as a function of the design parameter. This is called the regularization path in sparse estimation and is a very important tool in order to find the appropriate balance between fit and complexity. We extend this to the more complicated nuclear norm case. The key idea is to determine when to exactly calculate the optimal solution using an upper bound based on the so-called duality gap. Hence, by solving a fixed number of optimization problems the whole regularization path up to a given tolerance can be efficiently computed. We illustrate this approach on some numerical examples. △ Less

Submitted 22 July, 2014; originally announced July 2014.

arXiv:1401.5408 [pdf, other]

On change point detection using the fused lasso method

Authors: Cristian R. Rojas, Bo Wahlberg

Abstract: In this paper we analyze the asymptotic properties of l1 penalized maximum likelihood estimation of signals with piece-wise constant mean values and/or variances. The focus is on segmentation of a non-stationary time series with respect to changes in these model parameters. This change point detection and estimation problem is also referred to as total variation denoising or l1 -mean filtering and… ▽ More In this paper we analyze the asymptotic properties of l1 penalized maximum likelihood estimation of signals with piece-wise constant mean values and/or variances. The focus is on segmentation of a non-stationary time series with respect to changes in these model parameters. This change point detection and estimation problem is also referred to as total variation denoising or l1 -mean filtering and has many important applications in most fields of science and engineering. We establish the (approximate) sparse consistency properties, including rate of convergence, of the so-called fused lasso signal approximator (FLSA). We show that this only holds if the sign of the corresponding consecutive changes are all different, and that this estimator is otherwise incapable of correctly detecting the underlying sparsity pattern. The key idea is to notice that the optimality conditions for this problem can be analyzed using techniques related to brownian bridge theory. △ Less

Submitted 21 January, 2014; originally announced January 2014.

MSC Class: 62G08; 62G20

arXiv:1209.4887 [pdf, other]

doi 10.1109/TSP.2013.2272291

A Note on the SPICE Method

Authors: Cristian R. Rojas, Dimitrios Katselis, Håkan Hjalmarsson

Abstract: In this article, we analyze the SPICE method developed in [1], and establish its connections with other standard sparse estimation methods such as the Lasso and the LAD-Lasso. This result positions SPICE as a computationally efficient technique for the calculation of Lasso-type estimators. Conversely, this connection is very useful for establishing the asymptotic properties of SPICE under several… ▽ More In this article, we analyze the SPICE method developed in [1], and establish its connections with other standard sparse estimation methods such as the Lasso and the LAD-Lasso. This result positions SPICE as a computationally efficient technique for the calculation of Lasso-type estimators. Conversely, this connection is very useful for establishing the asymptotic properties of SPICE under several problem scenarios and for suggesting suitable modifications in cases where the naive version of SPICE would not work. △ Less

Submitted 21 September, 2012; originally announced September 2012.

Comments: 5 pages, 1 figure. Submitted to the IEEE Transactions on Signal Processing

arXiv:1111.5948 [pdf, other]

On l_1 Mean and Variance Filtering

Authors: Bo Wahlberg, Cristian R. Rojas, Mariette Annergren

Abstract: This paper addresses the problem of segmenting a time-series with respect to changes in the mean value or in the variance. The first case is when the time data is modeled as a sequence of independent and normal distributed random variables with unknown, possibly changing, mean value but fixed variance. The main assumption is that the mean value is piecewise constant in time, and the task is to est… ▽ More This paper addresses the problem of segmenting a time-series with respect to changes in the mean value or in the variance. The first case is when the time data is modeled as a sequence of independent and normal distributed random variables with unknown, possibly changing, mean value but fixed variance. The main assumption is that the mean value is piecewise constant in time, and the task is to estimate the change times and the mean values within the segments. The second case is when the mean value is constant, but the variance can change. The assumption is that the variance is piecewise constant in time, and we want to estimate change times and the variance values within the segments. To find solutions to these problems, we will study an l_1 regularized maximum likelihood method, related to the fused lasso method and l_1 trend filtering, where the parameters to be estimated are free to vary at each sample. To penalize variations in the estimated parameters, the l_1-norm of the time difference of the parameters is used as a regularization term. This idea is closely related to total variation denoising. The main contribution is that a convex formulation of this variance estimation problem, where the parametrization is based on the inverse of the variance, can be formulated as a certain l_1 mean estimation problem. This implies that results and methods for mean estimation can be applied to the challenging problem of variance segmentation/estimation. △ Less

Submitted 25 November, 2011; originally announced November 2011.

Comments: The 45th Annual Asilomar Conference on Signals, Systems, and Computers, November 6-9, 2011, Pacific Grove, California, USA

Showing 1–18 of 18 results for author: Rojas, C R