-
Expressivity of Quadratic Neural ODEs
Authors:
Joshua Hanson,
Maxim Raginsky
Abstract:
This work focuses on deriving quantitative approximation error bounds for neural ordinary differential equations having at most quadratic nonlinearities in the dynamics. The simple dynamics of this model form demonstrates how expressivity can be derived primarily from iteratively composing many basic elementary operations, versus from the complexity of those elementary operations themselves. Like…
▽ More
This work focuses on deriving quantitative approximation error bounds for neural ordinary differential equations having at most quadratic nonlinearities in the dynamics. The simple dynamics of this model form demonstrates how expressivity can be derived primarily from iteratively composing many basic elementary operations, versus from the complexity of those elementary operations themselves. Like the analog differential analyzer and universal polynomial DAEs, the expressivity is derived instead primarily from the "depth" of the model. These results contribute to our understanding of what depth specifically imparts to the capabilities of deep learning architectures.
△ Less
Submitted 12 April, 2025;
originally announced April 2025.
-
Talagrand Meets Talagrand: Upper and Lower Bounds on Expected Soft Maxima of Gaussian Processes with Finite Index Sets
Authors:
Yifeng Chu,
Maxim Raginsky
Abstract:
Analysis of extremal behavior of stochastic processes is a key ingredient in a wide variety of applications, including probability, statistical physics, theoretical computer science, and learning theory. In this paper, we consider centered Gaussian processes on finite index sets and investigate expected values of their smoothed, or ``soft,'' maxima. We obtain upper and lower bounds for these expec…
▽ More
Analysis of extremal behavior of stochastic processes is a key ingredient in a wide variety of applications, including probability, statistical physics, theoretical computer science, and learning theory. In this paper, we consider centered Gaussian processes on finite index sets and investigate expected values of their smoothed, or ``soft,'' maxima. We obtain upper and lower bounds for these expected values using a combination of ideas from statistical physics (the Gibbs variational principle for the equilibrium free energy and replica-symmetric representations of Gibbs averages) and from probability theory (Sudakov minoration). These bounds are parametrized by an inverse temperature $β> 0$ and reduce to the usual Gaussian maximal inequalities in the zero-temperature limit $β\to \infty$. We provide an illustration of our methods in the context of the Random Energy Model, one of the simplest models of physical systems with random disorder.
△ Less
Submitted 10 February, 2025;
originally announced February 2025.
-
A variational approach to sampling in diffusion processes
Authors:
Maxim Raginsky
Abstract:
We revisit the work of Mitter and Newton on an information-theoretic interpretation of Bayes' formula through the Gibbs variational principle. This formulation allowed them to pose nonlinear estimation for diffusion processes as a problem in stochastic optimal control, so that the posterior density of the signal given the observation path could be sampled by adding a drift to the signal process. W…
▽ More
We revisit the work of Mitter and Newton on an information-theoretic interpretation of Bayes' formula through the Gibbs variational principle. This formulation allowed them to pose nonlinear estimation for diffusion processes as a problem in stochastic optimal control, so that the posterior density of the signal given the observation path could be sampled by adding a drift to the signal process. We show that this control-theoretic approach to sampling provides a common mechanism underlying several distinct problems involving diffusion processes, specifically importance sampling using Feynman-Kac averages, time reversal, and Schrödinger bridges.
△ Less
Submitted 30 April, 2024;
originally announced May 2024.
-
Some Remarks on Controllability of the Liouville Equation
Authors:
Maxim Raginsky
Abstract:
We revisit the work of Roger Brockett on controllability of the Liouville equation, with a particular focus on the following problem: Given a smooth controlled dynamical system of the form $\dot{x} = f(x,u)$ and a state-space diffeomorphism $ψ$, design a feedback control $u(t,x)$ to steer an arbitrary initial state $x_0$ to $ψ(x_0)$ in finite time. This formulation of the problem makes contact wit…
▽ More
We revisit the work of Roger Brockett on controllability of the Liouville equation, with a particular focus on the following problem: Given a smooth controlled dynamical system of the form $\dot{x} = f(x,u)$ and a state-space diffeomorphism $ψ$, design a feedback control $u(t,x)$ to steer an arbitrary initial state $x_0$ to $ψ(x_0)$ in finite time. This formulation of the problem makes contact with the theory of optimal transportation and with nonlinear controllability. For controllable linear systems, Brockett showed that this is possible under a fairly restrictive condition on $ψ$. We prove that controllability suffices for a much larger class of diffeomorphisms. For nonlinear systems defined on smooth manifolds, we review a recent result of Agrachev and Caponigro regarding controllability on the group of diffeomorphisms. A corollary of this result states that, for control-affine systems satisfying a bracket generating condition, any $ψ$ in a neighborhood of the identity can be implemented using a time-varying feedback control law that switches between finitely many time-invariant flows. We prove a quantitative version which allows us to describe the implementation complexity of the Agrachev-Caponigro construction in terms of a lower bound on the number of switchings.
△ Less
Submitted 6 December, 2024; v1 submitted 22 April, 2024;
originally announced April 2024.
-
Revisiting Stochastic Realization Theory using Functional Itô Calculus
Authors:
Tanya Veeravalli,
Maxim Raginsky
Abstract:
This paper considers the problem of constructing finite-dimensional state space realizations for stochastic processes that can be represented as the outputs of a certain type of a causal system driven by a continuous semimartingale input process. The main assumption is that the output process is infinitely differentiable, where the notion of differentiability comes from the functional Itô calculus…
▽ More
This paper considers the problem of constructing finite-dimensional state space realizations for stochastic processes that can be represented as the outputs of a certain type of a causal system driven by a continuous semimartingale input process. The main assumption is that the output process is infinitely differentiable, where the notion of differentiability comes from the functional Itô calculus introduced by Dupire as a causal (nonanticipative) counterpart to Malliavin's stochastic calculus of variations. The proposed approach builds on the ideas of Hijab, who had considered the case of processes driven by a Brownian motion, and makes contact with the realization theory of deterministic systems based on formal power series and Chen-Fliess functional expansions.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.
-
Rademacher Complexity of Neural ODEs via Chen-Fliess Series
Authors:
Joshua Hanson,
Maxim Raginsky
Abstract:
We show how continuous-depth neural ODE models can be framed as single-layer, infinite-width nets using the Chen--Fliess series expansion for nonlinear ODEs. In this net, the output ``weights'' are taken from the signature of the control input -- a tool used to represent infinite-dimensional paths as a sequence of tensors -- which comprises iterated integrals of the control input over a simplex. T…
▽ More
We show how continuous-depth neural ODE models can be framed as single-layer, infinite-width nets using the Chen--Fliess series expansion for nonlinear ODEs. In this net, the output ``weights'' are taken from the signature of the control input -- a tool used to represent infinite-dimensional paths as a sequence of tensors -- which comprises iterated integrals of the control input over a simplex. The ``features'' are taken to be iterated Lie derivatives of the output function with respect to the vector fields in the controlled ODE model. The main result of this work applies this framework to derive compact expressions for the Rademacher complexity of ODE models that map an initial condition to a scalar output at some terminal time. The result leverages the straightforward analysis afforded by single-layer architectures. We conclude with some examples instantiating the bound for some specific systems and discuss potential follow-up work.
△ Less
Submitted 20 May, 2024; v1 submitted 29 January, 2024;
originally announced January 2024.
-
Generalization Bounds: Perspectives from Information Theory and PAC-Bayes
Authors:
Fredrik Hellström,
Giuseppe Durisi,
Benjamin Guedj,
Maxim Raginsky
Abstract:
A fundamental question in theoretical machine learning is generalization. Over the past decades, the PAC-Bayesian approach has been established as a flexible framework to address the generalization capabilities of machine learning algorithms, and design new ones. Recently, it has garnered increased interest due to its potential applicability for a variety of learning algorithms, including deep neu…
▽ More
A fundamental question in theoretical machine learning is generalization. Over the past decades, the PAC-Bayesian approach has been established as a flexible framework to address the generalization capabilities of machine learning algorithms, and design new ones. Recently, it has garnered increased interest due to its potential applicability for a variety of learning algorithms, including deep neural networks. In parallel, an information-theoretic view of generalization has developed, wherein the relation between generalization and various information measures has been established. This framework is intimately connected to the PAC-Bayesian approach, and a number of results have been independently discovered in both strands. In this monograph, we highlight this strong connection and present a unified treatment of PAC-Bayesian and information-theoretic generalization bounds. We present techniques and results that the two perspectives have in common, and discuss the approaches and interpretations that differ. In particular, we demonstrate how many proofs in the area share a modular structure, through which the underlying ideas can be intuited. We pay special attention to the conditional mutual information (CMI) framework; analytical studies of the information complexity of learning algorithms; and the application of the proposed methods to deep learning. This monograph is intended to provide a comprehensive introduction to information-theoretic generalization bounds and their connection to PAC-Bayes, serving as a foundation from which the most recent developments are accessible. It is aimed broadly towards researchers with an interest in generalization and theoretical machine learning.
△ Less
Submitted 27 March, 2024; v1 submitted 8 September, 2023;
originally announced September 2023.
-
A Constructive Approach to Function Realization by Neural Stochastic Differential Equations
Authors:
Tanya Veeravalli,
Maxim Raginsky
Abstract:
The problem of function approximation by neural dynamical systems has typically been approached in a top-down manner: Any continuous function can be approximated to an arbitrary accuracy by a sufficiently complex model with a given architecture. This can lead to high-complexity controls which are impractical in applications. In this paper, we take the opposite, constructive approach: We impose var…
▽ More
The problem of function approximation by neural dynamical systems has typically been approached in a top-down manner: Any continuous function can be approximated to an arbitrary accuracy by a sufficiently complex model with a given architecture. This can lead to high-complexity controls which are impractical in applications. In this paper, we take the opposite, constructive approach: We impose various structural restrictions on system dynamics and consequently characterize the class of functions that can be realized by such a system. The systems are implemented as a cascade interconnection of a neural stochastic differential equation (Neural SDE), a deterministic dynamical system, and a readout map. Both probabilistic and geometric (Lie-theoretic) methods are used to characterize the classes of functions realized by such systems.
△ Less
Submitted 21 September, 2023; v1 submitted 30 June, 2023;
originally announced July 2023.
-
Majorizing Measures, Codes, and Information
Authors:
Yifeng Chu,
Maxim Raginsky
Abstract:
The majorizing measure theorem of Fernique and Talagrand is a fundamental result in the theory of random processes. It relates the boundedness of random processes indexed by elements of a metric space to complexity measures arising from certain multiscale combinatorial structures, such as packing and covering trees. This paper builds on the ideas first outlined in a little-noticed preprint of Andr…
▽ More
The majorizing measure theorem of Fernique and Talagrand is a fundamental result in the theory of random processes. It relates the boundedness of random processes indexed by elements of a metric space to complexity measures arising from certain multiscale combinatorial structures, such as packing and covering trees. This paper builds on the ideas first outlined in a little-noticed preprint of Andreas Maurer to present an information-theoretic perspective on the majorizing measure theorem, according to which the boundedness of random processes is phrased in terms of the existence of efficient variable-length codes for the elements of the indexing metric space.
△ Less
Submitted 6 May, 2023; v1 submitted 4 May, 2023;
originally announced May 2023.
-
A Chain Rule for the Expected Suprema of Bernoulli Processes
Authors:
Yifeng Chu,
Maxim Raginsky
Abstract:
We obtain an upper bound on the expected supremum of a Bernoulli process indexed by the image of an index set under a uniformly Lipschitz function class in terms of properties of the index set and the function class, extending an earlier result of Maurer for Gaussian processes. The proof makes essential use of recent results of Bednorz and Latala on the boundedness of Bernoulli processes.
We obtain an upper bound on the expected supremum of a Bernoulli process indexed by the image of an index set under a uniformly Lipschitz function class in terms of properties of the index set and the function class, extending an earlier result of Maurer for Gaussian processes. The proof makes essential use of recent results of Bednorz and Latala on the boundedness of Bernoulli processes.
△ Less
Submitted 27 April, 2023;
originally announced April 2023.
-
Variational Principles for Mirror Descent and Mirror Langevin Dynamics
Authors:
Belinda Tzen,
Anant Raj,
Maxim Raginsky,
Francis Bach
Abstract:
Mirror descent, introduced by Nemirovski and Yudin in the 1970s, is a primal-dual convex optimization method that can be tailored to the geometry of the optimization problem at hand through the choice of a strongly convex potential function. It arises as a basic primitive in a variety of applications, including large-scale optimization, machine learning, and control. This paper proposes a variatio…
▽ More
Mirror descent, introduced by Nemirovski and Yudin in the 1970s, is a primal-dual convex optimization method that can be tailored to the geometry of the optimization problem at hand through the choice of a strongly convex potential function. It arises as a basic primitive in a variety of applications, including large-scale optimization, machine learning, and control. This paper proposes a variational formulation of mirror descent and of its stochastic variant, mirror Langevin dynamics. The main idea, inspired by the classic work of Brezis and Ekeland on variational principles for gradient flows, is to show that mirror descent emerges as a closed-loop solution for a certain optimal control problem, and the Bellman value function is given by the Bregman divergence between the initial condition and the global minimizer of the objective function.
△ Less
Submitted 16 March, 2023;
originally announced March 2023.
-
Nonlinear controllability and function representation by neural stochastic differential equations
Authors:
Tanya Veeravalli,
Maxim Raginsky
Abstract:
There has been a great deal of recent interest in learning and approximation of functions that can be expressed as expectations of a given nonlinearity with respect to its random internal parameters. Examples of such representations include "infinitely wide" neural nets, where the underlying nonlinearity is given by the activation function of an individual neuron. In this paper, we bring this pers…
▽ More
There has been a great deal of recent interest in learning and approximation of functions that can be expressed as expectations of a given nonlinearity with respect to its random internal parameters. Examples of such representations include "infinitely wide" neural nets, where the underlying nonlinearity is given by the activation function of an individual neuron. In this paper, we bring this perspective to function representation by neural stochastic differential equations (SDEs). A neural SDE is an Itô diffusion process whose drift and diffusion matrix are elements of some parametric families. We show that the ability of a neural SDE to realize nonlinear functions of its initial condition can be related to the problem of optimally steering a certain deterministic dynamical system between two given points in finite time. This auxiliary system is obtained by formally replacing the Brownian motion in the SDE by a deterministic control input. We derive upper and lower bounds on the minimum control effort needed to accomplish this steering; these bounds may be of independent interest in the context of motion planning and deterministic optimal control.
△ Less
Submitted 1 December, 2022;
originally announced December 2022.
-
Fitting an immersed submanifold to data via Sussmann's orbit theorem
Authors:
Joshua Hanson,
Maxim Raginsky
Abstract:
This paper describes an approach for fitting an immersed submanifold of a finite-dimensional Euclidean space to random samples. The reconstruction mapping from the ambient space to the desired submanifold is implemented as a composition of an encoder that maps each point to a tuple of (positive or negative) times and a decoder given by a composition of flows along finitely many vector fields start…
▽ More
This paper describes an approach for fitting an immersed submanifold of a finite-dimensional Euclidean space to random samples. The reconstruction mapping from the ambient space to the desired submanifold is implemented as a composition of an encoder that maps each point to a tuple of (positive or negative) times and a decoder given by a composition of flows along finitely many vector fields starting from a fixed initial point. The encoder supplies the times for the flows. The encoder-decoder map is obtained by empirical risk minimization, and a high-probability bound is given on the excess risk relative to the minimum expected reconstruction error over a given class of encoder-decoder maps. The proposed approach makes fundamental use of Sussmann's orbit theorem, which guarantees that the image of the reconstruction map is indeed contained in an immersed submanifold.
△ Less
Submitted 14 September, 2022; v1 submitted 3 April, 2022;
originally announced April 2022.
-
Input-to-State Stable Neural Ordinary Differential Equations with Applications to Transient Modeling of Circuits
Authors:
Alan Yang,
Jie Xiong,
Maxim Raginsky,
Elyse Rosenbaum
Abstract:
This paper proposes a class of neural ordinary differential equations parametrized by provably input-to-state stable continuous-time recurrent neural networks. The model dynamics are defined by construction to be input-to-state stable (ISS) with respect to an ISS-Lyapunov function that is learned jointly with the dynamics. We use the proposed method to learn cheap-to-simulate behavioral models for…
▽ More
This paper proposes a class of neural ordinary differential equations parametrized by provably input-to-state stable continuous-time recurrent neural networks. The model dynamics are defined by construction to be input-to-state stable (ISS) with respect to an ISS-Lyapunov function that is learned jointly with the dynamics. We use the proposed method to learn cheap-to-simulate behavioral models for electronic circuits that can accurately reproduce the behavior of various digital and analog circuits when simulated by a commercial circuit simulator, even when interconnected with circuit components not encountered during training. We also demonstrate the feasibility of learning ISS-preserving perturbations to the dynamics for modeling degradation effects due to circuit aging.
△ Less
Submitted 13 February, 2022;
originally announced February 2022.
-
Minimum Excess Risk in Bayesian Learning
Authors:
Aolin Xu,
Maxim Raginsky
Abstract:
We analyze the best achievable performance of Bayesian learning under generative models by defining and upper-bounding the minimum excess risk (MER): the gap between the minimum expected loss attainable by learning from data and the minimum expected loss that could be achieved if the model realization were known. The definition of MER provides a principled way to define different notions of uncert…
▽ More
We analyze the best achievable performance of Bayesian learning under generative models by defining and upper-bounding the minimum excess risk (MER): the gap between the minimum expected loss attainable by learning from data and the minimum expected loss that could be achieved if the model realization were known. The definition of MER provides a principled way to define different notions of uncertainties in Bayesian learning, including the aleatoric uncertainty and the minimum epistemic uncertainty. Two methods for deriving upper bounds for the MER are presented. The first method, generally suitable for Bayesian learning with a parametric generative model, upper-bounds the MER by the conditional mutual information between the model parameters and the quantity being predicted given the observed data. It allows us to quantify the rate at which the MER decays to zero as more data becomes available. Under realizable models, this method also relates the MER to the richness of the generative function class, notably the VC dimension in binary classification. The second method, particularly suitable for Bayesian learning with a parametric predictive model, relates the MER to the minimum estimation error of the model parameters from data. It explicitly shows how the uncertainty in model parameter estimation translates to the MER and to the final prediction uncertainty. We also extend the definition and analysis of MER to the setting with multiple model families and the setting with nonparametric models. Along the discussions we draw some comparisons between the MER in Bayesian learning and the excess risk in frequentist learning.
△ Less
Submitted 31 December, 2021; v1 submitted 29 December, 2020;
originally announced December 2020.
-
Learning Recurrent Neural Net Models of Nonlinear Systems
Authors:
Joshua Hanson,
Maxim Raginsky,
Eduardo Sontag
Abstract:
We consider the following learning problem: Given sample pairs of input and output signals generated by an unknown nonlinear system (which is not assumed to be causal or time-invariant), we wish to find a continuous-time recurrent neural net with hyperbolic tangent activation function that approximately reproduces the underlying i/o behavior with high confidence. Leveraging earlier work concerned…
▽ More
We consider the following learning problem: Given sample pairs of input and output signals generated by an unknown nonlinear system (which is not assumed to be causal or time-invariant), we wish to find a continuous-time recurrent neural net with hyperbolic tangent activation function that approximately reproduces the underlying i/o behavior with high confidence. Leveraging earlier work concerned with matching output derivatives up to a given finite order, we reformulate the learning problem in familiar system-theoretic language and derive quantitative guarantees on the sup-norm risk of the learned model in terms of the number of neurons, the sample size, the number of derivatives being matched, and the regularity properties of the inputs, the outputs, and the unknown i/o map.
△ Less
Submitted 16 November, 2021; v1 submitted 18 November, 2020;
originally announced November 2020.
-
Function approximation by neural nets in the mean-field regime: Entropic regularization and controlled McKean-Vlasov dynamics
Authors:
Belinda Tzen,
Maxim Raginsky
Abstract:
We consider the problem of function approximation by two-layer neural nets with random weights that are "nearly Gaussian" in the sense of Kullback-Leibler divergence. Our setting is the mean-field limit, where the finite population of neurons in the hidden layer is replaced by a continuous ensemble. We show that the problem can be phrased as global minimization of a free energy functional on the s…
▽ More
We consider the problem of function approximation by two-layer neural nets with random weights that are "nearly Gaussian" in the sense of Kullback-Leibler divergence. Our setting is the mean-field limit, where the finite population of neurons in the hidden layer is replaced by a continuous ensemble. We show that the problem can be phrased as global minimization of a free energy functional on the space of (finite-length) paths over probability measures on the weights. This functional trades off the $L^2$ approximation risk of the terminal measure against the KL divergence of the path with respect to an isotropic Brownian motion prior. We characterize the unique global minimizer and examine the dynamics in the space of probability measures over weights that can achieve it. In particular, we show that the optimal path-space measure corresponds to the Föllmer drift, the solution to a McKean-Vlasov optimal control problem closely related to the classic Schrödinger bridge problem. While the Föllmer drift cannot in general be obtained in closed form, thus limiting its potential algorithmic utility, we illustrate the viability of the mean-field Langevin diffusion as a finite-time approximation under various conditions on entropic regularization. Specifically, we show that it closely tracks the Föllmer drift when the regularization is such that the minimizing density is log-concave.
△ Less
Submitted 22 June, 2024; v1 submitted 5 February, 2020;
originally announced February 2020.
-
Universal Approximation of Input-Output Maps by Temporal Convolutional Nets
Authors:
Joshua Hanson,
Maxim Raginsky
Abstract:
There has been a recent shift in sequence-to-sequence modeling from recurrent network architectures to convolutional network architectures due to computational advantages in training and operation while still achieving competitive performance. For systems having limited long-term temporal dependencies, the approximation capability of recurrent networks is essentially equivalent to that of temporal…
▽ More
There has been a recent shift in sequence-to-sequence modeling from recurrent network architectures to convolutional network architectures due to computational advantages in training and operation while still achieving competitive performance. For systems having limited long-term temporal dependencies, the approximation capability of recurrent networks is essentially equivalent to that of temporal convolutional nets (TCNs). We prove that TCNs can approximate a large class of input-output maps having approximately finite memory to arbitrary error tolerance. Furthermore, we derive quantitative approximation rates for deep ReLU TCNs in terms of the width and depth of the network and modulus of continuity of the original input-output map, and apply these results to input-output maps of systems that admit finite-dimensional state-space realizations (i.e., recurrent models).
△ Less
Submitted 27 October, 2019; v1 submitted 21 June, 2019;
originally announced June 2019.
-
Non-signaling Approximations of Stochastic Team Problems
Authors:
Naci Saldi,
Can Deha Karıksız,
Maxim Raginsky,
Eric Chitambar
Abstract:
In this paper, we consider non-signaling approximation of finite stochastic teams. We first introduce a hierarchy of team decision rules that can be classified in an increasing order as randomized policies, quantum-correlated policies, and non-signaling policies. Then, we establish an approximation of team-optimal policies for sequential teams via extendible non-signaling policies. We prove that t…
▽ More
In this paper, we consider non-signaling approximation of finite stochastic teams. We first introduce a hierarchy of team decision rules that can be classified in an increasing order as randomized policies, quantum-correlated policies, and non-signaling policies. Then, we establish an approximation of team-optimal policies for sequential teams via extendible non-signaling policies. We prove that the distance between extendible non-signaling policies and decentralized policies is small if the extension is sufficiently large. Using this result, we establish a linear programming (LP) approximation of sequential teams. Finally, we state an open problem regarding computation of optimal value of quantum-correlated policies.
△ Less
Submitted 16 June, 2020; v1 submitted 17 May, 2019;
originally announced May 2019.
-
Theoretical guarantees for sampling and inference in generative models with latent diffusions
Authors:
Belinda Tzen,
Maxim Raginsky
Abstract:
We introduce and study a class of probabilistic generative models, where the latent object is a finite-dimensional diffusion process on a finite time interval and the observed variable is drawn conditionally on the terminal point of the diffusion. We make the following contributions:
We provide a unified viewpoint on both sampling and variational inference in such generative models through the l…
▽ More
We introduce and study a class of probabilistic generative models, where the latent object is a finite-dimensional diffusion process on a finite time interval and the observed variable is drawn conditionally on the terminal point of the diffusion. We make the following contributions:
We provide a unified viewpoint on both sampling and variational inference in such generative models through the lens of stochastic control.
We quantify the expressiveness of diffusion-based generative models. Specifically, we show that one can efficiently sample from a wide class of terminal target distributions by choosing the drift of the latent diffusion from the class of multilayer feedforward neural nets, with the accuracy of sampling measured by the Kullback-Leibler divergence to the target distribution.
Finally, we present and analyze a scheme for unbiased simulation of generative models with latent diffusions and provide bounds on the variance of the resulting estimators. This scheme can be implemented as a deep generative model with a random number of layers.
△ Less
Submitted 31 May, 2019; v1 submitted 4 March, 2019;
originally announced March 2019.
-
Discrete-time Risk-sensitive Mean-field Games
Authors:
Naci Saldi,
Tamer Basar,
Maxim Raginsky
Abstract:
In this paper, we study a class of discrete-time mean-field games under the infinite-horizon risk-sensitive discounted-cost optimality criterion. Risk-sensitivity is introduced for each agent (player) via an exponential utility function. In this game model, each agent is coupled with the rest of the population through the empirical distribution of the states, which affects both the agent's individ…
▽ More
In this paper, we study a class of discrete-time mean-field games under the infinite-horizon risk-sensitive discounted-cost optimality criterion. Risk-sensitivity is introduced for each agent (player) via an exponential utility function. In this game model, each agent is coupled with the rest of the population through the empirical distribution of the states, which affects both the agent's individual cost and its state dynamics. Under mild assumptions, we establish the existence of a mean-field equilibrium in the infinite-population limit as the number of agents ($N$) goes to infinity, and then show that the policy obtained from the mean-field equilibrium constitutes an approximate Nash equilibrium when $N$ is sufficiently large.
△ Less
Submitted 4 October, 2018; v1 submitted 12 August, 2018;
originally announced August 2018.
-
Local Optimality and Generalization Guarantees for the Langevin Algorithm via Empirical Metastability
Authors:
Belinda Tzen,
Tengyuan Liang,
Maxim Raginsky
Abstract:
We study the detailed path-wise behavior of the discrete-time Langevin algorithm for non-convex Empirical Risk Minimization (ERM) through the lens of metastability, adopting some techniques from Berglund and Gentz (2003.
For a particular local optimum of the empirical risk, with an arbitrary initialization, we show that, with high probability, at least one of the following two events will occur:…
▽ More
We study the detailed path-wise behavior of the discrete-time Langevin algorithm for non-convex Empirical Risk Minimization (ERM) through the lens of metastability, adopting some techniques from Berglund and Gentz (2003.
For a particular local optimum of the empirical risk, with an arbitrary initialization, we show that, with high probability, at least one of the following two events will occur: (1) the Langevin trajectory ends up somewhere outside the $\varepsilon$-neighborhood of this particular optimum within a short recurrence time; (2) it enters this $\varepsilon$-neighborhood by the recurrence time and stays there until a potentially exponentially long escape time. We call this phenomenon empirical metastability.
This two-timescale characterization aligns nicely with the existing literature in the following two senses. First, the effective recurrence time (i.e., number of iterations multiplied by stepsize) is dimension-independent, and resembles the convergence time of continuous-time deterministic Gradient Descent (GD). However unlike GD, the Langevin algorithm does not require strong conditions on local initialization, and has the possibility of eventually visiting all optima. Second, the scaling of the escape time is consistent with the Eyring-Kramers law, which states that the Langevin scheme will eventually visit all local minima, but it will take an exponentially long time to transit among them. We apply this path-wise concentration result in the context of statistical learning to examine local notions of generalization and optimality.
△ Less
Submitted 5 June, 2018; v1 submitted 18 February, 2018;
originally announced February 2018.
-
Sequential Empirical Coordination Under an Output Entropy Constraint
Authors:
Ehsan Shafieepoorfard,
Maxim Raginsky
Abstract:
This paper considers the problem of sequential empirical coordination, where the objective is to achieve a given value of the expected uniform deviation between state-action empirical averages and statistical expectations under a given strategic probability measure, with respect to a given universal Glivenko-Cantelli class of test functions. A communication constraint is imposed on the Shannon ent…
▽ More
This paper considers the problem of sequential empirical coordination, where the objective is to achieve a given value of the expected uniform deviation between state-action empirical averages and statistical expectations under a given strategic probability measure, with respect to a given universal Glivenko-Cantelli class of test functions. A communication constraint is imposed on the Shannon entropy of the resulting action sequence. It is shown that the fundamental limit on the output entropy is given by the minimum of the mutual information between the state and the action processes under all strategic measures that have the same marginal state process as the target measure and approximate the target measure to desired accuracy with respect to the underlying Glivenko--Cantelli seminorm. The fundamental limit is shown to be asymptotically achievable by tree-structured codes.
△ Less
Submitted 11 June, 2018; v1 submitted 27 October, 2017;
originally announced October 2017.
-
Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis
Authors:
Maxim Raginsky,
Alexander Rakhlin,
Matus Telgarsky
Abstract:
Stochastic Gradient Langevin Dynamics (SGLD) is a popular variant of Stochastic Gradient Descent, where properly scaled isotropic Gaussian noise is added to an unbiased estimate of the gradient at each iteration. This modest change allows SGLD to escape local minima and suffices to guarantee asymptotic convergence to global minimizers for sufficiently regular non-convex objectives (Gelfand and Mit…
▽ More
Stochastic Gradient Langevin Dynamics (SGLD) is a popular variant of Stochastic Gradient Descent, where properly scaled isotropic Gaussian noise is added to an unbiased estimate of the gradient at each iteration. This modest change allows SGLD to escape local minima and suffices to guarantee asymptotic convergence to global minimizers for sufficiently regular non-convex objectives (Gelfand and Mitter, 1991). The present work provides a nonasymptotic analysis in the context of non-convex learning problems, giving finite-time guarantees for SGLD to find approximate minimizers of both empirical and population risks. As in the asymptotic setting, our analysis relates the discrete-time SGLD Markov chain to a continuous-time diffusion process. A new tool that drives the results is the use of weighted transportation cost inequalities to quantify the rate of convergence of SGLD to a stationary distribution in the Euclidean $2$-Wasserstein distance.
△ Less
Submitted 4 June, 2017; v1 submitted 13 February, 2017;
originally announced February 2017.
-
Markov-Nash Equilibria in Mean-Field Games with Discounted Cost
Authors:
Naci Saldi,
Tamer Başar,
Maxim Raginsky
Abstract:
In this paper, we consider discrete-time dynamic games of the mean-field type with a finite number $N$ of agents subject to an infinite-horizon discounted-cost optimality criterion. The state space of each agent is a locally compact Polish space. At each time, the agents are coupled through the empirical distribution of their states, which affects both the agents' individual costs and their state…
▽ More
In this paper, we consider discrete-time dynamic games of the mean-field type with a finite number $N$ of agents subject to an infinite-horizon discounted-cost optimality criterion. The state space of each agent is a locally compact Polish space. At each time, the agents are coupled through the empirical distribution of their states, which affects both the agents' individual costs and their state transition probabilities. We introduce a new solution concept of the Markov-Nash equilibrium, under which a policy is player-by-player optimal in the class of all Markov policies. Under mild assumptions, we demonstrate the existence of a mean-field equilibrium in the infinite-population limit $N \to \infty$, and then show that the policy obtained from the mean-field equilibrium is approximately Markov-Nash when the number of agents $N$ is sufficiently large.
△ Less
Submitted 14 January, 2017; v1 submitted 23 December, 2016;
originally announced December 2016.
-
Information-Theoretic Lower Bounds on Bayes Risk in Decentralized Estimation
Authors:
Aolin Xu,
Maxim Raginsky
Abstract:
We derive lower bounds on the Bayes risk in decentralized estimation, where the estimator does not have direct access to the random samples generated conditionally on the random parameter of interest, but only to the data received from local processors that observe the samples. The received data are subject to communication constraints, due to quantization and the noise in the communication channe…
▽ More
We derive lower bounds on the Bayes risk in decentralized estimation, where the estimator does not have direct access to the random samples generated conditionally on the random parameter of interest, but only to the data received from local processors that observe the samples. The received data are subject to communication constraints, due to quantization and the noise in the communication channels from the processors to the estimator. We first derive general lower bounds on the Bayes risk using information-theoretic quantities, such as mutual information, information density, small ball probability, and differential entropy. We then apply these lower bounds to the decentralized case, using strong data processing inequalities to quantify the contraction of information due to communication constraints. We treat the cases of a single processor and of multiple processors, where the samples observed by different processors may be conditionally dependent given the parameter, for noninteractive and interactive communication protocols. Our results recover and improve recent lower bounds on the Bayes risk and the minimax risk for certain decentralized estimation problems, where previously only conditionally independent sample sets and noiseless channels have been considered. Moreover, our results provide a general way to quantify the degradation of estimation performance caused by distributing resources to multiple processors, which is only discussed for specific examples in existing works.
△ Less
Submitted 2 July, 2016;
originally announced July 2016.
-
Concentration of measure without independence: a unified approach via the martingale method
Authors:
Aryeh Kontorovich,
Maxim Raginsky
Abstract:
The concentration of measure phenomenon may be summarized as follows: a function of many weakly dependent random variables that is not too sensitive to any of its individual arguments will tend to take values very close to its expectation. This phenomenon is most completely understood when the arguments are mutually independent random variables, and there exist several powerful complementary metho…
▽ More
The concentration of measure phenomenon may be summarized as follows: a function of many weakly dependent random variables that is not too sensitive to any of its individual arguments will tend to take values very close to its expectation. This phenomenon is most completely understood when the arguments are mutually independent random variables, and there exist several powerful complementary methods for proving concentration inequalities, such as the martingale method, the entropy method, and the method of transportation inequalities. The setting of dependent arguments is much less well understood. This chapter focuses on the martingale method for deriving concentration inequalities without independence assumptions. In particular, we use the machinery of so-called Wasserstein matrices to show that the Azuma-Hoeffding concentration inequality for martingales with almost surely bounded differences, when applied in a sufficiently abstract setting, is powerful enough to recover and sharpen several known concentration results for nonproduct measures. Wasserstein matrices provide a natural formalism for capturing the interplay between the metric and the probabilistic structures, which is fundamental to the concentration phenomenon.
△ Less
Submitted 17 November, 2016; v1 submitted 1 February, 2016;
originally announced February 2016.
-
Concentration of Measure Inequalities and Their Communication and Information-Theoretic Applications
Authors:
Maxim Raginsky,
Igal Sason
Abstract:
During the last two decades, concentration of measure has been a subject of various exciting developments in convex geometry, functional analysis, statistical physics, high-dimensional statistics, probability theory, information theory, communications and coding theory, computer science, and learning theory. One common theme which emerges in these fields is probabilistic stability: complicated, no…
▽ More
During the last two decades, concentration of measure has been a subject of various exciting developments in convex geometry, functional analysis, statistical physics, high-dimensional statistics, probability theory, information theory, communications and coding theory, computer science, and learning theory. One common theme which emerges in these fields is probabilistic stability: complicated, nonlinear functions of a large number of independent or weakly dependent random variables often tend to concentrate sharply around their expected values. Information theory plays a key role in the derivation of concentration inequalities. Indeed, both the entropy method and the approach based on transportation-cost inequalities are two major information-theoretic paths toward proving concentration.
This brief survey is based on a recent monograph of the authors in the Foundations and Trends in Communications and Information Theory (online available at https://arxiv.boxedpaper.com/pdf/1212.4663v8.pdf), and a tutorial given by the authors at ISIT 2015. It introduces information theorists to three main techniques for deriving concentration inequalities: the martingale method, the entropy method, and the transportation-cost inequalities. Some applications in information theory, communications, and coding theory are used to illustrate the main ideas.
△ Less
Submitted 10 October, 2015;
originally announced October 2015.
-
Coordinate Dual Averaging for Decentralized Online Optimization with Nonseparable Global Objectives
Authors:
Soomin Lee,
Angelia Nedić,
Maxim Raginsky
Abstract:
We consider a decentralized online convex optimization problem in a network of agents, where each agent controls only a coordinate (or a part) of the global decision vector. For such a problem, we propose two decentralized variants (ODA-C and ODA-PS) of Nesterov's primal-dual algorithm with dual averaging. In ODA-C, to mitigate the disagreements on the primal-vector updates, the agents implement a…
▽ More
We consider a decentralized online convex optimization problem in a network of agents, where each agent controls only a coordinate (or a part) of the global decision vector. For such a problem, we propose two decentralized variants (ODA-C and ODA-PS) of Nesterov's primal-dual algorithm with dual averaging. In ODA-C, to mitigate the disagreements on the primal-vector updates, the agents implement a generalization of the local information-exchange dynamics recently proposed by Li and Marden over a static undirected graph. In ODA-PS, the agents implement the broadcast-based push-sum dynamics over a time-varying sequence of uniformly connected digraphs. We show that the regret bounds in both cases have sublinear growth of $O(\sqrt{T})$, with the time horizon $T$, when the stepsize is of the form $1/\sqrt{t}$ and the objective functions are Lipschitz-continuous convex functions with Lipschitz gradients. We also implement the proposed algorithms on a sensor network to complement our theoretical analysis.
△ Less
Submitted 20 May, 2016; v1 submitted 31 August, 2015;
originally announced August 2015.
-
Rationally inattentive control of Markov processes
Authors:
Ehsan Shafieepoorfard,
Maxim Raginsky,
Sean P. Meyn
Abstract:
The article poses a general model for optimal control subject to information constraints, motivated in part by recent work of Sims and others on information-constrained decision-making by economic agents. In the average-cost optimal control framework, the general model introduced in this paper reduces to a variant of the linear-programming representation of the average-cost optimal control problem…
▽ More
The article poses a general model for optimal control subject to information constraints, motivated in part by recent work of Sims and others on information-constrained decision-making by economic agents. In the average-cost optimal control framework, the general model introduced in this paper reduces to a variant of the linear-programming representation of the average-cost optimal control problem, subject to an additional mutual information constraint on the randomized stationary policy. The resulting optimization problem is convex and admits a decomposition based on the Bellman error, which is the object of study in approximate dynamic programming. The theory is illustrated through the example of information-constrained linear-quadratic-Gaussian (LQG) control problem. Some results on the infinite-horizon discounted-cost criterion are also presented.
△ Less
Submitted 23 February, 2016; v1 submitted 12 February, 2015;
originally announced February 2015.
-
Poisson's equation in nonlinear filtering
Authors:
Richard S. Laugesen,
Prashant G. Mehta,
Sean P. Meyn,
Maxim Raginsky
Abstract:
The aim of this paper is to provide a variational interpretation of the nonlinear filter in continuous time. A time-stepping procedure is introduced, consisting of successive minimization problems in the space of probability densities. The weak form of the nonlinear filter is derived via analysis of the first-order optimality conditions for these problems. The derivation shows the nonlinear filter…
▽ More
The aim of this paper is to provide a variational interpretation of the nonlinear filter in continuous time. A time-stepping procedure is introduced, consisting of successive minimization problems in the space of probability densities. The weak form of the nonlinear filter is derived via analysis of the first-order optimality conditions for these problems. The derivation shows the nonlinear filter dynamics may be regarded as a gradient flow, or a steepest descent, for a certain energy functional with respect to the Kullback-Leibler divergence.
The second part of the paper is concerned with derivation of the feedback particle filter algorithm, based again on the analysis of the first variation. The algorithm is shown to be exact. That is, the posterior distribution of the particle matches exactly the true posterior, provided the filter is initialized with the true prior.
△ Less
Submitted 18 December, 2014;
originally announced December 2014.
-
Strong data processing inequalities and $Φ$-Sobolev inequalities for discrete channels
Authors:
Maxim Raginsky
Abstract:
The noisiness of a channel can be measured by comparing suitable functionals of the input and output distributions. For instance, the worst-case ratio of output relative entropy to input relative entropy for all possible pairs of input distributions is bounded from above by unity, by the data processing theorem. However, for a fixed reference input distribution, this quantity may be strictly small…
▽ More
The noisiness of a channel can be measured by comparing suitable functionals of the input and output distributions. For instance, the worst-case ratio of output relative entropy to input relative entropy for all possible pairs of input distributions is bounded from above by unity, by the data processing theorem. However, for a fixed reference input distribution, this quantity may be strictly smaller than one, giving so-called strong data processing inequalities (SDPIs). The same considerations apply to an arbitrary $Φ$-divergence. This paper presents a systematic study of optimal constants in SDPIs for discrete channels, including their variational characterizations, upper and lower bounds, structural results for channels on product probability spaces, and the relationship between SDPIs and so-called $Φ$-Sobolev inequalities (another class of inequalities that can be used to quantify the noisiness of a channel by controlling entropy-like functionals of the input distribution by suitable measures of input-output correlation). Several applications to information theory, discrete probability, and statistical physics are discussed.
△ Less
Submitted 30 March, 2016; v1 submitted 13 November, 2014;
originally announced November 2014.
-
Online Markov decision processes with Kullback-Leibler control cost
Authors:
Peng Guan,
Maxim Raginsky,
Rebecca Willett
Abstract:
This paper considers an online (real-time) control problem that involves an agent performing a discrete-time random walk over a finite state space. The agent's action at each time step is to specify the probability distribution for the next state given the current state. Following the set-up of Todorov, the state-action cost at each time step is a sum of a state cost and a control cost given by th…
▽ More
This paper considers an online (real-time) control problem that involves an agent performing a discrete-time random walk over a finite state space. The agent's action at each time step is to specify the probability distribution for the next state given the current state. Following the set-up of Todorov, the state-action cost at each time step is a sum of a state cost and a control cost given by the Kullback-Leibler (KL) divergence between the agent's next-state distribution and that determined by some fixed passive dynamics. The online aspect of the problem is due to the fact that the state cost functions are generated by a dynamic environment, and the agent learns the current state cost only after selecting an action. An explicit construction of a computationally efficient strategy with small regret (i.e., expected difference between its actual total cost and the smallest cost attainable using noncausal knowledge of the state costs) under mild regularity conditions is presented, along with a demonstration of the performance of the proposed strategy on a simulated target tracking problem. A number of new results on Markov decision processes with KL control cost are also obtained.
△ Less
Submitted 14 January, 2014;
originally announced January 2014.
-
Relax but stay in control: from value to algorithms for online Markov decision processes
Authors:
Peng Guan,
Maxim Raginsky,
Rebecca Willett
Abstract:
Online learning algorithms are designed to perform in non-stationary environments, but generally there is no notion of a dynamic state to model constraints on current and future actions as a function of past actions. State-based models are common in stochastic control settings, but commonly used frameworks such as Markov Decision Processes (MDPs) assume a known stationary environment. In recent ye…
▽ More
Online learning algorithms are designed to perform in non-stationary environments, but generally there is no notion of a dynamic state to model constraints on current and future actions as a function of past actions. State-based models are common in stochastic control settings, but commonly used frameworks such as Markov Decision Processes (MDPs) assume a known stationary environment. In recent years, there has been a growing interest in combining the above two frameworks and considering an MDP setting in which the cost function is allowed to change arbitrarily after each time step. However, most of the work in this area has been algorithmic: given a problem, one would develop an algorithm almost from scratch. Moreover, the presence of the state and the assumption of an arbitrarily varying environment complicate both the theoretical analysis and the development of computationally efficient methods. This paper describes a broad extension of the ideas proposed by Rakhlin et al. to give a general framework for deriving algorithms in an MDP setting with arbitrarily changing costs. This framework leads to a unifying view of existing methods and provides a general procedure for constructing new ones. Several new methods are presented, and one of them is shown to have important advantages over a similar method developed from scratch via an online version of approximate dynamic programming.
△ Less
Submitted 31 August, 2015; v1 submitted 27 October, 2013;
originally announced October 2013.
-
Online discrete optimization in social networks in the presence of Knightian uncertainty
Authors:
Maxim Raginsky,
Angelia Nedić
Abstract:
We study a model of collective real-time decision-making (or learning) in a social network operating in an uncertain environment, for which no a priori probabilistic model is available. Instead, the environment's impact on the agents in the network is seen through a sequence of cost functions, revealed to the agents in a causal manner only after all the relevant actions are taken. There are two ki…
▽ More
We study a model of collective real-time decision-making (or learning) in a social network operating in an uncertain environment, for which no a priori probabilistic model is available. Instead, the environment's impact on the agents in the network is seen through a sequence of cost functions, revealed to the agents in a causal manner only after all the relevant actions are taken. There are two kinds of costs: individual costs incurred by each agent and local-interaction costs incurred by each agent and its neighbors in the social network. Moreover, agents have inertia: each agent has a default mixed strategy that stays fixed regardless of the state of the environment, and must expend effort to deviate from this strategy in order to respond to cost signals coming from the environment. We construct a decentralized strategy, wherein each agent selects its action based only on the costs directly affecting it and on the decisions made by its neighbors in the network. In this setting, we quantify social learning in terms of regret, which is given by the difference between the realized network performance over a given time horizon and the best performance that could have been achieved in hindsight by a fictitious centralized entity with full knowledge of the environment's evolution. We show that our strategy achieves the regret that scales polylogarithmically with the time horizon and polynomially with the number of agents and the maximum number of neighbors of any agent in the social network.
△ Less
Submitted 28 January, 2015; v1 submitted 1 July, 2013;
originally announced July 2013.
-
Concentration of Measure Inequalities in Information Theory, Communications and Coding (Second Edition)
Authors:
Maxim Raginsky,
Igal Sason
Abstract:
During the last two decades, concentration inequalities have been the subject of exciting developments in various areas, including convex geometry, functional analysis, statistical physics, high-dimensional statistics, pure and applied probability theory, information theory, theoretical computer science, and learning theory. This monograph focuses on some of the key modern mathematical tools that…
▽ More
During the last two decades, concentration inequalities have been the subject of exciting developments in various areas, including convex geometry, functional analysis, statistical physics, high-dimensional statistics, pure and applied probability theory, information theory, theoretical computer science, and learning theory. This monograph focuses on some of the key modern mathematical tools that are used for the derivation of concentration inequalities, on their links to information theory, and on their various applications to communications and coding. In addition to being a survey, this monograph also includes various new recent results derived by the authors. The first part of the monograph introduces classical concentration inequalities for martingales, as well as some recent refinements and extensions. The power and versatility of the martingale approach is exemplified in the context of codes defined on graphs and iterative decoding algorithms, as well as codes for wireless communication. The second part of the monograph introduces the entropy method, an information-theoretic technique for deriving concentration inequalities. The basic ingredients of the entropy method are discussed first in the context of logarithmic Sobolev inequalities, which underlie the so-called functional approach to concentration of measure, and then from a complementary information-theoretic viewpoint based on transportation-cost inequalities and probability in metric spaces. Some representative results on concentration for dependent random variables are briefly summarized, with emphasis on their connections to the entropy method. Finally, we discuss several applications of the entropy method to problems in communications and coding, including strong converses, empirical distributions of good channel codes, and an information-theoretic converse for concentration of measure.
△ Less
Submitted 24 February, 2015; v1 submitted 19 December, 2012;
originally announced December 2012.
-
A recursive procedure for density estimation on the binary hypercube
Authors:
Maxim Raginsky,
Jorge Silva,
Svetlana Lazebnik,
Rebecca Willett
Abstract:
This paper describes a recursive estimation procedure for multivariate binary densities (probability distributions of vectors of Bernoulli random variables) using orthogonal expansions. For $d$ covariates, there are $2^d$ basis coefficients to estimate, which renders conventional approaches computationally prohibitive when $d$ is large. However, for a wide class of densities that satisfy a certain…
▽ More
This paper describes a recursive estimation procedure for multivariate binary densities (probability distributions of vectors of Bernoulli random variables) using orthogonal expansions. For $d$ covariates, there are $2^d$ basis coefficients to estimate, which renders conventional approaches computationally prohibitive when $d$ is large. However, for a wide class of densities that satisfy a certain sparsity condition, our estimator runs in probabilistic polynomial time and adapts to the unknown sparsity of the underlying density in two key ways: (1) it attains near-minimax mean-squared error for moderate sample sizes, and (2) the computational complexity is lower for sparser densities. Our method also allows for flexible control of the trade-off between mean-squared error and computational complexity.
△ Less
Submitted 29 November, 2012; v1 submitted 6 December, 2011;
originally announced December 2011.
-
Target Detection Performance Bounds in Compressive Imaging
Authors:
Kalyani Krishnamurthy,
Rebecca Willett,
Maxim Raginsky
Abstract:
This paper describes computationally efficient approaches and associated theoretical performance guarantees for the detection of known targets and anomalies from few projection measurements of the underlying signals. The proposed approaches accommodate signals of different strengths contaminated by a colored Gaussian background, and perform detection without reconstructing the underlying signals f…
▽ More
This paper describes computationally efficient approaches and associated theoretical performance guarantees for the detection of known targets and anomalies from few projection measurements of the underlying signals. The proposed approaches accommodate signals of different strengths contaminated by a colored Gaussian background, and perform detection without reconstructing the underlying signals from the observations. The theoretical performance bounds of the target detector highlight fundamental tradeoffs among the number of measurements collected, amount of background signal present, signal-to-noise ratio, and similarity among potential targets coming from a known dictionary. The anomaly detector is designed to control the number of false discoveries. The proposed approach does not depend on a known sparse representation of targets; rather, the theoretical performance bounds exploit the structure of a known dictionary of targets and the distance preservation property of the measurement matrix. Simulation experiments illustrate the practicality and effectiveness of the proposed approaches.
△ Less
Submitted 14 August, 2012; v1 submitted 2 December, 2011;
originally announced December 2011.
-
Divergence-based characterization of fundamental limitations of adaptive dynamical systems
Authors:
Maxim Raginsky
Abstract:
Adaptive dynamical systems arise in a multitude of contexts, e.g., optimization, control, communications, signal processing, and machine learning. A precise characterization of their fundamental limitations is therefore of paramount importance. In this paper, we consider the general problem of adaptively controlling and/or identifying a stochastic dynamical system, where our {\em a priori} knowled…
▽ More
Adaptive dynamical systems arise in a multitude of contexts, e.g., optimization, control, communications, signal processing, and machine learning. A precise characterization of their fundamental limitations is therefore of paramount importance. In this paper, we consider the general problem of adaptively controlling and/or identifying a stochastic dynamical system, where our {\em a priori} knowledge allows us to place the system in a subset of a metric space (the uncertainty set). We present an information-theoretic meta-theorem that captures the trade-off between the metric complexity (or richness) of the uncertainty set, the amount of information acquired online in the process of controlling and observing the system, and the residual uncertainty remaining after the observations have been collected. Following the approach of Zames, we quantify {\em a priori} information by the Kolmogorov (metric) entropy of the uncertainty set, while the information acquired online is expressed as a sum of information divergences. The general theory is used to derive new minimax lower bounds on the metric identification error, as well as to give a simple derivation of the minimum time needed to stabilize an uncertain stochastic linear system.
△ Less
Submitted 11 October, 2010;
originally announced October 2010.
-
Information-based complexity, feedback and dynamics in convex programming
Authors:
Maxim Raginsky,
Alexander Rakhlin
Abstract:
We study the intrinsic limitations of sequential convex optimization through the lens of feedback information theory. In the oracle model of optimization, an algorithm queries an {\em oracle} for noisy information about the unknown objective function, and the goal is to (approximately) minimize every function in a given class using as few queries as possible. We show that, in order for a function…
▽ More
We study the intrinsic limitations of sequential convex optimization through the lens of feedback information theory. In the oracle model of optimization, an algorithm queries an {\em oracle} for noisy information about the unknown objective function, and the goal is to (approximately) minimize every function in a given class using as few queries as possible. We show that, in order for a function to be optimized, the algorithm must be able to accumulate enough information about the objective. This, in turn, puts limits on the speed of optimization under specific assumptions on the oracle and the type of feedback. Our techniques are akin to the ones used in statistical literature to obtain minimax lower bounds on the risks of estimation procedures; the notable difference is that, unlike in the case of i.i.d. data, a sequential optimization algorithm can gather observations in a {\em controlled} manner, so that the amount of information at each step is allowed to change in time. In particular, we show that optimization algorithms often obey the law of diminishing returns: the signal-to-noise ratio drops as the optimization algorithm approaches the optimum. To underscore the generality of the tools, we use our approach to derive fundamental lower bounds for a certain active learning problem. Overall, the present work connects the intuitive notions of information in optimization, experimental design, estimation, and active learning to the quantitative notion of Shannon information.
△ Less
Submitted 9 September, 2011; v1 submitted 11 October, 2010;
originally announced October 2010.
-
Operational distance and fidelity for quantum channels
Authors:
Viacheslav P. Belavkin,
Giacomo Mauro D'Ariano,
Maxim Raginsky
Abstract:
We define and study a fidelity criterion for quantum channels, which we term the minimax fidelity, through a noncommutative generalization of maximal Hellinger distance between two positive kernels in classical probability theory. Like other known fidelities for quantum channels, the minimax fidelity is well-defined for channels between finite-dimensional algebras, but it also applies to a certa…
▽ More
We define and study a fidelity criterion for quantum channels, which we term the minimax fidelity, through a noncommutative generalization of maximal Hellinger distance between two positive kernels in classical probability theory. Like other known fidelities for quantum channels, the minimax fidelity is well-defined for channels between finite-dimensional algebras, but it also applies to a certain class of channels between infinite-dimensional algebras (explicitly, those channels that possess an operator-valued Radon--Nikodym density with respect to the trace in the sense of Belavkin--Staszewski) and induces a metric on the set of quantum channels which is topologically equivalent to the CB-norm distance between channels, precisely in the same way as the Bures metric on the density operators associated with statistical states of quantum-mechanical systems, derived from the well-known fidelity (`generalized transition probability') of Uhlmann, is topologically equivalent to the trace-norm distance.
△ Less
Submitted 18 January, 2005; v1 submitted 25 August, 2004;
originally announced August 2004.
-
A Phase Transition and Stochastic Domination in Pippenger's Probabilistic Failure Model for Boolean Networks with Unreliable Gates
Authors:
Maxim Raginsky
Abstract:
We study Pippenger's model of Boolean networks with unreliable gates. In this model, the conditional probability that a particular gate fails, given the failure status of any subset of gates preceding it in the network, is bounded from above by some $ε$. We show that if we pick a Boolean network with $n$ gates at random according to the Barak-Erdős model of a random acyclic digraph, such that th…
▽ More
We study Pippenger's model of Boolean networks with unreliable gates. In this model, the conditional probability that a particular gate fails, given the failure status of any subset of gates preceding it in the network, is bounded from above by some $ε$. We show that if we pick a Boolean network with $n$ gates at random according to the Barak-Erdős model of a random acyclic digraph, such that the expected edge density is $c n^{-1}\log n$, and if $ε$ is equal to a certain function of the size of the largest reflexive, transitive closure of a vertex (with respect to a particular realization of the random digraph), then Pippenger's model exhibits a phase transition at $c=1$. Namely, with probability $1-o(1)$ as $n\to\infty$, we have the following: for $0 \le c \le 1$, the minimum of the probability that no gate has failed, taken over all probability distributions of gate failures consistent with Pippenger's model, is equal to $o(1)$, whereas for $c >1$ it is equal to $\exp(-\frac{c}{e(c-1)}) + o(1)$. We also indicate how a more refined analysis of Pippenger's model, e.g., for the purpose of estimating probabilities of monotone events, can be carried out using the machinery of stochastic domination.
△ Less
Submitted 24 November, 2003; v1 submitted 4 November, 2003;
originally announced November 2003.
-
Radon-Nikodym derivatives of quantum operations
Authors:
Maxim Raginsky
Abstract:
Given a completely positive (CP) map $T$, there is a theorem of the Radon-Nikodym type [W.B. Arveson, Acta Math. {\bf 123}, 141 (1969); V.P. Belavkin and P. Staszewski, Rep. Math. Phys. {\bf 24}, 49 (1986)] that completely characterizes all CP maps $S$ such that $T-S$ is also a CP map. This theorem is reviewed, and several alternative formulations are given along the way. We then use the Radon-N…
▽ More
Given a completely positive (CP) map $T$, there is a theorem of the Radon-Nikodym type [W.B. Arveson, Acta Math. {\bf 123}, 141 (1969); V.P. Belavkin and P. Staszewski, Rep. Math. Phys. {\bf 24}, 49 (1986)] that completely characterizes all CP maps $S$ such that $T-S$ is also a CP map. This theorem is reviewed, and several alternative formulations are given along the way. We then use the Radon-Nikodym formalism to study the structure of order intervals of quantum operations, as well as a certain one-to-one correspondence between CP maps and positive operators, already fruitfully exploited in many quantum information-theoretic treatments. We also comment on how the Radon-Nikodym theorem can be used to derive norm estimates for differences of CP maps in general, and of quantum operations in particular.
△ Less
Submitted 15 September, 2003; v1 submitted 25 March, 2003;
originally announced March 2003.
-
Entropy production rates of bistochastic strictly contractive quantum channels on a matrix algebra
Authors:
Maxim Raginsky
Abstract:
We derive, for a bistochastic strictly contractive quantum channel on a matrix algebra, a relation between the contraction rate and the rate of entropy production. We also sketch some applications of our result to the statistical physics of irreversible processes and to quantum information processing.
We derive, for a bistochastic strictly contractive quantum channel on a matrix algebra, a relation between the contraction rate and the rate of entropy production. We also sketch some applications of our result to the statistical physics of irreversible processes and to quantum information processing.
△ Less
Submitted 5 September, 2002; v1 submitted 29 July, 2002;
originally announced July 2002.