-
Information Theory for Expectation Measures
Authors:
Peter Harremoës
Abstract:
Shannon based his information theory on the notion of probability measures as it we developed by Kolmogorov. In this paper we study some fundamental problems in information theory based on expectation measures. In the theory of expectation measures it is natural to study data sets where no randomness is present and it is also natural to study information theory for point processes as well as sampl…
▽ More
Shannon based his information theory on the notion of probability measures as it we developed by Kolmogorov. In this paper we study some fundamental problems in information theory based on expectation measures. In the theory of expectation measures it is natural to study data sets where no randomness is present and it is also natural to study information theory for point processes as well as sampling where the sample size is not fixed. Expectation measures in combination with Kraft's Inequality can be used to clarify in which cases probability measures can be used to quantify randomness.
△ Less
Submitted 29 January, 2025;
originally announced January 2025.
-
Reverse Information Projections and Optimal E-statistics
Authors:
Tyron Lardy,
Peter Grünwald,
Peter Harremoës
Abstract:
Information projections have found important applications in probability theory, statistics, and related areas. In the field of hypothesis testing in particular, the reverse information projection (RIPr) has recently been shown to lead to growth-rate optimal (GRO) e-statistics for testing simple alternatives against composite null hypotheses. However, the RIPr as well as the GRO criterion are unde…
▽ More
Information projections have found important applications in probability theory, statistics, and related areas. In the field of hypothesis testing in particular, the reverse information projection (RIPr) has recently been shown to lead to growth-rate optimal (GRO) e-statistics for testing simple alternatives against composite null hypotheses. However, the RIPr as well as the GRO criterion are undefined whenever the infimum information divergence between the null and alternative is infinite. We show that in such scenarios, under some assumptions, there still exists a measure in the null that is closest to the alternative in a specific sense. Whenever the information divergence is finite, this measure coincides with the usual RIPr. It therefore gives a natural extension of the RIPr to certain cases where the latter was previously not defined. This extended notion of the RIPr is shown to lead to optimal e-statistics in a sense that is a novel, but natural, extension of the GRO criterion. We also give conditions under which the (extension of the) RIPr is a strict sub-probability measure, as well as conditions under which an approximation of the RIPr leads to approximate e-statistics. For this case we provide tight relations between the corresponding approximation rates.
△ Less
Submitted 30 July, 2024; v1 submitted 28 June, 2023;
originally announced June 2023.
-
Unnormalized Measures in Information Theory
Authors:
Peter Harremoës
Abstract:
Information theory is built on probability measures and by definition a probability measure has total mass 1. Probability measures are used to model uncertainty, and one may ask how important it is that the total mass is one. We claim that the main reason to normalize measures is that probability measures are related to codes via Kraft's inequality. Using a minimum description length approach to s…
▽ More
Information theory is built on probability measures and by definition a probability measure has total mass 1. Probability measures are used to model uncertainty, and one may ask how important it is that the total mass is one. We claim that the main reason to normalize measures is that probability measures are related to codes via Kraft's inequality. Using a minimum description length approach to statistics we will demonstrate with that measures that are not normalized require a new interpretation that we will call the Poisson interpretation. With the Poisson interpretation many problems can be simplified. The focus will shift from from probabilities to mean values. We give examples of improvements of test procedures, improved inequalities, simplified algorithms, new projection results, and improvements in our description of quantum systems.
△ Less
Submitted 5 February, 2022;
originally announced February 2022.
-
Rate Distortion Theory for Descriptive Statistics
Authors:
Peter Harremoës
Abstract:
Rate distortion theory was developed for optimizing lossy compression of data, but it also has a lot of applications in statistics. In this paper we will see how rate distortion theory can be used to analyze a complicated data set involving orientations of early Islamic mosques. The analysis involves testing, identification of outliers, choice of compression rate, calculation of optimal reconstruc…
▽ More
Rate distortion theory was developed for optimizing lossy compression of data, but it also has a lot of applications in statistics. In this paper we will see how rate distortion theory can be used to analyze a complicated data set involving orientations of early Islamic mosques. The analysis involves testing, identification of outliers, choice of compression rate, calculation of optimal reconstruction points, and assigning "descriptive confidence regions" to the reconstruction points. In this paper the focus will be on the methods, so the integrity of the data set and the interpretation of the results will not be discussed.
△ Less
Submitted 16 February, 2022; v1 submitted 10 January, 2022;
originally announced January 2022.
-
Bounds on the Information Divergence for Hypergeometric Distributions
Authors:
Peter Harremoës,
František Matúš
Abstract:
The hypergeometric distributions have many important applications, but they have not had sufficient attention in information theory. Hypergeometric distributions can be approximated by binomial distributions or Poisson distributions. In this paper we present upper and lower bounds on information divergence. These bounds are important for statistical testing and a better understanding of the notion…
▽ More
The hypergeometric distributions have many important applications, but they have not had sufficient attention in information theory. Hypergeometric distributions can be approximated by binomial distributions or Poisson distributions. In this paper we present upper and lower bounds on information divergence. These bounds are important for statistical testing and a better understanding of the notion of exchange-ability.
△ Less
Submitted 7 February, 2020;
originally announced February 2020.
-
Statistical Inference and Exact Saddle Point Approximations
Authors:
Peter Harremoës
Abstract:
Statistical inference may follow a frequentist approach or it may follow a Bayesian approach or it may use the minimum description length principle (MDL). Our goal is to identify situations in which these different approaches to statistical inference coincide. It is proved that for exponential families MDL and Bayesian inference coincide if and only if the renormalized saddle point approximation f…
▽ More
Statistical inference may follow a frequentist approach or it may follow a Bayesian approach or it may use the minimum description length principle (MDL). Our goal is to identify situations in which these different approaches to statistical inference coincide. It is proved that for exponential families MDL and Bayesian inference coincide if and only if the renormalized saddle point approximation for the conjugated exponential family is exact. For 1-dimensional exponential families the only families with exact renormalized saddle point approximations are the Gaussian location family, the Gamma family and the inverse Gaussian family. They are conjugated families of the Gaussian location family, the Gamma family and the Poisson-exponential family. The first two families are self-conjugated implying that only for the two first families the Bayesian approach is consistent with the frequentist approach. In higher dimensions there are more examples.
△ Less
Submitted 6 May, 2018;
originally announced May 2018.
-
Quantum Information on Spectral Sets
Authors:
Peter Harremoës
Abstract:
For convex optimization problems Bregman divergences appear as regret functions. Such regret functions can be defined on any convex set but if a sufficiency condition is added the regret function must be proportional to information divergence and the convex set must be spectral. Spectral set are sets where different orthogonal decompositions of a state into pure states have unique mixing coefficie…
▽ More
For convex optimization problems Bregman divergences appear as regret functions. Such regret functions can be defined on any convex set but if a sufficiency condition is added the regret function must be proportional to information divergence and the convex set must be spectral. Spectral set are sets where different orthogonal decompositions of a state into pure states have unique mixing coefficients. Only on such spectral sets it is possible to define well behaved information theoretic quantities like entropy and divergence. It is only possible to perform measurements in a reversible way if the state space is spectral. The most important spectral sets can be represented as positive elements of Jordan algebras with trace 1. This means that Jordan algebras provide a natural framework for studying quantum information. We compare information theory on Hilbert spaces with information theory in more general Jordan algebras, and conclude that much of the formalism is unchanged but also identify some important differences.
△ Less
Submitted 10 February, 2017; v1 submitted 23 January, 2017;
originally announced January 2017.
-
Divergence and Sufficiency for Convex Optimization
Authors:
Peter Harremoës
Abstract:
Logarithmic score and information divergence appear in information theory, statistics, statistical mechanics, and portfolio theory. We demonstrate that all these topics involve some kind of optimization that leads directly to regret functions and such regret functions are often given by a Bregman divergence. If the regret function also fulfills a sufficiency condition it must be proportional to in…
▽ More
Logarithmic score and information divergence appear in information theory, statistics, statistical mechanics, and portfolio theory. We demonstrate that all these topics involve some kind of optimization that leads directly to regret functions and such regret functions are often given by a Bregman divergence. If the regret function also fulfills a sufficiency condition it must be proportional to information divergence. We will demonstrate that sufficiency is equivalent to the apparently weaker notion of locality and it is also equivalent to the apparently stronger notion of monotonicity. These sufficiency conditions have quite different relevance in the different areas of application, and often they are not fulfilled. Therefore sufficiency conditions can be used to explain when results from one area can be transferred directly to another and when one will experience differences.
△ Less
Submitted 10 April, 2017; v1 submitted 4 January, 2017;
originally announced January 2017.
-
Maximum Entropy and Sufficiency
Authors:
Peter Harremoës
Abstract:
The notion of Bregman divergence and sufficiency will be defined on general convex state spaces. It is demonstrated that only spectral sets can have a Bregman divergence that satisfies a sufficiency condition. Positive elements with trace 1 in a Jordan algebra are examples of spectral sets, and the most important example is the set of density matrices with complex entries. It is conjectured that i…
▽ More
The notion of Bregman divergence and sufficiency will be defined on general convex state spaces. It is demonstrated that only spectral sets can have a Bregman divergence that satisfies a sufficiency condition. Positive elements with trace 1 in a Jordan algebra are examples of spectral sets, and the most important example is the set of density matrices with complex entries. It is conjectured that information theoretic considerations lead directly to the notion of Jordan algebra under some regularity conditions.
△ Less
Submitted 3 September, 2016; v1 submitted 8 July, 2016;
originally announced July 2016.
-
Sufficiency on the Stock Market
Authors:
Peter Harremoës
Abstract:
It is well-known that there are a number of relations between theoretical finance theory and information theory. Some of these relations are exact and some are approximate. In this paper we will explore some of these relations and determine under which conditions the relations are exact. It turns out that portfolio theory always leads to Bregman divergences. The Bregman divergence is only proporti…
▽ More
It is well-known that there are a number of relations between theoretical finance theory and information theory. Some of these relations are exact and some are approximate. In this paper we will explore some of these relations and determine under which conditions the relations are exact. It turns out that portfolio theory always leads to Bregman divergences. The Bregman divergence is only proportional to information divergence in situations that are essentially equal to the type of gambling studied by Kelly. This can be related an abstract sufficiency condition.
△ Less
Submitted 27 January, 2016;
originally announced January 2016.
-
Lattices with non-Shannon Inequalities
Authors:
Peter Harremoës
Abstract:
We study the existence or absence of non-Shannon inequalities for variables that are related by functional dependencies. Although the power-set on four variables is the smallest Boolean lattice with non-Shannon inequalities there exist lattices with many more variables without non-Shannon inequalities. We search for conditions that ensures that no non-Shannon inequalities exist. It is demonstrated…
▽ More
We study the existence or absence of non-Shannon inequalities for variables that are related by functional dependencies. Although the power-set on four variables is the smallest Boolean lattice with non-Shannon inequalities there exist lattices with many more variables without non-Shannon inequalities. We search for conditions that ensures that no non-Shannon inequalities exist. It is demonstrated that 3-dimensional distributive lattices cannot have non-Shannon inequalities and planar modular lattices cannot have non-Shannon inequalities. The existence of non-Shannon inequalities is related to the question of whether a lattice is isomorphic to a lattice of subgroups of a group.
△ Less
Submitted 15 February, 2015;
originally announced February 2015.
-
Mutual information of Contingency Tables and Related Inequalities
Authors:
Peter Harremoës
Abstract:
For testing independence it is very popular to use either the $χ^{2}$-statistic or $G^{2}$-statistics (mutual information). Asymptotically both are $χ^{2}$-distributed so an obvious question is which of the two statistics that has a distribution that is closest to the $χ^{2}$-distribution. Surprisingly the distribution of mutual information is much better approximated by a $χ^{2}$-distribution tha…
▽ More
For testing independence it is very popular to use either the $χ^{2}$-statistic or $G^{2}$-statistics (mutual information). Asymptotically both are $χ^{2}$-distributed so an obvious question is which of the two statistics that has a distribution that is closest to the $χ^{2}$-distribution. Surprisingly the distribution of mutual information is much better approximated by a $χ^{2}$-distribution than the $χ^{2}$-statistic. For technical reasons we shall focus on the simplest case with one degree of freedom. We introduce the signed log-likelihood and demonstrate that its distribution function can be related to the distribution function of a standard Gaussian by inequalities. For the hypergeometric distribution we formulate a general conjecture about how close the signed log-likelihood is to a standard Gaussian, and this conjecture gives much more accurate estimates of the tail probabilities of this type of distribution than previously published results. The conjecture has been proved numerically in all cases relevant for testing independence and further evidence of its validity is given.
△ Less
Submitted 1 February, 2014;
originally announced February 2014.
-
Horizon-Independent Optimal Prediction with Log-Loss in Exponential Families
Authors:
Peter Bartlett,
Peter Grunwald,
Peter Harremoes,
Fares Hedayati,
Wojciech Kotlowski
Abstract:
We study online learning under logarithmic loss with regular parametric models. Hedayati and Bartlett (2012b) showed that a Bayesian prediction strategy with Jeffreys prior and sequential normalized maximum likelihood (SNML) coincide and are optimal if and only if the latter is exchangeable, and if and only if the optimal strategy can be calculated without knowing the time horizon in advance. They…
▽ More
We study online learning under logarithmic loss with regular parametric models. Hedayati and Bartlett (2012b) showed that a Bayesian prediction strategy with Jeffreys prior and sequential normalized maximum likelihood (SNML) coincide and are optimal if and only if the latter is exchangeable, and if and only if the optimal strategy can be calculated without knowing the time horizon in advance. They put forward the question what families have exchangeable SNML strategies. This paper fully answers this open problem for one-dimensional exponential families. The exchangeability can happen only for three classes of natural exponential family distributions, namely the Gaussian, Gamma, and the Tweedie exponential family of order 3/2. Keywords: SNML Exchangeability, Exponential Family, Online Learning, Logarithmic Loss, Bayesian Strategy, Jeffreys Prior, Fisher Information1
△ Less
Submitted 19 May, 2013;
originally announced May 2013.
-
Extendable MDL
Authors:
Peter Harremoës
Abstract:
In this paper we show that combination of the minimum description length principle and a exchange-ability condition leads directly to the use of Jeffreys prior. This approach works in most cases even when Jeffreys prior cannot be normalized. Kraft's inequality links codes and distributions but a closer look at this inequality demonstrates that this link only makes sense when sequences are consider…
▽ More
In this paper we show that combination of the minimum description length principle and a exchange-ability condition leads directly to the use of Jeffreys prior. This approach works in most cases even when Jeffreys prior cannot be normalized. Kraft's inequality links codes and distributions but a closer look at this inequality demonstrates that this link only makes sense when sequences are considered as prefixes of potential longer sequences. For technical reasons only results for exponential families are stated. Results on when Jeffreys prior can be normalized after conditioning on a initializing string are given. An exotic case where no initial string allow Jeffreys prior to be normalized is given and some way of handling such exotic cases are discussed.
△ Less
Submitted 19 May, 2013; v1 submitted 28 January, 2013;
originally announced January 2013.
-
Minimum KL-divergence on complements of $L_1$ balls
Authors:
Daniel Berend,
Peter Harremoës,
Aryeh Kontorovich
Abstract:
Pinsker's widely used inequality upper-bounds the total variation distance $||P-Q||_1$ in terms of the Kullback-Leibler divergence $D(P||Q)$. Although in general a bound in the reverse direction is impossible, in many applications the quantity of interest is actually $D^*(P,\eps)$ --- defined, for an arbitrary fixed $P$, as the infimum of $D(P||Q)$ over all distributions $Q$ that are $\eps$-far aw…
▽ More
Pinsker's widely used inequality upper-bounds the total variation distance $||P-Q||_1$ in terms of the Kullback-Leibler divergence $D(P||Q)$. Although in general a bound in the reverse direction is impossible, in many applications the quantity of interest is actually $D^*(P,\eps)$ --- defined, for an arbitrary fixed $P$, as the infimum of $D(P||Q)$ over all distributions $Q$ that are $\eps$-far away from $P$ in total variation. We show that $D^*(P,\eps)\le C\eps^2 + O(\eps^3)$, where $C=C(P)=1/2$ for "balanced" distributions, thereby providing a kind of reverse Pinsker inequality. An application to large deviations is given, and some of the structural results may be of independent interest. Keywords: Pinsker inequality, Sanov's theorem, large deviations
△ Less
Submitted 20 February, 2014; v1 submitted 27 June, 2012;
originally announced June 2012.
-
Rényi Divergence and Kullback-Leibler Divergence
Authors:
Tim van Erven,
Peter Harremoës
Abstract:
Rényi divergence is related to Rényi entropy much like Kullback-Leibler divergence is related to Shannon's entropy, and comes up in many settings. It was introduced by Rényi as a measure of information that satisfies almost the same axioms as Kullback-Leibler divergence, and depends on a parameter that is called its order. In particular, the Rényi divergence of order 1 equals the Kullback-Leibler…
▽ More
Rényi divergence is related to Rényi entropy much like Kullback-Leibler divergence is related to Shannon's entropy, and comes up in many settings. It was introduced by Rényi as a measure of information that satisfies almost the same axioms as Kullback-Leibler divergence, and depends on a parameter that is called its order. In particular, the Rényi divergence of order 1 equals the Kullback-Leibler divergence.
We review and extend the most important properties of Rényi divergence and Kullback-Leibler divergence, including convexity, continuity, limits of $σ$-algebras and the relation of the special order 0 to the Gaussian dichotomy and contiguity. We also show how to generalize the Pythagorean inequality to orders different from 1, and we extend the known equivalence between channel capacity and minimax redundancy to continuous channel inputs (for all orders) and present several other minimax results.
△ Less
Submitted 24 April, 2014; v1 submitted 12 June, 2012;
originally announced June 2012.
-
Information Divergence is more chi squared distributed than the chi squared statistics
Authors:
Peter Harremoës,
Gábor Tusnády
Abstract:
For testing goodness of fit it is very popular to use either the chi square statistic or G statistics (information divergence). Asymptotically both are chi square distributed so an obvious question is which of the two statistics that has a distribution that is closest to the chi square distribution. Surprisingly, when there is only one degree of freedom it seems like the distribution of informatio…
▽ More
For testing goodness of fit it is very popular to use either the chi square statistic or G statistics (information divergence). Asymptotically both are chi square distributed so an obvious question is which of the two statistics that has a distribution that is closest to the chi square distribution. Surprisingly, when there is only one degree of freedom it seems like the distribution of information divergence is much better approximated by a chi square distribution than the chi square statistic. For random variables we introduce a new transformation that transform several important distributions into new random variables that are almost Gaussian. For the binomial distributions and the Poisson distributions we formulate a general conjecture about how close their transform are to the Gaussian. The conjecture is proved for Poisson distributions.
△ Less
Submitted 17 June, 2012; v1 submitted 6 February, 2012;
originally announced February 2012.
-
Lower bounds on Information Divergence
Authors:
Peter Harremoës,
Christophe Vignat
Abstract:
In this paper we establish lower bounds on information divergence from a distribution to certain important classes of distributions as Gaussian, exponential, Gamma, Poisson, geometric, and binomial. These lower bounds are tight and for several convergence theorems where a rate of convergence can be computed, this rate is determined by the lower bounds proved in this paper. General techniques for g…
▽ More
In this paper we establish lower bounds on information divergence from a distribution to certain important classes of distributions as Gaussian, exponential, Gamma, Poisson, geometric, and binomial. These lower bounds are tight and for several convergence theorems where a rate of convergence can be computed, this rate is determined by the lower bounds proved in this paper. General techniques for getting lower bounds in terms of moments are developed.
△ Less
Submitted 12 February, 2011;
originally announced February 2011.
-
On Pairs of $f$-divergences and their Joint Range
Authors:
Peter Harremoës,
Igor Vajda
Abstract:
We compare two f-divergences and prove that their joint range is the convex hull of the joint range for distributions supported on only two points. Some applications of this result are given.
We compare two f-divergences and prove that their joint range is the convex hull of the joint range for distributions supported on only two points. Some applications of this result are given.
△ Less
Submitted 1 July, 2010;
originally announced July 2010.
-
Rényi Divergence and Majorization
Authors:
Tim van Erven,
Peter Harremoës
Abstract:
Rényi divergence is related to Rényi entropy much like information divergence (also called Kullback-Leibler divergence or relative entropy) is related to Shannon's entropy, and comes up in many settings. It was introduced by Rényi as a measure of information that satisfies almost the same axioms as information divergence. We review the most important properties of Rényi divergence, including its r…
▽ More
Rényi divergence is related to Rényi entropy much like information divergence (also called Kullback-Leibler divergence or relative entropy) is related to Shannon's entropy, and comes up in many settings. It was introduced by Rényi as a measure of information that satisfies almost the same axioms as information divergence. We review the most important properties of Rényi divergence, including its relation to some other distances. We show how Rényi divergence appears when the theory of majorization is generalized from the finite to the continuous setting. Finally, Rényi divergence plays a role in analyzing the number of binary questions required to guess the values of a sequence of random variables.
△ Less
Submitted 27 May, 2010; v1 submitted 25 January, 2010;
originally announced January 2010.
-
Joint Range of f-divergences
Authors:
Peter Harremoës,
Igor Vajda
Abstract:
We provide a general method for evaluation of the joint range of f-divergences for two different functions f. Via topological arguments we prove that the joint range for general distributions equals the convex hull of the joint range achieved by the distributions on a two-element set. The joint range technique provides important inequalities between different f-divergences with various application…
▽ More
We provide a general method for evaluation of the joint range of f-divergences for two different functions f. Via topological arguments we prove that the joint range for general distributions equals the convex hull of the joint range achieved by the distributions on a two-element set. The joint range technique provides important inequalities between different f-divergences with various applications in information theory and statistics.
△ Less
Submitted 27 May, 2010; v1 submitted 25 January, 2010;
originally announced January 2010.
-
Thinning, Entropy and the Law of Thin Numbers
Authors:
Peter Harremoes,
Oliver Johnson,
Ioannis Kontoyiannis
Abstract:
Renyi's "thinning" operation on a discrete random variable is a natural discrete analog of the scaling operation for continuous random variables. The properties of thinning are investigated in an information-theoretic context, especially in connection with information-theoretic inequalities related to Poisson approximation results. The classical Binomial-to-Poisson convergence (sometimes referre…
▽ More
Renyi's "thinning" operation on a discrete random variable is a natural discrete analog of the scaling operation for continuous random variables. The properties of thinning are investigated in an information-theoretic context, especially in connection with information-theoretic inequalities related to Poisson approximation results. The classical Binomial-to-Poisson convergence (sometimes referred to as the "law of small numbers" is seen to be a special case of a thinning limit theorem for convolutions of discrete distributions. A rate of convergence is provided for this limit, and nonasymptotic bounds are also established. This development parallels, in part, the development of Gaussian inequalities leading to the information-theoretic version of the central limit theorem. In particular, a "thinning Markov chain" is introduced, and it is shown to play a role analogous to that of the Ornstein-Uhlenbeck process in connection to the entropy power inequality.
△ Less
Submitted 3 June, 2009;
originally announced June 2009.
-
Joint Range of Rényi Entropies
Authors:
Peter Harremoës
Abstract:
The exact range of the joined values of several Rényi entropies is determined. The method is based on topology with special emphasis on the orientation of the objects studied. Like in the case when only two orders of Rényi entropies are studied one can parametrize upper and lower bounds but an explicit formula for a tight upper or lower bound cannot be given.
The exact range of the joined values of several Rényi entropies is determined. The method is based on topology with special emphasis on the orientation of the objects studied. Like in the case when only two orders of Rényi entropies are studied one can parametrize upper and lower bounds but an explicit formula for a tight upper or lower bound cannot be given.
△ Less
Submitted 16 April, 2009;
originally announced April 2009.
-
Testing Goodness-of-Fit via Rate Distortion
Authors:
Peter Harremoes
Abstract:
A framework is developed using techniques from rate distortion theory in statistical testing. The idea is first to do optimal compression according to a certain distortion function and then use information divergence from the compressed empirical distribution to the compressed null hypothesis as statistic. Only very special cases have been studied in more detail, but they indicate that the appro…
▽ More
A framework is developed using techniques from rate distortion theory in statistical testing. The idea is first to do optimal compression according to a certain distortion function and then use information divergence from the compressed empirical distribution to the compressed null hypothesis as statistic. Only very special cases have been studied in more detail, but they indicate that the approach can be used under very general conditions.
△ Less
Submitted 31 March, 2009;
originally announced March 2009.
-
Regret and Jeffreys Integrals in Exp. Families
Authors:
Peter Grunwald,
Peter Harremoes
Abstract:
The problem of whether minimax redundancy, minimax regret and Jeffreys integrals are finite or infinite are discussed.
The problem of whether minimax redundancy, minimax regret and Jeffreys integrals are finite or infinite are discussed.
△ Less
Submitted 31 March, 2009;
originally announced March 2009.
-
Maximum Entropy on Compact Groups
Authors:
Peter Harremoes
Abstract:
On a compact group the Haar probability measure plays the role of uniform distribution. The entropy and rate distortion theory for this uniform distribution is studied. New results and simplified proofs on convergence of convolutions on compact groups are presented and they can be formulated as entropy increases to its maximum. Information theoretic techniques and Markov chains play a crucial ro…
▽ More
On a compact group the Haar probability measure plays the role of uniform distribution. The entropy and rate distortion theory for this uniform distribution is studied. New results and simplified proofs on convergence of convolutions on compact groups are presented and they can be formulated as entropy increases to its maximum. Information theoretic techniques and Markov chains play a crucial role. The convergence results are also formulated via rate distortion functions. The rate of convergence is shown to be exponential.
△ Less
Submitted 29 March, 2009; v1 submitted 30 December, 2008;
originally announced January 2009.