-
The Magnitude of Categories of Texts Enriched by Language Models
Authors:
Tai-Danae Bradley,
Juan Pablo Vigneaux
Abstract:
The purpose of this article is twofold. Firstly, we use the next-token probabilities given by a language model to explicitly define a $[0,1]$-enrichment of a category of texts in natural language, in the sense of Bradley, Terilla, and Vlassopoulos. We consider explicitly the terminating conditions for text generation and determine when the enrichment itself can be interpreted as a probability over…
▽ More
The purpose of this article is twofold. Firstly, we use the next-token probabilities given by a language model to explicitly define a $[0,1]$-enrichment of a category of texts in natural language, in the sense of Bradley, Terilla, and Vlassopoulos. We consider explicitly the terminating conditions for text generation and determine when the enrichment itself can be interpreted as a probability over texts. Secondly, we compute the Möbius function and the magnitude of an associated generalized metric space $\mathcal{M}$ of texts using a combinatorial version of these quantities recently introduced by Vigneaux. The magnitude function $f(t)$ of $\mathcal{M}$ is a sum over texts $x$ (prompts) of the Tsallis $t$-entropies of the next-token probability distributions $p(-|x)$ plus the cardinality of the model's possible outputs. The derivative of $f$ at $t=1$ recovers a sum of Shannon entropies, which justifies seeing magnitude as a partition function. Following Leinster and Schulman, we also express the magnitude function of $\mathcal M$ as an Euler characteristic of magnitude homology and provide an explicit description of the zeroeth and first magnitude homology groups.
△ Less
Submitted 11 January, 2025;
originally announced January 2025.
-
A combinatorial approach to categorical Möbius inversion and pseudoinversion
Authors:
Juan Pablo Vigneaux
Abstract:
We use Cramer's formula for the inverse of a matrix and a combinatorial expression for the determinant in terms of paths of an associated digraph (which can be traced back to Coates) to give a combinatorial interpretation of Möbius inversion whenever it exists. Every Möbius coefficient is a quotient of two sums, each indexed by certain collections of paths in the digraph. Our result contains, as p…
▽ More
We use Cramer's formula for the inverse of a matrix and a combinatorial expression for the determinant in terms of paths of an associated digraph (which can be traced back to Coates) to give a combinatorial interpretation of Möbius inversion whenever it exists. Every Möbius coefficient is a quotient of two sums, each indexed by certain collections of paths in the digraph. Our result contains, as particular cases, previous theorems by Hall (for posets) and Leinster (for skeletal categories whose idempotents are identities). A byproduct is a novel expression for the magnitude of a metric space as sum over self-avoiding paths with finitely many terms. By means of Berg's formula, our main constructions can be extended to Moore-Penrose pseudoinverses, yielding an analogous combinatorial interpretation of Möbius pseudoinversion and, consequently, of the magnitude of an arbitrary finite category.
△ Less
Submitted 19 July, 2024;
originally announced July 2024.
-
On the entropy of rectifiable and stratified measures
Authors:
Juan Pablo Vigneaux
Abstract:
We summarize some results of geometric measure theory concerning rectifiable sets and measures. Combined with the entropic chain rule for disintegrations (Vigneaux, 2021), they account for some properties of the entropy of rectifiable measures with respect to the Hausdorff measure first studied by (Koliander et al., 2016). Then we present some recent work on stratified measures, which are convex c…
▽ More
We summarize some results of geometric measure theory concerning rectifiable sets and measures. Combined with the entropic chain rule for disintegrations (Vigneaux, 2021), they account for some properties of the entropy of rectifiable measures with respect to the Hausdorff measure first studied by (Koliander et al., 2016). Then we present some recent work on stratified measures, which are convex combinations of rectifiable measures. These generalize discrete-continuous mixtures and may have a singular continuous part. Their entropy obeys a chain rule, whose conditional term is an average of the entropies of the rectifiable measures involved. We state an asymptotic equipartition property (AEP) for stratified measures that shows concentration on strata of a few "typical dimensions" and that links the conditional term of the chain rule to the volume growth of typical sequences in each stratum.
△ Less
Submitted 31 May, 2023;
originally announced June 2023.
-
A formula for the categorical magnitude in terms of the Moore-Penrose pseudoinverse
Authors:
Stephanie Chen,
Juan Pablo Vigneaux
Abstract:
The magnitude of finite categories is a generalization of the Euler characteristic. It is defined using the coarse incidence algebra of rational-valued functions on the given finite category, and a distinguished element in this algebra: the Dirichlet zeta function. The incidence algebra may be identified with the algebra of $n \times n$ matrices over the rational numbers, where $n$ is the cardinal…
▽ More
The magnitude of finite categories is a generalization of the Euler characteristic. It is defined using the coarse incidence algebra of rational-valued functions on the given finite category, and a distinguished element in this algebra: the Dirichlet zeta function. The incidence algebra may be identified with the algebra of $n \times n$ matrices over the rational numbers, where $n$ is the cardinality of the underlying object set. The Moore-Penrose pseudoinverse of a matrix is a generalization of the inverse; it exists and is unique for any given matrix over the complex numbers. In this article, we derive a new method for calculating the magnitude of a finite category, using the pseudoinverse of the matrix that corresponds to the zeta function. The magnitude equals the sum of the entries of this pseudoinverse.
△ Less
Submitted 13 December, 2023; v1 submitted 21 March, 2023;
originally announced March 2023.
-
Categorical magnitude and entropy
Authors:
Stephanie Chen,
Juan Pablo Vigneaux
Abstract:
Given any finite set equipped with a probability measure, one may compute its Shannon entropy or information content. The entropy becomes the logarithm of the cardinality of the set when the uniform probability is used. Leinster introduced a notion of Euler characteristic for certain finite categories, also known as magnitude, that can be seen as a categorical generalization of cardinality. This p…
▽ More
Given any finite set equipped with a probability measure, one may compute its Shannon entropy or information content. The entropy becomes the logarithm of the cardinality of the set when the uniform probability is used. Leinster introduced a notion of Euler characteristic for certain finite categories, also known as magnitude, that can be seen as a categorical generalization of cardinality. This paper aims to connect the two ideas by considering the extension of Shannon entropy to finite categories endowed with probability, in such a way that the magnitude is recovered when a certain choice of "uniform" probability is made.
△ Less
Submitted 12 December, 2023; v1 submitted 1 March, 2023;
originally announced March 2023.
-
Typicality for stratified measures
Authors:
Juan Pablo Vigneaux
Abstract:
Stratified measures on Euclidean space are defined here as convex combinations of rectifiable measures. They are possibly singular with respect to the Lebesgue measure and generalize continuous-discrete mixtures. A stratified measure $ρ$ can thus be represented as $\sum_{i=1}^k q_i ρ_i$, where $(q_1,..,q_k)$ is a probability vector and each $ρ_i$ is $m_i$-rectifiable for some integer $m_i$ i.e. ab…
▽ More
Stratified measures on Euclidean space are defined here as convex combinations of rectifiable measures. They are possibly singular with respect to the Lebesgue measure and generalize continuous-discrete mixtures. A stratified measure $ρ$ can thus be represented as $\sum_{i=1}^k q_i ρ_i$, where $(q_1,..,q_k)$ is a probability vector and each $ρ_i$ is $m_i$-rectifiable for some integer $m_i$ i.e. absolutely continuous with respect to the $m_i$-Hausdorff measure $μ_i$ on a $m_i$-rectifiable set $E_i$ (e.g. a smooth $m_i$-manifold). We introduce a set of strongly typical realizations of $ρ^{\otimes n}$ (memoryless source) that occur with high probability. The typical realizations are supported on a finite union of strata $\{E_{i_1}\times \cdots \times E_{i_n}\}$ whose dimension concentrates around the mean dimension $\sum_{i=1}^k q_i m_i$. For each $n$, an appropriate sum of Hausdorff measures on the different strata gives a natural notion of reference "volume"; the exponential growth rate of the typical set's volume is quantified by Csiszar's generalized entropy of $ρ$ with respect to $μ=\sum_{i=1}^k μ_i$. Moreover, we prove that this generalized entropy satisfies a chain rule and that the conditional term is related to the volume growth of the typical realizations in each stratum. The chain rule and its asymptotic interpretation hold in the more general framework of piecewise continuous measures: convex combinations of measures restricted to pairwise disjoint sets equipped with reference $σ$-finite measures. Finally, we establish that our notion of mean dimension coincides with Rényi's information dimension when applied to stratified measures, but the generalized entropy used here differs from Rényi's dimensional entropy.
△ Less
Submitted 21 December, 2022;
originally announced December 2022.
-
Information cohomology of classical vector-valued observables
Authors:
Juan Pablo Vigneaux
Abstract:
We provide here a novel algebraic characterization of two information measures associated with a vector-valued random variable, its differential entropy and the dimension of the underlying space, purely based on their recursive properties (the chain rule and the nullity-rank theorem, respectively). More precisely, we compute the information cohomology of Baudot and Bennequin with coefficients in a…
▽ More
We provide here a novel algebraic characterization of two information measures associated with a vector-valued random variable, its differential entropy and the dimension of the underlying space, purely based on their recursive properties (the chain rule and the nullity-rank theorem, respectively). More precisely, we compute the information cohomology of Baudot and Bennequin with coefficients in a module of continuous probabilistic functionals over a category that mixes discrete observables and continuous vector-valued observables, characterizing completely the 1-cocycles; evaluated on continuous laws, these cocycles are linear combinations of the differential entropy and the dimension.
△ Less
Submitted 9 July, 2021;
originally announced July 2021.
-
Entropy under disintegrations
Authors:
Juan Pablo Vigneaux
Abstract:
We consider the differential entropy of probability measures absolutely continuous with respect to a given $σ$-finite reference measure on an arbitrary measurable space. We state the asymptotic equipartition property in this general case; the result is part of the folklore but our presentation is to some extent novel. Then we study a general framework under which such entropies satisfy a chain rul…
▽ More
We consider the differential entropy of probability measures absolutely continuous with respect to a given $σ$-finite reference measure on an arbitrary measurable space. We state the asymptotic equipartition property in this general case; the result is part of the folklore but our presentation is to some extent novel. Then we study a general framework under which such entropies satisfy a chain rule: disintegrations of measures. We give an asymptotic interpretation for conditional entropies in this case. Finally, we apply our result to Haar measures in canonical relation.
△ Less
Submitted 9 July, 2021; v1 submitted 18 February, 2021;
originally announced February 2021.
-
Extra-fine sheaves and interaction decompositions
Authors:
Daniel Bennequin,
Olivier Peltre,
Grégoire Sergeant-Perthuis,
Juan Pablo Vigneaux
Abstract:
We introduce an original notion of extra-fine sheaf on a topological space, and a variant (hyper-extra-fine) for which Čech cohomology in strictly positive degree vanishes. We provide a characterization of such sheaves when the topological space is a partially ordered set (poset) equipped with the Alexandrov topology. Then we further specialize our results to some sheaves of vector spaces and inje…
▽ More
We introduce an original notion of extra-fine sheaf on a topological space, and a variant (hyper-extra-fine) for which Čech cohomology in strictly positive degree vanishes. We provide a characterization of such sheaves when the topological space is a partially ordered set (poset) equipped with the Alexandrov topology. Then we further specialize our results to some sheaves of vector spaces and injective maps, where extra-fineness is (essentially) equivalent to the decomposition of the sheaf into a direct sum of subfunctors, known as interaction decomposition, and can be expressed by a sum-intersection condition. We use these results to compute the dimension of the space of global sections when the presheaves are freely generated over a functor of sets, generalizing classical counting formulae for the number of solutions of the linearized marginal problem (Kellerer and Matúš). We finish with a comparison theorem between the Čech cohomology associated to a covering and the topos cohomology of the poset with coefficients in the presheaf, which is also the cohomology of a cosimplicial local system over the nerve of the poset. For that, we give a detailed treatment of cosimplicial local systems on simplicial sets. The appendixes present presheaves, sheaves and Čech cohomology, and their application to the marginal problem.
△ Less
Submitted 18 December, 2020; v1 submitted 26 September, 2020;
originally announced September 2020.
-
A homological characterization of generalized multinomial coefficients related to the entropic chain rule
Authors:
Juan Pablo Vigneaux
Abstract:
There is an asymptotic relationship between the multiplicative relations among multinomial coefficients and the (additive) recurrence property of Shannon entropy known as the chain rule. We show that both types of identities are manifestations of a unique algebraic construction: a $1$-cocycle condition in \emph{information cohomology}, an algebraic invariant of phesheaves of modules on \emph{infor…
▽ More
There is an asymptotic relationship between the multiplicative relations among multinomial coefficients and the (additive) recurrence property of Shannon entropy known as the chain rule. We show that both types of identities are manifestations of a unique algebraic construction: a $1$-cocycle condition in \emph{information cohomology}, an algebraic invariant of phesheaves of modules on \emph{information structures} (categories of observables). Baudot and Bennequin introduced this cohomology and proved that Shannon entropy represents the only nontrivial cohomology class in degree $1$ when the coefficients are a natural presheaf of probabilistic functionals. The author obtained later a $1$-parameter family of deformations of that presheaf, in such a way that each Tsallis $α$-entropy appears as the unique $1$-cocycle associated to the parameter $α$. In this article, we introduce a new presheaf of \emph{combinatorial functionals}, which are measurable functions of finite arrays of integers; these arrays represent \emph{histograms} associated to random experiments. In this case, the only cohomology class in degree $0$ is generated by the exponential function and $1$-cocycles are Fontené-Ward generalized multinomial coefficients. As a byproduct, we get a simple combinatorial analogue of the fundamental equation of information theory that characterizes the generalized binomial coefficients. The asymptotic relationship mentioned above is extended to a correspondence between certain generalized multinomial coefficients and any $α$-entropy, that sheds new light on the meaning of the chain rule and its deformations.
△ Less
Submitted 4 March, 2020;
originally announced March 2020.
-
A functional equation related to generalized entropies and the modular group
Authors:
Daniel Bennequin,
Juan Pablo Vigneaux
Abstract:
We solve a functional equation connected to the algebraic characterization of generalized information functions. To prove the symmetry of the solution, we study a related system of functional equations, which involves two homographies. These transformations generate the modular group, and this fact plays a crucial role in solving the system. The method suggests a more general relation between cond…
▽ More
We solve a functional equation connected to the algebraic characterization of generalized information functions. To prove the symmetry of the solution, we study a related system of functional equations, which involves two homographies. These transformations generate the modular group, and this fact plays a crucial role in solving the system. The method suggests a more general relation between conditional probabilities and arithmetic.
△ Less
Submitted 4 March, 2020; v1 submitted 15 October, 2019;
originally announced October 2019.
-
Information theory with finite vector spaces
Authors:
Juan Pablo Vigneaux
Abstract:
Whereas Shannon entropy is related to the growth rate of multinomial coefficients, we show that the quadratic entropy (Tsallis 2-entropy) is connected to their $q$-deformation; when $q$ is a prime power, these $q$-multinomial coefficients count flags of finite vector spaces with prescribed length and dimensions. In particular, the $q$-binomial coefficients count vector subspaces of given dimension…
▽ More
Whereas Shannon entropy is related to the growth rate of multinomial coefficients, we show that the quadratic entropy (Tsallis 2-entropy) is connected to their $q$-deformation; when $q$ is a prime power, these $q$-multinomial coefficients count flags of finite vector spaces with prescribed length and dimensions. In particular, the $q$-binomial coefficients count vector subspaces of given dimension. We obtain this way a combinatorial explanation for the nonadditivity of the quadratic entropy, which arises from a recursive counting of flags. We show that statistical systems whose configurations are described by flags provide a frequentist justification for the maximum entropy principle with Tsallis statistics. We introduce then a discrete-time stochastic process associated to the $q$-binomial probability distribution, that generates at time $n$ a vector subspace of $\mathbb{F}_q^n$ (here $\mathbb{F}_q$ is the finite field of order $q$). The concentration of measure on certain "typical subspaces" allows us to extend the asymptotic equipartition property to this setting. The size of the typical set is quantified by the quadratic entropy. We discuss the applications to Shannon theory, particularly to source coding, when messages correspond to vector spaces.
△ Less
Submitted 25 March, 2020; v1 submitted 13 July, 2018;
originally announced July 2018.
-
Information structures and their cohomology
Authors:
Juan Pablo Vigneaux
Abstract:
We introduce the category of information structures, whose objects are suitable diagrams of measurable sets that encode the possible outputs of a given family of observables and their mutual relationships of refinement; they serve as mathematical models of contextuality in classical and quantum settings. Each information structure can be regarded as a ringed site with trivial topology; the structu…
▽ More
We introduce the category of information structures, whose objects are suitable diagrams of measurable sets that encode the possible outputs of a given family of observables and their mutual relationships of refinement; they serve as mathematical models of contextuality in classical and quantum settings. Each information structure can be regarded as a ringed site with trivial topology; the structure ring is generated by the observables themselves and its multiplication corresponds to joint measurement. We extend Baudot and Bennequin's definition of information cohomology to this setting, as a derived functor in the category of modules over the structure ring, and show explicitly that the bar construction gives a projective resolution in that category, recovering in this way the cochain complexes previously considered in the literature. Finally, we study the particular case of a one-parameter family of coefficients made of functions of probability distributions. The only 1-cocycles are Shannon entropy or Tsallis $α$-entropy, depending on the value of the parameter.
△ Less
Submitted 8 November, 2021; v1 submitted 22 September, 2017;
originally announced September 2017.