Search | arXiv e-print repository

The Magnitude of Categories of Texts Enriched by Language Models

Authors: Tai-Danae Bradley, Juan Pablo Vigneaux

Abstract: The purpose of this article is twofold. Firstly, we use the next-token probabilities given by a language model to explicitly define a $[0,1]$-enrichment of a category of texts in natural language, in the sense of Bradley, Terilla, and Vlassopoulos. We consider explicitly the terminating conditions for text generation and determine when the enrichment itself can be interpreted as a probability over… ▽ More The purpose of this article is twofold. Firstly, we use the next-token probabilities given by a language model to explicitly define a $[0,1]$-enrichment of a category of texts in natural language, in the sense of Bradley, Terilla, and Vlassopoulos. We consider explicitly the terminating conditions for text generation and determine when the enrichment itself can be interpreted as a probability over texts. Secondly, we compute the Möbius function and the magnitude of an associated generalized metric space $\mathcal{M}$ of texts using a combinatorial version of these quantities recently introduced by Vigneaux. The magnitude function $f(t)$ of $\mathcal{M}$ is a sum over texts $x$ (prompts) of the Tsallis $t$-entropies of the next-token probability distributions $p(-|x)$ plus the cardinality of the model's possible outputs. The derivative of $f$ at $t=1$ recovers a sum of Shannon entropies, which justifies seeing magnitude as a partition function. Following Leinster and Schulman, we also express the magnitude function of $\mathcal M$ as an Euler characteristic of magnitude homology and provide an explicit description of the zeroeth and first magnitude homology groups. △ Less

Submitted 11 January, 2025; originally announced January 2025.

MSC Class: 18D20; 68T50 ACM Class: I.2.7; G.3

arXiv:2407.14647 [pdf, ps, other]

A combinatorial approach to categorical Möbius inversion and pseudoinversion

Authors: Juan Pablo Vigneaux

Abstract: We use Cramer's formula for the inverse of a matrix and a combinatorial expression for the determinant in terms of paths of an associated digraph (which can be traced back to Coates) to give a combinatorial interpretation of Möbius inversion whenever it exists. Every Möbius coefficient is a quotient of two sums, each indexed by certain collections of paths in the digraph. Our result contains, as p… ▽ More We use Cramer's formula for the inverse of a matrix and a combinatorial expression for the determinant in terms of paths of an associated digraph (which can be traced back to Coates) to give a combinatorial interpretation of Möbius inversion whenever it exists. Every Möbius coefficient is a quotient of two sums, each indexed by certain collections of paths in the digraph. Our result contains, as particular cases, previous theorems by Hall (for posets) and Leinster (for skeletal categories whose idempotents are identities). A byproduct is a novel expression for the magnitude of a metric space as sum over self-avoiding paths with finitely many terms. By means of Berg's formula, our main constructions can be extended to Moore-Penrose pseudoinverses, yielding an analogous combinatorial interpretation of Möbius pseudoinversion and, consequently, of the magnitude of an arbitrary finite category. △ Less

Submitted 19 July, 2024; originally announced July 2024.

Comments: 17 pages. Some of the results were presented at the conference "Magnitude 2023" in Osaka, Japan

MSC Class: 15A09; 15A10; 18A05; 54E35; 18D20

arXiv:2306.00305 [pdf, ps, other]

On the entropy of rectifiable and stratified measures

Authors: Juan Pablo Vigneaux

Abstract: We summarize some results of geometric measure theory concerning rectifiable sets and measures. Combined with the entropic chain rule for disintegrations (Vigneaux, 2021), they account for some properties of the entropy of rectifiable measures with respect to the Hausdorff measure first studied by (Koliander et al., 2016). Then we present some recent work on stratified measures, which are convex c… ▽ More We summarize some results of geometric measure theory concerning rectifiable sets and measures. Combined with the entropic chain rule for disintegrations (Vigneaux, 2021), they account for some properties of the entropy of rectifiable measures with respect to the Hausdorff measure first studied by (Koliander et al., 2016). Then we present some recent work on stratified measures, which are convex combinations of rectifiable measures. These generalize discrete-continuous mixtures and may have a singular continuous part. Their entropy obeys a chain rule, whose conditional term is an average of the entropies of the rectifiable measures involved. We state an asymptotic equipartition property (AEP) for stratified measures that shows concentration on strata of a few "typical dimensions" and that links the conditional term of the chain rule to the volume growth of typical sequences in each stratum. △ Less

Submitted 31 May, 2023; originally announced June 2023.

Comments: To appear in the proceedings of Geometric Science of Information (GSI2023)

MSC Class: 94A17; 94A24; 28A75

arXiv:2303.12176 [pdf, ps, other]

doi 10.36045/j.bbms.230331

A formula for the categorical magnitude in terms of the Moore-Penrose pseudoinverse

Authors: Stephanie Chen, Juan Pablo Vigneaux

Abstract: The magnitude of finite categories is a generalization of the Euler characteristic. It is defined using the coarse incidence algebra of rational-valued functions on the given finite category, and a distinguished element in this algebra: the Dirichlet zeta function. The incidence algebra may be identified with the algebra of $n \times n$ matrices over the rational numbers, where $n$ is the cardinal… ▽ More The magnitude of finite categories is a generalization of the Euler characteristic. It is defined using the coarse incidence algebra of rational-valued functions on the given finite category, and a distinguished element in this algebra: the Dirichlet zeta function. The incidence algebra may be identified with the algebra of $n \times n$ matrices over the rational numbers, where $n$ is the cardinality of the underlying object set. The Moore-Penrose pseudoinverse of a matrix is a generalization of the inverse; it exists and is unique for any given matrix over the complex numbers. In this article, we derive a new method for calculating the magnitude of a finite category, using the pseudoinverse of the matrix that corresponds to the zeta function. The magnitude equals the sum of the entries of this pseudoinverse. △ Less

Submitted 13 December, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

MSC Class: 18D99; 15A10

Journal ref: Bull. Belg. Math. Soc. Simon Stevin 30(3): 341-353 (November 2023)

arXiv:2303.00879 [pdf, ps, other]

doi 10.1007/978-3-031-38271-0_28

Categorical magnitude and entropy

Authors: Stephanie Chen, Juan Pablo Vigneaux

Abstract: Given any finite set equipped with a probability measure, one may compute its Shannon entropy or information content. The entropy becomes the logarithm of the cardinality of the set when the uniform probability is used. Leinster introduced a notion of Euler characteristic for certain finite categories, also known as magnitude, that can be seen as a categorical generalization of cardinality. This p… ▽ More Given any finite set equipped with a probability measure, one may compute its Shannon entropy or information content. The entropy becomes the logarithm of the cardinality of the set when the uniform probability is used. Leinster introduced a notion of Euler characteristic for certain finite categories, also known as magnitude, that can be seen as a categorical generalization of cardinality. This paper aims to connect the two ideas by considering the extension of Shannon entropy to finite categories endowed with probability, in such a way that the magnitude is recovered when a certain choice of "uniform" probability is made. △ Less

Submitted 12 December, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

Comments: 11 pages, published in GSI 2023 conference proceedings

MSC Class: 18; 94

Journal ref: In: Nielsen, F., Barbaresco, F. (eds) Geometric Science of Information. GSI 2023. Lecture Notes in Computer Science, vol 14071. Springer, Cham

arXiv:2212.10809 [pdf, ps, other]

Typicality for stratified measures

Authors: Juan Pablo Vigneaux

Abstract: Stratified measures on Euclidean space are defined here as convex combinations of rectifiable measures. They are possibly singular with respect to the Lebesgue measure and generalize continuous-discrete mixtures. A stratified measure $ρ$ can thus be represented as $\sum_{i=1}^k q_i ρ_i$, where $(q_1,..,q_k)$ is a probability vector and each $ρ_i$ is $m_i$-rectifiable for some integer $m_i$ i.e. ab… ▽ More Stratified measures on Euclidean space are defined here as convex combinations of rectifiable measures. They are possibly singular with respect to the Lebesgue measure and generalize continuous-discrete mixtures. A stratified measure $ρ$ can thus be represented as $\sum_{i=1}^k q_i ρ_i$, where $(q_1,..,q_k)$ is a probability vector and each $ρ_i$ is $m_i$-rectifiable for some integer $m_i$ i.e. absolutely continuous with respect to the $m_i$-Hausdorff measure $μ_i$ on a $m_i$-rectifiable set $E_i$ (e.g. a smooth $m_i$-manifold). We introduce a set of strongly typical realizations of $ρ^{\otimes n}$ (memoryless source) that occur with high probability. The typical realizations are supported on a finite union of strata $\{E_{i_1}\times \cdots \times E_{i_n}\}$ whose dimension concentrates around the mean dimension $\sum_{i=1}^k q_i m_i$. For each $n$, an appropriate sum of Hausdorff measures on the different strata gives a natural notion of reference "volume"; the exponential growth rate of the typical set's volume is quantified by Csiszar's generalized entropy of $ρ$ with respect to $μ=\sum_{i=1}^k μ_i$. Moreover, we prove that this generalized entropy satisfies a chain rule and that the conditional term is related to the volume growth of the typical realizations in each stratum. The chain rule and its asymptotic interpretation hold in the more general framework of piecewise continuous measures: convex combinations of measures restricted to pairwise disjoint sets equipped with reference $σ$-finite measures. Finally, we establish that our notion of mean dimension coincides with Rényi's information dimension when applied to stratified measures, but the generalized entropy used here differs from Rényi's dimensional entropy. △ Less

Submitted 21 December, 2022; originally announced December 2022.

MSC Class: 94A17; 94A24; 28A75; 60F99

arXiv:2107.04377 [pdf, ps, other]

Information cohomology of classical vector-valued observables

Authors: Juan Pablo Vigneaux

Abstract: We provide here a novel algebraic characterization of two information measures associated with a vector-valued random variable, its differential entropy and the dimension of the underlying space, purely based on their recursive properties (the chain rule and the nullity-rank theorem, respectively). More precisely, we compute the information cohomology of Baudot and Bennequin with coefficients in a… ▽ More We provide here a novel algebraic characterization of two information measures associated with a vector-valued random variable, its differential entropy and the dimension of the underlying space, purely based on their recursive properties (the chain rule and the nullity-rank theorem, respectively). More precisely, we compute the information cohomology of Baudot and Bennequin with coefficients in a module of continuous probabilistic functionals over a category that mixes discrete observables and continuous vector-valued observables, characterizing completely the 1-cocycles; evaluated on continuous laws, these cocycles are linear combinations of the differential entropy and the dimension. △ Less

Submitted 9 July, 2021; originally announced July 2021.

Comments: 10 pages, no figures. Conference paper (GSI2021)

MSC Class: 94A17; 18G90

Journal ref: Chapter 58 in F. Nielsen, F. Barbaresco (eds.). Geometric Science of Information: 5th International Conference, GSI 2021, Paris, France, 2021, Proceedings. LNCS Vol. 12829. Springer, 2021

arXiv:2102.09584 [pdf, ps, other]

Entropy under disintegrations

Authors: Juan Pablo Vigneaux

Abstract: We consider the differential entropy of probability measures absolutely continuous with respect to a given $σ$-finite reference measure on an arbitrary measurable space. We state the asymptotic equipartition property in this general case; the result is part of the folklore but our presentation is to some extent novel. Then we study a general framework under which such entropies satisfy a chain rul… ▽ More We consider the differential entropy of probability measures absolutely continuous with respect to a given $σ$-finite reference measure on an arbitrary measurable space. We state the asymptotic equipartition property in this general case; the result is part of the folklore but our presentation is to some extent novel. Then we study a general framework under which such entropies satisfy a chain rule: disintegrations of measures. We give an asymptotic interpretation for conditional entropies in this case. Finally, we apply our result to Haar measures in canonical relation. △ Less

Submitted 9 July, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

Comments: 10 pages, no figures. Conference Paper (GSI2021)

MSC Class: 94A17 (Primary) 60B05 (Secondary)

Journal ref: Chapter 38 in F. Nielsen, F. Barbaresco (eds.). Geometric Science of Information: 5th International Conference, GSI 2021, Paris, France, 2021, Proceedings. LNCS Vol. 12829. Springer, 2021

arXiv:2009.12646 [pdf, ps, other]

Extra-fine sheaves and interaction decompositions

Authors: Daniel Bennequin, Olivier Peltre, Grégoire Sergeant-Perthuis, Juan Pablo Vigneaux

Abstract: We introduce an original notion of extra-fine sheaf on a topological space, and a variant (hyper-extra-fine) for which Čech cohomology in strictly positive degree vanishes. We provide a characterization of such sheaves when the topological space is a partially ordered set (poset) equipped with the Alexandrov topology. Then we further specialize our results to some sheaves of vector spaces and inje… ▽ More We introduce an original notion of extra-fine sheaf on a topological space, and a variant (hyper-extra-fine) for which Čech cohomology in strictly positive degree vanishes. We provide a characterization of such sheaves when the topological space is a partially ordered set (poset) equipped with the Alexandrov topology. Then we further specialize our results to some sheaves of vector spaces and injective maps, where extra-fineness is (essentially) equivalent to the decomposition of the sheaf into a direct sum of subfunctors, known as interaction decomposition, and can be expressed by a sum-intersection condition. We use these results to compute the dimension of the space of global sections when the presheaves are freely generated over a functor of sets, generalizing classical counting formulae for the number of solutions of the linearized marginal problem (Kellerer and Matúš). We finish with a comparison theorem between the Čech cohomology associated to a covering and the topos cohomology of the poset with coefficients in the presheaf, which is also the cohomology of a cosimplicial local system over the nerve of the poset. For that, we give a detailed treatment of cosimplicial local systems on simplicial sets. The appendixes present presheaves, sheaves and Čech cohomology, and their application to the marginal problem. △ Less

Submitted 18 December, 2020; v1 submitted 26 September, 2020; originally announced September 2020.

Comments: 47 pages, no figures. Several corrections to the previous version were introduced, mainly to current Thm. 2.6, Prop. 4.3, Thm. 4.4 and Thm. 4.6. The introduction was expanded. The structure of the article is the same

MSC Class: Primary 14F05; 55N30; 55U10 Secondary 06A11; 60B99

arXiv:2003.02021 [pdf, ps, other]

A homological characterization of generalized multinomial coefficients related to the entropic chain rule

Authors: Juan Pablo Vigneaux

Abstract: There is an asymptotic relationship between the multiplicative relations among multinomial coefficients and the (additive) recurrence property of Shannon entropy known as the chain rule. We show that both types of identities are manifestations of a unique algebraic construction: a $1$-cocycle condition in \emph{information cohomology}, an algebraic invariant of phesheaves of modules on \emph{infor… ▽ More There is an asymptotic relationship between the multiplicative relations among multinomial coefficients and the (additive) recurrence property of Shannon entropy known as the chain rule. We show that both types of identities are manifestations of a unique algebraic construction: a $1$-cocycle condition in \emph{information cohomology}, an algebraic invariant of phesheaves of modules on \emph{information structures} (categories of observables). Baudot and Bennequin introduced this cohomology and proved that Shannon entropy represents the only nontrivial cohomology class in degree $1$ when the coefficients are a natural presheaf of probabilistic functionals. The author obtained later a $1$-parameter family of deformations of that presheaf, in such a way that each Tsallis $α$-entropy appears as the unique $1$-cocycle associated to the parameter $α$. In this article, we introduce a new presheaf of \emph{combinatorial functionals}, which are measurable functions of finite arrays of integers; these arrays represent \emph{histograms} associated to random experiments. In this case, the only cohomology class in degree $0$ is generated by the exponential function and $1$-cocycles are Fontené-Ward generalized multinomial coefficients. As a byproduct, we get a simple combinatorial analogue of the fundamental equation of information theory that characterizes the generalized binomial coefficients. The asymptotic relationship mentioned above is extended to a correspondence between certain generalized multinomial coefficients and any $α$-entropy, that sheds new light on the meaning of the chain rule and its deformations. △ Less

Submitted 4 March, 2020; originally announced March 2020.

MSC Class: 05A10; 05A16; 39B22; 18G60; 94A17

arXiv:1910.06927 [pdf, ps, other]

doi 10.1007/s00010-020-00717-2

A functional equation related to generalized entropies and the modular group

Authors: Daniel Bennequin, Juan Pablo Vigneaux

Abstract: We solve a functional equation connected to the algebraic characterization of generalized information functions. To prove the symmetry of the solution, we study a related system of functional equations, which involves two homographies. These transformations generate the modular group, and this fact plays a crucial role in solving the system. The method suggests a more general relation between cond… ▽ More We solve a functional equation connected to the algebraic characterization of generalized information functions. To prove the symmetry of the solution, we study a related system of functional equations, which involves two homographies. These transformations generate the modular group, and this fact plays a crucial role in solving the system. The method suggests a more general relation between conditional probabilities and arithmetic. △ Less

Submitted 4 March, 2020; v1 submitted 15 October, 2019; originally announced October 2019.

Comments: Originally uploaded as an appendix to arXiv:1709.07807v1. Changes in v2: the introduction was extended to summarize in more detail previous results; there is a new lemma at the end of Section 4

MSC Class: 97I70; 94A17

Journal ref: Aequationes Mathematicae, 2020

arXiv:1807.05152 [pdf, ps, other]

doi 10.1109/TIT.2019.2907590

Information theory with finite vector spaces

Authors: Juan Pablo Vigneaux

Abstract: Whereas Shannon entropy is related to the growth rate of multinomial coefficients, we show that the quadratic entropy (Tsallis 2-entropy) is connected to their $q$-deformation; when $q$ is a prime power, these $q$-multinomial coefficients count flags of finite vector spaces with prescribed length and dimensions. In particular, the $q$-binomial coefficients count vector subspaces of given dimension… ▽ More Whereas Shannon entropy is related to the growth rate of multinomial coefficients, we show that the quadratic entropy (Tsallis 2-entropy) is connected to their $q$-deformation; when $q$ is a prime power, these $q$-multinomial coefficients count flags of finite vector spaces with prescribed length and dimensions. In particular, the $q$-binomial coefficients count vector subspaces of given dimension. We obtain this way a combinatorial explanation for the nonadditivity of the quadratic entropy, which arises from a recursive counting of flags. We show that statistical systems whose configurations are described by flags provide a frequentist justification for the maximum entropy principle with Tsallis statistics. We introduce then a discrete-time stochastic process associated to the $q$-binomial probability distribution, that generates at time $n$ a vector subspace of $\mathbb{F}_q^n$ (here $\mathbb{F}_q$ is the finite field of order $q$). The concentration of measure on certain "typical subspaces" allows us to extend the asymptotic equipartition property to this setting. The size of the typical set is quantified by the quadratic entropy. We discuss the applications to Shannon theory, particularly to source coding, when messages correspond to vector spaces. △ Less

Submitted 25 March, 2020; v1 submitted 13 July, 2018; originally announced July 2018.

Comments: Presented in part at the Latin American Week on Coding and Information 2018 (Campinas, Brazil)

MSC Class: 94A15; 05A10; 60G99

Journal ref: IEEE Transactions on Information Theory, vol. 65, no. 9, pp. 5674-5687, Sept. 2019

arXiv:1709.07807 [pdf, ps, other]

Information structures and their cohomology

Authors: Juan Pablo Vigneaux

Abstract: We introduce the category of information structures, whose objects are suitable diagrams of measurable sets that encode the possible outputs of a given family of observables and their mutual relationships of refinement; they serve as mathematical models of contextuality in classical and quantum settings. Each information structure can be regarded as a ringed site with trivial topology; the structu… ▽ More We introduce the category of information structures, whose objects are suitable diagrams of measurable sets that encode the possible outputs of a given family of observables and their mutual relationships of refinement; they serve as mathematical models of contextuality in classical and quantum settings. Each information structure can be regarded as a ringed site with trivial topology; the structure ring is generated by the observables themselves and its multiplication corresponds to joint measurement. We extend Baudot and Bennequin's definition of information cohomology to this setting, as a derived functor in the category of modules over the structure ring, and show explicitly that the bar construction gives a projective resolution in that category, recovering in this way the cochain complexes previously considered in the literature. Finally, we study the particular case of a one-parameter family of coefficients made of functions of probability distributions. The only 1-cocycles are Shannon entropy or Tsallis $α$-entropy, depending on the value of the parameter. △ Less

Submitted 8 November, 2021; v1 submitted 22 September, 2017; originally announced September 2017.

Comments: 54 pages, 1 figure. This improved version was finally published in Theory and Applications of Categories. It took into account multiple suggestion of the reviewer

MSC Class: 94A15; 55N35 (Primary); 39B05; 60A99 (Secondary)

Journal ref: Theory and Applications of Categories, Vol. 35, 2020, No. 38, pp 1476-1529

Showing 1–13 of 13 results for author: Vigneaux, J P