-
A minimal base or a direct base? That is the question!
Authors:
Jaume Baixeries,
Amedeo Napoli
Abstract:
In this paper we revisit the problem of computing the closure of a set of attributes given a basis of dependencies or implications. This problem is of main interest in logics, in the relational database model, in lattice theory, and in Formal Concept Analysis as well. A basis of dependencies may have different characteristics, among which being ``minimal'', e.g., the Duquenne-Guigues Basis, or bei…
▽ More
In this paper we revisit the problem of computing the closure of a set of attributes given a basis of dependencies or implications. This problem is of main interest in logics, in the relational database model, in lattice theory, and in Formal Concept Analysis as well. A basis of dependencies may have different characteristics, among which being ``minimal'', e.g., the Duquenne-Guigues Basis, or being ``direct'', e.g., the the Canonical Basis and the D-basis. Here we propose an extensive and experimental study of the impacts of minimality and directness on the closure algorithms. The results of the experiments performed on real and synthetic datasets are analyzed in depth, and suggest a different and fresh look at computing the closure of a set of attributes w.r.t. a basis of dependencies.
This paper has been submitted to the International Journal of Approximate Reasoning.
△ Less
Submitted 7 March, 2025; v1 submitted 18 April, 2024;
originally announced April 2024.
-
Database Dependencies and Formal Concept Analysis
Authors:
Jaume Baixeries
Abstract:
This is an account of the characterization of database dependencies with Formal Concept Analysis.
This is an account of the characterization of database dependencies with Formal Concept Analysis.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Zipf's laws of meaning in Catalan
Authors:
Neus Català,
Jaume Baixeries,
Ramon Ferrer-Cancho,
Lluís Padró,
Antoni Hernández-Fernández
Abstract:
In his pioneering research, G. K. Zipf formulated a couple of statistical laws on the relationship between the frequency of a word with its number of meanings: the law of meaning distribution, relating the frequency of a word and its frequency rank, and the meaning-frequency law, relating the frequency of a word with its number of meanings. Although these laws were formulated more than half a cent…
▽ More
In his pioneering research, G. K. Zipf formulated a couple of statistical laws on the relationship between the frequency of a word with its number of meanings: the law of meaning distribution, relating the frequency of a word and its frequency rank, and the meaning-frequency law, relating the frequency of a word with its number of meanings. Although these laws were formulated more than half a century ago, they have been only investigated in a few languages. Here we present the first study of these laws in Catalan.
We verify these laws in Catalan via the relationship among their exponents and that of the rank-frequency law. We present a new protocol for the analysis of these Zipfian laws that can be extended to other languages. We report the first evidence of two marked regimes for these laws in written language and speech, paralleling the two regimes in Zipf's rank-frequency law in large multi-author corpora discovered in early 2000s. Finally, the implications of these two regimes will be discussed.
△ Less
Submitted 30 June, 2021;
originally announced July 2021.
-
Polysemy and brevity versus frequency in language
Authors:
Bernardino Casas,
Antoni Hernández-Fernández,
Neus Català,
Ramon Ferrer-i-Cancho,
Jaume Baixeries
Abstract:
The pioneering research of G. K. Zipf on the relationship between word frequency and other word features led to the formulation of various linguistic laws. The most popular is Zipf's law for word frequencies. Here we focus on two laws that have been studied less intensively: the meaning-frequency law, i.e. the tendency of more frequent words to be more polysemous, and the law of abbreviation, i.e.…
▽ More
The pioneering research of G. K. Zipf on the relationship between word frequency and other word features led to the formulation of various linguistic laws. The most popular is Zipf's law for word frequencies. Here we focus on two laws that have been studied less intensively: the meaning-frequency law, i.e. the tendency of more frequent words to be more polysemous, and the law of abbreviation, i.e. the tendency of more frequent words to be shorter. In a previous work, we tested the robustness of these Zipfian laws for English, roughly measuring word length in number of characters and distinguishing adult from child speech. In the present article, we extend our study to other languages (Dutch and Spanish) and introduce two additional measures of length: syllabic length and phonemic length. Our correlation analysis indicates that both the meaning-frequency law and the law of abbreviation hold overall in all the analyzed languages.
△ Less
Submitted 27 March, 2019;
originally announced April 2019.
-
The polysemy of the words that children learn over time
Authors:
Bernardino Casas,
Neus Català,
Ramon Ferrer-i-Cancho,
Antoni Hernández-Fernández,
Jaume Baixeries
Abstract:
Here we study polysemy as a potential learning bias in vocabulary learning in children. Words of low polysemy could be preferred as they reduce the disambiguation effort for the listener. However, such preference could be a side-effect of another bias: the preference of children for nouns in combination with the lower polysemy of nouns with respect to other part-of-speech categories. Our results s…
▽ More
Here we study polysemy as a potential learning bias in vocabulary learning in children. Words of low polysemy could be preferred as they reduce the disambiguation effort for the listener. However, such preference could be a side-effect of another bias: the preference of children for nouns in combination with the lower polysemy of nouns with respect to other part-of-speech categories. Our results show that mean polysemy in children increases over time in two phases, i.e. a fast growth till the 31st month followed by a slower tendency towards adult speech. In contrast, this evolution is not found in adults interacting with children. This suggests that children have a preference for non-polysemous words in their early stages of vocabulary acquisition. Interestingly, the evolutionary pattern described above weakens when controlling for syntactic category (noun, verb, adjective or adverb) but it does not disappear completely, suggesting that it could result from acombination of a standalone bias for low polysemy and a preference for nouns.
△ Less
Submitted 26 March, 2019; v1 submitted 27 November, 2016;
originally announced November 2016.
-
The challenges of statistical patterns of language: the case of Menzerath's law in genomes
Authors:
Ramon Ferrer-i-Cancho,
Núria Forns,
Antoni Hernández-Fernández,
Gemma Bel-Enguix,
Jaume Baixeries
Abstract:
The importance of statistical patterns of language has been debated over decades. Although Zipf's law is perhaps the most popular case, recently, Menzerath's law has begun to be involved. Menzerath's law manifests in language, music and genomes as a tendency of the mean size of the parts to decrease as the number of parts increases in many situations. This statistical regularity emerges also in th…
▽ More
The importance of statistical patterns of language has been debated over decades. Although Zipf's law is perhaps the most popular case, recently, Menzerath's law has begun to be involved. Menzerath's law manifests in language, music and genomes as a tendency of the mean size of the parts to decrease as the number of parts increases in many situations. This statistical regularity emerges also in the context of genomes, for instance, as a tendency of species with more chromosomes to have a smaller mean chromosome size. It has been argued that the instantiation of this law in genomes is not indicative of any parallel between language and genomes because (a) the law is inevitable and (b) non-coding DNA dominates genomes. Here mathematical, statistical and conceptual challenges of these criticisms are discussed. Two major conclusions are drawn: the law is not inevitable and languages also have a correlate of non-coding DNA. However, the wide range of manifestations of the law in and outside genomes suggests that the striking similarities between non-coding DNA and certain linguistics units could be anecdotal for understanding the recurrence of that statistical law.
△ Less
Submitted 29 September, 2012; v1 submitted 3 July, 2012;
originally announced July 2012.