-
Wikipedia information flow analysis reveals the scale-free architecture of the Semantic Space
Authors:
A. P. Masucci,
A. Kalampokis,
V. M. Eguíluz,
E. Hernández-García
Abstract:
In this paper we extract the topology of the semantic space in its encyclopedic acception, measuring the semantic flow between the different entries of the largest modern encyclopedia, Wikipedia, and thus creating a directed complex network of semantic flows. Notably at the percolation threshold the semantic space is characterised by scale-free behaviour at different levels of complexity and this…
▽ More
In this paper we extract the topology of the semantic space in its encyclopedic acception, measuring the semantic flow between the different entries of the largest modern encyclopedia, Wikipedia, and thus creating a directed complex network of semantic flows. Notably at the percolation threshold the semantic space is characterised by scale-free behaviour at different levels of complexity and this relates the semantic space to a wide range of biological, social and linguistics phenomena. In particular we find that the cluster size distribution, representing the size of different semantic areas, is scale-free. Moreover the topology of the resulting semantic space is scale-free in the connectivity distribution and displays small-world properties. However its statistical properties do not allow a classical interpretation via a generative model based on a simple multiplicative process. After giving a detailed description and interpretation of the topological properties of the semantic space, we introduce a stochastic model of content-based network, based on a copy and mutation algorithm and on the Heaps' law, that is able to capture the main statistical properties of the analysed semantic space, including the Zipf's law for the word frequency distribution.
△ Less
Submitted 2 February, 2011;
originally announced February 2011.
-
Extracting directed information flow networks: an application to genetics and semantics
Authors:
A. P. Masucci,
A. Kalampokis,
V. M. Eguíluz,
E. Hernández-García
Abstract:
We introduce a general method to infer the directional information flow between populations whose elements are described by n-dimensional vectors of symbolic attributes. The method is based on the Jensen-Shannon divergence and on the Shannon entropy and has a wide range of application. We show here the results of two applications: first extracting the network of genetic flow between the meadows of…
▽ More
We introduce a general method to infer the directional information flow between populations whose elements are described by n-dimensional vectors of symbolic attributes. The method is based on the Jensen-Shannon divergence and on the Shannon entropy and has a wide range of application. We show here the results of two applications: first extracting the network of genetic flow between the meadows of the seagrass Poseidonia Oceanica, where the meadow elements are specified by sets of microsatellite markers, then we extract the semantic flow network from a set of Wikipedia pages, showing the semantic channels between different areas of knowledge.
△ Less
Submitted 29 December, 2010; v1 submitted 24 September, 2010;
originally announced September 2010.
-
Evolution of Vocabulary on Scale-free and Random Networks
Authors:
Alkiviadis Kalampokis,
Kosmas Kosmidis,
Panos Argyrakis
Abstract:
We examine the evolution of the vocabulary of a group of individuals (linguistic agents) on a scale-free network, using Monte Carlo simulations and assumptions from evolutionary game theory. It is known that when the agents are arranged in a two-dimensional lattice structure and interact by diffusion and encounter, then their final vocabulary size is the maximum possible. Knowing all available w…
▽ More
We examine the evolution of the vocabulary of a group of individuals (linguistic agents) on a scale-free network, using Monte Carlo simulations and assumptions from evolutionary game theory. It is known that when the agents are arranged in a two-dimensional lattice structure and interact by diffusion and encounter, then their final vocabulary size is the maximum possible. Knowing all available words is essential in order to increase the probability to ``survive'' by effective reproduction. On scale-free networks we find a different result. It is not necessary to learn the entire vocabulary available. Survival chances are increased by using the vocabulary of the ``hubs'' (nodes with high degree). The existence of the ``hubs'' in a scale-free network is the source of an additional important fitness generating mechanism.
△ Less
Submitted 8 January, 2007;
originally announced January 2007.
-
Language Time Series Analysis
Authors:
Kosmas Kosmidis,
Alkiviadis Kalampokis,
Panos Argyrakis
Abstract:
We use the Detrended Fluctuation Analysis (DFA) and the Grassberger-Proccacia analysis (GP) methods in order to study language characteristics. Despite that we construct our signals using only word lengths or word frequencies, excluding in this way huge amount of information from language, the application of Grassberger- Proccacia (GP) analysis indicates that linguistic signals may be considered…
▽ More
We use the Detrended Fluctuation Analysis (DFA) and the Grassberger-Proccacia analysis (GP) methods in order to study language characteristics. Despite that we construct our signals using only word lengths or word frequencies, excluding in this way huge amount of information from language, the application of Grassberger- Proccacia (GP) analysis indicates that linguistic signals may be considered as the manifestation of a complex system of high dimensionality, different from random signals or systems of low dimensionality such as the earth climate. The DFA method is additionally able to distinguish a natural language signal from a computer code signal. This last result may be useful in the field of cryptography.
△ Less
Submitted 11 July, 2006;
originally announced July 2006.
-
Statistical Mechanical Approach to Human Language
Authors:
Kosmas Kosmidis,
Alkiviadis Kalampokis,
Panos Argyrakis
Abstract:
We use the formulation of equilibrium statistical mechanics in order to study some important characteristics of language. Using a simple expression for the Hamiltonian of a language system, which is directly implied by the Zipf law, we are able to explain several characteristic features of human language that seem completely unrelated, such as the universality of the Zipf exponent, the vocabular…
▽ More
We use the formulation of equilibrium statistical mechanics in order to study some important characteristics of language. Using a simple expression for the Hamiltonian of a language system, which is directly implied by the Zipf law, we are able to explain several characteristic features of human language that seem completely unrelated, such as the universality of the Zipf exponent, the vocabulary size of children, the reduced communication abilities of people suffering from schizophrenia, etc. While several explanations are necessarily only qualitative at this stage, we have, nevertheless, been able to derive a formula for the vocabulary size of children as a function of age, which agrees rather well with experimental data.
△ Less
Submitted 4 October, 2005;
originally announced October 2005.