-
On The Maximum Linear Arrangement Problem for Trees
Authors:
Lluís Alemany-Puig,
Juan Luis Esteban,
Ramon Ferrer-i-Cancho
Abstract:
Linear arrangements of graphs are a well-known type of graph labeling and are found in many important computational problems, such as the Minimum Linear Arrangement Problem ($\texttt{minLA}$). A linear arrangement is usually defined as a permutation of the $n$ vertices of a graph. An intuitive geometric setting is that of vertices lying on consecutive integer positions in the real line, starting a…
▽ More
Linear arrangements of graphs are a well-known type of graph labeling and are found in many important computational problems, such as the Minimum Linear Arrangement Problem ($\texttt{minLA}$). A linear arrangement is usually defined as a permutation of the $n$ vertices of a graph. An intuitive geometric setting is that of vertices lying on consecutive integer positions in the real line, starting at 1; edges are often drawn as semicircles above the real line. In this paper we study the Maximum Linear Arrangement problem ($\texttt{MaxLA}$), the maximization variant of $\texttt{minLA}$. We devise a new characterization of maximum arrangements of general graphs, and prove that $\texttt{MaxLA}$ can be solved for cycle graphs in constant time, and for $k$-linear trees ($k\le2$) in time $O(n)$. We present two constrained variants of $\texttt{MaxLA}$ we call $\texttt{bipartite MaxLA}$ and $\texttt{1-thistle MaxLA}$. We prove that the former can be solved in time $O(n)$ for any bipartite graph; the latter, by an algorithm that typically runs in time $O(n^4)$ on unlabelled trees. The combination of the two variants has two promising characteristics. First, it solves $\texttt{MaxLA}$ for almost all trees consisting of a few tenths of nodes. Second, we prove that it constitutes a $3/2$-approximation algorithm for $\texttt{MaxLA}$ for trees. Furthermore, we conjecture that $\texttt{bipartite MaxLA}$ solves $\texttt{MaxLA}$ for at least $50\%$ of all free trees.
△ Less
Submitted 9 July, 2024; v1 submitted 7 December, 2023;
originally announced December 2023.
-
The expected sum of edge lengths in planar linearizations of trees. Theory and applications
Authors:
Lluís Alemany-Puig,
Ramon Ferrer-i-Cancho
Abstract:
Dependency trees have proven to be a very successful model to represent the syntactic structure of sentences of human languages. In these structures, vertices are words and edges connect syntactically-dependent words. The tendency of these dependencies to be short has been demonstrated using random baselines for the sum of the lengths of the edges or its variants. A ubiquitous baseline is the expe…
▽ More
Dependency trees have proven to be a very successful model to represent the syntactic structure of sentences of human languages. In these structures, vertices are words and edges connect syntactically-dependent words. The tendency of these dependencies to be short has been demonstrated using random baselines for the sum of the lengths of the edges or its variants. A ubiquitous baseline is the expected sum in projective orderings (wherein edges do not cross and the root word of the sentence is not covered by any edge), that can be computed in time $O(n)$. Here we focus on a weaker formal constraint, namely planarity. In the theoretical domain, we present a characterization of planarity that, given a sentence, yields either the number of planar permutations or an efficient algorithm to generate uniformly random planar permutations of the words. We also show the relationship between the expected sum in planar arrangements and the expected sum in projective arrangements. In the domain of applications, we derive a $O(n)$-time algorithm to calculate the expected value of the sum of edge lengths. We also apply this research to a parallel corpus and find that the gap between actual dependency distance and the random baseline reduces as the strength of the formal constraint on dependency structures increases, suggesting that formal constraints absorb part of the dependency distance minimization effect. Our research paves the way for replicating past research on dependency distance minimization using random planar linearizations as random baseline.
△ Less
Submitted 18 September, 2023; v1 submitted 12 July, 2022;
originally announced July 2022.
-
The Maximum Linear Arrangement Problem for trees under projectivity and planarity
Authors:
Lluís Alemany-Puig,
Juan Luis Esteban,
Ramon Ferrer-i-Cancho
Abstract:
A linear arrangement is a mapping $π$ from the $n$ vertices of a graph $G$ to $n$ distinct consecutive integers. Linear arrangements can be represented by drawing the vertices along a horizontal line and drawing the edges as semicircles above said line. In this setting, the length of an edge is defined as the absolute value of the difference between the positions of its two vertices in the arrange…
▽ More
A linear arrangement is a mapping $π$ from the $n$ vertices of a graph $G$ to $n$ distinct consecutive integers. Linear arrangements can be represented by drawing the vertices along a horizontal line and drawing the edges as semicircles above said line. In this setting, the length of an edge is defined as the absolute value of the difference between the positions of its two vertices in the arrangement, and the cost of an arrangement as the sum of all edge lengths. Here we study two variants of the Maximum Linear Arrangement problem (MaxLA), which consists of finding an arrangement that maximizes the cost. In the planar variant for free trees, vertices have to be arranged in such a way that there are no edge crossings. In the projective variant for rooted trees, arrangements have to be planar and the root of the tree cannot be covered by any edge. In this paper we present algorithms that are linear in time and space to solve planar and projective MaxLA for trees. We also prove several properties of maximum projective and planar arrangements, and show that caterpillar trees maximize planar MaxLA over all trees of a fixed size thereby generalizing a previous extremal result on trees.
△ Less
Submitted 21 March, 2023; v1 submitted 14 June, 2022;
originally announced June 2022.
-
The Linear Arrangement Library. A new tool for research on syntactic dependency structures
Authors:
Lluís Alemany-Puig,
Juan Luis Esteban,
Ramon Ferrer-i-Cancho
Abstract:
The new and growing field of Quantitative Dependency Syntax has emerged at the crossroads between Dependency Syntax and Quantitative Linguistics. One of the main concerns in this field is the statistical patterns of syntactic dependency structures. These structures, grouped in treebanks, are the source for statistical analyses in these and related areas; dozens of scores devised over the years are…
▽ More
The new and growing field of Quantitative Dependency Syntax has emerged at the crossroads between Dependency Syntax and Quantitative Linguistics. One of the main concerns in this field is the statistical patterns of syntactic dependency structures. These structures, grouped in treebanks, are the source for statistical analyses in these and related areas; dozens of scores devised over the years are the tools of a new industry to search for patterns and perform other sorts of analyses. The plethora of such metrics and their increasing complexity require sharing the source code of the programs used to perform such analyses. However, such code is not often shared with the scientific community or is tested following unknown standards. Here we present a new open-source tool, the Linear Arrangement Library (LAL), which caters to the needs of, especially, inexperienced programmers. This tool enables the calculation of these metrics on single syntactic dependency structures, treebanks, and collection of treebanks, grounded on ease of use and yet with great flexibility. LAL has been designed to be efficient, easy to use (while satisfying the needs of all levels of programming expertise), reliable (thanks to thorough testing), and to unite research from different traditions, geographic areas, and research fields.
△ Less
Submitted 5 December, 2021;
originally announced December 2021.
-
Linear-time calculation of the expected sum of edge lengths in random projective linearizations of trees
Authors:
Lluís Alemany-Puig,
Ramon Ferrer-i-Cancho
Abstract:
The syntactic structure of a sentence is often represented using syntactic dependency trees. The sum of the distances between syntactically related words has been in the limelight for the past decades. Research on dependency distances led to the formulation of the principle of dependency distance minimization whereby words in sentences are ordered so as to minimize that sum. Numerous random baseli…
▽ More
The syntactic structure of a sentence is often represented using syntactic dependency trees. The sum of the distances between syntactically related words has been in the limelight for the past decades. Research on dependency distances led to the formulation of the principle of dependency distance minimization whereby words in sentences are ordered so as to minimize that sum. Numerous random baselines have been defined to carry out related quantitative studies on languages. The simplest random baseline is the expected value of the sum in unconstrained random permutations of the words in the sentence, namely when all the shufflings of the words of a sentence are allowed and equally likely. Here we focus on a popular baseline: random projective permutations of the words of the sentence, that is, permutations where the syntactic dependency structure is projective, a formal constraint that sentences satisfy often in languages. Thus far, the expectation of the sum of dependency distances in random projective shufflings of a sentence has been estimated approximately with a Monte Carlo procedure whose cost is of the order of $Rn$, where $n$ is the number of words of the sentence and $R$ is the number of samples; it is well known that the larger $R$, the lower the error of the estimation but the larger the time cost. Here we present formulae to compute that expectation without error in time of the order of $n$. Furthermore, we show that star trees maximize it, and give an algorithm to retrieve the trees that minimize it.
△ Less
Submitted 24 May, 2022; v1 submitted 7 July, 2021;
originally announced July 2021.
-
Minimum projective linearizations of trees in linear time
Authors:
Lluís Alemany-Puig,
Juan Luis Esteban,
Ramon Ferrer-i-Cancho
Abstract:
The Minimum Linear Arrangement problem (MLA) consists of finding a mapping $π$ from vertices of a graph to distinct integers that minimizes $\sum_{\{u,v\}\in E}|π(u) - π(v)|$. In that setting, vertices are often assumed to lie on a horizontal line and edges are drawn as semicircles above said line. For trees, various algorithms are available to solve the problem in polynomial time in $n=|V|$. Ther…
▽ More
The Minimum Linear Arrangement problem (MLA) consists of finding a mapping $π$ from vertices of a graph to distinct integers that minimizes $\sum_{\{u,v\}\in E}|π(u) - π(v)|$. In that setting, vertices are often assumed to lie on a horizontal line and edges are drawn as semicircles above said line. For trees, various algorithms are available to solve the problem in polynomial time in $n=|V|$. There exist variants of the MLA in which the arrangements are constrained. Iordanskii, and later Hochberg and Stallmann (HS), put forward $O(n)$-time algorithms that solve the problem when arrangements are constrained to be planar (also known as one-page book embeddings). We also consider linear arrangements of rooted trees that are constrained to be projective (planar embeddings where the root is not covered by any edge). Gildea and Temperley (GT) sketched an algorithm for projective arrangements which they claimed runs in $O(n)$ but did not provide any justification of its cost. In contrast, Park and Levy claimed that GT's algorithm runs in $O(n \log d_{max})$ where $d_{max}$ is the maximum degree but did not provide sufficient detail. Here we correct an error in HS's algorithm for the planar case, show its relationship with the projective case, and derive simple algorithms for the projective and planar cases that run without a doubt in $O(n)$ time.
△ Less
Submitted 12 September, 2024; v1 submitted 5 February, 2021;
originally announced February 2021.
-
The optimality of syntactic dependency distances
Authors:
Ramon Ferrer-i-Cancho,
Carlos Gómez-Rodríguez,
Juan Luis Esteban,
Lluís Alemany-Puig
Abstract:
It is often stated that human languages, as other biological systems, are shaped by cost-cutting pressures but, to what extent? Attempts to quantify the degree of optimality of languages by means of an optimality score have been scarce and focused mostly on English. Here we recast the problem of the optimality of the word order of a sentence as an optimization problem on a spatial network where th…
▽ More
It is often stated that human languages, as other biological systems, are shaped by cost-cutting pressures but, to what extent? Attempts to quantify the degree of optimality of languages by means of an optimality score have been scarce and focused mostly on English. Here we recast the problem of the optimality of the word order of a sentence as an optimization problem on a spatial network where the vertices are words, arcs indicate syntactic dependencies and the space is defined by the linear order of the words in the sentence. We introduce a new score to quantify the cognitive pressure to reduce the distance between linked words in a sentence. The analysis of sentences from 93 languages representing 19 linguistic families reveals that half of languages are optimized to a 70% or more. The score indicates that distances are not significantly reduced in a few languages and confirms two theoretical predictions, i.e. that longer sentences are more optimized and that distances are more likely to be longer than expected by chance in short sentences. We present a new hierarchical ranking of languages by their degree of optimization. The new score has implications for various fields of language research (dependency linguistics, typology, historical linguistics, clinical linguistics and cognitive science). Finally, the principles behind the design of the score have implications for network science.
△ Less
Submitted 4 October, 2021; v1 submitted 30 July, 2020;
originally announced July 2020.
-
Reappraising the distribution of the number of edge crossings of graphs on a sphere
Authors:
Lluís Alemany-Puig,
Mercè Mora,
Ramon Ferrer-i-Cancho
Abstract:
Many real transportation and mobility networks have their vertices placed on the surface of the Earth. In such embeddings, the edges laid on that surface may cross. In his pioneering research, Moon analyzed the distribution of the number of crossings on complete graphs and complete bipartite graphs whose vertices are located uniformly at random on the surface of a sphere assuming that vertex place…
▽ More
Many real transportation and mobility networks have their vertices placed on the surface of the Earth. In such embeddings, the edges laid on that surface may cross. In his pioneering research, Moon analyzed the distribution of the number of crossings on complete graphs and complete bipartite graphs whose vertices are located uniformly at random on the surface of a sphere assuming that vertex placements are independent from each other. Here we revise his derivation of that variance in the light of recent theoretical developments on the variance of crossings and computer simulations. We show that Moon's formulae are inaccurate in predicting the true variance and provide exact formulae.
△ Less
Submitted 31 July, 2020; v1 submitted 6 March, 2020;
originally announced March 2020.
-
Fast calculation of the variance of edge crossings in random arrangements
Authors:
Lluís Alemany-Puig,
Ramon Ferrer-i-Cancho
Abstract:
The crossing number of a graph $G$, $\mathrm{cr}(G)$, is the minimum number of edge crossings arising when drawing a graph on a certain surface. Determining $\mathrm{cr}(G)$ is a problem of great importance in Graph Theory. Its maximum variant, i.e. the maximum crossing number, $\mathrm{max-cr}(G)$, is receiving growing attention. Instead of an optimization problem on the number of crossings, here…
▽ More
The crossing number of a graph $G$, $\mathrm{cr}(G)$, is the minimum number of edge crossings arising when drawing a graph on a certain surface. Determining $\mathrm{cr}(G)$ is a problem of great importance in Graph Theory. Its maximum variant, i.e. the maximum crossing number, $\mathrm{max-cr}(G)$, is receiving growing attention. Instead of an optimization problem on the number of crossings, here we consider the variance of the number of edge crossings, when embedding the vertices of an arbitrary graph uniformly at random in some space. In his pioneering research, Moon derived this variance on random linear arrangements of complete unipartite and bipartite graphs. Given the need of efficient algorithms to support this sort of research and given also the growing interest of the number of edge crossings in spatial networks, networks where vertices are embedded in some space, here we derive an algorithm to calculate the variance in arbitrary graphs in time $o(nm^2)$ that we transform into one that runs in time $O(nm)$ by reusing computations. We also derive one for forests that runs in time $O(n)$. These algorithms work on a wide range of random layouts (not only on Moon's) and are based on novel arithmetic expressions for the calculation of the variance that we develop from previous theoretical work. This paves the way for many applications that rely on a fast but exact calculation of the variance.
△ Less
Submitted 25 July, 2023; v1 submitted 6 March, 2020;
originally announced March 2020.
-
Edge crossings in random linear arrangements
Authors:
Lluís Alemany-Puig,
Ramon Ferrer-i-Cancho
Abstract:
In spatial networks vertices are arranged in some space and edges may cross. When arranging vertices in a 1-dimensional lattice edges may cross when drawn above the vertex sequence as it happens in linguistic and biological networks. Here we investigate the general of problem of the distribution of edge crossings in random arrangements of the vertices. We generalize the existing formula for the ex…
▽ More
In spatial networks vertices are arranged in some space and edges may cross. When arranging vertices in a 1-dimensional lattice edges may cross when drawn above the vertex sequence as it happens in linguistic and biological networks. Here we investigate the general of problem of the distribution of edge crossings in random arrangements of the vertices. We generalize the existing formula for the expectation of this number in random linear arrangements of trees to any network and derive an expression for the variance of the number of crossings in an arbitrary layout relying on a novel characterization of the algebraic structure of that variance in an arbitrary space. We provide compact formulae for the expectation and the variance in complete graphs, complete bipartite graphs, cycle graphs, one-regular graphs and various kinds of trees (star trees, quasi-star trees and linear trees). In these networks, the scaling of expectation and variance as a function of network size is asymptotically power-law-like in random linear arrangements. Our work paves the way for further research and applications in 1-dimension or investigating the distribution of the number of crossings in lattices of higher dimension or other embeddings.
△ Less
Submitted 21 February, 2020; v1 submitted 9 October, 2019;
originally announced October 2019.