-
Compressing Hypergraphs using Suffix Sorting
Authors:
Enno Adler,
Stefan Böttcher,
Rita Hartel
Abstract:
Hypergraphs model complex, non-binary relationships like co-authorships, social group memberships, and recommendations. Like traditional graphs, hypergraphs can grow large, posing challenges for storage, transmission, and query performance. We propose HyperCSA, a novel compression method for hypergraphs that maintains support for standard queries over the succinct representation. HyperCSA achieves…
▽ More
Hypergraphs model complex, non-binary relationships like co-authorships, social group memberships, and recommendations. Like traditional graphs, hypergraphs can grow large, posing challenges for storage, transmission, and query performance. We propose HyperCSA, a novel compression method for hypergraphs that maintains support for standard queries over the succinct representation. HyperCSA achieves compression ratios of 26% to 79% of the original file size on real-world hypergraphs - outperforming existing methods on all large hypergraphs in our experiments. Additionally, HyperCSA scales to larger datasets than existing approaches. Furthermore, for common real-world hypergraphs, HyperCSA evaluates neighbor queries 6 to 40 times faster than both standard data structures and other hypergraph compression approaches.
△ Less
Submitted 5 June, 2025;
originally announced June 2025.
-
IBB: Fast Burrows-Wheeler Transform Construction for Length-Diverse DNA Data
Authors:
Enno Adler,
Stefan Böttcher,
Rita Hartel,
Cederic Alexander Steininger
Abstract:
The Burrows-Wheeler transform (BWT) is integral to the FM-index, which is used extensively in text compression, indexing, pattern search, and bioinformatic problems as de novo assembly and read alignment. Thus, efficient construction of the BWT in terms of time and memory usage is key to these applications. We present a novel external algorithm called Improved-Bucket Burrows-Wheeler transform (IBB…
▽ More
The Burrows-Wheeler transform (BWT) is integral to the FM-index, which is used extensively in text compression, indexing, pattern search, and bioinformatic problems as de novo assembly and read alignment. Thus, efficient construction of the BWT in terms of time and memory usage is key to these applications. We present a novel external algorithm called Improved-Bucket Burrows-Wheeler transform (IBB) for constructing the BWT of DNA datasets with highly diverse sequence lengths. IBB uses a right-aligned approach to efficiently handle sequences of varying lengths, a tree-based data structure to manage relative insert positions and ranks, and fine buckets to reduce the necessary amount of input and output to external memory. Our experiments demonstrate that IBB is 10% to 40% faster than the best existing state-of-the-art BWT construction algorithms on most datasets while maintaining competitive memory consumption.
△ Less
Submitted 3 February, 2025;
originally announced February 2025.
-
String Partition for Building Long Burrows-Wheeler Transforms
Authors:
Enno Adler,
Stefan Böttcher,
Rita Hartel
Abstract:
Constructing the Burrows-Wheeler transform (BWT) for long strings poses significant challenges regarding construction time and memory usage. We use a prefix of the suffix array to partition a long string into shorter substrings, thereby enabling the use of multi-string BWT construction algorithms to process these partitions fast. We provide an implementation, partDNA, for DNA sequences. Through co…
▽ More
Constructing the Burrows-Wheeler transform (BWT) for long strings poses significant challenges regarding construction time and memory usage. We use a prefix of the suffix array to partition a long string into shorter substrings, thereby enabling the use of multi-string BWT construction algorithms to process these partitions fast. We provide an implementation, partDNA, for DNA sequences. Through comparison with state-of-the-art BWT construction algorithms, we show that partDNA with IBB offers a novel trade-off for construction time and memory usage for BWT construction on real genome datasets. Beyond this, the proposed partitioning strategy is applicable to strings of any alphabet.
△ Less
Submitted 14 May, 2025; v1 submitted 15 June, 2024;
originally announced June 2024.
-
ITR: Grammar-based Graph Compression Supporting Fast Triple Queries
Authors:
Enno Adler,
Stefan Böttcher,
Rita Hartel
Abstract:
Neighborhood queries and triple queries are the most common queries on graphs; thus, it is desirable to answer them efficiently on compressed data structures. We present a compression scheme called Incidence-Type-RePair (ITR) for graphs with labeled nodes and labeled edges based on RePair and apply the scheme to network, version, and RDF graphs. We show that ITR performs neighborhood queries and t…
▽ More
Neighborhood queries and triple queries are the most common queries on graphs; thus, it is desirable to answer them efficiently on compressed data structures. We present a compression scheme called Incidence-Type-RePair (ITR) for graphs with labeled nodes and labeled edges based on RePair and apply the scheme to network, version, and RDF graphs. We show that ITR performs neighborhood queries and triple queries within only a few milliseconds and thereby outperforms existing RePair-based solutions on graphs while providing a compression size comparable to existing graph compressors.
△ Less
Submitted 10 October, 2024; v1 submitted 1 June, 2023;
originally announced June 2023.
-
Deep reinforced learning heuristic tested on spin-glass ground states: The larger picture
Authors:
Stefan Boettcher
Abstract:
In Changjun Fan et al. [Nature Communications https://doi.org/10.1038/s41467-023-36363-w (2023)], the authors present a deep reinforced learning approach to augment combinatorial optimization heuristics. In particular, they present results for several spin glass ground state problems, for which instances on non-planar networks are generally NP-hard, in comparison with several Monte Carlo based met…
▽ More
In Changjun Fan et al. [Nature Communications https://doi.org/10.1038/s41467-023-36363-w (2023)], the authors present a deep reinforced learning approach to augment combinatorial optimization heuristics. In particular, they present results for several spin glass ground state problems, for which instances on non-planar networks are generally NP-hard, in comparison with several Monte Carlo based methods, such as simulated annealing (SA) or parallel tempering (PT). Indeed, those results demonstrate that the reinforced learning improves the results over those obtained with SA or PT, or at least allows for reduced runtimes for the heuristics before results of comparable quality have been obtained relative to those other methods. To facilitate the conclusion that their method is ''superior'', the authors pursue two basic strategies: (1) A commercial GUROBI solver is called on to procure a sample of exact ground states as a testbed to compare with, and (2) a head-to-head comparison between the heuristics is given for a sample of larger instances where exact ground states are hard to ascertain. Here, we put these studies into a larger context, showing that the claimed superiority is at best marginal for smaller samples and becomes essentially irrelevant with respect to any sensible approximation of true ground states in the larger samples. For example, this method becomes irrelevant as a means to determine stiffness exponents $θ$ in $d>2$, as mentioned by the authors, where the problem is not only NP-hard but requires the subtraction of two almost equal ground-state energies and systemic errors in each of $\approx 1\%$ found here are unacceptable. This larger picture on the method arises from a straightforward finite-size corrections study over the spin glass ensembles the authors employ, using data that has been available for decades.
△ Less
Submitted 14 September, 2023; v1 submitted 21 February, 2023;
originally announced February 2023.
-
Inability of a graph neural network heuristic to outperform greedy algorithms in solving combinatorial optimization problems like Max-Cut
Authors:
Stefan Boettcher
Abstract:
In Nature Machine Intelligence 4, 367 (2022), Schuetz et al provide a scheme to employ graph neural networks (GNN) as a heuristic to solve a variety of classical, NP-hard combinatorial optimization problems. It describes how the network is trained on sample instances and the resulting GNN heuristic is evaluated applying widely used techniques to determine its ability to succeed. Clearly, the idea…
▽ More
In Nature Machine Intelligence 4, 367 (2022), Schuetz et al provide a scheme to employ graph neural networks (GNN) as a heuristic to solve a variety of classical, NP-hard combinatorial optimization problems. It describes how the network is trained on sample instances and the resulting GNN heuristic is evaluated applying widely used techniques to determine its ability to succeed. Clearly, the idea of harnessing the powerful abilities of such networks to ``learn'' the intricacies of complex, multimodal energy landscapes in such a hands-off approach seems enticing. And based on the observed performance, the heuristic promises to be highly scalable, with a computational cost linear in the input size $n$, although there is likely a significant overhead in the pre-factor due to the GNN itself. However, closer inspection shows that the reported results for this GNN are only minutely better than those for gradient descent and get outperformed by a greedy algorithm, for example, for Max-Cut. The discussion also highlights what I believe are some common misconceptions in the evaluations of heuristics.
△ Less
Submitted 2 October, 2022;
originally announced October 2022.
-
Simulation computation in grammar-compressed graphs
Authors:
Stefan Böttcher,
Rita Hartel,
Sven Peeters
Abstract:
Like [1], we present an algorithm to compute the simulation of a query pattern in a graph of labeled nodes and unlabeled edges. However, our algorithm works on a compressed graph grammar, instead of on the original graph. The speed-up of our algorithm compared to the algorithm in [1] grows with the size of the graph and with the compression strength.
Like [1], we present an algorithm to compute the simulation of a query pattern in a graph of labeled nodes and unlabeled edges. However, our algorithm works on a compressed graph grammar, instead of on the original graph. The speed-up of our algorithm compared to the algorithm in [1] grows with the size of the graph and with the compression strength.
△ Less
Submitted 14 January, 2020;
originally announced January 2020.
-
Analysis of the Relation between Quadratic Unconstrained Binary Optimization (QUBO) and the Spin Glass Ground-State Problem
Authors:
Stefan Boettcher
Abstract:
We analyze the transformation of QUBO from its conventional Boolean presentation into an equivalent spin glass problem with coupled $\pm1$ spin variables exposed to a site-dependent external field. We find that in a widely used testbed for QUBO these fields tend to be rather large compared to the typical coupling and many spins in each optimal configurations simply align with the fields irrespecti…
▽ More
We analyze the transformation of QUBO from its conventional Boolean presentation into an equivalent spin glass problem with coupled $\pm1$ spin variables exposed to a site-dependent external field. We find that in a widely used testbed for QUBO these fields tend to be rather large compared to the typical coupling and many spins in each optimal configurations simply align with the fields irrespective of their constraints. Thereby, the testbed instances tend to exhibit large redundancies - seemingly independent variables which contribute little to the hardness of the problem, however. We demonstrate various consequences of this insight, for QUBO solvers as well as for heuristics developed for finding spin glass ground states. To this end, we implement the Extremal Optimization (EO) heuristic, in a new adaptation for the QUBO problem. We also propose a novel way to assess the quality of heuristics for increasing problem sizes based on asymptotic scaling.
△ Less
Submitted 3 December, 2019; v1 submitted 24 June, 2019;
originally announced June 2019.
-
Complexity Bounds on Quantum Search Algorithms in finite-dimensional Networks
Authors:
Stefan Boettcher,
Shanshan Li,
Tharso D. Fernandes,
Renato Portugal
Abstract:
We establish a lower bound concerning the computational complexity of Grover's algorithms on fractal networks. This bound provides general predictions for the quantum advantage gained for searching unstructured lists. It yields a fundamental criterion, derived from quantum transport properties, for the improvement a quantum search algorithm achieves over the corresponding classical search in a net…
▽ More
We establish a lower bound concerning the computational complexity of Grover's algorithms on fractal networks. This bound provides general predictions for the quantum advantage gained for searching unstructured lists. It yields a fundamental criterion, derived from quantum transport properties, for the improvement a quantum search algorithm achieves over the corresponding classical search in a network based solely on its spectral dimension, $d_{s}$. Our analysis employs recent advances in the interpretation of the venerable real-space renormalization group (RG) as applied to quantum walks. It clarifies the competition between Grover's abstract algorithm, i.e., a rotation in Hilbert space, and quantum transport in an actual geometry. The latter is characterized in terms of the quantum walk dimension $d_{w}^{Q}$ and the spatial (fractal) dimension $d_{f}$ that is summarized simply by the spectral dimension of the network. The analysis simultaneously determines the optimal time for a quantum measurement and the probability for successfully pin-pointing a marked element in the network. The RG further encompasses an optimization scheme devised by Tulsi that allows to tune this probability to certainty, leaving quantum transport as the only limiting process. It considers entire families of problems to be studied, thereby establishing large universality classes for quantum search, which we verify with extensive simulations. The methods we develop could point the way towards systematic studies of universality classes in computational complexity to enable modification and control of search behavior.
△ Less
Submitted 18 July, 2018; v1 submitted 17 August, 2017;
originally announced August 2017.
-
Efficient XML Keyword Search based on DAG-Compression
Authors:
Stefan Böttcher,
Rita Hartel,
Jonathan Rabe
Abstract:
In contrast to XML query languages as e.g. XPath which require knowledge on the query language as well as on the document structure, keyword search is open to anybody. As the size of XML sources grows rapidly, the need for efficient search indices on XML data that support keyword search increases. In this paper, we present an approach of XML keyword search which is based on the DAG of the XML data…
▽ More
In contrast to XML query languages as e.g. XPath which require knowledge on the query language as well as on the document structure, keyword search is open to anybody. As the size of XML sources grows rapidly, the need for efficient search indices on XML data that support keyword search increases. In this paper, we present an approach of XML keyword search which is based on the DAG of the XML data, where repeated substructures are considered only once, and therefore, have to be searched only once. As our performance evaluation shows, this DAG-based extension of the set intersection search algorithm[1], [2], can lead to search times that are on large documents more than twice as fast as the search times of the XML-based approach. Additionally, we utilize a smaller index, i.e., we consume less main memory to compute the results.
△ Less
Submitted 26 November, 2013;
originally announced November 2013.
-
Optimization of transport protocols with path-length constraints in complex networks
Authors:
Jose J. Ramasco,
Marta S. de la Lama,
Eduardo Lopez,
Stefan Boettcher
Abstract:
We propose a protocol optimization technique that is applicable to both weighted or unweighted graphs. Our aim is to explore by how much a small variation around the Shortest Path or Optimal Path protocols can enhance protocol performance. Such an optimization strategy can be necessary because even though some protocols can achieve very high traffic tolerance levels, this is commonly done by enlar…
▽ More
We propose a protocol optimization technique that is applicable to both weighted or unweighted graphs. Our aim is to explore by how much a small variation around the Shortest Path or Optimal Path protocols can enhance protocol performance. Such an optimization strategy can be necessary because even though some protocols can achieve very high traffic tolerance levels, this is commonly done by enlarging the path-lengths, which may jeopardize scalability. We use ideas borrowed from Extremal Optimization to guide our algorithm, which proves to be an effective technique. Our method exploits the degeneracy of the paths or their close-weight alternatives, which significantly improves the scalability of the protocols in comparison to Shortest Paths or Optimal Paths protocols, keeping at the same time almost intact the length or weight of the paths. This characteristic ensures that the optimized routing protocols are composed of paths that are quick to traverse, avoiding negative effects in data communication due to path-length increases that can become specially relevant when information losses are present.
△ Less
Submitted 3 June, 2010;
originally announced June 2010.
-
Conjecture on the maximum cut and bisection width in random regular graphs
Authors:
Lenka Zdeborová,
Stefan Boettcher
Abstract:
Asymptotic properties of random regular graphs are object of extensive study in mathematics. In this note we argue, based on theory of spin glasses, that in random regular graphs the maximum cut size asymptotically equals the number of edges in the graph minus the minimum bisection size. Maximum cut and minimal bisection are two famous NP-complete problems with no known general relation between…
▽ More
Asymptotic properties of random regular graphs are object of extensive study in mathematics. In this note we argue, based on theory of spin glasses, that in random regular graphs the maximum cut size asymptotically equals the number of edges in the graph minus the minimum bisection size. Maximum cut and minimal bisection are two famous NP-complete problems with no known general relation between them, hence our conjecture is a surprising property of random regular graphs. We further support the conjecture with numerical simulations. A rigorous proof of this relation is obviously a challenge.
△ Less
Submitted 24 December, 2009;
originally announced December 2009.
-
The Peculiar Phase Structure of Random Graph Bisection
Authors:
Allon G. Percus,
Gabriel Istrate,
Bruno Goncalves,
Robert Z. Sumi,
Stefan Boettcher
Abstract:
The mincut graph bisection problem involves partitioning the n vertices of a graph into disjoint subsets, each containing exactly n/2 vertices, while minimizing the number of "cut" edges with an endpoint in each subset. When considered over sparse random graphs, the phase structure of the graph bisection problem displays certain familiar properties, but also some surprises. It is known that when…
▽ More
The mincut graph bisection problem involves partitioning the n vertices of a graph into disjoint subsets, each containing exactly n/2 vertices, while minimizing the number of "cut" edges with an endpoint in each subset. When considered over sparse random graphs, the phase structure of the graph bisection problem displays certain familiar properties, but also some surprises. It is known that when the mean degree is below the critical value of 2 log 2, the cutsize is zero with high probability. We study how the minimum cutsize increases with mean degree above this critical threshold, finding a new analytical upper bound that improves considerably upon previous bounds. Combined with recent results on expander graphs, our bound suggests the unusual scenario that random graph bisection is replica symmetric up to and beyond the critical threshold, with a replica symmetry breaking transition possibly taking place above the threshold. An intriguing algorithmic consequence is that although the problem is NP-hard, we can find near-optimal cutsizes (whose ratio to the optimal value approaches 1 asymptotically) in polynomial time for typical instances near the phase transition.
△ Less
Submitted 19 November, 2008; v1 submitted 11 August, 2008;
originally announced August 2008.
-
Analysis of the Karmarkar-Karp Differencing Algorithm
Authors:
Stefan Boettcher,
Stephan Mertens
Abstract:
The Karmarkar-Karp differencing algorithm is the best known polynomial time heuristic for the number partitioning problem, fundamental in both theoretical computer science and statistical physics. We analyze the performance of the differencing algorithm on random instances by mapping it to a nonlinear rate equation. Our analysis reveals strong finite size effects that explain why the precise asy…
▽ More
The Karmarkar-Karp differencing algorithm is the best known polynomial time heuristic for the number partitioning problem, fundamental in both theoretical computer science and statistical physics. We analyze the performance of the differencing algorithm on random instances by mapping it to a nonlinear rate equation. Our analysis reveals strong finite size effects that explain why the precise asymptotics of the differencing solution is hard to establish by simulations. The asymptotic series emerging from the rate equation satisfies all known bounds on the Karmarkar-Karp algorithm and projects a scaling $n^{-c\ln n}$, where $c=1/(2\ln2)=0.7213...$. Our calculations reveal subtle relations between the algorithm and Fibonacci-like sequences, and we establish an explicit identity to that effect.
△ Less
Submitted 3 October, 2008; v1 submitted 27 February, 2008;
originally announced February 2008.
-
Spines of Random Constraint Satisfaction Problems: Definition and Connection with Computational Complexity
Authors:
Gabriel Istrate,
Stefan Boettcher,
Allon G. Percus
Abstract:
We study the connection between the order of phase transitions in combinatorial problems and the complexity of decision algorithms for such problems. We rigorously show that, for a class of random constraint satisfaction problems, a limited connection between the two phenomena indeed exists. Specifically, we extend the definition of the spine order parameter of Bollobas et al. to random constrai…
▽ More
We study the connection between the order of phase transitions in combinatorial problems and the complexity of decision algorithms for such problems. We rigorously show that, for a class of random constraint satisfaction problems, a limited connection between the two phenomena indeed exists. Specifically, we extend the definition of the spine order parameter of Bollobas et al. to random constraint satisfaction problems, rigorously showing that for such problems a discontinuity of the spine is associated with a $2^{Ω(n)}$ resolution complexity (and thus a $2^{Ω(n)}$ complexity of DPLL algorithms) on random instances. The two phenomena have a common underlying cause: the emergence of ``large'' (linear size) minimally unsatisfiable subformulas of a random formula at the satisfiability phase transition.
We present several further results that add weight to the intuition that random constraint satisfaction problems with a sharp threshold and a continuous spine are ``qualitatively similar to random 2-SAT''. Finally, we argue that it is the spine rather than the backbone parameter whose continuity has implications for the decision complexity of combinatorial problems, and we provide experimental evidence that the two parameters can behave in a different manner.
△ Less
Submitted 29 March, 2005;
originally announced March 2005.
-
Extremal Optimization: an Evolutionary Local-Search Algorithm
Authors:
Stefan Boettcher,
Allon G. Percus
Abstract:
A recently introduced general-purpose heuristic for finding high-quality solutions for many hard optimization problems is reviewed. The method is inspired by recent progress in understanding far-from-equilibrium phenomena in terms of {\em self-organized criticality,} a concept introduced to describe emergent complexity in physical systems. This method, called {\em extremal optimization,} success…
▽ More
A recently introduced general-purpose heuristic for finding high-quality solutions for many hard optimization problems is reviewed. The method is inspired by recent progress in understanding far-from-equilibrium phenomena in terms of {\em self-organized criticality,} a concept introduced to describe emergent complexity in physical systems. This method, called {\em extremal optimization,} successively replaces the value of extremely undesirable variables in a sub-optimal solution with new, random ones. Large, avalanche-like fluctuations in the cost function self-organize from this dynamics, effectively scaling barriers to explore local optima in distant neighborhoods of the configuration space while eliminating the need to tune parameters. Drawing upon models used to simulate the dynamics of granular media, evolution, or geology, extremal optimization complements approximation methods inspired by equilibrium statistical physics, such as {\em simulated annealing}. It may be but one example of applying new insights into {\em non-equilibrium phenomena} systematically to hard optimization problems. This method is widely applicable and so far has proved competitive with -- and even superior to -- more elaborate general-purpose heuristics on testbeds of constrained optimization problems with up to $10^5$ variables, such as bipartitioning, coloring, and satisfiability. Analysis of a suitable model predicts the only free parameter of the method in accordance with all experimental results.
△ Less
Submitted 26 September, 2002;
originally announced September 2002.
-
Jamming Model for the Extremal Optimization Heuristic
Authors:
S. Boettcher,
M. Grigni
Abstract:
Extremal Optimization, a recently introduced meta-heuristic for hard optimization problems, is analyzed on a simple model of jamming. The model is motivated first by the problem of finding lowest energy configurations for a disordered spin system on a fixed-valence graph. The numerical results for the spin system exhibit the same phenomena found in all earlier studies of extremal optimization, a…
▽ More
Extremal Optimization, a recently introduced meta-heuristic for hard optimization problems, is analyzed on a simple model of jamming. The model is motivated first by the problem of finding lowest energy configurations for a disordered spin system on a fixed-valence graph. The numerical results for the spin system exhibit the same phenomena found in all earlier studies of extremal optimization, and our analytical results for the model reproduce many of these features.
△ Less
Submitted 12 December, 2001; v1 submitted 9 October, 2001;
originally announced October 2001.
-
Extremal Optimization for Graph Partitioning
Authors:
S. Boettcher,
A. G. Percus
Abstract:
Extremal optimization is a new general-purpose method for approximating solutions to hard optimization problems. We study the method in detail by way of the NP-hard graph partitioning problem. We discuss the scaling behavior of extremal optimization, focusing on the convergence of the average run as a function of runtime and system size. The method has a single free parameter, which we determine…
▽ More
Extremal optimization is a new general-purpose method for approximating solutions to hard optimization problems. We study the method in detail by way of the NP-hard graph partitioning problem. We discuss the scaling behavior of extremal optimization, focusing on the convergence of the average run as a function of runtime and system size. The method has a single free parameter, which we determine numerically and justify using a simple argument. Our numerical results demonstrate that on random graphs, extremal optimization maintains consistent accuracy for increasing system sizes, with an approximation error decreasing over runtime roughly as a power law t^(-0.4). On geometrically structured graphs, the scaling of results from the average run suggests that these are far from optimal, with large fluctuations between individual trials. But when only the best runs are considered, results consistent with theoretical arguments are recovered.
△ Less
Submitted 11 April, 2001;
originally announced April 2001.
-
Optimization with Extremal Dynamics
Authors:
S. Boettcher,
A. G. Percus
Abstract:
We explore a new general-purpose heuristic for finding high-quality solutions to hard optimization problems. The method, called extremal optimization, is inspired by self-organized criticality, a concept introduced to describe emergent complexity in physical systems. Extremal optimization successively replaces extremely undesirable variables of a single sub-optimal solution with new, random ones…
▽ More
We explore a new general-purpose heuristic for finding high-quality solutions to hard optimization problems. The method, called extremal optimization, is inspired by self-organized criticality, a concept introduced to describe emergent complexity in physical systems. Extremal optimization successively replaces extremely undesirable variables of a single sub-optimal solution with new, random ones. Large fluctuations ensue, that efficiently explore many local optima. With only one adjustable parameter, the heuristic's performance has proven competitive with more elaborate methods, especially near phase transitions which are believed to coincide with the hardest instances. We use extremal optimization to elucidate the phase transition in the 3-coloring problem, and we provide independent confirmation of previously reported extrapolations for the ground-state energy of +-J spin glasses in d=3 and 4.
△ Less
Submitted 8 April, 2001; v1 submitted 23 October, 2000;
originally announced October 2000.