-
Attempting the impossible: enumerating extremal submodular functions for n=6
Authors:
Elod P. Csirmaz,
Laszlo Csirmaz
Abstract:
Enumerating the extremal submodular functions defined on subsets of a fixed base set has only been done for base sets up to five elements. This paper reports the results of attempting to generate all such functions on a six-element base set. Using improved tools from polyhedral geometry, we have computed 360 billion of them, and provide the first reasonable estimate of their total number, which is…
▽ More
Enumerating the extremal submodular functions defined on subsets of a fixed base set has only been done for base sets up to five elements. This paper reports the results of attempting to generate all such functions on a six-element base set. Using improved tools from polyhedral geometry, we have computed 360 billion of them, and provide the first reasonable estimate of their total number, which is expected to be between 1,000 and 10,000 times this number. The applied Double Description and Adjacency Decomposition methods require an insertion order of the defining inequalities. We introduce two novel orders, which speed up the computations significantly, and provide additional insight into the highly symmetric structure of submodular functions. We also present an improvement to the combinatorial test used as part of the Double Description method, and use statistical analyses to estimate the degeneracy of the polyhedral cone used to describe these functions. The statistical results also highlight the limitations of the applied methods.
△ Less
Submitted 20 October, 2024;
originally announced October 2024.
-
Synchronizing Many Filesystems in Near Linear Time
Authors:
Elod P. Csirmaz,
Laszlo Csirmaz
Abstract:
Finding a provably correct subquadratic synchronization algorithm for many filesystem replicas is one of the main theoretical problems in Operational Transformation (OT) and Conflict-free Replicated Data Types (CRDT) frameworks. Based on the Algebraic Theory of Filesystems, which incorporates non-commutative filesystem commands natively, we developed and built a proof-of-concept implementation of…
▽ More
Finding a provably correct subquadratic synchronization algorithm for many filesystem replicas is one of the main theoretical problems in Operational Transformation (OT) and Conflict-free Replicated Data Types (CRDT) frameworks. Based on the Algebraic Theory of Filesystems, which incorporates non-commutative filesystem commands natively, we developed and built a proof-of-concept implementation of an algorithm suite which synchronizes an arbitrary number of replicas. The result is provably correct, and the synchronized system is created in linear space and time after an initial sorting phase. It works by identifying conflicting command pairs and requesting one of the commands to be removed. The method can be guided to reach any of the theoretically possible synchronized states. The algorithm also allows asynchronous usage. After the client sends a synchronization request, the local replica remains available for further modifications. When the synchronization instructions arrive, they can be merged with the changes made since the synchronization request. The suite also works on filesystems with directed acyclic graph-based path structure in place of the traditional tree-like arrangement. Consequently, our algorithms apply to filesystems with hard or soft links as long as the links create no loops.
△ Less
Submitted 17 May, 2023; v1 submitted 19 February, 2023;
originally announced February 2023.
-
Data Synchronization: A Complete Theoretical Solution for Filesystems
Authors:
Elod P. Csirmaz,
Laszlo Csirmaz
Abstract:
Data reconciliation in general, and filesystem synchronization in particular, lacks rigorous theoretical foundation. This paper presents, for the first time, a complete analysis of synchronization for two replicas of a theoretical filesystem. Synchronization has two main stages: identifying the conflicts, and resolving them. All existing (both theoretical and practical) synchronizers are operation…
▽ More
Data reconciliation in general, and filesystem synchronization in particular, lacks rigorous theoretical foundation. This paper presents, for the first time, a complete analysis of synchronization for two replicas of a theoretical filesystem. Synchronization has two main stages: identifying the conflicts, and resolving them. All existing (both theoretical and practical) synchronizers are operation-based: they define, using some rationale or heuristics, how conflicts are to be resolved without considering the effect of the resolution on subsequent conflicts. Instead, our approach is declaration-based: we define what constitutes the resolution of all conflicts, and for each possible scenario we prove the existence of sequences of operations / commands which convert the replicas into a common synchronized state. These sequences consist of operations rolling back some local changes, followed by operations performed on the other replica. The set of rolled-back operations provides the user with clear and intuitive information on the proposed changes, so she can easily decide whether to accept them or ask for other alternatives. All possible synchronized states are described by specifying a set of conflicts, a partial order on the conflicts describing the order in which they need to be resolved, as well as the effect of each decision on subsequent conflicts. Using this classification, the outcomes of different conflict resolution policies can be investigated easily.
△ Less
Submitted 12 November, 2022; v1 submitted 10 October, 2022;
originally announced October 2022.
-
Algebra of Data Reconciliation
Authors:
Elod P. Csirmaz,
Laszlo Csirmaz
Abstract:
With distributed computing and mobile applications becoming ever more prevalent, synchronizing diverging replicas of the same data is a common problem. Reconciliation -- bringing two replicas of the same data structure as close as possible without overriding local changes -- is investigated in an algebraic model. Our approach is to consider two sequences of simple commands that describe the change…
▽ More
With distributed computing and mobile applications becoming ever more prevalent, synchronizing diverging replicas of the same data is a common problem. Reconciliation -- bringing two replicas of the same data structure as close as possible without overriding local changes -- is investigated in an algebraic model. Our approach is to consider two sequences of simple commands that describe the changes in the replicas compared to the original structure, and then determine the maximal subsequences of each that can be propagated to the other. The proposed command set is shown to be functionally complete, and an update detection algorithm is presented which produces a command sequence transforming the original data structure into the replica while traversing both simultaneously. Syntactical characterization is provided in terms of a rewriting system for semantically equivalent command sequences. Algebraic properties of sequence pairs that are applicable to the same data structure are investigated. Based on these results the reconciliation problem is shown to have a unique maximal solution. In addition, syntactical properties of the maximal solution allow for an efficient algorithm that produces it.
△ Less
Submitted 9 August, 2022; v1 submitted 12 October, 2021;
originally announced October 2021.
-
Algebraic File Synchronization: Adequacy and Completeness
Authors:
Elod Pal Csirmaz
Abstract:
With distributed computing and mobile applications, synchronizing diverging replicas of data structures is a more and more common problem. We use algebraic methods to reason about filesystem operations, and introduce a simplified definition of conflicting updates to filesystems. We also define algorithms for update detection and reconciliation and present rigorous proofs that they not only work as…
▽ More
With distributed computing and mobile applications, synchronizing diverging replicas of data structures is a more and more common problem. We use algebraic methods to reason about filesystem operations, and introduce a simplified definition of conflicting updates to filesystems. We also define algorithms for update detection and reconciliation and present rigorous proofs that they not only work as intended, but also cannot be improved on.
To achieve this, we introduce a novel, symmetric set of filesystem commands with higher information content, which removes edge cases and increases the predictive powers of our algebraic model. We also present a number of generally useful classes and properties of sequences of commands.
While these results are often intuitive, providing exact proofs for them is far from trivial. They contribute to our understanding of this special type of algebraic model, and toward building more complete algebras of filesystem trees and extending algebraic approaches to other data storage protocols. They also form a theoretical basis for specifying and guaranteeing the error-free operation of applications that implement an algebraic approach to synchronization.
△ Less
Submitted 20 July, 2018; v1 submitted 7 January, 2016;
originally announced January 2016.