-
Factorised Representations of Join Queries: Tight Bounds and a New Dichotomy
Authors:
Christoph Berkholz,
Harry Vinall-Smeeth
Abstract:
A common theme in factorised databases and knowledge compilation is the representation of solution sets in a useful yet succinct data structure. In this paper, we study the representation of the result of join queries (or, equivalently, the set of homomorphisms between two relational structures). We focus on the very general format of $\{\cup, \times\}$-circuits -- also known as d-representations…
▽ More
A common theme in factorised databases and knowledge compilation is the representation of solution sets in a useful yet succinct data structure. In this paper, we study the representation of the result of join queries (or, equivalently, the set of homomorphisms between two relational structures). We focus on the very general format of $\{\cup, \times\}$-circuits -- also known as d-representations or DNNF circuits -- and aim to find the limits of this approach.
In prior work, it has been shown that there always exists a $\{\cup, \times\}$-circuits-circuit of size $N^{O(subw)}$ representing the query result, where N is the size of the database and subw the submodular width of the query. If the arity of all relations is bounded by a constant, then subw is linear in the treewidth tw of the query. In this setting, the authors of this paper proved a lower bound of $N^{Ω(tw^{\varepsilon})}$ on the circuit size (ICALP 2023), where $\varepsilon>0$ depends on the excluded grid theorem.
Our first main contribution is to improve this lower bound to $N^{Ω(tw)}$, which is tight up to a constant factor in the exponent. Our second contribution is a $N^{Ω(subw^{1/4})}$ lower bound on the circuit size for join queries over relations of unbounded arity. Both lower bounds are unconditional lower bounds on the circuit size for well-chosen database instances. Their proofs use a combination of structural (hyper)graph theory with communication complexity in a simple yet novel way. While the second lower bound is asymptotically equivalent to Marx's conditional bound on the decision complexity (JACM 2013), our $N^{Θ(tw)}$ bound in the bounded-arity setting is tight, while the best conditional bound on the decision complexity is $N^{Ω(tw/\log tw)}$. Note that, removing this logarithmic factor in the decision setting is a major open problem.
△ Less
Submitted 26 March, 2025;
originally announced March 2025.
-
A Lower Bound on Unambiguous Context Free Grammars via Communication Complexity
Authors:
Stefan Mengel,
Harry Vinall-Smeeth
Abstract:
Motivated by recent connections to factorised databases, we analyse the efficiency of representations by context free grammars (CFGs). Concretely, we prove a recent conjecture by Kimelfeld, Martens, and Niewerth (ICDT 2025), that for finite languages representations by general CFGs can be doubly-exponentially smaller than those by unambiguous CFGs. To do so, we show the first exponential lower bou…
▽ More
Motivated by recent connections to factorised databases, we analyse the efficiency of representations by context free grammars (CFGs). Concretely, we prove a recent conjecture by Kimelfeld, Martens, and Niewerth (ICDT 2025), that for finite languages representations by general CFGs can be doubly-exponentially smaller than those by unambiguous CFGs. To do so, we show the first exponential lower bounds for representation by unambiguous CFGs of a finite language that can efficiently be represented by CFGs. Our proof first reduces the problem to proving a lower bound in a non-standard model of communication complexity. Then, we argue similarly in spirit to a recent discrepancy argument to show the required communication complexity lower bound. Our result also implies that a finite language may admit an exponentially smaller representation as a nondeterministic finite automaton than as an unambiguous CFG.
△ Less
Submitted 31 March, 2025; v1 submitted 4 December, 2024;
originally announced December 2024.
-
Supercritical Size-Width Tree-Like Resolution Trade-Offs for Graph Isomorphism
Authors:
Christoph Berkholz,
Moritz Lichter,
Harry Vinall-Smeeth
Abstract:
We study the refutation complexity of graph isomorphism in the tree-like resolution calculus. Torán and Wörz (TOCL 2023) showed that there is a resolution refutation of narrow width $k$ for two graphs if and only if they can be distinguished in ($k+1$)-variable first-order logic (FO$^{k+1}$) and hence by a count-free variant of the $k$-dimensional Weisfeiler-Leman algorithm. While DAG-like narrow…
▽ More
We study the refutation complexity of graph isomorphism in the tree-like resolution calculus. Torán and Wörz (TOCL 2023) showed that there is a resolution refutation of narrow width $k$ for two graphs if and only if they can be distinguished in ($k+1$)-variable first-order logic (FO$^{k+1}$) and hence by a count-free variant of the $k$-dimensional Weisfeiler-Leman algorithm. While DAG-like narrow width $k$ resolution refutations have size at most $n^k$, tree-like refutations may be much larger. We show that there are graphs of order n, whose isomorphism can be refuted in narrow width $k$ but only in tree-like size $2^{Ω(n^{k/2})}$. This is a supercritical trade-off where bounding one parameter (the narrow width) causes the other parameter (the size) to grow above its worst case. The size lower bound is super-exponential in the formula size and improves a related supercritical width versus tree-like size trade-off by Razborov (JACM 2016). To prove our result, we develop a new variant of the $k$-pebble EF-game for FO$^k$ to reason about tree-like refutation size in a similar way as the Prover-Delayer games in proof complexity. We analyze this game on a modified variant of the compressed CFI graphs introduced by Grohe, Lichter, Neuen, and Schweitzer (FOCS 2023). Using a recent improved robust compressed CFI construction of Janett, Nordström, and Pang (unpublished manuscript), we obtain a similar bound for width $k$ (instead of the stronger but less common narrow width) and make the result more robust.
△ Less
Submitted 25 July, 2024;
originally announced July 2024.
-
Structured d-DNNF Is Not Closed Under Negation
Authors:
Harry Vinall-Smeeth
Abstract:
Both structured d-DNNF and SDD can be exponentially more succinct than OBDD. Moreover, SDD is essentially as tractable as OBDD. But this has left two important open questions. Firstly, does OBDD support more tractable transformations than structured d-DNNF? And secondly, is structured d-DNNF more succinct than SDD? In this paper, we answer both questions in the affirmative. For the first question…
▽ More
Both structured d-DNNF and SDD can be exponentially more succinct than OBDD. Moreover, SDD is essentially as tractable as OBDD. But this has left two important open questions. Firstly, does OBDD support more tractable transformations than structured d-DNNF? And secondly, is structured d-DNNF more succinct than SDD? In this paper, we answer both questions in the affirmative. For the first question we show that, unlike OBDD, structured d-DNNF does not support polytime negation, disjunction, or existential quantification operations. As a corollary, we deduce that there are functions with an equivalent polynomial-sized structured d-DNNF but with no such representation as an SDD, thus answering the second question. We also lift this second result to arithmetic circuits (AC) to show a succinctness gap between PSDD and the monotone AC analogue to structured d-DNNF.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
From Quantifier Depth to Quantifier Number: Separating Structures with k Variables
Authors:
Harry Vinall-Smeeth
Abstract:
Given two $n$-element structures, $\mathcal{A}$ and $\mathcal{B}$, which can be distinguished by a sentence of $k$-variable first-order logic ($\mathcal{L}^k$), what is the minimum $f(n)$ such that there is guaranteed to be a sentence $φ\in \mathcal{L}^k$ with at most $f(n)$ quantifiers, such that $\mathcal{A} \models φ$ but $\mathcal{B} \not \models φ$? We present various results related to this…
▽ More
Given two $n$-element structures, $\mathcal{A}$ and $\mathcal{B}$, which can be distinguished by a sentence of $k$-variable first-order logic ($\mathcal{L}^k$), what is the minimum $f(n)$ such that there is guaranteed to be a sentence $φ\in \mathcal{L}^k$ with at most $f(n)$ quantifiers, such that $\mathcal{A} \models φ$ but $\mathcal{B} \not \models φ$? We present various results related to this question obtained by using the recently introduced QVT games. In particular, we show that when we limit the number of variables, there can be an exponential gap between the quantifier depth and the quantifier number needed to separate two structures. Through the lens of this question, we will highlight some difficulties that arise in analysing the QVT game and some techniques which can help to overcome them. As a consequence, we show that $\mathcal{L}^{k+1}$ is exponentially more succinct than $\mathcal{L}^{k}$. We also show, in the setting of the existential-positive fragment, how to lift quantifier depth lower bounds to quantifier number lower bounds. This leads to almost tight bounds.
△ Less
Submitted 23 February, 2024; v1 submitted 27 November, 2023;
originally announced November 2023.
-
A dichotomy for succinct representations of homomorphisms
Authors:
Christoph Berkholz,
Harry Vinall-Smeeth
Abstract:
The task of computing homomorphisms between two finite relational structures $\mathcal{A}$ and $\mathcal{B}$ is a well-studied question with numerous applications. Since the set $\operatorname{Hom}(\mathcal{A},\mathcal{B})$ of all homomorphisms may be very large having a method of representing it in a succinct way, especially one which enables us to perform efficient enumeration and counting, coul…
▽ More
The task of computing homomorphisms between two finite relational structures $\mathcal{A}$ and $\mathcal{B}$ is a well-studied question with numerous applications. Since the set $\operatorname{Hom}(\mathcal{A},\mathcal{B})$ of all homomorphisms may be very large having a method of representing it in a succinct way, especially one which enables us to perform efficient enumeration and counting, could be extremely useful.
One simple yet powerful way of doing so is to decompose $\operatorname{Hom}(\mathcal{A},\mathcal{B})$ using union and Cartesian product. Such data structures, called d-representations, have been introduced by Olteanu and Zavodny in the context of database theory. Their results also imply that if the treewidth of the left-hand side structure $\mathcal{A}$ is bounded, then a d-representation of polynomial size can be found in polynomial time. We show that for structures of bounded arity this is optimal: if the treewidth is unbounded then there are instances where the size of any d-representation is superpolynomial. Along the way we develop tools for proving lower bounds on the size of d-representations, in particular we define a notion of reduction suitable for this context and prove an almost tight lower bound on the size of d-representations of all $k$-cliques in a graph.
△ Less
Submitted 26 May, 2023; v1 submitted 29 September, 2022;
originally announced September 2022.