Search | arXiv e-print repository

Faster Weighted and Unweighted Tree Edit Distance and APSP Equivalence

Authors: Jakob Nogler, Adam Polak, Barna Saha, Virginia Vassilevska Williams, Yinzhan Xu, Christopher Ye

Abstract: The tree edit distance (TED) between two rooted ordered trees with $n$ nodes labeled from an alphabet $Σ$ is the minimum cost of transforming one tree into the other by a sequence of valid operations consisting of insertions, deletions and relabeling of nodes. The tree edit distance is a well-known generalization of string edit distance and has been studied since the 1970s. Years of steady improve… ▽ More The tree edit distance (TED) between two rooted ordered trees with $n$ nodes labeled from an alphabet $Σ$ is the minimum cost of transforming one tree into the other by a sequence of valid operations consisting of insertions, deletions and relabeling of nodes. The tree edit distance is a well-known generalization of string edit distance and has been studied since the 1970s. Years of steady improvements have led to an $O(n^3)$ algorithm [DMRW 2010]. Fine-grained complexity casts light onto the hardness of TED showing that a truly subcubic time algorithm for TED implies a truly subcubic time algorithm for All-Pairs Shortest Paths (APSP) [BGMW 2020]. Therefore, under the popular APSP hypothesis, a truly subcubic time algorithm for TED cannot exist. However, unlike many problems in fine-grained complexity for which conditional hardness based on APSP also comes with equivalence to APSP, whether TED can be reduced to APSP has remained unknown. In this paper, we resolve this. Not only we show that TED is fine-grained equivalent to APSP, our reduction is tight enough, so that combined with the fastest APSP algorithm to-date [Williams 2018] it gives the first ever subcubic time algorithm for TED running in $n^3/2^{Ω(\sqrt{\log{n}})}$ time. We also consider the unweighted tree edit distance problem in which the cost of each edit is one. For unweighted TED, a truly subcubic algorithm is known due to Mao [Mao 2022], later improved slightly by Dürr [Dürr 2023] to run in $O(n^{2.9148})$. Their algorithm uses bounded monotone min-plus product as a crucial subroutine, and the best running time for this product is $\tilde{O}(n^{\frac{3+ω}{2}})\leq O(n^{2.6857})$ (where $ω$ is the exponent of fast matrix multiplication). In this work, we close this gap and give an algorithm for unweighted TED that runs in $\tilde{O}(n^{\frac{3+ω}{2}})$ time. △ Less

Submitted 31 March, 2025; v1 submitted 10 November, 2024; originally announced November 2024.

Comments: Replaced with revised version

arXiv:2410.06808 [pdf, other]

Near-Optimal-Time Quantum Algorithms for Approximate Pattern Matching

Authors: Tomasz Kociumaka, Jakob Nogler, Philip Wellnitz

Abstract: Approximate Pattern Matching is among the most fundamental string-processing tasks. Given a text $T$ of length $n$, a pattern $P$ of length $m$, and a threshold $k$, the task is to identify the fragments of $T$ that are at distance at most $k$ to $P$. We consider the two most common distances: Hamming distance (the number of character substitutions) in Pattern Matching with Mismatches and edit dis… ▽ More Approximate Pattern Matching is among the most fundamental string-processing tasks. Given a text $T$ of length $n$, a pattern $P$ of length $m$, and a threshold $k$, the task is to identify the fragments of $T$ that are at distance at most $k$ to $P$. We consider the two most common distances: Hamming distance (the number of character substitutions) in Pattern Matching with Mismatches and edit distance (the minimum number of character insertions, deletions, and substitutions) in Pattern Matching with Edits. We revisit the complexity of these two problems in the quantum setting. Our recent work [STOC'24] shows that $\hat{O}(\sqrt{nk})$ quantum queries are sufficient to solve (the decision version of) Pattern Matching with Edits. However, the quantum time complexity of the underlying solution does not provide any improvement over classical computation. On the other hand, the state-of-the-art algorithm for Pattern Matching with Mismatches [Jin and Nogler; SODA'23] achieves query complexity $\hat{O}(\sqrt{nk^{3/2}})$ and time complexity $\tilde{O}(\sqrt{nk^2})$, falling short of an unconditional lower bound of $Ω(\sqrt{nk})$ queries. In this work, we present quantum algorithms with a time complexity of $\tilde{O}(\sqrt{nk}+\sqrt{n/m}\cdot k^2)$ for Pattern Matching with Mismatches and $\hat{O}(\sqrt{nk}+\sqrt{n/m}\cdot k^{3.5})$ for Pattern Matching with Edits; both solutions use $\hat{O}(\sqrt{nk})$ queries. The running times are near-optimal for $k\ll m^{1/3}$ and $k\ll m^{1/6}$, respectively, and offer advantage over classical algorithms for $k\ll (mn)^{1/4}$ and $k\ll (mn)^{1/7}$, respectively. Our solutions can also report the starting positions of approximate occurrences of $P$ in $T$ (represented as collections of arithmetic progressions); in this case, the unconditional lower bound and the complexities of our algorithms increase by a $Θ(\sqrt{n/m})$ factor. △ Less

Submitted 9 October, 2024; originally announced October 2024.

Comments: 69 pages, 2 figures

arXiv:2403.18812 [pdf, other]

On the Communication Complexity of Approximate Pattern Matching

Authors: Tomasz Kociumaka, Jakob Nogler, Philip Wellnitz

Abstract: The decades-old Pattern Matching with Edits problem, given a length-$n$ string $T$ (the text), a length-$m$ string $P$ (the pattern), and a positive integer $k$ (the threshold), asks to list all fragments of $T$ that are at edit distance at most $k$ from $P$. The one-way communication complexity of this problem is the minimum amount of space needed to encode the answer so that it can be retrieved… ▽ More The decades-old Pattern Matching with Edits problem, given a length-$n$ string $T$ (the text), a length-$m$ string $P$ (the pattern), and a positive integer $k$ (the threshold), asks to list all fragments of $T$ that are at edit distance at most $k$ from $P$. The one-way communication complexity of this problem is the minimum amount of space needed to encode the answer so that it can be retrieved without accessing the input strings $P$ and $T$. The closely related Pattern Matching with Mismatches problem (defined in terms of the Hamming distance instead of the edit distance) is already well understood from the communication complexity perspective: Clifford, Kociumaka, and Porat [SODA 2019] proved that $Ω(n/m \cdot k \log(m/k))$ bits are necessary and $O(n/m \cdot k\log (m|Σ|/k))$ bits are sufficient; the upper bound allows encoding not only the occurrences of $P$ in $T$ with at most $k$ mismatches but also the substitutions needed to make each $k$-mismatch occurrence exact. Despite recent improvements in the running time [Charalampopoulos, Kociumaka, and Wellnitz; FOCS 2020 and 2022], the communication complexity of Pattern Matching with Edits remained unexplored, with a lower bound of $Ω(n/m \cdot k\log(m/k))$ bits and an upper bound of $O(n/m \cdot k^3\log m)$ bits stemming from previous research. In this work, we prove an upper bound of $O(n/m \cdot k \log^2 m)$ bits, thus establishing the optimal communication complexity up to logarithmic factors. We also show that $O(n/m \cdot k \log m \log (m|Σ|))$ bits allow encoding, for each $k$-error occurrence of $P$ in $T$, the shortest sequence of edits needed to make the occurrence exact. We leverage the techniques behind our new result on the communication complexity to obtain quantum algorithms for Pattern Matching with Edits. △ Less

Submitted 9 October, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

Comments: 67 pages; 5 figures; abstract shortened

arXiv:2403.06376 [pdf, other]

The Geometry of Cyclical Social Trends

Authors: Bernard Chazelle, Kritkorn Karntikoon, Jakob Nogler

Abstract: We investigate the emergence of periodic behavior in opinion dynamics and its underlying geometry. For this, we use a bounded-confidence model with contrarian agents in a convolution social network. This means that agents adapt their opinions by interacting with their neighbors in a time-varying social network. Being contrarian, the agents are kept from reaching consensus. This is the key feature… ▽ More We investigate the emergence of periodic behavior in opinion dynamics and its underlying geometry. For this, we use a bounded-confidence model with contrarian agents in a convolution social network. This means that agents adapt their opinions by interacting with their neighbors in a time-varying social network. Being contrarian, the agents are kept from reaching consensus. This is the key feature that allows the emergence of cyclical trends. We show that the systems either converge to nonconsensual equilibrium or are attracted to periodic or quasi-periodic orbits. We bound the dimension of the attractors and the period of cyclical trends. We exhibit instances where each orbit is dense and uniformly distributed within its attractor. We also investigate the case of randomly changing social networks. △ Less

Submitted 10 March, 2024; originally announced March 2024.

arXiv:2211.15945 [pdf, other]

Quantum Speed-ups for String Synchronizing Sets, Longest Common Substring, and k-mismatch Matching

Authors: Ce Jin, Jakob Nogler

Abstract: Longest Common Substring (LCS) is an important text processing problem, which has recently been investigated in the quantum query model. The decisional version of this problem, LCS with threshold $d$, asks whether two length-$n$ input strings have a common substring of length $d$. The two extreme cases, $d=1$ and $d=n$, correspond respectively to Element Distinctness and Unstructured Search, two f… ▽ More Longest Common Substring (LCS) is an important text processing problem, which has recently been investigated in the quantum query model. The decisional version of this problem, LCS with threshold $d$, asks whether two length-$n$ input strings have a common substring of length $d$. The two extreme cases, $d=1$ and $d=n$, correspond respectively to Element Distinctness and Unstructured Search, two fundamental problems in quantum query complexity. However, the intermediate case $1\ll d\ll n$ was not fully understood. We show that the complexity of LCS with threshold $d$ smoothly interpolates between the two extreme cases up to $n^{o(1)}$ factors: LCS with threshold $d$ has a quantum algorithm in $n^{2/3+o(1)}/d^{1/6}$ query complexity and time complexity, and requires at least $Ω(n^{2/3}/d^{1/6})$ quantum query complexity. Our result improves upon previous upper bounds $\tilde O(\min \{n/d^{1/2}, n^{2/3}\})$ (Le Gall and Seddighin ITCS 2022, Akmal and Jin SODA 2022), and answers an open question of Akmal and Jin. Our main technical contribution is a quantum speed-up of the powerful String Synchronizing Set technique introduced by Kempa and Kociumaka (STOC 2019). It consistently samples $n/τ^{1-o(1)}$ synchronizing positions in the string depending on their length-$Θ(τ)$ contexts, and each synchronizing position can be reported by a quantum algorithm in $\tilde O(τ^{1/2+o(1)})$ time. As another application of our quantum string synchronizing set, we study the $k$-mismatch Matching problem, which asks if the pattern has an occurrence in the text with at most $k$ Hamming mismatches. Using a structural result of Charalampopoulos, Kociumaka, and Wellnitz (FOCS 2020), we obtain a quantum algorithm for $k$-mismatch matching with $k^{3/4} n^{1/2+o(1)}$ query complexity and $\tilde O(kn^{1/2})$ time complexity. △ Less

Submitted 29 November, 2022; originally announced November 2022.

Comments: SODA 2023. Abstract shortened to meet arXiv requirements

Showing 1–5 of 5 results for author: Nogler, J