Compressed Dictionary Matching on Run-Length Encoded Strings

Bille, Philip; Gørtz, Inge Li; Puglisi, Simon J.; Tarnow, Simon R.

doi:10.4230/LIPIcs.CPM.2025.21

Computer Science > Data Structures and Algorithms

arXiv:2509.03265 (cs)

[Submitted on 3 Sep 2025]

Title:Compressed Dictionary Matching on Run-Length Encoded Strings

Authors:Philip Bille, Inge Li Gørtz, Simon J. Puglisi, Simon R. Tarnow

View PDF HTML (experimental)

Abstract:Given a set of pattern strings $\mathcal{P}=\{P_1, P_2,\ldots P_k\}$ and a text string $S$, the classic dictionary matching problem is to report all occurrences of each pattern in $S$. We study the dictionary problem in the compressed setting, where the pattern strings and the text string are compressed using run-length encoding, and the goal is to solve the problem without decompression and achieve efficient time and space in the size of the compressed strings. Let $m$ and $n$ be the total length of the patterns $\mathcal{P}$ and the length of the text string $S$, respectively, and let $\overline{m}$ and $\overline{n}$ be the total number of runs in the run-length encoding of the patterns in $\mathcal{P}$ and $S$, respectively. Our main result is an algorithm that achieves $O( (\overline{m} + \overline{n})\log \log m + \mathrm{occ})$ expected time, and $O(\overline{m})$ space, where $\mathrm{occ}$ is the total number of occurrences of patterns in $S$. This is the first non-trivial solution to the problem. Since any solution must read the input, our time bound is optimal within an $\log \log m$ factor. We introduce several new techniques to achieve our bounds, including a new compressed representation of the classic Aho-Corasick automaton and a new efficient string index that supports fast queries in run-length encoded strings.

Subjects:	Data Structures and Algorithms (cs.DS)
Cite as:	arXiv:2509.03265 [cs.DS]
	(or arXiv:2509.03265v1 [cs.DS] for this version)
	https://doi.org/10.48550/arXiv.2509.03265
Journal reference:	36th Annual Symposium on Combinatorial Pattern Matching (CPM 2025). Leibniz International Proceedings in Informatics (LIPIcs), Volume 331, pp. 21:1-21:16, Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2025)
Related DOI:	https://doi.org/10.4230/LIPIcs.CPM.2025.21

Submission history

From: Simon Tarnow [view email]
[v1] Wed, 3 Sep 2025 12:30:08 UTC (132 KB)

Computer Science > Data Structures and Algorithms

Title:Compressed Dictionary Matching on Run-Length Encoded Strings

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Data Structures and Algorithms

Title:Compressed Dictionary Matching on Run-Length Encoded Strings

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators