Using Colors and Sketches to Count Subgraphs in a Streaming Graph
Authors:
Shirin Handjani,
Douglas Jungreis,
Mark Tiefenbruck
Abstract:
Suppose we wish to estimate $\#H$, the number of copies of some small graph $H$ in a large streaming graph $G$. There are many algorithms for this task when $H$ is a triangle, but just a few that apply to arbitrary $H$. Here we focus on one such algorithm, which was introduced by Kane, Mehlhorn, Sauerwald, and Sun. The storage and update time per edge for their algorithm are both $O(m^k/(\#H)^2)$,…
▽ More
Suppose we wish to estimate $\#H$, the number of copies of some small graph $H$ in a large streaming graph $G$. There are many algorithms for this task when $H$ is a triangle, but just a few that apply to arbitrary $H$. Here we focus on one such algorithm, which was introduced by Kane, Mehlhorn, Sauerwald, and Sun. The storage and update time per edge for their algorithm are both $O(m^k/(\#H)^2)$, where $m$ is the number of edges in $G$, and $k$ is the number of edges in $H$. Here, we propose three modifications to their algorithm that can dramatically reduce both the storage and update time. Suppose that $H$ has no leaves and that $G$ has maximum degree $\leq m^{1/2 - α}$, where $α> 0$. Define $C = \min(m^{2α},m^{1/3})$. Then in our version of the algorithm, the update time per edge is $O(1)$, and the storage is approximately reduced by a factor of $C^{2k-t-2}$, where $t$ is the number of vertices in $H$; in particular, the storage is $O(C^2 + m^k/(C^{2k-t-2} (\#H)^2))$.
△ Less
Submitted 23 February, 2023;
originally announced February 2023.