Search | arXiv e-print repository

Differential privacy and Sublinear time are incompatible sometimes

Authors: Jeremiah Blocki, Hendrik Fichtenberger, Elena Grigorescu, Tamalika Mukherjee

Abstract: Differential privacy and sublinear algorithms are both rapidly emerging algorithmic themes in times of big data analysis. Although recent works have shown the existence of differentially private sublinear algorithms for many problems including graph parameter estimation and clustering, little is known regarding hardness results on these algorithms. In this paper, we initiate the study of lower bou… ▽ More Differential privacy and sublinear algorithms are both rapidly emerging algorithmic themes in times of big data analysis. Although recent works have shown the existence of differentially private sublinear algorithms for many problems including graph parameter estimation and clustering, little is known regarding hardness results on these algorithms. In this paper, we initiate the study of lower bounds for problems that aim for both differentially-private and sublinear-time algorithms. Our main result is the incompatibility of both the desiderata in the general case. In particular, we prove that a simple problem based on one-way marginals yields both a differentially-private algorithm, as well as a sublinear-time algorithm, but does not admit a ``strictly'' sublinear-time algorithm that is also differentially private. △ Less

Submitted 13 January, 2025; v1 submitted 9 July, 2024; originally announced July 2024.

arXiv:2403.14332 [pdf, ps, other]

A Differentially Private Clustering Algorithm for Well-Clustered Graphs

Authors: Weiqiang He, Hendrik Fichtenberger, Pan Peng

Abstract: We study differentially private (DP) algorithms for recovering clusters in well-clustered graphs, which are graphs whose vertex set can be partitioned into a small number of sets, each inducing a subgraph of high inner conductance and small outer conductance. Such graphs have widespread application as a benchmark in the theoretical analysis of spectral clustering. We provide an efficient ($ε$,$δ$)… ▽ More We study differentially private (DP) algorithms for recovering clusters in well-clustered graphs, which are graphs whose vertex set can be partitioned into a small number of sets, each inducing a subgraph of high inner conductance and small outer conductance. Such graphs have widespread application as a benchmark in the theoretical analysis of spectral clustering. We provide an efficient ($ε$,$δ$)-DP algorithm tailored specifically for such graphs. Our algorithm draws inspiration from the recent work of Chen et al., who developed DP algorithms for recovery of stochastic block models in cases where the graph comprises exactly two nearly-balanced clusters. Our algorithm works for well-clustered graphs with $k$ nearly-balanced clusters, and the misclassification ratio almost matches the one of the best-known non-private algorithms. We conduct experimental evaluations on datasets with known ground truth clusters to substantiate the prowess of our algorithm. We also show that any (pure) $ε$-DP algorithm would result in substantial error. △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2307.14490 [pdf, other]

doi 10.1145/3580305.3599840

HUGE: Huge Unsupervised Graph Embeddings with TPUs

Authors: Brandon Mayer, Anton Tsitsulin, Hendrik Fichtenberger, Jonathan Halcrow, Bryan Perozzi

Abstract: Graphs are a representation of structured data that captures the relationships between sets of objects. With the ubiquity of available network data, there is increasing industrial and academic need to quickly analyze graphs with billions of nodes and trillions of edges. A common first step for network understanding is Graph Embedding, the process of creating a continuous representation of nodes in… ▽ More Graphs are a representation of structured data that captures the relationships between sets of objects. With the ubiquity of available network data, there is increasing industrial and academic need to quickly analyze graphs with billions of nodes and trillions of edges. A common first step for network understanding is Graph Embedding, the process of creating a continuous representation of nodes in a graph. A continuous representation is often more amenable, especially at scale, for solving downstream machine learning tasks such as classification, link prediction, and clustering. A high-performance graph embedding architecture leveraging Tensor Processing Units (TPUs) with configurable amounts of high-bandwidth memory is presented that simplifies the graph embedding problem and can scale to graphs with billions of nodes and trillions of edges. We verify the embedding space quality on real and synthetic large-scale datasets. △ Less

Submitted 26 July, 2023; originally announced July 2023.

Comments: As appeared at KDD 2023

arXiv:2303.11843 [pdf, other]

doi 10.1137/1.9781611977554.ch101

Optimal Fully Dynamic $k$-Center Clustering for Adaptive and Oblivious Adversaries

Authors: MohammadHossein Bateni, Hossein Esfandiari, Hendrik Fichtenberger, Monika Henzinger, Rajesh Jayaram, Vahab Mirrokni, Andreas Wiese

Abstract: In fully dynamic clustering problems, a clustering of a given data set in a metric space must be maintained while it is modified through insertions and deletions of individual points. In this paper, we resolve the complexity of fully dynamic $k$-center clustering against both adaptive and oblivious adversaries. Against oblivious adversaries, we present the first algorithm for fully dynamic $k$-cen… ▽ More In fully dynamic clustering problems, a clustering of a given data set in a metric space must be maintained while it is modified through insertions and deletions of individual points. In this paper, we resolve the complexity of fully dynamic $k$-center clustering against both adaptive and oblivious adversaries. Against oblivious adversaries, we present the first algorithm for fully dynamic $k$-center in an arbitrary metric space that maintains an optimal $(2+ε)$-approximation in $O(k \cdot \mathrm{polylog}(n,Δ))$ amortized update time. Here, $n$ is an upper bound on the number of active points at any time, and $Δ$ is the aspect ratio of the metric space. Previously, the best known amortized update time was $O(k^2\cdot \mathrm{polylog}(n,Δ))$, and is due to Chan, Gourqin, and Sozio (2018). Moreover, we demonstrate that our runtime is optimal up to $\mathrm{polylog}(n,Δ)$ factors. In fact, we prove that even offline algorithms for $k$-clustering tasks in arbitrary metric spaces, including $k$-medians, $k$-means, and $k$-center, must make at least $Ω(n k)$ distance queries to achieve any non-trivial approximation factor. This implies a lower bound of $Ω(k)$ which holds even for the insertions-only setting. We also show deterministic lower and upper bounds for adaptive adversaries, demonstrate that an update time sublinear in $k$ is possible against oblivious adversaries for metric spaces which admit locally sensitive hash functions (LSH) and give the first fully dynamic $O(1)$-approximation algorithms for the closely related $k$-sum-of-radii and $k$-sum-of-diameter problems. △ Less

Submitted 21 March, 2023; originally announced March 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2112.07050, arXiv:2112.07217

arXiv:2203.14225 [pdf, ps, other]

Approximately Counting Subgraphs in Data Streams

Authors: Hendrik Fichtenberger, Pan Peng

Abstract: Estimating the number of subgraphs in data streams is a fundamental problem that has received great attention in the past decade. In this paper, we give improved streaming algorithms for approximately counting the number of occurrences of an arbitrary subgraph $H$, denoted $\# H$, when the input graph $G$ is represented as a stream of $m$ edges. To obtain our algorithms, we provide a generic trans… ▽ More Estimating the number of subgraphs in data streams is a fundamental problem that has received great attention in the past decade. In this paper, we give improved streaming algorithms for approximately counting the number of occurrences of an arbitrary subgraph $H$, denoted $\# H$, when the input graph $G$ is represented as a stream of $m$ edges. To obtain our algorithms, we provide a generic transformation that converts constant-round sublinear-time graph algorithms in the query access model to constant-pass sublinear-space graph streaming algorithms. Using this transformation, we obtain the following results. 1. We give a $3$-pass turnstile streaming algorithm for $(1\pm ε)$-approximating $\# H$ in $\tilde{O}(\frac{m^{ρ(H)}}{ε^2\cdot \# H})$ space, where $ρ(H)$ is the fractional edge-cover of $H$. This improves upon and generalizes a result of McGregor et al. [PODS 2016], who gave a $3$-pass insertion-only streaming algorithm for $(1\pm ε)$-approximating the number $\# T$ of triangles in $\tilde{O}(\frac{m^{3/2}}{ε^2\cdot \# T})$ space if the algorithm is given additional oracle access to the degrees. 2. We provide a constant-pass streaming algorithm for $(1\pm ε)$-approximating $\# K_r$ in $\tilde{O}(\frac{mλ^{r-2}}{ε^2\cdot \# K_r})$ space for any $r\geq 3$, in a graph $G$ with degeneracy $λ$, where $K_r$ is a clique on $r$ vertices. This resolves a conjecture by Bera and Seshadhri [PODS 2020]. More generally, our reduction relates the adaptivity of a query algorithm to the pass complexity of a corresponding streaming algorithm, and it is applicable to all algorithms in standard sublinear-time graph query models, e.g., the (augmented) general model. △ Less

Submitted 27 March, 2022; originally announced March 2022.

arXiv:2202.11205 [pdf, other]

Constant matters: Fine-grained Complexity of Differentially Private Continual Observation

Authors: Hendrik Fichtenberger, Monika Henzinger, Jalaj Upadhyay

Abstract: We study fine-grained error bounds for differentially private algorithms for counting under continual observation. Our main insight is that the matrix mechanism when using lower-triangular matrices can be used in the continual observation model. More specifically, we give an explicit factorization for the counting matrix $M_\mathsf{count}$ and upper bound the error explicitly. We also give a fine-… ▽ More We study fine-grained error bounds for differentially private algorithms for counting under continual observation. Our main insight is that the matrix mechanism when using lower-triangular matrices can be used in the continual observation model. More specifically, we give an explicit factorization for the counting matrix $M_\mathsf{count}$ and upper bound the error explicitly. We also give a fine-grained analysis, specifying the exact constant in the upper bound. Our analysis is based on upper and lower bounds of the {\em completely bounded norm} (cb-norm) of $M_\mathsf{count}$. Along the way, we improve the best-known bound of 28 years by Mathias (SIAM Journal on Matrix Analysis and Applications, 1993) on the cb-norm of $M_\mathsf{count}$ for a large range of the dimension of $M_\mathsf{count}$. Furthermore, we are the first to give concrete error bounds for various problems under continual observation such as binary counting, maintaining a histogram, releasing an approximately cut-preserving synthetic graph, many graph-based statistics, and substring and episode counting. Finally, we note that our result can be used to get a fine-grained error bound for non-interactive local learning {and the first lower bounds on the additive error for $(ε,δ)$-differentially-private counting under continual observation.} Subsequent to this work, Henzinger et al. (SODA2023) showed that our factorization also achieves fine-grained mean-squared error. △ Less

Submitted 5 February, 2024; v1 submitted 23 February, 2022; originally announced February 2022.

Comments: Updated a citation and corrected by an off-one calculation error in the accuracy analysis

arXiv:2112.07217 [pdf, ps, other]

On fully dynamic constant-factor approximation algorithms for clustering problems

Authors: Hendrik Fichtenberger, Monika Henzinger, Andreas Wiese

Abstract: Clustering is an important task with applications in many fields of computer science. We study the fully dynamic setting in which we want to maintain good clusters efficiently when input points (from a metric space) can be inserted and deleted. Many clustering problems are $\mathsf{APX}$-hard but admit polynomial time $O(1)$-approximation algorithms. Thus, it is a natural question whether we can m… ▽ More Clustering is an important task with applications in many fields of computer science. We study the fully dynamic setting in which we want to maintain good clusters efficiently when input points (from a metric space) can be inserted and deleted. Many clustering problems are $\mathsf{APX}$-hard but admit polynomial time $O(1)$-approximation algorithms. Thus, it is a natural question whether we can maintain $O(1)$-approximate solutions for them in subpolynomial update time, against adaptive and oblivious adversaries. Only a few results are known that give partial answers to this question. There are dynamic algorithms for $k$-center, $k$-means, and $k$-median that maintain constant factor approximations in expected $\tilde{O}(k^{2})$ update time against an oblivious adversary. However, for these problems there are no algorithms known with an update time that is subpolynomial in $k$, and against an adaptive adversary there are even no (non-trivial) dynamic algorithms known at all. In this paper, we complete the picture of the question above for all these clustering problems. 1. We show that there is no fully dynamic $O(1)$-approximation algorithm for any of the classic clustering problems above with an update time in $n^{o(1)}h(k)$ against an adaptive adversary, for an arbitrary function $h$. 2. We give a lower bound of $Ω(k)$ on the update time for each of the above problems, even against an oblivious adversary. 3. We give the first $O(1)$-approximate fully dynamic algorithms for $k$-sum-of-radii and for $k$-sum-of-diameters with expected update time of $\tilde{O}(k^{O(1)})$ against an oblivious adversary. 4. Finally, for $k$-center we present a fully dynamic $(6+ε)$-approximation algorithm with an expected update time of $\tilde{O}(k)$ against an oblivious adversary. △ Less

Submitted 14 December, 2021; originally announced December 2021.

arXiv:2106.14756 [pdf, other]

Differentially Private Algorithms for Graphs Under Continual Observation

Authors: Hendrik Fichtenberger, Monika Henzinger, Lara Ost

Abstract: Differentially private algorithms protect individuals in data analysis scenarios by ensuring that there is only a weak correlation between the existence of the user in the data and the result of the analysis. Dynamic graph algorithms maintain the solution to a problem (e.g., a matching) on an evolving input, i.e., a graph where nodes or edges are inserted or deleted over time. They output the va… ▽ More Differentially private algorithms protect individuals in data analysis scenarios by ensuring that there is only a weak correlation between the existence of the user in the data and the result of the analysis. Dynamic graph algorithms maintain the solution to a problem (e.g., a matching) on an evolving input, i.e., a graph where nodes or edges are inserted or deleted over time. They output the value of the solution after each update operation, i.e., continuously. We study (event-level and user-level) differentially private algorithms for graph problems under continual observation, i.e., differentially private dynamic graph algorithms. We present event-level private algorithms for partially dynamic counting-based problems such as triangle count that improve the additive error by a polynomial factor (in the length $T$ of the update sequence) on the state of the art, resulting in the first algorithms with additive error polylogarithmic in $T$. We also give $\varepsilon$-differentially private and partially dynamic algorithms for minimum spanning tree, minimum cut, densest subgraph, and maximum matching. The additive error of our improved MST algorithm is $O(W \log^{3/2}T / \varepsilon)$, where $W$ is the maximum weight of any edge, which, as we show, is tight up to a $(\sqrt{\log T} / \varepsilon)$-factor. For the other problems, we present a partially-dynamic algorithm with multiplicative error $(1+β)$ for any constant $β> 0$ and additive error $O(W \log(nW) \log(T) / (\varepsilon β))$. Finally, we show that the additive error for a broad class of dynamic graph algorithms with user-level privacy must be linear in the value of the output solution's range. △ Less

Submitted 28 November, 2023; v1 submitted 28 June, 2021; originally announced June 2021.

Comments: Corrected typos in lower bounds in Table 1. Fixed missing factor $\ell$ in statement of Theorem 45

arXiv:2011.06888 [pdf, other]

Consistent k-Clustering for General Metrics

Authors: Hendrik Fichtenberger, Silvio Lattanzi, Ashkan Norouzi-Fard, Ola Svensson

Abstract: Given a stream of points in a metric space, is it possible to maintain a constant approximate clustering by changing the cluster centers only a small number of times during the entire execution of the algorithm? This question received attention in recent years in the machine learning literature and, before our work, the best known algorithm performs $\widetilde{O}(k^2)$ center swaps (the… ▽ More Given a stream of points in a metric space, is it possible to maintain a constant approximate clustering by changing the cluster centers only a small number of times during the entire execution of the algorithm? This question received attention in recent years in the machine learning literature and, before our work, the best known algorithm performs $\widetilde{O}(k^2)$ center swaps (the $\widetilde{O}(\cdot)$ notation hides polylogarithmic factors in the number of points $n$ and the aspect ratio $Δ$ of the input instance). This is a quadratic increase compared to the offline case -- the whole stream is known in advance and one is interested in keeping a constant approximation at any point in time -- for which $\widetilde{O}(k)$ swaps are known to be sufficient and simple examples show that $Ω(k \log(n Δ))$ swaps are necessary. We close this gap by developing an algorithm that, perhaps surprisingly, matches the guarantees in the offline setting. Specifically, we show how to maintain a constant-factor approximation for the $k$-median problem by performing an optimal (up to polylogarithimic factors) number $\widetilde{O}(k)$ of center swaps. To obtain our result we leverage new structural properties of $k$-median clustering that may be of independent interest. △ Less

Submitted 13 November, 2020; originally announced November 2020.

arXiv:2005.01861 [pdf, ps, other]

Sampling Arbitrary Subgraphs Exactly Uniformly in Sublinear Time

Authors: Hendrik Fichtenberger, Mingze Gao, Pan Peng

Abstract: We present a simple sublinear-time algorithm for sampling an arbitrary subgraph $H$ \emph{exactly uniformly} from a graph $G$ with $m$ edges, to which the algorithm has access by performing the following types of queries: (1) degree queries, (2) neighbor queries, (3) pair queries and (4) edge sampling queries. The query complexity and running time of our algorithm are… ▽ More We present a simple sublinear-time algorithm for sampling an arbitrary subgraph $H$ \emph{exactly uniformly} from a graph $G$ with $m$ edges, to which the algorithm has access by performing the following types of queries: (1) degree queries, (2) neighbor queries, (3) pair queries and (4) edge sampling queries. The query complexity and running time of our algorithm are $\tilde{O}(\min\{m, \frac{m^{ρ(H)}}{\# H}\})$ and $\tilde{O}(\frac{m^{ρ(H)}}{\# H})$, respectively, where $ρ(H)$ is the fractional edge-cover of $H$ and $\# H$ is the number of copies of $H$ in $G$. For any clique on $r$ vertices, i.e., $H=K_r$, our algorithm is almost optimal as any algorithm that samples an $H$ from any distribution that has $Ω(1)$ total probability mass on the set of all copies of $H$ must perform $Ω(\min\{m, \frac{m^{ρ(H)}}{\# H\cdot (cr)^r}\})$ queries. Together with the query and time complexities of the $(1\pm \varepsilon)$-approximation algorithm for the number of subgraphs $H$ by Assadi, Kapralov and Khanna [ITCS 2018] and the lower bound by Eden and Rosenbaum [APPROX 2018] for approximately counting cliques, our results suggest that in our query model, approximately counting cliques is "equivalent to" exactly uniformly sampling cliques, in the sense that the query and time complexities of exactly uniform sampling and randomized approximate counting are within a polylogarithmic factor of each other. This stands in interesting contrast to an analogous relation between approximate counting and almost uniformly sampling for self-reducible problems in the polynomial-time regime by Jerrum, Valiant and Vazirani [TCS 1986]. △ Less

Submitted 25 March, 2021; v1 submitted 4 May, 2020; originally announced May 2020.

Comments: ICALP 2020

arXiv:1905.01644 [pdf, other]

Testable Properties in General Graphs and Random Order Streaming

Authors: Artur Czumaj, Hendrik Fichtenberger, Pan Peng, Christian Sohler

Abstract: We present a novel framework closely linking the areas of property testing and data streaming algorithms in the setting of general graphs. It has been recently shown (Monemizadeh et al. 2017) that for bounded-degree graphs, any constant-query tester can be emulated in the random order streaming model by a streaming algorithm that uses only space required to store a constant number of words. Howeve… ▽ More We present a novel framework closely linking the areas of property testing and data streaming algorithms in the setting of general graphs. It has been recently shown (Monemizadeh et al. 2017) that for bounded-degree graphs, any constant-query tester can be emulated in the random order streaming model by a streaming algorithm that uses only space required to store a constant number of words. However, in a more natural setting of general graphs, with no restriction on the maximum degree, no such results were known because of our lack of understanding of constant-query testers in general graphs and lack of techniques to appropriately emulate in the streaming setting off-line algorithms allowing many high-degree vertices. In this work we advance our understanding on both of these challenges. First, we provide canonical testers for all constant-query testers for general graphs, both, for one-sided and two-sided errors. Such canonizations were only known before (in the adjacency matrix model) for dense graphs (Goldreich and Trevisan 2003) and (in the adjacency list model) for bounded degree (di-)graphs (Goldreich and Ron 2011, Czumaj et al. 2016). Using the concept of canonical testers, we then prove that every property of general graphs that is constant-query testable with one-sided error can also be tested in constant-space with one-sided error in the random order streaming model. Our results imply, among others, that properties like $(s,t)$ disconnectivity, $k$-path-freeness, etc. are constant-space testable in random order streams. △ Less

Submitted 5 May, 2019; originally announced May 2019.

arXiv:1812.09249 [pdf, other]

doi 10.1007/s10458-021-09505-x

Testing Stability Properties in Graphical Hedonic Games

Authors: Hendrik Fichtenberger, Anja Rey

Abstract: In hedonic games, players form coalitions based on individual preferences over the group of players they could belong to. Several concepts to describe the stability of coalition structures in a game have been proposed and analysed in the literature. However, prior research focuses on algorithms with time complexity that is at least linear in the input size. In the light of very large games that ar… ▽ More In hedonic games, players form coalitions based on individual preferences over the group of players they could belong to. Several concepts to describe the stability of coalition structures in a game have been proposed and analysed in the literature. However, prior research focuses on algorithms with time complexity that is at least linear in the input size. In the light of very large games that arise from, e.g., social networks and advertising, we initiate the study of sublinear time property testing algorithms for existence and verification problems under several notions of coalition stability in a model of hedonic games represented by graphs with bounded degree. In graph property testing, one shall decide whether a given input has a property (e.g., a game admits a stable coalition structure) or is far from it, i.e., one has to modify at least an $ε$-fraction of the input (e.g., the game's preferences) to make it have the property. In particular, we consider verification of perfection, individual rationality, Nash stability, (contractual) individual stability, and core stability. While there is always a Nash-stable coalition structure (which also implies individually stable coalitions), we show that the existence of a perfect coalition structure is not tautological but can be tested. All our testers have one-sided error and time complexity that is independent of the input size. △ Less

Submitted 29 July, 2021; v1 submitted 21 December, 2018; originally announced December 2018.

Journal ref: Autonomous Agents and Multi-Agent Systems 35.2 (2021): 1-26

arXiv:1811.02937 [pdf, ps, other]

Every Testable (Infinite) Property of Bounded-Degree Graphs Contains an Infinite Hyperfinite Subproperty

Authors: Hendrik Fichtenberger, Pan Peng, Christian Sohler

Abstract: One of the most fundamental questions in graph property testing is to characterize the combinatorial structure of properties that are testable with a constant number of queries. We work towards an answer to this question for the bounded-degree graph model introduced in [Goldreich, Ron, 2002], where the input graphs have maximum degree bounded by a constant $d$. In this model, it is known (among ot… ▽ More One of the most fundamental questions in graph property testing is to characterize the combinatorial structure of properties that are testable with a constant number of queries. We work towards an answer to this question for the bounded-degree graph model introduced in [Goldreich, Ron, 2002], where the input graphs have maximum degree bounded by a constant $d$. In this model, it is known (among other results) that every \emph{hyperfinite} property is constant-query testable [Newman, Sohler, 2013], where, informally, a graph property is hyperfinite, if for every $δ>0$ every graph in the property can be partitioned into small connected components by removing $δn$ edges. In this paper we show that hyperfiniteness plays a role in \emph{every} testable property, i.e. we show that every testable property is either finite (which trivially implies hyperfiniteness and testability) or contains an infinite hyperfinite subproperty. A simple consequence of our result is that no infinite graph property that only consists of expander graphs is constant-query testable. Based on the above findings, one could ask if every infinite testable non-hyperfinite property might contain an infinite family of expander (or near-expander) graphs. We show that this is not true. Motivated by our counter-example we develop a theorem that shows that we can partition the set of vertices of every bounded degree graph into a constant number of subsets and a separator set, such that the separator set is small and the distribution of $k$-disks on every subset of a partition class, is roughly the same as that of the partition class if the subset has small expansion. △ Less

Submitted 7 November, 2018; originally announced November 2018.

arXiv:1810.05064 [pdf, other]

A Theory-Based Evaluation of Nearest Neighbor Models Put Into Practice

Authors: Hendrik Fichtenberger, Dennis Rohde

Abstract: In the $k$-nearest neighborhood model ($k$-NN), we are given a set of points $P$, and we shall answer queries $q$ by returning the $k$ nearest neighbors of $q$ in $P$ according to some metric. This concept is crucial in many areas of data analysis and data processing, e.g., computer vision, document retrieval and machine learning. Many $k$-NN algorithms have been published and implemented, but oft… ▽ More In the $k$-nearest neighborhood model ($k$-NN), we are given a set of points $P$, and we shall answer queries $q$ by returning the $k$ nearest neighbors of $q$ in $P$ according to some metric. This concept is crucial in many areas of data analysis and data processing, e.g., computer vision, document retrieval and machine learning. Many $k$-NN algorithms have been published and implemented, but often the relation between parameters and accuracy of the computed $k$-NN is not explicit. We study property testing of $k$-NN graphs in theory and evaluate it empirically: given a point set $P \subset \mathbb{R}^δ$ and a directed graph $G=(P,E)$, is $G$ a $k$-NN graph, i.e., every point $p \in P$ has outgoing edges to its $k$ nearest neighbors, or is it $ε$-far from being a $k$-NN graph? Here, $ε$-far means that one has to change more than an $ε$-fraction of the edges in order to make $G$ a $k$-NN graph. We develop a randomized algorithm with one-sided error that decides this question, i.e., a property tester for the $k$-NN property, with complexity $O(\sqrt{n} k^2 / ε^2)$ measured in terms of the number of vertices and edges it inspects, and we prove a lower bound of $Ω(\sqrt{n / εk})$. We evaluate our tester empirically on the $k$-NN models computed by various algorithms and show that it can be used to detect $k$-NN models with bad accuracy in significantly less time than the building time of the $k$-NN model. △ Less

Submitted 30 November, 2018; v1 submitted 11 October, 2018; originally announced October 2018.

arXiv:1707.06126 [pdf, other]

doi 10.4230/LIPIcs.ICALP.2018.52

A Sublinear Tester for Outerplanarity (and Other Forbidden Minors) With One-Sided Error

Authors: Hendrik Fichtenberger, Reut Levi, Yadu Vasudev, Maximilian Wötzel

Abstract: We consider one-sided error property testing of $\mathcal{F}$-minor freeness in bounded-degree graphs for any finite family of graphs $\mathcal{F}$ that contains a minor of $K_{2,k}$, the $k$-circus graph, or the $(k\times 2)$-grid for any $k\in\mathbb{N}$. This includes, for instance, testing whether a graph is outerplanar or a cactus graph. The query complexity of our algorithm in terms of the n… ▽ More We consider one-sided error property testing of $\mathcal{F}$-minor freeness in bounded-degree graphs for any finite family of graphs $\mathcal{F}$ that contains a minor of $K_{2,k}$, the $k$-circus graph, or the $(k\times 2)$-grid for any $k\in\mathbb{N}$. This includes, for instance, testing whether a graph is outerplanar or a cactus graph. The query complexity of our algorithm in terms of the number of vertices in the graph, $n$, is $\tilde{O}(n^{2/3} / ε^5)$. Czumaj et~al.\ showed that cycle-freeness and $C_k$-minor freeness can be tested with query complexity $\tilde{O}(\sqrt{n})$ by using random walks, and that testing $H$-minor freeness for any $H$ that contains a cycles requires $Ω(\sqrt{n})$ queries. In contrast to these results, we analyze the structure of the graph and show that either we can find a subgraph of sublinear size that includes the forbidden minor $H$, or we can find a pair of disjoint subsets of vertices whose edge-cut is large, which induces an $H$-minor. △ Less

Submitted 8 August, 2018; v1 submitted 19 July, 2017; originally announced July 2017.

Comments: extended to testing outerplanarity, full version of ICALP paper

Journal ref: 45th International Colloquium on Automata, Languages, and Programming (ICALP), pages 52:1-52:14, 2018

arXiv:1705.08174 [pdf, ps, other]

Distributed Testing of Conductance

Authors: Hendrik Fichtenberger, Yadu Vasudev

Abstract: We study the problem of testing conductance in the setting of distributed computing and give a two-sided tester that takes $\mathcal{O}(\log(n) / (εΦ^2))$ rounds to decide if a graph has conductance at least $Φ$ or is $ε$-far from having conductance at least $Φ^2 / 1000$ in the distributed CONGEST model. We also show that $Ω(\log n)$ rounds are necessary for testing conductance even in the LOCAL m… ▽ More We study the problem of testing conductance in the setting of distributed computing and give a two-sided tester that takes $\mathcal{O}(\log(n) / (εΦ^2))$ rounds to decide if a graph has conductance at least $Φ$ or is $ε$-far from having conductance at least $Φ^2 / 1000$ in the distributed CONGEST model. We also show that $Ω(\log n)$ rounds are necessary for testing conductance even in the LOCAL model. In the case of a connected graph, we show that we can perform the test even when the number of vertices in the graph is not known a priori. This is the first two-sided tester in the distributed model we are aware of. A key observation is that one can perform a polynomial number of random walks from a small set of vertices if it is sufficient to track only some small statistics of the walks. This greatly reduces the congestion on the edges compared to tracking each walk individually. △ Less

Submitted 19 October, 2017; v1 submitted 23 May, 2017; originally announced May 2017.

Comments: revised introduction and some fixes

arXiv:1309.5781 [pdf, other]

PROBI: A Heuristic for the probabilistic k-median problem

Authors: Hendrik Fichtenberger, Melanie Schmidt

Abstract: We develop the heuristic PROBI for the probabilistic Euclidean k-median problem based on a coreset construction by Lammersen et al. Our algorithm computes a summary of the data and then uses an adapted version of k-means++ (Arthur and Vassilvitskii, 2007) to compute a good solution on the summary. The summary is maintained in a data stream, so PROBI can be used in a data stream setting on very lar… ▽ More We develop the heuristic PROBI for the probabilistic Euclidean k-median problem based on a coreset construction by Lammersen et al. Our algorithm computes a summary of the data and then uses an adapted version of k-means++ (Arthur and Vassilvitskii, 2007) to compute a good solution on the summary. The summary is maintained in a data stream, so PROBI can be used in a data stream setting on very large data sets. We experimentally evaluate the quality of the summary and of the computed solution and compare the running time to state of the art data stream clustering algorithms. △ Less

Submitted 23 September, 2013; originally announced September 2013.

Showing 1–17 of 17 results for author: Fichtenberger, H