Search | arXiv e-print repository

arXiv:2407.20187 [pdf, ps, other]

Eliminating Majority Illusion is Easy

Authors: Jack Dippel, Max Dupré la Tour, April Niu, Sanjukta Roy, Adrian Vetta

Abstract: Majority Illusion is a phenomenon in social networks wherein the decision by the majority of the network is not the same as one's personal social circle's majority, leading to an incorrect perception of the majority in a large network. In this paper, we present polynomial-time algorithms which can eliminate majority illusion in a network by altering as few connections as possible. Additionally, we… ▽ More Majority Illusion is a phenomenon in social networks wherein the decision by the majority of the network is not the same as one's personal social circle's majority, leading to an incorrect perception of the majority in a large network. In this paper, we present polynomial-time algorithms which can eliminate majority illusion in a network by altering as few connections as possible. Additionally, we prove that the more general problem of ensuring all neighbourhoods in the network are at least a $p$-fraction of the majority is NP-hard for most values of $p$. △ Less

Submitted 29 July, 2024; originally announced July 2024.

arXiv:2407.11217 [pdf, ps, other]

Almost-linear Time Approximation Algorithm to Euclidean $k$-median and $k$-means

Authors: Max Dupré la Tour, David Saulpic

Abstract: Clustering is one of the staples of data analysis and unsupervised learning. As such, clustering algorithms are often used on massive data sets, and they need to be extremely fast. We focus on the Euclidean $k$-median and $k$-means problems, two of the standard ways to model the task of clustering. For these, the go-to algorithm is $k$-means++, which yields an $O(\log k)$-approximation in time… ▽ More Clustering is one of the staples of data analysis and unsupervised learning. As such, clustering algorithms are often used on massive data sets, and they need to be extremely fast. We focus on the Euclidean $k$-median and $k$-means problems, two of the standard ways to model the task of clustering. For these, the go-to algorithm is $k$-means++, which yields an $O(\log k)$-approximation in time $\tilde O(nkd)$. While it is possible to improve either the approximation factor [Lattanzi and Sohler, ICML19] or the running time [Cohen-Addad et al., NeurIPS 20], it is unknown how precise a linear-time algorithm can be. In this paper, we almost answer this question by presenting an almost linear-time algorithm to compute a constant-factor approximation. △ Less

Submitted 18 December, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.02412 [pdf, other]

$k$-Leaf Powers Cannot be Characterized by a Finite Set of Forbidden Induced Subgraphs for $k \geq 5$

Authors: Max Dupré la Tour, Manuel Lafond, Ndiamé Ndiaye, Adrian Vetta

Abstract: A graph $G=(V,E)$ is a $k$-leaf power if there is a tree $T$ whose leaves are the vertices of $G$ with the property that a pair of leaves $u$ and $v$ induce an edge in $G$ if and only if they are distance at most $k$ apart in $T$. For $k\le 4$, it is known that there exists a finite set $F_k$ of graphs such that the class $L(k)$ of $k$-leaf power graphs is characterized as the set of strongly chor… ▽ More A graph $G=(V,E)$ is a $k$-leaf power if there is a tree $T$ whose leaves are the vertices of $G$ with the property that a pair of leaves $u$ and $v$ induce an edge in $G$ if and only if they are distance at most $k$ apart in $T$. For $k\le 4$, it is known that there exists a finite set $F_k$ of graphs such that the class $L(k)$ of $k$-leaf power graphs is characterized as the set of strongly chordal graphs that do not contain any graph in $F_k$ as an induced subgraph. We prove no such characterization holds for $k\ge 5$. That is, for any $k\ge 5$, there is no finite set $F_k$ of graphs such that $L(k)$ is equivalent to the set of strongly chordal graphs that do not contain as an induced subgraph any graph in $F_k$. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2406.19926 [pdf, other]

Fully Dynamic k-Means Coreset in Near-Optimal Update Time

Authors: Max Dupré la Tour, Monika Henzinger, David Saulpic

Abstract: We study in this paper the problem of maintaining a solution to $k$-median and $k$-means clustering in a fully dynamic setting. To do so, we present an algorithm to efficiently maintain a coreset, a compressed version of the dataset, that allows easy computation of a clustering solution at query time. Our coreset algorithm has near-optimal update time of $\tilde O(k)$ in general metric spaces, whi… ▽ More We study in this paper the problem of maintaining a solution to $k$-median and $k$-means clustering in a fully dynamic setting. To do so, we present an algorithm to efficiently maintain a coreset, a compressed version of the dataset, that allows easy computation of a clustering solution at query time. Our coreset algorithm has near-optimal update time of $\tilde O(k)$ in general metric spaces, which reduces to $\tilde O(d)$ in the Euclidean space $\mathbb{R}^d$. The query time is $O(k^2)$ in general metrics, and $O(kd)$ in $\mathbb{R}^d$. To maintain a constant-factor approximation for $k$-median and $k$-means clustering in Euclidean space, this directly leads to an algorithm update time $\tilde O(d)$, and query time $\tilde O(kd + k^2)$. To maintain a $O(polylog~k)$-approximation, the query time is reduced to $\tilde O(kd)$. △ Less

Submitted 28 June, 2024; originally announced June 2024.

Comments: To appear at ESA 2024

arXiv:2406.11649 [pdf, other]

Making Old Things New: A Unified Algorithm for Differentially Private Clustering

Authors: Max Dupré la Tour, Monika Henzinger, David Saulpic

Abstract: As a staple of data analysis and unsupervised learning, the problem of private clustering has been widely studied under various privacy models. Centralized differential privacy is the first of them, and the problem has also been studied for the local and the shuffle variation. In each case, the goal is to design an algorithm that computes privately a clustering, with the smallest possible error. T… ▽ More As a staple of data analysis and unsupervised learning, the problem of private clustering has been widely studied under various privacy models. Centralized differential privacy is the first of them, and the problem has also been studied for the local and the shuffle variation. In each case, the goal is to design an algorithm that computes privately a clustering, with the smallest possible error. The study of each variation gave rise to new algorithms: the landscape of private clustering algorithms is therefore quite intricate. In this paper, we show that a 20-year-old algorithm can be slightly modified to work for any of these models. This provides a unified picture: while matching almost all previously known results, it allows us to improve some of them and extend it to a new privacy model, the continual observation setting, where the input is changing over time and the algorithm must output a new solution at each time step. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Oral presentation at ICML 2024

arXiv:2312.14721 [pdf, other]

Gerrymandering Planar Graphs

Authors: Jack Dippel, Max Dupré la Tour, April Niu, Sanjukta Roy, Adrian Vetta

Abstract: We study the computational complexity of the map redistricting problem (gerrymandering). Mathematically, the electoral district designer (gerrymanderer) attempts to partition a weighted graph into $k$ connected components (districts) such that its candidate (party) wins as many districts as possible. Prior work has principally concerned the special cases where the graph is a path or a tree. Our fo… ▽ More We study the computational complexity of the map redistricting problem (gerrymandering). Mathematically, the electoral district designer (gerrymanderer) attempts to partition a weighted graph into $k$ connected components (districts) such that its candidate (party) wins as many districts as possible. Prior work has principally concerned the special cases where the graph is a path or a tree. Our focus concerns the realistic case where the graph is planar. We prove that the gerrymandering problem is solvable in polynomial time in $λ$-outerplanar graphs, when the number of candidates and $λ$ are constants and the vertex weights (voting weights) are polynomially bounded. In contrast, the problem is NP-complete in general planar graphs even with just two candidates. This motivates the study of approximation algorithms for gerrymandering planar graphs. However, when the number of candidates is large, we prove it is hard to distinguish between instances where the gerrymanderer cannot win a single district and instances where the gerrymanderer can win at least one district. This immediately implies that the redistricting problem is inapproximable in polynomial time in planar graphs, unless P=NP. This conclusion appears terminal for the design of good approximation algorithms -- but it is not. The inapproximability bound can be circumvented as it only applies when the maximum number of districts the gerrymanderer can win is extremely small, say one. Indeed, for a fixed number of candidates, our main result is that there is a constant factor approximation algorithm for redistricting unweighted planar graphs, provided the optimal value is a large enough constant. △ Less

Submitted 7 January, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

arXiv:2307.03430 [pdf, ps, other]

Differential Privacy for Clustering Under Continual Observation

Authors: Max Dupré la Tour, Monika Henzinger, David Saulpic

Abstract: We consider the problem of clustering privately a dataset in $\mathbb{R}^d$ that undergoes both insertion and deletion of points. Specifically, we give an $\varepsilon$-differentially private clustering mechanism for the $k$-means objective under continual observation. This is the first approximation algorithm for that problem with an additive error that depends only logarithmically in the number… ▽ More We consider the problem of clustering privately a dataset in $\mathbb{R}^d$ that undergoes both insertion and deletion of points. Specifically, we give an $\varepsilon$-differentially private clustering mechanism for the $k$-means objective under continual observation. This is the first approximation algorithm for that problem with an additive error that depends only logarithmically in the number $T$ of updates. The multiplicative error is almost the same as non privately. To do so we show how to perform dimension reduction under continual observation and combine it with a differentially private greedy approximation algorithm for $k$-means. We also partially extend our results to the $k$-median problem. △ Less

Submitted 27 July, 2023; v1 submitted 7 July, 2023; originally announced July 2023.

Showing 1–7 of 7 results for author: la Tour, M D