Predicting Positive and Negative Links with Noisy Queries: Theory & Practice

Tsourakakis, Charalampos E.; Mitzenmacher, Michael; Larsen, Kasper Green; Błasiok, Jarosław; Lawson, Ben; Nakkiran, Preetum; Nakos, Vasileios

Computer Science > Data Structures and Algorithms

arXiv:1709.07308 (cs)

[Submitted on 19 Sep 2017 (v1), last revised 6 Dec 2020 (this version, v3)]

Title:Predicting Positive and Negative Links with Noisy Queries: Theory & Practice

Authors:Charalampos E. Tsourakakis, Michael Mitzenmacher, Kasper Green Larsen, Jarosław Błasiok, Ben Lawson, Preetum Nakkiran, Vasileios Nakos

View PDF

Abstract:Social networks involve both positive and negative relationships, which can be captured in signed graphs. The {\em edge sign prediction problem} aims to predict whether an interaction between a pair of nodes will be positive or negative. We provide theoretical results for this problem that motivate natural improvements to recent heuristics.
The edge sign prediction problem is related to correlation clustering; a positive relationship means being in the same cluster. We consider the following model for two clusters: we are allowed to query any pair of nodes whether they belong to the same cluster or not, but the answer to the query is corrupted with some probability $0<q<\frac{1}{2}$. Let $\delta=1-2q$ be the bias. We provide an algorithm that recovers all signs correctly with high probability in the presence of noise with $O(\frac{n\log n}{\delta^2}+\frac{\log^2 n}{\delta^6})$ queries. This is the best known result for this problem for all but tiny $\delta$, improving on the recent work of Mazumdar and Saha \cite{mazumdar2017clustering}. We also provide an algorithm that performs $O(\frac{n\log n}{\delta^4})$ queries, and uses breadth first search as its main algorithmic primitive. While both the running time and the number of queries for this algorithm are sub-optimal, our result relies on novel theoretical techniques, and naturally suggests the use of edge-disjoint paths as a feature for predicting signs in online social networks. Correspondingly, we experiment with using edge disjoint $s-t$ paths of short length as a feature for predicting the sign of edge $(s,t)$ in real-world signed networks. Empirical findings suggest that the use of such paths improves the classification accuracy, especially for pairs of nodes with no common neighbors.

Comments:	arXiv admin note: text overlap with arXiv:1609.00750
Subjects:	Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM); Machine Learning (cs.LG); Social and Information Networks (cs.SI); Combinatorics (math.CO)
Cite as:	arXiv:1709.07308 [cs.DS]
	(or arXiv:1709.07308v3 [cs.DS] for this version)
	https://doi.org/10.48550/arXiv.1709.07308

Submission history

From: Charalampos Tsourakakis [view email]
[v1] Tue, 19 Sep 2017 20:38:10 UTC (227 KB)
[v2] Tue, 7 Aug 2018 21:28:34 UTC (256 KB)
[v3] Sun, 6 Dec 2020 21:54:16 UTC (255 KB)

Computer Science > Data Structures and Algorithms

Title:Predicting Positive and Negative Links with Noisy Queries: Theory & Practice

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Data Structures and Algorithms

Title:Predicting Positive and Negative Links with Noisy Queries: Theory & Practice

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators