Search | arXiv e-print repository

Naively Sorting Evolving Data is Optimal and Robust

Authors: George Giakkoupis, Marcos Kiwi, Dimitrios Los

Abstract: We study sorting in the evolving data model, introduced by [AKMU11], where the true total order changes while the sorting algorithm is processing the input. More precisely, each comparison operation of the algorithm is followed by a sequence of evolution steps, where an evolution step perturbs the rank of a random item by a "small" random value. The goal is to maintain an ordering that remains clo… ▽ More We study sorting in the evolving data model, introduced by [AKMU11], where the true total order changes while the sorting algorithm is processing the input. More precisely, each comparison operation of the algorithm is followed by a sequence of evolution steps, where an evolution step perturbs the rank of a random item by a "small" random value. The goal is to maintain an ordering that remains close to the true order over time. Previous works have analyzed adaptations of classic sorting algorithms, assuming that an evolution step changes the rank of an item by just one, and that a fixed constant number $b$ of evolution steps take place between two comparisons. In fact, the only previous result achieving optimal linear total deviation, by [BvDEGJ18a], applies just for $b=1$. We analyze a very simple sorting algorithm suggested by [M14], which samples a random pair of adjacent items in each step and swaps them if they are out of order. We show that the algorithm achieves and maintains, with high probability, optimal total deviation, $O(n)$, and optimal maximum deviation, $O(\log n)$, under very general model settings. Namely, the perturbation introduced by each evolution step is sampled from a general distribution of bounded moment generating function, and we just require that the average number of evolution steps between two sorting steps be bounded by an (arbitrary) constant, where the average is over a linear number of steps. The key ingredients of our proof are a novel potential function argument that inserts "gaps" in the list of items, and a general analysis framework which separates the analysis of sorting from that of the evolution steps, and is applicable to a variety of settings for which previous approaches do not apply. Our results settle conjectures and open problems in the aforementioned works, and provide theoretical support for empirical observations in [BvDEGJ18b]. △ Less

Submitted 22 September, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

Comments: 40 pages, 6 figures

MSC Class: 68W20; 68W40; 68P10 ACM Class: F.2.2; G.3

Journal ref: In Proceedings of the 65th IEEE Annual Symposium on Foundations of Computer Science (FOCS 2024)

arXiv:2302.03569 [pdf, other]

Label propagation on binomial random graphs

Authors: Marcos Kiwi, Lyuben Lichev, Dieter Mitsche, Paweł Prałat

Abstract: We study the behavior of a label propagation algorithm (LPA) on the Erdős-Rényi random graph $\mathcal{G}(n,p)$. Initially, given a network, each vertex starts with a random label in the interval $[0,1]$. Then, in each round of LPA, every vertex switches its label to the majority label in its neighborhood (including its own label). At the first round, ties are broken towards smaller labels, while… ▽ More We study the behavior of a label propagation algorithm (LPA) on the Erdős-Rényi random graph $\mathcal{G}(n,p)$. Initially, given a network, each vertex starts with a random label in the interval $[0,1]$. Then, in each round of LPA, every vertex switches its label to the majority label in its neighborhood (including its own label). At the first round, ties are broken towards smaller labels, while at each of the next rounds, ties are broken uniformly at random. The algorithm terminates once all labels stay the same in two consecutive iterations. LPA is successfully used in practice for detecting communities in networks (corresponding to vertex sets with the same label after termination of the algorithm). Perhaps surprisingly, LPA's performance on dense random graphs is hard to analyze, and so far convergence to consensus was known only when $np\ge n^{3/4+\varepsilon}$, where LPA converges in three rounds. By defining an alternative label attribution procedure which converges to the label propagation algorithm after three rounds, a careful multi-stage exposure of the edges allows us to break the $n^{3/4+\varepsilon}$ barrier and show that, when $np \ge n^{5/8+\varepsilon}$, a.a.s.\ the algorithm terminates with a single label. Moreover, we show that, if $np\gg n^{2/3}$, a.a.s.\ this label is the smallest one, whereas if $n^{5/8+\varepsilon}\le np\ll n^{2/3}$, the surviving label is a.a.s.\ not the smallest one. En passant, we show a presumably new monotonicity lemma for Binomial random variables that might be of independent interest. △ Less

Submitted 22 May, 2025; v1 submitted 7 February, 2023; originally announced February 2023.

Comments: 47 pages

MSC Class: 05C80; 60C05; 05D40 ACM Class: G.3; G.2.2

arXiv:2207.06956 [pdf, ps, other]

Cover and Hitting Times of Hyperbolic Random Graphs

Authors: Marcos Kiwi, Markus Schepers, John Sylvester

Abstract: We study random walks on the giant component of Hyperbolic Random Graphs (HRGs), in the regime when the degree distribution obeys a power law with exponent in the range $(2,3)$. In particular, we first focus on the expected time for a random walk to hit a given vertex or visit, i.e. cover, all vertices. We show that, a.a.s. (with respect to the HRG), and up to multiplicative constants: the cover t… ▽ More We study random walks on the giant component of Hyperbolic Random Graphs (HRGs), in the regime when the degree distribution obeys a power law with exponent in the range $(2,3)$. In particular, we first focus on the expected time for a random walk to hit a given vertex or visit, i.e. cover, all vertices. We show that, a.a.s. (with respect to the HRG), and up to multiplicative constants: the cover time is $n(\log n)^2$, the maximum hitting time is $n\log n$, and the average hitting time is $n$. We then determine the expected time to commute between two given vertices a.a.s., up to a small factor polylogarithmic in $n$, and under some mild hypothesis on the pair of vertices involved. Our results are proved by controlling effective resistances using the energy dissipated by carefully designed network flows associated to a tiling of the hyperbolic plane, on which we overlay a forest-like structure. △ Less

Submitted 1 May, 2024; v1 submitted 14 July, 2022; originally announced July 2022.

Comments: 55 pages, 4 figures. Appeared in Proceedings of RANDOM 2022

MSC Class: 05C80; 60J10; 60G40

arXiv:2003.03664 [pdf, ps, other]

doi 10.1016/j.ejc.2021.103403

Quasi-random words and limits of word sequences

Authors: Hiêp Hàn, Marcos Kiwi, Matías Pavez-Signé

Abstract: Words are sequences of letters over a finite alphabet. We study two intimately related topics for this object: quasi-randomness and limit theory. With respect to the first topic we investigate the notion of uniform distribution of letters over intervals, and in the spirit of the famous Chung--Graham--Wilson theorem for graphs we provide a list of word properties which are equivalent to uniformity.… ▽ More Words are sequences of letters over a finite alphabet. We study two intimately related topics for this object: quasi-randomness and limit theory. With respect to the first topic we investigate the notion of uniform distribution of letters over intervals, and in the spirit of the famous Chung--Graham--Wilson theorem for graphs we provide a list of word properties which are equivalent to uniformity. In particular, we show that uniformity is equivalent to counting 3-letter subsequences. Inspired by graph limit theory we then investigate limits of convergent word sequences, those in which all subsequence densities converge. We show that convergent word sequences have a natural limit, namely Lebesgue measurable functions of the form $f:[0,1]\to[0,1]$. Via this theory we show that every hereditary word property is testable, address the problem of finite forcibility for word limits and establish as a byproduct a new model of random word sequences. Along the lines of the proof of the existence of word limits, we can also establish the existence of limits for higher dimensional structures. In particular, we obtain an alternative proof of the result by Hoppen, Kohayakawa, Moreira, Ráth and Sampaio [{\it J. Combin. Theory Ser. B 103(1):93--113, 2013}] establishing the existence of permutons. △ Less

Submitted 30 August, 2021; v1 submitted 7 March, 2020; originally announced March 2020.

MSC Class: 05A05 (Primary); 68Q99 (Secondary) ACM Class: G.2.1

Journal ref: European Journal of Combinatorics, Volume 98, 2021

arXiv:1308.3831 [pdf, ps, other]

Strict majority bootstrap percolation in the r-wheel

Authors: Marcos Kiwi, Pablo Moisset de Espanés, Ivan Rapaport, Sergio Rica, Guillaume Theyssier

Abstract: In this paper we study the strict majority bootstrap percolation process on graphs. Vertices may be active or passive. Initially, active vertices are chosen independently with probability p. Each passive vertex becomes active if at least half of its neighbors are active (and thereafter never changes its state). If at the end of the process all vertices become active then we say that the initial se… ▽ More In this paper we study the strict majority bootstrap percolation process on graphs. Vertices may be active or passive. Initially, active vertices are chosen independently with probability p. Each passive vertex becomes active if at least half of its neighbors are active (and thereafter never changes its state). If at the end of the process all vertices become active then we say that the initial set of active vertices percolates on the graph. We address the problem of finding graphs for which percolation is likely to occur for small values of p. Specifically, we study a graph that we call r-wheel: a ring of n vertices augmented with a universal vertex where each vertex in the ring is connected to its r closest neighbors to the left and to its r closest neighbors to the right. We prove that the critical probability is 1/4. In other words, if p>1/4 then for large values of r percolation occurs with probability arbitrarily close to 1 as n goes to infinity. On the other hand, if p<1/4 then the probability of percolation is bounded away from 1. △ Less

Submitted 19 November, 2013; v1 submitted 18 August, 2013; originally announced August 2013.

Comments: 10 pages

MSC Class: 60K35; 60C05

arXiv:1305.4883 [pdf, other]

Repetition-free longest common subsequence of random sequences

Authors: Marcos Kiwi, Cristina G. Fernandes

Abstract: A repetition free Longest Common Subsequence (LCS) of two sequences x and y is an LCS of x and y where each symbol may appear at most once. Let R denote the length of a repetition free LCS of two sequences of n symbols each one chosen randomly, uniformly, and independently over a k-ary alphabet. We study the asymptotic, in n and k, behavior of R and establish that there are three distinct regimes,… ▽ More A repetition free Longest Common Subsequence (LCS) of two sequences x and y is an LCS of x and y where each symbol may appear at most once. Let R denote the length of a repetition free LCS of two sequences of n symbols each one chosen randomly, uniformly, and independently over a k-ary alphabet. We study the asymptotic, in n and k, behavior of R and establish that there are three distinct regimes, depending on the relative speed of growth of n and k. For each regime we establish the limiting behavior of R. In fact, we do more, since we actually establish tail bounds for large deviations of R from its limiting behavior. Our study is motivated by the so called exemplar model proposed by Sankoff (1999) and the related similarity measure introduced by Adi et al. (2007). A natural question that arises in this context, which as we show is related to long standing open problems in the area of probabilistic combinatorics, is to understand the asymptotic, in n and k, behavior of parameter R. △ Less

Submitted 21 May, 2013; originally announced May 2013.

Comments: 15 pages, 1 figure

MSC Class: 05A19; 05A15 ACM Class: G.2.1

arXiv:1107.3767 [pdf, other]

Computational Hardness of Enumerating Satisfying Spin-Assignments in Triangulations

Authors: Andrea Jiménez, Marcos Kiwi

Abstract: Satisfying spin-assignments in triangulations of a surface are states of minimum energy of the antiferromagnetic Ising model on triangulations which correspond (via geometric duality) to perfect matchings in cubic bridgeless graphs. In this work we show that it is NP-complete to decide whether or not a surface triangulation admits a satisfying spin-assignment, and that it is #P-complete to determi… ▽ More Satisfying spin-assignments in triangulations of a surface are states of minimum energy of the antiferromagnetic Ising model on triangulations which correspond (via geometric duality) to perfect matchings in cubic bridgeless graphs. In this work we show that it is NP-complete to decide whether or not a surface triangulation admits a satisfying spin-assignment, and that it is #P-complete to determine the number of such assignments. Both results are derived via an elaborate (and atypical) reduction that maps a Boolean formula in 3-conjunctive normal form into a triangulation of an orientable closed surface. △ Less

Submitted 19 July, 2011; originally announced July 2011.

Comments: 20 pages,25 figures

ACM Class: F.1.3; J.2

arXiv:1105.0474 [pdf, other]

Generalizations and Variants of the Largest Non-crossing Matching Problem in Random Bipartite Graphs

Authors: Marcos Kiwi, José A. Soto

Abstract: We are interested in the statistics of the length of the longest increasing subsequence of 2-rowed lexicographically sorted arrays chosen according to distinct families of distributions D = (D_n)_n, and when n goes to infinity. This framework encompasses well studied problems such as the so called Longest Increasing Subsequence problem, the Longest Common Subsequence problem, problems concerning d… ▽ More We are interested in the statistics of the length of the longest increasing subsequence of 2-rowed lexicographically sorted arrays chosen according to distinct families of distributions D = (D_n)_n, and when n goes to infinity. This framework encompasses well studied problems such as the so called Longest Increasing Subsequence problem, the Longest Common Subsequence problem, problems concerning directed bond percolation models, among others. We define several natural families of distinct distributions and characterize the asymptotic behavior of the expected length of a longest increasing subsequence chosen according to them. In particular, we consider generalizations to d-rowed arrays as well as symmetry restricted two-rowed arrays. △ Less

Submitted 3 May, 2011; originally announced May 2011.

Comments: 32 pages, 5 figures

ACM Class: G.2.1

Showing 1–8 of 8 results for author: Kiwi, M