-
Noisy group testing via spatial coupling
Authors:
Amin Coja-Oghlan,
Max Hahn-Klimroth,
Lukas Hintze,
Dominik Kaaser,
Lena Krieg,
Maurice Rolvien,
Olga Scheftelowitsch
Abstract:
We study the problem of identifying a small set $k\sim n^θ$, $0<θ<1$, of infected individuals within a large population of size $n$ by testing groups of individuals simultaneously. All tests are conducted concurrently. The goal is to minimise the total number of tests required. In this paper we make the (realistic) assumption that tests are noisy, i.e.\ that a group that contains an infected indiv…
▽ More
We study the problem of identifying a small set $k\sim n^θ$, $0<θ<1$, of infected individuals within a large population of size $n$ by testing groups of individuals simultaneously. All tests are conducted concurrently. The goal is to minimise the total number of tests required. In this paper we make the (realistic) assumption that tests are noisy, i.e.\ that a group that contains an infected individual may return a negative test result or one that does not contain an infected individual may return a positive test results with a certain probability. The noise need not be symmetric. We develop an algorithm called SPARC that correctly identifies the set of infected individuals up to $o(k)$ errors with high probability with the asymptotically minimum number of tests. Additionally, we develop an algorithm called SPEX that exactly identifies the set of infected individuals w.h.p. with a number of tests that matches the information-theoretic lower bound for the constant column design, a powerful and well-studied test design.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
On a Near-Optimal \& Efficient Algorithm for the Sparse Pooled Data Problem
Authors:
Max Hahn-Klimroth,
Remco van der Hofstad,
Noela Müller,
Connor Riddlesden
Abstract:
The pooled data problem asks to identify the unknown labels of a set of items from condensed measurements. More precisely, given $n$ items, assume that each item has a label in $\cbc{0,1,\ldots, d}$, encoded via the ground-truth $\SIGMA$. We call the pooled data problem sparse if the number of non-zero entries of $\SIGMA$ scales as $k \sim n^θ$ for $θ\in (0,1)$. The information that is revealed ab…
▽ More
The pooled data problem asks to identify the unknown labels of a set of items from condensed measurements. More precisely, given $n$ items, assume that each item has a label in $\cbc{0,1,\ldots, d}$, encoded via the ground-truth $\SIGMA$. We call the pooled data problem sparse if the number of non-zero entries of $\SIGMA$ scales as $k \sim n^θ$ for $θ\in (0,1)$. The information that is revealed about $\SIGMA$ comes from pooled measurements, each indicating how many items of each label are contained in the pool. The most basic question is to design a pooling scheme that uses as few pools as possible, while reconstructing $\SIGMA$ with high probability. Variants of the problem and its combinatorial ramifications have been studied for at least 35 years. However, the study of the modern question of \emph{efficient} inference of the labels has suggested a statistical-to-computational gap of order $\log n$ in the minimum number of pools needed for theoretically possible versus efficient inference. In this article, we resolve the question whether this $\log n$-gap is artificial or of a fundamental nature by the design of an efficient algorithm, called \algoname, based upon a novel pooling scheme on a number of pools very close to the information-theoretic threshold.
△ Less
Submitted 22 December, 2023;
originally announced December 2023.
-
Statistical and Computational Phase Transitions in Group Testing
Authors:
Amin Coja-Oghlan,
Oliver Gebhard,
Max Hahn-Klimroth,
Alexander S. Wein,
Ilias Zadik
Abstract:
We study the group testing problem where the goal is to identify a set of k infected individuals carrying a rare disease within a population of size n, based on the outcomes of pooled tests which return positive whenever there is at least one infected individual in the tested group. We consider two different simple random procedures for assigning individuals to tests: the constant-column design an…
▽ More
We study the group testing problem where the goal is to identify a set of k infected individuals carrying a rare disease within a population of size n, based on the outcomes of pooled tests which return positive whenever there is at least one infected individual in the tested group. We consider two different simple random procedures for assigning individuals to tests: the constant-column design and Bernoulli design. Our first set of results concerns the fundamental statistical limits. For the constant-column design, we give a new information-theoretic lower bound which implies that the proportion of correctly identifiable infected individuals undergoes a sharp "all-or-nothing" phase transition when the number of tests crosses a particular threshold. For the Bernoulli design, we determine the precise number of tests required to solve the associated detection problem (where the goal is to distinguish between a group testing instance and pure noise), improving both the upper and lower bounds of Truong, Aldridge, and Scarlett (2020). For both group testing models, we also study the power of computationally efficient (polynomial-time) inference procedures. We determine the precise number of tests required for the class of low-degree polynomial algorithms to solve the detection problem. This provides evidence for an inherent computational-statistical gap in both the detection and recovery problems at small sparsity levels. Notably, our evidence is contrary to that of Iliopoulos and Zadik (2021), who predicted the absence of a computational-statistical gap in the Bernoulli design.
△ Less
Submitted 15 June, 2022;
originally announced June 2022.
-
The full rank condition for sparse random matrices
Authors:
Amin Coja-Oghlan,
Pu Gao,
Max Hahn-Klimroth,
Joon Lee,
Noela Müller,
Maurice Rolvien
Abstract:
We derive a sufficient condition for a sparse random matrix with given numbers of non-zero entries in the rows and columns having full row rank. The result covers both matrices over finite fields with independent non-zero entries and $\{0,1\}$-matrices over the rationals. The sufficient condition is generally necessary as well.
We derive a sufficient condition for a sparse random matrix with given numbers of non-zero entries in the rows and columns having full row rank. The result covers both matrices over finite fields with independent non-zero entries and $\{0,1\}$-matrices over the rationals. The sufficient condition is generally necessary as well.
△ Less
Submitted 7 February, 2022; v1 submitted 28 December, 2021;
originally announced December 2021.
-
Minimum degree conditions for containing an $r$-regular $r$-connected subgraph
Authors:
Max Hahn-Klimroth,
Olaf Parczyk,
Yury Person
Abstract:
We study optimal minimum degree conditions when an $n$-vertex graph $G$ contains an $r$-regular $r$-connected subgraph. We prove for $r$ fixed and $n$ large the condition to be $δ(G) \ge \frac{n+r-2}{2}$ when $nr \equiv 0 \pmod 2$. This answers a question of M.~Kriesell.
We study optimal minimum degree conditions when an $n$-vertex graph $G$ contains an $r$-regular $r$-connected subgraph. We prove for $r$ fixed and $n$ large the condition to be $δ(G) \ge \frac{n+r-2}{2}$ when $nr \equiv 0 \pmod 2$. This answers a question of M.~Kriesell.
△ Less
Submitted 17 August, 2021;
originally announced August 2021.
-
Near optimal efficient decoding from pooled data
Authors:
Max Hahn-Klimroth,
Noela Müller
Abstract:
Consider $n$ items, each of which is characterised by one of $d+1$ possible features in $\{0, \ldots, d\}$. We study the inference task of learning these types by queries on subsets, or pools, of the items that only reveal a form of coarsened information on the features - in our case, the sum of all the features in the pool. This is a realistic scenario in situations where one has memory or techni…
▽ More
Consider $n$ items, each of which is characterised by one of $d+1$ possible features in $\{0, \ldots, d\}$. We study the inference task of learning these types by queries on subsets, or pools, of the items that only reveal a form of coarsened information on the features - in our case, the sum of all the features in the pool. This is a realistic scenario in situations where one has memory or technical constraints in the data collection process, or where the data is subject to anonymisation. Related prominent problems are the quantitative group testing problem, of which it is a generalisation, as well as the compressed sensing problem, of which it is a special case. In the present article, we are interested in the minimum number of queries needed to efficiently infer the labels, if one of the features, say $0$, is dominant in the sense that the number $k$ of non-zero features among the items is much smaller than $n$. It is known that in this case, all features can be recovered in exponential time by using no more than $O(k)$ queries. However, so far, all \textit{efficient} inference algorithms required at least $Ω(k\ln n)$ queries, and it was unknown whether this gap is artificial or of a fundamental nature. Here we show that indeed, the previous gap between the information-theoretic and computational bounds is not inherent to the problem by providing an efficient algorithm that succeeds with high probability and employs no more than $O(k)$ measurements. This also solves a long standing open question for the quantitative group testing problem.
△ Less
Submitted 9 February, 2022; v1 submitted 9 August, 2021;
originally announced August 2021.
-
Inference and mutual information on random factor graphs
Authors:
Amin Coja-Oghlan,
Max Hahn-Klimroth,
Philipp Loick,
Noela Müller,
Konstantinos Panagiotou,
Matija Pasch
Abstract:
Random factor graphs provide a powerful framework for the study of inference problems such as decoding problems or the stochastic block model. Information-theoretically the key quantity of interest is the mutual information between the observed factor graph and the underlying ground truth around which the factor graph was created; in the stochastic block model, this would be the planted partition.…
▽ More
Random factor graphs provide a powerful framework for the study of inference problems such as decoding problems or the stochastic block model. Information-theoretically the key quantity of interest is the mutual information between the observed factor graph and the underlying ground truth around which the factor graph was created; in the stochastic block model, this would be the planted partition. The mutual information gauges whether and how well the ground truth can be inferred from the observable data. For a very general model of random factor graphs we verify a formula for the mutual information predicted by physics techniques. As an application we prove a conjecture about low-density generator matrix codes from [Montanari: IEEE Transactions on Information Theory 2005]. Further applications include phase transitions of the stochastic block model and the mixed $k$-spin model from physics.
△ Less
Submitted 15 July, 2020;
originally announced July 2020.
-
Near optimal sparsity-constrained group testing: improved bounds and algorithms
Authors:
Oliver Gebhard,
Max Hahn-Klimroth,
Olaf Parczyk,
Manuel Penschuck,
Maurice Rolvien,
Jonathan Scarlett,
Nelvin Tan
Abstract:
Recent advances in noiseless non-adaptive group testing have led to a precise asymptotic characterization of the number of tests required for high-probability recovery in the sublinear regime $k = n^θ$ (with $θ\in (0,1)$), with $n$ individuals among which $k$ are infected. However, the required number of tests may increase substantially under real-world practical constraints, notably including bou…
▽ More
Recent advances in noiseless non-adaptive group testing have led to a precise asymptotic characterization of the number of tests required for high-probability recovery in the sublinear regime $k = n^θ$ (with $θ\in (0,1)$), with $n$ individuals among which $k$ are infected. However, the required number of tests may increase substantially under real-world practical constraints, notably including bounds on the maximum number $Δ$ of tests an individual can be placed in, or the maximum number $Γ$ of individuals in a given test. While previous works have given recovery guarantees for these settings, significant gaps remain between the achievability and converse bounds. In this paper, we substantially or completely close several of the most prominent gaps. In the case of $Δ$-divisible items, we show that the definite defectives (DD) algorithm coupled with a random regular design is asymptotically optimal in dense scaling regimes, and optimal to within a factor of $\eul$ more generally; we establish this by strengthening both the best known achievability and converse bounds. In the case of $Γ$-sized tests, we provide a comprehensive analysis of the regime $Γ= Θ(1)$, and again establish a precise threshold proving the asymptotic optimality of SCOMP (a slight refinement of DD) equipped with a tailored pooling scheme. Finally, for each of these two settings, we provide near-optimal adaptive algorithms based on sequential splitting, and provably demonstrate gaps between the performance of optimal adaptive and non-adaptive algorithms.
△ Less
Submitted 22 December, 2021; v1 submitted 24 April, 2020;
originally announced April 2020.
-
Random perturbation of sparse graphs
Authors:
Max Hahn-Klimroth,
Giulia S. Maesaka,
Yannick Mogge,
Samuel Mohr,
Olaf Parczyk
Abstract:
In the model of randomly perturbed graphs we consider the union of a deterministic graph $\mathcal{G}_α$ with minimum degree $αn$ and the binomial random graph $\mathbb{G}(n,p)$. This model was introduced by Bohman, Frieze, and Martin and for Hamilton cycles their result bridges the gap between Dirac's theorem and the results by Posá and Koršunov on the threshold in $\mathbb{G}(n,p)$. In this note…
▽ More
In the model of randomly perturbed graphs we consider the union of a deterministic graph $\mathcal{G}_α$ with minimum degree $αn$ and the binomial random graph $\mathbb{G}(n,p)$. This model was introduced by Bohman, Frieze, and Martin and for Hamilton cycles their result bridges the gap between Dirac's theorem and the results by Posá and Koršunov on the threshold in $\mathbb{G}(n,p)$. In this note we extend this result in $\mathcal{G}_α\cup \mathbb{G}(n,p)$ to sparser graphs with $α=o(1)$. More precisely, for any $\varepsilon>0$ and $α\colon \mathbb{N} \mapsto (0,1)$ we show that a.a.s. $\mathcal{G}_α\cup \mathbb{G}(n,β/n)$ is Hamiltonian, where $β= -(6 + \varepsilon) \log(α)$. If $α>0$ is a fixed constant this gives the aforementioned result by Bohman, Frieze, and Martin and if $α=O(1/n)$ the random part $\mathbb{G}(n,p)$ is sufficient for a Hamilton cycle. We also discuss embeddings of bounded degree trees and other spanning structures in this model, which lead to interesting questions on almost spanning embeddings into $\mathbb{G}(n,p)$.
△ Less
Submitted 9 April, 2020;
originally announced April 2020.
-
The random 2-SAT partition function
Authors:
Dimitris Achlioptas,
Amin Coja-Oghlan,
Max Hahn-Klimroth,
Joon Lee,
Noela Müller,
Manuel Penschuck,
Guangyan Zhou
Abstract:
We show that throughout the satisfiable phase the normalised number of satisfying assignments of a random $2$-SAT formula converges in probability to an expression predicted by the cavity method from statistical physics. The proof is based on showing that the Belief Propagation algorithm renders the correct marginal probability that a variable is set to `true' under a uniformly random satisfying a…
▽ More
We show that throughout the satisfiable phase the normalised number of satisfying assignments of a random $2$-SAT formula converges in probability to an expression predicted by the cavity method from statistical physics. The proof is based on showing that the Belief Propagation algorithm renders the correct marginal probability that a variable is set to `true' under a uniformly random satisfying assignment.
△ Less
Submitted 10 February, 2020;
originally announced February 2020.
-
Optimal group testing
Authors:
Amin Coja-Oghlan,
Oliver Gebhard,
Max Hahn-Klimroth,
Philipp Loick
Abstract:
In the group testing problem the aim is to identify a small set of $k\sim n^θ$ infected individuals out of a population size $n$, $0<θ<1$. We avail ourselves of a test procedure capable of testing groups of individuals, with the test returning a positive result iff at least one individual in the group is infected. The aim is to devise a test design with as few tests as possible so that the set of…
▽ More
In the group testing problem the aim is to identify a small set of $k\sim n^θ$ infected individuals out of a population size $n$, $0<θ<1$. We avail ourselves of a test procedure capable of testing groups of individuals, with the test returning a positive result iff at least one individual in the group is infected. The aim is to devise a test design with as few tests as possible so that the set of infected individuals can be identified correctly with high probability. We establish an explicit sharp information-theoretic/algorithmic phase transition $\minf$ for non-adaptive group testing, where all tests are conducted in parallel. Thus, with more than $\minf$ tests the infected individuals can be identified in polynomial time \whp, while learning the set of infected individuals is information-theoretically impossible with fewer tests. In addition, we develop an optimal adaptive scheme where the tests are conducted in two stages.
△ Less
Submitted 18 April, 2020; v1 submitted 6 November, 2019;
originally announced November 2019.
-
The cut metric for probability distributions
Authors:
Amin Coja-Oghlan,
Max Hahn-Klimroth
Abstract:
Guided by the theory of graph limits, we investigate a variant of the cut metric for limit objects of sequences of discrete probability distributions. Apart from establishing basic results, we introduce a natural operation called {\em pinning} on the space of limit objects and show how this operation yields a canonical cut metric approximation to a given probability distribution akin to the weak r…
▽ More
Guided by the theory of graph limits, we investigate a variant of the cut metric for limit objects of sequences of discrete probability distributions. Apart from establishing basic results, we introduce a natural operation called {\em pinning} on the space of limit objects and show how this operation yields a canonical cut metric approximation to a given probability distribution akin to the weak regularity lemma for graphons. We also establish the cut metric continuity of basic operations such as taking product measures.
△ Less
Submitted 30 November, 2020; v1 submitted 31 May, 2019;
originally announced May 2019.