-
Parameterized Algorithms on Integer Sets with Small Doubling: Integer Programming, Subset Sum and k-SUM
Authors:
Tim Randolph,
Karol Węgrzycki
Abstract:
We study the parameterized complexity of algorithmic problems whose input is an integer set $A$ in terms of the doubling constant $C := |A + A|/|A|$, a fundamental measure of additive structure. We present evidence that this new parameterization is algorithmically useful in the form of new results for two difficult, well-studied problems: Integer Programming and Subset Sum.
First, we show that d…
▽ More
We study the parameterized complexity of algorithmic problems whose input is an integer set $A$ in terms of the doubling constant $C := |A + A|/|A|$, a fundamental measure of additive structure. We present evidence that this new parameterization is algorithmically useful in the form of new results for two difficult, well-studied problems: Integer Programming and Subset Sum.
First, we show that determining the feasibility of bounded Integer Programs is a tractable problem when parameterized in the doubling constant. Specifically, we prove that the feasibility of an integer program $I$ with $n$ polynomially-bounded variables and $m$ constraints can be determined in time $n^{O_C(1)} poly(|I|)$ when the column set of the constraint matrix has doubling constant $C$.
Second, we show that the Subset Sum and Unbounded Subset Sum problems can be solved in time $n^{O_C(1)}$ and $n^{O_C(\log \log \log n)}$, respectively, where the $O_C$ notation hides functions that depend only on the doubling constant $C$. We also show the equivalence of achieving an FPT algorithm for Subset Sum with bounded doubling and achieving a milestone result for the parameterized complexity of Box ILP. Finally, we design near-linear time algorithms for $k$-SUM as well as tight lower bounds for 4-SUM and nearly tight lower bounds for $k$-SUM, under the $k$-SUM conjecture.
Several of our results rely on a new proof that Freiman's Theorem, a central result in additive combinatorics, can be made efficiently constructive. This result may be of independent interest.
△ Less
Submitted 25 July, 2024;
originally announced July 2024.
-
Testing Sumsets is Hard
Authors:
Xi Chen,
Shivam Nadimpalli,
Tim Randolph,
Rocco A. Servedio,
Or Zamir
Abstract:
A subset $S$ of the Boolean hypercube $\mathbb{F}_2^n$ is a sumset if $S = \{a + b : a, b\in A\}$ for some $A \subseteq \mathbb{F}_2^n$. Sumsets are central objects of study in additive combinatorics, featuring in several influential results. We prove a lower bound of $Ω(2^{n/2})$ for the number of queries needed to test whether a Boolean function $f:\mathbb{F}_2^n \to \{0,1\}$ is the indicator fu…
▽ More
A subset $S$ of the Boolean hypercube $\mathbb{F}_2^n$ is a sumset if $S = \{a + b : a, b\in A\}$ for some $A \subseteq \mathbb{F}_2^n$. Sumsets are central objects of study in additive combinatorics, featuring in several influential results. We prove a lower bound of $Ω(2^{n/2})$ for the number of queries needed to test whether a Boolean function $f:\mathbb{F}_2^n \to \{0,1\}$ is the indicator function of a sumset. Our lower bound for testing sumsets follows from sharp bounds on the related problem of shift testing, which may be of independent interest. We also give a near-optimal $2^{n/2} \cdot \mathrm{poly}(n)$-query algorithm for a smoothed analysis formulation of the sumset refutation problem.
△ Less
Submitted 4 February, 2024; v1 submitted 14 January, 2024;
originally announced January 2024.
-
Subset Sum in Time $2^{n/2} / poly(n)$
Authors:
Xi Chen,
Yaonan Jin,
Tim Randolph,
Rocco A. Servedio
Abstract:
A major goal in the area of exact exponential algorithms is to give an algorithm for the (worst-case) $n$-input Subset Sum problem that runs in time $2^{(1/2 - c)n}$ for some constant $c>0$. In this paper we give a Subset Sum algorithm with worst-case running time $O(2^{n/2} \cdot n^{-γ})$ for a constant $γ> 0.5023$ in standard word RAM or circuit RAM models. To the best of our knowledge, this is…
▽ More
A major goal in the area of exact exponential algorithms is to give an algorithm for the (worst-case) $n$-input Subset Sum problem that runs in time $2^{(1/2 - c)n}$ for some constant $c>0$. In this paper we give a Subset Sum algorithm with worst-case running time $O(2^{n/2} \cdot n^{-γ})$ for a constant $γ> 0.5023$ in standard word RAM or circuit RAM models. To the best of our knowledge, this is the first improvement on the classical ``meet-in-the-middle'' algorithm for worst-case Subset Sum, due to Horowitz and Sahni, which can be implemented in time $O(2^{n/2})$ in these memory models.
Our algorithm combines a number of different techniques, including the ``representation method'' introduced by Howgrave-Graham and Joux and subsequent adaptations of the method in Austrin, Kaski, Koivisto, and Nederlof, and Nederlof and Wegrzycki, and ``bit-packing'' techniques used in the work of Baran, Demaine, and Patrascu on subquadratic algorithms for 3SUM.
△ Less
Submitted 29 January, 2023; v1 submitted 17 January, 2023;
originally announced January 2023.
-
Average-Case Subset Balancing Problems
Authors:
Xi Chen,
Yaonan Jin,
Tim Randolph,
Rocco A. Servedio
Abstract:
Given a set of $n$ input integers, the Equal Subset Sum problem asks us to find two distinct subsets with the same sum. In this paper we present an algorithm that runs in time $O^*(3^{0.387n})$ in the~average case, significantly improving over the $O^*(3^{0.488n})$ running time of the best known worst-case algorithm and the Meet-in-the-Middle benchmark of $O^*(3^{0.5n})$.
Our algorithm generaliz…
▽ More
Given a set of $n$ input integers, the Equal Subset Sum problem asks us to find two distinct subsets with the same sum. In this paper we present an algorithm that runs in time $O^*(3^{0.387n})$ in the~average case, significantly improving over the $O^*(3^{0.488n})$ running time of the best known worst-case algorithm and the Meet-in-the-Middle benchmark of $O^*(3^{0.5n})$.
Our algorithm generalizes to a number of related problems, such as the ``Generalized Equal Subset Sum'' problem, which asks us to assign a coefficient $c_i$ from a set $C$ to each input number $x_i$ such that $\sum_{i} c_i x_i = 0$. Our algorithm for the average-case version of this problem runs in~time $|C|^{(0.5-c_0/|C|)n}$ for some positive constant $c_0$, whenever $C=\{0, \pm 1, \dots, \pm d\}$ or $\{\pm 1, \dots, \pm d\}$ for some positive integer $d$ (with $O^*(|C|^{0.45n})$ when $|C|<10$). Our results extend to the~problem of finding ``nearly balanced'' solutions in which the target is a not-too-large nonzero offset $τ$.
Our approach relies on new structural results that characterize the probability that $\sum_{i} c_i x_i$ $=τ$ has a solution $c \in C^n$ when $x_i$'s are chosen randomly; these results may be of independent interest. Our algorithm is inspired by the ``representation technique'' introduced by Howgrave-Graham and Joux. This requires several new ideas to overcome preprocessing hurdles that arise in the representation framework, as well as a novel application of dynamic programming in the solution recovery phase of the algorithm.
△ Less
Submitted 27 October, 2021;
originally announced October 2021.
-
A Lower Bound on Cycle-Finding in Sparse Digraphs
Authors:
Xi Chen,
Tim Randolph,
Rocco A. Servedio,
Timothy Sun
Abstract:
We consider the problem of finding a cycle in a sparse directed graph $G$ that is promised to be far from acyclic, meaning that the smallest feedback arc set in $G$ is large. We prove an information-theoretic lower bound, showing that for $N$-vertex graphs with constant outdegree any algorithm for this problem must make $\tildeΩ(N^{5/9})$ queries to an adjacency list representation of $G$. In the…
▽ More
We consider the problem of finding a cycle in a sparse directed graph $G$ that is promised to be far from acyclic, meaning that the smallest feedback arc set in $G$ is large. We prove an information-theoretic lower bound, showing that for $N$-vertex graphs with constant outdegree any algorithm for this problem must make $\tildeΩ(N^{5/9})$ queries to an adjacency list representation of $G$. In the language of property testing, our result is an $\tildeΩ(N^{5/9})$ lower bound on the query complexity of one-sided algorithms for testing whether sparse digraphs with constant outdegree are far from acyclic. This is the first improvement on the $Ω(\sqrt{N})$ lower bound, implicit in Bender and Ron, which follows from a simple birthday paradox argument.
△ Less
Submitted 28 July, 2019;
originally announced July 2019.
-
Incorporating Deep Features in the Analysis of Tissue Microarray Images
Authors:
Donghui Yan,
Timothy W. Randolph,
Jian Zou,
Peng Gong
Abstract:
Tissue microarray (TMA) images have been used increasingly often in cancer studies and the validation of biomarkers. TACOMA---a cutting-edge automatic scoring algorithm for TMA images---is comparable to pathologists in terms of accuracy and repeatability. Here we consider how this algorithm may be further improved. Inspired by the recent success of deep learning, we propose to incorporate represen…
▽ More
Tissue microarray (TMA) images have been used increasingly often in cancer studies and the validation of biomarkers. TACOMA---a cutting-edge automatic scoring algorithm for TMA images---is comparable to pathologists in terms of accuracy and repeatability. Here we consider how this algorithm may be further improved. Inspired by the recent success of deep learning, we propose to incorporate representations learnable through computation. We explore representations of a group nature through unsupervised learning, e.g., hierarchical clustering and recursive space partition. Information carried by clustering or spatial partitioning may be more concrete than the labels when the data are heterogeneous, or could help when the labels are noisy. The use of such information could be viewed as regularization in model fitting. It is motivated by major challenges in TMA image scoring---heterogeneity and label noise, and the cluster assumption in semi-supervised learning. Using this information on TMA images of breast cancer, we have reduced the error rate of TACOMA by about 6%. Further simulations on synthetic data provide insights on when such representations would likely help. Although we focus on TMAs, learnable representations of this type are expected to be applicable in other settings.
△ Less
Submitted 25 November, 2018;
originally announced December 2018.
-
(k,p)-Planarity: A Relaxation of Hybrid Planarity
Authors:
Emilio Di Giacomo,
William J. Lenhart,
Giuseppe Liotta,
Timothy W. Randolph,
Alessandra Tappini
Abstract:
We present a new model for hybrid planarity that relaxes existing hybrid representations. A graph $G = (V,E)$ is $(k,p)$-planar if $V$ can be partitioned into clusters of size at most $k$ such that $G$ admits a drawing where: (i) each cluster is associated with a closed, bounded planar region, called a cluster region; (ii) cluster regions are pairwise disjoint, (iii) each vertex $v \in V$ is ident…
▽ More
We present a new model for hybrid planarity that relaxes existing hybrid representations. A graph $G = (V,E)$ is $(k,p)$-planar if $V$ can be partitioned into clusters of size at most $k$ such that $G$ admits a drawing where: (i) each cluster is associated with a closed, bounded planar region, called a cluster region; (ii) cluster regions are pairwise disjoint, (iii) each vertex $v \in V$ is identified with at most $p$ distinct points, called \emph{ports}, on the boundary of its cluster region; (iv) each inter-cluster edge $(u,v) \in E$ is identified with a Jordan arc connecting a port of $u$ to a port of $v$; (v) inter-cluster edges do not cross or intersect cluster regions except at their endpoints. We first tightly bound the number of edges in a $(k,p)$-planar graph with $p<k$. We then prove that $(4,1)$-planarity testing and $(2,2)$-planarity testing are NP-complete problems. Finally, we prove that neither the class of $(2,2)$-planar graphs nor the class of $1$-planar graphs contains the other, indicating that the $(k,p)$-planar graphs are a large and novel class.
△ Less
Submitted 21 September, 2018; v1 submitted 29 June, 2018;
originally announced June 2018.
-
Statistical methods for tissue array images - algorithmic scoring and co-training
Authors:
Donghui Yan,
Pei Wang,
Michael Linden,
Beatrice Knudsen,
Timothy Randolph
Abstract:
Recent advances in tissue microarray technology have allowed immunohistochemistry to become a powerful medium-to-high throughput analysis tool, particularly for the validation of diagnostic and prognostic biomarkers. However, as study size grows, the manual evaluation of these assays becomes a prohibitive limitation; it vastly reduces throughput and greatly increases variability and expense. We pr…
▽ More
Recent advances in tissue microarray technology have allowed immunohistochemistry to become a powerful medium-to-high throughput analysis tool, particularly for the validation of diagnostic and prognostic biomarkers. However, as study size grows, the manual evaluation of these assays becomes a prohibitive limitation; it vastly reduces throughput and greatly increases variability and expense. We propose an algorithm - Tissue Array Co-Occurrence Matrix Analysis (TACOMA) - for quantifying cellular phenotypes based on textural regularity summarized by local inter-pixel relationships. The algorithm can be easily trained for any staining pattern, is absent of sensitive tuning parameters and has the ability to report salient pixels in an image that contribute to its score. Pathologists' input via informative training patches is an important aspect of the algorithm that allows the training for any specific marker or cell type. With co-training, the error rate of TACOMA can be reduced substantially for a very small training sample (e.g., with size 30). We give theoretical insights into the success of co-training via thinning of the feature set in a high-dimensional setting when there is "sufficient" redundancy among the features. TACOMA is flexible, transparent and provides a scoring process that can be evaluated with clarity and confidence. In a study based on an estrogen receptor (ER) marker, we show that TACOMA is comparable to, or outperforms, pathologists' performance in terms of accuracy and repeatability.
△ Less
Submitted 1 October, 2012; v1 submitted 31 January, 2011;
originally announced February 2011.