Search | arXiv e-print repository

Parameterized Algorithms on Integer Sets with Small Doubling: Integer Programming, Subset Sum and k-SUM

Abstract: We study the parameterized complexity of algorithmic problems whose input is an integer set $A$ in terms of the doubling constant $C := |A + A|/|A|$, a fundamental measure of additive structure. We present evidence that this new parameterization is algorithmically useful in the form of new results for two difficult, well-studied problems: Integer Programming and Subset Sum. First, we show that d… ▽ More We study the parameterized complexity of algorithmic problems whose input is an integer set $A$ in terms of the doubling constant $C := |A + A|/|A|$, a fundamental measure of additive structure. We present evidence that this new parameterization is algorithmically useful in the form of new results for two difficult, well-studied problems: Integer Programming and Subset Sum. First, we show that determining the feasibility of bounded Integer Programs is a tractable problem when parameterized in the doubling constant. Specifically, we prove that the feasibility of an integer program $I$ with $n$ polynomially-bounded variables and $m$ constraints can be determined in time $n^{O_C(1)} poly(|I|)$ when the column set of the constraint matrix has doubling constant $C$. Second, we show that the Subset Sum and Unbounded Subset Sum problems can be solved in time $n^{O_C(1)}$ and $n^{O_C(\log \log \log n)}$, respectively, where the $O_C$ notation hides functions that depend only on the doubling constant $C$. We also show the equivalence of achieving an FPT algorithm for Subset Sum with bounded doubling and achieving a milestone result for the parameterized complexity of Box ILP. Finally, we design near-linear time algorithms for $k$-SUM as well as tight lower bounds for 4-SUM and nearly tight lower bounds for $k$-SUM, under the $k$-SUM conjecture. Several of our results rely on a new proof that Freiman's Theorem, a central result in additive combinatorics, can be made efficiently constructive. This result may be of independent interest. △ Less

Submitted 25 July, 2024; originally announced July 2024.

Comments: 24 pages, 0 figures

ACM Class: F.2.1; F.2.2

arXiv:2401.07242 [pdf, ps, other]

Testing Sumsets is Hard

Authors: Xi Chen, Shivam Nadimpalli, Tim Randolph, Rocco A. Servedio, Or Zamir

Abstract: A subset $S$ of the Boolean hypercube $\mathbb{F}_2^n$ is a sumset if $S = \{a + b : a, b\in A\}$ for some $A \subseteq \mathbb{F}_2^n$. Sumsets are central objects of study in additive combinatorics, featuring in several influential results. We prove a lower bound of $Ω(2^{n/2})$ for the number of queries needed to test whether a Boolean function $f:\mathbb{F}_2^n \to \{0,1\}$ is the indicator fu… ▽ More A subset $S$ of the Boolean hypercube $\mathbb{F}_2^n$ is a sumset if $S = \{a + b : a, b\in A\}$ for some $A \subseteq \mathbb{F}_2^n$. Sumsets are central objects of study in additive combinatorics, featuring in several influential results. We prove a lower bound of $Ω(2^{n/2})$ for the number of queries needed to test whether a Boolean function $f:\mathbb{F}_2^n \to \{0,1\}$ is the indicator function of a sumset. Our lower bound for testing sumsets follows from sharp bounds on the related problem of shift testing, which may be of independent interest. We also give a near-optimal $2^{n/2} \cdot \mathrm{poly}(n)$-query algorithm for a smoothed analysis formulation of the sumset refutation problem. △ Less

Submitted 4 February, 2024; v1 submitted 14 January, 2024; originally announced January 2024.

Comments: 18 pages

arXiv:2301.07134 [pdf, other]

Subset Sum in Time $2^{n/2} / poly(n)$

Authors: Xi Chen, Yaonan Jin, Tim Randolph, Rocco A. Servedio

Abstract: A major goal in the area of exact exponential algorithms is to give an algorithm for the (worst-case) $n$-input Subset Sum problem that runs in time $2^{(1/2 - c)n}$ for some constant $c>0$. In this paper we give a Subset Sum algorithm with worst-case running time $O(2^{n/2} \cdot n^{-γ})$ for a constant $γ> 0.5023$ in standard word RAM or circuit RAM models. To the best of our knowledge, this is… ▽ More A major goal in the area of exact exponential algorithms is to give an algorithm for the (worst-case) $n$-input Subset Sum problem that runs in time $2^{(1/2 - c)n}$ for some constant $c>0$. In this paper we give a Subset Sum algorithm with worst-case running time $O(2^{n/2} \cdot n^{-γ})$ for a constant $γ> 0.5023$ in standard word RAM or circuit RAM models. To the best of our knowledge, this is the first improvement on the classical ``meet-in-the-middle'' algorithm for worst-case Subset Sum, due to Horowitz and Sahni, which can be implemented in time $O(2^{n/2})$ in these memory models. Our algorithm combines a number of different techniques, including the ``representation method'' introduced by Howgrave-Graham and Joux and subsequent adaptations of the method in Austrin, Kaski, Koivisto, and Nederlof, and Nederlof and Wegrzycki, and ``bit-packing'' techniques used in the work of Baran, Demaine, and Patrascu on subquadratic algorithms for 3SUM. △ Less

Submitted 29 January, 2023; v1 submitted 17 January, 2023; originally announced January 2023.

Comments: 26 pages, 9 figures

MSC Class: 68Q25

arXiv:2110.14607 [pdf, other]

Average-Case Subset Balancing Problems

Authors: Xi Chen, Yaonan Jin, Tim Randolph, Rocco A. Servedio

Abstract: Given a set of $n$ input integers, the Equal Subset Sum problem asks us to find two distinct subsets with the same sum. In this paper we present an algorithm that runs in time $O^*(3^{0.387n})$ in the~average case, significantly improving over the $O^*(3^{0.488n})$ running time of the best known worst-case algorithm and the Meet-in-the-Middle benchmark of $O^*(3^{0.5n})$. Our algorithm generaliz… ▽ More Given a set of $n$ input integers, the Equal Subset Sum problem asks us to find two distinct subsets with the same sum. In this paper we present an algorithm that runs in time $O^*(3^{0.387n})$ in the~average case, significantly improving over the $O^*(3^{0.488n})$ running time of the best known worst-case algorithm and the Meet-in-the-Middle benchmark of $O^*(3^{0.5n})$. Our algorithm generalizes to a number of related problems, such as the ``Generalized Equal Subset Sum'' problem, which asks us to assign a coefficient $c_i$ from a set $C$ to each input number $x_i$ such that $\sum_{i} c_i x_i = 0$. Our algorithm for the average-case version of this problem runs in~time $|C|^{(0.5-c_0/|C|)n}$ for some positive constant $c_0$, whenever $C=\{0, \pm 1, \dots, \pm d\}$ or $\{\pm 1, \dots, \pm d\}$ for some positive integer $d$ (with $O^*(|C|^{0.45n})$ when $|C|<10$). Our results extend to the~problem of finding ``nearly balanced'' solutions in which the target is a not-too-large nonzero offset $τ$. Our approach relies on new structural results that characterize the probability that $\sum_{i} c_i x_i$ $=τ$ has a solution $c \in C^n$ when $x_i$'s are chosen randomly; these results may be of independent interest. Our algorithm is inspired by the ``representation technique'' introduced by Howgrave-Graham and Joux. This requires several new ideas to overcome preprocessing hurdles that arise in the representation framework, as well as a novel application of dynamic programming in the solution recovery phase of the algorithm. △ Less

Submitted 27 October, 2021; originally announced October 2021.

Comments: 44 pages, 5 figures

MSC Class: 68Q25 ACM Class: F.2.1

arXiv:1907.12106 [pdf, other]

A Lower Bound on Cycle-Finding in Sparse Digraphs

Authors: Xi Chen, Tim Randolph, Rocco A. Servedio, Timothy Sun

Abstract: We consider the problem of finding a cycle in a sparse directed graph $G$ that is promised to be far from acyclic, meaning that the smallest feedback arc set in $G$ is large. We prove an information-theoretic lower bound, showing that for $N$-vertex graphs with constant outdegree any algorithm for this problem must make $\tildeΩ(N^{5/9})$ queries to an adjacency list representation of $G$. In the… ▽ More We consider the problem of finding a cycle in a sparse directed graph $G$ that is promised to be far from acyclic, meaning that the smallest feedback arc set in $G$ is large. We prove an information-theoretic lower bound, showing that for $N$-vertex graphs with constant outdegree any algorithm for this problem must make $\tildeΩ(N^{5/9})$ queries to an adjacency list representation of $G$. In the language of property testing, our result is an $\tildeΩ(N^{5/9})$ lower bound on the query complexity of one-sided algorithms for testing whether sparse digraphs with constant outdegree are far from acyclic. This is the first improvement on the $Ω(\sqrt{N})$ lower bound, implicit in Bender and Ron, which follows from a simple birthday paradox argument. △ Less

Submitted 28 July, 2019; originally announced July 2019.

Comments: 25 pages, 2 figures

arXiv:1812.00887 [pdf, ps, other]

Incorporating Deep Features in the Analysis of Tissue Microarray Images

Authors: Donghui Yan, Timothy W. Randolph, Jian Zou, Peng Gong

Abstract: Tissue microarray (TMA) images have been used increasingly often in cancer studies and the validation of biomarkers. TACOMA---a cutting-edge automatic scoring algorithm for TMA images---is comparable to pathologists in terms of accuracy and repeatability. Here we consider how this algorithm may be further improved. Inspired by the recent success of deep learning, we propose to incorporate represen… ▽ More Tissue microarray (TMA) images have been used increasingly often in cancer studies and the validation of biomarkers. TACOMA---a cutting-edge automatic scoring algorithm for TMA images---is comparable to pathologists in terms of accuracy and repeatability. Here we consider how this algorithm may be further improved. Inspired by the recent success of deep learning, we propose to incorporate representations learnable through computation. We explore representations of a group nature through unsupervised learning, e.g., hierarchical clustering and recursive space partition. Information carried by clustering or spatial partitioning may be more concrete than the labels when the data are heterogeneous, or could help when the labels are noisy. The use of such information could be viewed as regularization in model fitting. It is motivated by major challenges in TMA image scoring---heterogeneity and label noise, and the cluster assumption in semi-supervised learning. Using this information on TMA images of breast cancer, we have reduced the error rate of TACOMA by about 6%. Further simulations on synthetic data provide insights on when such representations would likely help. Although we focus on TMAs, learnable representations of this type are expected to be applicable in other settings. △ Less

Submitted 25 November, 2018; originally announced December 2018.

Comments: 23 pages, 6 figures

arXiv:1806.11413 [pdf, other]

(k,p)-Planarity: A Relaxation of Hybrid Planarity

Authors: Emilio Di Giacomo, William J. Lenhart, Giuseppe Liotta, Timothy W. Randolph, Alessandra Tappini

Abstract: We present a new model for hybrid planarity that relaxes existing hybrid representations. A graph $G = (V,E)$ is $(k,p)$-planar if $V$ can be partitioned into clusters of size at most $k$ such that $G$ admits a drawing where: (i) each cluster is associated with a closed, bounded planar region, called a cluster region; (ii) cluster regions are pairwise disjoint, (iii) each vertex $v \in V$ is ident… ▽ More We present a new model for hybrid planarity that relaxes existing hybrid representations. A graph $G = (V,E)$ is $(k,p)$-planar if $V$ can be partitioned into clusters of size at most $k$ such that $G$ admits a drawing where: (i) each cluster is associated with a closed, bounded planar region, called a cluster region; (ii) cluster regions are pairwise disjoint, (iii) each vertex $v \in V$ is identified with at most $p$ distinct points, called \emph{ports}, on the boundary of its cluster region; (iv) each inter-cluster edge $(u,v) \in E$ is identified with a Jordan arc connecting a port of $u$ to a port of $v$; (v) inter-cluster edges do not cross or intersect cluster regions except at their endpoints. We first tightly bound the number of edges in a $(k,p)$-planar graph with $p<k$. We then prove that $(4,1)$-planarity testing and $(2,2)$-planarity testing are NP-complete problems. Finally, we prove that neither the class of $(2,2)$-planar graphs nor the class of $1$-planar graphs contains the other, indicating that the $(k,p)$-planar graphs are a large and novel class. △ Less

Submitted 21 September, 2018; v1 submitted 29 June, 2018; originally announced June 2018.

arXiv:1102.0059 [pdf, ps, other]

doi 10.1214/12-AOAS543

Statistical methods for tissue array images - algorithmic scoring and co-training

Authors: Donghui Yan, Pei Wang, Michael Linden, Beatrice Knudsen, Timothy Randolph

Abstract: Recent advances in tissue microarray technology have allowed immunohistochemistry to become a powerful medium-to-high throughput analysis tool, particularly for the validation of diagnostic and prognostic biomarkers. However, as study size grows, the manual evaluation of these assays becomes a prohibitive limitation; it vastly reduces throughput and greatly increases variability and expense. We pr… ▽ More Recent advances in tissue microarray technology have allowed immunohistochemistry to become a powerful medium-to-high throughput analysis tool, particularly for the validation of diagnostic and prognostic biomarkers. However, as study size grows, the manual evaluation of these assays becomes a prohibitive limitation; it vastly reduces throughput and greatly increases variability and expense. We propose an algorithm - Tissue Array Co-Occurrence Matrix Analysis (TACOMA) - for quantifying cellular phenotypes based on textural regularity summarized by local inter-pixel relationships. The algorithm can be easily trained for any staining pattern, is absent of sensitive tuning parameters and has the ability to report salient pixels in an image that contribute to its score. Pathologists' input via informative training patches is an important aspect of the algorithm that allows the training for any specific marker or cell type. With co-training, the error rate of TACOMA can be reduced substantially for a very small training sample (e.g., with size 30). We give theoretical insights into the success of co-training via thinning of the feature set in a high-dimensional setting when there is "sufficient" redundancy among the features. TACOMA is flexible, transparent and provides a scoring process that can be evaluated with clarity and confidence. In a study based on an estrogen receptor (ER) marker, we show that TACOMA is comparable to, or outperforms, pathologists' performance in terms of accuracy and repeatability. △ Less

Submitted 1 October, 2012; v1 submitted 31 January, 2011; originally announced February 2011.

Comments: Published in at http://dx.doi.org/10.1214/12-AOAS543 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org)

Report number: IMS-AOAS-AOAS543

Journal ref: Annals of Applied Statistics 2012, Vol. 6, No. 3, 1280-1305

Showing 1–8 of 8 results for author: Randolph, T