Search | arXiv e-print repository

PARQO: Penalty-Aware Robust Plan Selection in Query Optimization

Authors: Haibo Xiu, Pankaj K. Agarwal, Jun Yang

Abstract: The effectiveness of a query optimizer relies on the accuracy of selectivity estimates. The execution plan generated by the optimizer can be extremely poor in reality due to uncertainty in these estimates. This paper presents PARQO (Penalty-Aware Robust Plan Selection in Query Optimization), a novel system where users can define powerful robustness metrics that assess the expected penalty of a pla… ▽ More The effectiveness of a query optimizer relies on the accuracy of selectivity estimates. The execution plan generated by the optimizer can be extremely poor in reality due to uncertainty in these estimates. This paper presents PARQO (Penalty-Aware Robust Plan Selection in Query Optimization), a novel system where users can define powerful robustness metrics that assess the expected penalty of a plan with respect to true optimal plans under uncertain selectivity estimates. PARQO uses workload-informed profiling to build error models, and employs principled sensitivity analysis techniques to identify human-interpretable selectivity dimensions with the largest impact on penalty. Experiments on three benchmarks demonstrate that PARQO finds robust, performant plans, and enables efficient and effective parametric optimization. △ Less

Submitted 15 July, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

Comments: This paper has been accepted with shepherding by VLDB 2024 (Vol 17)

arXiv:2403.16312 [pdf, other]

On Reporting Durable Patterns in Temporal Proximity Graphs

Authors: Pankaj K. Agarwal, Xiao Hu, Stavros Sintos, Jun Yang

Abstract: Finding patterns in graphs is a fundamental problem in databases and data mining. In many applications, graphs are temporal and evolve over time, so we are interested in finding durable patterns, such as triangles and paths, which persist over a long time. While there has been work on finding durable simple patterns, existing algorithms do not have provable guarantees and run in strictly super-lin… ▽ More Finding patterns in graphs is a fundamental problem in databases and data mining. In many applications, graphs are temporal and evolve over time, so we are interested in finding durable patterns, such as triangles and paths, which persist over a long time. While there has been work on finding durable simple patterns, existing algorithms do not have provable guarantees and run in strictly super-linear time. The paper leverages the observation that many graphs arising in practice are naturally proximity graphs or can be approximated as such, where nodes are embedded as points in some high-dimensional space, and two nodes are connected by an edge if they are close to each other. We work with an implicit representation of the proximity graph, where nodes are additionally annotated by time intervals, and design near-linear-time algorithms for finding (approximately) durable patterns above a given durability threshold. We also consider an interactive setting where a client experiments with different durability thresholds in a sequence of queries; we show how to compute incremental changes to result patterns efficiently in time near-linear to the size of the changes. △ Less

Submitted 24 March, 2024; originally announced March 2024.

Journal ref: PODS 2024

arXiv:2403.12276 [pdf, ps, other]

Semi-Algebraic Off-line Range Searching and Biclique Partitions in the Plane

Authors: Pankaj K. Agarwal, Esther Ezra, Micha Sharir

Abstract: Let $P$ be a set of $m$ points in ${\mathbb R}^2$, let $Σ$ be a set of $n$ semi-algebraic sets of constant complexity in ${\mathbb R}^2$, let $(S,+)$ be a semigroup, and let $w: P \rightarrow S$ be a weight function on the points of $P$. We describe a randomized algorithm for computing $w(P\capσ)$ for every $σ\inΣ$ in overall expected time… ▽ More Let $P$ be a set of $m$ points in ${\mathbb R}^2$, let $Σ$ be a set of $n$ semi-algebraic sets of constant complexity in ${\mathbb R}^2$, let $(S,+)$ be a semigroup, and let $w: P \rightarrow S$ be a weight function on the points of $P$. We describe a randomized algorithm for computing $w(P\capσ)$ for every $σ\inΣ$ in overall expected time $O^*\bigl( m^{\frac{2s}{5s-4}}n^{\frac{5s-6}{5s-4}} + m^{2/3}n^{2/3} + m + n \bigr)$, where $s>0$ is a constant that bounds the maximum complexity of the regions of $Σ$, and where the $O^*(\cdot)$ notation hides subpolynomial factors. For $s\ge 3$, surprisingly, this bound is smaller than the best-known bound for answering $m$ such queries in an on-line manner. The latter takes $O^*(m^{\frac{s}{2s-1}}n^{\frac{2s-2}{2s-1}}+m+n)$ time. Let $Φ: Σ\times P \rightarrow \{0,1\}$ be the Boolean predicate (of constant complexity) such that $Φ(σ,p) = 1$ if $p\inσ$ and $0$ otherwise, and let $Σ\mathopΦ P = \{ (σ,p) \in Σ\times P \mid Φ(σ,p)=1\}$. Our algorithm actually computes a partition ${\mathcal B}_Φ$ of $Σ\mathopΦ P$ into bipartite cliques (bicliques) of size (i.e., sum of the sizes of the vertex sets of its bicliques) $O^*\bigl( m^{\frac{2s}{5s-4}}n^{\frac{5s-6}{5s-4}} + m^{2/3}n^{2/3} + m + n \bigr)$. It is straightforward to compute $w(P\capσ)$ for all $σ\in Σ$ from ${\mathcal B}_Φ$. Similarly, if $η: Σ\rightarrow S$ is a weight function on the regions of $Σ$, $\sum_{σ\in Σ: p \in σ} η(σ)$, for every point $p\in P$, can be computed from ${\mathcal B}_Φ$ in a straightforward manner. A recent work of Chan et al. solves the online version of this dual point enclosure problem within the same performance bound as our off-line solution. We also mention a few other applications of computing ${\mathcal B}_Φ$. △ Less

Submitted 16 September, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

arXiv:2401.06047 [pdf, other]

Computing Data Distribution from Query Selectivities

Authors: Pankaj K. Agarwal, Rahul Raychaudhury, Stavros Sintos, Jun Yang

Abstract: We are given a set $\mathcal{Z}=\{(R_1,s_1),\ldots, (R_n,s_n)\}$, where each $R_i$ is a \emph{range} in $\Re^d$, such as rectangle or ball, and $s_i \in [0,1]$ denotes its \emph{selectivity}. The goal is to compute a small-size \emph{discrete data distribution} $\mathcal{D}=\{(q_1,w_1),\ldots, (q_m,w_m)\}$, where $q_j\in \Re^d$ and $w_j\in [0,1]$ for each $1\leq j\leq m$, and… ▽ More We are given a set $\mathcal{Z}=\{(R_1,s_1),\ldots, (R_n,s_n)\}$, where each $R_i$ is a \emph{range} in $\Re^d$, such as rectangle or ball, and $s_i \in [0,1]$ denotes its \emph{selectivity}. The goal is to compute a small-size \emph{discrete data distribution} $\mathcal{D}=\{(q_1,w_1),\ldots, (q_m,w_m)\}$, where $q_j\in \Re^d$ and $w_j\in [0,1]$ for each $1\leq j\leq m$, and $\sum_{1\leq j\leq m}w_j= 1$, such that $\mathcal{D}$ is the most \emph{consistent} with $\mathcal{Z}$, i.e., $\mathrm{err}_p(\mathcal{D},\mathcal{Z})=\frac{1}{n}\sum_{i=1}^n\! \lvert{s_i-\sum_{j=1}^m w_j\cdot 1(q_j\in R_i)}\rvert^p$ is minimized. In a database setting, $\mathcal{Z}$ corresponds to a workload of range queries over some table, together with their observed selectivities (i.e., fraction of tuples returned), and $\mathcal{D}$ can be used as compact model for approximating the data distribution within the table without accessing the underlying contents. In this paper, we obtain both upper and lower bounds for this problem. In particular, we show that the problem of finding the best data distribution from selectivity queries is $\mathsf{NP}$-complete. On the positive side, we describe a Monte Carlo algorithm that constructs, in time $O((n+δ^{-d})δ^{-2}\mathop{\mathrm{polylog}})$, a discrete distribution $\tilde{\mathcal{D}}$ of size $O(δ^{-2})$, such that $\mathrm{err}_p(\tilde{\mathcal{D}},\mathcal{Z})\leq \min_{\mathcal{D}}\mathrm{err}_p(\mathcal{D},\mathcal{Z})+δ$ (for $p=1,2,\infty$) where the minimum is taken over all discrete distributions. We also establish conditional lower bounds, which strongly indicate the infeasibility of relative approximations as well as removal of the exponential dependency on the dimension for additive approximations. This suggests that significant improvements to our algorithm are unlikely. △ Less

Submitted 11 January, 2024; originally announced January 2024.

Journal ref: ICDT 2024

arXiv:2311.02172 [pdf, other]

Fast and Accurate Approximations of the Optimal Transport in Semi-Discrete and Discrete Settings

Authors: Pankaj K. Agarwal, Sharath Raghvendra, Pouyan Shirzadian, Keegan Yao

Abstract: Given a $d$-dimensional continuous (resp. discrete) probability distribution $μ$ and a discrete distribution $ν$, the semi-discrete (resp. discrete) Optimal Transport (OT) problem asks for computing a minimum-cost plan to transport mass from $μ$ to $ν$; we assume $n$ to be the size of the support of the discrete distributions, and we assume we have access to an oracle outputting the mass of $μ$ in… ▽ More Given a $d$-dimensional continuous (resp. discrete) probability distribution $μ$ and a discrete distribution $ν$, the semi-discrete (resp. discrete) Optimal Transport (OT) problem asks for computing a minimum-cost plan to transport mass from $μ$ to $ν$; we assume $n$ to be the size of the support of the discrete distributions, and we assume we have access to an oracle outputting the mass of $μ$ inside a constant-complexity region in $O(1)$ time. In this paper, we present three approximation algorithms for the OT problem. (i) Semi-discrete additive approximation: For any $ε>0$, we present an algorithm that computes a semi-discrete transport plan with $ε$-additive error in $n^{O(d)}\log\frac{C_{\max}}ε$ time; here, $C_{\max}$ is the diameter of the supports of $μ$ and $ν$. (ii) Semi-discrete relative approximation: For any $ε>0$, we present an algorithm that computes a $(1+ε)$-approximate semi-discrete transport plan in $nε^{-O(d)}\log(n)\log^{O(d)}(\log n)$ time; here, we assume the ground distance is any $L_p$ norm. (iii) Discrete relative approximation: For any $ε>0$, we present a Monte-Carlo $(1+ε)$-approximation algorithm that computes a transport plan under any $L_p$ norm in $nε^{-O(d)}\log(n)\log^{O(d)}(\log n)$ time; here, we assume that the spread of the supports of $μ$ and $ν$ is polynomially bounded. △ Less

Submitted 3 November, 2023; originally announced November 2023.

arXiv:2311.02050 [pdf, other]

Fast Approximation Algorithms for Piercing Boxes by Points

Authors: Pankaj K. Agarwal, Sariel Har-Peled, Rahul Raychaudhury, Stavros Sintos

Abstract: $\newcommand{\popt}{\mathcal{p}} \newcommand{\Re}{\mathbb{R}}\newcommand{\N}{\mathcal{N}} \newcommand{\BX}{\mathcal{B}} \newcommand{\bb}{\mathsf{b}} \newcommand{\eps}{\varepsilon} \newcommand{\polylog}{\mathrm{polylog}} $ Let $\mathcal{B}=\{\mathsf{b}_1, \ldots ,\mathsf{b}_n\}$ be a set of $n$ axis-aligned boxes in $\Re^d$ where $d\geq2… ▽ More $\newcommand{\popt}{\mathcal{p}} \newcommand{\Re}{\mathbb{R}}\newcommand{\N}{\mathcal{N}} \newcommand{\BX}{\mathcal{B}} \newcommand{\bb}{\mathsf{b}} \newcommand{\eps}{\varepsilon} \newcommand{\polylog}{\mathrm{polylog}} $ Let $\mathcal{B}=\{\mathsf{b}_1, \ldots ,\mathsf{b}_n\}$ be a set of $n$ axis-aligned boxes in $\Re^d$ where $d\geq2$ is a constant. The \emph{piercing problem} is to compute a smallest set of points $\N \subset \Re^d$ that hits every box in $\mathcal{B}$, i.e., $\N\cap \mathsf{b}_i\neq \emptyset$, for $i=1,\ldots, n$. Let $\popt=\popt(\mathcal{B})$, the \emph{piercing number} be the minimum size of a piercing set of $\mathcal{B}$. We present a randomized $O(d^2\log\log \popt)$-approximation algorithm with expected running time $O(n^{d/2}\polylog n)$. Next, we present a faster $O(n^{\log d+1})$-time algorithm but with a slightly inferior approximation factor of $O(2^{4d}\log\log\popt)$. The running time of both algorithms can be improved to near-linear using a sampling-based technique, if $\popt = O(n^{1/d})$. For the dynamic version of the problem in the plane, we obtain a randomized $O(\log\log\popt)$-approximation algorithm with $O(n^{1/2}\polylog n )$ amortized expected update time for insertion or deletion of boxes. For squares in $\Re^2$, the update time can be improved to $O(n^{1/3}\polylog n )$. △ Less

Submitted 24 April, 2025; v1 submitted 3 November, 2023; originally announced November 2023.

Comments: Appeared in SODA 2024

arXiv:2311.01597 [pdf, other]

Vertical Decomposition in 3D and 4D with Applications to Line Nearest-Neighbor Searching in 3D

Authors: Pankaj K. Agarwal, Esther Ezra, Micha Sharir

Abstract: Vertical decomposition is a widely used general technique for decomposing the cells of arrangements of semi-algebraic sets in $d$-space into constant-complexity subcells. In this paper, we settle in the affirmative a few long-standing open problems involving the vertical decomposition of substructures of arrangements for $d=3,4$: (i) Let $\mathcal{S}$ be a collection of $n$ semi-algebraic sets of… ▽ More Vertical decomposition is a widely used general technique for decomposing the cells of arrangements of semi-algebraic sets in $d$-space into constant-complexity subcells. In this paper, we settle in the affirmative a few long-standing open problems involving the vertical decomposition of substructures of arrangements for $d=3,4$: (i) Let $\mathcal{S}$ be a collection of $n$ semi-algebraic sets of constant complexity in 3D, and let $U(m)$ be an upper bound on the complexity of the union $\mathcal{U}(\mathcal{S}')$ of any subset $\mathcal{S}'\subseteq \mathcal{S}$ of size at most $m$. We prove that the complexity of the vertical decomposition of the complement of $\mathcal{U}(\mathcal{S})$ is $O^*(n^2+U(n))$ (where the $O^*(\cdot)$ notation hides subpolynomial factors). We also show that the complexity of the vertical decomposition of the entire arrangement $\mathcal{A}(\mathcal{S})$ is $O^*(n^2+X)$, where $X$ is the number of vertices in $\mathcal{A}(\mathcal{S})$. (ii) Let $\mathcal{F}$ be a collection of $n$ trivariate functions whose graphs are semi-algebraic sets of constant complexity. We show that the complexity of the vertical decomposition of the portion of the arrangement $\mathcal{A}(\mathcal{F})$ in 4D lying below the lower envelope of $\mathcal{F}$ is $O^*(n^3)$. These results lead to efficient algorithms for a variety of problems involving these decompositions, including algorithms for constructing the decompositions themselves, and for constructing $(1/r)$-cuttings of substructures of arrangements of the kinds considered above. One additional algorithm of interest is for output-sensitive point enclosure queries amid semi-algebraic sets in three or four dimensions. In addition, as a main domain of applications, we study various proximity problems involving points and lines in 3D. △ Less

Submitted 2 November, 2023; originally announced November 2023.

arXiv:2310.20615 [pdf, other]

Near-Optimal Min-Sum Motion Planning for Two Square Robots in a Polygonal Environment

Authors: Pankaj K. Agarwal, Dan Halperin, Micha Sharir, Alex Steiger

Abstract: Let $\mathcal{W} \subset \mathbb{R}^2$ be a planar polygonal environment (i.e., a polygon potentially with holes) with a total of $n$ vertices, and let $A,B$ be two robots, each modeled as an axis-aligned unit square, that can translate inside $\mathcal{W}$. Given source and target placements $s_A,t_A,s_B,t_B \in \mathcal{W}$ of $A$ and $B$, respectively, the goal is to compute a \emph{collision-f… ▽ More Let $\mathcal{W} \subset \mathbb{R}^2$ be a planar polygonal environment (i.e., a polygon potentially with holes) with a total of $n$ vertices, and let $A,B$ be two robots, each modeled as an axis-aligned unit square, that can translate inside $\mathcal{W}$. Given source and target placements $s_A,t_A,s_B,t_B \in \mathcal{W}$ of $A$ and $B$, respectively, the goal is to compute a \emph{collision-free motion plan} $\mathbfπ^*$, i.e., a motion plan that continuously moves $A$ from $s_A$ to $t_A$ and $B$ from $s_B$ to $t_B$ so that $A$ and $B$ remain inside $\mathcal{W}$ and do not collide with each other during the motion. Furthermore, if such a plan exists, then we wish to return a plan that minimizes the sum of the lengths of the paths traversed by the robots, $\left|\mathbfπ^*\right|$. Given $\mathcal{W}, s_A,t_A,s_B,t_B$ and a parameter $\varepsilon > 0$, we present an $n^2\varepsilon^{-O(1)} \log n$-time $(1+\varepsilon)$-approximation algorithm for this problem. We are not aware of any polynomial time algorithm for this problem, nor do we know whether the problem is NP-Hard. Our result is the first polynomial-time $(1+\varepsilon)$-approximation algorithm for an optimal motion planning problem involving two robots moving in a polygonal environment. △ Less

Submitted 31 October, 2023; originally announced October 2023.

Comments: The conference version of the paper is accepted to SODA 2024

arXiv:2210.11643 [pdf, other]

All Politics is Local: Redistricting via Local Fairness

Authors: Shao-Heng Ko, Erin Taylor, Pankaj K. Agarwal, Kamesh Munagala

Abstract: In this paper, we propose to use the concept of local fairness for auditing and ranking redistricting plans. Given a redistricting plan, a deviating group is a population-balanced contiguous region in which a majority of individuals are of the same interest and in the minority of their respective districts; such a set of individuals have a justified complaint with how the redistricting plan was dr… ▽ More In this paper, we propose to use the concept of local fairness for auditing and ranking redistricting plans. Given a redistricting plan, a deviating group is a population-balanced contiguous region in which a majority of individuals are of the same interest and in the minority of their respective districts; such a set of individuals have a justified complaint with how the redistricting plan was drawn. A redistricting plan with no deviating groups is called locally fair. We show that the problem of auditing a given plan for local fairness is NP-complete. We present an MCMC approach for auditing as well as ranking redistricting plans. We also present a dynamic programming based algorithm for the auditing problem that we use to demonstrate the efficacy of our MCMC approach. Using these tools, we test local fairness on real-world election data, showing that it is indeed possible to find plans that are almost or exactly locally fair. Further, we show that such plans can be generated while sacrificing very little in terms of compactness and existing fairness measures such as competitiveness of the districts or seat shares of the plans. △ Less

Submitted 19 November, 2022; v1 submitted 20 October, 2022; originally announced October 2022.

arXiv:2210.00123 [pdf, other]

doi 10.4230/LIPIcs.ISAAC.2022.35

doi 10.1016/j.comgeo.2023.102019

Multi-Robot Motion Planning for Unit Discs with Revolving Areas

Authors: Pankaj K. Agarwal, Tzvika Geft, Dan Halperin, Erin Taylor

Abstract: We study the problem of motion planning for a collection of $n$ labeled unit disc robots in a polygonal environment. We assume that the robots have revolving areas around their start and final positions: that each start and each final is contained in a radius $2$ disc lying in the free space, not necessarily concentric with the start or final position, which is free from other start or final posit… ▽ More We study the problem of motion planning for a collection of $n$ labeled unit disc robots in a polygonal environment. We assume that the robots have revolving areas around their start and final positions: that each start and each final is contained in a radius $2$ disc lying in the free space, not necessarily concentric with the start or final position, which is free from other start or final positions. This assumption allows a weakly-monotone motion plan, in which robots move according to an ordering as follows: during the turn of a robot $R$ in the ordering, it moves fully from its start to final position, while other robots do not leave their revolving areas. As $R$ passes through a revolving area, a robot $R'$ that is inside this area may move within the revolving area to avoid a collision. Notwithstanding the existence of a motion plan, we show that minimizing the total traveled distance in this setting, specifically even when the motion plan is restricted to be weakly-monotone, is APX-hard, ruling out any polynomial-time $(1+ε)$-approximation algorithm. On the positive side, we present the first constant-factor approximation algorithm for computing a feasible weakly-monotone motion plan. The total distance traveled by the robots is within an $O(1)$ factor of that of the optimal motion plan, which need not be weakly monotone. Our algorithm extends to an online setting in which the polygonal environment is fixed but the initial and final positions of robots are specified in an online manner. Finally, we observe that the overhead in the overall cost that we add while editing the paths to avoid robot-robot collision can vary significantly depending on the ordering we chose. Finding the best ordering in this respect is known to be NP-hard, and we provide a polynomial time $O(\log n \log \log n)$-approximation algorithm for this problem. △ Less

Submitted 15 June, 2023; v1 submitted 30 September, 2022; originally announced October 2022.

Journal ref: Computational Geometry, 102019 (2023)

arXiv:2207.07211 [pdf, other]

Computing Optimal Kernels in Two Dimensions

Authors: Pankaj K. Agarwal, Sariel Har-Peled

Abstract: Let $P$ be a set of $n$ points in $\Re^2$. For a parameter $\varepsilon\in (0,1)$, a subset $C\subseteq P$ is an \emph{$\varepsilon$-kernel} of $P$ if the projection of the convex hull of $C$ approximates that of $P$ within $(1-\varepsilon)$-factor in every direction. The set $C$ is a \emph{weak $\varepsilon$-kernel} of $P$ if its directional width approximates that of $P$ in every direction. Let… ▽ More Let $P$ be a set of $n$ points in $\Re^2$. For a parameter $\varepsilon\in (0,1)$, a subset $C\subseteq P$ is an \emph{$\varepsilon$-kernel} of $P$ if the projection of the convex hull of $C$ approximates that of $P$ within $(1-\varepsilon)$-factor in every direction. The set $C$ is a \emph{weak $\varepsilon$-kernel} of $P$ if its directional width approximates that of $P$ in every direction. Let $\mathsf{k}_{\varepsilon}(P)$ (resp.\ $\mathsf{k}^{\mathsf{w}}_{\varepsilon}(P)$) denote the minimum-size of an $\varepsilon$-kernel (resp. weak $\varepsilon$-kernel) of $P$. We present an $O(n\mathsf{k}_{\varepsilon}(P)\log n)$-time algorithm for computing an $\varepsilon$-kernel of $P$ of size $\mathsf{k}_{\varepsilon}(P)$, and an $O(n^2\log n)$-time algorithm for computing a weak $\varepsilon$-kernel of $P$ of size ${\mathsf{k}}^{\mathsf{w}}_{\varepsilon}(P)$. We also present a fast algorithm for the Hausdorff variant of this problem. In addition, we introduce the notion of \emph{$\varepsilon$-core}, a convex polygon lying inside $\mathsf{ch}(P)$, prove that it is a good approximation of the optimal $\varepsilon$-kernel, present an efficient algorithm for computing it, and use it to compute an $\varepsilon$-kernel of small size. △ Less

Submitted 13 March, 2023; v1 submitted 14 July, 2022; originally announced July 2022.

Comments: To appear in SoCG 2023

arXiv:2204.03875 [pdf, other]

Deterministic, Near-Linear $\varepsilon$-Approximation Algorithm for Geometric Bipartite Matching

Authors: Pankaj K. Agarwal, Hsien-Chih Chang, Sharath Raghvendra, Allen Xiao

Abstract: Given point sets $A$ and $B$ in $\mathbb{R}^d$ where $A$ and $B$ have equal size $n$ for some constant dimension $d$ and a parameter $\varepsilon>0$, we present the first deterministic algorithm that computes, in $n\cdot(\varepsilon^{-1} \log n)^{O(d)}$ time, a perfect matching between $A$ and $B$ whose cost is within a $(1+\varepsilon)$ factor of the optimal under any $\smash{\ell_p}$-norm. Altho… ▽ More Given point sets $A$ and $B$ in $\mathbb{R}^d$ where $A$ and $B$ have equal size $n$ for some constant dimension $d$ and a parameter $\varepsilon>0$, we present the first deterministic algorithm that computes, in $n\cdot(\varepsilon^{-1} \log n)^{O(d)}$ time, a perfect matching between $A$ and $B$ whose cost is within a $(1+\varepsilon)$ factor of the optimal under any $\smash{\ell_p}$-norm. Although a Monte-Carlo algorithm with a similar running time is proposed by Raghvendra and Agarwal [J. ACM 2020], the best-known deterministic $\varepsilon$-approximation algorithm takes $Ω(n^{3/2})$ time. Our algorithm constructs a (refinement of a) tree cover of $\mathbb{R}^d$, and we develop several new tools to apply a tree-cover based approach to compute an $\varepsilon$-approximate perfect matching. △ Less

Submitted 8 April, 2022; originally announced April 2022.

Comments: The conference version of the paper is accepted to STOC 2022

arXiv:2203.10241 [pdf, other]

Intersection Queries for Flat Semi-Algebraic Objects in Three Dimensions and Related Problems

Authors: Pankaj K. Agarwal, Boris Aronov, Esther Ezra, Matthew J. Katz, Micha Sharir

Abstract: Let $\mathcal{T}$ be a set of $n$ flat (planar) semi-algebraic regions in $\mathbb{R}^3$ of constant complexity (e.g., triangles, disks), which we call plates. We wish to preprocess $\mathcal{T}$ into a data structure so that for a query object $γ$, which is also a plate, we can quickly answer various intersection queries, such as detecting whether $γ$ intersects any plate of $\mathcal{T}$, report… ▽ More Let $\mathcal{T}$ be a set of $n$ flat (planar) semi-algebraic regions in $\mathbb{R}^3$ of constant complexity (e.g., triangles, disks), which we call plates. We wish to preprocess $\mathcal{T}$ into a data structure so that for a query object $γ$, which is also a plate, we can quickly answer various intersection queries, such as detecting whether $γ$ intersects any plate of $\mathcal{T}$, reporting all the plates intersected by $γ$, or counting them. We also consider two simpler cases of this general setting: (i) the input objects are plates and the query objects are constant-degree parametrized algebraic arcs in $\mathbb{R}^3$ (arcs, for short), or (ii) the input objects are arcs and the query objects are plates in $\mathbb{R}^3$. Besides being interesting in their own right, the data structures for these two special cases form the building blocks for handling the general case. By combining the polynomial-partitioning technique with additional tools from real algebraic geometry, we present many different data structures for intersection queries, which also provide trade-offs between their size and query time. For example, if $\mathcal{T}$ is a set of plates and the query objects are algebraic arcs, we obtain a data structure that uses $O^*(n^{4/3})$ storage (where the $O^*(\cdot)$ notation hides factors of the form $n^ε$, for an arbitrarily small $ε>0$) and answers an arc-intersection query in $O^*(n^{2/3})$ time. This result is significant since the exponents do not depend on the specific shape of the input and query objects. We generalize and slightly improve this result: for a parameter $s\in [n^{4/3}, n^{t_q}]$, where ${t_q}\ge 3$ is the number of real parameters needed to specify a query arc, the query time can be decreased to $O^*((n/s^{1/{t_q}})^{\tfrac{2/3}{1-1/{t_q}}})$ by increasing the storage to $O^*(s)$. △ Less

Submitted 16 March, 2025; v1 submitted 19 March, 2022; originally announced March 2022.

Comments: 57 pages, 8 figures, a much extended and expanded version of SoCG'22 paper

arXiv:2112.06899 [pdf, ps, other]

Locally Fair Partitioning

Authors: Pankaj K. Agarwal, Shao-Heng Ko, Kamesh Munagala, Erin Taylor

Abstract: We model the societal task of redistricting political districts as a partitioning problem: Given a set of $n$ points in the plane, each belonging to one of two parties, and a parameter $k$, our goal is to compute a partition $Π$ of the plane into regions so that each region contains roughly $σ= n/k$ points. $Π$ should satisfy a notion of ''local'' fairness, which is related to the notion of core,… ▽ More We model the societal task of redistricting political districts as a partitioning problem: Given a set of $n$ points in the plane, each belonging to one of two parties, and a parameter $k$, our goal is to compute a partition $Π$ of the plane into regions so that each region contains roughly $σ= n/k$ points. $Π$ should satisfy a notion of ''local'' fairness, which is related to the notion of core, a well-studied concept in cooperative game theory. A region is associated with the majority party in that region, and a point is unhappy in $Π$ if it belongs to the minority party. A group $D$ of roughly $σ$ contiguous points is called a deviating group with respect to $Π$ if majority of points in $D$ are unhappy in $Π$. The partition $Π$ is locally fair if there is no deviating group with respect to $Π$. This paper focuses on a restricted case when points lie in $1$D. The problem is non-trivial even in this case. We consider both adversarial and ''beyond worst-case" settings for this problem. For the former, we characterize the input parameters for which a locally fair partition always exists; we also show that a locally fair partition may not exist for certain parameters. We then consider input models where there are ''runs'' of red and blue points. For such clustered inputs, we show that a locally fair partition may not exist for certain values of $σ$, but an approximate locally fair partition exists if we allow some regions to have smaller sizes. We finally present a polynomial-time algorithm for computing a locally fair partition if one exists. △ Less

Submitted 15 December, 2021; v1 submitted 13 December, 2021; originally announced December 2021.

arXiv:2105.01818 [pdf, other]

Dynamic Enumeration of Similarity Joins

Authors: Pankaj K. Agarwal, Xiao Hu, Stavros Sintos, Jun Yang

Abstract: This paper considers enumerating answers to similarity-join queries under dynamic updates: Given two sets of $n$ points $A,B$ in $\mathbb{R}^d$, a metric $φ(\cdot)$, and a distance threshold $r > 0$, report all pairs of points $(a, b) \in A \times B$ with $φ(a,b) \le r$. Our goal is to store $A,B$ into a dynamic data structure that, whenever asked, can enumerate all result pairs with worst-case de… ▽ More This paper considers enumerating answers to similarity-join queries under dynamic updates: Given two sets of $n$ points $A,B$ in $\mathbb{R}^d$, a metric $φ(\cdot)$, and a distance threshold $r > 0$, report all pairs of points $(a, b) \in A \times B$ with $φ(a,b) \le r$. Our goal is to store $A,B$ into a dynamic data structure that, whenever asked, can enumerate all result pairs with worst-case delay guarantee, i.e., the time between enumerating two consecutive pairs is bounded. Furthermore, the data structure can be efficiently updated when a point is inserted into or deleted from $A$ or $B$. We propose several efficient data structures for answering similarity-join queries in low dimension. For exact enumeration of similarity join, we present near-linear-size data structures for $\ell_1, \ell_\infty$ metrics with $\log^{O(1)} n$ update time and delay. We show that such a data structure is not feasible for the $\ell_2$ metric for $d \ge 4$. For approximate enumeration of similarity join, where the distance threshold is a soft constraint, we obtain a unified linear-size data structure for $\ell_p$ metric, with $\log^{O(1)} n$ delay and update time. In high dimensions, we present an efficient data structure with worst-case delay-guarantee using locality sensitive hashing (LSH). △ Less

Submitted 4 May, 2021; originally announced May 2021.

arXiv:2103.12887 [pdf, other]

Efficiently Answering Durability Prediction Queries

Authors: Junyang Gao, Yifan Xu, Pankaj K. Agarwal, Jun Yang

Abstract: We consider a class of queries called durability prediction queries that arise commonly in predictive analytics, where we use a given predictive model to answer questions about possible futures to inform our decisions. Examples of durability prediction queries include "what is the probability that this financial product will keep losing money over the next 12 quarters before turning in any profit?… ▽ More We consider a class of queries called durability prediction queries that arise commonly in predictive analytics, where we use a given predictive model to answer questions about possible futures to inform our decisions. Examples of durability prediction queries include "what is the probability that this financial product will keep losing money over the next 12 quarters before turning in any profit?" and "what is the chance for our proposed server cluster to fail the required service-level agreement before its term ends?" We devise a general method called Multi-Level Splitting Sampling (MLSS) that can efficiently handle complex queries and complex models -- including those involving black-box functions -- as long as the models allow us to simulate possible futures step by step. Our method addresses the inefficiency of standard Monte Carlo (MC) methods by applying the idea of importance splitting to let one "promising" sample path prefix generate multiple "offspring" paths, thereby directing simulation efforts toward more promising paths. We propose practical techniques for designing splitting strategies, freeing users from manual tuning. Experiments show that our approach is able to achieve unbiased estimates and the same error guarantees as standard MC while offering an order-of-magnitude cost reduction. △ Less

Submitted 31 March, 2021; v1 submitted 23 March, 2021; originally announced March 2021.

Comments: in SIGMOD 2021

arXiv:2102.12072 [pdf, other]

Durable Top-K Instant-Stamped Temporal Records with User-Specified Scoring Functions

Authors: Junyang Gao, Stavros Sintos, Pankaj K. Agarwal, Jun Yang

Abstract: A way of finding interesting or exceptional records from instant-stamped temporal data is to consider their "durability," or, intuitively speaking, how well they compare with other records that arrived earlier or later, and how long they retain their supremacy. For example, people are naturally fascinated by claims with long durability, such as: "On January 22, 2006, Kobe Bryant dropped 81 points… ▽ More A way of finding interesting or exceptional records from instant-stamped temporal data is to consider their "durability," or, intuitively speaking, how well they compare with other records that arrived earlier or later, and how long they retain their supremacy. For example, people are naturally fascinated by claims with long durability, such as: "On January 22, 2006, Kobe Bryant dropped 81 points against Toronto Raptors. Since then, this scoring record has yet to be broken." In general, given a sequence of instant-stamped records, suppose that we can rank them by a user-specified scoring function $f$, which may consider multiple attributes of a record to compute a single score for ranking. This paper studies "durable top-$k$ queries", which find records whose scores were within top-$k$ among those records within a "durability window" of given length, e.g., a 10-year window starting/ending at the timestamp of the record. The parameter $k$, the length of the durability window, and parameters of the scoring function (which capture user preference) can all be given at the query time. We illustrate why this problem formulation yields more meaningful answers in some practical situations than other similar types of queries considered previously. We propose new algorithms for solving this problem, and provide a comprehensive theoretical analysis on the complexities of the problem itself and of our algorithms. Our algorithms vastly outperform various baselines (by up to two orders of magnitude on real and synthetic datasets). △ Less

Submitted 23 March, 2021; v1 submitted 24 February, 2021; originally announced February 2021.

Comments: in ICDE 2021

arXiv:2009.14358 [pdf, other]

Clustering under Perturbation Stability in Near-Linear Time

Authors: Pankaj K. Agarwal, Hsien-Chih Chang, Kamesh Munagala, Erin Taylor, Emo Welzl

Abstract: We consider the problem of center-based clustering in low-dimensional Euclidean spaces under the perturbation stability assumption. An instance is $α$-stable if the underlying optimal clustering continues to remain optimal even when all pairwise distances are arbitrarily perturbed by a factor of at most $α$. Our main contribution is in presenting efficient exact algorithms for $α$-stable clusterin… ▽ More We consider the problem of center-based clustering in low-dimensional Euclidean spaces under the perturbation stability assumption. An instance is $α$-stable if the underlying optimal clustering continues to remain optimal even when all pairwise distances are arbitrarily perturbed by a factor of at most $α$. Our main contribution is in presenting efficient exact algorithms for $α$-stable clustering instances whose running times depend near-linearly on the size of the data set when $α\ge 2 + \sqrt{3}$. For $k$-center and $k$-means problems, our algorithms also achieve polynomial dependence on the number of clusters, $k$, when $α\geq 2 + \sqrt{3} + ε$ for any constant $ε> 0$ in any fixed dimension. For $k$-median, our algorithms have polynomial dependence on $k$ for $α> 5$ in any fixed dimension; and for $α\geq 2 + \sqrt{3}$ in two dimensions. Our algorithms are simple, and only require applying techniques such as local search or dynamic programming to a suitably modified metric space, combined with careful choice of data structures. △ Less

Submitted 29 September, 2020; originally announced September 2020.

arXiv:2009.12369 [pdf, other]

On Two-Handed Planar Assembly Partitioning with Connectivity Constraints

Authors: Pankaj K. Agarwal, Boris Aronov, Tzvika Geft, Dan Halperin

Abstract: Assembly planning is a fundamental problem in robotics and automation, which involves designing a sequence of motions to bring the separate constituent parts of a product into their final placement in the product. Assembly planning is naturally cast as a disassembly problem, giving rise to the assembly partitioning problem: Given a set $A$ of parts, find a subset $S\subset A$, referred to as a sub… ▽ More Assembly planning is a fundamental problem in robotics and automation, which involves designing a sequence of motions to bring the separate constituent parts of a product into their final placement in the product. Assembly planning is naturally cast as a disassembly problem, giving rise to the assembly partitioning problem: Given a set $A$ of parts, find a subset $S\subset A$, referred to as a subassembly, such that $S$ can be rigidly translated to infinity along a prescribed direction without colliding with $A\setminus S$. While assembly partitioning is efficiently solvable, it is further desirable for the parts of a subassembly to be easily held together. This motivates the problem that we study, called connected-assembly-partitioning, which additionally requires each of the two subassemblies, $S$ and $A\setminus S$, to be connected. We show that this problem is NP-complete, settling an open question posed by Wilson et al. (1995) a quarter of a century ago, even when $A$ consists of unit-grid squares (i.e., $A$ is polyomino-shaped). Towards this result, we prove the NP-hardness of a new Planar 3-SAT variant having an adjacency requirement for variables appearing in the same clause, which may be of independent interest. On the positive side, we give an $O(2^k n^2)$-time fixed-parameter tractable algorithm (requiring low degree polynomial-time pre-processing) for an assembly $A$ consisting of polygons in the plane, where $n=|A|$ and $k=|S|$. We also describe a special case of unit-grid square assemblies, where a connected partition can always be found in $O(n)$-time. △ Less

Submitted 21 March, 2023; v1 submitted 25 September, 2020; originally announced September 2020.

Comments: This version generalizes our algorithm from the SODA '21 version for unit-grid squares to polygonal assemblies and improves presentation

arXiv:2009.08014 [pdf, other]

doi 10.1145/3397536.3422269

1D and 2D Flow Routing on a Terrain

Authors: Aaron Lowe, Svend C. Svendsen, Pankaj K. Agarwal, Lars Arge

Abstract: An important problem in terrain analysis is modeling how water flows across a terrain creating floods by forming channels and filling depressions. In this paper we study a number of \emph{flow-query} related problems: Given a terrain $Σ$, represented as a triangulated $xy$-monotone surface with $n$ vertices, a rain distribution $R$ which may vary over time, determine how much water is flowing over… ▽ More An important problem in terrain analysis is modeling how water flows across a terrain creating floods by forming channels and filling depressions. In this paper we study a number of \emph{flow-query} related problems: Given a terrain $Σ$, represented as a triangulated $xy$-monotone surface with $n$ vertices, a rain distribution $R$ which may vary over time, determine how much water is flowing over a given edge as a function of time. We develop internal-memory as well as I/O-efficient algorithms for flow queries. This paper contains four main results: (i) We present an internal-memory algorithm that preprocesses $Σ$ into a linear-size data structure that for a (possibly time varying) rain distribution $R$ can return the flow-rate functions of all edges of $Σ$ in $O(ρk+|φ| \log n)$ time, where $ρ$ is the number of sinks in $Σ$, $k$ is the number of times the rain distribution changes, and $|φ|$ is the total complexity of the flow-rate functions that have non-zero values; (ii) We also present an I/O-efficient algorithm for preprocessing $Σ$ into a linear-size data structure so that for a rain distribution $R$, it can compute the flow-rate function of all edges using $O(\text{Sort}(|φ|))$ I/Os and $O(|φ| \log |φ|)$ internal computation time. (iii) $Σ$ can be preprocessed into a linear-size data structure so that for a given rain distribution $R$, the flow-rate function of an edge $(q,r) \in Σ$ under the single-flow direction (SFD) model can be computed more efficiently. (iv) We present an algorithm for computing the two-dimensional channel along which water flows using Manning's equation; a widely used empirical equation that relates the flow-rate of water in an open channel to the geometry of the channel along with the height of water in the channel. △ Less

Submitted 16 September, 2020; originally announced September 2020.

Comments: 12 pages, to be published in SIGSPATIAL'20

arXiv:2003.00202 [pdf, other]

Dynamic geometric set cover and hitting set

Authors: Pankaj K. Agarwal, Hsien-Chih Chang, Subhash Suri, Allen Xiao, Jie Xue

Abstract: We investigate dynamic versions of geometric set cover and hitting set where points and ranges may be inserted or deleted, and we want to efficiently maintain an (approximately) optimal solution for the current problem instance. While their static versions have been extensively studied in the past, surprisingly little is known about dynamic geometric set cover and hitting set. For instance, even f… ▽ More We investigate dynamic versions of geometric set cover and hitting set where points and ranges may be inserted or deleted, and we want to efficiently maintain an (approximately) optimal solution for the current problem instance. While their static versions have been extensively studied in the past, surprisingly little is known about dynamic geometric set cover and hitting set. For instance, even for the most basic case of one-dimensional interval set cover and hitting set, no nontrivial results were known. The main contribution of our paper are two frameworks that lead to efficient data structures for dynamically maintaining set covers and hitting sets in $\mathbb{R}^1$ and $\mathbb{R}^2$. The first framework uses bootstrapping and gives a $(1+\varepsilon)$-approximate data structure for dynamic interval set cover in $\mathbb{R}^1$ with $O(n^α/\varepsilon)$ amortized update time for any constant $α> 0$; in $\mathbb{R}^2$, this method gives $O(1)$-approximate data structures for unit-square (and quadrant) set cover and hitting set with $O(n^{1/2+α})$ amortized update time. The second framework uses local modification, and leads to a $(1+\varepsilon)$-approximate data structure for dynamic interval hitting set in $\mathbb{R}^1$ with $\widetilde{O}(1/\varepsilon)$ amortized update time; in $\mathbb{R}^2$, it gives $O(1)$-approximate data structures for unit-square (and quadrant) set cover and hitting set in the \textit{partially} dynamic settings with $\widetilde{O}(1)$ amortized update time. △ Less

Submitted 29 February, 2020; originally announced March 2020.

Comments: A preliminary version will appear in SoCG'20

arXiv:1909.05380 [pdf, ps, other]

doi 10.14778/3358701.3358708

Selecting Data to Clean for Fact Checking: Minimizing Uncertainty vs. Maximizing Surprise

Authors: Stavros Sintos, Pankaj K. Agarwal, Jun Yang

Abstract: We study the optimization problem of selecting numerical quantities to clean in order to fact-check claims based on such data. Oftentimes, such claims are technically correct, but they can still mislead for two reasons. First, data may contain uncertainty and errors. Second, data can be "fished" to advance particular positions. In practice, fact-checkers cannot afford to clean all data and must ch… ▽ More We study the optimization problem of selecting numerical quantities to clean in order to fact-check claims based on such data. Oftentimes, such claims are technically correct, but they can still mislead for two reasons. First, data may contain uncertainty and errors. Second, data can be "fished" to advance particular positions. In practice, fact-checkers cannot afford to clean all data and must choose to clean what "matters the most" to checking a claim. We explore alternative definitions of what "matters the most": one is to ascertain claim qualities (by minimizing uncertainty in these measures), while an alternative is just to counter the claim (by maximizing the probability of finding a counterargument). We show whether the two objectives align with each other, with important implications on when fact-checkers should exercise care in selective data cleaning, to avoid potential bias introduced by their desire to counter claims. We develop efficient algorithms for solving the various variants of the optimization problem, showing significant improvements over naive solutions. The problem is particularly challenging because the objectives in the fact-checking context are complex, non-linear functions over data. We obtain results that generalize to a large class of functions, with potential applications beyond fact-checking. △ Less

Submitted 11 September, 2019; originally announced September 2019.

arXiv:1903.10943 [pdf, other]

doi 10.1145/3527614

Maintaining the Union of Unit Discs under Insertions with Near-Optimal Overhead

Authors: Pankaj K. Agarwal, Ravid Cohen, Dan Halperin, Wolfgang Mulzer

Abstract: We present efficient dynamic data structures for maintaining the union of unit discs and the lower envelope of pseudo-lines in the plane. More precisely, we present three main results in this paper: (i) We present a linear-size data structure to maintain the union of a set of unit discs under insertions. It can insert a disc and update the union in $O((k+1) \log^2 n)$ time, where $n$ is the curr… ▽ More We present efficient dynamic data structures for maintaining the union of unit discs and the lower envelope of pseudo-lines in the plane. More precisely, we present three main results in this paper: (i) We present a linear-size data structure to maintain the union of a set of unit discs under insertions. It can insert a disc and update the union in $O((k+1) \log^2 n)$ time, where $n$ is the current number of unit discs and $k$ is the combinatorial complexity of the structural change in the union due to the insertion of the new disc. It can also compute, within the same time bound, the area of the union after the insertion of each disc. (ii) We propose a linear-size data structure for maintaining the lower envelope of a set of $x$-monotone pseudo-lines. It can handle insertion/deletion of a pseudo-line in $O(\log^2 n)$ time; for a query point $x_0\in\mathbb{R}$, it can report, in $O(\log n)$ time, the point on the lower envelope with $x$-coordinate $x_0$; and for a query point $q\in\mathbb{R}^2$, it can return all $k$ pseudo-lines lying below $q$ in time $O(\log n+k\log^2 n)$. (iii) We present a linear-size data structure for storing a set of circular arcs of unit radius (not necessarily on the boundary of the union of the corresponding discs), so that for a query unit disc $D$, all input arcs intersecting $D$ can be reported in $O(n^{1/2+\varepsilon} + k)$ time, where $k$ is the output size and $\varepsilon > 0$ is an arbitrarily small constant. A unit-circle arc can be inserted or deleted in $O(\log^2 n)$ time. △ Less

Submitted 5 July, 2023; v1 submitted 20 March, 2019; originally announced March 2019.

Comments: 29 pages, 19 figures; this article is an extension of our previous work arXiv:1902.09565; a preliminary version appeared at SoCG 2019

Journal ref: ACM Transactions on Algorithms (TALG), 18(3), July 2022, Article 26

arXiv:1903.09358 [pdf, other]

Efficient Algorithms for Geometric Partial Matching

Authors: Pankaj K. Agarwal, Hsien-Chih Chang, Allen Xiao

Abstract: Let $A$ and $B$ be two point sets in the plane of sizes $r$ and $n$ respectively (assume $r \leq n$), and let $k$ be a parameter. A matching between $A$ and $B$ is a family of pairs in $A \times B$ so that any point of $A \cup B$ appears in at most one pair. Given two positive integers $p$ and $q$, we define the cost of matching $M$ to be $c(M) = \sum_{(a, b) \in M}\|{a-b}\|_p^q$ where… ▽ More Let $A$ and $B$ be two point sets in the plane of sizes $r$ and $n$ respectively (assume $r \leq n$), and let $k$ be a parameter. A matching between $A$ and $B$ is a family of pairs in $A \times B$ so that any point of $A \cup B$ appears in at most one pair. Given two positive integers $p$ and $q$, we define the cost of matching $M$ to be $c(M) = \sum_{(a, b) \in M}\|{a-b}\|_p^q$ where $\|{\cdot}\|_p$ is the $L_p$-norm. The geometric partial matching problem asks to find the minimum-cost size-$k$ matching between $A$ and $B$. We present efficient algorithms for geometric partial matching problem that work for any powers of $L_p$-norm matching objective: An exact algorithm that runs in $O((n + k^2) {\mathop{\mathrm{polylog}}} n)$ time, and a $(1 + \varepsilon)$-approximation algorithm that runs in $O((n + k\sqrt{k}) {\mathop{\mathrm{polylog}}} n \cdot \log\varepsilon^{-1})$ time. Both algorithms are based on the primal-dual flow augmentation scheme; the main improvements involve using dynamic data structures to achieve efficient flow augmentations. With similar techniques, we give an exact algorithm for the planar transportation problem running in $O(\min\{n^2, rn^{3/2}\} {\mathop{\mathrm{polylog}}} n)$ time. △ Less

Submitted 22 March, 2019; originally announced March 2019.

arXiv:1903.08263 [pdf, other]

Faster Algorithms for the Geometric Transportation Problem

Authors: Pankaj K. Agarwal, Kyle Fox, Debmalya Panigrahi, Kasturi R. Varadarajan, Allen Xiao

Abstract: Let $R$ and $B$ be two point sets in $\mathbb{R}^d$, with $|R|+ |B| = n$ and where $d$ is a constant. Next, let $λ: R \cup B \to \mathbb{N}$ such that $\sum_{r \in R } λ(r) = \sum_{b \in B} λ(b)$ be demand functions over $R$ and $B$. Let $\|\cdot\|$ be a suitable distance function such as the $L_p$ distance. The transportation problem asks to find a map $τ: R \times B \to \mathbb{N}$ such that… ▽ More Let $R$ and $B$ be two point sets in $\mathbb{R}^d$, with $|R|+ |B| = n$ and where $d$ is a constant. Next, let $λ: R \cup B \to \mathbb{N}$ such that $\sum_{r \in R } λ(r) = \sum_{b \in B} λ(b)$ be demand functions over $R$ and $B$. Let $\|\cdot\|$ be a suitable distance function such as the $L_p$ distance. The transportation problem asks to find a map $τ: R \times B \to \mathbb{N}$ such that $\sum_{b \in B}τ(r,b) = λ(r)$, $\sum_{r \in R}τ(r,b) = λ(b)$, and $\sum_{r \in R, b \in B} τ(r,b) \|r-b\|$ is minimized. We present three new results for the transportation problem when $\|r-b\|$ is any $L_p$ metric: - For any constant $\varepsilon > 0$, an $O(n^{1+\varepsilon})$ expected time randomized algorithm that returns a transportation map with expected cost $O(\log^2(1/\varepsilon))$ times the optimal cost. - For any $\varepsilon > 0$, a $(1+\varepsilon)$-approximation in $O(n^{3/2}\varepsilon^{-d} \operatorname{polylog}(U) \operatorname{polylog}(n))$ time, where $U = \max_{p\in R\cup B} λ(p)$. - An exact strongly polynomial $O(n^2 \operatorname{polylog}n)$ time algorithm, for $d = 2$. △ Less

Submitted 19 March, 2019; originally announced March 2019.

Comments: 33 pages, 6 figures, full version of a paper that appeared in SoCG 2017

ACM Class: F.2.2

Journal ref: Symposium on Computational Geometry 2017: 7:1-7:16

arXiv:1902.09565 [pdf, other]

Dynamic Maintenance of the Lower Envelope of Pseudo-Lines

Authors: Pankaj K. Agarwal, Ravid Cohen, Dan Halperin, Wolfgang Mulzer

Abstract: We present a fully dynamic data structure for the maintenance of lower envelopes of pseudo-lines. The structure has $O(\log^2 n)$ update time and $O(\log n)$ vertical ray shooting query time. To achieve this performance, we devise a new algorithm for finding the intersection between two lower envelopes of pseudo-lines in $O(\log n)$ time, using \emph{tentative} binary search; the lower envelopes a… ▽ More We present a fully dynamic data structure for the maintenance of lower envelopes of pseudo-lines. The structure has $O(\log^2 n)$ update time and $O(\log n)$ vertical ray shooting query time. To achieve this performance, we devise a new algorithm for finding the intersection between two lower envelopes of pseudo-lines in $O(\log n)$ time, using \emph{tentative} binary search; the lower envelopes are special in that at $x=-\infty$ any pseudo-line contributing to the first envelope lies below every pseudo-line contributing to the second envelope. The structure requires $O(n)$ storage space. △ Less

Submitted 25 February, 2019; originally announced February 2019.

Comments: appeared in EuroCG 2019 (European conference on Computational Geometry)

arXiv:1812.10269 [pdf, other]

An Efficient Algorithm for Generalized Polynomial Partitioning and Its Applications

Authors: Pankaj K. Agarwal, Boris Aronov, Esther Ezra, Joshua Zahl

Abstract: In 2015, Guth proved that if $S$ is a collection of $n$ $g$-dimensional semi-algebraic sets in $\mathbb{R}^d$ and if $D\geq 1$ is an integer, then there is a $d$-variate polynomial $P$ of degree at most $D$ so that each connected component of $\mathbb{R}^d\setminus Z(P)$ intersects $O(n/D^{d-g})$ sets from $S$. Such a polynomial is called a generalized partitioning polynomial. We present a randomi… ▽ More In 2015, Guth proved that if $S$ is a collection of $n$ $g$-dimensional semi-algebraic sets in $\mathbb{R}^d$ and if $D\geq 1$ is an integer, then there is a $d$-variate polynomial $P$ of degree at most $D$ so that each connected component of $\mathbb{R}^d\setminus Z(P)$ intersects $O(n/D^{d-g})$ sets from $S$. Such a polynomial is called a generalized partitioning polynomial. We present a randomized algorithm that computes such polynomials efficiently -- the expected running time of our algorithm is linear in $|S|$. Our approach exploits the technique of quantifier elimination combined with that of $ε$-samples. We also present an extension of our construction to multi-level polynomial partitioning for semi-algebraic sets in $\mathbb{R}^d$. We present five applications of our result. The first is a data structure for answering point-enclosure queries among a family of semi-algebraic sets in $\mathbb{R}^d$ in $O(\log n)$ time, with storage complexity and expected preprocessing time of $O(n^{d+ε})$. The second is a data structure for answering range-searching queries with semi-algebraic ranges in $\mathbb{R}^d$ in $O(\log n)$ time, with $O(n^{t+ε})$ storage and expected preprocessing time, where $t > 0$ is an integer that depends on $d$ and the description complexity of the ranges. The third is a data structure for answering vertical ray-shooting queries among semi-algebraic sets in $\mathbb{R}^{d}$ in $O(\log^2 n)$ time, with $O(n^{d+ε})$ storage and expected preprocessing time. The fourth is an efficient algorithm for cutting algebraic curves in $\mathbb{R}^2$ into pseudo-segments. The fifth application is for eliminating depth cycles among triangles in $\mathbb{R}^3$, where we show a nearly-optimal algorithm to cut $n$ pairwise disjoint non-vertical triangles in $\mathbb{R}^3$ into pieces that form a depth order. △ Less

Submitted 23 January, 2021; v1 submitted 26 December, 2018; originally announced December 2018.

Comments: 30 pages, 0 figures. v2: final version, to appear in SIAM J. Comput

MSC Class: 68Q01; 68W01; 14Q20 ACM Class: F.2.2; I.3.5

arXiv:1810.10466 [pdf, other]

Approximate Minimum-Weight Matching with Outliers under Translation

Authors: Pankaj K. Agarwal, Haim Kaplan, Geva Kipper, Wolfgang Mulzer, Günter Rote, Micha Sharir, Allen Xiao

Abstract: Our goal is to compare two planar point sets by finding subsets of a given size such that a minimum-weight matching between them has the smallest weight. This can be done by a translation of one set that minimizes the weight of the matching. We give efficient algorithms (a) for finding approximately optimal matchings, when the cost of a matching is the $L_p$-norm of the tuple of the Euclidean dist… ▽ More Our goal is to compare two planar point sets by finding subsets of a given size such that a minimum-weight matching between them has the smallest weight. This can be done by a translation of one set that minimizes the weight of the matching. We give efficient algorithms (a) for finding approximately optimal matchings, when the cost of a matching is the $L_p$-norm of the tuple of the Euclidean distances between the pairs of matched points, for any $p\in [1,\infty]$, and (b)~for constructing small-size approximate minimization (or matching) diagrams: partitions of the translation space into regions, together with an approximate optimal matching for each region. △ Less

Submitted 24 October, 2018; originally announced October 2018.

Comments: 13 pages, 2 figures

arXiv:1803.05765 [pdf, other]

Improved Dynamic Geodesic Nearest Neighbor Searching in a Simple Polygon

Authors: Pankaj K. Agarwal, Lars Arge, Frank Staals

Abstract: We present an efficient dynamic data structure that supports geodesic nearest neighbor queries for a set $S$ of point sites in a static simple polygon $P$. Our data structure allows us to insert a new site in $S$, delete a site from $S$, and ask for the site in $S$ closest to an arbitrary query point $q \in P$. All distances are measured using the geodesic distance, that is, the length of the shor… ▽ More We present an efficient dynamic data structure that supports geodesic nearest neighbor queries for a set $S$ of point sites in a static simple polygon $P$. Our data structure allows us to insert a new site in $S$, delete a site from $S$, and ask for the site in $S$ closest to an arbitrary query point $q \in P$. All distances are measured using the geodesic distance, that is, the length of the shortest path that is completely contained in $P$. Our data structure achieves polylogarithmic update and query times, and uses $O(n\log^3n\log m + m)$ space, where $n$ is the number of sites in $S$ and $m$ is the number of vertices in $P$. The crucial ingredient in our data structure is an implicit representation of a vertical shallow cutting of the geodesic distance functions. We show that such an implicit representation exists, and that we can compute it efficiently. △ Less

Submitted 15 March, 2018; originally announced March 2018.

Comments: full version of our SoCG 2018 paper. arXiv admin note: substantial text overlap with arXiv:1707.02961

arXiv:1706.02939 [pdf, other]

An Efficient Algorithm for Computing High-Quality Paths amid Polygonal Obstacles

Authors: Pankaj K. Agarwal, Kyle Fox, Oren Salzman

Abstract: We study a path-planning problem amid a set $\mathcal{O}$ of obstacles in $\mathbb{R}^2$, in which we wish to compute a short path between two points while also maintaining a high clearance from $\mathcal{O}$; the clearance of a point is its distance from a nearest obstacle in $\mathcal{O}$. Specifically, the problem asks for a path minimizing the reciprocal of the clearance integrated over the le… ▽ More We study a path-planning problem amid a set $\mathcal{O}$ of obstacles in $\mathbb{R}^2$, in which we wish to compute a short path between two points while also maintaining a high clearance from $\mathcal{O}$; the clearance of a point is its distance from a nearest obstacle in $\mathcal{O}$. Specifically, the problem asks for a path minimizing the reciprocal of the clearance integrated over the length of the path. We present the first polynomial-time approximation scheme for this problem. Let $n$ be the total number of obstacle vertices and let $\varepsilon \in (0,1]$. Our algorithm computes in time $O(\frac{n^2}{\varepsilon ^2} \log \frac{n}{\varepsilon})$ a path of total cost at most $(1+\varepsilon)$ times the cost of the optimal path. △ Less

Submitted 9 June, 2017; originally announced June 2017.

Comments: A preliminary version of this work appear in the Proceedings of the 27th Annual ACM-SIAM Symposium on Discrete Algorithms

arXiv:1702.01446 [pdf, other]

Efficient Algorithms for k-Regret Minimizing Sets

Authors: Pankaj K. Agarwal, Nirman Kumar, Stavros Sintos, Subhash Suri

Abstract: A regret minimizing set Q is a small size representation of a much larger database P so that user queries executed on Q return answers whose scores are not much worse than those on the full dataset. In particular, a k-regret minimizing set has the property that the regret ratio between the score of the top-1 item in Q and the score of the top-k item in P is minimized, where the score of an item is… ▽ More A regret minimizing set Q is a small size representation of a much larger database P so that user queries executed on Q return answers whose scores are not much worse than those on the full dataset. In particular, a k-regret minimizing set has the property that the regret ratio between the score of the top-1 item in Q and the score of the top-k item in P is minimized, where the score of an item is the inner product of the item's attributes with a user's weight (preference) vector. The problem is challenging because we want to find a single representative set Q whose regret ratio is small with respect to all possible user weight vectors. We show that k-regret minimization is NP-Complete for all dimensions d >= 3. This settles an open problem from Chester et al. [VLDB 2014], and resolves the complexity status of the problem for all d: the problem is known to have polynomial-time solution for d <= 2. In addition, we propose two new approximation schemes for regret minimization, both with provable guarantees, one based on coresets and another based on hitting sets. We also carry out extensive experimental evaluation, and show that our schemes compute regret-minimizing sets comparable in size to the greedy algorithm proposed in [VLDB 14] but our schemes are significantly faster and scalable to large data sets. △ Less

Submitted 8 February, 2017; v1 submitted 5 February, 2017; originally announced February 2017.

arXiv:1606.00112 [pdf, other]

Nearest-Neighbor Searching Under Uncertainty II

Authors: Pankaj K. Agarwal, Boris Aronov, Sariel Har-Peled, Jeff M. Philips, Ke Yi, Wuzhou Zhang

Abstract: Nearest-neighbor search, which returns the nearest neighbor of a query point in a set of points, is an important and widely studied problem in many fields, and it has wide range of applications. In many of them, such as sensor databases, location-based services, face recognition, and mobile data, the location of data is imprecise. We therefore study nearest-neighbor queries in a probabilistic fram… ▽ More Nearest-neighbor search, which returns the nearest neighbor of a query point in a set of points, is an important and widely studied problem in many fields, and it has wide range of applications. In many of them, such as sensor databases, location-based services, face recognition, and mobile data, the location of data is imprecise. We therefore study nearest-neighbor queries in a probabilistic framework in which the location of each input point is specified as a probability distribution function. We present efficient algorithms for - computing all points that are nearest neighbors of a query point with nonzero probability; and - estimating the probability of a point being the nearest neighbor of a query point, either exactly or within a specified additive error. △ Less

Submitted 1 June, 2016; originally announced June 2016.

arXiv:1512.01876 [pdf, other]

Approximating Dynamic Time Warping and Edit Distance for a Pair of Point Sequences

Authors: Pankaj K. Agarwal, Kyle Fox, Jiangwei Pan, Rex Ying

Abstract: We give the first subquadratic-time approximation schemes for dynamic time warping (DTW) and edit distance (ED) of several natural families of point sequences in $\mathbb{R}^d$, for any fixed $d \ge 1$. In particular, our algorithms compute $(1+\varepsilon)$-approximations of DTW and ED in time near-linear for point sequences drawn from k-packed or k-bounded curves, and subquadratic for backbone s… ▽ More We give the first subquadratic-time approximation schemes for dynamic time warping (DTW) and edit distance (ED) of several natural families of point sequences in $\mathbb{R}^d$, for any fixed $d \ge 1$. In particular, our algorithms compute $(1+\varepsilon)$-approximations of DTW and ED in time near-linear for point sequences drawn from k-packed or k-bounded curves, and subquadratic for backbone sequences. Roughly speaking, a curve is $κ$-packed if the length of its intersection with any ball of radius $r$ is at most $κ\cdot r$, and a curve is $κ$-bounded if the sub-curve between two curve points does not go too far from the two points compared to the distance between the two points. In backbone sequences, consecutive points are spaced at approximately equal distances apart, and no two points lie very close together. Recent results suggest that a subquadratic algorithm for DTW or ED is unlikely for an arbitrary pair of point sequences even for $d=1$. Our algorithms work by constructing a small set of rectangular regions that cover the entries of the dynamic programming table commonly used for these distance measures. The weights of entries inside each rectangle are roughly the same, so we are able to use efficient procedures to approximately compute the cheapest paths through these rectangles. △ Less

Submitted 7 January, 2016; v1 submitted 6 December, 2015; originally announced December 2015.

ACM Class: F.2.2

arXiv:1509.05751 [pdf, ps, other]

Computing the Gromov-Hausdorff Distance for Metric Trees

Authors: Pankaj K. Agarwal, Kyle Fox, Abhinandan Nath, Anastasios Sidiropoulos, Yusu Wang

Abstract: The Gromov-Hausdorff (GH) distance is a natural way to measure distance between two metric spaces. We prove that it is $\mathrm{NP}$-hard to approximate the Gromov-Hausdorff distance better than a factor of $3$ for geodesic metrics on a pair of trees. We complement this result by providing a polynomial time $O(\min\{n, \sqrt{rn}\})$-approximation algorithm for computing the GH distance between a p… ▽ More The Gromov-Hausdorff (GH) distance is a natural way to measure distance between two metric spaces. We prove that it is $\mathrm{NP}$-hard to approximate the Gromov-Hausdorff distance better than a factor of $3$ for geodesic metrics on a pair of trees. We complement this result by providing a polynomial time $O(\min\{n, \sqrt{rn}\})$-approximation algorithm for computing the GH distance between a pair of metric trees, where $r$ is the ratio of the longest edge length in both trees to the shortest edge length. For metric trees with unit length edges, this yields an $O(\sqrt{n})$-approximation algorithm. △ Less

Submitted 13 June, 2017; v1 submitted 18 September, 2015; originally announced September 2015.

Comments: Appeared in Proceedings of the 26th International Symposium on Algorithms and Computation

arXiv:1504.06851 [pdf, other]

Stable Delaunay Graphs

Authors: Pankaj K. Agarwal, Jie Gao, Leonidas J. Guibas, Haim Kaplan, Natan Rubin, Micha Sharir

Abstract: Let $P$ be a set of $n$ points in $\mathrm{R}^2$, and let $\mathrm{DT}(P)$ denote its Euclidean Delaunay triangulation. We introduce the notion of an edge of $\mathrm{DT}(P)$ being {\it stable}. Defined in terms of a parameter $α>0$, a Delaunay edge $pq$ is called $α$-stable, if the (equal) angles at which $p$ and $q$ see the corresponding Voronoi edge $e_{pq}$ are at least $α$. A subgraph $G$ of… ▽ More Let $P$ be a set of $n$ points in $\mathrm{R}^2$, and let $\mathrm{DT}(P)$ denote its Euclidean Delaunay triangulation. We introduce the notion of an edge of $\mathrm{DT}(P)$ being {\it stable}. Defined in terms of a parameter $α>0$, a Delaunay edge $pq$ is called $α$-stable, if the (equal) angles at which $p$ and $q$ see the corresponding Voronoi edge $e_{pq}$ are at least $α$. A subgraph $G$ of $\mathrm{DT}(P)$ is called {\it $(cα, α)$-stable Delaunay graph} ($\mathrm{SDG}$ in short), for some constant $c \ge 1$, if every edge in $G$ is $α$-stable and every $cα$-stable of $\mathrm{DT}(P)$ is in $G$. We show that if an edge is stable in the Euclidean Delaunay triangulation of $P$, then it is also a stable edge, though for a different value of $α$, in the Delaunay triangulation of $P$ under any convex distance function that is sufficiently close to the Euclidean norm, and vice-versa. In particular, a $6α$-stable edge in $\mathrm{DT}(P)$ is $α$-stable in the Delaunay triangulation under the distance function induced by a regular $k$-gon for $k \ge 2π/α$, and vice-versa. Exploiting this relationship and the analysis in~\cite{polydel}, we present a linear-size kinetic data structure (KDS) for maintaining an $(8α,α)$-$\mathrm{SDG}$ as the points of $P$ move. If the points move along algebraic trajectories of bounded degree, the KDS processes nearly quadratic events during the motion, each of which can processed in $O(\log n)$ time. Finally, we show that a number of useful properties of $\mathrm{DT}(P)$ are retained by $\mathrm{SDG}$ of $P$. △ Less

Submitted 26 April, 2015; originally announced April 2015.

Comments: This is a revision of the paper arXiv:1104.0622 presented in SoCG 2010. The revised analysis relies on results reported in the companion paper arXiv:1404.4851

ACM Class: F.2.2; G.2.1

arXiv:1406.6599 [pdf, other]

Convex Hulls under Uncertainty

Authors: Pankaj K. Agarwal, Sariel Har-Peled, Subhash Suri, Hakan Yildiz, Wuzhou Zhang

Abstract: We study the convex-hull problem in a probabilistic setting, motivated by the need to handle data uncertainty inherent in many applications, including sensor databases, location-based services and computer vision. In our framework, the uncertainty of each input site is described by a probability distribution over a finite number of possible locations including a \emph{null} location to account for… ▽ More We study the convex-hull problem in a probabilistic setting, motivated by the need to handle data uncertainty inherent in many applications, including sensor databases, location-based services and computer vision. In our framework, the uncertainty of each input site is described by a probability distribution over a finite number of possible locations including a \emph{null} location to account for non-existence of the point. Our results include both exact and approximation algorithms for computing the probability of a query point lying inside the convex hull of the input, time-space tradeoffs for the membership queries, a connection between Tukey depth and membership queries, as well as a new notion of $\some$-hull that may be a useful representation of uncertain hulls. △ Less

Submitted 25 June, 2014; originally announced June 2014.

arXiv:1406.4005 [pdf, other]

Maintaining Contour Trees of Dynamic Terrains

Authors: Pankaj K. Agarwal, Lars Arge, Thomas Mølhave, Morten Revsbæk, Jungwoo Yang

Abstract: We consider maintaining the contour tree $\mathbb{T}$ of a piecewise-linear triangulation $\mathbb{M}$ that is the graph of a time varying height function $h: \mathbb{R}^2 \rightarrow \mathbb{R}$. We carefully describe the combinatorial change in $\mathbb{T}$ that happen as $h$ varies over time and how these changes relate to topological changes in $\mathbb{M}$. We present a kinetic data structure… ▽ More We consider maintaining the contour tree $\mathbb{T}$ of a piecewise-linear triangulation $\mathbb{M}$ that is the graph of a time varying height function $h: \mathbb{R}^2 \rightarrow \mathbb{R}$. We carefully describe the combinatorial change in $\mathbb{T}$ that happen as $h$ varies over time and how these changes relate to topological changes in $\mathbb{M}$. We present a kinetic data structure that maintains the contour tree of $h$ over time. Our data structure maintains certificates that fail only when $h(v)=h(u)$ for two adjacent vertices $v$ and $u$ in $\mathbb{M}$, or when two saddle vertices lie on the same contour of $\mathbb{M}$. A certificate failure is handled in $O(\log(n))$ time. We also show how our data structure can be extended to handle a set of general update operations on $\mathbb{M}$ and how it can be applied to maintain topological persistence pairs of time varying functions. △ Less

Submitted 25 June, 2014; v1 submitted 16 June, 2014; originally announced June 2014.

ACM Class: F.2.2

arXiv:1404.4851 [pdf, other]

Kinetic Voronoi Diagrams and Delaunay Triangulations under Polygonal Distance Functions

Authors: Pankaj K. Agarwal, Haim Kaplan, Natan Rubin, Micha Sharir

Abstract: Let $P$ be a set of $n$ points and $Q$ a convex $k$-gon in ${\mathbb R}^2$. We analyze in detail the topological (or discrete) changes in the structure of the Voronoi diagram and the Delaunay triangulation of $P$, under the convex distance function defined by $Q$, as the points of $P$ move along prespecified continuous trajectories. Assuming that each point of $P$ moves along an algebraic trajecto… ▽ More Let $P$ be a set of $n$ points and $Q$ a convex $k$-gon in ${\mathbb R}^2$. We analyze in detail the topological (or discrete) changes in the structure of the Voronoi diagram and the Delaunay triangulation of $P$, under the convex distance function defined by $Q$, as the points of $P$ move along prespecified continuous trajectories. Assuming that each point of $P$ moves along an algebraic trajectory of bounded degree, we establish an upper bound of $O(k^4nλ_r(n))$ on the number of topological changes experienced by the diagrams throughout the motion; here $λ_r(n)$ is the maximum length of an $(n,r)$-Davenport-Schinzel sequence, and $r$ is a constant depending on the algebraic degree of the motion of the points. Finally, we describe an algorithm for efficiently maintaining the above structures, using the kinetic data structure (KDS) framework. △ Less

Submitted 18 April, 2014; originally announced April 2014.

ACM Class: F.2.2; G.2.1; I.3.5

arXiv:1303.1585 [pdf, other]

Computing Similarity between a Pair of Trajectories

Authors: Swaminathan Sankararaman, Pankaj K. Agarwal, Thomas Mølhave, Arnold P. Boedihardjo

Abstract: With recent advances in sensing and tracking technology, trajectory data is becoming increasingly pervasive and analysis of trajectory data is becoming exceedingly important. A fundamental problem in analyzing trajectory data is that of identifying common patterns between pairs or among groups of trajectories. In this paper, we consider the problem of identifying similar portions between a pair of… ▽ More With recent advances in sensing and tracking technology, trajectory data is becoming increasingly pervasive and analysis of trajectory data is becoming exceedingly important. A fundamental problem in analyzing trajectory data is that of identifying common patterns between pairs or among groups of trajectories. In this paper, we consider the problem of identifying similar portions between a pair of trajectories, each observed as a sequence of points sampled from it. We present new measures of trajectory similarity --- both local and global --- between a pair of trajectories to distinguish between similar and dissimilar portions. Our model is robust under noise and outliers, it does not make any assumptions on the sampling rates on either trajectory, and it works even if they are partially observed. Additionally, the model also yields a scalar similarity score which can be used to rank multiple pairs of trajectories according to similarity, e.g. in clustering applications. We also present efficient algorithms for computing the similarity under our measures; the worst-case running time is quadratic in the number of sample points. Finally, we present an extensive experimental study evaluating the effectiveness of our approach on real datasets, comparing with it with earlier approaches, and illustrating many issues that arise in trajectory data. Our experiments show that our approach is highly accurate in distinguishing similar and dissimilar portions as compared to earlier methods even with sparse sampling. △ Less

Submitted 6 March, 2013; originally announced March 2013.

arXiv:1209.4463 [pdf, other]

Sparsification of Motion-Planning Roadmaps by Edge Contraction

Authors: Doron Shaharabani, Oren Salzman, Pankaj K. Agarwal, Dan Halperin

Abstract: We present Roadmap Sparsification by Edge Contraction (RSEC), a simple and effective algorithm for reducing the size of a motion-planning roadmap. The algorithm exhibits minimal effect on the quality of paths that can be extracted from the new roadmap. The primitive operation used by RSEC is edge contraction - the contraction of a roadmap edge to a single vertex and the connection of the new verte… ▽ More We present Roadmap Sparsification by Edge Contraction (RSEC), a simple and effective algorithm for reducing the size of a motion-planning roadmap. The algorithm exhibits minimal effect on the quality of paths that can be extracted from the new roadmap. The primitive operation used by RSEC is edge contraction - the contraction of a roadmap edge to a single vertex and the connection of the new vertex to the neighboring vertices of the contracted edge. For certain scenarios, we compress more than 98% of the edges and vertices at the cost of degradation of average shortest path length by at most 2%. △ Less

Submitted 20 September, 2012; originally announced September 2012.

arXiv:1208.3384 [pdf, other]

On Range Searching with Semialgebraic Sets II

Authors: Pankaj K. Agarwal, Jiri Matousek, Micha Sharir

Abstract: Let $P$ be a set of $n$ points in $\R^d$. We present a linear-size data structure for answering range queries on $P$ with constant-complexity semialgebraic sets as ranges, in time close to $O(n^{1-1/d})$. It essentially matches the performance of similar structures for simplex range searching, and, for $d\ge 5$, significantly improves earlier solutions by the first two authors obtained in~1994. Th… ▽ More Let $P$ be a set of $n$ points in $\R^d$. We present a linear-size data structure for answering range queries on $P$ with constant-complexity semialgebraic sets as ranges, in time close to $O(n^{1-1/d})$. It essentially matches the performance of similar structures for simplex range searching, and, for $d\ge 5$, significantly improves earlier solutions by the first two authors obtained in~1994. This almost settles a long-standing open problem in range searching. The data structure is based on the polynomial-partitioning technique of Guth and Katz [arXiv:1011.4105], which shows that for a parameter $r$, $1 < r \le n$, there exists a $d$-variate polynomial $f$ of degree $O(r^{1/d})$ such that each connected component of $\R^d\setminus Z(f)$ contains at most $n/r$ points of $P$, where $Z(f)$ is the zero set of $f$. We present an efficient randomized algorithm for computing such a polynomial partition, which is of independent interest and is likely to have additional applications. △ Less

Submitted 30 May, 2013; v1 submitted 16 August, 2012; originally announced August 2012.

arXiv:1204.5333 [pdf, ps, other]

Computing the Discrete Fréchet Distance in Subquadratic Time

Authors: Pankaj K. Agarwal, Rinat Ben Avraham, Haim Kaplan, Micha Sharir

Abstract: The Fréchet distance is a similarity measure between two curves $A$ and $B$: Informally, it is the minimum length of a leash required to connect a dog, constrained to be on $A$, and its owner, constrained to be on $B$, as they walk without backtracking along their respective curves from one endpoint to the other. The advantage of this measure on other measures such as the Hausdorff distance is tha… ▽ More The Fréchet distance is a similarity measure between two curves $A$ and $B$: Informally, it is the minimum length of a leash required to connect a dog, constrained to be on $A$, and its owner, constrained to be on $B$, as they walk without backtracking along their respective curves from one endpoint to the other. The advantage of this measure on other measures such as the Hausdorff distance is that it takes into account the ordering of the points along the curves. The discrete Fréchet distance replaces the dog and its owner by a pair of frogs that can only reside on $n$ and $m$ specific pebbles on the curves $A$ and $B$, respectively. These frogs hop from a pebble to the next without backtracking. The discrete Fréchet distance can be computed by a rather straightforward quadratic dynamic programming algorithm. However, despite a considerable amount of work on this problem and its variations, there is no subquadratic algorithm known, even for approximation versions of the problem. In this paper we present a subquadratic algorithm for computing the discrete Fréchet distance between two sequences of points in the plane, of respective lengths $m\le n$. The algorithm runs in $O(\dfrac{mn\log\log n}{\log n})$ time and uses $O(n+m)$ storage. Our approach uses the geometry of the problem in a subtle way to encode legal positions of the frogs as states of a finite automata. △ Less

Submitted 24 April, 2012; originally announced April 2012.

arXiv:1104.0622 [pdf, ps, other]

Kinetic Stable Delaunay Graphs

Authors: Pankaj K. Agarwal, Jie Gao, Leonidas J. Guibas, Haim Kaplan, Vladlen Koltun, Natan Rubin, Micha Sharir

Abstract: We consider the problem of maintaining the Euclidean Delaunay triangulation $\DT$ of a set $P$ of $n$ moving points in the plane, along algebraic trajectories of constant description complexity. Since the best known upper bound on the number of topological changes in the full $\DT$ is nearly cubic, we seek to maintain a suitable portion of it that is less volatile yet retains many useful propertie… ▽ More We consider the problem of maintaining the Euclidean Delaunay triangulation $\DT$ of a set $P$ of $n$ moving points in the plane, along algebraic trajectories of constant description complexity. Since the best known upper bound on the number of topological changes in the full $\DT$ is nearly cubic, we seek to maintain a suitable portion of it that is less volatile yet retains many useful properties. We introduce the notion of a stable Delaunay graph, which is a dynamic subgraph of the Delaunay triangulation. The stable Delaunay graph (a) is easy to define, (b) experiences only a nearly quadratic number of discrete changes, (c) is robust under small changes of the norm, and (d) possesses certain useful properties. The stable Delaunay graph ($\SDG$ in short) is defined in terms of a parameter $α>0$, and consists of Delaunay edges $pq$ for which the angles at which $p$ and $q$ see their Voronoi edge $e_{pq}$ are at least $α$. We show that (i) $\SDG$ always contains at least roughly one third of the Delaunay edges; (ii) it contains the $β$-skeleton of $P$, for $β=1+Ω(α^2)$; (iii) it is stable, in the sense that its edges survive for long periods of time, as long as the orientations of the segments connecting (nearby) points of $P$ do not change by much; and (iv) stable Delaunay edges remain stable (with an appropriate redefinition of stability) if we replace the Euclidean norm by any sufficiently close norm. In particular, we can approximate the Euclidean norm by a polygonal norm (namely, a regular $k$-gon, with $k=Θ(1/α)$), and keep track of a Euclidean $\SDG$ by maintaining the full Delaunay triangulation of $P$ under the polygonal norm. We describe two kinetic data structures for maintaining $\SDG$. Both structures use $O^*(n)$ storage and process $O^*(n^2)$ events during the motion, each in $O^*(1)$ time. △ Less

Submitted 4 April, 2011; originally announced April 2011.

Comments: A preliminary version appeared in Proc. SoCG 2010

ACM Class: F.2.2; G.2.1

arXiv:1012.2694 [pdf, ps, other]

The 2-Center Problem in Three Dimensions

Authors: Pankaj K. Agarwal, Rinat Ben Avraham, Micha Sharir

Abstract: Let P be a set of n points in R^3. The 2-center problem for P is to find two congruent balls of minimum radius whose union covers P. We present two randomized algorithms for computing a 2-center of P. The first algorithm runs in O(n^3 log^5 n) expected time, and the second algorithm runs in O((n^2 log^5 n) /(1-r*/r_0)^3) expected time, where r* is the radius of the 2-center balls of P and r_0 is t… ▽ More Let P be a set of n points in R^3. The 2-center problem for P is to find two congruent balls of minimum radius whose union covers P. We present two randomized algorithms for computing a 2-center of P. The first algorithm runs in O(n^3 log^5 n) expected time, and the second algorithm runs in O((n^2 log^5 n) /(1-r*/r_0)^3) expected time, where r* is the radius of the 2-center balls of P and r_0 is the radius of the smallest enclosing ball of P. The second algorithm is faster than the first one as long as r* is not too close to r_0, which is equivalent to the condition that the centers of the two covering balls be not too close to each other. △ Less

Submitted 13 December, 2010; originally announced December 2010.

ACM Class: F.2.2; I.5.3

arXiv:1003.5874 [pdf, other]

Stability of epsilon-Kernels

Authors: Pankaj K. Agarwal, Jeff M. Phillips, Hai Yu

Abstract: Given a set P of n points in |R^d, an eps-kernel K subset P approximates the directional width of P in every direction within a relative (1-eps) factor. In this paper we study the stability of eps-kernels under dynamic insertion and deletion of points to P and by changing the approximation factor eps. In the first case, we say an algorithm for dynamically maintaining a eps-kernel is stable if at… ▽ More Given a set P of n points in |R^d, an eps-kernel K subset P approximates the directional width of P in every direction within a relative (1-eps) factor. In this paper we study the stability of eps-kernels under dynamic insertion and deletion of points to P and by changing the approximation factor eps. In the first case, we say an algorithm for dynamically maintaining a eps-kernel is stable if at most O(1) points change in K as one point is inserted or deleted from P. We describe an algorithm to maintain an eps-kernel of size O(1/eps^{(d-1)/2}) in O(1/eps^{(d-1)/2} + log n) time per update. Not only does our algorithm maintain a stable eps-kernel, its update time is faster than any known algorithm that maintains an eps-kernel of size O(1/eps^{(d-1)/2}). Next, we show that if there is an eps-kernel of P of size k, which may be dramatically less than O(1/eps^{(d-1)/2}), then there is an (eps/2)-kernel of P of size O(min {1/eps^{(d-1)/2}, k^{floor(d/2)} log^{d-2} (1/eps)}). Moreover, there exists a point set P in |R^d and a parameter eps > 0 such that if every eps-kernel of P has size at least k, then any (eps/2)-kernel of P has size Omega(k^{floor(d/2)}). △ Less

Submitted 30 March, 2010; originally announced March 2010.

Comments: 15 pages, 7 figures

arXiv:0912.5182 [pdf, other]

Lipschitz Unimodal and Isotonic Regression on Paths and Trees

Authors: Pankaj K. Agarwal, Jeff M. Phillips, Bardia Sadri

Abstract: We describe algorithms for finding the regression of t, a sequence of values, to the closest sequence s by mean squared error, so that s is always increasing (isotonicity) and so the values of two consecutive points do not increase by too much (Lipschitz). The isotonicity constraint can be replaced with a unimodular constraint, where there is exactly one local maximum in s. These algorithm are g… ▽ More We describe algorithms for finding the regression of t, a sequence of values, to the closest sequence s by mean squared error, so that s is always increasing (isotonicity) and so the values of two consecutive points do not increase by too much (Lipschitz). The isotonicity constraint can be replaced with a unimodular constraint, where there is exactly one local maximum in s. These algorithm are generalized from sequences of values to trees of values. For each scenario we describe near-linear time algorithms. △ Less

Submitted 28 December, 2009; originally announced December 2009.

Comments: 18 pages, 5 figures

arXiv:0912.4115 [pdf, other]

On Channel-Discontinuity-Constraint Routing in Wireless Networks

Authors: Swaminathan Sankararaman, Alon Efrat, Srinivasan Ramasubramanian, Pankaj K. Agarwal

Abstract: Multi-channel wireless networks are increasingly being employed as infrastructure networks, e.g. in metro areas. Nodes in these networks frequently employ directional antennas to improve spatial throughput. In such networks, given a source and destination, it is of interest to compute an optimal path and channel assignment on every link in the path such that the path bandwidth is the same as tha… ▽ More Multi-channel wireless networks are increasingly being employed as infrastructure networks, e.g. in metro areas. Nodes in these networks frequently employ directional antennas to improve spatial throughput. In such networks, given a source and destination, it is of interest to compute an optimal path and channel assignment on every link in the path such that the path bandwidth is the same as that of the link bandwidth and such a path satisfies the constraint that no two consecutive links on the path are assigned the same channel, referred to as "Channel Discontinuity Constraint" (CDC). CDC-paths are also quite useful for TDMA system, where preferably every consecutive links along a path are assigned different time slots. This paper contains several contributions. We first present an $O(N^{2})$ distributed algorithm for discovering the shortest CDC-path between given source and destination. This improves the running time of the $O(N^{3})$ centralized algorithm of Ahuja et al. for finding the minimum-weight CDC-path. Our second result is a generalized $t$-spanner for CDC-path; For any $θ>0$ we show how to construct a sub-network containing only $O(\frac{N}θ)$ edges, such that that length of shortest CDC-paths between arbitrary sources and destinations increases by only a factor of at most $(1-2\sin{\tfracθ{2}})^{-2}$. We propose a novel algorithm to compute the spanner in a distributed manner using only $O(n\log{n})$ messages. An important conclusion of this scheme is in the case of directional antennas are used. In this case, it is enough to consider only the two closest nodes in each cone. △ Less

Submitted 26 February, 2010; v1 submitted 21 December, 2009; originally announced December 2009.

arXiv:0806.4326 [pdf, other]

An Efficient Algorithm for 2D Euclidean 2-Center with Outliers

Authors: Pankaj K. Agarwal, Jeff M. Phillips

Abstract: For a set P of n points in R^2, the Euclidean 2-center problem computes a pair of congruent disks of the minimal radius that cover P. We extend this to the (2,k)-center problem where we compute the minimal radius pair of congruent disks to cover n-k points of P. We present a randomized algorithm with O(n k^7 log^3 n) expected running time for the (2,k)-center problem. We also study the (p,k)-cen… ▽ More For a set P of n points in R^2, the Euclidean 2-center problem computes a pair of congruent disks of the minimal radius that cover P. We extend this to the (2,k)-center problem where we compute the minimal radius pair of congruent disks to cover n-k points of P. We present a randomized algorithm with O(n k^7 log^3 n) expected running time for the (2,k)-center problem. We also study the (p,k)-center problem in R}^2 under the \ell_\infty-metric. We give solutions for p=4 in O(k^{O(1)} n log n) time and for p=5 in O(k^{O(1)} n log^5 n) time. △ Less

Submitted 12 September, 2008; v1 submitted 26 June, 2008; originally announced June 2008.

Comments: 19 pages, 6 figures. Longer version of paper in ESA08. Adds section on l_\infty (p,k)-center

arXiv:cs/9909001 [pdf, ps, other]

Emerging Challenges in Computational Topology

Authors: Marshall Bern, David Eppstein, Pankaj K. Agarwal, Nina Amenta, Paul Chew, Tamal Dey, David P. Dobkin, Herbert Edelsbrunner, Cindy Grimm, Leonidas J. Guibas, John Harer, Joel Hass, Andrew Hicks, Carroll K. Johnson, Gilad Lerman, David Letscher, Paul Plassmann, Eric Sedgwick, Jack Snoeyink, Jeff Weeks, Chee Yap, Denis Zorin

Abstract: Here we present the results of the NSF-funded Workshop on Computational Topology, which met on June 11 and 12 in Miami Beach, Florida. This report identifies important problems involving both computation and topology. Here we present the results of the NSF-funded Workshop on Computational Topology, which met on June 11 and 12 in Miami Beach, Florida. This report identifies important problems involving both computation and topology. △ Less

Submitted 1 September, 1999; originally announced September 1999.

Comments: 20 pages

ACM Class: F.2.2; I.2.9; I.2.10; I.3.5; J.2

arXiv:cs/9808008 [pdf, ps, other]

Computational Geometry Column 34

Authors: Pankaj K. Agarwal, Joseph O'Rourke

Abstract: Problems presented at the open-problem session of the 14th Annual ACM Symposium on Computational Geometry are listed. Problems presented at the open-problem session of the 14th Annual ACM Symposium on Computational Geometry are listed. △ Less

Submitted 31 August, 1998; originally announced August 1998.

ACM Class: F.2.2

Journal ref: SIGACT News, 29(3) (Issue 108) 27-32, Sept. 1998

Showing 1–50 of 50 results for author: Agarwal, P K