-
Approximate Furthest Neighbor with Application to Annulus Query
Authors:
Rasmus Pagh,
Francesco Silvestri,
Johan Sivertsen,
Matthew Skala
Abstract:
Much recent work has been devoted to approximate nearest neighbor queries. Motivated by applications in recommender systems, we consider approximate furthest neighbor (AFN) queries and present a simple, fast, and highly practical data structure for answering AFN queries in high- dimensional Euclidean space. The method builds on the technique of In- dyk (SODA 2003), storing random projections to pr…
▽ More
Much recent work has been devoted to approximate nearest neighbor queries. Motivated by applications in recommender systems, we consider approximate furthest neighbor (AFN) queries and present a simple, fast, and highly practical data structure for answering AFN queries in high- dimensional Euclidean space. The method builds on the technique of In- dyk (SODA 2003), storing random projections to provide sublinear query time for AFN. However, we introduce a different query algorithm, improving on Indyk's approximation factor and reducing the running time by a logarithmic factor. We also present a variation based on a query- independent ordering of the database points; while this does not have the provable approximation factor of the query-dependent data structure, it offers significant improvement in time and space complexity. We give a theoretical analysis, and experimental results. As an application, the query-dependent approach is used for deriving a data structure for the approximate annulus query problem, which is defined as follows: given an input set S and two parameters r > 0 and w >= 1, construct a data structure that returns for each query point q a point p in S such that the distance between p and q is at least r/w and at most wr.
△ Less
Submitted 22 November, 2016;
originally announced November 2016.
-
A Structural Query System for Han Characters
Authors:
Matthew Skala
Abstract:
The IDSgrep structural query system for Han character dictionaries is presented. This system includes a data model and syntax for describing the spatial structure of Han characters using Extended Ideographic Description Sequences (EIDSes) based on the Unicode IDS syntax; a language for querying EIDS databases, designed to suit the needs of font developers and foreign language learners; a bit vecto…
▽ More
The IDSgrep structural query system for Han character dictionaries is presented. This system includes a data model and syntax for describing the spatial structure of Han characters using Extended Ideographic Description Sequences (EIDSes) based on the Unicode IDS syntax; a language for querying EIDS databases, designed to suit the needs of font developers and foreign language learners; a bit vector index inspired by Bloom filters for faster query operations; a freely available implementation; and format translation from popular third-party IDS and XML character databases. Experimental results are included, with a comparison to other software used for similar applications.
△ Less
Submitted 22 April, 2014;
originally announced April 2014.
-
Robust Non-Parametric Data Approximation of Pointsets via Data Reduction
Authors:
Stephane Durocher,
Alexandre Leblanc,
Jason Morrison,
Matthew Skala
Abstract:
In this paper we present a novel non-parametric method of simplifying piecewise linear curves and we apply this method as a statistical approximation of structure within sequential data in the plane. We consider the problem of minimizing the average length of sequences of consecutive input points that lie on any one side of the simplified curve. Specifically, given a sequence $P$ of $n$ points in…
▽ More
In this paper we present a novel non-parametric method of simplifying piecewise linear curves and we apply this method as a statistical approximation of structure within sequential data in the plane. We consider the problem of minimizing the average length of sequences of consecutive input points that lie on any one side of the simplified curve. Specifically, given a sequence $P$ of $n$ points in the plane that determine a simple polygonal chain consisting of $n-1$ segments, we describe algorithms for selecting an ordered subset $Q \subset P$ (including the first and last points of $P$) that determines a second polygonal chain to approximate $P$, such that the number of crossings between the two polygonal chains is maximized, and the cardinality of $Q$ is minimized among all such maximizing subsets of $P$. Our algorithms have respective running times $O(n^2\log n)$ when $P$ is monotonic and $O(n^2\log^2 n)$ when $P$ is an arbitrary simple polyline. Finally, we examine the application of our algorithms iteratively in a bootstrapping technique to define a smooth robust non-parametric approximation of the original sequence.
△ Less
Submitted 30 May, 2012;
originally announced May 2012.