Search | arXiv e-print repository

doi 10.1016/j.is.2016.07.006

Approximate Furthest Neighbor with Application to Annulus Query

Authors: Rasmus Pagh, Francesco Silvestri, Johan Sivertsen, Matthew Skala

Abstract: Much recent work has been devoted to approximate nearest neighbor queries. Motivated by applications in recommender systems, we consider approximate furthest neighbor (AFN) queries and present a simple, fast, and highly practical data structure for answering AFN queries in high- dimensional Euclidean space. The method builds on the technique of In- dyk (SODA 2003), storing random projections to pr… ▽ More Much recent work has been devoted to approximate nearest neighbor queries. Motivated by applications in recommender systems, we consider approximate furthest neighbor (AFN) queries and present a simple, fast, and highly practical data structure for answering AFN queries in high- dimensional Euclidean space. The method builds on the technique of In- dyk (SODA 2003), storing random projections to provide sublinear query time for AFN. However, we introduce a different query algorithm, improving on Indyk's approximation factor and reducing the running time by a logarithmic factor. We also present a variation based on a query- independent ordering of the database points; while this does not have the provable approximation factor of the query-dependent data structure, it offers significant improvement in time and space complexity. We give a theoretical analysis, and experimental results. As an application, the query-dependent approach is used for deriving a data structure for the approximate annulus query problem, which is defined as follows: given an input set S and two parameters r > 0 and w >= 1, construct a data structure that returns for each query point q a point p in S such that the distance between p and q is at least r/w and at most wr. △ Less

Submitted 22 November, 2016; originally announced November 2016.

Journal ref: Information Systems, Available online 22 July 2016, ISSN 0306-4379

arXiv:1404.5585 [pdf, other]

A Structural Query System for Han Characters

Authors: Matthew Skala

Abstract: The IDSgrep structural query system for Han character dictionaries is presented. This system includes a data model and syntax for describing the spatial structure of Han characters using Extended Ideographic Description Sequences (EIDSes) based on the Unicode IDS syntax; a language for querying EIDS databases, designed to suit the needs of font developers and foreign language learners; a bit vecto… ▽ More The IDSgrep structural query system for Han character dictionaries is presented. This system includes a data model and syntax for describing the spatial structure of Han characters using Extended Ideographic Description Sequences (EIDSes) based on the Unicode IDS syntax; a language for querying EIDS databases, designed to suit the needs of font developers and foreign language learners; a bit vector index inspired by Bloom filters for faster query operations; a freely available implementation; and format translation from popular third-party IDS and XML character databases. Experimental results are included, with a comparison to other software used for similar applications. △ Less

Submitted 22 April, 2014; originally announced April 2014.

Comments: 28 pages, 5 figures, for submission to ACM Transactions on Asian Language Information Processing

ACM Class: H.3.1

Journal ref: International Journal of Asian Language Processing 23(2) (2015) 127-159

arXiv:1205.6717 [pdf, other]

Robust Non-Parametric Data Approximation of Pointsets via Data Reduction

Authors: Stephane Durocher, Alexandre Leblanc, Jason Morrison, Matthew Skala

Abstract: In this paper we present a novel non-parametric method of simplifying piecewise linear curves and we apply this method as a statistical approximation of structure within sequential data in the plane. We consider the problem of minimizing the average length of sequences of consecutive input points that lie on any one side of the simplified curve. Specifically, given a sequence $P$ of $n$ points in… ▽ More In this paper we present a novel non-parametric method of simplifying piecewise linear curves and we apply this method as a statistical approximation of structure within sequential data in the plane. We consider the problem of minimizing the average length of sequences of consecutive input points that lie on any one side of the simplified curve. Specifically, given a sequence $P$ of $n$ points in the plane that determine a simple polygonal chain consisting of $n-1$ segments, we describe algorithms for selecting an ordered subset $Q \subset P$ (including the first and last points of $P$) that determines a second polygonal chain to approximate $P$, such that the number of crossings between the two polygonal chains is maximized, and the cardinality of $Q$ is minimized among all such maximizing subsets of $P$. Our algorithms have respective running times $O(n^2\log n)$ when $P$ is monotonic and $O(n^2\log^2 n)$ when $P$ is an arbitrary simple polyline. Finally, we examine the application of our algorithms iteratively in a bootstrapping technique to define a smooth robust non-parametric approximation of the original sequence. △ Less

Submitted 30 May, 2012; originally announced May 2012.

Comments: 13 pages, 6 figures

ACM Class: F.2.1; G.1.2

Showing 1–3 of 3 results for author: Skala, M