Showing 1–2 of 2 results for author: Doshi, I

Search v0.5.6 released 2020-02-24

arXiv:2010.09426 [pdf, other]

cs.IR

LANNS: A Web-Scale Approximate Nearest Neighbor Lookup System

Authors: Ishita Doshi, Dhritiman Das, Ashish Bhutani, Rajeev Kumar, Rushi Bhatt, Niranjan Balasubramanian

Abstract: Nearest neighbor search (NNS) has a wide range of applications in information retrieval, computer vision, machine learning, databases, and other areas. Existing state-of-the-art algorithm for nearest neighbor search, Hierarchical Navigable Small World Networks(HNSW), is unable to scale to large datasets of 100M records in high dimensions. In this paper, we propose LANNS, an end-to-end platform for… ▽ More Nearest neighbor search (NNS) has a wide range of applications in information retrieval, computer vision, machine learning, databases, and other areas. Existing state-of-the-art algorithm for nearest neighbor search, Hierarchical Navigable Small World Networks(HNSW), is unable to scale to large datasets of 100M records in high dimensions. In this paper, we propose LANNS, an end-to-end platform for Approximate Nearest Neighbor Search, which scales for web-scale datasets. Library for Large Scale Approximate Nearest Neighbor Search (LANNS) is deployed in multiple production systems for identifying topK ($100 \leq topK \leq 200$) approximate nearest neighbors with a latency of a few milliseconds per query, high throughput of 2.5k Queries Per Second (QPS) on a single node, on large ($\sim$180M data points) high dimensional (50-2048 dimensional) datasets. △ Less

Submitted 19 October, 2020; originally announced October 2020.

Comments: 10 pages, 9 figures, 9 tables

ACM Class: H.3.3; H.3.4; H.3.1
arXiv:2008.10828 [pdf, other]

cs.DS

Efficient Hierarchical Clustering for Classification and Anomaly Detection

Authors: Ishita Doshi, Sreekalyan Sajjalla, Jayesh Choudhari, Rushi Bhatt, Anirban Dasgupta

Abstract: We address the problem of large scale real-time classification of content posted on social networks, along with the need to rapidly identify novel spam types. Obtaining manual labels for user-generated content using editorial labeling and taxonomy development lags compared to the rate at which new content type needs to be classified. We propose a class of hierarchical clustering algorithms that ca… ▽ More We address the problem of large scale real-time classification of content posted on social networks, along with the need to rapidly identify novel spam types. Obtaining manual labels for user-generated content using editorial labeling and taxonomy development lags compared to the rate at which new content type needs to be classified. We propose a class of hierarchical clustering algorithms that can be used both for efficient and scalable real-time multiclass classification as well as in detecting new anomalies in user-generated content. Our methods have low query time, linear space usage, and come with theoretical guarantees with respect to a specific hierarchical clustering cost function (Dasgupta, 2016). We compare our solutions against a range of classification techniques and demonstrate excellent empirical performance. △ Less

Submitted 25 August, 2020; originally announced August 2020.

Comments: 19 pages, 2 figures, 9 tables

ACM Class: H.3.3; I.5.3; I.7.0; E.1

Search v0.5.6 released 2020-02-24