scikit-hubness: Hubness Reduction and Approximate Neighbor Search

Feldbauer, Roman; Rattei, Thomas; Flexer, Arthur

doi:10.21105/joss.01957

Computer Science > Machine Learning

arXiv:1912.00706 (cs)

[Submitted on 2 Dec 2019]

Title:scikit-hubness: Hubness Reduction and Approximate Neighbor Search

Authors:Roman Feldbauer, Thomas Rattei, Arthur Flexer

View PDF

Abstract:This paper introduces scikit-hubness, a Python package for efficient nearest neighbor search in high-dimensional spaces. Hubness is an aspect of the curse of dimensionality, and is known to impair various learning tasks, including classification, clustering, and visualization. scikit-hubness provides algorithms for hubness analysis ("Is my data affected by hubness?"), hubness reduction ("How can we improve neighbor retrieval in high dimensions?"), and approximate neighbor search ("Does it work for large data sets?"). It is integrated into the scikit-learn environment, enabling rapid adoption by Python-based machine learning researchers and practitioners. Users will find all functionality of the scikit-learn neighbors package, plus additional support for transparent hubness reduction and approximate nearest neighbor search. scikit-hubness is developed using several quality assessment tools and principles, such as PEP8 compliance, unit tests with high code coverage, continuous integration on all major platforms (Linux, MacOS, Windows), and additional checks by LGTM. The source code is available at this https URL under the BSD 3-clause license. Install from the Python package index with $ pip install scikit-hubness.

Subjects:	Machine Learning (cs.LG); Information Retrieval (cs.IR); Machine Learning (stat.ML)
Cite as:	arXiv:1912.00706 [cs.LG]
	(or arXiv:1912.00706v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1912.00706
Related DOI:	https://doi.org/10.21105/joss.01957

Submission history

From: Roman Feldbauer [view email]
[v1] Mon, 2 Dec 2019 12:04:32 UTC (14 KB)

Computer Science > Machine Learning

Title:scikit-hubness: Hubness Reduction and Approximate Neighbor Search

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:scikit-hubness: Hubness Reduction and Approximate Neighbor Search

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators