Skip to main content

Showing 1–23 of 23 results for author: Aumüller, M

.
  1. arXiv:2505.17810  [pdf, ps, other

    cs.LG cs.IR

    VIBE: Vector Index Benchmark for Embeddings

    Authors: Elias Jääsaari, Ville Hyvönen, Matteo Ceccarello, Teemu Roos, Martin Aumüller

    Abstract: Approximate nearest neighbor (ANN) search is a performance-critical component of many machine learning pipelines. Rigorous benchmarking is essential for evaluating the performance of vector indexes for ANN search. However, the datasets of the existing benchmarks are no longer representative of the current applications of ANN search. Hence, there is an urgent need for an up-to-date set of benchmark… ▽ More

    Submitted 23 May, 2025; originally announced May 2025.

    Comments: 25 pages

  2. arXiv:2409.17424  [pdf, other

    cs.IR cs.DS cs.LG cs.PF

    Results of the Big ANN: NeurIPS'23 competition

    Authors: Harsha Vardhan Simhadri, Martin Aumüller, Amir Ingber, Matthijs Douze, George Williams, Magdalen Dobson Manohar, Dmitry Baranchuk, Edo Liberty, Frank Liu, Ben Landrum, Mazin Karjikar, Laxman Dhulipala, Meng Chen, Yue Chen, Rui Ma, Kai Zhang, Yuzheng Cai, Jiayang Shi, Yizhuo Chen, Weiguo Zheng, Zihao Wan, Jie Yin, Ben Huang

    Abstract: The 2023 Big ANN Challenge, held at NeurIPS 2023, focused on advancing the state-of-the-art in indexing data structures and search algorithms for practical variants of Approximate Nearest Neighbor (ANN) search that reflect the growing complexity and diversity of workloads. Unlike prior challenges that emphasized scaling up classical ANN search ~\cite{DBLP:conf/nips/SimhadriWADBBCH21}, this competi… ▽ More

    Submitted 25 September, 2024; originally announced September 2024.

    Comments: Code: https://github.com/harsha-simhadri/big-ann-benchmarks/releases/tag/v0.3.0

    ACM Class: H.3.3

  3. arXiv:2409.07187  [pdf, other

    cs.DS

    Differentially Private High-Dimensional Approximate Range Counting, Revisited

    Authors: Martin Aumüller, Fabrizio Boninsegna, Francesco Silvestri

    Abstract: Locality Sensitive Filters are known for offering a quasi-linear space data structure with rigorous guarantees for the Approximate Near Neighbor search (ANN) problem. Building on Locality Sensitive Filters, we derive a simple data structure for the Approximate Near Neighbor Counting (ANNC) problem under differential privacy (DP). Moreover, we provide a simple analysis leveraging a connection with… ▽ More

    Submitted 2 May, 2025; v1 submitted 11 September, 2024; originally announced September 2024.

  4. arXiv:2306.08745  [pdf, other

    cs.CR cs.DS cs.LG

    PLAN: Variance-Aware Private Mean Estimation

    Authors: Martin Aumüller, Christian Janos Lebeda, Boel Nelson, Rasmus Pagh

    Abstract: Differentially private mean estimation is an important building block in privacy-preserving algorithms for data analysis and machine learning. Though the trade-off between privacy and utility is well understood in the worst case, many datasets exhibit structure that could potentially be exploited to yield better algorithms. In this paper we present $\textit{Private Limit Adapted Noise}$ (PLAN), a… ▽ More

    Submitted 10 April, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

  5. arXiv:2205.03763  [pdf, other

    cs.LG cs.DB cs.DS cs.PF

    Results of the NeurIPS'21 Challenge on Billion-Scale Approximate Nearest Neighbor Search

    Authors: Harsha Vardhan Simhadri, George Williams, Martin Aumüller, Matthijs Douze, Artem Babenko, Dmitry Baranchuk, Qi Chen, Lucas Hosseini, Ravishankar Krishnaswamy, Gopal Srinivasa, Suhas Jayaram Subramanya, Jingdong Wang

    Abstract: Despite the broad range of algorithms for Approximate Nearest Neighbor Search, most empirical evaluations of algorithms have focused on smaller datasets, typically of 1 million points~\citep{Benchmark}. However, deploying recent advances in embedding based techniques for search, recommendation and ranking at scale require ANNS indices at billion, trillion or larger scale. Barring a few recent pape… ▽ More

    Submitted 7 May, 2022; originally announced May 2022.

  6. arXiv:2107.02736  [pdf, other

    cs.DS cs.LG

    DEANN: Speeding up Kernel-Density Estimation using Approximate Nearest Neighbor Search

    Authors: Matti Karppa, Martin Aumüller, Rasmus Pagh

    Abstract: Kernel Density Estimation (KDE) is a nonparametric method for estimating the shape of a density function, given a set of samples from the distribution. Recently, locality-sensitive hashing, originally proposed as a tool for nearest neighbor search, has been shown to enable fast KDE data structures. However, these approaches do not take advantage of the many other advances that have been made in al… ▽ More

    Submitted 1 March, 2022; v1 submitted 6 July, 2021; originally announced July 2021.

    Comments: 35 pages, 1 figure. AISTATS 2022

  7. arXiv:2106.10068  [pdf, other

    cs.CR cs.DS

    Differentially Private Sparse Vectors with Low Error, Optimal Space, and Fast Access

    Authors: Martin Aumüller, Christian Janos Lebeda, Rasmus Pagh

    Abstract: Representing a sparse histogram, or more generally a sparse vector, is a fundamental task in differential privacy. An ideal solution would use space close to information-theoretical lower bounds, have an error distribution that depends optimally on the desired privacy level, and allow fast random access to entries in the vector. However, existing approaches have only achieved two of these three go… ▽ More

    Submitted 27 September, 2021; v1 submitted 18 June, 2021; originally announced June 2021.

  8. arXiv:2101.10905  [pdf, other

    cs.DS cs.DB cs.LG

    Sampling a Near Neighbor in High Dimensions -- Who is the Fairest of Them All?

    Authors: Martin Aumüller, Sariel Har-Peled, Sepideh Mahabadi, Rasmus Pagh, Francesco Silvestri

    Abstract: Similarity search is a fundamental algorithmic primitive, widely used in many computer science disciplines. Given a set of points $S$ and a radius parameter $r>0$, the $r$-near neighbor ($r$-NN) problem asks for a data structure that, given any query point $q$, returns a point $p$ within distance at most $r$ from $q$. In this paper, we study the $r$-NN problem in the light of individual fairness a… ▽ More

    Submitted 26 January, 2021; originally announced January 2021.

    Comments: arXiv admin note: text overlap with arXiv:1906.02640

  9. arXiv:2008.08134  [pdf, other

    cs.LG cs.CR cs.IR stat.ML

    Differentially Private Sketches for Jaccard Similarity Estimation

    Authors: Martin Aumüller, Anders Bourgeat, Jana Schmurr

    Abstract: This paper describes two locally-differential private algorithms for releasing user vectors such that the Jaccard similarity between these vectors can be efficiently estimated. The basic building block is the well known MinHash method. To achieve a privacy-utility trade-off, MinHash is extended in two ways using variants of Generalized Randomized Response and the Laplace Mechanism. A theoretical a… ▽ More

    Submitted 18 August, 2020; originally announced August 2020.

    Comments: Accepted at SISAP 2020

  10. arXiv:1907.07387  [pdf, other

    cs.IR cs.DB

    The Role of Local Intrinsic Dimensionality in Benchmarking Nearest Neighbor Search

    Authors: Martin Aumüller, Matteo Ceccarello

    Abstract: This paper reconsiders common benchmarking approaches to nearest neighbor search. It is shown that the concept of local intrinsic dimensionality (LID) allows to choose query sets of a wide range of difficulty for real-world datasets. Moreover, the effect of different LID distributions on the running time performance of implementations is empirically studied. To this end, different visualization co… ▽ More

    Submitted 17 July, 2019; originally announced July 2019.

    Comments: Preprint of the paper accepted at SISAP 2019

  11. arXiv:1906.12211  [pdf, other

    cs.DS cs.CG

    PUFFINN: Parameterless and Universally Fast FInding of Nearest Neighbors

    Authors: Martin Aumüller, Tobias Christiani, Rasmus Pagh, Michael Vesterli

    Abstract: We present PUFFINN, a parameterless LSH-based index for solving the $k$-nearest neighbor problem with probabilistic guarantees. By parameterless we mean that the user is only required to specify the amount of memory the index is supposed to use and the result quality that should be achieved. The index combines several heuristic ideas known in the literature. By small adaptions to the query algorit… ▽ More

    Submitted 28 June, 2019; originally announced June 2019.

    Comments: Extended version of the ESA 2019 paper

  12. arXiv:1906.01859  [pdf, other

    cs.DS cs.CG cs.IR cs.LG

    Fair Near Neighbor Search: Independent Range Sampling in High Dimensions

    Authors: Martin Aumüller, Rasmus Pagh, Francesco Silvestri

    Abstract: Similarity search is a fundamental algorithmic primitive, widely used in many computer science disciplines. There are several variants of the similarity search problem, and one of the most relevant is the $r$-near neighbor ($r$-NN) problem: given a radius $r>0$ and a set of points $S$, construct a data structure that, for any given query point $q$, returns a point $p$ within distance at most $r$ f… ▽ More

    Submitted 15 June, 2020; v1 submitted 5 June, 2019; originally announced June 2019.

    Comments: Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems (PODS), Pages 191-204, June 2020

  13. arXiv:1810.12047  [pdf, ps, other

    cs.DS

    Simple and Fast BlockQuicksort using Lomuto's Partitioning Scheme

    Authors: Martin Aumüller, Nikolaj Hass

    Abstract: This paper presents simple variants of the BlockQuicksort algorithm described by Edelkamp and Weiss (ESA 2016). The simplification is achieved by using Lomuto's partitioning scheme instead of Hoare's crossing pointer technique to partition the input. To achieve a robust sorting algorithm that works well on many different input types, the paper introduces a novel two-pivot variant of Lomuto's parti… ▽ More

    Submitted 29 October, 2018; originally announced October 2018.

    Comments: Accepted at ALENEX 2019

    ACM Class: F.2.2

  14. arXiv:1807.05614  [pdf, other

    cs.IR cs.DB

    ANN-Benchmarks: A Benchmarking Tool for Approximate Nearest Neighbor Algorithms

    Authors: Martin Aumüller, Erik Bernhardsson, Alexander Faithfull

    Abstract: This paper describes ANN-Benchmarks, a tool for evaluating the performance of in-memory approximate nearest neighbor algorithms. It provides a standard interface for measuring the performance and quality achieved by nearest neighbor algorithms on different standard data sets. It supports several different ways of integrating $k$-NN algorithms, and its configuration system automatically tests a ran… ▽ More

    Submitted 17 July, 2018; v1 submitted 15 July, 2018; originally announced July 2018.

    Comments: Full version of the SISAP 2017 conference paper. v2: Updated the abstract to avoid arXiv linking to the wrong URL

    ACM Class: H.3.3

  15. arXiv:1703.07867  [pdf, other

    cs.DS

    Distance-Sensitive hashing

    Authors: Martin Aumüller, Tobias Christiani, Rasmus Pagh, Francesco Silvestri

    Abstract: Locality-sensitive hashing (LSH) is an important tool for managing high-dimensional noisy or uncertain data, for example in connection with data cleaning (similarity join) and noise-robust search (similarity search). However, for a number of problems the LSH framework is not known to yield good solutions, and instead ad hoc solutions have been designed for particular similarity and distance measur… ▽ More

    Submitted 17 April, 2018; v1 submitted 22 March, 2017; originally announced March 2017.

    Comments: Accepted at PODS'18. Abstract shortened due to character limit

    ACM Class: H.3.3

  16. Dual-Pivot Quicksort: Optimality, Analysis and Zeros of Associated Lattice Paths

    Authors: Martin Aumüller, Martin Dietzfelbinger, Clemens Heuberger, Daniel Krenn, Helmut Prodinger

    Abstract: We present an average case analysis of a variant of dual-pivot quicksort. We show that the used algorithmic partitioning strategy is optimal, i.e., it minimizes the expected number of key comparisons. For the analysis, we calculate the expected number of comparisons exactly as well as asymptotically, in particular, we provide exact expressions for the linear, logarithmic, and constant terms. An… ▽ More

    Submitted 27 November, 2017; v1 submitted 1 November, 2016; originally announced November 2016.

    Comments: This article supersedes arXiv:1602.04031

    MSC Class: 05A16; 68R05; 68P10; 68Q25; 68W40

    Journal ref: Combin. Probab. Comput. 28 (2019), no. 4, 485-518

  17. arXiv:1611.00029  [pdf, other

    cs.DS

    A Simple Hash Class with Strong Randomness Properties in Graphs and Hypergraphs

    Authors: Martin Aumüller, Martin Dietzfelbinger, Philipp Woelfel

    Abstract: We study randomness properties of graphs and hypergraphs generated by simple hash functions. Several hashing applications can be analyzed by studying the structure of $d$-uniform random ($d$-partite) hypergraphs obtained from a set $S$ of $n$ keys and $d$ randomly chosen hash functions $h_1,\dots,h_d$ by associating each key $x\in S$ with a hyperedge $\{h_1(x),\dots, h_d(x)\}$. Often it is assumed… ▽ More

    Submitted 31 October, 2016; originally announced November 2016.

    MSC Class: 68P05; 68R10; 68W20; 05C80

  18. Parameter-free Locality Sensitive Hashing for Spherical Range Reporting

    Authors: Thomas D. Ahle, Martin Aumüller, Rasmus Pagh

    Abstract: We present a data structure for *spherical range reporting* on a point set $S$, i.e., reporting all points in $S$ that lie within radius $r$ of a given query point $q$. Our solution builds upon the Locality-Sensitive Hashing (LSH) framework of Indyk and Motwani, which represents the asymptotically best solutions to near neighbor problems in high dimensions. While traditional LSH data structures ha… ▽ More

    Submitted 20 July, 2016; v1 submitted 9 May, 2016; originally announced May 2016.

    Comments: 21 pages, 5 figures, due to the limitation "The abstract field cannot be longer than 1,920 characters", the abstract appearing here is slightly shorter than that in the PDF file

    ACM Class: H.3.3

  19. arXiv:1604.02093  [pdf

    cond-mat.mtrl-sci cond-mat.soft

    Impact of water on the charge transport of a glass-forming ionic liquid

    Authors: P. Sippel, V. Dietrich, D. Reuter, M. Aumüller, P. Lunkenheimer, A. Loidl, S. Krohns

    Abstract: Using dielectric spectroscopy and differential scanning calorimetry, we have performed a detailed investigation of the influence of water uptake on the translational and reorientational glassy dynamics in the typical ionic liquid 1-Butyl-3-methyl-imidazolium chloride. From a careful analysis of the measured dielectric permittivity and conductivity spectra, we find a significant acceleration of cat… ▽ More

    Submitted 7 April, 2016; originally announced April 2016.

    Comments: 10 pages, 7 figures

    Journal ref: J. Mol. Liq. 223 (2016) 635

  20. arXiv:1602.04031  [pdf, other

    math.CO cs.DS

    Counting Zeros in Random Walks on the Integers and Analysis of Optimal Dual-Pivot Quicksort

    Authors: Martin Aumüller, Martin Dietzfelbinger, Clemens Heuberger, Daniel Krenn, Helmut Prodinger

    Abstract: We present an average case analysis of two variants of dual-pivot quicksort, one with a non-algorithmic comparison-optimal partitioning strategy, the other with a closely related algorithmic strategy. For both we calculate the expected number of comparisons exactly as well as asymptotically, in particular, we provide exact expressions for the linear, logarithmic, and constant terms. An essential s… ▽ More

    Submitted 11 May, 2016; v1 submitted 12 February, 2016; originally announced February 2016.

    Comments: extended abstract

    MSC Class: 05A16; 68R05; 68P10; 68Q25; 68W40

  21. arXiv:1510.04676  [pdf, ps, other

    cs.DS

    How Good is Multi-Pivot Quicksort?

    Authors: Martin Aumüller, Martin Dietzfelbinger, Pascal Klaue

    Abstract: Multi-Pivot Quicksort refers to variants of classical quicksort where in the partitioning step $k$ pivots are used to split the input into $k + 1$ segments. For many years, multi-pivot quicksort was regarded as impractical, but in 2009 a 2-pivot approach by Yaroslavskiy, Bentley, and Bloch was chosen as the standard sorting algorithm in Sun's Java 7. In 2014 at ALENEX, Kushagra et al. introduced a… ▽ More

    Submitted 31 May, 2016; v1 submitted 15 October, 2015; originally announced October 2015.

    Comments: Submitted to a journal, v2: Fixed statement of Gibb's inequality, v3: Revised version, especially improving on the experiments in Section 9

    ACM Class: F.2.2

  22. arXiv:1303.5217  [pdf, other

    cs.DS

    Optimal Partitioning for Dual-Pivot Quicksort

    Authors: Martin Aumüller, Martin Dietzfelbinger

    Abstract: Dual-pivot quicksort refers to variants of classical quicksort where in the partitioning step two pivots are used to split the input into three segments. This can be done in different ways, giving rise to different algorithms. Recently, a dual-pivot algorithm proposed by Yaroslavskiy received much attention, because a variant of it replaced the well-engineered quicksort algorithm in Sun's Java 7 r… ▽ More

    Submitted 13 October, 2015; v1 submitted 21 March, 2013; originally announced March 2013.

    Comments: Accepted for publication in ACM Transactions on Algorithms

  23. arXiv:1204.4431  [pdf, ps, other

    cs.DS

    Explicit and Efficient Hash Families Suffice for Cuckoo Hashing with a Stash

    Authors: Martin Aumüller, Martin Dietzfelbinger, Philipp Woelfel

    Abstract: It is shown that for cuckoo hashing with a stash as proposed by Kirsch, Mitzenmacher, and Wieder (2008) families of very simple hash functions can be used, maintaining the favorable performance guarantees: with stash size $s$ the probability of a rehash is $O(1/n^{s+1})$, and the evaluation time is $O(s)$. Instead of the full randomness needed for the analysis of Kirsch et al. and of Kutzelnigg (2… ▽ More

    Submitted 19 April, 2012; originally announced April 2012.

    Comments: 18 Pages

    ACM Class: F.2.2