Skip to main content

Showing 1–14 of 14 results for author: Beretta, L

.
  1. arXiv:2505.05819  [pdf, ps, other

    cs.LG cs.DS

    New Statistical and Computational Results for Learning Junta Distributions

    Authors: Lorenzo Beretta

    Abstract: We study the problem of learning junta distributions on $\{0, 1\}^n$, where a distribution is a $k$-junta if its probability mass function depends on a subset of at most $k$ variables. We make two main contributions: - We show that learning $k$-junta distributions is \emph{computationally} equivalent to learning $k$-parity functions with noise (LPN), a landmark problem in computational learning… ▽ More

    Submitted 19 May, 2025; v1 submitted 9 May, 2025; originally announced May 2025.

  2. arXiv:2505.04604  [pdf, ps, other

    cs.LG cs.CC cs.DS stat.ML

    Testing Juntas Optimally with Samples

    Authors: Lorenzo Beretta, Nathaniel Harms, Caleb Koch

    Abstract: We prove tight upper and lower bounds of $Θ\left(\tfrac{1}ε\left( \sqrt{2^k \log\binom{n}{k} } + \log\binom{n}{k} \right)\right)$ on the number of samples required for distribution-free $k$-junta testing. This is the first tight bound for testing a natural class of Boolean functions in the distribution-free sample-based model. Our bounds also hold for the feature selection problem, showing that a… ▽ More

    Submitted 7 May, 2025; originally announced May 2025.

  3. arXiv:2409.15008  [pdf, other

    math.NA

    Sketched Lanczos uncertainty score: a low-memory summary of the Fisher information

    Authors: Marco Miani, Lorenzo Beretta, Søren Hauberg

    Abstract: Current uncertainty quantification is memory and compute expensive, which hinders practical uptake. To counter, we develop Sketched Lanczos Uncertainty (SLU): an architecture-agnostic uncertainty score that can be applied to pre-trained neural networks with minimal overhead. Importantly, the memory use of SLU only grows logarithmically with the number of model parameters. We combine Lanczos' algor… ▽ More

    Submitted 25 October, 2024; v1 submitted 23 September, 2024; originally announced September 2024.

  4. Dimensionality Reduction and Nearest Neighbors for Improving Out-of-Distribution Detection in Medical Image Segmentation

    Authors: McKell Woodland, Nihil Patel, Austin Castelo, Mais Al Taie, Mohamed Eltaher, Joshua P. Yung, Tucker J. Netherton, Tiffany L. Calderone, Jessica I. Sanchez, Darrel W. Cleere, Ahmed Elsaiey, Nakul Gupta, David Victor, Laura Beretta, Ankit B. Patel, Kristy K. Brock

    Abstract: Clinically deployed deep learning-based segmentation models are known to fail on data outside of their training distributions. While clinicians review the segmentations, these models tend to perform well in most instances, which could exacerbate automation bias. Therefore, detecting out-of-distribution images at inference is critical to warn the clinicians that the model likely failed. This work a… ▽ More

    Submitted 2 October, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

    Comments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org/2024:020. Expansion of "Dimensionality Reduction for Improving Out-of-Distribution Detection in Medical Image Segmentation" arXiv:2308.03723. Code available at https://github.com/mckellwoodland/dimen_reduce_mahal (https://zenodo.org/records/13881989)

    Journal ref: Machine.Learning.for.Biomedical.Imaging. 2 (2024) 2006

  5. arXiv:2406.19257  [pdf, other

    cs.DS cs.CG

    Online sorting and online TSP: randomized, stochastic, and high-dimensional

    Authors: Mikkel Abrahamsen, Ioana O. Bercea, Lorenzo Beretta, Jonas Klausen, László Kozma

    Abstract: In the online sorting problem, $n$ items are revealed one by one and have to be placed (immediately and irrevocably) into empty cells of a size-$n$ array. The goal is to minimize the sum of absolute differences between items in consecutive cells. This natural problem was recently introduced by Aamand, Abrahamsen, Beretta, and Kleist (SODA 2023) as a tool in their study of online geometric packing… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 23 pages, appeared in ESA 2024

  6. arXiv:2403.07126  [pdf, other

    stat.AP cs.CV

    Heterogeneous Image-based Classification Using Distributional Data Analysis

    Authors: Alec Reinhardt, Newsha Nikzad, Raven J. Hollis, Galia Jacobson, Millicent A. Roach, Mohamed Badawy, Peter Chul Park, Laura Beretta, Prasun K Jalal, David T. Fuentes, Eugene J. Koay, Suprateek Kundu

    Abstract: Diagnostic imaging has gained prominence as potential biomarkers for early detection and diagnosis in a diverse array of disorders including cancer. However, existing methods routinely face challenges arising from various factors such as image heterogeneity. We develop a novel imaging-based distributional data analysis (DDA) approach that incorporates the probability (quantile) distribution of the… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: 16, 2 figures, 3 tables

  7. arXiv:2310.19514  [pdf, other

    cs.DS cs.CG

    Approximate Earth Mover's Distance in Truly-Subquadratic Time

    Authors: Lorenzo Beretta, Aviad Rubinstein

    Abstract: We design an additive approximation scheme for estimating the cost of the min-weight bipartite matching problem: given a bipartite graph with non-negative edge costs and $\varepsilon > 0$, our algorithm estimates the cost of matching all but $O(\varepsilon)$-fraction of the vertices in truly subquadratic time $O(n^{2-δ(\varepsilon)})$. Our algorithm has a natural interpretation for computing the… ▽ More

    Submitted 10 November, 2023; v1 submitted 30 October, 2023; originally announced October 2023.

  8. arXiv:2309.16384  [pdf, other

    cs.CG cs.LG

    Multi-Swap $k$-Means++

    Authors: Lorenzo Beretta, Vincent Cohen-Addad, Silvio Lattanzi, Nikos Parotsidis

    Abstract: The $k$-means++ algorithm of Arthur and Vassilvitskii (SODA 2007) is often the practitioners' choice algorithm for optimizing the popular $k$-means clustering objective and is known to give an $O(\log k)$-approximation in expectation. To obtain higher quality solutions, Lattanzi and Sohler (ICML 2019) proposed augmenting $k$-means++ with $O(k \log \log k)$ local search steps obtained through the… ▽ More

    Submitted 25 October, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

    Comments: NeurIPS 2023

  9. arXiv:2309.03980  [pdf, ps, other

    physics.med-ph

    Enhancement Pattern Mapping for Detection of Hepatocellular Carcinoma in Patients with Cirrhosis

    Authors: Newsha Nikzad, David Thomas Fuentes, Millicent Roach, Tasadduk Chowdhury, Matthew Cagley, Mohamed Badawy, Ahmed Elkhesen, Manal Hassan, Khaled Elsayes, Laura Beretta, Eugene Jon Koay, Prasun Kumar Jalal

    Abstract: Background and Aims: Limited methods exist to accurately characterize risk of malignant progression of liver lesions in patients undergoing surveillance for hepatocellular carcinoma (HCC). Enhancement pattern mapping (EPM) measures voxel-based root mean square deviation (RMSD) and improves the contrast-to-noise ratio (CNR) of liver lesions on standard of care imaging. This study investigates the u… ▽ More

    Submitted 15 September, 2023; v1 submitted 7 September, 2023; originally announced September 2023.

    Comments: Pre-print, 9 pages, 4 figures

  10. arXiv:2308.14134  [pdf, ps, other

    cs.DS

    Locally Uniform Hashing

    Authors: Ioana O. Bercea, Lorenzo Beretta, Jonas Klausen, Jakob Bæk Tejs Houen, Mikkel Thorup

    Abstract: Hashing is a common technique used in data processing, with a strong impact on the time and resources spent on computation. Hashing also affects the applicability of theoretical results that often assume access to (unrealistic) uniform/fully-random hash functions. In this paper, we are concerned with designing hash functions that are practical and come with strong theoretical guarantees on their p… ▽ More

    Submitted 28 September, 2023; v1 submitted 27 August, 2023; originally announced August 2023.

    Comments: FOCS 2023

  11. arXiv:2112.03791  [pdf, other

    cs.CG cs.DS

    Online Sorting and Translational Packing of Convex Polygons

    Authors: Anders Aamand, Mikkel Abrahamsen, Lorenzo Beretta, Linda Kleist

    Abstract: We investigate several online packing problems in which convex polygons arrive one by one and have to be placed irrevocably into a container, while the aim is to minimize the used space. Among other variants, we consider strip packing and bin packing, where the container is the infinite horizontal strip $[0,\infty)\times [0,1]$ or a collection of $1 \times 1$ bins, respectively. We draw interest… ▽ More

    Submitted 8 April, 2024; v1 submitted 7 December, 2021; originally announced December 2021.

  12. An Optimal Algorithm for Finding Champions in Tournament Graphs

    Authors: Lorenzo Beretta, Franco Maria Nardini, Roberto Trani, Rossano Venturini

    Abstract: A tournament graph is a complete directed graph, which can be used to model a round-robin tournament between $n$ players. In this paper, we address the problem of finding a champion of the tournament, also known as Copeland winner, which is a player that wins the highest number of matches. In detail, we aim to investigate algorithms that find the champion by playing a low number of matches. Solvin… ▽ More

    Submitted 18 April, 2023; v1 submitted 26 November, 2021; originally announced November 2021.

  13. arXiv:2110.14948  [pdf, other

    cs.DS math.PR math.ST

    Better Sum Estimation via Weighted Sampling

    Authors: Lorenzo Beretta, Jakub Tětek

    Abstract: Given a large set $U$ where each item $a\in U$ has weight $w(a)$, we want to estimate the total weight $W=\sum_{a\in U} w(a)$ to within factor of $1\pm\varepsilon$ with some constant probability $>1/2$. Since $n=|U|$ is large, we want to do this without looking at the entire set $U$. In the traditional setting in which we are allowed to sample elements from $U$ uniformly, sampling $Ω(n)$ items is… ▽ More

    Submitted 28 October, 2021; originally announced October 2021.

    Comments: To appear at SODA 2022

  14. arXiv:2101.09024  [pdf, other

    cs.CG

    Online Packing to Minimize Area or Perimeter

    Authors: Mikkel Abrahamsen, Lorenzo Beretta

    Abstract: We consider online packing problems where we get a stream of axis-parallel rectangles. The rectangles have to be placed in the plane without overlapping, and each rectangle must be placed without knowing the subsequent rectangles. The goal is to minimize the perimeter or the area of the axis-parallel bounding box of the rectangles. We either allow rotations by 90 degrees or translations only. Fo… ▽ More

    Submitted 25 January, 2021; v1 submitted 22 January, 2021; originally announced January 2021.