Skip to main content

Showing 1–6 of 6 results for author: Pellegrina, L

Searching in archive cs. Search in all archives.
.
  1. Scalable Rule Lists Learning with Sampling

    Authors: Leonardo Pellegrina, Fabio Vandin

    Abstract: Learning interpretable models has become a major focus of machine learning research, given the increasing prominence of machine learning in socially important decision-making. Among interpretable models, rule lists are among the best-known and easily interpretable ones. However, finding optimal rule lists is computationally challenging, and current approaches are impractical for large datasets.… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted to KDD 2024

  2. arXiv:2406.11803  [pdf, other

    cs.LG cs.DB stat.ML

    Efficient Discovery of Significant Patterns with Few-Shot Resampling

    Authors: Leonardo Pellegrina, Fabio Vandin

    Abstract: Significant pattern mining is a fundamental task in mining transactional data, requiring to identify patterns significantly associated with the value of a given feature, the target. In several applications, such as biomedicine, basket market analysis, and social networks, the goal is to discover patterns whose association with the target is defined with respect to an underlying population, or proc… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted to VLDB 2024

  3. Efficient Centrality Maximization with Rademacher Averages

    Authors: Leonardo Pellegrina

    Abstract: The identification of the set of k most central nodes of a graph, or centrality maximization, is a key task in network analysis, with various applications ranging from finding communities in social and biological networks to understanding which seed nodes are important to diffuse information in a graph. As the exact computation of centrality measures does not scale to modern-sized networks, the mo… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

    Comments: Accepted to KDD '23

  4. arXiv:2106.03462  [pdf, other

    cs.DS cs.SI

    SILVAN: Estimating Betweenness Centralities with Progressive Sampling and Non-uniform Rademacher Bounds

    Authors: Leonardo Pellegrina, Fabio Vandin

    Abstract: Betweenness centrality is a popular centrality measure with applications in several domains, and whose exact computation is impractical for modern-sized networks. We present SILVAN, a novel, efficient algorithm to compute, with high probability, accurate estimates of the betweenness centrality of all nodes of a graph and a high-quality approximation of the top-k betweenness centralities. SILVAN fo… ▽ More

    Submitted 1 June, 2022; v1 submitted 7 June, 2021; originally announced June 2021.

  5. arXiv:2010.12103  [pdf, other

    cs.LG math.PR stat.ML

    Sharper convergence bounds of Monte Carlo Rademacher Averages through Self-Bounding functions

    Authors: Leonardo Pellegrina

    Abstract: We derive sharper probabilistic concentration bounds for the Monte Carlo Empirical Rademacher Averages (MCERA), which are proved through recent results on the concentration of self-bounding functions. Our novel bounds are characterized by convergence rates that depend on data-dependent characteristic quantities of the set of functions under consideration, such as the empirical wimpy variance, an e… ▽ More

    Submitted 16 January, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

  6. arXiv:2006.09085  [pdf, other

    cs.LG cs.DB cs.DS stat.ML

    MCRapper: Monte-Carlo Rademacher Averages for Poset Families and Approximate Pattern Mining

    Authors: Leonardo Pellegrina, Cyrus Cousins, Fabio Vandin, Matteo Riondato

    Abstract: We present MCRapper, an algorithm for efficient computation of Monte-Carlo Empirical Rademacher Averages (MCERA) for families of functions exhibiting poset (e.g., lattice) structure, such as those that arise in many pattern mining tasks. The MCERA allows us to compute upper bounds to the maximum deviation of sample means from their expectations, thus it can be used to find both statistically-signi… ▽ More

    Submitted 16 June, 2020; originally announced June 2020.