Skip to main content

Showing 1–14 of 14 results for author: Riondato, M

.
  1. Alice and the Caterpillar: A more descriptive null model for assessing data mining results

    Authors: Giulia Preti, Gianmarco De Francisci Morales, Matteo Riondato

    Abstract: We introduce novel null models for assessing the results obtained from observed binary transactional and sequence datasets, using statistical hypothesis testing. Our null models maintain more properties of the observed dataset than existing ones. Specifically, they preserve the Bipartite Joint Degree Matrix of the bipartite (multi-)graph corresponding to the dataset, which ensures that the number… ▽ More

    Submitted 11 June, 2025; originally announced June 2025.

    Journal ref: Knowledge and Information Systems, 2024

  2. Polaris: Sampling from the Multigraph Configuration Model with Prescribed Color Assortativity

    Authors: Giulia Preti, Matteo Riondato, Aristides Gionis, Gianmarco De Francisci Morales

    Abstract: We introduce Polaris, a network null model for colored multi-graphs that preserves the Joint Color Matrix. Polaris is specifically designed for studying network polarization, where vertices belong to a side in a debate or a partisan group, represented by a vertex color, and relations have different strengths, represented by an integer-valued edge multiplicity. The key feature of Polaris is preserv… ▽ More

    Submitted 18 December, 2024; v1 submitted 2 September, 2024; originally announced September 2024.

    Comments: Accepted for publication at WSDM2025

  3. arXiv:2308.10838  [pdf, other

    cs.SI physics.soc-ph

    An impossibility result for Markov Chain Monte Carlo sampling from micro-canonical bipartite graph ensembles

    Authors: Giulia Preti, Gianmarco De Francisci Morales, Matteo Riondato

    Abstract: Markov Chain Monte Carlo (MCMC) algorithms are commonly used to sample from graph ensembles. Two graphs are neighbors in the state space if one can be obtained from the other with only a few modifications, e.g., edge rewirings. For many common ensembles, e.g., those preserving the degree sequences of bipartite graphs, rewiring operations involving two edges are sufficient to create a fully-connect… ▽ More

    Submitted 10 September, 2024; v1 submitted 21 August, 2023; originally announced August 2023.

    Comments: Accepted for publication in Physical Review E

  4. RePBubLik: Reducing the Polarized Bubble Radius with Link Insertions

    Authors: Shahrzad Haddadan, Cristina Menghini, Matteo Riondato, Eli Upfal

    Abstract: The topology of the hyperlink graph among pages expressing different opinions may influence the exposure of readers to diverse content. Structural bias may trap a reader in a polarized bubble with no access to other opinions. We model readers' behavior as random walks. A node is in a polarized bubble if the expected length of a random walk from it to a page of different opinion is large. The struc… ▽ More

    Submitted 12 January, 2021; originally announced January 2021.

  5. arXiv:2006.09085  [pdf, other

    cs.LG cs.DB cs.DS stat.ML

    MCRapper: Monte-Carlo Rademacher Averages for Poset Families and Approximate Pattern Mining

    Authors: Leonardo Pellegrina, Cyrus Cousins, Fabio Vandin, Matteo Riondato

    Abstract: We present MCRapper, an algorithm for efficient computation of Monte-Carlo Empirical Rademacher Averages (MCERA) for families of functions exhibiting poset (e.g., lattice) structure, such as those that arise in many pattern mining tasks. The MCERA allows us to compute upper bounds to the maximum deviation of sample means from their expectations, thus it can be used to find both statistically-signi… ▽ More

    Submitted 16 June, 2020; originally announced June 2020.

  6. arXiv:1602.07424  [pdf, other

    cs.DS cs.DB

    TRIÈST: Counting Local and Global Triangles in Fully-dynamic Streams with Fixed Memory Size

    Authors: Lorenzo De Stefani, Alessandro Epasto, Matteo Riondato, Eli Upfal

    Abstract: We present TRIÈST, a suite of one-pass streaming algorithms to compute unbiased, low-variance, high-quality approximations of the global and local (i.e., incident to each vertex) number of triangles in a fully-dynamic graph represented as an adversarial stream of edge insertions and deletions. Our algorithms use reservoir sampling and its variants to exploit the user-specified memory space at all… ▽ More

    Submitted 28 June, 2016; v1 submitted 24 February, 2016; originally announced February 2016.

    Comments: 49 pages, 7 figures, extended version of the paper appeared at ACM KDD'16

    ACM Class: G.2.2; H.2.8

  7. arXiv:1602.05866  [pdf, other

    cs.DS

    ABRA: Approximating Betweenness Centrality in Static and Dynamic Graphs with Rademacher Averages

    Authors: Matteo Riondato, Eli Upfal

    Abstract: We present ABRA, a suite of algorithms that compute and maintain probabilistically-guaranteed, high-quality, approximations of the betweenness centrality of all nodes (or edges) on both static and fully dynamic graphs. Our algorithms rely on random sampling and their analysis leverages on Rademacher averages and pseudodimension, fundamental concepts from statistical learning theory. To our knowled… ▽ More

    Submitted 18 February, 2016; originally announced February 2016.

    ACM Class: G.2.2; H.2.8

  8. arXiv:1504.03275  [pdf, other

    cs.SI

    Wiggins: Detecting Valuable Information in Dynamic Networks Using Limited Resources

    Authors: Ahmad Mahmoody, Matteo Riondato, Eli Upfal

    Abstract: Detecting new information and events in a dynamic network by probing individual nodes has many practical applications: discovering new webpages, analyzing influence properties in network, and detecting failure propagation in electronic circuits or infections in public drinkable water systems. In practice, it is infeasible for anyone but the owner of the network (if existent) to monitor all nodes a… ▽ More

    Submitted 29 July, 2015; v1 submitted 13 April, 2015; originally announced April 2015.

  9. arXiv:1301.1218  [pdf, ps, other

    cs.LG cs.DB cs.DS stat.ML

    Finding the True Frequent Itemsets

    Authors: Matteo Riondato, Fabio Vandin

    Abstract: Frequent Itemsets (FIs) mining is a fundamental primitive in data mining. It requires to identify all itemsets appearing in at least a fraction $θ$ of a transactional dataset $\mathcal{D}$. Often though, the ultimate goal of mining $\mathcal{D}$ is not an analysis of the dataset \emph{per se}, but the understanding of the underlying process that generated it. Specifically, in many applications… ▽ More

    Submitted 22 January, 2014; v1 submitted 7 January, 2013; originally announced January 2013.

    Comments: 13 pages, Extended version of work appeared in SIAM International Conference on Data Mining, 2014

    ACM Class: H.2.8

  10. arXiv:1111.6937  [pdf, other

    cs.DS cs.DB cs.LG

    Efficient Discovery of Association Rules and Frequent Itemsets through Sampling with Tight Performance Guarantees

    Authors: Matteo Riondato, Eli Upfal

    Abstract: The tasks of extracting (top-$K$) Frequent Itemsets (FI's) and Association Rules (AR's) are fundamental primitives in data mining and database applications. Exact algorithms for these problems exist and are widely used, but their running time is hindered by the need of scanning the entire dataset, possibly multiple times. High quality approximations of FI's and AR's are sufficient for most practic… ▽ More

    Submitted 22 February, 2013; v1 submitted 29 November, 2011; originally announced November 2011.

    Comments: 19 pages, 7 figures. A shorter version of this paper appeared in the proceedings of ECML PKDD 2012

    ACM Class: H.2.8

  11. Space-Round Tradeoffs for MapReduce Computations

    Authors: Andrea Pietracaprina, Geppino Pucci, Matteo Riondato, Francesco Silvestri, Eli Upfal

    Abstract: This work explores fundamental modeling and algorithmic issues arising in the well-established MapReduce framework. First, we formally specify a computational model for MapReduce which captures the functional flavor of the paradigm by allowing for a flexible use of parallelism. Indeed, the model diverges from a traditional processor-centric view by featuring parameters which embody only global and… ▽ More

    Submitted 9 November, 2011; originally announced November 2011.

    Journal ref: Final version in Proc. of the 26th ACM international conference on Supercomputing, pages 235-244, 2012

  12. arXiv:1101.5805  [pdf, ps, other

    cs.DB cs.DS cs.LG

    The VC-Dimension of Queries and Selectivity Estimation Through Sampling

    Authors: Matteo Riondato, Mert Akdere, Ugur Cetintemel, Stanley B. Zdonik, Eli Upfal

    Abstract: We develop a novel method, based on the statistical concept of the Vapnik-Chervonenkis dimension, to evaluate the selectivity (output cardinality) of SQL queries - a crucial step in optimizing the execution of large scale database and data-mining operations. The major theoretical contribution of this work, which is of independent interest, is an explicit bound to the VC-dimension of a range space… ▽ More

    Submitted 11 August, 2011; v1 submitted 30 January, 2011; originally announced January 2011.

    Comments: 20 pages, 3 figures

    ACM Class: H.2.4; G.3

  13. Mining Top-K Frequent Itemsets Through Progressive Sampling

    Authors: Andrea Pietracaprina, Matteo Riondato, Eli Upfal, Fabio Vandin

    Abstract: We study the use of sampling for efficiently mining the top-K frequent itemsets of cardinality at most w. To this purpose, we define an approximation to the top-K frequent itemsets to be a family of itemsets which includes (resp., excludes) all very frequent (resp., very infrequent) itemsets, together with an estimate of these itemsets' frequencies with a bounded error. Our first result is an uppe… ▽ More

    Submitted 27 June, 2010; originally announced June 2010.

    Comments: 16 pages, 2 figures, accepted for presentation at ECML PKDD 2010 and publication in the ECML PKDD 2010 special issue of the Data Mining and Knowledge Discovery journal

  14. arXiv:physics/0506008  [pdf, ps, other

    physics.med-ph

    Preliminary study of metabolic radiotherapy with 188Re via small animal imaging

    Authors: G. Baldazzi, D. Bollini, A. Muciaccio, F. -L. Navarria, G. Pancaldi, A. Perrotta, M. Zuffa, P. Boccaccio, N. Uzunov, M. Bello, D. Bernardini, U. Mazzi, G. Moschini, M. Riondato, A. Rosato, F. Garibaldi, R. Pani, A. Antoccia, F. de Notaristefani, G. Hull, V. Orsolini Cencelli, A. Sgura, C. Tanzarella

    Abstract: 188Re is a beta- (Emax = 2.12 MeV) and gamma (155 keV) emitter. Since its chemistry is similar to that of the largely employed tracer, 99mTc, molecules of hyaluronic acid (HA) have been labelled with 188Re to produce a target specific radiopharmaceutical. The radiolabeled compound, i.v. injected in healthy mice, is able to accumulate into the liver after a few minutes. To study the effect of met… ▽ More

    Submitted 1 June, 2005; originally announced June 2005.

    Comments: 6 pages, 8 figs. To appear in Nucl. Phys. B (PS), proc. of Innovative Particle and Radiation Detectors, Siena, 23-26 May 2004