Showing 1–2 of 2 results for author: López, F L
-
Differentiable Clustering with Perturbed Spanning Forests
Authors:
Lawrence Stewart,
Francis S Bach,
Felipe Llinares López,
Quentin Berthet
Abstract:
We introduce a differentiable clustering method based on stochastic perturbations of minimum-weight spanning forests. This allows us to include clustering in end-to-end trainable pipelines, with efficient gradients. We show that our method performs well even in difficult settings, such as data sets with high noise and challenging geometries. We also formulate an ad hoc loss to efficiently learn fr…
▽ More
We introduce a differentiable clustering method based on stochastic perturbations of minimum-weight spanning forests. This allows us to include clustering in end-to-end trainable pipelines, with efficient gradients. We show that our method performs well even in difficult settings, such as data sets with high noise and challenging geometries. We also formulate an ad hoc loss to efficiently learn from partial clustering data using this operation. We demonstrate its performance on several data sets for supervised and semi-supervised tasks.
△ Less
Submitted 6 November, 2023; v1 submitted 25 May, 2023;
originally announced May 2023.
-
Significant Subgraph Mining with Multiple Testing Correction
Authors:
Mahito Sugiyama,
Felipe Llinares López,
Niklas Kasenburg,
Karsten M. Borgwardt
Abstract:
The problem of finding itemsets that are statistically significantly enriched in a class of transactions is complicated by the need to correct for multiple hypothesis testing. Pruning untestable hypotheses was recently proposed as a strategy for this task of significant itemset mining. It was shown to lead to greater statistical power, the discovery of more truly significant itemsets, than the sta…
▽ More
The problem of finding itemsets that are statistically significantly enriched in a class of transactions is complicated by the need to correct for multiple hypothesis testing. Pruning untestable hypotheses was recently proposed as a strategy for this task of significant itemset mining. It was shown to lead to greater statistical power, the discovery of more truly significant itemsets, than the standard Bonferroni correction on real-world datasets. An open question, however, is whether this strategy of excluding untestable hypotheses also leads to greater statistical power in subgraph mining, in which the number of hypotheses is much larger than in itemset mining. Here we answer this question by an empirical investigation on eight popular graph benchmark datasets. We propose a new efficient search strategy, which always returns the same solution as the state-of-the-art approach and is approximately two orders of magnitude faster. Moreover, we exploit the dependence between subgraphs by considering the effective number of tests and thereby further increase the statistical power.
△ Less
Submitted 30 January, 2015; v1 submitted 1 July, 2014;
originally announced July 2014.