Showing 1–2 of 2 results for author: López, F L

Search v0.5.6 released 2020-02-24

arXiv:2305.16358 [pdf, other]

cs.LG cs.AI stat.ML

Differentiable Clustering with Perturbed Spanning Forests

Authors: Lawrence Stewart, Francis S Bach, Felipe Llinares López, Quentin Berthet

Abstract: We introduce a differentiable clustering method based on stochastic perturbations of minimum-weight spanning forests. This allows us to include clustering in end-to-end trainable pipelines, with efficient gradients. We show that our method performs well even in difficult settings, such as data sets with high noise and challenging geometries. We also formulate an ad hoc loss to efficiently learn fr… ▽ More We introduce a differentiable clustering method based on stochastic perturbations of minimum-weight spanning forests. This allows us to include clustering in end-to-end trainable pipelines, with efficient gradients. We show that our method performs well even in difficult settings, such as data sets with high noise and challenging geometries. We also formulate an ad hoc loss to efficiently learn from partial clustering data using this operation. We demonstrate its performance on several data sets for supervised and semi-supervised tasks. △ Less

Submitted 6 November, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

Journal ref: 37th Conference on Neural Information Processing Systems, Dec 2023, New Orleans, United States
arXiv:1407.0316 [pdf, other]

stat.ME cs.LG stat.ML

Significant Subgraph Mining with Multiple Testing Correction

Authors: Mahito Sugiyama, Felipe Llinares López, Niklas Kasenburg, Karsten M. Borgwardt

Abstract: The problem of finding itemsets that are statistically significantly enriched in a class of transactions is complicated by the need to correct for multiple hypothesis testing. Pruning untestable hypotheses was recently proposed as a strategy for this task of significant itemset mining. It was shown to lead to greater statistical power, the discovery of more truly significant itemsets, than the sta… ▽ More The problem of finding itemsets that are statistically significantly enriched in a class of transactions is complicated by the need to correct for multiple hypothesis testing. Pruning untestable hypotheses was recently proposed as a strategy for this task of significant itemset mining. It was shown to lead to greater statistical power, the discovery of more truly significant itemsets, than the standard Bonferroni correction on real-world datasets. An open question, however, is whether this strategy of excluding untestable hypotheses also leads to greater statistical power in subgraph mining, in which the number of hypotheses is much larger than in itemset mining. Here we answer this question by an empirical investigation on eight popular graph benchmark datasets. We propose a new efficient search strategy, which always returns the same solution as the state-of-the-art approach and is approximately two orders of magnitude faster. Moreover, we exploit the dependence between subgraphs by considering the effective number of tests and thereby further increase the statistical power. △ Less

Submitted 30 January, 2015; v1 submitted 1 July, 2014; originally announced July 2014.

Comments: 18 pages, 5 figure, accepted to the 2015 SIAM International Conference on Data Mining (SDM15)

Search v0.5.6 released 2020-02-24