Interdiction of minimum spanning trees and other matroid bases
Authors:
Noah Weninger,
Ricardo Fukasawa
Abstract:
In the minimum spanning tree (MST) interdiction problem, we are given a graph $G=(V,E)$ with edge weights, and want to find some $X\subseteq E$ satisfying a knapsack constraint such that the MST weight in $(V,E\setminus X)$ is maximized. Since MSTs of $G$ are the minimum weight bases in the graphic matroid of $G$, this problem is a special case of matroid interdiction on a matroid…
▽ More
In the minimum spanning tree (MST) interdiction problem, we are given a graph $G=(V,E)$ with edge weights, and want to find some $X\subseteq E$ satisfying a knapsack constraint such that the MST weight in $(V,E\setminus X)$ is maximized. Since MSTs of $G$ are the minimum weight bases in the graphic matroid of $G$, this problem is a special case of matroid interdiction on a matroid $M=(E,\mathcal{I})$, in which the objective is instead to maximize the minimum weight of a basis of $M$ which is disjoint from $X$. By reduction from 0-1 knapsack, matroid interdiction is NP-complete, even for uniform matroids.
We develop a new exact algorithm to solve the matroid interdiction problem. One of the key components of our algorithm is a dynamic programming upper bound which only requires that a simpler discrete derivative problem can be calculated/approximated for the given matroid. Our exact algorithm then uses this bound within a custom branch-and-bound algorithm. For different matroids, we show how this discrete derivative can be calculated/approximated. In particular, for partition matroids, this yields a pseudopolynomial time algorithm. For graphic matroids, an approximation can be obtained by solving a sequence of minimum cut problems, which we apply to the MST interdiction problem. The running time of our algorithm is asymptotically faster than the best known MST interdiction algorithm, up to polylog factors. Furthermore, our algorithm achieves state-of-the-art computational performance: we solved all available instances from the literature, and in many cases reduced the best running time from hours to seconds.
△ Less
Submitted 20 July, 2024;
originally announced July 2024.
Trained Random Forests Completely Reveal your Dataset
Authors:
Julien Ferry,
Ricardo Fukasawa,
Timothée Pascal,
Thibaut Vidal
Abstract:
We introduce an optimization-based reconstruction attack capable of completely or near-completely reconstructing a dataset utilized for training a random forest. Notably, our approach relies solely on information readily available in commonly used libraries such as scikit-learn. To achieve this, we formulate the reconstruction problem as a combinatorial problem under a maximum likelihood objective…
▽ More
We introduce an optimization-based reconstruction attack capable of completely or near-completely reconstructing a dataset utilized for training a random forest. Notably, our approach relies solely on information readily available in commonly used libraries such as scikit-learn. To achieve this, we formulate the reconstruction problem as a combinatorial problem under a maximum likelihood objective. We demonstrate that this problem is NP-hard, though solvable at scale using constraint programming -- an approach rooted in constraint propagation and solution-domain reduction. Through an extensive computational investigation, we demonstrate that random forests trained without bootstrap aggregation but with feature randomization are susceptible to a complete reconstruction. This holds true even with a small number of trees. Even with bootstrap aggregation, the majority of the data can also be reconstructed. These findings underscore a critical vulnerability inherent in widely adopted ensemble methods, warranting attention and mitigation. Although the potential for such reconstruction attacks has been discussed in privacy research, our study provides clear empirical evidence of their practicability.
△ Less
Submitted 14 August, 2024; v1 submitted 29 February, 2024;
originally announced February 2024.