-
Bypassing orthogonalization in the quantum DPP sampler
Authors:
Michaël Fanuel,
Rémi Bardenet
Abstract:
Given an $n\times r$ matrix $X$ of rank $r$, consider the problem of sampling $r$ integers $\mathtt{C}\subset \{1, \dots, n\}$ with probability proportional to the squared determinant of the rows of $X$ indexed by $\mathtt{C}$. The distribution of $\mathtt{C}$ is called a projection determinantal point process (DPP). The vanilla classical algorithm to sample a DPP works in two steps, an orthogonal…
▽ More
Given an $n\times r$ matrix $X$ of rank $r$, consider the problem of sampling $r$ integers $\mathtt{C}\subset \{1, \dots, n\}$ with probability proportional to the squared determinant of the rows of $X$ indexed by $\mathtt{C}$. The distribution of $\mathtt{C}$ is called a projection determinantal point process (DPP). The vanilla classical algorithm to sample a DPP works in two steps, an orthogonalization in $\mathcal{O}(nr^2)$ and a sampling step of the same cost. The bottleneck of recent quantum approaches to DPP sampling remains that preliminary orthogonalization step. For instance, (Kerenidis and Prakash, 2022) proposed an algorithm with the same $\mathcal{O}(nr^2)$ orthogonalization, followed by a $\mathcal{O}(nr)$ classical step to find the gates in a quantum circuit. The classical $\mathcal{O}(nr^2)$ orthogonalization thus still dominates the cost. Our first contribution is to reduce preprocessing to normalizing the columns of $X$, obtaining $\mathsf{X}$ in $\mathcal{O}(nr)$ classical operations. We show that a simple circuit inspired by the formalism of Kerenidis et al., 2022 samples a DPP of a type we had never encountered in applications, which is different from our target DPP. Plugging this circuit into a rejection sampling routine, we recover our target DPP after an expected $1/\det \mathsf{X}^\top\mathsf{X} = 1/a$ preparations of the quantum circuit. Using amplitude amplification, our second contribution is to boost the acceptance probability from $a$ to $1-a$ at the price of a circuit depth of $\mathcal{O}(r\log n/\sqrt{a})$ and $\mathcal{O}(\log n)$ extra qubits. Prepending a fast, sketching-based classical approximation of $a$, we obtain a pipeline to sample a projection DPP on a quantum computer, where the former $\mathcal{O}(nr^2)$ preprocessing bottleneck has been replaced by the $\mathcal{O}(nr)$ cost of normalizing the columns and the cost of our approximation of $a$.
△ Less
Submitted 21 March, 2025; v1 submitted 7 March, 2025;
originally announced March 2025.
-
Negative Dependence as a toolbox for machine learning : review and new developments
Authors:
Hoang-Son Tran,
Vladimir Petrovic,
Remi Bardenet,
Subhroshekhar Ghosh
Abstract:
Negative dependence is becoming a key driver in advancing learning capabilities beyond the limits of traditional independence. Recent developments have evidenced support towards negatively dependent systems as a learning paradigm in a broad range of fundamental machine learning challenges including optimization, sampling, dimensionality reduction and sparse signal recovery, often surpassing the pe…
▽ More
Negative dependence is becoming a key driver in advancing learning capabilities beyond the limits of traditional independence. Recent developments have evidenced support towards negatively dependent systems as a learning paradigm in a broad range of fundamental machine learning challenges including optimization, sampling, dimensionality reduction and sparse signal recovery, often surpassing the performance of current methods based on statistical independence. The most popular negatively dependent model has been that of determinantal point processes (DPPs), which have their origins in quantum theory. However, other models, such as perturbed lattice models, strongly Rayleigh measures, zeros of random functions have gained salience in various learning applications. In this article, we review this burgeoning field of research, as it has developed over the past two decades or so. We also present new results on applications of DPPs to the parsimonious representation of neural networks. In the limited scope of the article, we mostly focus on aspects of this area to which the authors contributed over the recent years, including applications to Monte Carlo methods, coresets and stochastic gradient descent, stochastic networks, signal processing and connections to quantum computation. However, starting from basics of negative dependence for the uninitiated reader, extensive references are provided to a broad swath of related developments which could not be covered within our limited scope. While existing works and reviews generally focus on specific negatively dependent models (e.g. DPPs), a notable feature of this article is that it addresses negative dependence as a machine learning methodology as a whole. In this vein, it covers within its span an array of negatively dependent models and their applications well beyond DPPs, thereby putting forward a very general and rather unique perspective.
△ Less
Submitted 11 February, 2025;
originally announced February 2025.
-
Small coresets via negative dependence: DPPs, linear statistics, and concentration
Authors:
Rémi Bardenet,
Subhroshekhar Ghosh,
Hugo Simon-Onfroy,
Hoang-Son Tran
Abstract:
Determinantal point processes (DPPs) are random configurations of points with tunable negative dependence. Because sampling is tractable, DPPs are natural candidates for subsampling tasks, such as minibatch selection or coreset construction. A \emph{coreset} is a subset of a (large) training set, such that minimizing an empirical loss averaged over the coreset is a controlled replacement for the i…
▽ More
Determinantal point processes (DPPs) are random configurations of points with tunable negative dependence. Because sampling is tractable, DPPs are natural candidates for subsampling tasks, such as minibatch selection or coreset construction. A \emph{coreset} is a subset of a (large) training set, such that minimizing an empirical loss averaged over the coreset is a controlled replacement for the intractable minimization of the original empirical loss. Typically, the control takes the form of a guarantee that the average loss over the coreset approximates the total loss uniformly across the parameter space. Recent work has provided significant empirical support in favor of using DPPs to build randomized coresets, coupled with interesting theoretical results that are suggestive but leave some key questions unanswered. In particular, the central question of whether the cardinality of a DPP-based coreset is fundamentally smaller than one based on independent sampling remained open. In this paper, we answer this question in the affirmative, demonstrating that \emph{DPPs can provably outperform independently drawn coresets}. In this vein, we contribute a conceptual understanding of coreset loss as a \emph{linear statistic} of the (random) coreset. We leverage this structural observation to connect the coresets problem to a more general problem of concentration phenomena for linear statistics of DPPs, wherein we obtain \emph{effective concentration inequalities that extend well-beyond the state-of-the-art}, encompassing general non-projection, even non-symmetric kernels. The latter have been recently shown to be of interest in machine learning beyond coresets, but come with a limited theoretical toolbox, to the extension of which our result contributes. Finally, we are also able to address the coresets problem for vector-valued objective functions, a novelty in the coresets literature.
△ Less
Submitted 1 November, 2024;
originally announced November 2024.
-
Cycling in the forest with Wilson's algorithm
Authors:
Michaël Fanuel,
Rémi Bardenet
Abstract:
We consider a probability measure on cycle-rooted spanning forests (CRSFs) introduced by Kenyon. CRSFs are spanning subgraphs, each connected component of which has a unique cycle; they generalize spanning trees. A generalization of Wilson's celebrated CyclePopping algorithm for uniform spanning trees has been proposed for CRSFs, and several concise proofs have been given that the algorithm sample…
▽ More
We consider a probability measure on cycle-rooted spanning forests (CRSFs) introduced by Kenyon. CRSFs are spanning subgraphs, each connected component of which has a unique cycle; they generalize spanning trees. A generalization of Wilson's celebrated CyclePopping algorithm for uniform spanning trees has been proposed for CRSFs, and several concise proofs have been given that the algorithm samples from Kenyon's distribution. In this survey, we flesh out all the details of such a proof of correctness, progressively generalizing a proof by Marchal for spanning trees. This detailed proof has several interests. First, it serves as a modern tutorial on Wilson's algorithm, suitable for applied probability and computer science audiences. Compared to uniform spanning trees, the more sophisticated motivating application to CRSFs brings forth connections to recent research topics such as loop measures, partial rejection sampling, and heaps of cycles. Second, the detailed proof \emph{à la} Marchal yields the law of the time complexity of the sampling algorithm, shedding light on practical situations where the algorithm is expected to run fast.
△ Less
Submitted 9 July, 2025; v1 submitted 23 April, 2024;
originally announced April 2024.
-
Point Processes and spatial statistics in time-frequency analysis
Authors:
Barbara Pascal,
Rémi Bardenet
Abstract:
A finite-energy signal is represented by a square-integrable, complex-valued function $t\mapsto s(t)$ of a real variable $t$, interpreted as time. Similarly, a noisy signal is represented by a random process. Time-frequency analysis, a subfield of signal processing, amounts to describing the temporal evolution of the frequency content of a signal. Loosely speaking, if $s$ is the audio recording of…
▽ More
A finite-energy signal is represented by a square-integrable, complex-valued function $t\mapsto s(t)$ of a real variable $t$, interpreted as time. Similarly, a noisy signal is represented by a random process. Time-frequency analysis, a subfield of signal processing, amounts to describing the temporal evolution of the frequency content of a signal. Loosely speaking, if $s$ is the audio recording of a musical piece, time-frequency analysis somehow consists in writing the musical score of the piece. Mathematically, the operation is performed through a transform $\mathcal{V}$, mapping $s \in L^2(\mathbb{R})$ onto a complex-valued function $\mathcal{V}s \in L^2(\mathbb{R}^2)$ of time $t$ and angular frequency $ω$. The squared modulus $(t, ω) \mapsto \vert\mathcal{V}s(t,ω)\vert^2$ of the time-frequency representation is known as the spectrogram of $s$; in the musical score analogy, a peaked spectrogram at $(t_0,ω_0)$ corresponds to a musical note at angular frequency $ω_0$ localized at time $t_0$. More generally, the intuition is that upper level sets of the spectrogram contain relevant information about in the original signal. Hence, many signal processing algorithms revolve around identifying maxima of the spectrogram. In contrast, zeros of the spectrogram indicate perfect silence, that is, a time at which a particular frequency is absent. Assimilating $\mathbb{R}^2$ to $\mathbb{C}$ through $z = ω+ \mathrm{i}t$, this chapter focuses on time-frequency transforms $\mathcal{V}$ that map signals to analytic functions. The zeros of the spectrogram of a noisy signal are then the zeros of a random analytic function, hence forming a Point Process in $\mathbb{C}$. This chapter is devoted to the study of these Point Processes, to their links with zeros of Gaussian Analytic Functions, and to designing signal detection and denoising algorithms using spatial statistics.
△ Less
Submitted 15 April, 2025; v1 submitted 29 February, 2024;
originally announced February 2024.
-
Monte Carlo with kernel-based Gibbs measures: Guarantees for probabilistic herding
Authors:
Martin Rouault,
Rémi Bardenet,
Mylène Maïda
Abstract:
Kernel herding belongs to a family of deterministic quadratures that seek to minimize the worst-case integration error over a reproducing kernel Hilbert space (RKHS). In spite of strong experimental support, it has revealed difficult to prove that this worst-case error decreases at a faster rate than the standard square root of the number of quadrature nodes, at least in the usual case where the R…
▽ More
Kernel herding belongs to a family of deterministic quadratures that seek to minimize the worst-case integration error over a reproducing kernel Hilbert space (RKHS). In spite of strong experimental support, it has revealed difficult to prove that this worst-case error decreases at a faster rate than the standard square root of the number of quadrature nodes, at least in the usual case where the RKHS is infinite-dimensional. In this theoretical paper, we study a joint probability distribution over quadrature nodes, whose support tends to minimize the same worst-case error as kernel herding. We prove that it does outperform i.i.d. Monte Carlo, in the sense of coming with a tighter concentration inequality on the worst-case integration error. While not improving the rate yet, this demonstrates that the mathematical tools of the study of Gibbs measures can help understand to what extent kernel herding and its variants improve on computationally cheaper methods. Moreover, we provide early experimental evidence that a faster rate of convergence, though not worst-case, is likely.
△ Less
Submitted 18 February, 2024;
originally announced February 2024.
-
Benchmarking multi-component signal processing methods in the time-frequency plane
Authors:
Juan M. Miramont,
Rémi Bardenet,
Pierre Chainais,
Francois Auger
Abstract:
Signal processing in the time-frequency plane has a long history and remains a field of methodological innovation. For instance, detection and denoising based on the zeros of the spectrogram have been proposed since 2015, contrasting with a long history of focusing on larger values of the spectrogram. Yet, unlike neighboring fields like optimization and machine learning, time-frequency signal proc…
▽ More
Signal processing in the time-frequency plane has a long history and remains a field of methodological innovation. For instance, detection and denoising based on the zeros of the spectrogram have been proposed since 2015, contrasting with a long history of focusing on larger values of the spectrogram. Yet, unlike neighboring fields like optimization and machine learning, time-frequency signal processing lacks widely-adopted benchmarking tools. In this work, we contribute an open-source, Python-based toolbox termed MCSM-Benchs for benchmarking multi-component signal analysis methods, and we demonstrate our toolbox on three time-frequency benchmarks. First, we compare different methods for signal detection based on the zeros of the spectrogram, including unexplored variations of previously proposed detection tests. Second, we compare zero-based denoising methods to both classical and novel methods based on large values and ridges of the spectrogram. Finally, we compare the denoising performance of these methods against typical spectrogram thresholding strategies, in terms of post-processing artifacts commonly referred to as musical noise. At a low level, the obtained results provide new insight on the assessed approaches, and in particular research directions to further develop zero-based methods. At a higher level, our benchmarks exemplify the benefits of using a public, collaborative, common framework for benchmarking.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
On sampling determinantal and Pfaffian point processes on a quantum computer
Authors:
Rémi Bardenet,
Michaël Fanuel,
Alexandre Feller
Abstract:
DPPs were introduced by Macchi as a model in quantum optics the 1970s. Since then, they have been widely used as models and subsampling tools in statistics and computer science. Most applications require sampling from a DPP, and given their quantum origin, it is natural to wonder whether sampling a DPP on a quantum computer is easier than on a classical one. We focus here on DPPs over a finite sta…
▽ More
DPPs were introduced by Macchi as a model in quantum optics the 1970s. Since then, they have been widely used as models and subsampling tools in statistics and computer science. Most applications require sampling from a DPP, and given their quantum origin, it is natural to wonder whether sampling a DPP on a quantum computer is easier than on a classical one. We focus here on DPPs over a finite state space, which are distributions over the subsets of $\{1,\dots,N\}$ parametrized by an $N\times N$ Hermitian kernel matrix. Vanilla sampling consists in two steps, of respective costs $\mathcal{O}(N^3)$ and $\mathcal{O}(Nr^2)$ operations on a classical computer, where $r$ is the rank of the kernel matrix. A large first part of the current paper consists in explaining why the state-of-the-art in quantum simulation of fermionic systems already yields quantum DPP sampling algorithms. We then modify existing quantum circuits, and discuss their insertion in a full DPP sampling pipeline that starts from practical kernel specifications. The bottom line is that, with $P$ (classical) parallel processors, we can divide the preprocessing cost by $P$ and build a quantum circuit with $\mathcal{O}(Nr)$ gates that sample a given DPP, with depth varying from $\mathcal{O}(N)$ to $\mathcal{O}(r\log N)$ depending on qubit-communication constraints on the target machine. We also connect existing work on the simulation of superconductors to Pfaffian point processes, which generalize DPPs and would be a natural addition to the machine learner's toolbox. In particular, we describe "projective" Pfaffian point processes, the cardinality of which has constant parity, almost surely. Finally, the circuits are empirically validated on a classical simulator and on 5-qubit IBM machines.
△ Less
Submitted 22 November, 2023; v1 submitted 25 May, 2023;
originally announced May 2023.
-
Sparsification of the regularized magnetic Laplacian with multi-type spanning forests
Authors:
Michaël Fanuel,
Rémi Bardenet
Abstract:
In this paper, we consider a ${\rm U}(1)$-connection graph, that is, a graph where each oriented edge is endowed with a unit modulus complex number that is conjugated under orientation flip. A natural replacement for the combinatorial Laplacian is then the magnetic Laplacian, an Hermitian matrix that includes information about the graph's connection. Magnetic Laplacians appear, e.g., in the proble…
▽ More
In this paper, we consider a ${\rm U}(1)$-connection graph, that is, a graph where each oriented edge is endowed with a unit modulus complex number that is conjugated under orientation flip. A natural replacement for the combinatorial Laplacian is then the magnetic Laplacian, an Hermitian matrix that includes information about the graph's connection. Magnetic Laplacians appear, e.g., in the problem of angular synchronization. In the context of large and dense graphs, we study here sparsifiers of the magnetic Laplacian $Δ$, i.e., spectral approximations based on subgraphs with few edges. Our approach relies on sampling multi-type spanning forests (MTSFs) using a custom determinantal point process, a probability distribution over edges that favours diversity. In a word, an MTSF is a spanning subgraph whose connected components are either trees or cycle-rooted trees. The latter partially capture the angular inconsistencies of the connection graph, and thus provide a way to compress the information contained in the connection. Interestingly, when the connection graph has weakly inconsistent cycles, samples from the determinantal point process under consideration can be obtained à la Wilson, using a random walk with cycle popping. We provide statistical guarantees for a choice of natural estimators of the connection Laplacian, and investigate two practical applications of our sparsifiers: ranking with angular synchronization and graph-based semi-supervised learning. From a statistical perspective, a side result of this paper of independent interest is a matrix Chernoff bound with intrinsic dimension, which allows considering the influence of a regularization -- of the form $Δ+ q \mathbb{I}$ with $q>0$ -- on sparsification guarantees.
△ Less
Submitted 20 March, 2024; v1 submitted 31 August, 2022;
originally announced August 2022.
-
Determinantal point processes based on orthogonal polynomials for sampling minibatches in SGD
Authors:
Remi Bardenet,
Subhro Ghosh,
Meixia Lin
Abstract:
Stochastic gradient descent (SGD) is a cornerstone of machine learning. When the number N of data items is large, SGD relies on constructing an unbiased estimator of the gradient of the empirical risk using a small subset of the original dataset, called a minibatch. Default minibatch construction involves uniformly sampling a subset of the desired size, but alternatives have been explored for vari…
▽ More
Stochastic gradient descent (SGD) is a cornerstone of machine learning. When the number N of data items is large, SGD relies on constructing an unbiased estimator of the gradient of the empirical risk using a small subset of the original dataset, called a minibatch. Default minibatch construction involves uniformly sampling a subset of the desired size, but alternatives have been explored for variance reduction. In particular, experimental evidence suggests drawing minibatches from determinantal point processes (DPPs), distributions over minibatches that favour diversity among selected items. However, like in recent work on DPPs for coresets, providing a systematic and principled understanding of how and why DPPs help has been difficult. In this work, we contribute an orthogonal polynomial-based DPP paradigm for minibatch sampling in SGD. Our approach leverages the specific data distribution at hand, which endows it with greater sensitivity and power over existing data-agnostic methods. We substantiate our method via a detailed theoretical analysis of its convergence properties, interweaving between the discrete data set and the underlying continuous domain. In particular, we show how specific DPPs and a string of controlled approximations can lead to gradient estimators with a variance that decays faster with the batchsize than under uniform sampling. Coupled with existing finite-time guarantees for SGD on convex objectives, this entails that, DPP minibatches lead to a smaller bound on the mean square approximation error than uniform minibatches. Moreover, our estimators are amenable to a recent algorithm that directly samples linear statistics of DPPs (i.e., the gradient estimator) without sampling the underlying DPP (i.e., the minibatch), thereby reducing computational overhead. We provide detailed synthetic as well as real data experiments to substantiate our theoretical claims.
△ Less
Submitted 11 December, 2021;
originally announced December 2021.
-
Nonparametric estimation of continuous DPPs with kernel methods
Authors:
Michaël Fanuel,
Rémi Bardenet
Abstract:
Determinantal Point Process (DPPs) are statistical models for repulsive point patterns. Both sampling and inference are tractable for DPPs, a rare feature among models with negative dependence that explains their popularity in machine learning and spatial statistics. Parametric and nonparametric inference methods have been proposed in the finite case, i.e. when the point patterns live in a finite…
▽ More
Determinantal Point Process (DPPs) are statistical models for repulsive point patterns. Both sampling and inference are tractable for DPPs, a rare feature among models with negative dependence that explains their popularity in machine learning and spatial statistics. Parametric and nonparametric inference methods have been proposed in the finite case, i.e. when the point patterns live in a finite ground set. In the continuous case, only parametric methods have been investigated, while nonparametric maximum likelihood for DPPs -- an optimization problem over trace-class operators -- has remained an open question. In this paper, we show that a restricted version of this maximum likelihood (MLE) problem falls within the scope of a recent representer theorem for nonnegative functions in an RKHS. This leads to a finite-dimensional problem, with strong statistical ties to the original MLE. Moreover, we propose, analyze, and demonstrate a fixed point algorithm to solve this finite-dimensional problem. Finally, we also provide a controlled estimate of the correlation kernel of the DPP, thus providing more interpretability.
△ Less
Submitted 27 November, 2021; v1 submitted 27 June, 2021;
originally announced June 2021.
-
Learning from DPPs via Sampling: Beyond HKPV and symmetry
Authors:
Rémi Bardenet,
Subhroshekhar Ghosh
Abstract:
Determinantal point processes (DPPs) have become a significant tool for recommendation systems, feature selection, or summary extraction, harnessing the intrinsic ability of these probabilistic models to facilitate sample diversity. The ability to sample from DPPs is paramount to the empirical investigation of these models. Most exact samplers are variants of a spectral meta-algorithm due to Hough…
▽ More
Determinantal point processes (DPPs) have become a significant tool for recommendation systems, feature selection, or summary extraction, harnessing the intrinsic ability of these probabilistic models to facilitate sample diversity. The ability to sample from DPPs is paramount to the empirical investigation of these models. Most exact samplers are variants of a spectral meta-algorithm due to Hough, Krishnapur, Peres and Virág (henceforth HKPV), which is in general time and resource intensive. For DPPs with symmetric kernels, scalable HKPV samplers have been proposed that either first downsample the ground set of items, or force the kernel to be low-rank, using e.g. Nyström-type decompositions.
In the present work, we contribute a radically different approach than HKPV. Exploiting the fact that many statistical and learning objectives can be effectively accomplished by only sampling certain key observables of a DPP (so-called linear statistics), we invoke an expression for the Laplace transform of such an observable as a single determinant, which holds in complete generality. Combining traditional low-rank approximation techniques with Laplace inversion algorithms from numerical analysis, we show how to directly approximate the distribution function of a linear statistic of a DPP. This distribution function can then be used in hypothesis testing or to actually sample the linear statistic, as per requirement. Our approach is scalable and applies to very general DPPs, beyond traditional symmetric kernels.
△ Less
Submitted 8 July, 2020;
originally announced July 2020.
-
Kernel interpolation with continuous volume sampling
Authors:
Ayoub Belhadji,
Rémi Bardenet,
Pierre Chainais
Abstract:
A fundamental task in kernel methods is to pick nodes and weights, so as to approximate a given function from an RKHS by the weighted sum of kernel translates located at the nodes. This is the crux of kernel density estimation, kernel quadrature, or interpolation from discrete samples. Furthermore, RKHSs offer a convenient mathematical and computational framework. We introduce and analyse continuo…
▽ More
A fundamental task in kernel methods is to pick nodes and weights, so as to approximate a given function from an RKHS by the weighted sum of kernel translates located at the nodes. This is the crux of kernel density estimation, kernel quadrature, or interpolation from discrete samples. Furthermore, RKHSs offer a convenient mathematical and computational framework. We introduce and analyse continuous volume sampling (VS), the continuous counterpart -- for choosing node locations -- of a discrete distribution introduced in (Deshpande & Vempala, 2006). Our contribution is theoretical: we prove almost optimal bounds for interpolation and quadrature under VS. While similar bounds already exist for some specific RKHSs using ad-hoc node constructions, VS offers bounds that apply to any Mercer kernel and depend on the spectrum of the associated integration operator. We emphasize that, unlike previous randomized approaches that rely on regularized leverage scores or determinantal point processes, evaluating the pdf of VS only requires pointwise evaluations of the kernel. VS is thus naturally amenable to MCMC samplers.
△ Less
Submitted 22 February, 2020;
originally announced February 2020.
-
Kernel quadrature with DPPs
Authors:
Ayoub Belhadji,
Rémi Bardenet,
Pierre Chainais
Abstract:
We study quadrature rules for functions from an RKHS, using nodes sampled from a determinantal point process (DPP). DPPs are parametrized by a kernel, and we use a truncated and saturated version of the RKHS kernel. This link between the two kernels, along with DPP machinery, leads to relatively tight bounds on the quadrature error, that depends on the spectrum of the RKHS kernel. Finally, we expe…
▽ More
We study quadrature rules for functions from an RKHS, using nodes sampled from a determinantal point process (DPP). DPPs are parametrized by a kernel, and we use a truncated and saturated version of the RKHS kernel. This link between the two kernels, along with DPP machinery, leads to relatively tight bounds on the quadrature error, that depends on the spectrum of the RKHS kernel. Finally, we experimentally compare DPPs to existing kernel-based quadratures such as herding, Bayesian quadrature, or leverage score sampling. Numerical results confirm the interest of DPPs, and even suggest faster rates than our bounds in particular cases.
△ Less
Submitted 31 December, 2019; v1 submitted 18 June, 2019;
originally announced June 2019.
-
A determinantal point process for column subset selection
Authors:
Ayoub Belhadji,
Rémi Bardenet,
Pierre Chainais
Abstract:
Dimensionality reduction is a first step of many machine learning pipelines. Two popular approaches are principal component analysis, which projects onto a small number of well chosen but non-interpretable directions, and feature selection, which selects a small number of the original features. Feature selection can be abstracted as a numerical linear algebra problem called the column subset selec…
▽ More
Dimensionality reduction is a first step of many machine learning pipelines. Two popular approaches are principal component analysis, which projects onto a small number of well chosen but non-interpretable directions, and feature selection, which selects a small number of the original features. Feature selection can be abstracted as a numerical linear algebra problem called the column subset selection problem (CSSP). CSSP corresponds to selecting the best subset of columns of a matrix $X \in \mathbb{R}^{N \times d}$, where \emph{best} is often meant in the sense of minimizing the approximation error, i.e., the norm of the residual after projection of $X$ onto the space spanned by the selected columns. Such an optimization over subsets of $\{1,\dots,d\}$ is usually impractical. One workaround that has been vastly explored is to resort to polynomial-cost, random subset selection algorithms that favor small values of this approximation error. We propose such a randomized algorithm, based on sampling from a projection determinantal point process (DPP), a repulsive distribution over a fixed number $k$ of indices $\{1,\dots,d\}$ that favors diversity among the selected columns. We give bounds on the ratio of the expected approximation error for this DPP over the optimal error of PCA. These bounds improve over the state-of-the-art bounds of \emph{volume sampling} when some realistic structural assumptions are satisfied for $X$. Numerical experiments suggest that our bounds are tight, and that our algorithms have comparable performance with the \emph{double phase} algorithm, often considered to be the practical state-of-the-art. Column subset selection with DPPs thus inherits the best of both worlds: good empirical performance and tight error bounds.
△ Less
Submitted 23 December, 2018;
originally announced December 2018.
-
DPPy: Sampling DPPs with Python
Authors:
Guillaume Gautier,
Guillermo Polito,
Rémi Bardenet,
Michal Valko
Abstract:
Determinantal point processes (DPPs) are specific probability distributions over clouds of points that are used as models and computational tools across physics, probability, statistics, and more recently machine learning. Sampling from DPPs is a challenge and therefore we present DPPy, a Python toolbox that gathers known exact and approximate sampling algorithms for both finite and continuous DPP…
▽ More
Determinantal point processes (DPPs) are specific probability distributions over clouds of points that are used as models and computational tools across physics, probability, statistics, and more recently machine learning. Sampling from DPPs is a challenge and therefore we present DPPy, a Python toolbox that gathers known exact and approximate sampling algorithms for both finite and continuous DPPs. The project is hosted on GitHub and equipped with an extensive documentation.
△ Less
Submitted 12 August, 2019; v1 submitted 19 September, 2018;
originally announced September 2018.
-
Zonotope hit-and-run for efficient sampling from projection DPPs
Authors:
Guillaume Gautier,
Rémi Bardenet,
Michal Valko
Abstract:
Determinantal point processes (DPPs) are distributions over sets of items that model diversity using kernels. Their applications in machine learning include summary extraction and recommendation systems. Yet, the cost of sampling from a DPP is prohibitive in large-scale applications, which has triggered an effort towards efficient approximate samplers. We build a novel MCMC sampler that combines i…
▽ More
Determinantal point processes (DPPs) are distributions over sets of items that model diversity using kernels. Their applications in machine learning include summary extraction and recommendation systems. Yet, the cost of sampling from a DPP is prohibitive in large-scale applications, which has triggered an effort towards efficient approximate samplers. We build a novel MCMC sampler that combines ideas from combinatorial geometry, linear programming, and Monte Carlo methods to sample from DPPs with a fixed sample cardinality, also called projection DPPs. Our sampler leverages the ability of the hit-and-run MCMC kernel to efficiently move across convex bodies. Previous theoretical results yield a fast mixing time of our chain when targeting a distribution that is close to a projection DPP, but not a DPP in general. Our empirical results demonstrate that this extends to sampling projection DPPs, i.e., our sampler is more sample-efficient than previous approaches which in turn translates to faster convergence when dealing with costly-to-evaluate functions, such as summary extraction in our experiments.
△ Less
Submitted 15 June, 2017; v1 submitted 30 May, 2017;
originally announced May 2017.