Search | arXiv e-print repository

Differentiable Expectation-Maximisation and Applications to Gaussian Mixture Model Optimal Transport

Authors: Samuel Boïté, Eloi Tanguy, Julie Delon, Agnès Desolneux, Rémi Flamary

Abstract: The Expectation-Maximisation (EM) algorithm is a central tool in statistics and machine learning, widely used for latent-variable models such as Gaussian Mixture Models (GMMs). Despite its ubiquity, EM is typically treated as a non-differentiable black box, preventing its integration into modern learning pipelines where end-to-end gradient propagation is essential. In this work, we present and com… ▽ More The Expectation-Maximisation (EM) algorithm is a central tool in statistics and machine learning, widely used for latent-variable models such as Gaussian Mixture Models (GMMs). Despite its ubiquity, EM is typically treated as a non-differentiable black box, preventing its integration into modern learning pipelines where end-to-end gradient propagation is essential. In this work, we present and compare several differentiation strategies for EM, from full automatic differentiation to approximate methods, assessing their accuracy and computational efficiency. As a key application, we leverage this differentiable EM in the computation of the Mixture Wasserstein distance $\mathrm{MW}_2$ between GMMs, allowing $\mathrm{MW}_2$ to be used as a differentiable loss in imaging and machine learning tasks. To complement our practical use of $\mathrm{MW}_2$, we contribute a novel stability result which provides theoretical justification for the use of $\mathrm{MW}_2$ with EM, and also introduce a novel unbalanced variant of $\mathrm{MW}_2$. Numerical experiments on barycentre computation, colour and style transfer, image generation, and texture synthesis illustrate the versatility of the proposed approach in different settings. △ Less

Submitted 29 September, 2025; v1 submitted 2 September, 2025; originally announced September 2025.

arXiv:2307.10352 [pdf, other]

doi 10.1090/mcom/3994

Properties of Discrete Sliced Wasserstein Losses

Authors: Eloi Tanguy, Rémi Flamary, Julie Delon

Abstract: The Sliced Wasserstein (SW) distance has become a popular alternative to the Wasserstein distance for comparing probability measures. Widespread applications include image processing, domain adaptation and generative modelling, where it is common to optimise some parameters in order to minimise SW, which serves as a loss function between discrete probability measures (since measures admitting dens… ▽ More The Sliced Wasserstein (SW) distance has become a popular alternative to the Wasserstein distance for comparing probability measures. Widespread applications include image processing, domain adaptation and generative modelling, where it is common to optimise some parameters in order to minimise SW, which serves as a loss function between discrete probability measures (since measures admitting densities are numerically unattainable). All these optimisation problems bear the same sub-problem, which is minimising the Sliced Wasserstein energy. In this paper we study the properties of $\mathcal{E}: Y \longmapsto \mathrm{SW}_2^2(γ_Y, γ_Z)$, i.e. the SW distance between two uniform discrete measures with the same amount of points as a function of the support $Y \in \mathbb{R}^{n \times d}$ of one of the measures. We investigate the regularity and optimisation properties of this energy, as well as its Monte-Carlo approximation $\mathcal{E}_p$ (estimating the expectation in SW using only $p$ samples) and show convergence results on the critical points of $\mathcal{E}_p$ to those of $\mathcal{E}$, as well as an almost-sure uniform convergence and a uniform Central Limit result on the process $\mathcal{E}_p(Y)$. Finally, we show that in a certain sense, Stochastic Gradient Descent methods minimising $\mathcal{E}$ and $\mathcal{E}_p$ converge towards (Clarke) critical points of these energies. △ Less

Submitted 14 May, 2025; v1 submitted 19 July, 2023; originally announced July 2023.

Journal ref: Mathematics of Computation (2024)

arXiv:2304.12029 [pdf, other]

doi 10.5802/crmath.601

Reconstructing discrete measures from projections. Consequences on the empirical Sliced Wasserstein Distance

Authors: Eloi Tanguy, Rémi Flamary, Julie Delon

Abstract: This paper deals with the reconstruction of a discrete measure $γ_Z$ on $\mathbb{R}^d$ from the knowledge of its pushforward measures $P_i\#γ_Z$ by linear applications $P_i: \mathbb{R}^d \rightarrow \mathbb{R}^{d_i}$ (for instance projections onto subspaces). The measure $γ_Z$ being fixed, assuming that the rows of the matrices $P_i$ are independent realizations of laws which do not give mass to h… ▽ More This paper deals with the reconstruction of a discrete measure $γ_Z$ on $\mathbb{R}^d$ from the knowledge of its pushforward measures $P_i\#γ_Z$ by linear applications $P_i: \mathbb{R}^d \rightarrow \mathbb{R}^{d_i}$ (for instance projections onto subspaces). The measure $γ_Z$ being fixed, assuming that the rows of the matrices $P_i$ are independent realizations of laws which do not give mass to hyperplanes, we show that if $\sum_i d_i > d$, this reconstruction problem has almost certainly a unique solution. This holds for any number of points in $γ_Z$. A direct consequence of this result is an almost-sure separability property on the empirical Sliced Wasserstein distance. △ Less

Submitted 12 April, 2024; v1 submitted 24 April, 2023; originally announced April 2023.

Journal ref: Comptes Rendus. Mathématique 362 (2024), pp. 1121-1129

arXiv:2110.15813 [pdf, other]

Sliding window strategy for convolutional spike sorting with Lasso : Algorithm, theoretical guarantees and complexity

Authors: Laurent Dragoni, Rémi Flamary, Karim Lounici, Patricia Reynaud-Bouret

Abstract: Spike sorting is a class of algorithms used in neuroscience to attribute the time occurences of particular electric signals, called action potential or spike, to neurons. We rephrase this problem as a particular optimization problem : Lasso for convolutional models in high dimension. Lasso (i.e. least absolute shrinkage and selection operator) is a very generic tool in machine learning that help u… ▽ More Spike sorting is a class of algorithms used in neuroscience to attribute the time occurences of particular electric signals, called action potential or spike, to neurons. We rephrase this problem as a particular optimization problem : Lasso for convolutional models in high dimension. Lasso (i.e. least absolute shrinkage and selection operator) is a very generic tool in machine learning that help us to look for sparse solutions (here the time occurrences). However, for the size of the problem at hand in this neuroscience context, the classical Lasso solvers are failing. We present here a new and much faster algorithm. Making use of biological properties related to neurons, we explain how the particular structure of the problem allows several optimizations, leading to an algorithm with a temporal complexity which grows linearly with respect to the size of the recorded signal and can be performed online. Moreover the spatial separability of the initial problem allows to break it into subproblems, further reducing the complexity and making possible its application on the latest recording devices which comprise a large number of sensors. We provide several mathematical results: the size and numerical complexity of the subproblems can be estimated mathematically by using percolation theory. We also show under reasonable assumptions that the Lasso estimator retrieves the true time occurrences of the spikes {with large probability}. Finally the theoretical time complexity of the algorithm is given. Numerical simulations are also provided in order to illustrate the efficiency of our approach. △ Less

Submitted 11 April, 2022; v1 submitted 29 October, 2021; originally announced October 2021.

arXiv:2110.00629 [pdf, other]

Factored couplings in multi-marginal optimal transport via difference of convex programming

Authors: Quang Huy Tran, Hicham Janati, Ievgen Redko, Rémi Flamary, Nicolas Courty

Abstract: Optimal transport (OT) theory underlies many emerging machine learning (ML) methods nowadays solving a wide range of tasks such as generative modeling, transfer learning and information retrieval. These latter works, however, usually build upon a traditional OT setup with two distributions, while leaving a more general multi-marginal OT formulation somewhat unexplored. In this paper, we study the… ▽ More Optimal transport (OT) theory underlies many emerging machine learning (ML) methods nowadays solving a wide range of tasks such as generative modeling, transfer learning and information retrieval. These latter works, however, usually build upon a traditional OT setup with two distributions, while leaving a more general multi-marginal OT formulation somewhat unexplored. In this paper, we study the multi-marginal OT (MMOT) problem and unify several popular OT methods under its umbrella by promoting structural information on the coupling. We show that incorporating such structural information into MMOT results in an instance of a different of convex (DC) programming problem allowing us to solve it numerically. Despite high computational cost of the latter procedure, the solutions provided by DC optimization are usually as qualitative as those obtained using currently employed optimization schemes. △ Less

Submitted 1 December, 2021; v1 submitted 1 October, 2021; originally announced October 2021.

Comments: Revision of notation and proofs

arXiv:2106.04145 [pdf, other]

Unbalanced Optimal Transport through Non-negative Penalized Linear Regression

Authors: Laetitia Chapel, Rémi Flamary, Haoran Wu, Cédric Févotte, Gilles Gasso

Abstract: This paper addresses the problem of Unbalanced Optimal Transport (UOT) in which the marginal conditions are relaxed (using weighted penalties in lieu of equality) and no additional regularization is enforced on the OT plan. In this context, we show that the corresponding optimization problem can be reformulated as a non-negative penalized linear regression problem. This reformulation allows us to… ▽ More This paper addresses the problem of Unbalanced Optimal Transport (UOT) in which the marginal conditions are relaxed (using weighted penalties in lieu of equality) and no additional regularization is enforced on the OT plan. In this context, we show that the corresponding optimization problem can be reformulated as a non-negative penalized linear regression problem. This reformulation allows us to propose novel algorithms inspired from inverse problems and nonnegative matrix factorization. In particular, we consider majorization-minimization which leads in our setting to efficient multiplicative updates for a variety of penalties. Furthermore, we derive for the first time an efficient algorithm to compute the regularization path of UOT with quadratic penalties. The proposed algorithm provides a continuity of piece-wise linear OT plans converging to the solution of balanced OT (corresponding to infinite penalty weights). We perform several numerical experiments on simulated and real data illustrating the new algorithms, and provide a detailed discussion about more sophisticated optimization tools that can further be used to solve OT problems thanks to our reformulation. △ Less

Submitted 8 June, 2021; originally announced June 2021.

Comments: Laetitia Chapel and Rémi Flamary have equal contribution

arXiv:2103.03606 [pdf, other]

Unbalanced minibatch Optimal Transport; applications to Domain Adaptation

Authors: Kilian Fatras, Thibault Séjourné, Nicolas Courty, Rémi Flamary

Abstract: Optimal transport distances have found many applications in machine learning for their capacity to compare non-parametric probability distributions. Yet their algorithmic complexity generally prevents their direct use on large scale datasets. Among the possible strategies to alleviate this issue, practitioners can rely on computing estimates of these distances over subsets of data, {\em i.e.} mini… ▽ More Optimal transport distances have found many applications in machine learning for their capacity to compare non-parametric probability distributions. Yet their algorithmic complexity generally prevents their direct use on large scale datasets. Among the possible strategies to alleviate this issue, practitioners can rely on computing estimates of these distances over subsets of data, {\em i.e.} minibatches. While computationally appealing, we highlight in this paper some limits of this strategy, arguing it can lead to undesirable smoothing effects. As an alternative, we suggest that the same minibatch strategy coupled with unbalanced optimal transport can yield more robust behavior. We discuss the associated theoretical properties, such as unbiased estimators, existence of gradients and concentration bounds. Our experimental study shows that in challenging problems associated to domain adaptation, the use of unbalanced optimal transport leads to significantly better results, competing with or surpassing recent baselines. △ Less

Submitted 5 March, 2021; originally announced March 2021.

arXiv:1906.12077 [pdf, other]

Large scale Lasso with windowed active set for convolutional spike sorting

Authors: Laurent Dragoni, Rémi Flamary, Karim Lounici, Patricia Reynaud-Bouret

Abstract: Spike sorting is a fundamental preprocessing step in neuroscience that is central to access simultaneous but distinct neuronal activities and therefore to better understand the animal or even human brain. But numerical complexity limits studies that require processing large scale datasets in terms of number of electrodes, neurons, spikes and length of the recorded signals. We propose in this work… ▽ More Spike sorting is a fundamental preprocessing step in neuroscience that is central to access simultaneous but distinct neuronal activities and therefore to better understand the animal or even human brain. But numerical complexity limits studies that require processing large scale datasets in terms of number of electrodes, neurons, spikes and length of the recorded signals. We propose in this work a novel active set algorithm aimed at solving the Lasso for a classical convolutional model. Our algorithm can be implemented efficiently on parallel architecture and has a linear complexity w.r.t. the temporal dimensionality which ensures scaling and will open the door to online spike sorting. We provide theoretical results about the complexity of the algorithm and illustrate it in numerical experiments along with results about the accuracy of the spike recovery and robustness to the regularization parameter. △ Less

Submitted 28 June, 2019; originally announced June 2019.

arXiv:1905.10155 [pdf, other]

Concentration bounds for linear Monge mapping estimation and optimal transport domain adaptation

Authors: Rémi Flamary, Karim Lounici, André Ferrari

Abstract: This article investigates the quality of the estimator of the linear Monge mapping between distributions. We provide the first concentration result on the linear mapping operator and prove a sample complexity of $n^{-1/2}$ when using empirical estimates of first and second order moments. This result is then used to derive a generalization bound for domain adaptation with optimal transport. As a co… ▽ More This article investigates the quality of the estimator of the linear Monge mapping between distributions. We provide the first concentration result on the linear mapping operator and prove a sample complexity of $n^{-1/2}$ when using empirical estimates of first and second order moments. This result is then used to derive a generalization bound for domain adaptation with optimal transport. As a consequence, this method approaches the performance of theoretical Bayes predictor under mild conditions on the covariance structure of the problem. We also discuss the computational complexity of the linear mapping estimation and show that when the source and target are stationary the mapping is a convolution that can be estimated very efficiently using fast Fourier transforms. Numerical experiments reproduce the behavior of the proven bounds on simulated and real data for mapping estimation and domain adaptation on images. △ Less

Submitted 1 December, 2020; v1 submitted 24 May, 2019; originally announced May 2019.

arXiv:1711.11423 [pdf, other]

doi 10.1109/TSIPN.2018.2863218

On reducing the communication cost of the diffusion LMS algorithm

Authors: Ibrahim El Khalil Harrane, Rémi Flamary, Cédric Richard

Abstract: The rise of digital and mobile communications has recently made the world more connected and networked, resulting in an unprecedented volume of data flowing between sources, data centers, or processes. While these data may be processed in a centralized manner, it is often more suitable to consider distributed strategies such as diffusion as they are scalable and can handle large amounts of data by… ▽ More The rise of digital and mobile communications has recently made the world more connected and networked, resulting in an unprecedented volume of data flowing between sources, data centers, or processes. While these data may be processed in a centralized manner, it is often more suitable to consider distributed strategies such as diffusion as they are scalable and can handle large amounts of data by distributing tasks over networked agents. Although it is relatively simple to implement diffusion strategies over a cluster, it appears to be challenging to deploy them in an ad-hoc network with limited energy budget for communication. In this paper, we introduce a diffusion LMS strategy that significantly reduces communication costs without compromising the performance. Then, we analyze the proposed algorithm in the mean and mean-square sense. Next, we conduct numerical experiments to confirm the theoretical findings. Finally, we perform large scale simulations to test the algorithm efficiency in a scenario where energy is limited. △ Less

Submitted 23 July, 2018; v1 submitted 30 November, 2017; originally announced November 2017.

arXiv:1705.06603 [pdf, other]

Distributed Deblurring of Large Images of Wide Field-Of-View

Authors: Rahul Mourya, André Ferrari, Rémi Flamary, Pascal Bianchi, Cédric Richard

Abstract: Image deblurring is an economic way to reduce certain degradations (blur and noise) in acquired images. Thus, it has become essential tool in high resolution imaging in many applications, e.g., astronomy, microscopy or computational photography. In applications such as astronomy and satellite imaging, the size of acquired images can be extremely large (up to gigapixels) covering wide field-of-view… ▽ More Image deblurring is an economic way to reduce certain degradations (blur and noise) in acquired images. Thus, it has become essential tool in high resolution imaging in many applications, e.g., astronomy, microscopy or computational photography. In applications such as astronomy and satellite imaging, the size of acquired images can be extremely large (up to gigapixels) covering wide field-of-view suffering from shift-variant blur. Most of the existing image deblurring techniques are designed and implemented to work efficiently on centralized computing system having multiple processors and a shared memory. Thus, the largest image that can be handle is limited by the size of the physical memory available on the system. In this paper, we propose a distributed nonblind image deblurring algorithm in which several connected processing nodes (with reasonable computational resources) process simultaneously different portions of a large image while maintaining certain coherency among them to finally obtain a single crisp image. Unlike the existing centralized techniques, image deblurring in distributed fashion raises several issues. To tackle these issues, we consider certain approximations that trade-offs between the quality of deblurred image and the computational resources required to achieve it. The experimental results show that our algorithm produces the similar quality of images as the existing centralized techniques while allowing distribution, and thus being cost effective for extremely large images. △ Less

Submitted 17 May, 2017; originally announced May 2017.

Comments: 16 pages, 10 figures, submitted to IEEE Trans. on Image Processing

arXiv:1606.07286 [pdf, ps, other]

doi 10.1109/CAMSAP.2015.7383796

Importance sampling strategy for non-convex randomized block-coordinate descent

Authors: Rémi Flamary, Alain Rakotomamonjy, Gilles Gasso

Abstract: As the number of samples and dimensionality of optimization problems related to statistics an machine learning explode, block coordinate descent algorithms have gained popularity since they reduce the original problem to several smaller ones. Coordinates to be optimized are usually selected randomly according to a given probability distribution. We introduce an importance sampling strategy that he… ▽ More As the number of samples and dimensionality of optimization problems related to statistics an machine learning explode, block coordinate descent algorithms have gained popularity since they reduce the original problem to several smaller ones. Coordinates to be optimized are usually selected randomly according to a given probability distribution. We introduce an importance sampling strategy that helps randomized coordinate descent algorithms to focus on blocks that are still far from convergence. The framework applies to problems composed of the sum of two possibly non-convex terms, one being separable and non-smooth. We have compared our algorithm to a full gradient proximal approach as well as to a randomized block coordinate algorithm that considers uniform sampling and cyclic block coordinate descent. Experimental evidences show the clear benefit of using an importance sampling strategy. △ Less

Submitted 23 June, 2016; originally announced June 2016.

Journal ref: IEEE INTERNATIONAL WORKSHOP ON COMPUTATIONAL ADVANCES IN MULTI-SENSOR ADAPTIVE PROCESSING, Dec 2015, Cancun, Mexico. 2015

arXiv:1510.06567 [pdf, ps, other]

Generalized conditional gradient: analysis of convergence and applications

Authors: Alain Rakotomamonjy, Rémi Flamary, Nicolas Courty

Abstract: The objectives of this technical report is to provide additional results on the generalized conditional gradient methods introduced by Bredies et al. [BLM05]. Indeed , when the objective function is smooth, we provide a novel certificate of optimality and we show that the algorithm has a linear convergence rate. Applications of this algorithm are also discussed. The objectives of this technical report is to provide additional results on the generalized conditional gradient methods introduced by Bredies et al. [BLM05]. Indeed , when the objective function is smooth, we provide a novel certificate of optimality and we show that the algorithm has a linear convergence rate. Applications of this algorithm are also discussed. △ Less

Submitted 22 October, 2015; originally announced October 2015.

arXiv:1507.00438 [pdf, ps, other]

DC Proximal Newton for Non-Convex Optimization Problems

Authors: Alain Rakotomamonjy, Remi Flamary, Gilles Gasso

Abstract: We introduce a novel algorithm for solving learning problems where both the loss function and the regularizer are non-convex but belong to the class of difference of convex (DC) functions. Our contribution is a new general purpose proximal Newton algorithm that is able to deal with such a situation. The algorithm consists in obtaining a descent direction from an approximation of the loss function… ▽ More We introduce a novel algorithm for solving learning problems where both the loss function and the regularizer are non-convex but belong to the class of difference of convex (DC) functions. Our contribution is a new general purpose proximal Newton algorithm that is able to deal with such a situation. The algorithm consists in obtaining a descent direction from an approximation of the loss function and then in performing a line search to ensure sufficient descent. A theoretical analysis is provided showing that the iterates of the proposed algorithm {admit} as limit points stationary points of the DC objective function. Numerical experiments show that our approach is more efficient than current state of the art for a problem with a convex loss functions and non-convex regularizer. We have also illustrated the benefit of our algorithm in high-dimensional transductive learning problem where both loss function and regularizers are non-convex. △ Less

Submitted 2 July, 2015; originally announced July 2015.

Showing 1–14 of 14 results for author: Flamary, R