Skip to main content

Showing 1–11 of 11 results for author: Muzellec, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2210.04620  [pdf, other

    cs.LG cs.CV

    FLamby: Datasets and Benchmarks for Cross-Silo Federated Learning in Realistic Healthcare Settings

    Authors: Jean Ogier du Terrail, Samy-Safwan Ayed, Edwige Cyffers, Felix Grimberg, Chaoyang He, Regis Loeb, Paul Mangold, Tanguy Marchand, Othmane Marfoq, Erum Mushtaq, Boris Muzellec, Constantin Philippenko, Santiago Silva, Maria Teleńczuk, Shadi Albarqouni, Salman Avestimehr, Aurélien Bellet, Aymeric Dieuleveut, Martin Jaggi, Sai Praneeth Karimireddy, Marco Lorenzi, Giovanni Neglia, Marc Tommasi, Mathieu Andreux

    Abstract: Federated Learning (FL) is a novel approach enabling several clients holding sensitive data to collaboratively train machine learning models, without centralizing data. The cross-silo FL setting corresponds to the case of few ($2$--$50$) reliable clients, each holding medium to large datasets, and is typically found in applications such as healthcare, finance, or industry. While previous works hav… ▽ More

    Submitted 5 May, 2023; v1 submitted 10 October, 2022; originally announced October 2022.

    Comments: Accepted to NeurIPS, Datasets and Benchmarks Track, this version fixes typos in the datasets' table and the appendix

  2. arXiv:2210.01639  [pdf, other

    cs.LG

    SecureFedYJ: a safe feature Gaussianization protocol for Federated Learning

    Authors: Tanguy Marchand, Boris Muzellec, Constance Beguier, Jean Ogier du Terrail, Mathieu Andreux

    Abstract: The Yeo-Johnson (YJ) transformation is a standard parametrized per-feature unidimensional transformation often used to Gaussianize features in machine learning. In this paper, we investigate the problem of applying the YJ transformation in a cross-silo Federated Learning setting under privacy constraints. For the first time, we prove that the YJ negative log-likelihood is in fact convex, which all… ▽ More

    Submitted 13 October, 2022; v1 submitted 4 October, 2022; originally announced October 2022.

    Comments: Accepted to Neurips2022

  3. arXiv:2112.01907  [pdf, other

    stat.ML cs.LG math.ST

    Near-optimal estimation of smooth transport maps with kernel sums-of-squares

    Authors: Boris Muzellec, Adrien Vacher, Francis Bach, François-Xavier Vialard, Alessandro Rudi

    Abstract: It was recently shown that under smoothness conditions, the squared Wasserstein distance between two distributions could be efficiently computed with appealing statistical error upper bounds. However, rather than the distance itself, the object of interest for applications such as generative modeling is the underlying optimal transport map. Hence, computational and statistical guarantees need to b… ▽ More

    Submitted 29 December, 2021; v1 submitted 3 December, 2021; originally announced December 2021.

  4. arXiv:2111.11306  [pdf, other

    stat.ML cs.LG

    Learning PSD-valued functions using kernel sums-of-squares

    Authors: Boris Muzellec, Francis Bach, Alessandro Rudi

    Abstract: Shape constraints such as positive semi-definiteness (PSD) for matrices or convexity for functions play a central role in many applications in machine learning and sciences, including metric learning, optimal transport, and economics. Yet, very few function models exist that enforce PSD-ness or convexity with good empirical performance and theoretical guarantees. In this paper, we introduce a kern… ▽ More

    Submitted 24 January, 2022; v1 submitted 22 November, 2021; originally announced November 2021.

  5. arXiv:2106.09994  [pdf, other

    cs.LG stat.ML

    A Note on Optimizing Distributions using Kernel Mean Embeddings

    Authors: Boris Muzellec, Francis Bach, Alessandro Rudi

    Abstract: Kernel mean embeddings are a popular tool that consists in representing probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space. When the kernel is characteristic, mean embeddings can be used to define a distance between probability measures, known as the maximum mean discrepancy (MMD). A well-known advantage of mean embeddings and MMD is their low… ▽ More

    Submitted 27 June, 2021; v1 submitted 18 June, 2021; originally announced June 2021.

  6. arXiv:2003.00306  [pdf, ps, other

    math.PR cs.LG stat.ML

    Dimension-free convergence rates for gradient Langevin dynamics in RKHS

    Authors: Boris Muzellec, Kanji Sato, Mathurin Massias, Taiji Suzuki

    Abstract: Gradient Langevin dynamics (GLD) and stochastic GLD (SGLD) have attracted considerable attention lately, as a way to provide convergence guarantees in a non-convex setting. However, the known rates grow exponentially with the dimension of the space. In this work, we provide a convergence analysis of GLD and SGLD when the optimization space is an infinite dimensional Hilbert space. More precisely,… ▽ More

    Submitted 26 March, 2020; v1 submitted 29 February, 2020; originally announced March 2020.

  7. arXiv:2002.03860  [pdf, other

    stat.ML cs.LG

    Missing Data Imputation using Optimal Transport

    Authors: Boris Muzellec, Julie Josse, Claire Boyer, Marco Cuturi

    Abstract: Missing data is a crucial issue when applying machine learning algorithms to real-world datasets. Starting from the simple assumption that two batches extracted randomly from the same dataset should share the same distribution, we leverage optimal transport distances to quantify that criterion and turn it into a loss function to impute missing data values. We propose practical methods to minimize… ▽ More

    Submitted 1 July, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

  8. arXiv:1905.10099  [pdf, other

    cs.LG stat.ML

    Subspace Detours: Building Transport Plans that are Optimal on Subspace Projections

    Authors: Boris Muzellec, Marco Cuturi

    Abstract: Computing optimal transport (OT) between measures in high dimensions is doomed by the curse of dimensionality. A popular approach to avoid this curse is to project input measures on lower-dimensional subspaces (1D lines in the case of sliced Wasserstein distances), solve the OT problem between these reduced measures, and settle for the Wasserstein distance between these reductions, rather than tha… ▽ More

    Submitted 28 October, 2019; v1 submitted 24 May, 2019; originally announced May 2019.

  9. arXiv:1805.07594  [pdf, other

    stat.ML cs.LG

    Generalizing Point Embeddings using the Wasserstein Space of Elliptical Distributions

    Authors: Boris Muzellec, Marco Cuturi

    Abstract: Embedding complex objects as vectors in low dimensional spaces is a longstanding problem in machine learning. We propose in this work an extension of that approach, which consists in embedding objects as elliptical probability distributions, namely distributions whose densities have elliptical level sets. We endow these measures with the 2-Wasserstein metric, with two important benefits: (i) For s… ▽ More

    Submitted 17 February, 2019; v1 submitted 19 May, 2018; originally announced May 2018.

    Journal ref: Advances in Neural Information Processing Systems 31, pages 10258--10269, 2018

  10. arXiv:1609.07082  [pdf, other

    cs.LG cs.CG cs.CV

    Large Margin Nearest Neighbor Classification using Curved Mahalanobis Distances

    Authors: Frank Nielsen, Boris Muzellec, Richard Nock

    Abstract: We consider the supervised classification problem of machine learning in Cayley-Klein projective geometries: We show how to learn a curved Mahalanobis metric distance corresponding to either the hyperbolic geometry or the elliptic geometry using the Large Margin Nearest Neighbor (LMNN) framework. We report on our experimental results, and further consider the case of learning a mixed curved Mahala… ▽ More

    Submitted 26 September, 2016; v1 submitted 22 September, 2016; originally announced September 2016.

    Comments: 21 pages, 8 figures, 5 tables, extend ICIP 2016 paper entitled "classification With Mixtures of Curved Mahalanobis Metrics"

  11. arXiv:1609.04495  [pdf, other

    cs.LG

    Tsallis Regularized Optimal Transport and Ecological Inference

    Authors: Boris Muzellec, Richard Nock, Giorgio Patrini, Frank Nielsen

    Abstract: Optimal transport is a powerful framework for computing distances between probability distributions. We unify the two main approaches to optimal transport, namely Monge-Kantorovitch and Sinkhorn-Cuturi, into what we define as Tsallis regularized optimal transport (\trot). \trot~interpolates a rich family of distortions from Wasserstein to Kullback-Leibler, encompassing as well Pearson, Neyman and… ▽ More

    Submitted 14 September, 2016; originally announced September 2016.

    ACM Class: G.1.6