Search | arXiv e-print repository

KD$^{2}$M: An unifying framework for feature knowledge distillation

Abstract: Knowledge Distillation (KD) seeks to transfer the knowledge of a teacher, towards a student neural net. This process is often done by matching the networks' predictions (i.e., their output), but, recently several works have proposed to match the distributions of neural nets' activations (i.e., their features), a process known as \emph{distribution matching}. In this paper, we propose an unifying f… ▽ More Knowledge Distillation (KD) seeks to transfer the knowledge of a teacher, towards a student neural net. This process is often done by matching the networks' predictions (i.e., their output), but, recently several works have proposed to match the distributions of neural nets' activations (i.e., their features), a process known as \emph{distribution matching}. In this paper, we propose an unifying framework, Knowledge Distillation through Distribution Matching (KD$^{2}$M), which formalizes this strategy. Our contributions are threefold. We i) provide an overview of distribution metrics used in distribution matching, ii) benchmark on computer vision datasets, and iii) derive new theoretical results for KD. △ Less

Submitted 2 April, 2025; originally announced April 2025.

Comments: 8 pages, 2 figures, 1 table, under review

arXiv:2503.17683 [pdf, other]

Decentralized Federated Dataset Dictionary Learning for Multi-Source Domain Adaptation

Authors: Rebecca Clain, Eduardo Fernandes Montesuma, Fred Ngolè Mboula

Abstract: Decentralized Multi-Source Domain Adaptation (DMSDA) is a challenging task that aims to transfer knowledge from multiple related and heterogeneous source domains to an unlabeled target domain within a decentralized framework. Our work tackles DMSDA through a fully decentralized federated approach. In particular, we extend the Federated Dataset Dictionary Learning (FedDaDiL) framework by eliminatin… ▽ More Decentralized Multi-Source Domain Adaptation (DMSDA) is a challenging task that aims to transfer knowledge from multiple related and heterogeneous source domains to an unlabeled target domain within a decentralized framework. Our work tackles DMSDA through a fully decentralized federated approach. In particular, we extend the Federated Dataset Dictionary Learning (FedDaDiL) framework by eliminating the necessity for a central server. FedDaDiL leverages Wasserstein barycenters to model the distributional shift across multiple clients, enabling effective adaptation while preserving data privacy. By decentralizing this framework, we enhance its robustness, scalability, and privacy, removing the risk of a single point of failure. We compare our method to its federated counterpart and other benchmark algorithms, showing that our approach effectively adapts source domains to an unlabeled target domain in a fully decentralized manner. △ Less

Submitted 22 March, 2025; originally announced March 2025.

Comments: Accepted at ICASSP 2025

arXiv:2502.12793 [pdf, other]

Unsupervised Anomaly Detection through Mass Repulsing Optimal Transport

Authors: Eduardo Fernandes Montesuma, Adel El Habazi, Fred Ngole Mboula

Abstract: Detecting anomalies in datasets is a longstanding problem in machine learning. In this context, anomalies are defined as a sample that significantly deviates from the remaining data. Meanwhile, optimal transport (OT) is a field of mathematics concerned with the transportation, between two probability measures, at least effort. In classical OT, the optimal transportation strategy of a measure to it… ▽ More Detecting anomalies in datasets is a longstanding problem in machine learning. In this context, anomalies are defined as a sample that significantly deviates from the remaining data. Meanwhile, optimal transport (OT) is a field of mathematics concerned with the transportation, between two probability measures, at least effort. In classical OT, the optimal transportation strategy of a measure to itself is the identity. In this paper, we tackle anomaly detection by forcing samples to displace its mass, while keeping the least effort objective. We call this new transportation problem Mass Repulsing Optimal Transport (MROT). Naturally, samples lying in low density regions of space will be forced to displace mass very far, incurring a higher transportation cost. We use these concepts to design a new anomaly score. Through a series of experiments in existing benchmarks, and fault detection problems, we show that our algorithm improves over existing methods. △ Less

Submitted 18 February, 2025; originally announced February 2025.

Comments: 15 pages, 9 figures, 1 table, under review

arXiv:2501.13732 [pdf, other]

A dimensionality reduction technique based on the Gromov-Wasserstein distance

Authors: Rafael P. Eufrazio, Eduardo Fernandes Montesuma, Charles C. Cavalcante

Abstract: Analyzing relationships between objects is a pivotal problem within data science. In this context, Dimensionality reduction (DR) techniques are employed to generate smaller and more manageable data representations. This paper proposes a new method for dimensionality reduction, based on optimal transportation theory and the Gromov-Wasserstein distance. We offer a new probabilistic view of the class… ▽ More Analyzing relationships between objects is a pivotal problem within data science. In this context, Dimensionality reduction (DR) techniques are employed to generate smaller and more manageable data representations. This paper proposes a new method for dimensionality reduction, based on optimal transportation theory and the Gromov-Wasserstein distance. We offer a new probabilistic view of the classical Multidimensional Scaling (MDS) algorithm and the nonlinear dimensionality reduction algorithm, Isomap (Isometric Mapping or Isometric Feature Mapping) that extends the classical MDS, in which we use the Gromov-Wasserstein distance between the probability measure of high-dimensional data, and its low-dimensional representation. Through gradient descent, our method embeds high-dimensional data into a lower-dimensional space, providing a robust and efficient solution for analyzing complex high-dimensional datasets. △ Less

Submitted 23 January, 2025; originally announced January 2025.

arXiv:2407.19853 [pdf, other]

Online Multi-Source Domain Adaptation through Gaussian Mixtures and Dataset Dictionary Learning

Authors: Eduardo Fernandes Montesuma, Stevan Le Stanc, Fred Ngolè Mboula

Abstract: This paper addresses the challenge of online multi-source domain adaptation (MSDA) in transfer learning, a scenario where one needs to adapt multiple, heterogeneous source domains towards a target domain that comes in a stream. We introduce a novel approach for the online fit of a Gaussian Mixture Model (GMM), based on the Wasserstein geometry of Gaussian measures. We build upon this method and re… ▽ More This paper addresses the challenge of online multi-source domain adaptation (MSDA) in transfer learning, a scenario where one needs to adapt multiple, heterogeneous source domains towards a target domain that comes in a stream. We introduce a novel approach for the online fit of a Gaussian Mixture Model (GMM), based on the Wasserstein geometry of Gaussian measures. We build upon this method and recent developments in dataset dictionary learning for proposing a novel strategy in online MSDA. Experiments on the challenging Tennessee Eastman Process benchmark demonstrate that our approach is able to adapt \emph{on the fly} to the stream of target domain data. Furthermore, our online GMM serves as a memory, representing the whole stream of data. △ Less

Submitted 29 July, 2024; originally announced July 2024.

Comments: 6 pages, 3 figures, accepted at the IEEE International Workshop on Machine Learning for Signal Processing 2024

arXiv:2407.11647 [pdf, other]

Dataset Dictionary Learning in a Wasserstein Space for Federated Domain Adaptation

Authors: Eduardo Fernandes Montesuma, Fabiola Espinoza Castellon, Fred Ngolè Mboula, Aurélien Mayoue, Antoine Souloumiac, Cédric Gouy-Pailler

Abstract: Multi-Source Domain Adaptation (MSDA) is a challenging scenario where multiple related and heterogeneous source datasets must be adapted to an unlabeled target dataset. Conventional MSDA methods often overlook that data holders may have privacy concerns, hindering direct data sharing. In response, decentralized MSDA has emerged as a promising strategy to achieve adaptation without centralizing cli… ▽ More Multi-Source Domain Adaptation (MSDA) is a challenging scenario where multiple related and heterogeneous source datasets must be adapted to an unlabeled target dataset. Conventional MSDA methods often overlook that data holders may have privacy concerns, hindering direct data sharing. In response, decentralized MSDA has emerged as a promising strategy to achieve adaptation without centralizing clients' data. Our work proposes a novel approach, Decentralized Dataset Dictionary Learning, to address this challenge. Our method leverages Wasserstein barycenters to model the distributional shift across multiple clients, enabling effective adaptation while preserving data privacy. Specifically, our algorithm expresses each client's underlying distribution as a Wasserstein barycenter of public atoms, weighted by private barycentric coordinates. Our approach ensures that the barycentric coordinates remain undisclosed throughout the adaptation process. Extensive experimentation across five visual domain adaptation benchmarks demonstrates the superiority of our strategy over existing decentralized MSDA techniques. Moreover, our method exhibits enhanced robustness to client parallelism while maintaining relative resilience compared to conventional decentralized MSDA methodologies. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: 17 pages,7 figures

arXiv:2404.10261 [pdf, other]

Lighter, Better, Faster Multi-Source Domain Adaptation with Gaussian Mixture Models and Optimal Transport

Authors: Eduardo Fernandes Montesuma, Fred Ngolè Mboula, Antoine Souloumiac

Abstract: In this paper, we tackle Multi-Source Domain Adaptation (MSDA), a task in transfer learning where one adapts multiple heterogeneous, labeled source probability measures towards a different, unlabeled target measure. We propose a novel framework for MSDA, based on Optimal Transport (OT) and Gaussian Mixture Models (GMMs). Our framework has two key advantages. First, OT between GMMs can be solved ef… ▽ More In this paper, we tackle Multi-Source Domain Adaptation (MSDA), a task in transfer learning where one adapts multiple heterogeneous, labeled source probability measures towards a different, unlabeled target measure. We propose a novel framework for MSDA, based on Optimal Transport (OT) and Gaussian Mixture Models (GMMs). Our framework has two key advantages. First, OT between GMMs can be solved efficiently via linear programming. Second, it provides a convenient model for supervised learning, especially classification, as components in the GMM can be associated with existing classes. Based on the GMM-OT problem, we propose a novel technique for calculating barycenters of GMMs. Based on this novel algorithm, we propose two new strategies for MSDA: GMM-Wasserstein Barycenter Transport (WBT) and GMM-Dataset Dictionary Learning (DaDiL). We empirically evaluate our proposed methods on four benchmarks in image classification and fault diagnosis, showing that we improve over the prior art while being faster and involving fewer parameters. Our code is publicly available at https://github.com/eddardd/gmm_msda △ Less

Submitted 21 August, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

Comments: 13 pages, 6 figures, accepted as a research track paper at the ECML-PKDD 2024 conference

arXiv:2403.13847 [pdf, other]

Optimal Transport for Domain Adaptation through Gaussian Mixture Models

Authors: Eduardo Fernandes Montesuma, Fred Maurice Ngolè Mboula, Antoine Souloumiac

Abstract: Machine learning systems operate under the assumption that training and test data are sampled from a fixed probability distribution. However, this assumptions is rarely verified in practice, as the conditions upon which data was acquired are likely to change. In this context, the adaptation of the unsupervised domain requires minimal access to the data of the new conditions for learning models rob… ▽ More Machine learning systems operate under the assumption that training and test data are sampled from a fixed probability distribution. However, this assumptions is rarely verified in practice, as the conditions upon which data was acquired are likely to change. In this context, the adaptation of the unsupervised domain requires minimal access to the data of the new conditions for learning models robust to changes in the data distribution. Optimal transport is a theoretically grounded tool for analyzing changes in distribution, especially as it allows the mapping between domains. However, these methods are usually computationally expensive as their complexity scales cubically with the number of samples. In this work, we explore optimal transport between Gaussian Mixture Models (GMMs), which is conveniently written in terms of the components of source and target GMMs. We experiment with 9 benchmarks, with a total of $85$ adaptation tasks, showing that our methods are more efficient than previous shallow domain adaptation methods, and they scale well with number of samples $n$ and dimensions $d$. △ Less

Submitted 22 January, 2025; v1 submitted 18 March, 2024; originally announced March 2024.

Comments: 29 pages, 9 figures, 8 tables, accepted at Transactions on Machine Learning Research

arXiv:2309.07670 [pdf, other]

Federated Dataset Dictionary Learning for Multi-Source Domain Adaptation

Authors: Fabiola Espinoza Castellon, Eduardo Fernandes Montesuma, Fred Ngolè Mboula, Aurélien Mayoue, Antoine Souloumiac, Cédric Gouy-Pailler

Abstract: In this article, we propose an approach for federated domain adaptation, a setting where distributional shift exists among clients and some have unlabeled data. The proposed framework, FedDaDiL, tackles the resulting challenge through dictionary learning of empirical distributions. In our setting, clients' distributions represent particular domains, and FedDaDiL collectively trains a federated dic… ▽ More In this article, we propose an approach for federated domain adaptation, a setting where distributional shift exists among clients and some have unlabeled data. The proposed framework, FedDaDiL, tackles the resulting challenge through dictionary learning of empirical distributions. In our setting, clients' distributions represent particular domains, and FedDaDiL collectively trains a federated dictionary of empirical distributions. In particular, we build upon the Dataset Dictionary Learning framework by designing collaborative communication protocols and aggregation operations. The chosen protocols keep clients' data private, thus enhancing overall privacy compared to its centralized counterpart. We empirically demonstrate that our approach successfully generates labeled data on the target domain with extensive experiments on (i) Caltech-Office, (ii) TEP, and (iii) CWRU benchmarks. Furthermore, we compare our method to its centralized counterpart and other benchmarks in federated domain adaptation. △ Less

Submitted 8 November, 2023; v1 submitted 14 September, 2023; originally announced September 2023.

Comments: 7 pages,2 figures; v2: fixed typos

arXiv:2309.07666 [pdf, other]

Multi-Source Domain Adaptation meets Dataset Distillation through Dataset Dictionary Learning

Authors: Eduardo Fernandes Montesuma, Fred Ngolè Mboula, Antoine Souloumiac

Abstract: In this paper, we consider the intersection of two problems in machine learning: Multi-Source Domain Adaptation (MSDA) and Dataset Distillation (DD). On the one hand, the first considers adapting multiple heterogeneous labeled source domains to an unlabeled target domain. On the other hand, the second attacks the problem of synthesizing a small summary containing all the information about the data… ▽ More In this paper, we consider the intersection of two problems in machine learning: Multi-Source Domain Adaptation (MSDA) and Dataset Distillation (DD). On the one hand, the first considers adapting multiple heterogeneous labeled source domains to an unlabeled target domain. On the other hand, the second attacks the problem of synthesizing a small summary containing all the information about the datasets. We thus consider a new problem called MSDA-DD. To solve it, we adapt previous works in the MSDA literature, such as Wasserstein Barycenter Transport and Dataset Dictionary Learning, as well as DD method Distribution Matching. We thoroughly experiment with this novel problem on four benchmarks (Caltech-Office 10, Tennessee-Eastman Process, Continuous Stirred Tank Reactor, and Case Western Reserve University), where we show that, even with as little as 1 sample per class, one achieves state-of-the-art adaptation performance. △ Less

Submitted 14 September, 2023; originally announced September 2023.

Comments: 7 pages,4 figures

arXiv:2308.11247 [pdf, other]

Benchmarking Domain Adaptation for Chemical Processes on the Tennessee Eastman Process

Authors: Eduardo Fernandes Montesuma, Michela Mulas, Fred Ngolè Mboula, Francesco Corona, Antoine Souloumiac

Abstract: In system monitoring, automatic fault diagnosis seeks to infer the systems' state based on sensor readings, e.g., through machine learning models. In this context, it is of key importance that, based on historical data, these systems are able to generalize to incoming data. In parallel, many factors may induce changes in the data probability distribution, hindering the possibility of such models t… ▽ More In system monitoring, automatic fault diagnosis seeks to infer the systems' state based on sensor readings, e.g., through machine learning models. In this context, it is of key importance that, based on historical data, these systems are able to generalize to incoming data. In parallel, many factors may induce changes in the data probability distribution, hindering the possibility of such models to generalize. In this sense, domain adaptation is an important framework for adapting models to different probability distributions. In this paper, we propose a new benchmark, based on the Tennessee Eastman Process of Downs and Vogel (1993), for benchmarking domain adaptation methods in the context of chemical processes. Besides describing the process, and its relevance for domain adaptation, we describe a series of data processing steps for reproducing our benchmark. We then test 11 domain adaptation strategies on this novel benchmark, showing that optimal transport-based techniques outperform other strategies. △ Less

Submitted 29 July, 2024; v1 submitted 22 August, 2023; originally announced August 2023.

Comments: 16 pages, 9 figures, 5 tables. Accepted as a Workshop paper at the ECML-PKDD 2024 conference

arXiv:2307.14953 [pdf, other]

doi 10.3233/FAIA230459

Multi-Source Domain Adaptation through Dataset Dictionary Learning in Wasserstein Space

Authors: Eduardo Fernandes Montesuma, Fred Ngolè Mboula, Antoine Souloumiac

Abstract: This paper seeks to solve Multi-Source Domain Adaptation (MSDA), which aims to mitigate data distribution shifts when transferring knowledge from multiple labeled source domains to an unlabeled target domain. We propose a novel MSDA framework based on dictionary learning and optimal transport. We interpret each domain in MSDA as an empirical distribution. As such, we express each domain as a Wasse… ▽ More This paper seeks to solve Multi-Source Domain Adaptation (MSDA), which aims to mitigate data distribution shifts when transferring knowledge from multiple labeled source domains to an unlabeled target domain. We propose a novel MSDA framework based on dictionary learning and optimal transport. We interpret each domain in MSDA as an empirical distribution. As such, we express each domain as a Wasserstein barycenter of dictionary atoms, which are empirical distributions. We propose a novel algorithm, DaDiL, for learning via mini-batches: (i) atom distributions; (ii) a matrix of barycentric coordinates. Based on our dictionary, we propose two novel methods for MSDA: DaDil-R, based on the reconstruction of labeled samples in the target domain, and DaDiL-E, based on the ensembling of classifiers learned on atom distributions. We evaluate our methods in 3 benchmarks: Caltech-Office, Office 31, and CRWU, where we improved previous state-of-the-art by 3.15%, 2.29%, and 7.71% in classification performance. Finally, we show that interpolations in the Wasserstein hull of learned atoms provide data that can generalize to the target domain. △ Less

Submitted 8 November, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

Comments: 13 pages,8 figures,Published as a conference paper at the 26th European Conference on Artificial Intelligence; v2: corrected typos

arXiv:2306.16156 [pdf, other]

Recent Advances in Optimal Transport for Machine Learning

Authors: Eduardo Fernandes Montesuma, Fred Ngolè Mboula, Antoine Souloumiac

Abstract: Recently, Optimal Transport has been proposed as a probabilistic framework in Machine Learning for comparing and manipulating probability distributions. This is rooted in its rich history and theory, and has offered new solutions to different problems in machine learning, such as generative modeling and transfer learning. In this survey we explore contributions of Optimal Transport for Machine Lea… ▽ More Recently, Optimal Transport has been proposed as a probabilistic framework in Machine Learning for comparing and manipulating probability distributions. This is rooted in its rich history and theory, and has offered new solutions to different problems in machine learning, such as generative modeling and transfer learning. In this survey we explore contributions of Optimal Transport for Machine Learning over the period 2012 -- 2023, focusing on four sub-fields of Machine Learning: supervised, unsupervised, transfer and reinforcement learning. We further highlight the recent development in computational Optimal Transport and its extensions, such as partial, unbalanced, Gromov and Neural Optimal Transport, and its interplay with Machine Learning practice. △ Less

Submitted 21 August, 2024; v1 submitted 28 June, 2023; originally announced June 2023.

Comments: 20 pages,15 figures,under review

arXiv:1910.08328 [pdf, other]

doi 10.1109/ICASSP40776.2020.9053937

OpenDenoising: an Extensible Benchmark for Building Comparative Studies of Image Denoisers

Authors: Florian Lemarchand, Eduardo Fernandes Montesuma, Maxime Pelcat, Erwan Nogues

Abstract: Image denoising has recently taken a leap forward due to machine learning. However, image denoisers, both expert-based and learning-based, are mostly tested on well-behaved generated noises (usually Gaussian) rather than on real-life noises, making performance comparisons difficult in real-world conditions. This is especially true for learning-based denoisers which performance depends on training… ▽ More Image denoising has recently taken a leap forward due to machine learning. However, image denoisers, both expert-based and learning-based, are mostly tested on well-behaved generated noises (usually Gaussian) rather than on real-life noises, making performance comparisons difficult in real-world conditions. This is especially true for learning-based denoisers which performance depends on training data. Hence, choosing which method to use for a specific denoising problem is difficult. This paper proposes a comparative study of existing denoisers, as well as an extensible open tool that makes it possible to reproduce and extend the study. MWCNN is shown to outperform other methods when trained for a real-world image interception noise, and additionally is the second least compute hungry of the tested methods. To evaluate the robustness of conclusions, three test sets are compared. A Kendall's Tau correlation of only 60% is obtained on methods ranking between noise types, demonstrating the need for a benchmarking tool. △ Less

Submitted 18 October, 2019; originally announced October 2019.

Showing 1–14 of 14 results for author: Montesuma, E F