Missing Data Imputation using Optimal Transport

Muzellec, Boris; Josse, Julie; Boyer, Claire; Cuturi, Marco

Statistics > Machine Learning

arXiv:2002.03860 (stat)

[Submitted on 10 Feb 2020 (v1), last revised 1 Jul 2020 (this version, v3)]

Title:Missing Data Imputation using Optimal Transport

Authors:Boris Muzellec, Julie Josse, Claire Boyer, Marco Cuturi

View PDF

Abstract:Missing data is a crucial issue when applying machine learning algorithms to real-world datasets. Starting from the simple assumption that two batches extracted randomly from the same dataset should share the same distribution, we leverage optimal transport distances to quantify that criterion and turn it into a loss function to impute missing data values. We propose practical methods to minimize these losses using end-to-end learning, that can exploit or not parametric assumptions on the underlying distributions of values. We evaluate our methods on datasets from the UCI repository, in MCAR, MAR and MNAR settings. These experiments show that OT-based methods match or out-perform state-of-the-art imputation methods, even for high percentages of missing values.

Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2002.03860 [stat.ML]
	(or arXiv:2002.03860v3 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2002.03860

Submission history

From: Boris Muzellec [view email]
[v1] Mon, 10 Feb 2020 15:23:42 UTC (533 KB)
[v2] Wed, 26 Feb 2020 09:06:16 UTC (752 KB)
[v3] Wed, 1 Jul 2020 09:16:41 UTC (2,865 KB)

Full-text links:

Access Paper:

view license

Current browse context:

stat.ML

< prev | next >

new | recent | 2020-02

Change to browse by:

cs
cs.LG
stat

References & Citations

export BibTeX citation

Statistics > Machine Learning

Title:Missing Data Imputation using Optimal Transport

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Missing Data Imputation using Optimal Transport

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators