Learning from data with structured missingness
Authors:
Robin Mitra,
Sarah F. McGough,
Tapabrata Chakraborti,
Chris Holmes,
Ryan Copping,
Niels Hagenbuch,
Stefanie Biedermann,
Jack Noonan,
Brieuc Lehmann,
Aditi Shenvi,
Xuan Vinh Doan,
David Leslie,
Ginestra Bianconi,
Ruben Sanchez-Garcia,
Alisha Davies,
Maxine Mackintosh,
Eleni-Rosalina Andrinopoulou,
Anahid Basiri,
Chris Harbron,
Ben D. MacArthur
Abstract:
Missing data are an unavoidable complication in many machine learning tasks. When data are `missing at random' there exist a range of tools and techniques to deal with the issue. However, as machine learning studies become more ambitious, and seek to learn from ever-larger volumes of heterogeneous data, an increasingly encountered problem arises in which missing values exhibit an association or st…
▽ More
Missing data are an unavoidable complication in many machine learning tasks. When data are `missing at random' there exist a range of tools and techniques to deal with the issue. However, as machine learning studies become more ambitious, and seek to learn from ever-larger volumes of heterogeneous data, an increasingly encountered problem arises in which missing values exhibit an association or structure, either explicitly or implicitly. Such `structured missingness' raises a range of challenges that have not yet been systematically addressed, and presents a fundamental hindrance to machine learning at scale. Here, we outline the current literature and propose a set of grand challenges in learning from data with structured missingness.
△ Less
Submitted 3 April, 2023;
originally announced April 2023.