Bias in Data-driven AI Systems -- An Introductory Survey
Authors:
Eirini Ntoutsi,
Pavlos Fafalios,
Ujwal Gadiraju,
Vasileios Iosifidis,
Wolfgang Nejdl,
Maria-Esther Vidal,
Salvatore Ruggieri,
Franco Turini,
Symeon Papadopoulos,
Emmanouil Krasanakis,
Ioannis Kompatsiaris,
Katharina Kinder-Kurlanda,
Claudia Wagner,
Fariba Karimi,
Miriam Fernandez,
Harith Alani,
Bettina Berendt,
Tina Kruegel,
Christian Heinze,
Klaus Broelemann,
Gjergji Kasneci,
Thanassis Tiropanis,
Steffen Staab
Abstract:
AI-based systems are widely employed nowadays to make decisions that have far-reaching impacts on individuals and society. Their decisions might affect everyone, everywhere and anytime, entailing concerns about potential human rights issues. Therefore, it is necessary to move beyond traditional AI algorithms optimized for predictive performance and embed ethical and legal principles in their desig…
▽ More
AI-based systems are widely employed nowadays to make decisions that have far-reaching impacts on individuals and society. Their decisions might affect everyone, everywhere and anytime, entailing concerns about potential human rights issues. Therefore, it is necessary to move beyond traditional AI algorithms optimized for predictive performance and embed ethical and legal principles in their design, training and deployment to ensure social good while still benefiting from the huge potential of the AI technology. The goal of this survey is to provide a broad multi-disciplinary overview of the area of bias in AI systems, focusing on technical challenges and solutions as well as to suggest new research directions towards approaches well-grounded in a legal frame. In this survey, we focus on data-driven AI, as a large part of AI is powered nowadays by (big) data and powerful Machine Learning (ML) algorithms. If otherwise not specified, we use the general term bias to describe problems related to the gathering or processing of data that might result in prejudiced decisions on the bases of demographic features like race, sex, etc.
△ Less
Submitted 14 January, 2020;
originally announced January 2020.
DUAL-LOCO: Distributing Statistical Estimation Using Random Projections
Authors:
Christina Heinze,
Brian McWilliams,
Nicolai Meinshausen
Abstract:
We present DUAL-LOCO, a communication-efficient algorithm for distributed statistical estimation. DUAL-LOCO assumes that the data is distributed according to the features rather than the samples. It requires only a single round of communication where low-dimensional random projections are used to approximate the dependences between features available to different workers. We show that DUAL-LOCO ha…
▽ More
We present DUAL-LOCO, a communication-efficient algorithm for distributed statistical estimation. DUAL-LOCO assumes that the data is distributed according to the features rather than the samples. It requires only a single round of communication where low-dimensional random projections are used to approximate the dependences between features available to different workers. We show that DUAL-LOCO has bounded approximation error which only depends weakly on the number of workers. We compare DUAL-LOCO against a state-of-the-art distributed optimization method on a variety of real world datasets and show that it obtains better speedups while retaining good accuracy.
△ Less
Submitted 8 January, 2016; v1 submitted 8 June, 2015;
originally announced June 2015.