-
Impact Of Missing Data Imputation On The Fairness And Accuracy Of Graph Node Classifiers
Authors:
Haris Mansoor,
Sarwan Ali,
Shafiq Alam,
Muhammad Asad Khan,
Umair ul Hassan,
Imdadullah Khan
Abstract:
Analysis of the fairness of machine learning (ML) algorithms recently attracted many researchers' interest. Most ML methods show bias toward protected groups, which limits the applicability of ML models in many applications like crime rate prediction etc. Since the data may have missing values which, if not appropriately handled, are known to further harmfully affect fairness. Many imputation meth…
▽ More
Analysis of the fairness of machine learning (ML) algorithms recently attracted many researchers' interest. Most ML methods show bias toward protected groups, which limits the applicability of ML models in many applications like crime rate prediction etc. Since the data may have missing values which, if not appropriately handled, are known to further harmfully affect fairness. Many imputation methods are proposed to deal with missing data. However, the effect of missing data imputation on fairness is not studied well. In this paper, we analyze the effect on fairness in the context of graph data (node attributes) imputation using different embedding and neural network methods. Extensive experiments on six datasets demonstrate severe fairness issues in missing data imputation under graph node classification. We also find that the choice of the imputation method affects both fairness and accuracy. Our results provide valuable insights into graph data fairness and how to handle missingness in graphs efficiently. This work also provides directions regarding theoretical studies on fairness in graph data.
△ Less
Submitted 1 November, 2022;
originally announced November 2022.
-
Efficient Data Analytics on Augmented Similarity Triplets
Authors:
Sarwan Ali,
Muhammad Ahmad,
Umair ul Hassan,
Muhammad Asad Khan,
Shafiq Alam,
Imdadullah Khan
Abstract:
Data analysis require a pairwise proximity measure over objects. Recent work has extended this to situations where the distance information between objects is given as comparison results of distances between three objects (triplets). Humans find the comparison tasks much easier than the exact distance computation and such data can be easily obtained in big quantity via crowd-sourcing. In this work…
▽ More
Data analysis require a pairwise proximity measure over objects. Recent work has extended this to situations where the distance information between objects is given as comparison results of distances between three objects (triplets). Humans find the comparison tasks much easier than the exact distance computation and such data can be easily obtained in big quantity via crowd-sourcing. In this work, we propose triplets augmentation, an efficient method to extend the triplets data by inferring the hidden implicit information form the existing data. Triplets augmentation improves the quality of kernel-based and kernel-free data analytics. We also propose a novel set of algorithms for common data analysis tasks based on triplets. These methods work directly with triplets and avoid kernel evaluations, thus are scalable to big data. We demonstrate that our methods outperform the current best-known techniques and are robust to noisy data.
△ Less
Submitted 18 February, 2023; v1 submitted 27 December, 2019;
originally announced December 2019.
-
Societal impacts of big data: challenges and opportunities in Europe
Authors:
Martà Cuquet,
Guillermo Vega-Gorgojo,
Hans Lammerant,
Rachel Finn,
Umair ul Hassan
Abstract:
This paper presents the risks and opportunities of big data and the potential social benefits it can bring. The research is based on an analysis of the societal impacts observed in a set of six case studies across different European sectors. These impacts are divided into economic, social and ethical, legal and political impacts, and affect areas such as improved efficiency, innovation and decisio…
▽ More
This paper presents the risks and opportunities of big data and the potential social benefits it can bring. The research is based on an analysis of the societal impacts observed in a set of six case studies across different European sectors. These impacts are divided into economic, social and ethical, legal and political impacts, and affect areas such as improved efficiency, innovation and decision making, changing business models, dependency on public funding, participation, equality, discrimination and trust, data protection and intellectual property rights, private and public tensions and losing control to actors abroad. A special focus is given to the risks and opportunities coming from the legal framework and how to counter the negative impacts of big data. Recommendations are presented for four specific legal frameworks: copyright and database protection, protection of trade secrets, privacy and data protection and anti-discrimination. In addition, the potential social benefits of big data are exemplified in six domains: improved decision making and event detection; data-driven innovations and new business models; direct social, environmental and other citizen benefits; citizen participation, transparency and public trust; privacy-aware data practices; and big data for identifying discrimination. Several best practices are suggested to capture these benefits.
△ Less
Submitted 11 April, 2017;
originally announced April 2017.