-
What is Fair? Exploring Pareto-Efficiency for Fairness Constrained Classifiers
Authors:
Ananth Balashankar,
Alyssa Lees,
Chris Welty,
Lakshminarayanan Subramanian
Abstract:
The potential for learned models to amplify existing societal biases has been broadly recognized. Fairness-aware classifier constraints, which apply equality metrics of performance across subgroups defined on sensitive attributes such as race and gender, seek to rectify inequity but can yield non-uniform degradation in performance for skewed datasets. In certain domains, imbalanced degradation of…
▽ More
The potential for learned models to amplify existing societal biases has been broadly recognized. Fairness-aware classifier constraints, which apply equality metrics of performance across subgroups defined on sensitive attributes such as race and gender, seek to rectify inequity but can yield non-uniform degradation in performance for skewed datasets. In certain domains, imbalanced degradation of performance can yield another form of unintentional bias. In the spirit of constructing fairness-aware algorithms as societal imperative, we explore an alternative: Pareto-Efficient Fairness (PEF). Theoretically, we prove that PEF identifies the operating point on the Pareto curve of subgroup performances closest to the fairness hyperplane, maximizing multiple subgroup accuracy. Empirically we demonstrate that PEF outperforms by achieving Pareto levels in accuracy for all subgroups compared to strict fairness constraints in several UCI datasets.
△ Less
Submitted 30 October, 2019;
originally announced October 2019.
-
Mammography Assessment using Multi-Scale Deep Classifiers
Authors:
Ulzee An,
Khader Shameer,
Lakshmi Subramanian
Abstract:
Applying deep learning methods to mammography assessment has remained a challenging topic. Dense noise with sparse expressions, mega-pixel raw data resolution, lack of diverse examples have all been factors affecting performance. The lack of pixel-level ground truths have especially limited segmentation methods in pushing beyond approximately bounding regions. We propose a classification approach…
▽ More
Applying deep learning methods to mammography assessment has remained a challenging topic. Dense noise with sparse expressions, mega-pixel raw data resolution, lack of diverse examples have all been factors affecting performance. The lack of pixel-level ground truths have especially limited segmentation methods in pushing beyond approximately bounding regions. We propose a classification approach grounded in high performance tissue assessment as an alternative to all-in-one localization and assessment models that is also capable of pinpointing the causal pixels. First, the objective of the mammography assessment task is formalized in the context of local tissue classifiers. Then, the accuracy of a convolutional neural net is evaluated on classifying patches of tissue with suspicious findings at varying scales, where highest obtained AUC is above $0.9$. The local evaluations of one such expert tissue classifier is used to augment the results of a heatmap regression model and additionally recover the exact causal regions at high resolution as a saliency image suitable for clinical settings.
△ Less
Submitted 29 June, 2018;
originally announced July 2018.
-
A Model-based Projection Technique for Segmenting Customers
Authors:
Srikanth Jagabathula,
Lakshminarayanan Subramanian,
Ashwin Venkataraman
Abstract:
We consider the problem of segmenting a large population of customers into non-overlapping groups with similar preferences, using diverse preference observations such as purchases, ratings, clicks, etc. over subsets of items. We focus on the setting where the universe of items is large (ranging from thousands to millions) and unstructured (lacking well-defined attributes) and each customer provide…
▽ More
We consider the problem of segmenting a large population of customers into non-overlapping groups with similar preferences, using diverse preference observations such as purchases, ratings, clicks, etc. over subsets of items. We focus on the setting where the universe of items is large (ranging from thousands to millions) and unstructured (lacking well-defined attributes) and each customer provides observations for only a few items. These data characteristics limit the applicability of existing techniques in marketing and machine learning. To overcome these limitations, we propose a model-based projection technique, which transforms the diverse set of observations into a more comparable scale and deals with missing data by projecting the transformed data onto a low-dimensional space. We then cluster the projected data to obtain the customer segments. Theoretically, we derive precise necessary and sufficient conditions that guarantee asymptotic recovery of the true customer segments. Empirically, we demonstrate the speed and performance of our method in two real-world case studies: (a) 84% improvement in the accuracy of new movie recommendations on the MovieLens data set and (b) 6% improvement in the performance of similar item recommendations algorithm on an offline dataset at eBay. We show that our method outperforms standard latent-class and demographic-based techniques.
△ Less
Submitted 25 January, 2017;
originally announced January 2017.