-
Equalizing Closeness Centralities via Edge Additions
Authors:
Alex Crane,
Sorelle A. Friedler,
Mihir Patel,
Blair D. Sullivan
Abstract:
Graph modification problems with the goal of optimizing some measure of a given node's network position have a rich history in the algorithms literature. Less commonly explored are modification problems with the goal of equalizing positions, though this class of problems is well-motivated from the perspective of equalizing social capital, i.e., algorithmic fairness. In this work, we study how to a…
▽ More
Graph modification problems with the goal of optimizing some measure of a given node's network position have a rich history in the algorithms literature. Less commonly explored are modification problems with the goal of equalizing positions, though this class of problems is well-motivated from the perspective of equalizing social capital, i.e., algorithmic fairness. In this work, we study how to add edges to make the closeness centralities of a given pair of nodes more equal. We formalize two versions of this problem: Closeness Ratio Improvement, which aims to maximize the ratio of closeness centralities between two specified nodes, and Closeness Gap Minimization, which aims to minimize the absolute difference of centralities. We show that both problems are $\textsf{NP}$-hard, and for Closeness Ratio Improvement we present a quasilinear-time $\frac{6}{11}$-approximation, complemented by a bicriteria inapproximability bound. In contrast, we show that Closeness Gap Minimization admits no multiplicative approximation unless $\textsf{P} = \textsf{NP}$. We conclude with a discussion of open directions for this style of problem, including several natural generalizations.
△ Less
Submitted 9 May, 2025;
originally announced May 2025.
-
Feature Responsiveness Scores: Model-Agnostic Explanations for Recourse
Authors:
Seung Hyun Cheon,
Anneke Wernerfelt,
Sorelle A. Friedler,
Berk Ustun
Abstract:
Machine learning models routinely automate decisions in applications like lending and hiring. In such settings, consumer protection rules require companies that deploy models to explain predictions to decision subjects. These rules are motivated, in part, by the belief that explanations can promote recourse by revealing information that individuals can use to contest or improve their outcomes. In…
▽ More
Machine learning models routinely automate decisions in applications like lending and hiring. In such settings, consumer protection rules require companies that deploy models to explain predictions to decision subjects. These rules are motivated, in part, by the belief that explanations can promote recourse by revealing information that individuals can use to contest or improve their outcomes. In practice, many companies comply with these rules by providing individuals with a list of the most important features for their prediction, which they identify based on feature importance scores from feature attribution methods such as SHAP or LIME. In this work, we show how these practices can undermine consumers by highlighting features that would not lead to an improved outcome and by explaining predictions that cannot be changed. We propose to address these issues by highlighting features based on their responsiveness score -- i.e., the probability that an individual can attain a target prediction by changing a specific feature. We develop efficient methods to compute responsiveness scores for any model and any dataset. We conduct an extensive empirical study on the responsiveness of explanations in lending. Our results show that standard practices in consumer finance can backfire by presenting consumers with reasons without recourse, and demonstrate how our approach improves consumer protection by highlighting responsive features and identifying fixed predictions.
△ Less
Submitted 28 March, 2025; v1 submitted 29 October, 2024;
originally announced October 2024.
-
Identity-related Speech Suppression in Generative AI Content Moderation
Authors:
Oghenefejiro Isaacs Anigboro,
Charlie M. Crawford,
Grace Proebsting,
Danaƫ Metaxa,
Sorelle A. Friedler
Abstract:
Automated content moderation has long been used to help identify and filter undesired user-generated content online. Generative AI systems now use such filters to keep undesired generated content from being created by or shown to users. From classrooms to Hollywood, as generative AI is increasingly used for creative or expressive text generation, whose stories will these technologies allow to be t…
▽ More
Automated content moderation has long been used to help identify and filter undesired user-generated content online. Generative AI systems now use such filters to keep undesired generated content from being created by or shown to users. From classrooms to Hollywood, as generative AI is increasingly used for creative or expressive text generation, whose stories will these technologies allow to be told, and whose will they suppress? In this paper, we define and introduce measures of speech suppression, focusing on speech related to different identity groups incorrectly filtered by a range of content moderation APIs. Using both short-form, user-generated datasets traditional in content moderation and longer generative AI-focused data, including two datasets we introduce in this work, we create a benchmark for measurement of speech suppression for nine identity groups. Across one traditional and four generative AI-focused automated content moderation services tested, we find that identity-related speech is more likely to be incorrectly suppressed than other speech. We find differences in identity-related speech suppression for traditional versus generative AI data, with APIs performing better on generative AI data but worse on longer text instances, and by identity, with identity-specific reasons for incorrect flagging behavior. Overall, we find that on traditional short-form data incorrectly suppressed speech is likely to be political, while for generative AI creative data it is likely to be television violence.
△ Less
Submitted 5 April, 2025; v1 submitted 9 September, 2024;
originally announced September 2024.
-
Fast algorithms to improve fair information access in networks
Authors:
Dennis Robert Windham,
Caroline J. Wendt,
Alex Crane,
Madelyn J Warr,
Freda Shi,
Sorelle A. Friedler,
Blair D. Sullivan,
Aaron Clauset
Abstract:
We consider the problem of selecting $k$ seed nodes in a network to maximize the minimum probability of activation under an independent cascade beginning at these seeds. The motivation is to promote fairness by ensuring that even the least advantaged members of the network have good access to information. Our problem can be viewed as a variant of the classic influence maximization objective, but i…
▽ More
We consider the problem of selecting $k$ seed nodes in a network to maximize the minimum probability of activation under an independent cascade beginning at these seeds. The motivation is to promote fairness by ensuring that even the least advantaged members of the network have good access to information. Our problem can be viewed as a variant of the classic influence maximization objective, but it appears somewhat more difficult to solve: only heuristics are known. Moreover, the scalability of these methods is sharply constrained by the need to repeatedly estimate access probabilities.
We design and evaluate a suite of $10$ new scalable algorithms which crucially do not require probability estimation. To facilitate comparison with the state-of-the-art, we make three more contributions which may be of broader interest. We introduce a principled method of selecting a pairwise information transmission parameter used in experimental evaluations, as well as a new performance metric which allows for comparison of algorithms across a range of values for the parameter $k$. Finally, we provide a new benchmark corpus of $174$ networks drawn from $6$ domains. Our algorithms retain most of the performance of the state-of-the-art while reducing running time by orders of magnitude. Specifically, a meta-learner approach is on average only $20\%$ less effective than the state-of-the-art on held-out data, but about $75-130$ times faster. Further, the meta-learner's performance exceeds the state-of the-art on about $20\%$ of networks, and the magnitude of its running time advantage is maintained on much larger networks.
△ Less
Submitted 19 February, 2025; v1 submitted 4 September, 2024;
originally announced September 2024.
-
Reducing Access Disparities in Networks using Edge Augmentation
Authors:
Ashkan Bashardoust,
Sorelle A. Friedler,
Carlos E. Scheidegger,
Blair D. Sullivan,
Suresh Venkatasubramanian
Abstract:
In social networks, a node's position is a form of \it{social capital}. Better-positioned members not only benefit from (faster) access to diverse information, but innately have more potential influence on information spread. Structural biases often arise from network formation, and can lead to significant disparities in information access based on position. Further, processes such as link recomme…
▽ More
In social networks, a node's position is a form of \it{social capital}. Better-positioned members not only benefit from (faster) access to diverse information, but innately have more potential influence on information spread. Structural biases often arise from network formation, and can lead to significant disparities in information access based on position. Further, processes such as link recommendation can exacerbate this inequality by relying on network structure to augment connectivity.
We argue that one can understand and quantify this social capital through the lens of information flow in the network. We consider the setting where all nodes may be sources of distinct information, and a node's (dis)advantage deems its ability to access all information available on the network. We introduce three new measures of advantage (broadcast, influence, and control), which are quantified in terms of position in the network using \it{access signatures} -- vectors that represent a node's ability to share information. We then consider the problem of improving equity by making interventions to increase the access of the least-advantaged nodes. We argue that edge augmentation is most appropriate for mitigating bias in the network structure, and frame a budgeted intervention problem for maximizing minimum pairwise access.
Finally, we propose heuristic strategies for selecting edge augmentations and empirically evaluate their performance on a corpus of real-world social networks. We demonstrate that a small number of interventions significantly increase the broadcast measure of access for the least-advantaged nodes (over 5 times more than random), and also improve the minimum influence. Additional analysis shows that these interventions can also dramatically shrink the gap in advantage between nodes (over \%82) and reduce disparities between their access signatures.
△ Less
Submitted 15 September, 2022;
originally announced September 2022.
-
Measuring and mitigating voting access disparities: a study of race and polling locations in Florida and North Carolina
Authors:
Mohsen Abbasi,
Suresh Venkatasubramanian,
Sorelle A. Friedler,
Kristian Lum,
Calvin Barrett
Abstract:
Voter suppression and associated racial disparities in access to voting are long-standing civil rights concerns in the United States. Barriers to voting have taken many forms over the decades. A history of violent explicit discouragement has shifted to more subtle access limitations that can include long lines and wait times, long travel times to reach a polling station, and other logistical barri…
▽ More
Voter suppression and associated racial disparities in access to voting are long-standing civil rights concerns in the United States. Barriers to voting have taken many forms over the decades. A history of violent explicit discouragement has shifted to more subtle access limitations that can include long lines and wait times, long travel times to reach a polling station, and other logistical barriers to voting. Our focus in this work is on quantifying disparities in voting access pertaining to the overall time-to-vote, and how they could be remedied via a better choice of polling location or provisioning more sites where voters can cast ballots. However, appropriately calibrating access disparities is difficult because of the need to account for factors such as population density and different community expectations for reasonable travel times.
In this paper, we quantify access to polling locations, developing a methodology for the calibrated measurement of racial disparities in polling location "load" and distance to polling locations. We apply this methodology to a study of real-world data from Florida and North Carolina to identify disparities in voting access from the 2020 election. We also introduce algorithms, with modifications to handle scale, that can reduce these disparities by suggesting new polling locations from a given list of identified public locations (including schools and libraries). Applying these algorithms on the 2020 election location data also helps to expose and explore tradeoffs between the cost of allocating more polling locations and the potential impact on access disparities. The developed voting access measurement methodology and algorithmic remediation technique is a first step in better polling location assignment.
△ Less
Submitted 30 May, 2022;
originally announced May 2022.
-
Information access representations and social capital in networks
Authors:
Ashkan Bashardoust,
Hannah C. Beilinson,
Sorelle A. Friedler,
Jiajie Ma,
Jade Rousseau,
Carlos E. Scheidegger,
Blair D. Sullivan,
Nasanbayar Ulzii-Orshikh,
Suresh Venkatasubramanian
Abstract:
Social network position confers power and social capital. In the setting of online social networks that have massive reach, creating mathematical representations of social capital is an important step towards understanding how network position can differentially confer advantage to different groups and how network position can itself be a source of advantage. In this paper, we use well established…
▽ More
Social network position confers power and social capital. In the setting of online social networks that have massive reach, creating mathematical representations of social capital is an important step towards understanding how network position can differentially confer advantage to different groups and how network position can itself be a source of advantage. In this paper, we use well established models for information flow on networks as a base to propose a formal descriptor of the network position of a node as represented by its information access. Combining these descriptors allows a full representation of social capital across the network. Using real-world networks, we demonstrate that this representation allows the identification of differences between groups based on network specific measures of inequality of access.
△ Less
Submitted 16 October, 2023; v1 submitted 23 October, 2020;
originally announced October 2020.
-
Energy Usage Reports: Environmental awareness as part of algorithmic accountability
Authors:
Kadan Lottick,
Silvia Susai,
Sorelle A. Friedler,
Jonathan P. Wilson
Abstract:
The carbon footprint of algorithms must be measured and transparently reported so computer scientists can take an honest and active role in environmental sustainability. In this paper, we take analyses usually applied at the industrial level and make them accessible for individual computer science researchers with an easy-to-use Python package. Localizing to the energy mixture of the electrical po…
▽ More
The carbon footprint of algorithms must be measured and transparently reported so computer scientists can take an honest and active role in environmental sustainability. In this paper, we take analyses usually applied at the industrial level and make them accessible for individual computer science researchers with an easy-to-use Python package. Localizing to the energy mixture of the electrical power grid, we make the conversion from energy usage to CO2 emissions, in addition to contextualizing these results with more human-understandable benchmarks such as automobile miles driven. We also include comparisons with energy mixtures employed in electrical grids around the world. We propose including these automatically-generated Energy Usage Reports as part of standard algorithmic accountability practices, and demonstrate the use of these reports as part of model-choice in a machine learning context.
△ Less
Submitted 16 December, 2019; v1 submitted 19 November, 2019;
originally announced November 2019.
-
Disentangling Influence: Using Disentangled Representations to Audit Model Predictions
Authors:
Charles T. Marx,
Richard Lanas Phillips,
Sorelle A. Friedler,
Carlos Scheidegger,
Suresh Venkatasubramanian
Abstract:
Motivated by the need to audit complex and black box models, there has been extensive research on quantifying how data features influence model predictions. Feature influence can be direct (a direct influence on model outcomes) and indirect (model outcomes are influenced via proxy features). Feature influence can also be expressed in aggregate over the training or test data or locally with respect…
▽ More
Motivated by the need to audit complex and black box models, there has been extensive research on quantifying how data features influence model predictions. Feature influence can be direct (a direct influence on model outcomes) and indirect (model outcomes are influenced via proxy features). Feature influence can also be expressed in aggregate over the training or test data or locally with respect to a single point. Current research has typically focused on one of each of these dimensions. In this paper, we develop disentangled influence audits, a procedure to audit the indirect influence of features. Specifically, we show that disentangled representations provide a mechanism to identify proxy features in the dataset, while allowing an explicit computation of feature influence on either individual outcomes or aggregate-level outcomes. We show through both theory and experiments that disentangled influence audits can both detect proxy features and show, for each individual or in aggregate, which of these proxy features affects the classifier being audited the most. In this respect, our method is more powerful than existing methods for ascertaining feature influence.
△ Less
Submitted 20 June, 2019;
originally announced June 2019.
-
Gaps in Information Access in Social Networks
Authors:
Benjamin Fish,
Ashkan Bashardoust,
danah boyd,
Sorelle A. Friedler,
Carlos Scheidegger,
Suresh Venkatasubramanian
Abstract:
The study of influence maximization in social networks has largely ignored disparate effects these algorithms might have on the individuals contained in the social network. Individuals may place a high value on receiving information, e.g. job openings or advertisements for loans. While well-connected individuals at the center of the network are likely to receive the information that is being distr…
▽ More
The study of influence maximization in social networks has largely ignored disparate effects these algorithms might have on the individuals contained in the social network. Individuals may place a high value on receiving information, e.g. job openings or advertisements for loans. While well-connected individuals at the center of the network are likely to receive the information that is being distributed through the network, poorly connected individuals are systematically less likely to receive the information, producing a gap in access to the information between individuals. In this work, we study how best to spread information in a social network while minimizing this access gap. We propose to use the maximin social welfare function as an objective function, where we maximize the minimum probability of receiving the information under an intervention. We prove that in this setting this welfare function constrains the access gap whereas maximizing the expected number of nodes reached does not. We also investigate the difficulties of using the maximin, and present hardness results and analysis for standard greedy strategies. Finally, we investigate practical ways of optimizing for the maximin, and give empirical evidence that a simple greedy-based strategy works well in practice.
△ Less
Submitted 5 March, 2019;
originally announced March 2019.
-
Assessing the Local Interpretability of Machine Learning Models
Authors:
Dylan Slack,
Sorelle A. Friedler,
Carlos Scheidegger,
Chitradeep Dutta Roy
Abstract:
The increasing adoption of machine learning tools has led to calls for accountability via model interpretability. But what does it mean for a machine learning model to be interpretable by humans, and how can this be assessed? We focus on two definitions of interpretability that have been introduced in the machine learning literature: simulatability (a user's ability to run a model on a given input…
▽ More
The increasing adoption of machine learning tools has led to calls for accountability via model interpretability. But what does it mean for a machine learning model to be interpretable by humans, and how can this be assessed? We focus on two definitions of interpretability that have been introduced in the machine learning literature: simulatability (a user's ability to run a model on a given input) and "what if" local explainability (a user's ability to correctly determine a model's prediction under local changes to the input, given knowledge of the model's original prediction). Through a user study with 1,000 participants, we test whether humans perform well on tasks that mimic the definitions of simulatability and "what if" local explainability on models that are typically considered locally interpretable. To track the relative interpretability of models, we employ a simple metric, the runtime operation count on the simulatability task. We find evidence that as the number of operations increases, participant accuracy on the local interpretability tasks decreases. In addition, this evidence is consistent with the common intuition that decision trees and logistic regression models are interpretable and are more interpretable than neural networks.
△ Less
Submitted 2 August, 2019; v1 submitted 9 February, 2019;
originally announced February 2019.
-
Fairness in representation: quantifying stereotyping as a representational harm
Authors:
Mohsen Abbasi,
Sorelle A. Friedler,
Carlos Scheidegger,
Suresh Venkatasubramanian
Abstract:
While harms of allocation have been increasingly studied as part of the subfield of algorithmic fairness, harms of representation have received considerably less attention. In this paper, we formalize two notions of stereotyping and show how they manifest in later allocative harms within the machine learning pipeline. We also propose mitigation strategies and demonstrate their effectiveness on syn…
▽ More
While harms of allocation have been increasingly studied as part of the subfield of algorithmic fairness, harms of representation have received considerably less attention. In this paper, we formalize two notions of stereotyping and show how they manifest in later allocative harms within the machine learning pipeline. We also propose mitigation strategies and demonstrate their effectiveness on synthetic datasets.
△ Less
Submitted 28 January, 2019;
originally announced January 2019.
-
A comparative study of fairness-enhancing interventions in machine learning
Authors:
Sorelle A. Friedler,
Carlos Scheidegger,
Suresh Venkatasubramanian,
Sonam Choudhary,
Evan P. Hamilton,
Derek Roth
Abstract:
Computers are increasingly used to make decisions that have significant impact in people's lives. Often, these predictions can affect different population subgroups disproportionately. As a result, the issue of fairness has received much recent interest, and a number of fairness-enhanced classifiers and predictors have appeared in the literature. This paper seeks to study the following questions:…
▽ More
Computers are increasingly used to make decisions that have significant impact in people's lives. Often, these predictions can affect different population subgroups disproportionately. As a result, the issue of fairness has received much recent interest, and a number of fairness-enhanced classifiers and predictors have appeared in the literature. This paper seeks to study the following questions: how do these different techniques fundamentally compare to one another, and what accounts for the differences? Specifically, we seek to bring attention to many under-appreciated aspects of such fairness-enhancing interventions. Concretely, we present the results of an open benchmark we have developed that lets us compare a number of different algorithms under a variety of fairness measures, and a large number of existing datasets. We find that although different algorithms tend to prefer specific formulations of fairness preservations, many of these measures strongly correlate with one another. In addition, we find that fairness-preserving algorithms tend to be sensitive to fluctuations in dataset composition (simulated in our benchmark by varying training-test splits), indicating that fairness interventions might be more brittle than previously thought.
△ Less
Submitted 12 February, 2018;
originally announced February 2018.
-
Interpretable Active Learning
Authors:
Richard L. Phillips,
Kyu Hyun Chang,
Sorelle A. Friedler
Abstract:
Active learning has long been a topic of study in machine learning. However, as increasingly complex and opaque models have become standard practice, the process of active learning, too, has become more opaque. There has been little investigation into interpreting what specific trends and patterns an active learning strategy may be exploring. This work expands on the Local Interpretable Model-agno…
▽ More
Active learning has long been a topic of study in machine learning. However, as increasingly complex and opaque models have become standard practice, the process of active learning, too, has become more opaque. There has been little investigation into interpreting what specific trends and patterns an active learning strategy may be exploring. This work expands on the Local Interpretable Model-agnostic Explanations framework (LIME) to provide explanations for active learning recommendations. We demonstrate how LIME can be used to generate locally faithful explanations for an active learning strategy, and how these explanations can be used to understand how different models and datasets explore a problem space over time. In order to quantify the per-subgroup differences in how an active learning strategy queries spatial regions, we introduce a notion of uncertainty bias (based on disparate impact) to measure the discrepancy in the confidence for a model's predictions between one subgroup and another. Using the uncertainty bias measure, we show that our query explanations accurately reflect the subgroup focus of the active learning queries, allowing for an interpretable explanation of what is being learned as points with similar sources of uncertainty have their uncertainty bias resolved. We demonstrate that this technique can be applied to track uncertainty bias over user-defined clusters or automatically generated clusters based on the source of uncertainty.
△ Less
Submitted 23 June, 2018; v1 submitted 31 July, 2017;
originally announced August 2017.
-
Runaway Feedback Loops in Predictive Policing
Authors:
Danielle Ensign,
Sorelle A. Friedler,
Scott Neville,
Carlos Scheidegger,
Suresh Venkatasubramanian
Abstract:
Predictive policing systems are increasingly used to determine how to allocate police across a city in order to best prevent crime. Discovered crime data (e.g., arrest counts) are used to help update the model, and the process is repeated. Such systems have been empirically shown to be susceptible to runaway feedback loops, where police are repeatedly sent back to the same neighborhoods regardless…
▽ More
Predictive policing systems are increasingly used to determine how to allocate police across a city in order to best prevent crime. Discovered crime data (e.g., arrest counts) are used to help update the model, and the process is repeated. Such systems have been empirically shown to be susceptible to runaway feedback loops, where police are repeatedly sent back to the same neighborhoods regardless of the true crime rate.
In response, we develop a mathematical model of predictive policing that proves why this feedback loop occurs, show empirically that this model exhibits such problems, and demonstrate how to change the inputs to a predictive policing system (in a black-box manner) so the runaway feedback loop does not occur, allowing the true crime rate to be learned. Our results are quantitative: we can establish a link (in our model) between the degree to which runaway feedback causes problems and the disparity in crime rates between areas. Moreover, we can also demonstrate the way in which \emph{reported} incidents of crime (those reported by residents) and \emph{discovered} incidents of crime (i.e. those directly observed by police officers dispatched as a result of the predictive policing algorithm) interact: in brief, while reported incidents can attenuate the degree of runaway feedback, they cannot entirely remove it without the interventions we suggest.
△ Less
Submitted 21 December, 2017; v1 submitted 29 June, 2017;
originally announced June 2017.
-
On the (im)possibility of fairness
Authors:
Sorelle A. Friedler,
Carlos Scheidegger,
Suresh Venkatasubramanian
Abstract:
What does it mean for an algorithm to be fair? Different papers use different notions of algorithmic fairness, and although these appear internally consistent, they also seem mutually incompatible. We present a mathematical setting in which the distinctions in previous papers can be made formal. In addition to characterizing the spaces of inputs (the "observed" space) and outputs (the "decision" s…
▽ More
What does it mean for an algorithm to be fair? Different papers use different notions of algorithmic fairness, and although these appear internally consistent, they also seem mutually incompatible. We present a mathematical setting in which the distinctions in previous papers can be made formal. In addition to characterizing the spaces of inputs (the "observed" space) and outputs (the "decision" space), we introduce the notion of a construct space: a space that captures unobservable, but meaningful variables for the prediction.
We show that in order to prove desirable properties of the entire decision-making process, different mechanisms for fairness require different assumptions about the nature of the mapping from construct space to decision space. The results in this paper imply that future treatments of algorithmic fairness should more explicitly state assumptions about the relationship between constructs and observations.
△ Less
Submitted 23 September, 2016;
originally announced September 2016.
-
Auditing Black-box Models for Indirect Influence
Authors:
Philip Adler,
Casey Falk,
Sorelle A. Friedler,
Gabriel Rybeck,
Carlos Scheidegger,
Brandon Smith,
Suresh Venkatasubramanian
Abstract:
Data-trained predictive models see widespread use, but for the most part they are used as black boxes which output a prediction or score. It is therefore hard to acquire a deeper understanding of model behavior, and in particular how different features influence the model prediction. This is important when interpreting the behavior of complex models, or asserting that certain problematic attribute…
▽ More
Data-trained predictive models see widespread use, but for the most part they are used as black boxes which output a prediction or score. It is therefore hard to acquire a deeper understanding of model behavior, and in particular how different features influence the model prediction. This is important when interpreting the behavior of complex models, or asserting that certain problematic attributes (like race or gender) are not unduly influencing decisions.
In this paper, we present a technique for auditing black-box models, which lets us study the extent to which existing models take advantage of particular features in the dataset, without knowing how the models work. Our work focuses on the problem of indirect influence: how some features might indirectly influence outcomes via other, related features. As a result, we can find attribute influences even in cases where, upon further direct examination of the model, the attribute is not referred to by the model at all.
Our approach does not require the black-box model to be retrained. This is important if (for example) the model is only accessible via an API, and contrasts our work with other methods that investigate feature influence like feature selection. We present experimental evidence for the effectiveness of our procedure using a variety of publicly available datasets and models. We also validate our procedure using techniques from interpretable learning and feature selection, as well as against other black-box auditing procedures.
△ Less
Submitted 30 November, 2016; v1 submitted 22 February, 2016;
originally announced February 2016.
-
Convex Hull for Probabilistic Points
Authors:
F. Betul Atalay,
Sorelle A. Friedler,
Dianna Xu
Abstract:
We analyze the correctness of an O(n log n) time divide-and-conquer algorithm for the convex hull problem when each input point is a location determined by a normal distribution. We show that the algorithm finds the convex hull of such probabilistic points to precision within some expected correctness determined by a user-given confidence value. In order to precisely explain how correct the result…
▽ More
We analyze the correctness of an O(n log n) time divide-and-conquer algorithm for the convex hull problem when each input point is a location determined by a normal distribution. We show that the algorithm finds the convex hull of such probabilistic points to precision within some expected correctness determined by a user-given confidence value. In order to precisely explain how correct the resulting structure is, we introduce a new certificate error model for calculating and understanding approximate geometric error based on the fundamental properties of a geometric structure. We show that this new error model implies correctness under a robust statistical error model, in which each point lies within the hull with probability at least that of the user-given confidence value, for the convex hull problem.
△ Less
Submitted 5 August, 2016; v1 submitted 2 December, 2014;
originally announced December 2014.