Search | arXiv e-print repository

Should you use LLMs to simulate opinions? Quality checks for early-stage deliberation

Authors: Terrence Neumann, Maria De-Arteaga, Sina Fazelpour

Abstract: The emergent capabilities of large language models (LLMs) have sparked interest in assessing their ability to simulate human opinions in a variety of contexts, potentially serving as surrogates for human subjects in opinion surveys. However, previous evaluations of this capability have depended heavily on costly, domain-specific human survey data, and mixed empirical results about LLM effectivenes… ▽ More The emergent capabilities of large language models (LLMs) have sparked interest in assessing their ability to simulate human opinions in a variety of contexts, potentially serving as surrogates for human subjects in opinion surveys. However, previous evaluations of this capability have depended heavily on costly, domain-specific human survey data, and mixed empirical results about LLM effectiveness create uncertainty for managers about whether investing in this technology is justified in early-stage research. To address these challenges, we introduce a series of quality checks to support early-stage deliberation about the viability of using LLMs for simulating human opinions. These checks emphasize logical constraints, model stability, and alignment with stakeholder expectations of model outputs, thereby reducing dependence on human-generated data in the initial stages of evaluation. We demonstrate the usefulness of the proposed quality control tests in the context of AI-assisted content moderation, an application that both advocates and critics of LLMs' capabilities to simulate human opinion see as a desirable potential use case. None of the tested models passed all quality control checks, revealing several failure modes. We conclude by discussing implications of these failure modes and recommend how organizations can utilize our proposed tests for prompt engineering and in their risk management practices when considering the use of LLMs for opinion simulation. We make our crowdsourced dataset of claims with human and LLM annotations publicly available for future research. △ Less

Submitted 1 May, 2025; v1 submitted 11 April, 2025; originally announced April 2025.

arXiv:2504.04243 [pdf, other]

doi 10.1145/3715275.3732070

Perils of Label Indeterminacy: A Case Study on Prediction of Neurological Recovery After Cardiac Arrest

Authors: Jakob Schoeffer, Maria De-Arteaga, Jonathan Elmer

Abstract: The design of AI systems to assist human decision-making typically requires the availability of labels to train and evaluate supervised models. Frequently, however, these labels are unknown, and different ways of estimating them involve unverifiable assumptions or arbitrary choices. In this work, we introduce the concept of label indeterminacy and derive important implications in high-stakes AI-as… ▽ More The design of AI systems to assist human decision-making typically requires the availability of labels to train and evaluate supervised models. Frequently, however, these labels are unknown, and different ways of estimating them involve unverifiable assumptions or arbitrary choices. In this work, we introduce the concept of label indeterminacy and derive important implications in high-stakes AI-assisted decision-making. We present an empirical study in a healthcare context, focusing specifically on predicting the recovery of comatose patients after resuscitation from cardiac arrest. Our study shows that label indeterminacy can result in models that perform similarly when evaluated on patients with known labels, but vary drastically in their predictions for patients where labels are unknown. After demonstrating crucial ethical implications of label indeterminacy in this high-stakes context, we discuss takeaways for evaluation, reporting, and design. △ Less

Submitted 7 May, 2025; v1 submitted 5 April, 2025; originally announced April 2025.

Comments: The 2025 ACM Conference on Fairness, Accountability, and Transparency (FAccT '25)

arXiv:2503.00333 [pdf, other]

More of the Same: Persistent Representational Harms Under Increased Representation

Authors: Jennifer Mickel, Maria De-Arteaga, Leqi Liu, Kevin Tian

Abstract: To recognize and mitigate the harms of generative AI systems, it is crucial to consider who is represented in the outputs of generative AI systems and how people are represented. A critical gap emerges when naively improving who is represented, as this does not imply bias mitigation efforts have been applied to address how people are represented. We critically examined this by investigating gender… ▽ More To recognize and mitigate the harms of generative AI systems, it is crucial to consider who is represented in the outputs of generative AI systems and how people are represented. A critical gap emerges when naively improving who is represented, as this does not imply bias mitigation efforts have been applied to address how people are represented. We critically examined this by investigating gender representation in occupation across state-of-the-art large language models. We first show evidence suggesting that over time there have been interventions to models altering the resulting gender distribution, and we find that women are more represented than men when models are prompted to generate biographies or personas. We then demonstrate that representational biases persist in how different genders are represented by examining statistically significant word differences across genders. This results in a proliferation of representational harms, stereotypes, and neoliberalism ideals that, despite existing interventions to increase female representation, reinforce existing systems of oppression. △ Less

Submitted 28 February, 2025; originally announced March 2025.

Comments: 26 pages, 7 figures, 6 tables, pre-print

arXiv:2411.18122 [pdf, other]

Using Machine Bias To Measure Human Bias

Authors: Wanxue Dong, Maria De-Arteaga, Maytal Saar-Tsechansky

Abstract: Biased human decisions have consequential impacts across various domains, yielding unfair treatment of individuals and resulting in suboptimal outcomes for organizations and society. In recognition of this fact, organizations regularly design and deploy interventions aimed at mitigating these biases. However, measuring human decision biases remains an important but elusive task. Organizations are… ▽ More Biased human decisions have consequential impacts across various domains, yielding unfair treatment of individuals and resulting in suboptimal outcomes for organizations and society. In recognition of this fact, organizations regularly design and deploy interventions aimed at mitigating these biases. However, measuring human decision biases remains an important but elusive task. Organizations are frequently concerned with mistaken decisions disproportionately affecting one group. In practice, however, this is typically not possible to assess due to the scarcity of a gold standard: a label that indicates what the correct decision would have been. In this work, we propose a machine learning-based framework to assess bias in human-generated decisions when gold standard labels are scarce. We provide theoretical guarantees and empirical evidence demonstrating the superiority of our method over existing alternatives. This proposed methodology establishes a foundation for transparency in human decision-making, carrying substantial implications for managerial duties, and offering potential for alleviating algorithmic biases when human decisions are used as labels to train algorithms. △ Less

Submitted 10 December, 2024; v1 submitted 27 November, 2024; originally announced November 2024.

arXiv:2407.11933 [pdf, other]

Fairly Accurate: Optimizing Accuracy Parity in Fair Target-Group Detection

Authors: Soumyajit Gupta, Venelin Kovatchev, Maria De-Arteaga, Matthew Lease

Abstract: In algorithmic toxicity detection pipelines, it is important to identify which demographic group(s) are the subject of a post, a task commonly known as \textit{target (group) detection}. While accurate detection is clearly important, we further advocate a fairness objective: to provide equal protection to all groups who may be targeted. To this end, we adopt \textit{Accuracy Parity} (AP) -- balanc… ▽ More In algorithmic toxicity detection pipelines, it is important to identify which demographic group(s) are the subject of a post, a task commonly known as \textit{target (group) detection}. While accurate detection is clearly important, we further advocate a fairness objective: to provide equal protection to all groups who may be targeted. To this end, we adopt \textit{Accuracy Parity} (AP) -- balanced detection accuracy across groups -- as our fairness objective. However, in order to align model training with our AP fairness objective, we require an equivalent loss function. Moreover, for gradient-based models such as neural networks, this loss function needs to be differentiable. Because no such loss function exists today for AP, we propose \emph{Group Accuracy Parity} (GAP): the first differentiable loss function having a one-on-one mapping to AP. We empirically show that GAP addresses disparate impact on groups for target detection. Furthermore, because a single post often targets multiple groups in practice, we also provide a mathematical extension of GAP to larger multi-group settings, something typically requiring heuristics in prior work. Our findings show that by optimizing AP, GAP better mitigates bias in comparison with other commonly employed loss functions. △ Less

Submitted 16 July, 2024; originally announced July 2024.

arXiv:2401.16558 [pdf, other]

Diverse, but Divisive: LLMs Can Exaggerate Gender Differences in Opinion Related to Harms of Misinformation

Authors: Terrence Neumann, Sooyong Lee, Maria De-Arteaga, Sina Fazelpour, Matthew Lease

Abstract: The pervasive spread of misinformation and disinformation poses a significant threat to society. Professional fact-checkers play a key role in addressing this threat, but the vast scale of the problem forces them to prioritize their limited resources. This prioritization may consider a range of factors, such as varying risks of harm posed to specific groups of people. In this work, we investigate… ▽ More The pervasive spread of misinformation and disinformation poses a significant threat to society. Professional fact-checkers play a key role in addressing this threat, but the vast scale of the problem forces them to prioritize their limited resources. This prioritization may consider a range of factors, such as varying risks of harm posed to specific groups of people. In this work, we investigate potential implications of using a large language model (LLM) to facilitate such prioritization. Because fact-checking impacts a wide range of diverse segments of society, it is important that diverse views are represented in the claim prioritization process. This paper examines whether a LLM can reflect the views of various groups when assessing the harms of misinformation, focusing on gender as a primary variable. We pose two central questions: (1) To what extent do prompts with explicit gender references reflect gender differences in opinion in the United States on topics of social relevance? and (2) To what extent do gender-neutral prompts align with gendered viewpoints on those topics? To analyze these questions, we present the TopicMisinfo dataset, containing 160 fact-checked claims from diverse topics, supplemented by nearly 1600 human annotations with subjective perceptions and annotator demographics. Analyzing responses to gender-specific and neutral prompts, we find that GPT 3.5-Turbo reflects empirically observed gender differences in opinion but amplifies the extent of these differences. These findings illuminate AI's complex role in moderating online communication, with implications for fact-checkers, algorithm designers, and the use of crowd-workers as annotators. We also release the TopicMisinfo dataset to support continuing research in the community. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: Under Review

arXiv:2310.13007 [pdf, other]

doi 10.1145/3630106.3658990

A Critical Survey on Fairness Benefits of Explainable AI

Authors: Luca Deck, Jakob Schoeffer, Maria De-Arteaga, Niklas Kühl

Abstract: In this critical survey, we analyze typical claims on the relationship between explainable AI (XAI) and fairness to disentangle the multidimensional relationship between these two concepts. Based on a systematic literature review and a subsequent qualitative content analysis, we identify seven archetypal claims from 175 scientific articles on the alleged fairness benefits of XAI. We present crucia… ▽ More In this critical survey, we analyze typical claims on the relationship between explainable AI (XAI) and fairness to disentangle the multidimensional relationship between these two concepts. Based on a systematic literature review and a subsequent qualitative content analysis, we identify seven archetypal claims from 175 scientific articles on the alleged fairness benefits of XAI. We present crucial caveats with respect to these claims and provide an entry point for future discussions around the potentials and limitations of XAI for specific fairness desiderata. Importantly, we notice that claims are often (i) vague and simplistic, (ii) lacking normative grounding, or (iii) poorly aligned with the actual capabilities of XAI. We suggest to conceive XAI not as an ethical panacea but as one of many tools to approach the multidimensional, sociotechnical challenge of algorithmic fairness. Moreover, when making a claim about XAI and fairness, we emphasize the need to be more specific about what kind of XAI method is used, which fairness desideratum it refers to, how exactly it enables fairness, and who is the stakeholder that benefits from XAI. △ Less

Submitted 7 May, 2024; v1 submitted 15 October, 2023; originally announced October 2023.

Comments: ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT '24)

arXiv:2307.08945 [pdf, other]

Mitigating Label Bias via Decoupled Confident Learning

Authors: Yunyi Li, Maria De-Arteaga, Maytal Saar-Tsechansky

Abstract: Growing concerns regarding algorithmic fairness have led to a surge in methodologies to mitigate algorithmic bias. However, such methodologies largely assume that observed labels in training data are correct. This is problematic because bias in labels is pervasive across important domains, including healthcare, hiring, and content moderation. In particular, human-generated labels are prone to enco… ▽ More Growing concerns regarding algorithmic fairness have led to a surge in methodologies to mitigate algorithmic bias. However, such methodologies largely assume that observed labels in training data are correct. This is problematic because bias in labels is pervasive across important domains, including healthcare, hiring, and content moderation. In particular, human-generated labels are prone to encoding societal biases. While the presence of labeling bias has been discussed conceptually, there is a lack of methodologies to address this problem. We propose a pruning method -- Decoupled Confident Learning (DeCoLe) -- specifically designed to mitigate label bias. After illustrating its performance on a synthetic dataset, we apply DeCoLe in the context of hate speech detection, where label bias has been recognized as an important challenge, and show that it successfully identifies biased labels and outperforms competing approaches. △ Less

Submitted 29 September, 2023; v1 submitted 17 July, 2023; originally announced July 2023.

Comments: AI & HCI Workshop at the 40th International Conference on Machine Learning (ICML), Honolulu, Hawaii, USA. 2023

arXiv:2302.08157 [pdf, other]

doi 10.1145/3544549.3583178

Human-Centered Responsible Artificial Intelligence: Current & Future Trends

Authors: Mohammad Tahaei, Marios Constantinides, Daniele Quercia, Sean Kennedy, Michael Muller, Simone Stumpf, Q. Vera Liao, Ricardo Baeza-Yates, Lora Aroyo, Jess Holbrook, Ewa Luger, Michael Madaio, Ilana Golbin Blumenfeld, Maria De-Arteaga, Jessica Vitak, Alexandra Olteanu

Abstract: In recent years, the CHI community has seen significant growth in research on Human-Centered Responsible Artificial Intelligence. While different research communities may use different terminology to discuss similar topics, all of this work is ultimately aimed at developing AI that benefits humanity while being grounded in human rights and ethics, and reducing the potential harms of AI. In this sp… ▽ More In recent years, the CHI community has seen significant growth in research on Human-Centered Responsible Artificial Intelligence. While different research communities may use different terminology to discuss similar topics, all of this work is ultimately aimed at developing AI that benefits humanity while being grounded in human rights and ethics, and reducing the potential harms of AI. In this special interest group, we aim to bring together researchers from academia and industry interested in these topics to map current and future research trends to advance this important area of research by fostering collaboration and sharing ideas. △ Less

Submitted 16 February, 2023; originally announced February 2023.

Comments: To appear in Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems

arXiv:2302.07372 [pdf, other]

Same Same, But Different: Conditional Multi-Task Learning for Demographic-Specific Toxicity Detection

Authors: Soumyajit Gupta, Sooyong Lee, Maria De-Arteaga, Matthew Lease

Abstract: Algorithmic bias often arises as a result of differential subgroup validity, in which predictive relationships vary across groups. For example, in toxic language detection, comments targeting different demographic groups can vary markedly across groups. In such settings, trained models can be dominated by the relationships that best fit the majority group, leading to disparate performance. We prop… ▽ More Algorithmic bias often arises as a result of differential subgroup validity, in which predictive relationships vary across groups. For example, in toxic language detection, comments targeting different demographic groups can vary markedly across groups. In such settings, trained models can be dominated by the relationships that best fit the majority group, leading to disparate performance. We propose framing toxicity detection as multi-task learning (MTL), allowing a model to specialize on the relationships that are relevant to each demographic group while also leveraging shared properties across groups. With toxicity detection, each task corresponds to identifying toxicity against a particular demographic group. However, traditional MTL requires labels for all tasks to be present for every data point. To address this, we propose Conditional MTL (CondMTL), wherein only training examples relevant to the given demographic group are considered by the loss function. This lets us learn group specific representations in each branch which are not cross contaminated by irrelevant labels. Results on synthetic and real data show that using CondMTL improves predictive recall over various baselines in general and for the minority demographic group in particular, while having similar overall accuracy. △ Less

Submitted 6 March, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

Journal ref: Proceedings of the Web Conference, WWW 2023

arXiv:2302.02944 [pdf, other]

Learning Complementary Policies for Human-AI Teams

Authors: Ruijiang Gao, Maytal Saar-Tsechansky, Maria De-Arteaga, Ligong Han, Wei Sun, Min Kyung Lee, Matthew Lease

Abstract: Human-AI complementarity is important when neither the algorithm nor the human yields dominant performance across all instances in a given context. Recent work that explored human-AI collaboration has considered decisions that correspond to classification tasks. However, in many important contexts where humans can benefit from AI complementarity, humans undertake course of action. In this paper, w… ▽ More Human-AI complementarity is important when neither the algorithm nor the human yields dominant performance across all instances in a given context. Recent work that explored human-AI collaboration has considered decisions that correspond to classification tasks. However, in many important contexts where humans can benefit from AI complementarity, humans undertake course of action. In this paper, we propose a framework for a novel human-AI collaboration for selecting advantageous course of action, which we refer to as Learning Complementary Policy for Human-AI teams (\textsc{lcp-hai}). Our solution aims to exploit the human-AI complementarity to maximize decision rewards by learning both an algorithmic policy that aims to complement humans by a routing model that defers decisions to either a human or the AI to leverage the resulting complementarity. We then extend our approach to leverage opportunities and mitigate risks that arise in important contexts in practice: 1) when a team is composed of multiple humans with differential and potentially complementary abilities, 2) when the observational data includes consistent deterministic actions, and 3) when the covariate distribution of future decisions differ from that in the historical data. We demonstrate the effectiveness of our proposed methods using data on real human responses and semi-synthetic, and find that our methods offer reliable and advantageous performance across setting, and that it is superior to when either the algorithm or the AI make decisions on their own. We also find that the extensions we propose effectively improve the robustness of the human-AI collaboration performance in the presence of different challenging settings. △ Less

Submitted 6 February, 2023; originally announced February 2023.

Comments: Previous name: Robust Human-AI Collaboration with Bandit Feedback; Best student paper award at Conference on Information Systems and Technology (CIST), 2022

arXiv:2209.11812 [pdf, other]

doi 10.1145/3613904.3642621

Explanations, Fairness, and Appropriate Reliance in Human-AI Decision-Making

Authors: Jakob Schoeffer, Maria De-Arteaga, Niklas Kuehl

Abstract: In this work, we study the effects of feature-based explanations on distributive fairness of AI-assisted decisions, specifically focusing on the task of predicting occupations from short textual bios. We also investigate how any effects are mediated by humans' fairness perceptions and their reliance on AI recommendations. Our findings show that explanations influence fairness perceptions, which, i… ▽ More In this work, we study the effects of feature-based explanations on distributive fairness of AI-assisted decisions, specifically focusing on the task of predicting occupations from short textual bios. We also investigate how any effects are mediated by humans' fairness perceptions and their reliance on AI recommendations. Our findings show that explanations influence fairness perceptions, which, in turn, relate to humans' tendency to adhere to AI recommendations. However, we see that such explanations do not enable humans to discern correct and incorrect AI recommendations. Instead, we show that they may affect reliance irrespective of the correctness of AI recommendations. Depending on which features an explanation highlights, this can foster or hinder distributive fairness: when explanations highlight features that are task-irrelevant and evidently associated with the sensitive attribute, this prompts overrides that counter AI recommendations that align with gender stereotypes. Meanwhile, if explanations appear task-relevant, this induces reliance behavior that reinforces stereotype-aligned errors. These results imply that feature-based explanations are not a reliable mechanism to improve distributive fairness. △ Less

Submitted 18 March, 2024; v1 submitted 23 September, 2022; originally announced September 2022.

Comments: ACM CHI Conference on Human Factors in Computing Systems (CHI '24)

arXiv:2208.06648 [pdf, other]

Imputation Strategies Under Clinical Presence: Impact on Algorithmic Fairness

Authors: Vincent Jeanselme, Maria De-Arteaga, Zhe Zhang, Jessica Barrett, Brian Tom

Abstract: Machine learning risks reinforcing biases present in data and, as we argue in this work, in what is absent from data. In healthcare, societal and decision biases shape patterns in missing data, yet the algorithmic fairness implications of group-specific missingness are poorly understood. The way we address missingness in healthcare can have detrimental impacts on downstream algorithmic fairness. O… ▽ More Machine learning risks reinforcing biases present in data and, as we argue in this work, in what is absent from data. In healthcare, societal and decision biases shape patterns in missing data, yet the algorithmic fairness implications of group-specific missingness are poorly understood. The way we address missingness in healthcare can have detrimental impacts on downstream algorithmic fairness. Our work questions current recommendations and practices aimed at handling missing data with a focus on their effect on algorithmic fairness, and offers a path forward. Specifically, we consider the theoretical underpinnings of existing recommendations as well as their empirical predictive performance and corresponding algorithmic fairness measured through subgroup performances. Our results show that current practices for handling missingness lack principled foundations, are disconnected from the realities of missingness mechanisms in healthcare, and can be counterproductive. For example, we show that favouring group-specific imputation strategy can be misguided and exacerbate prediction disparities. We then build on our findings to propose a framework for empirically guiding imputation choices, and an accompanying reporting framework. Our work constitutes an important contribution to recent efforts by regulators and practitioners to grapple with the realities of real-world data, and to foster the responsible and transparent deployment of machine learning systems. We demonstrate the practical utility of the proposed framework through experimentation on widely used datasets, where we show how the proposed framework can guide the selection of imputation strategies, allowing us to choose among strategies that yield equal overall predictive performance but present different algorithmic fairness properties. △ Less

Submitted 17 March, 2025; v1 submitted 13 August, 2022; originally announced August 2022.

Comments: Full Journal Version under review; Presented at the conference Machine Learning for Health (ML4H) 2022 Published in the Proceedings of Machine Learning Research (193)

arXiv:2207.13834 [pdf, ps, other]

Toward Supporting Perceptual Complementarity in Human-AI Collaboration via Reflection on Unobservables

Authors: Kenneth Holstein, Maria De-Arteaga, Lakshmi Tumati, Yanghuidi Cheng

Abstract: In many real world contexts, successful human-AI collaboration requires humans to productively integrate complementary sources of information into AI-informed decisions. However, in practice human decision-makers often lack understanding of what information an AI model has access to in relation to themselves. There are few available guidelines regarding how to effectively communicate about unobser… ▽ More In many real world contexts, successful human-AI collaboration requires humans to productively integrate complementary sources of information into AI-informed decisions. However, in practice human decision-makers often lack understanding of what information an AI model has access to in relation to themselves. There are few available guidelines regarding how to effectively communicate about unobservables: features that may influence the outcome, but which are unavailable to the model. In this work, we conducted an online experiment to understand whether and how explicitly communicating potentially relevant unobservables influences how people integrate model outputs and unobservables when making predictions. Our findings indicate that presenting prompts about unobservables can change how humans integrate model outputs and unobservables, but do not necessarily lead to improved performance. Furthermore, the impacts of these prompts can vary depending on decision-makers' prior domain expertise. We conclude by discussing implications for future research and design of AI-based decision support tools. △ Less

Submitted 26 January, 2023; v1 submitted 27 July, 2022; originally announced July 2022.

Comments: CSCW 2023

arXiv:2207.10991 [pdf, other]

Algorithmic Fairness in Business Analytics: Directions for Research and Practice

Authors: Maria De-Arteaga, Stefan Feuerriegel, Maytal Saar-Tsechansky

Abstract: The extensive adoption of business analytics (BA) has brought financial gains and increased efficiencies. However, these advances have simultaneously drawn attention to rising legal and ethical challenges when BA inform decisions with fairness implications. As a response to these concerns, the emerging study of algorithmic fairness deals with algorithmic outputs that may result in disparate outcom… ▽ More The extensive adoption of business analytics (BA) has brought financial gains and increased efficiencies. However, these advances have simultaneously drawn attention to rising legal and ethical challenges when BA inform decisions with fairness implications. As a response to these concerns, the emerging study of algorithmic fairness deals with algorithmic outputs that may result in disparate outcomes or other forms of injustices for subgroups of the population, especially those who have been historically marginalized. Fairness is relevant on the basis of legal compliance, social responsibility, and utility; if not adequately and systematically addressed, unfair BA systems may lead to societal harms and may also threaten an organization's own survival, its competitiveness, and overall performance. This paper offers a forward-looking, BA-focused review of algorithmic fairness. We first review the state-of-the-art research on sources and measures of bias, as well as bias mitigation algorithms. We then provide a detailed discussion of the utility-fairness relationship, emphasizing that the frequent assumption of a trade-off between these two constructs is often mistaken or short-sighted. Finally, we chart a path forward by identifying opportunities for business scholars to address impactful, open challenges that are key to the effective and responsible deployment of BA. △ Less

Submitted 22 July, 2022; originally announced July 2022.

arXiv:2207.07723 [pdf, other]

More Data Can Lead Us Astray: Active Data Acquisition in the Presence of Label Bias

Authors: Yunyi Li, Maria De-Arteaga, Maytal Saar-Tsechansky

Abstract: An increased awareness concerning risks of algorithmic bias has driven a surge of efforts around bias mitigation strategies. A vast majority of the proposed approaches fall under one of two categories: (1) imposing algorithmic fairness constraints on predictive models, and (2) collecting additional training samples. Most recently and at the intersection of these two categories, methods that propos… ▽ More An increased awareness concerning risks of algorithmic bias has driven a surge of efforts around bias mitigation strategies. A vast majority of the proposed approaches fall under one of two categories: (1) imposing algorithmic fairness constraints on predictive models, and (2) collecting additional training samples. Most recently and at the intersection of these two categories, methods that propose active learning under fairness constraints have been developed. However, proposed bias mitigation strategies typically overlook the bias presented in the observed labels. In this work, we study fairness considerations of active data collection strategies in the presence of label bias. We first present an overview of different types of label bias in the context of supervised learning systems. We then empirically show that, when overlooking label bias, collecting more data can aggravate bias, and imposing fairness constraints that rely on the observed labels in the data collection process may not address the problem. Our results illustrate the unintended consequences of deploying a model that attempts to mitigate a single type of bias while neglecting others, emphasizing the importance of explicitly differentiating between the types of bias that fairness-aware algorithms aim to address, and highlighting the risks of neglecting label bias during data collection. △ Less

Submitted 15 July, 2022; originally announced July 2022.

Report number: https://ojs.aaai.org/index.php/HCOMP/article/view/21994/21770

Journal ref: Proceedings of the AAAI Conference on Human Computation and Crowdsourcing 2022 Oct 14 (Vol. 10, pp. 133-146)

arXiv:2205.00072 [pdf, other]

Doubting AI Predictions: Influence-Driven Second Opinion Recommendation

Authors: Maria De-Arteaga, Alexandra Chouldechova, Artur Dubrawski

Abstract: Effective human-AI collaboration requires a system design that provides humans with meaningful ways to make sense of and critically evaluate algorithmic recommendations. In this paper, we propose a way to augment human-AI collaboration by building on a common organizational practice: identifying experts who are likely to provide complementary opinions. When machine learning algorithms are trained… ▽ More Effective human-AI collaboration requires a system design that provides humans with meaningful ways to make sense of and critically evaluate algorithmic recommendations. In this paper, we propose a way to augment human-AI collaboration by building on a common organizational practice: identifying experts who are likely to provide complementary opinions. When machine learning algorithms are trained to predict human-generated assessments, experts' rich multitude of perspectives is frequently lost in monolithic algorithmic recommendations. The proposed approach aims to leverage productive disagreement by (1) identifying whether some experts are likely to disagree with an algorithmic assessment and, if so, (2) recommend an expert to request a second opinion from. △ Less

Submitted 29 April, 2022; originally announced May 2022.

Comments: ACM CHI 2022 Workshop on Trust and Reliance in AI-Human Teams (TRAIT)

arXiv:2204.13568 [pdf, other]

doi 10.1145/3531146.3533205

Justice in Misinformation Detection Systems: An Analysis of Algorithms, Stakeholders, and Potential Harms

Authors: Terrence Neumann, Maria De-Arteaga, Sina Fazelpour

Abstract: Faced with the scale and surge of misinformation on social media, many platforms and fact-checking organizations have turned to algorithms for automating key parts of misinformation detection pipelines. While offering a promising solution to the challenge of scale, the ethical and societal risks associated with algorithmic misinformation detection are not well-understood. In this paper, we employ… ▽ More Faced with the scale and surge of misinformation on social media, many platforms and fact-checking organizations have turned to algorithms for automating key parts of misinformation detection pipelines. While offering a promising solution to the challenge of scale, the ethical and societal risks associated with algorithmic misinformation detection are not well-understood. In this paper, we employ and extend upon the notion of informational justice to develop a framework for explicating issues of justice relating to representation, participation, distribution of benefits and burdens, and credibility in the misinformation detection pipeline. Drawing on the framework: (1) we show how injustices materialize for stakeholders across three algorithmic stages in the pipeline; (2) we suggest empirical measures for assessing these injustices; and (3) we identify potential sources of these harms. This framework should help researchers, policymakers, and practitioners reason about potential harms or risks associated with these algorithms and provide conceptual guidance for the design of algorithmic fairness audits in this domain. △ Less

Submitted 29 April, 2022; v1 submitted 28 April, 2022; originally announced April 2022.

Comments: Accepted at ACM Conference on Fairness, Accountability, and Transparenct (FAccT), 2022

arXiv:2204.13156 [pdf, other]

On the Relationship Between Explanations, Fairness Perceptions, and Decisions

Authors: Jakob Schoeffer, Maria De-Arteaga, Niklas Kuehl

Abstract: It is known that recommendations of AI-based systems can be incorrect or unfair. Hence, it is often proposed that a human be the final decision-maker. Prior work has argued that explanations are an essential pathway to help human decision-makers enhance decision quality and mitigate bias, i.e., facilitate human-AI complementarity. For these benefits to materialize, explanations should enable human… ▽ More It is known that recommendations of AI-based systems can be incorrect or unfair. Hence, it is often proposed that a human be the final decision-maker. Prior work has argued that explanations are an essential pathway to help human decision-makers enhance decision quality and mitigate bias, i.e., facilitate human-AI complementarity. For these benefits to materialize, explanations should enable humans to appropriately rely on AI recommendations and override the algorithmic recommendation when necessary to increase distributive fairness of decisions. The literature, however, does not provide conclusive empirical evidence as to whether explanations enable such complementarity in practice. In this work, we (a) provide a conceptual framework to articulate the relationships between explanations, fairness perceptions, reliance, and distributive fairness, (b) apply it to understand (seemingly) contradictory research findings at the intersection of explanations and fairness, and (c) derive cohesive implications for the formulation of research questions and the design of experiments. △ Less

Submitted 6 May, 2022; v1 submitted 27 April, 2022; originally announced April 2022.

Comments: ACM CHI 2022 Workshop on Human-Centered Explainable AI (HCXAI), May 12--13, 2022, New Orleans, LA, USA

arXiv:2204.07661 [pdf, other]

doi 10.47989/ir30iConf47572

Finding Pareto Trade-offs in Fair and Accurate Detection of Toxic Speech

Authors: Soumyajit Gupta, Venelin Kovatchev, Anubrata Das, Maria De-Arteaga, Matthew Lease

Abstract: Optimizing NLP models for fairness poses many challenges. Lack of differentiable fairness measures prevents gradient-based loss training or requires surrogate losses that diverge from the true metric of interest. In addition, competing objectives (e.g., accuracy vs. fairness) often require making trade-offs based on stakeholder preferences, but stakeholders may not know their preferences before se… ▽ More Optimizing NLP models for fairness poses many challenges. Lack of differentiable fairness measures prevents gradient-based loss training or requires surrogate losses that diverge from the true metric of interest. In addition, competing objectives (e.g., accuracy vs. fairness) often require making trade-offs based on stakeholder preferences, but stakeholders may not know their preferences before seeing system performance under different trade-off settings. To address these challenges, we begin by formulating a differentiable version of a popular fairness measure, Accuracy Parity, to provide balanced accuracy across demographic groups. Next, we show how model-agnostic, HyperNetwork optimization can efficiently train arbitrary NLP model architectures to learn Pareto-optimal trade-offs between competing metrics. Focusing on the task of toxic language detection, we show the generality and efficacy of our methods across two datasets, three neural architectures, and three fairness losses. △ Less

Submitted 9 April, 2025; v1 submitted 15 April, 2022; originally announced April 2022.

Journal ref: Published in Information Research, vol. 30, iConf, pp. 123--141, 2025

arXiv:2108.11056 [pdf, other]

doi 10.1007/s10618-022-00910-8

Social Norm Bias: Residual Harms of Fairness-Aware Algorithms

Authors: Myra Cheng, Maria De-Arteaga, Lester Mackey, Adam Tauman Kalai

Abstract: Many modern machine learning algorithms mitigate bias by enforcing fairness constraints across coarsely-defined groups related to a sensitive attribute like gender or race. However, these algorithms seldom account for within-group heterogeneity and biases that may disproportionately affect some members of a group. In this work, we characterize Social Norm Bias (SNoB), a subtle but consequential ty… ▽ More Many modern machine learning algorithms mitigate bias by enforcing fairness constraints across coarsely-defined groups related to a sensitive attribute like gender or race. However, these algorithms seldom account for within-group heterogeneity and biases that may disproportionately affect some members of a group. In this work, we characterize Social Norm Bias (SNoB), a subtle but consequential type of algorithmic discrimination that may be exhibited by machine learning models, even when these systems achieve group fairness objectives. We study this issue through the lens of gender bias in occupation classification. We quantify SNoB by measuring how an algorithm's predictions are associated with conformity to inferred gender norms. When predicting if an individual belongs to a male-dominated occupation, this framework reveals that "fair" classifiers still favor biographies written in ways that align with inferred masculine norms. We compare SNoB across algorithmic fairness methods and show that it is frequently a residual bias, and post-processing approaches do not mitigate this type of bias at all. △ Less

Submitted 10 August, 2022; v1 submitted 25 August, 2021; originally announced August 2021.

Comments: Spotlighted at the 2021 ICML Machine Learning for Data Workshop and presented at the 2021 ICML Socially Responsible Machine Learning Workshop

Report number: Data Min Knowl Disc (2023)

arXiv:2107.09163 [pdf, other]

Diversity in Sociotechnical Machine Learning Systems

Authors: Sina Fazelpour, Maria De-Arteaga

Abstract: There has been a surge of recent interest in sociocultural diversity in machine learning (ML) research, with researchers (i) examining the benefits of diversity as an organizational solution for alleviating problems with algorithmic bias, and (ii) proposing measures and methods for implementing diversity as a design desideratum in the construction of predictive algorithms. Currently, however, ther… ▽ More There has been a surge of recent interest in sociocultural diversity in machine learning (ML) research, with researchers (i) examining the benefits of diversity as an organizational solution for alleviating problems with algorithmic bias, and (ii) proposing measures and methods for implementing diversity as a design desideratum in the construction of predictive algorithms. Currently, however, there is a gap between discussions of measures and benefits of diversity in ML, on the one hand, and the broader research on the underlying concepts of diversity and the precise mechanisms of its functional benefits, on the other. This gap is problematic because diversity is not a monolithic concept. Rather, different concepts of diversity are based on distinct rationales that should inform how we measure diversity in a given context. Similarly, the lack of specificity about the precise mechanisms underpinning diversity's potential benefits can result in uninformative generalities, invalid experimental designs, and illicit interpretations of findings. In this work, we draw on research in philosophy, psychology, and social and organizational sciences to make three contributions: First, we introduce a taxonomy of different diversity concepts from philosophy of science, and explicate the distinct epistemic and political rationales underlying these concepts. Second, we provide an overview of mechanisms by which diversity can benefit group performance. Third, we situate these taxonomies--of concepts and mechanisms--in the lifecycle of sociotechnical ML systems and make a case for their usefulness in fair and accountable ML. We do so by illustrating how they clarify the discourse around diversity in the context of ML systems, promote the formulation of more precise research questions about diversity's impact, and provide conceptual tools to further advance research and practice. △ Less

Submitted 19 July, 2021; originally announced July 2021.

arXiv:2105.10614 [pdf, other]

Human-AI Collaboration with Bandit Feedback

Authors: Ruijiang Gao, Maytal Saar-Tsechansky, Maria De-Arteaga, Ligong Han, Min Kyung Lee, Matthew Lease

Abstract: Human-machine complementarity is important when neither the algorithm nor the human yield dominant performance across all instances in a given domain. Most research on algorithmic decision-making solely centers on the algorithm's performance, while recent work that explores human-machine collaboration has framed the decision-making problems as classification tasks. In this paper, we first propose… ▽ More Human-machine complementarity is important when neither the algorithm nor the human yield dominant performance across all instances in a given domain. Most research on algorithmic decision-making solely centers on the algorithm's performance, while recent work that explores human-machine collaboration has framed the decision-making problems as classification tasks. In this paper, we first propose and then develop a solution for a novel human-machine collaboration problem in a bandit feedback setting. Our solution aims to exploit the human-machine complementarity to maximize decision rewards. We then extend our approach to settings with multiple human decision makers. We demonstrate the effectiveness of our proposed methods using both synthetic and real human responses, and find that our methods outperform both the algorithm and the human when they each make decisions on their own. We also show how personalized routing in the presence of multiple human decision-makers can further improve the human-machine team performance. △ Less

Submitted 21 May, 2021; originally announced May 2021.

Comments: Accepted at IJCAI 2021

Journal ref: In Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI), pages 1722--1728, 2021

arXiv:2102.00128 [pdf, other]

doi 10.1145/3442188.3445877

The effect of differential victim crime reporting on predictive policing systems

Authors: Nil-Jana Akpinar, Maria De-Arteaga, Alexandra Chouldechova

Abstract: Police departments around the world have been experimenting with forms of place-based data-driven proactive policing for over two decades. Modern incarnations of such systems are commonly known as hot spot predictive policing. These systems predict where future crime is likely to concentrate such that police can allocate patrols to these areas and deter crime before it occurs. Previous research on… ▽ More Police departments around the world have been experimenting with forms of place-based data-driven proactive policing for over two decades. Modern incarnations of such systems are commonly known as hot spot predictive policing. These systems predict where future crime is likely to concentrate such that police can allocate patrols to these areas and deter crime before it occurs. Previous research on fairness in predictive policing has concentrated on the feedback loops which occur when models are trained on discovered crime data, but has limited implications for models trained on victim crime reporting data. We demonstrate how differential victim crime reporting rates across geographical areas can lead to outcome disparities in common crime hot spot prediction models. Our analysis is based on a simulation patterned after district-level victimization and crime reporting survey data for Bogotá, Colombia. Our results suggest that differential crime reporting rates can lead to a displacement of predicted hotspots from high crime but low reporting areas to high or medium crime and high reporting areas. This may lead to misallocations both in the form of over-policing and under-policing. △ Less

Submitted 4 February, 2021; v1 submitted 29 January, 2021; originally announced February 2021.

Comments: Conference on Fairness, Accountability, and Transparency (FAccT 2021)

arXiv:2101.09648 [pdf, other]

Leveraging Expert Consistency to Improve Algorithmic Decision Support

Authors: Maria De-Arteaga, Vincent Jeanselme, Artur Dubrawski, Alexandra Chouldechova

Abstract: Machine learning (ML) is increasingly being used to support high-stakes decisions. However, there is frequently a construct gap: a gap between the construct of interest to the decision-making task and what is captured in proxies used as labels to train ML models. As a result, ML models may fail to capture important dimensions of decision criteria, hampering their utility for decision support. Thus… ▽ More Machine learning (ML) is increasingly being used to support high-stakes decisions. However, there is frequently a construct gap: a gap between the construct of interest to the decision-making task and what is captured in proxies used as labels to train ML models. As a result, ML models may fail to capture important dimensions of decision criteria, hampering their utility for decision support. Thus, an essential step in the design of ML systems for decision support is selecting a target label among available proxies. In this work, we explore the use of historical expert decisions as a rich -- yet also imperfect -- source of information that can be combined with observed outcomes to narrow the construct gap. We argue that managers and system designers may be interested in learning from experts in instances where they exhibit consistency with each other, while learning from observed outcomes otherwise. We develop a methodology to enable this goal using information that is commonly available in organizational information systems. This involves two core steps. First, we propose an influence function-based methodology to estimate expert consistency indirectly when each case in the data is assessed by a single expert. Second, we introduce a label amalgamation approach that allows ML models to simultaneously learn from expert decisions and observed outcomes. Our empirical evaluation, using simulations in a clinical setting and real-world data from the child welfare domain, indicates that the proposed approach successfully narrows the construct gap, yielding better predictive performance than learning from either observed outcomes or expert decisions alone. △ Less

Submitted 3 June, 2024; v1 submitted 24 January, 2021; originally announced January 2021.

Comments: Best Paper Runner-Up Award, Workshop on Information Technologies and Systems (WITS), 2021

arXiv:2002.08035 [pdf, other]

doi 10.1145/3313831.3376638

A Case for Humans-in-the-Loop: Decisions in the Presence of Erroneous Algorithmic Scores

Authors: Maria De-Arteaga, Riccardo Fogliato, Alexandra Chouldechova

Abstract: The increased use of algorithmic predictions in sensitive domains has been accompanied by both enthusiasm and concern. To understand the opportunities and risks of these technologies, it is key to study how experts alter their decisions when using such tools. In this paper, we study the adoption of an algorithmic tool used to assist child maltreatment hotline screening decisions. We focus on the q… ▽ More The increased use of algorithmic predictions in sensitive domains has been accompanied by both enthusiasm and concern. To understand the opportunities and risks of these technologies, it is key to study how experts alter their decisions when using such tools. In this paper, we study the adoption of an algorithmic tool used to assist child maltreatment hotline screening decisions. We focus on the question: Are humans capable of identifying cases in which the machine is wrong, and of overriding those recommendations? We first show that humans do alter their behavior when the tool is deployed. Then, we show that humans are less likely to adhere to the machine's recommendation when the score displayed is an incorrect estimate of risk, even when overriding the recommendation requires supervisory approval. These results highlight the risks of full automation and the importance of designing decision pipelines that provide humans with autonomy. △ Less

Submitted 20 February, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

Comments: Accepted at ACM Conference on Human Factors in Computing Systems (ACM CHI), 2020

arXiv:2001.00249

Proceedings of NeurIPS 2019 Workshop on Machine Learning for the Developing World: Challenges and Risks of ML4D

Authors: Maria De-Arteaga, Tejumade Afonja, Amanda Coston

Abstract: This is the proceedings of the 3rd ML4D workshop which was help in Vancouver, Canada on December 13, 2019 as part of the Neural Information Processing Systems conference. This is the proceedings of the 3rd ML4D workshop which was help in Vancouver, Canada on December 13, 2019 as part of the Neural Information Processing Systems conference. △ Less

Submitted 10 April, 2020; v1 submitted 1 January, 2020; originally announced January 2020.

arXiv:1906.08206 [pdf, other]

Killings of social leaders in the Colombian post-conflict: Data analysis for investigative journalism

Authors: Maria De-Arteaga, Benedikt Boecking

Abstract: After the peace agreement of 2016 with FARC, the killings of social leaders have emerged as an important post-conflict challenge for Colombia. We present a data analysis based on official records obtained from the Colombian General Attorney's Office spanning the time period from 2012 to 2017. The results of the analysis show a drastic increase in the officially recorded number of killings of democ… ▽ More After the peace agreement of 2016 with FARC, the killings of social leaders have emerged as an important post-conflict challenge for Colombia. We present a data analysis based on official records obtained from the Colombian General Attorney's Office spanning the time period from 2012 to 2017. The results of the analysis show a drastic increase in the officially recorded number of killings of democratically elected leaders of community organizations, in particular those belonging to Juntas de Acción Comunal [Community Action Boards]. These are important entities that have been part of the Colombian democratic apparatus since 1958, and enable communities to advocate for their needs. We also describe how the data analysis guided a journalistic investigation that was motivated by the Colombian government's denial of the systematic nature of social leaders killings. △ Less

Submitted 19 June, 2019; originally announced June 2019.

arXiv:1904.05233 [pdf, other]

What's in a Name? Reducing Bias in Bios without Access to Protected Attributes

Authors: Alexey Romanov, Maria De-Arteaga, Hanna Wallach, Jennifer Chayes, Christian Borgs, Alexandra Chouldechova, Sahin Geyik, Krishnaram Kenthapadi, Anna Rumshisky, Adam Tauman Kalai

Abstract: There is a growing body of work that proposes methods for mitigating bias in machine learning systems. These methods typically rely on access to protected attributes such as race, gender, or age. However, this raises two significant challenges: (1) protected attributes may not be available or it may not be legal to use them, and (2) it is often desirable to simultaneously consider multiple protect… ▽ More There is a growing body of work that proposes methods for mitigating bias in machine learning systems. These methods typically rely on access to protected attributes such as race, gender, or age. However, this raises two significant challenges: (1) protected attributes may not be available or it may not be legal to use them, and (2) it is often desirable to simultaneously consider multiple protected attributes, as well as their intersections. In the context of mitigating bias in occupation classification, we propose a method for discouraging correlation between the predicted probability of an individual's true occupation and a word embedding of their name. This method leverages the societal biases that are encoded in word embeddings, eliminating the need for access to protected attributes. Crucially, it only requires access to individuals' names at training time and not at deployment time. We evaluate two variations of our proposed method using a large-scale dataset of online biographies. We find that both variations simultaneously reduce race and gender biases, with almost no reduction in the classifier's overall true positive rate. △ Less

Submitted 10 April, 2019; originally announced April 2019.

Comments: Accepted at NAACL 2019; Best Thematic Paper

arXiv:1901.09451 [pdf, other]

doi 10.1145/3287560.3287572

Bias in Bios: A Case Study of Semantic Representation Bias in a High-Stakes Setting

Authors: Maria De-Arteaga, Alexey Romanov, Hanna Wallach, Jennifer Chayes, Christian Borgs, Alexandra Chouldechova, Sahin Geyik, Krishnaram Kenthapadi, Adam Tauman Kalai

Abstract: We present a large-scale study of gender bias in occupation classification, a task where the use of machine learning may lead to negative outcomes on peoples' lives. We analyze the potential allocation harms that can result from semantic representation bias. To do so, we study the impact on occupation classification of including explicit gender indicators---such as first names and pronouns---in di… ▽ More We present a large-scale study of gender bias in occupation classification, a task where the use of machine learning may lead to negative outcomes on peoples' lives. We analyze the potential allocation harms that can result from semantic representation bias. To do so, we study the impact on occupation classification of including explicit gender indicators---such as first names and pronouns---in different semantic representations of online biographies. Additionally, we quantify the bias that remains when these indicators are "scrubbed," and describe proxy behavior that occurs in the absence of explicit gender indicators. As we demonstrate, differences in true positive rates between genders are correlated with existing gender imbalances in occupations, which may compound these imbalances. △ Less

Submitted 27 January, 2019; originally announced January 2019.

Comments: Accepted at ACM Conference on Fairness, Accountability, and Transparency (ACM FAT*), 2019

arXiv:1812.10398

Proceedings of NeurIPS 2018 Workshop on Machine Learning for the Developing World: Achieving Sustainable Impact

Authors: Maria De-Arteaga, Amanda Coston, William Herlands

Abstract: This is the Proceedings of NeurIPS 2018 Workshop on Machine Learning for the Developing World: Achieving Sustainable Impact, held in Montreal, Canada on December 8, 2018 This is the Proceedings of NeurIPS 2018 Workshop on Machine Learning for the Developing World: Achieving Sustainable Impact, held in Montreal, Canada on December 8, 2018 △ Less

Submitted 18 February, 2019; v1 submitted 20 December, 2018; originally announced December 2018.

Comments: 18 papers in the proceedings. 10 additional papers were presented at the workshop but not included in the proceedings

arXiv:1812.08769 [pdf, other]

What are the biases in my word embedding?

Authors: Nathaniel Swinger, Maria De-Arteaga, Neil Thomas Heffernan IV, Mark DM Leiserson, Adam Tauman Kalai

Abstract: This paper presents an algorithm for enumerating biases in word embeddings. The algorithm exposes a large number of offensive associations related to sensitive features such as race and gender on publicly available embeddings, including a supposedly "debiased" embedding. These biases are concerning in light of the widespread use of word embeddings. The associations are identified by geometric patt… ▽ More This paper presents an algorithm for enumerating biases in word embeddings. The algorithm exposes a large number of offensive associations related to sensitive features such as race and gender on publicly available embeddings, including a supposedly "debiased" embedding. These biases are concerning in light of the widespread use of word embeddings. The associations are identified by geometric patterns in word embeddings that run parallel between people's names and common lower-case tokens. The algorithm is highly unsupervised: it does not even require the sensitive features to be pre-specified. This is desirable because: (a) many forms of discrimination--such as racial discrimination--are linked to social constructs that may vary depending on the context, rather than to categories with fixed definitions; and (b) it makes it easier to identify biases against intersectional groups, which depend on combinations of sensitive features. The inputs to our algorithm are a list of target tokens, e.g. names, and a word embedding. It outputs a number of Word Embedding Association Tests (WEATs) that capture various biases present in the data. We illustrate the utility of our approach on publicly available word embeddings and lists of names, and evaluate its output using crowdsourcing. We also show how removing names may not remove potential proxy bias. △ Less

Submitted 19 June, 2019; v1 submitted 20 December, 2018; originally announced December 2018.

Comments: At AIES 2019: the AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society

arXiv:1807.00905 [pdf, other]

Learning under selective labels in the presence of expert consistency

Authors: Maria De-Arteaga, Artur Dubrawski, Alexandra Chouldechova

Abstract: We explore the problem of learning under selective labels in the context of algorithm-assisted decision making. Selective labels is a pervasive selection bias problem that arises when historical decision making blinds us to the true outcome for certain instances. Examples of this are common in many applications, ranging from predicting recidivism using pre-trial release data to diagnosing patients… ▽ More We explore the problem of learning under selective labels in the context of algorithm-assisted decision making. Selective labels is a pervasive selection bias problem that arises when historical decision making blinds us to the true outcome for certain instances. Examples of this are common in many applications, ranging from predicting recidivism using pre-trial release data to diagnosing patients. In this paper we discuss why selective labels often cannot be effectively tackled by standard methods for adjusting for sample selection bias, even if there are no unobservables. We propose a data augmentation approach that can be used to either leverage expert consistency to mitigate the partial blindness that results from selective labels, or to empirically validate whether learning under such framework may lead to unreliable models prone to systemic discrimination. △ Less

Submitted 4 July, 2018; v1 submitted 2 July, 2018; originally announced July 2018.

Comments: Presented at the 2018 Workshop on Fairness, Accountability, and Transparency in Machine Learning (FAT/ML 2018)

arXiv:1711.09522

Proceedings of NIPS 2017 Workshop on Machine Learning for the Developing World

Authors: Maria De-Arteaga, William Herlands

Abstract: This is the Proceedings of NIPS 2017 Workshop on Machine Learning for the Developing World, held in Long Beach, California, USA on December 8, 2017 This is the Proceedings of NIPS 2017 Workshop on Machine Learning for the Developing World, held in Long Beach, California, USA on December 8, 2017 △ Less

Submitted 12 December, 2017; v1 submitted 26 November, 2017; originally announced November 2017.

Comments: 15 papers

arXiv:1711.06538 [pdf, other]

doi 10.5281/zenodo.571551

Discovery of Complex Anomalous Patterns of Sexual Violence in El Salvador

Authors: Maria De-Arteaga, Artur Dubrawski

Abstract: When sexual violence is a product of organized crime or social imaginary, the links between sexual violence episodes can be understood as a latent structure. With this assumption in place, we can use data science to uncover complex patterns. In this paper we focus on the use of data mining techniques to unveil complex anomalous spatiotemporal patterns of sexual violence. We illustrate their use by… ▽ More When sexual violence is a product of organized crime or social imaginary, the links between sexual violence episodes can be understood as a latent structure. With this assumption in place, we can use data science to uncover complex patterns. In this paper we focus on the use of data mining techniques to unveil complex anomalous spatiotemporal patterns of sexual violence. We illustrate their use by analyzing all reported rapes in El Salvador over a period of nine years. Through our analysis, we are able to provide evidence of phenomena that, to the best of our knowledge, have not been previously reported in literature. We devote special attention to a pattern we discover in the East, where underage victims report their boyfriends as perpetrators at anomalously high rates. Finally, we explain how such analyzes could be conducted in real-time, enabling early detection of emerging patterns to allow law enforcement agencies and policy makers to react accordingly. △ Less

Submitted 17 November, 2017; originally announced November 2017.

Comments: Conference paper at Data for Policy 2016 - Frontiers of Data Science for Government: Ideas, Practices and Projections (Data for Policy)

arXiv:1511.06419 [pdf, other]

Canonical Autocorrelation Analysis

Authors: Maria De-Arteaga, Artur Dubrawski, Peter Huggins

Abstract: We present an extension of sparse Canonical Correlation Analysis (CCA) designed for finding multiple-to-multiple linear correlations within a single set of variables. Unlike CCA, which finds correlations between two sets of data where the rows are matched exactly but the columns represent separate sets of variables, the method proposed here, Canonical Autocorrelation Analysis (CAA), finds multivar… ▽ More We present an extension of sparse Canonical Correlation Analysis (CCA) designed for finding multiple-to-multiple linear correlations within a single set of variables. Unlike CCA, which finds correlations between two sets of data where the rows are matched exactly but the columns represent separate sets of variables, the method proposed here, Canonical Autocorrelation Analysis (CAA), finds multivariate correlations within just one set of variables. This can be useful when we look for hidden parsimonious structures in data, each involving only a small subset of all features. In addition, the discovered correlations are highly interpretable as they are formed by pairs of sparse linear combinations of the original features. We show how CAA can be of use as a tool for anomaly detection when the expected structure of correlations is not followed by anomalous data. We illustrate the utility of CAA in two application domains where single-class and unsupervised learning of correlation structures are particularly relevant: breast cancer diagnosis and radiation threat detection. When applied to the Wisconsin Breast Cancer data, single-class CAA is competitive with supervised methods used in literature. On the radiation threat detection task, unsupervised CAA performs significantly better than an unsupervised alternative prevalent in the domain, while providing valuable additional insights for threat analysis. △ Less

Submitted 19 November, 2015; originally announced November 2015.

Comments: 6 pages, 5 figures

arXiv:1511.04402 [pdf, other]

Lass-0: sparse non-convex regression by local search

Authors: William Herlands, Maria De-Arteaga, Daniel Neill, Artur Dubrawski

Abstract: We compute approximate solutions to L0 regularized linear regression using L1 regularization, also known as the Lasso, as an initialization step. Our algorithm, the Lass-0 ("Lass-zero"), uses a computationally efficient stepwise search to determine a locally optimal L0 solution given any L1 regularization solution. We present theoretical results of consistency under orthogonality and appropriate h… ▽ More We compute approximate solutions to L0 regularized linear regression using L1 regularization, also known as the Lasso, as an initialization step. Our algorithm, the Lass-0 ("Lass-zero"), uses a computationally efficient stepwise search to determine a locally optimal L0 solution given any L1 regularization solution. We present theoretical results of consistency under orthogonality and appropriate handling of redundant features. Empirically, we use synthetic data to demonstrate that Lass-0 solutions are closer to the true sparse support than L1 regularization models. Additionally, in real-world data Lass-0 finds more parsimonious solutions than L1 regularization while maintaining similar predictive accuracy. △ Less

Submitted 17 February, 2016; v1 submitted 13 November, 2015; originally announced November 2015.

Comments: 8 pages, 1 figure. NIPS 2015 Workshop of Optimization (OPT2015)

Showing 1–37 of 37 results for author: De-Arteaga, M