Search | arXiv e-print repository

arXiv:2501.19294 [pdf, ps, other]

The Cost of Balanced Training-Data Production in an Online Data Market

Authors: Augustin Chaintreau, Roland Maio, Juba Ziani

Abstract: Many ethical issues in machine learning are connected to the training data. Online data markets are an important source of training data, facilitating both production and distribution. Recently, a trend has emerged of for-profit "ethical" participants in online data markets. This trend raises a fascinating question: Can online data markets sustainably and efficiently address ethical issues in the… ▽ More Many ethical issues in machine learning are connected to the training data. Online data markets are an important source of training data, facilitating both production and distribution. Recently, a trend has emerged of for-profit "ethical" participants in online data markets. This trend raises a fascinating question: Can online data markets sustainably and efficiently address ethical issues in the broader machine-learning economy? In this work, we study this question in a stylized model of an online data market. We investigate the effects of intervening in the data market to achieve balanced training-data production. The model reveals the crucial role of market conditions. In small and emerging markets, an intervention can drive the data producers out of the market, so that the cost of fairness is maximal. Yet, in large and established markets, the cost of fairness can vanish (as a fraction of overall welfare) as the market grows. Our results suggest that "ethical" online data markets can be economically feasible under favorable market conditions, and motivate more models to consider the role of data production and distribution in mediating the impacts of ethical interventions. △ Less

Submitted 31 January, 2025; originally announced January 2025.

arXiv:2402.13787 [pdf, other]

doi 10.1145/3589334.3645609

Fairness Rising from the Ranks: HITS and PageRank on Homophilic Networks

Authors: Ana-Andreea Stoica, Nelly Litvak, Augustin Chaintreau

Abstract: In this paper, we investigate the conditions under which link analysis algorithms prevent minority groups from reaching high ranking slots. We find that the most common link-based algorithms using centrality metrics, such as PageRank and HITS, can reproduce and even amplify bias against minority groups in networks. Yet, their behavior differs: one one hand, we empirically show that PageRank mirror… ▽ More In this paper, we investigate the conditions under which link analysis algorithms prevent minority groups from reaching high ranking slots. We find that the most common link-based algorithms using centrality metrics, such as PageRank and HITS, can reproduce and even amplify bias against minority groups in networks. Yet, their behavior differs: one one hand, we empirically show that PageRank mirrors the degree distribution for most of the ranking positions and it can equalize representation of minorities among the top ranked nodes; on the other hand, we find that HITS amplifies pre-existing bias in homophilic networks through a novel theoretical analysis, supported by empirical results. We find the root cause of bias amplification in HITS to be the level of homophily present in the network, modeled through an evolving network model with two communities. We illustrate our theoretical analysis on both synthetic and real datasets and we present directions for future work. △ Less

Submitted 8 March, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

Comments: Accepted for publication in Proceedings of The Web Conference, 2024

arXiv:2112.00269 [pdf, other]

doi 10.1145/3656017

Unequal Opportunities in Multi-hop Referral Programs

Authors: Yiguang Zhang, Augustin Chaintreau

Abstract: As modern social networks allow for faster and broader interactions with friends and acquaintances, online referral programs that promote sales through existing users are becoming increasingly popular. Because it is all too common that online networks reproduce historical structural bias, members of disadvantaged groups often benefit less from such referral opportunities. For instance, one-hop ref… ▽ More As modern social networks allow for faster and broader interactions with friends and acquaintances, online referral programs that promote sales through existing users are becoming increasingly popular. Because it is all too common that online networks reproduce historical structural bias, members of disadvantaged groups often benefit less from such referral opportunities. For instance, one-hop referral programs that distribute rewards only among pairs of friends or followers may offer less rewards and opportunities to minorities in networks where it was proved that their degrees is statistically smaller. Here, we examine the fairness of general referral programs, increasingly popular forms of marketing in which an existing referrer is encouraged to initiate the recruitment of new referred users over multiple hops. While this clearly expands opportunities for rewards, it remains unclear whether it helps addressing fairness concerns, or make them worse. We show, from studying 4 real-world networks and performing theoretical analysis on networks created with minority-majority affiliations and homophily, that the change of bias in multi-hop referral programs highly depends on the network structures and the referral strategies. Specifically, under three different constrained referral strategies which limit the number of referrals each person can share to a fixed number, we show that even with no explicit intention to discriminate and without access to sensitive attributes such as gender and race, certain referral strategies can still amplify the structural biases further when higher hops are allowed. Moreover, when there is no constraint on the number of referrals each person can distribute and when the effect of referral strategies is removed, we prove a precise condition under which the bias in 1-hop referral programs is amplified in higher-hop referral programs. △ Less

Submitted 30 November, 2021; originally announced December 2021.

Comments: preprint

Journal ref: Proceedings of the ACM on Measurement and Analysis of Computing Systems 8.2 (2024): 1-28

arXiv:2102.11925 [pdf, other]

doi 10.1145/3460083

Chasm in Hegemony: Explaining and Reproducing Disparities in Homophilous Networks

Authors: Yiguang Zhang, Jessy Xinyi Han, Ilica Mahajan, Priyanjana Bengani, Augustin Chaintreau

Abstract: In networks with a minority and a majority community, it is well-studied that minorities are under-represented at the top of the social hierarchy. However, researchers are less clear about the representation of minorities from the lower levels of the hierarchy, where other disadvantages or vulnerabilities may exist. We offer a more complete picture of social disparities at each social level with e… ▽ More In networks with a minority and a majority community, it is well-studied that minorities are under-represented at the top of the social hierarchy. However, researchers are less clear about the representation of minorities from the lower levels of the hierarchy, where other disadvantages or vulnerabilities may exist. We offer a more complete picture of social disparities at each social level with empirical evidence that the minority representation exhibits two opposite phases: at the higher rungs of the social ladder, the representation of the minority community decreases; but, lower in the ladder, which is more populous, as you ascend, the representation of the minority community improves. We refer to this opposing phenomenon between the upper-level and lower-level as the \emph{chasm effect}. Previous models of network growth with homophily fail to detect and explain the presence of this chasm effect. We analyze the interactions among a few well-observed network-growing mechanisms with a simple model to reveal the sufficient and necessary conditions for both phases in the chasm effect to occur. By generalizing the simple model naturally, we present a complete bi-affiliation bipartite network-growth model that could successfully capture disparities at all social levels and reproduce real social networks. Finally, we illustrate that addressing the chasm effect can create fairer systems with two applications in advertisement and fact-checks, thereby demonstrating the potential impact of the chasm effect on the future research of minority-majority disparities and fair algorithms. △ Less

Submitted 14 June, 2021; v1 submitted 23 February, 2021; originally announced February 2021.

Journal ref: Proceedings of the ACM on Measurement and Analysis of Computing Systems 5.2 (2021): 1-38

arXiv:2012.02394 [pdf, ps, other]

Biased Programmers? Or Biased Data? A Field Experiment in Operationalizing AI Ethics

Authors: Bo Cowgill, Fabrizio Dell'Acqua, Samuel Deng, Daniel Hsu, Nakul Verma, Augustin Chaintreau

Abstract: Why do biased predictions arise? What interventions can prevent them? We evaluate 8.2 million algorithmic predictions of math performance from $\approx$400 AI engineers, each of whom developed an algorithm under a randomly assigned experimental condition. Our treatment arms modified programmers' incentives, training data, awareness, and/or technical knowledge of AI ethics. We then assess out-of-sa… ▽ More Why do biased predictions arise? What interventions can prevent them? We evaluate 8.2 million algorithmic predictions of math performance from $\approx$400 AI engineers, each of whom developed an algorithm under a randomly assigned experimental condition. Our treatment arms modified programmers' incentives, training data, awareness, and/or technical knowledge of AI ethics. We then assess out-of-sample predictions from their algorithms using randomized audit manipulations of algorithm inputs and ground-truth math performance for 20K subjects. We find that biased predictions are mostly caused by biased training data. However, one-third of the benefit of better training data comes through a novel economic mechanism: Engineers exert greater effort and are more responsive to incentives when given better training data. We also assess how performance varies with programmers' demographic characteristics, and their performance on a psychological test of implicit bias (IAT) concerning gender and careers. We find no evidence that female, minority and low-IAT engineers exhibit lower bias or discrimination in their code. However, we do find that prediction errors are correlated within demographic groups, which creates performance improvements through cross-demographic averaging. Finally, we quantify the benefits and tradeoffs of practical managerial or policy interventions such as technical advice, simple reminders, and improved incentives for decreasing algorithmic bias. △ Less

Submitted 3 December, 2020; originally announced December 2020.

Comments: Part of the Navigating the Broader Impacts of AI Research Workshop at NeurIPS 2020

arXiv:1812.03379 [pdf, other]

PopFactor: Live-Streamer Behavior and Popularity

Authors: Robert Netzorg, Lauren Arnett, Augustin Chaintreau, Eugene Wu

Abstract: Live video-streaming platforms such as Twitch enable top content creators to reap significant profits and influence. To that effect, various behavioral norms are recommended to new entrants and those seeking to increase their popularity and success. Chiefly among them are to simply put in the effort and promote on social media outlets such as Twitter, Instagram, and the like. But does following th… ▽ More Live video-streaming platforms such as Twitch enable top content creators to reap significant profits and influence. To that effect, various behavioral norms are recommended to new entrants and those seeking to increase their popularity and success. Chiefly among them are to simply put in the effort and promote on social media outlets such as Twitter, Instagram, and the like. But does following these behaviors indeed have a relationship with eventual popularity? In this paper, we collect a corpus of Twitch streamer popularity measures --- spanning social and financial measures --- and their behavior data on Twitch and third party platform. We also compile a set of community-defined behavioral norms. We then perform temporal analysis to identify the increased predictive value that a streamer's future behavior contributes to predicting future popularity. At the population level, we find that behavioral information improves the prediction of relative growth that exceeds the median streamer. At the individual level, we find that although it is difficult to quickly become successful in absolute terms, streamers that put in considerable effort are more successful than the rest, and that creating social media accounts to promote oneself is effective irrespective of when the accounts are created. Ultimately, we find that studying the popularity and success of content creators in the long term is a promising and rich research area. △ Less

Submitted 8 December, 2018; originally announced December 2018.

arXiv:1810.02318 [pdf, other]

Information Market for Web Browsing: Design, Usability and Incremental Adoption

Authors: Arash Molavi Kakhki, Vijay Erramilli, Phillipa Gill, Augustin Chaintreau, Balachander Krishnamurthy

Abstract: Browsing privacy solutions face an uphill battle to deployment. Many operate counter to the economic objectives of popular online services (e.g., by completely blocking ads) and do not provide enough incentive for users who may be subject to performance degradation for deploying them. In this study, we take a step towards realizing a system for online privacy that is mutually beneficial to users a… ▽ More Browsing privacy solutions face an uphill battle to deployment. Many operate counter to the economic objectives of popular online services (e.g., by completely blocking ads) and do not provide enough incentive for users who may be subject to performance degradation for deploying them. In this study, we take a step towards realizing a system for online privacy that is mutually beneficial to users and online advertisers: an information market. This system not only maintains economic viability for online services, but also provides users with financial compensation to encourage them to participate. We prototype and evaluate an information market that provides privacy and revenue to users while preserving and sometimes improving their Web performance. We evaluate feasibility of the market via a one month field study with 63 users and find that users are indeed willing to sell their browsing information. We also use Web traces of millions of users to drive a simulation study to evaluate the system at scale. We find that the system can indeed be profitable to both users and online advertisers. △ Less

Submitted 4 October, 2018; originally announced October 2018.

Comments: 12 pages, 9 figures, 3 tables, 2 appendixes To appear in Performance18, December 5-7, 2018, Toulouse, France

arXiv:1508.06911 [pdf, other]

Who Contributes to the Knowledge Sharing Economy?

Authors: Arthi Ramachandran, Augustin Chaintreau

Abstract: Information sharing dynamics of social networks rely on a small set of influencers to effectively reach a large audience. Our recent results and observations demonstrate that the shape and identity of this elite, especially those contributing \emph{original} content, is difficult to predict. Information acquisition is often cited as an example of a public good. However, this emerging and powerful… ▽ More Information sharing dynamics of social networks rely on a small set of influencers to effectively reach a large audience. Our recent results and observations demonstrate that the shape and identity of this elite, especially those contributing \emph{original} content, is difficult to predict. Information acquisition is often cited as an example of a public good. However, this emerging and powerful theory has yet to provably offer qualitative insights on how specialization of users into active and passive participants occurs. This paper bridges, for the first time, the theory of public goods and the analysis of diffusion in social media. We introduce a non-linear model of \emph{perishable} public goods, leveraging new observations about sharing of media sources. The primary contribution of this work is to show that \emph{shelf time}, which characterizes the rate at which content get renewed, is a critical factor in audience participation. Our model proves a fundamental \emph{dichotomy} in information diffusion: While short-lived content has simple and predictable diffusion, long-lived content has complex specialization. This occurs even when all information seekers are \emph{ex ante} identical and could be a contributing factor to the difficulty of predicting social network participation and evolution. △ Less

Submitted 27 August, 2015; originally announced August 2015.

Comments: 15 pages in ACM Conference on Online Social Networks 2015

arXiv:1407.2323 [pdf, other]

XRay: Enhancing the Web's Transparency with Differential Correlation

Authors: Mathias Lecuyer, Guillaume Ducoffe, Francis Lan, Andrei Papancea, Theofilos Petsios, Riley Spahn, Augustin Chaintreau, Roxana Geambasu

Abstract: Today's Web services - such as Google, Amazon, and Facebook - leverage user data for varied purposes, including personalizing recommendations, targeting advertisements, and adjusting prices. At present, users have little insight into how their data is being used. Hence, they cannot make informed choices about the services they choose. To increase transparency, we developed XRay, the first fine-gra… ▽ More Today's Web services - such as Google, Amazon, and Facebook - leverage user data for varied purposes, including personalizing recommendations, targeting advertisements, and adjusting prices. At present, users have little insight into how their data is being used. Hence, they cannot make informed choices about the services they choose. To increase transparency, we developed XRay, the first fine-grained, robust, and scalable personal data tracking system for the Web. XRay predicts which data in an arbitrary Web account (such as emails, searches, or viewed products) is being used to target which outputs (such as ads, recommended products, or prices). XRay's core functions are service agnostic and easy to instantiate for new services, and they can track data within and across services. To make predictions independent of the audited service, XRay relies on the following insight: by comparing outputs from different accounts with similar, but not identical, subsets of data, one can pinpoint targeting through correlation. We show both theoretically, and through experiments on Gmail, Amazon, and YouTube, that XRay achieves high precision and recall by correlating data from a surprisingly small number of extra accounts. △ Less

Submitted 7 October, 2014; v1 submitted 8 July, 2014; originally announced July 2014.

Comments: Extended version of a paper presented at the 23rd USENIX Security Symposium (USENIX Security 14)

arXiv:1212.3782 [pdf, other]

Can Selfish Groups be Self-Enforcing?

Authors: Guillaume Ducoffe, Dorian Mazauric, Augustin Chaintreau

Abstract: Algorithmic graph theory has thoroughly analyzed how, given a network describing constraints between various nodes, groups can be formed among these so that the resulting configuration optimizes a \emph{global} metric. In contrast, for various social and economic networks, groups are formed \emph{de facto} by the choices of selfish players. A fundamental problem in this setting is the existence an… ▽ More Algorithmic graph theory has thoroughly analyzed how, given a network describing constraints between various nodes, groups can be formed among these so that the resulting configuration optimizes a \emph{global} metric. In contrast, for various social and economic networks, groups are formed \emph{de facto} by the choices of selfish players. A fundamental problem in this setting is the existence and convergence to a \emph{self-enforcing} configuration: assignment of players into groups such that no player has an incentive to move into another group than hers. Motivated by information sharing on social networks -- and the difficult tradeoff between its benefits and the associated privacy risk -- we study the possible emergence of such stable configurations in a general selfish group formation game. Our paper considers this general game for the first time, and it completes its analysis. We show that convergence critically depends on the level of \emph{collusions} among the players -- which allow multiple players to move simultaneously as long as \emph{all of them} benefit. Solving a previously open problem we exactly show when, depending on collusions, convergence occurs within polynomial time, non-polynomial time, and when it never occurs. We also prove that previously known bounds on convergence time are all loose: by a novel combinatorial analysis of the evolution of this game we are able to provide the first \emph{asymptotically exact} formula on its convergence. Moreover, we extend these results by providing a complete analysis when groups may \emph{overlap}, and for general utility functions representing \emph{multi-modal} interactions. Finally, we prove that collusions have a significant and \emph{positive} effect on the \emph{efficiency} of the equilibrium that is attained. △ Less

Submitted 12 February, 2014; v1 submitted 16 December, 2012; originally announced December 2012.

arXiv:0803.0248 [pdf, ps, other]

Networks become navigable as nodes move and forget

Authors: Augustin Chaintreau, Pierre Fraigniaud, Emmanuelle Lebhar

Abstract: We propose a dynamical process for network evolution, aiming at explaining the emergence of the small world phenomenon, i.e., the statistical observation that any pair of individuals are linked by a short chain of acquaintances computable by a simple decentralized routing algorithm, known as greedy routing. Previously proposed dynamical processes enabled to demonstrate experimentally (by simulat… ▽ More We propose a dynamical process for network evolution, aiming at explaining the emergence of the small world phenomenon, i.e., the statistical observation that any pair of individuals are linked by a short chain of acquaintances computable by a simple decentralized routing algorithm, known as greedy routing. Previously proposed dynamical processes enabled to demonstrate experimentally (by simulations) that the small world phenomenon can emerge from local dynamics. However, the analysis of greedy routing using the probability distributions arising from these dynamics is quite complex because of mutual dependencies. In contrast, our process enables complete formal analysis. It is based on the combination of two simple processes: a random walk process, and an harmonic forgetting process. Both processes reflect natural behaviors of the individuals, viewed as nodes in the network of inter-individual acquaintances. We prove that, in k-dimensional lattices, the combination of these two processes generates long-range links mutually independently distributed as a k-harmonic distribution. We analyze the performances of greedy routing at the stationary regime of our process, and prove that the expected number of steps for routing from any source to any target in any multidimensional lattice is a polylogarithmic function of the distance between the two nodes in the lattice. Up to our knowledge, these results are the first formal proof that navigability in small worlds can emerge from a dynamical process for network evolution. Our dynamical process can find practical applications to the design of spatial gossip and resource location protocols. △ Less

Submitted 3 March, 2008; originally announced March 2008.

Comments: 21 pages, 1 figure

ACM Class: C.2.1; C.2.2; C.2.4

Showing 1–11 of 11 results for author: Chaintreau, A