-
Solidago: A Modular Collaborative Scoring Pipeline
Authors:
Lê Nguyên Hoang,
Romain Beylerian,
Bérangère Colbois,
Julien Fageot,
Louis Faucon,
Aidan Jungo,
Alain Le Noac'h,
Adrien Matissart,
Oscar Villemaud
Abstract:
This paper presents Solidago, an end-to-end modular pipeline to allow any community of users to collaboratively score any number of entities. Solidago proposes a six-module decomposition. First, it uses pretrust and peer-to-peer vouches to assign trust scores to users. Second, based on participation, trust scores are turned into voting rights per user per entity. Third, for each user, a preference…
▽ More
This paper presents Solidago, an end-to-end modular pipeline to allow any community of users to collaboratively score any number of entities. Solidago proposes a six-module decomposition. First, it uses pretrust and peer-to-peer vouches to assign trust scores to users. Second, based on participation, trust scores are turned into voting rights per user per entity. Third, for each user, a preference model is learned from the user's evaluation data. Fourth, users' models are put on a similar scale. Fifth, these models are securely aggregated. Sixth, models are post-processed to yield human-readable global scores. We also propose default implementations of the six modules, including a novel trust propagation algorithm, and adaptations of state-of-the-art scaling and aggregation solutions. Our pipeline has been successfully deployed on the open-source platform tournesol.app. We thereby lay an appealing foundation for the collaborative, effective, scalable, fair, interpretable and secure scoring of any set of entities.
△ Less
Submitted 25 September, 2024; v1 submitted 30 October, 2022;
originally announced November 2022.
-
Robust Sparse Voting
Authors:
Youssef Allouah,
Rachid Guerraoui,
Lê-Nguyên Hoang,
Oscar Villemaud
Abstract:
Many applications, such as content moderation and recommendation, require reviewing and scoring a large number of alternatives. Doing so robustly is however very challenging. Indeed, voters' inputs are inevitably sparse: most alternatives are only scored by a small fraction of voters. This sparsity amplifies the effects of biased voters introducing unfairness, and of malicious voters seeking to ha…
▽ More
Many applications, such as content moderation and recommendation, require reviewing and scoring a large number of alternatives. Doing so robustly is however very challenging. Indeed, voters' inputs are inevitably sparse: most alternatives are only scored by a small fraction of voters. This sparsity amplifies the effects of biased voters introducing unfairness, and of malicious voters seeking to hack the voting process by reporting dishonest scores. We give a precise definition of the problem of robust sparse voting, highlight its underlying technical challenges, and present a novel voting mechanism addressing the problem. We prove that, using this mechanism, no voter can have more than a small parameterizable effect on each alternative's score; a property we call Lipschitz resilience. We also identify conditions of voters comparability under which any unanimous preferences can be recovered, even when each voter provides sparse scores, on a scale that is potentially very different from any other voter's score scale. Proving these properties required us to introduce, analyze and carefully compose novel aggregation primitives which could be of independent interest.
△ Less
Submitted 25 January, 2024; v1 submitted 17 February, 2022;
originally announced February 2022.
-
An Equivalence Between Data Poisoning and Byzantine Gradient Attacks
Authors:
Sadegh Farhadkhani,
Rachid Guerraoui,
Lê-Nguyên Hoang,
Oscar Villemaud
Abstract:
To study the resilience of distributed learning, the "Byzantine" literature considers a strong threat model where workers can report arbitrary gradients to the parameter server. Whereas this model helped obtain several fundamental results, it has sometimes been considered unrealistic, when the workers are mostly trustworthy machines. In this paper, we show a surprising equivalence between this mod…
▽ More
To study the resilience of distributed learning, the "Byzantine" literature considers a strong threat model where workers can report arbitrary gradients to the parameter server. Whereas this model helped obtain several fundamental results, it has sometimes been considered unrealistic, when the workers are mostly trustworthy machines. In this paper, we show a surprising equivalence between this model and data poisoning, a threat considered much more realistic. More specifically, we prove that every gradient attack can be reduced to data poisoning, in any personalized federated learning system with PAC guarantees (which we show are both desirable and realistic). This equivalence makes it possible to obtain new impossibility results on the resilience of any "robust" learning algorithm to data poisoning in highly heterogeneous applications, as corollaries of existing impossibility theorems on Byzantine machine learning. Moreover, using our equivalence, we derive a practical attack that we show (theoretically and empirically) can be very effective against classical personalized federated learning models.
△ Less
Submitted 20 July, 2022; v1 submitted 17 February, 2022;
originally announced February 2022.