-
Two-Sided Matching with Resource-Regional Caps
Authors:
Felipe Garrido-Lucero,
Denis Sokolov,
Patrick Loiseau,
Simon Mauras
Abstract:
We study two-sided many-to-one matching problems under a novel type of distributional constraints, resource-regional caps. In the context of college admissions, under resource-regional caps, an admitted student may be provided with a unit of some resource through a college, which belongs to a region possessing some amount of this resource. A student may be admitted to a college with at most one un…
▽ More
We study two-sided many-to-one matching problems under a novel type of distributional constraints, resource-regional caps. In the context of college admissions, under resource-regional caps, an admitted student may be provided with a unit of some resource through a college, which belongs to a region possessing some amount of this resource. A student may be admitted to a college with at most one unit of any resource, i.e., all resources are close substitutes, e.g., dorms on the campus, dorms outside the campus, subsidies for renting a room, etc. The core feature of our model is that students are allowed to be admitted without any resource, which breaks heredity property of previously studied models with regions.
It is well known that a stable matching may not exist under markets with regional constraints. Thus, we focus on three weakened versions of stability that restore existence under resource-regional caps: envy-freeness, non-wastefulness, and novel direct-envy stability. For each version of stability we design corresponding matching mechanism(s). Finally, we compare stability performances of constructed mechanisms using simulations, and conclude that more sophisticated direct-envy stable mechanism is the go-to mechanism for maximal stability of the resulting matching under resource-regional caps.
△ Less
Submitted 20 February, 2025;
originally announced February 2025.
-
On the Impact of the Utility in Semivalue-based Data Valuation
Authors:
Mélissa Tamine,
Benjamin Heymann,
Patrick Loiseau,
Maxime Vono
Abstract:
Semivalue-based data valuation uses cooperative-game theory intuitions to assign each data point a value reflecting its contribution to a downstream task. Still, those values depend on the practitioner's choice of utility, raising the question: How robust is semivalue-based data valuation to changes in the utility? This issue is critical when the utility is set as a trade-off between several crite…
▽ More
Semivalue-based data valuation uses cooperative-game theory intuitions to assign each data point a value reflecting its contribution to a downstream task. Still, those values depend on the practitioner's choice of utility, raising the question: How robust is semivalue-based data valuation to changes in the utility? This issue is critical when the utility is set as a trade-off between several criteria and when practitioners must select among multiple equally valid utilities. We address it by introducing the notion of a dataset's spatial signature: given a semivalue, we embed each data point into a lower-dimensional space where any utility becomes a linear functional, making the data valuation framework amenable to a simpler geometric picture. Building on this, we propose a practical methodology centered on an explicit robustness metric that informs practitioners whether and by how much their data valuation results will shift as the utility changes. We validate this approach across diverse datasets and semivalues, demonstrating strong agreement with rank-correlation analyses and offering analytical insight into how choosing a semivalue can amplify or diminish robustness.
△ Less
Submitted 23 May, 2025; v1 submitted 10 February, 2025;
originally announced February 2025.
-
Prophet Inequalities: Competing with the Top $\ell$ Items is Easy
Authors:
Mathieu Molina,
Nicolas Gast,
Patrick Loiseau,
Vianney Perchet
Abstract:
We explore a prophet inequality problem, where the values of a sequence of items are drawn i.i.d. from some distribution, and an online decision maker must select one item irrevocably. We establish that $\mathrm{CR}_{\ell}$ the worst-case competitive ratio between the expected optimal performance of an online decision maker compared to that of a prophet who uses the average of the top $\ell$ items…
▽ More
We explore a prophet inequality problem, where the values of a sequence of items are drawn i.i.d. from some distribution, and an online decision maker must select one item irrevocably. We establish that $\mathrm{CR}_{\ell}$ the worst-case competitive ratio between the expected optimal performance of an online decision maker compared to that of a prophet who uses the average of the top $\ell$ items is exactly the solution to an integral equation. This quantity $\mathrm{CR}_{\ell}$ is larger than $1-e^{-\ell}$. This implies that the bound converges exponentially fast to $1$ as $\ell$ grows. In particular for $\ell=2$, $\mathrm{CR}_{2} \approx 0.966$ which is much closer to $1$ than the classical bound of $0.745$ for $\ell=1$. Additionally, we prove asymptotic lower bounds for the competitive ratio of a more general scenario, where the decision maker is permitted to select $k$ items. This subsumes the $k$ multi-unit i.i.d. prophet problem and provides the current best asymptotic guarantees, as well as enables broader understanding in the more general framework. Finally, we prove a tight asymptotic competitive ratio when only static threshold policies are allowed.
△ Less
Submitted 10 January, 2025; v1 submitted 14 August, 2024;
originally announced August 2024.
-
Equitable Auctions
Authors:
Simon Finster,
Patrick Loiseau,
Simon Mauras,
Mathieu Molina,
Bary Pradelski
Abstract:
We initiate the study of how auction design affects the division of surplus among buyers. We propose a parsimonious measure for equity and apply it to the family of standard auctions for homogeneous goods. Our surplus-equitable mechanism is efficient, Bayesian-Nash incentive compatible, and achieves surplus parity among winners ex-post. The uniform-price auction is equity-optimal if and only if bu…
▽ More
We initiate the study of how auction design affects the division of surplus among buyers. We propose a parsimonious measure for equity and apply it to the family of standard auctions for homogeneous goods. Our surplus-equitable mechanism is efficient, Bayesian-Nash incentive compatible, and achieves surplus parity among winners ex-post. The uniform-price auction is equity-optimal if and only if buyers have a pure common value. Against intuition, the pay-as-bid auction is not always preferred in terms of equity if buyers have pure private values. In auctions with price mixing between pay-as-bid and uniform prices, we provide prior-free bounds on the equity-preferred pricing rule under a common regularity condition on signals.
△ Less
Submitted 13 November, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
The Price of Opportunity Fairness in Matroid Allocation Problems
Authors:
Rémi Castera,
Felipe Garrido-Lucero,
Patrick Loiseau,
Simon Mauras,
Mathieu Molina,
Vianney Perchet
Abstract:
We consider matroid allocation problems under opportunity fairness constraints: resources need to be allocated to a set of agents under matroid constraints (which includes classical problems such as bipartite matching). Agents are divided into C groups according to a sensitive attribute, and an allocation is opportunity-fair if each group receives the same share proportional to the maximum feasibl…
▽ More
We consider matroid allocation problems under opportunity fairness constraints: resources need to be allocated to a set of agents under matroid constraints (which includes classical problems such as bipartite matching). Agents are divided into C groups according to a sensitive attribute, and an allocation is opportunity-fair if each group receives the same share proportional to the maximum feasible allocation it could achieve in isolation. We study the Price of Fairness (PoF), i.e., the ratio between maximum size allocations and maximum size opportunity-fair allocations. We first provide a characterization of the PoF leveraging the underlying polymatroid structure of the allocation problem. Based on this characterization, we prove bounds on the PoF in various settings from fully adversarial (wort-case) to fully random. Notably, one of our main results considers an arbitrary matroid structure with agents randomly divided into groups. In this setting, we prove a PoF bound as a function of the size of the largest group. Our result implies that, as long as there is no dominant group (i.e., the largest group is not too large), opportunity fairness constraints do not induce any loss of social welfare (defined as the allocation size). Overall, our results give insights into which aspects of the problem's structure affect the trade-off between opportunity fairness and social welfare.
△ Less
Submitted 13 March, 2025; v1 submitted 1 March, 2024;
originally announced March 2024.
-
Trading-off price for data quality to achieve fair online allocation
Authors:
Mathieu Molina,
Nicolas Gast,
Patrick Loiseau,
Vianney Perchet
Abstract:
We consider the problem of online allocation subject to a long-term fairness penalty. Contrary to existing works, however, we do not assume that the decision-maker observes the protected attributes -- which is often unrealistic in practice. Instead they can purchase data that help estimate them from sources of different quality; and hence reduce the fairness penalty at some cost. We model this pro…
▽ More
We consider the problem of online allocation subject to a long-term fairness penalty. Contrary to existing works, however, we do not assume that the decision-maker observes the protected attributes -- which is often unrealistic in practice. Instead they can purchase data that help estimate them from sources of different quality; and hence reduce the fairness penalty at some cost. We model this problem as a multi-armed bandit problem where each arm corresponds to the choice of a data source, coupled with the online allocation problem. We propose an algorithm that jointly solves both problems and show that it has a regret bounded by $\mathcal{O}(\sqrt{T})$. A key difficulty is that the rewards received by selecting a source are correlated by the fairness penalty, which leads to a need for randomization (despite a stochastic setting). Our algorithm takes into account contextual information available before the source selection, and can adapt to many different fairness notions. We also show that in some instances, the estimates used can be learned on the fly.
△ Less
Submitted 4 December, 2023; v1 submitted 23 June, 2023;
originally announced June 2023.
-
DU-Shapley: A Shapley Value Proxy for Efficient Dataset Valuation
Authors:
Felipe Garrido-Lucero,
Benjamin Heymann,
Maxime Vono,
Patrick Loiseau,
Vianney Perchet
Abstract:
We consider the dataset valuation problem, that is, the problem of quantifying the incremental gain, to some relevant pre-defined utility of a machine learning task, of aggregating an individual dataset to others. The Shapley value is a natural tool to perform dataset valuation due to its formal axiomatic justification, which can be combined with Monte Carlo integration to overcome the computation…
▽ More
We consider the dataset valuation problem, that is, the problem of quantifying the incremental gain, to some relevant pre-defined utility of a machine learning task, of aggregating an individual dataset to others. The Shapley value is a natural tool to perform dataset valuation due to its formal axiomatic justification, which can be combined with Monte Carlo integration to overcome the computational tractability challenges. Such generic approximation methods, however, remain expensive in some cases. In this paper, we exploit the knowledge about the structure of the dataset valuation problem to devise more efficient Shapley value estimators. We propose a novel approximation, referred to as discrete uniform Shapley, which is expressed as an expectation under a discrete uniform distribution with support of reasonable size. We justify the relevancy of the proposed framework via asymptotic and non-asymptotic theoretical guarantees and illustrate its benefits via an extensive set of numerical experiments.
△ Less
Submitted 4 November, 2024; v1 submitted 3 June, 2023;
originally announced June 2023.
-
Understanding Blockchain Governance: Analyzing Decentralized Voting to Amend DeFi Smart Contracts
Authors:
Johnnatan Messias,
Vabuk Pahari,
Balakrishnan Chandrasekaran,
Krishna P. Gummadi,
Patrick Loiseau
Abstract:
Smart contracts are contractual agreements between participants of a blockchain, who cannot implicitly trust one another. They are software programs that run on top of a blockchain, and we may need to change them from time to time (e.g., to fix bugs or address new use cases). Governance protocols define the means for amending or changing these smart contracts without any centralized authority. The…
▽ More
Smart contracts are contractual agreements between participants of a blockchain, who cannot implicitly trust one another. They are software programs that run on top of a blockchain, and we may need to change them from time to time (e.g., to fix bugs or address new use cases). Governance protocols define the means for amending or changing these smart contracts without any centralized authority. They distribute the decision-making power to every user of the smart contract: Users vote on accepting or rejecting every change.
In this work, we review and characterize decentralized governance in practice, using Compound and Uniswap -- two widely used governance protocols -- as a case study. We reveal a high concentration of voting power in both Compound and Uniswap: 10 voters hold together 57.86% and 44.72% of the voting power, respectively. Although proposals to change or amend the protocol receive, on average, a substantial number of votes (i.e., 89.39%) in favor within the Compound protocol, they require fewer than three voters to obtain 50% or more votes. We show that voting on Compound proposals can be unfairly expensive for small token holders, and we discover voting coalitions that can further marginalize these users.
△ Less
Submitted 21 April, 2024; v1 submitted 28 May, 2023;
originally announced May 2023.
-
Dissecting Bitcoin and Ethereum Transactions: On the Lack of Transaction Contention and Prioritization Transparency in Blockchains
Authors:
Johnnatan Messias,
Vabuk Pahari,
Balakrishnan Chandrasekaran,
Krishna P. Gummadi,
Patrick Loiseau
Abstract:
In permissionless blockchains, transaction issuers include a fee to incentivize miners to include their transactions. To accurately estimate this prioritization fee for a transaction, transaction issuers (or blockchain participants, more generally) rely on two fundamental notions of transparency, namely contention and prioritization transparency. Contention transparency implies that participants a…
▽ More
In permissionless blockchains, transaction issuers include a fee to incentivize miners to include their transactions. To accurately estimate this prioritization fee for a transaction, transaction issuers (or blockchain participants, more generally) rely on two fundamental notions of transparency, namely contention and prioritization transparency. Contention transparency implies that participants are aware of every pending transaction that will contend with a given transaction for inclusion. Prioritization transparency states that the participants are aware of the transaction or prioritization fees paid by every such contending transaction. Neither of these notions of transparency holds well today. Private relay networks, for instance, allow users to send transactions privately to miners. Besides, users can offer fees to miners via either direct transfers to miners' wallets or off-chain payments -- neither of which are public. In this work, we characterize the lack of contention and prioritization transparency in Bitcoin and Ethereum resulting from such practices. We show that private relay networks are widely used and private transactions are quite prevalent. We show that the lack of transparency facilitates miners to collude and overcharge users who may use these private relay networks despite them offering little to no guarantees on transaction prioritization. The lack of these transparencies in blockchains has crucial implications for transaction issuers as well as the stability of blockchains. Finally, we make our data sets and scripts publicly available.
△ Less
Submitted 24 May, 2023; v1 submitted 14 February, 2023;
originally announced February 2023.
-
Bounding and Approximating Intersectional Fairness through Marginal Fairness
Authors:
Mathieu Molina,
Patrick Loiseau
Abstract:
Discrimination in machine learning often arises along multiple dimensions (a.k.a. protected attributes); it is then desirable to ensure \emph{intersectional fairness} -- i.e., that no subgroup is discriminated against. It is known that ensuring \emph{marginal fairness} for every dimension independently is not sufficient in general. Due to the exponential number of subgroups, however, directly meas…
▽ More
Discrimination in machine learning often arises along multiple dimensions (a.k.a. protected attributes); it is then desirable to ensure \emph{intersectional fairness} -- i.e., that no subgroup is discriminated against. It is known that ensuring \emph{marginal fairness} for every dimension independently is not sufficient in general. Due to the exponential number of subgroups, however, directly measuring intersectional fairness from data is impossible. In this paper, our primary goal is to understand in detail the relationship between marginal and intersectional fairness through statistical analysis. We first identify a set of sufficient conditions under which an exact relationship can be obtained. Then, we prove bounds (easily computable through marginal fairness and other meaningful statistical quantities) in high-probability on intersectional fairness in the general case. Beyond their descriptive value, we show that these theoretical bounds can be leveraged to derive a heuristic improving the approximation and bounds of intersectional fairness by choosing, in a relevant manner, protected attributes for which we describe intersectional subgroups. Finally, we test the performance of our approximations and bounds on real and synthetic data-sets.
△ Less
Submitted 23 June, 2023; v1 submitted 12 June, 2022;
originally announced June 2022.
-
Fairness in Selection Problems with Strategic Candidates
Authors:
Vitalii Emelianov,
Nicolas Gast,
Patrick Loiseau
Abstract:
To better understand discriminations and the effect of affirmative actions in selection problems (e.g., college admission or hiring), a recent line of research proposed a model based on differential variance. This model assumes that the decision-maker has a noisy estimate of each candidate's quality and puts forward the difference in the noise variances between different demographic groups as a ke…
▽ More
To better understand discriminations and the effect of affirmative actions in selection problems (e.g., college admission or hiring), a recent line of research proposed a model based on differential variance. This model assumes that the decision-maker has a noisy estimate of each candidate's quality and puts forward the difference in the noise variances between different demographic groups as a key factor to explain discrimination. The literature on differential variance, however, does not consider the strategic behavior of candidates who can react to the selection procedure to improve their outcome, which is well-known to happen in many domains.
In this paper, we study how the strategic aspect affects fairness in selection problems. We propose to model selection problems with strategic candidates as a contest game: A population of rational candidates compete by choosing an effort level to increase their quality. They incur a cost-of-effort but get a (random) quality whose expectation equals the chosen effort. A Bayesian decision-maker observes a noisy estimate of the quality of each candidate (with differential variance) and selects the fraction $α$ of best candidates based on their posterior expected quality; each selected candidate receives a reward $S$. We characterize the (unique) equilibrium of this game in the different parameters' regimes, both when the decision-maker is unconstrained and when they are constrained to respect the fairness notion of demographic parity. Our results reveal important impacts of the strategic behavior on the discrimination observed at equilibrium and allow us to understand the effect of imposing demographic parity in this context. In particular, we find that, in many cases, the results contrast with the non-strategic setting.
△ Less
Submitted 24 May, 2022;
originally announced May 2022.
-
Pareto-Optimal Fairness-Utility Amortizations in Rankings with a DBN Exposure Model
Authors:
Till Kletti,
Jean-Michel Renders,
Patrick Loiseau
Abstract:
In recent years, it has become clear that rankings delivered in many areas need not only be useful to the users but also respect fairness of exposure for the item producers. We consider the problem of finding ranking policies that achieve a Pareto-optimal tradeoff between these two aspects. Several methods were proposed to solve it; for instance a popular one is to use linear programming with a Bi…
▽ More
In recent years, it has become clear that rankings delivered in many areas need not only be useful to the users but also respect fairness of exposure for the item producers. We consider the problem of finding ranking policies that achieve a Pareto-optimal tradeoff between these two aspects. Several methods were proposed to solve it; for instance a popular one is to use linear programming with a Birkhoff-von Neumann decomposition. These methods, however, are based on a classical Position Based exposure Model (PBM), which assumes independence between the items (hence the exposure only depends on the rank). In many applications, this assumption is unrealistic and the community increasingly moves towards considering other models that include dependences, such as the Dynamic Bayesian Network (DBN) exposure model. For such models, computing (exact) optimal fair ranking policies remains an open question.
We answer this question by leveraging a new geometrical method based on the so-called expohedron proposed recently for the PBM (Kletti et al., WSDM'22). We lay out the structure of a new geometrical object (the DBN-expohedron), and propose for it a Carathéodory decomposition algorithm of complexity $O(n^3)$, where $n$ is the number of documents to rank. Such an algorithm enables expressing any feasible expected exposure vector as a distribution over at most $n$ rankings; furthermore we show that we can compute the whole set of Pareto-optimal expected exposure vectors with the same complexity $O(n^3)$. Our work constitutes the first exact algorithm able to efficiently find a Pareto-optimal distribution of rankings. It is applicable to a broad range of fairness notions, including classical notions of meritocratic and demographic fairness. We empirically evaluate our method on the TREC2020 and MSLR datasets and compare it to several baselines in terms of Pareto-optimality and speed.
△ Less
Submitted 16 May, 2022;
originally announced May 2022.
-
Introducing the Expohedron for Efficient Pareto-optimal Fairness-Utility Amortizations in Repeated Rankings
Authors:
Till Kletti,
Jean-Michel Renders,
Patrick Loiseau
Abstract:
We consider the problem of computing a sequence of rankings that maximizes consumer-side utility while minimizing producer-side individual unfairness of exposure. While prior work has addressed this problem using linear or quadratic programs on bistochastic matrices, such approaches, relying on Birkhoff-von Neumann (BvN) decompositions, are too slow to be implemented at large scale.
In this pape…
▽ More
We consider the problem of computing a sequence of rankings that maximizes consumer-side utility while minimizing producer-side individual unfairness of exposure. While prior work has addressed this problem using linear or quadratic programs on bistochastic matrices, such approaches, relying on Birkhoff-von Neumann (BvN) decompositions, are too slow to be implemented at large scale.
In this paper we introduce a geometrical object, a polytope that we call expohedron, whose points represent all achievable exposures of items for a Position Based Model (PBM). We exhibit some of its properties and lay out a Carathéodory decomposition algorithm with complexity $O(n^2\log(n))$ able to express any point inside the expohedron as a convex sum of at most $n$ vertices, where $n$ is the number of items to rank. Such a decomposition makes it possible to express any feasible target exposure as a distribution over at most $n$ rankings. Furthermore we show that we can use this polytope to recover the whole Pareto frontier of the multi-objective fairness-utility optimization problem, using a simple geometrical procedure with complexity $O(n^2\log(n))$. Our approach compares favorably to linear or quadratic programming baselines in terms of algorithmic complexity and empirical runtime and is applicable to any merit that is a non-decreasing function of item relevance. Furthermore our solution can be expressed as a distribution over only $n$ permutations, instead of the $(n-1)^2 + 1$ achieved with BvN decompositions. We perform experiments on synthetic and real-world datasets, confirming our theoretical results.
△ Less
Submitted 7 February, 2022;
originally announced February 2022.
-
On Fair Selection in the Presence of Implicit and Differential Variance
Authors:
Vitalii Emelianov,
Nicolas Gast,
Krishna P. Gummadi,
Patrick Loiseau
Abstract:
Discrimination in selection problems such as hiring or college admission is often explained by implicit bias from the decision maker against disadvantaged demographic groups. In this paper, we consider a model where the decision maker receives a noisy estimate of each candidate's quality, whose variance depends on the candidate's group -- we argue that such differential variance is a key feature o…
▽ More
Discrimination in selection problems such as hiring or college admission is often explained by implicit bias from the decision maker against disadvantaged demographic groups. In this paper, we consider a model where the decision maker receives a noisy estimate of each candidate's quality, whose variance depends on the candidate's group -- we argue that such differential variance is a key feature of many selection problems. We analyze two notable settings: in the first, the noise variances are unknown to the decision maker who simply picks the candidates with the highest estimated quality independently of their group; in the second, the variances are known and the decision maker picks candidates having the highest expected quality given the noisy estimate. We show that both baseline decision makers yield discrimination, although in opposite directions: the first leads to underrepresentation of the low-variance group while the second leads to underrepresentation of the high-variance group. We study the effect on the selection utility of imposing a fairness mechanism that we term the $γ$-rule (it is an extension of the classical four-fifths rule and it also includes demographic parity). In the first setting (with unknown variances), we prove that under mild conditions, imposing the $γ$-rule increases the selection utility -- here there is no trade-off between fairness and utility. In the second setting (with known variances), imposing the $γ$-rule decreases the utility but we prove a bound on the utility loss due to the fairness mechanism.
△ Less
Submitted 10 December, 2021;
originally announced December 2021.
-
Selfish & Opaque Transaction Ordering in the Bitcoin Blockchain: The Case for Chain Neutrality
Authors:
Johnnatan Messias,
Mohamed Alzayat,
Balakrishnan Chandrasekaran,
Krishna P. Gummadi,
Patrick Loiseau,
Alan Mislove
Abstract:
Most public blockchain protocols, including the popular Bitcoin and Ethereum blockchains, do not formally specify the order in which miners should select transactions from the pool of pending (or uncommitted) transactions for inclusion in the blockchain. Over the years, informal conventions or "norms" for transaction ordering have, however, emerged via the use of shared software by miners, e.g., t…
▽ More
Most public blockchain protocols, including the popular Bitcoin and Ethereum blockchains, do not formally specify the order in which miners should select transactions from the pool of pending (or uncommitted) transactions for inclusion in the blockchain. Over the years, informal conventions or "norms" for transaction ordering have, however, emerged via the use of shared software by miners, e.g., the GetBlockTemplate (GBT) mining protocol in Bitcoin Core. Today, a widely held view is that Bitcoin miners prioritize transactions based on their offered "transaction fee-per-byte." Bitcoin users are, consequently, encouraged to increase the fees to accelerate the commitment of their transactions, particularly during periods of congestion. In this paper, we audit the Bitcoin blockchain and present statistically significant evidence of mining pools deviating from the norms to accelerate the commitment of transactions for which they have (i) a selfish or vested interest, or (ii) received dark-fee payments via opaque (non-public) side-channels. As blockchains are increasingly being used as a record-keeping substrate for a variety of decentralized (financial technology) systems, our findings call for an urgent discussion on defining neutrality norms that miners must adhere to when ordering transactions in the chains. Finally, we make our data sets and scripts publicly available.
△ Less
Submitted 22 October, 2021;
originally announced October 2021.
-
Scalable Optimal Classifiers for Adversarial Settings under Uncertainty
Authors:
Patrick Loiseau,
Benjamin Roussillon
Abstract:
We consider the problem of finding optimal classifiers in an adversarial setting where the class-1 data is generated by an attacker whose objective is not known to the defender -- an aspect that is key to realistic applications but has so far been overlooked in the literature. To model this situation, we propose a Bayesian game framework where the defender chooses a classifier with no a priori res…
▽ More
We consider the problem of finding optimal classifiers in an adversarial setting where the class-1 data is generated by an attacker whose objective is not known to the defender -- an aspect that is key to realistic applications but has so far been overlooked in the literature. To model this situation, we propose a Bayesian game framework where the defender chooses a classifier with no a priori restriction on the set of possible classifiers. The key difficulty in the proposed framework is that the set of possible classifiers is exponential in the set of possible data, which is itself exponential in the number of features used for classification. To counter this, we first show that Bayesian Nash equilibria can be characterized completely via functional threshold classifiers with a small number of parameters. We then show that this low-dimensional characterization enables to develop a training method to compute provably approximately optimal classifiers in a scalable manner; and to develop a learning algorithm for the online setting with low regret (both independent of the dimension of the set of possible data). We illustrate our results through simulations.
△ Less
Submitted 25 October, 2021; v1 submitted 28 June, 2021;
originally announced June 2021.
-
Asymptotic Degradation of Linear Regression Estimates With Strategic Data Sources
Authors:
Benjamin Roussillon,
Nicolas Gast,
Patrick Loiseau,
Panayotis Mertikopoulos
Abstract:
We consider the problem of linear regression from strategic data sources with a public good component, i.e., when data is provided by strategic agents who seek to minimize an individual provision cost for increasing their data's precision while benefiting from the model's overall precision. In contrast to previous works, our model tackles the case where there is uncertainty on the attributes chara…
▽ More
We consider the problem of linear regression from strategic data sources with a public good component, i.e., when data is provided by strategic agents who seek to minimize an individual provision cost for increasing their data's precision while benefiting from the model's overall precision. In contrast to previous works, our model tackles the case where there is uncertainty on the attributes characterizing the agents' data -- a critical aspect of the problem when the number of agents is large. We provide a characterization of the game's equilibrium, which reveals an interesting connection with optimal design. Subsequently, we focus on the asymptotic behavior of the covariance of the linear regression parameters estimated via generalized least squares as the number of data sources becomes large. We provide upper and lower bounds for this covariance matrix and we show that, when the agents' provision costs are superlinear, the model's covariance converges to zero but at a slower rate relative to virtually all learning problems with exogenous data. On the other hand, if the agents' provision costs are linear, this covariance fails to converge. This shows that even the basic property of consistency of generalized least squares estimators is compromised when the data sources are strategic.
△ Less
Submitted 11 March, 2022; v1 submitted 28 June, 2021;
originally announced June 2021.
-
Modeling Coordinated vs. P2P Mining: An Analysis of Inefficiency and Inequality in Proof-of-Work Blockchains
Authors:
Mohamed Alzayat,
Johnnatan Messias,
Balakrishnan Chandrasekaran,
Krishna P. Gummadi,
Patrick Loiseau
Abstract:
We study efficiency in a proof-of-work blockchain with non-zero latencies, focusing in particular on the (inequality in) individual miners' efficiencies. Prior work attributed differences in miners' efficiencies mostly to attacks, but we pursue a different question: Can inequality in miners' efficiencies be explained by delays, even when all miners are honest? Traditionally, such efficiency-relate…
▽ More
We study efficiency in a proof-of-work blockchain with non-zero latencies, focusing in particular on the (inequality in) individual miners' efficiencies. Prior work attributed differences in miners' efficiencies mostly to attacks, but we pursue a different question: Can inequality in miners' efficiencies be explained by delays, even when all miners are honest? Traditionally, such efficiency-related questions were tackled only at the level of the overall system, and in a peer-to-peer (P2P) setting where miners directly connect to one another. Despite it being common today for miners to pool compute capacities in a mining pool managed by a centralized coordinator, efficiency in such a coordinated setting has barely been studied.
In this paper, we propose a simple model of a proof-of-work blockchain with latencies for both the P2P and the coordinated settings. We derive a closed-form expression for the efficiency in the coordinated setting with an arbitrary number of miners and arbitrary latencies, both for the overall system and for each individual miner. We leverage this result to show that inequalities arise from variability in the delays, but that if all miners are equidistant from the coordinator, they have equal efficiency irrespective of their compute capacities. We then prove that, under a natural consistency condition, the overall system efficiency in the P2P setting is higher than that in the coordinated setting. Finally, we perform a simulation-based study to demonstrate that even in the P2P setting delays between miners introduce inequalities, and that there is a more complex interplay between delays and compute capacities.
△ Less
Submitted 5 June, 2021;
originally announced June 2021.
-
Colonel Blotto Games with Favoritism: Competitions with Pre-allocations and Asymmetric Effectiveness
Authors:
Dong Quan Vu,
Patrick Loiseau
Abstract:
We introduce the Colonel Blotto game with favoritism, an extension of the famous Colonel Blotto game where the winner-determination rule is generalized to include pre-allocations and asymmetry of the players' resources effectiveness on each battlefield. Such favoritism is found in many classical applications of the Colonel Blotto game. We focus on the Nash equilibrium. First, we consider the close…
▽ More
We introduce the Colonel Blotto game with favoritism, an extension of the famous Colonel Blotto game where the winner-determination rule is generalized to include pre-allocations and asymmetry of the players' resources effectiveness on each battlefield. Such favoritism is found in many classical applications of the Colonel Blotto game. We focus on the Nash equilibrium. First, we consider the closely related model of all-pay auctions with favoritism and completely characterize its equilibrium. Based on this result, we prove the existence of a set of optimal univariate distributions -- which serve as candidate marginals for an equilibrium -- of the Colonel Blotto game with favoritism and show an explicit construction thereof. In several particular cases, this directly leads to an equilibrium of the Colonel Blotto game with favoritism. In other cases, we use these optimal univariate distributions to derive an approximate equilibrium with well-controlled approximation error. Finally, we propose an algorithm -- based on the notion of winding number in parametric curves -- to efficiently compute an approximation of the proposed optimal univariate distributions with arbitrarily small error.
△ Less
Submitted 1 June, 2021;
originally announced June 2021.
-
On Fair Selection in the Presence of Implicit Variance
Authors:
Vitalii Emelianov,
Nicolas Gast,
Krishna P. Gummadi,
Patrick Loiseau
Abstract:
Quota-based fairness mechanisms like the so-called Rooney rule or four-fifths rule are used in selection problems such as hiring or college admission to reduce inequalities based on sensitive demographic attributes. These mechanisms are often viewed as introducing a trade-off between selection fairness and utility. In recent work, however, Kleinberg and Raghavan showed that, in the presence of imp…
▽ More
Quota-based fairness mechanisms like the so-called Rooney rule or four-fifths rule are used in selection problems such as hiring or college admission to reduce inequalities based on sensitive demographic attributes. These mechanisms are often viewed as introducing a trade-off between selection fairness and utility. In recent work, however, Kleinberg and Raghavan showed that, in the presence of implicit bias in estimating candidates' quality, the Rooney rule can increase the utility of the selection process.
We argue that even in the absence of implicit bias, the estimates of candidates' quality from different groups may differ in another fundamental way, namely, in their variance. We term this phenomenon implicit variance and we ask: can fairness mechanisms be beneficial to the utility of a selection process in the presence of implicit variance (even in the absence of implicit bias)? To answer this question, we propose a simple model in which candidates have a true latent quality that is drawn from a group-independent normal distribution. To make the selection, a decision maker receives an unbiased estimate of the quality of each candidate, with normal noise, but whose variance depends on the candidate's group. We then compare the utility obtained by imposing a fairness mechanism that we term $γ$-rule (it includes demographic parity and the four-fifths rule as special cases), to that of a group-oblivious selection algorithm that picks the candidates with the highest estimated quality independently of their group. Our main result shows that the demographic parity mechanism always increases the selection utility, while any $γ$-rule weakly increases it. We extend our model to a two-stage selection process where the true quality is observed at the second stage. We discuss multiple extensions of our results, in particular to different distributions of the true latent quality.
△ Less
Submitted 24 June, 2020;
originally announced June 2020.
-
Path Planning Problems with Side Observations-When Colonels Play Hide-and-Seek
Authors:
Dong Quan Vu,
Patrick Loiseau,
Alonso Silva,
Long Tran-Thanh
Abstract:
Resource allocation games such as the famous Colonel Blotto (CB) and Hide-and-Seek (HS) games are often used to model a large variety of practical problems, but only in their one-shot versions. Indeed, due to their extremely large strategy space, it remains an open question how one can efficiently learn in these games. In this work, we show that the online CB and HS games can be cast as path plann…
▽ More
Resource allocation games such as the famous Colonel Blotto (CB) and Hide-and-Seek (HS) games are often used to model a large variety of practical problems, but only in their one-shot versions. Indeed, due to their extremely large strategy space, it remains an open question how one can efficiently learn in these games. In this work, we show that the online CB and HS games can be cast as path planning problems with side-observations (SOPPP): at each stage, a learner chooses a path on a directed acyclic graph and suffers the sum of losses that are adversarially assigned to the corresponding edges; and she then receives semi-bandit feedback with side-observations (i.e., she observes the losses on the chosen edges plus some others). We propose a novel algorithm, EXP3-OE, the first-of-its-kind with guaranteed efficient running time for SOPPP without requiring any auxiliary oracle. We provide an expected-regret bound of EXP3-OE in SOPPP matching the order of the best benchmark in the literature. Moreover, we introduce additional assumptions on the observability model under which we can further improve the regret bounds of EXP3-OE. We illustrate the benefit of using EXP3-OE in SOPPP by applying it to the online CB and HS games.
△ Less
Submitted 21 November, 2019; v1 submitted 19 November, 2019;
originally announced November 2019.
-
Approximate Equilibria in Generalized Colonel Blotto and Generalized Lottery Blotto Games
Authors:
Dong Quan Vu,
Patrick Loiseau,
Alonso Silva
Abstract:
In the Colonel Blotto game, two players with a fixed budget simultaneously allocate their resources across n battlefields to maximize the aggregate value gained from the battlefields where they have the higher allocation. Despite its long-standing history and important applications, the Colonel Blotto game still lacks a complete Nash equilibrium characterization in its most general form where play…
▽ More
In the Colonel Blotto game, two players with a fixed budget simultaneously allocate their resources across n battlefields to maximize the aggregate value gained from the battlefields where they have the higher allocation. Despite its long-standing history and important applications, the Colonel Blotto game still lacks a complete Nash equilibrium characterization in its most general form where players are asymmetric and battlefields' values are heterogeneous across battlefields and different between the two players---this is called the Generalized Colonel Blotto game. In this work, we propose a simply-constructed class of strategies---the independently uniform strategies---and we prove that they are approximate equilibria of the Generalized Colonel Blotto game; moreover, we characterize the approximation error according to the game's parameters. We also consider an extension called the Generalized Lottery Blotto game, with stochastic winner-determination rules allowing more flexibility in modeling practical contests. We prove that the proposed strategies are also approximate equilibria of the Generalized Lottery Blotto game.
△ Less
Submitted 3 November, 2020; v1 submitted 15 October, 2019;
originally announced October 2019.
-
Nonzero-sum Adversarial Hypothesis Testing Games
Authors:
Sarath Yasodharan,
Patrick Loiseau
Abstract:
We study nonzero-sum hypothesis testing games that arise in the context of adversarial classification, in both the Bayesian as well as the Neyman-Pearson frameworks. We first show that these games admit mixed strategy Nash equilibria, and then we examine some interesting concentration phenomena of these equilibria. Our main results are on the exponential rates of convergence of classification erro…
▽ More
We study nonzero-sum hypothesis testing games that arise in the context of adversarial classification, in both the Bayesian as well as the Neyman-Pearson frameworks. We first show that these games admit mixed strategy Nash equilibria, and then we examine some interesting concentration phenomena of these equilibria. Our main results are on the exponential rates of convergence of classification errors at equilibrium, which are analogous to the well-known Chernoff-Stein lemma and Chernoff information that describe the error exponents in the classical binary hypothesis testing problem, but with parameters derived from the adversarial model. The results are validated through numerical experiments.
△ Less
Submitted 28 September, 2019;
originally announced September 2019.
-
Combinatorial Bandits for Sequential Learning in Colonel Blotto Games
Authors:
Dong Quan Vu,
Patrick Loiseau,
Alonso Silva
Abstract:
The Colonel Blotto game is a renowned resource allocation problem with a long-standing literature in game theory (almost 100 years). However, its scope of application is still restricted by the lack of studies on the incomplete-information situations where a learning model is needed. In this work, we propose and study a regret-minimization model where a learner repeatedly plays the Colonel Blotto…
▽ More
The Colonel Blotto game is a renowned resource allocation problem with a long-standing literature in game theory (almost 100 years). However, its scope of application is still restricted by the lack of studies on the incomplete-information situations where a learning model is needed. In this work, we propose and study a regret-minimization model where a learner repeatedly plays the Colonel Blotto game against several adversaries. At each stage, the learner distributes her budget of resources on a fixed number of battlefields to maximize the aggregate value of battlefields she wins; each battlefield being won if there is no adversary that has higher allocation. We focus on the bandit feedback setting. We first show that it can be modeled as a path planning problem. It is then possible to use the classical COMBAND algorithm to guarantee a sub-linear regret in terms of time horizon, but this entails two fundamental challenges: (i) the computation is inefficient due to the huge size of the action set, and (ii) the standard exploration distribution leads to a loose guarantee in practice. To address the first, we construct a modified algorithm that can be efficiently implemented by applying a dynamic programming technique called weight pushing; for the second, we propose methods optimizing the exploration distribution to improve the regret bound. Finally, we implement our proposed algorithm and perform numerical experiments that show the regret improvement in practice.
△ Less
Submitted 11 September, 2019;
originally announced September 2019.
-
The Price of Local Fairness in Multistage Selection
Authors:
Vitalii Emelianov,
George Arvanitakis,
Nicolas Gast,
Krishna Gummadi,
Patrick Loiseau
Abstract:
The rise of algorithmic decision making led to active researches on how to define and guarantee fairness, mostly focusing on one-shot decision making. In several important applications such as hiring, however, decisions are made in multiple stage with additional information at each stage. In such cases, fairness issues remain poorly understood.
In this paper we study fairness in $k$-stage select…
▽ More
The rise of algorithmic decision making led to active researches on how to define and guarantee fairness, mostly focusing on one-shot decision making. In several important applications such as hiring, however, decisions are made in multiple stage with additional information at each stage. In such cases, fairness issues remain poorly understood.
In this paper we study fairness in $k$-stage selection problems where additional features are observed at every stage. We first introduce two fairness notions, local (per stage) and global (final stage) fairness, that extend the classical fairness notions to the $k$-stage setting. We propose a simple model based on a probabilistic formulation and show that the locally and globally fair selections that maximize precision can be computed via a linear program. We then define the price of local fairness to measure the loss of precision induced by local constraints; and investigate theoretically and empirically this quantity. In particular, our experiments show that the price of local fairness is generally smaller when the sensitive attribute is observed at the first stage; but globally fair selections are more locally fair when the sensitive attribute is observed at the second stage---hence in both cases it is often possible to have a selection that has a small price of local fairness and is close to locally fair.
△ Less
Submitted 15 June, 2019;
originally announced June 2019.
-
Path Planning Problems with Side Observations-When Colonels Play Hide-and-Seek
Authors:
Dong Quan Vu,
Patrick Loiseau,
Alonso Silva,
Long Tran-Thanh
Abstract:
Resource allocation games such as the famous Colonel Blotto (CB) and Hide-and-Seek (HS) games are often used to model a large variety of practical problems, but only in their one-shot versions. Indeed, due to their extremely large strategy space, it remains an open question how one can efficiently learn in these games. In this work, we show that the online CB and HS games can be cast as path plann…
▽ More
Resource allocation games such as the famous Colonel Blotto (CB) and Hide-and-Seek (HS) games are often used to model a large variety of practical problems, but only in their one-shot versions. Indeed, due to their extremely large strategy space, it remains an open question how one can efficiently learn in these games. In this work, we show that the online CB and HS games can be cast as path planning problems with side-observations (SOPPP): at each stage, a learner chooses a path on a directed acyclic graph and suffers the sum of losses that are adversarially assigned to the corresponding edges; and she then receives semi-bandit feedback with side-observations (i.e., she observes the losses on the chosen edges plus some others). We propose a novel algorithm, EXP3-OE, the first-of-its-kind with guaranteed efficient running time for SOPPP without requiring any auxiliary oracle. We provide an expected-regret bound of EXP3-OE in SOPPP matching the order of the best benchmark in the literature. Moreover, we introduce additional assumptions on the observability model under which we can further improve the regret bounds of EXP3-OE. We illustrate the benefit of using EXP3-OE in SOPPP by applying it to the online CB and HS games.
△ Less
Submitted 21 November, 2019; v1 submitted 27 May, 2019;
originally announced May 2019.
-
Equality of Voice: Towards Fair Representation in Crowdsourced Top-K Recommendations
Authors:
Abhijnan Chakraborty,
Gourab K Patro,
Niloy Ganguly,
Krishna P. Gummadi,
Patrick Loiseau
Abstract:
To help their users to discover important items at a particular time, major websites like Twitter, Yelp, TripAdvisor or NYTimes provide Top-K recommendations (e.g., 10 Trending Topics, Top 5 Hotels in Paris or 10 Most Viewed News Stories), which rely on crowdsourced popularity signals to select the items. However, different sections of a crowd may have different preferences, and there is a large s…
▽ More
To help their users to discover important items at a particular time, major websites like Twitter, Yelp, TripAdvisor or NYTimes provide Top-K recommendations (e.g., 10 Trending Topics, Top 5 Hotels in Paris or 10 Most Viewed News Stories), which rely on crowdsourced popularity signals to select the items. However, different sections of a crowd may have different preferences, and there is a large silent majority who do not explicitly express their opinion. Also, the crowd often consists of actors like bots, spammers, or people running orchestrated campaigns. Recommendation algorithms today largely do not consider such nuances, hence are vulnerable to strategic manipulation by small but hyper-active user groups.
To fairly aggregate the preferences of all users while recommending top-K items, we borrow ideas from prior research on social choice theory, and identify a voting mechanism called Single Transferable Vote (STV) as having many of the fairness properties we desire in top-K item (s)elections. We develop an innovative mechanism to attribute preferences of silent majority which also make STV completely operational. We show the generalizability of our approach by implementing it on two different real-world datasets. Through extensive experimentation and comparison with state-of-the-art techniques, we show that our proposed approach provides maximum user satisfaction, and cuts down drastically on items disliked by most but hyper-actively promoted by a few users.
△ Less
Submitted 21 November, 2018;
originally announced November 2018.
-
Forgetting the Forgotten with Letheia, Concealing Content Deletion from Persistent Observers
Authors:
Mohsen Minaei,
Mainack Mondal,
Patrick Loiseau,
Krishna Gummadi,
Aniket Kate
Abstract:
Most social platforms offer mechanisms allowing users to delete their posts, and a significant fraction of users exercise this right to be forgotten. However, ironically, users' attempt to reduce attention to sensitive posts via deletion, in practice, attracts unwanted attention from stalkers specifically to those posts. Thus, deletions may leave users more vulnerable to attacks on their privacy i…
▽ More
Most social platforms offer mechanisms allowing users to delete their posts, and a significant fraction of users exercise this right to be forgotten. However, ironically, users' attempt to reduce attention to sensitive posts via deletion, in practice, attracts unwanted attention from stalkers specifically to those posts. Thus, deletions may leave users more vulnerable to attacks on their privacy in general. Users hoping to make their posts forgotten face a "damned if I do, damned if I don't" dilemma. Many are shifting towards ephemeral social platform like Snapchat, which will deprive us of important user-data archival. In the form of intermittent withdrawals, we present, Lethe, a novel solution to this problem of forgetting the forgotten. If the next-generation social platforms are willing to give up the uninterrupted availability of non-deleted posts by a very small fraction, Lethe provides privacy to the deleted posts over long durations. In presence of Lethe, an adversarial observer becomes unsure if some posts are permanently deleted or just temporarily withdrawn by Lethe; at the same time, the adversarial observer is overwhelmed by a large number of falsely flagged undeleted posts. To demonstrate the feasibility and performance of Lethe, we analyze large-scale real data about users' deletion over Twitter and thoroughly investigate how to choose time duration distributions for alternating between temporary withdrawals and resurrections of non-deleted posts. We find a favorable trade-off between privacy, availability and adversarial overhead in different settings for users exercising their right to delete. We show that, even against an ultimate adversary with an uninterrupted access to the entire platform, Lethe offers deletion privacy for up to 3 months from the time of deletion, while maintaining content availability as high as 95% and keeping the adversarial precision to 20%.
△ Less
Submitted 25 September, 2018; v1 submitted 30 October, 2017;
originally announced October 2017.
-
Dissecting demand response mechanisms: the role of consumption forecasts and personalized offers
Authors:
Alberto Benegiamo,
Patrick Loiseau,
Giovanni Neglia
Abstract:
Demand-Response (DR) programs, whereby users of an electricity network are encouraged by economic incentives to rearrange their consumption in order to reduce production costs, are envisioned to be a key feature of the smart grid paradigm. Several recent works proposed DR mechanisms and used analytical models to derive optimal incentives. Most of these works, however, rely on a macroscopic descrip…
▽ More
Demand-Response (DR) programs, whereby users of an electricity network are encouraged by economic incentives to rearrange their consumption in order to reduce production costs, are envisioned to be a key feature of the smart grid paradigm. Several recent works proposed DR mechanisms and used analytical models to derive optimal incentives. Most of these works, however, rely on a macroscopic description of the population that does not model individual choices of users. In this paper, we conduct a detailed analysis of those models and we argue that the macroscopic descriptions hide important assumptions that can jeopardize the mechanisms' implementation (such as the ability to make personalized offers and to perfectly estimate the demand that is moved from a timeslot to another). Then, we start from a microscopic description that explicitly models each user's decision. We introduce four DR mechanisms with various assumptions on the provider's capabilities. Contrarily to previous studies, we find that the optimization problems that result from our mechanisms are complex and can be solved numerically only through a heuristic. We present numerical simulations that compare the different mechanisms and their sensitivity to forecast errors. At a high level, our results show that the performance of DR mechanisms under reasonable assumptions on the provider's capabilities are significantly lower than
△ Less
Submitted 12 December, 2016;
originally announced December 2016.
-
A Game-Theoretic Analysis of Adversarial Classification
Authors:
Lemonia Dritsoula,
Patrick Loiseau,
John Musacchio
Abstract:
Attack detection is usually approached as a classification problem. However, standard classification tools often perform poorly because an adaptive attacker can shape his attacks in response to the algorithm. This has led to the recent interest in developing methods for adversarial classification, but to the best of our knowledge, there have been very few prior studies that take into account the a…
▽ More
Attack detection is usually approached as a classification problem. However, standard classification tools often perform poorly because an adaptive attacker can shape his attacks in response to the algorithm. This has led to the recent interest in developing methods for adversarial classification, but to the best of our knowledge, there have been very few prior studies that take into account the attacker's tradeoff between adapting to the classifier being used against him with his desire to maintain the efficacy of his attack. Including this effect is key to derive solutions that perform well in practice.
In this investigation we model the interaction as a game between a defender who chooses a classifier to distinguish between attacks and normal behavior based on a set of observed features and an attacker who chooses his attack features (class 1 data). Normal behavior (class 0 data) is random and exogenous. The attacker's objective balances the benefit from attacks and the cost of being detected while the defender's objective balances the benefit of a correct attack detection and the cost of false alarm. We provide an efficient algorithm to compute all Nash equilibria and a compact characterization of the possible forms of a Nash equilibrium that reveals intuitive messages on how to perform classification in the presence of an attacker. We also explore qualitatively and quantitatively the impact of the non-attacker and underlying parameters on the equilibrium strategies.
△ Less
Submitted 22 June, 2017; v1 submitted 17 October, 2016;
originally announced October 2016.
-
Efficient Algorithms for Scheduling Moldable Tasks
Authors:
Xiaohu Wu,
Patrick Loiseau
Abstract:
We study the problem of scheduling $n$ independent moldable tasks on $m$ processors that arises in large-scale parallel computations. When tasks are monotonic, the best known result is a $(\frac{3}{2}+ε)$-approximation algorithm for makespan minimization with a complexity linear in $n$ and polynomial in $\log{m}$ and $\frac{1}ε$ where $ε$ is arbitrarily small. We propose a new perspective of the e…
▽ More
We study the problem of scheduling $n$ independent moldable tasks on $m$ processors that arises in large-scale parallel computations. When tasks are monotonic, the best known result is a $(\frac{3}{2}+ε)$-approximation algorithm for makespan minimization with a complexity linear in $n$ and polynomial in $\log{m}$ and $\frac{1}ε$ where $ε$ is arbitrarily small. We propose a new perspective of the existing speedup models: the speedup of a task $T_{j}$ is linear when the number $p$ of assigned processors is small (up to a threshold $δ_{j}$) while it presents monotonicity when $p$ ranges in $[δ_{j}, k_{j}]$; the bound $k_{j}$ indicates an unacceptable overhead when parallelizing on too many processors. The generality of this model is proved to be between the classic monotonic and linear-speedup models. For any given integer $δ\geq 5$, let $u=\left\lceil \sqrt[2]δ \right\rceil-1\geq 2$. In this paper, we propose a $\frac{1}{θ(δ)} (1+ε)$-approximation algorithm for makespan minimization where $θ(δ) = \frac{u+1}{u+2}\left( 1- \frac{k}{m} \right)$ ($m\gg k$). As a by-product, we also propose a $θ(δ)$-approximation algorithm for throughput maximization with a common deadline.
△ Less
Submitted 29 March, 2023; v1 submitted 27 September, 2016;
originally announced September 2016.
-
Towards Designing Cost-Optimal Policies to Utilize IaaS Clouds under Online Learning
Authors:
Xiaohu Wu,
Patrick Loiseau,
Esa Hyytia
Abstract:
Many businesses possess a small infrastructure that they can use for their computing tasks, but also often buy extra computing resources from clouds. Cloud vendors such as Amazon EC2 offer two types of purchase options: on-demand and spot instances. As tenants have limited budgets to satisfy their computing needs, it is crucial for them to determine how to purchase different options and utilize th…
▽ More
Many businesses possess a small infrastructure that they can use for their computing tasks, but also often buy extra computing resources from clouds. Cloud vendors such as Amazon EC2 offer two types of purchase options: on-demand and spot instances. As tenants have limited budgets to satisfy their computing needs, it is crucial for them to determine how to purchase different options and utilize them (in addition to possible self-owned instances) in a cost-effective manner while respecting their response-time targets. In this paper, we propose a framework to design policies to allocate self-owned, on-demand and spot instances to arriving jobs. In particular, we propose a near-optimal policy to determine the number of self-owned instance and an optimal policy to determine the number of on-demand instances to buy and the number of spot instances to bid for at each time unit. Our policies rely on a small number of parameters and we use an online learning technique to infer their optimal values. Through numerical simulations, we show the effectiveness of our proposed policies, in particular that they achieve a cost reduction of up to 64.51% when spot and on-demand instances are considered and of up to 43.74% when self-owned instances are considered, compared to previously proposed or intuitive policies.
△ Less
Submitted 12 August, 2019; v1 submitted 18 July, 2016;
originally announced July 2016.
-
An Approximate Dynamic Programming Approach to Adversarial Online Learning
Authors:
Vijay Kamble,
Patrick Loiseau,
Jean Walrand
Abstract:
We describe an approximate dynamic programming (ADP) approach to compute approximations of the optimal strategies and of the minimal losses that can be guaranteed in discounted repeated games with vector-valued losses. Such games prominently arise in the analysis of regret in repeated decision-making in adversarial environments, also known as adversarial online learning. At the core of our approac…
▽ More
We describe an approximate dynamic programming (ADP) approach to compute approximations of the optimal strategies and of the minimal losses that can be guaranteed in discounted repeated games with vector-valued losses. Such games prominently arise in the analysis of regret in repeated decision-making in adversarial environments, also known as adversarial online learning. At the core of our approach is a characterization of the lower Pareto frontier of the set of expected losses that a player can guarantee in these games as the unique fixed point of a set-valued dynamic programming operator. When applied to the problem of regret minimization with discounted losses, our approach yields algorithms that achieve markedly improved performance bounds compared to off-the-shelf online learning algorithms like Hedge. These results thus suggest the significant potential of ADP-based approaches in adversarial online learning.
△ Less
Submitted 26 October, 2020; v1 submitted 16 March, 2016;
originally announced March 2016.
-
Improved Competitive Analysis of Online Scheduling Deadline-Sensitive Jobs
Authors:
Patrick Loiseau,
Xiaohu Wu
Abstract:
We consider the following scheduling problem. There is a single machine and the jobs will arrive for completion online. Each job j is preemptive and, upon its arrival, its other characteristics are immediately revealed to the machine: the deadline requirement, the workload and the value. The objective is to maximize the aggregate value of jobs completed by their deadlines. Using the minimum of the…
▽ More
We consider the following scheduling problem. There is a single machine and the jobs will arrive for completion online. Each job j is preemptive and, upon its arrival, its other characteristics are immediately revealed to the machine: the deadline requirement, the workload and the value. The objective is to maximize the aggregate value of jobs completed by their deadlines. Using the minimum of the ratios of deadline minus arrival time to workload over all jobs as the slackness s, a non-committed and a committed online scheduling algorithm is proposed in [Lucier et al., SPAA'13; Azar et al., EC'15], achieving competitive ratios of 2+f(s), where the big O notation f(s)=\mathcal{O}(\frac{1}{(\sqrt[3]{s}-1)^{2}}), and (2+f(s*b))/b respectively, where b=ω*(1-ω), ωis in (0, 1), and s is no less than 1/b. In this paper, without recourse to the dual fitting technique used in the above works, we propose a simpler and more intuitive analytical framework for the two algorithms, improving the competitive ratio of the first algorithm by 1 and therefore improving the competitive ratio of the second algorithm by 1/b. As stated in [Lucier et al., SPAA'13; Azar et al. EC'15], it is justifiable in scenarios like the online batch processing for cloud computing that the slackness s is large, hence the big O notation in the above competitive ratios can be ignored. Under the assumption, our analysis brings very significant improvements to the competitive ratios of the two algorithms: from 2 to 1 and from 2/b to 1/b respectively.
△ Less
Submitted 11 September, 2015; v1 submitted 30 June, 2015;
originally announced June 2015.
-
On the Reliability of Profile Matching Across Large Online Social Networks
Authors:
Oana Goga,
Patrick Loiseau,
Robin Sommer,
Renata Teixeira,
Krishna P. Gummadi
Abstract:
Matching the profiles of a user across multiple online social networks brings opportunities for new services and applications as well as new insights on user online behavior, yet it raises serious privacy concerns. Prior literature has proposed methods to match profiles and showed that it is possible to do it accurately, but using evaluations that focused on sampled datasets only. In this paper, w…
▽ More
Matching the profiles of a user across multiple online social networks brings opportunities for new services and applications as well as new insights on user online behavior, yet it raises serious privacy concerns. Prior literature has proposed methods to match profiles and showed that it is possible to do it accurately, but using evaluations that focused on sampled datasets only. In this paper, we study the extent to which we can reliably match profiles in practice, across real-world social networks, by exploiting public attributes, i.e., information users publicly provide about themselves. Today's social networks have hundreds of millions of users, which brings completely new challenges as a reliable matching scheme must identify the correct matching profile out of the millions of possible profiles. We first define a set of properties for profile attributes--Availability, Consistency, non-Impersonability, and Discriminability (ACID)--that are both necessary and sufficient to determine the reliability of a matching scheme. Using these properties, we propose a method to evaluate the accuracy of matching schemes in real practical cases. Our results show that the accuracy in practice is significantly lower than the one reported in prior literature. When considering entire social networks, there is a non-negligible number of profiles that belong to different users but have similar attributes, which leads to many false matches. Our paper sheds light on the limits of matching profiles in the real world and illustrates the correct methodology to evaluate matching schemes in realistic scenarios.
△ Less
Submitted 7 June, 2015;
originally announced June 2015.
-
A Game-Theoretic Study on Non-Monetary Incentives in Data Analytics Projects with Privacy Implications
Authors:
Michela Chessa,
Jens Grossklags,
Patrick Loiseau
Abstract:
The amount of personal information contributed by individuals to digital repositories such as social network sites has grown substantially. The existence of this data offers unprecedented opportunities for data analytics research in various domains of societal importance including medicine and public policy. The results of these analyses can be considered a public good which benefits data contribu…
▽ More
The amount of personal information contributed by individuals to digital repositories such as social network sites has grown substantially. The existence of this data offers unprecedented opportunities for data analytics research in various domains of societal importance including medicine and public policy. The results of these analyses can be considered a public good which benefits data contributors as well as individuals who are not making their data available. At the same time, the release of personal information carries perceived and actual privacy risks to the contributors. Our research addresses this problem area. In our work, we study a game-theoretic model in which individuals take control over participation in data analytics projects in two ways: 1) individuals can contribute data at a self-chosen level of precision, and 2) individuals can decide whether they want to contribute at all (or not). From the analyst's perspective, we investigate to which degree the research analyst has flexibility to set requirements for data precision, so that individuals are still willing to contribute to the project, and the quality of the estimation improves. We study this tradeoff scenario for populations of homogeneous and heterogeneous individuals, and determine Nash equilibria that reflect the optimal level of participation and precision of contributions. We further prove that the analyst can substantially increase the accuracy of the analysis by imposing a lower bound on the precision of the data that users can reveal.
△ Less
Submitted 10 May, 2015;
originally announced May 2015.
-
Algorithms for Scheduling Malleable Cloud Tasks
Authors:
Xiaohu Wu,
Patrick Loiseau
Abstract:
Due to the ubiquity of batch data processing in cloud computing, the related problem of scheduling malleable batch tasks and its extensions have received significant attention recently. In this paper, we consider a fundamental model where a set of n tasks is to be processed on C identical machines and each task is specified by a value, a workload, a deadline and a parallelism bound. Within the par…
▽ More
Due to the ubiquity of batch data processing in cloud computing, the related problem of scheduling malleable batch tasks and its extensions have received significant attention recently. In this paper, we consider a fundamental model where a set of n tasks is to be processed on C identical machines and each task is specified by a value, a workload, a deadline and a parallelism bound. Within the parallelism bound, the number of machines assigned to a task can vary over time without affecting its workload. For this model, we obtain two core results: a sufficient and necessary condition such that a set of tasks can be finished by their deadlines on C machines, and an algorithm to produce such a schedule. These core results provide a conceptual tool and an optimal scheduling algorithm that enable proposing new algorithmic analysis and design and improving existing algorithms under various objectives.
△ Less
Submitted 4 February, 2018; v1 submitted 18 January, 2015;
originally announced January 2015.
-
Causal study of Network Performance
Authors:
Hadrien Hours,
Ernst Biersack,
Patrick Loiseau
Abstract:
The use of Internet in the every day life has pushed its evolution in a very fast way. The heterogeneity of the equipments supporting its networks, as well as the different devices from which it can be accessed, have participated in increasing the complexity of understanding its global behavior and performance. In our study we propose a new method for studying the performance of TCP protocol based…
▽ More
The use of Internet in the every day life has pushed its evolution in a very fast way. The heterogeneity of the equipments supporting its networks, as well as the different devices from which it can be accessed, have participated in increasing the complexity of understanding its global behavior and performance. In our study we propose a new method for studying the performance of TCP protocol based on causal graphs. Causal graphs offer models easy to interpret and use. They highlight the structural model of the system they represent and give us access to the causal dependences between the different parameters of the system. One of the major contribution of causal graphs is their ability to predict the effects of an intervention from observations made before this intervention.
△ Less
Submitted 22 April, 2014;
originally announced April 2014.
-
Linear Regression from Strategic Data Sources
Authors:
Nicolas Gast,
Stratis Ioannidis,
Patrick Loiseau,
Benjamin Roussillon
Abstract:
Linear regression is a fundamental building block of statistical data analysis. It amounts to estimating the parameters of a linear model that maps input features to corresponding outputs. In the classical setting where the precision of each data point is fixed, the famous Aitken/Gauss-Markov theorem in statistics states that generalized least squares (GLS) is a so-called "Best Linear Unbiased Est…
▽ More
Linear regression is a fundamental building block of statistical data analysis. It amounts to estimating the parameters of a linear model that maps input features to corresponding outputs. In the classical setting where the precision of each data point is fixed, the famous Aitken/Gauss-Markov theorem in statistics states that generalized least squares (GLS) is a so-called "Best Linear Unbiased Estimator" (BLUE). In modern data science, however, one often faces strategic data sources, namely, individuals who incur a cost for providing high-precision data.
In this paper, we study a setting in which features are public but individuals choose the precision of the outputs they reveal to an analyst. We assume that the analyst performs linear regression on this dataset, and individuals benefit from the outcome of this estimation. We model this scenario as a game where individuals minimize a cost comprising two components: (a) an (agent-specific) disclosure cost for providing high-precision data; and (b) a (global) estimation cost representing the inaccuracy in the linear model estimate. In this game, the linear model estimate is a public good that benefits all individuals. We establish that this game has a unique non-trivial Nash equilibrium. We study the efficiency of this equilibrium and we prove tight bounds on the price of stability for a large class of disclosure and estimation costs. Finally, we study the estimator accuracy achieved at equilibrium. We show that, in general, Aitken's theorem does not hold under strategic data sources, though it does hold if individuals have identical disclosure costs (up to a multiplicative factor). When individuals have non-identical costs, we derive a bound on the improvement of the equilibrium estimation cost that can be achieved by deviating from GLS, under mild assumptions on the disclosure cost functions.
△ Less
Submitted 12 December, 2019; v1 submitted 30 September, 2013;
originally announced September 2013.
-
Incentive Mechanisms for Internet Congestion Management: Fixed-Budget Rebate versus Time-of-Day Pricing
Authors:
Patrick Loiseau,
Galina Schwartz,
John Musacchio,
Saurabh Amin,
S. Shankar Sastry
Abstract:
Mobile data traffic has been steadily rising in the past years. This has generated a significant interest in the deployment of incentive mechanisms to reduce peak-time congestion. Typically, the design of these mechanisms requires information about user demand and sensitivity to prices. Such information is naturally imperfect. In this paper, we propose a \emph{fixed-budget rebate mechanism} that g…
▽ More
Mobile data traffic has been steadily rising in the past years. This has generated a significant interest in the deployment of incentive mechanisms to reduce peak-time congestion. Typically, the design of these mechanisms requires information about user demand and sensitivity to prices. Such information is naturally imperfect. In this paper, we propose a \emph{fixed-budget rebate mechanism} that gives each user a reward proportional to his percentage contribution to the aggregate reduction in peak time demand. For comparison, we also study a time-of-day pricing mechanism that gives each user a fixed reward per unit reduction of his peak-time demand. To evaluate the two mechanisms, we introduce a game-theoretic model that captures the \emph{public good} nature of decongestion. For each mechanism, we demonstrate that the socially optimal level of decongestion is achievable for a specific choice of the mechanism's parameter. We then investigate how imperfect information about user demand affects the mechanisms' effectiveness. From our results, the fixed-budget rebate pricing is more robust when the users' sensitivity to congestion is "sufficiently" convex. This feature of the fixed-budget rebate mechanism is attractive for many situations of interest and is driven by its closed-loop property, i.e., the unit reward decreases as the peak-time demand decreases.
△ Less
Submitted 29 May, 2013;
originally announced May 2013.
-
Un modèle de trafic adapté à la volatilité de charge d'un service de vidéo à la demande: Identification, validation et application à la gestion dynamique de ressources
Authors:
Shubhabrata Roy,
Thomas Begin,
Patrick Loiseau,
Paulo Goncalves
Abstract:
Dynamic resource management has become an active area of research in the Cloud Computing paradigm. Cost of resources varies significantly depending on configuration for using them. Hence efficient management of resources is of prime interest to both Cloud Providers and Cloud Users. In this report we suggest a probabilistic resource provisioning approach that can be exploited as the input of a dyna…
▽ More
Dynamic resource management has become an active area of research in the Cloud Computing paradigm. Cost of resources varies significantly depending on configuration for using them. Hence efficient management of resources is of prime interest to both Cloud Providers and Cloud Users. In this report we suggest a probabilistic resource provisioning approach that can be exploited as the input of a dynamic resource management scheme. Using a Video on Demand use case to justify our claims, we propose an analytical model inspired from standard models developed for epidemiology spreading, to represent sudden and intense workload variations. As an essential step we also derive a heuristic identification procedure to calibrate all the model parameters and evaluate the performance of our estimator on synthetic time series. We show how good can our model fit to real workload traces with respect to the stationary case in terms of steady-state probability and autocorrelation structure. We find that the resulting model verifies a Large Deviation Principle that statistically characterizes extreme rare events, such as the ones produced by "buzz effects" that may cause workload overflow in the VoD context. This analysis provides valuable insight on expectable abnormal behaviors of systems. We exploit the information obtained using the Large Deviation Principle for the proposed Video on Demand use-case for defining policies (Service Level Agreements). We believe these policies for elastic resource provisioning and usage may be of some interest to all stakeholders in the emerging context of cloud networking.
△ Less
Submitted 2 October, 2012; v1 submitted 24 September, 2012;
originally announced September 2012.
-
Dynamic Resource Management in Clouds: A Probabilistic Approach
Authors:
Paulo Gonçalves,
Shubhabrata Roy,
Thomas Begin,
Patrick Loiseau
Abstract:
Dynamic resource management has become an active area of research in the Cloud Computing paradigm. Cost of resources varies significantly depending on configuration for using them. Hence efficient management of resources is of prime interest to both Cloud Providers and Cloud Users. In this work we suggest a probabilistic resource provisioning approach that can be exploited as the input of a dynami…
▽ More
Dynamic resource management has become an active area of research in the Cloud Computing paradigm. Cost of resources varies significantly depending on configuration for using them. Hence efficient management of resources is of prime interest to both Cloud Providers and Cloud Users. In this work we suggest a probabilistic resource provisioning approach that can be exploited as the input of a dynamic resource management scheme. Using a Video on Demand use case to justify our claims, we propose an analytical model inspired from standard models developed for epidemiology spreading, to represent sudden and intense workload variations. We show that the resulting model verifies a Large Deviation Principle that statistically characterizes extreme rare events, such as the ones produced by "buzz/flash crowd effects" that may cause workload overflow in the VoD context. This analysis provides valuable insight on expectable abnormal behaviors of systems. We exploit the information obtained using the Large Deviation Principle for the proposed Video on Demand use-case for defining policies (Service Level Agreements). We believe these policies for elastic resource provisioning and usage may be of some interest to all stakeholders in the emerging context of cloud networking
△ Less
Submitted 21 September, 2012;
originally announced September 2012.
-
A Game-Theoretical Approach for Finding Optimal Strategies in an Intruder Classification Game
Authors:
Lemonia Dritsoula,
Patrick Loiseau,
John Musacchio
Abstract:
We consider a game in which a strategic defender classifies an intruder as spy or spammer. The classification is based on the number of file server and mail server attacks observed during a fixed window. The spammer naively attacks (with a known distribution) his main target: the mail server. The spy strategically selects the number of attacks on his main target: the file server. The defender stra…
▽ More
We consider a game in which a strategic defender classifies an intruder as spy or spammer. The classification is based on the number of file server and mail server attacks observed during a fixed window. The spammer naively attacks (with a known distribution) his main target: the mail server. The spy strategically selects the number of attacks on his main target: the file server. The defender strategically selects his classification policy: a threshold on the number of file server attacks. We model the interaction of the two players (spy and defender) as a nonzero-sum game: The defender needs to balance missed detections and false alarms in his objective function, while the spy has a tradeoff between attacking the file server more aggressively and increasing the chances of getting caught. We give a characterization of the Nash equilibria in mixed strategies, and demonstrate how the Nash equilibria can be computed in polynomial time. Our characterization gives interesting and non-intuitive insights on the players' strategies at equilibrium: The defender uniformly randomizes between a set of thresholds that includes very large values. The strategy of the spy is a truncated version of the spammer's distribution. We present numerical simulations that validate and illustrate our theoretical results.
△ Less
Submitted 3 July, 2012;
originally announced July 2012.