-
GL-LowPopArt: A Nearly Instance-Wise Minimax-Optimal Estimator for Generalized Low-Rank Trace Regression
Authors:
Junghyun Lee,
Kyoungseok Jang,
Kwang-Sung Jun,
Milan Vojnović,
Se-Young Yun
Abstract:
We present `GL-LowPopArt`, a novel Catoni-style estimator for generalized low-rank trace regression. Building on `LowPopArt` (Jang et al., 2024), it employs a two-stage approach: nuclear norm regularization followed by matrix Catoni estimation. We establish state-of-the-art estimation error bounds, surpassing existing guarantees (Fan et al., 2019; Kang et al., 2022), and reveal a novel experimenta…
▽ More
We present `GL-LowPopArt`, a novel Catoni-style estimator for generalized low-rank trace regression. Building on `LowPopArt` (Jang et al., 2024), it employs a two-stage approach: nuclear norm regularization followed by matrix Catoni estimation. We establish state-of-the-art estimation error bounds, surpassing existing guarantees (Fan et al., 2019; Kang et al., 2022), and reveal a novel experimental design objective, $\mathrm{GL}(π)$. The key technical challenge is controlling bias from the nonlinear inverse link function, which we address by our two-stage approach. We prove a *local* minimax lower bound, showing that our `GL-LowPopArt` enjoys instance-wise optimality up to the condition number of the ground-truth Hessian. Applications include generalized linear matrix completion, where `GL-LowPopArt` achieves a state-of-the-art Frobenius error guarantee, and **bilinear dueling bandits**, a novel setting inspired by general preference learning (Zhang et al., 2024). Our analysis of a `GL-LowPopArt`-based explore-then-commit algorithm reveals a new, potentially interesting problem-dependent quantity, along with improved Borda regret bound than vectorization (Wu et al., 2024).
△ Less
Submitted 3 June, 2025; v1 submitted 3 June, 2025;
originally announced June 2025.
-
What is the Alignment Objective of GRPO?
Authors:
Milan Vojnovic,
Se-Young Yun
Abstract:
In this note, we examine the aggregation of preferences achieved by the Group Policy Optimisation (GRPO) algorithm, a reinforcement learning method used to train advanced artificial intelligence models such as DeepSeek-R1-Zero and DeepSeekMath. The GRPO algorithm trains a policy using a reward preference model, which is computed by sampling a set of outputs for a given context, observing the corre…
▽ More
In this note, we examine the aggregation of preferences achieved by the Group Policy Optimisation (GRPO) algorithm, a reinforcement learning method used to train advanced artificial intelligence models such as DeepSeek-R1-Zero and DeepSeekMath. The GRPO algorithm trains a policy using a reward preference model, which is computed by sampling a set of outputs for a given context, observing the corresponding rewards, and applying shift-and-scale normalisation to these reward values. Additionally, it incorporates a penalty function to discourage deviations from a reference policy.
We present a framework that enables us to characterise the stationary policies of the GRPO algorithm. This analysis reveals that the aggregation of preferences differs fundamentally from standard logarithmic pooling, which is implemented by other approaches such as RLHF. The precise form of preference aggregation arises from the way the reward preference model is defined and from the penalty function, which we show to essentially correspond to the reverse Kullback-Leibler (KL) divergence between the aggregation policy and the reference policy.
Interestingly, we demonstrate that for groups of size two, the reward preference model corresponds to pairwise comparison preferences, similar to those in other alignment methods based on pairwise comparison feedback. We provide explicit characterisations of the aggregate preference for binary questions, for groups of size two, and in the limit of large group size. This provides insights into the dependence of the aggregate preference on parameters such as the regularisation constant and the confidence margin of question answers.
Finally, we discuss the aggregation of preferences obtained by modifying the GRPO algorithm to use direct KL divergence as the penalty or to use rewards without scale normalisation.
△ Less
Submitted 13 March, 2025; v1 submitted 25 February, 2025;
originally announced February 2025.
-
An Adaptive Approach for Infinitely Many-armed Bandits under Generalized Rotting Constraints
Authors:
Jung-hun Kim,
Milan Vojnovic,
Se-Young Yun
Abstract:
In this study, we consider the infinitely many-armed bandit problems in a rested rotting setting, where the mean reward of an arm may decrease with each pull, while otherwise, it remains unchanged. We explore two scenarios regarding the rotting of rewards: one in which the cumulative amount of rotting is bounded by $V_T$, referred to as the slow-rotting case, and the other in which the cumulative…
▽ More
In this study, we consider the infinitely many-armed bandit problems in a rested rotting setting, where the mean reward of an arm may decrease with each pull, while otherwise, it remains unchanged. We explore two scenarios regarding the rotting of rewards: one in which the cumulative amount of rotting is bounded by $V_T$, referred to as the slow-rotting case, and the other in which the cumulative number of rotting instances is bounded by $S_T$, referred to as the abrupt-rotting case. To address the challenge posed by rotting rewards, we introduce an algorithm that utilizes UCB with an adaptive sliding window, designed to manage the bias and variance trade-off arising due to rotting rewards. Our proposed algorithm achieves tight regret bounds for both slow and abrupt rotting scenarios. Lastly, we demonstrate the performance of our algorithm using numerical experiments.
△ Less
Submitted 1 June, 2025; v1 submitted 22 April, 2024;
originally announced April 2024.
-
On the Convergence of Loss and Uncertainty-based Active Learning Algorithms
Authors:
Daniel Haimovich,
Dima Karamshuk,
Fridolin Linder,
Niek Tax,
Milan Vojnovic
Abstract:
We investigate the convergence rates and data sample sizes required for training a machine learning model using a stochastic gradient descent (SGD) algorithm, where data points are sampled based on either their loss value or uncertainty value. These training methods are particularly relevant for active learning and data subset selection problems. For SGD with a constant step size update, we presen…
▽ More
We investigate the convergence rates and data sample sizes required for training a machine learning model using a stochastic gradient descent (SGD) algorithm, where data points are sampled based on either their loss value or uncertainty value. These training methods are particularly relevant for active learning and data subset selection problems. For SGD with a constant step size update, we present convergence results for linear classifiers and linearly separable datasets using squared hinge loss and similar training loss functions. Additionally, we extend our analysis to more general classifiers and datasets, considering a wide range of loss-based sampling strategies and smooth convex training loss functions. We propose a novel algorithm called Adaptive-Weight Sampling (AWS) that utilizes SGD with an adaptive step size that achieves stochastic Polyak's step size in expectation. We establish convergence rate results for AWS for smooth convex training loss functions. Our numerical experiments demonstrate the efficiency of AWS on various datasets by using either exact or estimated loss values.
△ Less
Submitted 22 November, 2024; v1 submitted 21 December, 2023;
originally announced December 2023.
-
Combinatorial Bandits for Maximum Value Reward Function under Max Value-Index Feedback
Authors:
Yiliu Wang,
Wei Chen,
Milan Vojnović
Abstract:
We consider a combinatorial multi-armed bandit problem for maximum value reward function under maximum value and index feedback. This is a new feedback structure that lies in between commonly studied semi-bandit and full-bandit feedback structures. We propose an algorithm and provide a regret bound for problem instances with stochastic arm outcomes according to arbitrary distributions with finite…
▽ More
We consider a combinatorial multi-armed bandit problem for maximum value reward function under maximum value and index feedback. This is a new feedback structure that lies in between commonly studied semi-bandit and full-bandit feedback structures. We propose an algorithm and provide a regret bound for problem instances with stochastic arm outcomes according to arbitrary distributions with finite supports. The regret analysis rests on considering an extended set of arms, associated with values and probabilities of arm outcomes, and applying a smoothness condition. Our algorithm achieves a $O((k/Δ)\log(T))$ distribution-dependent and a $\tilde{O}(\sqrt{T})$ distribution-independent regret where $k$ is the number of arms selected in each round, $Δ$ is a distribution-dependent reward gap and $T$ is the horizon time. Perhaps surprisingly, the regret bound is comparable to previously-known bound under more informative semi-bandit feedback. We demonstrate the effectiveness of our algorithm through experimental results.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
Doubly Adversarial Federated Bandits
Authors:
Jialin Yi,
Milan Vojnović
Abstract:
We study a new non-stochastic federated multi-armed bandit problem with multiple agents collaborating via a communication network. The losses of the arms are assigned by an oblivious adversary that specifies the loss of each arm not only for each time step but also for each agent, which we call ``doubly adversarial". In this setting, different agents may choose the same arm in the same time step b…
▽ More
We study a new non-stochastic federated multi-armed bandit problem with multiple agents collaborating via a communication network. The losses of the arms are assigned by an oblivious adversary that specifies the loss of each arm not only for each time step but also for each agent, which we call ``doubly adversarial". In this setting, different agents may choose the same arm in the same time step but observe different feedback. The goal of each agent is to find a globally best arm in hindsight that has the lowest cumulative loss averaged over all agents, which necessities the communication among agents. We provide regret lower bounds for any federated bandit algorithm under different settings, when agents have access to full-information feedback, or the bandit feedback. For the bandit feedback setting, we propose a near-optimal federated bandit algorithm called FEDEXP3. Our algorithm gives a positive answer to an open question proposed in Cesa-Bianchi et al. (2016): FEDEXP3 can guarantee a sub-linear regret without exchanging sequences of selected arm identities or loss sequences among agents. We also provide numerical evaluations of our algorithm to validate our theoretical results and demonstrate its effectiveness on synthetic and real-world datasets
△ Less
Submitted 21 October, 2023; v1 submitted 22 January, 2023;
originally announced January 2023.
-
On Regret-optimal Cooperative Nonstochastic Multi-armed Bandits
Authors:
Jialin Yi,
Milan Vojnović
Abstract:
We consider the nonstochastic multi-agent multi-armed bandit problem with agents collaborating via a communication network with delays. We show a lower bound for individual regret of all agents. We show that with suitable regularizers and communication protocols, a collaborative multi-agent \emph{follow-the-regularized-leader} (FTRL) algorithm has an individual regret upper bound that matches the…
▽ More
We consider the nonstochastic multi-agent multi-armed bandit problem with agents collaborating via a communication network with delays. We show a lower bound for individual regret of all agents. We show that with suitable regularizers and communication protocols, a collaborative multi-agent \emph{follow-the-regularized-leader} (FTRL) algorithm has an individual regret upper bound that matches the lower bound up to a constant factor when the number of arms is large enough relative to degrees of agents in the communication graph. We also show that an FTRL algorithm with a suitable regularizer is regret optimal with respect to the scaling with the edge-delay parameter. We present numerical experiments validating our theoretical results and demonstrate cases when our algorithms outperform previously proposed algorithms.
△ Less
Submitted 21 October, 2023; v1 submitted 30 November, 2022;
originally announced November 2022.
-
Dynamics and Inference for Voter Model Processes
Authors:
Milan Vojnovic,
Kaifang Zhou
Abstract:
We consider a discrete-time voter model process on a set of nodes, each being in one of two states, either 0 or 1. In each time step, each node adopts the state of a randomly sampled neighbor according to sampling probabilities, referred to as node interaction parameters. We study the maximum likelihood estimation of the node interaction parameters from observed node states for a given number of r…
▽ More
We consider a discrete-time voter model process on a set of nodes, each being in one of two states, either 0 or 1. In each time step, each node adopts the state of a randomly sampled neighbor according to sampling probabilities, referred to as node interaction parameters. We study the maximum likelihood estimation of the node interaction parameters from observed node states for a given number of realizations of the voter model process. In contrast to previous work on parameter estimation of network autoregressive processes, whose long-run behavior is according to a stationary stochastic process, the voter model is an absorbing stochastic process that eventually reaches a consensus state. This requires developing a framework for deriving parameter estimation error bounds from observations consisting of several realizations of a voter model process. We present parameter estimation error bounds by interpreting the observation data as being generated according to an extended voter process that consists of cycles, each corresponding to a realization of the voter model process until absorption to a consensus state. In order to obtain these results, consensus time of a voter model process plays an important role. We present new bounds for all moments and a bound that holds with any given probability for consensus time, which may be of independent interest. In contrast to most existing work, our results yield a consensus time bound that holds with high probability. We also present a sampling complexity lower bound for parameter estimation within a prescribed error tolerance for the class of locally stable estimators.
△ Less
Submitted 24 November, 2022;
originally announced November 2022.
-
Rotting Infinitely Many-armed Bandits
Authors:
Jung-hun Kim,
Milan Vojnovic,
Se-Young Yun
Abstract:
We consider the infinitely many-armed bandit problem with rotting rewards, where the mean reward of an arm decreases at each pull of the arm according to an arbitrary trend with maximum rotting rate $\varrho=o(1)$. We show that this learning problem has an $Ω(\max\{\varrho^{1/3}T,\sqrt{T}\})$ worst-case regret lower bound where $T$ is the horizon time. We show that a matching upper bound…
▽ More
We consider the infinitely many-armed bandit problem with rotting rewards, where the mean reward of an arm decreases at each pull of the arm according to an arbitrary trend with maximum rotting rate $\varrho=o(1)$. We show that this learning problem has an $Ω(\max\{\varrho^{1/3}T,\sqrt{T}\})$ worst-case regret lower bound where $T$ is the horizon time. We show that a matching upper bound $\tilde{O}(\max\{\varrho^{1/3}T,\sqrt{T}\})$, up to a poly-logarithmic factor, can be achieved by an algorithm that uses a UCB index for each arm and a threshold value to decide whether to continue pulling an arm or remove the arm from further consideration, when the algorithm knows the value of the maximum rotting rate $\varrho$. We also show that an $\tilde{O}(\max\{\varrho^{1/3}T,T^{3/4}\})$ regret upper bound can be achieved by an algorithm that does not know the value of $\varrho$, by using an adaptive UCB index along with an adaptive threshold value.
△ Less
Submitted 17 December, 2023; v1 submitted 30 January, 2022;
originally announced January 2022.
-
Scheduling Servers with Stochastic Bilinear Rewards
Authors:
Jung-hun Kim,
Milan Vojnovic
Abstract:
We address a control system optimization problem that arises in multi-class, multi-server queueing system scheduling with uncertainty. In this scenario, jobs incur holding costs while awaiting completion, and job-server assignments yield observable stochastic rewards with unknown mean values. The rewards for job-server assignments are assumed to follow a bilinear model with respect to features cha…
▽ More
We address a control system optimization problem that arises in multi-class, multi-server queueing system scheduling with uncertainty. In this scenario, jobs incur holding costs while awaiting completion, and job-server assignments yield observable stochastic rewards with unknown mean values. The rewards for job-server assignments are assumed to follow a bilinear model with respect to features characterizing jobs and servers. Our objective is regret minimization, aiming to maximize the cumulative reward of job-server assignments over a time horizon while maintaining a bounded total job holding cost, thus ensuring queueing system stability. This problem is motivated by applications in computing services and online platforms. To address this problem, we propose a scheduling algorithm based on weighted proportional fair allocation criteria augmented with marginal costs for reward maximization, incorporating a bandit strategy. Our algorithm achieves sub-linear regret and sub-linear mean holding cost (and queue length bound) with respect to the time horizon, thus guaranteeing queueing system stability. Additionally, we establish stability conditions for distributed iterative algorithms for computing allocations, which are relevant to large-scale system applications. Finally, we validate the efficiency of our algorithm through numerical experiments.
△ Less
Submitted 1 September, 2024; v1 submitted 12 December, 2021;
originally announced December 2021.
-
Pure Exploration and Regret Minimization in Matching Bandits
Authors:
Flore Sentenac,
Jialin Yi,
Clément Calauzènes,
Vianney Perchet,
Milan Vojnovic
Abstract:
Finding an optimal matching in a weighted graph is a standard combinatorial problem. We consider its semi-bandit version where either a pair or a full matching is sampled sequentially. We prove that it is possible to leverage a rank-1 assumption on the adjacency matrix to reduce the sample complexity and the regret of off-the-shelf algorithms up to reaching a linear dependency in the number of ver…
▽ More
Finding an optimal matching in a weighted graph is a standard combinatorial problem. We consider its semi-bandit version where either a pair or a full matching is sampled sequentially. We prove that it is possible to leverage a rank-1 assumption on the adjacency matrix to reduce the sample complexity and the regret of off-the-shelf algorithms up to reaching a linear dependency in the number of vertices (up to poly log terms).
△ Less
Submitted 31 July, 2021;
originally announced August 2021.
-
Scheduling Jobs with Stochastic Holding Costs
Authors:
Dabeen Lee,
Milan Vojnovic
Abstract:
We study a single-server scheduling problem for the objective of minimizing the expected cumulative holding cost incurred by jobs, where parameters defining stochastic job holding costs are unknown to the scheduler. We consider a general setting allowing for different job classes, where jobs of the same class have statistically identical holding costs and service times, with an arbitrary number of…
▽ More
We study a single-server scheduling problem for the objective of minimizing the expected cumulative holding cost incurred by jobs, where parameters defining stochastic job holding costs are unknown to the scheduler. We consider a general setting allowing for different job classes, where jobs of the same class have statistically identical holding costs and service times, with an arbitrary number of jobs across classes. In each time step, the server can process a job and observes random holding costs of the jobs that are yet to be completed. We consider a learning-based $cμ$ rule scheduling which starts with a preemption period of fixed duration, serving as a learning phase, and having gathered data about jobs, it switches to nonpreemptive scheduling. Our algorithms are designed to handle instances with large and small gaps in mean job holding costs and achieve near-optimal performance guarantees. The performance of algorithms is evaluated by regret, where the benchmark is the minimum possible total holding cost attained by the $cμ$ rule scheduling policy when the parameters of jobs are known. We show regret lower bounds and algorithms that achieve nearly matching regret upper bounds. Our numerical results demonstrate the efficacy of our algorithms and show that our regret analysis is nearly tight.
△ Less
Submitted 21 September, 2022; v1 submitted 28 May, 2021;
originally announced May 2021.
-
Test Score Algorithms for Budgeted Stochastic Utility Maximization
Authors:
Dabeen Lee,
Milan Vojnovic,
Se-Young Yun
Abstract:
Motivated by recent developments in designing algorithms based on individual item scores for solving utility maximization problems, we study the framework of using test scores, defined as a statistic of observed individual item performance data, for solving the budgeted stochastic utility maximization problem. We extend an existing scoring mechanism, namely the replication test scores, to incorpor…
▽ More
Motivated by recent developments in designing algorithms based on individual item scores for solving utility maximization problems, we study the framework of using test scores, defined as a statistic of observed individual item performance data, for solving the budgeted stochastic utility maximization problem. We extend an existing scoring mechanism, namely the replication test scores, to incorporate heterogeneous item costs as well as item values. We show that a natural greedy algorithm that selects items solely based on their replication test scores outputs solutions within a constant factor of the optimum for a broad class of utility functions. Our algorithms and approximation guarantees assume that test scores are noisy estimates of certain expected values with respect to marginal distributions of individual item values, thus making our algorithms practical and extending previous work that assumes noiseless estimates. Moreover, we show how our algorithm can be adapted to the setting where items arrive in a streaming fashion while maintaining the same approximation guarantee. We present numerical results, using synthetic data and data sets from the Academia.StackExchange Q&A forum, which show that our test score algorithm can achieve competitiveness, and in some cases better performance than a benchmark algorithm that requires access to a value oracle to evaluate function values.
△ Less
Submitted 24 February, 2022; v1 submitted 30 December, 2020;
originally announced December 2020.
-
Popularity Prediction for Social Media over Arbitrary Time Horizons
Authors:
Daniel Haimovich,
Dima Karamshuk,
Thomas J. Leeper,
Evgeniy Riabenko,
Milan Vojnovic
Abstract:
Predicting the popularity of social media content in real time requires approaches that efficiently operate at global scale. Popularity prediction is important for many applications, including detection of harmful viral content to enable timely content moderation. The prediction task is difficult because views result from interactions between user interests, content features, resharing, feed ranki…
▽ More
Predicting the popularity of social media content in real time requires approaches that efficiently operate at global scale. Popularity prediction is important for many applications, including detection of harmful viral content to enable timely content moderation. The prediction task is difficult because views result from interactions between user interests, content features, resharing, feed ranking, and network structure. We consider the problem of accurately predicting popularity both at any given prediction time since a content item's creation and for arbitrary time horizons into the future. In order to achieve high accuracy for different prediction time horizons, it is essential for models to use static features (of content and user) as well as observed popularity growth up to prediction time.
We propose a feature-based approach based on a self-excited Hawkes point process model, which involves prediction of the content's popularity at one or more reference horizons in tandem with a point predictor of an effective growth parameter that reflects the timescale of popularity growth. This results in a highly scalable method for popularity prediction over arbitrary prediction time horizons that also achieves a high degree of accuracy, compared to several leading baselines, on a dataset of public page content on Facebook over a two-month period, covering billions of content views and hundreds of thousands of distinct content items. The model has shown competitive prediction accuracy against a strong baseline that consists of separately trained models for specific prediction time horizons.
△ Less
Submitted 22 December, 2021; v1 submitted 4 September, 2020;
originally announced September 2020.
-
Accelerated MM Algorithms for Ranking Scores Inference from Comparison Data
Authors:
Milan Vojnovic,
Seyoung Yun,
Kaifang Zhou
Abstract:
In this paper, we study a popular method for inference of the Bradley-Terry model parameters, namely the MM algorithm, for maximum likelihood estimation and maximum a posteriori probability estimation. This class of models includes the Bradley-Terry model of paired comparisons, the Rao-Kupper model of paired comparisons allowing for tie outcomes, the Luce choice model, and the Plackett-Luce rankin…
▽ More
In this paper, we study a popular method for inference of the Bradley-Terry model parameters, namely the MM algorithm, for maximum likelihood estimation and maximum a posteriori probability estimation. This class of models includes the Bradley-Terry model of paired comparisons, the Rao-Kupper model of paired comparisons allowing for tie outcomes, the Luce choice model, and the Plackett-Luce ranking model. We establish tight characterizations of the convergence rate for the MM algorithm, and show that it is essentially equivalent to that of a gradient descent algorithm. For the maximum likelihood estimation, the convergence is shown to be linear with the rate crucially determined by the algebraic connectivity of the matrix of item pair co-occurrences in observed comparison data. For the Bayesian inference, the convergence rate is also shown to be linear, with the rate determined by a parameter of the prior distribution in a way that can make the convergence arbitrarily slow for small values of this parameter. We propose a simple modification of the classical MM algorithm that avoids the observed slow convergence issue and accelerates the convergence. The key component of the accelerated MM algorithm is a parameter rescaling performed at each iteration step that is carefully chosen based on theoretical analysis and characterisation of the convergence rate.
Our experimental results, performed on both synthetic and real-world data, demonstrate the identified slow convergence issue of the classic MM algorithm, and show that significant efficiency gains can be obtained by our new proposed method.
△ Less
Submitted 26 December, 2020; v1 submitted 1 January, 2019;
originally announced January 2019.
-
KONG: Kernels for ordered-neighborhood graphs
Authors:
Moez Draief,
Konstantin Kutzkov,
Kevin Scaman,
Milan Vojnovic
Abstract:
We present novel graph kernels for graphs with node and edge labels that have ordered neighborhoods, i.e. when neighbor nodes follow an order. Graphs with ordered neighborhoods are a natural data representation for evolving graphs where edges are created over time, which induces an order. Combining convolutional subgraph kernels and string kernels, we design new scalable algorithms for generation…
▽ More
We present novel graph kernels for graphs with node and edge labels that have ordered neighborhoods, i.e. when neighbor nodes follow an order. Graphs with ordered neighborhoods are a natural data representation for evolving graphs where edges are created over time, which induces an order. Combining convolutional subgraph kernels and string kernels, we design new scalable algorithms for generation of explicit graph feature maps using sketching techniques. We obtain precise bounds for the approximation accuracy and computational complexity of the proposed approaches and demonstrate their applicability on real datasets. In particular, our experiments demonstrate that neighborhood ordering results in more informative features. For the special case of general graphs, i.e. graphs without ordered neighborhoods, the new graph kernels yield efficient and simple algorithms for the comparison of label distributions between graphs.
△ Less
Submitted 29 May, 2018; v1 submitted 25 May, 2018;
originally announced May 2018.
-
Parameter Estimation for Thurstone Choice Models
Authors:
Milan Vojnovic,
Se-Young Yun
Abstract:
We consider the estimation accuracy of individual strength parameters of a Thurstone choice model when each input observation consists of a choice of one item from a set of two or more items (so called top-1 lists). This model accommodates the well-known choice models such as the Luce choice model for comparison sets of two or more items and the Bradley-Terry model for pair comparisons.
We provi…
▽ More
We consider the estimation accuracy of individual strength parameters of a Thurstone choice model when each input observation consists of a choice of one item from a set of two or more items (so called top-1 lists). This model accommodates the well-known choice models such as the Luce choice model for comparison sets of two or more items and the Bradley-Terry model for pair comparisons.
We provide a tight characterization of the mean squared error of the maximum likelihood parameter estimator. We also provide similar characterizations for parameter estimators defined by a rank-breaking method, which amounts to deducing one or more pair comparisons from a comparison of two or more items, assuming independence of these pair comparisons, and maximizing a likelihood function derived under these assumptions. We also consider a related binary classification problem where each individual parameter takes value from a set of two possible values and the goal is to correctly classify all items within a prescribed classification error.
△ Less
Submitted 29 April, 2017;
originally announced May 2017.
-
Communication complexity of approximate maximum matching in the message-passing model
Authors:
Zengfeng Huang,
Bozidar Radunovic,
Milan Vojnovic,
Qin Zhang
Abstract:
We consider the communication complexity of finding an approximate maximum matching in a graph in a multi-party message-passing communication model. The maximum matching problem is one of the most fundamental graph combinatorial problems, with a variety of applications.
The input to the problem is a graph $G$ that has $n$ vertices and the set of edges partitioned over $k$ sites, and an approxima…
▽ More
We consider the communication complexity of finding an approximate maximum matching in a graph in a multi-party message-passing communication model. The maximum matching problem is one of the most fundamental graph combinatorial problems, with a variety of applications.
The input to the problem is a graph $G$ that has $n$ vertices and the set of edges partitioned over $k$ sites, and an approximation ratio parameter $α$. The output is required to be a matching in $G$ that has to be reported by one of the sites, whose size is at least factor $α$ of the size of a maximum matching in $G$.
We show that the communication complexity of this problem is $Ω(α^2 k n)$ information bits. This bound is shown to be tight up to a $\log n$ factor, by constructing an algorithm, establishing its correctness, and an upper bound on the communication cost. The lower bound also applies to other graph combinatorial problems in the message-passing communication model, including max-flow and graph sparsification.
△ Less
Submitted 27 April, 2017;
originally announced April 2017.
-
Adaptive Matching for Expert Systems with Uncertain Task Types
Authors:
Virag Shah,
Lennart Gulikers,
Laurent Massoulie,
Milan Vojnovic
Abstract:
A matching in a two-sided market often incurs an externality: a matched resource may become unavailable to the other side of the market, at least for a while. This is especially an issue in online platforms involving human experts as the expert resources are often scarce. The efficient utilization of experts in these platforms is made challenging by the fact that the information available about th…
▽ More
A matching in a two-sided market often incurs an externality: a matched resource may become unavailable to the other side of the market, at least for a while. This is especially an issue in online platforms involving human experts as the expert resources are often scarce. The efficient utilization of experts in these platforms is made challenging by the fact that the information available about the parties involved is usually limited.
To address this challenge, we develop a model of a task-expert matching system where a task is matched to an expert using not only the prior information about the task but also the feedback obtained from the past matches. In our model the tasks arrive online while the experts are fixed and constrained by a finite service capacity. For this model, we characterize the maximum task resolution throughput a platform can achieve. We show that the natural greedy approaches where each expert is assigned a task most suitable to her skill is suboptimal, as it does not internalize the above externality. We develop a throughput optimal backpressure algorithm which does so by accounting for the `congestion' among different task types. Finally, we validate our model and confirm our theoretical findings with data-driven simulations via logs of Math.StackExchange, a StackOverflow forum dedicated to mathematics.
△ Less
Submitted 26 October, 2018; v1 submitted 2 March, 2017;
originally announced March 2017.
-
QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding
Authors:
Dan Alistarh,
Demjan Grubic,
Jerry Li,
Ryota Tomioka,
Milan Vojnovic
Abstract:
Parallel implementations of stochastic gradient descent (SGD) have received significant research attention, thanks to excellent scalability properties of this algorithm, and to its efficiency in the context of training deep neural networks. A fundamental barrier for parallelizing large-scale SGD is the fact that the cost of communicating the gradient updates between nodes can be very large. Conseq…
▽ More
Parallel implementations of stochastic gradient descent (SGD) have received significant research attention, thanks to excellent scalability properties of this algorithm, and to its efficiency in the context of training deep neural networks. A fundamental barrier for parallelizing large-scale SGD is the fact that the cost of communicating the gradient updates between nodes can be very large. Consequently, lossy compression heuristics have been proposed, by which nodes only communicate quantized gradients. Although effective in practice, these heuristics do not always provably converge, and it is not clear whether they are optimal.
In this paper, we propose Quantized SGD (QSGD), a family of compression schemes which allow the compression of gradient updates at each node, while guaranteeing convergence under standard assumptions. QSGD allows the user to trade off compression and convergence time: it can communicate a sublinear number of bits per iteration in the model dimension, and can achieve asymptotically optimal communication cost. We complement our theoretical results with empirical data, showing that QSGD can significantly reduce communication cost, while being competitive with standard uncompressed techniques on a variety of real tasks.
In particular, experiments show that gradient quantization applied to training of deep neural networks for image classification and automated speech recognition can lead to significant reductions in communication cost, and end-to-end training time. For instance, on 16 GPUs, we are able to train a ResNet-152 network on ImageNet 1.8x faster to full accuracy. Of note, we show that there exist generic parameter settings under which all known network architectures preserve or slightly improve their full accuracy when using quantization.
△ Less
Submitted 6 December, 2017; v1 submitted 6 October, 2016;
originally announced October 2016.
-
Submodular Maximization using Test Scores
Authors:
Shreyas Sekar,
Milan Vojnovic,
Se-Young Yun
Abstract:
We study the canonical problem of maximizing a stochastic submodular function subject to a cardinality constraint, where the goal is to select a subset from a ground set of items with uncertain individual performances to maximize their expected group value. Although near-optimal algorithms have been proposed for this problem, practical concerns regarding scalability, compatibility with distributed…
▽ More
We study the canonical problem of maximizing a stochastic submodular function subject to a cardinality constraint, where the goal is to select a subset from a ground set of items with uncertain individual performances to maximize their expected group value. Although near-optimal algorithms have been proposed for this problem, practical concerns regarding scalability, compatibility with distributed implementation, and expensive oracle queries persist in large-scale applications. Motivated by online platforms that rely on individual item scores for content recommendation and team selection, we propose a special class of algorithms that select items based solely on individual performance measures known as test scores. The central contribution of this work is a novel and systematic framework for designing test score based algorithms for a broad class of naturally occurring utility functions. We introduce a new scoring mechanism that we refer to as replication test scores and prove that as long as the objective function satisfies a diminishing returns property, one can leverage these scores to compute solutions that are within a constant factor of the optimum. We then extend our results to the more general stochastic submodular welfare maximization problem, where the goal is to select items and assign them to multiple groups to maximize the sum of the expected group values. For this more difficult problem, we show that replication test scores can be used to develop an algorithm that approximates the optimum solution up to a logarithmic factor. The techniques presented in this work bridge the gap between the rigorous theoretical work on submodular optimization and simple, scalable heuristics that are useful in certain domains.
△ Less
Submitted 9 May, 2019; v1 submitted 23 May, 2016;
originally announced May 2016.
-
Spectral Ranking using Seriation
Authors:
Fajwel Fogel,
Alexandre d'Aspremont,
Milan Vojnovic
Abstract:
We describe a seriation algorithm for ranking a set of items given pairwise comparisons between these items. Intuitively, the algorithm assigns similar rankings to items that compare similarly with all others. It does so by constructing a similarity matrix from pairwise comparisons, using seriation methods to reorder this matrix and construct a ranking. We first show that this spectral seriation a…
▽ More
We describe a seriation algorithm for ranking a set of items given pairwise comparisons between these items. Intuitively, the algorithm assigns similar rankings to items that compare similarly with all others. It does so by constructing a similarity matrix from pairwise comparisons, using seriation methods to reorder this matrix and construct a ranking. We first show that this spectral seriation algorithm recovers the true ranking when all pairwise comparisons are observed and consistent with a total order. We then show that ranking reconstruction is still exact when some pairwise comparisons are corrupted or missing, and that seriation based spectral ranking is more robust to noise than classical scoring methods. Finally, we bound the ranking error when only a random subset of the comparions are observed. An additional benefit of the seriation formulation is that it allows us to solve semi-supervised ranking problems. Experiments on both synthetic and real datasets demonstrate that seriation based spectral ranking achieves competitive and in some cases superior performance compared to classical ranking methods.
△ Less
Submitted 10 March, 2016; v1 submitted 20 June, 2014;
originally announced June 2014.
-
Incentives and Efficiency in Uncertain Collaborative Environments
Authors:
Yoram Bachrach,
Vasilis Syrgkanis,
Milan Vojnovic
Abstract:
We consider collaborative systems where users make contributions across multiple available projects and are rewarded for their contributions in individual projects according to a local sharing of the value produced. This serves as a model of online social computing systems such as online Q&A forums and of credit sharing in scientific co-authorship settings. We show that the maximum feasible produc…
▽ More
We consider collaborative systems where users make contributions across multiple available projects and are rewarded for their contributions in individual projects according to a local sharing of the value produced. This serves as a model of online social computing systems such as online Q&A forums and of credit sharing in scientific co-authorship settings. We show that the maximum feasible produced value can be well approximated by simple local sharing rules where users are approximately rewarded in proportion to their marginal contributions and that this holds even under incomplete information about the player's abilities and effort constraints. For natural instances we show almost 95% optimality at equilibrium. When players incur a cost for their effort, we identify a threshold phenomenon: the efficiency is a constant fraction of the optimal when the cost is strictly convex and decreases with the number of players if the cost is linear.
△ Less
Submitted 5 August, 2013;
originally announced August 2013.
-
Strong Price of Anarchy and Coalitional Dynamics
Authors:
Yoram Bachrach,
Vasilis Syrgkanis,
Eva Tardos,
Milan Vojnovic
Abstract:
We introduce a framework for studying the effect of cooperation on the quality of outcomes in utility games. Our framework is a coalitional analog of the smoothness framework of non-cooperative games. Coalitional smoothness implies bounds on the strong price of anarchy, the loss of quality of coalitionally stable outcomes, as well as bounds on coalitional versions of coarse correlated equilibria a…
▽ More
We introduce a framework for studying the effect of cooperation on the quality of outcomes in utility games. Our framework is a coalitional analog of the smoothness framework of non-cooperative games. Coalitional smoothness implies bounds on the strong price of anarchy, the loss of quality of coalitionally stable outcomes, as well as bounds on coalitional versions of coarse correlated equilibria and sink equilibria, which we define as out-of-equilibrium myopic behavior as determined by a natural coalitional version of best-response dynamics.
Our coalitional smoothness framework captures existing results bounding the strong price of anarchy of network design games. We show that in any monotone utility-maximization game, if each player's utility is at least his marginal contribution to the welfare, then the strong price of anarchy is at most 2. This captures a broad class of games, including games with a very high price of anarchy. Additionally, we show that in potential games the strong price of anarchy is close to the price of stability, the quality of the best Nash equilibrium.
△ Less
Submitted 9 July, 2013;
originally announced July 2013.
-
Bargaining Dynamics in Exchange Networks
Authors:
Moez Draief,
Milan Vojnovic
Abstract:
We consider a dynamical system for computing Nash bargaining solutions on graphs and focus on its rate of convergence. More precisely, we analyze the edge-balanced dynamical system by Azar et al and fully specify its convergence for an important class of elementary graph structures that arise in Kleinberg and Tardos' procedure for computing a Nash bargaining solution on general graphs. We show tha…
▽ More
We consider a dynamical system for computing Nash bargaining solutions on graphs and focus on its rate of convergence. More precisely, we analyze the edge-balanced dynamical system by Azar et al and fully specify its convergence for an important class of elementary graph structures that arise in Kleinberg and Tardos' procedure for computing a Nash bargaining solution on general graphs. We show that all these dynamical systems are either linear or eventually become linear and that their convergence times are quadratic in the number of matched edges.
△ Less
Submitted 6 February, 2012;
originally announced February 2012.
-
Convergence Speed of Binary Interval Consensus
Authors:
Moez Draief,
Milan Vojnovic
Abstract:
We consider the convergence time for solving the binary consensus problem using the interval consensus algorithm proposed by B\' en\' ezit, Thiran and Vetterli (2009). In the binary consensus problem, each node initially holds one of two states and the goal for each node is to correctly decide which one of these two states was initially held by a majority of nodes.
We derive an upper bound on th…
▽ More
We consider the convergence time for solving the binary consensus problem using the interval consensus algorithm proposed by B\' en\' ezit, Thiran and Vetterli (2009). In the binary consensus problem, each node initially holds one of two states and the goal for each node is to correctly decide which one of these two states was initially held by a majority of nodes.
We derive an upper bound on the expected convergence time that holds for arbitrary connected graphs, which is based on the location of eigenvalues of some contact rate matrices. We instantiate our bound for particular networks of interest, including complete graphs, paths, cycles, star-shaped networks, and Erd\" os-R\' enyi random graphs; for these graphs, we compare our bound with alternative computations. We find that for all these examples our bound is tight, yielding the exact order with respect to the number of nodes.
We pinpoint the fact that the expected convergence time critically depends on the voting margin defined as the difference between the fraction of nodes that initially held the majority and the minority states, respectively. The characterization of the expected convergence time yields exact relation between the expected convergence time and the voting margin, for some of these graphs, which reveals how the expected convergence time goes to infinity as the voting margin approaches zero.
Our results provide insights into how the expected convergence time depends on the network topology which can be used for performance evaluation and network design. The results are of interest in the context of networked systems, in particular, peer-to-peer networks, sensor networks and distributed databases.
△ Less
Submitted 6 February, 2012;
originally announced February 2012.