Search | arXiv e-print repository

GL-LowPopArt: A Nearly Instance-Wise Minimax-Optimal Estimator for Generalized Low-Rank Trace Regression

Authors: Junghyun Lee, Kyoungseok Jang, Kwang-Sung Jun, Milan Vojnović, Se-Young Yun

Abstract: We present `GL-LowPopArt`, a novel Catoni-style estimator for generalized low-rank trace regression. Building on `LowPopArt` (Jang et al., 2024), it employs a two-stage approach: nuclear norm regularization followed by matrix Catoni estimation. We establish state-of-the-art estimation error bounds, surpassing existing guarantees (Fan et al., 2019; Kang et al., 2022), and reveal a novel experimenta… ▽ More We present `GL-LowPopArt`, a novel Catoni-style estimator for generalized low-rank trace regression. Building on `LowPopArt` (Jang et al., 2024), it employs a two-stage approach: nuclear norm regularization followed by matrix Catoni estimation. We establish state-of-the-art estimation error bounds, surpassing existing guarantees (Fan et al., 2019; Kang et al., 2022), and reveal a novel experimental design objective, $\mathrm{GL}(π)$. The key technical challenge is controlling bias from the nonlinear inverse link function, which we address by our two-stage approach. We prove a *local* minimax lower bound, showing that our `GL-LowPopArt` enjoys instance-wise optimality up to the condition number of the ground-truth Hessian. Applications include generalized linear matrix completion, where `GL-LowPopArt` achieves a state-of-the-art Frobenius error guarantee, and **bilinear dueling bandits**, a novel setting inspired by general preference learning (Zhang et al., 2024). Our analysis of a `GL-LowPopArt`-based explore-then-commit algorithm reveals a new, potentially interesting problem-dependent quantity, along with improved Borda regret bound than vectorization (Wu et al., 2024). △ Less

Submitted 3 June, 2025; v1 submitted 3 June, 2025; originally announced June 2025.

Comments: 53 pages, 2 figures, 3 tables; Accepted as a Spotlight Poster to the 42nd International Conference on Machine Learning (ICML 2025). Minor correction to the arXiv title in v2 ;)

arXiv:2502.18548 [pdf, other]

What is the Alignment Objective of GRPO?

Authors: Milan Vojnovic, Se-Young Yun

Abstract: In this note, we examine the aggregation of preferences achieved by the Group Policy Optimisation (GRPO) algorithm, a reinforcement learning method used to train advanced artificial intelligence models such as DeepSeek-R1-Zero and DeepSeekMath. The GRPO algorithm trains a policy using a reward preference model, which is computed by sampling a set of outputs for a given context, observing the corre… ▽ More In this note, we examine the aggregation of preferences achieved by the Group Policy Optimisation (GRPO) algorithm, a reinforcement learning method used to train advanced artificial intelligence models such as DeepSeek-R1-Zero and DeepSeekMath. The GRPO algorithm trains a policy using a reward preference model, which is computed by sampling a set of outputs for a given context, observing the corresponding rewards, and applying shift-and-scale normalisation to these reward values. Additionally, it incorporates a penalty function to discourage deviations from a reference policy. We present a framework that enables us to characterise the stationary policies of the GRPO algorithm. This analysis reveals that the aggregation of preferences differs fundamentally from standard logarithmic pooling, which is implemented by other approaches such as RLHF. The precise form of preference aggregation arises from the way the reward preference model is defined and from the penalty function, which we show to essentially correspond to the reverse Kullback-Leibler (KL) divergence between the aggregation policy and the reference policy. Interestingly, we demonstrate that for groups of size two, the reward preference model corresponds to pairwise comparison preferences, similar to those in other alignment methods based on pairwise comparison feedback. We provide explicit characterisations of the aggregate preference for binary questions, for groups of size two, and in the limit of large group size. This provides insights into the dependence of the aggregate preference on parameters such as the regularisation constant and the confidence margin of question answers. Finally, we discuss the aggregation of preferences obtained by modifying the GRPO algorithm to use direct KL divergence as the penalty or to use rewards without scale normalisation. △ Less

Submitted 13 March, 2025; v1 submitted 25 February, 2025; originally announced February 2025.

arXiv:2404.14202 [pdf, ps, other]

An Adaptive Approach for Infinitely Many-armed Bandits under Generalized Rotting Constraints

Authors: Jung-hun Kim, Milan Vojnovic, Se-Young Yun

Abstract: In this study, we consider the infinitely many-armed bandit problems in a rested rotting setting, where the mean reward of an arm may decrease with each pull, while otherwise, it remains unchanged. We explore two scenarios regarding the rotting of rewards: one in which the cumulative amount of rotting is bounded by $V_T$, referred to as the slow-rotting case, and the other in which the cumulative… ▽ More In this study, we consider the infinitely many-armed bandit problems in a rested rotting setting, where the mean reward of an arm may decrease with each pull, while otherwise, it remains unchanged. We explore two scenarios regarding the rotting of rewards: one in which the cumulative amount of rotting is bounded by $V_T$, referred to as the slow-rotting case, and the other in which the cumulative number of rotting instances is bounded by $S_T$, referred to as the abrupt-rotting case. To address the challenge posed by rotting rewards, we introduce an algorithm that utilizes UCB with an adaptive sliding window, designed to manage the bias and variance trade-off arising due to rotting rewards. Our proposed algorithm achieves tight regret bounds for both slow and abrupt rotting scenarios. Lastly, we demonstrate the performance of our algorithm using numerical experiments. △ Less

Submitted 1 June, 2025; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: NeurIPS 2024

arXiv:2312.13927 [pdf, other]

On the Convergence of Loss and Uncertainty-based Active Learning Algorithms

Authors: Daniel Haimovich, Dima Karamshuk, Fridolin Linder, Niek Tax, Milan Vojnovic

Abstract: We investigate the convergence rates and data sample sizes required for training a machine learning model using a stochastic gradient descent (SGD) algorithm, where data points are sampled based on either their loss value or uncertainty value. These training methods are particularly relevant for active learning and data subset selection problems. For SGD with a constant step size update, we presen… ▽ More We investigate the convergence rates and data sample sizes required for training a machine learning model using a stochastic gradient descent (SGD) algorithm, where data points are sampled based on either their loss value or uncertainty value. These training methods are particularly relevant for active learning and data subset selection problems. For SGD with a constant step size update, we present convergence results for linear classifiers and linearly separable datasets using squared hinge loss and similar training loss functions. Additionally, we extend our analysis to more general classifiers and datasets, considering a wide range of loss-based sampling strategies and smooth convex training loss functions. We propose a novel algorithm called Adaptive-Weight Sampling (AWS) that utilizes SGD with an adaptive step size that achieves stochastic Polyak's step size in expectation. We establish convergence rate results for AWS for smooth convex training loss functions. Our numerical experiments demonstrate the efficiency of AWS on various datasets by using either exact or estimated loss values. △ Less

Submitted 22 November, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

arXiv:2305.16074 [pdf, other]

Combinatorial Bandits for Maximum Value Reward Function under Max Value-Index Feedback

Authors: Yiliu Wang, Wei Chen, Milan Vojnović

Abstract: We consider a combinatorial multi-armed bandit problem for maximum value reward function under maximum value and index feedback. This is a new feedback structure that lies in between commonly studied semi-bandit and full-bandit feedback structures. We propose an algorithm and provide a regret bound for problem instances with stochastic arm outcomes according to arbitrary distributions with finite… ▽ More We consider a combinatorial multi-armed bandit problem for maximum value reward function under maximum value and index feedback. This is a new feedback structure that lies in between commonly studied semi-bandit and full-bandit feedback structures. We propose an algorithm and provide a regret bound for problem instances with stochastic arm outcomes according to arbitrary distributions with finite supports. The regret analysis rests on considering an extended set of arms, associated with values and probabilities of arm outcomes, and applying a smoothness condition. Our algorithm achieves a $O((k/Δ)\log(T))$ distribution-dependent and a $\tilde{O}(\sqrt{T})$ distribution-independent regret where $k$ is the number of arms selected in each round, $Δ$ is a distribution-dependent reward gap and $T$ is the horizon time. Perhaps surprisingly, the regret bound is comparable to previously-known bound under more informative semi-bandit feedback. We demonstrate the effectiveness of our algorithm through experimental results. △ Less

Submitted 25 May, 2023; originally announced May 2023.

arXiv:2301.09223 [pdf, other]

Doubly Adversarial Federated Bandits

Authors: Jialin Yi, Milan Vojnović

Abstract: We study a new non-stochastic federated multi-armed bandit problem with multiple agents collaborating via a communication network. The losses of the arms are assigned by an oblivious adversary that specifies the loss of each arm not only for each time step but also for each agent, which we call ``doubly adversarial". In this setting, different agents may choose the same arm in the same time step b… ▽ More We study a new non-stochastic federated multi-armed bandit problem with multiple agents collaborating via a communication network. The losses of the arms are assigned by an oblivious adversary that specifies the loss of each arm not only for each time step but also for each agent, which we call ``doubly adversarial". In this setting, different agents may choose the same arm in the same time step but observe different feedback. The goal of each agent is to find a globally best arm in hindsight that has the lowest cumulative loss averaged over all agents, which necessities the communication among agents. We provide regret lower bounds for any federated bandit algorithm under different settings, when agents have access to full-information feedback, or the bandit feedback. For the bandit feedback setting, we propose a near-optimal federated bandit algorithm called FEDEXP3. Our algorithm gives a positive answer to an open question proposed in Cesa-Bianchi et al. (2016): FEDEXP3 can guarantee a sub-linear regret without exchanging sequences of selected arm identities or loss sequences among agents. We also provide numerical evaluations of our algorithm to validate our theoretical results and demonstrate its effectiveness on synthetic and real-world datasets △ Less

Submitted 21 October, 2023; v1 submitted 22 January, 2023; originally announced January 2023.

Comments: Published in ICML 2023 https://proceedings.mlr.press/v202/yi23a.html

Journal ref: Proceedings of the 40th International Conference on Machine Learning 2023

arXiv:2211.17154 [pdf, other]

doi 10.5555/3545946.3598780

On Regret-optimal Cooperative Nonstochastic Multi-armed Bandits

Authors: Jialin Yi, Milan Vojnović

Abstract: We consider the nonstochastic multi-agent multi-armed bandit problem with agents collaborating via a communication network with delays. We show a lower bound for individual regret of all agents. We show that with suitable regularizers and communication protocols, a collaborative multi-agent \emph{follow-the-regularized-leader} (FTRL) algorithm has an individual regret upper bound that matches the… ▽ More We consider the nonstochastic multi-agent multi-armed bandit problem with agents collaborating via a communication network with delays. We show a lower bound for individual regret of all agents. We show that with suitable regularizers and communication protocols, a collaborative multi-agent \emph{follow-the-regularized-leader} (FTRL) algorithm has an individual regret upper bound that matches the lower bound up to a constant factor when the number of arms is large enough relative to degrees of agents in the communication graph. We also show that an FTRL algorithm with a suitable regularizer is regret optimal with respect to the scaling with the edge-delay parameter. We present numerical experiments validating our theoretical results and demonstrate cases when our algorithms outperform previously proposed algorithms. △ Less

Submitted 21 October, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

Comments: Published in AAMAS 2023

Journal ref: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems 1329 1335

arXiv:2211.13628 [pdf, other]

Dynamics and Inference for Voter Model Processes

Authors: Milan Vojnovic, Kaifang Zhou

Abstract: We consider a discrete-time voter model process on a set of nodes, each being in one of two states, either 0 or 1. In each time step, each node adopts the state of a randomly sampled neighbor according to sampling probabilities, referred to as node interaction parameters. We study the maximum likelihood estimation of the node interaction parameters from observed node states for a given number of r… ▽ More We consider a discrete-time voter model process on a set of nodes, each being in one of two states, either 0 or 1. In each time step, each node adopts the state of a randomly sampled neighbor according to sampling probabilities, referred to as node interaction parameters. We study the maximum likelihood estimation of the node interaction parameters from observed node states for a given number of realizations of the voter model process. In contrast to previous work on parameter estimation of network autoregressive processes, whose long-run behavior is according to a stationary stochastic process, the voter model is an absorbing stochastic process that eventually reaches a consensus state. This requires developing a framework for deriving parameter estimation error bounds from observations consisting of several realizations of a voter model process. We present parameter estimation error bounds by interpreting the observation data as being generated according to an extended voter process that consists of cycles, each corresponding to a realization of the voter model process until absorption to a consensus state. In order to obtain these results, consensus time of a voter model process plays an important role. We present new bounds for all moments and a bound that holds with any given probability for consensus time, which may be of independent interest. In contrast to most existing work, our results yield a consensus time bound that holds with high probability. We also present a sampling complexity lower bound for parameter estimation within a prescribed error tolerance for the class of locally stable estimators. △ Less

Submitted 24 November, 2022; originally announced November 2022.

arXiv:2201.12975 [pdf, other]

Rotting Infinitely Many-armed Bandits

Authors: Jung-hun Kim, Milan Vojnovic, Se-Young Yun

Abstract: We consider the infinitely many-armed bandit problem with rotting rewards, where the mean reward of an arm decreases at each pull of the arm according to an arbitrary trend with maximum rotting rate $\varrho=o(1)$. We show that this learning problem has an $Ω(\max\{\varrho^{1/3}T,\sqrt{T}\})$ worst-case regret lower bound where $T$ is the horizon time. We show that a matching upper bound… ▽ More We consider the infinitely many-armed bandit problem with rotting rewards, where the mean reward of an arm decreases at each pull of the arm according to an arbitrary trend with maximum rotting rate $\varrho=o(1)$. We show that this learning problem has an $Ω(\max\{\varrho^{1/3}T,\sqrt{T}\})$ worst-case regret lower bound where $T$ is the horizon time. We show that a matching upper bound $\tilde{O}(\max\{\varrho^{1/3}T,\sqrt{T}\})$, up to a poly-logarithmic factor, can be achieved by an algorithm that uses a UCB index for each arm and a threshold value to decide whether to continue pulling an arm or remove the arm from further consideration, when the algorithm knows the value of the maximum rotting rate $\varrho$. We also show that an $\tilde{O}(\max\{\varrho^{1/3}T,T^{3/4}\})$ regret upper bound can be achieved by an algorithm that does not know the value of $\varrho$, by using an adaptive UCB index along with an adaptive threshold value. △ Less

Submitted 17 December, 2023; v1 submitted 30 January, 2022; originally announced January 2022.

Comments: ICML2022

arXiv:2112.06362 [pdf, other]

Scheduling Servers with Stochastic Bilinear Rewards

Authors: Jung-hun Kim, Milan Vojnovic

Abstract: We address a control system optimization problem that arises in multi-class, multi-server queueing system scheduling with uncertainty. In this scenario, jobs incur holding costs while awaiting completion, and job-server assignments yield observable stochastic rewards with unknown mean values. The rewards for job-server assignments are assumed to follow a bilinear model with respect to features cha… ▽ More We address a control system optimization problem that arises in multi-class, multi-server queueing system scheduling with uncertainty. In this scenario, jobs incur holding costs while awaiting completion, and job-server assignments yield observable stochastic rewards with unknown mean values. The rewards for job-server assignments are assumed to follow a bilinear model with respect to features characterizing jobs and servers. Our objective is regret minimization, aiming to maximize the cumulative reward of job-server assignments over a time horizon while maintaining a bounded total job holding cost, thus ensuring queueing system stability. This problem is motivated by applications in computing services and online platforms. To address this problem, we propose a scheduling algorithm based on weighted proportional fair allocation criteria augmented with marginal costs for reward maximization, incorporating a bandit strategy. Our algorithm achieves sub-linear regret and sub-linear mean holding cost (and queue length bound) with respect to the time horizon, thus guaranteeing queueing system stability. Additionally, we establish stability conditions for distributed iterative algorithms for computing allocations, which are relevant to large-scale system applications. Finally, we validate the efficiency of our algorithm through numerical experiments. △ Less

Submitted 1 September, 2024; v1 submitted 12 December, 2021; originally announced December 2021.

arXiv:2108.00230 [pdf, other]

Pure Exploration and Regret Minimization in Matching Bandits

Authors: Flore Sentenac, Jialin Yi, Clément Calauzènes, Vianney Perchet, Milan Vojnovic

Abstract: Finding an optimal matching in a weighted graph is a standard combinatorial problem. We consider its semi-bandit version where either a pair or a full matching is sampled sequentially. We prove that it is possible to leverage a rank-1 assumption on the adjacency matrix to reduce the sample complexity and the regret of off-the-shelf algorithms up to reaching a linear dependency in the number of ver… ▽ More Finding an optimal matching in a weighted graph is a standard combinatorial problem. We consider its semi-bandit version where either a pair or a full matching is sampled sequentially. We prove that it is possible to leverage a rank-1 assumption on the adjacency matrix to reduce the sample complexity and the regret of off-the-shelf algorithms up to reaching a linear dependency in the number of vertices (up to poly log terms). △ Less

Submitted 31 July, 2021; originally announced August 2021.

Journal ref: Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021

arXiv:2105.13655 [pdf, other]

Scheduling Jobs with Stochastic Holding Costs

Authors: Dabeen Lee, Milan Vojnovic

Abstract: We study a single-server scheduling problem for the objective of minimizing the expected cumulative holding cost incurred by jobs, where parameters defining stochastic job holding costs are unknown to the scheduler. We consider a general setting allowing for different job classes, where jobs of the same class have statistically identical holding costs and service times, with an arbitrary number of… ▽ More We study a single-server scheduling problem for the objective of minimizing the expected cumulative holding cost incurred by jobs, where parameters defining stochastic job holding costs are unknown to the scheduler. We consider a general setting allowing for different job classes, where jobs of the same class have statistically identical holding costs and service times, with an arbitrary number of jobs across classes. In each time step, the server can process a job and observes random holding costs of the jobs that are yet to be completed. We consider a learning-based $cμ$ rule scheduling which starts with a preemption period of fixed duration, serving as a learning phase, and having gathered data about jobs, it switches to nonpreemptive scheduling. Our algorithms are designed to handle instances with large and small gaps in mean job holding costs and achieve near-optimal performance guarantees. The performance of algorithms is evaluated by regret, where the benchmark is the minimum possible total holding cost attained by the $cμ$ rule scheduling policy when the parameters of jobs are known. We show regret lower bounds and algorithms that achieve nearly matching regret upper bounds. Our numerical results demonstrate the efficacy of our algorithms and show that our regret analysis is nearly tight. △ Less

Submitted 21 September, 2022; v1 submitted 28 May, 2021; originally announced May 2021.

Comments: Extended abstract appeared in NeurIPS 2021

arXiv:2012.15194 [pdf, other]

Test Score Algorithms for Budgeted Stochastic Utility Maximization

Authors: Dabeen Lee, Milan Vojnovic, Se-Young Yun

Abstract: Motivated by recent developments in designing algorithms based on individual item scores for solving utility maximization problems, we study the framework of using test scores, defined as a statistic of observed individual item performance data, for solving the budgeted stochastic utility maximization problem. We extend an existing scoring mechanism, namely the replication test scores, to incorpor… ▽ More Motivated by recent developments in designing algorithms based on individual item scores for solving utility maximization problems, we study the framework of using test scores, defined as a statistic of observed individual item performance data, for solving the budgeted stochastic utility maximization problem. We extend an existing scoring mechanism, namely the replication test scores, to incorporate heterogeneous item costs as well as item values. We show that a natural greedy algorithm that selects items solely based on their replication test scores outputs solutions within a constant factor of the optimum for a broad class of utility functions. Our algorithms and approximation guarantees assume that test scores are noisy estimates of certain expected values with respect to marginal distributions of individual item values, thus making our algorithms practical and extending previous work that assumes noiseless estimates. Moreover, we show how our algorithm can be adapted to the setting where items arrive in a streaming fashion while maintaining the same approximation guarantee. We present numerical results, using synthetic data and data sets from the Academia.StackExchange Q&A forum, which show that our test score algorithm can achieve competitiveness, and in some cases better performance than a benchmark algorithm that requires access to a value oracle to evaluate function values. △ Less

Submitted 24 February, 2022; v1 submitted 30 December, 2020; originally announced December 2020.

arXiv:2009.02092 [pdf, other]

doi 10.14778/3503585.3503593

Popularity Prediction for Social Media over Arbitrary Time Horizons

Authors: Daniel Haimovich, Dima Karamshuk, Thomas J. Leeper, Evgeniy Riabenko, Milan Vojnovic

Abstract: Predicting the popularity of social media content in real time requires approaches that efficiently operate at global scale. Popularity prediction is important for many applications, including detection of harmful viral content to enable timely content moderation. The prediction task is difficult because views result from interactions between user interests, content features, resharing, feed ranki… ▽ More Predicting the popularity of social media content in real time requires approaches that efficiently operate at global scale. Popularity prediction is important for many applications, including detection of harmful viral content to enable timely content moderation. The prediction task is difficult because views result from interactions between user interests, content features, resharing, feed ranking, and network structure. We consider the problem of accurately predicting popularity both at any given prediction time since a content item's creation and for arbitrary time horizons into the future. In order to achieve high accuracy for different prediction time horizons, it is essential for models to use static features (of content and user) as well as observed popularity growth up to prediction time. We propose a feature-based approach based on a self-excited Hawkes point process model, which involves prediction of the content's popularity at one or more reference horizons in tandem with a point predictor of an effective growth parameter that reflects the timescale of popularity growth. This results in a highly scalable method for popularity prediction over arbitrary prediction time horizons that also achieves a high degree of accuracy, compared to several leading baselines, on a dataset of public page content on Facebook over a two-month period, covering billions of content views and hundreds of thousands of distinct content items. The model has shown competitive prediction accuracy against a strong baseline that consists of separately trained models for specific prediction time horizons. △ Less

Submitted 22 December, 2021; v1 submitted 4 September, 2020; originally announced September 2020.

Comments: International Conference on Very Large Data Bases (VLDB'2022)

arXiv:1901.00150 [pdf, other]

Accelerated MM Algorithms for Ranking Scores Inference from Comparison Data

Authors: Milan Vojnovic, Seyoung Yun, Kaifang Zhou

Abstract: In this paper, we study a popular method for inference of the Bradley-Terry model parameters, namely the MM algorithm, for maximum likelihood estimation and maximum a posteriori probability estimation. This class of models includes the Bradley-Terry model of paired comparisons, the Rao-Kupper model of paired comparisons allowing for tie outcomes, the Luce choice model, and the Plackett-Luce rankin… ▽ More In this paper, we study a popular method for inference of the Bradley-Terry model parameters, namely the MM algorithm, for maximum likelihood estimation and maximum a posteriori probability estimation. This class of models includes the Bradley-Terry model of paired comparisons, the Rao-Kupper model of paired comparisons allowing for tie outcomes, the Luce choice model, and the Plackett-Luce ranking model. We establish tight characterizations of the convergence rate for the MM algorithm, and show that it is essentially equivalent to that of a gradient descent algorithm. For the maximum likelihood estimation, the convergence is shown to be linear with the rate crucially determined by the algebraic connectivity of the matrix of item pair co-occurrences in observed comparison data. For the Bayesian inference, the convergence rate is also shown to be linear, with the rate determined by a parameter of the prior distribution in a way that can make the convergence arbitrarily slow for small values of this parameter. We propose a simple modification of the classical MM algorithm that avoids the observed slow convergence issue and accelerates the convergence. The key component of the accelerated MM algorithm is a parameter rescaling performed at each iteration step that is carefully chosen based on theoretical analysis and characterisation of the convergence rate. Our experimental results, performed on both synthetic and real-world data, demonstrate the identified slow convergence issue of the classic MM algorithm, and show that significant efficiency gains can be obtained by our new proposed method. △ Less

Submitted 26 December, 2020; v1 submitted 1 January, 2019; originally announced January 2019.

arXiv:1805.10014 [pdf, other]

KONG: Kernels for ordered-neighborhood graphs

Authors: Moez Draief, Konstantin Kutzkov, Kevin Scaman, Milan Vojnovic

Abstract: We present novel graph kernels for graphs with node and edge labels that have ordered neighborhoods, i.e. when neighbor nodes follow an order. Graphs with ordered neighborhoods are a natural data representation for evolving graphs where edges are created over time, which induces an order. Combining convolutional subgraph kernels and string kernels, we design new scalable algorithms for generation… ▽ More We present novel graph kernels for graphs with node and edge labels that have ordered neighborhoods, i.e. when neighbor nodes follow an order. Graphs with ordered neighborhoods are a natural data representation for evolving graphs where edges are created over time, which induces an order. Combining convolutional subgraph kernels and string kernels, we design new scalable algorithms for generation of explicit graph feature maps using sketching techniques. We obtain precise bounds for the approximation accuracy and computational complexity of the proposed approaches and demonstrate their applicability on real datasets. In particular, our experiments demonstrate that neighborhood ordering results in more informative features. For the special case of general graphs, i.e. graphs without ordered neighborhoods, the new graph kernels yield efficient and simple algorithms for the comparison of label distributions between graphs. △ Less

Submitted 29 May, 2018; v1 submitted 25 May, 2018; originally announced May 2018.

arXiv:1705.00136 [pdf, other]

Parameter Estimation for Thurstone Choice Models

Authors: Milan Vojnovic, Se-Young Yun

Abstract: We consider the estimation accuracy of individual strength parameters of a Thurstone choice model when each input observation consists of a choice of one item from a set of two or more items (so called top-1 lists). This model accommodates the well-known choice models such as the Luce choice model for comparison sets of two or more items and the Bradley-Terry model for pair comparisons. We provi… ▽ More We consider the estimation accuracy of individual strength parameters of a Thurstone choice model when each input observation consists of a choice of one item from a set of two or more items (so called top-1 lists). This model accommodates the well-known choice models such as the Luce choice model for comparison sets of two or more items and the Bradley-Terry model for pair comparisons. We provide a tight characterization of the mean squared error of the maximum likelihood parameter estimator. We also provide similar characterizations for parameter estimators defined by a rank-breaking method, which amounts to deducing one or more pair comparisons from a comparison of two or more items, assuming independence of these pair comparisons, and maximizing a likelihood function derived under these assumptions. We also consider a related binary classification problem where each individual parameter takes value from a set of two possible values and the goal is to correctly classify all items within a prescribed classification error. △ Less

Submitted 29 April, 2017; originally announced May 2017.

Comments: 55 pages

arXiv:1704.08462 [pdf, ps, other]

Communication complexity of approximate maximum matching in the message-passing model

Authors: Zengfeng Huang, Bozidar Radunovic, Milan Vojnovic, Qin Zhang

Abstract: We consider the communication complexity of finding an approximate maximum matching in a graph in a multi-party message-passing communication model. The maximum matching problem is one of the most fundamental graph combinatorial problems, with a variety of applications. The input to the problem is a graph $G$ that has $n$ vertices and the set of edges partitioned over $k$ sites, and an approxima… ▽ More We consider the communication complexity of finding an approximate maximum matching in a graph in a multi-party message-passing communication model. The maximum matching problem is one of the most fundamental graph combinatorial problems, with a variety of applications. The input to the problem is a graph $G$ that has $n$ vertices and the set of edges partitioned over $k$ sites, and an approximation ratio parameter $α$. The output is required to be a matching in $G$ that has to be reported by one of the sites, whose size is at least factor $α$ of the size of a maximum matching in $G$. We show that the communication complexity of this problem is $Ω(α^2 k n)$ information bits. This bound is shown to be tight up to a $\log n$ factor, by constructing an algorithm, establishing its correctness, and an upper bound on the communication cost. The lower bound also applies to other graph combinatorial problems in the message-passing communication model, including max-flow and graph sparsification. △ Less

Submitted 27 April, 2017; originally announced April 2017.

arXiv:1703.00674 [pdf, other]

Adaptive Matching for Expert Systems with Uncertain Task Types

Authors: Virag Shah, Lennart Gulikers, Laurent Massoulie, Milan Vojnovic

Abstract: A matching in a two-sided market often incurs an externality: a matched resource may become unavailable to the other side of the market, at least for a while. This is especially an issue in online platforms involving human experts as the expert resources are often scarce. The efficient utilization of experts in these platforms is made challenging by the fact that the information available about th… ▽ More A matching in a two-sided market often incurs an externality: a matched resource may become unavailable to the other side of the market, at least for a while. This is especially an issue in online platforms involving human experts as the expert resources are often scarce. The efficient utilization of experts in these platforms is made challenging by the fact that the information available about the parties involved is usually limited. To address this challenge, we develop a model of a task-expert matching system where a task is matched to an expert using not only the prior information about the task but also the feedback obtained from the past matches. In our model the tasks arrive online while the experts are fixed and constrained by a finite service capacity. For this model, we characterize the maximum task resolution throughput a platform can achieve. We show that the natural greedy approaches where each expert is assigned a task most suitable to her skill is suboptimal, as it does not internalize the above externality. We develop a throughput optimal backpressure algorithm which does so by accounting for the `congestion' among different task types. Finally, we validate our model and confirm our theoretical findings with data-driven simulations via logs of Math.StackExchange, a StackOverflow forum dedicated to mathematics. △ Less

Submitted 26 October, 2018; v1 submitted 2 March, 2017; originally announced March 2017.

Comments: A part of it presented at Allerton Conference 2017, 18 pages

arXiv:1610.02132 [pdf, other]

QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding

Authors: Dan Alistarh, Demjan Grubic, Jerry Li, Ryota Tomioka, Milan Vojnovic

Abstract: Parallel implementations of stochastic gradient descent (SGD) have received significant research attention, thanks to excellent scalability properties of this algorithm, and to its efficiency in the context of training deep neural networks. A fundamental barrier for parallelizing large-scale SGD is the fact that the cost of communicating the gradient updates between nodes can be very large. Conseq… ▽ More Parallel implementations of stochastic gradient descent (SGD) have received significant research attention, thanks to excellent scalability properties of this algorithm, and to its efficiency in the context of training deep neural networks. A fundamental barrier for parallelizing large-scale SGD is the fact that the cost of communicating the gradient updates between nodes can be very large. Consequently, lossy compression heuristics have been proposed, by which nodes only communicate quantized gradients. Although effective in practice, these heuristics do not always provably converge, and it is not clear whether they are optimal. In this paper, we propose Quantized SGD (QSGD), a family of compression schemes which allow the compression of gradient updates at each node, while guaranteeing convergence under standard assumptions. QSGD allows the user to trade off compression and convergence time: it can communicate a sublinear number of bits per iteration in the model dimension, and can achieve asymptotically optimal communication cost. We complement our theoretical results with empirical data, showing that QSGD can significantly reduce communication cost, while being competitive with standard uncompressed techniques on a variety of real tasks. In particular, experiments show that gradient quantization applied to training of deep neural networks for image classification and automated speech recognition can lead to significant reductions in communication cost, and end-to-end training time. For instance, on 16 GPUs, we are able to train a ResNet-152 network on ImageNet 1.8x faster to full accuracy. Of note, we show that there exist generic parameter settings under which all known network architectures preserve or slightly improve their full accuracy when using quantization. △ Less

Submitted 6 December, 2017; v1 submitted 6 October, 2016; originally announced October 2016.

arXiv:1605.07172 [pdf, other]

Submodular Maximization using Test Scores

Authors: Shreyas Sekar, Milan Vojnovic, Se-Young Yun

Abstract: We study the canonical problem of maximizing a stochastic submodular function subject to a cardinality constraint, where the goal is to select a subset from a ground set of items with uncertain individual performances to maximize their expected group value. Although near-optimal algorithms have been proposed for this problem, practical concerns regarding scalability, compatibility with distributed… ▽ More We study the canonical problem of maximizing a stochastic submodular function subject to a cardinality constraint, where the goal is to select a subset from a ground set of items with uncertain individual performances to maximize their expected group value. Although near-optimal algorithms have been proposed for this problem, practical concerns regarding scalability, compatibility with distributed implementation, and expensive oracle queries persist in large-scale applications. Motivated by online platforms that rely on individual item scores for content recommendation and team selection, we propose a special class of algorithms that select items based solely on individual performance measures known as test scores. The central contribution of this work is a novel and systematic framework for designing test score based algorithms for a broad class of naturally occurring utility functions. We introduce a new scoring mechanism that we refer to as replication test scores and prove that as long as the objective function satisfies a diminishing returns property, one can leverage these scores to compute solutions that are within a constant factor of the optimum. We then extend our results to the more general stochastic submodular welfare maximization problem, where the goal is to select items and assign them to multiple groups to maximize the sum of the expected group values. For this more difficult problem, we show that replication test scores can be used to develop an algorithm that approximates the optimum solution up to a logarithmic factor. The techniques presented in this work bridge the gap between the rigorous theoretical work on submodular optimization and simple, scalable heuristics that are useful in certain domains. △ Less

Submitted 9 May, 2019; v1 submitted 23 May, 2016; originally announced May 2016.

Comments: Under review

arXiv:1406.5370 [pdf, other]

Spectral Ranking using Seriation

Authors: Fajwel Fogel, Alexandre d'Aspremont, Milan Vojnovic

Abstract: We describe a seriation algorithm for ranking a set of items given pairwise comparisons between these items. Intuitively, the algorithm assigns similar rankings to items that compare similarly with all others. It does so by constructing a similarity matrix from pairwise comparisons, using seriation methods to reorder this matrix and construct a ranking. We first show that this spectral seriation a… ▽ More We describe a seriation algorithm for ranking a set of items given pairwise comparisons between these items. Intuitively, the algorithm assigns similar rankings to items that compare similarly with all others. It does so by constructing a similarity matrix from pairwise comparisons, using seriation methods to reorder this matrix and construct a ranking. We first show that this spectral seriation algorithm recovers the true ranking when all pairwise comparisons are observed and consistent with a total order. We then show that ranking reconstruction is still exact when some pairwise comparisons are corrupted or missing, and that seriation based spectral ranking is more robust to noise than classical scoring methods. Finally, we bound the ranking error when only a random subset of the comparions are observed. An additional benefit of the seriation formulation is that it allows us to solve semi-supervised ranking problems. Experiments on both synthetic and real datasets demonstrate that seriation based spectral ranking achieves competitive and in some cases superior performance compared to classical ranking methods. △ Less

Submitted 10 March, 2016; v1 submitted 20 June, 2014; originally announced June 2014.

Comments: Substantially revised. Accepted by JMLR

MSC Class: 62F07; 06A07; 90C27

arXiv:1308.0990 [pdf, ps, other]

Incentives and Efficiency in Uncertain Collaborative Environments

Authors: Yoram Bachrach, Vasilis Syrgkanis, Milan Vojnovic

Abstract: We consider collaborative systems where users make contributions across multiple available projects and are rewarded for their contributions in individual projects according to a local sharing of the value produced. This serves as a model of online social computing systems such as online Q&A forums and of credit sharing in scientific co-authorship settings. We show that the maximum feasible produc… ▽ More We consider collaborative systems where users make contributions across multiple available projects and are rewarded for their contributions in individual projects according to a local sharing of the value produced. This serves as a model of online social computing systems such as online Q&A forums and of credit sharing in scientific co-authorship settings. We show that the maximum feasible produced value can be well approximated by simple local sharing rules where users are approximately rewarded in proportion to their marginal contributions and that this holds even under incomplete information about the player's abilities and effort constraints. For natural instances we show almost 95% optimality at equilibrium. When players incur a cost for their effort, we identify a threshold phenomenon: the efficiency is a constant fraction of the optimal when the cost is strictly convex and decreases with the number of players if the cost is linear. △ Less

Submitted 5 August, 2013; originally announced August 2013.

arXiv:1307.2537 [pdf, ps, other]

Strong Price of Anarchy and Coalitional Dynamics

Authors: Yoram Bachrach, Vasilis Syrgkanis, Eva Tardos, Milan Vojnovic

Abstract: We introduce a framework for studying the effect of cooperation on the quality of outcomes in utility games. Our framework is a coalitional analog of the smoothness framework of non-cooperative games. Coalitional smoothness implies bounds on the strong price of anarchy, the loss of quality of coalitionally stable outcomes, as well as bounds on coalitional versions of coarse correlated equilibria a… ▽ More We introduce a framework for studying the effect of cooperation on the quality of outcomes in utility games. Our framework is a coalitional analog of the smoothness framework of non-cooperative games. Coalitional smoothness implies bounds on the strong price of anarchy, the loss of quality of coalitionally stable outcomes, as well as bounds on coalitional versions of coarse correlated equilibria and sink equilibria, which we define as out-of-equilibrium myopic behavior as determined by a natural coalitional version of best-response dynamics. Our coalitional smoothness framework captures existing results bounding the strong price of anarchy of network design games. We show that in any monotone utility-maximization game, if each player's utility is at least his marginal contribution to the welfare, then the strong price of anarchy is at most 2. This captures a broad class of games, including games with a very high price of anarchy. Additionally, we show that in potential games the strong price of anarchy is close to the price of stability, the quality of the best Nash equilibrium. △ Less

Submitted 9 July, 2013; originally announced July 2013.

arXiv:1202.1089 [pdf, ps, other]

Bargaining Dynamics in Exchange Networks

Authors: Moez Draief, Milan Vojnovic

Abstract: We consider a dynamical system for computing Nash bargaining solutions on graphs and focus on its rate of convergence. More precisely, we analyze the edge-balanced dynamical system by Azar et al and fully specify its convergence for an important class of elementary graph structures that arise in Kleinberg and Tardos' procedure for computing a Nash bargaining solution on general graphs. We show tha… ▽ More We consider a dynamical system for computing Nash bargaining solutions on graphs and focus on its rate of convergence. More precisely, we analyze the edge-balanced dynamical system by Azar et al and fully specify its convergence for an important class of elementary graph structures that arise in Kleinberg and Tardos' procedure for computing a Nash bargaining solution on general graphs. We show that all these dynamical systems are either linear or eventually become linear and that their convergence times are quadratic in the number of matched edges. △ Less

Submitted 6 February, 2012; originally announced February 2012.

Comments: Short version appeared in Allerton 2010

arXiv:1202.1083 [pdf, other]

Convergence Speed of Binary Interval Consensus

Authors: Moez Draief, Milan Vojnovic

Abstract: We consider the convergence time for solving the binary consensus problem using the interval consensus algorithm proposed by B\' en\' ezit, Thiran and Vetterli (2009). In the binary consensus problem, each node initially holds one of two states and the goal for each node is to correctly decide which one of these two states was initially held by a majority of nodes. We derive an upper bound on th… ▽ More We consider the convergence time for solving the binary consensus problem using the interval consensus algorithm proposed by B\' en\' ezit, Thiran and Vetterli (2009). In the binary consensus problem, each node initially holds one of two states and the goal for each node is to correctly decide which one of these two states was initially held by a majority of nodes. We derive an upper bound on the expected convergence time that holds for arbitrary connected graphs, which is based on the location of eigenvalues of some contact rate matrices. We instantiate our bound for particular networks of interest, including complete graphs, paths, cycles, star-shaped networks, and Erd\" os-R\' enyi random graphs; for these graphs, we compare our bound with alternative computations. We find that for all these examples our bound is tight, yielding the exact order with respect to the number of nodes. We pinpoint the fact that the expected convergence time critically depends on the voting margin defined as the difference between the fraction of nodes that initially held the majority and the minority states, respectively. The characterization of the expected convergence time yields exact relation between the expected convergence time and the voting margin, for some of these graphs, which reveals how the expected convergence time goes to infinity as the voting margin approaches zero. Our results provide insights into how the expected convergence time depends on the network topology which can be used for performance evaluation and network design. The results are of interest in the context of networked systems, in particular, peer-to-peer networks, sensor networks and distributed databases. △ Less

Submitted 6 February, 2012; originally announced February 2012.

Comments: To appear in SIAM Optimization and Control. Short version appeared in INFOCOM 2010

Showing 1–26 of 26 results for author: Vojnovic, M