Skip to main content

Showing 1–26 of 26 results for author: Vojnovic, M

.
  1. arXiv:2506.03074  [pdf, ps, other

    stat.ML cs.LG

    GL-LowPopArt: A Nearly Instance-Wise Minimax-Optimal Estimator for Generalized Low-Rank Trace Regression

    Authors: Junghyun Lee, Kyoungseok Jang, Kwang-Sung Jun, Milan Vojnović, Se-Young Yun

    Abstract: We present `GL-LowPopArt`, a novel Catoni-style estimator for generalized low-rank trace regression. Building on `LowPopArt` (Jang et al., 2024), it employs a two-stage approach: nuclear norm regularization followed by matrix Catoni estimation. We establish state-of-the-art estimation error bounds, surpassing existing guarantees (Fan et al., 2019; Kang et al., 2022), and reveal a novel experimenta… ▽ More

    Submitted 3 June, 2025; v1 submitted 3 June, 2025; originally announced June 2025.

    Comments: 53 pages, 2 figures, 3 tables; Accepted as a Spotlight Poster to the 42nd International Conference on Machine Learning (ICML 2025). Minor correction to the arXiv title in v2 ;)

  2. arXiv:2502.18548  [pdf, other

    cs.LG cs.AI

    What is the Alignment Objective of GRPO?

    Authors: Milan Vojnovic, Se-Young Yun

    Abstract: In this note, we examine the aggregation of preferences achieved by the Group Policy Optimisation (GRPO) algorithm, a reinforcement learning method used to train advanced artificial intelligence models such as DeepSeek-R1-Zero and DeepSeekMath. The GRPO algorithm trains a policy using a reward preference model, which is computed by sampling a set of outputs for a given context, observing the corre… ▽ More

    Submitted 13 March, 2025; v1 submitted 25 February, 2025; originally announced February 2025.

  3. arXiv:2404.14202  [pdf, ps, other

    cs.LG stat.ML

    An Adaptive Approach for Infinitely Many-armed Bandits under Generalized Rotting Constraints

    Authors: Jung-hun Kim, Milan Vojnovic, Se-Young Yun

    Abstract: In this study, we consider the infinitely many-armed bandit problems in a rested rotting setting, where the mean reward of an arm may decrease with each pull, while otherwise, it remains unchanged. We explore two scenarios regarding the rotting of rewards: one in which the cumulative amount of rotting is bounded by $V_T$, referred to as the slow-rotting case, and the other in which the cumulative… ▽ More

    Submitted 1 June, 2025; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: NeurIPS 2024

  4. arXiv:2312.13927  [pdf, other

    cs.LG cs.AI

    On the Convergence of Loss and Uncertainty-based Active Learning Algorithms

    Authors: Daniel Haimovich, Dima Karamshuk, Fridolin Linder, Niek Tax, Milan Vojnovic

    Abstract: We investigate the convergence rates and data sample sizes required for training a machine learning model using a stochastic gradient descent (SGD) algorithm, where data points are sampled based on either their loss value or uncertainty value. These training methods are particularly relevant for active learning and data subset selection problems. For SGD with a constant step size update, we presen… ▽ More

    Submitted 22 November, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

  5. arXiv:2305.16074  [pdf, other

    cs.LG math.ST

    Combinatorial Bandits for Maximum Value Reward Function under Max Value-Index Feedback

    Authors: Yiliu Wang, Wei Chen, Milan Vojnović

    Abstract: We consider a combinatorial multi-armed bandit problem for maximum value reward function under maximum value and index feedback. This is a new feedback structure that lies in between commonly studied semi-bandit and full-bandit feedback structures. We propose an algorithm and provide a regret bound for problem instances with stochastic arm outcomes according to arbitrary distributions with finite… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

  6. arXiv:2301.09223  [pdf, other

    stat.ML cs.AI cs.LG cs.MA

    Doubly Adversarial Federated Bandits

    Authors: Jialin Yi, Milan Vojnović

    Abstract: We study a new non-stochastic federated multi-armed bandit problem with multiple agents collaborating via a communication network. The losses of the arms are assigned by an oblivious adversary that specifies the loss of each arm not only for each time step but also for each agent, which we call ``doubly adversarial". In this setting, different agents may choose the same arm in the same time step b… ▽ More

    Submitted 21 October, 2023; v1 submitted 22 January, 2023; originally announced January 2023.

    Comments: Published in ICML 2023 https://proceedings.mlr.press/v202/yi23a.html

    Journal ref: Proceedings of the 40th International Conference on Machine Learning 2023

  7. arXiv:2211.17154  [pdf, other

    stat.ML cs.LG cs.MA math.ST

    On Regret-optimal Cooperative Nonstochastic Multi-armed Bandits

    Authors: Jialin Yi, Milan Vojnović

    Abstract: We consider the nonstochastic multi-agent multi-armed bandit problem with agents collaborating via a communication network with delays. We show a lower bound for individual regret of all agents. We show that with suitable regularizers and communication protocols, a collaborative multi-agent \emph{follow-the-regularized-leader} (FTRL) algorithm has an individual regret upper bound that matches the… ▽ More

    Submitted 21 October, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

    Comments: Published in AAMAS 2023

    Journal ref: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems 1329 1335

  8. arXiv:2211.13628  [pdf, other

    math.OC math.PR

    Dynamics and Inference for Voter Model Processes

    Authors: Milan Vojnovic, Kaifang Zhou

    Abstract: We consider a discrete-time voter model process on a set of nodes, each being in one of two states, either 0 or 1. In each time step, each node adopts the state of a randomly sampled neighbor according to sampling probabilities, referred to as node interaction parameters. We study the maximum likelihood estimation of the node interaction parameters from observed node states for a given number of r… ▽ More

    Submitted 24 November, 2022; originally announced November 2022.

  9. arXiv:2201.12975  [pdf, other

    cs.LG cs.DS math.OC stat.ML

    Rotting Infinitely Many-armed Bandits

    Authors: Jung-hun Kim, Milan Vojnovic, Se-Young Yun

    Abstract: We consider the infinitely many-armed bandit problem with rotting rewards, where the mean reward of an arm decreases at each pull of the arm according to an arbitrary trend with maximum rotting rate $\varrho=o(1)$. We show that this learning problem has an $Ω(\max\{\varrho^{1/3}T,\sqrt{T}\})$ worst-case regret lower bound where $T$ is the horizon time. We show that a matching upper bound… ▽ More

    Submitted 17 December, 2023; v1 submitted 30 January, 2022; originally announced January 2022.

    Comments: ICML2022

  10. arXiv:2112.06362  [pdf, other

    cs.LG cs.DS math.OC

    Scheduling Servers with Stochastic Bilinear Rewards

    Authors: Jung-hun Kim, Milan Vojnovic

    Abstract: We address a control system optimization problem that arises in multi-class, multi-server queueing system scheduling with uncertainty. In this scenario, jobs incur holding costs while awaiting completion, and job-server assignments yield observable stochastic rewards with unknown mean values. The rewards for job-server assignments are assumed to follow a bilinear model with respect to features cha… ▽ More

    Submitted 1 September, 2024; v1 submitted 12 December, 2021; originally announced December 2021.

  11. arXiv:2108.00230  [pdf, other

    stat.ML cs.LG

    Pure Exploration and Regret Minimization in Matching Bandits

    Authors: Flore Sentenac, Jialin Yi, Clément Calauzènes, Vianney Perchet, Milan Vojnovic

    Abstract: Finding an optimal matching in a weighted graph is a standard combinatorial problem. We consider its semi-bandit version where either a pair or a full matching is sampled sequentially. We prove that it is possible to leverage a rank-1 assumption on the adjacency matrix to reduce the sample complexity and the regret of off-the-shelf algorithms up to reaching a linear dependency in the number of ver… ▽ More

    Submitted 31 July, 2021; originally announced August 2021.

    Journal ref: Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021

  12. arXiv:2105.13655  [pdf, other

    cs.LG cs.DS math.OC stat.ML

    Scheduling Jobs with Stochastic Holding Costs

    Authors: Dabeen Lee, Milan Vojnovic

    Abstract: We study a single-server scheduling problem for the objective of minimizing the expected cumulative holding cost incurred by jobs, where parameters defining stochastic job holding costs are unknown to the scheduler. We consider a general setting allowing for different job classes, where jobs of the same class have statistically identical holding costs and service times, with an arbitrary number of… ▽ More

    Submitted 21 September, 2022; v1 submitted 28 May, 2021; originally announced May 2021.

    Comments: Extended abstract appeared in NeurIPS 2021

  13. arXiv:2012.15194  [pdf, other

    cs.DS cs.LG math.OC

    Test Score Algorithms for Budgeted Stochastic Utility Maximization

    Authors: Dabeen Lee, Milan Vojnovic, Se-Young Yun

    Abstract: Motivated by recent developments in designing algorithms based on individual item scores for solving utility maximization problems, we study the framework of using test scores, defined as a statistic of observed individual item performance data, for solving the budgeted stochastic utility maximization problem. We extend an existing scoring mechanism, namely the replication test scores, to incorpor… ▽ More

    Submitted 24 February, 2022; v1 submitted 30 December, 2020; originally announced December 2020.

  14. Popularity Prediction for Social Media over Arbitrary Time Horizons

    Authors: Daniel Haimovich, Dima Karamshuk, Thomas J. Leeper, Evgeniy Riabenko, Milan Vojnovic

    Abstract: Predicting the popularity of social media content in real time requires approaches that efficiently operate at global scale. Popularity prediction is important for many applications, including detection of harmful viral content to enable timely content moderation. The prediction task is difficult because views result from interactions between user interests, content features, resharing, feed ranki… ▽ More

    Submitted 22 December, 2021; v1 submitted 4 September, 2020; originally announced September 2020.

    Comments: International Conference on Very Large Data Bases (VLDB'2022)

  15. arXiv:1901.00150  [pdf, other

    stat.ML cs.LG math.OC

    Accelerated MM Algorithms for Ranking Scores Inference from Comparison Data

    Authors: Milan Vojnovic, Seyoung Yun, Kaifang Zhou

    Abstract: In this paper, we study a popular method for inference of the Bradley-Terry model parameters, namely the MM algorithm, for maximum likelihood estimation and maximum a posteriori probability estimation. This class of models includes the Bradley-Terry model of paired comparisons, the Rao-Kupper model of paired comparisons allowing for tie outcomes, the Luce choice model, and the Plackett-Luce rankin… ▽ More

    Submitted 26 December, 2020; v1 submitted 1 January, 2019; originally announced January 2019.

  16. arXiv:1805.10014  [pdf, other

    cs.LG stat.ML

    KONG: Kernels for ordered-neighborhood graphs

    Authors: Moez Draief, Konstantin Kutzkov, Kevin Scaman, Milan Vojnovic

    Abstract: We present novel graph kernels for graphs with node and edge labels that have ordered neighborhoods, i.e. when neighbor nodes follow an order. Graphs with ordered neighborhoods are a natural data representation for evolving graphs where edges are created over time, which induces an order. Combining convolutional subgraph kernels and string kernels, we design new scalable algorithms for generation… ▽ More

    Submitted 29 May, 2018; v1 submitted 25 May, 2018; originally announced May 2018.

  17. arXiv:1705.00136  [pdf, other

    math.ST

    Parameter Estimation for Thurstone Choice Models

    Authors: Milan Vojnovic, Se-Young Yun

    Abstract: We consider the estimation accuracy of individual strength parameters of a Thurstone choice model when each input observation consists of a choice of one item from a set of two or more items (so called top-1 lists). This model accommodates the well-known choice models such as the Luce choice model for comparison sets of two or more items and the Bradley-Terry model for pair comparisons. We provi… ▽ More

    Submitted 29 April, 2017; originally announced May 2017.

    Comments: 55 pages

  18. arXiv:1704.08462  [pdf, ps, other

    cs.DS

    Communication complexity of approximate maximum matching in the message-passing model

    Authors: Zengfeng Huang, Bozidar Radunovic, Milan Vojnovic, Qin Zhang

    Abstract: We consider the communication complexity of finding an approximate maximum matching in a graph in a multi-party message-passing communication model. The maximum matching problem is one of the most fundamental graph combinatorial problems, with a variety of applications. The input to the problem is a graph $G$ that has $n$ vertices and the set of edges partitioned over $k$ sites, and an approxima… ▽ More

    Submitted 27 April, 2017; originally announced April 2017.

  19. arXiv:1703.00674  [pdf, other

    cs.AI cs.LG stat.ML

    Adaptive Matching for Expert Systems with Uncertain Task Types

    Authors: Virag Shah, Lennart Gulikers, Laurent Massoulie, Milan Vojnovic

    Abstract: A matching in a two-sided market often incurs an externality: a matched resource may become unavailable to the other side of the market, at least for a while. This is especially an issue in online platforms involving human experts as the expert resources are often scarce. The efficient utilization of experts in these platforms is made challenging by the fact that the information available about th… ▽ More

    Submitted 26 October, 2018; v1 submitted 2 March, 2017; originally announced March 2017.

    Comments: A part of it presented at Allerton Conference 2017, 18 pages

  20. arXiv:1610.02132  [pdf, other

    cs.LG cs.DS

    QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding

    Authors: Dan Alistarh, Demjan Grubic, Jerry Li, Ryota Tomioka, Milan Vojnovic

    Abstract: Parallel implementations of stochastic gradient descent (SGD) have received significant research attention, thanks to excellent scalability properties of this algorithm, and to its efficiency in the context of training deep neural networks. A fundamental barrier for parallelizing large-scale SGD is the fact that the cost of communicating the gradient updates between nodes can be very large. Conseq… ▽ More

    Submitted 6 December, 2017; v1 submitted 6 October, 2016; originally announced October 2016.

  21. arXiv:1605.07172  [pdf, other

    cs.DS math.CO

    Submodular Maximization using Test Scores

    Authors: Shreyas Sekar, Milan Vojnovic, Se-Young Yun

    Abstract: We study the canonical problem of maximizing a stochastic submodular function subject to a cardinality constraint, where the goal is to select a subset from a ground set of items with uncertain individual performances to maximize their expected group value. Although near-optimal algorithms have been proposed for this problem, practical concerns regarding scalability, compatibility with distributed… ▽ More

    Submitted 9 May, 2019; v1 submitted 23 May, 2016; originally announced May 2016.

    Comments: Under review

  22. arXiv:1406.5370  [pdf, other

    cs.LG cs.AI stat.ML

    Spectral Ranking using Seriation

    Authors: Fajwel Fogel, Alexandre d'Aspremont, Milan Vojnovic

    Abstract: We describe a seriation algorithm for ranking a set of items given pairwise comparisons between these items. Intuitively, the algorithm assigns similar rankings to items that compare similarly with all others. It does so by constructing a similarity matrix from pairwise comparisons, using seriation methods to reorder this matrix and construct a ranking. We first show that this spectral seriation a… ▽ More

    Submitted 10 March, 2016; v1 submitted 20 June, 2014; originally announced June 2014.

    Comments: Substantially revised. Accepted by JMLR

    MSC Class: 62F07; 06A07; 90C27

  23. arXiv:1308.0990  [pdf, ps, other

    cs.GT

    Incentives and Efficiency in Uncertain Collaborative Environments

    Authors: Yoram Bachrach, Vasilis Syrgkanis, Milan Vojnovic

    Abstract: We consider collaborative systems where users make contributions across multiple available projects and are rewarded for their contributions in individual projects according to a local sharing of the value produced. This serves as a model of online social computing systems such as online Q&A forums and of credit sharing in scientific co-authorship settings. We show that the maximum feasible produc… ▽ More

    Submitted 5 August, 2013; originally announced August 2013.

  24. arXiv:1307.2537  [pdf, ps, other

    cs.GT

    Strong Price of Anarchy and Coalitional Dynamics

    Authors: Yoram Bachrach, Vasilis Syrgkanis, Eva Tardos, Milan Vojnovic

    Abstract: We introduce a framework for studying the effect of cooperation on the quality of outcomes in utility games. Our framework is a coalitional analog of the smoothness framework of non-cooperative games. Coalitional smoothness implies bounds on the strong price of anarchy, the loss of quality of coalitionally stable outcomes, as well as bounds on coalitional versions of coarse correlated equilibria a… ▽ More

    Submitted 9 July, 2013; originally announced July 2013.

  25. arXiv:1202.1089  [pdf, ps, other

    cs.GT math.OC

    Bargaining Dynamics in Exchange Networks

    Authors: Moez Draief, Milan Vojnovic

    Abstract: We consider a dynamical system for computing Nash bargaining solutions on graphs and focus on its rate of convergence. More precisely, we analyze the edge-balanced dynamical system by Azar et al and fully specify its convergence for an important class of elementary graph structures that arise in Kleinberg and Tardos' procedure for computing a Nash bargaining solution on general graphs. We show tha… ▽ More

    Submitted 6 February, 2012; originally announced February 2012.

    Comments: Short version appeared in Allerton 2010

  26. arXiv:1202.1083  [pdf, other

    math.PR cs.DM math.OC

    Convergence Speed of Binary Interval Consensus

    Authors: Moez Draief, Milan Vojnovic

    Abstract: We consider the convergence time for solving the binary consensus problem using the interval consensus algorithm proposed by B\' en\' ezit, Thiran and Vetterli (2009). In the binary consensus problem, each node initially holds one of two states and the goal for each node is to correctly decide which one of these two states was initially held by a majority of nodes. We derive an upper bound on th… ▽ More

    Submitted 6 February, 2012; originally announced February 2012.

    Comments: To appear in SIAM Optimization and Control. Short version appeared in INFOCOM 2010