Skip to main content

Showing 1–50 of 99 results for author: Kveton, B

.
  1. arXiv:2506.06964  [pdf, ps, other

    cs.CL cs.LG

    Learning to Clarify by Reinforcement Learning Through Reward-Weighted Fine-Tuning

    Authors: Subhojyoti Mukherjee, Viet Dac Lai, Raghavendra Addanki, Ryan Rossi, Seunghyun Yoon, Trung Bui, Anup Rao, Jayakumar Subramanian, Branislav Kveton

    Abstract: Question answering (QA) agents automatically answer questions posed in natural language. In this work, we learn to ask clarifying questions in QA agents. The key idea in our method is to simulate conversations that contain clarifying questions and learn from them using reinforcement learning (RL). To make RL practical, we propose and analyze offline RL objectives that can be viewed as reward-weigh… ▽ More

    Submitted 7 June, 2025; originally announced June 2025.

    Comments: 39 pages

  2. arXiv:2506.06561  [pdf, ps, other

    cs.CL cs.AI

    LaMP-Cap: Personalized Figure Caption Generation With Multimodal Figure Profiles

    Authors: Ho Yin 'Sam' Ng, Ting-Yao Hsu, Aashish Anantha Ramakrishnan, Branislav Kveton, Nedim Lipka, Franck Dernoncourt, Dongwon Lee, Tong Yu, Sungchul Kim, Ryan A. Rossi, Ting-Hao 'Kenneth' Huang

    Abstract: Figure captions are crucial for helping readers understand and remember a figure's key message. Many models have been developed to generate these captions, helping authors compose better quality captions more easily. Yet, authors almost always need to revise generic AI-generated captions to match their writing style and the domain's style, highlighting the need for personalization. Despite languag… ▽ More

    Submitted 6 June, 2025; originally announced June 2025.

  3. arXiv:2506.02945  [pdf, ps, other

    cs.CL cs.LG

    Quantitative LLM Judges

    Authors: Aishwarya Sahoo, Jeevana Kruthi Karnuthala, Tushar Parmanand Budhwani, Pranchal Agarwal, Sankaran Vaidyanathan, Alexa Siu, Franck Dernoncourt, Jennifer Healey, Nedim Lipka, Ryan Rossi, Uttaran Bhattacharya, Branislav Kveton

    Abstract: LLM-as-a-judge is a framework in which a large language model (LLM) automatically evaluates the output of another LLM. We propose quantitative LLM judges, which align evaluation scores of existing LLM judges to human scores in a given domain using regression models. The models are trained to improve the score of the original judge by using the judge's textual evaluation and score. We present four… ▽ More

    Submitted 3 June, 2025; originally announced June 2025.

  4. arXiv:2505.14826  [pdf, other

    cs.LG cs.CL stat.ML

    FisherSFT: Data-Efficient Supervised Fine-Tuning of Language Models Using Information Gain

    Authors: Rohan Deb, Kiran Thekumparampil, Kousha Kalantari, Gaurush Hiranandani, Shoham Sabach, Branislav Kveton

    Abstract: Supervised fine-tuning (SFT) is a standard approach to adapting large language models (LLMs) to new domains. In this work, we improve the statistical efficiency of SFT by selecting an informative subset of training examples. Specifically, for a fixed budget of training examples, which determines the computational cost of fine-tuning, we determine the most informative ones. The key idea in our meth… ▽ More

    Submitted 20 May, 2025; originally announced May 2025.

  5. arXiv:2505.14106  [pdf, ps, other

    cs.CL cs.AI

    A Personalized Conversational Benchmark: Towards Simulating Personalized Conversations

    Authors: Li Li, Peilin Cai, Ryan A. Rossi, Franck Dernoncourt, Branislav Kveton, Junda Wu, Tong Yu, Linxin Song, Tiankai Yang, Yuehan Qin, Nesreen K. Ahmed, Samyadeep Basu, Subhojyoti Mukherjee, Ruiyi Zhang, Zhengmian Hu, Bo Ni, Yuxiao Zhou, Zichao Wang, Yue Huang, Yu Wang, Xiangliang Zhang, Philip S. Yu, Xiyang Hu, Yue Zhao

    Abstract: We present PersonaConvBench, a large-scale benchmark for evaluating personalized reasoning and generation in multi-turn conversations with large language models (LLMs). Unlike existing work that focuses on either personalization or conversational structure in isolation, PersonaConvBench integrates both, offering three core tasks: sentence classification, impact regression, and user-centric text ge… ▽ More

    Submitted 25 May, 2025; v1 submitted 20 May, 2025; originally announced May 2025.

  6. RecGaze: The First Eye Tracking and User Interaction Dataset for Carousel Interfaces

    Authors: Santiago de Leon-Martinez, Jingwei Kang, Robert Moro, Maarten de Rijke, Branislav Kveton, Harrie Oosterhuis, Maria Bielikova

    Abstract: Carousel interfaces are widely used in e-commerce and streaming services, but little research has been devoted to them. Previous studies of interfaces for presenting search and recommendation results have focused on single ranked lists, but it appears their results cannot be extrapolated to carousels due to the added complexity. Eye tracking is a highly informative approach to understanding how us… ▽ More

    Submitted 29 April, 2025; originally announced April 2025.

    Comments: Accepted to Resource & Reproducibility Track SIGIR '25

  7. arXiv:2503.16734  [pdf, other

    cs.AI cs.IR

    Towards Agentic Recommender Systems in the Era of Multimodal Large Language Models

    Authors: Chengkai Huang, Junda Wu, Yu Xia, Zixu Yu, Ruhan Wang, Tong Yu, Ruiyi Zhang, Ryan A. Rossi, Branislav Kveton, Dongruo Zhou, Julian McAuley, Lina Yao

    Abstract: Recent breakthroughs in Large Language Models (LLMs) have led to the emergence of agentic AI systems that extend beyond the capabilities of standalone models. By empowering LLMs to perceive external environments, integrate multimodal information, and interact with various tools, these agentic systems exhibit greater autonomy and adaptability across complex tasks. This evolution brings new opportun… ▽ More

    Submitted 20 March, 2025; originally announced March 2025.

  8. arXiv:2503.02119  [pdf, other

    cs.LG

    An Efficient Plugin Method for Metric Optimization of Black-Box Models

    Authors: Siddartha Devic, Nurendra Choudhary, Anirudh Srinivasan, Sahika Genc, Branislav Kveton, Gaurush Hiranandani

    Abstract: Many machine learning algorithms and classifiers are available only via API queries as a ``black-box'' -- that is, the downstream user has no ability to change, re-train, or fine-tune the model on a particular target distribution. Indeed, the downstream user may not even have knowledge of the \emph{original} training distribution or performance metric used to construct and optimize the black-box m… ▽ More

    Submitted 3 March, 2025; originally announced March 2025.

  9. arXiv:2503.01076  [pdf, other

    cs.LG stat.ML

    Active Learning for Direct Preference Optimization

    Authors: Branislav Kveton, Xintong Li, Julian McAuley, Ryan Rossi, Jingbo Shang, Junda Wu, Tong Yu

    Abstract: Direct preference optimization (DPO) is a form of reinforcement learning from human feedback (RLHF) where the policy is learned directly from preferential feedback. Although many models of human preferences exist, the critical task of selecting the most informative feedback for training them is under-explored. We propose an active learning framework for DPO, which can be applied to collect human f… ▽ More

    Submitted 2 March, 2025; originally announced March 2025.

  10. arXiv:2502.11767  [pdf, ps, other

    cs.LG cs.CL

    From Selection to Generation: A Survey of LLM-based Active Learning

    Authors: Yu Xia, Subhojyoti Mukherjee, Zhouhang Xie, Junda Wu, Xintong Li, Ryan Aponte, Hanjia Lyu, Joe Barrow, Hongjie Chen, Franck Dernoncourt, Branislav Kveton, Tong Yu, Ruiyi Zhang, Jiuxiang Gu, Nesreen K. Ahmed, Yu Wang, Xiang Chen, Hanieh Deilamsalehy, Sungchul Kim, Zhengmian Hu, Yue Zhao, Nedim Lipka, Seunghyun Yoon, Ting-Hao Kenneth Huang, Zichao Wang , et al. (9 additional authors not shown)

    Abstract: Active Learning (AL) has been a powerful paradigm for improving model efficiency and performance by selecting the most informative data points for labeling and training. In recent active learning frameworks, Large Language Models (LLMs) have been employed not only for selection but also for generating entirely new data instances and providing more cost-effective annotations. Motivated by the incre… ▽ More

    Submitted 31 May, 2025; v1 submitted 17 February, 2025; originally announced February 2025.

    Comments: ACL 2025

  11. arXiv:2412.19396  [pdf, other

    cs.LG cs.AI cs.IT math.OC stat.ML

    Comparing Few to Rank Many: Active Human Preference Learning using Randomized Frank-Wolfe

    Authors: Kiran Koshy Thekumparampil, Gaurush Hiranandani, Kousha Kalantari, Shoham Sabach, Branislav Kveton

    Abstract: We study learning of human preferences from a limited comparison feedback. This task is ubiquitous in machine learning. Its applications such as reinforcement learning from human feedback, have been transformational. We formulate this problem as learning a Plackett-Luce model over a universe of $N$ choices from $K$-way comparison feedback, where typically $K \ll N$. Our solution is the D-optimal d… ▽ More

    Submitted 26 December, 2024; originally announced December 2024.

    Comments: Submitted to AISTATS 2025 on October 10, 2024

  12. arXiv:2412.13501  [pdf, other

    cs.AI cs.HC

    GUI Agents: A Survey

    Authors: Dang Nguyen, Jian Chen, Yu Wang, Gang Wu, Namyong Park, Zhengmian Hu, Hanjia Lyu, Junda Wu, Ryan Aponte, Yu Xia, Xintong Li, Jing Shi, Hongjie Chen, Viet Dac Lai, Zhouhang Xie, Sungchul Kim, Ruiyi Zhang, Tong Yu, Mehrab Tanjim, Nesreen K. Ahmed, Puneet Mathur, Seunghyun Yoon, Lina Yao, Branislav Kveton, Thien Huu Nguyen , et al. (4 additional authors not shown)

    Abstract: Graphical User Interface (GUI) agents, powered by Large Foundation Models, have emerged as a transformative approach to automating human-computer interaction. These agents autonomously interact with digital systems or software applications via GUIs, emulating human actions such as clicking, typing, and navigating visual elements across diverse platforms. Motivated by the growing interest and funda… ▽ More

    Submitted 17 December, 2024; originally announced December 2024.

  13. arXiv:2412.05469  [pdf, other

    cs.LG

    Multi-Objective Alignment of Large Language Models Through Hypervolume Maximization

    Authors: Subhojyoti Mukherjee, Anusha Lalitha, Sailik Sengupta, Aniket Deshmukh, Branislav Kveton

    Abstract: Multi-objective alignment from human feedback (MOAHF) in large language models (LLMs) is a challenging problem as human preferences are complex, multifaceted, and often conflicting. Recent works on MOAHF considered a-priori multi-objective optimization (MOO), where human preferences are known at training or inference time. In contrast, when human preferences are unknown or difficult to quantify, a… ▽ More

    Submitted 6 December, 2024; originally announced December 2024.

  14. arXiv:2411.09065  [pdf, other

    cs.IR cs.AI cs.LG

    Language-Model Prior Overcomes Cold-Start Items

    Authors: Shiyu Wang, Hao Ding, Yupeng Gu, Sergul Aydore, Kousha Kalantari, Branislav Kveton

    Abstract: The growth of recommender systems (RecSys) is driven by digitization and the need for personalized content in areas such as e-commerce and video streaming. The content in these systems often changes rapidly and therefore they constantly face the ongoing cold-start problem, where new items lack interaction data and are hard to value. Existing solutions for the cold-start problem, such as content-ba… ▽ More

    Submitted 13 November, 2024; originally announced November 2024.

    Comments: This paper is dedicated to cold-start item recommendation using language-model priors

  15. arXiv:2411.00027  [pdf, other

    cs.CL

    Personalization of Large Language Models: A Survey

    Authors: Zhehao Zhang, Ryan A. Rossi, Branislav Kveton, Yijia Shao, Diyi Yang, Hamed Zamani, Franck Dernoncourt, Joe Barrow, Tong Yu, Sungchul Kim, Ruiyi Zhang, Jiuxiang Gu, Tyler Derr, Hongjie Chen, Junda Wu, Xiang Chen, Zichao Wang, Subrata Mitra, Nedim Lipka, Nesreen Ahmed, Yu Wang

    Abstract: Personalization of Large Language Models (LLMs) has recently become increasingly important with a wide range of applications. Despite the importance and recent progress, most existing works on personalized LLMs have focused either entirely on (a) personalized text generation or (b) leveraging LLMs for personalization-related downstream applications, such as recommendation systems. In this work, we… ▽ More

    Submitted 5 May, 2025; v1 submitted 29 October, 2024; originally announced November 2024.

  16. arXiv:2410.23703  [pdf, other

    cs.LG cs.CL

    OCEAN: Offline Chain-of-thought Evaluation and Alignment in Large Language Models

    Authors: Junda Wu, Xintong Li, Ruoyu Wang, Yu Xia, Yuxin Xiong, Jianing Wang, Tong Yu, Xiang Chen, Branislav Kveton, Lina Yao, Jingbo Shang, Julian McAuley

    Abstract: Offline evaluation of LLMs is crucial in understanding their capacities, though current methods remain underexplored in existing research. In this work, we focus on the offline evaluation of the chain-of-thought capabilities and show how to optimize LLMs based on the proposed evaluation method. To enable offline feedback with rich knowledge and reasoning paths, we use knowledge graphs (e.g., Wikid… ▽ More

    Submitted 31 October, 2024; originally announced October 2024.

    Comments: 10 pages

  17. arXiv:2410.03919  [pdf, other

    cs.LG stat.ML

    Online Posterior Sampling with a Diffusion Prior

    Authors: Branislav Kveton, Boris Oreshkin, Youngsuk Park, Aniket Deshmukh, Rui Song

    Abstract: Posterior sampling in contextual bandits with a Gaussian prior can be implemented exactly or approximately using the Laplace approximation. The Gaussian prior is computationally efficient but it cannot describe complex distributions. In this work, we propose approximate posterior sampling algorithms for contextual bandits with a diffusion model prior. The key idea is to sample from a chain of appr… ▽ More

    Submitted 4 October, 2024; originally announced October 2024.

    Comments: Proceedings of the 38th Conference on Neural Information Processing Systems

  18. arXiv:2406.10030  [pdf, other

    cs.LG stat.ML

    Off-Policy Evaluation from Logged Human Feedback

    Authors: Aniruddha Bhargava, Lalit Jain, Branislav Kveton, Ge Liu, Subhojyoti Mukherjee

    Abstract: Learning from human feedback has been central to recent advances in artificial intelligence and machine learning. Since the collection of human feedback is costly, a natural question to ask is if the new feedback always needs to collected. Or could we evaluate a new model with the human feedback on responses of another model? This motivates us to study off-policy evaluation from logged human feedb… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  19. arXiv:2405.15332  [pdf, other

    cs.LG

    Cross-Validated Off-Policy Evaluation

    Authors: Matej Cief, Branislav Kveton, Michal Kompan

    Abstract: We study estimator selection and hyper-parameter tuning in off-policy evaluation. Although cross-validation is the most popular method for model selection in supervised learning, off-policy evaluation relies mostly on theory, which provides only limited guidance to practitioners. We show how to use cross-validation for off-policy evaluation. This challenges a popular belief that cross-validation i… ▽ More

    Submitted 20 December, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: 13 pages, 7 figures, to be published in AAAI 2025

  20. arXiv:2404.13895  [pdf, other

    cs.LG

    Optimal Design for Human Preference Elicitation

    Authors: Subhojyoti Mukherjee, Anusha Lalitha, Kousha Kalantari, Aniket Deshmukh, Ge Liu, Yifei Ma, Branislav Kveton

    Abstract: Learning of preference models from human feedback has been central to recent advances in artificial intelligence. Motivated by the cost of obtaining high-quality human annotations, we study efficient human preference elicitation for learning preference models. The key idea in our work is to generalize optimal designs, a methodology for computing optimal information-gathering policies, to questions… ▽ More

    Submitted 4 November, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Advances in Neural Information Processing Systems 37

  21. arXiv:2404.08846  [pdf, other

    cs.LG cs.CL

    Experimental Design for Active Transductive Inference in Large Language Models

    Authors: Subhojyoti Mukherjee, Anusha Lalitha, Aniket Deshmukh, Ge Liu, Yifei Ma, Branislav Kveton

    Abstract: One emergent ability of large language models (LLMs) is that query-specific examples can be included in the prompt at inference time. In this work, we use active learning for adaptive prompt design and call it Active In-context Prompt Design (AIPD). We design the LLM prompt by adaptively choosing few-shot examples from a training set to optimize performance on a test set. The training examples are… ▽ More

    Submitted 30 May, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

  22. arXiv:2401.08893  [pdf, other

    cs.LG math.OC

    MADA: Meta-Adaptive Optimizers through hyper-gradient Descent

    Authors: Kaan Ozkara, Can Karakus, Parameswaran Raman, Mingyi Hong, Shoham Sabach, Branislav Kveton, Volkan Cevher

    Abstract: Following the introduction of Adam, several novel adaptive optimizers for deep learning have been proposed. These optimizers typically excel in some tasks but may not outperform Adam uniformly across all tasks. In this work, we introduce Meta-Adaptive Optimizers (MADA), a unified optimizer framework that can generalize several known optimizers and dynamically learn the most suitable one during tra… ▽ More

    Submitted 17 June, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

  23. arXiv:2312.14345  [pdf, other

    cs.AI cs.CL cs.HC

    Logic-Scaffolding: Personalized Aspect-Instructed Recommendation Explanation Generation using LLMs

    Authors: Behnam Rahdari, Hao Ding, Ziwei Fan, Yifei Ma, Zhuotong Chen, Anoop Deoras, Branislav Kveton

    Abstract: The unique capabilities of Large Language Models (LLMs), such as the natural language text generation ability, position them as strong candidates for providing explanation for recommendations. However, despite the size of the LLM, most existing models struggle to produce zero-shot explanations reliably. To address this issue, we propose a framework called Logic-Scaffolding, that combines the ideas… ▽ More

    Submitted 17 January, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: The 17th ACM International Conference on Web Search and Data Mining (WSDM 2024)

  24. arXiv:2310.19251  [pdf, other

    cs.IR cs.AI

    Pre-trained Recommender Systems: A Causal Debiasing Perspective

    Authors: Ziqian Lin, Hao Ding, Nghia Trong Hoang, Branislav Kveton, Anoop Deoras, Hao Wang

    Abstract: Recent studies on pre-trained vision/language models have demonstrated the practical benefit of a new, promising solution-building paradigm in AI where models can be pre-trained on broad data describing a generic task space and then adapted successfully to solve a wide range of downstream tasks, even when training data is severely limited (e.g., in zero- or few-shot learning scenarios). Inspired b… ▽ More

    Submitted 8 January, 2024; v1 submitted 29 October, 2023; originally announced October 2023.

    Comments: 8 pages, WSDM 24

  25. arXiv:2310.18617  [pdf, other

    cs.LG stat.ML

    Pessimistic Off-Policy Multi-Objective Optimization

    Authors: Shima Alizadeh, Aniruddha Bhargava, Karthick Gopalswamy, Lalit Jain, Branislav Kveton, Ge Liu

    Abstract: Multi-objective optimization is a type of decision making problems where multiple conflicting objectives are optimized. We study offline optimization of multi-objective policies from data collected by an existing policy. We propose a pessimistic estimator for the multi-objective policy values that can be easily plugged into existing formulas for hypervolume computation and optimized. The estimator… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

  26. arXiv:2310.14751  [pdf, other

    cs.LG

    Efficient and Interpretable Bandit Algorithms

    Authors: Subhojyoti Mukherjee, Ruihao Zhu, Branislav Kveton

    Abstract: Motivated by the importance of explainability in modern machine learning, we design bandit algorithms that are efficient and interpretable. A bandit algorithm is interpretable if it explores with the objective of reducing uncertainty in the unknown model parameter. To quantify the interpretability, we introduce a novel metric of model error, which compares the rate reduction of the mean reward est… ▽ More

    Submitted 8 February, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

  27. arXiv:2306.09136  [pdf, other

    cs.LG stat.ML

    Finite-Time Logarithmic Bayes Regret Upper Bounds

    Authors: Alexia Atsidakou, Branislav Kveton, Sumeet Katariya, Constantine Caramanis, Sujay Sanghavi

    Abstract: We derive the first finite-time logarithmic Bayes regret upper bounds for Bayesian bandits. In a multi-armed bandit, we obtain $O(c_Δ\log n)$ and $O(c_h \log^2 n)$ upper bounds for an upper confidence bound algorithm, where $c_h$ and $c_Δ$ are constants depending on the prior distribution and the gaps of bandit instances sampled from it, respectively. The latter bound asymptotically matches the lo… ▽ More

    Submitted 21 January, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

  28. arXiv:2306.07549  [pdf, other

    cs.LG stat.ML

    Fixed-Budget Best-Arm Identification with Heterogeneous Reward Variances

    Authors: Anusha Lalitha, Kousha Kalantari, Yifei Ma, Anoop Deoras, Branislav Kveton

    Abstract: We study the problem of best-arm identification (BAI) in the fixed-budget setting with heterogeneous reward variances. We propose two variance-adaptive BAI algorithms for this setting: SHVar for known reward variances and SHAdaVar for unknown reward variances. Our algorithms rely on non-uniform budget allocations among the arms where the arms with higher reward variances are pulled more often than… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

  29. arXiv:2303.09033  [pdf, other

    cs.LG

    Only Pay for What Is Uncertain: Variance-Adaptive Thompson Sampling

    Authors: Aadirupa Saha, Branislav Kveton

    Abstract: Most bandit algorithms assume that the reward variances or their upper bounds are known, and that they are the same for all arms. This naturally leads to suboptimal performance and higher regret due to variance overestimation. On the other hand, underestimated reward variances may lead to linear regret due to committing early to a suboptimal arm. This motivated prior works on variance-adaptive fre… ▽ More

    Submitted 12 October, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

  30. arXiv:2302.01543  [pdf, other

    cs.LG

    Multiplier Bootstrap-based Exploration

    Authors: Runzhe Wan, Haoyu Wei, Branislav Kveton, Rui Song

    Abstract: Despite the great interest in the bandit problem, designing efficient algorithms for complex models remains challenging, as there is typically no analytical way to quantify uncertainty. In this paper, we propose Multiplier Bootstrap-based Exploration (MBE), a novel exploration strategy that is applicable to any reward model amenable to weighted loss minimization. We prove both instance-dependent a… ▽ More

    Submitted 2 February, 2023; originally announced February 2023.

  31. arXiv:2302.00284  [pdf, other

    cs.LG cs.AI

    Selective Uncertainty Propagation in Offline RL

    Authors: Sanath Kumar Krishnamurthy, Tanmay Gangwani, Sumeet Katariya, Branislav Kveton, Shrey Modi, Anshuka Rangi

    Abstract: We consider the finite-horizon offline reinforcement learning (RL) setting, and are motivated by the challenge of learning the policy at any step h in dynamic programming (DP) algorithms. To learn this, it is sufficient to evaluate the treatment effect of deviating from the behavioral policy at step h after having optimized the policy for all future steps. Since the policy at any step can affect n… ▽ More

    Submitted 19 January, 2025; v1 submitted 1 February, 2023; originally announced February 2023.

  32. arXiv:2301.05182  [pdf, other

    cs.LG cs.AI stat.ML

    Thompson Sampling with Diffusion Generative Prior

    Authors: Yu-Guan Hsieh, Shiva Prasad Kasiviswanathan, Branislav Kveton, Patrick Blöbaum

    Abstract: In this work, we initiate the idea of using denoising diffusion models to learn priors for online decision making problems. Our special focus is on the meta-learning for bandit framework, with the goal of learning a strategy that performs well across bandit tasks of a same class. To this end, we train a diffusion model that learns the underlying task distribution and combine Thompson sampling with… ▽ More

    Submitted 30 January, 2023; v1 submitted 12 January, 2023; originally announced January 2023.

  33. arXiv:2212.04720  [pdf, other

    cs.LG cs.AI

    Multi-Task Off-Policy Learning from Bandit Feedback

    Authors: Joey Hong, Branislav Kveton, Sumeet Katariya, Manzil Zaheer, Mohammad Ghavamzadeh

    Abstract: Many practical applications, such as recommender systems and learning to rank, involve solving multiple similar tasks. One example is learning of recommendation policies for users with similar movie preferences, where the users may still rank the individual movies slightly differently. Such tasks can be organized in a hierarchy, where similar tasks are related through a shared structure. In this w… ▽ More

    Submitted 9 December, 2022; originally announced December 2022.

    Comments: 14 pages, 3 figures

  34. arXiv:2211.08572  [pdf, other

    cs.LG stat.ML

    Bayesian Fixed-Budget Best-Arm Identification

    Authors: Alexia Atsidakou, Sumeet Katariya, Sujay Sanghavi, Branislav Kveton

    Abstract: Fixed-budget best-arm identification (BAI) is a bandit problem where the agent maximizes the probability of identifying the optimal arm within a fixed budget of observations. In this work, we study this problem in the Bayesian setting. We propose a Bayesian elimination algorithm and derive an upper bound on its probability of misidentifying the optimal arm. The bound reflects the quality of the pr… ▽ More

    Submitted 15 June, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

  35. arXiv:2210.14483  [pdf, other

    cs.LG stat.ML

    Robust Contextual Linear Bandits

    Authors: Rong Zhu, Branislav Kveton

    Abstract: Model misspecification is a major consideration in applications of statistical methods and machine learning. However, it is often neglected in contextual bandits. This paper studies a common form of misspecification, an inter-arm heterogeneity that is not captured by context. To address this issue, we assume that the heterogeneity arises due to arm-specific random variables, which can be learned.… ▽ More

    Submitted 26 October, 2022; originally announced October 2022.

  36. arXiv:2209.13426  [pdf, other

    cs.IR cs.HC cs.IT

    From Ranked Lists to Carousels: A Carousel Click Model

    Authors: Behnam Rahdari, Branislav Kveton, Peter Brusilovsky

    Abstract: Carousel-based recommendation interfaces allow users to explore recommended items in a structured, efficient, and visually-appealing way. This made them a de-facto standard approach to recommending items to end users in many real-life recommenders. In this work, we try to explain the efficiency of carousel recommenders using a \emph{carousel click model}, a generative model of user interaction wit… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

  37. arXiv:2206.04091  [pdf, other

    stat.ML cs.LG

    Uplifting Bandits

    Authors: Yu-Guan Hsieh, Shiva Prasad Kasiviswanathan, Branislav Kveton

    Abstract: We introduce a multi-armed bandit model where the reward is a sum of multiple random variables, and each action only alters the distributions of some of them. After each action, the agent observes the realizations of all the variables. This model is motivated by marketing campaigns and recommender systems, where the variables represent outcomes on individual customers, such as clicks. We propose U… ▽ More

    Submitted 8 June, 2022; originally announced June 2022.

  38. arXiv:2206.02593  [pdf, other

    cs.LG cs.AI cs.IR

    Pessimistic Off-Policy Optimization for Learning to Rank

    Authors: Matej Cief, Branislav Kveton, Michal Kompan

    Abstract: Off-policy learning is a framework for optimizing policies without deploying them, using data collected by another policy. In recommender systems, this is especially challenging due to the imbalance in logged data: some items are recommended and thus logged more frequently than others. This is further perpetuated when recommending a list of items, as the action space is combinatorial. To address t… ▽ More

    Submitted 23 August, 2024; v1 submitted 6 June, 2022; originally announced June 2022.

    Comments: 13 pages, 10 figures, to be published in ECAI 2024

  39. arXiv:2205.15124  [pdf, other

    cs.LG stat.ML

    Mixed-Effect Thompson Sampling

    Authors: Imad Aouali, Branislav Kveton, Sumeet Katariya

    Abstract: A contextual bandit is a popular framework for online learning to act under uncertainty. In practice, the number of actions is huge and their expected rewards are correlated. In this work, we introduce a general framework for capturing such correlations through a mixed-effect model where actions are related through multiple shared effect parameters. To explore efficiently using this structure, we… ▽ More

    Submitted 5 March, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

  40. arXiv:2202.13234  [pdf, other

    cs.LG

    Safe Exploration for Efficient Policy Evaluation and Comparison

    Authors: Runzhe Wan, Branislav Kveton, Rui Song

    Abstract: High-quality data plays a central role in ensuring the accuracy of policy evaluation. This paper initiates the study of efficient and safe data collection for bandit policy evaluation. We formulate the problem and investigate its several representative variants. For each variant, we analyze its statistical properties, derive the corresponding exploration policy, and design an efficient algorithm f… ▽ More

    Submitted 18 June, 2022; v1 submitted 26 February, 2022; originally announced February 2022.

  41. arXiv:2202.12888  [pdf, other

    cs.LG cs.AI stat.ML

    Meta-Learning for Simple Regret Minimization

    Authors: Mohammadjavad Azizi, Branislav Kveton, Mohammad Ghavamzadeh, Sumeet Katariya

    Abstract: We develop a meta-learning framework for simple regret minimization in bandits. In this framework, a learning agent interacts with a sequence of bandit tasks, which are sampled i.i.d.\ from an unknown prior distribution, and learns its meta-parameters to perform better on future tasks. We propose the first Bayesian and frequentist meta-learning algorithms for this setting. The Bayesian algorithm h… ▽ More

    Submitted 4 July, 2023; v1 submitted 25 February, 2022; originally announced February 2022.

  42. arXiv:2202.01454  [pdf, other

    cs.LG stat.ML

    Deep Hierarchy in Bandits

    Authors: Joey Hong, Branislav Kveton, Sumeet Katariya, Manzil Zaheer, Mohammad Ghavamzadeh

    Abstract: Mean rewards of actions are often correlated. The form of these correlations may be complex and unknown a priori, such as the preferences of a user for recommended products and their categories. To maximize statistical efficiency, it is important to leverage these correlations when learning. We formulate a bandit variant of this problem where the correlations of mean action rewards are represented… ▽ More

    Submitted 3 February, 2022; originally announced February 2022.

  43. arXiv:2201.09798  [pdf, other

    cs.LG cs.CE

    IMO$^3$: Interactive Multi-Objective Off-Policy Optimization

    Authors: Nan Wang, Hongning Wang, Maryam Karimzadehgan, Branislav Kveton, Craig Boutilier

    Abstract: Most real-world optimization problems have multiple objectives. A system designer needs to find a policy that trades off these objectives to reach a desired operating point. This problem has been studied extensively in the setting of known objective functions. We consider a more practical but challenging setting of unknown objective functions. In industry, this problem is mostly approached with on… ▽ More

    Submitted 24 January, 2022; v1 submitted 24 January, 2022; originally announced January 2022.

  44. arXiv:2111.06929  [pdf, other

    cs.LG cs.AI

    Hierarchical Bayesian Bandits

    Authors: Joey Hong, Branislav Kveton, Manzil Zaheer, Mohammad Ghavamzadeh

    Abstract: Meta-, multi-task, and federated learning can be all viewed as solving similar tasks, drawn from a distribution that reflects task similarities. We provide a unified view of all these problems, as learning to act in a hierarchical Bayesian bandit. We propose and analyze a natural hierarchical Thompson sampling algorithm (HierTS) for this class of problems. Our regret bounds hold for many variants… ▽ More

    Submitted 5 March, 2022; v1 submitted 12 November, 2021; originally announced November 2021.

    Comments: Proceedings of the 25th International Conference on Artificial Intelligence and Statistics

  45. arXiv:2111.04835  [pdf, other

    cs.LG

    Safe Data Collection for Offline and Online Policy Learning

    Authors: Ruihao Zhu, Branislav Kveton

    Abstract: Motivated by practical needs of experimentation and policy learning in online platforms, we study the problem of safe data collection. Specifically, our goal is to develop a logging policy that efficiently explores different actions to elicit information while achieving competitive reward with a baseline production policy. We first show that a common practice of mixing the production policy with r… ▽ More

    Submitted 4 August, 2022; v1 submitted 8 November, 2021; originally announced November 2021.

  46. arXiv:2109.07743  [pdf, other

    cs.LG stat.ML

    Optimal Probing with Statistical Guarantees for Network Monitoring at Scale

    Authors: Muhammad Jehangir Amjad, Christophe Diot, Dimitris Konomis, Branislav Kveton, Augustin Soule, Xiaolong Yang

    Abstract: Cloud networks are difficult to monitor because they grow rapidly and the budgets for monitoring them are limited. We propose a framework for estimating network metrics, such as latency and packet loss, with guarantees on estimation errors for a fixed monitoring budget. Our proposed algorithms produce a distribution of probes across network paths, which we then monitor; and are based on A- and E-o… ▽ More

    Submitted 16 September, 2021; originally announced September 2021.

  47. arXiv:2107.06196  [pdf, other

    cs.LG

    No Regrets for Learning the Prior in Bandits

    Authors: Soumya Basu, Branislav Kveton, Manzil Zaheer, Csaba Szepesvári

    Abstract: We propose ${\tt AdaTS}$, a Thompson sampling algorithm that adapts sequentially to bandit tasks that it interacts with. The key idea in ${\tt AdaTS}$ is to adapt to an unknown task prior distribution by maintaining a distribution over its parameters. When solving a bandit task, that uncertainty is marginalized out and properly accounted for. ${\tt AdaTS}$ is a fully-Bayesian algorithm that can be… ▽ More

    Submitted 25 February, 2022; v1 submitted 13 July, 2021; originally announced July 2021.

  48. arXiv:2106.12200  [pdf, other

    cs.LG stat.ML

    Random Effect Bandits

    Authors: Rong Zhu, Branislav Kveton

    Abstract: This paper studies regret minimization in a multi-armed bandit. It is well known that side information, such as the prior distribution of arm means in Thompson sampling, can improve the statistical efficiency of the bandit algorithm. While the prior is a blessing when correctly specified, it is a curse when misspecified. To address this issue, we introduce the assumption of a random-effect model t… ▽ More

    Submitted 4 March, 2022; v1 submitted 23 June, 2021; originally announced June 2021.

    Comments: Proceedings of the 25th International Conference on Artificial Intelligence and Statistics

  49. arXiv:2106.05608  [pdf, other

    cs.LG cs.AI stat.ML

    Thompson Sampling with a Mixture Prior

    Authors: Joey Hong, Branislav Kveton, Manzil Zaheer, Mohammad Ghavamzadeh, Craig Boutilier

    Abstract: We study Thompson sampling (TS) in online decision making, where the uncertain environment is sampled from a mixture distribution. This is relevant in multi-task learning, where a learning agent faces different classes of problems. We incorporate this structure in a natural way by initializing TS with a mixture prior, and call the resulting algorithm MixTS. To analyze MixTS, we develop a novel and… ▽ More

    Submitted 5 March, 2022; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: Proceedings of the 25th International Conference on Artificial Intelligence and Statistics

  50. arXiv:2106.04763  [pdf, other

    cs.LG

    Fixed-Budget Best-Arm Identification in Structured Bandits

    Authors: Mohammad Javad Azizi, Branislav Kveton, Mohammad Ghavamzadeh

    Abstract: Best-arm identification (BAI) in a fixed-budget setting is a bandit problem where the learning agent maximizes the probability of identifying the optimal (best) arm after a fixed number of observations. Most works on this topic study unstructured problems with a small number of arms, which limits their applicability. We propose a general tractable algorithm that incorporates the structure, by succ… ▽ More

    Submitted 4 July, 2023; v1 submitted 8 June, 2021; originally announced June 2021.