Skip to main content

Showing 1–25 of 25 results for author: Kamalaruban, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2503.10228  [pdf, other

    cs.LG

    Policy Teaching via Data Poisoning in Learning from Human Preferences

    Authors: Andi Nika, Jonathan Nöther, Debmalya Mandal, Parameswaran Kamalaruban, Adish Singla, Goran Radanović

    Abstract: We study data poisoning attacks in learning from human preferences. More specifically, we consider the problem of teaching/enforcing a target policy $π^\dagger$ by synthesizing preference data. We seek to understand the susceptibility of different preference-based learning paradigms to poisoned preference data by analyzing the number of samples required by the attacker to enforce $π^\dagger$. We f… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

    Comments: In AISTATS 2025

  2. arXiv:2503.05684  [pdf, other

    cs.LG cs.CV

    Fairness-Aware Low-Rank Adaptation Under Demographic Privacy Constraints

    Authors: Parameswaran Kamalaruban, Mark Anderson, Stuart Burrell, Maeve Madigan, Piotr Skalski, David Sutton

    Abstract: Pre-trained foundation models can be adapted for specific tasks using Low-Rank Adaptation (LoRA). However, the fairness properties of these adapted classifiers remain underexplored. Existing fairness-aware fine-tuning methods rely on direct access to sensitive attributes or their predictors, but in practice, these sensitive attributes are often held under strict consumer privacy controls, and neit… ▽ More

    Submitted 7 March, 2025; originally announced March 2025.

  3. arXiv:2409.04373  [pdf, other

    cs.LG

    Evaluating Fairness in Transaction Fraud Models: Fairness Metrics, Bias Audits, and Challenges

    Authors: Parameswaran Kamalaruban, Yulu Pi, Stuart Burrell, Eleanor Drage, Piotr Skalski, Jason Wong, David Sutton

    Abstract: Ensuring fairness in transaction fraud detection models is vital due to the potential harms and legal implications of biased decision-making. Despite extensive research on algorithmic fairness, there is a notable gap in the study of bias in fraud detection models, mainly due to the field's unique challenges. These challenges include the need for fairness metrics that account for fraud data's imbal… ▽ More

    Submitted 6 September, 2024; originally announced September 2024.

  4. arXiv:2407.18414  [pdf, other

    cs.LG cs.AI

    Adversarially Robust Decision Transformer

    Authors: Xiaohang Tang, Afonso Marques, Parameswaran Kamalaruban, Ilija Bogunovic

    Abstract: Decision Transformer (DT), as one of the representative Reinforcement Learning via Supervised Learning (RvS) methods, has achieved strong performance in offline learning tasks by leveraging the powerful Transformer architecture for sequential decision-making. However, in adversarial environments, these methods can be non-robust, since the return is dependent on the strategies of both the decision-… ▽ More

    Submitted 1 November, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

    Comments: Accepted to NeurIPS 2024

  5. arXiv:2405.02481  [pdf, other

    cs.LG cs.AI

    Proximal Curriculum with Task Correlations for Deep Reinforcement Learning

    Authors: Georgios Tzannetos, Parameswaran Kamalaruban, Adish Singla

    Abstract: Curriculum design for reinforcement learning (RL) can speed up an agent's learning process and help it learn to perform well on complex tasks. However, existing techniques typically require domain-specific hyperparameter tuning, involve expensive optimization procedures for task selection, or are suitable only for specific learning objectives. In this work, we consider curriculum design in context… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: IJCAI'24 paper (longer version)

  6. arXiv:2403.01857  [pdf, ps, other

    cs.LG

    Reward Model Learning vs. Direct Policy Optimization: A Comparative Analysis of Learning from Human Preferences

    Authors: Andi Nika, Debmalya Mandal, Parameswaran Kamalaruban, Georgios Tzannetos, Goran Radanović, Adish Singla

    Abstract: In this paper, we take a step towards a deeper understanding of learning from human preferences by systematically comparing the paradigm of reinforcement learning from human feedback (RLHF) with the recently proposed paradigm of direct preference optimization (DPO). We focus our attention on the class of loglinear policy parametrization and linear reward functions. In order to compare the two para… ▽ More

    Submitted 5 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  7. arXiv:2402.07019  [pdf, other

    cs.LG

    Informativeness of Reward Functions in Reinforcement Learning

    Authors: Rati Devidze, Parameswaran Kamalaruban, Adish Singla

    Abstract: Reward functions are central in specifying the task we want a reinforcement learning agent to perform. Given a task and desired optimal behavior, we study the problem of designing informative reward functions so that the designed rewards speed up the agent's convergence. In particular, we consider expert-driven reward design settings where an expert or teacher seeks to provide informative and inte… ▽ More

    Submitted 10 February, 2024; originally announced February 2024.

    Comments: Longer version of the AAMAS'24 paper

  8. arXiv:2402.06734  [pdf, ps, other

    cs.LG cs.AI

    Corruption Robust Offline Reinforcement Learning with Human Feedback

    Authors: Debmalya Mandal, Andi Nika, Parameswaran Kamalaruban, Adish Singla, Goran Radanović

    Abstract: We study data corruption robustness for reinforcement learning with human feedback (RLHF) in an offline setting. Given an offline dataset of pairs of trajectories along with feedback about human preferences, an $\varepsilon$-fraction of the pairs is corrupted (e.g., feedback flipped or trajectory features manipulated), capturing an adversarial attack or noisy human preferences. We aim to design al… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  9. arXiv:2304.12877  [pdf, other

    cs.LG

    Proximal Curriculum for Reinforcement Learning Agents

    Authors: Georgios Tzannetos, Bárbara Gomes Ribeiro, Parameswaran Kamalaruban, Adish Singla

    Abstract: We consider the problem of curriculum design for reinforcement learning (RL) agents in contextual multi-task settings. Existing techniques on automatic curriculum design typically require domain-specific hyperparameter tuning or have limited theoretical underpinnings. To tackle these limitations, we design our curriculum strategy, ProCuRL, inspired by the pedagogical concept of Zone of Proximal De… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

    Comments: Published in Transactions on Machine Learning Research (TMLR) 2023

  10. arXiv:2304.06701  [pdf, other

    cs.LG cs.AI cs.CY cs.HC

    Learning Personalized Decision Support Policies

    Authors: Umang Bhatt, Valerie Chen, Katherine M. Collins, Parameswaran Kamalaruban, Emma Kallina, Adrian Weller, Ameet Talwalkar

    Abstract: Individual human decision-makers may benefit from different forms of support to improve decision outcomes, but when each form of support will yield better outcomes? In this work, we posit that personalizing access to decision support tools can be an effective mechanism for instantiating the appropriate use of AI assistance. Specifically, we propose the general problem of learning a decision suppor… ▽ More

    Submitted 23 January, 2025; v1 submitted 13 April, 2023; originally announced April 2023.

    Comments: AAAI 2025

  11. arXiv:2202.06003  [pdf, other

    cs.LG

    Robust Learning from Observation with Model Misspecification

    Authors: Luca Viano, Yu-Ting Huang, Parameswaran Kamalaruban, Craig Innes, Subramanian Ramamoorthy, Adrian Weller

    Abstract: Imitation learning (IL) is a popular paradigm for training policies in robotic systems when specifying the reward function is difficult. However, despite the success of IL algorithms, they impose the somewhat unrealistic requirement that the expert demonstrations must come from the same domain in which a new imitator policy is to be learned. We consider a practical setting, where (i) state-only ex… ▽ More

    Submitted 15 February, 2022; v1 submitted 12 February, 2022; originally announced February 2022.

    Comments: accepted to AAMAS 2022 (camera-ready version)

  12. arXiv:2106.04696  [pdf, other

    cs.LG cs.AI

    Curriculum Design for Teaching via Demonstrations: Theory and Applications

    Authors: Gaurav Yengera, Rati Devidze, Parameswaran Kamalaruban, Adish Singla

    Abstract: We consider the problem of teaching via demonstrations in sequential decision-making settings. In particular, we study how to design a personalized curriculum over demonstrations to speed up the learner's convergence. We provide a unified curriculum strategy for two popular learner models: Maximum Causal Entropy Inverse Reinforcement Learning (MaxEnt-IRL) and Cross-Entropy Behavioral Cloning (Cros… ▽ More

    Submitted 15 December, 2021; v1 submitted 8 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021

  13. arXiv:2007.01174  [pdf, other

    cs.LG stat.ML

    Robust Inverse Reinforcement Learning under Transition Dynamics Mismatch

    Authors: Luca Viano, Yu-Ting Huang, Parameswaran Kamalaruban, Adrian Weller, Volkan Cevher

    Abstract: We study the inverse reinforcement learning (IRL) problem under a transition dynamics mismatch between the expert and the learner. Specifically, we consider the Maximum Causal Entropy (MCE) IRL learner model and provide a tight upper bound on the learner's performance degradation based on the $\ell_1$-distance between the transition dynamics of the expert and the learner. Leveraging insights from… ▽ More

    Submitted 30 November, 2021; v1 submitted 2 July, 2020; originally announced July 2020.

  14. arXiv:2007.00425  [pdf, other

    cs.LG stat.ML

    Interaction-limited Inverse Reinforcement Learning

    Authors: Martin Troussard, Emmanuel Pignat, Parameswaran Kamalaruban, Sylvain Calinon, Volkan Cevher

    Abstract: This paper proposes an inverse reinforcement learning (IRL) framework to accelerate learning when the learner-teacher \textit{interaction} is \textit{limited} during training. Our setting is motivated by the realistic scenarios where a helpful teacher is not available or when the teacher cannot access the learning dynamics of the student. We present two different training strategies: Curriculum In… ▽ More

    Submitted 1 July, 2020; originally announced July 2020.

  15. arXiv:2006.13160  [pdf, other

    cs.LG stat.ML

    Environment Shaping in Reinforcement Learning using State Abstraction

    Authors: Parameswaran Kamalaruban, Rati Devidze, Volkan Cevher, Adish Singla

    Abstract: One of the central challenges faced by a reinforcement learning (RL) agent is to effectively learn a (near-)optimal policy in environments with large state spaces having sparse and noisy feedback signals. In real-world applications, an expert with additional domain knowledge can help in speeding up the learning process via \emph{shaping the environment}, i.e., making the environment more learner-f… ▽ More

    Submitted 23 June, 2020; originally announced June 2020.

  16. arXiv:2002.06063  [pdf, other

    cs.LG stat.ML

    Robust Reinforcement Learning via Adversarial training with Langevin Dynamics

    Authors: Parameswaran Kamalaruban, Yu-Ting Huang, Ya-Ping Hsieh, Paul Rolland, Cheng Shi, Volkan Cevher

    Abstract: We introduce a sampling perspective to tackle the challenging task of training robust Reinforcement Learning (RL) agents. Leveraging the powerful Stochastic Gradient Langevin Dynamics, we present a novel, scalable two-player RL algorithm, which is a sampling variant of the two-player policy gradient method. Our algorithm consistently outperforms existing baselines, in terms of generalization acros… ▽ More

    Submitted 5 November, 2020; v1 submitted 14 February, 2020; originally announced February 2020.

  17. arXiv:1912.00498  [pdf, other

    cs.LG cs.AI cs.MA eess.SY

    Optimization for Reinforcement Learning: From Single Agent to Cooperative Agents

    Authors: Donghwan Lee, Niao He, Parameswaran Kamalaruban, Volkan Cevher

    Abstract: This article reviews recent advances in multi-agent reinforcement learning algorithms for large-scale control systems and communication networks, which learn to communicate and cooperate. We provide an overview of this emerging field, with an emphasis on the decentralized setting under different coordination protocols. We highlight the evolution of reinforcement learning algorithms from single-age… ▽ More

    Submitted 1 December, 2019; originally announced December 2019.

  18. arXiv:1905.11867  [pdf, other

    cs.LG cs.AI stat.ML

    Interactive Teaching Algorithms for Inverse Reinforcement Learning

    Authors: Parameswaran Kamalaruban, Rati Devidze, Volkan Cevher, Adish Singla

    Abstract: We study the problem of inverse reinforcement learning (IRL) with the added twist that the learner is assisted by a helpful teacher. More formally, we tackle the following algorithmic question: How could a teacher provide an informative sequence of demonstrations to an IRL learner to speed up the learning process? We present an interactive teaching framework where a teacher adaptively chooses the… ▽ More

    Submitted 5 June, 2019; v1 submitted 28 May, 2019; originally announced May 2019.

    Comments: IJCAI'19 paper (extended version)

  19. arXiv:1811.03537  [pdf, other

    cs.LG stat.ML

    Iterative Classroom Teaching

    Authors: Teresa Yeo, Parameswaran Kamalaruban, Adish Singla, Arpit Merchant, Thibault Asselborn, Louis Faucon, Pierre Dillenbourg, Volkan Cevher

    Abstract: We consider the machine teaching problem in a classroom-like setting wherein the teacher has to deliver the same examples to a diverse group of students. Their diversity stems from differences in their initial internal states as well as their learning rates. We prove that a teacher with full knowledge about the learning dynamics of the students can teach a target concept to the entire classroom us… ▽ More

    Submitted 12 November, 2018; v1 submitted 8 November, 2018; originally announced November 2018.

    Comments: AAAI'19 (extended version)

  20. arXiv:1806.02389  [pdf, other

    stat.ML cs.LG

    Not All Attributes are Created Equal: $d_{\mathcal{X}}$-Private Mechanisms for Linear Queries

    Authors: Parameswaran Kamalaruban, Victor Perrier, Hassan Jameel Asghar, Mohamed Ali Kaafar

    Abstract: Differential privacy provides strong privacy guarantees simultaneously enabling useful insights from sensitive datasets. However, it provides the same level of protection for all elements (individuals and attributes) in the data. There are practical scenarios where some data attributes need more/less protection than others. In this paper, we consider $d_{\mathcal{X}}$-privacy, an instantiation of… ▽ More

    Submitted 28 August, 2019; v1 submitted 6 June, 2018; originally announced June 2018.

  21. arXiv:1805.08622  [pdf, other

    cs.LG stat.ML

    Transitions, Losses, and Re-parameterizations: Elements of Prediction Games

    Authors: Parameswaran Kamalaruban

    Abstract: This thesis presents some geometric insights into three different types of two player prediction games -- namely general learning task, prediction with expert advice, and online convex optimization. These games differ in the nature of the opponent (stochastic, adversarial, or intermediate), the order of the players' move, and the utility function. The insights shed some light on the understanding… ▽ More

    Submitted 20 May, 2018; originally announced May 2018.

    Comments: PhD thesis, The Australian National University, 2018. arXiv admin note: text overlap with arXiv:0901.0356 by other authors

  22. arXiv:1805.07737  [pdf, other

    cs.LG stat.ML

    Exp-Concavity of Proper Composite Losses

    Authors: Parameswaran Kamalaruban, Robert C. Williamson, Xinhua Zhang

    Abstract: The goal of online prediction with expert advice is to find a decision strategy which will perform almost as well as the best expert in a given pool of experts, on any sequence of outcomes. This problem has been widely studied and $O(\sqrt{T})$ and $O(\log{T})$ regret bounds can be achieved for convex losses (\cite{zinkevich2003online}) and strictly convex losses with bounded first and second deri… ▽ More

    Submitted 20 May, 2018; originally announced May 2018.

    Journal ref: PMLR 40:1035-1065, 2015

  23. arXiv:1805.07723  [pdf, other

    cs.LG stat.ML

    Minimax Lower Bounds for Cost Sensitive Classification

    Authors: Parameswaran Kamalaruban, Robert C. Williamson

    Abstract: The cost-sensitive classification problem plays a crucial role in mission-critical machine learning applications, and differs with traditional classification by taking the misclassification costs into consideration. Although being studied extensively in the literature, the fundamental limits of this problem are still not well understood. We investigate the hardness of this problem by extending the… ▽ More

    Submitted 20 May, 2018; originally announced May 2018.

  24. arXiv:1609.02383  [pdf, ps, other

    cs.LG

    Improved Optimistic Mirror Descent for Sparsity and Curvature

    Authors: Parameswaran Kamalaruban

    Abstract: Online Convex Optimization plays a key role in large scale machine learning. Early approaches to this problem were conservative, in which the main focus was protection against the worst case scenario. But recently several algorithms have been developed for tightening the regret bounds in easy data instances such as sparsity, predictable sequences, and curved losses. In this work we unify some of t… ▽ More

    Submitted 8 September, 2016; originally announced September 2016.

  25. arXiv:1607.00146  [pdf, ps, other

    cs.LG stat.ML

    Efficient and Consistent Robust Time Series Analysis

    Authors: Kush Bhatia, Prateek Jain, Parameswaran Kamalaruban, Purushottam Kar

    Abstract: We study the problem of robust time series analysis under the standard auto-regressive (AR) time series model in the presence of arbitrary outliers. We devise an efficient hard thresholding based algorithm which can obtain a consistent estimate of the optimal AR model despite a large fraction of the time series points being corrupted. Our algorithm alternately estimates the corrupted set of points… ▽ More

    Submitted 1 July, 2016; originally announced July 2016.