Skip to main content

Showing 1–3 of 3 results for author: Chidambaram, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2410.14001  [pdf, other

    cs.LG cs.CL

    Personalized Adaptation via In-Context Preference Learning

    Authors: Allison Lau, Younwoo Choi, Vahid Balazadeh, Keertana Chidambaram, Vasilis Syrgkanis, Rahul G. Krishnan

    Abstract: Reinforcement Learning from Human Feedback (RLHF) is widely used to align Language Models (LMs) with human preferences. However, existing approaches often neglect individual user preferences, leading to suboptimal personalization. We present the Preference Pretrained Transformer (PPT), a novel approach for adaptive personalization using online user feedback. PPT leverages the in-context learning c… ▽ More

    Submitted 17 October, 2024; originally announced October 2024.

  2. arXiv:2405.15065  [pdf, other

    cs.LG

    Direct Preference Optimization With Unobserved Preference Heterogeneity

    Authors: Keertana Chidambaram, Karthik Vinay Seetharaman, Vasilis Syrgkanis

    Abstract: RLHF has emerged as a pivotal step in aligning language models with human objectives and values. It typically involves learning a reward model from human preference data and then using reinforcement learning to update the generative model accordingly. Conversely, Direct Preference Optimization (DPO) directly optimizes the generative model with preference data, skipping reinforcement learning. Howe… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  3. arXiv:2404.07266  [pdf, ps, other

    cs.LG

    Sequential Decision Making with Expert Demonstrations under Unobserved Heterogeneity

    Authors: Vahid Balazadeh, Keertana Chidambaram, Viet Nguyen, Rahul G. Krishnan, Vasilis Syrgkanis

    Abstract: We study the problem of online sequential decision-making given auxiliary demonstrations from experts who made their decisions based on unobserved contextual information. These demonstrations can be viewed as solving related but slightly different problems than what the learner faces. This setting arises in many application domains, such as self-driving cars, healthcare, and finance, where expert… ▽ More

    Submitted 15 June, 2025; v1 submitted 10 April, 2024; originally announced April 2024.