Skip to main content

Showing 1–50 of 51 results for author: Bıyık, E

.
  1. arXiv:2505.20455  [pdf, ps, other

    cs.RO

    HAND Me the Data: Fast Robot Adaptation via Hand Path Retrieval

    Authors: Matthew Hong, Anthony Liang, Kevin Kim, Harshitha Rajaprakash, Jesse Thomason, Erdem Bıyık, Jesse Zhang

    Abstract: We hand the community HAND, a simple and time-efficient method for teaching robots new manipulation tasks through human hand demonstrations. Instead of relying on task-specific robot demonstrations collected via teleoperation, HAND uses easy-to-provide hand demonstrations to retrieve relevant behaviors from task-agnostic robot play data. Using a visual tracking pipeline, HAND extracts the motion o… ▽ More

    Submitted 1 June, 2025; v1 submitted 26 May, 2025; originally announced May 2025.

  2. arXiv:2505.10911  [pdf, ps, other

    cs.RO

    ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations

    Authors: Jiahui Zhang, Yusen Luo, Abrar Anwar, Sumedh Anand Sontakke, Joseph J Lim, Jesse Thomason, Erdem Biyik, Jesse Zhang

    Abstract: We introduce ReWiND, a framework for learning robot manipulation tasks solely from language instructions without per-task demonstrations. Standard reinforcement learning (RL) and imitation learning methods require expert supervision through human-designed reward functions or demonstrations for every new task. In contrast, ReWiND starts from a small demonstration dataset to learn: (1) a data-effici… ▽ More

    Submitted 16 May, 2025; originally announced May 2025.

  3. arXiv:2505.04999  [pdf, other

    cs.RO cs.AI cs.LG

    CLAM: Continuous Latent Action Models for Robot Learning from Unlabeled Demonstrations

    Authors: Anthony Liang, Pavel Czempin, Matthew Hong, Yutai Zhou, Erdem Biyik, Stephen Tu

    Abstract: Learning robot policies using imitation learning requires collecting large amounts of costly action-labeled expert demonstrations, which fundamentally limits the scale of training data. A promising approach to address this bottleneck is to harness the abundance of unlabeled observations-e.g., from video demonstrations-to learn latent action labels in an unsupervised way. However, we find that exis… ▽ More

    Submitted 8 May, 2025; originally announced May 2025.

    Comments: Latent Action Models, Self-supervised Pretraining, Learning from Videos

  4. arXiv:2503.10110  [pdf, other

    cs.RO cs.AI cs.LG

    IMPACT: Intelligent Motion Planning with Acceptable Contact Trajectories via Vision-Language Models

    Authors: Yiyang Ling, Karan Owalekar, Oluwatobiloba Adesanya, Erdem Bıyık, Daniel Seita

    Abstract: Motion planning involves determining a sequence of robot configurations to reach a desired pose, subject to movement and safety constraints. Traditional motion planning finds collision-free paths, but this is overly restrictive in clutter, where it may not be possible for a robot to accomplish a task without contact. In addition, contacts range from relatively benign (e.g., brushing a soft pillow)… ▽ More

    Submitted 13 March, 2025; originally announced March 2025.

  5. arXiv:2503.04679  [pdf, other

    cs.MA cs.AI cs.LG cs.RO

    Multi-Agent Inverse Q-Learning from Demonstrations

    Authors: Nathaniel Haynam, Adam Khoja, Dhruv Kumar, Vivek Myers, Erdem Bıyık

    Abstract: When reward functions are hand-designed, deep reinforcement learning algorithms often suffer from reward misspecification, causing them to learn suboptimal policies in terms of the intended task objectives. In the single-agent case, inverse reinforcement learning (IRL) techniques attempt to address this issue by inferring the reward function from expert demonstrations. However, in multi-agent prob… ▽ More

    Submitted 6 March, 2025; originally announced March 2025.

    Comments: 8 pages, 4 figures, 2 tables. Published at the International Conference on Robotics and Automation (ICRA) 2025

  6. arXiv:2503.02992  [pdf, other

    cs.RO cs.AI

    RAILGUN: A Unified Convolutional Policy for Multi-Agent Path Finding Across Different Environments and Tasks

    Authors: Yimin Tang, Xiao Xiong, Jingyi Xi, Jiaoyang Li, Erdem Bıyık, Sven Koenig

    Abstract: Multi-Agent Path Finding (MAPF), which focuses on finding collision-free paths for multiple robots, is crucial for applications ranging from aerial swarms to warehouse automation. Solving MAPF is NP-hard so learning-based approaches for MAPF have gained attention, particularly those leveraging deep neural networks. Nonetheless, despite the community's continued efforts, all learning-based MAPF pla… ▽ More

    Submitted 4 March, 2025; originally announced March 2025.

    Comments: 7 pages

  7. arXiv:2502.13519  [pdf, other

    cs.RO cs.AI cs.LG

    MILE: Model-based Intervention Learning

    Authors: Yigit Korkmaz, Erdem Bıyık

    Abstract: Imitation learning techniques have been shown to be highly effective in real-world control scenarios, such as robotics. However, these approaches not only suffer from compounding error issues but also require human experts to provide complete trajectories. Although there exist interactive methods where an expert oversees the robot and intervenes if needed, these extensions usually only utilize the… ▽ More

    Submitted 19 February, 2025; originally announced February 2025.

    Comments: International Conference on Robotics and Automation (ICRA)

  8. arXiv:2412.04453  [pdf, other

    cs.RO cs.CV

    NaVILA: Legged Robot Vision-Language-Action Model for Navigation

    Authors: An-Chieh Cheng, Yandong Ji, Zhaojing Yang, Zaitian Gongye, Xueyan Zou, Jan Kautz, Erdem Bıyık, Hongxu Yin, Sifei Liu, Xiaolong Wang

    Abstract: This paper proposes to solve the problem of Vision-and-Language Navigation with legged robots, which not only provides a flexible way for humans to command but also allows the robot to navigate through more challenging and cluttered scenes. However, it is non-trivial to translate human language instructions all the way to low-level leg joint actions. We propose NaVILA, a 2-level framework that uni… ▽ More

    Submitted 17 February, 2025; v1 submitted 5 December, 2024; originally announced December 2024.

    Comments: Website: https://navila-bot.github.io/

  9. arXiv:2410.12217  [pdf, other

    cs.CL

    Accurate and Data-Efficient Toxicity Prediction when Annotators Disagree

    Authors: Harbani Jaggi, Kashyap Murali, Eve Fleisig, Erdem Bıyık

    Abstract: When annotators disagree, predicting the labels given by individual annotators can capture nuances overlooked by traditional label aggregation. We introduce three approaches to predicting individual annotator ratings on the toxicity of text by incorporating individual annotator-specific information: a neural collaborative filtering (NCF) approach, an in-context learning (ICL) approach, and an inte… ▽ More

    Submitted 16 October, 2024; originally announced October 2024.

  10. arXiv:2410.11833  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Mitigating Suboptimality of Deterministic Policy Gradients in Complex Q-functions

    Authors: Ayush Jain, Norio Kosaka, Xinhu Li, Kyung-Min Kim, Erdem Bıyık, Joseph J. Lim

    Abstract: In reinforcement learning, off-policy actor-critic approaches like DDPG and TD3 are based on the deterministic policy gradient. Herein, the Q-function is trained from off-policy environment data and the actor (policy) is trained to maximize the Q-function via gradient ascent. We observe that in complex tasks like dexterous manipulation and restricted locomotion, the Q-value is a complex function o… ▽ More

    Submitted 15 October, 2024; originally announced October 2024.

  11. arXiv:2410.06401  [pdf, other

    cs.RO

    Trajectory Improvement and Reward Learning from Comparative Language Feedback

    Authors: Zhaojing Yang, Miru Jun, Jeremy Tien, Stuart J. Russell, Anca Dragan, Erdem Bıyık

    Abstract: Learning from human feedback has gained traction in fields like robotics and natural language processing in recent years. While prior works mostly rely on human feedback in the form of comparisons, language is a preferable modality that provides more informative insights into user preferences. In this work, we aim to incorporate comparative language feedback to iteratively improve robot trajectori… ▽ More

    Submitted 8 October, 2024; originally announced October 2024.

    Comments: 8th Annual Conference of Robot Learning (2024)

  12. arXiv:2406.17768  [pdf, other

    cs.RO cs.AI cs.LG

    EXTRACT: Efficient Policy Learning by Extracting Transferable Robot Skills from Offline Data

    Authors: Jesse Zhang, Minho Heo, Zuxin Liu, Erdem Biyik, Joseph J Lim, Yao Liu, Rasool Fakoor

    Abstract: Most reinforcement learning (RL) methods focus on learning optimal policies over low-level action spaces. While these methods can perform well in their training environments, they lack the flexibility to transfer to new tasks. Instead, RL agents that can act over useful, temporally extended skills rather than low-level actions can learn new tasks more easily. Prior work in skill-based RL either re… ▽ More

    Submitted 18 September, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

    Comments: 25 pages, 16 figures

    Journal ref: CoRL 2024

  13. arXiv:2406.06714  [pdf, other

    cs.LG cs.AI cs.HC

    Coprocessor Actor Critic: A Model-Based Reinforcement Learning Approach For Adaptive Brain Stimulation

    Authors: Michelle Pan, Mariah Schrum, Vivek Myers, Erdem Bıyık, Anca Dragan

    Abstract: Adaptive brain stimulation can treat neurological conditions such as Parkinson's disease and post-stroke motor deficits by influencing abnormal neural activity. Because of patient heterogeneity, each patient requires a unique stimulation policy to achieve optimal neural responses. Model-free reinforcement learning (MFRL) holds promise in learning effective policies for a variety of similar control… ▽ More

    Submitted 7 October, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: Proceedings of the 41st International Conference on Machine Learning (ICML 2024)

    Journal ref: International Conference on Machine Learning 2024

  14. arXiv:2403.10940  [pdf, other

    cs.RO cs.LG

    ViSaRL: Visual Reinforcement Learning Guided by Human Saliency

    Authors: Anthony Liang, Jesse Thomason, Erdem Bıyık

    Abstract: Training robots to perform complex control tasks from high-dimensional pixel input using reinforcement learning (RL) is sample-inefficient, because image observations are comprised primarily of task-irrelevant information. By contrast, humans are able to visually attend to task-relevant objects and areas. Based on this insight, we introduce Visual Saliency-Guided Reinforcement Learning (ViSaRL). U… ▽ More

    Submitted 20 October, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

    Journal ref: IEEE/RSJ International Conference on Intelligent Robots and Systems 2024

  15. arXiv:2403.06003  [pdf, other

    cs.RO cs.AI cs.LG

    A Generalized Acquisition Function for Preference-based Reward Learning

    Authors: Evan Ellis, Gaurav R. Ghosal, Stuart J. Russell, Anca Dragan, Erdem Bıyık

    Abstract: Preference-based reward learning is a popular technique for teaching robots and autonomous systems how a human user wants them to perform a task. Previous works have shown that actively synthesizing preference queries to maximize information gain about the reward function parameters improves data efficiency. The information gain criterion focuses on precisely identifying all parameters of the rewa… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  16. arXiv:2402.15957  [pdf, other

    cs.LG

    DynaMITE-RL: A Dynamic Model for Improved Temporal Meta-Reinforcement Learning

    Authors: Anthony Liang, Guy Tennenholtz, Chih-wei Hsu, Yinlam Chow, Erdem Bıyık, Craig Boutilier

    Abstract: We introduce DynaMITE-RL, a meta-reinforcement learning (meta-RL) approach to approximate inference in environments where the latent state evolves at varying rates. We model episode sessions - parts of the episode where the latent state is fixed - and propose three key modifications to existing meta-RL methods: consistency of latent information within sessions, session masking, and prior latent co… ▽ More

    Submitted 4 December, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

    Journal ref: Neural Information Processing Systems (NeurIPS) 2024

  17. arXiv:2402.15757  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Batch Active Learning of Reward Functions from Human Preferences

    Authors: Erdem Bıyık, Nima Anari, Dorsa Sadigh

    Abstract: Data generation and labeling are often expensive in robot learning. Preference-based learning is a concept that enables reliable labeling by querying users with preference questions. Active querying methods are commonly employed in preference-based learning to generate more informative data at the expense of parallelization and computation time. In this paper, we develop a set of novel algorithms,… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Comments: To appear in ACM Transactions on Human-Robot Interaction (THRI). 27 pages, 12 figures, 2 tables. arXiv admin note: text overlap with arXiv:1810.04303

  18. arXiv:2402.03681  [pdf, other

    cs.RO cs.AI cs.LG

    RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback

    Authors: Yufei Wang, Zhanyi Sun, Jesse Zhang, Zhou Xian, Erdem Biyik, David Held, Zackory Erickson

    Abstract: Reward engineering has long been a challenge in Reinforcement Learning (RL) research, as it often requires extensive human effort and iterative processes of trial-and-error to design effective reward functions. In this paper, we propose RL-VLM-F, a method that automatically generates reward functions for agents to learn new tasks, using only a text description of the task goal and the agent's visu… ▽ More

    Submitted 14 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: ICML 2024

  19. arXiv:2311.02085  [pdf, other

    cs.IR cs.AI

    Preference Elicitation with Soft Attributes in Interactive Recommendation

    Authors: Erdem Biyik, Fan Yao, Yinlam Chow, Alex Haig, Chih-wei Hsu, Mohammad Ghavamzadeh, Craig Boutilier

    Abstract: Preference elicitation plays a central role in interactive recommender systems. Most preference elicitation approaches use either item queries that ask users to select preferred items from a slate, or attribute queries that ask them to express their preferences for item characteristics. Unfortunately, users often wish to describe their preferences using soft attributes for which no ground-truth se… ▽ More

    Submitted 22 October, 2023; originally announced November 2023.

  20. arXiv:2310.07899  [pdf, other

    cs.AI cs.RO

    RoboCLIP: One Demonstration is Enough to Learn Robot Policies

    Authors: Sumedh A Sontakke, Jesse Zhang, Sébastien M. R. Arnold, Karl Pertsch, Erdem Bıyık, Dorsa Sadigh, Chelsea Finn, Laurent Itti

    Abstract: Reward specification is a notoriously difficult problem in reinforcement learning, requiring extensive expert supervision to design robust reward functions. Imitation learning (IL) methods attempt to circumvent these problems by utilizing expert demonstrations but typically require a large number of in-domain expert demonstrations. Inspired by advances in the field of Video-and-Language Models (VL… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  21. arXiv:2307.15217  [pdf, other

    cs.AI cs.CL cs.LG

    Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

    Authors: Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen , et al. (7 additional authors not shown)

    Abstract: Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals. RLHF has emerged as the central method used to finetune state-of-the-art large language models (LLMs). Despite this popularity, there has been relatively little public work systematizing its flaws. In this paper, we (1) survey open problems and fundamental limitations of RLHF and rel… ▽ More

    Submitted 11 September, 2023; v1 submitted 27 July, 2023; originally announced July 2023.

  22. Renewable energy management in smart home environment via forecast embedded scheduling based on Recurrent Trend Predictive Neural Network

    Authors: Mert Nakıp, Onur Çopur, Emrah Biyik, Cüneyt Güzeliş

    Abstract: Smart home energy management systems help the distribution grid operate more efficiently and reliably, and enable effective penetration of distributed renewable energy sources. These systems rely on robust forecasting, optimization, and control/scheduling algorithms that can handle the uncertain nature of demand and renewable generation. This paper proposes an advanced ML algorithm, called Recurre… ▽ More

    Submitted 6 July, 2023; v1 submitted 4 July, 2023; originally announced July 2023.

    Journal ref: Nakıp, M., Çopur, O., Biyik, E., & Güzeliş, C. (2023). Renewable energy management in smart home environment via forecast embedded scheduling based on Recurrent Trend Predictive Neural Network. Applied Energy, 340, 121014

  23. arXiv:2302.13507  [pdf, other

    cs.LG cs.AI cs.RO

    Active Reward Learning from Online Preferences

    Authors: Vivek Myers, Erdem Bıyık, Dorsa Sadigh

    Abstract: Robot policies need to adapt to human preferences and/or new environments. Human experts may have the domain knowledge required to help robots achieve this adaptation. However, existing works often require costly offline re-training on human feedback, and those feedback usually need to be frequent and too complex for the humans to reliably provide. To avoid placing undue burden on human experts an… ▽ More

    Submitted 26 February, 2023; originally announced February 2023.

    Comments: 11 pages, 8 figures, 1 table. Published in the 2023 IEEE International Conference on Robotics and Automation (ICRA)

  24. arXiv:2211.14003  [pdf, other

    cs.AI cs.HC cs.RO

    Assistive Teaching of Motor Control Tasks to Humans

    Authors: Megha Srivastava, Erdem Biyik, Suvir Mirchandani, Noah Goodman, Dorsa Sadigh

    Abstract: Recent works on shared autonomy and assistive-AI technologies, such as assistive robot teleoperation, seek to model and help human users with limited ability in a fixed task. However, these approaches often fail to account for humans' ability to adapt and eventually learn how to execute a control task themselves. Furthermore, in applications where it may be desirable for a human to intervene, thes… ▽ More

    Submitted 25 November, 2022; originally announced November 2022.

    Comments: 22 pages, 14 figures, NeurIPS 2022

  25. arXiv:2210.10899  [pdf, other

    cs.RO cs.AI cs.LG stat.ML

    Learning Preferences for Interactive Autonomy

    Authors: Erdem Bıyık

    Abstract: When robots enter everyday human environments, they need to understand their tasks and how they should perform those tasks. To encode these, reward functions, which specify the objective of a robot, are employed. However, designing reward functions can be extremely challenging for complex tasks and environments. A promising approach is to learn reward functions from humans. Recently, several robot… ▽ More

    Submitted 19 October, 2022; originally announced October 2022.

    Comments: Ph.D. Thesis (Stanford University), 198 pages

  26. arXiv:2205.11613  [pdf, other

    cs.HC

    How do people incorporate advice from artificial agents when making physical judgments?

    Authors: Erik Brockbank, Haoliang Wang, Justin Yang, Suvir Mirchandani, Erdem Bıyık, Dorsa Sadigh, Judith E. Fan

    Abstract: How do people build up trust with artificial agents? Here, we study a key component of interpersonal trust: people's ability to evaluate the competence of another agent across repeated interactions. Prior work has largely focused on appraisal of simple, static skills; in contrast, we probe competence evaluations in a rich setting with agents that learn over time. Participants played a video game i… ▽ More

    Submitted 16 May, 2022; originally announced May 2022.

  27. arXiv:2203.04421  [pdf, other

    cs.LG cs.RO

    Leveraging Smooth Attention Prior for Multi-Agent Trajectory Prediction

    Authors: Zhangjie Cao, Erdem Bıyık, Guy Rosman, Dorsa Sadigh

    Abstract: Multi-agent interactions are important to model for forecasting other agents' behaviors and trajectories. At a certain time, to forecast a reasonable future trajectory, each agent needs to pay attention to the interactions with only a small group of most relevant agents instead of unnecessarily paying attention to all the other agents. However, existing attention modeling works ignore that human a… ▽ More

    Submitted 19 March, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

    Comments: 8 pages

    Journal ref: ICRA 2022

  28. arXiv:2110.00751  [pdf, other

    cs.LG cs.AI cs.MA cs.RO stat.ML

    Partner-Aware Algorithms in Decentralized Cooperative Bandit Teams

    Authors: Erdem Bıyık, Anusha Lalitha, Rajarshi Saha, Andrea Goldsmith, Dorsa Sadigh

    Abstract: When humans collaborate with each other, they often make decisions by observing others and considering the consequences that their actions may have on the entire team, instead of greedily doing what is best for just themselves. We would like our AI agents to effectively collaborate in a similar way by capturing a model of their partners. In this work, we propose and analyze a decentralized Multi-A… ▽ More

    Submitted 16 December, 2021; v1 submitted 2 October, 2021; originally announced October 2021.

    Comments: 14 pages, 13 figures. To be presented at "Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI) 2022". Also presented at "Artificial Intelligence for Human-Robot Interaction (AI-HRI) at AAAI Fall Symposium Series 2021"

    Report number: AIHRI/2021/46

  29. arXiv:2110.00284  [pdf, other

    cs.RO cs.AI cs.LG

    Learning Reward Functions from Scale Feedback

    Authors: Nils Wilde, Erdem Bıyık, Dorsa Sadigh, Stephen L. Smith

    Abstract: Today's robots are increasingly interacting with people and need to efficiently learn inexperienced user's preferences. A common framework is to iteratively query the user about which of two presented robot trajectories they prefer. While this minimizes the users effort, a strict choice does not yield any information on how much one trajectory is preferred. We propose scale feedback, where the use… ▽ More

    Submitted 1 October, 2021; originally announced October 2021.

    Comments: 16 pages, 15 figures, 3 tables. Published at Conference on Robot Learning (CoRL) 2021

  30. arXiv:2109.12750  [pdf, other

    cs.LG cs.AI cs.RO

    Learning Multimodal Rewards from Rankings

    Authors: Vivek Myers, Erdem Bıyık, Nima Anari, Dorsa Sadigh

    Abstract: Learning from human feedback has shown to be a useful approach in acquiring robot reward functions. However, expert feedback is often assumed to be drawn from an underlying unimodal reward function. This assumption does not always hold including in settings where multiple experts provide data or when a single expert provides data for different tasks -- we thus go beyond learning a unimodal reward… ▽ More

    Submitted 18 October, 2021; v1 submitted 26 September, 2021; originally announced September 2021.

    Comments: 17 pages, 12 figures, 2 tables. Published at Conference on Robot Learning (CoRL) 2021

  31. arXiv:2108.07259  [pdf, other

    cs.LG cs.AI cs.RO

    APReL: A Library for Active Preference-based Reward Learning Algorithms

    Authors: Erdem Bıyık, Aditi Talati, Dorsa Sadigh

    Abstract: Reward learning is a fundamental problem in human-robot interaction to have robots that operate in alignment with what their human user wants. Many preference-based learning algorithms and active querying techniques have been proposed as a solution to this problem. In this paper, we present APReL, a library for active preference-based reward learning algorithms, which enable researchers and practi… ▽ More

    Submitted 4 January, 2022; v1 submitted 16 August, 2021; originally announced August 2021.

    Comments: 5 pages, 1 figures. Library is available at: https://github.com/Stanford-ILIAD/APReL

    Report number: AIHRI/2021/47

  32. arXiv:2106.04678  [pdf, other

    cs.MA cs.AI cs.LG cs.RO

    Incentivizing Efficient Equilibria in Traffic Networks with Mixed Autonomy

    Authors: Erdem Bıyık, Daniel A. Lazar, Ramtin Pedarsani, Dorsa Sadigh

    Abstract: Traffic congestion has large economic and social costs. The introduction of autonomous vehicles can potentially reduce this congestion by increasing road capacity via vehicle platooning and by creating an avenue for influencing people's choice of routes. We consider a network of parallel roads with two modes of transportation: (i) human drivers, who will choose the quickest route available to them… ▽ More

    Submitted 5 May, 2021; originally announced June 2021.

    Comments: 12 pages, 7 figures, 2 tables. To appear at IEEE Transactions on Control of Network Systems (TCNS). arXiv admin note: substantial text overlap with arXiv:1904.02209

  33. arXiv:2105.06593  [pdf, other

    cs.MA cs.AI cs.GT cs.LG

    Emergent Prosociality in Multi-Agent Games Through Gifting

    Authors: Woodrow Z. Wang, Mark Beliaev, Erdem Bıyık, Daniel A. Lazar, Ramtin Pedarsani, Dorsa Sadigh

    Abstract: Coordination is often critical to forming prosocial behaviors -- behaviors that increase the overall sum of rewards received by all agents in a multi-agent game. However, state of the art reinforcement learning algorithms often suffer from converging to socially less desirable equilibria when multiple equilibria exist. Previous works address this challenge with explicit reward shaping, which requi… ▽ More

    Submitted 13 May, 2021; originally announced May 2021.

    Comments: 9 pages, 6 figures, IJCAI 2021

  34. arXiv:2103.02727  [pdf, other

    cs.RO cs.HC

    Preference-based Learning of Reward Function Features

    Authors: Sydney M. Katz, Amir Maleki, Erdem Bıyık, Mykel J. Kochenderfer

    Abstract: Preference-based learning of reward functions, where the reward function is learned using comparison data, has been well studied for complex robotic tasks such as autonomous driving. Existing algorithms have focused on learning reward functions that are linear in a set of trajectory features. The features are typically hand-coded, and preference-based learning is used to determine a particular use… ▽ More

    Submitted 3 March, 2021; originally announced March 2021.

    Comments: 8 pages, 8 figures

  35. arXiv:2012.15749  [pdf, other

    cs.SI cs.AI cs.LG eess.SY

    Incentivizing Routing Choices for Safe and Efficient Transportation in the Face of the COVID-19 Pandemic

    Authors: Mark Beliaev, Erdem Bıyık, Daniel A. Lazar, Woodrow Z. Wang, Dorsa Sadigh, Ramtin Pedarsani

    Abstract: The COVID-19 pandemic has severely affected many aspects of people's daily lives. While many countries are in a re-opening stage, some effects of the pandemic on people's behaviors are expected to last much longer, including how they choose between different transport options. Experts predict considerably delayed recovery of the public transport options, as people try to avoid crowded places. In t… ▽ More

    Submitted 17 February, 2021; v1 submitted 28 December, 2020; originally announced December 2020.

    Comments: ICCPS 2021. 11 pages, 4 figures

  36. ROIAL: Region of Interest Active Learning for Characterizing Exoskeleton Gait Preference Landscapes

    Authors: Kejun Li, Maegan Tucker, Erdem Bıyık, Ellen Novoseller, Joel W. Burdick, Yanan Sui, Dorsa Sadigh, Yisong Yue, Aaron D. Ames

    Abstract: Characterizing what types of exoskeleton gaits are comfortable for users, and understanding the science of walking more generally, require recovering a user's utility landscape. Learning these landscapes is challenging, as walking trajectories are defined by numerous gait parameters, data collection from human trials is expensive, and user safety and comfort must be ensured. This work proposes the… ▽ More

    Submitted 30 March, 2021; v1 submitted 9 November, 2020; originally announced November 2020.

    Comments: 6 pages + 1 page of references; 7 figures; To Appear at ICRA 2021

  37. arXiv:2008.04452  [pdf, other

    cs.AI cs.LG cs.MA cs.RO

    Multi-Agent Safe Planning with Gaussian Processes

    Authors: Zheqing Zhu, Erdem Bıyık, Dorsa Sadigh

    Abstract: Multi-agent safe systems have become an increasingly important area of study as we can now easily have multiple AI-powered systems operating together. In such settings, we need to ensure the safety of not only each individual agent, but also the overall system. In this paper, we introduce a novel multi-agent safe learning algorithm that enables decentralized safe navigation when there are multiple… ▽ More

    Submitted 10 August, 2020; originally announced August 2020.

    Comments: 9 pages, 5 figures. Published at IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2020

  38. arXiv:2007.00178  [pdf, other

    cs.LG cs.AI cs.RO eess.SY stat.ML

    Reinforcement Learning based Control of Imitative Policies for Near-Accident Driving

    Authors: Zhangjie Cao, Erdem Bıyık, Woodrow Z. Wang, Allan Raventos, Adrien Gaidon, Guy Rosman, Dorsa Sadigh

    Abstract: Autonomous driving has achieved significant progress in recent years, but autonomous cars are still unable to tackle high-risk situations where a potential accident is likely. In such near-accident scenarios, even a minor change in the vehicle's actions may result in drastically different consequences. To avoid unsafe actions in near-accident scenarios, we need to fully explore the environment. Ho… ▽ More

    Submitted 30 June, 2020; originally announced July 2020.

    Comments: 10 pages, 7 figures. Published at Robotics: Science and Systems (RSS) 2020

  39. arXiv:2006.14091  [pdf, other

    cs.RO cs.AI cs.LG

    Learning Reward Functions from Diverse Sources of Human Feedback: Optimally Integrating Demonstrations and Preferences

    Authors: Erdem Bıyık, Dylan P. Losey, Malayandi Palan, Nicholas C. Landolfi, Gleb Shevchuk, Dorsa Sadigh

    Abstract: Reward functions are a common way to specify the objective of a robot. As designing reward functions can be extremely challenging, a more promising approach is to directly learn reward functions from human teachers. Importantly, data from human teachers can be collected either passively or actively in a variety of forms: passive data sources include demonstrations, (e.g., kinesthetic guidance), wh… ▽ More

    Submitted 4 August, 2021; v1 submitted 24 June, 2020; originally announced June 2020.

    Comments: 20 pages, 17 figures. Accepted for publication by The International Journal of Robotics Research (IJRR)

  40. arXiv:2005.02575  [pdf, other

    cs.RO cs.AI cs.LG

    Active Preference-Based Gaussian Process Regression for Reward Learning

    Authors: Erdem Bıyık, Nicolas Huynh, Mykel J. Kochenderfer, Dorsa Sadigh

    Abstract: Designing reward functions is a challenging problem in AI and robotics. Humans usually have a difficult time directly specifying all the desirable behaviors that a robot needs to optimize. One common approach is to learn reward functions from collected expert demonstrations. However, learning reward functions from demonstrations introduces many challenges: some methods require highly structured mo… ▽ More

    Submitted 3 June, 2020; v1 submitted 5 May, 2020; originally announced May 2020.

    Comments: Proceedings of Robotics: Science and Systems (RSS), July 2020

  41. When Humans Aren't Optimal: Robots that Collaborate with Risk-Aware Humans

    Authors: Minae Kwon, Erdem Biyik, Aditi Talati, Karan Bhasin, Dylan P. Losey, Dorsa Sadigh

    Abstract: In order to collaborate safely and efficiently, robots need to anticipate how their human partners will behave. Some of today's robots model humans as if they were also robots, and assume users are always optimal. Other robots account for human limitations, and relax this assumption so that the human is noisily rational. Both of these models make sense when the human receives deterministic rewards… ▽ More

    Submitted 13 January, 2020; originally announced January 2020.

    Comments: ACM/IEEE International Conference on Human-Robot Interaction

    ACM Class: I.2.9

  42. arXiv:1911.07380  [pdf, other

    cs.HC cs.CY

    Developing a Scenario-Based Video Game Generation Framework: Preliminary Results

    Authors: Elif Surer, Mustafa Erkayaoğlu, Zeynep Nur Öztürk, Furkan Yücel, Emin Alp Bıyık, Burak Altan, Büşra Şenderin, Zeliha Oğuz, Servet Gürer, H. Şebnem Düzgün

    Abstract: Emergency training and planning provide structured curricula, rule-based action items, and interdisciplinary collaborative entities to imitate and teach real-life tasks. This rule-based structure enables the curricula to be transferred into other systematic learning platforms such as serious games ---games that have additional purposes rather than only entertainment. Serious games aim to educate,… ▽ More

    Submitted 17 November, 2019; originally announced November 2019.

  43. arXiv:1910.04365  [pdf, other

    cs.RO cs.AI cs.LG

    Asking Easy Questions: A User-Friendly Approach to Active Reward Learning

    Authors: Erdem Bıyık, Malayandi Palan, Nicholas C. Landolfi, Dylan P. Losey, Dorsa Sadigh

    Abstract: Robots can learn the right reward function by querying a human expert. Existing approaches attempt to choose questions where the robot is most uncertain about the human's response; however, they do not consider how easy it will be for the human to answer! In this paper we explore an information gain formulation for optimally selecting questions that naturally account for the human's ability to ans… ▽ More

    Submitted 10 October, 2019; originally announced October 2019.

    Comments: Proceedings of the 3rd Conference on Robot Learning (CoRL), October 2019

  44. arXiv:1909.03664  [pdf, other

    math.OC cs.RO eess.SY

    Learning How to Dynamically Route Autonomous Vehicles on Shared Roads

    Authors: Daniel A. Lazar, Erdem Bıyık, Dorsa Sadigh, Ramtin Pedarsani

    Abstract: Road congestion induces significant costs across the world, and road network disturbances, such as traffic accidents, can cause highly congested traffic patterns. If a planner had control over the routing of all vehicles in the network, they could easily reverse this effect. In a more realistic scenario, we consider a planner that controls autonomous cars, which are a fraction of all present cars.… ▽ More

    Submitted 3 June, 2021; v1 submitted 9 September, 2019; originally announced September 2019.

    Comments: Accepted to Transportation Research Part C

  45. arXiv:1906.07975  [pdf, other

    cs.LG stat.ML

    Batch Active Learning Using Determinantal Point Processes

    Authors: Erdem Bıyık, Kenneth Wang, Nima Anari, Dorsa Sadigh

    Abstract: Data collection and labeling is one of the main challenges in employing machine learning algorithms in a variety of real-world applications with limited data. While active learning methods attempt to tackle this issue by labeling only the data samples that give high information, they generally suffer from large computational costs and are impractical in settings where data can be collected in para… ▽ More

    Submitted 19 June, 2019; originally announced June 2019.

    Comments: Submitted to NeurIPS 2019

  46. arXiv:1904.02209  [pdf, other

    math.OC cs.RO eess.SY

    The Green Choice: Learning and Influencing Human Decisions on Shared Roads

    Authors: Erdem Bıyık, Daniel A. Lazar, Dorsa Sadigh, Ramtin Pedarsani

    Abstract: Autonomous vehicles have the potential to increase the capacity of roads via platooning, even when human drivers and autonomous vehicles share roads. However, when users of a road network choose their routes selfishly, the resulting traffic configuration may be very inefficient. Because of this, we consider how to influence human decisions so as to decrease congestion on these roads. We consider a… ▽ More

    Submitted 9 April, 2019; v1 submitted 3 April, 2019; originally announced April 2019.

    Comments: Submitted to CDC 2019

  47. arXiv:1904.01068  [pdf, other

    cs.RO cs.AI cs.LG eess.SY

    Efficient and Safe Exploration in Deterministic Markov Decision Processes with Unknown Transition Models

    Authors: Erdem Bıyık, Jonathan Margoliash, Shahrouz Ryan Alimo, Dorsa Sadigh

    Abstract: We propose a safe exploration algorithm for deterministic Markov Decision Processes with unknown transition models. Our algorithm guarantees safety by leveraging Lipschitz-continuity to ensure that no unsafe states are visited during exploration. Unlike many other existing techniques, the provided safety guarantee is deterministic. Our algorithm is optimized to reduce the number of actions needed… ▽ More

    Submitted 1 April, 2019; originally announced April 2019.

    Comments: Proceedings of the American Control Conference (ACC), July 2019. The first two authors have equal contribution

  48. arXiv:1810.11978  [pdf, other

    math.OC cs.RO

    Altruistic Autonomy: Beating Congestion on Shared Roads

    Authors: Erdem Bıyık, Daniel Lazar, Ramtin Pedarsani, Dorsa Sadigh

    Abstract: Traffic congestion has large economic and social costs. The introduction of autonomous vehicles can potentially reduce this congestion, both by increasing network throughput and by enabling a social planner to incentivize users of autonomous vehicles to take longer routes that can alleviate congestion on more direct roads. We formalize the effects of altruistic autonomy on roads shared between hum… ▽ More

    Submitted 29 October, 2018; originally announced October 2018.

    Comments: Accepted to Workshop on the Algorithmic Foundations of Robotics (WAFR) 2018

  49. arXiv:1810.04303  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Batch Active Preference-Based Learning of Reward Functions

    Authors: Erdem Bıyık, Dorsa Sadigh

    Abstract: Data generation and labeling are usually an expensive part of learning for robotics. While active learning methods are commonly used to tackle the former problem, preference-based learning is a concept that attempts to solve the latter by querying users with preference questions. In this paper, we will develop a new algorithm, batch active preference-based learning, that enables efficient learning… ▽ More

    Submitted 9 October, 2018; originally announced October 2018.

    Comments: Proceedings of the 2nd Conference on Robot Learning (CoRL), October 2018

  50. arXiv:1704.00096  [pdf, other

    physics.med-ph

    Reconstruction by Calibration over Tensors for Multi-Coil Multi-Acquisition Balanced SSFP Imaging

    Authors: Erdem Biyik, Efe Ilicak, Tolga Çukur

    Abstract: Purpose: To develop a rapid imaging framework for balanced steady-state free precession (bSSFP) that jointly reconstructs undersampled data (by a factor of R) across multiple coils (D) and multiple acquisitions (N). To devise a multi-acquisition coil compression technique for improved computational efficiency. Methods: The bSSFP image for a given coil and acquisition is modeled to be modulated b… ▽ More

    Submitted 6 September, 2017; v1 submitted 31 March, 2017; originally announced April 2017.

    Comments: To be published in Magnetic Resonance in Medicine. http://onlinelibrary.wiley.com/doi/10.1002/mrm.26902/abstract