Skip to main content

Showing 1–8 of 8 results for author: Mark, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2412.06685  [pdf, other

    cs.LG cs.AI

    Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone

    Authors: Max Sobol Mark, Tian Gao, Georgia Gabriela Sampaio, Mohan Kumar Srirama, Archit Sharma, Chelsea Finn, Aviral Kumar

    Abstract: Recent advances in learning decision-making policies can largely be attributed to training expressive policy models, largely via imitation learning. While imitation learning discards non-expert data, reinforcement learning (RL) can still learn from suboptimal data. However, instantiating RL training of a new policy class often presents a different challenge: most deep RL machinery is co-developed… ▽ More

    Submitted 9 December, 2024; originally announced December 2024.

  2. arXiv:2310.15145  [pdf, other

    cs.RO cs.AI cs.LG

    Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for Autonomous Real-World Reinforcement Learning

    Authors: Jingyun Yang, Max Sobol Mark, Brandon Vu, Archit Sharma, Jeannette Bohg, Chelsea Finn

    Abstract: The pre-train and fine-tune paradigm in machine learning has had dramatic success in a wide range of domains because the use of existing data or pre-trained models on the internet enables quick and easy learning of new tasks. We aim to enable this paradigm in robotic reinforcement learning, allowing a robot to learn a new task with little human effort by leveraging data and models from the Interne… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  3. arXiv:2310.08558  [pdf, other

    cs.LG cs.AI cs.RO

    Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias

    Authors: Max Sobol Mark, Archit Sharma, Fahim Tajwar, Rafael Rafailov, Sergey Levine, Chelsea Finn

    Abstract: It is desirable for policies to optimistically explore new states and behaviors during online reinforcement learning (RL) or fine-tuning, especially when prior offline data does not provide enough state coverage. However, exploration bonuses can bias the learned policy, and our experiments find that naive, yet standard use of such bonuses can fail to recover a performant policy. Concurrently, pess… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

  4. arXiv:2303.10871  [pdf, other

    cs.IR cs.AI

    NASA Science Mission Directorate Knowledge Graph Discovery

    Authors: Roelien C. Timmer, Fech Scen Khoo, Megan Mark, Marcella Scoczynski Ribeiro Martins, Anamaria Berea, Gregory Renard, Kaylin Bugbee

    Abstract: The size of the National Aeronautics and Space Administration (NASA) Science Mission Directorate (SMD) is growing exponentially, allowing researchers to make discoveries. However, making discoveries is challenging and time-consuming due to the size of the data catalogs, and as many concepts and data are indirectly connected. This paper proposes a pipeline to generate knowledge graphs (KGs) represe… ▽ More

    Submitted 20 March, 2023; originally announced March 2023.

  5. arXiv:2303.05479  [pdf, other

    cs.LG cs.AI

    Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning

    Authors: Mitsuhiko Nakamoto, Yuexiang Zhai, Anikait Singh, Max Sobol Mark, Yi Ma, Chelsea Finn, Aviral Kumar, Sergey Levine

    Abstract: A compelling use case of offline reinforcement learning (RL) is to obtain a policy initialization from existing datasets followed by fast online fine-tuning with limited interaction. However, existing offline RL methods tend to behave poorly during fine-tuning. In this paper, we devise an approach for learning an effective initialization from offline data that also enables fast online fine-tuning… ▽ More

    Submitted 19 January, 2024; v1 submitted 9 March, 2023; originally announced March 2023.

    Comments: NeurIPS 2023. project page: https://nakamotoo.github.io/Cal-QL

  6. arXiv:2102.03672  [pdf, other

    cs.LG

    Emergency Department Optimization and Load Prediction in Hospitals

    Authors: Karthik K. Padthe, Vikas Kumar, Carly M. Eckert, Nicholas M. Mark, Anam Zahid, Muhammad Aurangzeb Ahmad, Ankur Teredesai

    Abstract: Over the past several years, across the globe, there has been an increase in people seeking care in emergency departments (EDs). ED resources, including nurse staffing, are strained by such increases in patient volume. Accurate forecasting of incoming patient volume in emergency departments (ED) is crucial for efficient utilization and allocation of ED resources. Working with a suburban ED in the… ▽ More

    Submitted 6 February, 2021; originally announced February 2021.

    Comments: 7 pages, 3 figures, 4 tables

  7. arXiv:1905.11954  [pdf, other

    cs.CV cs.AI cs.LG

    Unsupervised Learning from Video with Deep Neural Embeddings

    Authors: Chengxu Zhuang, Tianwei She, Alex Andonian, Max Sobol Mark, Daniel Yamins

    Abstract: Because of the rich dynamical structure of videos and their ubiquity in everyday life, it is a natural idea that video data could serve as a powerful unsupervised learning signal for training visual representations in deep neural networks. However, instantiating this idea, especially at large scale, has remained a significant artificial intelligence challenge. Here we present the Video Instance Em… ▽ More

    Submitted 10 March, 2020; v1 submitted 28 May, 2019; originally announced May 2019.

    Comments: To appear in CVPR 2020

  8. arXiv:1802.04259  [pdf, other

    cs.CR cs.AR

    Sphinx: A Secure Architecture Based on Binary Code Diversification and Execution Obfuscation

    Authors: Michel A. Kinsy, Donato Kava, Alan Ehret, Miguel Mark

    Abstract: Sphinx, a hardware-software co-design architecture for binary code and runtime obfuscation. The Sphinx architecture uses binary code diversification and self-reconfigurable processing elements to maintain application functionality while obfuscating the binary code and architecture states to attackers. This approach dramatically reduces an attacker's ability to exploit information gained from one d… ▽ More

    Submitted 11 February, 2018; originally announced February 2018.

    Comments: Boston Area Architecture 2018 Workshop (BARC18)