-
AI Alignment with Changing and Influenceable Reward Functions
Authors:
Micah Carroll,
Davis Foote,
Anand Siththaranjan,
Stuart Russell,
Anca Dragan
Abstract:
Existing AI alignment approaches assume that preferences are static, which is unrealistic: our preferences change, and may even be influenced by our interactions with AI systems themselves. To clarify the consequences of incorrectly assuming static preferences, we introduce Dynamic Reward Markov Decision Processes (DR-MDPs), which explicitly model preference changes and the AI's influence on them.…
▽ More
Existing AI alignment approaches assume that preferences are static, which is unrealistic: our preferences change, and may even be influenced by our interactions with AI systems themselves. To clarify the consequences of incorrectly assuming static preferences, we introduce Dynamic Reward Markov Decision Processes (DR-MDPs), which explicitly model preference changes and the AI's influence on them. We show that despite its convenience, the static-preference assumption may undermine the soundness of existing alignment techniques, leading them to implicitly reward AI systems for influencing user preferences in ways users may not truly want. We then explore potential solutions. First, we offer a unifying perspective on how an agent's optimization horizon may partially help reduce undesirable AI influence. Then, we formalize different notions of AI alignment that account for preference change from the outset. Comparing the strengths and limitations of 8 such notions of alignment, we find that they all either err towards causing undesirable AI influence, or are overly risk-averse, suggesting that a straightforward solution to the problems of changing preferences may not exist. As there is no avoiding grappling with changing preferences in real-world settings, this makes it all the more important to handle these issues with care, balancing risks and capabilities. We hope our work can provide conceptual clarity and constitute a first step towards AI alignment practices which explicitly account for (and contend with) the changing and influenceable nature of human preferences.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback
Authors:
Leon Lang,
Davis Foote,
Stuart Russell,
Anca Dragan,
Erik Jenner,
Scott Emmons
Abstract:
Past analyses of reinforcement learning from human feedback (RLHF) assume that the human evaluators fully observe the environment. What happens when human feedback is based only on partial observations? We formally define two failure cases: deceptive inflation and overjustification. Modeling the human as Boltzmann-rational w.r.t. a belief over trajectories, we prove conditions under which RLHF is…
▽ More
Past analyses of reinforcement learning from human feedback (RLHF) assume that the human evaluators fully observe the environment. What happens when human feedback is based only on partial observations? We formally define two failure cases: deceptive inflation and overjustification. Modeling the human as Boltzmann-rational w.r.t. a belief over trajectories, we prove conditions under which RLHF is guaranteed to result in policies that deceptively inflate their performance, overjustify their behavior to make an impression, or both. Under the new assumption that the human's partial observability is known and accounted for, we then analyze how much information the feedback process provides about the return function. We show that sometimes, the human's feedback determines the return function uniquely up to an additive constant, but in other realistic cases, there is irreducible ambiguity. We propose exploratory research directions to help tackle these challenges, experimentally validate both the theoretical concerns and potential mitigations, and caution against blindly applying RLHF in partially observable settings.
△ Less
Submitted 17 November, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
Fast and Accurate Retrieval of Methane Concentration from Imaging Spectrometer Data Using Sparsity Prior
Authors:
Markus D. Foote,
Philip E. Dennison,
Andrew K. Thorpe,
David R. Thompson,
Siraput Jongaramrungruang,
Christian Frankenberg,
Sarang C. Joshi
Abstract:
The strong radiative forcing by atmospheric methane has stimulated interest in identifying natural and anthropogenic sources of this potent greenhouse gas. Point sources are important targets for quantification, and anthropogenic targets have potential for emissions reduction. Methane point source plume detection and concentration retrieval have been previously demonstrated using data from the Air…
▽ More
The strong radiative forcing by atmospheric methane has stimulated interest in identifying natural and anthropogenic sources of this potent greenhouse gas. Point sources are important targets for quantification, and anthropogenic targets have potential for emissions reduction. Methane point source plume detection and concentration retrieval have been previously demonstrated using data from the Airborne Visible InfraRed Imaging Spectrometer Next Generation (AVIRIS-NG). Current quantitative methods have tradeoffs between computational requirements and retrieval accuracy, creating obstacles for processing real-time data or large datasets from flight campaigns. We present a new computationally efficient algorithm that applies sparsity and an albedo correction to matched filter retrieval of trace gas concentration-pathlength. The new algorithm was tested using AVIRIS-NG data acquired over several point source plumes in Ahmedabad, India. The algorithm was validated using simulated AVIRIS-NG data including synthetic plumes of known methane concentration. Sparsity and albedo correction together reduced the root mean squared error of retrieved methane concentration-pathlength enhancement by 60.7% compared with a previous robust matched filter method. Background noise was reduced by a factor of 2.64. The new algorithm was able to process the entire 300 flightline 2016 AVIRIS-NG India campaign in just over 8 hours on a desktop computer with GPU acceleration.
△ Less
Submitted 5 March, 2020;
originally announced March 2020.
-
Learning Multiparametric Biomarkers for Assessing MR-Guided Focused Ultrasound Treatment of Malignant Tumors
Authors:
Blake E. Zimmerman,
Sara Johnson,
Henrik Odéen,
Jill Shea,
Markus D. Foote,
Nicole Winkler,
Sarang C. Joshi,
Allison Payne
Abstract:
Noninvasive MR-guided focused ultrasound (MRgFUS) treatments are promising alternatives to the surgical removal of malignant tumors. A significant challenge is assessing the viability of treated tissue during and immediately after MRgFUS procedures. Current clinical assessment uses the nonperfused volume (NPV) biomarker immediately after treatment from contrast-enhanced MRI. The NPV has variable a…
▽ More
Noninvasive MR-guided focused ultrasound (MRgFUS) treatments are promising alternatives to the surgical removal of malignant tumors. A significant challenge is assessing the viability of treated tissue during and immediately after MRgFUS procedures. Current clinical assessment uses the nonperfused volume (NPV) biomarker immediately after treatment from contrast-enhanced MRI. The NPV has variable accuracy, and the use of contrast agent prevents continuing MRgFUS treatment if tumor coverage is inadequate. This work presents a novel, noncontrast, learned multiparametric MR biomarker that can be used during treatment for intratreatment assessment, validated in a VX2 rabbit tumor model. A deep convolutional neural network was trained on noncontrast multiparametric MR images using the NPV biomarker from follow-up MR imaging (3-5 days after MRgFUS treatment) as the accurate label of nonviable tissue. A novel volume-conserving registration algorithm yielded a voxel-wise correlation between treatment and follow-up NPV, providing a rigorous validation of the biomarker. The learned noncontrast multiparametric MR biomarker predicted the follow-up NPV with an average DICE coefficient of 0.71, substantially outperforming the current clinical standard (DICE coefficient = 0.53). Noncontrast multiparametric MR imaging integrated with a deep convolutional neural network provides a more accurate prediction of MRgFUS treatment outcome than current contrast-based techniques.
△ Less
Submitted 29 September, 2020; v1 submitted 23 October, 2019;
originally announced October 2019.
-
Development and Validation of a Deep Learning Algorithm for Improving Gleason Scoring of Prostate Cancer
Authors:
Kunal Nagpal,
Davis Foote,
Yun Liu,
Po-Hsuan,
Chen,
Ellery Wulczyn,
Fraser Tan,
Niels Olson,
Jenny L. Smith,
Arash Mohtashamian,
James H. Wren,
Greg S. Corrado,
Robert MacDonald,
Lily H. Peng,
Mahul B. Amin,
Andrew J. Evans,
Ankur R. Sangoi,
Craig H. Mermel,
Jason D. Hipp,
Martin C. Stumpe
Abstract:
For prostate cancer patients, the Gleason score is one of the most important prognostic factors, potentially determining treatment independent of the stage. However, Gleason scoring is based on subjective microscopic examination of tumor morphology and suffers from poor reproducibility. Here we present a deep learning system (DLS) for Gleason scoring whole-slide images of prostatectomies. Our syst…
▽ More
For prostate cancer patients, the Gleason score is one of the most important prognostic factors, potentially determining treatment independent of the stage. However, Gleason scoring is based on subjective microscopic examination of tumor morphology and suffers from poor reproducibility. Here we present a deep learning system (DLS) for Gleason scoring whole-slide images of prostatectomies. Our system was developed using 112 million pathologist-annotated image patches from 1,226 slides, and evaluated on an independent validation dataset of 331 slides, where the reference standard was established by genitourinary specialist pathologists. On the validation dataset, the mean accuracy among 29 general pathologists was 0.61. The DLS achieved a significantly higher diagnostic accuracy of 0.70 (p=0.002) and trended towards better patient risk stratification in correlations to clinical follow-up data. Our approach could improve the accuracy of Gleason scoring and subsequent therapy decisions, particularly where specialist expertise is unavailable. The DLS also goes beyond the current Gleason system to more finely characterize and quantitate tumor morphology, providing opportunities for refinement of the Gleason system itself.
△ Less
Submitted 15 November, 2018;
originally announced November 2018.
-
Real-Time 2D-3D Deformable Registration with Deep Learning and Application to Lung Radiotherapy Targeting
Authors:
Markus D. Foote,
Blake E. Zimmerman,
Amit Sawant,
Sarang Joshi
Abstract:
Radiation therapy presents a need for dynamic tracking of a target tumor volume. Fiducial markers such as implanted gold seeds have been used to gate radiation delivery but the markers are invasive and gating significantly increases treatment time. Pretreatment acquisition of a respiratory correlated 4DCT allows for determination of accurate motion tracking which is useful in treatment planning. W…
▽ More
Radiation therapy presents a need for dynamic tracking of a target tumor volume. Fiducial markers such as implanted gold seeds have been used to gate radiation delivery but the markers are invasive and gating significantly increases treatment time. Pretreatment acquisition of a respiratory correlated 4DCT allows for determination of accurate motion tracking which is useful in treatment planning. We design a patient-specific motion subspace and a deep convolutional neural network to recover anatomical positions from a single fluoroscopic projection in real-time. We use this deep network to approximate the nonlinear inverse of a diffeomorphic deformation composed with radiographic projection. This network recovers subspace coordinates to define the patient-specific deformation of the lungs from a baseline anatomic position. The geometric accuracy of the subspace deformations on real patient data is similar to accuracy attained by original image registration between individual respiratory-phase image volumes.
△ Less
Submitted 25 September, 2019; v1 submitted 22 July, 2018;
originally announced July 2018.
-
#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning
Authors:
Haoran Tang,
Rein Houthooft,
Davis Foote,
Adam Stooke,
Xi Chen,
Yan Duan,
John Schulman,
Filip De Turck,
Pieter Abbeel
Abstract:
Count-based exploration algorithms are known to perform near-optimally when used in conjunction with tabular reinforcement learning (RL) methods for solving small discrete Markov decision processes (MDPs). It is generally thought that count-based methods cannot be applied in high-dimensional state spaces, since most states will only occur once. Recent deep RL exploration strategies are able to dea…
▽ More
Count-based exploration algorithms are known to perform near-optimally when used in conjunction with tabular reinforcement learning (RL) methods for solving small discrete Markov decision processes (MDPs). It is generally thought that count-based methods cannot be applied in high-dimensional state spaces, since most states will only occur once. Recent deep RL exploration strategies are able to deal with high-dimensional continuous state spaces through complex heuristics, often relying on optimism in the face of uncertainty or intrinsic motivation. In this work, we describe a surprising finding: a simple generalization of the classic count-based approach can reach near state-of-the-art performance on various high-dimensional and/or continuous deep RL benchmarks. States are mapped to hash codes, which allows to count their occurrences with a hash table. These counts are then used to compute a reward bonus according to the classic count-based exploration theory. We find that simple hash functions can achieve surprisingly good results on many challenging tasks. Furthermore, we show that a domain-dependent learned hash code may further improve these results. Detailed analysis reveals important aspects of a good hash function: 1) having appropriate granularity and 2) encoding information relevant to solving the MDP. This exploration strategy achieves near state-of-the-art performance on both continuous control tasks and Atari 2600 games, hence providing a simple yet powerful baseline for solving MDPs that require considerable exploration.
△ Less
Submitted 5 December, 2017; v1 submitted 15 November, 2016;
originally announced November 2016.