-
Reward Model Interpretability via Optimal and Pessimal Tokens
Authors:
Brian Christian,
Hannah Rose Kirk,
Jessica A. F. Thompson,
Christopher Summerfield,
Tsvetomira Dumbalska
Abstract:
Reward modeling has emerged as a crucial component in aligning large language models with human values. Significant attention has focused on using reward models as a means for fine-tuning generative models. However, the reward models themselves -- which directly encode human value judgments by turning prompt-response pairs into scalar rewards -- remain relatively understudied. We present a novel a…
▽ More
Reward modeling has emerged as a crucial component in aligning large language models with human values. Significant attention has focused on using reward models as a means for fine-tuning generative models. However, the reward models themselves -- which directly encode human value judgments by turning prompt-response pairs into scalar rewards -- remain relatively understudied. We present a novel approach to reward model interpretability through exhaustive analysis of their responses across their entire vocabulary space. By examining how different reward models score every possible single-token response to value-laden prompts, we uncover several striking findings: (i) substantial heterogeneity between models trained on similar objectives, (ii) systematic asymmetries in how models encode high- vs low-scoring tokens, (iii) significant sensitivity to prompt framing that mirrors human cognitive biases, and (iv) overvaluation of more frequent tokens. We demonstrate these effects across ten recent open-source reward models of varying parameter counts and architectures. Our results challenge assumptions about the interchangeability of reward models, as well as their suitability as proxies of complex and context-dependent human values. We find that these models can encode concerning biases toward certain identity groups, which may emerge as unintended consequences of harmlessness training -- distortions that risk propagating through the downstream large language models now deployed to millions.
△ Less
Submitted 8 June, 2025;
originally announced June 2025.
-
Integration of a Graph-Based Path Planner and Mixed-Integer MPC for Robot Navigation in Cluttered Environments
Authors:
Joshua A. Robbins,
Stephen J. Harnett,
Andrew F. Thompson,
Sean Brennan,
Herschel C. Pangborn
Abstract:
The ability to update a path plan is a required capability for autonomous mobile robots navigating through uncertain environments. This paper proposes a re-planning strategy using a multilayer planning and control framework for cases where the robot's environment is partially known. A medial axis graph-based planner defines a global path plan based on known obstacles where each edge in the graph c…
▽ More
The ability to update a path plan is a required capability for autonomous mobile robots navigating through uncertain environments. This paper proposes a re-planning strategy using a multilayer planning and control framework for cases where the robot's environment is partially known. A medial axis graph-based planner defines a global path plan based on known obstacles where each edge in the graph corresponds to a unique corridor. A mixed-integer model predictive control (MPC) method detects if a terminal constraint derived from the global plan is infeasible, subject to a non-convex description of the local environment. Infeasibility detection is used to trigger efficient global re-planning via medial axis graph edge deletion. The proposed re-planning strategy is demonstrated experimentally.
△ Less
Submitted 17 April, 2025;
originally announced April 2025.
-
Automated Functional Decomposition for Hybrid Zonotope Over-approximations with Application to LSTM Networks
Authors:
Jonah J. Glunt,
Jacob A. Siefert,
Andrew F. Thompson,
Justin Ruths,
Herschel C. Pangborn
Abstract:
Functional decomposition is a powerful tool for systems analysis because it can reduce a function of arbitrary input dimensions to the sum and superposition of functions of a single variable, thereby mitigating (or potentially avoiding) the exponential scaling often associated with analyses over high-dimensional spaces. This paper presents automated methods for constructing functional decompositio…
▽ More
Functional decomposition is a powerful tool for systems analysis because it can reduce a function of arbitrary input dimensions to the sum and superposition of functions of a single variable, thereby mitigating (or potentially avoiding) the exponential scaling often associated with analyses over high-dimensional spaces. This paper presents automated methods for constructing functional decompositions used to form set-based over-approximations of nonlinear functions, with particular focus on the hybrid zonotope set representation. To demonstrate these methods, we construct a hybrid zonotope set that over-approximates the input-output graph of a long short-term memory neural network, and use functional decomposition to represent a discrete hybrid automaton via a hybrid zonotope.
△ Less
Submitted 19 March, 2025;
originally announced March 2025.
-
Energy-Aware Predictive Motion Planning for Autonomous Vehicles Using a Hybrid Zonotope Constraint Representation
Authors:
Joshua A. Robbins,
Andrew F. Thompson,
Sean Brennan,
Herschel C. Pangborn
Abstract:
Uncrewed aerial systems have tightly coupled energy and motion dynamics which must be accounted for by onboard planning algorithms. This work proposes a strategy for coupled motion and energy planning using model predictive control (MPC). A reduced-order linear time-invariant model of coupled energy and motion dynamics is presented. Constrained zonotopes are used to represent state and input const…
▽ More
Uncrewed aerial systems have tightly coupled energy and motion dynamics which must be accounted for by onboard planning algorithms. This work proposes a strategy for coupled motion and energy planning using model predictive control (MPC). A reduced-order linear time-invariant model of coupled energy and motion dynamics is presented. Constrained zonotopes are used to represent state and input constraints, and hybrid zonotopes are used to represent non-convex constraints tied to a map of the environment. The structures of these constraint representations are exploited within a mixed-integer quadratic program solver tailored to MPC motion planning problems. Results apply the proposed methodology to coupled motion and energy utilization planning problems for 1) a hybrid-electric vehicle that must restrict engine usage when flying over regions with noise restrictions, and 2) an electric package delivery drone that must track waysets with both position and battery state of charge requirements. By leveraging the structure-exploiting solver, the proposed mixed-integer MPC formulations can be implemented in real time.
△ Less
Submitted 15 November, 2024; v1 submitted 5 November, 2024;
originally announced November 2024.
-
Zero-shot counting with a dual-stream neural network model
Authors:
Jessica A. F. Thompson,
Hannah Sheahan,
Tsvetomira Dumbalska,
Julian Sandbrink,
Manuela Piazza,
Christopher Summerfield
Abstract:
Deep neural networks have provided a computational framework for understanding object recognition, grounded in the neurophysiology of the primate ventral stream, but fail to account for how we process relational aspects of a scene. For example, deep neural networks fail at problems that involve enumerating the number of elements in an array, a problem that in humans relies on parietal cortex. Here…
▽ More
Deep neural networks have provided a computational framework for understanding object recognition, grounded in the neurophysiology of the primate ventral stream, but fail to account for how we process relational aspects of a scene. For example, deep neural networks fail at problems that involve enumerating the number of elements in an array, a problem that in humans relies on parietal cortex. Here, we build a 'dual-stream' neural network model which, equipped with both dorsal and ventral streams, can generalise its counting ability to wholly novel items ('zero-shot' counting). In doing so, it forms spatial response fields and lognormal number codes that resemble those observed in macaque posterior parietal cortex. We use the dual-stream network to make successful predictions about behavioural studies of the human gaze during similar counting tasks.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
Error Bounds for Compositions of Piecewise Affine Approximations
Authors:
Jonah J. Glunt,
Jacob A. Siefert,
Andrew F. Thompson,
Herschel C. Pangborn
Abstract:
Nonlinear expressions are often approximated by piecewise affine (PWA) functions to simplify analysis or reduce computational costs. To reduce computational complexity, multivariate functions can be represented as compositions of functions with one or two inputs, which can be approximated individually. This paper provides efficient methods to generate PWA approximations of nonlinear functions via…
▽ More
Nonlinear expressions are often approximated by piecewise affine (PWA) functions to simplify analysis or reduce computational costs. To reduce computational complexity, multivariate functions can be represented as compositions of functions with one or two inputs, which can be approximated individually. This paper provides efficient methods to generate PWA approximations of nonlinear functions via functional decomposition. The key contributions focus on intelligent placement of breakpoints for PWA approximations without requiring optimization, and on bounding the error of PWA compositions as a function of the error tolerance for each component of that composition. The proposed methods are used to systematically construct a PWA approximation for a complicated function, either to within a desired error tolerance or to a given level of complexity.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Set-valued State Estimation for Nonlinear Systems Using Hybrid Zonotopes
Authors:
Jacob A. Siefert,
Andrew F. Thompson,
Jonah J. Glunt,
Herschel C. Pangborn
Abstract:
This paper proposes a method for set-valued state estimation of nonlinear, discrete-time systems. This is achieved by combining graphs of functions representing system dynamics and measurements with the hybrid zonotope set representation that can efficiently represent nonconvex and disjoint sets. Tight over-approximations of complex nonlinear functions are efficiently produced by leveraging specia…
▽ More
This paper proposes a method for set-valued state estimation of nonlinear, discrete-time systems. This is achieved by combining graphs of functions representing system dynamics and measurements with the hybrid zonotope set representation that can efficiently represent nonconvex and disjoint sets. Tight over-approximations of complex nonlinear functions are efficiently produced by leveraging special ordered sets and neural networks, which enable computation of set-valued state estimates that grow linearly in memory complexity with time. A numerical example demonstrates significant reduction of conservatism in the set-valued state estimates using the proposed method as compared to an idealized convex approach.
△ Less
Submitted 16 September, 2023; v1 submitted 16 April, 2023;
originally announced April 2023.
-
Reachability Analysis Using Hybrid Zonotopes and Functional Decomposition
Authors:
Jacob A. Siefert,
Trevor J. Bird,
Andrew F. Thompson,
Jonah J. Glunt,
Justin P. Koeln,
Neera Jain,
Herschel C. Pangborn
Abstract:
This paper proposes methods for reachability analysis of nonlinear systems in both open loop and closed loop with advanced controllers. The methods combine hybrid zonotopes, a construct called a state-update set, functional decomposition, and special ordered set approximations to enable linear growth in reachable set memory complexity with time and linear scaling in computational complexity with t…
▽ More
This paper proposes methods for reachability analysis of nonlinear systems in both open loop and closed loop with advanced controllers. The methods combine hybrid zonotopes, a construct called a state-update set, functional decomposition, and special ordered set approximations to enable linear growth in reachable set memory complexity with time and linear scaling in computational complexity with the system dimension. Facilitating this combination are new identities for constructing nonconvex sets that contain nonlinear functions and for efficiently converting a collection of polytopes from vertex representation to hybrid zonotope representation. Benchmark numerical examples from the literature demonstrate the proposed methods and provide comparison to state-of-the-art techniques.
△ Less
Submitted 22 February, 2024; v1 submitted 13 April, 2023;
originally announced April 2023.
-
Resolvent analysis of stratification effects on wall-bounded shear flows
Authors:
M. A. Ahmed,
H. J. Bae,
A. F. Thompson,
B. J. McKeon
Abstract:
The interaction between shear driven turbulence and stratification is a key process in a wide array of geophysical flows with spatio-temporal scales that span many orders of magnitude. A quick numerical model prediction based on external parameters of stratified boundary layers could greatly benefit the understanding of the interaction between velocity and scalar flux at varying scales. For these…
▽ More
The interaction between shear driven turbulence and stratification is a key process in a wide array of geophysical flows with spatio-temporal scales that span many orders of magnitude. A quick numerical model prediction based on external parameters of stratified boundary layers could greatly benefit the understanding of the interaction between velocity and scalar flux at varying scales. For these reasons, here, we use the resolvent framework to investigate the effects of an active scalar on incompressible wall-bounded turbulence. We obtain the state of the flow system by applying the linear resolvent operator to the nonlinear terms in the governing Navier-Stokes equations with the Boussinesq approximation. This extends the formulation to include the scalar advection equation with the scalar component acting in the wall-normal direction in the momentum equations. We use the mean velocity profiles from a DNS of a stably-stratified turbulent channel flow at varying friction Richardson number. The results obtained from the resolvent analysis are compared to the premultiplied energy spectra, auto-correlation coefficient, and the energy budget terms obtained from the DNS. It is shown that despite using only a very limited range of representative scales, the resolvent model is able to reproduce the balance of energy budget terms as well as provide meaningful insight of coherent structures occurring in the flow. Computation of the leading resolvent models, despite considering a limited range of scales, reproduces the balance of energy budget terms, provides meaningful predictions of coherent structures in the flow, and is more cost-effective than performing full-scale simulations. This quick model can provide further understanding of stratified flows with only information about the mean profile and prior knowledge of energetic scales of motion in the neutrally-buoyant boundary layers.
△ Less
Submitted 27 January, 2021;
originally announced January 2021.
-
A pole-to-equator ocean overturning circulation on Enceladus
Authors:
Ana H. Lobo,
Andrew F. Thompson,
Steven D. Vance,
Saikiran Tharimena
Abstract:
Enceladus is believed to have a saltwater global ocean with a mean depth of at least 30~km, heated from below at the ocean-core interface and cooled at the top, where the ocean loses heat to the icy lithosphere above. This scenario suggests an important role for vertical convection to influence the interior properties and circulation of Enceladus' ocean. Additionally, the ice shell that encompasse…
▽ More
Enceladus is believed to have a saltwater global ocean with a mean depth of at least 30~km, heated from below at the ocean-core interface and cooled at the top, where the ocean loses heat to the icy lithosphere above. This scenario suggests an important role for vertical convection to influence the interior properties and circulation of Enceladus' ocean. Additionally, the ice shell that encompasses the ocean has dramatic meridional thickness variations that, in steady state, must be sustained against processes acting to remove these ice thickness anomalies. One mechanism that would maintain variations in the ice shell thickness involves spatially-separated regions of freezing and melting at the ocean-ice interface. Here, we use an idealized, dynamical ocean model forced by an observationally-guided density forcing at the ocean-ice interface to argue that Enceladus' interior ocean should support a meridional overturning circulation. This circulation establishes an interior density structure that is more complex than in studies that have focused only on convection, including a shallow freshwater lens in the polar regions. Spatially-separated sites of ice formation and melt enable Enceladus to sustain a significant vertical and horizontal stratification, which impacts interior heat transport, and is critical for understanding the relationship between a global ocean and the planetary energy budget. The presence of low salinity layers near the polar ocean-ice interface also influences whether samples measured from the plumes are representative of the global ocean.
△ Less
Submitted 12 July, 2020;
originally announced July 2020.
-
The effect of task and training on intermediate representations in convolutional neural networks revealed with modified RV similarity analysis
Authors:
Jessica A. F. Thompson,
Yoshua Bengio,
Marc Schoenwiesner
Abstract:
Centered Kernel Alignment (CKA) was recently proposed as a similarity metric for comparing activation patterns in deep networks. Here we experiment with the modified RV-coefficient (RV2), which has very similar properties as CKA while being less sensitive to dataset size. We compare the representations of networks that received varying amounts of training on different layers: a standard trained ne…
▽ More
Centered Kernel Alignment (CKA) was recently proposed as a similarity metric for comparing activation patterns in deep networks. Here we experiment with the modified RV-coefficient (RV2), which has very similar properties as CKA while being less sensitive to dataset size. We compare the representations of networks that received varying amounts of training on different layers: a standard trained network (all parameters updated at every step), a freeze trained network (layers gradually frozen during training), random networks (only some layers trained), and a completely untrained network. We found that RV2 was able to recover expected similarity patterns and provide interpretable similarity matrices that suggested hypotheses about how representations are affected by different training recipes. We propose that the superior performance achieved by freeze training can be attributed to representational differences in the penultimate layer. Our comparisons of random networks suggest that the inputs and targets serve as anchors on the representations in the lowest and highest layers.
△ Less
Submitted 4 December, 2019;
originally announced December 2019.
-
How can deep learning advance computational modeling of sensory information processing?
Authors:
Jessica A. F. Thompson,
Yoshua Bengio,
Elia Formisano,
Marc Schönwiesner
Abstract:
Deep learning, computational neuroscience, and cognitive science have overlapping goals related to understanding intelligence such that perception and behaviour can be simulated in computational systems. In neuroimaging, machine learning methods have been used to test computational models of sensory information processing. Recently, these model comparison techniques have been used to evaluate deep…
▽ More
Deep learning, computational neuroscience, and cognitive science have overlapping goals related to understanding intelligence such that perception and behaviour can be simulated in computational systems. In neuroimaging, machine learning methods have been used to test computational models of sensory information processing. Recently, these model comparison techniques have been used to evaluate deep neural networks (DNNs) as models of sensory information processing. However, the interpretation of such model evaluations is muddied by imprecise statistical conclusions. Here, we make explicit the types of conclusions that can be drawn from these existing model comparison techniques and how these conclusions change when the model in question is a DNN. We discuss how DNNs are amenable to new model comparison techniques that allow for stronger conclusions to be made about the computational mechanisms underlying sensory information processing.
△ Less
Submitted 25 September, 2018;
originally announced October 2018.