Search | arXiv e-print repository

PLAICraft: Large-Scale Time-Aligned Vision-Speech-Action Dataset for Embodied AI

Authors: Yingchen He, Christian D. Weilbach, Martyna E. Wojciechowska, Yuxuan Zhang, Frank Wood

Abstract: Advances in deep generative modelling have made it increasingly plausible to train human-level embodied agents. Yet progress has been limited by the absence of large-scale, real-time, multi-modal, and socially interactive datasets that reflect the sensory-motor complexity of natural environments. To address this, we present PLAICraft, a novel data collection platform and dataset capturing multipla… ▽ More Advances in deep generative modelling have made it increasingly plausible to train human-level embodied agents. Yet progress has been limited by the absence of large-scale, real-time, multi-modal, and socially interactive datasets that reflect the sensory-motor complexity of natural environments. To address this, we present PLAICraft, a novel data collection platform and dataset capturing multiplayer Minecraft interactions across five time-aligned modalities: video, game output audio, microphone input audio, mouse, and keyboard actions. Each modality is logged with millisecond time precision, enabling the study of synchronous, embodied behaviour in a rich, open-ended world. The dataset comprises over 10,000 hours of gameplay from more than 10,000 global participants.\footnote{We have done a privacy review for the public release of an initial 200-hour subset of the dataset, with plans to release most of the dataset over time.} Alongside the dataset, we provide an evaluation suite for benchmarking model capabilities in object recognition, spatial awareness, language grounding, and long-term memory. PLAICraft opens a path toward training and evaluating agents that act fluently and purposefully in real time, paving the way for truly embodied artificial intelligence. △ Less

Submitted 19 May, 2025; originally announced May 2025.

Comments: 9 pages, 8 figures

arXiv:2503.11537 [pdf, other]

Basic stability tests of machine learning potentials for molecular simulations in computational drug discovery

Authors: Kavindri Ranasinghe, Adam L. Baskerville, Geoffrey P. F. Wood, Gerhard Koenig

Abstract: Neural network potentials trained on quantum-mechanical data can calculate molecular interactions with relatively high speed and accuracy. However, neural network potentials might exhibit instabilities, nonphysical behavior, or lack accuracy. To assess the reliability of neural network potentials, a series of tests is conducted during model training, in the gas phase, and in the condensed phase. T… ▽ More Neural network potentials trained on quantum-mechanical data can calculate molecular interactions with relatively high speed and accuracy. However, neural network potentials might exhibit instabilities, nonphysical behavior, or lack accuracy. To assess the reliability of neural network potentials, a series of tests is conducted during model training, in the gas phase, and in the condensed phase. The testing procedure is performed for eight in-house neural network potentials based on the ANI-2x dataset, using both the ANI-2x and MACE architectures. This allows an evaluation of the effect of the model architecture on its performance. We also perform stability tests of the publicly available neural network potentials ANI-2x, ANI-1ccx, MACE-OFF23, and AIMNet2. A normal mode analysis of 14 simple benchmark molecules revealed that the small MACE-OFF23 model shows large deviations from the reference quantum-mechanical energy surface. Also, some MACE models with a reduced number of parameters failed to produce stable molecular dynamics simulations in the gas phase, and all MACE models exhibit unfavorable behavior during steric clashes. The published ANI-2x and one of the in-house MACE models are not able to reproduce the structure of liquid water at ambient conditions, forming an amorphous solid phase instead. The ANI-1ccx model shows nonphysical additional energy minima in bond length and bond angle space, which caused a phase transition to an amorphous solid. Out of all 13 considered public and in-house models, only one in-house model based on the ANI-2x B97-3c dataset shows better agreement with the experimental radial distribution function of water than the simple molecular mechanics TIP3P model. This shows that great care must be taken during model training and when selecting a neural network potential for real-world applications. △ Less

Submitted 14 March, 2025; originally announced March 2025.

Comments: 30 pages, 5 figures

MSC Class: 82D03

arXiv:2502.20371 [pdf, other]

Constrained Generative Modeling with Manually Bridged Diffusion Models

Authors: Saeid Naderiparizi, Xiaoxuan Liang, Berend Zwartsenberg, Frank Wood

Abstract: In this paper we describe a novel framework for diffusion-based generative modeling on constrained spaces. In particular, we introduce manual bridges, a framework that expands the kinds of constraints that can be practically used to form so-called diffusion bridges. We develop a mechanism for combining multiple such constraints so that the resulting multiply-constrained model remains a manual brid… ▽ More In this paper we describe a novel framework for diffusion-based generative modeling on constrained spaces. In particular, we introduce manual bridges, a framework that expands the kinds of constraints that can be practically used to form so-called diffusion bridges. We develop a mechanism for combining multiple such constraints so that the resulting multiply-constrained model remains a manual bridge that respects all constraints. We also develop a mechanism for training a diffusion model that respects such multiple constraints while also adapting it to match a data distribution. We develop and extend theory demonstrating the mathematical validity of our mechanisms. Additionally, we demonstrate our mechanism in constrained generative modeling tasks, highlighting a particular high-value application in modeling trajectory initializations for path planning and control in autonomous vehicles. △ Less

Submitted 27 February, 2025; originally announced February 2025.

Comments: AAAI 2025

arXiv:2502.09587 [pdf, other]

Rolling Ahead Diffusion for Traffic Scene Simulation

Authors: Yunpeng Liu, Matthew Niedoba, William Harvey, Adam Scibior, Berend Zwartsenberg, Frank Wood

Abstract: Realistic driving simulation requires that NPCs not only mimic natural driving behaviors but also react to the behavior of other simulated agents. Recent developments in diffusion-based scenario generation focus on creating diverse and realistic traffic scenarios by jointly modelling the motion of all the agents in the scene. However, these traffic scenarios do not react when the motion of agents… ▽ More Realistic driving simulation requires that NPCs not only mimic natural driving behaviors but also react to the behavior of other simulated agents. Recent developments in diffusion-based scenario generation focus on creating diverse and realistic traffic scenarios by jointly modelling the motion of all the agents in the scene. However, these traffic scenarios do not react when the motion of agents deviates from their modelled trajectories. For example, the ego-agent can be controlled by a stand along motion planner. To produce reactive scenarios with joint scenario models, the model must regenerate the scenario at each timestep based on new observations in a Model Predictive Control (MPC) fashion. Although reactive, this method is time-consuming, as one complete possible future for all NPCs is generated per simulation step. Alternatively, one can utilize an autoregressive model (AR) to predict only the immediate next-step future for all NPCs. Although faster, this method lacks the capability for advanced planning. We present a rolling diffusion based traffic scene generation model which mixes the benefits of both methods by predicting the next step future and simultaneously predicting partially noised further future steps at the same time. We show that such model is efficient compared to diffusion model based AR, achieving a beneficial compromise between reactivity and computational efficiency. △ Less

Submitted 13 February, 2025; originally announced February 2025.

Comments: Accepted to Workshop on Machine Learning for Autonomous Driving at AAAI 2025

arXiv:2501.12408 [pdf, other]

Control-ITRA: Controlling the Behavior of a Driving Model

Authors: Vasileios Lioutas, Adam Scibior, Matthew Niedoba, Berend Zwartsenberg, Frank Wood

Abstract: Simulating realistic driving behavior is crucial for developing and testing autonomous systems in complex traffic environments. Equally important is the ability to control the behavior of simulated agents to tailor scenarios to specific research needs and safety considerations. This paper extends the general-purpose multi-agent driving behavior model ITRA (Scibior et al., 2021), by introducing a m… ▽ More Simulating realistic driving behavior is crucial for developing and testing autonomous systems in complex traffic environments. Equally important is the ability to control the behavior of simulated agents to tailor scenarios to specific research needs and safety considerations. This paper extends the general-purpose multi-agent driving behavior model ITRA (Scibior et al., 2021), by introducing a method called Control-ITRA to influence agent behavior through waypoint assignment and target speed modulation. By conditioning agents on these two aspects, we provide a mechanism for them to adhere to specific trajectories and indirectly adjust their aggressiveness. We compare different approaches for integrating these conditions during training and demonstrate that our method can generate controllable, infraction-free trajectories while preserving realism in both seen and unseen locations. △ Less

Submitted 16 January, 2025; originally announced January 2025.

Comments: 16 pages, 2 figures

arXiv:2411.19339 [pdf, other]

Towards a Mechanistic Explanation of Diffusion Model Generalization

Authors: Matthew Niedoba, Berend Zwartsenberg, Kevin Murphy, Frank Wood

Abstract: We propose a simple, training-free mechanism which explains the generalization behaviour of diffusion models. By comparing pre-trained diffusion models to their theoretically optimal empirical counterparts, we identify a shared local inductive bias across a variety of network architectures. From this observation, we hypothesize that network denoisers generalize through localized denoising operatio… ▽ More We propose a simple, training-free mechanism which explains the generalization behaviour of diffusion models. By comparing pre-trained diffusion models to their theoretically optimal empirical counterparts, we identify a shared local inductive bias across a variety of network architectures. From this observation, we hypothesize that network denoisers generalize through localized denoising operations, as these operations approximate the training objective well over much of the training distribution. To validate our hypothesis, we introduce novel denoising algorithms which aggregate local empirical denoisers to replicate network behaviour. Comparing these algorithms to network denoisers across forward and reverse diffusion processes, our approach exhibits consistent visual similarity to neural network outputs, with lower mean squared error than previously proposed methods. △ Less

Submitted 14 February, 2025; v1 submitted 28 November, 2024; originally announced November 2024.

Comments: 24 pages, 23 figures

arXiv:2410.16818 [pdf, other]

doi 10.1021/acs.jctc.4c01427

An evaluation of machine learning/molecular mechanics end-state corrections with mechanical embedding to calculate relative protein-ligand binding free energies

Authors: Johannes Karwounopoulos, Mateusz Bieniek, Zhiyi Wu, Adam L. Baskerville, Gerhard Koenig, Benjamin P. Cossins, Geoffrey P. F. Wood

Abstract: The development of machine-learning (ML) potentials offers significant accuracy improvements compared to molecular mechanics (MM) because of the inclusion of quantum-mechanical effects in molecular interactions. However, ML simulations are several times more computationally demanding than MM simulations, so there is a trade-off between speed and accuracy. One possible compromise are hybrid machine… ▽ More The development of machine-learning (ML) potentials offers significant accuracy improvements compared to molecular mechanics (MM) because of the inclusion of quantum-mechanical effects in molecular interactions. However, ML simulations are several times more computationally demanding than MM simulations, so there is a trade-off between speed and accuracy. One possible compromise are hybrid machine learning/molecular mechanics (ML/MM) approaches with mechanical embedding that treat the intramolecular interactions of the ligand at the ML level and the protein-ligand interactions at the MM level. Recent studies have reported improved protein-ligand binding free energy results based on ML/MM with mechanical embedding, arguing that intramolecular interactions like torsion potentials of the ligand are often the limiting factor for accuracy. This claim is evaluated based on 108 relative binding free energy calculations for four different benchmark systems. As an alternative strategy, we also tested a tool that fits the MM dihedral potentials to the ML level of theory. Overall, the relative binding free energy results from MM with Open Force Field 2.2.0, MM with ML-fitted torsion potentials, and the corresponding ML/MM end-state corrected simulations show no statistically significant differences in the mean absolute errors (between 0.8 and 0.9 kcal/mol). Therefore, a well-parameterized force field is on a par with simple mechanical embedding ML/MM simulations for protein-ligand binding. In terms of computational costs, the reparametrization of poor torsional potentials is preferable over employing computationally intensive ML/MM simulations of protein-ligand complexes with mechanical embedding. Also, the refitting strategy leads to lower variances of the protein-ligand binding free energy results than the ML/MM end-state corrections. △ Less

Submitted 22 October, 2024; originally announced October 2024.

Comments: 47 pages, 13 figures, 10 tables

MSC Class: 82D03

Journal ref: J. Chem. Theory Comput. 2025, 21(2), 967-977

arXiv:2407.05494 [pdf, other]

Prospective Messaging: Learning in Networks with Communication Delays

Authors: Ryan Fayyazi, Christian Weilbach, Frank Wood

Abstract: Inter-neuron communication delays are ubiquitous in physically realized neural networks such as biological neural circuits and neuromorphic hardware. These delays have significant and often disruptive consequences on network dynamics during training and inference. It is therefore essential that communication delays be accounted for, both in computational models of biological neural networks and in… ▽ More Inter-neuron communication delays are ubiquitous in physically realized neural networks such as biological neural circuits and neuromorphic hardware. These delays have significant and often disruptive consequences on network dynamics during training and inference. It is therefore essential that communication delays be accounted for, both in computational models of biological neural networks and in large-scale neuromorphic systems. Nonetheless, communication delays have yet to be comprehensively addressed in either domain. In this paper, we first show that delays prevent state-of-the-art continuous-time neural networks called Latent Equilibrium (LE) networks from learning even simple tasks despite significant overparameterization. We then propose to compensate for communication delays by predicting future signals based on currently available ones. This conceptually straightforward approach, which we call prospective messaging (PM), uses only neuron-local information, and is flexible in terms of memory and computation requirements. We demonstrate that incorporating PM into delayed LE networks prevents reaction lags, and facilitates successful learning on Fourier synthesis and autoregressive video prediction tasks. △ Less

Submitted 8 July, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

arXiv:2406.04814 [pdf, other]

Lifelong Learning of Video Diffusion Models From a Single Video Stream

Authors: Jason Yoo, Yingchen He, Saeid Naderiparizi, Dylan Green, Gido M. van de Ven, Geoff Pleiss, Frank Wood

Abstract: This work demonstrates that training autoregressive video diffusion models from a single, continuous video stream is not only possible but remarkably can also be competitive with standard offline training approaches given the same number of gradient steps. Our demonstration further reveals that this main result can be achieved using experience replay that only retains a subset of the preceding vid… ▽ More This work demonstrates that training autoregressive video diffusion models from a single, continuous video stream is not only possible but remarkably can also be competitive with standard offline training approaches given the same number of gradient steps. Our demonstration further reveals that this main result can be achieved using experience replay that only retains a subset of the preceding video stream. We also contribute three new single video generative modeling datasets suitable for evaluating lifelong video model learning: Lifelong Bouncing Balls, Lifelong 3D Maze, and Lifelong PLAICraft. Each dataset contains over a million consecutive frames from a synthetic environment of increasing complexity. △ Less

Submitted 28 November, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

arXiv:2405.04491 [pdf, other]

TorchDriveEnv: A Reinforcement Learning Benchmark for Autonomous Driving with Reactive, Realistic, and Diverse Non-Playable Characters

Authors: Jonathan Wilder Lavington, Ke Zhang, Vasileios Lioutas, Matthew Niedoba, Yunpeng Liu, Dylan Green, Saeid Naderiparizi, Xiaoxuan Liang, Setareh Dabiri, Adam Ścibior, Berend Zwartsenberg, Frank Wood

Abstract: The training, testing, and deployment, of autonomous vehicles requires realistic and efficient simulators. Moreover, because of the high variability between different problems presented in different autonomous systems, these simulators need to be easy to use, and easy to modify. To address these problems we introduce TorchDriveSim and its benchmark extension TorchDriveEnv. TorchDriveEnv is a light… ▽ More The training, testing, and deployment, of autonomous vehicles requires realistic and efficient simulators. Moreover, because of the high variability between different problems presented in different autonomous systems, these simulators need to be easy to use, and easy to modify. To address these problems we introduce TorchDriveSim and its benchmark extension TorchDriveEnv. TorchDriveEnv is a lightweight reinforcement learning benchmark programmed entirely in Python, which can be modified to test a number of different factors in learned vehicle behavior, including the effect of varying kinematic models, agent types, and traffic control patterns. Most importantly unlike many replay based simulation approaches, TorchDriveEnv is fully integrated with a state of the art behavioral simulation API. This allows users to train and evaluate driving models alongside data driven Non-Playable Characters (NPC) whose initializations and driving behavior are reactive, realistic, and diverse. We illustrate the efficiency and simplicity of TorchDriveEnv by evaluating common reinforcement learning baselines in both training and validation environments. Our experiments show that TorchDriveEnv is easy to use, but difficult to solve. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.00251 [pdf, other]

Semantically Consistent Video Inpainting with Conditional Diffusion Models

Authors: Dylan Green, William Harvey, Saeid Naderiparizi, Matthew Niedoba, Yunpeng Liu, Xiaoxuan Liang, Jonathan Lavington, Ke Zhang, Vasileios Lioutas, Setareh Dabiri, Adam Scibior, Berend Zwartsenberg, Frank Wood

Abstract: Current state-of-the-art methods for video inpainting typically rely on optical flow or attention-based approaches to inpaint masked regions by propagating visual information across frames. While such approaches have led to significant progress on standard benchmarks, they struggle with tasks that require the synthesis of novel content that is not present in other frames. In this paper, we reframe… ▽ More Current state-of-the-art methods for video inpainting typically rely on optical flow or attention-based approaches to inpaint masked regions by propagating visual information across frames. While such approaches have led to significant progress on standard benchmarks, they struggle with tasks that require the synthesis of novel content that is not present in other frames. In this paper, we reframe video inpainting as a conditional generative modeling problem and present a framework for solving such problems with conditional video diffusion models. We introduce inpainting-specific sampling schemes which capture crucial long-range dependencies in the context, and devise a novel method for conditioning on the known pixels in incomplete frames. We highlight the advantages of using a generative approach for this task, showing that our method is capable of generating diverse, high-quality inpaintings and synthesizing new content that is spatially, temporally, and semantically consistent with the provided context. △ Less

Submitted 8 October, 2024; v1 submitted 30 April, 2024; originally announced May 2024.

arXiv:2404.09636 [pdf, other]

All-in-one simulation-based inference

Authors: Manuel Gloeckler, Michael Deistler, Christian Weilbach, Frank Wood, Jakob H. Macke

Abstract: Amortized Bayesian inference trains neural networks to solve stochastic inference problems using model simulations, thereby making it possible to rapidly perform Bayesian inference for any newly observed data. However, current simulation-based amortized inference methods are simulation-hungry and inflexible: They require the specification of a fixed parametric prior, simulator, and inference tasks… ▽ More Amortized Bayesian inference trains neural networks to solve stochastic inference problems using model simulations, thereby making it possible to rapidly perform Bayesian inference for any newly observed data. However, current simulation-based amortized inference methods are simulation-hungry and inflexible: They require the specification of a fixed parametric prior, simulator, and inference tasks ahead of time. Here, we present a new amortized inference method -- the Simformer -- which overcomes these limitations. By training a probabilistic diffusion model with transformer architectures, the Simformer outperforms current state-of-the-art amortized inference approaches on benchmark tasks and is substantially more flexible: It can be applied to models with function-valued parameters, it can handle inference scenarios with missing or unstructured data, and it can sample arbitrary conditionals of the joint distribution of parameters and data, including both posterior and likelihood. We showcase the performance and flexibility of the Simformer on simulators from ecology, epidemiology, and neuroscience, and demonstrate that it opens up new possibilities and application domains for amortized Bayesian inference on simulation-based models. △ Less

Submitted 15 July, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

Comments: To be published in the proceedings of the 41st International Conference on Machine Learning (ICML 2024), Vienna, Austria. PMLR 235, 2024

arXiv:2403.00025 [pdf, ps, other]

On the Challenges and Opportunities in Generative AI

Authors: Laura Manduchi, Kushagra Pandey, Clara Meister, Robert Bamler, Ryan Cotterell, Sina Däubener, Sophie Fellenz, Asja Fischer, Thomas Gärtner, Matthias Kirchler, Marius Kloft, Yingzhen Li, Christoph Lippert, Gerard de Melo, Eric Nalisnick, Björn Ommer, Rajesh Ranganath, Maja Rudolph, Karen Ullrich, Guy Van den Broeck, Julia E Vogt, Yixin Wang, Florian Wenzel, Frank Wood, Stephan Mandt , et al. (1 additional authors not shown)

Abstract: The field of deep generative modeling has grown rapidly in the last few years. With the availability of massive amounts of training data coupled with advances in scalable unsupervised learning paradigms, recent large-scale generative models show tremendous promise in synthesizing high-resolution images and text, as well as structured data such as videos and molecules. However, we argue that curren… ▽ More The field of deep generative modeling has grown rapidly in the last few years. With the availability of massive amounts of training data coupled with advances in scalable unsupervised learning paradigms, recent large-scale generative models show tremendous promise in synthesizing high-resolution images and text, as well as structured data such as videos and molecules. However, we argue that current large-scale generative AI models exhibit several fundamental shortcomings that hinder their widespread adoption across domains. In this work, our objective is to identify these issues and highlight key unresolved challenges in modern generative AI paradigms that should be addressed to further enhance their capabilities, versatility, and reliability. By identifying these challenges, we aim to provide researchers with insights for exploring fruitful research directions, thus fostering the development of more robust and accessible generative AI solutions. △ Less

Submitted 20 March, 2025; v1 submitted 28 February, 2024; originally announced March 2024.

arXiv:2402.09542 [pdf, other]

Layerwise Proximal Replay: A Proximal Point Method for Online Continual Learning

Authors: Jason Yoo, Yunpeng Liu, Frank Wood, Geoff Pleiss

Abstract: In online continual learning, a neural network incrementally learns from a non-i.i.d. data stream. Nearly all online continual learning methods employ experience replay to simultaneously prevent catastrophic forgetting and underfitting on past data. Our work demonstrates a limitation of this approach: neural networks trained with experience replay tend to have unstable optimization trajectories, i… ▽ More In online continual learning, a neural network incrementally learns from a non-i.i.d. data stream. Nearly all online continual learning methods employ experience replay to simultaneously prevent catastrophic forgetting and underfitting on past data. Our work demonstrates a limitation of this approach: neural networks trained with experience replay tend to have unstable optimization trajectories, impeding their overall accuracy. Surprisingly, these instabilities persist even when the replay buffer stores all previous training examples, suggesting that this issue is orthogonal to catastrophic forgetting. We minimize these instabilities through a simple modification of the optimization geometry. Our solution, Layerwise Proximal Replay (LPR), balances learning from new and replay data while only allowing for gradual changes in the hidden activation of past data. We demonstrate that LPR consistently improves replay-based online continual learning methods across multiple problem settings, regardless of the amount of available replay memory. △ Less

Submitted 18 July, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

arXiv:2402.08018 [pdf, other]

Nearest Neighbour Score Estimators for Diffusion Generative Models

Authors: Matthew Niedoba, Dylan Green, Saeid Naderiparizi, Vasileios Lioutas, Jonathan Wilder Lavington, Xiaoxuan Liang, Yunpeng Liu, Ke Zhang, Setareh Dabiri, Adam Ścibior, Berend Zwartsenberg, Frank Wood

Abstract: Score function estimation is the cornerstone of both training and sampling from diffusion generative models. Despite this fact, the most commonly used estimators are either biased neural network approximations or high variance Monte Carlo estimators based on the conditional score. We introduce a novel nearest neighbour score function estimator which utilizes multiple samples from the training set… ▽ More Score function estimation is the cornerstone of both training and sampling from diffusion generative models. Despite this fact, the most commonly used estimators are either biased neural network approximations or high variance Monte Carlo estimators based on the conditional score. We introduce a novel nearest neighbour score function estimator which utilizes multiple samples from the training set to dramatically decrease estimator variance. We leverage our low variance estimator in two compelling applications. Training consistency models with our estimator, we report a significant increase in both convergence speed and sample quality. In diffusion models, we show that our estimator can replace a learned network for probability-flow ODE integration, opening promising new avenues of future research. △ Less

Submitted 16 July, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

Comments: 25 pages, 9 figures. To be published in ICML 2024

arXiv:2309.12508 [pdf, other]

A Diffusion-Model of Joint Interactive Navigation

Authors: Matthew Niedoba, Jonathan Wilder Lavington, Yunpeng Liu, Vasileios Lioutas, Justice Sefas, Xiaoxuan Liang, Dylan Green, Setareh Dabiri, Berend Zwartsenberg, Adam Scibior, Frank Wood

Abstract: Simulation of autonomous vehicle systems requires that simulated traffic participants exhibit diverse and realistic behaviors. The use of prerecorded real-world traffic scenarios in simulation ensures realism but the rarity of safety critical events makes large scale collection of driving scenarios expensive. In this paper, we present DJINN - a diffusion based method of generating traffic scenario… ▽ More Simulation of autonomous vehicle systems requires that simulated traffic participants exhibit diverse and realistic behaviors. The use of prerecorded real-world traffic scenarios in simulation ensures realism but the rarity of safety critical events makes large scale collection of driving scenarios expensive. In this paper, we present DJINN - a diffusion based method of generating traffic scenarios. Our approach jointly diffuses the trajectories of all agents, conditioned on a flexible set of state observations from the past, present, or future. On popular trajectory forecasting datasets, we report state of the art performance on joint trajectory metrics. In addition, we demonstrate how DJINN flexibly enables direct test-time sampling from a variety of valuable conditional distributions including goal-based sampling, behavior-class sampling, and scenario editing. △ Less

Submitted 24 October, 2023; v1 submitted 21 September, 2023; originally announced September 2023.

Comments: 10 pages, 4 figures. Accepted to NeurIPS 2023

arXiv:2307.16463 [pdf, other]

Don't be so negative! Score-based Generative Modeling with Oracle-assisted Guidance

Authors: Saeid Naderiparizi, Xiaoxuan Liang, Berend Zwartsenberg, Frank Wood

Abstract: The maximum likelihood principle advocates parameter estimation via optimization of the data likelihood function. Models estimated in this way can exhibit a variety of generalization characteristics dictated by, e.g. architecture, parameterization, and optimization bias. This work addresses model learning in a setting where there further exists side-information in the form of an oracle that can la… ▽ More The maximum likelihood principle advocates parameter estimation via optimization of the data likelihood function. Models estimated in this way can exhibit a variety of generalization characteristics dictated by, e.g. architecture, parameterization, and optimization bias. This work addresses model learning in a setting where there further exists side-information in the form of an oracle that can label samples as being outside the support of the true data generating distribution. Specifically we develop a new denoising diffusion probabilistic modeling (DDPM) methodology, Gen-neG, that leverages this additional side-information. Our approach builds on generative adversarial networks (GANs) and discriminator guidance in diffusion models to guide the generation process towards the positive support region indicated by the oracle. We empirically establish the utility of Gen-neG in applications including collision avoidance in self-driving simulators and safety-guarded human motion generation. △ Less

Submitted 31 July, 2023; originally announced July 2023.

arXiv:2305.14621 [pdf, other]

Realistically distributing object placements in synthetic training data improves the performance of vision-based object detection models

Authors: Setareh Dabiri, Vasileios Lioutas, Berend Zwartsenberg, Yunpeng Liu, Matthew Niedoba, Xiaoxuan Liang, Dylan Green, Justice Sefas, Jonathan Wilder Lavington, Frank Wood, Adam Scibior

Abstract: When training object detection models on synthetic data, it is important to make the distribution of synthetic data as close as possible to the distribution of real data. We investigate specifically the impact of object placement distribution, keeping all other aspects of synthetic data fixed. Our experiment, training a 3D vehicle detection model in CARLA and testing on KITTI, demonstrates a subst… ▽ More When training object detection models on synthetic data, it is important to make the distribution of synthetic data as close as possible to the distribution of real data. We investigate specifically the impact of object placement distribution, keeping all other aspects of synthetic data fixed. Our experiment, training a 3D vehicle detection model in CARLA and testing on KITTI, demonstrates a substantial improvement resulting from improving the object placement distribution. △ Less

Submitted 23 May, 2023; originally announced May 2023.

arXiv:2305.11856 [pdf, other]

Video Killed the HD-Map: Predicting Multi-Agent Behavior Directly From Aerial Images

Authors: Yunpeng Liu, Vasileios Lioutas, Jonathan Wilder Lavington, Matthew Niedoba, Justice Sefas, Setareh Dabiri, Dylan Green, Xiaoxuan Liang, Berend Zwartsenberg, Adam Ścibior, Frank Wood

Abstract: The development of algorithms that learn multi-agent behavioral models using human demonstrations has led to increasingly realistic simulations in the field of autonomous driving. In general, such models learn to jointly predict trajectories for all controlled agents by exploiting road context information such as drivable lanes obtained from manually annotated high-definition (HD) maps. Recent stu… ▽ More The development of algorithms that learn multi-agent behavioral models using human demonstrations has led to increasingly realistic simulations in the field of autonomous driving. In general, such models learn to jointly predict trajectories for all controlled agents by exploiting road context information such as drivable lanes obtained from manually annotated high-definition (HD) maps. Recent studies show that these models can greatly benefit from increasing the amount of human data available for training. However, the manual annotation of HD maps which is necessary for every new location puts a bottleneck on efficiently scaling up human traffic datasets. We propose an aerial image-based map (AIM) representation that requires minimal annotation and provides rich road context information for traffic agents like pedestrians and vehicles. We evaluate multi-agent trajectory prediction using the AIM by incorporating it into a differentiable driving simulator as an image-texture-based differentiable rendering module. Our results demonstrate competitive multi-agent trajectory prediction performance especially for pedestrians in the scene when using our AIM representation as compared to models trained with rasterized HD maps. △ Less

Submitted 19 September, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

ACM Class: I.2.9; I.4.9

Journal ref: 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC)

arXiv:2303.16187 [pdf, other]

Visual Chain-of-Thought Diffusion Models

Authors: William Harvey, Frank Wood

Abstract: Recent progress with conditional image diffusion models has been stunning, and this holds true whether we are speaking about models conditioned on a text description, a scene layout, or a sketch. Unconditional image diffusion models are also improving but lag behind, as do diffusion models which are conditioned on lower-dimensional features like class labels. We propose to close the gap between co… ▽ More Recent progress with conditional image diffusion models has been stunning, and this holds true whether we are speaking about models conditioned on a text description, a scene layout, or a sketch. Unconditional image diffusion models are also improving but lag behind, as do diffusion models which are conditioned on lower-dimensional features like class labels. We propose to close the gap between conditional and unconditional models using a two-stage sampling procedure. In the first stage we sample an embedding describing the semantic content of the image. In the second stage we sample the image conditioned on this embedding and then discard the embedding. Doing so lets us leverage the power of conditional diffusion models on the unconditional generation task, which we show improves FID by 25-50% compared to standard unconditional generation. △ Less

Submitted 20 June, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

arXiv:2210.12236 [pdf, other]

Uncertain Evidence in Probabilistic Models and Stochastic Simulators

Authors: Andreas Munk, Alexander Mead, Frank Wood

Abstract: We consider the problem of performing Bayesian inference in probabilistic models where observations are accompanied by uncertainty, referred to as "uncertain evidence." We explore how to interpret uncertain evidence, and by extension the importance of proper interpretation as it pertains to inference about latent variables. We consider a recently-proposed method "distributional evidence" as well a… ▽ More We consider the problem of performing Bayesian inference in probabilistic models where observations are accompanied by uncertainty, referred to as "uncertain evidence." We explore how to interpret uncertain evidence, and by extension the importance of proper interpretation as it pertains to inference about latent variables. We consider a recently-proposed method "distributional evidence" as well as revisit two older methods: Jeffrey's rule and virtual evidence. We devise guidelines on how to account for uncertain evidence and we provide new insights, particularly regarding consistency. To showcase the impact of different interpretations of the same uncertain evidence, we carry out experiments in which one interpretation is defined as "correct." We then compare inference results from each different interpretation illustrating the importance of careful consideration of uncertain evidence. △ Less

Submitted 26 January, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

arXiv:2210.11633 [pdf, other]

Graphically Structured Diffusion Models

Authors: Christian Weilbach, William Harvey, Frank Wood

Abstract: We introduce a framework for automatically defining and learning deep generative models with problem-specific structure. We tackle problem domains that are more traditionally solved by algorithms such as sorting, constraint satisfaction for Sudoku, and matrix factorization. Concretely, we train diffusion models with an architecture tailored to the problem specification. This problem specification… ▽ More We introduce a framework for automatically defining and learning deep generative models with problem-specific structure. We tackle problem domains that are more traditionally solved by algorithms such as sorting, constraint satisfaction for Sudoku, and matrix factorization. Concretely, we train diffusion models with an architecture tailored to the problem specification. This problem specification should contain a graphical model describing relationships between variables, and often benefits from explicit representation of subcomputations. Permutation invariances can also be exploited. Across a diverse set of experiments we improve the scaling relationship between problem dimension and our model's performance, in terms of both training time and final accuracy. Our code can be found at https://github.com/plai-group/gsdm. △ Less

Submitted 16 June, 2023; v1 submitted 20 October, 2022; originally announced October 2022.

ACM Class: G.3

arXiv:2208.04987 [pdf, other]

Vehicle Type Specific Waypoint Generation

Authors: Yunpeng Liu, Jonathan Wilder Lavington, Adam Scibior, Frank Wood

Abstract: We develop a generic mechanism for generating vehicle-type specific sequences of waypoints from a probabilistic foundation model of driving behavior. Many foundation behavior models are trained on data that does not include vehicle information, which limits their utility in downstream applications such as planning. Our novel methodology conditionally specializes such a behavior predictive model to… ▽ More We develop a generic mechanism for generating vehicle-type specific sequences of waypoints from a probabilistic foundation model of driving behavior. Many foundation behavior models are trained on data that does not include vehicle information, which limits their utility in downstream applications such as planning. Our novel methodology conditionally specializes such a behavior predictive model to a vehicle-type by utilizing byproducts of the reinforcement learning algorithms used to produce vehicle specific controllers. We show how to compose a vehicle specific value function estimate with a generic probabilistic behavior model to generate vehicle-type specific waypoint sequences that are more likely to be physically plausible then their vehicle-agnostic counterparts. △ Less

Submitted 9 August, 2022; originally announced August 2022.

Journal ref: 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

arXiv:2206.09021 [pdf, other]

Conditional Permutation Invariant Flows

Authors: Berend Zwartsenberg, Adam Ścibior, Matthew Niedoba, Vasileios Lioutas, Yunpeng Liu, Justice Sefas, Setareh Dabiri, Jonathan Wilder Lavington, Trevor Campbell, Frank Wood

Abstract: We present a novel, conditional generative probabilistic model of set-valued data with a tractable log density. This model is a continuous normalizing flow governed by permutation equivariant dynamics. These dynamics are driven by a learnable per-set-element term and pairwise interactions, both parametrized by deep neural networks. We illustrate the utility of this model via applications including… ▽ More We present a novel, conditional generative probabilistic model of set-valued data with a tractable log density. This model is a continuous normalizing flow governed by permutation equivariant dynamics. These dynamics are driven by a learnable per-set-element term and pairwise interactions, both parametrized by deep neural networks. We illustrate the utility of this model via applications including (1) complex traffic scene generation conditioned on visually specified map information, and (2) object bounding box generation conditioned directly on images. We train our model by maximizing the expected likelihood of labeled conditional data under our flow, with the aid of a penalty that ensures the dynamics are smooth and hence efficiently solvable. Our method significantly outperforms non-permutation invariant baselines in terms of log likelihood and domain-specific metrics (offroad, collision, and combined infractions), yielding realistic samples that are difficult to distinguish from real data. △ Less

Submitted 17 June, 2022; originally announced June 2022.

Comments: 20 pages, 10 figures

ACM Class: I.2.0

arXiv:2205.15460 [pdf, other]

Critic Sequential Monte Carlo

Authors: Vasileios Lioutas, Jonathan Wilder Lavington, Justice Sefas, Matthew Niedoba, Yunpeng Liu, Berend Zwartsenberg, Setareh Dabiri, Frank Wood, Adam Scibior

Abstract: We introduce CriticSMC, a new algorithm for planning as inference built from a composition of sequential Monte Carlo with learned Soft-Q function heuristic factors. These heuristic factors, obtained from parametric approximations of the marginal likelihood ahead, more effectively guide SMC towards the desired target distribution, which is particularly helpful for planning in environments with hard… ▽ More We introduce CriticSMC, a new algorithm for planning as inference built from a composition of sequential Monte Carlo with learned Soft-Q function heuristic factors. These heuristic factors, obtained from parametric approximations of the marginal likelihood ahead, more effectively guide SMC towards the desired target distribution, which is particularly helpful for planning in environments with hard constraints placed sparsely in time. Compared with previous work, we modify the placement of such heuristic factors, which allows us to cheaply propose and evaluate large numbers of putative action particles, greatly increasing inference and planning efficiency. CriticSMC is compatible with informative priors, whose density function need not be known, and can be used as a model-free control algorithm. Our experiments on collision avoidance in a high-dimensional simulated driving task show that CriticSMC significantly reduces collision rates at a low computational cost while maintaining realism and diversity of driving behaviors across vehicles and environment scenarios. △ Less

Submitted 21 January, 2023; v1 submitted 30 May, 2022; originally announced May 2022.

Comments: ICLR 2023

arXiv:2205.11495 [pdf, other]

Flexible Diffusion Modeling of Long Videos

Authors: William Harvey, Saeid Naderiparizi, Vaden Masrani, Christian Weilbach, Frank Wood

Abstract: We present a framework for video modeling based on denoising diffusion probabilistic models that produces long-duration video completions in a variety of realistic environments. We introduce a generative model that can at test-time sample any arbitrary subset of video frames conditioned on any other subset and present an architecture adapted for this purpose. Doing so allows us to efficiently comp… ▽ More We present a framework for video modeling based on denoising diffusion probabilistic models that produces long-duration video completions in a variety of realistic environments. We introduce a generative model that can at test-time sample any arbitrary subset of video frames conditioned on any other subset and present an architecture adapted for this purpose. Doing so allows us to efficiently compare and optimize a variety of schedules for the order in which frames in a long video are sampled and use selective sparse and long-range conditioning on previously sampled frames. We demonstrate improved video modeling over prior work on a number of datasets and sample temporally coherent videos over 25 minutes in length. We additionally release a new video modeling dataset and semantically meaningful metrics based on videos generated in the CARLA autonomous driving simulator. △ Less

Submitted 15 December, 2022; v1 submitted 23 May, 2022; originally announced May 2022.

arXiv:2205.09930 [pdf, other]

BayesPCN: A Continually Learnable Predictive Coding Associative Memory

Authors: Jason Yoo, Frank Wood

Abstract: Associative memory plays an important role in human intelligence and its mechanisms have been linked to attention in machine learning. While the machine learning community's interest in associative memories has recently been rekindled, most work has focused on memory recall ($read$) over memory learning ($write$). In this paper, we present BayesPCN, a hierarchical associative memory capable of per… ▽ More Associative memory plays an important role in human intelligence and its mechanisms have been linked to attention in machine learning. While the machine learning community's interest in associative memories has recently been rekindled, most work has focused on memory recall ($read$) over memory learning ($write$). In this paper, we present BayesPCN, a hierarchical associative memory capable of performing continual one-shot memory writes without meta-learning. Moreover, BayesPCN is able to gradually forget past observations ($forget$) to free its memory. Experiments show that BayesPCN can recall corrupted i.i.d. high-dimensional data observed hundreds to a thousand ``timesteps'' ago without a large drop in recall ability compared to the state-of-the-art offline-learned parametric memory models. △ Less

Submitted 11 November, 2022; v1 submitted 19 May, 2022; originally announced May 2022.

arXiv:2202.08587 [pdf, other]

Gradients without Backpropagation

Authors: Atılım Güneş Baydin, Barak A. Pearlmutter, Don Syme, Frank Wood, Philip Torr

Abstract: Using backpropagation to compute gradients of objective functions for optimization has remained a mainstay of machine learning. Backpropagation, or reverse-mode differentiation, is a special case within the general family of automatic differentiation algorithms that also includes the forward mode. We present a method to compute gradients based solely on the directional derivative that one can comp… ▽ More Using backpropagation to compute gradients of objective functions for optimization has remained a mainstay of machine learning. Backpropagation, or reverse-mode differentiation, is a special case within the general family of automatic differentiation algorithms that also includes the forward mode. We present a method to compute gradients based solely on the directional derivative that one can compute exactly and efficiently via the forward mode. We call this formulation the forward gradient, an unbiased estimate of the gradient that can be evaluated in a single forward run of the function, entirely eliminating the need for backpropagation in gradient descent. We demonstrate forward gradient descent in a range of problems, showing substantial savings in computation and enabling training up to twice as fast in some cases. △ Less

Submitted 17 February, 2022; originally announced February 2022.

Comments: 10 pages, 6 figures

MSC Class: 68T07 ACM Class: I.2.6; I.2.5

arXiv:2202.02693 [pdf, other]

Exploration with Multi-Sample Target Values for Distributional Reinforcement Learning

Authors: Michael Teng, Michiel van de Panne, Frank Wood

Abstract: Distributional reinforcement learning (RL) aims to learn a value-network that predicts the full distribution of the returns for a given state, often modeled via a quantile-based critic. This approach has been successfully integrated into common RL methods for continuous control, giving rise to algorithms such as Distributional Soft Actor-Critic (DSAC). In this paper, we introduce multi-sample targ… ▽ More Distributional reinforcement learning (RL) aims to learn a value-network that predicts the full distribution of the returns for a given state, often modeled via a quantile-based critic. This approach has been successfully integrated into common RL methods for continuous control, giving rise to algorithms such as Distributional Soft Actor-Critic (DSAC). In this paper, we introduce multi-sample target values (MTV) for distributional RL, as a principled replacement for single-sample target value estimation, as commonly employed in current practice. The improved distributional estimates further lend themselves to UCB-based exploration. These two ideas are combined to yield our distributional RL algorithm, E2DC (Extra Exploration with Distributional Critics). We evaluate our approach on a range of continuous control tasks and demonstrate state-of-the-art model-free performance on difficult tasks such as Humanoid control. We provide further insight into the method via visualization and analysis of the learned distributions and their evolution during training. △ Less

Submitted 5 February, 2022; originally announced February 2022.

Comments: Submitted to ICML 2022

arXiv:2201.05151 [pdf, other]

Beyond Simple Meta-Learning: Multi-Purpose Models for Multi-Domain, Active and Continual Few-Shot Learning

Authors: Peyman Bateni, Jarred Barber, Raghav Goyal, Vaden Masrani, Jan-Willem van de Meent, Leonid Sigal, Frank Wood

Abstract: Modern deep learning requires large-scale extensively labelled datasets for training. Few-shot learning aims to alleviate this issue by learning effectively from few labelled examples. In previously proposed few-shot visual classifiers, it is assumed that the feature manifold, where classifier decisions are made, has uncorrelated feature dimensions and uniform feature variance. In this work, we fo… ▽ More Modern deep learning requires large-scale extensively labelled datasets for training. Few-shot learning aims to alleviate this issue by learning effectively from few labelled examples. In previously proposed few-shot visual classifiers, it is assumed that the feature manifold, where classifier decisions are made, has uncorrelated feature dimensions and uniform feature variance. In this work, we focus on addressing the limitations arising from this assumption by proposing a variance-sensitive class of models that operates in a low-label regime. The first method, Simple CNAPS, employs a hierarchically regularized Mahalanobis-distance based classifier combined with a state of the art neural adaptive feature extractor to achieve strong performance on Meta-Dataset, mini-ImageNet and tiered-ImageNet benchmarks. We further extend this approach to a transductive learning setting, proposing Transductive CNAPS. This transductive method combines a soft k-means parameter refinement procedure with a two-step task encoder to achieve improved test-time classification accuracy using unlabelled data. Transductive CNAPS achieves state of the art performance on Meta-Dataset. Finally, we explore the use of our methods (Simple and Transductive) for "out of the box" continual and active learning. Extensive experiments on large scale benchmarks illustrate robustness and versatility of this, relatively speaking, simple class of models. All trained model checkpoints and corresponding source codes have been made publicly available. △ Less

Submitted 12 December, 2022; v1 submitted 13 January, 2022; originally announced January 2022.

arXiv:2107.00745 [pdf, other]

q-Paths: Generalizing the Geometric Annealing Path using Power Means

Authors: Vaden Masrani, Rob Brekelmans, Thang Bui, Frank Nielsen, Aram Galstyan, Greg Ver Steeg, Frank Wood

Abstract: Many common machine learning methods involve the geometric annealing path, a sequence of intermediate densities between two distributions of interest constructed using the geometric average. While alternatives such as the moment-averaging path have demonstrated performance gains in some settings, their practical applicability remains limited by exponential family endpoint assumptions and a lack of… ▽ More Many common machine learning methods involve the geometric annealing path, a sequence of intermediate densities between two distributions of interest constructed using the geometric average. While alternatives such as the moment-averaging path have demonstrated performance gains in some settings, their practical applicability remains limited by exponential family endpoint assumptions and a lack of closed form energy function. In this work, we introduce $q$-paths, a family of paths which is derived from a generalized notion of the mean, includes the geometric and arithmetic mixtures as special cases, and admits a simple closed form involving the deformed logarithm function from nonextensive thermodynamics. Following previous analysis of the geometric path, we interpret our $q$-paths as corresponding to a $q$-exponential family of distributions, and provide a variational representation of intermediate densities as minimizing a mixture of $α$-divergences to the endpoints. We show that small deviations away from the geometric path yield empirical gains for Bayesian inference using Sequential Monte Carlo and generative model evaluation using Annealed Importance Sampling. △ Less

Submitted 1 July, 2021; originally announced July 2021.

Comments: arXiv admin note: text overlap with arXiv:2012.07823

arXiv:2106.10314 [pdf, other]

Differentiable Particle Filtering without Modifying the Forward Pass

Authors: Adam Ścibior, Frank Wood

Abstract: Particle filters are not compatible with automatic differentiation due to the presence of discrete resampling steps. While known estimators for the score function, based on Fisher's identity, can be computed using particle filters, up to this point they required manual implementation. In this paper we show that such estimators can be computed using automatic differentiation, after introducing a si… ▽ More Particle filters are not compatible with automatic differentiation due to the presence of discrete resampling steps. While known estimators for the score function, based on Fisher's identity, can be computed using particle filters, up to this point they required manual implementation. In this paper we show that such estimators can be computed using automatic differentiation, after introducing a simple correction to the particle weights. This correction utilizes the stop-gradient operator and does not modify the particle filter operation on the forward pass, while also being cheap and easy to compute. Surprisingly, with the same correction automatic differentiation also produces good estimators for gradients of expectations under the posterior. We can therefore regard our method as a general recipe for making particle filters differentiable. We additionally show that it produces desired estimators for second-order derivatives and how to extend it to further reduce variance at the expense of additional computation. △ Less

Submitted 19 October, 2021; v1 submitted 18 June, 2021; originally announced June 2021.

Comments: 24 pages, 3 figures

arXiv:2104.11212 [pdf, other]

Imagining The Road Ahead: Multi-Agent Trajectory Prediction via Differentiable Simulation

Authors: Adam Scibior, Vasileios Lioutas, Daniele Reda, Peyman Bateni, Frank Wood

Abstract: We develop a deep generative model built on a fully differentiable simulator for multi-agent trajectory prediction. Agents are modeled with conditional recurrent variational neural networks (CVRNNs), which take as input an ego-centric birdview image representing the current state of the world and output an action, consisting of steering and acceleration, which is used to derive the subsequent agen… ▽ More We develop a deep generative model built on a fully differentiable simulator for multi-agent trajectory prediction. Agents are modeled with conditional recurrent variational neural networks (CVRNNs), which take as input an ego-centric birdview image representing the current state of the world and output an action, consisting of steering and acceleration, which is used to derive the subsequent agent state using a kinematic bicycle model. The full simulation state is then differentiably rendered for each agent, initiating the next time step. We achieve state-of-the-art results on the INTERACTION dataset, using standard neural architectures and a standard variational training objective, producing realistic multi-modal predictions without any ad-hoc diversity-inducing losses. We conduct ablation studies to examine individual components of the simulator, finding that both the kinematic bicycle model and the continuous feedback from the birdview image are crucial for achieving this level of performance. We name our model ITRA, for "Imagining the Road Ahead". △ Less

Submitted 22 April, 2021; originally announced April 2021.

Comments: 10 pages, 8 figures

arXiv:2102.12037 [pdf, other]

Conditional Image Generation by Conditioning Variational Auto-Encoders

Authors: William Harvey, Saeid Naderiparizi, Frank Wood

Abstract: We present a conditional variational auto-encoder (VAE) which, to avoid the substantial cost of training from scratch, uses an architecture and training objective capable of leveraging a foundation model in the form of a pretrained unconditional VAE. To train the conditional VAE, we only need to train an artifact to perform amortized inference over the unconditional VAE's latent variables given a… ▽ More We present a conditional variational auto-encoder (VAE) which, to avoid the substantial cost of training from scratch, uses an architecture and training objective capable of leveraging a foundation model in the form of a pretrained unconditional VAE. To train the conditional VAE, we only need to train an artifact to perform amortized inference over the unconditional VAE's latent variables given a conditioning input. We demonstrate our approach on tasks including image inpainting, for which it outperforms state-of-the-art GAN-based approaches at faithfully representing the inherent uncertainty. We conclude by describing a possible application of our inpainting model, in which it is used to perform Bayesian experimental design for the purpose of guiding a sensor. △ Less

Submitted 28 May, 2022; v1 submitted 23 February, 2021; originally announced February 2021.

Comments: 37 pages, 20 figures

arXiv:2012.15566 [pdf, other]

Robust Asymmetric Learning in POMDPs

Authors: Andrew Warrington, J. Wilder Lavington, Adam Ścibior, Mark Schmidt, Frank Wood

Abstract: Policies for partially observed Markov decision processes can be efficiently learned by imitating policies for the corresponding fully observed Markov decision processes. Unfortunately, existing approaches for this kind of imitation learning have a serious flaw: the expert does not know what the trainee cannot see, and so may encourage actions that are sub-optimal, even unsafe, under partial infor… ▽ More Policies for partially observed Markov decision processes can be efficiently learned by imitating policies for the corresponding fully observed Markov decision processes. Unfortunately, existing approaches for this kind of imitation learning have a serious flaw: the expert does not know what the trainee cannot see, and so may encourage actions that are sub-optimal, even unsafe, under partial information. We derive an objective to instead train the expert to maximize the expected reward of the imitating agent policy, and use it to construct an efficient algorithm, adaptive asymmetric DAgger (A2D), that jointly trains the expert and the agent. We show that A2D produces an expert policy that the agent can safely imitate, in turn outperforming policies learned by imitating a fixed expert. △ Less

Submitted 1 July, 2021; v1 submitted 31 December, 2020; originally announced December 2020.

Comments: ICML 2021

arXiv:2012.07823 [pdf, other]

Annealed Importance Sampling with q-Paths

Authors: Rob Brekelmans, Vaden Masrani, Thang Bui, Frank Wood, Aram Galstyan, Greg Ver Steeg, Frank Nielsen

Abstract: Annealed importance sampling (AIS) is the gold standard for estimating partition functions or marginal likelihoods, corresponding to importance sampling over a path of distributions between a tractable base and an unnormalized target. While AIS yields an unbiased estimator for any path, existing literature has been primarily limited to the geometric mixture or moment-averaged paths associated with… ▽ More Annealed importance sampling (AIS) is the gold standard for estimating partition functions or marginal likelihoods, corresponding to importance sampling over a path of distributions between a tractable base and an unnormalized target. While AIS yields an unbiased estimator for any path, existing literature has been primarily limited to the geometric mixture or moment-averaged paths associated with the exponential family and KL divergence. We explore AIS using $q$-paths, which include the geometric path as a special case and are related to the homogeneous power mean, deformed exponential family, and $α$-divergence. △ Less

Submitted 14 December, 2020; originally announced December 2020.

Comments: NeurIPS Workshop on Deep Learning through Information Geometry (Best Paper Award)

Journal ref: Published at UAI 2021 https://arxiv.boxedpaper.com/abs/2107.00745

arXiv:2012.05390 [pdf, other]

Ensemble Squared: A Meta AutoML System

Authors: Jason Yoo, Tony Joseph, Dylan Yung, S. Ali Nasseri, Frank Wood

Abstract: There are currently many barriers that prevent non-experts from exploiting machine learning solutions ranging from the lack of intuition on statistical learning techniques to the trickiness of hyperparameter tuning. Such barriers have led to an explosion of interest in automated machine learning (AutoML), whereby an off-the-shelf system can take care of many of the steps for end-users without the… ▽ More There are currently many barriers that prevent non-experts from exploiting machine learning solutions ranging from the lack of intuition on statistical learning techniques to the trickiness of hyperparameter tuning. Such barriers have led to an explosion of interest in automated machine learning (AutoML), whereby an off-the-shelf system can take care of many of the steps for end-users without the need for expertise in machine learning. This paper presents Ensemble Squared (Ensemble$^2$), an AutoML system that ensembles the results of state-of-the-art open-source AutoML systems. Ensemble$^2$ exploits the diversity of existing AutoML systems by leveraging the differences in their model search space and heuristics. Empirically, we show that diversity of each AutoML system is sufficient to justify ensembling at the AutoML system level. In demonstrating this, we also establish new state-of-the-art AutoML results on the OpenML tabular classification benchmark. △ Less

Submitted 19 June, 2021; v1 submitted 9 December, 2020; originally announced December 2020.

arXiv:2010.15750 [pdf, other]

Gaussian Process Bandit Optimization of the Thermodynamic Variational Objective

Authors: Vu Nguyen, Vaden Masrani, Rob Brekelmans, Michael A. Osborne, Frank Wood

Abstract: Achieving the full promise of the Thermodynamic Variational Objective (TVO), a recently proposed variational lower bound on the log evidence involving a one-dimensional Riemann integral approximation, requires choosing a "schedule" of sorted discretization points. This paper introduces a bespoke Gaussian process bandit optimization method for automatically choosing these points. Our approach not o… ▽ More Achieving the full promise of the Thermodynamic Variational Objective (TVO), a recently proposed variational lower bound on the log evidence involving a one-dimensional Riemann integral approximation, requires choosing a "schedule" of sorted discretization points. This paper introduces a bespoke Gaussian process bandit optimization method for automatically choosing these points. Our approach not only automates their one-time selection, but also dynamically adapts their positions over the course of optimization, leading to improved model learning and inference. We provide theoretical guarantees that our bandit optimization converges to the regret-minimizing choice of integration points. Empirical validation of our algorithm is provided in terms of improved learning and inference in Variational Autoencoders and Sigmoid Belief Networks. △ Less

Submitted 20 November, 2020; v1 submitted 29 October, 2020; originally announced October 2020.

Comments: NeurIPS 2020

arXiv:2010.03753 [pdf, other]

Uncertainty in Neural Processes

Authors: Saeid Naderiparizi, Kenny Chiu, Benjamin Bloem-Reddy, Frank Wood

Abstract: We explore the effects of architecture and training objective choice on amortized posterior predictive inference in probabilistic conditional generative models. We aim this work to be a counterpoint to a recent trend in the literature that stresses achieving good samples when the amount of conditioning data is large. We instead focus our attention on the case where the amount of conditioning data… ▽ More We explore the effects of architecture and training objective choice on amortized posterior predictive inference in probabilistic conditional generative models. We aim this work to be a counterpoint to a recent trend in the literature that stresses achieving good samples when the amount of conditioning data is large. We instead focus our attention on the case where the amount of conditioning data is small. We highlight specific architecture and objective choices that we find lead to qualitative and quantitative improvement to posterior inference in this low data regime. Specifically we explore the effects of choices of pooling operator and variational family on posterior quality in neural processes. Superior posterior predictive samples drawn from our novel neural process architectures are demonstrated via image completion/in-painting experiments. △ Less

Submitted 8 October, 2020; originally announced October 2020.

arXiv:2010.01274 [pdf, other]

Assisting the Adversary to Improve GAN Training

Authors: Andreas Munk, William Harvey, Frank Wood

Abstract: Some of the most popular methods for improving the stability and performance of GANs involve constraining or regularizing the discriminator. In this paper we consider a largely overlooked regularization technique which we refer to as the Adversary's Assistant (AdvAs). We motivate this using a different perspective to that of prior work. Specifically, we consider a common mismatch between theoretic… ▽ More Some of the most popular methods for improving the stability and performance of GANs involve constraining or regularizing the discriminator. In this paper we consider a largely overlooked regularization technique which we refer to as the Adversary's Assistant (AdvAs). We motivate this using a different perspective to that of prior work. Specifically, we consider a common mismatch between theoretical analysis and practice: analysis often assumes that the discriminator reaches its optimum on each iteration. In practice, this is essentially never true, often leading to poor gradient estimates for the generator. To address this, AdvAs is a theoretically motivated penalty imposed on the generator based on the norm of the gradients used to train the discriminator. This encourages the generator to move towards points where the discriminator is optimal. We demonstrate the effect of applying AdvAs to several GAN objectives, datasets and network architectures. The results indicate a reduction in the mismatch between theory and practice and that AdvAs can lead to improvement of GAN training, as measured by FID scores. △ Less

Submitted 8 December, 2020; v1 submitted 3 October, 2020; originally announced October 2020.

arXiv:2007.00642 [pdf, other]

All in the Exponential Family: Bregman Duality in Thermodynamic Variational Inference

Authors: Rob Brekelmans, Vaden Masrani, Frank Wood, Greg Ver Steeg, Aram Galstyan

Abstract: The recently proposed Thermodynamic Variational Objective (TVO) leverages thermodynamic integration to provide a family of variational inference objectives, which both tighten and generalize the ubiquitous Evidence Lower Bound (ELBO). However, the tightness of TVO bounds was not previously known, an expensive grid search was used to choose a "schedule" of intermediate distributions, and model lear… ▽ More The recently proposed Thermodynamic Variational Objective (TVO) leverages thermodynamic integration to provide a family of variational inference objectives, which both tighten and generalize the ubiquitous Evidence Lower Bound (ELBO). However, the tightness of TVO bounds was not previously known, an expensive grid search was used to choose a "schedule" of intermediate distributions, and model learning suffered with ostensibly tighter bounds. In this work, we propose an exponential family interpretation of the geometric mixture curve underlying the TVO and various path sampling methods, which allows us to characterize the gap in TVO likelihood bounds as a sum of KL divergences. We propose to choose intermediate distributions using equal spacing in the moment parameters of our exponential family, which matches grid search performance and allows the schedule to adaptively update over the course of training. Finally, we derive a doubly reparameterized gradient estimator which improves model learning and allows the TVO to benefit from more refined bounds. To further contextualize our contributions, we provide a unified framework for understanding thermodynamic integration and the TVO using Taylor series remainders. △ Less

Submitted 1 July, 2020; originally announced July 2020.

Comments: ICML 2020

arXiv:2007.00155 [pdf, other]

Semi-supervised Sequential Generative Models

Authors: Michael Teng, Tuan Anh Le, Adam Scibior, Frank Wood

Abstract: We introduce a novel objective for training deep generative time-series models with discrete latent variables for which supervision is only sparsely available. This instance of semi-supervised learning is challenging for existing methods, because the exponential number of possible discrete latent configurations results in high variance gradient estimators. We first overcome this problem by extendi… ▽ More We introduce a novel objective for training deep generative time-series models with discrete latent variables for which supervision is only sparsely available. This instance of semi-supervised learning is challenging for existing methods, because the exponential number of possible discrete latent configurations results in high variance gradient estimators. We first overcome this problem by extending the standard semi-supervised generative modeling objective with reweighted wake-sleep. However, we find that this approach still suffers when the frequency of available labels varies between training sequences. Finally, we introduce a unified objective inspired by teacher-forcing and show that this approach is robust to variable length supervision. We call the resulting method caffeinated wake-sleep (CWS) to emphasize its additional dependence on real data. We demonstrate its effectiveness with experiments on MNIST, handwriting, and fruit fly trajectory data. △ Less

Submitted 30 June, 2020; originally announced July 2020.

Comments: Accepted to Uncertainty in Artificial Intelligence 2020

arXiv:2006.12245 [pdf, other]

Enhancing Few-Shot Image Classification with Unlabelled Examples

Authors: Peyman Bateni, Jarred Barber, Jan-Willem van de Meent, Frank Wood

Abstract: We develop a transductive meta-learning method that uses unlabelled instances to improve few-shot image classification performance. Our approach combines a regularized Mahalanobis-distance-based soft k-means clustering procedure with a modified state of the art neural adaptive feature extractor to achieve improved test-time classification accuracy using unlabelled data. We evaluate our method on t… ▽ More We develop a transductive meta-learning method that uses unlabelled instances to improve few-shot image classification performance. Our approach combines a regularized Mahalanobis-distance-based soft k-means clustering procedure with a modified state of the art neural adaptive feature extractor to achieve improved test-time classification accuracy using unlabelled data. We evaluate our method on transductive few-shot learning tasks, in which the goal is to jointly predict labels for query (test) examples given a set of support (training) examples. We achieve state of the art performance on the Meta-Dataset, mini-ImageNet and tiered-ImageNet benchmarks. All trained models and code have been made publicly available at github.com/plai-group/simple-cnaps. △ Less

Submitted 21 October, 2021; v1 submitted 17 June, 2020; originally announced June 2020.

arXiv:2003.13221 [pdf, other]

doi 10.3389/frai.2021.550603

Planning as Inference in Epidemiological Models

Authors: Frank Wood, Andrew Warrington, Saeid Naderiparizi, Christian Weilbach, Vaden Masrani, William Harvey, Adam Scibior, Boyan Beronov, John Grefenstette, Duncan Campbell, Ali Nasseri

Abstract: In this work we demonstrate how to automate parts of the infectious disease-control policy-making process via performing inference in existing epidemiological models. The kind of inference tasks undertaken include computing the posterior distribution over controllable, via direct policy-making choices, simulation model parameters that give rise to acceptable disease progression outcomes. Among oth… ▽ More In this work we demonstrate how to automate parts of the infectious disease-control policy-making process via performing inference in existing epidemiological models. The kind of inference tasks undertaken include computing the posterior distribution over controllable, via direct policy-making choices, simulation model parameters that give rise to acceptable disease progression outcomes. Among other things, we illustrate the use of a probabilistic programming language that automates inference in existing simulators. Neither the full capabilities of this tool for automating inference nor its utility for planning is widely disseminated at the current time. Timely gains in understanding about how such simulation-based models and inference automation tools applied in support of policymaking could lead to less economically damaging policy prescriptions, particularly during the current COVID-19 pandemic. △ Less

Submitted 15 September, 2021; v1 submitted 30 March, 2020; originally announced March 2020.

Comments: Revisions

Journal ref: Front Artif Intell. 2021; 4: 550603

arXiv:2003.12908 [pdf, other]

Coping With Simulators That Don't Always Return

Authors: Andrew Warrington, Saeid Naderiparizi, Frank Wood

Abstract: Deterministic models are approximations of reality that are easy to interpret and often easier to build than stochastic alternatives. Unfortunately, as nature is capricious, observational data can never be fully explained by deterministic models in practice. Observation and process noise need to be added to adapt deterministic models to behave stochastically, such that they are capable of explaini… ▽ More Deterministic models are approximations of reality that are easy to interpret and often easier to build than stochastic alternatives. Unfortunately, as nature is capricious, observational data can never be fully explained by deterministic models in practice. Observation and process noise need to be added to adapt deterministic models to behave stochastically, such that they are capable of explaining and extrapolating from noisy data. We investigate and address computational inefficiencies that arise from adding process noise to deterministic simulators that fail to return for certain inputs; a property we describe as "brittle." We show how to train a conditional normalizing flow to propose perturbations such that the simulator succeeds with high probability, increasing computational efficiency. △ Less

Submitted 28 March, 2020; originally announced March 2020.

Comments: AISTATS 2020 camera ready, version 1.0

arXiv:1912.03432 [pdf, other]

Improved Few-Shot Visual Classification

Authors: Peyman Bateni, Raghav Goyal, Vaden Masrani, Frank Wood, Leonid Sigal

Abstract: Few-shot learning is a fundamental task in computer vision that carries the promise of alleviating the need for exhaustively labeled data. Most few-shot learning approaches to date have focused on progressively more complex neural feature extractors and classifier adaptation strategies, as well as the refinement of the task definition itself. In this paper, we explore the hypothesis that a simple… ▽ More Few-shot learning is a fundamental task in computer vision that carries the promise of alleviating the need for exhaustively labeled data. Most few-shot learning approaches to date have focused on progressively more complex neural feature extractors and classifier adaptation strategies, as well as the refinement of the task definition itself. In this paper, we explore the hypothesis that a simple class-covariance-based distance metric, namely the Mahalanobis distance, adopted into a state of the art few-shot learning approach (CNAPS) can, in and of itself, lead to a significant performance improvement. We also discover that it is possible to learn adaptive feature extractors that allow useful estimation of the high dimensional feature covariances required by this metric from surprisingly few samples. The result of our work is a new "Simple CNAPS" architecture which has up to 9.2% fewer trainable parameters than CNAPS and performs up to 6.1% better than state of the art on the standard few-shot image classification benchmark dataset. △ Less

Submitted 11 June, 2020; v1 submitted 6 December, 2019; originally announced December 2019.

arXiv:1910.11961 [pdf, other]

Attention for Inference Compilation

Authors: William Harvey, Andreas Munk, Atılım Güneş Baydin, Alexander Bergholm, Frank Wood

Abstract: We present a new approach to automatic amortized inference in universal probabilistic programs which improves performance compared to current methods. Our approach is a variation of inference compilation (IC) which leverages deep neural networks to approximate a posterior distribution over latent variables in a probabilistic program. A challenge with existing IC network architectures is that they… ▽ More We present a new approach to automatic amortized inference in universal probabilistic programs which improves performance compared to current methods. Our approach is a variation of inference compilation (IC) which leverages deep neural networks to approximate a posterior distribution over latent variables in a probabilistic program. A challenge with existing IC network architectures is that they can fail to model long-range dependencies between latent variables. To address this, we introduce an attention mechanism that attends to the most salient variables previously sampled in the execution of a probabilistic program. We demonstrate that the addition of attention allows the proposal distributions to better match the true posterior, enhancing inference about latent variables in simulators. △ Less

Submitted 25 October, 2019; originally announced October 2019.

arXiv:1910.11950 [pdf, other]

Probabilistic Surrogate Networks for Simulators with Unbounded Randomness

Authors: Andreas Munk, Berend Zwartsenberg, Adam Ścibior, Atılım Güneş Baydin, Andrew Stewart, Goran Fernlund, Anoush Poursartip, Frank Wood

Abstract: We present a framework for automatically structuring and training fast, approximate, deep neural surrogates of stochastic simulators. Unlike traditional approaches to surrogate modeling, our surrogates retain the interpretable structure and control flow of the reference simulator. Our surrogates target stochastic simulators where the number of random variables itself can be stochastic and potentia… ▽ More We present a framework for automatically structuring and training fast, approximate, deep neural surrogates of stochastic simulators. Unlike traditional approaches to surrogate modeling, our surrogates retain the interpretable structure and control flow of the reference simulator. Our surrogates target stochastic simulators where the number of random variables itself can be stochastic and potentially unbounded. Our framework further enables an automatic replacement of the reference simulator with the surrogate when undertaking amortized inference. The fidelity and speed of our surrogates allow for both faster stochastic simulation and accurate and substantially faster posterior inference. Using an illustrative yet non-trivial example we show our surrogates' ability to accurately model a probabilistic program with an unbounded number of random variables. We then proceed with an example that shows our surrogates are able to accurately model a complex structure like an unbounded stack in a program synthesis example. We further demonstrate how our surrogate modeling technique makes amortized inference in complex black-box simulators an order of magnitude faster. Specifically, we do simulator-based materials quality testing, inferring safety-critical latent internal temperature profiles of composite materials undergoing curing. △ Less

Submitted 20 January, 2023; v1 submitted 25 October, 2019; originally announced October 2019.

arXiv:1910.09056 [pdf, other]

Amortized Rejection Sampling in Universal Probabilistic Programming

Authors: Saeid Naderiparizi, Adam Ścibior, Andreas Munk, Mehrdad Ghadiri, Atılım Güneş Baydin, Bradley Gram-Hansen, Christian Schroeder de Witt, Robert Zinkov, Philip H. S. Torr, Tom Rainforth, Yee Whye Teh, Frank Wood

Abstract: Naive approaches to amortized inference in probabilistic programs with unbounded loops can produce estimators with infinite variance. This is particularly true of importance sampling inference in programs that explicitly include rejection sampling as part of the user-programmed generative procedure. In this paper we develop a new and efficient amortized importance sampling estimator. We prove fini… ▽ More Naive approaches to amortized inference in probabilistic programs with unbounded loops can produce estimators with infinite variance. This is particularly true of importance sampling inference in programs that explicitly include rejection sampling as part of the user-programmed generative procedure. In this paper we develop a new and efficient amortized importance sampling estimator. We prove finite variance of our estimator and empirically demonstrate our method's correctness and efficiency compared to existing alternatives on generative programs containing rejection sampling loops and discuss how to implement our method in a generic probabilistic programming framework. △ Less

Submitted 28 March, 2022; v1 submitted 20 October, 2019; originally announced October 2019.

Comments: AISTATS 2022 camera ready

arXiv:1909.09721

Safer End-to-End Autonomous Driving via Conditional Imitation Learning and Command Augmentation

Authors: Renhao Wang, Adam Scibior, Frank Wood

Abstract: Imitation learning is a promising approach to end-to-end training of autonomous vehicle controllers. Typically the driving process with such approaches is entirely automatic and black-box, although in practice it is desirable to control the vehicle through high-level commands, such as telling it which way to go at an intersection. In existing work this has been accomplished by the application of a… ▽ More Imitation learning is a promising approach to end-to-end training of autonomous vehicle controllers. Typically the driving process with such approaches is entirely automatic and black-box, although in practice it is desirable to control the vehicle through high-level commands, such as telling it which way to go at an intersection. In existing work this has been accomplished by the application of a branched neural architecture, since directly providing the command as an additional input to the controller often results in the command being ignored. In this work we overcome this limitation by learning a disentangled probabilistic latent variable model that generates the steering commands. We achieve faithful command-conditional generation without using a branched architecture and demonstrate improved stability of the controller, applying only a variational objective without any domain-specific adjustments. On top of that, we extend our model with an additional latent variable and augment the dataset to train a controller that is robust to unsafe commands, such as asking it to turn into a wall. The main contribution of this work is a recipe for building controllable imitation driving agents that improves upon multiple aspects of the current state of the art relating to robustness and interpretability. △ Less

Submitted 20 November, 2020; v1 submitted 20 September, 2019; originally announced September 2019.

Comments: Architecture fails to sufficiently disentangle representations and obey varied commands

Showing 1–50 of 104 results for author: Wood, F