-
Positional Diffusion: Ordering Unordered Sets with Diffusion Probabilistic Models
Authors:
Francesco Giuliari,
Gianluca Scarpellini,
Stuart James,
Yiming Wang,
Alessio Del Bue
Abstract:
Positional reasoning is the process of ordering unsorted parts contained in a set into a consistent structure. We present Positional Diffusion, a plug-and-play graph formulation with Diffusion Probabilistic Models to address positional reasoning. We use the forward process to map elements' positions in a set to random positions in a continuous space. Positional Diffusion learns to reverse the nois…
▽ More
Positional reasoning is the process of ordering unsorted parts contained in a set into a consistent structure. We present Positional Diffusion, a plug-and-play graph formulation with Diffusion Probabilistic Models to address positional reasoning. We use the forward process to map elements' positions in a set to random positions in a continuous space. Positional Diffusion learns to reverse the noising process and recover the original positions through an Attention-based Graph Neural Network. We conduct extensive experiments with benchmark datasets including two puzzle datasets, three sentence ordering datasets, and one visual storytelling dataset, demonstrating that our method outperforms long-lasting research on puzzle solving with up to +18% compared to the second-best deep learning method, and performs on par against the state-of-the-art methods on sentence ordering and visual storytelling. Our work highlights the suitability of diffusion models for ordering problems and proposes a novel formulation and method for solving various ordering tasks. Project website at https://iit-pavis.github.io/Positional_Diffusion/
△ Less
Submitted 20 March, 2023;
originally announced March 2023.
-
Hierarchical clustering with OWA-based linkages, the Lance-Williams formula, and dendrogram inversions
Authors:
Marek Gagolewski,
Anna Cena,
Simon James,
Gleb Beliakov
Abstract:
Agglomerative hierarchical clustering based on Ordered Weighted Averaging (OWA) operators not only generalises the single, complete, and average linkages, but also includes intercluster distances based on a few nearest or farthest neighbours, trimmed and winsorised means of pairwise point similarities, amongst many others. We explore the relationships between the famous Lance-Williams update formu…
▽ More
Agglomerative hierarchical clustering based on Ordered Weighted Averaging (OWA) operators not only generalises the single, complete, and average linkages, but also includes intercluster distances based on a few nearest or farthest neighbours, trimmed and winsorised means of pairwise point similarities, amongst many others. We explore the relationships between the famous Lance-Williams update formula and the extended OWA-based linkages with weights generated via infinite coefficient sequences. Furthermore, we provide some conditions for the weight generators to guarantee the resulting dendrograms to be free from unaesthetic inversions.
△ Less
Submitted 25 October, 2023; v1 submitted 9 March, 2023;
originally announced March 2023.
-
Multi-View Masked World Models for Visual Robotic Manipulation
Authors:
Younggyo Seo,
Junsu Kim,
Stephen James,
Kimin Lee,
Jinwoo Shin,
Pieter Abbeel
Abstract:
Visual robotic manipulation research and applications often use multiple cameras, or views, to better perceive the world. How else can we utilize the richness of multi-view data? In this paper, we investigate how to learn good representations with multi-view data and utilize them for visual robotic manipulation. Specifically, we train a multi-view masked autoencoder which reconstructs pixels of ra…
▽ More
Visual robotic manipulation research and applications often use multiple cameras, or views, to better perceive the world. How else can we utilize the richness of multi-view data? In this paper, we investigate how to learn good representations with multi-view data and utilize them for visual robotic manipulation. Specifically, we train a multi-view masked autoencoder which reconstructs pixels of randomly masked viewpoints and then learn a world model operating on the representations from the autoencoder. We demonstrate the effectiveness of our method in a range of scenarios, including multi-view control and single-view control with auxiliary cameras for representation learning. We also show that the multi-view masked autoencoder trained with multiple randomized viewpoints enables training a policy with strong viewpoint randomization and transferring the policy to solve real-robot tasks without camera calibration and an adaptation procedure. Video demonstrations are available at: https://sites.google.com/view/mv-mwm.
△ Less
Submitted 31 May, 2023; v1 submitted 5 February, 2023;
originally announced February 2023.
-
Hierarchically Composing Level Generators for the Creation of Complex Structures
Authors:
Michael Beukman,
Manuel Fokam,
Marcel Kruger,
Guy Axelrod,
Muhammad Nasir,
Branden Ingram,
Benjamin Rosman,
Steven James
Abstract:
Procedural content generation (PCG) is a growing field, with numerous applications in the video game industry and great potential to help create better games at a fraction of the cost of manual creation. However, much of the work in PCG is focused on generating relatively straightforward levels in simple games, as it is challenging to design an optimisable objective function for complex settings.…
▽ More
Procedural content generation (PCG) is a growing field, with numerous applications in the video game industry and great potential to help create better games at a fraction of the cost of manual creation. However, much of the work in PCG is focused on generating relatively straightforward levels in simple games, as it is challenging to design an optimisable objective function for complex settings. This limits the applicability of PCG to more complex and modern titles, hindering its adoption in industry. Our work aims to address this limitation by introducing a compositional level generation method that recursively composes simple low-level generators to construct large and complex creations. This approach allows for easily-optimisable objectives and the ability to design a complex structure in an interpretable way by referencing lower-level components. We empirically demonstrate that our method outperforms a non-compositional baseline by more accurately satisfying a designer's functional requirements in several tasks. Finally, we provide a qualitative showcase (in Minecraft) illustrating the large and complex, but still coherent, structures that were generated using simple base generators.
△ Less
Submitted 19 July, 2023; v1 submitted 3 February, 2023;
originally announced February 2023.
-
Background Determination for the LUX-ZEPLIN (LZ) Dark Matter Experiment
Authors:
J. Aalbers,
D. S. Akerib,
A. K. Al Musalhi,
F. Alder,
S. K. Alsum,
C. S. Amarasinghe,
A. Ames,
T. J. Anderson,
N. Angelides,
H. M. Araújo,
J. E. Armstrong,
M. Arthurs,
A. Baker,
J. Bang,
J. W. Bargemann,
A. Baxter,
K. Beattie,
P. Beltrame,
E. P. Bernard,
A. Bhatti,
A. Biekert,
T. P. Biesiadzinski,
H. J. Birch,
G. M. Blockinger,
B. Boxer
, et al. (178 additional authors not shown)
Abstract:
The LUX-ZEPLIN experiment recently reported limits on WIMP-nucleus interactions from its initial science run, down to $9.2\times10^{-48}$ cm$^2$ for the spin-independent interaction of a 36 GeV/c$^2$ WIMP at 90% confidence level. In this paper, we present a comprehensive analysis of the backgrounds important for this result and for other upcoming physics analyses, including neutrinoless double-bet…
▽ More
The LUX-ZEPLIN experiment recently reported limits on WIMP-nucleus interactions from its initial science run, down to $9.2\times10^{-48}$ cm$^2$ for the spin-independent interaction of a 36 GeV/c$^2$ WIMP at 90% confidence level. In this paper, we present a comprehensive analysis of the backgrounds important for this result and for other upcoming physics analyses, including neutrinoless double-beta decay searches and effective field theory interpretations of LUX-ZEPLIN data. We confirm that the in-situ determinations of bulk and fixed radioactive backgrounds are consistent with expectations from the ex-situ assays. The observed background rate after WIMP search criteria were applied was $(6.3\pm0.5)\times10^{-5}$ events/keV$_{ee}$/kg/day in the low-energy region, approximately 60 times lower than the equivalent rate reported by the LUX experiment.
△ Less
Submitted 17 July, 2023; v1 submitted 30 November, 2022;
originally announced November 2022.
-
StereoPose: Category-Level 6D Transparent Object Pose Estimation from Stereo Images via Back-View NOCS
Authors:
Kai Chen,
Stephen James,
Congying Sui,
Yun-Hui Liu,
Pieter Abbeel,
Qi Dou
Abstract:
Most existing methods for category-level pose estimation rely on object point clouds. However, when considering transparent objects, depth cameras are usually not able to capture meaningful data, resulting in point clouds with severe artifacts. Without a high-quality point cloud, existing methods are not applicable to challenging transparent objects. To tackle this problem, we present StereoPose,…
▽ More
Most existing methods for category-level pose estimation rely on object point clouds. However, when considering transparent objects, depth cameras are usually not able to capture meaningful data, resulting in point clouds with severe artifacts. Without a high-quality point cloud, existing methods are not applicable to challenging transparent objects. To tackle this problem, we present StereoPose, a novel stereo image framework for category-level object pose estimation, ideally suited for transparent objects. For a robust estimation from pure stereo images, we develop a pipeline that decouples category-level pose estimation into object size estimation, initial pose estimation, and pose refinement. StereoPose then estimates object pose based on representation in the normalized object coordinate space~(NOCS). To address the issue of image content aliasing, we further define a back-view NOCS map for the transparent object. The back-view NOCS aims to reduce the network learning ambiguity caused by content aliasing, and leverage informative cues on the back of the transparent object for more accurate pose estimation. To further improve the performance of the stereo framework, StereoPose is equipped with a parallax attention module for stereo feature fusion and an epipolar loss for improving the stereo-view consistency of network predictions. Extensive experiments on the public TOD dataset demonstrate the superiority of the proposed StereoPose framework for category-level 6D transparent object pose estimation.
△ Less
Submitted 3 November, 2022;
originally announced November 2022.
-
Sim-to-Real via Sim-to-Seg: End-to-end Off-road Autonomous Driving Without Real Data
Authors:
John So,
Amber Xie,
Sunggoo Jung,
Jeffrey Edlund,
Rohan Thakker,
Ali Agha-mohammadi,
Pieter Abbeel,
Stephen James
Abstract:
Autonomous driving is complex, requiring sophisticated 3D scene understanding, localization, mapping, and control. Rather than explicitly modelling and fusing each of these components, we instead consider an end-to-end approach via reinforcement learning (RL). However, collecting exploration driving data in the real world is impractical and dangerous. While training in simulation and deploying vis…
▽ More
Autonomous driving is complex, requiring sophisticated 3D scene understanding, localization, mapping, and control. Rather than explicitly modelling and fusing each of these components, we instead consider an end-to-end approach via reinforcement learning (RL). However, collecting exploration driving data in the real world is impractical and dangerous. While training in simulation and deploying visual sim-to-real techniques has worked well for robot manipulation, deploying beyond controlled workspace viewpoints remains a challenge. In this paper, we address this challenge by presenting Sim2Seg, a re-imagining of RCAN that crosses the visual reality gap for off-road autonomous driving, without using any real-world data. This is done by learning to translate randomized simulation images into simulated segmentation and depth maps, subsequently enabling real-world images to also be translated. This allows us to train an end-to-end RL policy in simulation, and directly deploy in the real-world. Our approach, which can be trained in 48 hours on 1 GPU, can perform equally as well as a classical perception and control stack that took thousands of engineering hours over several months to build. We hope this work motivates future end-to-end autonomous driving research.
△ Less
Submitted 25 October, 2022;
originally announced October 2022.
-
Augmentative Topology Agents For Open-Ended Learning
Authors:
Muhammad Umair Nasir,
Michael Beukman,
Steven James,
Christopher Wesley Cleghorn
Abstract:
In this work, we tackle the problem of open-ended learning by introducing a method that simultaneously evolves agents and increasingly challenging environments. Unlike previous open-ended approaches that optimize agents using a fixed neural network topology, we hypothesize that generalization can be improved by allowing agents' controllers to become more complex as they encounter more difficult en…
▽ More
In this work, we tackle the problem of open-ended learning by introducing a method that simultaneously evolves agents and increasingly challenging environments. Unlike previous open-ended approaches that optimize agents using a fixed neural network topology, we hypothesize that generalization can be improved by allowing agents' controllers to become more complex as they encounter more difficult environments. Our method, Augmentative Topology EPOET (ATEP), extends the Enhanced Paired Open-Ended Trailblazer (EPOET) algorithm by allowing agents to evolve their own neural network structures over time, adding complexity and capacity as necessary. Empirical results demonstrate that ATEP results in general agents capable of solving more environments than a fixed-topology baseline. We also investigate mechanisms for transferring agents between environments and find that a species-based approach further improves the performance and generalization of agents.
△ Less
Submitted 11 October, 2023; v1 submitted 20 October, 2022;
originally announced October 2022.
-
Real-World Robot Learning with Masked Visual Pre-training
Authors:
Ilija Radosavovic,
Tete Xiao,
Stephen James,
Pieter Abbeel,
Jitendra Malik,
Trevor Darrell
Abstract:
In this work, we explore self-supervised visual pre-training on images from diverse, in-the-wild videos for real-world robotic tasks. Like prior work, our visual representations are pre-trained via a masked autoencoder (MAE), frozen, and then passed into a learnable control module. Unlike prior work, we show that the pre-trained representations are effective across a range of real-world robotic ta…
▽ More
In this work, we explore self-supervised visual pre-training on images from diverse, in-the-wild videos for real-world robotic tasks. Like prior work, our visual representations are pre-trained via a masked autoencoder (MAE), frozen, and then passed into a learnable control module. Unlike prior work, we show that the pre-trained representations are effective across a range of real-world robotic tasks and embodiments. We find that our encoder consistently outperforms CLIP (up to 75%), supervised ImageNet pre-training (up to 81%), and training from scratch (up to 81%). Finally, we train a 307M parameter vision transformer on a massive collection of 4.5M images from the Internet and egocentric videos, and demonstrate clearly the benefits of scaling visual pre-training for robot learning.
△ Less
Submitted 6 October, 2022;
originally announced October 2022.
-
Temporally Consistent Transformers for Video Generation
Authors:
Wilson Yan,
Danijar Hafner,
Stephen James,
Pieter Abbeel
Abstract:
To generate accurate videos, algorithms have to understand the spatial and temporal dependencies in the world. Current algorithms enable accurate predictions over short horizons but tend to suffer from temporal inconsistencies. When generated content goes out of view and is later revisited, the model invents different content instead. Despite this severe limitation, no established benchmarks on co…
▽ More
To generate accurate videos, algorithms have to understand the spatial and temporal dependencies in the world. Current algorithms enable accurate predictions over short horizons but tend to suffer from temporal inconsistencies. When generated content goes out of view and is later revisited, the model invents different content instead. Despite this severe limitation, no established benchmarks on complex data exist for rigorously evaluating video generation with long temporal dependencies. In this paper, we curate 3 challenging video datasets with long-range dependencies by rendering walks through 3D scenes of procedural mazes, Minecraft worlds, and indoor scans. We perform a comprehensive evaluation of current models and observe their limitations in temporal consistency. Moreover, we introduce the Temporally Consistent Transformer (TECO), a generative model that substantially improves long-term consistency while also reducing sampling time. By compressing its input sequence into fewer embeddings, applying a temporal transformer, and expanding back using a spatial MaskGit, TECO outperforms existing models across many metrics. Videos are available on the website: https://wilson1yan.github.io/teco
△ Less
Submitted 31 May, 2023; v1 submitted 5 October, 2022;
originally announced October 2022.
-
HARP: Autoregressive Latent Video Prediction with High-Fidelity Image Generator
Authors:
Younggyo Seo,
Kimin Lee,
Fangchen Liu,
Stephen James,
Pieter Abbeel
Abstract:
Video prediction is an important yet challenging problem; burdened with the tasks of generating future frames and learning environment dynamics. Recently, autoregressive latent video models have proved to be a powerful video prediction tool, by separating the video prediction into two sub-problems: pre-training an image generator model, followed by learning an autoregressive prediction model in th…
▽ More
Video prediction is an important yet challenging problem; burdened with the tasks of generating future frames and learning environment dynamics. Recently, autoregressive latent video models have proved to be a powerful video prediction tool, by separating the video prediction into two sub-problems: pre-training an image generator model, followed by learning an autoregressive prediction model in the latent space of the image generator. However, successfully generating high-fidelity and high-resolution videos has yet to be seen. In this work, we investigate how to train an autoregressive latent video prediction model capable of predicting high-fidelity future frames with minimal modification to existing models, and produce high-resolution (256x256) videos. Specifically, we scale up prior models by employing a high-fidelity image generator (VQ-GAN) with a causal transformer model, and introduce additional techniques of top-k sampling and data augmentation to further improve video prediction quality. Despite the simplicity, the proposed method achieves competitive performance to state-of-the-art approaches on standard video prediction benchmarks with fewer parameters, and enables high-resolution video prediction on complex and large-scale datasets. Videos are available at https://sites.google.com/view/harp-videos/home.
△ Less
Submitted 15 September, 2022;
originally announced September 2022.
-
Geolocation of Cultural Heritage using Multi-View Knowledge Graph Embedding
Authors:
Hebatallah A. Mohamed,
Sebastiano Vascon,
Feliks Hibraj,
Stuart James,
Diego Pilutti,
Alessio Del Bue,
Marcello Pelillo
Abstract:
Knowledge Graphs (KGs) have proven to be a reliable way of structuring data. They can provide a rich source of contextual information about cultural heritage collections. However, cultural heritage KGs are far from being complete. They are often missing important attributes such as geographical location, especially for sculptures and mobile or indoor entities such as paintings. In this paper, we f…
▽ More
Knowledge Graphs (KGs) have proven to be a reliable way of structuring data. They can provide a rich source of contextual information about cultural heritage collections. However, cultural heritage KGs are far from being complete. They are often missing important attributes such as geographical location, especially for sculptures and mobile or indoor entities such as paintings. In this paper, we first present a framework for ingesting knowledge about tangible cultural heritage entities from various data sources and their connected multi-hop knowledge into a geolocalized KG. Secondly, we propose a multi-view learning model for estimating the relative distance between a given pair of cultural heritage entities, based on the geographical as well as the knowledge connections of the entities.
△ Less
Submitted 8 September, 2022;
originally announced September 2022.
-
Combining Evolutionary Search with Behaviour Cloning for Procedurally Generated Content
Authors:
Nicholas Muir,
Steven James
Abstract:
In this work, we consider the problem of procedural content generation for video game levels. Prior approaches have relied on evolutionary search (ES) methods capable of generating diverse levels, but this generation procedure is slow, which is problematic in real-time settings. Reinforcement learning (RL) has also been proposed to tackle the same problem, and while level generation is fast, train…
▽ More
In this work, we consider the problem of procedural content generation for video game levels. Prior approaches have relied on evolutionary search (ES) methods capable of generating diverse levels, but this generation procedure is slow, which is problematic in real-time settings. Reinforcement learning (RL) has also been proposed to tackle the same problem, and while level generation is fast, training time can be prohibitively expensive. We propose a framework to tackle the procedural content generation problem that combines the best of ES and RL. In particular, our approach first uses ES to generate a sequence of levels evolved over time, and then uses behaviour cloning to distil these levels into a policy, which can then be queried to produce new levels quickly. We apply our approach to a maze game and Super Mario Bros, with our results indicating that our approach does in fact decrease the time required for level generation, especially when an increasing number of valid levels are required.
△ Less
Submitted 29 July, 2022;
originally announced July 2022.
-
PoserNet: Refining Relative Camera Poses Exploiting Object Detections
Authors:
Matteo Taiana,
Matteo Toso,
Stuart James,
Alessio Del Bue
Abstract:
The estimation of the camera poses associated with a set of images commonly relies on feature matches between the images. In contrast, we are the first to address this challenge by using objectness regions to guide the pose estimation problem rather than explicit semantic object detections. We propose Pose Refiner Network (PoserNet) a light-weight Graph Neural Network to refine the approximate pai…
▽ More
The estimation of the camera poses associated with a set of images commonly relies on feature matches between the images. In contrast, we are the first to address this challenge by using objectness regions to guide the pose estimation problem rather than explicit semantic object detections. We propose Pose Refiner Network (PoserNet) a light-weight Graph Neural Network to refine the approximate pair-wise relative camera poses. PoserNet exploits associations between the objectness regions - concisely expressed as bounding boxes - across multiple views to globally refine sparsely connected view graphs. We evaluate on the 7-Scenes dataset across varied sizes of graphs and show how this process can be beneficial to optimisation-based Motion Averaging algorithms improving the median error on the rotation by 62 degrees with respect to the initial estimates obtained based on bounding boxes. Code and data are available at https://github.com/IIT-PAVIS/PoserNet.
△ Less
Submitted 21 July, 2022; v1 submitted 19 July, 2022;
originally announced July 2022.
-
GANzzle: Reframing jigsaw puzzle solving as a retrieval task using a generative mental image
Authors:
Davide Talon,
Alessio Del Bue,
Stuart James
Abstract:
Puzzle solving is a combinatorial challenge due to the difficulty of matching adjacent pieces. Instead, we infer a mental image from all pieces, which a given piece can then be matched against avoiding the combinatorial explosion. Exploiting advancements in Generative Adversarial methods, we learn how to reconstruct the image given a set of unordered pieces, allowing the model to learn a joint emb…
▽ More
Puzzle solving is a combinatorial challenge due to the difficulty of matching adjacent pieces. Instead, we infer a mental image from all pieces, which a given piece can then be matched against avoiding the combinatorial explosion. Exploiting advancements in Generative Adversarial methods, we learn how to reconstruct the image given a set of unordered pieces, allowing the model to learn a joint embedding space to match an encoding of each piece to the cropped layer of the generator. Therefore we frame the problem as a R@1 retrieval task, and then solve the linear assignment using differentiable Hungarian attention, making the process end-to-end. In doing so our model is puzzle size agnostic, in contrast to prior deep learning methods which are single size. We evaluate on two new large-scale datasets, where our model is on par with deep learning methods, while generalizing to multiple puzzle sizes.
△ Less
Submitted 12 July, 2022;
originally announced July 2022.
-
First Dark Matter Search Results from the LUX-ZEPLIN (LZ) Experiment
Authors:
J. Aalbers,
D. S. Akerib,
C. W. Akerlof,
A. K. Al Musalhi,
F. Alder,
A. Alqahtani,
S. K. Alsum,
C. S. Amarasinghe,
A. Ames,
T. J. Anderson,
N. Angelides,
H. M. Araújo,
J. E. Armstrong,
M. Arthurs,
S. Azadi,
A. J. Bailey,
A. Baker,
J. Balajthy,
S. Balashov,
J. Bang,
J. W. Bargemann,
M. J. Barry,
J. Barthel,
D. Bauer,
A. Baxter
, et al. (322 additional authors not shown)
Abstract:
The LUX-ZEPLIN experiment is a dark matter detector centered on a dual-phase xenon time projection chamber operating at the Sanford Underground Research Facility in Lead, South Dakota, USA. This Letter reports results from LUX-ZEPLIN's first search for weakly interacting massive particles (WIMPs) with an exposure of 60~live days using a fiducial mass of 5.5 t. A profile-likelihood ratio analysis s…
▽ More
The LUX-ZEPLIN experiment is a dark matter detector centered on a dual-phase xenon time projection chamber operating at the Sanford Underground Research Facility in Lead, South Dakota, USA. This Letter reports results from LUX-ZEPLIN's first search for weakly interacting massive particles (WIMPs) with an exposure of 60~live days using a fiducial mass of 5.5 t. A profile-likelihood ratio analysis shows the data to be consistent with a background-only hypothesis, setting new limits on spin-independent WIMP-nucleon, spin-dependent WIMP-neutron, and spin-dependent WIMP-proton cross sections for WIMP masses above 9 GeV/c$^2$. The most stringent limit is set for spin-independent scattering at 36 GeV/c$^2$, rejecting cross sections above 9.2$\times 10^{-48}$ cm$^2$ at the 90% confidence level.
△ Less
Submitted 2 August, 2023; v1 submitted 8 July, 2022;
originally announced July 2022.
-
Masked World Models for Visual Control
Authors:
Younggyo Seo,
Danijar Hafner,
Hao Liu,
Fangchen Liu,
Stephen James,
Kimin Lee,
Pieter Abbeel
Abstract:
Visual model-based reinforcement learning (RL) has the potential to enable sample-efficient robot learning from visual observations. Yet the current approaches typically train a single model end-to-end for learning both visual representations and dynamics, making it difficult to accurately model the interaction between robots and small objects. In this work, we introduce a visual model-based RL fr…
▽ More
Visual model-based reinforcement learning (RL) has the potential to enable sample-efficient robot learning from visual observations. Yet the current approaches typically train a single model end-to-end for learning both visual representations and dynamics, making it difficult to accurately model the interaction between robots and small objects. In this work, we introduce a visual model-based RL framework that decouples visual representation learning and dynamics learning. Specifically, we train an autoencoder with convolutional layers and vision transformers (ViT) to reconstruct pixels given masked convolutional features, and learn a latent dynamics model that operates on the representations from the autoencoder. Moreover, to encode task-relevant information, we introduce an auxiliary reward prediction objective for the autoencoder. We continually update both autoencoder and dynamics model using online samples collected from environment interaction. We demonstrate that our decoupling approach achieves state-of-the-art performance on a variety of visual robotic tasks from Meta-world and RLBench, e.g., we achieve 81.7% success rate on 50 visual robotic manipulation tasks from Meta-world, while the baseline achieves 67.9%. Code is available on the project website: https://sites.google.com/view/mwm-rl.
△ Less
Submitted 27 May, 2023; v1 submitted 28 June, 2022;
originally announced June 2022.
-
World Value Functions: Knowledge Representation for Learning and Planning
Authors:
Geraud Nangue Tasse,
Benjamin Rosman,
Steven James
Abstract:
We propose world value functions (WVFs), a type of goal-oriented general value function that represents how to solve not just a given task, but any other goal-reaching task in an agent's environment. This is achieved by equipping an agent with an internal goal space defined as all the world states where it experiences a terminal transition. The agent can then modify the standard task rewards to de…
▽ More
We propose world value functions (WVFs), a type of goal-oriented general value function that represents how to solve not just a given task, but any other goal-reaching task in an agent's environment. This is achieved by equipping an agent with an internal goal space defined as all the world states where it experiences a terminal transition. The agent can then modify the standard task rewards to define its own reward function, which provably drives it to learn how to achieve all reachable internal goals, and the value of doing so in the current task. We demonstrate two key benefits of WVFs in the context of learning and planning. In particular, given a learned WVF, an agent can compute the optimal policy in a new task by simply estimating the task's reward function. Furthermore, we show that WVFs also implicitly encode the transition dynamics of the environment, and so can be used to perform planning. Experimental results show that WVFs can be learned faster than regular value functions, while their ability to infer the environment's dynamics can be used to integrate learning and planning methods to further improve sample efficiency.
△ Less
Submitted 23 June, 2022;
originally announced June 2022.
-
Patch-based Object-centric Transformers for Efficient Video Generation
Authors:
Wilson Yan,
Ryo Okumura,
Stephen James,
Pieter Abbeel
Abstract:
In this work, we present Patch-based Object-centric Video Transformer (POVT), a novel region-based video generation architecture that leverages object-centric information to efficiently model temporal dynamics in videos. We build upon prior work in video prediction via an autoregressive transformer over the discrete latent space of compressed videos, with an added modification to model object-cent…
▽ More
In this work, we present Patch-based Object-centric Video Transformer (POVT), a novel region-based video generation architecture that leverages object-centric information to efficiently model temporal dynamics in videos. We build upon prior work in video prediction via an autoregressive transformer over the discrete latent space of compressed videos, with an added modification to model object-centric information via bounding boxes. Due to better compressibility of object-centric representations, we can improve training efficiency by allowing the model to only access object information for longer horizon temporal information. When evaluated on various difficult object-centric datasets, our method achieves better or equal performance to other video generation models, while remaining computationally more efficient and scalable. In addition, we show that our method is able to perform object-centric controllability through bounding box manipulation, which may aid downstream tasks such as video editing, or visual planning. Samples are available at https://sites.google.com/view/povt-public
△ Less
Submitted 18 June, 2022; v1 submitted 8 June, 2022;
originally announced June 2022.
-
On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning
Authors:
Zhao Mandi,
Pieter Abbeel,
Stephen James
Abstract:
Intelligent agents should have the ability to leverage knowledge from previously learned tasks in order to learn new ones quickly and efficiently. Meta-learning approaches have emerged as a popular solution to achieve this. However, meta-reinforcement learning (meta-RL) algorithms have thus far been restricted to simple environments with narrow task distributions. Moreover, the paradigm of pretrai…
▽ More
Intelligent agents should have the ability to leverage knowledge from previously learned tasks in order to learn new ones quickly and efficiently. Meta-learning approaches have emerged as a popular solution to achieve this. However, meta-reinforcement learning (meta-RL) algorithms have thus far been restricted to simple environments with narrow task distributions. Moreover, the paradigm of pretraining followed by fine-tuning to adapt to new tasks has emerged as a simple yet effective solution in supervised and self-supervised learning. This calls into question the benefits of meta-learning approaches also in reinforcement learning, which typically come at the cost of high complexity. We hence investigate meta-RL approaches in a variety of vision-based benchmarks, including Procgen, RLBench, and Atari, where evaluations are made on completely novel tasks. Our findings show that when meta-learning approaches are evaluated on different tasks (rather than different variations of the same task), multi-task pretraining with fine-tuning on new tasks performs equally as well, or better, than meta-pretraining with meta test-time adaptation. This is encouraging for future research, as multi-task pretraining tends to be simpler and computationally cheaper than meta-RL. From these findings, we advocate for evaluating future meta-RL methods on more challenging tasks and including multi-task pretraining with fine-tuning as a simple, yet strong baseline.
△ Less
Submitted 16 February, 2023; v1 submitted 7 June, 2022;
originally announced June 2022.
-
Skill Machines: Temporal Logic Skill Composition in Reinforcement Learning
Authors:
Geraud Nangue Tasse,
Devon Jarvis,
Steven James,
Benjamin Rosman
Abstract:
It is desirable for an agent to be able to solve a rich variety of problems that can be specified through language in the same environment. A popular approach towards obtaining such agents is to reuse skills learned in prior tasks to generalise compositionally to new ones. However, this is a challenging problem due to the curse of dimensionality induced by the combinatorially large number of ways…
▽ More
It is desirable for an agent to be able to solve a rich variety of problems that can be specified through language in the same environment. A popular approach towards obtaining such agents is to reuse skills learned in prior tasks to generalise compositionally to new ones. However, this is a challenging problem due to the curse of dimensionality induced by the combinatorially large number of ways high-level goals can be combined both logically and temporally in language. To address this problem, we propose a framework where an agent first learns a sufficient set of skill primitives to achieve all high-level goals in its environment. The agent can then flexibly compose them both logically and temporally to provably achieve temporal logic specifications in any regular language, such as regular fragments of linear temporal logic. This provides the agent with the ability to map from complex temporal logic task specifications to near-optimal behaviours zero-shot. We demonstrate this experimentally in a tabular setting, as well as in a high-dimensional video game and continuous control environment. Finally, we also demonstrate that the performance of skill machines can be improved with regular off-policy reinforcement learning algorithms when optimal behaviours are desired.
△ Less
Submitted 16 March, 2024; v1 submitted 25 May, 2022;
originally announced May 2022.
-
World Value Functions: Knowledge Representation for Multitask Reinforcement Learning
Authors:
Geraud Nangue Tasse,
Steven James,
Benjamin Rosman
Abstract:
An open problem in artificial intelligence is how to learn and represent knowledge that is sufficient for a general agent that needs to solve multiple tasks in a given world. In this work we propose world value functions (WVFs), which are a type of general value function with mastery of the world - they represent not only how to solve a given task, but also how to solve any other goal-reaching tas…
▽ More
An open problem in artificial intelligence is how to learn and represent knowledge that is sufficient for a general agent that needs to solve multiple tasks in a given world. In this work we propose world value functions (WVFs), which are a type of general value function with mastery of the world - they represent not only how to solve a given task, but also how to solve any other goal-reaching task. To achieve this, we equip the agent with an internal goal space defined as all the world states where it experiences a terminal transition - a task outcome. The agent can then modify task rewards to define its own reward function, which provably drives it to learn how to achieve all achievable internal goals, and the value of doing so in the current task. We demonstrate a number of benefits of WVFs. When the agent's internal goal space is the entire state space, we demonstrate that the transition function can be inferred from the learned WVF, which allows the agent to plan using learned value functions. Additionally, we show that for tasks in the same world, a pretrained agent that has learned any WVF can then infer the policy and value function for any new task directly from its rewards. Finally, an important property for long-lived agents is the ability to reuse existing knowledge to solve new tasks. Using WVFs as the knowledge representation for learned tasks, we show that an agent is able to solve their logical combination zero-shot, resulting in a combinatorially increasing number of skills throughout their lifetime.
△ Less
Submitted 18 May, 2022;
originally announced May 2022.
-
Accounting for the Sequential Nature of States to Learn Features for Reinforcement Learning
Authors:
Nathan Michlo,
Devon Jarvis,
Richard Klein,
Steven James
Abstract:
In this work, we investigate the properties of data that cause popular representation learning approaches to fail. In particular, we find that in environments where states do not significantly overlap, variational autoencoders (VAEs) fail to learn useful features. We demonstrate this failure in a simple gridworld domain, and then provide a solution in the form of metric learning. However, metric l…
▽ More
In this work, we investigate the properties of data that cause popular representation learning approaches to fail. In particular, we find that in environments where states do not significantly overlap, variational autoencoders (VAEs) fail to learn useful features. We demonstrate this failure in a simple gridworld domain, and then provide a solution in the form of metric learning. However, metric learning requires supervision in the form of a distance function, which is absent in reinforcement learning. To overcome this, we leverage the sequential nature of states in a replay buffer to approximate a distance metric and provide a weak supervision signal, under the assumption that temporally close states are also semantically similar. We modify a VAE with triplet loss and demonstrate that this approach is able to learn useful features for downstream tasks, without additional supervision, in environments where standard VAEs fail.
△ Less
Submitted 12 May, 2022;
originally announced May 2022.
-
Learning Abstract and Transferable Representations for Planning
Authors:
Steven James,
Benjamin Rosman,
George Konidaris
Abstract:
We are concerned with the question of how an agent can acquire its own representations from sensory data. We restrict our focus to learning representations for long-term planning, a class of problems that state-of-the-art learning methods are unable to solve. We propose a framework for autonomously learning state abstractions of an agent's environment, given a set of skills. Importantly, these abs…
▽ More
We are concerned with the question of how an agent can acquire its own representations from sensory data. We restrict our focus to learning representations for long-term planning, a class of problems that state-of-the-art learning methods are unable to solve. We propose a framework for autonomously learning state abstractions of an agent's environment, given a set of skills. Importantly, these abstractions are task-independent, and so can be reused to solve new tasks. We demonstrate how an agent can use an existing set of options to acquire representations from ego- and object-centric observations. These abstractions can immediately be reused by the same agent in new environments. We show how to combine these portable representations with problem-specific ones to generate a sound description of a specific task that can be used for abstract planning. Finally, we show how to autonomously construct a multi-level hierarchy consisting of increasingly abstract representations. Since these hierarchies are transferable, higher-order concepts can be reused in new tasks, relieving the agent from relearning them and improving sample efficiency. Our results demonstrate that our approach allows an agent to transfer previous knowledge to new tasks, improving sample efficiency as the number of tasks increases.
△ Less
Submitted 4 May, 2022;
originally announced May 2022.
-
FlameNEST: Explicit Profile Likelihoods with the Noble Element Simulation Technique
Authors:
R. S. James,
J. Palmer,
A. Kaboth,
C. Ghag,
J. Aalbers
Abstract:
We present FlameNEST, a framework providing explicit likelihood evaluations in noble element particle detectors using data-driven models from the Noble Element Simulation Technique. FlameNEST provides a way to perform statistical analyses on real data with no dependence on large, computationally expensive Monte Carlo simulations by evaluating the likelihood on an event-by-event basis using analyti…
▽ More
We present FlameNEST, a framework providing explicit likelihood evaluations in noble element particle detectors using data-driven models from the Noble Element Simulation Technique. FlameNEST provides a way to perform statistical analyses on real data with no dependence on large, computationally expensive Monte Carlo simulations by evaluating the likelihood on an event-by-event basis using analytic probability elements convolved together in a single TensorFlow multiplication. Furthermore, this robust framework creates opportunities for simple inter-collaborative analyses which will be fundamental for the future of experimental dark matter physics.
△ Less
Submitted 28 April, 2022;
originally announced April 2022.
-
Coarse-to-fine Q-attention with Tree Expansion
Authors:
Stephen James,
Pieter Abbeel
Abstract:
Coarse-to-fine Q-attention enables sample-efficient robot manipulation by discretizing the translation space in a coarse-to-fine manner, where the resolution gradually increases at each layer in the hierarchy. Although effective, Q-attention suffers from "coarse ambiguity" - when voxelization is significantly coarse, it is not feasible to distinguish similar-looking objects without first inspectin…
▽ More
Coarse-to-fine Q-attention enables sample-efficient robot manipulation by discretizing the translation space in a coarse-to-fine manner, where the resolution gradually increases at each layer in the hierarchy. Although effective, Q-attention suffers from "coarse ambiguity" - when voxelization is significantly coarse, it is not feasible to distinguish similar-looking objects without first inspecting at a finer resolution. To combat this, we propose to envision Q-attention as a tree that can be expanded and used to accumulate value estimates across the top-k voxels at each Q-attention depth. When our extension, Q-attention with Tree Expansion (QTE), replaces standard Q-attention in the Attention-driven Robot Manipulation (ARM) system, we are able to accomplish a larger set of tasks; especially on those that suffer from "coarse ambiguity". In addition to evaluating our approach across 12 RLBench tasks, we also show that the improved performance is visible in a real-world task involving small objects.
△ Less
Submitted 2 May, 2022; v1 submitted 26 April, 2022;
originally announced April 2022.
-
Adaptive Online Value Function Approximation with Wavelets
Authors:
Michael Beukman,
Michael Mitchley,
Dean Wookey,
Steven James,
George Konidaris
Abstract:
Using function approximation to represent a value function is necessary for continuous and high-dimensional state spaces. Linear function approximation has desirable theoretical guarantees and often requires less compute and samples than neural networks, but most approaches suffer from an exponential growth in the number of functions as the dimensionality of the state space increases. In this work…
▽ More
Using function approximation to represent a value function is necessary for continuous and high-dimensional state spaces. Linear function approximation has desirable theoretical guarantees and often requires less compute and samples than neural networks, but most approaches suffer from an exponential growth in the number of functions as the dimensionality of the state space increases. In this work, we introduce the wavelet basis for reinforcement learning. Wavelets can effectively be used as a fixed basis and additionally provide the ability to adaptively refine the basis set as learning progresses, making it feasible to start with a minimal basis set. This adaptive method can either increase the granularity of the approximation at a point in state space, or add in interactions between different dimensions as necessary. We prove that wavelets are both necessary and sufficient if we wish to construct a function approximator that can be adaptively refined without loss of precision. We further demonstrate that a fixed wavelet basis set performs comparably against the high-performing Fourier basis on Mountain Car and Acrobot, and that the adaptive methods provide a convenient approach to addressing an oversized initial basis set, while demonstrating performance comparable to, or greater than, the fixed wavelet basis.
△ Less
Submitted 22 April, 2022;
originally announced April 2022.
-
Automatic Encoding and Repair of Reactive High-Level Tasks with Learned Abstract Representations
Authors:
Adam Pacheck,
Steven James,
George Konidaris,
Hadas Kress-Gazit
Abstract:
We present a framework that, given a set of skills a robot can perform, abstracts sensor data into symbols that we use to automatically encode the robot's capabilities in Linear Temporal Logic. We specify reactive high-level tasks based on these capabilities, for which a strategy is automatically synthesized and executed on the robot, if the task is feasible. If a task is not feasible given the ro…
▽ More
We present a framework that, given a set of skills a robot can perform, abstracts sensor data into symbols that we use to automatically encode the robot's capabilities in Linear Temporal Logic. We specify reactive high-level tasks based on these capabilities, for which a strategy is automatically synthesized and executed on the robot, if the task is feasible. If a task is not feasible given the robot's capabilities, we present two methods, one enumeration-based and one synthesis-based, for automatically suggesting additional skills for the robot or modifications to existing skills that would make the task feasible. We demonstrate our framework on a Baxter robot manipulating blocks on a table, a Baxter robot manipulating plates on a table, and a Kinova arm manipulating vials, with multiple sensor modalities, including raw images.
△ Less
Submitted 18 April, 2022;
originally announced April 2022.
-
Sim-to-Real 6D Object Pose Estimation via Iterative Self-training for Robotic Bin Picking
Authors:
Kai Chen,
Rui Cao,
Stephen James,
Yichuan Li,
Yun-Hui Liu,
Pieter Abbeel,
Qi Dou
Abstract:
In this paper, we propose an iterative self-training framework for sim-to-real 6D object pose estimation to facilitate cost-effective robotic grasping. Given a bin-picking scenario, we establish a photo-realistic simulator to synthesize abundant virtual data, and use this to train an initial pose estimation network. This network then takes the role of a teacher model, which generates pose predicti…
▽ More
In this paper, we propose an iterative self-training framework for sim-to-real 6D object pose estimation to facilitate cost-effective robotic grasping. Given a bin-picking scenario, we establish a photo-realistic simulator to synthesize abundant virtual data, and use this to train an initial pose estimation network. This network then takes the role of a teacher model, which generates pose predictions for unlabeled real data. With these predictions, we further design a comprehensive adaptive selection scheme to distinguish reliable results, and leverage them as pseudo labels to update a student model for pose estimation on real data. To continuously improve the quality of pseudo labels, we iterate the above steps by taking the trained student model as a new teacher and re-label real data using the refined teacher model. We evaluate our method on a public benchmark and our newly-released dataset, achieving an ADD(-S) improvement of 11.49% and 22.62% respectively. Our method is also able to improve robotic bin-picking success by 19.54%, demonstrating the potential of iterative sim-to-real solutions for robotic applications.
△ Less
Submitted 21 July, 2022; v1 submitted 14 April, 2022;
originally announced April 2022.
-
Procedural Content Generation using Neuroevolution and Novelty Search for Diverse Video Game Levels
Authors:
Michael Beukman,
Christopher W Cleghorn,
Steven James
Abstract:
Procedurally generated video game content has the potential to drastically reduce the content creation budget of game developers and large studios. However, adoption is hindered by limitations such as slow generation, as well as low quality and diversity of content. We introduce an evolutionary search-based approach for evolving level generators using novelty search to procedurally generate divers…
▽ More
Procedurally generated video game content has the potential to drastically reduce the content creation budget of game developers and large studios. However, adoption is hindered by limitations such as slow generation, as well as low quality and diversity of content. We introduce an evolutionary search-based approach for evolving level generators using novelty search to procedurally generate diverse levels in real time, without requiring training data or detailed domain-specific knowledge. We test our method on two domains, and our results show an order of magnitude speedup in generation time compared to existing methods while obtaining comparable metric scores. We further demonstrate the ability to generalise to arbitrary-sized levels without retraining.
△ Less
Submitted 14 April, 2022;
originally announced April 2022.
-
Mitigating Mismatch Compression in Differential Local Field Potentials
Authors:
Vineet Tiruvadi,
Sam James,
Bryan Howell,
Mosadoluwa Obatusin,
Andrea Crowell,
Patricio Riva-Posse,
Ki Sueng Choi,
Allison Waters,
Robert E. Gross,
Cameron C. McIntyre,
Helen S. Mayberg,
Robert Butera
Abstract:
Bidirectional deep brain stimulation (bdDBS) devices capable of recording differential local field potentials (dLFP) enable neural recordings alongside clinical therapy. Efforts to identify objective signals of various brain disorders, or disease readouts, are challenging in dLFP, especially during active DBS. In this report we identified, characterized, and mitigated a major source of distortion…
▽ More
Bidirectional deep brain stimulation (bdDBS) devices capable of recording differential local field potentials (dLFP) enable neural recordings alongside clinical therapy. Efforts to identify objective signals of various brain disorders, or disease readouts, are challenging in dLFP, especially during active DBS. In this report we identified, characterized, and mitigated a major source of distortion in dLFP that we introduce as mismatch compression (MC). MC occurs secondary to impedance mismatches across the dLFP channel resulting in incomplete rejection of artifacts and downstream amplifier gain compression. Using in silico and in vitro models we demonstrate that MC accounts for impedance-related distortions sensitive to DBS amplitude. We then use these models to develop and validate a mitigation strategy for MC that is provided as an opensource library for more reliable oscillatory disease readouts.
△ Less
Submitted 7 April, 2022;
originally announced April 2022.
-
Coarse-to-Fine Q-attention with Learned Path Ranking
Authors:
Stephen James,
Pieter Abbeel
Abstract:
We propose Learned Path Ranking (LPR), a method that accepts an end-effector goal pose, and learns to rank a set of goal-reaching paths generated from an array of path generating methods, including: path planning, Bezier curve sampling, and a learned policy. The core idea being that each of the path generation modules will be useful in different tasks, or at different stages in a task. When LPR is…
▽ More
We propose Learned Path Ranking (LPR), a method that accepts an end-effector goal pose, and learns to rank a set of goal-reaching paths generated from an array of path generating methods, including: path planning, Bezier curve sampling, and a learned policy. The core idea being that each of the path generation modules will be useful in different tasks, or at different stages in a task. When LPR is added as an extension to C2F-ARM, our new system, C2F-ARM+LPR, retains the sample efficiency of its predecessor, while also being able to accomplish a larger set of tasks; in particular, tasks that require very specific motions (e.g. opening toilet seat) that need to be inferred from both demonstrations and exploration data. In addition to benchmarking our approach across 16 RLBench tasks, we also learn real-world tasks, tabula rasa, in 10-15 minutes, with only 3 demonstrations.
△ Less
Submitted 4 April, 2022;
originally announced April 2022.
-
Reinforcement Learning with Action-Free Pre-Training from Videos
Authors:
Younggyo Seo,
Kimin Lee,
Stephen James,
Pieter Abbeel
Abstract:
Recent unsupervised pre-training methods have shown to be effective on language and vision domains by learning useful representations for multiple downstream tasks. In this paper, we investigate if such unsupervised pre-training methods can also be effective for vision-based reinforcement learning (RL). To this end, we introduce a framework that learns representations useful for understanding the…
▽ More
Recent unsupervised pre-training methods have shown to be effective on language and vision domains by learning useful representations for multiple downstream tasks. In this paper, we investigate if such unsupervised pre-training methods can also be effective for vision-based reinforcement learning (RL). To this end, we introduce a framework that learns representations useful for understanding the dynamics via generative pre-training on videos. Our framework consists of two phases: we pre-train an action-free latent video prediction model, and then utilize the pre-trained representations for efficiently learning action-conditional world models on unseen environments. To incorporate additional action inputs during fine-tuning, we introduce a new architecture that stacks an action-conditional latent prediction model on top of the pre-trained action-free prediction model. Moreover, for better exploration, we propose a video-based intrinsic bonus that leverages pre-trained representations. We demonstrate that our framework significantly improves both final performances and sample-efficiency of vision-based RL in a variety of manipulation and locomotion tasks. Code is available at https://github.com/younggyoseo/apv.
△ Less
Submitted 16 June, 2022; v1 submitted 25 March, 2022;
originally announced March 2022.
-
A Next-Generation Liquid Xenon Observatory for Dark Matter and Neutrino Physics
Authors:
J. Aalbers,
K. Abe,
V. Aerne,
F. Agostini,
S. Ahmed Maouloud,
D. S. Akerib,
D. Yu. Akimov,
J. Akshat,
A. K. Al Musalhi,
F. Alder,
S. K. Alsum,
L. Althueser,
C. S. Amarasinghe,
F. D. Amaro,
A. Ames,
T. J. Anderson,
B. Andrieu,
N. Angelides,
E. Angelino,
J. Angevaare,
V. C. Antochi,
D. Antón Martin,
B. Antunovic,
E. Aprile,
H. M. Araújo
, et al. (572 additional authors not shown)
Abstract:
The nature of dark matter and properties of neutrinos are among the most pressing issues in contemporary particle physics. The dual-phase xenon time-projection chamber is the leading technology to cover the available parameter space for Weakly Interacting Massive Particles (WIMPs), while featuring extensive sensitivity to many alternative dark matter candidates. These detectors can also study neut…
▽ More
The nature of dark matter and properties of neutrinos are among the most pressing issues in contemporary particle physics. The dual-phase xenon time-projection chamber is the leading technology to cover the available parameter space for Weakly Interacting Massive Particles (WIMPs), while featuring extensive sensitivity to many alternative dark matter candidates. These detectors can also study neutrinos through neutrinoless double-beta decay and through a variety of astrophysical sources. A next-generation xenon-based detector will therefore be a true multi-purpose observatory to significantly advance particle physics, nuclear physics, astrophysics, solar physics, and cosmology. This review article presents the science cases for such a detector.
△ Less
Submitted 4 March, 2022;
originally announced March 2022.
-
Overlooked Implications of the Reconstruction Loss for VAE Disentanglement
Authors:
Nathan Michlo,
Richard Klein,
Steven James
Abstract:
Learning disentangled representations with variational autoencoders (VAEs) is often attributed to the regularisation component of the loss. In this work, we highlight the interaction between data and the reconstruction term of the loss as the main contributor to disentanglement in VAEs. We show that standard benchmark datasets have unintended correlations between their subjective ground-truth fact…
▽ More
Learning disentangled representations with variational autoencoders (VAEs) is often attributed to the regularisation component of the loss. In this work, we highlight the interaction between data and the reconstruction term of the loss as the main contributor to disentanglement in VAEs. We show that standard benchmark datasets have unintended correlations between their subjective ground-truth factors and perceived axes in the data according to typical VAE reconstruction losses. Our work exploits this relationship to provide a theory for what constitutes an adversarial dataset under a given reconstruction loss. We verify this by constructing an example dataset that prevents disentanglement in state-of-the-art frameworks while maintaining human-intuitive ground-truth factors. Finally, we re-enable disentanglement by designing an example reconstruction loss that is once again able to perceive the ground-truth factors. Our findings demonstrate the subjective nature of disentanglement and the importance of considering the interaction between the ground-truth factors, data and notably, the reconstruction loss, which is under-recognised in the literature.
△ Less
Submitted 9 August, 2023; v1 submitted 27 February, 2022;
originally announced February 2022.
-
ReorientBot: Learning Object Reorientation for Specific-Posed Placement
Authors:
Kentaro Wada,
Stephen James,
Andrew J. Davison
Abstract:
Robots need the capability of placing objects in arbitrary, specific poses to rearrange the world and achieve various valuable tasks. Object reorientation plays a crucial role in this as objects may not initially be oriented such that the robot can grasp and then immediately place them in a specific goal pose. In this work, we present a vision-based manipulation system, ReorientBot, which consists…
▽ More
Robots need the capability of placing objects in arbitrary, specific poses to rearrange the world and achieve various valuable tasks. Object reorientation plays a crucial role in this as objects may not initially be oriented such that the robot can grasp and then immediately place them in a specific goal pose. In this work, we present a vision-based manipulation system, ReorientBot, which consists of 1) visual scene understanding with pose estimation and volumetric reconstruction using an onboard RGB-D camera; 2) learned waypoint selection for successful and efficient motion generation for reorientation; 3) traditional motion planning to generate a collision-free trajectory from the selected waypoints. We evaluate our method using the YCB objects in both simulation and the real world, achieving 93% overall success, 81% improvement in success rate, and 22% improvement in execution time compared to a heuristic approach. We demonstrate extended multi-object rearrangement showing the general capability of the system.
△ Less
Submitted 22 February, 2022;
originally announced February 2022.
-
SafePicking: Learning Safe Object Extraction via Object-Level Mapping
Authors:
Kentaro Wada,
Stephen James,
Andrew J. Davison
Abstract:
Robots need object-level scene understanding to manipulate objects while reasoning about contact, support, and occlusion among objects. Given a pile of objects, object recognition and reconstruction can identify the boundary of object instances, giving important cues as to how the objects form and support the pile. In this work, we present a system, SafePicking, that integrates object-level mappin…
▽ More
Robots need object-level scene understanding to manipulate objects while reasoning about contact, support, and occlusion among objects. Given a pile of objects, object recognition and reconstruction can identify the boundary of object instances, giving important cues as to how the objects form and support the pile. In this work, we present a system, SafePicking, that integrates object-level mapping and learning-based motion planning to generate a motion that safely extracts occluded target objects from a pile. Planning is done by learning a deep Q-network that receives observations of predicted poses and a depth-based heightmap to output a motion trajectory, trained to maximize a safety metric reward. Our results show that the observation fusion of poses and depth-sensing gives both better performance and robustness to the model. We evaluate our methods using the YCB objects in both simulation and the real world, achieving safe object extraction from piles.
△ Less
Submitted 1 March, 2022; v1 submitted 11 February, 2022;
originally announced February 2022.
-
Bingham Policy Parameterization for 3D Rotations in Reinforcement Learning
Authors:
Stephen James,
Pieter Abbeel
Abstract:
We propose a new policy parameterization for representing 3D rotations during reinforcement learning. Today in the continuous control reinforcement learning literature, many stochastic policy parameterizations are Gaussian. We argue that universally applying a Gaussian policy parameterization is not always desirable for all environments. One such case in particular where this is true are tasks tha…
▽ More
We propose a new policy parameterization for representing 3D rotations during reinforcement learning. Today in the continuous control reinforcement learning literature, many stochastic policy parameterizations are Gaussian. We argue that universally applying a Gaussian policy parameterization is not always desirable for all environments. One such case in particular where this is true are tasks that involve predicting a 3D rotation output, either in isolation, or coupled with translation as part of a full 6D pose output. Our proposed Bingham Policy Parameterization (BPP) models the Bingham distribution and allows for better rotation (quaternion) prediction over a Gaussian policy parameterization in a range of reinforcement learning tasks. We evaluate BPP on the rotation Wahba problem task, as well as a set of vision-based next-best pose robot manipulation tasks from RLBench. We hope that this paper encourages more research into developing other policy parameterization that are more suited for particular environments, rather than always assuming Gaussian.
△ Less
Submitted 8 February, 2022;
originally announced February 2022.
-
Auto-Lambda: Disentangling Dynamic Task Relationships
Authors:
Shikun Liu,
Stephen James,
Andrew J. Davison,
Edward Johns
Abstract:
Understanding the structure of multiple related tasks allows for multi-task learning to improve the generalisation ability of one or all of them. However, it usually requires training each pairwise combination of tasks together in order to capture task relationships, at an extremely high computational cost. In this work, we learn task relationships via an automated weighting framework, named Auto-…
▽ More
Understanding the structure of multiple related tasks allows for multi-task learning to improve the generalisation ability of one or all of them. However, it usually requires training each pairwise combination of tasks together in order to capture task relationships, at an extremely high computational cost. In this work, we learn task relationships via an automated weighting framework, named Auto-Lambda. Unlike previous methods where task relationships are assumed to be fixed, Auto-Lambda is a gradient-based meta learning framework which explores continuous, dynamic task relationships via task-specific weightings, and can optimise any choice of combination of tasks through the formulation of a meta-loss; where the validation loss automatically influences task weightings throughout training. We apply the proposed framework to both multi-task and auxiliary learning problems in computer vision and robotics, and show that Auto-Lambda achieves state-of-the-art performance, even when compared to optimisation strategies designed specifically for each problem and data domain. Finally, we observe that Auto-Lambda can discover interesting learning behaviors, leading to new insights in multi-task learning. Code is available at https://github.com/lorenmt/auto-lambda.
△ Less
Submitted 2 June, 2022; v1 submitted 7 February, 2022;
originally announced February 2022.
-
Investigating Transfer Learning in Graph Neural Networks
Authors:
Nishai Kooverjee,
Steven James,
Terence van Zyl
Abstract:
Graph neural networks (GNNs) build on the success of deep learning models by extending them for use in graph spaces. Transfer learning has proven extremely successful for traditional deep learning problems: resulting in faster training and improved performance. Despite the increasing interest in GNNs and their use cases, there is little research on their transferability. This research demonstrates…
▽ More
Graph neural networks (GNNs) build on the success of deep learning models by extending them for use in graph spaces. Transfer learning has proven extremely successful for traditional deep learning problems: resulting in faster training and improved performance. Despite the increasing interest in GNNs and their use cases, there is little research on their transferability. This research demonstrates that transfer learning is effective with GNNs, and describes how source tasks and the choice of GNN impact the ability to learn generalisable knowledge. We perform experiments using real-world and synthetic data within the contexts of node classification and graph classification. To this end, we also provide a general methodology for transfer learning experimentation and present a novel algorithm for generating synthetic graph classification tasks. We compare the performance of GCN, GraphSAGE and GIN across both the synthetic and real-world datasets. Our results demonstrate empirically that GNNs with inductive operations yield statistically significantly improved transfer. Further we show that similarity in community structure between source and target tasks support statistically significant improvements in transfer over and above the use of only the node attributes.
△ Less
Submitted 1 February, 2022;
originally announced February 2022.
-
Towards Objective Metrics for Procedurally Generated Video Game Levels
Authors:
Michael Beukman,
Steven James,
Christopher Cleghorn
Abstract:
With increasing interest in procedural content generation by academia and game developers alike, it is vital that different approaches can be compared fairly. However, evaluating procedurally generated video game levels is often difficult, due to the lack of standardised, game-independent metrics. In this paper, we introduce two simulation-based evaluation metrics that involve analysing the behavi…
▽ More
With increasing interest in procedural content generation by academia and game developers alike, it is vital that different approaches can be compared fairly. However, evaluating procedurally generated video game levels is often difficult, due to the lack of standardised, game-independent metrics. In this paper, we introduce two simulation-based evaluation metrics that involve analysing the behaviour of an A* agent to measure the diversity and difficulty of generated levels in a general, game-independent manner. Diversity is calculated by comparing action trajectories from different levels using the edit distance, and difficulty is measured as how much exploration and expansion of the A* search tree is necessary before the agent can solve the level. We demonstrate that our diversity metric is more robust to changes in level size and representation than current methods and additionally measures factors that directly affect playability, instead of focusing on visual information. The difficulty metric shows promise, as it correlates with existing estimates of difficulty in one of the tested domains, but it does face some challenges in the other domain. Finally, to promote reproducibility, we publicly release our evaluation framework.
△ Less
Submitted 9 March, 2022; v1 submitted 25 January, 2022;
originally announced January 2022.
-
Cosmogenic production of $^{37}$Ar in the context of the LUX-ZEPLIN experiment
Authors:
J. Aalbers,
D. S. Akerib,
A. K. Al Musalhi,
F. Alder,
S. K. Alsum,
C. S. Amarasinghe,
A. Ames,
T. J. Anderson,
N. Angelides,
H. M. Araújo,
J. E. Armstrong,
M. Arthurs,
X. Bai,
A. Baker,
J. Balajthy,
S. Balashov,
J. Bang,
J. W. Bargemann,
D. Bauer,
A. Baxter,
K. Beattie,
E. P. Bernard,
A. Bhatti,
A. Biekert,
T. P. Biesiadzinski
, et al. (183 additional authors not shown)
Abstract:
We estimate the amount of $^{37}$Ar produced in natural xenon via cosmic ray-induced spallation, an inevitable consequence of the transportation and storage of xenon on the Earth's surface. We then calculate the resulting $^{37}$Ar concentration in a 10-tonne payload~(similar to that of the LUX-ZEPLIN experiment) assuming a representative schedule of xenon purification, storage and delivery to the…
▽ More
We estimate the amount of $^{37}$Ar produced in natural xenon via cosmic ray-induced spallation, an inevitable consequence of the transportation and storage of xenon on the Earth's surface. We then calculate the resulting $^{37}$Ar concentration in a 10-tonne payload~(similar to that of the LUX-ZEPLIN experiment) assuming a representative schedule of xenon purification, storage and delivery to the underground facility. Using the spallation model by Silberberg and Tsao, the sea level production rate of $^{37}$Ar in natural xenon is estimated to be 0.024~atoms/kg/day. Assuming the xenon is successively purified to remove radioactive contaminants in 1-tonne batches at a rate of 1~tonne/month, the average $^{37}$Ar activity after 10~tonnes are purified and transported underground is 0.058--0.090~$μ$Bq/kg, depending on the degree of argon removal during above-ground purification. Such cosmogenic $^{37}$Ar will appear as a noticeable background in the early science data, while decaying with a 35~day half-life. This newly-noticed production mechanism of $^{37}$Ar should be considered when planning for future liquid xenon-based experiments.
△ Less
Submitted 22 March, 2022; v1 submitted 8 January, 2022;
originally announced January 2022.
-
Learning to Follow Language Instructions with Compositional Policies
Authors:
Vanya Cohen,
Geraud Nangue Tasse,
Nakul Gopalan,
Steven James,
Matthew Gombolay,
Benjamin Rosman
Abstract:
We propose a framework that learns to execute natural language instructions in an environment consisting of goal-reaching tasks that share components of their task descriptions. Our approach leverages the compositionality of both value functions and language, with the aim of reducing the sample complexity of learning novel tasks. First, we train a reinforcement learning agent to learn value functi…
▽ More
We propose a framework that learns to execute natural language instructions in an environment consisting of goal-reaching tasks that share components of their task descriptions. Our approach leverages the compositionality of both value functions and language, with the aim of reducing the sample complexity of learning novel tasks. First, we train a reinforcement learning agent to learn value functions that can be subsequently composed through a Boolean algebra to solve novel tasks. Second, we fine-tune a seq2seq model pretrained on web-scale corpora to map language to logical expressions that specify the required value function compositions. Evaluating our agent in the BabyAI domain, we observe a decrease of 86% in the number of training steps needed to learn a second task after mastering a single task. Results from ablation studies further indicate that it is the combination of compositional value functions and language representations that allows the agent to quickly generalize to new tasks.
△ Less
Submitted 9 October, 2021;
originally announced October 2021.
-
Coarse-to-Fine Q-attention: Efficient Learning for Visual Robotic Manipulation via Discretisation
Authors:
Stephen James,
Kentaro Wada,
Tristan Laidlow,
Andrew J. Davison
Abstract:
We present a coarse-to-fine discretisation method that enables the use of discrete reinforcement learning approaches in place of unstable and data-inefficient actor-critic methods in continuous robotics domains. This approach builds on the recently released ARM algorithm, which replaces the continuous next-best pose agent with a discrete one, with coarse-to-fine Q-attention. Given a voxelised scen…
▽ More
We present a coarse-to-fine discretisation method that enables the use of discrete reinforcement learning approaches in place of unstable and data-inefficient actor-critic methods in continuous robotics domains. This approach builds on the recently released ARM algorithm, which replaces the continuous next-best pose agent with a discrete one, with coarse-to-fine Q-attention. Given a voxelised scene, coarse-to-fine Q-attention learns what part of the scene to 'zoom' into. When this 'zooming' behaviour is applied iteratively, it results in a near-lossless discretisation of the translation space, and allows the use of a discrete action, deep Q-learning method. We show that our new coarse-to-fine algorithm achieves state-of-the-art performance on several difficult sparsely rewarded RLBench vision-based robotics tasks, and can train real-world policies, tabula rasa, in a matter of minutes, with as little as 3 demonstrations.
△ Less
Submitted 14 March, 2022; v1 submitted 23 June, 2021;
originally announced June 2021.
-
Q-attention: Enabling Efficient Learning for Vision-based Robotic Manipulation
Authors:
Stephen James,
Andrew J. Davison
Abstract:
Despite the success of reinforcement learning methods, they have yet to have their breakthrough moment when applied to a broad range of robotic manipulation tasks. This is partly due to the fact that reinforcement learning algorithms are notoriously difficult and time consuming to train, which is exacerbated when training from images rather than full-state inputs. As humans perform manipulation ta…
▽ More
Despite the success of reinforcement learning methods, they have yet to have their breakthrough moment when applied to a broad range of robotic manipulation tasks. This is partly due to the fact that reinforcement learning algorithms are notoriously difficult and time consuming to train, which is exacerbated when training from images rather than full-state inputs. As humans perform manipulation tasks, our eyes closely monitor every step of the process with our gaze focusing sequentially on the objects being manipulated. With this in mind, we present our Attention-driven Robotic Manipulation (ARM) algorithm, which is a general manipulation algorithm that can be applied to a range of sparse-rewarded tasks, given only a small number of demonstrations. ARM splits the complex task of manipulation into a 3 stage pipeline: (1) a Q-attention agent extracts relevant pixel locations from RGB and point cloud inputs, (2) a next-best pose agent that accepts crops from the Q-attention agent and outputs poses, and (3) a control agent that takes the goal pose and outputs joint actions. We show that current learning algorithms fail on a range of RLBench tasks, whilst ARM is successful.
△ Less
Submitted 3 February, 2022; v1 submitted 31 May, 2021;
originally announced May 2021.
-
Mixing Modalities of 3D Sketching and Speech for Interactive Model Retrieval in Virtual Reality
Authors:
Daniele Giunchi,
Alejandro Sztrajman,
Stuart James,
Anthony Steed
Abstract:
Sketch and speech are intuitive interaction methods that convey complementary information and have been independently used for 3D model retrieval in virtual environments. While sketch has been shown to be an effective retrieval method, not all collections are easily navigable using this modality alone. We design a new challenging database for sketch comprised of 3D chairs where each of the compone…
▽ More
Sketch and speech are intuitive interaction methods that convey complementary information and have been independently used for 3D model retrieval in virtual environments. While sketch has been shown to be an effective retrieval method, not all collections are easily navigable using this modality alone. We design a new challenging database for sketch comprised of 3D chairs where each of the components (arms, legs, seat, back) are independently colored. To overcome this, we implement a multimodal interface for querying 3D model databases within a virtual environment. We base the sketch on the state-of-the-art for 3D Sketch Retrieval, and use a Wizard-of-Oz style experiment to process the voice input. In this way, we avoid the complexities of natural language processing which frequently requires fine-tuning to be robust. We conduct two user studies and show that hybrid search strategies emerge from the combination of interactions, fostering the advantages provided by both modalities.
△ Less
Submitted 5 May, 2021;
originally announced May 2021.
-
Waypoint Planning Networks
Authors:
Alexandru-Iosif Toma,
Hussein Ali Jaafar,
Hao-Ya Hsueh,
Stephen James,
Daniel Lenton,
Ronald Clark,
Sajad Saeedi
Abstract:
With the recent advances in machine learning, path planning algorithms are also evolving; however, the learned path planning algorithms often have difficulty competing with success rates of classic algorithms. We propose waypoint planning networks (WPN), a hybrid algorithm based on LSTMs with a local kernel - a classic algorithm such as A*, and a global kernel using a learned algorithm. WPN produc…
▽ More
With the recent advances in machine learning, path planning algorithms are also evolving; however, the learned path planning algorithms often have difficulty competing with success rates of classic algorithms. We propose waypoint planning networks (WPN), a hybrid algorithm based on LSTMs with a local kernel - a classic algorithm such as A*, and a global kernel using a learned algorithm. WPN produces a more computationally efficient and robust solution. We compare WPN against A*, as well as related works including motion planning networks (MPNet) and value iteration networks (VIN). In this paper, the design and experiments have been conducted for 2D environments. Experimental results outline the benefits of WPN, both in efficiency and generalization. It is shown that WPN's search space is considerably less than A*, while being able to generate near optimal results. Additionally, WPN works on partial maps, unlike A* which needs the full map in advance. The code is available online.
△ Less
Submitted 1 May, 2021;
originally announced May 2021.
-
Projected sensitivity of the LUX-ZEPLIN (LZ) experiment to the two-neutrino and neutrinoless double beta decays of $^{134}$Xe
Authors:
The LUX-ZEPLIN,
Collaboration,
:,
D. S. Akerib,
A. K. Al Musalhi,
S. K. Alsum,
C. S. Amarasinghe,
A. Ames,
T. J. Anderson,
N. Angelides,
H. M. Araujo,
J. E. Armstrong,
M. Arthurs,
X. Bai,
J. Balajthy,
S. Balashov,
J. Bang,
J. W. Bargemann,
D. Bauer,
A. Baxter,
P. Beltrame,
E. P. Bernard,
A. Bernstein,
A. Bhatti,
A. Biekert
, et al. (172 additional authors not shown)
Abstract:
The projected sensitivity of the LUX-ZEPLIN (LZ) experiment to two-neutrino and neutrinoless double beta decay of $^{134}$Xe is presented. LZ is a 10-tonne xenon time projection chamber optimized for the detection of dark matter particles, that is expected to start operating in 2021 at Sanford Underground Research Facility, USA. Its large mass of natural xenon provides an exceptional opportunity t…
▽ More
The projected sensitivity of the LUX-ZEPLIN (LZ) experiment to two-neutrino and neutrinoless double beta decay of $^{134}$Xe is presented. LZ is a 10-tonne xenon time projection chamber optimized for the detection of dark matter particles, that is expected to start operating in 2021 at Sanford Underground Research Facility, USA. Its large mass of natural xenon provides an exceptional opportunity to search for the double beta decay of $^{134}$Xe, for which xenon detectors enriched in $^{136}$Xe are less effective. For the two-neutrino decay mode, LZ is predicted to exclude values of the half-life up to 1.7$\times$10$^{24}$ years at 90% confidence level (CL), and has a three-sigma observation potential of 8.7$\times$10$^{23}$ years, approaching the predictions of nuclear models. For the neutrinoless decay mode LZ, is projected to exclude values of the half-life up to 7.3$\times$10$^{24}$ years at 90% CL.
△ Less
Submitted 22 November, 2021; v1 submitted 26 April, 2021;
originally announced April 2021.
-
Finding Experts in Social Media Data using a Hybrid Approach
Authors:
Simon James,
Brady
Abstract:
Several approaches to the problem of expert finding have emerged in computer science research. In this work, three of these approaches - content analysis, social graph analysis and the use of Semantic Web technologies are examined. An integrated set of system requirements is then developed that uses all three approaches in one hybrid approach.
To show the practicality of this hybrid approach, a…
▽ More
Several approaches to the problem of expert finding have emerged in computer science research. In this work, three of these approaches - content analysis, social graph analysis and the use of Semantic Web technologies are examined. An integrated set of system requirements is then developed that uses all three approaches in one hybrid approach.
To show the practicality of this hybrid approach, a usable prototype expert finding system called ExpertQuest is developed using a modern functional programming language (Clojure) to query social media data and Linked Data. This system is evaluated and discussed. Finally, a discussion and conclusions are presented which describe the benefits and shortcomings of the hybrid approach and the technologies used in this work.
△ Less
Submitted 1 April, 2021;
originally announced April 2021.
-
SIMstack: A Generative Shape and Instance Model for Unordered Object Stacks
Authors:
Zoe Landgraf,
Raluca Scona,
Tristan Laidlow,
Stephen James,
Stefan Leutenegger,
Andrew J. Davison
Abstract:
By estimating 3D shape and instances from a single view, we can capture information about an environment quickly, without the need for comprehensive scanning and multi-view fusion. Solving this task for composite scenes (such as object stacks) is challenging: occluded areas are not only ambiguous in shape but also in instance segmentation; multiple decompositions could be valid. We observe that ph…
▽ More
By estimating 3D shape and instances from a single view, we can capture information about an environment quickly, without the need for comprehensive scanning and multi-view fusion. Solving this task for composite scenes (such as object stacks) is challenging: occluded areas are not only ambiguous in shape but also in instance segmentation; multiple decompositions could be valid. We observe that physics constrains decomposition as well as shape in occluded regions and hypothesise that a latent space learned from scenes built under physics simulation can serve as a prior to better predict shape and instances in occluded regions. To this end we propose SIMstack, a depth-conditioned Variational Auto-Encoder (VAE), trained on a dataset of objects stacked under physics simulation. We formulate instance segmentation as a centre voting task which allows for class-agnostic detection and doesn't require setting the maximum number of objects in the scene. At test time, our model can generate 3D shape and instance segmentation from a single depth view, probabilistically sampling proposals for the occluded region from the learned latent space. Our method has practical applications in providing robots some of the ability humans have to make rapid intuitive inferences of partially observed scenes. We demonstrate an application for precise (non-disruptive) object grasping of unknown objects from a single depth view.
△ Less
Submitted 26 September, 2021; v1 submitted 30 March, 2021;
originally announced March 2021.