Search | arXiv e-print repository

Learning and Understanding a Disentangled Feature Representation for Hidden Parameters in Reinforcement Learning

Authors: Christopher Reale, Rebecca Russell

Abstract: Hidden parameters are latent variables in reinforcement learning (RL) environments that are constant over the course of a trajectory. Understanding what, if any, hidden parameters affect a particular environment can aid both the development and appropriate usage of RL systems. We present an unsupervised method to map RL trajectories into a feature space where distance represents the relative diffe… ▽ More Hidden parameters are latent variables in reinforcement learning (RL) environments that are constant over the course of a trajectory. Understanding what, if any, hidden parameters affect a particular environment can aid both the development and appropriate usage of RL systems. We present an unsupervised method to map RL trajectories into a feature space where distance represents the relative difference in system behavior due to hidden parameters. Our approach disentangles the effects of hidden parameters by leveraging a recurrent neural network (RNN) world model as used in model-based RL. First, we alter the standard world model training algorithm to isolate the hidden parameter information in the world model memory. Then, we use a metric learning approach to map the RNN memory into a space with a distance metric approximating a bisimulation metric with respect to the hidden parameters. The resulting disentangled feature space can be used to meaningfully relate trajectories to each other and analyze the hidden parameter. We demonstrate our approach on four hidden parameters across three RL environments. Finally we present two methods to help identify and understand the effects of hidden parameters on systems. △ Less

Submitted 29 November, 2022; originally announced November 2022.

Comments: Appears in Proceedings of AAAI FSS-22 Symposium "Lessons Learned for Autonomous Assessment of Machine Abilities (LLAAMA)"

arXiv:1910.14215 [pdf, other]

Multivariate Uncertainty in Deep Learning

Authors: Rebecca L. Russell, Christopher Reale

Abstract: Deep learning has the potential to dramatically impact navigation and tracking state estimation problems critical to autonomous vehicles and robotics. Measurement uncertainties in state estimation systems based on Kalman and other Bayes filters are typically assumed to be a fixed covariance matrix. This assumption is risky, particularly for "black box" deep learning models, in which uncertainty ca… ▽ More Deep learning has the potential to dramatically impact navigation and tracking state estimation problems critical to autonomous vehicles and robotics. Measurement uncertainties in state estimation systems based on Kalman and other Bayes filters are typically assumed to be a fixed covariance matrix. This assumption is risky, particularly for "black box" deep learning models, in which uncertainty can vary dramatically and unexpectedly. Accurate quantification of multivariate uncertainty will allow for the full potential of deep learning to be used more safely and reliably in these applications. We show how to model multivariate uncertainty for regression problems with neural networks, incorporating both aleatoric and epistemic sources of heteroscedastic uncertainty. We train a deep uncertainty covariance matrix model in two ways: directly using a multivariate Gaussian density loss function, and indirectly using end-to-end training through a Kalman filter. We experimentally show in a visual tracking problem the large impact that accurate multivariate uncertainty quantification can have on Kalman filter performance for both in-domain and out-of-domain evaluation data. We additionally show in a challenging visual odometry problem how end-to-end filter training can allow uncertainty predictions to compensate for filter weaknesses. △ Less

Submitted 14 June, 2021; v1 submitted 30 October, 2019; originally announced October 2019.

Comments: To be published in IEEE Transactions on Neural Networks and Learning Systems

arXiv:1908.00449 [pdf, other]

Tree-Transformer: A Transformer-Based Method for Correction of Tree-Structured Data

Authors: Jacob Harer, Chris Reale, Peter Chin

Abstract: Many common sequential data sources, such as source code and natural language, have a natural tree-structured representation. These trees can be generated by fitting a sequence to a grammar, yielding a hierarchical ordering of the tokens in the sequence. This structure encodes a high degree of syntactic information, making it ideal for problems such as grammar correction. However, little work has… ▽ More Many common sequential data sources, such as source code and natural language, have a natural tree-structured representation. These trees can be generated by fitting a sequence to a grammar, yielding a hierarchical ordering of the tokens in the sequence. This structure encodes a high degree of syntactic information, making it ideal for problems such as grammar correction. However, little work has been done to develop neural networks that can operate on and exploit tree-structured data. In this paper we present the Tree-Transformer \textemdash{} a novel neural network architecture designed to translate between arbitrary input and output trees. We applied this architecture to correction tasks in both the source code and natural language domains. On source code, our model achieved an improvement of $25\%$ $\text{F}0.5$ over the best sequential method. On natural language, we achieved comparable results to the most complex state of the art systems, obtaining a $10\%$ improvement in recall on the CoNLL 2014 benchmark and the highest to date $\text{F}0.5$ score on the AESW benchmark of $50.43$. △ Less

Submitted 1 August, 2019; originally announced August 2019.

arXiv:1805.07475 [pdf, other]

Learning to Repair Software Vulnerabilities with Generative Adversarial Networks

Authors: Jacob Harer, Onur Ozdemir, Tomo Lazovich, Christopher P. Reale, Rebecca L. Russell, Louis Y. Kim, Peter Chin

Abstract: Motivated by the problem of automated repair of software vulnerabilities, we propose an adversarial learning approach that maps from one discrete source domain to another target domain without requiring paired labeled examples or source and target domains to be bijections. We demonstrate that the proposed adversarial learning approach is an effective technique for repairing software vulnerabilitie… ▽ More Motivated by the problem of automated repair of software vulnerabilities, we propose an adversarial learning approach that maps from one discrete source domain to another target domain without requiring paired labeled examples or source and target domains to be bijections. We demonstrate that the proposed adversarial learning approach is an effective technique for repairing software vulnerabilities, performing close to seq2seq approaches that require labeled pairs. The proposed Generative Adversarial Network approach is application-agnostic in that it can be applied to other problems similar to code repair, such as grammar correction or sentiment translation. △ Less

Submitted 28 October, 2018; v1 submitted 18 May, 2018; originally announced May 2018.

Comments: Presented at 32nd Conference on Neural Information Processing Systems (nips 2018), Montreal Canada

arXiv:1805.01818 [pdf, other]

Object and Text-guided Semantics for CNN-based Activity Recognition

Authors: Sungmin Eum, Christopher Reale, Heesung Kwon, Claire Bonial, Clare Voss

Abstract: Many previous methods have demonstrated the importance of considering semantically relevant objects for carrying out video-based human activity recognition, yet none of the methods have harvested the power of large text corpora to relate the objects and the activities to be transferred into learning a unified deep convolutional neural network. We present a novel activity recognition CNN which co-l… ▽ More Many previous methods have demonstrated the importance of considering semantically relevant objects for carrying out video-based human activity recognition, yet none of the methods have harvested the power of large text corpora to relate the objects and the activities to be transferred into learning a unified deep convolutional neural network. We present a novel activity recognition CNN which co-learns the object recognition task in an end-to-end multitask learning scheme to improve upon the baseline activity recognition performance. We further improve upon the multitask learning approach by exploiting a text-guided semantic space to select the most relevant objects with respect to the target activities. To the best of our knowledge, we are the first to investigate this approach. △ Less

Submitted 4 May, 2018; originally announced May 2018.

Comments: Submitted to ICIP 2018

arXiv:1404.2306 [pdf, ps, other]

How much is convenient to defect? A method to estimate the cooperation probability in Prisoner's Dilemma and other games

Authors: Cesco Reale

Abstract: In many cases the Nash equilibria are not predictive of the experimental players' behaviour. For some games of Game Theory it is proposed here a method to estimate the probabilities with which the different options will be actually chosen by balanced players, i.e. players that are neither too competitive, nor too cooperative. This will allow to measure the intrinsec cooperativeness degree of a gam… ▽ More In many cases the Nash equilibria are not predictive of the experimental players' behaviour. For some games of Game Theory it is proposed here a method to estimate the probabilities with which the different options will be actually chosen by balanced players, i.e. players that are neither too competitive, nor too cooperative. This will allow to measure the intrinsec cooperativeness degree of a game, only in function of its payoffs. The method is shaped on the Prisoner's Dilemma, then generalized for asymmetric tables, N players and N options. It is adapted to other conditions like Chicken Game, Battle of the Sexes, Stag Hunt and Translators (a new name proposed for a particular condition). Then the method is applied to other games like Diner's Dilemma, Public Goods Game, Traveler's Dilemma and War of Attrition. These games are so analyzed in a probabilistic way that is consistent to what we could expect intuitively, overcoming some known paradoxes of the Game Theory. △ Less

Submitted 17 February, 2014; originally announced April 2014.

arXiv:1310.8088 [pdf, other]

doi 10.1016/j.disc.2015.04.013

An optimal bound on the number of moves for open Mancala

Authors: Alessandro Musesti, Maurizio Paolini, Cesco Reale

Abstract: We determine the optimal bound for the maximum number of moves required to reach a periodic configuration of open mancala (also called open owari), inspired by a popular African game. A mancala move can be interpreted as a map from the set of compositions of a given integer in itself, thus relating our result to the study of the corresponding finite dynamical system. We determine the optimal bound for the maximum number of moves required to reach a periodic configuration of open mancala (also called open owari), inspired by a popular African game. A mancala move can be interpreted as a map from the set of compositions of a given integer in itself, thus relating our result to the study of the corresponding finite dynamical system. △ Less

Submitted 21 April, 2015; v1 submitted 30 October, 2013; originally announced October 2013.

MSC Class: 91A50

Journal ref: Discrete Mathematics, 338, Issue 11 (2015), 1872-1844

arXiv:1205.1703 [pdf, other]

Zeroth-rank operation and non transitive numbers. Nulranga operacio kaj netransitivaj nombroj. Operazione di rango zero e numeri non transitivi

Authors: Cesco Reale

Abstract: Observing the existing relationships between the elementary operations of addition, multiplication (iteration of additions) and exponentiation (iteration of multiplications), a new operation (named incrementation) is defined, consistently with these laws and such that addition turns out to be an iteration of incrementations. Incrementation turns out to be consistent with Ackermann's function. Afte… ▽ More Observing the existing relationships between the elementary operations of addition, multiplication (iteration of additions) and exponentiation (iteration of multiplications), a new operation (named incrementation) is defined, consistently with these laws and such that addition turns out to be an iteration of incrementations. Incrementation turns out to be consistent with Ackermann's function. After defining the inverse operation of incrementation (named decrementation), we observe that R is not closed under it. So a new set of numbers is defined (named E, Escherian numbers), such that decrementation is closed on it. After defining the concept of pseudoorder (analogous to the order, but not transitive), it is shown that Escherian numbers are not transitive. Then addition and multiplication on E are analysed, and a correspondence between E and C is found. Finally, incrementation is extended to C, in such a way that decrementation is closed on C too. English keywords: hyper-operations, incrementation, zeration, Ackermann function, intransitive order, not transitive order, intransitive numbers, non transitive numbers, not transitive numbers, new number sets. △ Less

Submitted 17 February, 2014; v1 submitted 7 May, 2012; originally announced May 2012.

Comments: in Italian

Showing 1–8 of 8 results for author: Reale, C