-
A Bayesian Hierarchical Time Series Model for Reconstructing Hydroclimate from Multiple Proxies
Authors:
Niamh Cahill,
Jacky Croke,
Micheline Campbell,
Kate Hughes,
John Vitkovsky,
Jack Eaton Kilgallen,
Andrew Parnell
Abstract:
We propose a Bayesian hierarchical model which produces probabilistic reconstructions of hydroclimatic variability in Queensland Australia. The model provides a standardised approach to hydroclimate reconstruction using multiple palaeoclimate proxy records derived from natural archives such as speleothems, ice cores and tree rings. The method combines time-series modelling with inverse prediction…
▽ More
We propose a Bayesian hierarchical model which produces probabilistic reconstructions of hydroclimatic variability in Queensland Australia. The model provides a standardised approach to hydroclimate reconstruction using multiple palaeoclimate proxy records derived from natural archives such as speleothems, ice cores and tree rings. The method combines time-series modelling with inverse prediction to quantify the relationships between a given hydroclimate index and relevant proxies over an instrumental period and subsequently reconstruct the hydroclimate back through time. We present case studies for Brisbane and Fitzroy catchments focusing on two hydroclimate indices, the Rainfall Index (RFI) and the Standardised Precipitation-Evapotranspiration Index (SPEI). The probabilistic nature of the reconstructions allows us to estimate the probability that a hydroclimate index in any reconstruction year was lower (higher) than the minimum (maximum) value observed over the instrumental period. In Brisbane, the RFI is unlikely (probabilities < 20%) to have exhibited extremes beyond the minimum/maximum values observed between 1889 and 2017. However, in Fitzroy there are several years during the reconstruction period where the RFI is likely (> 50% probability) to have exhibited behaviour beyond the minimum/maximum of what has been observed. For SPEI, the probability of observing such extremes since the end of the instrumental period in 1889 doesn't exceed 50% in any reconstruction year in Brisbane or Fitzroy.
△ Less
Submitted 8 August, 2022; v1 submitted 18 February, 2022;
originally announced February 2022.
-
Overcoming Catastrophic Forgetting via Direction-Constrained Optimization
Authors:
Yunfei Teng,
Anna Choromanska,
Murray Campbell,
Songtao Lu,
Parikshit Ram,
Lior Horesh
Abstract:
This paper studies a new design of the optimization algorithm for training deep learning models with a fixed architecture of the classification network in a continual learning framework. The training data is non-stationary and the non-stationarity is imposed by a sequence of distinct tasks. We first analyze a deep model trained on only one learning task in isolation and identify a region in networ…
▽ More
This paper studies a new design of the optimization algorithm for training deep learning models with a fixed architecture of the classification network in a continual learning framework. The training data is non-stationary and the non-stationarity is imposed by a sequence of distinct tasks. We first analyze a deep model trained on only one learning task in isolation and identify a region in network parameter space, where the model performance is close to the recovered optimum. We provide empirical evidence that this region resembles a cone that expands along the convergence direction. We study the principal directions of the trajectory of the optimizer after convergence and show that traveling along a few top principal directions can quickly bring the parameters outside the cone but this is not the case for the remaining directions. We argue that catastrophic forgetting in a continual learning setting can be alleviated when the parameters are constrained to stay within the intersection of the plausible cones of individual tasks that were so far encountered during training. Based on this observation we present our direction-constrained optimization (DCO) method, where for each task we introduce a linear autoencoder to approximate its corresponding top forbidden principal directions. They are then incorporated into the loss function in the form of a regularization term for the purpose of learning the coming tasks without forgetting. Furthermore, in order to control the memory growth as the number of tasks increases, we propose a memory-efficient version of our algorithm called compressed DCO (DCO-COMP) that allocates a memory of fixed size for storing all autoencoders. We empirically demonstrate that our algorithm performs favorably compared to other state-of-art regularization-based continual learning methods.
△ Less
Submitted 1 July, 2022; v1 submitted 25 November, 2020;
originally announced November 2020.
-
A Study of Compositional Generalization in Neural Models
Authors:
Tim Klinger,
Dhaval Adjodah,
Vincent Marois,
Josh Joseph,
Matthew Riemer,
Alex 'Sandy' Pentland,
Murray Campbell
Abstract:
Compositional and relational learning is a hallmark of human intelligence, but one which presents challenges for neural models. One difficulty in the development of such models is the lack of benchmarks with clear compositional and relational task structure on which to systematically evaluate them. In this paper, we introduce an environment called ConceptWorld, which enables the generation of imag…
▽ More
Compositional and relational learning is a hallmark of human intelligence, but one which presents challenges for neural models. One difficulty in the development of such models is the lack of benchmarks with clear compositional and relational task structure on which to systematically evaluate them. In this paper, we introduce an environment called ConceptWorld, which enables the generation of images from compositional and relational concepts, defined using a logical domain specific language. We use it to generate images for a variety of compositional structures: 2x2 squares, pentominoes, sequences, scenes involving these objects, and other more complex concepts. We perform experiments to test the ability of standard neural architectures to generalize on relations with compositional arguments as the compositional depth of those arguments increases and under substitution. We compare standard neural networks such as MLP, CNN and ResNet, as well as state-of-the-art relational networks including WReN and PrediNet in a multi-class image classification setting. For simple problems, all models generalize well to close concepts but struggle with longer compositional chains. For more complex tests involving substitutivity, all models struggle, even with short chains. In highlighting these difficulties and providing an environment for further experimentation, we hope to encourage the development of models which are able to generalize effectively in compositional, relational domains.
△ Less
Submitted 8 July, 2020; v1 submitted 16 June, 2020;
originally announced June 2020.
-
Revisiting Meta-Learning as Supervised Learning
Authors:
Wei-Lun Chao,
Han-Jia Ye,
De-Chuan Zhan,
Mark Campbell,
Kilian Q. Weinberger
Abstract:
Recent years have witnessed an abundance of new publications and approaches on meta-learning. This community-wide enthusiasm has sparked great insights but has also created a plethora of seemingly different frameworks, which can be hard to compare and evaluate. In this paper, we aim to provide a principled, unifying framework by revisiting and strengthening the connection between meta-learning and…
▽ More
Recent years have witnessed an abundance of new publications and approaches on meta-learning. This community-wide enthusiasm has sparked great insights but has also created a plethora of seemingly different frameworks, which can be hard to compare and evaluate. In this paper, we aim to provide a principled, unifying framework by revisiting and strengthening the connection between meta-learning and traditional supervised learning. By treating pairs of task-specific data sets and target models as (feature, label) samples, we can reduce many meta-learning algorithms to instances of supervised learning. This view not only unifies meta-learning into an intuitive and practical framework but also allows us to transfer insights from supervised learning directly to improve meta-learning. For example, we obtain a better understanding of generalization properties, and we can readily transfer well-understood techniques, such as model ensemble, pre-training, joint training, data augmentation, and even nearest neighbor based methods. We provide an intuitive analogy of these methods in the context of meta-learning and show that they give rise to significant improvements in model performance on few-shot learning.
△ Less
Submitted 3 February, 2020;
originally announced February 2020.
-
Teaching AI to Explain its Decisions Using Embeddings and Multi-Task Learning
Authors:
Noel C. F. Codella,
Michael Hind,
Karthikeyan Natesan Ramamurthy,
Murray Campbell,
Amit Dhurandhar,
Kush R. Varshney,
Dennis Wei,
Aleksandra Mojsilović
Abstract:
Using machine learning in high-stakes applications often requires predictions to be accompanied by explanations comprehensible to the domain user, who has ultimate responsibility for decisions and outcomes. Recently, a new framework for providing explanations, called TED, has been proposed to provide meaningful explanations for predictions. This framework augments training data to include explanat…
▽ More
Using machine learning in high-stakes applications often requires predictions to be accompanied by explanations comprehensible to the domain user, who has ultimate responsibility for decisions and outcomes. Recently, a new framework for providing explanations, called TED, has been proposed to provide meaningful explanations for predictions. This framework augments training data to include explanations elicited from domain users, in addition to features and labels. This approach ensures that explanations for predictions are tailored to the complexity expectations and domain knowledge of the consumer. In this paper, we build on this foundational work, by exploring more sophisticated instantiations of the TED framework and empirically evaluate their effectiveness in two diverse domains, chemical odor and skin cancer prediction. Results demonstrate that meaningful explanations can be reliably taught to machine learning algorithms, and in some cases, improving modeling accuracy.
△ Less
Submitted 5 June, 2019;
originally announced June 2019.
-
Hybrid Reinforcement Learning with Expert State Sequences
Authors:
Xiaoxiao Guo,
Shiyu Chang,
Mo Yu,
Gerald Tesauro,
Murray Campbell
Abstract:
Existing imitation learning approaches often require that the complete demonstration data, including sequences of actions and states, are available. In this paper, we consider a more realistic and difficult scenario where a reinforcement learning agent only has access to the state sequences of an expert, while the expert actions are unobserved. We propose a novel tensor-based model to infer the un…
▽ More
Existing imitation learning approaches often require that the complete demonstration data, including sequences of actions and states, are available. In this paper, we consider a more realistic and difficult scenario where a reinforcement learning agent only has access to the state sequences of an expert, while the expert actions are unobserved. We propose a novel tensor-based model to infer the unobserved actions of the expert state sequences. The policy of the agent is then optimized via a hybrid objective combining reinforcement learning and imitation learning. We evaluated our hybrid approach on an illustrative domain and Atari games. The empirical results show that (1) the agents are able to leverage state expert sequences to learn faster than pure reinforcement learning baselines, (2) our tensor-based action inference model is advantageous compared to standard deep neural networks in inferring expert actions, and (3) the hybrid policy optimization objective is robust against noise in expert state sequences.
△ Less
Submitted 10 March, 2019;
originally announced March 2019.
-
Interpretable Multi-Objective Reinforcement Learning through Policy Orchestration
Authors:
Ritesh Noothigattu,
Djallel Bouneffouf,
Nicholas Mattei,
Rachita Chandra,
Piyush Madan,
Kush Varshney,
Murray Campbell,
Moninder Singh,
Francesca Rossi
Abstract:
Autonomous cyber-physical agents and systems play an increasingly large role in our lives. To ensure that agents behave in ways aligned with the values of the societies in which they operate, we must develop techniques that allow these agents to not only maximize their reward in an environment, but also to learn and follow the implicit constraints of society. These constraints and norms can come f…
▽ More
Autonomous cyber-physical agents and systems play an increasingly large role in our lives. To ensure that agents behave in ways aligned with the values of the societies in which they operate, we must develop techniques that allow these agents to not only maximize their reward in an environment, but also to learn and follow the implicit constraints of society. These constraints and norms can come from any number of sources including regulations, business process guidelines, laws, ethical principles, social norms, and moral values. We detail a novel approach that uses inverse reinforcement learning to learn a set of unspecified constraints from demonstrations of the task, and reinforcement learning to learn to maximize the environment rewards. More precisely, we assume that an agent can observe traces of behavior of members of the society but has no access to the explicit set of constraints that give rise to the observed behavior. Inverse reinforcement learning is used to learn such constraints, that are then combined with a possibly orthogonal value function through the use of a contextual bandit-based orchestrator that picks a contextually-appropriate choice between the two policies (constraint-based and environment reward-based) when taking actions. The contextual bandit orchestrator allows the agent to mix policies in novel ways, taking the best actions from either a reward maximizing or constrained policy. In addition, the orchestrator is transparent on which policy is being employed at each time step. We test our algorithms using a Pac-Man domain and show that the agent is able to learn to act optimally, act within the demonstrated constraints, and mix these two functions in complex ways.
△ Less
Submitted 21 September, 2018;
originally announced September 2018.
-
Consistent Alignment of Word Embedding Models
Authors:
Cem Safak Sahin,
Rajmonda S. Caceres,
Brandon Oselio,
William M. Campbell
Abstract:
Word embedding models offer continuous vector representations that can capture rich contextual semantics based on their word co-occurrence patterns. While these word vectors can provide very effective features used in many NLP tasks such as clustering similar words and inferring learning relationships, many challenges and open research questions remain. In this paper, we propose a solution that al…
▽ More
Word embedding models offer continuous vector representations that can capture rich contextual semantics based on their word co-occurrence patterns. While these word vectors can provide very effective features used in many NLP tasks such as clustering similar words and inferring learning relationships, many challenges and open research questions remain. In this paper, we propose a solution that aligns variations of the same model (or different models) in a joint low-dimensional latent space leveraging carefully generated synthetic data points. This generative process is inspired by the observation that a variety of linguistic relationships is captured by simple linear operations in embedded space. We demonstrate that our approach can lead to substantial improvements in recovering embeddings of local neighborhoods.
△ Less
Submitted 24 February, 2017;
originally announced February 2017.
-
On Generalized Bayesian Data Fusion with Complex Models in Large Scale Networks
Authors:
Nisar Ahmed,
Tsung-Lin Yang,
Mark Campbell
Abstract:
Recent advances in communications, mobile computing, and artificial intelligence have greatly expanded the application space of intelligent distributed sensor networks. This in turn motivates the development of generalized Bayesian decentralized data fusion (DDF) algorithms for robust and efficient information sharing among autonomous agents using probabilistic belief models. However, DDF is signi…
▽ More
Recent advances in communications, mobile computing, and artificial intelligence have greatly expanded the application space of intelligent distributed sensor networks. This in turn motivates the development of generalized Bayesian decentralized data fusion (DDF) algorithms for robust and efficient information sharing among autonomous agents using probabilistic belief models. However, DDF is significantly challenging to implement for general real-world applications requiring the use of dynamic/ad hoc network topologies and complex belief models, such as Gaussian mixtures or hybrid Bayesian networks. To tackle these issues, we first discuss some new key mathematical insights about exact DDF and conservative approximations to DDF. These insights are then used to develop novel generalized DDF algorithms for complex beliefs based on mixture pdfs and conditional factors. Numerical examples motivated by multi-robot target search demonstrate that our methods lead to significantly better fusion results, and thus have great potential to enhance distributed intelligent reasoning in sensor networks.
△ Less
Submitted 13 August, 2013;
originally announced August 2013.
-
Maximum Likelihood Fusion of Stochastic Maps
Authors:
Brandon Jones,
Mark Campbell,
Lang Tong
Abstract:
The fusion of independently obtained stochastic maps by collaborating mobile agents is considered. The proposed approach includes two parts: matching of stochastic maps and maximum likelihood alignment. In particular, an affine invariant hypergraph is constructed for each stochastic map, and a bipartite matching via a linear program is used to establish landmark correspondence between stochastic m…
▽ More
The fusion of independently obtained stochastic maps by collaborating mobile agents is considered. The proposed approach includes two parts: matching of stochastic maps and maximum likelihood alignment. In particular, an affine invariant hypergraph is constructed for each stochastic map, and a bipartite matching via a linear program is used to establish landmark correspondence between stochastic maps. A maximum likelihood alignment procedure is proposed to determine rotation and translation between common landmarks in order to construct a global map within a common frame of reference. A main feature of the proposed approach is its scalability with respect to the number of landmarks: the matching step has polynomial complexity and the maximum likelihood alignment is obtained in closed form. Experimental validation of the proposed fusion approach is performed using the Victoria Park benchmark dataset.
△ Less
Submitted 25 March, 2013;
originally announced March 2013.