-
Supporting Mitosis Detection AI Training with Inter-Observer Eye-Gaze Consistencies
Authors:
Hongyan Gu,
Zihan Yan,
Ayesha Alvi,
Brandon Day,
Chunxu Yang,
Zida Wu,
Shino Magaki,
Mohammad Haeri,
Xiang 'Anthony' Chen
Abstract:
The expansion of artificial intelligence (AI) in pathology tasks has intensified the demand for doctors' annotations in AI development. However, collecting high-quality annotations from doctors is costly and time-consuming, creating a bottleneck in AI progress. This study investigates eye-tracking as a cost-effective technology to collect doctors' behavioral data for AI training with a focus on th…
▽ More
The expansion of artificial intelligence (AI) in pathology tasks has intensified the demand for doctors' annotations in AI development. However, collecting high-quality annotations from doctors is costly and time-consuming, creating a bottleneck in AI progress. This study investigates eye-tracking as a cost-effective technology to collect doctors' behavioral data for AI training with a focus on the pathology task of mitosis detection. One major challenge in using eye-gaze data is the low signal-to-noise ratio, which hinders the extraction of meaningful information. We tackled this by levering the properties of inter-observer eye-gaze consistencies and creating eye-gaze labels from consistent eye-fixations shared by a group of observers. Our study involved 14 non-medical participants, from whom we collected eye-gaze data and generated eye-gaze labels based on varying group sizes. We assessed the efficacy of such eye-gaze labels by training Convolutional Neural Networks (CNNs) and comparing their performance to those trained with ground truth annotations and a heuristic-based baseline. Results indicated that CNNs trained with our eye-gaze labels closely followed the performance of ground-truth-based CNNs, and significantly outperformed the baseline. Although primarily focused on mitosis, we envision that insights from this study can be generalized to other medical imaging tasks.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
Structure-aware generation of drug-like molecules
Authors:
Pavol Drotár,
Arian Rokkum Jamasb,
Ben Day,
Cătălina Cangea,
Pietro Liò
Abstract:
Structure-based drug design involves finding ligand molecules that exhibit structural and chemical complementarity to protein pockets. Deep generative methods have shown promise in proposing novel molecules from scratch (de-novo design), avoiding exhaustive virtual screening of chemical space. Most generative de-novo models fail to incorporate detailed ligand-protein interactions and 3D pocket str…
▽ More
Structure-based drug design involves finding ligand molecules that exhibit structural and chemical complementarity to protein pockets. Deep generative methods have shown promise in proposing novel molecules from scratch (de-novo design), avoiding exhaustive virtual screening of chemical space. Most generative de-novo models fail to incorporate detailed ligand-protein interactions and 3D pocket structures. We propose a novel supervised model that generates molecular graphs jointly with 3D pose in a discretised molecular space. Molecules are built atom-by-atom inside pockets, guided by structural information from crystallographic data. We evaluate our model using a docking benchmark and find that guided generation improves predicted binding affinities by 8% and drug-likeness scores by 10% over the baseline. Furthermore, our model proposes molecules with binding scores exceeding some known ligands, which could be useful in future wet-lab studies.
△ Less
Submitted 7 November, 2021;
originally announced November 2021.
-
Attentional Meta-learners for Few-shot Polythetic Classification
Authors:
Ben Day,
Ramon Viñas,
Nikola Simidjievski,
Pietro Liò
Abstract:
Polythetic classifications, based on shared patterns of features that need neither be universal nor constant among members of a class, are common in the natural world and greatly outnumber monothetic classifications over a set of features. We show that threshold meta-learners, such as Prototypical Networks, require an embedding dimension that is exponential in the number of task-relevant features…
▽ More
Polythetic classifications, based on shared patterns of features that need neither be universal nor constant among members of a class, are common in the natural world and greatly outnumber monothetic classifications over a set of features. We show that threshold meta-learners, such as Prototypical Networks, require an embedding dimension that is exponential in the number of task-relevant features to emulate these functions. In contrast, attentional classifiers, such as Matching Networks, are polythetic by default and able to solve these problems with a linear embedding dimension. However, we find that in the presence of task-irrelevant features, inherent to meta-learning problems, attentional models are susceptible to misclassification. To address this challenge, we propose a self-attention feature-selection mechanism that adaptively dilutes non-discriminative features. We demonstrate the effectiveness of our approach in meta-learning Boolean functions, and synthetic and real-world few-shot learning tasks.
△ Less
Submitted 27 June, 2022; v1 submitted 9 June, 2021;
originally announced June 2021.
-
Meta-learning using privileged information for dynamics
Authors:
Ben Day,
Alexander Norcliffe,
Jacob Moss,
Pietro Liò
Abstract:
Neural ODE Processes approach the problem of meta-learning for dynamics using a latent variable model, which permits a flexible aggregation of contextual information. This flexibility is inherited from the Neural Process framework and allows the model to aggregate sets of context observations of arbitrary size into a fixed-length representation. In the physical sciences, we often have access to st…
▽ More
Neural ODE Processes approach the problem of meta-learning for dynamics using a latent variable model, which permits a flexible aggregation of contextual information. This flexibility is inherited from the Neural Process framework and allows the model to aggregate sets of context observations of arbitrary size into a fixed-length representation. In the physical sciences, we often have access to structured knowledge in addition to raw observations of a system, such as the value of a conserved quantity or a description of an understood component. Taking advantage of the aggregation flexibility, we extend the Neural ODE Process model to use additional information within the Learning Using Privileged Information setting, and we validate our extension with experiments showing improved accuracy and calibration on simulated dynamics tasks.
△ Less
Submitted 29 April, 2021;
originally announced April 2021.
-
Neural ODE Processes
Authors:
Alexander Norcliffe,
Cristian Bodnar,
Ben Day,
Jacob Moss,
Pietro Liò
Abstract:
Neural Ordinary Differential Equations (NODEs) use a neural network to model the instantaneous rate of change in the state of a system. However, despite their apparent suitability for dynamics-governed time-series, NODEs present a few disadvantages. First, they are unable to adapt to incoming data points, a fundamental requirement for real-time applications imposed by the natural direction of time…
▽ More
Neural Ordinary Differential Equations (NODEs) use a neural network to model the instantaneous rate of change in the state of a system. However, despite their apparent suitability for dynamics-governed time-series, NODEs present a few disadvantages. First, they are unable to adapt to incoming data points, a fundamental requirement for real-time applications imposed by the natural direction of time. Second, time series are often composed of a sparse set of measurements that could be explained by many possible underlying dynamics. NODEs do not capture this uncertainty. In contrast, Neural Processes (NPs) are a family of models providing uncertainty estimation and fast data adaptation but lack an explicit treatment of the flow of time. To address these problems, we introduce Neural ODE Processes (NDPs), a new class of stochastic processes determined by a distribution over Neural ODEs. By maintaining an adaptive data-dependent distribution over the underlying ODE, we show that our model can successfully capture the dynamics of low-dimensional systems from just a few data points. At the same time, we demonstrate that NDPs scale up to challenging high-dimensional time-series with unknown latent dynamics such as rotating MNIST digits.
△ Less
Submitted 17 August, 2021; v1 submitted 23 March, 2021;
originally announced March 2021.
-
Utilising Graph Machine Learning within Drug Discovery and Development
Authors:
Thomas Gaudelet,
Ben Day,
Arian R. Jamasb,
Jyothish Soman,
Cristian Regep,
Gertrude Liu,
Jeremy B. R. Hayter,
Richard Vickers,
Charles Roberts,
Jian Tang,
David Roblin,
Tom L. Blundell,
Michael M. Bronstein,
Jake P. Taylor-King
Abstract:
Graph Machine Learning (GML) is receiving growing interest within the pharmaceutical and biotechnology industries for its ability to model biomolecular structures, the functional relationships between them, and integrate multi-omic datasets - amongst other data types. Herein, we present a multidisciplinary academic-industrial review of the topic within the context of drug discovery and development…
▽ More
Graph Machine Learning (GML) is receiving growing interest within the pharmaceutical and biotechnology industries for its ability to model biomolecular structures, the functional relationships between them, and integrate multi-omic datasets - amongst other data types. Herein, we present a multidisciplinary academic-industrial review of the topic within the context of drug discovery and development. After introducing key terms and modelling approaches, we move chronologically through the drug development pipeline to identify and summarise work incorporating: target identification, design of small molecules and biologics, and drug repurposing. Whilst the field is still emerging, key milestones including repurposed drugs entering in vivo studies, suggest graph machine learning will become a modelling framework of choice within biomedical machine learning.
△ Less
Submitted 10 February, 2021; v1 submitted 9 December, 2020;
originally announced December 2020.
-
The Role of Isomorphism Classes in Multi-Relational Datasets
Authors:
Vijja Wichitwechkarn,
Ben Day,
Cristian Bodnar,
Matthew Wales,
Pietro Liò
Abstract:
Multi-interaction systems abound in nature, from colloidal suspensions to gene regulatory circuits. These systems can produce complex dynamics and graph neural networks have been proposed as a method to extract underlying interactions and predict how systems will evolve. The current training and evaluation procedures for these models through the use of synthetic multi-relational datasets however a…
▽ More
Multi-interaction systems abound in nature, from colloidal suspensions to gene regulatory circuits. These systems can produce complex dynamics and graph neural networks have been proposed as a method to extract underlying interactions and predict how systems will evolve. The current training and evaluation procedures for these models through the use of synthetic multi-relational datasets however are agnostic to interaction network isomorphism classes, which produce identical dynamics up to initial conditions. We extensively analyse how isomorphism class awareness affects these models, focusing on neural relational inference (NRI) models, which are unique in explicitly inferring interactions to predict dynamics in the unsupervised setting. Specifically, we demonstrate that isomorphism leakage overestimates performance in multi-relational inference and that sampling biases present in the multi-interaction network generation process can impair generalisation. To remedy this, we propose isomorphism-aware synthetic benchmarks for model evaluation. We use these benchmarks to test generalisation abilities and demonstrate the existence of a threshold sampling frequency of isomorphism classes for successful learning. In addition, we demonstrate that isomorphism classes can be utilised through a simple prioritisation scheme to improve model performance, stability during training and reduce training time.
△ Less
Submitted 30 September, 2020;
originally announced September 2020.
-
Message Passing Neural Processes
Authors:
Ben Day,
Cătălina Cangea,
Arian R. Jamasb,
Pietro Liò
Abstract:
Neural Processes (NPs) are powerful and flexible models able to incorporate uncertainty when representing stochastic processes, while maintaining a linear time complexity. However, NPs produce a latent description by aggregating independent representations of context points and lack the ability to exploit relational information present in many datasets. This renders NPs ineffective in settings whe…
▽ More
Neural Processes (NPs) are powerful and flexible models able to incorporate uncertainty when representing stochastic processes, while maintaining a linear time complexity. However, NPs produce a latent description by aggregating independent representations of context points and lack the ability to exploit relational information present in many datasets. This renders NPs ineffective in settings where the stochastic process is primarily governed by neighbourhood rules, such as cellular automata (CA), and limits performance for any task where relational information remains unused. We address this shortcoming by introducing Message Passing Neural Processes (MPNPs), the first class of NPs that explicitly makes use of relational structure within the model. Our evaluation shows that MPNPs thrive at lower sampling rates, on existing benchmarks and newly-proposed CA and Cora-Branched tasks. We further report strong generalisation over density-based CA rule-sets and significant gains in challenging arbitrary-labelling and few-shot learning setups.
△ Less
Submitted 29 September, 2020;
originally announced September 2020.
-
Uncertainty in Neural Relational Inference Trajectory Reconstruction
Authors:
Vasileios Karavias,
Ben Day,
Pietro Liò
Abstract:
Neural networks used for multi-interaction trajectory reconstruction lack the ability to estimate the uncertainty in their outputs, which would be useful to better analyse and understand the systems they model. In this paper we extend the Factorised Neural Relational Inference model to output both a mean and a standard deviation for each component of the phase space vector, which together with an…
▽ More
Neural networks used for multi-interaction trajectory reconstruction lack the ability to estimate the uncertainty in their outputs, which would be useful to better analyse and understand the systems they model. In this paper we extend the Factorised Neural Relational Inference model to output both a mean and a standard deviation for each component of the phase space vector, which together with an appropriate loss function, can account for uncertainty. A variety of loss functions are investigated including ideas from convexification and a Bayesian treatment of the problem. We show that the physical meaning of the variables is important when considering the uncertainty and demonstrate the existence of pathological local minima that are difficult to avoid during training.
△ Less
Submitted 25 June, 2020; v1 submitted 24 June, 2020;
originally announced June 2020.
-
On Second Order Behaviour in Augmented Neural ODEs
Authors:
Alexander Norcliffe,
Cristian Bodnar,
Ben Day,
Nikola Simidjievski,
Pietro Liò
Abstract:
Neural Ordinary Differential Equations (NODEs) are a new class of models that transform data continuously through infinite-depth architectures. The continuous nature of NODEs has made them particularly suitable for learning the dynamics of complex physical systems. While previous work has mostly been focused on first order ODEs, the dynamics of many systems, especially in classical physics, are go…
▽ More
Neural Ordinary Differential Equations (NODEs) are a new class of models that transform data continuously through infinite-depth architectures. The continuous nature of NODEs has made them particularly suitable for learning the dynamics of complex physical systems. While previous work has mostly been focused on first order ODEs, the dynamics of many systems, especially in classical physics, are governed by second order laws. In this work, we consider Second Order Neural ODEs (SONODEs). We show how the adjoint sensitivity method can be extended to SONODEs and prove that the optimisation of a first order coupled ODE is equivalent and computationally more efficient. Furthermore, we extend the theoretical understanding of the broader class of Augmented NODEs (ANODEs) by showing they can also learn higher order dynamics with a minimal number of augmented dimensions, but at the cost of interpretability. This indicates that the advantages of ANODEs go beyond the extra space offered by the augmented dimensions, as originally thought. Finally, we compare SONODEs and ANODEs on synthetic and real dynamical systems and demonstrate that the inductive biases of the former generally result in faster training and better performance.
△ Less
Submitted 21 October, 2020; v1 submitted 12 June, 2020;
originally announced June 2020.
-
Proximal Distilled Evolutionary Reinforcement Learning
Authors:
Cristian Bodnar,
Ben Day,
Pietro Lió
Abstract:
Reinforcement Learning (RL) has achieved impressive performance in many complex environments due to the integration with Deep Neural Networks (DNNs). At the same time, Genetic Algorithms (GAs), often seen as a competing approach to RL, had limited success in scaling up to the DNNs required to solve challenging tasks. Contrary to this dichotomic view, in the physical world, evolution and learning a…
▽ More
Reinforcement Learning (RL) has achieved impressive performance in many complex environments due to the integration with Deep Neural Networks (DNNs). At the same time, Genetic Algorithms (GAs), often seen as a competing approach to RL, had limited success in scaling up to the DNNs required to solve challenging tasks. Contrary to this dichotomic view, in the physical world, evolution and learning are complementary processes that continuously interact. The recently proposed Evolutionary Reinforcement Learning (ERL) framework has demonstrated mutual benefits to performance when combining the two methods. However, ERL has not fully addressed the scalability problem of GAs. In this paper, we show that this problem is rooted in an unfortunate combination of a simple genetic encoding for DNNs and the use of traditional biologically-inspired variation operators. When applied to these encodings, the standard operators are destructive and cause catastrophic forgetting of the traits the networks acquired. We propose a novel algorithm called Proximal Distilled Evolutionary Reinforcement Learning (PDERL) that is characterised by a hierarchical integration between evolution and learning. The main innovation of PDERL is the use of learning-based variation operators that compensate for the simplicity of the genetic representation. Unlike traditional operators, our proposals meet the functional requirements of variation operators when applied on directly-encoded DNNs. We evaluate PDERL in five robot locomotion settings from the OpenAI gym. Our method outperforms ERL, as well as two state-of-the-art RL algorithms, PPO and TD3, in all tested environments.
△ Less
Submitted 7 July, 2020; v1 submitted 24 June, 2019;
originally announced June 2019.
-
Factorised Neural Relational Inference for Multi-Interaction Systems
Authors:
Ezra Webb,
Ben Day,
Helena Andres-Terre,
Pietro Lió
Abstract:
Many complex natural and cultural phenomena are well modelled by systems of simple interactions between particles. A number of architectures have been developed to articulate this kind of structure, both implicitly and explicitly. We consider an unsupervised explicit model, the NRI model, and make a series of representational adaptations and physically motivated changes. Most notably we factorise…
▽ More
Many complex natural and cultural phenomena are well modelled by systems of simple interactions between particles. A number of architectures have been developed to articulate this kind of structure, both implicitly and explicitly. We consider an unsupervised explicit model, the NRI model, and make a series of representational adaptations and physically motivated changes. Most notably we factorise the inferred latent interaction graph into a multiplex graph, allowing each layer to encode for a different interaction-type. This fNRI model is smaller in size and significantly outperforms the original in both edge and trajectory prediction, establishing a new state-of-the-art. We also present a simplified variant of our model, which demonstrates the NRI's formulation as a variational auto-encoder is not necessary for good performance, and make an adaptation to the NRI's training routine, significantly improving its ability to model complex physical dynamical systems.
△ Less
Submitted 21 May, 2019;
originally announced May 2019.
-
On Graph Classification Networks, Datasets and Baselines
Authors:
Enxhell Luzhnica,
Ben Day,
Pietro Liò
Abstract:
Graph classification receives a great deal of attention from the non-Euclidean machine learning community. Recent advances in graph coarsening have enabled the training of deeper networks and produced new state-of-the-art results in many benchmark tasks. We examine how these architectures train and find that performance is highly-sensitive to initialisation and depends strongly on jumping-knowledg…
▽ More
Graph classification receives a great deal of attention from the non-Euclidean machine learning community. Recent advances in graph coarsening have enabled the training of deeper networks and produced new state-of-the-art results in many benchmark tasks. We examine how these architectures train and find that performance is highly-sensitive to initialisation and depends strongly on jumping-knowledge structures. We then show that, despite the great complexity of these models, competitive performance is achieved by the simplest of models -- structure-blind MLP, single-layer GCN and fixed-weight GCN -- and propose these be included as baselines in future.
△ Less
Submitted 12 May, 2019;
originally announced May 2019.
-
Clique pooling for graph classification
Authors:
Enxhell Luzhnica,
Ben Day,
Pietro Lio'
Abstract:
We propose a novel graph pooling operation using cliques as the unit pool. As this approach is purely topological, rather than featural, it is more readily interpretable, a better analogue to image coarsening than filtering or pruning techniques, and entirely nonparametric. The operation is implemented within graph convolution network (GCN) and GraphSAGE architectures and tested against standard g…
▽ More
We propose a novel graph pooling operation using cliques as the unit pool. As this approach is purely topological, rather than featural, it is more readily interpretable, a better analogue to image coarsening than filtering or pruning techniques, and entirely nonparametric. The operation is implemented within graph convolution network (GCN) and GraphSAGE architectures and tested against standard graph classification benchmarks. In addition, we explore the backwards compatibility of the pooling to regular graphs, demonstrating competitive performance when replacing two-by-two pooling in standard convolutional neural networks (CNNs) with our mechanism.
△ Less
Submitted 9 April, 2019; v1 submitted 31 March, 2019;
originally announced April 2019.
-
Introducing Curvature to the Label Space
Authors:
Conor Sheehan,
Ben Day,
Pietro Liò
Abstract:
One-hot encoding is a labelling system that embeds classes as standard basis vectors in a label space. Despite seeing near-universal use in supervised categorical classification tasks, the scheme is problematic in its geometric implication that, as all classes are equally distant, all classes are equally different. This is inconsistent with most, if not all, real-world tasks due to the prevalence…
▽ More
One-hot encoding is a labelling system that embeds classes as standard basis vectors in a label space. Despite seeing near-universal use in supervised categorical classification tasks, the scheme is problematic in its geometric implication that, as all classes are equally distant, all classes are equally different. This is inconsistent with most, if not all, real-world tasks due to the prevalence of ancestral and convergent relationships generating a varying degree of morphological similarity across classes. We address this issue by introducing curvature to the label-space using a metric tensor as a self-regulating method that better represents these relationships as a bolt-on, learning-algorithm agnostic solution. We propose both general constraints and specific statistical parameterizations of the metric and identify a direction for future research using autoencoder-based parameterizations.
△ Less
Submitted 22 October, 2018;
originally announced October 2018.
-
Full-Duplex MIMO Relaying: Achievable Rates under Limited Dynamic Range
Authors:
Brian P. Day,
Adam R. Margetts,
Daniel W. Bliss,
Philip Schniter
Abstract:
In this paper we consider the problem of full-duplex multiple-input multiple-output (MIMO) relaying between multi-antenna source and destination nodes. The principal difficulty in implementing such a system is that, due to the limited attenuation between the relay's transmit and receive antenna arrays, the relay's outgoing signal may overwhelm its limited-dynamic-range input circuitry, making it d…
▽ More
In this paper we consider the problem of full-duplex multiple-input multiple-output (MIMO) relaying between multi-antenna source and destination nodes. The principal difficulty in implementing such a system is that, due to the limited attenuation between the relay's transmit and receive antenna arrays, the relay's outgoing signal may overwhelm its limited-dynamic-range input circuitry, making it difficult---if not impossible---to recover the desired incoming signal. While explicitly modeling transmitter/receiver dynamic-range limitations and channel estimation error, we derive tight upper and lower bounds on the end-to-end achievable rate of decode-and-forward-based full-duplex MIMO relay systems, and propose a transmission scheme based on maximization of the lower bound. The maximization requires us to (numerically) solve a nonconvex optimization problem, for which we detail a novel approach based on bisection search and gradient projection. To gain insights into system design tradeoffs, we also derive an analytic approximation to the achievable rate and numerically demonstrate its accuracy. We then study the behavior of the achievable rate as a function of signal-to-noise ratio, interference-to-noise ratio, transmitter/receiver dynamic range, number of antennas, and training length, using optimized half-duplex signaling as a baseline.
△ Less
Submitted 17 May, 2012; v1 submitted 10 November, 2011;
originally announced November 2011.