Search | arXiv e-print repository

Supporting Mitosis Detection AI Training with Inter-Observer Eye-Gaze Consistencies

Authors: Hongyan Gu, Zihan Yan, Ayesha Alvi, Brandon Day, Chunxu Yang, Zida Wu, Shino Magaki, Mohammad Haeri, Xiang 'Anthony' Chen

Abstract: The expansion of artificial intelligence (AI) in pathology tasks has intensified the demand for doctors' annotations in AI development. However, collecting high-quality annotations from doctors is costly and time-consuming, creating a bottleneck in AI progress. This study investigates eye-tracking as a cost-effective technology to collect doctors' behavioral data for AI training with a focus on th… ▽ More The expansion of artificial intelligence (AI) in pathology tasks has intensified the demand for doctors' annotations in AI development. However, collecting high-quality annotations from doctors is costly and time-consuming, creating a bottleneck in AI progress. This study investigates eye-tracking as a cost-effective technology to collect doctors' behavioral data for AI training with a focus on the pathology task of mitosis detection. One major challenge in using eye-gaze data is the low signal-to-noise ratio, which hinders the extraction of meaningful information. We tackled this by levering the properties of inter-observer eye-gaze consistencies and creating eye-gaze labels from consistent eye-fixations shared by a group of observers. Our study involved 14 non-medical participants, from whom we collected eye-gaze data and generated eye-gaze labels based on varying group sizes. We assessed the efficacy of such eye-gaze labels by training Convolutional Neural Networks (CNNs) and comparing their performance to those trained with ground truth annotations and a heuristic-based baseline. Results indicated that CNNs trained with our eye-gaze labels closely followed the performance of ground-truth-based CNNs, and significantly outperformed the baseline. Although primarily focused on mitosis, we envision that insights from this study can be generalized to other medical imaging tasks. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: Accepted by IEEE International Conference on Healthcare Informatics 2024

arXiv:2111.04107 [pdf, other]

Structure-aware generation of drug-like molecules

Authors: Pavol Drotár, Arian Rokkum Jamasb, Ben Day, Cătălina Cangea, Pietro Liò

Abstract: Structure-based drug design involves finding ligand molecules that exhibit structural and chemical complementarity to protein pockets. Deep generative methods have shown promise in proposing novel molecules from scratch (de-novo design), avoiding exhaustive virtual screening of chemical space. Most generative de-novo models fail to incorporate detailed ligand-protein interactions and 3D pocket str… ▽ More Structure-based drug design involves finding ligand molecules that exhibit structural and chemical complementarity to protein pockets. Deep generative methods have shown promise in proposing novel molecules from scratch (de-novo design), avoiding exhaustive virtual screening of chemical space. Most generative de-novo models fail to incorporate detailed ligand-protein interactions and 3D pocket structures. We propose a novel supervised model that generates molecular graphs jointly with 3D pose in a discretised molecular space. Molecules are built atom-by-atom inside pockets, guided by structural information from crystallographic data. We evaluate our model using a docking benchmark and find that guided generation improves predicted binding affinities by 8% and drug-likeness scores by 10% over the baseline. Furthermore, our model proposes molecules with binding scores exceeding some known ligands, which could be useful in future wet-lab studies. △ Less

Submitted 7 November, 2021; originally announced November 2021.

arXiv:2106.05317 [pdf, other]

Attentional Meta-learners for Few-shot Polythetic Classification

Authors: Ben Day, Ramon Viñas, Nikola Simidjievski, Pietro Liò

Abstract: Polythetic classifications, based on shared patterns of features that need neither be universal nor constant among members of a class, are common in the natural world and greatly outnumber monothetic classifications over a set of features. We show that threshold meta-learners, such as Prototypical Networks, require an embedding dimension that is exponential in the number of task-relevant features… ▽ More Polythetic classifications, based on shared patterns of features that need neither be universal nor constant among members of a class, are common in the natural world and greatly outnumber monothetic classifications over a set of features. We show that threshold meta-learners, such as Prototypical Networks, require an embedding dimension that is exponential in the number of task-relevant features to emulate these functions. In contrast, attentional classifiers, such as Matching Networks, are polythetic by default and able to solve these problems with a linear embedding dimension. However, we find that in the presence of task-irrelevant features, inherent to meta-learning problems, attentional models are susceptible to misclassification. To address this challenge, we propose a self-attention feature-selection mechanism that adaptively dilutes non-discriminative features. We demonstrate the effectiveness of our approach in meta-learning Boolean functions, and synthetic and real-world few-shot learning tasks. △ Less

Submitted 27 June, 2022; v1 submitted 9 June, 2021; originally announced June 2021.

Comments: Accepted at ICML 2022

arXiv:2104.14290 [pdf, other]

Meta-learning using privileged information for dynamics

Authors: Ben Day, Alexander Norcliffe, Jacob Moss, Pietro Liò

Abstract: Neural ODE Processes approach the problem of meta-learning for dynamics using a latent variable model, which permits a flexible aggregation of contextual information. This flexibility is inherited from the Neural Process framework and allows the model to aggregate sets of context observations of arbitrary size into a fixed-length representation. In the physical sciences, we often have access to st… ▽ More Neural ODE Processes approach the problem of meta-learning for dynamics using a latent variable model, which permits a flexible aggregation of contextual information. This flexibility is inherited from the Neural Process framework and allows the model to aggregate sets of context observations of arbitrary size into a fixed-length representation. In the physical sciences, we often have access to structured knowledge in addition to raw observations of a system, such as the value of a conserved quantity or a description of an understood component. Taking advantage of the aggregation flexibility, we extend the Neural ODE Process model to use additional information within the Learning Using Privileged Information setting, and we validate our extension with experiments showing improved accuracy and calibration on simulated dynamics tasks. △ Less

Submitted 29 April, 2021; originally announced April 2021.

Comments: Published as a workshop paper at the Learning to Learn and SimDL workshops at ICLR 2021. 4 pages, 3 pages of appendices

arXiv:2103.12413 [pdf, other]

Neural ODE Processes

Authors: Alexander Norcliffe, Cristian Bodnar, Ben Day, Jacob Moss, Pietro Liò

Abstract: Neural Ordinary Differential Equations (NODEs) use a neural network to model the instantaneous rate of change in the state of a system. However, despite their apparent suitability for dynamics-governed time-series, NODEs present a few disadvantages. First, they are unable to adapt to incoming data points, a fundamental requirement for real-time applications imposed by the natural direction of time… ▽ More Neural Ordinary Differential Equations (NODEs) use a neural network to model the instantaneous rate of change in the state of a system. However, despite their apparent suitability for dynamics-governed time-series, NODEs present a few disadvantages. First, they are unable to adapt to incoming data points, a fundamental requirement for real-time applications imposed by the natural direction of time. Second, time series are often composed of a sparse set of measurements that could be explained by many possible underlying dynamics. NODEs do not capture this uncertainty. In contrast, Neural Processes (NPs) are a family of models providing uncertainty estimation and fast data adaptation but lack an explicit treatment of the flow of time. To address these problems, we introduce Neural ODE Processes (NDPs), a new class of stochastic processes determined by a distribution over Neural ODEs. By maintaining an adaptive data-dependent distribution over the underlying ODE, we show that our model can successfully capture the dynamics of low-dimensional systems from just a few data points. At the same time, we demonstrate that NDPs scale up to challenging high-dimensional time-series with unknown latent dynamics such as rotating MNIST digits. △ Less

Submitted 17 August, 2021; v1 submitted 23 March, 2021; originally announced March 2021.

Comments: ICLR 2021. 9 pages, 6 figures, 7 pages of appendices

arXiv:2012.05716 [pdf, other]

Utilising Graph Machine Learning within Drug Discovery and Development

Authors: Thomas Gaudelet, Ben Day, Arian R. Jamasb, Jyothish Soman, Cristian Regep, Gertrude Liu, Jeremy B. R. Hayter, Richard Vickers, Charles Roberts, Jian Tang, David Roblin, Tom L. Blundell, Michael M. Bronstein, Jake P. Taylor-King

Abstract: Graph Machine Learning (GML) is receiving growing interest within the pharmaceutical and biotechnology industries for its ability to model biomolecular structures, the functional relationships between them, and integrate multi-omic datasets - amongst other data types. Herein, we present a multidisciplinary academic-industrial review of the topic within the context of drug discovery and development… ▽ More Graph Machine Learning (GML) is receiving growing interest within the pharmaceutical and biotechnology industries for its ability to model biomolecular structures, the functional relationships between them, and integrate multi-omic datasets - amongst other data types. Herein, we present a multidisciplinary academic-industrial review of the topic within the context of drug discovery and development. After introducing key terms and modelling approaches, we move chronologically through the drug development pipeline to identify and summarise work incorporating: target identification, design of small molecules and biologics, and drug repurposing. Whilst the field is still emerging, key milestones including repurposed drugs entering in vivo studies, suggest graph machine learning will become a modelling framework of choice within biomedical machine learning. △ Less

Submitted 10 February, 2021; v1 submitted 9 December, 2020; originally announced December 2020.

Comments: 19 pages, 7 figures, 2 tables

arXiv:2009.14593 [pdf, other]

The Role of Isomorphism Classes in Multi-Relational Datasets

Authors: Vijja Wichitwechkarn, Ben Day, Cristian Bodnar, Matthew Wales, Pietro Liò

Abstract: Multi-interaction systems abound in nature, from colloidal suspensions to gene regulatory circuits. These systems can produce complex dynamics and graph neural networks have been proposed as a method to extract underlying interactions and predict how systems will evolve. The current training and evaluation procedures for these models through the use of synthetic multi-relational datasets however a… ▽ More Multi-interaction systems abound in nature, from colloidal suspensions to gene regulatory circuits. These systems can produce complex dynamics and graph neural networks have been proposed as a method to extract underlying interactions and predict how systems will evolve. The current training and evaluation procedures for these models through the use of synthetic multi-relational datasets however are agnostic to interaction network isomorphism classes, which produce identical dynamics up to initial conditions. We extensively analyse how isomorphism class awareness affects these models, focusing on neural relational inference (NRI) models, which are unique in explicitly inferring interactions to predict dynamics in the unsupervised setting. Specifically, we demonstrate that isomorphism leakage overestimates performance in multi-relational inference and that sampling biases present in the multi-interaction network generation process can impair generalisation. To remedy this, we propose isomorphism-aware synthetic benchmarks for model evaluation. We use these benchmarks to test generalisation abilities and demonstrate the existence of a threshold sampling frequency of isomorphism classes for successful learning. In addition, we demonstrate that isomorphism classes can be utilised through a simple prioritisation scheme to improve model performance, stability during training and reduce training time. △ Less

Submitted 30 September, 2020; originally announced September 2020.

Comments: 7 pages main text, 1 page of references and an ethics statement, 3 pages of appendices

arXiv:2009.13895 [pdf, other]

Message Passing Neural Processes

Authors: Ben Day, Cătălina Cangea, Arian R. Jamasb, Pietro Liò

Abstract: Neural Processes (NPs) are powerful and flexible models able to incorporate uncertainty when representing stochastic processes, while maintaining a linear time complexity. However, NPs produce a latent description by aggregating independent representations of context points and lack the ability to exploit relational information present in many datasets. This renders NPs ineffective in settings whe… ▽ More Neural Processes (NPs) are powerful and flexible models able to incorporate uncertainty when representing stochastic processes, while maintaining a linear time complexity. However, NPs produce a latent description by aggregating independent representations of context points and lack the ability to exploit relational information present in many datasets. This renders NPs ineffective in settings where the stochastic process is primarily governed by neighbourhood rules, such as cellular automata (CA), and limits performance for any task where relational information remains unused. We address this shortcoming by introducing Message Passing Neural Processes (MPNPs), the first class of NPs that explicitly makes use of relational structure within the model. Our evaluation shows that MPNPs thrive at lower sampling rates, on existing benchmarks and newly-proposed CA and Cora-Branched tasks. We further report strong generalisation over density-based CA rule-sets and significant gains in challenging arbitrary-labelling and few-shot learning setups. △ Less

Submitted 29 September, 2020; originally announced September 2020.

Comments: 18 pages, 6 figures. The first two authors contributed equally

arXiv:2006.13666 [pdf, other]

Uncertainty in Neural Relational Inference Trajectory Reconstruction

Authors: Vasileios Karavias, Ben Day, Pietro Liò

Abstract: Neural networks used for multi-interaction trajectory reconstruction lack the ability to estimate the uncertainty in their outputs, which would be useful to better analyse and understand the systems they model. In this paper we extend the Factorised Neural Relational Inference model to output both a mean and a standard deviation for each component of the phase space vector, which together with an… ▽ More Neural networks used for multi-interaction trajectory reconstruction lack the ability to estimate the uncertainty in their outputs, which would be useful to better analyse and understand the systems they model. In this paper we extend the Factorised Neural Relational Inference model to output both a mean and a standard deviation for each component of the phase space vector, which together with an appropriate loss function, can account for uncertainty. A variety of loss functions are investigated including ideas from convexification and a Bayesian treatment of the problem. We show that the physical meaning of the variables is important when considering the uncertainty and demonstrate the existence of pathological local minima that are difficult to avoid during training. △ Less

Submitted 25 June, 2020; v1 submitted 24 June, 2020; originally announced June 2020.

Comments: 4 main pages, 1 page of supplementary materials. Accepted for presentation at the Graph Representation Learning and Beyond workshop at ICML 2020

arXiv:2006.07220 [pdf, other]

On Second Order Behaviour in Augmented Neural ODEs

Authors: Alexander Norcliffe, Cristian Bodnar, Ben Day, Nikola Simidjievski, Pietro Liò

Abstract: Neural Ordinary Differential Equations (NODEs) are a new class of models that transform data continuously through infinite-depth architectures. The continuous nature of NODEs has made them particularly suitable for learning the dynamics of complex physical systems. While previous work has mostly been focused on first order ODEs, the dynamics of many systems, especially in classical physics, are go… ▽ More Neural Ordinary Differential Equations (NODEs) are a new class of models that transform data continuously through infinite-depth architectures. The continuous nature of NODEs has made them particularly suitable for learning the dynamics of complex physical systems. While previous work has mostly been focused on first order ODEs, the dynamics of many systems, especially in classical physics, are governed by second order laws. In this work, we consider Second Order Neural ODEs (SONODEs). We show how the adjoint sensitivity method can be extended to SONODEs and prove that the optimisation of a first order coupled ODE is equivalent and computationally more efficient. Furthermore, we extend the theoretical understanding of the broader class of Augmented NODEs (ANODEs) by showing they can also learn higher order dynamics with a minimal number of augmented dimensions, but at the cost of interpretability. This indicates that the advantages of ANODEs go beyond the extra space offered by the augmented dimensions, as originally thought. Finally, we compare SONODEs and ANODEs on synthetic and real dynamical systems and demonstrate that the inductive biases of the former generally result in faster training and better performance. △ Less

Submitted 21 October, 2020; v1 submitted 12 June, 2020; originally announced June 2020.

Comments: Camera-ready version for NeurIPS 2020. Contains 28 pages, 15 figures

arXiv:1906.09807 [pdf, other]

doi 10.1609/aaai.v34i04.5728

Proximal Distilled Evolutionary Reinforcement Learning

Authors: Cristian Bodnar, Ben Day, Pietro Lió

Abstract: Reinforcement Learning (RL) has achieved impressive performance in many complex environments due to the integration with Deep Neural Networks (DNNs). At the same time, Genetic Algorithms (GAs), often seen as a competing approach to RL, had limited success in scaling up to the DNNs required to solve challenging tasks. Contrary to this dichotomic view, in the physical world, evolution and learning a… ▽ More Reinforcement Learning (RL) has achieved impressive performance in many complex environments due to the integration with Deep Neural Networks (DNNs). At the same time, Genetic Algorithms (GAs), often seen as a competing approach to RL, had limited success in scaling up to the DNNs required to solve challenging tasks. Contrary to this dichotomic view, in the physical world, evolution and learning are complementary processes that continuously interact. The recently proposed Evolutionary Reinforcement Learning (ERL) framework has demonstrated mutual benefits to performance when combining the two methods. However, ERL has not fully addressed the scalability problem of GAs. In this paper, we show that this problem is rooted in an unfortunate combination of a simple genetic encoding for DNNs and the use of traditional biologically-inspired variation operators. When applied to these encodings, the standard operators are destructive and cause catastrophic forgetting of the traits the networks acquired. We propose a novel algorithm called Proximal Distilled Evolutionary Reinforcement Learning (PDERL) that is characterised by a hierarchical integration between evolution and learning. The main innovation of PDERL is the use of learning-based variation operators that compensate for the simplicity of the genetic representation. Unlike traditional operators, our proposals meet the functional requirements of variation operators when applied on directly-encoded DNNs. We evaluate PDERL in five robot locomotion settings from the OpenAI gym. Our method outperforms ERL, as well as two state-of-the-art RL algorithms, PPO and TD3, in all tested environments. △ Less

Submitted 7 July, 2020; v1 submitted 24 June, 2019; originally announced June 2019.

Comments: Camera-ready version for AAAI-20. Contains 10 pages, 11 figures

Journal ref: Vol 34 No 04: AAAI 2020 Technical Track on Machine Learning 3283-3290

arXiv:1905.08721 [pdf, other]

Factorised Neural Relational Inference for Multi-Interaction Systems

Authors: Ezra Webb, Ben Day, Helena Andres-Terre, Pietro Lió

Abstract: Many complex natural and cultural phenomena are well modelled by systems of simple interactions between particles. A number of architectures have been developed to articulate this kind of structure, both implicitly and explicitly. We consider an unsupervised explicit model, the NRI model, and make a series of representational adaptations and physically motivated changes. Most notably we factorise… ▽ More Many complex natural and cultural phenomena are well modelled by systems of simple interactions between particles. A number of architectures have been developed to articulate this kind of structure, both implicitly and explicitly. We consider an unsupervised explicit model, the NRI model, and make a series of representational adaptations and physically motivated changes. Most notably we factorise the inferred latent interaction graph into a multiplex graph, allowing each layer to encode for a different interaction-type. This fNRI model is smaller in size and significantly outperforms the original in both edge and trajectory prediction, establishing a new state-of-the-art. We also present a simplified variant of our model, which demonstrates the NRI's formulation as a variational auto-encoder is not necessary for good performance, and make an adaptation to the NRI's training routine, significantly improving its ability to model complex physical dynamical systems. △ Less

Submitted 21 May, 2019; originally announced May 2019.

Comments: 4 page workshop paper accepted for presentation at the ICML 2019 Workshop on Learning and Reasoning with Graph-Structured Representations with 6 pages of supplementary materials and figures

arXiv:1905.04682 [pdf, other]

On Graph Classification Networks, Datasets and Baselines

Authors: Enxhell Luzhnica, Ben Day, Pietro Liò

Abstract: Graph classification receives a great deal of attention from the non-Euclidean machine learning community. Recent advances in graph coarsening have enabled the training of deeper networks and produced new state-of-the-art results in many benchmark tasks. We examine how these architectures train and find that performance is highly-sensitive to initialisation and depends strongly on jumping-knowledg… ▽ More Graph classification receives a great deal of attention from the non-Euclidean machine learning community. Recent advances in graph coarsening have enabled the training of deeper networks and produced new state-of-the-art results in many benchmark tasks. We examine how these architectures train and find that performance is highly-sensitive to initialisation and depends strongly on jumping-knowledge structures. We then show that, despite the great complexity of these models, competitive performance is achieved by the simplest of models -- structure-blind MLP, single-layer GCN and fixed-weight GCN -- and propose these be included as baselines in future. △ Less

Submitted 12 May, 2019; originally announced May 2019.

Comments: Submitted to the ICML 2019 Workshop on Learning and Reasoning with Graph-Structured Data

arXiv:1904.00374 [pdf, ps, other]

Clique pooling for graph classification

Authors: Enxhell Luzhnica, Ben Day, Pietro Lio'

Abstract: We propose a novel graph pooling operation using cliques as the unit pool. As this approach is purely topological, rather than featural, it is more readily interpretable, a better analogue to image coarsening than filtering or pruning techniques, and entirely nonparametric. The operation is implemented within graph convolution network (GCN) and GraphSAGE architectures and tested against standard g… ▽ More We propose a novel graph pooling operation using cliques as the unit pool. As this approach is purely topological, rather than featural, it is more readily interpretable, a better analogue to image coarsening than filtering or pruning techniques, and entirely nonparametric. The operation is implemented within graph convolution network (GCN) and GraphSAGE architectures and tested against standard graph classification benchmarks. In addition, we explore the backwards compatibility of the pooling to regular graphs, demonstrating competitive performance when replacing two-by-two pooling in standard convolutional neural networks (CNNs) with our mechanism. △ Less

Submitted 9 April, 2019; v1 submitted 31 March, 2019; originally announced April 2019.

Comments: Under review as a workshop paper at RLGM 2019 @ ICML

arXiv:1810.09549 [pdf, ps, other]

Introducing Curvature to the Label Space

Authors: Conor Sheehan, Ben Day, Pietro Liò

Abstract: One-hot encoding is a labelling system that embeds classes as standard basis vectors in a label space. Despite seeing near-universal use in supervised categorical classification tasks, the scheme is problematic in its geometric implication that, as all classes are equally distant, all classes are equally different. This is inconsistent with most, if not all, real-world tasks due to the prevalence… ▽ More One-hot encoding is a labelling system that embeds classes as standard basis vectors in a label space. Despite seeing near-universal use in supervised categorical classification tasks, the scheme is problematic in its geometric implication that, as all classes are equally distant, all classes are equally different. This is inconsistent with most, if not all, real-world tasks due to the prevalence of ancestral and convergent relationships generating a varying degree of morphological similarity across classes. We address this issue by introducing curvature to the label-space using a metric tensor as a self-regulating method that better represents these relationships as a bolt-on, learning-algorithm agnostic solution. We propose both general constraints and specific statistical parameterizations of the metric and identify a direction for future research using autoencoder-based parameterizations. △ Less

Submitted 22 October, 2018; originally announced October 2018.

Comments: Under review as a workshop paper at MetaLearn, NIPS 2018, 4 pages, 2 figures

arXiv:1111.2618 [pdf, ps, other]

doi 10.1109/TSP.2012.2192925

Full-Duplex MIMO Relaying: Achievable Rates under Limited Dynamic Range

Authors: Brian P. Day, Adam R. Margetts, Daniel W. Bliss, Philip Schniter

Abstract: In this paper we consider the problem of full-duplex multiple-input multiple-output (MIMO) relaying between multi-antenna source and destination nodes. The principal difficulty in implementing such a system is that, due to the limited attenuation between the relay's transmit and receive antenna arrays, the relay's outgoing signal may overwhelm its limited-dynamic-range input circuitry, making it d… ▽ More In this paper we consider the problem of full-duplex multiple-input multiple-output (MIMO) relaying between multi-antenna source and destination nodes. The principal difficulty in implementing such a system is that, due to the limited attenuation between the relay's transmit and receive antenna arrays, the relay's outgoing signal may overwhelm its limited-dynamic-range input circuitry, making it difficult---if not impossible---to recover the desired incoming signal. While explicitly modeling transmitter/receiver dynamic-range limitations and channel estimation error, we derive tight upper and lower bounds on the end-to-end achievable rate of decode-and-forward-based full-duplex MIMO relay systems, and propose a transmission scheme based on maximization of the lower bound. The maximization requires us to (numerically) solve a nonconvex optimization problem, for which we detail a novel approach based on bisection search and gradient projection. To gain insights into system design tradeoffs, we also derive an analytic approximation to the achievable rate and numerically demonstrate its accuracy. We then study the behavior of the achievable rate as a function of signal-to-noise ratio, interference-to-noise ratio, transmitter/receiver dynamic range, number of antennas, and training length, using optimized half-duplex signaling as a baseline. △ Less

Submitted 17 May, 2012; v1 submitted 10 November, 2011; originally announced November 2011.

Showing 1–16 of 16 results for author: Day, B