-
Composition and Control with Distilled Energy Diffusion Models and Sequential Monte Carlo
Authors:
James Thornton,
Louis Bethune,
Ruixiang Zhang,
Arwen Bradley,
Preetum Nakkiran,
Shuangfei Zhai
Abstract:
Diffusion models may be formulated as a time-indexed sequence of energy-based models, where the score corresponds to the negative gradient of an energy function. As opposed to learning the score directly, an energy parameterization is attractive as the energy itself can be used to control generation via Monte Carlo samplers. Architectural constraints and training instability in energy parameterize…
▽ More
Diffusion models may be formulated as a time-indexed sequence of energy-based models, where the score corresponds to the negative gradient of an energy function. As opposed to learning the score directly, an energy parameterization is attractive as the energy itself can be used to control generation via Monte Carlo samplers. Architectural constraints and training instability in energy parameterized models have so far yielded inferior performance compared to directly approximating the score or denoiser. We address these deficiencies by introducing a novel training regime for the energy function through distillation of pre-trained diffusion models, resembling a Helmholtz decomposition of the score vector field. We further showcase the synergies between energy and score by casting the diffusion sampling procedure as a Feynman Kac model where sampling is controlled using potentials from the learnt energy functions. The Feynman Kac model formalism enables composition and low temperature sampling through sequential Monte Carlo.
△ Less
Submitted 18 February, 2025;
originally announced February 2025.
-
Mechanisms of Projective Composition of Diffusion Models
Authors:
Arwen Bradley,
Preetum Nakkiran,
David Berthelot,
James Thornton,
Joshua M. Susskind
Abstract:
We study the theoretical foundations of composition in diffusion models, with a particular focus on out-of-distribution extrapolation and length-generalization. Prior work has shown that composing distributions via linear score combination can achieve promising results, including length-generalization in some cases (Du et al., 2023; Liu et al., 2022). However, our theoretical understanding of how…
▽ More
We study the theoretical foundations of composition in diffusion models, with a particular focus on out-of-distribution extrapolation and length-generalization. Prior work has shown that composing distributions via linear score combination can achieve promising results, including length-generalization in some cases (Du et al., 2023; Liu et al., 2022). However, our theoretical understanding of how and why such compositions work remains incomplete. In fact, it is not even entirely clear what it means for composition to "work". This paper starts to address these fundamental gaps. We begin by precisely defining one possible desired result of composition, which we call projective composition. Then, we investigate: (1) when linear score combinations provably achieve projective composition, (2) whether reverse-diffusion sampling can generate the desired composition, and (3) the conditions under which composition fails. We connect our theoretical analysis to prior empirical observations where composition has either worked or failed, for reasons that were unclear at the time. Finally, we propose a simple heuristic to help predict the success or failure of new compositions.
△ Less
Submitted 14 May, 2025; v1 submitted 6 February, 2025;
originally announced February 2025.
-
ASTRA: A Scene-aware TRAnsformer-based model for trajectory prediction
Authors:
Izzeddin Teeti,
Aniket Thomas,
Munish Monga,
Sachin Kumar,
Uddeshya Singh,
Andrew Bradley,
Biplab Banerjee,
Fabio Cuzzolin
Abstract:
We present ASTRA (A} Scene-aware TRAnsformer-based model for trajectory prediction), a light-weight pedestrian trajectory forecasting model that integrates the scene context, spatial dynamics, social inter-agent interactions and temporal progressions for precise forecasting. We utilised a U-Net-based feature extractor, via its latent vector representation, to capture scene representations and a gr…
▽ More
We present ASTRA (A} Scene-aware TRAnsformer-based model for trajectory prediction), a light-weight pedestrian trajectory forecasting model that integrates the scene context, spatial dynamics, social inter-agent interactions and temporal progressions for precise forecasting. We utilised a U-Net-based feature extractor, via its latent vector representation, to capture scene representations and a graph-aware transformer encoder for capturing social interactions. These components are integrated to learn an agent-scene aware embedding, enabling the model to learn spatial dynamics and forecast the future trajectory of pedestrians. The model is designed to produce both deterministic and stochastic outcomes, with the stochastic predictions being generated by incorporating a Conditional Variational Auto-Encoder (CVAE). ASTRA also proposes a simple yet effective weighted penalty loss function, which helps to yield predictions that outperform a wide array of state-of-the-art deterministic and generative models. ASTRA demonstrates an average improvement of 27%/10% in deterministic/stochastic settings on the ETH-UCY dataset, and 26% improvement on the PIE dataset, respectively, along with seven times fewer parameters than the existing state-of-the-art model (see Figure 1). Additionally, the model's versatility allows it to generalize across different perspectives, such as Bird's Eye View (BEV) and Ego-Vehicle View (EVV).
△ Less
Submitted 16 January, 2025;
originally announced January 2025.
-
ROAD-Waymo: Action Awareness at Scale for Autonomous Driving
Authors:
Salman Khan,
Izzeddin Teeti,
Reza Javanmard Alitappeh,
Mihaela C. Stoian,
Eleonora Giunchiglia,
Gurkirt Singh,
Andrew Bradley,
Fabio Cuzzolin
Abstract:
Autonomous Vehicle (AV) perception systems require more than simply seeing, via e.g., object detection or scene segmentation. They need a holistic understanding of what is happening within the scene for safe interaction with other road users. Few datasets exist for the purpose of developing and training algorithms to comprehend the actions of other road users. This paper presents ROAD-Waymo, an ex…
▽ More
Autonomous Vehicle (AV) perception systems require more than simply seeing, via e.g., object detection or scene segmentation. They need a holistic understanding of what is happening within the scene for safe interaction with other road users. Few datasets exist for the purpose of developing and training algorithms to comprehend the actions of other road users. This paper presents ROAD-Waymo, an extensive dataset for the development and benchmarking of techniques for agent, action, location and event detection in road scenes, provided as a layer upon the (US) Waymo Open dataset. Considerably larger and more challenging than any existing dataset (and encompassing multiple cities), it comes with 198k annotated video frames, 54k agent tubes, 3.9M bounding boxes and a total of 12.4M labels. The integrity of the dataset has been confirmed and enhanced via a novel annotation pipeline designed for automatically identifying violations of requirements specifically designed for this dataset. As ROAD-Waymo is compatible with the original (UK) ROAD dataset, it provides the opportunity to tackle domain adaptation between real-world road scenarios in different countries within a novel benchmark: ROAD++.
△ Less
Submitted 8 November, 2024; v1 submitted 3 November, 2024;
originally announced November 2024.
-
Strategic management analysis: from data to strategy diagram by LLM
Authors:
Richard Brath,
Adam Bradley,
David Jonker
Abstract:
Strategy management analyses are created by business consultants with common analysis frameworks (i.e. comparative analyses) and associated diagrams. We show these can be largely constructed using LLMs, starting with the extraction of insights from data, organization of those insights according to a strategy management framework, and then depiction in the typical strategy management diagram for th…
▽ More
Strategy management analyses are created by business consultants with common analysis frameworks (i.e. comparative analyses) and associated diagrams. We show these can be largely constructed using LLMs, starting with the extraction of insights from data, organization of those insights according to a strategy management framework, and then depiction in the typical strategy management diagram for that framework (static textual visualizations). We discuss caveats and future directions to generalize for broader uses.
△ Less
Submitted 10 September, 2024;
originally announced September 2024.
-
Classifier-Free Guidance is a Predictor-Corrector
Authors:
Arwen Bradley,
Preetum Nakkiran
Abstract:
We investigate the theoretical foundations of classifier-free guidance (CFG). CFG is the dominant method of conditional sampling for text-to-image diffusion models, yet unlike other aspects of diffusion, it remains on shaky theoretical footing. In this paper, we disprove common misconceptions, by showing that CFG interacts differently with DDPM (Ho et al., 2020) and DDIM (Song et al., 2021), and n…
▽ More
We investigate the theoretical foundations of classifier-free guidance (CFG). CFG is the dominant method of conditional sampling for text-to-image diffusion models, yet unlike other aspects of diffusion, it remains on shaky theoretical footing. In this paper, we disprove common misconceptions, by showing that CFG interacts differently with DDPM (Ho et al., 2020) and DDIM (Song et al., 2021), and neither sampler with CFG generates the gamma-powered distribution $p(x|c)^γp(x)^{1-γ}$. Then, we clarify the behavior of CFG by showing that it is a kind of predictor-corrector method (Song et al., 2020) that alternates between denoising and sharpening, which we call predictor-corrector guidance (PCG). We prove that in the SDE limit, CFG is actually equivalent to combining a DDIM predictor for the conditional distribution together with a Langevin dynamics corrector for a gamma-powered distribution (with a carefully chosen gamma). Our work thus provides a lens to theoretically understand CFG by embedding it in a broader design space of principled sampling methods.
△ Less
Submitted 23 August, 2024; v1 submitted 16 August, 2024;
originally announced August 2024.
-
Step-by-Step Diffusion: An Elementary Tutorial
Authors:
Preetum Nakkiran,
Arwen Bradley,
Hattie Zhou,
Madhu Advani
Abstract:
We present an accessible first course on diffusion models and flow matching for machine learning, aimed at a technical audience with no diffusion experience. We try to simplify the mathematical details as much as possible (sometimes heuristically), while retaining enough precision to derive correct algorithms.
We present an accessible first course on diffusion models and flow matching for machine learning, aimed at a technical audience with no diffusion experience. We try to simplify the mathematical details as much as possible (sometimes heuristically), while retaining enough precision to derive correct algorithms.
△ Less
Submitted 23 June, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
Vanishing Gradients in Reinforcement Finetuning of Language Models
Authors:
Noam Razin,
Hattie Zhou,
Omid Saremi,
Vimal Thilak,
Arwen Bradley,
Preetum Nakkiran,
Joshua Susskind,
Etai Littwin
Abstract:
Pretrained language models are commonly aligned with human preferences and downstream tasks via reinforcement finetuning (RFT), which refers to maximizing a (possibly learned) reward function using policy gradient algorithms. This work identifies a fundamental optimization obstacle in RFT: we prove that the expected gradient for an input vanishes when its reward standard deviation under the model…
▽ More
Pretrained language models are commonly aligned with human preferences and downstream tasks via reinforcement finetuning (RFT), which refers to maximizing a (possibly learned) reward function using policy gradient algorithms. This work identifies a fundamental optimization obstacle in RFT: we prove that the expected gradient for an input vanishes when its reward standard deviation under the model is small, even if the expected reward is far from optimal. Through experiments on an RFT benchmark and controlled environments, as well as a theoretical analysis, we then demonstrate that vanishing gradients due to small reward standard deviation are prevalent and detrimental, leading to extremely slow reward maximization. Lastly, we explore ways to overcome vanishing gradients in RFT. We find the common practice of an initial supervised finetuning (SFT) phase to be the most promising candidate, which sheds light on its importance in an RFT pipeline. Moreover, we show that a relatively small number of SFT optimization steps on as few as 1% of the input samples can suffice, indicating that the initial SFT phase need not be expensive in terms of compute and data labeling efforts. Overall, our results emphasize that being mindful for inputs whose expected gradient vanishes, as measured by the reward standard deviation, is crucial for successful execution of RFT.
△ Less
Submitted 14 March, 2024; v1 submitted 31 October, 2023;
originally announced October 2023.
-
A Hybrid Graph Network for Complex Activity Detection in Video
Authors:
Salman Khan,
Izzeddin Teeti,
Andrew Bradley,
Mohamed Elhoseiny,
Fabio Cuzzolin
Abstract:
Interpretation and understanding of video presents a challenging computer vision task in numerous fields - e.g. autonomous driving and sports analytics. Existing approaches to interpreting the actions taking place within a video clip are based upon Temporal Action Localisation (TAL), which typically identifies short-term actions. The emerging field of Complex Activity Detection (CompAD) extends th…
▽ More
Interpretation and understanding of video presents a challenging computer vision task in numerous fields - e.g. autonomous driving and sports analytics. Existing approaches to interpreting the actions taking place within a video clip are based upon Temporal Action Localisation (TAL), which typically identifies short-term actions. The emerging field of Complex Activity Detection (CompAD) extends this analysis to long-term activities, with a deeper understanding obtained by modelling the internal structure of a complex activity taking place within the video. We address the CompAD problem using a hybrid graph neural network which combines attention applied to a graph encoding the local (short-term) dynamic scene with a temporal graph modelling the overall long-duration activity. Our approach is as follows: i) Firstly, we propose a novel feature extraction technique which, for each video snippet, generates spatiotemporal `tubes' for the active elements (`agents') in the (local) scene by detecting individual objects, tracking them and then extracting 3D features from all the agent tubes as well as the overall scene. ii) Next, we construct a local scene graph where each node (representing either an agent tube or the scene) is connected to all other nodes. Attention is then applied to this graph to obtain an overall representation of the local dynamic scene. iii) Finally, all local scene graph representations are interconnected via a temporal graph, to estimate the complex activity class together with its start and end time. The proposed framework outperforms all previous state-of-the-art methods on all three datasets including ActivityNet-1.3, Thumos-14, and ROAD.
△ Less
Submitted 30 October, 2023; v1 submitted 26 October, 2023;
originally announced October 2023.
-
What Algorithms can Transformers Learn? A Study in Length Generalization
Authors:
Hattie Zhou,
Arwen Bradley,
Etai Littwin,
Noam Razin,
Omid Saremi,
Josh Susskind,
Samy Bengio,
Preetum Nakkiran
Abstract:
Large language models exhibit surprising emergent generalization properties, yet also struggle on many simple reasoning tasks such as arithmetic and parity. This raises the question of if and when Transformer models can learn the true algorithm for solving a task. We study the scope of Transformers' abilities in the specific setting of length generalization on algorithmic tasks. Here, we propose a…
▽ More
Large language models exhibit surprising emergent generalization properties, yet also struggle on many simple reasoning tasks such as arithmetic and parity. This raises the question of if and when Transformer models can learn the true algorithm for solving a task. We study the scope of Transformers' abilities in the specific setting of length generalization on algorithmic tasks. Here, we propose a unifying framework to understand when and how Transformers can exhibit strong length generalization on a given task. Specifically, we leverage RASP (Weiss et al., 2021) -- a programming language designed for the computational model of a Transformer -- and introduce the RASP-Generalization Conjecture: Transformers tend to length generalize on a task if the task can be solved by a short RASP program which works for all input lengths. This simple conjecture remarkably captures most known instances of length generalization on algorithmic tasks. Moreover, we leverage our insights to drastically improve generalization performance on traditionally hard tasks (such as parity and addition). On the theoretical side, we give a simple example where the "min-degree-interpolator" model of learning from Abbe et al. (2023) does not correctly predict Transformers' out-of-distribution behavior, but our conjecture does. Overall, our work provides a novel perspective on the mechanisms of compositional generalization and the algorithmic capabilities of Transformers.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
Temporal DINO: A Self-supervised Video Strategy to Enhance Action Prediction
Authors:
Izzeddin Teeti,
Rongali Sai Bhargav,
Vivek Singh,
Andrew Bradley,
Biplab Banerjee,
Fabio Cuzzolin
Abstract:
The emerging field of action prediction plays a vital role in various computer vision applications such as autonomous driving, activity analysis and human-computer interaction. Despite significant advancements, accurately predicting future actions remains a challenging problem due to high dimensionality, complex dynamics and uncertainties inherent in video data. Traditional supervised approaches r…
▽ More
The emerging field of action prediction plays a vital role in various computer vision applications such as autonomous driving, activity analysis and human-computer interaction. Despite significant advancements, accurately predicting future actions remains a challenging problem due to high dimensionality, complex dynamics and uncertainties inherent in video data. Traditional supervised approaches require large amounts of labelled data, which is expensive and time-consuming to obtain. This paper introduces a novel self-supervised video strategy for enhancing action prediction inspired by DINO (self-distillation with no labels). The Temporal-DINO approach employs two models; a 'student' processing past frames; and a 'teacher' processing both past and future frames, enabling a broader temporal context. During training, the teacher guides the student to learn future context by only observing past frames. The strategy is evaluated on ROAD dataset for the action prediction downstream task using 3D-ResNet, Transformer, and LSTM architectures. The experimental results showcase significant improvements in prediction performance across these architectures, with our method achieving an average enhancement of 9.9% Precision Points (PP), highlighting its effectiveness in enhancing the backbones' capabilities of capturing long-term dependencies. Furthermore, our approach demonstrates efficiency regarding the pretraining dataset size and the number of epochs required. This method overcomes limitations present in other approaches, including considering various backbone architectures, addressing multiple prediction horizons, reducing reliance on hand-crafted augmentations, and streamlining the pretraining process into a single stage. These findings highlight the potential of our approach in diverse video-based tasks such as activity recognition, motion planning, and scene understanding.
△ Less
Submitted 20 August, 2023; v1 submitted 8 August, 2023;
originally announced August 2023.
-
A Scenario-Based Functional Testing Approach to Improving DNN Performance
Authors:
Hong Zhu,
Thi Minh Tam Tran,
Aduen Benjumea,
Andrew Bradley
Abstract:
This paper proposes a scenario-based functional testing approach for enhancing the performance of machine learning (ML) applications. The proposed method is an iterative process that starts with testing the ML model on various scenarios to identify areas of weakness. It follows by a further testing on the suspected weak scenarios and statistically evaluate the model's performance on the scenarios…
▽ More
This paper proposes a scenario-based functional testing approach for enhancing the performance of machine learning (ML) applications. The proposed method is an iterative process that starts with testing the ML model on various scenarios to identify areas of weakness. It follows by a further testing on the suspected weak scenarios and statistically evaluate the model's performance on the scenarios to confirm the diagnosis. Once the diagnosis of weak scenarios is confirmed by test results, the treatment of the model is performed by retraining the model using a transfer learning technique with the original model as the base and applying a set of training data specifically targeting the treated scenarios plus a subset of training data selected at random from the original train dataset to prevent the so-call catastrophic forgetting effect. Finally, after the treatment, the model is assessed and evaluated again by testing on the treated scenarios as well as other scenarios to check if the treatment is effective and no side effect caused. The paper reports a case study with a real ML deep neural network (DNN) model, which is the perception system of an autonomous racing car. It is demonstrated that the method is effective in the sense that DNN model's performance can be improved. It provides an efficient method of enhancing ML model's performance with much less human and compute resource than retrain from scratch.
△ Less
Submitted 13 July, 2023;
originally announced July 2023.
-
Never mind the metrics -- what about the uncertainty? Visualising confusion matrix metric distributions
Authors:
David Lovell,
Dimity Miller,
Jaiden Capra,
Andrew Bradley
Abstract:
There are strong incentives to build models that demonstrate outstanding predictive performance on various datasets and benchmarks. We believe these incentives risk a narrow focus on models and on the performance metrics used to evaluate and compare them -- resulting in a growing body of literature to evaluate and compare metrics. This paper strives for a more balanced perspective on classifier pe…
▽ More
There are strong incentives to build models that demonstrate outstanding predictive performance on various datasets and benchmarks. We believe these incentives risk a narrow focus on models and on the performance metrics used to evaluate and compare them -- resulting in a growing body of literature to evaluate and compare metrics. This paper strives for a more balanced perspective on classifier performance metrics by highlighting their distributions under different models of uncertainty and showing how this uncertainty can easily eclipse differences in the empirical performance of classifiers. We begin by emphasising the fundamentally discrete nature of empirical confusion matrices and show how binary matrices can be meaningfully represented in a three dimensional compositional lattice, whose cross-sections form the basis of the space of receiver operating characteristic (ROC) curves. We develop equations, animations and interactive visualisations of the contours of performance metrics within (and beyond) this ROC space, showing how some are affected by class imbalance. We provide interactive visualisations that show the discrete posterior predictive probability mass functions of true and false positive rates in ROC space, and how these relate to uncertainty in performance metrics such as Balanced Accuracy (BA) and the Matthews Correlation Coefficient (MCC). Our hope is that these insights and visualisations will raise greater awareness of the substantial uncertainty in performance metric estimates that can arise when classifiers are evaluated on empirical datasets and benchmarks, and that classification model performance claims should be tempered by this understanding.
△ Less
Submitted 5 June, 2022;
originally announced June 2022.
-
Simulating Malicious Attacks on VANETs for Connected and Autonomous Vehicle Cybersecurity: A Machine Learning Dataset
Authors:
Safras Iqbal,
Peter Ball,
Muhammad H Kamarudin,
Andrew Bradley
Abstract:
Connected and Autonomous Vehicles (CAVs) rely on Vehicular Adhoc Networks with wireless communication between vehicles and roadside infrastructure to support safe operation. However, cybersecurity attacks pose a threat to VANETs and the safe operation of CAVs. This study proposes the use of simulation for modelling typical communication scenarios which may be subject to malicious attacks. The Ecli…
▽ More
Connected and Autonomous Vehicles (CAVs) rely on Vehicular Adhoc Networks with wireless communication between vehicles and roadside infrastructure to support safe operation. However, cybersecurity attacks pose a threat to VANETs and the safe operation of CAVs. This study proposes the use of simulation for modelling typical communication scenarios which may be subject to malicious attacks. The Eclipse MOSAIC simulation framework is used to model two typical road scenarios, including messaging between the vehicles and infrastructure - and both replay and bogus information cybersecurity attacks are introduced. The model demonstrates the impact of these attacks, and provides an open dataset to inform the development of machine learning algorithms to provide anomaly detection and mitigation solutions for enhancing secure communications and safe deployment of CAVs on the road.
△ Less
Submitted 15 February, 2022;
originally announced February 2022.
-
Vision in adverse weather: Augmentation using CycleGANs with various object detectors for robust perception in autonomous racing
Authors:
Izzeddin Teeti,
Valentina Musat,
Salman Khan,
Alexander Rast,
Fabio Cuzzolin,
Andrew Bradley
Abstract:
In an autonomous driving system, perception - identification of features and objects from the environment - is crucial. In autonomous racing, high speeds and small margins demand rapid and accurate detection systems. During the race, the weather can change abruptly, causing significant degradation in perception, resulting in ineffective manoeuvres. In order to improve detection in adverse weather,…
▽ More
In an autonomous driving system, perception - identification of features and objects from the environment - is crucial. In autonomous racing, high speeds and small margins demand rapid and accurate detection systems. During the race, the weather can change abruptly, causing significant degradation in perception, resulting in ineffective manoeuvres. In order to improve detection in adverse weather, deep-learning-based models typically require extensive datasets captured in such conditions - the collection of which is a tedious, laborious, and costly process. However, recent developments in CycleGAN architectures allow the synthesis of highly realistic scenes in multiple weather conditions. To this end, we introduce an approach of using synthesised adverse condition datasets in autonomous racing (generated using CycleGAN) to improve the performance of four out of five state-of-the-art detectors by an average of 42.7 and 4.4 mAP percentage points in the presence of night-time conditions and droplets, respectively. Furthermore, we present a comparative analysis of five object detectors - identifying the optimal pairing of detector and training data for use during autonomous racing in challenging conditions.
△ Less
Submitted 2 January, 2023; v1 submitted 10 January, 2022;
originally announced January 2022.
-
YOLO-Z: Improving small object detection in YOLOv5 for autonomous vehicles
Authors:
Aduen Benjumea,
Izzeddin Teeti,
Fabio Cuzzolin,
Andrew Bradley
Abstract:
As autonomous vehicles and autonomous racing rise in popularity, so does the need for faster and more accurate detectors. While our naked eyes are able to extract contextual information almost instantly, even from far away, image resolution and computational resources limitations make detecting smaller objects (that is, objects that occupy a small pixel area in the input image) a genuinely challen…
▽ More
As autonomous vehicles and autonomous racing rise in popularity, so does the need for faster and more accurate detectors. While our naked eyes are able to extract contextual information almost instantly, even from far away, image resolution and computational resources limitations make detecting smaller objects (that is, objects that occupy a small pixel area in the input image) a genuinely challenging task for machines and a wide-open research field. This study explores how the popular YOLOv5 object detector can be modified to improve its performance in detecting smaller objects, with a particular application in autonomous racing. To achieve this, we investigate how replacing certain structural elements of the model (as well as their connections and other parameters) can affect performance and inference time. In doing so, we propose a series of models at different scales, which we name `YOLO-Z', and which display an improvement of up to 6.9% in mAP when detecting smaller objects at 50% IOU, at the cost of just a 3ms increase in inference time compared to the original YOLOv5. Our objective is to inform future research on the potential of adjusting a popular detector such as YOLOv5 to address specific tasks and provide insights on how specific changes can impact small object detection. Such findings, applied to the broader context of autonomous vehicles, could increase the amount of contextual information available to such systems.
△ Less
Submitted 3 January, 2023; v1 submitted 22 December, 2021;
originally announced December 2021.
-
Shift-Curvature, SGD, and Generalization
Authors:
Arwen V. Bradley,
Carlos Alberto Gomez-Uribe,
Manish Reddy Vuyyuru
Abstract:
A longstanding debate surrounds the related hypotheses that low-curvature minima generalize better, and that SGD discourages curvature. We offer a more complete and nuanced view in support of both. First, we show that curvature harms test performance through two new mechanisms, the shift-curvature and bias-curvature, in addition to a known parameter-covariance mechanism. The three curvature-mediat…
▽ More
A longstanding debate surrounds the related hypotheses that low-curvature minima generalize better, and that SGD discourages curvature. We offer a more complete and nuanced view in support of both. First, we show that curvature harms test performance through two new mechanisms, the shift-curvature and bias-curvature, in addition to a known parameter-covariance mechanism. The three curvature-mediated contributions to test performance are reparametrization-invariant although curvature is not. The shift in the shift-curvature is the line connecting train and test local minima, which differ due to dataset sampling or distribution shift. Although the shift is unknown at training time, the shift-curvature can still be mitigated by minimizing overall curvature. Second, we derive a new, explicit SGD steady-state distribution showing that SGD optimizes an effective potential related to but different from train loss, and that SGD noise mediates a trade-off between deep versus low-curvature regions of this effective potential. Third, combining our test performance analysis with the SGD steady state shows that for small SGD noise, the shift-curvature may be the most significant of the three mechanisms. Our experiments confirm the impact of shift-curvature on test loss, and further explore the relationship between SGD noise and curvature.
△ Less
Submitted 27 July, 2022; v1 submitted 21 August, 2021;
originally announced August 2021.
-
Worsening Perception: Real-time Degradation of Autonomous Vehicle Perception Performance for Simulation of Adverse Weather Conditions
Authors:
Ivan Fursa,
Elias Fandi,
Valentina Musat,
Jacob Culley,
Enric Gil,
Izzeddin Teeti,
Louise Bilous,
Isaac Vander Sluis,
Alexander Rast,
Andrew Bradley
Abstract:
Autonomous vehicles rely heavily upon their perception subsystems to see the environment in which they operate. Unfortunately, the effect of variable weather conditions presents a significant challenge to object detection algorithms, and thus it is imperative to test the vehicle extensively in all conditions which it may experience. However, development of robust autonomous vehicle subsystems requ…
▽ More
Autonomous vehicles rely heavily upon their perception subsystems to see the environment in which they operate. Unfortunately, the effect of variable weather conditions presents a significant challenge to object detection algorithms, and thus it is imperative to test the vehicle extensively in all conditions which it may experience. However, development of robust autonomous vehicle subsystems requires repeatable, controlled testing - while real weather is unpredictable and cannot be scheduled. Real-world testing in adverse conditions is an expensive and time-consuming task, often requiring access to specialist facilities. Simulation is commonly relied upon as a substitute, with increasingly visually realistic representations of the real-world being developed. In the context of the complete autonomous vehicle control pipeline, subsystems downstream of perception need to be tested with accurate recreations of the perception system output, rather than focusing on subjective visual realism of the input - whether in simulation or the real world. This study develops the untapped potential of a lightweight weather augmentation method in an autonomous racing vehicle - focusing not on visual accuracy, but rather the effect upon perception subsystem performance in real time. With minimal adjustment, the prototype developed in this study can replicate the effects of water droplets on the camera lens, and fading light conditions. This approach introduces a latency of less than 8 ms using compute hardware well suited to being carried in the vehicle - rendering it ideal for real-time implementation that can be run during experiments in simulation, and augmented reality testing in the real world.
△ Less
Submitted 7 July, 2021; v1 submitted 3 March, 2021;
originally announced March 2021.
-
ROAD: The ROad event Awareness Dataset for Autonomous Driving
Authors:
Gurkirt Singh,
Stephen Akrigg,
Manuele Di Maio,
Valentina Fontana,
Reza Javanmard Alitappeh,
Suman Saha,
Kossar Jeddisaravi,
Farzad Yousefi,
Jacob Culley,
Tom Nicholson,
Jordan Omokeowa,
Salman Khan,
Stanislao Grazioso,
Andrew Bradley,
Giuseppe Di Gironimo,
Fabio Cuzzolin
Abstract:
Humans drive in a holistic fashion which entails, in particular, understanding dynamic road events and their evolution. Injecting these capabilities in autonomous vehicles can thus take situational awareness and decision making closer to human-level performance. To this purpose, we introduce the ROad event Awareness Dataset (ROAD) for Autonomous Driving, to our knowledge the first of its kind. ROA…
▽ More
Humans drive in a holistic fashion which entails, in particular, understanding dynamic road events and their evolution. Injecting these capabilities in autonomous vehicles can thus take situational awareness and decision making closer to human-level performance. To this purpose, we introduce the ROad event Awareness Dataset (ROAD) for Autonomous Driving, to our knowledge the first of its kind. ROAD is designed to test an autonomous vehicle's ability to detect road events, defined as triplets composed by an active agent, the action(s) it performs and the corresponding scene locations. ROAD comprises videos originally from the Oxford RobotCar Dataset annotated with bounding boxes showing the location in the image plane of each road event. We benchmark various detection tasks, proposing as a baseline a new incremental algorithm for online road event awareness termed 3D-RetinaNet. We also report the performance on the ROAD tasks of Slowfast and YOLOv5 detectors, as well as that of the winners of the ICCV2021 ROAD challenge, which highlight the challenges faced by situation awareness in autonomous driving. ROAD is designed to allow scholars to investigate exciting tasks such as complex (road) activity detection, future event anticipation and continual learning. The dataset is available at https://github.com/gurkirt/road-dataset; the baseline can be found at https://github.com/gurkirt/3D-RetinaNet.
△ Less
Submitted 1 April, 2022; v1 submitted 23 February, 2021;
originally announced February 2021.
-
Real-Time Optimal Trajectory Planning for Autonomous Vehicles and Lap Time Simulation Using Machine Learning
Authors:
Sam Garlick,
Andrew Bradley
Abstract:
Widespread development of driverless vehicles has led to the formation of autonomous racing, where technological development is accelerated by the high speeds and competitive environment of motorsport. A particular challenge for an autonomous vehicle is that of identifying a target trajectory - or, in the case of a competition vehicle, the racing line. Many existing approaches to finding the racin…
▽ More
Widespread development of driverless vehicles has led to the formation of autonomous racing, where technological development is accelerated by the high speeds and competitive environment of motorsport. A particular challenge for an autonomous vehicle is that of identifying a target trajectory - or, in the case of a competition vehicle, the racing line. Many existing approaches to finding the racing line are either not time-optimal solutions, or are computationally expensive - rendering them unsuitable for real-time application using on-board processing hardware. This study describes a machine learning approach to generating an accurate prediction of the racing line in real-time on desktop processing hardware. The proposed algorithm is a feed-forward neural network, trained using a dataset comprising racing lines for a large number of circuits calculated via traditional optimal control lap time simulation. The network predicts the racing line with a mean absolute error of +/-0.27m, and just +/-0.11m at corner apex - comparable to human drivers, and autonomous vehicle control subsystems. The approach generates predictions within 33ms, making it over 9,000 times faster than traditional methods of finding the optimal trajectory. Results suggest that for certain applications data-driven approaches to find near-optimal racing lines may be favourable to traditional computational methods.
△ Less
Submitted 15 September, 2021; v1 submitted 3 February, 2021;
originally announced February 2021.
-
Cinematic-L1 Video Stabilization with a Log-Homography Model
Authors:
Arwen Bradley,
Jason Klivington,
Joseph Triscari,
Rudolph van der Merwe
Abstract:
We present a method for stabilizing handheld video that simulates the camera motions cinematographers achieve with equipment like tripods, dollies, and Steadicams. We formulate a constrained convex optimization problem minimizing the $\ell_1$-norm of the first three derivatives of the stabilized motion. Our approach extends the work of Grundmann et al. [9] by solving with full homographies (rather…
▽ More
We present a method for stabilizing handheld video that simulates the camera motions cinematographers achieve with equipment like tripods, dollies, and Steadicams. We formulate a constrained convex optimization problem minimizing the $\ell_1$-norm of the first three derivatives of the stabilized motion. Our approach extends the work of Grundmann et al. [9] by solving with full homographies (rather than affinities) in order to correct perspective, preserving linearity by working in log-homography space. We also construct crop constraints that preserve field-of-view; model the problem as a quadratic (rather than linear) program to allow for an $\ell_2$ term encouraging fidelity to the original trajectory; and add constraints and objectives to reduce distortion. Furthermore, we propose new methods for handling salient objects via both inclusion constraints and centering objectives. Finally, we describe a windowing strategy to approximate the solution in linear time and bounded memory. Our method is computationally efficient, running at 300fps on an iPhone XS, and yields high-quality results, as we demonstrate with a collection of stabilized videos, quantitative and qualitative comparisons to [9] and other methods, and an ablation study.
△ Less
Submitted 20 November, 2020; v1 submitted 16 November, 2020;
originally announced November 2020.
-
Automatic lesion detection, segmentation and characterization via 3D multiscale morphological sifting in breast MRI
Authors:
Hang Min,
Darryl McClymont,
Shekhar S. Chandra,
Stuart Crozier,
Andrew P. Bradley
Abstract:
Previous studies on computer aided detection/diagnosis (CAD) in 4D breast magnetic resonance imaging (MRI) regard lesion detection, segmentation and characterization as separate tasks, and typically require users to manually select 2D MRI slices or regions of interest as the input. In this work, we present a breast MRI CAD system that can handle 4D multimodal breast MRI data, and integrate lesion…
▽ More
Previous studies on computer aided detection/diagnosis (CAD) in 4D breast magnetic resonance imaging (MRI) regard lesion detection, segmentation and characterization as separate tasks, and typically require users to manually select 2D MRI slices or regions of interest as the input. In this work, we present a breast MRI CAD system that can handle 4D multimodal breast MRI data, and integrate lesion detection, segmentation and characterization with no user intervention. The proposed CAD system consists of three major stages: region candidate generation, feature extraction and region candidate classification. Breast lesions are firstly extracted as region candidates using the novel 3D multiscale morphological sifting (MMS). The 3D MMS, which uses linear structuring elements to extract lesion-like patterns, can segment lesions from breast images accurately and efficiently. Analytical features are then extracted from all available 4D multimodal breast MRI sequences, including T1-, T2-weighted and DCE sequences, to represent the signal intensity, texture, morphological and enhancement kinetic characteristics of the region candidates. The region candidates are lastly classified as lesion or normal tissue by the random under-sampling boost (RUSboost), and as malignant or benign lesion by the random forest. Evaluated on a breast MRI dataset which contains a total of 117 cases with 95 malignant and 46 benign lesions, the proposed system achieves a true positive rate (TPR) of 0.90 at 3.19 false positives per patient (FPP) for lesion detection and a TPR of 0.91 at a FPP of 2.95 for identifying malignant lesions without any user intervention. The average dice similarity index (DSI) is 0.72 for lesion segmentation. Compared with previously proposed systems evaluated on the same breast MRI dataset, the proposed CAD system achieves a favourable performance in breast lesion detection and characterization.
△ Less
Submitted 7 July, 2020;
originally announced July 2020.
-
Design by Immersion: A Transdisciplinary Approach to Problem-Driven Visualizations
Authors:
Kyle Wm. Hall,
Adam J. Bradley,
Uta Hinrichs,
Samuel Huron,
Jo Wood,
Christopher Collins,
Sheelagh Carpendale
Abstract:
While previous work exists on how to conduct and disseminate insights from problem-driven visualization projects and design studies, the literature does not address how to accomplish these goals in transdisciplinary teams in ways that advance all disciplines involved. In this paper we introduce and define a new methodological paradigm we call design by immersion, which provides an alternative pers…
▽ More
While previous work exists on how to conduct and disseminate insights from problem-driven visualization projects and design studies, the literature does not address how to accomplish these goals in transdisciplinary teams in ways that advance all disciplines involved. In this paper we introduce and define a new methodological paradigm we call design by immersion, which provides an alternative perspective on problem-driven visualization work. Design by immersion embeds transdisciplinary experiences at the center of the visualization process by having visualization researchers participate in the work of the target domain (or domain experts participate in visualization research). Based on our own combined experiences of working on cross-disciplinary, problem-driven visualization projects, we present six case studies that expose the opportunities that design by immersion enables, including (1) exploring new domain-inspired visualization design spaces, (2) enriching domain understanding through personal experiences, and (3) building strong transdisciplinary relationships. Furthermore, we illustrate how the process of design by immersion opens up a diverse set of design activities that can be combined in different ways depending on the type of collaboration, project, and goals. Finally, we discuss the challenges and potential pitfalls of design by immersion.
△ Less
Submitted 17 October, 2019; v1 submitted 1 August, 2019;
originally announced August 2019.
-
Fully automatic computer-aided mass detection and segmentation via pseudo-color mammograms and Mask R-CNN
Authors:
Hang Min,
Devin Wilson,
Yinhuang Huang,
Siyu Liu,
Stuart Crozier,
Andrew P Bradley,
Shekhar S. Chandra
Abstract:
Mammographic mass detection and segmentation are usually performed as serial and separate tasks, with segmentation often only performed on manually confirmed true positive detections in previous studies. We propose a fully-integrated computer-aided detection (CAD) system for simultaneous mammographic mass detection and segmentation without user intervention. The proposed CAD only consists of a pse…
▽ More
Mammographic mass detection and segmentation are usually performed as serial and separate tasks, with segmentation often only performed on manually confirmed true positive detections in previous studies. We propose a fully-integrated computer-aided detection (CAD) system for simultaneous mammographic mass detection and segmentation without user intervention. The proposed CAD only consists of a pseudo-color image generation and a mass detection-segmentation stage based on Mask R-CNN. Grayscale mammograms are transformed into pseudo-color images based on multi-scale morphological sifting where mass-like patterns are enhanced to improve the performance of Mask R-CNN. Transfer learning with the Mask R-CNN is then adopted to simultaneously detect and segment masses on the pseudo-color images. Evaluated on the public dataset INbreast, the method outperforms the state-of-the-art methods by achieving an average true positive rate of 0.90 at 0.9 false positive per image and an average Dice similarity index of 0.88 for mass segmentation.
△ Less
Submitted 19 October, 2019; v1 submitted 28 June, 2019;
originally announced June 2019.
-
Pre and Post-hoc Diagnosis and Interpretation of Malignancy from Breast DCE-MRI
Authors:
Gabriel Maicas,
Andrew P. Bradley,
Jacinto C. Nascimento,
Ian Reid,
Gustavo Carneiro
Abstract:
We propose a new method for breast cancer screening from DCE-MRI based on a post-hoc approach that is trained using weakly annotated data (i.e., labels are available only at the image level without any lesion delineation). Our proposed post-hoc method automatically diagnosis the whole volume and, for positive cases, it localizes the malignant lesions that led to such diagnosis. Conversely, traditi…
▽ More
We propose a new method for breast cancer screening from DCE-MRI based on a post-hoc approach that is trained using weakly annotated data (i.e., labels are available only at the image level without any lesion delineation). Our proposed post-hoc method automatically diagnosis the whole volume and, for positive cases, it localizes the malignant lesions that led to such diagnosis. Conversely, traditional approaches follow a pre-hoc approach that initially localises suspicious areas that are subsequently classified to establish the breast malignancy -- this approach is trained using strongly annotated data (i.e., it needs a delineation and classification of all lesions in an image). Another goal of this paper is to establish the advantages and disadvantages of both approaches when applied to breast screening from DCE-MRI. Relying on experiments on a breast DCE-MRI dataset that contains scans of 117 patients, our results show that the post-hoc method is more accurate for diagnosing the whole volume per patient, achieving an AUC of 0.91, while the pre-hoc method achieves an AUC of 0.81. However, the performance for localising the malignant lesions remains challenging for the post-hoc method due to the weakly labelled dataset employed during training.
△ Less
Submitted 3 February, 2019; v1 submitted 25 September, 2018;
originally announced September 2018.
-
Model Agnostic Saliency for Weakly Supervised Lesion Detection from Breast DCE-MRI
Authors:
Gabriel Maicas,
Gerard Snaauw,
Andrew P. Bradley,
Ian Reid,
Gustavo Carneiro
Abstract:
There is a heated debate on how to interpret the decisions provided by deep learning models (DLM), where the main approaches rely on the visualization of salient regions to interpret the DLM classification process. However, these approaches generally fail to satisfy three conditions for the problem of lesion detection from medical images: 1) for images with lesions, all salient regions should repr…
▽ More
There is a heated debate on how to interpret the decisions provided by deep learning models (DLM), where the main approaches rely on the visualization of salient regions to interpret the DLM classification process. However, these approaches generally fail to satisfy three conditions for the problem of lesion detection from medical images: 1) for images with lesions, all salient regions should represent lesions, 2) for images containing no lesions, no salient region should be produced,and 3) lesions are generally small with relatively smooth borders. We propose a new model-agnostic paradigm to interpret DLM classification decisions supported by a novel definition of saliency that incorporates the conditions above. Our model-agnostic 1-class saliency detector (MASD) is tested on weakly supervised breast lesion detection from DCE-MRI, achieving state-of-the-art detection accuracy when compared to current visualization methods.
△ Less
Submitted 4 February, 2019; v1 submitted 20 July, 2018;
originally announced July 2018.
-
Why rankings of biomedical image analysis competitions should be interpreted with care
Authors:
Lena Maier-Hein,
Matthias Eisenmann,
Annika Reinke,
Sinan Onogur,
Marko Stankovic,
Patrick Scholz,
Tal Arbel,
Hrvoje Bogunovic,
Andrew P. Bradley,
Aaron Carass,
Carolin Feldmann,
Alejandro F. Frangi,
Peter M. Full,
Bram van Ginneken,
Allan Hanbury,
Katrin Honauer,
Michal Kozubek,
Bennett A. Landman,
Keno März,
Oskar Maier,
Klaus Maier-Hein,
Bjoern H. Menze,
Henning Müller,
Peter F. Neher,
Wiro Niessen
, et al. (13 additional authors not shown)
Abstract:
International challenges have become the standard for validation of biomedical image analysis methods. Given their scientific impact, it is surprising that a critical analysis of common practices related to the organization of challenges has not yet been performed. In this paper, we present a comprehensive analysis of biomedical image analysis challenges conducted up to now. We demonstrate the imp…
▽ More
International challenges have become the standard for validation of biomedical image analysis methods. Given their scientific impact, it is surprising that a critical analysis of common practices related to the organization of challenges has not yet been performed. In this paper, we present a comprehensive analysis of biomedical image analysis challenges conducted up to now. We demonstrate the importance of challenges and show that the lack of quality control has critical consequences. First, reproducibility and interpretation of the results is often hampered as only a fraction of relevant information is typically provided. Second, the rank of an algorithm is generally not robust to a number of variables such as the test data used for validation, the ranking scheme applied and the observers that make the reference annotations. To overcome these problems, we recommend best practice guidelines and define open research questions to be addressed in the future.
△ Less
Submitted 18 September, 2019; v1 submitted 6 June, 2018;
originally announced June 2018.
-
Producing radiologist-quality reports for interpretable artificial intelligence
Authors:
William Gale,
Luke Oakden-Rayner,
Gustavo Carneiro,
Andrew P Bradley,
Lyle J Palmer
Abstract:
Current approaches to explaining the decisions of deep learning systems for medical tasks have focused on visualising the elements that have contributed to each decision. We argue that such approaches are not enough to "open the black box" of medical decision making systems because they are missing a key component that has been used as a standard communication tool between doctors for centuries: l…
▽ More
Current approaches to explaining the decisions of deep learning systems for medical tasks have focused on visualising the elements that have contributed to each decision. We argue that such approaches are not enough to "open the black box" of medical decision making systems because they are missing a key component that has been used as a standard communication tool between doctors for centuries: language. We propose a model-agnostic interpretability method that involves training a simple recurrent neural network model to produce descriptive sentences to clarify the decision of deep learning classifiers.
We test our method on the task of detecting hip fractures from frontal pelvic x-rays. This process requires minimal additional labelling despite producing text containing elements that the original deep learning classification model was not specifically trained to detect.
The experimental results show that: 1) the sentences produced by our method consistently contain the desired information, 2) the generated sentences are preferred by doctors compared to current tools that create saliency maps, and 3) the combination of visualisations and generated text is better than either alone.
△ Less
Submitted 1 June, 2018;
originally announced June 2018.
-
Training Medical Image Analysis Systems like Radiologists
Authors:
Gabriel Maicas,
Andrew P. Bradley,
Jacinto C. Nascimento,
Ian Reid,
Gustavo Carneiro
Abstract:
The training of medical image analysis systems using machine learning approaches follows a common script: collect and annotate a large dataset, train the classifier on the training set, and test it on a hold-out test set. This process bears no direct resemblance with radiologist training, which is based on solving a series of tasks of increasing difficulty, where each task involves the use of sign…
▽ More
The training of medical image analysis systems using machine learning approaches follows a common script: collect and annotate a large dataset, train the classifier on the training set, and test it on a hold-out test set. This process bears no direct resemblance with radiologist training, which is based on solving a series of tasks of increasing difficulty, where each task involves the use of significantly smaller datasets than those used in machine learning. In this paper, we propose a novel training approach inspired by how radiologists are trained. In particular, we explore the use of meta-training that models a classifier based on a series of tasks. Tasks are selected using teacher-student curriculum learning, where each task consists of simple classification problems containing small training sets. We hypothesize that our proposed meta-training approach can be used to pre-train medical image analysis models. This hypothesis is tested on the automatic breast screening classification from DCE-MRI trained with weakly labeled datasets. The classification performance achieved by our approach is shown to be the best in the field for that application, compared to state of art baseline approaches: DenseNet, multiple instance learning and multi-task learning.
△ Less
Submitted 4 February, 2019; v1 submitted 28 May, 2018;
originally announced May 2018.
-
Detecting hip fractures with radiologist-level performance using deep neural networks
Authors:
William Gale,
Luke Oakden-Rayner,
Gustavo Carneiro,
Andrew P. Bradley,
Lyle J. Palmer
Abstract:
We developed an automated deep learning system to detect hip fractures from frontal pelvic x-rays, an important and common radiological task. Our system was trained on a decade of clinical x-rays (~53,000 studies) and can be applied to clinical data, automatically excluding inappropriate and technically unsatisfactory studies. We demonstrate diagnostic performance equivalent to a human radiologist…
▽ More
We developed an automated deep learning system to detect hip fractures from frontal pelvic x-rays, an important and common radiological task. Our system was trained on a decade of clinical x-rays (~53,000 studies) and can be applied to clinical data, automatically excluding inappropriate and technically unsatisfactory studies. We demonstrate diagnostic performance equivalent to a human radiologist and an area under the ROC curve of 0.994. Translated to clinical practice, such a system has the potential to increase the efficiency of diagnosis, reduce the need for expensive additional testing, expand access to expert level medical image interpretation, and improve overall patient outcomes.
△ Less
Submitted 17 November, 2017;
originally announced November 2017.
-
Automated Detection of Individual Micro-calcifications from Mammograms using a Multi-stage Cascade Approach
Authors:
Zhi Lu,
Gustavo Carneiro,
Neeraj Dhungel,
Andrew P. Bradley
Abstract:
In mammography, the efficacy of computer-aided detection methods depends, in part, on the robust localisation of micro-calcifications ($μ$C). Currently, the most effective methods are based on three steps: 1) detection of individual $μ$C candidates, 2) clustering of individual $μ$C candidates, and 3) classification of $μ$C clusters. Where the second step is motivated both to reduce the number of f…
▽ More
In mammography, the efficacy of computer-aided detection methods depends, in part, on the robust localisation of micro-calcifications ($μ$C). Currently, the most effective methods are based on three steps: 1) detection of individual $μ$C candidates, 2) clustering of individual $μ$C candidates, and 3) classification of $μ$C clusters. Where the second step is motivated both to reduce the number of false positive detections from the first step and on the evidence that malignancy depends on a relatively large number of $μ$C detections within a certain area. In this paper, we propose a novel approach to $μ$C detection, consisting of the detection \emph{and} classification of individual $μ$C candidates, using shape and appearance features, using a cascade of boosting classifiers. The final step in our approach then clusters the remaining individual $μ$C candidates. The main advantage of this approach lies in its ability to reject a significant number of false positive $μ$C candidates compared to previously proposed methods. Specifically, on the INbreast dataset, we show that our approach has a true positive rate (TPR) for individual $μ$Cs of 40\% at one false positive per image (FPI) and a TPR of 80\% at 10 FPI. These results are significantly more accurate than the current state of the art, which has a TPR of less than 1\% at one FPI and a TPR of 10\% at 10 FPI. Our results are competitive with the state of the art at the subsequent stage of detecting clusters of $μ$Cs.
△ Less
Submitted 7 October, 2016;
originally announced October 2016.
-
Automated 5-year Mortality Prediction using Deep Learning and Radiomics Features from Chest Computed Tomography
Authors:
Gustavo Carneiro,
Luke Oakden-Rayner,
Andrew P. Bradley,
Jacinto Nascimento,
Lyle Palmer
Abstract:
We propose new methods for the prediction of 5-year mortality in elderly individuals using chest computed tomography (CT). The methods consist of a classifier that performs this prediction using a set of features extracted from the CT image and segmentation maps of multiple anatomic structures. We explore two approaches: 1) a unified framework based on deep learning, where features and classifier…
▽ More
We propose new methods for the prediction of 5-year mortality in elderly individuals using chest computed tomography (CT). The methods consist of a classifier that performs this prediction using a set of features extracted from the CT image and segmentation maps of multiple anatomic structures. We explore two approaches: 1) a unified framework based on deep learning, where features and classifier are automatically learned in a single optimisation process; and 2) a multi-stage framework based on the design and selection/extraction of hand-crafted radiomics features, followed by the classifier learning process. Experimental results, based on a dataset of 48 annotated chest CTs, show that the deep learning model produces a mean 5-year mortality prediction accuracy of 68.5%, while radiomics produces a mean accuracy that varies between 56% to 66% (depending on the feature selection/extraction method and classifier). The successful development of the proposed models has the potential to make a profound impact in preventive and personalised healthcare.
△ Less
Submitted 1 July, 2016;
originally announced July 2016.
-
Deep Structured learning for mass segmentation from Mammograms
Authors:
Neeraj Dhungel,
Gustavo Carneiro,
Andrew P. Bradley
Abstract:
In this paper, we present a novel method for the segmentation of breast masses from mammograms exploring structured and deep learning. Specifically, using structured support vector machine (SSVM), we formulate a model that combines different types of potential functions, including one that classifies image regions using deep learning. Our main goal with this work is to show the accuracy and effici…
▽ More
In this paper, we present a novel method for the segmentation of breast masses from mammograms exploring structured and deep learning. Specifically, using structured support vector machine (SSVM), we formulate a model that combines different types of potential functions, including one that classifies image regions using deep learning. Our main goal with this work is to show the accuracy and efficiency improvements that these relatively new techniques can provide for the segmentation of breast masses from mammograms. We also propose an easily reproducible quantitative analysis to as- sess the performance of breast mass segmentation methodologies based on widely accepted accuracy and running time measurements on public datasets, which will facilitate further comparisons for this segmentation problem. In particular, we use two publicly available datasets (DDSM-BCRP and INbreast) and propose the computa- tion of the running time taken for the methodology to produce a mass segmentation given an input image and the use of the Dice index to quantitatively measure the segmentation accuracy. For both databases, we show that our proposed methodology produces competitive results in terms of accuracy and running time.
△ Less
Submitted 4 December, 2014; v1 submitted 27 October, 2014;
originally announced October 2014.
-
k-Step Relative Inductive Generalization
Authors:
Aaron R. Bradley
Abstract:
We introduce a new form of SAT-based symbolic model checking. One common idea in SAT-based symbolic model checking is to generate new clauses from states that can lead to property violations. Our previous work suggests applying induction to generalize from such states. While effective on some benchmarks, the main problem with inductive generalization is that not all such states can be inductive…
▽ More
We introduce a new form of SAT-based symbolic model checking. One common idea in SAT-based symbolic model checking is to generate new clauses from states that can lead to property violations. Our previous work suggests applying induction to generalize from such states. While effective on some benchmarks, the main problem with inductive generalization is that not all such states can be inductively generalized at a given time in the analysis, resulting in long searches for generalizable states on some benchmarks. This paper introduces the idea of inductively generalizing states relative to $k$-step over-approximations: a given state is inductively generalized relative to the latest $k$-step over-approximation relative to which the negation of the state is itself inductive. This idea motivates an algorithm that inductively generalizes a given state at the highest level $k$ so far examined, possibly by generating more than one mutually $k$-step relative inductive clause. We present experimental evidence that the algorithm is effective in practice.
△ Less
Submitted 18 March, 2010;
originally announced March 2010.