-
Brain Foundation Models with Hypergraph Dynamic Adapter for Brain Disease Analysis
Authors:
Zhongying Deng,
Haoyu Wang,
Ziyan Huang,
Lipei Zhang,
Angelica I. Aviles-Rivero,
Chaoyu Liu,
Junjun He,
Zoe Kourtzi,
Carola-Bibiane Schönlieb
Abstract:
Brain diseases, such as Alzheimer's disease and brain tumors, present profound challenges due to their complexity and societal impact. Recent advancements in brain foundation models have shown significant promise in addressing a range of brain-related tasks. However, current brain foundation models are limited by task and data homogeneity, restricted generalization beyond segmentation or classific…
▽ More
Brain diseases, such as Alzheimer's disease and brain tumors, present profound challenges due to their complexity and societal impact. Recent advancements in brain foundation models have shown significant promise in addressing a range of brain-related tasks. However, current brain foundation models are limited by task and data homogeneity, restricted generalization beyond segmentation or classification, and inefficient adaptation to diverse clinical tasks. In this work, we propose SAM-Brain3D, a brain-specific foundation model trained on over 66,000 brain image-label pairs across 14 MRI sub-modalities, and Hypergraph Dynamic Adapter (HyDA), a lightweight adapter for efficient and effective downstream adaptation. SAM-Brain3D captures detailed brain-specific anatomical and modality priors for segmenting diverse brain targets and broader downstream tasks. HyDA leverages hypergraphs to fuse complementary multi-modal data and dynamically generate patient-specific convolutional kernels for multi-scale feature fusion and personalized patient-wise adaptation. Together, our framework excels across a broad spectrum of brain disease segmentation and classification tasks. Extensive experiments demonstrate that our method consistently outperforms existing state-of-the-art approaches, offering a new paradigm for brain disease analysis through multi-modal, multi-scale, and dynamic foundation modeling.
△ Less
Submitted 1 May, 2025;
originally announced May 2025.
-
RSFR: A Coarse-to-Fine Reconstruction Framework for Diffusion Tensor Cardiac MRI with Semantic-Aware Refinement
Authors:
Jiahao Huang,
Fanwen Wang,
Pedro F. Ferreira,
Haosen Zhang,
Yinzhe Wu,
Zhifan Gao,
Lei Zhu,
Angelica I. Aviles-Rivero,
Carola-Bibiane Schonlieb,
Andrew D. Scott,
Zohya Khalique,
Maria Dwornik,
Ramyah Rajakulasingam,
Ranil De Silva,
Dudley J. Pennell,
Guang Yang,
Sonia Nielles-Vallespin
Abstract:
Cardiac diffusion tensor imaging (DTI) offers unique insights into cardiomyocyte arrangements, bridging the gap between microscopic and macroscopic cardiac function. However, its clinical utility is limited by technical challenges, including a low signal-to-noise ratio, aliasing artefacts, and the need for accurate quantitative fidelity. To address these limitations, we introduce RSFR (Reconstruct…
▽ More
Cardiac diffusion tensor imaging (DTI) offers unique insights into cardiomyocyte arrangements, bridging the gap between microscopic and macroscopic cardiac function. However, its clinical utility is limited by technical challenges, including a low signal-to-noise ratio, aliasing artefacts, and the need for accurate quantitative fidelity. To address these limitations, we introduce RSFR (Reconstruction, Segmentation, Fusion & Refinement), a novel framework for cardiac diffusion-weighted image reconstruction. RSFR employs a coarse-to-fine strategy, leveraging zero-shot semantic priors via the Segment Anything Model and a robust Vision Mamba-based reconstruction backbone. Our framework integrates semantic features effectively to mitigate artefacts and enhance fidelity, achieving state-of-the-art reconstruction quality and accurate DT parameter estimation under high undersampling rates. Extensive experiments and ablation studies demonstrate the superior performance of RSFR compared to existing methods, highlighting its robustness, scalability, and potential for clinical translation in quantitative cardiac DTI.
△ Less
Submitted 25 April, 2025;
originally announced April 2025.
-
Implicit U-KAN2.0: Dynamic, Efficient and Interpretable Medical Image Segmentation
Authors:
Chun-Wun Cheng,
Yining Zhao,
Yanqi Cheng,
Javier Montoya,
Carola-Bibiane Schönlieb,
Angelica I Aviles-Rivero
Abstract:
Image segmentation is a fundamental task in both image analysis and medical applications. State-of-the-art methods predominantly rely on encoder-decoder architectures with a U-shaped design, commonly referred to as U-Net. Recent advancements integrating transformers and MLPs improve performance but still face key limitations, such as poor interpretability, difficulty handling intrinsic noise, and…
▽ More
Image segmentation is a fundamental task in both image analysis and medical applications. State-of-the-art methods predominantly rely on encoder-decoder architectures with a U-shaped design, commonly referred to as U-Net. Recent advancements integrating transformers and MLPs improve performance but still face key limitations, such as poor interpretability, difficulty handling intrinsic noise, and constrained expressiveness due to discrete layer structures, often lacking a solid theoretical foundation.In this work, we introduce Implicit U-KAN 2.0, a novel U-Net variant that adopts a two-phase encoder-decoder structure. In the SONO phase, we use a second-order neural ordinary differential equation (NODEs), called the SONO block, for a more efficient, expressive, and theoretically grounded modeling approach. In the SONO-MultiKAN phase, we integrate the second-order NODEs and MultiKAN layer as the core computational block to enhance interpretability and representation power. Our contributions are threefold. First, U-KAN 2.0 is an implicit deep neural network incorporating MultiKAN and second order NODEs, improving interpretability and performance while reducing computational costs. Second, we provide a theoretical analysis demonstrating that the approximation ability of the MultiKAN block is independent of the input dimension. Third, we conduct extensive experiments on a variety of 2D and a single 3D dataset, demonstrating that our model consistently outperforms existing segmentation networks.
△ Less
Submitted 4 March, 2025;
originally announced March 2025.
-
A Sliding Layer Merging Method for Efficient Depth-Wise Pruning in LLMs
Authors:
Xuan Ding,
Rui Sun,
Yunjian Zhang,
Xiu Yan,
Yueqi Zhou,
Kaihao Huang,
Suzhong Fu,
Angelica I Aviles-Rivero,
Chuanlong Xie,
Yao Zhu
Abstract:
Compared to width-wise pruning, depth-wise pruning can significantly accelerate inference in resource-constrained scenarios. However, treating the entire Transformer layer as the minimum pruning unit may degrade model performance by indiscriminately discarding the entire information of the layer. This paper reveals the ``Patch-like'' feature relationship between layers in large language models by…
▽ More
Compared to width-wise pruning, depth-wise pruning can significantly accelerate inference in resource-constrained scenarios. However, treating the entire Transformer layer as the minimum pruning unit may degrade model performance by indiscriminately discarding the entire information of the layer. This paper reveals the ``Patch-like'' feature relationship between layers in large language models by analyzing the correlation of the outputs of different layers in the reproducing kernel Hilbert space. Building on this observation, we propose a sliding layer merging method that dynamically selects and fuses consecutive layers from top to bottom according to a pre-defined similarity threshold, thereby simplifying the model structure while maintaining its performance. Extensive experiments on LLMs with various architectures and different parameter scales show that our method outperforms existing pruning techniques in both zero-shot inference performance and retraining recovery quality after pruning. In particular, in the experiment with 35% pruning on the Vicuna-7B model, our method achieved a 1.654% improvement in average performance on zero-shot tasks compared to the existing method. Moreover, we further reveal the potential of combining depth pruning with width pruning to enhance the pruning effect. Our codes are available at https://github.com/920927/SLM-a-sliding-layer-merging-method.
△ Less
Submitted 15 May, 2025; v1 submitted 26 February, 2025;
originally announced February 2025.
-
ReFocus: Reinforcing Mid-Frequency and Key-Frequency Modeling for Multivariate Time Series Forecasting
Authors:
Guoqi Yu,
Yaoming Li,
Juncheng Wang,
Xiaoyu Guo,
Angelica I. Aviles-Rivero,
Tong Yang,
Shujun Wang
Abstract:
Recent advancements have progressively incorporated frequency-based techniques into deep learning models, leading to notable improvements in accuracy and efficiency for time series analysis tasks. However, the Mid-Frequency Spectrum Gap in the real-world time series, where the energy is concentrated at the low-frequency region while the middle-frequency band is negligible, hinders the ability of e…
▽ More
Recent advancements have progressively incorporated frequency-based techniques into deep learning models, leading to notable improvements in accuracy and efficiency for time series analysis tasks. However, the Mid-Frequency Spectrum Gap in the real-world time series, where the energy is concentrated at the low-frequency region while the middle-frequency band is negligible, hinders the ability of existing deep learning models to extract the crucial frequency information. Additionally, the shared Key-Frequency in multivariate time series, where different time series share indistinguishable frequency patterns, is rarely exploited by existing literature. This work introduces a novel module, Adaptive Mid-Frequency Energy Optimizer, based on convolution and residual learning, to emphasize the significance of mid-frequency bands. We also propose an Energy-based Key-Frequency Picking Block to capture shared Key-Frequency, which achieves superior inter-series modeling performance with fewer parameters. A novel Key-Frequency Enhanced Training strategy is employed to further enhance Key-Frequency modeling, where spectral information from other channels is randomly introduced into each channel. Our approach advanced multivariate time series forecasting on the challenging Traffic, ECL, and Solar benchmarks, reducing MSE by 4%, 6%, and 5% compared to the previous SOTA iTransformer. Code is available at this GitHub Repository: https://github.com/Levi-Ackman/ReFocus.
△ Less
Submitted 3 March, 2025; v1 submitted 24 February, 2025;
originally announced February 2025.
-
Cross-Modal Few-Shot Learning with Second-Order Neural Ordinary Differential Equations
Authors:
Yi Zhang,
Chun-Wun Cheng,
Junyi He,
Zhihai He,
Carola-Bibiane Schönlieb,
Yuyan Chen,
Angelica I Aviles-Rivero
Abstract:
We introduce SONO, a novel method leveraging Second-Order Neural Ordinary Differential Equations (Second-Order NODEs) to enhance cross-modal few-shot learning. By employing a simple yet effective architecture consisting of a Second-Order NODEs model paired with a cross-modal classifier, SONO addresses the significant challenge of overfitting, which is common in few-shot scenarios due to limited tr…
▽ More
We introduce SONO, a novel method leveraging Second-Order Neural Ordinary Differential Equations (Second-Order NODEs) to enhance cross-modal few-shot learning. By employing a simple yet effective architecture consisting of a Second-Order NODEs model paired with a cross-modal classifier, SONO addresses the significant challenge of overfitting, which is common in few-shot scenarios due to limited training examples. Our second-order approach can approximate a broader class of functions, enhancing the model's expressive power and feature generalization capabilities. We initialize our cross-modal classifier with text embeddings derived from class-relevant prompts, streamlining training efficiency by avoiding the need for frequent text encoder processing. Additionally, we utilize text-based image augmentation, exploiting CLIP's robust image-text correlation to enrich training data significantly. Extensive experiments across multiple datasets demonstrate that SONO outperforms existing state-of-the-art methods in few-shot learning performance.
△ Less
Submitted 20 December, 2024;
originally announced December 2024.
-
You KAN Do It in a Single Shot: Plug-and-Play Methods with Single-Instance Priors
Authors:
Yanqi Cheng,
Carola-Bibiane Schönlieb,
Angelica I Aviles-Rivero
Abstract:
The use of Plug-and-Play (PnP) methods has become a central approach for solving inverse problems, with denoisers serving as regularising priors that guide optimisation towards a clean solution. In this work, we introduce KAN-PnP, an optimisation framework that incorporates Kolmogorov-Arnold Networks (KANs) as denoisers within the Plug-and-Play (PnP) paradigm. KAN-PnP is specifically designed to s…
▽ More
The use of Plug-and-Play (PnP) methods has become a central approach for solving inverse problems, with denoisers serving as regularising priors that guide optimisation towards a clean solution. In this work, we introduce KAN-PnP, an optimisation framework that incorporates Kolmogorov-Arnold Networks (KANs) as denoisers within the Plug-and-Play (PnP) paradigm. KAN-PnP is specifically designed to solve inverse problems with single-instance priors, where only a single noisy observation is available, eliminating the need for large datasets typically required by traditional denoising methods. We show that KANs, based on the Kolmogorov-Arnold representation theorem, serve effectively as priors in such settings, providing a robust approach to denoising. We prove that the KAN denoiser is Lipschitz continuous, ensuring stability and convergence in optimisation algorithms like PnP-ADMM, even in the context of single-shot learning. Additionally, we provide theoretical guarantees for KAN-PnP, demonstrating its convergence under key conditions: the convexity of the data fidelity term, Lipschitz continuity of the denoiser, and boundedness of the regularisation functional. These conditions are crucial for stable and reliable optimisation. Our experimental results show, on super-resolution and joint optimisation, that KAN-PnP outperforms exiting methods, delivering superior performance in single-shot learning with minimal data. The method exhibits strong convergence properties, achieving high accuracy with fewer iterations.
△ Less
Submitted 2 May, 2025; v1 submitted 8 December, 2024;
originally announced December 2024.
-
Where Do We Stand with Implicit Neural Representations? A Technical and Performance Survey
Authors:
Amer Essakine,
Yanqi Cheng,
Chun-Wun Cheng,
Lipei Zhang,
Zhongying Deng,
Lei Zhu,
Carola-Bibiane Schönlieb,
Angelica I Aviles-Rivero
Abstract:
Implicit Neural Representations (INRs) have emerged as a paradigm in knowledge representation, offering exceptional flexibility and performance across a diverse range of applications. INRs leverage multilayer perceptrons (MLPs) to model data as continuous implicit functions, providing critical advantages such as resolution independence, memory efficiency, and generalisation beyond discretised data…
▽ More
Implicit Neural Representations (INRs) have emerged as a paradigm in knowledge representation, offering exceptional flexibility and performance across a diverse range of applications. INRs leverage multilayer perceptrons (MLPs) to model data as continuous implicit functions, providing critical advantages such as resolution independence, memory efficiency, and generalisation beyond discretised data structures. Their ability to solve complex inverse problems makes them particularly effective for tasks including audio reconstruction, image representation, 3D object reconstruction, and high-dimensional data synthesis. This survey provides a comprehensive review of state-of-the-art INR methods, introducing a clear taxonomy that categorises them into four key areas: activation functions, position encoding, combined strategies, and network structure optimisation. We rigorously analyse their critical properties, such as full differentiability, smoothness, compactness, and adaptability to varying resolutions while also examining their strengths and limitations in addressing locality biases and capturing fine details. Our experimental comparison offers new insights into the trade-offs between different approaches, showcasing the capabilities and challenges of the latest INR techniques across various tasks. In addition to identifying areas where current methods excel, we highlight key limitations and potential avenues for improvement, such as developing more expressive activation functions, enhancing positional encoding mechanisms, and improving scalability for complex, high-dimensional data. This survey serves as a roadmap for researchers, offering practical guidance for future exploration in the field of INRs. We aim to foster new methodologies by outlining promising research directions for INRs and applications.
△ Less
Submitted 18 February, 2025; v1 submitted 6 November, 2024;
originally announced November 2024.
-
Semi-Supervised Video Desnowing Network via Temporal Decoupling Experts and Distribution-Driven Contrastive Regularization
Authors:
Hongtao Wu,
Yijun Yang,
Angelica I Aviles-Rivero,
Jingjing Ren,
Sixiang Chen,
Haoyu Chen,
Lei Zhu
Abstract:
Snow degradations present formidable challenges to the advancement of computer vision tasks by the undesirable corruption in outdoor scenarios. While current deep learning-based desnowing approaches achieve success on synthetic benchmark datasets, they struggle to restore out-of-distribution real-world snowy videos due to the deficiency of paired real-world training data. To address this bottlenec…
▽ More
Snow degradations present formidable challenges to the advancement of computer vision tasks by the undesirable corruption in outdoor scenarios. While current deep learning-based desnowing approaches achieve success on synthetic benchmark datasets, they struggle to restore out-of-distribution real-world snowy videos due to the deficiency of paired real-world training data. To address this bottleneck, we devise a new paradigm for video desnowing in a semi-supervised spirit to involve unlabeled real data for the generalizable snow removal. Specifically, we construct a real-world dataset with 85 snowy videos, and then present a Semi-supervised Video Desnowing Network (SemiVDN) equipped by a novel Distribution-driven Contrastive Regularization. The elaborated contrastive regularization mitigates the distribution gap between the synthetic and real data, and consequently maintains the desired snow-invariant background details. Furthermore, based on the atmospheric scattering model, we introduce a Prior-guided Temporal Decoupling Experts module to decompose the physical components that make up a snowy video in a frame-correlated manner. We evaluate our SemiVDN on benchmark datasets and the collected real snowy data. The experimental results demonstrate the superiority of our approach against state-of-the-art image- and video-level desnowing methods.
△ Less
Submitted 10 October, 2024;
originally announced October 2024.
-
Mamba Neural Operator: Who Wins? Transformers vs. State-Space Models for PDEs
Authors:
Chun-Wun Cheng,
Jiahao Huang,
Yi Zhang,
Guang Yang,
Carola-Bibiane Schönlieb,
Angelica I Aviles-Rivero
Abstract:
Partial differential equations (PDEs) are widely used to model complex physical systems, but solving them efficiently remains a significant challenge. Recently, Transformers have emerged as the preferred architecture for PDEs due to their ability to capture intricate dependencies. However, they struggle with representing continuous dynamics and long-range interactions. To overcome these limitation…
▽ More
Partial differential equations (PDEs) are widely used to model complex physical systems, but solving them efficiently remains a significant challenge. Recently, Transformers have emerged as the preferred architecture for PDEs due to their ability to capture intricate dependencies. However, they struggle with representing continuous dynamics and long-range interactions. To overcome these limitations, we introduce the Mamba Neural Operator (MNO), a novel framework that enhances neural operator-based techniques for solving PDEs. MNO establishes a formal theoretical connection between structured state-space models (SSMs) and neural operators, offering a unified structure that can adapt to diverse architectures, including Transformer-based models. By leveraging the structured design of SSMs, MNO captures long-range dependencies and continuous dynamics more effectively than traditional Transformers. Through extensive analysis, we show that MNO significantly boosts the expressive power and accuracy of neural operators, making it not just a complement but a superior framework for PDE-related tasks, bridging the gap between efficient representation and accurate solution approximation.
△ Less
Submitted 9 April, 2025; v1 submitted 2 October, 2024;
originally announced October 2024.
-
Learning Task-Specific Sampling Strategy for Sparse-View CT Reconstruction
Authors:
Liutao Yang,
Jiahao Huang,
Yingying Fang,
Angelica I Aviles-Rivero,
Carola-Bibiane Schonlieb,
Daoqiang Zhang,
Guang Yang
Abstract:
Sparse-View Computed Tomography (SVCT) offers low-dose and fast imaging but suffers from severe artifacts. Optimizing the sampling strategy is an essential approach to improving the imaging quality of SVCT. However, current methods typically optimize a universal sampling strategy for all types of scans, overlooking the fact that the optimal strategy may vary depending on the specific scanning task…
▽ More
Sparse-View Computed Tomography (SVCT) offers low-dose and fast imaging but suffers from severe artifacts. Optimizing the sampling strategy is an essential approach to improving the imaging quality of SVCT. However, current methods typically optimize a universal sampling strategy for all types of scans, overlooking the fact that the optimal strategy may vary depending on the specific scanning task, whether it involves particular body scans (e.g., chest CT scans) or downstream clinical applications (e.g., disease diagnosis). The optimal strategy for one scanning task may not perform as well when applied to other tasks. To address this problem, we propose a deep learning framework that learns task-specific sampling strategies with a multi-task approach to train a unified reconstruction network while tailoring optimal sampling strategies for each individual task. Thus, a task-specific sampling strategy can be applied for each type of scans to improve the quality of SVCT imaging and further assist in performance of downstream clinical usage. Extensive experiments across different scanning types provide validation for the effectiveness of task-specific sampling strategies in enhancing imaging quality. Experiments involving downstream tasks verify the clinical value of learned sampling strategies, as evidenced by notable improvements in downstream task performance. Furthermore, the utilization of a multi-task framework with a shared reconstruction network facilitates deployment on current imaging devices with switchable task-specific modules, and allows for easily integrate new tasks without retraining the entire model.
△ Less
Submitted 2 September, 2024;
originally announced September 2024.
-
NODE-Adapter: Neural Ordinary Differential Equations for Better Vision-Language Reasoning
Authors:
Yi Zhang,
Chun-Wun Cheng,
Ke Yu,
Zhihai He,
Carola-Bibiane Schönlieb,
Angelica I. Aviles-Rivero
Abstract:
In this paper, we consider the problem of prototype-based vision-language reasoning problem. We observe that existing methods encounter three major challenges: 1) escalating resource demands and prolonging training times, 2) contending with excessive learnable parameters, and 3) fine-tuning based only on a single modality. These challenges will hinder their capability to adapt Vision-Language Mode…
▽ More
In this paper, we consider the problem of prototype-based vision-language reasoning problem. We observe that existing methods encounter three major challenges: 1) escalating resource demands and prolonging training times, 2) contending with excessive learnable parameters, and 3) fine-tuning based only on a single modality. These challenges will hinder their capability to adapt Vision-Language Models (VLMs) to downstream tasks. Motivated by this critical observation, we propose a novel method called NODE-Adapter, which utilizes Neural Ordinary Differential Equations for better vision-language reasoning. To fully leverage both visual and textual modalities and estimate class prototypes more effectively and accurately, we divide our method into two stages: cross-modal prototype construction and cross-modal prototype optimization using neural ordinary differential equations. Specifically, we exploit VLM to encode hand-crafted prompts into textual features and few-shot support images into visual features. Then, we estimate the textual prototype and visual prototype by averaging the textual features and visual features, respectively, and adaptively combine the textual prototype and visual prototype to construct the cross-modal prototype. To alleviate the prototype bias, we then model the prototype optimization process as an initial value problem with Neural ODEs to estimate the continuous gradient flow. Our extensive experimental results, which cover few-shot classification, domain generalization, and visual reasoning on human-object interaction, demonstrate that the proposed method significantly outperforms existing state-of-the-art approaches.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
LGRNet: Local-Global Reciprocal Network for Uterine Fibroid Segmentation in Ultrasound Videos
Authors:
Huihui Xu,
Yijun Yang,
Angelica I Aviles-Rivero,
Guang Yang,
Jing Qin,
Lei Zhu
Abstract:
Regular screening and early discovery of uterine fibroid are crucial for preventing potential malignant transformations and ensuring timely, life-saving interventions. To this end, we collect and annotate the first ultrasound video dataset with 100 videos for uterine fibroid segmentation (UFUV). We also present Local-Global Reciprocal Network (LGRNet) to efficiently and effectively propagate the l…
▽ More
Regular screening and early discovery of uterine fibroid are crucial for preventing potential malignant transformations and ensuring timely, life-saving interventions. To this end, we collect and annotate the first ultrasound video dataset with 100 videos for uterine fibroid segmentation (UFUV). We also present Local-Global Reciprocal Network (LGRNet) to efficiently and effectively propagate the long-term temporal context which is crucial to help distinguish between uninformative noisy surrounding tissues and target lesion regions. Specifically, the Cyclic Neighborhood Propagation (CNP) is introduced to propagate the inter-frame local temporal context in a cyclic manner. Moreover, to aggregate global temporal context, we first condense each frame into a set of frame bottleneck queries and devise Hilbert Selective Scan (HilbertSS) to both efficiently path connect each frame and preserve the locality bias. A distribute layer is then utilized to disseminate back the global context for reciprocal refinement. Extensive experiments on UFUV and three public Video Polyp Segmentation (VPS) datasets demonstrate consistent improvements compared to state-of-the-art segmentation methods, indicating the effectiveness and versatility of LGRNet. Code, checkpoints, and dataset are available at https://github.com/bio-mlhui/LGRNet
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Optimised ProPainter for Video Diminished Reality Inpainting
Authors:
Pengze Li,
Lihao Liu,
Carola-Bibiane Schönlieb,
Angelica I Aviles-Rivero
Abstract:
In this paper, part of the DREAMING Challenge - Diminished Reality for Emerging Applications in Medicine through Inpainting, we introduce a refined video inpainting technique optimised from the ProPainter method to meet the specialised demands of medical imaging, specifically in the context of oral and maxillofacial surgery. Our enhanced algorithm employs the zero-shot ProPainter, featuring optimi…
▽ More
In this paper, part of the DREAMING Challenge - Diminished Reality for Emerging Applications in Medicine through Inpainting, we introduce a refined video inpainting technique optimised from the ProPainter method to meet the specialised demands of medical imaging, specifically in the context of oral and maxillofacial surgery. Our enhanced algorithm employs the zero-shot ProPainter, featuring optimized parameters and pre-processing, to adeptly manage the complex task of inpainting surgical video sequences, without requiring any training process. It aims to produce temporally coherent and detail-rich reconstructions of occluded regions, facilitating clearer views of operative fields. The efficacy of our approach is evaluated using comprehensive metrics, positioning it as a significant advancement in the application of diminished reality for medical purposes.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Enhancing Global Sensitivity and Uncertainty Quantification in Medical Image Reconstruction with Monte Carlo Arbitrary-Masked Mamba
Authors:
Jiahao Huang,
Liutao Yang,
Fanwen Wang,
Yang Nan,
Weiwen Wu,
Chengyan Wang,
Kuangyu Shi,
Angelica I. Aviles-Rivero,
Carola-Bibiane Schönlieb,
Daoqiang Zhang,
Guang Yang
Abstract:
Deep learning has been extensively applied in medical image reconstruction, where Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) represent the predominant paradigms, each possessing distinct advantages and inherent limitations: CNNs exhibit linear complexity with local sensitivity, whereas ViTs demonstrate quadratic complexity with global sensitivity. The emerging Mamba has sh…
▽ More
Deep learning has been extensively applied in medical image reconstruction, where Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) represent the predominant paradigms, each possessing distinct advantages and inherent limitations: CNNs exhibit linear complexity with local sensitivity, whereas ViTs demonstrate quadratic complexity with global sensitivity. The emerging Mamba has shown superiority in learning visual representation, which combines the advantages of linear scalability and global sensitivity. In this study, we introduce MambaMIR, an Arbitrary-Masked Mamba-based model with wavelet decomposition for joint medical image reconstruction and uncertainty estimation. A novel Arbitrary Scan Masking (ASM) mechanism "masks out" redundant information to introduce randomness for further uncertainty estimation. Compared to the commonly used Monte Carlo (MC) dropout, our proposed MC-ASM provides an uncertainty map without the need for hyperparameter tuning and mitigates the performance drop typically observed when applying dropout to low-level tasks. For further texture preservation and better perceptual quality, we employ the wavelet transformation into MambaMIR and explore its variant based on the Generative Adversarial Network, namely MambaMIR-GAN. Comprehensive experiments have been conducted for multiple representative medical image reconstruction tasks, demonstrating that the proposed MambaMIR and MambaMIR-GAN outperform other baseline and state-of-the-art methods in different reconstruction tasks, where MambaMIR achieves the best reconstruction fidelity and MambaMIR-GAN has the best perceptual quality. In addition, our MC-ASM provides uncertainty maps as an additional tool for clinicians, while mitigating the typical performance drop caused by the commonly used dropout.
△ Less
Submitted 25 June, 2024; v1 submitted 27 May, 2024;
originally announced May 2024.
-
MAMBA4D: Efficient Long-Sequence Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models
Authors:
Jiuming Liu,
Jinru Han,
Lihao Liu,
Angelica I. Aviles-Rivero,
Chaokang Jiang,
Zhe Liu,
Hesheng Wang
Abstract:
Point cloud videos can faithfully capture real-world spatial geometries and temporal dynamics, which are essential for enabling intelligent agents to understand the dynamically changing world. However, designing an effective 4D backbone remains challenging, mainly due to the irregular and unordered distribution of points and temporal inconsistencies across frames. Also, recent transformer-based 4D…
▽ More
Point cloud videos can faithfully capture real-world spatial geometries and temporal dynamics, which are essential for enabling intelligent agents to understand the dynamically changing world. However, designing an effective 4D backbone remains challenging, mainly due to the irregular and unordered distribution of points and temporal inconsistencies across frames. Also, recent transformer-based 4D backbones commonly suffer from large computational costs due to their quadratic complexity, particularly for long video sequences. To address these challenges, we propose a novel point cloud video understanding backbone purely based on the State Space Models (SSMs). Specifically, we first disentangle space and time in 4D video sequences and then establish the spatio-temporal correlation with our designed Mamba blocks. The Intra-frame Spatial Mamba module is developed to encode locally similar geometric structures within a certain temporal stride. Subsequently, locally correlated tokens are delivered to the Inter-frame Temporal Mamba module, which integrates long-term point features across the entire video with linear complexity. Our proposed Mamba4d achieves competitive performance on the MSR-Action3D action recognition (+10.4% accuracy), HOI4D action segmentation (+0.7 F1 Score), and Synthia4D semantic segmentation (+0.19 mIoU) datasets. Especially, for long video sequences, our method has a significant efficiency improvement with 87.5% GPU memory reduction and 5.36 times speed-up. Codes will be released at https://github.com/IRMVLab/Mamba4D.
△ Less
Submitted 26 February, 2025; v1 submitted 23 May, 2024;
originally announced May 2024.
-
Bilevel Hypergraph Networks for Multi-Modal Alzheimer's Diagnosis
Authors:
Angelica I. Aviles-Rivero,
Chun-Wun Cheng,
Zhongying Deng,
Zoe Kourtzi,
Carola-Bibiane Schönlieb
Abstract:
Early detection of Alzheimer's disease's precursor stages is imperative for significantly enhancing patient outcomes and quality of life. This challenge is tackled through a semi-supervised multi-modal diagnosis framework. In particular, we introduce a new hypergraph framework that enables higher-order relations between multi-modal data, while utilising minimal labels. We first introduce a bilevel…
▽ More
Early detection of Alzheimer's disease's precursor stages is imperative for significantly enhancing patient outcomes and quality of life. This challenge is tackled through a semi-supervised multi-modal diagnosis framework. In particular, we introduce a new hypergraph framework that enables higher-order relations between multi-modal data, while utilising minimal labels. We first introduce a bilevel hypergraph optimisation framework that jointly learns a graph augmentation policy and a semi-supervised classifier. This dual learning strategy is hypothesised to enhance the robustness and generalisation capabilities of the model by fostering new pathways for information propagation. Secondly, we introduce a novel strategy for generating pseudo-labels more effectively via a gradient-driven flow. Our experimental results demonstrate the superior performance of our framework over current techniques in diagnosing Alzheimer's disease.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
Biophysics Informed Pathological Regularisation for Brain Tumour Segmentation
Authors:
Lipei Zhang,
Yanqi Cheng,
Lihao Liu,
Carola-Bibiane Schönlieb,
Angelica I Aviles-Rivero
Abstract:
Recent advances in deep learning have significantly improved brain tumour segmentation techniques; however, the results still lack confidence and robustness as they solely consider image data without biophysical priors or pathological information. Integrating biophysics-informed regularisation is one effective way to change this situation, as it provides an prior regularisation for automated end-t…
▽ More
Recent advances in deep learning have significantly improved brain tumour segmentation techniques; however, the results still lack confidence and robustness as they solely consider image data without biophysical priors or pathological information. Integrating biophysics-informed regularisation is one effective way to change this situation, as it provides an prior regularisation for automated end-to-end learning. In this paper, we propose a novel approach that designs brain tumour growth Partial Differential Equation (PDE) models as a regularisation with deep learning, operational with any network model. Our method introduces tumour growth PDE models directly into the segmentation process, improving accuracy and robustness, especially in data-scarce scenarios. This system estimates tumour cell density using a periodic activation function. By effectively integrating this estimation with biophysical models, we achieve better capture of tumour characteristics. This approach not only aligns the segmentation closer to actual biological behaviour but also strengthens the model's performance under limited data conditions. We demonstrate the effectiveness of our framework through extensive experiments on the BraTS 2023 dataset, showcasing significant improvements in both precision and reliability of tumour segmentation.
△ Less
Submitted 8 October, 2024; v1 submitted 14 March, 2024;
originally announced March 2024.
-
Genuine Knowledge from Practice: Diffusion Test-Time Adaptation for Video Adverse Weather Removal
Authors:
Yijun Yang,
Hongtao Wu,
Angelica I. Aviles-Rivero,
Yulun Zhang,
Jing Qin,
Lei Zhu
Abstract:
Real-world vision tasks frequently suffer from the appearance of unexpected adverse weather conditions, including rain, haze, snow, and raindrops. In the last decade, convolutional neural networks and vision transformers have yielded outstanding results in single-weather video removal. However, due to the absence of appropriate adaptation, most of them fail to generalize to other weather condition…
▽ More
Real-world vision tasks frequently suffer from the appearance of unexpected adverse weather conditions, including rain, haze, snow, and raindrops. In the last decade, convolutional neural networks and vision transformers have yielded outstanding results in single-weather video removal. However, due to the absence of appropriate adaptation, most of them fail to generalize to other weather conditions. Although ViWS-Net is proposed to remove adverse weather conditions in videos with a single set of pre-trained weights, it is seriously blinded by seen weather at train-time and degenerates when coming to unseen weather during test-time. In this work, we introduce test-time adaptation into adverse weather removal in videos, and propose the first framework that integrates test-time adaptation into the iterative diffusion reverse process. Specifically, we devise a diffusion-based network with a novel temporal noise model to efficiently explore frame-correlated information in degraded video clips at training stage. During inference stage, we introduce a proxy task named Diffusion Tubelet Self-Calibration to learn the primer distribution of test video stream and optimize the model by approximating the temporal noise model for online adaptation. Experimental results, on benchmark datasets, demonstrate that our Test-Time Adaptation method with Diffusion-based network(Diff-TTA) outperforms state-of-the-art methods in terms of restoring videos degraded by seen weather conditions. Its generalizable capability is also validated with unseen weather conditions in both synthesized and real-world videos.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
MambaMIR: An Arbitrary-Masked Mamba for Joint Medical Image Reconstruction and Uncertainty Estimation
Authors:
Jiahao Huang,
Liutao Yang,
Fanwen Wang,
Yang Nan,
Angelica I. Aviles-Rivero,
Carola-Bibiane Schönlieb,
Daoqiang Zhang,
Guang Yang
Abstract:
The recent Mamba model has shown remarkable adaptability for visual representation learning, including in medical imaging tasks. This study introduces MambaMIR, a Mamba-based model for medical image reconstruction, as well as its Generative Adversarial Network-based variant, MambaMIR-GAN. Our proposed MambaMIR inherits several advantages, such as linear complexity, global receptive fields, and dyn…
▽ More
The recent Mamba model has shown remarkable adaptability for visual representation learning, including in medical imaging tasks. This study introduces MambaMIR, a Mamba-based model for medical image reconstruction, as well as its Generative Adversarial Network-based variant, MambaMIR-GAN. Our proposed MambaMIR inherits several advantages, such as linear complexity, global receptive fields, and dynamic weights, from the original Mamba model. The innovated arbitrary-mask mechanism effectively adapt Mamba to our image reconstruction task, providing randomness for subsequent Monte Carlo-based uncertainty estimation. Experiments conducted on various medical image reconstruction tasks, including fast MRI and SVCT, which cover anatomical regions such as the knee, chest, and abdomen, have demonstrated that MambaMIR and MambaMIR-GAN achieve comparable or superior reconstruction results relative to state-of-the-art methods. Additionally, the estimated uncertainty maps offer further insights into the reliability of the reconstruction quality. The code is publicly available at https://github.com/ayanglab/MambaMIR.
△ Less
Submitted 25 June, 2024; v1 submitted 28 February, 2024;
originally announced February 2024.
-
Revitalizing Multivariate Time Series Forecasting: Learnable Decomposition with Inter-Series Dependencies and Intra-Series Variations Modeling
Authors:
Guoqi Yu,
Jing Zou,
Xiaowei Hu,
Angelica I. Aviles-Rivero,
Jing Qin,
Shujun Wang
Abstract:
Predicting multivariate time series is crucial, demanding precise modeling of intricate patterns, including inter-series dependencies and intra-series variations. Distinctive trend characteristics in each time series pose challenges, and existing methods, relying on basic moving average kernels, may struggle with the non-linear structure and complex trends in real-world data. Given that, we introd…
▽ More
Predicting multivariate time series is crucial, demanding precise modeling of intricate patterns, including inter-series dependencies and intra-series variations. Distinctive trend characteristics in each time series pose challenges, and existing methods, relying on basic moving average kernels, may struggle with the non-linear structure and complex trends in real-world data. Given that, we introduce a learnable decomposition strategy to capture dynamic trend information more reasonably. Additionally, we propose a dual attention module tailored to capture inter-series dependencies and intra-series variations simultaneously for better time series forecasting, which is implemented by channel-wise self-attention and autoregressive self-attention. To evaluate the effectiveness of our method, we conducted experiments across eight open-source datasets and compared it with the state-of-the-art methods. Through the comparison results, our Leddam (LEarnable Decomposition and Dual Attention Module) not only demonstrates significant advancements in predictive performance, but also the proposed decomposition strategy can be plugged into other methods with a large performance-boosting, from 11.87% to 48.56% MSE error degradation.
△ Less
Submitted 5 July, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Single-Shot Plug-and-Play Methods for Inverse Problems
Authors:
Yanqi Cheng,
Lipei Zhang,
Zhenda Shen,
Shujun Wang,
Lequan Yu,
Raymond H. Chan,
Carola-Bibiane Schönlieb,
Angelica I Aviles-Rivero
Abstract:
The utilisation of Plug-and-Play (PnP) priors in inverse problems has become increasingly prominent in recent years. This preference is based on the mathematical equivalence between the general proximal operator and the regularised denoiser, facilitating the adaptation of various off-the-shelf denoiser priors to a wide range of inverse problems. However, existing PnP models predominantly rely on p…
▽ More
The utilisation of Plug-and-Play (PnP) priors in inverse problems has become increasingly prominent in recent years. This preference is based on the mathematical equivalence between the general proximal operator and the regularised denoiser, facilitating the adaptation of various off-the-shelf denoiser priors to a wide range of inverse problems. However, existing PnP models predominantly rely on pre-trained denoisers using large datasets. In this work, we introduce Single-Shot PnP methods (SS-PnP), shifting the focus to solving inverse problems with minimal data. First, we integrate Single-Shot proximal denoisers into iterative methods, enabling training with single instances. Second, we propose implicit neural priors based on a novel function that preserves relevant frequencies to capture fine details while avoiding the issue of vanishing gradients. We demonstrate, through extensive numerical and visual experiments, that our method leads to better approximations.
△ Less
Submitted 11 November, 2024; v1 submitted 22 November, 2023;
originally announced November 2023.
-
TRIDENT: The Nonlinear Trilogy for Implicit Neural Representations
Authors:
Zhenda Shen,
Yanqi Cheng,
Raymond H. Chan,
Pietro Liò,
Carola-Bibiane Schönlieb,
Angelica I Aviles-Rivero
Abstract:
Implicit neural representations (INRs) have garnered significant interest recently for their ability to model complex, high-dimensional data without explicit parameterisation. In this work, we introduce TRIDENT, a novel function for implicit neural representations characterised by a trilogy of nonlinearities. Firstly, it is designed to represent high-order features through order compactness. Secon…
▽ More
Implicit neural representations (INRs) have garnered significant interest recently for their ability to model complex, high-dimensional data without explicit parameterisation. In this work, we introduce TRIDENT, a novel function for implicit neural representations characterised by a trilogy of nonlinearities. Firstly, it is designed to represent high-order features through order compactness. Secondly, TRIDENT efficiently captures frequency information, a feature called frequency compactness. Thirdly, it has the capability to represent signals or images such that most of its energy is concentrated in a limited spatial region, denoting spatial compactness. We demonstrated through extensive experiments on various inverse problems that our proposed function outperforms existing implicit neural representation functions.
△ Less
Submitted 21 November, 2023;
originally announced November 2023.
-
Traffic Video Object Detection using Motion Prior
Authors:
Lihao Liu,
Yanqi Cheng,
Dongdong Chen,
Jing He,
Pietro Liò,
Carola-Bibiane Schönlieb,
Angelica I Aviles-Rivero
Abstract:
Traffic videos inherently differ from generic videos in their stationary camera setup, thus providing a strong motion prior where objects often move in a specific direction over a short time interval. Existing works predominantly employ generic video object detection framework for traffic video object detection, which yield certain advantages such as broad applicability and robustness to diverse s…
▽ More
Traffic videos inherently differ from generic videos in their stationary camera setup, thus providing a strong motion prior where objects often move in a specific direction over a short time interval. Existing works predominantly employ generic video object detection framework for traffic video object detection, which yield certain advantages such as broad applicability and robustness to diverse scenarios. However, they fail to harness the strength of motion prior to enhance detection accuracy. In this work, we propose two innovative methods to exploit the motion prior and boost the performance of both fully-supervised and semi-supervised traffic video object detection. Firstly, we introduce a new self-attention module that leverages the motion prior to guide temporal information integration in the fully-supervised setting. Secondly, we utilise the motion prior to develop a pseudo-labelling mechanism to eliminate noisy pseudo labels for the semi-supervised setting. Both of our motion-prior-centred methods consistently demonstrates superior performance, outperforming existing state-of-the-art approaches by a margin of 2% in terms of mAP.
△ Less
Submitted 16 November, 2023;
originally announced November 2023.
-
The Missing U for Efficient Diffusion Models
Authors:
Sergio Calvo-Ordonez,
Chun-Wun Cheng,
Jiahao Huang,
Lipei Zhang,
Guang Yang,
Carola-Bibiane Schonlieb,
Angelica I Aviles-Rivero
Abstract:
Diffusion Probabilistic Models stand as a critical tool in generative modelling, enabling the generation of complex data distributions. This family of generative models yields record-breaking performance in tasks such as image synthesis, video generation, and molecule design. Despite their capabilities, their efficiency, especially in the reverse process, remains a challenge due to slow convergenc…
▽ More
Diffusion Probabilistic Models stand as a critical tool in generative modelling, enabling the generation of complex data distributions. This family of generative models yields record-breaking performance in tasks such as image synthesis, video generation, and molecule design. Despite their capabilities, their efficiency, especially in the reverse process, remains a challenge due to slow convergence rates and high computational costs. In this paper, we introduce an approach that leverages continuous dynamical systems to design a novel denoising network for diffusion models that is more parameter-efficient, exhibits faster convergence, and demonstrates increased noise robustness. Experimenting with Denoising Diffusion Probabilistic Models (DDPMs), our framework operates with approximately a quarter of the parameters, and $\sim$ 30\% of the Floating Point Operations (FLOPs) compared to standard U-Nets in DDPMs. Furthermore, our model is notably faster in inference than the baseline when measured in fair and equal conditions. We also provide a mathematical intuition as to why our proposed reverse process is faster as well as a mathematical discussion of the empirical tradeoffs in the denoising downstream task. Finally, we argue that our method is compatible with existing performance enhancement techniques, enabling further improvements in efficiency, quality, and speed.
△ Less
Submitted 5 April, 2024; v1 submitted 30 October, 2023;
originally announced October 2023.
-
Video Adverse-Weather-Component Suppression Network via Weather Messenger and Adversarial Backpropagation
Authors:
Yijun Yang,
Angelica I. Aviles-Rivero,
Huazhu Fu,
Ye Liu,
Weiming Wang,
Lei Zhu
Abstract:
Although convolutional neural networks (CNNs) have been proposed to remove adverse weather conditions in single images using a single set of pre-trained weights, they fail to restore weather videos due to the absence of temporal information. Furthermore, existing methods for removing adverse weather conditions (e.g., rain, fog, and snow) from videos can only handle one type of adverse weather. In…
▽ More
Although convolutional neural networks (CNNs) have been proposed to remove adverse weather conditions in single images using a single set of pre-trained weights, they fail to restore weather videos due to the absence of temporal information. Furthermore, existing methods for removing adverse weather conditions (e.g., rain, fog, and snow) from videos can only handle one type of adverse weather. In this work, we propose the first framework for restoring videos from all adverse weather conditions by developing a video adverse-weather-component suppression network (ViWS-Net). To achieve this, we first devise a weather-agnostic video transformer encoder with multiple transformer stages. Moreover, we design a long short-term temporal modeling mechanism for weather messenger to early fuse input adjacent video frames and learn weather-specific information. We further introduce a weather discriminator with gradient reversion, to maintain the weather-invariant common information and suppress the weather-specific information in pixel features, by adversarially predicting weather types. Finally, we develop a messenger-driven video transformer decoder to retrieve the residual weather-specific feature, which is spatiotemporally aggregated with hierarchical pixel features and refined to predict the clean target frame of input videos. Experimental results, on benchmark datasets and real-world weather videos, demonstrate that our ViWS-Net outperforms current state-of-the-art methods in terms of restoring videos degraded by any weather condition.
△ Less
Submitted 24 September, 2023;
originally announced September 2023.
-
MammoDG: Generalisable Deep Learning Breaks the Limits of Cross-Domain Multi-Center Breast Cancer Screening
Authors:
Yijun Yang,
Shujun Wang,
Lihao Liu,
Sarah Hickman,
Fiona J Gilbert,
Carola-Bibiane Schönlieb,
Angelica I. Aviles-Rivero
Abstract:
Breast cancer is a major cause of cancer death among women, emphasising the importance of early detection for improved treatment outcomes and quality of life. Mammography, the primary diagnostic imaging test, poses challenges due to the high variability and patterns in mammograms. Double reading of mammograms is recommended in many screening programs to improve diagnostic accuracy but increases ra…
▽ More
Breast cancer is a major cause of cancer death among women, emphasising the importance of early detection for improved treatment outcomes and quality of life. Mammography, the primary diagnostic imaging test, poses challenges due to the high variability and patterns in mammograms. Double reading of mammograms is recommended in many screening programs to improve diagnostic accuracy but increases radiologists' workload. Researchers explore Machine Learning models to support expert decision-making. Stand-alone models have shown comparable or superior performance to radiologists, but some studies note decreased sensitivity with multiple datasets, indicating the need for high generalisation and robustness models. This work devises MammoDG, a novel deep-learning framework for generalisable and reliable analysis of cross-domain multi-center mammography data. MammoDG leverages multi-view mammograms and a novel contrastive mechanism to enhance generalisation capabilities. Extensive validation demonstrates MammoDG's superiority, highlighting the critical importance of domain generalisation for trustworthy mammography analysis in imaging protocol variations.
△ Less
Submitted 2 August, 2023;
originally announced August 2023.
-
Deep Learning-based Diffusion Tensor Cardiac Magnetic Resonance Reconstruction: A Comparison Study
Authors:
Jiahao Huang,
Pedro F. Ferreira,
Lichao Wang,
Yinzhe Wu,
Angelica I. Aviles-Rivero,
Carola-Bibiane Schonlieb,
Andrew D. Scott,
Zohya Khalique,
Maria Dwornik,
Ramyah Rajakulasingam,
Ranil De Silva,
Dudley J. Pennell,
Sonia Nielles-Vallespin,
Guang Yang
Abstract:
In vivo cardiac diffusion tensor imaging (cDTI) is a promising Magnetic Resonance Imaging (MRI) technique for evaluating the micro-structure of myocardial tissue in the living heart, providing insights into cardiac function and enabling the development of innovative therapeutic strategies. However, the integration of cDTI into routine clinical practice is challenging due to the technical obstacles…
▽ More
In vivo cardiac diffusion tensor imaging (cDTI) is a promising Magnetic Resonance Imaging (MRI) technique for evaluating the micro-structure of myocardial tissue in the living heart, providing insights into cardiac function and enabling the development of innovative therapeutic strategies. However, the integration of cDTI into routine clinical practice is challenging due to the technical obstacles involved in the acquisition, such as low signal-to-noise ratio and long scanning times. In this paper, we investigate and implement three different types of deep learning-based MRI reconstruction models for cDTI reconstruction. We evaluate the performance of these models based on reconstruction quality assessment and diffusion tensor parameter assessment. Our results indicate that the models we discussed in this study can be applied for clinical use at an acceleration factor (AF) of $\times 2$ and $\times 4$, with the D5C5 model showing superior fidelity for reconstruction and the SwinMR model providing higher perceptual scores. There is no statistical difference with the reference for all diffusion tensor parameters at AF $\times 2$ or most DT parameters at AF $\times 4$, and the quality of most diffusion tensor parameter maps are visually acceptable. SwinMR is recommended as the optimal approach for reconstruction at AF $\times 2$ and AF $\times 4$. However, we believed the models discussed in this studies are not prepared for clinical use at a higher AF. At AF $\times 8$, the performance of all models discussed remains limited, with only half of the diffusion tensor parameters being recovered to a level with no statistical difference from the reference. Some diffusion tensor parameter maps even provide wrong and misleading information.
△ Less
Submitted 4 April, 2023; v1 submitted 31 March, 2023;
originally announced April 2023.
-
DiffMIC: Dual-Guidance Diffusion Network for Medical Image Classification
Authors:
Yijun Yang,
Huazhu Fu,
Angelica I. Aviles-Rivero,
Carola-Bibiane Schönlieb,
Lei Zhu
Abstract:
Diffusion Probabilistic Models have recently shown remarkable performance in generative image modeling, attracting significant attention in the computer vision community. However, while a substantial amount of diffusion-based research has focused on generative tasks, few studies have applied diffusion models to general medical image classification. In this paper, we propose the first diffusion-bas…
▽ More
Diffusion Probabilistic Models have recently shown remarkable performance in generative image modeling, attracting significant attention in the computer vision community. However, while a substantial amount of diffusion-based research has focused on generative tasks, few studies have applied diffusion models to general medical image classification. In this paper, we propose the first diffusion-based model (named DiffMIC) to address general medical image classification by eliminating unexpected noise and perturbations in medical images and robustly capturing semantic representation. To achieve this goal, we devise a dual conditional guidance strategy that conditions each diffusion step with multiple granularities to improve step-wise regional attention. Furthermore, we propose learning the mutual information in each granularity by enforcing Maximum-Mean Discrepancy regularization during the diffusion forward process. We evaluate the effectiveness of our DiffMIC on three medical classification tasks with different image modalities, including placental maturity grading on ultrasound images, skin lesion classification using dermatoscopic images, and diabetic retinopathy grading using fundus images. Our experimental results demonstrate that DiffMIC outperforms state-of-the-art methods by a significant margin, indicating the universality and effectiveness of the proposed model. Our code will be publicly available at https://github.com/scott-yjyang/DiffMIC.
△ Less
Submitted 11 July, 2023; v1 submitted 19 March, 2023;
originally announced March 2023.
-
HGIB: Prognosis for Alzheimer's Disease via Hypergraph Information Bottleneck
Authors:
Shujun Wang,
Angelica I Aviles-Rivero,
Zoe Kourtzi,
Carola-Bibiane Schönlieb
Abstract:
Alzheimer's disease prognosis is critical for early Mild Cognitive Impairment patients for timely treatment to improve the patient's quality of life. Whilst existing prognosis techniques demonstrate potential results, they are highly limited in terms of using a single modality. Most importantly, they fail in considering a key element for prognosis: not all features extracted at the current moment…
▽ More
Alzheimer's disease prognosis is critical for early Mild Cognitive Impairment patients for timely treatment to improve the patient's quality of life. Whilst existing prognosis techniques demonstrate potential results, they are highly limited in terms of using a single modality. Most importantly, they fail in considering a key element for prognosis: not all features extracted at the current moment may contribute to the prognosis prediction several years later. To address the current drawbacks of the literature, we propose a novel hypergraph framework based on an information bottleneck strategy (HGIB). Firstly, our framework seeks to discriminate irrelevant information, and therefore, solely focus on harmonising relevant information for future MCI conversion prediction e.g., two years later). Secondly, our model simultaneously accounts for multi-modal data based on imaging and non-imaging modalities. HGIB uses a hypergraph structure to represent the multi-modality data and accounts for various data modality types. Thirdly, the key of our model is based on a new optimisation scheme. It is based on modelling the principle of information bottleneck into loss functions that can be integrated into our hypergraph neural network. We demonstrate, through extensive experiments on ADNI, that our proposed HGIB framework outperforms existing state-of-the-art hypergraph neural networks for Alzheimer's disease prognosis. We showcase our model even under fewer labels. Finally, we further support the robustness and generalisation capabilities of our framework under both topological and feature perturbations.
△ Less
Submitted 18 March, 2023;
originally announced March 2023.
-
Learning Homeomorphic Image Registration via Conformal-Invariant Hyperelastic Regularisation
Authors:
Jing Zou,
Noémie Debroux,
Lihao Liu,
Jing Qin,
Carola-Bibiane Schönlieb,
Angelica I Aviles-Rivero
Abstract:
Deformable image registration is a fundamental task in medical image analysis and plays a crucial role in a wide range of clinical applications. Recently, deep learning-based approaches have been widely studied for deformable medical image registration and achieved promising results. However, existing deep learning image registration techniques do not theoretically guarantee topology-preserving tr…
▽ More
Deformable image registration is a fundamental task in medical image analysis and plays a crucial role in a wide range of clinical applications. Recently, deep learning-based approaches have been widely studied for deformable medical image registration and achieved promising results. However, existing deep learning image registration techniques do not theoretically guarantee topology-preserving transformations. This is a key property to preserve anatomical structures and achieve plausible transformations that can be used in real clinical settings. We propose a novel framework for deformable image registration. Firstly, we introduce a novel regulariser based on conformal-invariant properties in a nonlinear elasticity setting. Our regulariser enforces the deformation field to be smooth, invertible and orientation-preserving. More importantly, we strictly guarantee topology preservation yielding to a clinical meaningful registration. Secondly, we boost the performance of our regulariser through coordinate MLPs, where one can view the to-be-registered images as continuously differentiable entities. We demonstrate, through numerical and visual experiments, that our framework is able to outperform current techniques for image registration.
△ Less
Submitted 30 June, 2023; v1 submitted 14 March, 2023;
originally announced March 2023.
-
CoNIC Challenge: Pushing the Frontiers of Nuclear Detection, Segmentation, Classification and Counting
Authors:
Simon Graham,
Quoc Dang Vu,
Mostafa Jahanifar,
Martin Weigert,
Uwe Schmidt,
Wenhua Zhang,
Jun Zhang,
Sen Yang,
Jinxi Xiang,
Xiyue Wang,
Josef Lorenz Rumberger,
Elias Baumann,
Peter Hirsch,
Lihao Liu,
Chenyang Hong,
Angelica I. Aviles-Rivero,
Ayushi Jain,
Heeyoung Ahn,
Yiyu Hong,
Hussam Azzuni,
Min Xu,
Mohammad Yaqub,
Marie-Claire Blache,
Benoît Piégu,
Bertrand Vernay
, et al. (64 additional authors not shown)
Abstract:
Nuclear detection, segmentation and morphometric profiling are essential in helping us further understand the relationship between histology and patient outcome. To drive innovation in this area, we setup a community-wide challenge using the largest available dataset of its kind to assess nuclear segmentation and cellular composition. Our challenge, named CoNIC, stimulated the development of repro…
▽ More
Nuclear detection, segmentation and morphometric profiling are essential in helping us further understand the relationship between histology and patient outcome. To drive innovation in this area, we setup a community-wide challenge using the largest available dataset of its kind to assess nuclear segmentation and cellular composition. Our challenge, named CoNIC, stimulated the development of reproducible algorithms for cellular recognition with real-time result inspection on public leaderboards. We conducted an extensive post-challenge analysis based on the top-performing models using 1,658 whole-slide images of colon tissue. With around 700 million detected nuclei per model, associated features were used for dysplasia grading and survival analysis, where we demonstrated that the challenge's improvement over the previous state-of-the-art led to significant boosts in downstream performance. Our findings also suggest that eosinophils and neutrophils play an important role in the tumour microevironment. We release challenge models and WSI-level results to foster the development of further methods for biomarker discovery.
△ Less
Submitted 14 March, 2023; v1 submitted 10 March, 2023;
originally announced March 2023.
-
Continuous U-Net: Faster, Greater and Noiseless
Authors:
Chun-Wun Cheng,
Christina Runkel,
Lihao Liu,
Raymond H Chan,
Carola-Bibiane Schönlieb,
Angelica I Aviles-Rivero
Abstract:
Image segmentation is a fundamental task in image analysis and clinical practice. The current state-of-the-art techniques are based on U-shape type encoder-decoder networks with skip connections, called U-Net. Despite the powerful performance reported by existing U-Net type networks, they suffer from several major limitations. Issues include the hard coding of the receptive field size, compromisin…
▽ More
Image segmentation is a fundamental task in image analysis and clinical practice. The current state-of-the-art techniques are based on U-shape type encoder-decoder networks with skip connections, called U-Net. Despite the powerful performance reported by existing U-Net type networks, they suffer from several major limitations. Issues include the hard coding of the receptive field size, compromising the performance and computational cost, as well as the fact that they do not account for inherent noise in the data. They have problems associated with discrete layers, and do not offer any theoretical underpinning. In this work we introduce continuous U-Net, a novel family of networks for image segmentation. Firstly, continuous U-Net is a continuous deep neural network that introduces new dynamic blocks modelled by second order ordinary differential equations. Secondly, we provide theoretical guarantees for our network demonstrating faster convergence, higher robustness and less sensitivity to noise. Thirdly, we derive qualitative measures to tailor-made segmentation tasks. We demonstrate, through extensive numerical and visual results, that our model outperforms existing U-Net blocks for several medical image segmentation benchmarking datasets.
△ Less
Submitted 1 February, 2023;
originally announced February 2023.
-
TrafficCAM: A Versatile Dataset for Traffic Flow Segmentation
Authors:
Zhongying Deng,
Yanqi Chen,
Lihao Liu,
Shujun Wang,
Rihuan Ke,
Carola-Bibiane Schonlieb,
Angelica I Aviles-Rivero
Abstract:
Traffic flow analysis is revolutionising traffic management. Qualifying traffic flow data, traffic control bureaus could provide drivers with real-time alerts, advising the fastest routes and therefore optimising transportation logistics and reducing congestion. The existing traffic flow datasets have two major limitations. They feature a limited number of classes, usually limited to one type of v…
▽ More
Traffic flow analysis is revolutionising traffic management. Qualifying traffic flow data, traffic control bureaus could provide drivers with real-time alerts, advising the fastest routes and therefore optimising transportation logistics and reducing congestion. The existing traffic flow datasets have two major limitations. They feature a limited number of classes, usually limited to one type of vehicle, and the scarcity of unlabelled data. In this paper, we introduce a new benchmark traffic flow image dataset called TrafficCAM. Our dataset distinguishes itself by two major highlights. Firstly, TrafficCAM provides both pixel-level and instance-level semantic labelling along with a large range of types of vehicles and pedestrians. It is composed of a large and diverse set of video sequences recorded in streets from eight Indian cities with stationary cameras. Secondly, TrafficCAM aims to establish a new benchmark for developing fully-supervised tasks, and importantly, semi-supervised learning techniques. It is the first dataset that provides a vast amount of unlabelled data, helping to better capture traffic flow qualification under a low cost annotation requirement. More precisely, our dataset has 4,402 image frames with semantic and instance annotations along with 59,944 unlabelled image frames. We validate our new dataset through a large and comprehensive range of experiments on several state-of-the-art approaches under four different settings: fully-supervised semantic and instance segmentation, and semi-supervised semantic and instance segmentation tasks. Our benchmark dataset will be released.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
NorMatch: Matching Normalizing Flows with Discriminative Classifiers for Semi-Supervised Learning
Authors:
Zhongying Deng,
Rihuan Ke,
Carola-Bibiane Schonlieb,
Angelica I Aviles-Rivero
Abstract:
Semi-Supervised Learning (SSL) aims to learn a model using a tiny labeled set and massive amounts of unlabeled data. To better exploit the unlabeled data the latest SSL methods use pseudo-labels predicted from a single discriminative classifier. However, the generated pseudo-labels are inevitably linked to inherent confirmation bias and noise which greatly affects the model performance. In this wo…
▽ More
Semi-Supervised Learning (SSL) aims to learn a model using a tiny labeled set and massive amounts of unlabeled data. To better exploit the unlabeled data the latest SSL methods use pseudo-labels predicted from a single discriminative classifier. However, the generated pseudo-labels are inevitably linked to inherent confirmation bias and noise which greatly affects the model performance. In this work we introduce a new framework for SSL named NorMatch. Firstly, we introduce a new uncertainty estimation scheme based on normalizing flows, as an auxiliary classifier, to enforce highly certain pseudo-labels yielding a boost of the discriminative classifiers. Secondly, we introduce a threshold-free sample weighting strategy to exploit better both high and low confidence pseudo-labels. Furthermore, we utilize normalizing flows to model, in an unsupervised fashion, the distribution of unlabeled data. This modelling assumption can further improve the performance of generative classifiers via unlabeled data, and thus, implicitly contributing to training a better discriminative classifier. We demonstrate, through numerical and visual results, that NorMatch achieves state-of-the-art performance on several datasets.
△ Less
Submitted 16 February, 2024; v1 submitted 17 November, 2022;
originally announced November 2022.
-
SCOTCH and SODA: A Transformer Video Shadow Detection Framework
Authors:
Lihao Liu,
Jean Prost,
Lei Zhu,
Nicolas Papadakis,
Pietro Liò,
Carola-Bibiane Schönlieb,
Angelica I Aviles-Rivero
Abstract:
Shadows in videos are difficult to detect because of the large shadow deformation between frames. In this work, we argue that accounting for shadow deformation is essential when designing a video shadow detection method. To this end, we introduce the shadow deformation attention trajectory (SODA), a new type of video self-attention module, specially designed to handle the large shadow deformations…
▽ More
Shadows in videos are difficult to detect because of the large shadow deformation between frames. In this work, we argue that accounting for shadow deformation is essential when designing a video shadow detection method. To this end, we introduce the shadow deformation attention trajectory (SODA), a new type of video self-attention module, specially designed to handle the large shadow deformations in videos. Moreover, we present a new shadow contrastive learning mechanism (SCOTCH) which aims at guiding the network to learn a unified shadow representation from massive positive shadow pairs across different videos. We demonstrate empirically the effectiveness of our two contributions in an ablation study. Furthermore, we show that SCOTCH and SODA significantly outperforms existing techniques for video shadow detection. Code is available at the project page: https://lihaoliu-cambridge.github.io/scotch_and_soda/
△ Less
Submitted 26 March, 2023; v1 submitted 13 November, 2022;
originally announced November 2022.
-
Why Deep Surgical Models Fail?: Revisiting Surgical Action Triplet Recognition through the Lens of Robustness
Authors:
Yanqi Cheng,
Lihao Liu,
Shujun Wang,
Yueming Jin,
Carola-Bibiane Schönlieb,
Angelica I. Aviles-Rivero
Abstract:
Surgical action triplet recognition provides a better understanding of the surgical scene. This task is of high relevance as it provides the surgeon with context-aware support and safety. The current go-to strategy for improving performance is the development of new network mechanisms. However, the performance of current state-of-the-art techniques is substantially lower than other surgical tasks.…
▽ More
Surgical action triplet recognition provides a better understanding of the surgical scene. This task is of high relevance as it provides the surgeon with context-aware support and safety. The current go-to strategy for improving performance is the development of new network mechanisms. However, the performance of current state-of-the-art techniques is substantially lower than other surgical tasks. Why is this happening? This is the question that we address in this work. We present the first study to understand the failure of existing deep learning models through the lens of robustness and explainability. Firstly, we study current existing models under weak and strong $δ-$perturbations via an adversarial optimisation scheme. We then analyse the failure modes via feature based explanations. Our study reveals that the key to improving performance and increasing reliability is in the core and spurious attributes. Our work opens the door to more trustworthy and reliable deep learning models in surgical data science.
△ Less
Submitted 20 February, 2023; v1 submitted 18 September, 2022;
originally announced September 2022.
-
Multi-Modal Hypergraph Diffusion Network with Dual Prior for Alzheimer Classification
Authors:
Angelica I. Aviles-Rivero,
Christina Runkel,
Nicolas Papadakis,
Zoe Kourtzi,
Carola-Bibiane Schönlieb
Abstract:
The automatic early diagnosis of prodromal stages of Alzheimer's disease is of great relevance for patient treatment to improve quality of life. We address this problem as a multi-modal classification task. Multi-modal data provides richer and complementary information. However, existing techniques only consider either lower order relations between the data and single/multi-modal imaging data. In…
▽ More
The automatic early diagnosis of prodromal stages of Alzheimer's disease is of great relevance for patient treatment to improve quality of life. We address this problem as a multi-modal classification task. Multi-modal data provides richer and complementary information. However, existing techniques only consider either lower order relations between the data and single/multi-modal imaging data. In this work, we introduce a novel semi-supervised hypergraph learning framework for Alzheimer's disease diagnosis. Our framework allows for higher-order relations among multi-modal imaging and non-imaging data whilst requiring a tiny labelled set. Firstly, we introduce a dual embedding strategy for constructing a robust hypergraph that preserves the data semantics. We achieve this by enforcing perturbation invariance at the image and graph levels using a contrastive based mechanism. Secondly, we present a dynamically adjusted hypergraph diffusion model, via a semi-explicit flow, to improve the predictive uncertainty. We demonstrate, through our experiments, that our framework is able to outperform current techniques for Alzheimer's disease diagnosis.
△ Less
Submitted 6 September, 2022; v1 submitted 4 April, 2022;
originally announced April 2022.
-
PC-SwinMorph: Patch Representation for Unsupervised Medical Image Registration and Segmentation
Authors:
Lihao Liu,
Zhening Huang,
Pietro Liò,
Carola-Bibiane Schönlieb,
Angelica I. Aviles-Rivero
Abstract:
Medical image registration and segmentation are critical tasks for several clinical procedures. Manual realisation of those tasks is time-consuming and the quality is highly dependent on the level of expertise of the physician. To mitigate that laborious task, automatic tools have been developed where the majority of solutions are supervised techniques. However, in medical domain, the strong assum…
▽ More
Medical image registration and segmentation are critical tasks for several clinical procedures. Manual realisation of those tasks is time-consuming and the quality is highly dependent on the level of expertise of the physician. To mitigate that laborious task, automatic tools have been developed where the majority of solutions are supervised techniques. However, in medical domain, the strong assumption of having a well-representative ground truth is far from being realistic. To overcome this challenge, unsupervised techniques have been investigated. However, they are still limited in performance and they fail to produce plausible results. In this work, we propose a novel unified unsupervised framework for image registration and segmentation that we called PC-SwinMorph. The core of our framework is two patch-based strategies, where we demonstrate that patch representation is key for performance gain. We first introduce a patch-based contrastive strategy that enforces locality conditions and richer feature representation. Secondly, we utilise a 3D window/shifted-window multi-head self-attention module as a patch stitching strategy to eliminate artifacts from the patch splitting. We demonstrate, through a set of numerical and visual results, that our technique outperforms current state-of-the-art unsupervised techniques.
△ Less
Submitted 20 July, 2022; v1 submitted 10 March, 2022;
originally announced March 2022.
-
Simultaneous Semantic and Instance Segmentation for Colon Nuclei Identification and Counting
Authors:
Lihao Liu,
Chenyang Hong,
Angelica I. Aviles-Rivero,
Carola-Bibiane Schönlieb
Abstract:
We address the problem of automated nuclear segmentation, classification, and quantification from Haematoxylin and Eosin stained histology images, which is of great relevance for several downstream computational pathology applications. In this work, we present a solution framed as a simultaneous semantic and instance segmentation framework. Our solution is part of the Colon Nuclei Identification a…
▽ More
We address the problem of automated nuclear segmentation, classification, and quantification from Haematoxylin and Eosin stained histology images, which is of great relevance for several downstream computational pathology applications. In this work, we present a solution framed as a simultaneous semantic and instance segmentation framework. Our solution is part of the Colon Nuclei Identification and Counting (CoNIC) Challenge. We first train a semantic and instance segmentation model separately. Our framework uses as backbone HoverNet and Cascade Mask-RCNN models. We then ensemble the results with a custom Non-Maximum Suppression embedding (NMS). In our framework, the semantic model computes a class prediction for the cells whilst the instance model provides a refined segmentation. We demonstrate, through our experimental results, that our model outperforms the provided baselines by a large margin.
△ Less
Submitted 15 April, 2022; v1 submitted 28 February, 2022;
originally announced March 2022.
-
Delving Into Deep Walkers: A Convergence Analysis of Random-Walk-Based Vertex Embeddings
Authors:
Dominik Kloepfer,
Angelica I. Aviles-Rivero,
Daniel Heydecker
Abstract:
Graph vertex embeddings based on random walks have become increasingly influential in recent years, showing good performance in several tasks as they efficiently transform a graph into a more computationally digestible format while preserving relevant information. However, the theoretical properties of such algorithms, in particular the influence of hyperparameters and of the graph structure on th…
▽ More
Graph vertex embeddings based on random walks have become increasingly influential in recent years, showing good performance in several tasks as they efficiently transform a graph into a more computationally digestible format while preserving relevant information. However, the theoretical properties of such algorithms, in particular the influence of hyperparameters and of the graph structure on their convergence behaviour, have so far not been well-understood. In this work, we provide a theoretical analysis for random-walks based embeddings techniques. Firstly, we prove that, under some weak assumptions, vertex embeddings derived from random walks do indeed converge both in the single limit of the number of random walks $N \to \infty$ and in the double limit of both $N$ and the length of each random walk $L\to\infty$. Secondly, we derive concentration bounds quantifying the converge rate of the corpora for the single and double limits. Thirdly, we use these results to derive a heuristic for choosing the hyperparameters $N$ and $L$. We validate and illustrate the practical importance of our findings with a range of numerical and visual experiments on several graphs drawn from real-world applications.
△ Less
Submitted 21 July, 2021;
originally announced July 2021.
-
LaplaceNet: A Hybrid Graph-Energy Neural Network for Deep Semi-Supervised Classification
Authors:
Philip Sellars,
Angelica I. Aviles-Rivero,
Carola-Bibiane Schönlieb
Abstract:
Semi-supervised learning has received a lot of recent attention as it alleviates the need for large amounts of labelled data which can often be expensive, requires expert knowledge and be time consuming to collect. Recent developments in deep semi-supervised classification have reached unprecedented performance and the gap between supervised and semi-supervised learning is ever-decreasing. This im…
▽ More
Semi-supervised learning has received a lot of recent attention as it alleviates the need for large amounts of labelled data which can often be expensive, requires expert knowledge and be time consuming to collect. Recent developments in deep semi-supervised classification have reached unprecedented performance and the gap between supervised and semi-supervised learning is ever-decreasing. This improvement in performance has been based on the inclusion of numerous technical tricks, strong augmentation techniques and costly optimisation schemes with multi-term loss functions. We propose a new framework, LaplaceNet, for deep semi-supervised classification that has a greatly reduced model complexity. We utilise a hybrid approach where pseudolabels are produced by minimising the Laplacian energy on a graph. These pseudo-labels are then used to iteratively train a neural-network backbone. Our model outperforms state-of-the art methods for deep semi-supervised classification, over several benchmark datasets. Furthermore, we consider the application of strong-augmentations to neural networks theoretically and justify the use of a multi-sampling approach for semi-supervised learning. We demonstrate, through rigorous experimentation, that a multi-sampling augmentation approach improves generalisation and reduces the sensitivity of the network to augmentation.
△ Less
Submitted 28 September, 2022; v1 submitted 8 June, 2021;
originally announced June 2021.
-
HERS Superpixels: Deep Affinity Learning for Hierarchical Entropy Rate Segmentation
Authors:
Hankui Peng,
Angelica I. Aviles-Rivero,
Carola-Bibiane Schönlieb
Abstract:
Superpixels serve as a powerful preprocessing tool in numerous computer vision tasks. By using superpixel representation, the number of image primitives can be largely reduced by orders of magnitudes. With the rise of deep learning in recent years, a few works have attempted to feed deeply learned features / graphs into existing classical superpixel techniques. However, none of them are able to pr…
▽ More
Superpixels serve as a powerful preprocessing tool in numerous computer vision tasks. By using superpixel representation, the number of image primitives can be largely reduced by orders of magnitudes. With the rise of deep learning in recent years, a few works have attempted to feed deeply learned features / graphs into existing classical superpixel techniques. However, none of them are able to produce superpixels in near real-time, which is crucial to the applicability of superpixels in practice. In this work, we propose a two-stage graph-based framework for superpixel segmentation. In the first stage, we introduce an efficient Deep Affinity Learning (DAL) network that learns pairwise pixel affinities by aggregating multi-scale information. In the second stage, we propose a highly efficient superpixel method called Hierarchical Entropy Rate Segmentation (HERS). Using the learned affinities from the first stage, HERS builds a hierarchical tree structure that can produce any number of highly adaptive superpixels instantaneously. We demonstrate, through visual and numerical experiments, the effectiveness and efficiency of our method compared to various state-of-the-art superpixel methods.
△ Less
Submitted 18 November, 2021; v1 submitted 7 June, 2021;
originally announced June 2021.
-
Beyond Fine-tuning: Classifying High Resolution Mammograms using Function-Preserving Transformations
Authors:
Tao Wei,
Angelica I Aviles-Rivero,
Shuo Wang,
Yuan Huang,
Fiona J Gilbert,
Carola-Bibiane Schönlieb,
Chang Wen Chen
Abstract:
The task of classifying mammograms is very challenging because the lesion is usually small in the high resolution image. The current state-of-the-art approaches for medical image classification rely on using the de-facto method for ConvNets - fine-tuning. However, there are fundamental differences between natural images and medical images, which based on existing evidence from the literature, limi…
▽ More
The task of classifying mammograms is very challenging because the lesion is usually small in the high resolution image. The current state-of-the-art approaches for medical image classification rely on using the de-facto method for ConvNets - fine-tuning. However, there are fundamental differences between natural images and medical images, which based on existing evidence from the literature, limits the overall performance gain when designed with algorithmic approaches. In this paper, we propose to go beyond fine-tuning by introducing a novel framework called MorphHR, in which we highlight a new transfer learning scheme. The idea behind the proposed framework is to integrate function-preserving transformations, for any continuous non-linear activation neurons, to internally regularise the network for improving mammograms classification. The proposed solution offers two major advantages over the existing techniques. Firstly and unlike fine-tuning, the proposed approach allows for modifying not only the last few layers but also several of the first ones on a deep ConvNet. By doing this, we can design the network front to be suitable for learning domain specific features. Secondly, the proposed scheme is scalable to hardware. Therefore, one can fit high resolution images on standard GPU memory. We show that by using high resolution images, one prevents losing relevant information. We demonstrate, through numerical and visual experiments, that the proposed approach yields to a significant improvement in the classification performance over state-of-the-art techniques, and is indeed on a par with radiology experts. Moreover and for generalisation purposes, we show the effectiveness of the proposed learning scheme on another large dataset, the ChestX-ray14, surpassing current state-of-the-art techniques.
△ Less
Submitted 19 January, 2021;
originally announced January 2021.
-
Contrastive Registration for Unsupervised Medical Image Segmentation
Authors:
Lihao Liu,
Angelica I Aviles-Rivero,
Carola-Bibiane Schönlieb
Abstract:
Medical image segmentation is a relevant task as it serves as the first step for several diagnosis processes, thus it is indispensable in clinical usage. Whilst major success has been reported using supervised techniques, they assume a large and well-representative labelled set. This is a strong assumption in the medical domain where annotations are expensive, time-consuming, and inherent to human…
▽ More
Medical image segmentation is a relevant task as it serves as the first step for several diagnosis processes, thus it is indispensable in clinical usage. Whilst major success has been reported using supervised techniques, they assume a large and well-representative labelled set. This is a strong assumption in the medical domain where annotations are expensive, time-consuming, and inherent to human bias. To address this problem, unsupervised techniques have been proposed in the literature yet it is still an open problem due to the difficulty of learning any transformation pattern. In this work, we present a novel optimisation model framed into a new CNN-based contrastive registration architecture for unsupervised medical image segmentation. The core of our approach is to exploit image-level registration and feature-level from a contrastive learning mechanism, to perform registration-based segmentation. Firstly, we propose an architecture to capture the image-to-image transformation pattern via registration for unsupervised medical image segmentation. Secondly, we embed a contrastive learning mechanism into the registration architecture to enhance the discriminating capacity of the network in the feature-level. We show that our proposed technique mitigates the major drawbacks of existing unsupervised techniques. We demonstrate, through numerical and visual experiments, that our technique substantially outperforms the current state-of-the-art unsupervised segmentation methods on two major medical image datasets.
△ Less
Submitted 20 July, 2022; v1 submitted 17 November, 2020;
originally announced November 2020.
-
GraphXCOVID: Explainable Deep Graph Diffusion Pseudo-Labelling for Identifying COVID-19 on Chest X-rays
Authors:
Angelica I Aviles-Rivero,
Philip Sellars,
Carola-Bibiane Schönlieb,
Nicolas Papadakis
Abstract:
Can one learn to diagnose COVID-19 under extreme minimal supervision? Since the outbreak of the novel COVID-19 there has been a rush for developing Artificial Intelligence techniques for expert-level disease identification on Chest X-ray data. In particular, the use of deep supervised learning has become the go-to paradigm. However, the performance of such models is heavily dependent on the availa…
▽ More
Can one learn to diagnose COVID-19 under extreme minimal supervision? Since the outbreak of the novel COVID-19 there has been a rush for developing Artificial Intelligence techniques for expert-level disease identification on Chest X-ray data. In particular, the use of deep supervised learning has become the go-to paradigm. However, the performance of such models is heavily dependent on the availability of a large and representative labelled dataset. The creation of which is a heavily expensive and time consuming task, and especially imposes a great challenge for a novel disease. Semi-supervised learning has shown the ability to match the incredible performance of supervised models whilst requiring a small fraction of the labelled examples. This makes the semi-supervised paradigm an attractive option for identifying COVID-19. In this work, we introduce a graph based deep semi-supervised framework for classifying COVID-19 from chest X-rays. Our framework introduces an optimisation model for graph diffusion that reinforces the natural relation among the tiny labelled set and the vast unlabelled data. We then connect the diffusion prediction output as pseudo-labels that are used in an iterative scheme in a deep net. We demonstrate, through our experiments, that our model is able to outperform the current leading supervised model with a tiny fraction of the labelled examples. Finally, we provide attention maps to accommodate the radiologist's mental model, better fitting their perceptual and cognitive abilities. These visualisation aims to assist the radiologist in judging whether the diagnostic is correct or not, and in consequence to accelerate the decision.
△ Less
Submitted 4 July, 2021; v1 submitted 30 September, 2020;
originally announced October 2020.
-
Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans
Authors:
Michael Roberts,
Derek Driggs,
Matthew Thorpe,
Julian Gilbey,
Michael Yeung,
Stephan Ursprung,
Angelica I. Aviles-Rivero,
Christian Etmann,
Cathal McCague,
Lucian Beer,
Jonathan R. Weir-McCall,
Zhongzhao Teng,
Effrossyni Gkrania-Klotsas,
James H. F. Rudd,
Evis Sala,
Carola-Bibiane Schönlieb
Abstract:
Machine learning methods offer great promise for fast and accurate detection and prognostication of COVID-19 from standard-of-care chest radiographs (CXR) and computed tomography (CT) images. Many articles have been published in 2020 describing new machine learning-based models for both of these tasks, but it is unclear which are of potential clinical utility. In this systematic review, we search…
▽ More
Machine learning methods offer great promise for fast and accurate detection and prognostication of COVID-19 from standard-of-care chest radiographs (CXR) and computed tomography (CT) images. Many articles have been published in 2020 describing new machine learning-based models for both of these tasks, but it is unclear which are of potential clinical utility. In this systematic review, we search EMBASE via OVID, MEDLINE via PubMed, bioRxiv, medRxiv and arXiv for published papers and preprints uploaded from January 1, 2020 to October 3, 2020 which describe new machine learning models for the diagnosis or prognosis of COVID-19 from CXR or CT images. Our search identified 2,212 studies, of which 415 were included after initial screening and, after quality screening, 61 studies were included in this systematic review. Our review finds that none of the models identified are of potential clinical use due to methodological flaws and/or underlying biases. This is a major weakness, given the urgency with which validated COVID-19 models are needed. To address this, we give many recommendations which, if followed, will solve these issues and lead to higher quality model development and well documented manuscripts.
△ Less
Submitted 5 January, 2021; v1 submitted 14 August, 2020;
originally announced August 2020.
-
The GraphNet Zoo: An All-in-One Graph Based Deep Semi-Supervised Framework for Medical Image Classification
Authors:
Marianne de Vriendt,
Philip Sellars,
Angelica I Aviles-Rivero
Abstract:
We consider the problem of classifying a medical image dataset when we have a limited amounts of labels. This is very common yet challenging setting as labelled data is expensive, time consuming to collect and may require expert knowledge. The current classification go-to of deep supervised learning is unable to cope with such a problem setup. However, using semi-supervised learning, one can produ…
▽ More
We consider the problem of classifying a medical image dataset when we have a limited amounts of labels. This is very common yet challenging setting as labelled data is expensive, time consuming to collect and may require expert knowledge. The current classification go-to of deep supervised learning is unable to cope with such a problem setup. However, using semi-supervised learning, one can produce accurate classifications using a significantly reduced amount of labelled data. Therefore, semi-supervised learning is perfectly suited for medical image classification. However, there has almost been no uptake of semi-supervised methods in the medical domain. In this work, we propose an all-in-one framework for deep semi-supervised classification focusing on graph based approaches, which up to our knowledge it is the first time that an approach with minimal labels has been shown to such an unprecedented scale with medical data. We introduce the concept of hybrid models by defining a classifier as a combination between an energy-based model and a deep net. Our energy functional is built on the Dirichlet energy based on the graph p-Laplacian. Our framework includes energies based on the $\ell_1$ and $\ell_2$ norms. We then connected this energy model to a deep net to generate a much richer feature space to construct a stronger graph. Our framework can be set to be adapted to any complex dataset. We demonstrate, through extensive numerical comparisons, that our approach readily compete with fully-supervised state-of-the-art techniques for the applications of Malaria Cells, Mammograms and Chest X-ray classification whilst using only 20% of labels.
△ Less
Submitted 26 June, 2020; v1 submitted 13 March, 2020;
originally announced March 2020.
-
Dim the Lights! -- Low-Rank Prior Temporal Data for Specular-Free Video Recovery
Authors:
Samar M. Alsaleh,
Angelica I. Aviles-Rivero,
Noemie Debroux,
James K. Hahn
Abstract:
The appearance of an object is significantly affected by the illumination conditions in the environment. This is more evident with strong reflective objects as they suffer from more dominant specular reflections, causing information loss and discontinuity in the image domain. In this paper, we present a novel framework for specular-free video recovery with special emphasis on dealing with complex…
▽ More
The appearance of an object is significantly affected by the illumination conditions in the environment. This is more evident with strong reflective objects as they suffer from more dominant specular reflections, causing information loss and discontinuity in the image domain. In this paper, we present a novel framework for specular-free video recovery with special emphasis on dealing with complex motions coming from objects or camera. Our solution is a twostep approach that allows for both detection and restoration of the damaged regions on video data. We first propose a spatially adaptive detection term that searches for the damage areas. We then introduce a variational solution for specular-free video recovery that allows exploiting spatio-temporal correlations by representing prior data in a low-rank form. We demonstrate that our solution prevents major drawbacks of existing approaches while improving the performance in both detection accuracy and inpainting quality. Finally, we show that our approach can be applied to other problems such as object removal.
△ Less
Submitted 16 December, 2019;
originally announced December 2019.
-
Rethinking Medical Image Reconstruction via Shape Prior, Going Deeper and Faster: Deep Joint Indirect Registration and Reconstruction
Authors:
Jiulong Liu,
Angelica I. Aviles-Rivero,
Hui Ji,
Carola-Bibiane Schönlieb
Abstract:
Indirect image registration is a promising technique to improve image reconstruction quality by providing a shape prior for the reconstruction task. In this paper, we propose a novel hybrid method that seeks to reconstruct high quality images from few measurements whilst requiring low computational cost. With this purpose, our framework intertwines indirect registration and reconstruction tasks is…
▽ More
Indirect image registration is a promising technique to improve image reconstruction quality by providing a shape prior for the reconstruction task. In this paper, we propose a novel hybrid method that seeks to reconstruct high quality images from few measurements whilst requiring low computational cost. With this purpose, our framework intertwines indirect registration and reconstruction tasks is a single functional. It is based on two major novelties. Firstly, we introduce a model based on deep nets to solve the indirect registration problem, in which the inversion and registration mappings are recurrently connected through a fixed-point interaction based sparse optimisation. Secondly, we introduce specific inversion blocks, that use the explicit physical forward operator, to map the acquired measurements to the image reconstruction. We also introduce registration blocks based deep nets to predict the registration parameters and warp transformation accurately and efficiently. We demonstrate, through extensive numerical and visual experiments, that our framework outperforms significantly classic reconstruction schemes and other bi-task method; this in terms of both image quality and computational time. Finally, we show generalisation capabilities of our approach by demonstrating their performance on fast Magnetic Resonance Imaging (MRI), sparse view computed tomography (CT) and low dose CT with measurements much below the Nyquist limit.
△ Less
Submitted 16 December, 2019;
originally announced December 2019.