Search | arXiv e-print repository

Probabilistic Interactive 3D Segmentation with Hierarchical Neural Processes

Authors: Jie Liu, Pan Zhou, Zehao Xiao, Jiayi Shen, Wenzhe Yin, Jan-Jakob Sonke, Efstratios Gavves

Abstract: Interactive 3D segmentation has emerged as a promising solution for generating accurate object masks in complex 3D scenes by incorporating user-provided clicks. However, two critical challenges remain underexplored: (1) effectively generalizing from sparse user clicks to produce accurate segmentation, and (2) quantifying predictive uncertainty to help users identify unreliable regions. In this wor… ▽ More Interactive 3D segmentation has emerged as a promising solution for generating accurate object masks in complex 3D scenes by incorporating user-provided clicks. However, two critical challenges remain underexplored: (1) effectively generalizing from sparse user clicks to produce accurate segmentation, and (2) quantifying predictive uncertainty to help users identify unreliable regions. In this work, we propose NPISeg3D, a novel probabilistic framework that builds upon Neural Processes (NPs) to address these challenges. Specifically, NPISeg3D introduces a hierarchical latent variable structure with scene-specific and object-specific latent variables to enhance few-shot generalization by capturing both global context and object-specific characteristics. Additionally, we design a probabilistic prototype modulator that adaptively modulates click prototypes with object-specific latent variables, improving the model's ability to capture object-aware context and quantify predictive uncertainty. Experiments on four 3D point cloud datasets demonstrate that NPISeg3D achieves superior segmentation performance with fewer clicks while providing reliable uncertainty estimations. △ Less

Submitted 26 May, 2025; v1 submitted 3 May, 2025; originally announced May 2025.

Comments: ICML 2025 Proceedings

arXiv:2502.17028 [pdf, other]

Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence

Authors: Wenzhe Yin, Zehao Xiao, Pan Zhou, Shujian Yu, Jiayi Shen, Jan-Jakob Sonke, Efstratios Gavves

Abstract: Multimodal alignment is crucial for various downstream tasks such as cross-modal generation and retrieval. Previous multimodal approaches like CLIP utilize InfoNCE to maximize mutual information, primarily aligning pairwise samples across modalities while overlooking distributional differences. In addition, InfoNCE has inherent conflict in terms of alignment and uniformity in multimodality, leadin… ▽ More Multimodal alignment is crucial for various downstream tasks such as cross-modal generation and retrieval. Previous multimodal approaches like CLIP utilize InfoNCE to maximize mutual information, primarily aligning pairwise samples across modalities while overlooking distributional differences. In addition, InfoNCE has inherent conflict in terms of alignment and uniformity in multimodality, leading to suboptimal alignment with modality gaps. To overcome the limitations, we propose CS-Aligner, a novel framework that performs distributional vision-language alignment by integrating Cauchy-Schwarz (CS) divergence with mutual information. CS-Aligner captures both the global distribution information of each modality and the pairwise semantic relationships. We find that the CS divergence seamlessly addresses the InfoNCE's alignment-uniformity conflict and serves complementary roles with InfoNCE, yielding tighter and more precise alignment. Moreover, by introducing distributional alignment, CS-Aligner enables incorporating additional information from unpaired data and token-level representations, enhancing flexible and fine-grained alignment in practice. Experiments on text-to-image generation and cross-modality retrieval tasks demonstrate the effectiveness of our method on vision-language alignment. △ Less

Submitted 20 May, 2025; v1 submitted 24 February, 2025; originally announced February 2025.

arXiv:2502.02338 [pdf, other]

Geometric Neural Process Fields

Authors: Wenzhe Yin, Zehao Xiao, Jiayi Shen, Yunlu Chen, Cees G. M. Snoek, Jan-Jakob Sonke, Efstratios Gavves

Abstract: This paper addresses the challenge of Neural Field (NeF) generalization, where models must efficiently adapt to new signals given only a few observations. To tackle this, we propose Geometric Neural Process Fields (G-NPF), a probabilistic framework for neural radiance fields that explicitly captures uncertainty. We formulate NeF generalization as a probabilistic problem, enabling direct inference… ▽ More This paper addresses the challenge of Neural Field (NeF) generalization, where models must efficiently adapt to new signals given only a few observations. To tackle this, we propose Geometric Neural Process Fields (G-NPF), a probabilistic framework for neural radiance fields that explicitly captures uncertainty. We formulate NeF generalization as a probabilistic problem, enabling direct inference of NeF function distributions from limited context observations. To incorporate structural inductive biases, we introduce a set of geometric bases that encode spatial structure and facilitate the inference of NeF function distributions. Building on these bases, we design a hierarchical latent variable model, allowing G-NPF to integrate structural information across multiple spatial levels and effectively parameterize INR functions. This hierarchical approach improves generalization to novel scenes and unseen signals. Experiments on novel-view synthesis for 3D scenes, as well as 2D image and 1D signal regression, demonstrate the effectiveness of our method in capturing uncertainty and leveraging structural information for improved generalization. △ Less

Submitted 4 February, 2025; originally announced February 2025.

arXiv:2411.18249 [pdf, other]

Deep End-to-end Adaptive k-Space Sampling, Reconstruction, and Registration for Dynamic MRI

Authors: George Yiasemis, Jan-Jakob Sonke, Jonas Teuwen

Abstract: Dynamic MRI enables a range of clinical applications, including cardiac function assessment, organ motion tracking, and radiotherapy guidance. However, fully sampling the dynamic k-space data is often infeasible due to time constraints and physiological motion such as respiratory and cardiac motion. This necessitates undersampling, which degrades the quality of reconstructed images. Poor image qua… ▽ More Dynamic MRI enables a range of clinical applications, including cardiac function assessment, organ motion tracking, and radiotherapy guidance. However, fully sampling the dynamic k-space data is often infeasible due to time constraints and physiological motion such as respiratory and cardiac motion. This necessitates undersampling, which degrades the quality of reconstructed images. Poor image quality not only hinders visualization but also impairs the estimation of deformation fields, crucial for registering dynamic (moving) images to a static reference image. This registration enables tasks such as motion correction, treatment planning, and quantitative analysis in applications like cardiac imaging and MR-guided radiotherapy. To overcome the challenges posed by undersampling and motion, we introduce an end-to-end deep learning (DL) framework that integrates adaptive dynamic k-space sampling, reconstruction, and registration. Our approach begins with a DL-based adaptive sampling strategy, optimizing dynamic k-space acquisition to capture the most relevant data for each specific case. This is followed by a DL-based reconstruction module that produces images optimized for accurate deformation field estimation from the undersampled moving data. Finally, a registration module estimates the deformation fields aligning the reconstructed dynamic images with a static reference. The proposed framework is independent of specific reconstruction and registration modules allowing for plug-and-play integration of these components. The entire framework is jointly trained using a combination of supervised and unsupervised loss functions, enabling end-to-end optimization for improved performance across all components. Through controlled experiments and ablation studies, we validate each component, demonstrating that each choice contributes to robust motion estimation from undersampled dynamic data. △ Less

Submitted 21 March, 2025; v1 submitted 27 November, 2024; originally announced November 2024.

Comments: 48 pages, 23 figures, 8 tables

arXiv:2411.04679 [pdf, other]

CaPo: Cooperative Plan Optimization for Efficient Embodied Multi-Agent Cooperation

Authors: Jie Liu, Pan Zhou, Yingjun Du, Ah-Hwee Tan, Cees G. M. Snoek, Jan-Jakob Sonke, Efstratios Gavves

Abstract: In this work, we address the cooperation problem among large language model (LLM) based embodied agents, where agents must cooperate to achieve a common goal. Previous methods often execute actions extemporaneously and incoherently, without long-term strategic and cooperative planning, leading to redundant steps, failures, and even serious repercussions in complex tasks like search-and-rescue miss… ▽ More In this work, we address the cooperation problem among large language model (LLM) based embodied agents, where agents must cooperate to achieve a common goal. Previous methods often execute actions extemporaneously and incoherently, without long-term strategic and cooperative planning, leading to redundant steps, failures, and even serious repercussions in complex tasks like search-and-rescue missions where discussion and cooperative plan are crucial. To solve this issue, we propose Cooperative Plan Optimization (CaPo) to enhance the cooperation efficiency of LLM-based embodied agents. Inspired by human cooperation schemes, CaPo improves cooperation efficiency with two phases: 1) meta-plan generation, and 2) progress-adaptive meta-plan and execution. In the first phase, all agents analyze the task, discuss, and cooperatively create a meta-plan that decomposes the task into subtasks with detailed steps, ensuring a long-term strategic and coherent plan for efficient coordination. In the second phase, agents execute tasks according to the meta-plan and dynamically adjust it based on their latest progress (e.g., discovering a target object) through multi-turn discussions. This progress-based adaptation eliminates redundant actions, improving the overall cooperation efficiency of agents. Experimental results on the ThreeDworld Multi-Agent Transport and Communicative Watch-And-Help tasks demonstrate that CaPo achieves much higher task completion rate and efficiency compared with state-of-the-arts.The code is released at https://github.com/jliu4ai/CaPo. △ Less

Submitted 1 March, 2025; v1 submitted 7 November, 2024; originally announced November 2024.

Comments: Accepted in ICLR2025

arXiv:2411.01291 [pdf, other]

Deep Multi-contrast Cardiac MRI Reconstruction via vSHARP with Auxiliary Refinement Network

Authors: George Yiasemis, Nikita Moriakov, Jan-Jakob Sonke, Jonas Teuwen

Abstract: Cardiac MRI (CMRI) is a cornerstone imaging modality that provides in-depth insights into cardiac structure and function. Multi-contrast CMRI (MCCMRI), which acquires sequences with varying contrast weightings, significantly enhances diagnostic capabilities by capturing a wide range of cardiac tissue characteristics. However, MCCMRI is often constrained by lengthy acquisition times and susceptibil… ▽ More Cardiac MRI (CMRI) is a cornerstone imaging modality that provides in-depth insights into cardiac structure and function. Multi-contrast CMRI (MCCMRI), which acquires sequences with varying contrast weightings, significantly enhances diagnostic capabilities by capturing a wide range of cardiac tissue characteristics. However, MCCMRI is often constrained by lengthy acquisition times and susceptibility to motion artifacts. To mitigate these challenges, accelerated imaging techniques that use k-space undersampling via different sampling schemes at acceleration factors have been developed to shorten scan durations. In this context, we propose a deep learning-based reconstruction method for 2D dynamic multi-contrast, multi-scheme, and multi-acceleration MRI. Our approach integrates the state-of-the-art vSHARP model, which utilizes half-quadratic variable splitting and ADMM optimization, with a Variational Network serving as an Auxiliary Refinement Network (ARN) to better adapt to the diverse nature of MCCMRI data. Specifically, the subsampled k-space data is fed into the ARN, which produces an initial prediction for the denoising step used by vSHARP. This, along with the subsampled k-space, is then used by vSHARP to generate high-quality 2D sequence predictions. Our method outperforms traditional reconstruction techniques and other vSHARP-based models. △ Less

Submitted 2 November, 2024; originally announced November 2024.

Comments: 11 pages, 1 figure, 3 tables, CMRxRecon Challenge 2024

arXiv:2406.06660 [pdf, other]

Space-Time Continuous PDE Forecasting using Equivariant Neural Fields

Authors: David M. Knigge, David R. Wessels, Riccardo Valperga, Samuele Papa, Jan-Jakob Sonke, Efstratios Gavves, Erik J. Bekkers

Abstract: Recently, Conditional Neural Fields (NeFs) have emerged as a powerful modelling paradigm for PDEs, by learning solutions as flows in the latent space of the Conditional NeF. Although benefiting from favourable properties of NeFs such as grid-agnosticity and space-time-continuous dynamics modelling, this approach limits the ability to impose known constraints of the PDE on the solutions -- e.g. sym… ▽ More Recently, Conditional Neural Fields (NeFs) have emerged as a powerful modelling paradigm for PDEs, by learning solutions as flows in the latent space of the Conditional NeF. Although benefiting from favourable properties of NeFs such as grid-agnosticity and space-time-continuous dynamics modelling, this approach limits the ability to impose known constraints of the PDE on the solutions -- e.g. symmetries or boundary conditions -- in favour of modelling flexibility. Instead, we propose a space-time continuous NeF-based solving framework that - by preserving geometric information in the latent space - respects known symmetries of the PDE. We show that modelling solutions as flows of pointclouds over the group of interest $G$ improves generalization and data-efficiency. We validated that our framework readily generalizes to unseen spatial and temporal locations, as well as geometric transformations of the initial conditions - where other NeF-based PDE forecasting methods fail - and improve over baselines in a number of challenging geometries. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2405.19978 [pdf, other]

Domain Adaptation with Cauchy-Schwarz Divergence

Authors: Wenzhe Yin, Shujian Yu, Yicong Lin, Jie Liu, Jan-Jakob Sonke, Efstratios Gavves

Abstract: Domain adaptation aims to use training data from one or multiple source domains to learn a hypothesis that can be generalized to a different, but related, target domain. As such, having a reliable measure for evaluating the discrepancy of both marginal and conditional distributions is crucial. We introduce Cauchy-Schwarz (CS) divergence to the problem of unsupervised domain adaptation (UDA). The C… ▽ More Domain adaptation aims to use training data from one or multiple source domains to learn a hypothesis that can be generalized to a different, but related, target domain. As such, having a reliable measure for evaluating the discrepancy of both marginal and conditional distributions is crucial. We introduce Cauchy-Schwarz (CS) divergence to the problem of unsupervised domain adaptation (UDA). The CS divergence offers a theoretically tighter generalization error bound than the popular Kullback-Leibler divergence. This holds for the general case of supervised learning, including multi-class classification and regression. Furthermore, we illustrate that the CS divergence enables a simple estimator on the discrepancy of both marginal and conditional distributions between source and target domains in the representation space, without requiring any distributional assumptions. We provide multiple examples to illustrate how the CS divergence can be conveniently used in both distance metric- or adversarial training-based UDA frameworks, resulting in compelling performance. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: Accepted by UAI-24

arXiv:2403.10346 [pdf, other]

End-to-end Adaptive Dynamic Subsampling and Reconstruction for Cardiac MRI

Authors: George Yiasemis, Jan-Jakob Sonke, Jonas Teuwen

Abstract: $\textbf{Background:}… ▽ More $\textbf{Background:}$ Accelerating dynamic MRI is vital for advancing clinical applications and improving patient comfort. Commonly, deep learning (DL) methods for accelerated dynamic MRI reconstruction typically rely on uniformly applying non-adaptive predetermined or random subsampling patterns across all temporal frames of the dynamic acquisition. This approach fails to exploit temporal correlations or optimize subsampling on a case-by-case basis. $\textbf{Purpose:}$ To develop an end-to-end approach for adaptive dynamic MRI subsampling and reconstruction, capable of generating customized sampling patterns maximizing at the same time reconstruction quality. $\textbf{Methods:}$ We introduce the End-to-end Adaptive Dynamic Sampling and Reconstruction (E2E-ADS-Recon) for MRI framework, which integrates an adaptive dynamic sampler (ADS) that adapts the acquisition trajectory to each case for a given acceleration factor with a state-of-the-art dynamic reconstruction network, vSHARP, for reconstructing the adaptively sampled data into a dynamic image. The ADS can produce either frame-specific patterns or unified patterns applied to all temporal frames. E2E-ADS-Recon is evaluated under both frame-specific and unified 1D or 2D sampling settings, using dynamic cine cardiac MRI data and compared with vSHARP models employing standard subsampling trajectories, as well as pipelines where ADS was replaced by parameterized samplers optimized for dataset-specific schemes. $\textbf{Results:}$ E2E-ADS-Recon exhibited superior reconstruction quality, especially at high accelerations, in terms of standard quantitative metrics (SSIM, pSNR, NMSE). $\textbf{Conclusion:}$ The proposed framework improves reconstruction quality, highlighting the importance of case-specific subsampling optimization in dynamic MRI applications. △ Less

Submitted 21 March, 2025; v1 submitted 15 March, 2024; originally announced March 2024.

Comments: 38 pages, 26 figures, 2 tables

arXiv:2401.16051 [pdf, other]

Dynamic Prototype Adaptation with Distillation for Few-shot Point Cloud Segmentation

Authors: Jie Liu, Wenzhe Yin, Haochen Wang, Yunlu CHen, Jan-Jakob Sonke, Efstratios Gavves

Abstract: Few-shot point cloud segmentation seeks to generate per-point masks for previously unseen categories, using only a minimal set of annotated point clouds as reference. Existing prototype-based methods rely on support prototypes to guide the segmentation of query point clouds, but they encounter challenges when significant object variations exist between the support prototypes and query features. In… ▽ More Few-shot point cloud segmentation seeks to generate per-point masks for previously unseen categories, using only a minimal set of annotated point clouds as reference. Existing prototype-based methods rely on support prototypes to guide the segmentation of query point clouds, but they encounter challenges when significant object variations exist between the support prototypes and query features. In this work, we present dynamic prototype adaptation (DPA), which explicitly learns task-specific prototypes for each query point cloud to tackle the object variation problem. DPA achieves the adaptation through prototype rectification, aligning vanilla prototypes from support with the query feature distribution, and prototype-to-query attention, extracting task-specific context from query point clouds. Furthermore, we introduce a prototype distillation regularization term, enabling knowledge transfer between early-stage prototypes and their deeper counterparts during adaption. By iteratively applying these adaptations, we generate task-specific prototypes for accurate mask predictions on query point clouds. Extensive experiments on two popular benchmarks show that DPA surpasses state-of-the-art methods by a significant margin, e.g., 7.43\% and 6.39\% under the 2-way 1-shot setting on S3DIS and ScanNet, respectively. Code is available at https://github.com/jliu4ai/DPA. △ Less

Submitted 29 January, 2024; originally announced January 2024.

Comments: Accepted in 3DV2024, code is available at https://github.com/jliu4ai/DPA

arXiv:2401.11256 [pdf, other]

Equivariant Multiscale Learned Invertible Reconstruction for Cone Beam CT

Authors: Nikita Moriakov, Jan-Jakob Sonke, Jonas Teuwen

Abstract: Cone Beam CT (CBCT) is an essential imaging modality nowadays, but the image quality of CBCT still lags behind the high quality standards established by the conventional Computed Tomography. We propose LIRE+, a learned iterative scheme for fast and memory-efficient CBCT reconstruction, which is a substantially faster and more parameter-efficient alternative to the recently proposed LIRE method. LI… ▽ More Cone Beam CT (CBCT) is an essential imaging modality nowadays, but the image quality of CBCT still lags behind the high quality standards established by the conventional Computed Tomography. We propose LIRE+, a learned iterative scheme for fast and memory-efficient CBCT reconstruction, which is a substantially faster and more parameter-efficient alternative to the recently proposed LIRE method. LIRE+ is a rotationally-equivariant multiscale learned invertible primal-dual iterative scheme for CBCT reconstruction. Memory usage is optimized by relying on simple reversible residual networks in primal/dual cells and patch-wise computations inside the cells during forward and backward passes, while increased inference speed is achieved by making the primal-dual scheme multiscale so that the reconstruction process starts at low resolution and with low resolution primal/dual latent vectors. A LIRE+ model was trained and validated on a set of 260 + 22 thorax CT scans and tested using a set of 142 thorax CT scans with additional evaluation with and without finetuning on an out-of-distribution set of 79 Head and Neck (HN) CT scans. Our method surpasses classical and deep learning baselines, including LIRE, on the thorax test set. For a similar inference time and with only 37 % of the parameter budget, LIRE+ achieves a +0.2 dB PSNR improvement over LIRE, while being able to match the performance of LIRE in 45 % less inference time and with 28 % of the parameter budget. Rotational equivariance ensures robustness of LIRE+ to patient orientation, while LIRE and other deep learning baselines suffer from substantial performance degradation when patient orientation is unusual. On the HN dataset in the absence of finetuning, LIRE+ is generally comparable to LIRE in performance apart from a few outlier cases, whereas after identical finetuning LIRE+ demonstates a +1.02 dB PSNR improvement over LIRE. △ Less

Submitted 20 January, 2024; originally announced January 2024.

arXiv:2312.10531 [pdf, other]

How to Train Neural Field Representations: A Comprehensive Study and Benchmark

Authors: Samuele Papa, Riccardo Valperga, David Knigge, Miltiadis Kofinas, Phillip Lippe, Jan-Jakob Sonke, Efstratios Gavves

Abstract: Neural fields (NeFs) have recently emerged as a versatile method for modeling signals of various modalities, including images, shapes, and scenes. Subsequently, a number of works have explored the use of NeFs as representations for downstream tasks, e.g. classifying an image based on the parameters of a NeF that has been fit to it. However, the impact of the NeF hyperparameters on their quality as… ▽ More Neural fields (NeFs) have recently emerged as a versatile method for modeling signals of various modalities, including images, shapes, and scenes. Subsequently, a number of works have explored the use of NeFs as representations for downstream tasks, e.g. classifying an image based on the parameters of a NeF that has been fit to it. However, the impact of the NeF hyperparameters on their quality as downstream representation is scarcely understood and remains largely unexplored. This is in part caused by the large amount of time required to fit datasets of neural fields. In this work, we propose a JAX-based library that leverages parallelization to enable fast optimization of large-scale NeF datasets, resulting in a significant speed-up. With this library, we perform a comprehensive study that investigates the effects of different hyperparameters on fitting NeFs for downstream tasks. In particular, we explore the use of a shared initialization, the effects of overtraining, and the expressiveness of the network architectures used. Our study provides valuable insights on how to train NeFs and offers guidance for optimizing their effectiveness in downstream applications. Finally, based on the proposed library and our analysis, we propose Neural Field Arena, a benchmark consisting of neural field variants of popular vision datasets, including MNIST, CIFAR, variants of ImageNet, and ShapeNetv2. Our library and the Neural Field Arena will be open-sourced to introduce standardized benchmarking and promote further research on neural fields. △ Less

Submitted 5 June, 2024; v1 submitted 16 December, 2023; originally announced December 2023.

arXiv:2311.15856 [pdf, other]

Joint Supervised and Self-supervised Learning for MRI Reconstruction

Authors: George Yiasemis, Nikita Moriakov, Clara I. Sánchez, Jan-Jakob Sonke, Jonas Teuwen

Abstract: Magnetic Resonance Imaging (MRI) represents an important diagnostic modality; however, its inherently slow acquisition process poses challenges in obtaining fully-sampled $k$-space data under motion. In the absence of fully-sampled acquisitions, serving as ground truths, training deep learning algorithms in a supervised manner to predict the underlying ground truth image becomes challenging. To ad… ▽ More Magnetic Resonance Imaging (MRI) represents an important diagnostic modality; however, its inherently slow acquisition process poses challenges in obtaining fully-sampled $k$-space data under motion. In the absence of fully-sampled acquisitions, serving as ground truths, training deep learning algorithms in a supervised manner to predict the underlying ground truth image becomes challenging. To address this limitation, self-supervised methods have emerged as a viable alternative, leveraging available subsampled $k$-space data to train deep neural networks for MRI reconstruction. Nevertheless, these approaches often fall short when compared to supervised methods. We propose Joint Supervised and Self-supervised Learning (JSSL), a novel training approach for deep learning-based MRI reconstruction algorithms aimed at enhancing reconstruction quality in cases where target datasets containing fully-sampled $k$-space measurements are unavailable. JSSL operates by simultaneously training a model in a self-supervised learning setting, using subsampled data from the target dataset(s), and in a supervised learning manner, utilizing datasets with fully-sampled $k$-space data, referred to as proxy datasets. We demonstrate JSSL's efficacy using subsampled prostate or cardiac MRI data as the target datasets, with fully-sampled brain and knee, or brain, knee and prostate $k$-space acquisitions, respectively, as proxy datasets. Our results showcase substantial improvements over conventional self-supervised methods, validated using common image quality metrics. Furthermore, we provide theoretical motivations for JSSL and establish "rule-of-thumb" guidelines for training MRI reconstruction models. JSSL effectively enhances MRI reconstruction quality in scenarios where fully-sampled $k$-space data is not available, leveraging the strengths of supervised learning by incorporating proxy datasets. △ Less

Submitted 20 December, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

Comments: pages, 14 figures, 6 tables

arXiv:2311.11837 [pdf, other]

Kandinsky Conformal Prediction: Efficient Calibration of Image Segmentation Algorithms

Authors: Joren Brunekreef, Eric Marcus, Ray Sheombarsing, Jan-Jakob Sonke, Jonas Teuwen

Abstract: Image segmentation algorithms can be understood as a collection of pixel classifiers, for which the outcomes of nearby pixels are correlated. Classifier models can be calibrated using Inductive Conformal Prediction, but this requires holding back a sufficiently large calibration dataset for computing the distribution of non-conformity scores of the model's predictions. If one only requires only ma… ▽ More Image segmentation algorithms can be understood as a collection of pixel classifiers, for which the outcomes of nearby pixels are correlated. Classifier models can be calibrated using Inductive Conformal Prediction, but this requires holding back a sufficiently large calibration dataset for computing the distribution of non-conformity scores of the model's predictions. If one only requires only marginal calibration on the image level, this calibration set consists of all individual pixels in the images available for calibration. However, if the goal is to attain proper calibration for each individual pixel classifier, the calibration set consists of individual images. In a scenario where data are scarce (such as the medical domain), it may not always be possible to set aside sufficiently many images for this pixel-level calibration. The method we propose, dubbed ``Kandinsky calibration'', makes use of the spatial structure present in the distribution of natural images to simultaneously calibrate the classifiers of ``similar'' pixels. This can be seen as an intermediate approach between marginal (imagewise) and conditional (pixelwise) calibration, where non-conformity scores are aggregated over similar image regions, thereby making more efficient use of the images available for calibration. We run experiments on segmentation algorithms trained and calibrated on subsets of the public MS-COCO and Medical Decathlon datasets, demonstrating that Kandinsky calibration method can significantly improve the coverage. When compared to both pixelwise and imagewise calibration on little data, the Kandinsky method achieves much lower coverage errors, indicating the data efficiency of the Kandinsky calibration. △ Less

Submitted 20 November, 2023; originally announced November 2023.

Comments: 15 pages, 11 figures

arXiv:2310.06628 [pdf, other]

Deep Cardiac MRI Reconstruction with ADMM

Authors: George Yiasemis, Nikita Moriakov, Jan-Jakob Sonke, Jonas Teuwen

Abstract: Cardiac magnetic resonance imaging is a valuable non-invasive tool for identifying cardiovascular diseases. For instance, Cine MRI is the benchmark modality for assessing the cardiac function and anatomy. On the other hand, multi-contrast (T1 and T2) mapping has the potential to assess pathologies and abnormalities in the myocardium and interstitium. However, voluntary breath-holding and often arr… ▽ More Cardiac magnetic resonance imaging is a valuable non-invasive tool for identifying cardiovascular diseases. For instance, Cine MRI is the benchmark modality for assessing the cardiac function and anatomy. On the other hand, multi-contrast (T1 and T2) mapping has the potential to assess pathologies and abnormalities in the myocardium and interstitium. However, voluntary breath-holding and often arrhythmia, in combination with MRI's slow imaging speed, can lead to motion artifacts, hindering real-time acquisition image quality. Although performing accelerated acquisitions can facilitate dynamic imaging, it induces aliasing, causing low reconstructed image quality in Cine MRI and inaccurate T1 and T2 mapping estimation. In this work, inspired by related work in accelerated MRI reconstruction, we present a deep learning (DL)-based method for accelerated cine and multi-contrast reconstruction in the context of dynamic cardiac imaging. We formulate the reconstruction problem as a least squares regularized optimization task, and employ vSHARP, a state-of-the-art DL-based inverse problem solver, which incorporates half-quadratic variable splitting and the alternating direction method of multipliers with neural networks. We treat the problem in two setups; a 2D reconstruction and a 2D dynamic reconstruction task, and employ 2D and 3D deep learning networks, respectively. Our method optimizes in both the image and k-space domains, allowing for high reconstruction fidelity. Although the target data is undersampled with a Cartesian equispaced scheme, we train our model using both Cartesian and simulated non-Cartesian undersampling schemes to enhance generalization of the model to unseen data. Furthermore, our model adopts a deep neural network to learn and refine the sensitivity maps of multi-coil k-space data. Lastly, our method is jointly trained on both, undersampled cine and multi-contrast data. △ Less

Submitted 10 October, 2023; originally announced October 2023.

Comments: 12 pages, 3 figures, 2 tables. CMRxRecon Challenge, MICCAI 2023

arXiv:2309.09954 [pdf, other]

vSHARP: variable Splitting Half-quadratic Admm algorithm for Reconstruction of inverse-Problems

Authors: George Yiasemis, Nikita Moriakov, Jan-Jakob Sonke, Jonas Teuwen

Abstract: Medical Imaging (MI) tasks, such as accelerated parallel Magnetic Resonance Imaging (MRI), often involve reconstructing an image from noisy or incomplete measurements. This amounts to solving ill-posed inverse problems, where a satisfactory closed-form analytical solution is not available. Traditional methods such as Compressed Sensing (CS) in MRI reconstruction can be time-consuming or prone to o… ▽ More Medical Imaging (MI) tasks, such as accelerated parallel Magnetic Resonance Imaging (MRI), often involve reconstructing an image from noisy or incomplete measurements. This amounts to solving ill-posed inverse problems, where a satisfactory closed-form analytical solution is not available. Traditional methods such as Compressed Sensing (CS) in MRI reconstruction can be time-consuming or prone to obtaining low-fidelity images. Recently, a plethora of Deep Learning (DL) approaches have demonstrated superior performance in inverse-problem solving, surpassing conventional methods. In this study, we propose vSHARP (variable Splitting Half-quadratic ADMM algorithm for Reconstruction of inverse Problems), a novel DL-based method for solving ill-posed inverse problems arising in MI. vSHARP utilizes the Half-Quadratic Variable Splitting method and employs the Alternating Direction Method of Multipliers (ADMM) to unroll the optimization process. For data consistency, vSHARP unrolls a differentiable gradient descent process in the image domain, while a DL-based denoiser, such as a U-Net architecture, is applied to enhance image quality. vSHARP also employs a dilated-convolution DL-based model to predict the Lagrange multipliers for the ADMM initialization. We evaluate vSHARP on tasks of accelerated parallel MRI Reconstruction using two distinct datasets and on accelerated parallel dynamic MRI Reconstruction using another dataset. Our comparative analysis with state-of-the-art methods demonstrates the superior performance of vSHARP in these applications. △ Less

Submitted 30 July, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

Comments: 22 pages, 9 figures, 5 tables

arXiv:2307.08351 [pdf, other]

Neural Modulation Fields for Conditional Cone Beam Neural Tomography

Authors: Samuele Papa, David M. Knigge, Riccardo Valperga, Nikita Moriakov, Miltos Kofinas, Jan-Jakob Sonke, Efstratios Gavves

Abstract: Conventional Computed Tomography (CT) methods require large numbers of noise-free projections for accurate density reconstructions, limiting their applicability to the more complex class of Cone Beam Geometry CT (CBCT) reconstruction. Recently, deep learning methods have been proposed to overcome these limitations, with methods based on neural fields (NF) showing strong performance, by approximati… ▽ More Conventional Computed Tomography (CT) methods require large numbers of noise-free projections for accurate density reconstructions, limiting their applicability to the more complex class of Cone Beam Geometry CT (CBCT) reconstruction. Recently, deep learning methods have been proposed to overcome these limitations, with methods based on neural fields (NF) showing strong performance, by approximating the reconstructed density through a continuous-in-space coordinate based neural network. Our focus is on improving such methods, however, unlike previous work, which requires training an NF from scratch for each new set of projections, we instead propose to leverage anatomical consistencies over different scans by training a single conditional NF on a dataset of projections. We propose a novel conditioning method where local modulations are modeled per patient as a field over the input domain through a Neural Modulation Field (NMF). The resulting Conditional Cone Beam Neural Tomography (CondCBNT) shows improved performance for both high and low numbers of available projections on noise-free and noisy data. △ Less

Submitted 17 July, 2023; originally announced July 2023.

arXiv:2302.04729 [pdf, other]

Constrained Empirical Risk Minimization: Theory and Practice

Authors: Eric Marcus, Ray Sheombarsing, Jan-Jakob Sonke, Jonas Teuwen

Abstract: Deep Neural Networks (DNNs) are widely used for their ability to effectively approximate large classes of functions. This flexibility, however, makes the strict enforcement of constraints on DNNs an open problem. Here we present a framework that, under mild assumptions, allows the exact enforcement of constraints on parameterized sets of functions such as DNNs. Instead of imposing "soft'' constrai… ▽ More Deep Neural Networks (DNNs) are widely used for their ability to effectively approximate large classes of functions. This flexibility, however, makes the strict enforcement of constraints on DNNs an open problem. Here we present a framework that, under mild assumptions, allows the exact enforcement of constraints on parameterized sets of functions such as DNNs. Instead of imposing "soft'' constraints via additional terms in the loss, we restrict (a subset of) the DNN parameters to a submanifold on which the constraints are satisfied exactly throughout the entire training procedure. We focus on constraints that are outside the scope of equivariant networks used in Geometric Deep Learning. As a major example of the framework, we restrict filters of a Convolutional Neural Network (CNN) to be wavelets, and apply these wavelet networks to the task of contour prediction in the medical domain. △ Less

Submitted 9 February, 2023; originally announced February 2023.

Comments: 50 pages, 12 figures, 2 tables

arXiv:2301.10540 [pdf, other]

Modelling Long Range Dependencies in $N$D: From Task-Specific to a General Purpose CNN

Authors: David M. Knigge, David W. Romero, Albert Gu, Efstratios Gavves, Erik J. Bekkers, Jakub M. Tomczak, Mark Hoogendoorn, Jan-Jakob Sonke

Abstract: Performant Convolutional Neural Network (CNN) architectures must be tailored to specific tasks in order to consider the length, resolution, and dimensionality of the input data. In this work, we tackle the need for problem-specific CNN architectures. We present the Continuous Convolutional Neural Network (CCNN): a single CNN able to process data of arbitrary resolution, dimensionality and length w… ▽ More Performant Convolutional Neural Network (CNN) architectures must be tailored to specific tasks in order to consider the length, resolution, and dimensionality of the input data. In this work, we tackle the need for problem-specific CNN architectures. We present the Continuous Convolutional Neural Network (CCNN): a single CNN able to process data of arbitrary resolution, dimensionality and length without any structural changes. Its key component are its continuous convolutional kernels which model long-range dependencies at every layer, and thus remove the need of current CNN architectures for task-dependent downsampling and depths. We showcase the generality of our method by using the same architecture for tasks on sequential ($1{\rm D}$), visual ($2{\rm D}$) and point-cloud ($3{\rm D}$) data. Our CCNN matches and often outperforms the current state-of-the-art across all tasks considered. △ Less

Submitted 16 April, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

arXiv:2301.08365 [pdf, other]

On Retrospective k-space Subsampling schemes For Deep MRI Reconstruction

Authors: George Yiasemis, Clara I. Sánchez, Jan-Jakob Sonke, Jonas Teuwen

Abstract: Acquiring fully-sampled MRI $k$-space data is time-consuming, and collecting accelerated data can reduce the acquisition time. Employing 2D Cartesian-rectilinear subsampling schemes is a conventional approach for accelerated acquisitions; however, this often results in imprecise reconstructions, even with the use of Deep Learning (DL), especially at high acceleration factors. Non-rectilinear or no… ▽ More Acquiring fully-sampled MRI $k$-space data is time-consuming, and collecting accelerated data can reduce the acquisition time. Employing 2D Cartesian-rectilinear subsampling schemes is a conventional approach for accelerated acquisitions; however, this often results in imprecise reconstructions, even with the use of Deep Learning (DL), especially at high acceleration factors. Non-rectilinear or non-Cartesian trajectories can be implemented in MRI scanners as alternative subsampling options. This work investigates the impact of the $k$-space subsampling scheme on the quality of reconstructed accelerated MRI measurements produced by trained DL models. The Recurrent Variational Network (RecurrentVarNet) was used as the DL-based MRI-reconstruction architecture. Cartesian, fully-sampled multi-coil $k$-space measurements from three datasets were retrospectively subsampled with different accelerations using eight distinct subsampling schemes: four Cartesian-rectilinear, two Cartesian non-rectilinear, and two non-Cartesian. Experiments were conducted in two frameworks: scheme-specific, where a distinct model was trained and evaluated for each dataset-subsampling scheme pair, and multi-scheme, where for each dataset a single model was trained on data randomly subsampled by any of the eight schemes and evaluated on data subsampled by all schemes. In both frameworks, RecurrentVarNets trained and evaluated on non-rectilinearly subsampled data demonstrated superior performance, particularly for high accelerations. In the multi-scheme setting, reconstruction performance on rectilinearly subsampled data improved when compared to the scheme-specific experiments. Our findings demonstrate the potential for using DL-based methods, trained on non-rectilinearly subsampled measurements, to optimize scan time and image quality. △ Less

Submitted 9 August, 2023; v1 submitted 19 January, 2023; originally announced January 2023.

Comments: 22 pages, 12 figures, 5 tables

arXiv:2301.03194 [pdf, other]

Few-shot Semantic Segmentation with Support-induced Graph Convolutional Network

Authors: Jie Liu, Yanqi Bao, Wenzhe Yin, Haochen Wang, Yang Gao, Jan-Jakob Sonke, Efstratios Gavves

Abstract: Few-shot semantic segmentation (FSS) aims to achieve novel objects segmentation with only a few annotated samples and has made great progress recently. Most of the existing FSS models focus on the feature matching between support and query to tackle FSS. However, the appearance variations between objects from the same category could be extremely large, leading to unreliable feature matching and qu… ▽ More Few-shot semantic segmentation (FSS) aims to achieve novel objects segmentation with only a few annotated samples and has made great progress recently. Most of the existing FSS models focus on the feature matching between support and query to tackle FSS. However, the appearance variations between objects from the same category could be extremely large, leading to unreliable feature matching and query mask prediction. To this end, we propose a Support-induced Graph Convolutional Network (SiGCN) to explicitly excavate latent context structure in query images. Specifically, we propose a Support-induced Graph Reasoning (SiGR) module to capture salient query object parts at different semantic levels with a Support-induced GCN. Furthermore, an instance association (IA) module is designed to capture high-order instance context from both support and query instances. By integrating the proposed two modules, SiGCN can learn rich query context representation, and thus being more robust to appearance variations. Extensive experiments on PASCAL-5i and COCO-20i demonstrate that our SiGCN achieves state-of-the-art performance. △ Less

Submitted 15 March, 2023; v1 submitted 9 January, 2023; originally announced January 2023.

Comments: Accepted in BMVC2022 as oral presentation

arXiv:2205.07358 [pdf, other]

doi 10.1002/mp.16779

End-to-end Memory-Efficient Reconstruction for Cone Beam CT

Authors: Nikita Moriakov, Jan-Jakob Sonke, Jonas Teuwen

Abstract: Cone Beam CT plays an important role in many medical fields nowadays, but the potential of this imaging modality is hampered by lower image quality compared to the conventional CT. A lot of recent research has been directed towards reconstruction methods relying on deep learning. However, practical application of deep learning to CBCT reconstruction is complicated by several issues, such as exceed… ▽ More Cone Beam CT plays an important role in many medical fields nowadays, but the potential of this imaging modality is hampered by lower image quality compared to the conventional CT. A lot of recent research has been directed towards reconstruction methods relying on deep learning. However, practical application of deep learning to CBCT reconstruction is complicated by several issues, such as exceedingly high memory costs of deep learning methods for fully 3D data. In this work, we address these limitations and propose LIRE: a learned invertible primal-dual iterative scheme for Cone Beam CT reconstruction. Memory requirements of the network are substantially reduced while preserving its expressive power, enabling us to train on data with isotropic 2mm voxel spacing, clinically-relevant projection count and detector panel resolution on current hardware with 24 GB VRAM. Two LIRE models for small and for large Field-of-View setting were trained and validated on a set of 260 + 22 thorax CT scans and tested using a set of 142 thorax CT scans plus an out-of-distribution dataset of 79 head \& neck CT scans. For both settings, our method surpasses the classical methods and the deep learning baselines on both test sets. On the thorax CT set, our method achieves PSNR of 33.84 $\pm$ 2.28 for the small FoV setting and 35.14 $\pm$ 2.69 for the large FoV setting; U-Net baseline achieves PSNR of 33.08 $\pm$ 1.75 and 34.29 $\pm$ 2.71 respectively. On the head \& neck CT set, our method achieves PSNR of 39.35 $\pm$ 1.75 for the small FoV setting and 41.21 $\pm$ 1.41 for the large FoV setting; U-Net baseline achieves PSNR of 33.08 $\pm$ 1.75 and 34.29 $\pm$ 2.71 respectively. Additionally, we demonstrate that LIRE can be finetuned to reconstruct high-resolution CBCT data with the same geometry but 1mm voxel spacing and higher detector panel resolution, where it outperforms the U-Net baseline as well. △ Less

Submitted 31 October, 2023; v1 submitted 15 May, 2022; originally announced May 2022.

arXiv:2204.10638 [pdf, other]

Dynamic Prototype Convolution Network for Few-Shot Semantic Segmentation

Authors: Jie Liu, Yanqi Bao, Guo-Sen Xie, Huan Xiong, Jan-Jakob Sonke, Efstratios Gavves

Abstract: The key challenge for few-shot semantic segmentation (FSS) is how to tailor a desirable interaction among support and query features and/or their prototypes, under the episodic training scenario. Most existing FSS methods implement such support-query interactions by solely leveraging plain operations - e.g., cosine similarity and feature concatenation - for segmenting the query objects. However, t… ▽ More The key challenge for few-shot semantic segmentation (FSS) is how to tailor a desirable interaction among support and query features and/or their prototypes, under the episodic training scenario. Most existing FSS methods implement such support-query interactions by solely leveraging plain operations - e.g., cosine similarity and feature concatenation - for segmenting the query objects. However, these interaction approaches usually cannot well capture the intrinsic object details in the query images that are widely encountered in FSS, e.g., if the query object to be segmented has holes and slots, inaccurate segmentation almost always happens. To this end, we propose a dynamic prototype convolution network (DPCN) to fully capture the aforementioned intrinsic details for accurate FSS. Specifically, in DPCN, a dynamic convolution module (DCM) is firstly proposed to generate dynamic kernels from support foreground, then information interaction is achieved by convolution operations over query features using these kernels. Moreover, we equip DPCN with a support activation module (SAM) and a feature filtering module (FFM) to generate pseudo mask and filter out background information for the query images, respectively. SAM and FFM together can mine enriched context information from the query features. Our DPCN is also flexible and efficient under the k-shot FSS setting. Extensive experiments on PASCAL-5i and COCO-20i show that DPCN yields superior performances under both 1-shot and 5-shot settings. △ Less

Submitted 22 April, 2022; originally announced April 2022.

Comments: Accepted in CVPR2022. Code will be available soon

arXiv:2111.09639 [pdf, other]

Recurrent Variational Network: A Deep Learning Inverse Problem Solver applied to the task of Accelerated MRI Reconstruction

Authors: George Yiasemis, Jan-Jakob Sonke, Clarisa Sánchez, Jonas Teuwen

Abstract: Magnetic Resonance Imaging can produce detailed images of the anatomy and physiology of the human body that can assist doctors in diagnosing and treating pathologies such as tumours. However, MRI suffers from very long acquisition times that make it susceptible to patient motion artifacts and limit its potential to deliver dynamic treatments. Conventional approaches such as Parallel Imaging and Co… ▽ More Magnetic Resonance Imaging can produce detailed images of the anatomy and physiology of the human body that can assist doctors in diagnosing and treating pathologies such as tumours. However, MRI suffers from very long acquisition times that make it susceptible to patient motion artifacts and limit its potential to deliver dynamic treatments. Conventional approaches such as Parallel Imaging and Compressed Sensing allow for an increase in MRI acquisition speed by reconstructing MR images from sub-sampled MRI data acquired using multiple receiver coils. Recent advancements in Deep Learning combined with Parallel Imaging and Compressed Sensing techniques have the potential to produce high-fidelity reconstructions from highly accelerated MRI data. In this work we present a novel Deep Learning-based Inverse Problem solver applied to the task of Accelerated MRI Reconstruction, called the Recurrent Variational Network (RecurrentVarNet), by exploiting the properties of Convolutional Recurrent Neural Networks and unrolled algorithms for solving Inverse Problems. The RecurrentVarNet consists of multiple recurrent blocks, each responsible for one iteration of the unrolled variational optimization scheme for solving the inverse problem of multi-coil Accelerated MRI Reconstruction. Contrary to traditional approaches, the optimization steps are performed in the observation domain ($k$-space) instead of the image domain. Each block of the RecurrentVarNet refines the observed $k$-space and comprises a data consistency term and a recurrent unit which takes as input a learned hidden state and the prediction of the previous block. Our proposed method achieves new state of the art qualitative and quantitative reconstruction results on 5-fold and 10-fold accelerated data from a public multi-coil brain dataset, outperforming previous conventional and deep learning-based approaches. △ Less

Submitted 29 March, 2022; v1 submitted 18 November, 2021; originally announced November 2021.

Comments: 18 pages, 10 figures, 3 tables, CVPR 22

arXiv:2110.15233 [pdf, other]

Subpixel object segmentation using wavelets and multi resolution analysis

Authors: Ray Sheombarsing, Nikita Moriakov, Jan-Jakob Sonke, Jonas Teuwen

Abstract: We propose a novel deep learning framework for fast prediction of boundaries of two-dimensional simply connected domains using wavelets and Multi Resolution Analysis (MRA). The boundaries are modelled as (piecewise) smooth closed curves using wavelets and the so-called Pyramid Algorithm. Our network architecture is a hybrid analog of the U-Net, where the down-sampling path is a two-dimensional enc… ▽ More We propose a novel deep learning framework for fast prediction of boundaries of two-dimensional simply connected domains using wavelets and Multi Resolution Analysis (MRA). The boundaries are modelled as (piecewise) smooth closed curves using wavelets and the so-called Pyramid Algorithm. Our network architecture is a hybrid analog of the U-Net, where the down-sampling path is a two-dimensional encoder with learnable filters, and the upsampling path is a one-dimensional decoder, which builds curves up from low to high resolution levels. Any wavelet basis induced by a MRA can be used. This flexibility allows for incorporation of priors on the smoothness of curves. The effectiveness of the proposed method is demonstrated by delineating boundaries of simply connected domains (organs) in medical images using Debauches wavelets and comparing performance with a U-Net baseline. Our model demonstrates up to 5x faster inference speed compared to the U-Net, while maintaining similar performance in terms of Dice score and Hausdorff distance. △ Less

Submitted 28 October, 2021; originally announced October 2021.

Comments: 19 pages, 10 figures, 1 table

arXiv:2108.07619 [pdf, other]

doi 10.1117/12.2609876

Deep MRI Reconstruction with Radial Subsampling

Authors: George Yiasemis, Chaoping Zhang, Clara I. Sánchez, Jan-Jakob Sonke, Jonas Teuwen

Abstract: In spite of its extensive adaptation in almost every medical diagnostic and examinatorial application, Magnetic Resonance Imaging (MRI) is still a slow imaging modality which limits its use for dynamic imaging. In recent years, Parallel Imaging (PI) and Compressed Sensing (CS) have been utilised to accelerate the MRI acquisition. In clinical settings, subsampling the k-space measurements during sc… ▽ More In spite of its extensive adaptation in almost every medical diagnostic and examinatorial application, Magnetic Resonance Imaging (MRI) is still a slow imaging modality which limits its use for dynamic imaging. In recent years, Parallel Imaging (PI) and Compressed Sensing (CS) have been utilised to accelerate the MRI acquisition. In clinical settings, subsampling the k-space measurements during scanning time using Cartesian trajectories, such as rectilinear sampling, is currently the most conventional CS approach applied which, however, is prone to producing aliased reconstructions. With the advent of the involvement of Deep Learning (DL) in accelerating the MRI, reconstructing faithful images from subsampled data became increasingly promising. Retrospectively applying a subsampling mask onto the k-space data is a way of simulating the accelerated acquisition of k-space data in real clinical setting. In this paper we compare and provide a review for the effect of applying either rectilinear or radial retrospective subsampling on the quality of the reconstructions outputted by trained deep neural networks. With the same choice of hyper-parameters, we train and evaluate two distinct Recurrent Inference Machines (RIMs), one for each type of subsampling. The qualitative and quantitative results of our experiments indicate that the model trained on data with radial subsampling attains higher performance and learns to estimate reconstructions with higher fidelity paving the way for other DL approaches to involve radial subsampling. △ Less

Submitted 12 January, 2022; v1 submitted 17 August, 2021; originally announced August 2021.

Comments: 9 pages, 7 figures, 1 table

Report number: Proc. SPIE 12031, Medical Imaging 2022: Physics of Medical Imaging, 1203136

arXiv:2012.07819 [pdf, other]

Reconstructing unseen modalities and pathology with an efficient Recurrent Inference Machine

Authors: Dimitrios Karkalousos, Kai Lønning, Hanneke E. Hulst, Serge O. Dumoulin, Jan-Jakob Sonke, Frans M. Vos, Matthan W. A. Caan

Abstract: Objective: To allow efficient learning using the Recurrent Inference Machine (RIM) for image reconstruction whereas not being strictly dependent on the training data distribution so that unseen modalities and pathologies are still accurately recovered. Methods: Theoretically, the RIM learns to solve the inverse problem of accelerated-MRI reconstruction whereas being robust to variable imaging cond… ▽ More Objective: To allow efficient learning using the Recurrent Inference Machine (RIM) for image reconstruction whereas not being strictly dependent on the training data distribution so that unseen modalities and pathologies are still accurately recovered. Methods: Theoretically, the RIM learns to solve the inverse problem of accelerated-MRI reconstruction whereas being robust to variable imaging conditions. The efficiency and generalization capabilities with different training datasets were studied, as well as recurrent network units with decreasing complexity: the Gated Recurrent Unit (GRU), the Minimal Gated Unit (MGU), and the Independently Recurrent Neural Network (IndRNN), to reduce inference times. Validation was performed against Compressed Sensing (CS) and further assessed based on data unseen during training. A pathology study was conducted by reconstructing simulated white matter lesions and prospectively undersampled data of a Multiple Sclerosis patient. Results: Training on a single modality of 3T $T_1$-weighted brain data appeared sufficient to also reconstruct 7T $T_{2}^*$-weighted brain and 3T $T_2$-weighted knee data. The IndRNN is an efficient recurrent unit, reducing inference time by 68\% compared to CS, whereas maintaining performance. The RIM was able to reconstruct lesions unseen during training more accurately than CS when trained on $T_2$-weighted knee data. Training on $T_1$-weighted brain data and on combined data slightly enhanced the signal compared to CS. Conclusion: The RIM is efficient when decreasing its complexity, which reduces the inference time, whereas still being able to reconstruct data and pathology that was unseen during training. △ Less

Submitted 14 December, 2020; originally announced December 2020.

Comments: 20 pages, 8 figures

arXiv:1908.10715 [pdf, other]

Learned SIRT for Cone Beam Computed Tomography Reconstruction

Authors: Roeland J. Dilz, Lukas Schröder, Nikita Moriakov, Jan-Jakob Sonke, Jonas Teuwen

Abstract: We introduce the learned simultaneous iterative reconstruction technique (SIRT) for tomographic reconstruction. The learned SIRT algorithm is a deep learning based reconstruction method combining model knowledge with a learned component. The algorithm is trained by mapping raw measured data to the reconstruction results over several iterations. The Learned SIRT algorithm is applied to a cone beam… ▽ More We introduce the learned simultaneous iterative reconstruction technique (SIRT) for tomographic reconstruction. The learned SIRT algorithm is a deep learning based reconstruction method combining model knowledge with a learned component. The algorithm is trained by mapping raw measured data to the reconstruction results over several iterations. The Learned SIRT algorithm is applied to a cone beam geometry on a circular orbit, a challenging problem for learned methods due to its 3D geometry and its inherent inability to completely capture the patient anatomy. A comparison of 2D reconstructions is shown, where the learned SIRT approach produces reconstructions with superior peak signal to noise ratio (PSNR) and structural similarity (SSIM), compared to FBP, SIRT and U-net post-processing and similar PSNR and SSIM compared to the learned primal dual algorithm. Similar results are shown for cone beam geometry reconstructions of a 3D Shepp Logan phantom, where we obtain between 9.9 and 28.1 dB improvement over FBP with a substantial improvement in SSIM. Finally we show that our algorithm scales to clinically relevant problems, and performs well when applied to measurements of a physical phantom. △ Less

Submitted 28 August, 2019; originally announced August 2019.

arXiv:1201.2450 [pdf]

doi 10.1118/1.4745559

Four-dimensional Cone Beam CT Reconstruction and Enhancement using a Temporal Non-Local Means Method

Authors: Xun Jia, Zhen Tian, Yifei Lou, Jan-Jakob Sonke, Steve B. Jiang

Abstract: Four-dimensional Cone Beam Computed Tomography (4D-CBCT) has been developed to provide respiratory phase resolved volumetric imaging in image guided radiation therapy (IGRT). Inadequate number of projections in each phase bin results in low quality 4D-CBCT images with obvious streaking artifacts. In this work, we propose two novel 4D-CBCT algorithms: an iterative reconstruction algorithm and an en… ▽ More Four-dimensional Cone Beam Computed Tomography (4D-CBCT) has been developed to provide respiratory phase resolved volumetric imaging in image guided radiation therapy (IGRT). Inadequate number of projections in each phase bin results in low quality 4D-CBCT images with obvious streaking artifacts. In this work, we propose two novel 4D-CBCT algorithms: an iterative reconstruction algorithm and an enhancement algorithm, utilizing a temporal nonlocal means (TNLM) method. We define a TNLM energy term for a given set of 4D-CBCT images. Minimization of this term favors those 4D-CBCT images such that any anatomical features at one spatial point at one phase can be found in a nearby spatial point at neighboring phases. 4D-CBCT reconstruction is achieved by minimizing a total energy containing a data fidelity term and the TNLM energy term. As for the image enhancement, 4D-CBCT images generated by the FDK algorithm are enhanced by minimizing the TNLM function while keeping the enhanced images close to the FDK results. A forward-backward splitting algorithm and a Gauss-Jacobi iteration method are employed to solve the problems. The algorithms are implemented on GPU to achieve a high computational efficiency. The reconstruction algorithm and the enhancement algorithm generate visually similar 4D-CBCT images, both better than the FDK results. Quantitative evaluations indicate that, compared with the FDK results, our reconstruction method improves contrast-to-noise-ratio (CNR) by a factor of 2.56~3.13 and our enhancement method increases the CNR by 2.75~3.33 times. The enhancement method also removes over 80% of the streak artifacts from the FDK results. The total computation time is ~460 sec for the reconstruction algorithm and ~610 sec for the enhancement algorithm on an NVIDIA Tesla C1060 GPU card. △ Less

Submitted 11 January, 2012; originally announced January 2012.

Comments: 20 pages, 3 figures, 2 tables

Showing 1–29 of 29 results for author: Sonke, J