-
Synthetic multi-inversion time magnetic resonance images for visualization of subcortical structures
Authors:
Savannah P. Hays,
Lianrui Zuo,
Anqi Feng,
Yihao Liu,
Blake E. Dewey,
Jiachen Zhuo,
Ellen M. Mowry,
Scott D. Newsome Jerry L. Prince,
Aaron Carass
Abstract:
Purpose: Visualization of subcortical gray matter is essential in neuroscience and clinical practice, particularly for disease understanding and surgical planning.While multi-inversion time (multi-TI) T$_1$-weighted (T$_1$-w) magnetic resonance (MR) imaging improves visualization, it is rarely acquired in clinical settings. Approach: We present SyMTIC (Synthetic Multi-TI Contrasts), a deep learnin…
▽ More
Purpose: Visualization of subcortical gray matter is essential in neuroscience and clinical practice, particularly for disease understanding and surgical planning.While multi-inversion time (multi-TI) T$_1$-weighted (T$_1$-w) magnetic resonance (MR) imaging improves visualization, it is rarely acquired in clinical settings. Approach: We present SyMTIC (Synthetic Multi-TI Contrasts), a deep learning method that generates synthetic multi-TI images using routinely acquired T$_1$-w, T$_2$-weighted (T$_2$-w), and FLAIR images. Our approach combines image translation via deep neural networks with imaging physics to estimate longitudinal relaxation time (T$_1$) and proton density (PD) maps. These maps are then used to compute multi-TI images with arbitrary inversion times. Results: SyMTIC was trained using paired MPRAGE and FGATIR images along with T$_2$-w and FLAIR images. It accurately synthesized multi-TI images from standard clinical inputs, achieving image quality comparable to that from explicitly acquired multi-TI data.The synthetic images, especially for TI values between 400-800 ms, enhanced visualization of subcortical structures and improved segmentation of thalamic nuclei. Conclusion: SyMTIC enables robust generation of high-quality multi-TI images from routine MR contrasts. It generalizes well to varied clinical datasets, including those with missing FLAIR images or unknown parameters, offering a practical solution for improving brain MR image visualization and analysis.
△ Less
Submitted 4 June, 2025;
originally announced June 2025.
-
Beyond the LUMIR challenge: The pathway to foundational registration models
Authors:
Junyu Chen,
Shuwen Wei,
Joel Honkamaa,
Pekka Marttinen,
Hang Zhang,
Min Liu,
Yichao Zhou,
Zuopeng Tan,
Zhuoyuan Wang,
Yi Wang,
Hongchao Zhou,
Shunbo Hu,
Yi Zhang,
Qian Tao,
Lukas Förner,
Thomas Wendler,
Bailiang Jian,
Benedikt Wiestler,
Tim Hable,
Jin Kim,
Dan Ruan,
Frederic Madesta,
Thilo Sentker,
Wiebke Heyer,
Lianrui Zuo
, et al. (11 additional authors not shown)
Abstract:
Medical image challenges have played a transformative role in advancing the field, catalyzing algorithmic innovation and establishing new performance standards across diverse clinical applications. Image registration, a foundational task in neuroimaging pipelines, has similarly benefited from the Learn2Reg initiative. Building on this foundation, we introduce the Large-scale Unsupervised Brain MRI…
▽ More
Medical image challenges have played a transformative role in advancing the field, catalyzing algorithmic innovation and establishing new performance standards across diverse clinical applications. Image registration, a foundational task in neuroimaging pipelines, has similarly benefited from the Learn2Reg initiative. Building on this foundation, we introduce the Large-scale Unsupervised Brain MRI Image Registration (LUMIR) challenge, a next-generation benchmark designed to assess and advance unsupervised brain MRI registration. Distinct from prior challenges that leveraged anatomical label maps for supervision, LUMIR removes this dependency by providing over 4,000 preprocessed T1-weighted brain MRIs for training without any label maps, encouraging biologically plausible deformation modeling through self-supervision. In addition to evaluating performance on 590 held-out test subjects, LUMIR introduces a rigorous suite of zero-shot generalization tasks, spanning out-of-domain imaging modalities (e.g., FLAIR, T2-weighted, T2*-weighted), disease populations (e.g., Alzheimer's disease), acquisition protocols (e.g., 9.4T MRI), and species (e.g., macaque brains). A total of 1,158 subjects and over 4,000 image pairs were included for evaluation. Performance was assessed using both segmentation-based metrics (Dice coefficient, 95th percentile Hausdorff distance) and landmark-based registration accuracy (target registration error). Across both in-domain and zero-shot tasks, deep learning-based methods consistently achieved state-of-the-art accuracy while producing anatomically plausible deformation fields. The top-performing deep learning-based models demonstrated diffeomorphic properties and inverse consistency, outperforming several leading optimization-based methods, and showing strong robustness to most domain shifts, the exception being a drop in performance on out-of-domain contrasts.
△ Less
Submitted 29 May, 2025;
originally announced May 2025.
-
Brightness-Invariant Tracking Estimation in Tagged MRI
Authors:
Zhangxing Bian,
Shuwen Wei,
Xiao Liang,
Yuan-Chiao Lu,
Samuel W. Remedios,
Fangxu Xing,
Jonghye Woo,
Dzung L. Pham,
Aaron Carass,
Philip V. Bayly,
Jiachen Zhuo,
Ahmed Alshareef,
Jerry L. Prince
Abstract:
Magnetic resonance (MR) tagging is an imaging technique for noninvasively tracking tissue motion in vivo by creating a visible pattern of magnetization saturation (tags) that deforms with the tissue. Due to longitudinal relaxation and progression to steady-state, the tags and tissue brightnesses change over time, which makes tracking with optical flow methods error-prone. Although Fourier methods…
▽ More
Magnetic resonance (MR) tagging is an imaging technique for noninvasively tracking tissue motion in vivo by creating a visible pattern of magnetization saturation (tags) that deforms with the tissue. Due to longitudinal relaxation and progression to steady-state, the tags and tissue brightnesses change over time, which makes tracking with optical flow methods error-prone. Although Fourier methods can alleviate these problems, they are also sensitive to brightness changes as well as spectral spreading due to motion. To address these problems, we introduce the brightness-invariant tracking estimation (BRITE) technique for tagged MRI. BRITE disentangles the anatomy from the tag pattern in the observed tagged image sequence and simultaneously estimates the Lagrangian motion. The inherent ill-posedness of this problem is addressed by leveraging the expressive power of denoising diffusion probabilistic models to represent the probabilistic distribution of the underlying anatomy and the flexibility of physics-informed neural networks to estimate biologically-plausible motion. A set of tagged MR images of a gel phantom was acquired with various tag periods and imaging flip angles to demonstrate the impact of brightness variations and to validate our method. The results show that BRITE achieves more accurate motion and strain estimates as compared to other state of the art methods, while also being resistant to tag fading.
△ Less
Submitted 23 May, 2025;
originally announced May 2025.
-
Meta-Learning an In-Context Transformer Model of Human Higher Visual Cortex
Authors:
Muquan Yu,
Mu Nan,
Hossein Adeli,
Jacob S. Prince,
John A. Pyles,
Leila Wehbe,
Margaret M. Henderson,
Michael J. Tarr,
Andrew F. Luo
Abstract:
Understanding functional representations within higher visual cortex is a fundamental question in computational neuroscience. While artificial neural networks pretrained on large-scale datasets exhibit striking representational alignment with human neural responses, learning image-computable models of visual cortex relies on individual-level, large-scale fMRI datasets. The necessity for expensive,…
▽ More
Understanding functional representations within higher visual cortex is a fundamental question in computational neuroscience. While artificial neural networks pretrained on large-scale datasets exhibit striking representational alignment with human neural responses, learning image-computable models of visual cortex relies on individual-level, large-scale fMRI datasets. The necessity for expensive, time-intensive, and often impractical data acquisition limits the generalizability of encoders to new subjects and stimuli. BraInCoRL uses in-context learning to predict voxelwise neural responses from few-shot examples without any additional finetuning for novel subjects and stimuli. We leverage a transformer architecture that can flexibly condition on a variable number of in-context image stimuli, learning an inductive bias over multiple subjects. During training, we explicitly optimize the model for in-context learning. By jointly conditioning on image features and voxel activations, our model learns to directly generate better performing voxelwise models of higher visual cortex. We demonstrate that BraInCoRL consistently outperforms existing voxelwise encoder designs in a low-data regime when evaluated on entirely novel images, while also exhibiting strong test-time scaling behavior. The model also generalizes to an entirely new visual fMRI dataset, which uses different subjects and fMRI data acquisition parameters. Further, BraInCoRL facilitates better interpretability of neural signals in higher visual cortex by attending to semantically relevant stimuli. Finally, we show that our framework enables interpretable mappings from natural language queries to voxel selectivity.
△ Less
Submitted 21 May, 2025;
originally announced May 2025.
-
A Speech-to-Video Synthesis Approach Using Spatio-Temporal Diffusion for Vocal Tract MRI
Authors:
Paula Andrea Pérez-Toro,
Tomás Arias-Vergara,
Fangxu Xing,
Xiaofeng Liu,
Maureen Stone,
Jiachen Zhuo,
Juan Rafael Orozco-Arroyave,
Elmar Nöth,
Jana Hutter,
Jerry L. Prince,
Andreas Maier,
Jonghye Woo
Abstract:
Understanding the relationship between vocal tract motion during speech and the resulting acoustic signal is crucial for aided clinical assessment and developing personalized treatment and rehabilitation strategies. Toward this goal, we introduce an audio-to-video generation framework for creating Real Time/cine-Magnetic Resonance Imaging (RT-/cine-MRI) visuals of the vocal tract from speech signa…
▽ More
Understanding the relationship between vocal tract motion during speech and the resulting acoustic signal is crucial for aided clinical assessment and developing personalized treatment and rehabilitation strategies. Toward this goal, we introduce an audio-to-video generation framework for creating Real Time/cine-Magnetic Resonance Imaging (RT-/cine-MRI) visuals of the vocal tract from speech signals. Our framework first preprocesses RT-/cine-MRI sequences and speech samples to achieve temporal alignment, ensuring synchronization between visual and audio data. We then employ a modified stable diffusion model, integrating structural and temporal blocks, to effectively capture movement characteristics and temporal dynamics in the synchronized data. This process enables the generation of MRI sequences from new speech inputs, improving the conversion of audio into visual data. We evaluated our framework on healthy controls and tongue cancer patients by analyzing and comparing the vocal tract movements in synthesized videos. Our framework demonstrated adaptability to new speech inputs and effective generalization. In addition, positive human evaluations confirmed its effectiveness, with realistic and accurate visualizations, suggesting its potential for outpatient therapy and personalized simulation of vocal tract visualizations.
△ Less
Submitted 15 March, 2025;
originally announced March 2025.
-
ECLARE: Efficient cross-planar learning for anisotropic resolution enhancement
Authors:
Samuel W. Remedios,
Shuwen Wei,
Shuo Han,
Jinwei Zhang,
Aaron Carass,
Kurt G. Schilling,
Dzung L. Pham,
Jerry L. Prince,
Blake E. Dewey
Abstract:
In clinical imaging, magnetic resonance (MR) image volumes are often acquired as stacks of 2D slices with decreased scan times, improved signal-to-noise ratio, and image contrasts unique to 2D MR pulse sequences. While this is sufficient for clinical evaluation, automated algorithms designed for 3D analysis perform poorly on multi-slice 2D MR volumes, especially those with thick slices and gaps be…
▽ More
In clinical imaging, magnetic resonance (MR) image volumes are often acquired as stacks of 2D slices with decreased scan times, improved signal-to-noise ratio, and image contrasts unique to 2D MR pulse sequences. While this is sufficient for clinical evaluation, automated algorithms designed for 3D analysis perform poorly on multi-slice 2D MR volumes, especially those with thick slices and gaps between slices. Super-resolution (SR) methods aim to address this problem, but previous methods do not address all of the following: slice profile shape estimation, slice gap, domain shift, and non-integer or arbitrary upsampling factors. In this paper, we propose ECLARE (Efficient Cross-planar Learning for Anisotropic Resolution Enhancement), a self-SR method that addresses each of these factors. ECLARE uses a slice profile estimated from the multi-slice 2D MR volume, trains a network to learn the mapping from low-resolution to high-resolution in-plane patches from the same volume, and performs SR with anti-aliasing. We compared ECLARE to cubic B-spline interpolation, SMORE, and other contemporary SR methods. We used realistic and representative simulations so that quantitative performance against ground truth can be computed, and ECLARE outperformed all other methods in both signal recovery and downstream tasks. Importantly, as ECLARE does not use external training data it cannot suffer from domain shift between training and testing. Our code is open-source and available at https://www.github.com/sremedios/eclare.
△ Less
Submitted 21 May, 2025; v1 submitted 14 March, 2025;
originally announced March 2025.
-
MedVAE: Efficient Automated Interpretation of Medical Images with Large-Scale Generalizable Autoencoders
Authors:
Maya Varma,
Ashwin Kumar,
Rogier van der Sluijs,
Sophie Ostmeier,
Louis Blankemeier,
Pierre Chambon,
Christian Bluethgen,
Jip Prince,
Curtis Langlotz,
Akshay Chaudhari
Abstract:
Medical images are acquired at high resolutions with large fields of view in order to capture fine-grained features necessary for clinical decision-making. Consequently, training deep learning models on medical images can incur large computational costs. In this work, we address the challenge of downsizing medical images in order to improve downstream computational efficiency while preserving clin…
▽ More
Medical images are acquired at high resolutions with large fields of view in order to capture fine-grained features necessary for clinical decision-making. Consequently, training deep learning models on medical images can incur large computational costs. In this work, we address the challenge of downsizing medical images in order to improve downstream computational efficiency while preserving clinically-relevant features. We introduce MedVAE, a family of six large-scale 2D and 3D autoencoders capable of encoding medical images as downsized latent representations and decoding latent representations back to high-resolution images. We train MedVAE autoencoders using a novel two-stage training approach with 1,052,730 medical images. Across diverse tasks obtained from 20 medical image datasets, we demonstrate that (1) utilizing MedVAE latent representations in place of high-resolution images when training downstream models can lead to efficiency benefits (up to 70x improvement in throughput) while simultaneously preserving clinically-relevant features and (2) MedVAE can decode latent representations back to high-resolution images with high fidelity. Our work demonstrates that large-scale, generalizable autoencoders can help address critical efficiency challenges in the medical domain. Our code is available at https://github.com/StanfordMIMI/MedVAE.
△ Less
Submitted 2 June, 2025; v1 submitted 20 February, 2025;
originally announced February 2025.
-
Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models
Authors:
Thomas Fel,
Ekdeep Singh Lubana,
Jacob S. Prince,
Matthew Kowal,
Victor Boutin,
Isabel Papadimitriou,
Binxu Wang,
Martin Wattenberg,
Demba Ba,
Talia Konkle
Abstract:
Sparse Autoencoders (SAEs) have emerged as a powerful framework for machine learning interpretability, enabling the unsupervised decomposition of model representations into a dictionary of abstract, human-interpretable concepts. However, we reveal a fundamental limitation: existing SAEs exhibit severe instability, as identical models trained on similar datasets can produce sharply different dictio…
▽ More
Sparse Autoencoders (SAEs) have emerged as a powerful framework for machine learning interpretability, enabling the unsupervised decomposition of model representations into a dictionary of abstract, human-interpretable concepts. However, we reveal a fundamental limitation: existing SAEs exhibit severe instability, as identical models trained on similar datasets can produce sharply different dictionaries, undermining their reliability as an interpretability tool. To address this issue, we draw inspiration from the Archetypal Analysis framework introduced by Cutler & Breiman (1994) and present Archetypal SAEs (A-SAE), wherein dictionary atoms are constrained to the convex hull of data. This geometric anchoring significantly enhances the stability of inferred dictionaries, and their mildly relaxed variants RA-SAEs further match state-of-the-art reconstruction abilities. To rigorously assess dictionary quality learned by SAEs, we introduce two new benchmarks that test (i) plausibility, if dictionaries recover "true" classification directions and (ii) identifiability, if dictionaries disentangle synthetic concept mixtures. Across all evaluations, RA-SAEs consistently yield more structured representations while uncovering novel, semantically meaningful concepts in large-scale vision models.
△ Less
Submitted 23 May, 2025; v1 submitted 18 February, 2025;
originally announced February 2025.
-
Pitfalls of defacing whole-head MRI: re-identification risk with diffusion models and compromised research potential
Authors:
Chenyu Gao,
Kaiwen Xu,
Michael E. Kim,
Lianrui Zuo,
Zhiyuan Li,
Derek B. Archer,
Timothy J. Hohman,
Ann Zenobia Moore,
Luigi Ferrucci,
Lori L. Beason-Held,
Susan M. Resnick,
Christos Davatzikos,
Jerry L. Prince,
Bennett A. Landman
Abstract:
Defacing is often applied to head magnetic resonance image (MRI) datasets prior to public release to address privacy concerns. The alteration of facial and nearby voxels has provoked discussions about the true capability of these techniques to ensure privacy as well as their impact on downstream tasks. With advancements in deep generative models, the extent to which defacing can protect privacy is…
▽ More
Defacing is often applied to head magnetic resonance image (MRI) datasets prior to public release to address privacy concerns. The alteration of facial and nearby voxels has provoked discussions about the true capability of these techniques to ensure privacy as well as their impact on downstream tasks. With advancements in deep generative models, the extent to which defacing can protect privacy is uncertain. Additionally, while the altered voxels are known to contain valuable anatomical information, their potential to support research beyond the anatomical regions directly affected by defacing remains uncertain. To evaluate these considerations, we develop a refacing pipeline that recovers faces in defaced head MRIs using cascaded diffusion probabilistic models (DPMs). The DPMs are trained on images from 180 subjects and tested on images from 484 unseen subjects, 469 of whom are from a different dataset. To assess whether the altered voxels in defacing contain universally useful information, we also predict computed tomography (CT)-derived skeletal muscle radiodensity from facial voxels in both defaced and original MRIs. The results show that DPMs can generate high-fidelity faces that resemble the original faces from defaced images, with surface distances to the original faces significantly smaller than those of a population average face (p < 0.05). This performance also generalizes well to previously unseen datasets. For skeletal muscle radiodensity predictions, using defaced images results in significantly weaker Spearman's rank correlation coefficients compared to using original images (p < 10-4). For shin muscle, the correlation is statistically significant (p < 0.05) when using original images but not statistically significant (p > 0.05) when any defacing method is applied, suggesting that defacing might not only fail to protect privacy but also eliminate valuable information.
△ Less
Submitted 30 January, 2025;
originally announced January 2025.
-
Unique MS Lesion Identification from MRI
Authors:
Carlos A. Rivas,
Jinwei Zhang,
Shuwen Wei,
Samuel W. Remedios,
Aaron Carass,
Jerry L. Prince
Abstract:
Unique identification of multiple sclerosis (MS) white matter lesions (WMLs) is important to help characterize MS progression. WMLs are routinely identified from magnetic resonance images (MRIs) but the resultant total lesion load does not correlate well with EDSS; whereas mean unique lesion volume has been shown to correlate with EDSS. Our approach builds on prior work by incorporating Hessian ma…
▽ More
Unique identification of multiple sclerosis (MS) white matter lesions (WMLs) is important to help characterize MS progression. WMLs are routinely identified from magnetic resonance images (MRIs) but the resultant total lesion load does not correlate well with EDSS; whereas mean unique lesion volume has been shown to correlate with EDSS. Our approach builds on prior work by incorporating Hessian matrix computation from lesion probability maps before using the random walker algorithm to estimate the volume of each unique lesion. Synthetic images demonstrate our ability to accurately count the number of lesions present. The takeaways, are: 1) that our method correctly identifies all lesions including many that are missed by previous methods; 2) we can better separate confluent lesions; and 3) we can accurately capture the total volume of WMLs in a given probability map. This work will allow new more meaningful statistics to be computed from WMLs in brain MRIs
△ Less
Submitted 12 October, 2024;
originally announced October 2024.
-
Bi-Directional MS Lesion Filling and Synthesis Using Denoising Diffusion Implicit Model-based Lesion Repainting
Authors:
Jinwei Zhang,
Lianrui Zuo,
Yihao Liu,
Samuel Remedios,
Bennett A. Landman,
Jerry L. Prince,
Aaron Carass
Abstract:
Automatic magnetic resonance (MR) image processing pipelines are widely used to study people with multiple sclerosis (PwMS), encompassing tasks such as lesion segmentation and brain parcellation. However, the presence of lesion often complicates these analysis, particularly in brain parcellation. Lesion filling is commonly used to mitigate this issue, but existing lesion filling algorithms often f…
▽ More
Automatic magnetic resonance (MR) image processing pipelines are widely used to study people with multiple sclerosis (PwMS), encompassing tasks such as lesion segmentation and brain parcellation. However, the presence of lesion often complicates these analysis, particularly in brain parcellation. Lesion filling is commonly used to mitigate this issue, but existing lesion filling algorithms often fall short in accurately reconstructing realistic lesion-free images, which are vital for consistent downstream analysis. Additionally, the performance of lesion segmentation algorithms is often limited by insufficient data with lesion delineation as training labels. In this paper, we propose a novel approach leveraging Denoising Diffusion Implicit Models (DDIMs) for both MS lesion filling and synthesis based on image inpainting. Our modified DDIM architecture, once trained, enables both MS lesion filing and synthesis. Specifically, it can generate lesion-free T1-weighted or FLAIR images from those containing lesions; Or it can add lesions to T1-weighted or FLAIR images of healthy subjects. The former is essential for downstream analyses that require lesion-free images, while the latter is valuable for augmenting training datasets for lesion segmentation tasks. We validate our approach through initial experiments in this paper and demonstrate promising results in both lesion filling and synthesis, paving the way for future work.
△ Less
Submitted 7 October, 2024;
originally announced October 2024.
-
RATNUS: Rapid, Automatic Thalamic Nuclei Segmentation using Multimodal MRI inputs
Authors:
Anqi Feng,
Zhangxing Bian,
Blake E. Dewey,
Alexa Gail Colinco,
Jiachen Zhuo,
Jerry L. Prince
Abstract:
Accurate segmentation of thalamic nuclei is important for better understanding brain function and improving disease treatment. Traditional segmentation methods often rely on a single T1-weighted image, which has limited contrast in the thalamus. In this work, we introduce RATNUS, which uses synthetic T1-weighted images with many inversion times along with diffusion-derived features to enhance the…
▽ More
Accurate segmentation of thalamic nuclei is important for better understanding brain function and improving disease treatment. Traditional segmentation methods often rely on a single T1-weighted image, which has limited contrast in the thalamus. In this work, we introduce RATNUS, which uses synthetic T1-weighted images with many inversion times along with diffusion-derived features to enhance the visibility of nuclei within the thalamus. Using these features, a convolutional neural network is used to segment 13 thalamic nuclei. For comparison with other methods, we introduce a unified nuclei labeling scheme. Our results demonstrate an 87.19% average true positive rate (TPR) against manual labeling. In comparison, FreeSurfer and THOMAS achieve TPRs of 64.25% and 57.64%, respectively, demonstrating the superiority of RATNUS in thalamic nuclei segmentation.
△ Less
Submitted 10 September, 2024;
originally announced September 2024.
-
Beyond MR Image Harmonization: Resolution Matters Too
Authors:
Savannah P. Hays,
Samuel W. Remedios,
Lianrui Zuo,
Ellen M. Mowry,
Scott D. Newsome,
Peter A. Calabresi,
Aaron Carass,
Blake E. Dewey,
Jerry L. Prince
Abstract:
Magnetic resonance (MR) imaging is commonly used in the clinical setting to non-invasively monitor the body. There exists a large variability in MR imaging due to differences in scanner hardware, software, and protocol design. Ideally, a processing algorithm should perform robustly to this variability, but that is not always the case in reality. This introduces a need for image harmonization to ov…
▽ More
Magnetic resonance (MR) imaging is commonly used in the clinical setting to non-invasively monitor the body. There exists a large variability in MR imaging due to differences in scanner hardware, software, and protocol design. Ideally, a processing algorithm should perform robustly to this variability, but that is not always the case in reality. This introduces a need for image harmonization to overcome issues of domain shift when performing downstream analysis such as segmentation. Most image harmonization models focus on acquisition parameters such as inversion time or repetition time, but they ignore an important aspect in MR imaging -- resolution. In this paper, we evaluate the impact of image resolution on harmonization using a pretrained harmonization algorithm. We simulate 2D acquisitions of various slice thicknesses and gaps from 3D acquired, 1mm3 isotropic MR images and demonstrate how the performance of a state-of-the-art image harmonization algorithm varies as resolution changes. We discuss the most ideal scenarios for image resolution including acquisition orientation when 3D imaging is not available, which is common for many clinical scanners. Our results show that harmonization on low-resolution images does not account for acquisition resolution and orientation variations. Super-resolution can be used to alleviate resolution variations but it is not always used. Our methodology can generalize to help evaluate the impact of image acquisition resolution for multiple tasks. Determining the limits of a pretrained algorithm is important when considering preprocessing steps and trust in the results.
△ Less
Submitted 30 August, 2024; v1 submitted 29 August, 2024;
originally announced August 2024.
-
Vector Field Attention for Deformable Image Registration
Authors:
Yihao Liu,
Junyu Chen,
Lianrui Zuo,
Aaron Carass,
Jerry L. Prince
Abstract:
Deformable image registration establishes non-linear spatial correspondences between fixed and moving images. Deep learning-based deformable registration methods have been widely studied in recent years due to their speed advantage over traditional algorithms as well as their better accuracy. Most existing deep learning-based methods require neural networks to encode location information in their…
▽ More
Deformable image registration establishes non-linear spatial correspondences between fixed and moving images. Deep learning-based deformable registration methods have been widely studied in recent years due to their speed advantage over traditional algorithms as well as their better accuracy. Most existing deep learning-based methods require neural networks to encode location information in their feature maps and predict displacement or deformation fields though convolutional or fully connected layers from these high-dimensional feature maps. In this work, we present Vector Field Attention (VFA), a novel framework that enhances the efficiency of the existing network design by enabling direct retrieval of location correspondences. VFA uses neural networks to extract multi-resolution feature maps from the fixed and moving images and then retrieves pixel-level correspondences based on feature similarity. The retrieval is achieved with a novel attention module without the need of learnable parameters. VFA is trained end-to-end in either a supervised or unsupervised manner. We evaluated VFA for intra- and inter-modality registration and for unsupervised and semi-supervised registration using public datasets, and we also evaluated it on the Learn2Reg challenge. Experimental results demonstrate the superior performance of VFA compared to existing methods. The source code of VFA is publicly available at https://github.com/yihao6/vfa/.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
Revisiting registration-based synthesis: A focus on unsupervised MR image synthesis
Authors:
Savannah P. Hays,
Lianrui Zuo,
Yihao Liu,
Anqi Feng,
Jiachen Zhuo,
Jerry L. Prince,
Aaron Carass
Abstract:
Deep learning (DL) has led to significant improvements in medical image synthesis, enabling advanced image-to-image translation to generate synthetic images. However, DL methods face challenges such as domain shift and high demands for training data, limiting their generalizability and applicability. Historically, image synthesis was also carried out using deformable image registration (DIR), a me…
▽ More
Deep learning (DL) has led to significant improvements in medical image synthesis, enabling advanced image-to-image translation to generate synthetic images. However, DL methods face challenges such as domain shift and high demands for training data, limiting their generalizability and applicability. Historically, image synthesis was also carried out using deformable image registration (DIR), a method that warps moving images of a desired modality to match the anatomy of a fixed image. However, concerns about its speed and accuracy led to its decline in popularity. With the recent advances of DL-based DIR, we now revisit and reinvigorate this line of research. In this paper, we propose a fast and accurate synthesis method based on DIR. We use the task of synthesizing a rare magnetic resonance (MR) sequence, white matter nulled (WMn) T1-weighted (T1-w) images, to demonstrate the potential of our approach. During training, our method learns a DIR model based on the widely available MPRAGE sequence, which is a cerebrospinal fluid nulled (CSFn) T1-w inversion recovery gradient echo pulse sequence. During testing, the trained DIR model is first applied to estimate the deformation between moving and fixed CSFn images. Subsequently, this estimated deformation is applied to align the paired WMn counterpart of the moving CSFn image, yielding a synthetic WMn image for the fixed CSFn image. Our experiments demonstrate promising results for unsupervised image synthesis using DIR. These findings highlight the potential of our technique in contexts where supervised synthesis methods are constrained by limited training data.
△ Less
Submitted 19 February, 2024;
originally announced February 2024.
-
Speech motion anomaly detection via cross-modal translation of 4D motion fields from tagged MRI
Authors:
Xiaofeng Liu,
Fangxu Xing,
Jiachen Zhuo,
Maureen Stone,
Jerry L. Prince,
Georges El Fakhri,
Jonghye Woo
Abstract:
Understanding the relationship between tongue motion patterns during speech and their resulting speech acoustic outcomes -- i.e., articulatory-acoustic relation -- is of great importance in assessing speech quality and developing innovative treatment and rehabilitative strategies. This is especially important when evaluating and detecting abnormal articulatory features in patients with speech-rela…
▽ More
Understanding the relationship between tongue motion patterns during speech and their resulting speech acoustic outcomes -- i.e., articulatory-acoustic relation -- is of great importance in assessing speech quality and developing innovative treatment and rehabilitative strategies. This is especially important when evaluating and detecting abnormal articulatory features in patients with speech-related disorders. In this work, we aim to develop a framework for detecting speech motion anomalies in conjunction with their corresponding speech acoustics. This is achieved through the use of a deep cross-modal translator trained on data from healthy individuals only, which bridges the gap between 4D motion fields obtained from tagged MRI and 2D spectrograms derived from speech acoustic data. The trained translator is used as an anomaly detector, by measuring the spectrogram reconstruction quality on healthy individuals or patients. In particular, the cross-modal translator is likely to yield limited generalization capabilities on patient data, which includes unseen out-of-distribution patterns and demonstrates subpar performance, when compared with healthy individuals.~A one-class SVM is then used to distinguish the spectrograms of healthy individuals from those of patients. To validate our framework, we collected a total of 39 paired tagged MRI and speech waveforms, consisting of data from 36 healthy individuals and 3 tongue cancer patients. We used both 3D convolutional and transformer-based deep translation models, training them on the healthy training set and then applying them to both the healthy and patient testing sets. Our framework demonstrates a capability to detect abnormal patient data, thereby illustrating its potential in enhancing the understanding of the articulatory-acoustic relation for both healthy individuals and patients.
△ Less
Submitted 10 February, 2024;
originally announced February 2024.
-
Is Registering Raw Tagged-MR Enough for Strain Estimation in the Era of Deep Learning?
Authors:
Zhangxing Bian,
Ahmed Alshareef,
Shuwen Wei,
Junyu Chen,
Yuli Wang,
Jonghye Woo,
Dzung L. Pham,
Jiachen Zhuo,
Aaron Carass,
Jerry L. Prince
Abstract:
Magnetic Resonance Imaging with tagging (tMRI) has long been utilized for quantifying tissue motion and strain during deformation. However, a phenomenon known as tag fading, a gradual decrease in tag visibility over time, often complicates post-processing. The first contribution of this study is to model tag fading by considering the interplay between $T_1$ relaxation and the repeated application…
▽ More
Magnetic Resonance Imaging with tagging (tMRI) has long been utilized for quantifying tissue motion and strain during deformation. However, a phenomenon known as tag fading, a gradual decrease in tag visibility over time, often complicates post-processing. The first contribution of this study is to model tag fading by considering the interplay between $T_1$ relaxation and the repeated application of radio frequency (RF) pulses during serial imaging sequences. This is a factor that has been overlooked in prior research on tMRI post-processing. Further, we have observed an emerging trend of utilizing raw tagged MRI within a deep learning-based (DL) registration framework for motion estimation. In this work, we evaluate and analyze the impact of commonly used image similarity objectives in training DL registrations on raw tMRI. This is then compared with the Harmonic Phase-based approach, a traditional approach which is claimed to be robust to tag fading. Our findings, derived from both simulated images and an actual phantom scan, reveal the limitations of various similarity losses in raw tMRI and emphasize caution in registration tasks where image intensity changes over time.
△ Less
Submitted 30 January, 2024;
originally announced January 2024.
-
Super-resolution multi-contrast unbiased eye atlases with deep probabilistic refinement
Authors:
Ho Hin Lee,
Adam M. Saunders,
Michael E. Kim,
Samuel W. Remedios,
Lucas W. Remedios,
Yucheng Tang,
Qi Yang,
Xin Yu,
Shunxing Bao,
Chloe Cho,
Louise A. Mawn,
Tonia S. Rex,
Kevin L. Schey,
Blake E. Dewey,
Jeffrey M. Spraggins,
Jerry L. Prince,
Yuankai Huo,
Bennett A. Landman
Abstract:
Purpose: Eye morphology varies significantly across the population, especially for the orbit and optic nerve. These variations limit the feasibility and robustness of generalizing population-wise features of eye organs to an unbiased spatial reference.
Approach: To tackle these limitations, we propose a process for creating high-resolution unbiased eye atlases. First, to restore spatial details…
▽ More
Purpose: Eye morphology varies significantly across the population, especially for the orbit and optic nerve. These variations limit the feasibility and robustness of generalizing population-wise features of eye organs to an unbiased spatial reference.
Approach: To tackle these limitations, we propose a process for creating high-resolution unbiased eye atlases. First, to restore spatial details from scans with a low through-plane resolution compared to a high in-plane resolution, we apply a deep learning-based super-resolution algorithm. Then, we generate an initial unbiased reference with an iterative metric-based registration using a small portion of subject scans. We register the remaining scans to this template and refine the template using an unsupervised deep probabilistic approach that generates a more expansive deformation field to enhance the organ boundary alignment. We demonstrate this framework using magnetic resonance images across four different tissue contrasts, generating four atlases in separate spatial alignments.
Results: For each tissue contrast, we find a significant improvement using the Wilcoxon signed-rank test in the average Dice score across four labeled regions compared to a standard registration framework consisting of rigid, affine, and deformable transformations. These results highlight the effective alignment of eye organs and boundaries using our proposed process.
Conclusions: By combining super-resolution preprocessing and deep probabilistic models, we address the challenge of generating an eye atlas to serve as a standardized reference across a largely variable population.
△ Less
Submitted 14 November, 2024; v1 submitted 5 January, 2024;
originally announced January 2024.
-
AniRes2D: Anisotropic Residual-enhanced Diffusion for 2D MR Super-Resolution
Authors:
Zejun Wu,
Samuel W. Remedios,
Blake E. Dewey,
Aaron Carass,
Jerry L. Prince
Abstract:
Anisotropic low-resolution (LR) magnetic resonance (MR) images are fast to obtain but hinder automated processing. We propose to use denoising diffusion probabilistic models (DDPMs) to super-resolve these 2D-acquired LR MR slices. This paper introduces AniRes2D, a novel approach combining DDPM with a residual prediction for 2D super-resolution (SR). Results demonstrate that AniRes2D outperforms se…
▽ More
Anisotropic low-resolution (LR) magnetic resonance (MR) images are fast to obtain but hinder automated processing. We propose to use denoising diffusion probabilistic models (DDPMs) to super-resolve these 2D-acquired LR MR slices. This paper introduces AniRes2D, a novel approach combining DDPM with a residual prediction for 2D super-resolution (SR). Results demonstrate that AniRes2D outperforms several other DDPM-based models in quantitative metrics, visual quality, and out-of-domain evaluation. We use a trained AniRes2D to super-resolve 3D volumes slice by slice, where comparative quantitative results and reduced skull aliasing are achieved compared to a recent state-of-the-art self-supervised 3D super-resolution method. Furthermore, we explored the use of noise conditioning augmentation (NCA) as an alternative augmentation technique for DDPM-based SR models, but it was found to reduce performance. Our findings contribute valuable insights to the application of DDPMs for SR of anisotropic MR images.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
Towards an accurate and generalizable multiple sclerosis lesion segmentation model using self-ensembled lesion fusion
Authors:
Jinwei Zhang,
Lianrui Zuo,
Blake E. Dewey,
Samuel W. Remedios,
Dzung L. Pham,
Aaron Carass,
Jerry L. Prince
Abstract:
Automatic multiple sclerosis (MS) lesion segmentation using multi-contrast magnetic resonance (MR) images provides improved efficiency and reproducibility compared to manual delineation. Current state-of-the-art automatic MS lesion segmentation methods utilize modified U-Net-like architectures. However, in the literature, dedicated architecture modifications were always required to maximize their…
▽ More
Automatic multiple sclerosis (MS) lesion segmentation using multi-contrast magnetic resonance (MR) images provides improved efficiency and reproducibility compared to manual delineation. Current state-of-the-art automatic MS lesion segmentation methods utilize modified U-Net-like architectures. However, in the literature, dedicated architecture modifications were always required to maximize their performance. In addition, the best-performing methods have not proven to be generalizable to diverse test datasets with contrast variations and image artifacts. In this work, we developed an accurate and generalizable MS lesion segmentation model using the well-known U-Net architecture without further modification. A novel test-time self-ensembled lesion fusion strategy is proposed that not only achieved the best performance using the ISBI 2015 MS segmentation challenge data but also demonstrated robustness across various self-ensemble parameter choices. Moreover, equipped with instance normalization rather than batch normalization widely used in literature, the model trained on the ISBI challenge data generalized well on clinical test datasets from different scanners.
△ Less
Submitted 3 December, 2023;
originally announced December 2023.
-
Harmonization-enriched domain adaptation with light fine-tuning for multiple sclerosis lesion segmentation
Authors:
Jinwei Zhang,
Lianrui Zuo,
Blake E. Dewey,
Samuel W. Remedios,
Savannah P. Hays,
Dzung L. Pham,
Jerry L. Prince,
Aaron Carass
Abstract:
Deep learning algorithms utilizing magnetic resonance (MR) images have demonstrated cutting-edge proficiency in autonomously segmenting multiple sclerosis (MS) lesions. Despite their achievements, these algorithms may struggle to extend their performance across various sites or scanners, leading to domain generalization errors. While few-shot or one-shot domain adaptation emerges as a potential so…
▽ More
Deep learning algorithms utilizing magnetic resonance (MR) images have demonstrated cutting-edge proficiency in autonomously segmenting multiple sclerosis (MS) lesions. Despite their achievements, these algorithms may struggle to extend their performance across various sites or scanners, leading to domain generalization errors. While few-shot or one-shot domain adaptation emerges as a potential solution to mitigate generalization errors, its efficacy might be hindered by the scarcity of labeled data in the target domain. This paper seeks to tackle this challenge by integrating one-shot adaptation data with harmonized training data that incorporates labels. Our approach involves synthesizing new training data with a contrast akin to that of the test domain, a process we refer to as "contrast harmonization" in MRI. Our experiments illustrate that the amalgamation of one-shot adaptation data with harmonized training data surpasses the performance of utilizing either data source in isolation. Notably, domain adaptation using exclusively harmonized training data achieved comparable or even superior performance compared to one-shot adaptation. Moreover, all adaptations required only minimal fine-tuning, ranging from 2 to 5 epochs for convergence.
△ Less
Submitted 31 October, 2023;
originally announced October 2023.
-
Speech Audio Synthesis from Tagged MRI and Non-Negative Matrix Factorization via Plastic Transformer
Authors:
Xiaofeng Liu,
Fangxu Xing,
Maureen Stone,
Jiachen Zhuo,
Sidney Fels,
Jerry L. Prince,
Georges El Fakhri,
Jonghye Woo
Abstract:
The tongue's intricate 3D structure, comprising localized functional units, plays a crucial role in the production of speech. When measured using tagged MRI, these functional units exhibit cohesive displacements and derived quantities that facilitate the complex process of speech production. Non-negative matrix factorization-based approaches have been shown to estimate the functional units through…
▽ More
The tongue's intricate 3D structure, comprising localized functional units, plays a crucial role in the production of speech. When measured using tagged MRI, these functional units exhibit cohesive displacements and derived quantities that facilitate the complex process of speech production. Non-negative matrix factorization-based approaches have been shown to estimate the functional units through motion features, yielding a set of building blocks and a corresponding weighting map. Investigating the link between weighting maps and speech acoustics can offer significant insights into the intricate process of speech production. To this end, in this work, we utilize two-dimensional spectrograms as a proxy representation, and develop an end-to-end deep learning framework for translating weighting maps to their corresponding audio waveforms. Our proposed plastic light transformer (PLT) framework is based on directional product relative position bias and single-level spatial pyramid pooling, thus enabling flexible processing of weighting maps with variable size to fixed-size spectrograms, without input information loss or dimension expansion. Additionally, our PLT framework efficiently models the global correlation of wide matrix input. To improve the realism of our generated spectrograms with relatively limited training samples, we apply pair-wise utterance consistency with Maximum Mean Discrepancy constraint and adversarial training. Experimental results on a dataset of 29 subjects speaking two utterances demonstrated that our framework is able to synthesize speech audio waveforms from weighting maps, outperforming conventional convolution and transformer models.
△ Less
Submitted 25 September, 2023;
originally announced September 2023.
-
3D View Prediction Models of the Dorsal Visual Stream
Authors:
Gabriel Sarch,
Hsiao-Yu Fish Tung,
Aria Wang,
Jacob Prince,
Michael Tarr
Abstract:
Deep neural network representations align well with brain activity in the ventral visual stream. However, the primate visual system has a distinct dorsal processing stream with different functional properties. To test if a model trained to perceive 3D scene geometry aligns better with neural responses in dorsal visual areas, we trained a self-supervised geometry-aware recurrent neural network (GRN…
▽ More
Deep neural network representations align well with brain activity in the ventral visual stream. However, the primate visual system has a distinct dorsal processing stream with different functional properties. To test if a model trained to perceive 3D scene geometry aligns better with neural responses in dorsal visual areas, we trained a self-supervised geometry-aware recurrent neural network (GRNN) to predict novel camera views using a 3D feature memory. We compared GRNN to self-supervised baseline models that have been shown to align well with ventral regions using the large-scale fMRI Natural Scenes Dataset (NSD). We found that while the baseline models accounted better for ventral brain regions, GRNN accounted for a greater proportion of variance in dorsal brain regions. Our findings demonstrate the potential for using task-relevant models to probe representational differences across visual streams.
△ Less
Submitted 4 September, 2023;
originally announced September 2023.
-
MomentaMorph: Unsupervised Spatial-Temporal Registration with Momenta, Shooting, and Correction
Authors:
Zhangxing Bian,
Shuwen Wei,
Yihao Liu,
Junyu Chen,
Jiachen Zhuo,
Fangxu Xing,
Jonghye Woo,
Aaron Carass,
Jerry L. Prince
Abstract:
Tagged magnetic resonance imaging (tMRI) has been employed for decades to measure the motion of tissue undergoing deformation. However, registration-based motion estimation from tMRI is difficult due to the periodic patterns in these images, particularly when the motion is large. With a larger motion the registration approach gets trapped in a local optima, leading to motion estimation errors. We…
▽ More
Tagged magnetic resonance imaging (tMRI) has been employed for decades to measure the motion of tissue undergoing deformation. However, registration-based motion estimation from tMRI is difficult due to the periodic patterns in these images, particularly when the motion is large. With a larger motion the registration approach gets trapped in a local optima, leading to motion estimation errors. We introduce a novel "momenta, shooting, and correction" framework for Lagrangian motion estimation in the presence of repetitive patterns and large motion. This framework, grounded in Lie algebra and Lie group principles, accumulates momenta in the tangent vector space and employs exponential mapping in the diffeomorphic space for rapid approximation towards true optima, circumventing local optima. A subsequent correction step ensures convergence to true optima. The results on a 2D synthetic dataset and a real 3D tMRI dataset demonstrate our method's efficiency in estimating accurate, dense, and diffeomorphic 2D/3D motion fields amidst large motion and repetitive patterns.
△ Less
Submitted 5 August, 2023;
originally announced August 2023.
-
A survey on deep learning in medical image registration: new technologies, uncertainty, evaluation metrics, and beyond
Authors:
Junyu Chen,
Yihao Liu,
Shuwen Wei,
Zhangxing Bian,
Shalini Subramanian,
Aaron Carass,
Jerry L. Prince,
Yong Du
Abstract:
Deep learning technologies have dramatically reshaped the field of medical image registration over the past decade. The initial developments, such as regression-based and U-Net-based networks, established the foundation for deep learning in image registration. Subsequent progress has been made in various aspects of deep learning-based registration, including similarity measures, deformation regula…
▽ More
Deep learning technologies have dramatically reshaped the field of medical image registration over the past decade. The initial developments, such as regression-based and U-Net-based networks, established the foundation for deep learning in image registration. Subsequent progress has been made in various aspects of deep learning-based registration, including similarity measures, deformation regularizations, network architectures, and uncertainty estimation. These advancements have not only enriched the field of image registration but have also facilitated its application in a wide range of tasks, including atlas construction, multi-atlas segmentation, motion estimation, and 2D-3D registration. In this paper, we present a comprehensive overview of the most recent advancements in deep learning-based image registration. We begin with a concise introduction to the core concepts of deep learning-based image registration. Then, we delve into innovative network architectures, loss functions specific to registration, and methods for estimating registration uncertainty. Additionally, this paper explores appropriate evaluation metrics for assessing the performance of deep learning models in registration tasks. Finally, we highlight the practical applications of these novel techniques in medical imaging and discuss the future prospects of deep learning-based image registration.
△ Less
Submitted 1 November, 2024; v1 submitted 28 July, 2023;
originally announced July 2023.
-
Transformers in Reinforcement Learning: A Survey
Authors:
Pranav Agarwal,
Aamer Abdul Rahman,
Pierre-Luc St-Charles,
Simon J. D. Prince,
Samira Ebrahimi Kahou
Abstract:
Transformers have significantly impacted domains like natural language processing, computer vision, and robotics, where they improve performance compared to other neural networks. This survey explores how transformers are used in reinforcement learning (RL), where they are seen as a promising solution for addressing challenges such as unstable training, credit assignment, lack of interpretability,…
▽ More
Transformers have significantly impacted domains like natural language processing, computer vision, and robotics, where they improve performance compared to other neural networks. This survey explores how transformers are used in reinforcement learning (RL), where they are seen as a promising solution for addressing challenges such as unstable training, credit assignment, lack of interpretability, and partial observability. We begin by providing a brief domain overview of RL, followed by a discussion on the challenges of classical RL algorithms. Next, we delve into the properties of the transformer and its variants and discuss the characteristics that make them well-suited to address the challenges inherent in RL. We examine the application of transformers to various aspects of RL, including representation learning, transition and reward function modeling, and policy optimization. We also discuss recent research that aims to enhance the interpretability and efficiency of transformers in RL, using visualization techniques and efficient training strategies. Often, the transformer architecture must be tailored to the specific needs of a given application. We present a broad overview of how transformers have been adapted for several applications, including robotics, medicine, language modeling, cloud computing, and combinatorial optimization. We conclude by discussing the limitations of using transformers in RL and assess their potential for catalyzing future breakthroughs in this field.
△ Less
Submitted 12 July, 2023;
originally announced July 2023.
-
Deep Learning and Ethics
Authors:
Travis LaCroix,
Simon J. D. Prince
Abstract:
This article appears as chapter 21 of Prince (2023, Understanding Deep Learning); a complete draft of the textbook is available here: http://udlbook.com. This chapter considers potential harms arising from the design and use of AI systems. These include algorithmic bias, lack of explainability, data privacy violations, militarization, fraud, and environmental concerns. The aim is not to provide ad…
▽ More
This article appears as chapter 21 of Prince (2023, Understanding Deep Learning); a complete draft of the textbook is available here: http://udlbook.com. This chapter considers potential harms arising from the design and use of AI systems. These include algorithmic bias, lack of explainability, data privacy violations, militarization, fraud, and environmental concerns. The aim is not to provide advice on being more ethical. Instead, the goal is to express ideas and start conversations in key areas that have received attention in philosophy, political science, and the broader social sciences.
△ Less
Submitted 20 June, 2023; v1 submitted 24 May, 2023;
originally announced May 2023.
-
Attentive Continuous Generative Self-training for Unsupervised Domain Adaptive Medical Image Translation
Authors:
Xiaofeng Liu,
Jerry L. Prince,
Fangxu Xing,
Jiachen Zhuo,
Reese Timothy,
Maureen Stone,
Georges El Fakhri,
Jonghye Woo
Abstract:
Self-training is an important class of unsupervised domain adaptation (UDA) approaches that are used to mitigate the problem of domain shift, when applying knowledge learned from a labeled source domain to unlabeled and heterogeneous target domains. While self-training-based UDA has shown considerable promise on discriminative tasks, including classification and segmentation, through reliable pseu…
▽ More
Self-training is an important class of unsupervised domain adaptation (UDA) approaches that are used to mitigate the problem of domain shift, when applying knowledge learned from a labeled source domain to unlabeled and heterogeneous target domains. While self-training-based UDA has shown considerable promise on discriminative tasks, including classification and segmentation, through reliable pseudo-label filtering based on the maximum softmax probability, there is a paucity of prior work on self-training-based UDA for generative tasks, including image modality translation. To fill this gap, in this work, we seek to develop a generative self-training (GST) framework for domain adaptive image translation with continuous value prediction and regression objectives. Specifically, we quantify both aleatoric and epistemic uncertainties within our GST using variational Bayes learning to measure the reliability of synthesized data. We also introduce a self-attention scheme that de-emphasizes the background region to prevent it from dominating the training process. The adaptation is then carried out by an alternating optimization scheme with target domain supervision that focuses attention on the regions with reliable pseudo-labels. We evaluated our framework on two cross-scanner/center, inter-subject translation tasks, including tagged-to-cine magnetic resonance (MR) image translation and T1-weighted MR-to-fractional anisotropy translation. Extensive validations with unpaired target domain data showed that our GST yielded superior synthesis performance in comparison to adversarial training UDA methods.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Rapid Brain Meninges Surface Reconstruction with Layer Topology Guarantee
Authors:
Peiyu Duan,
Yuan Xue,
Shuo Han,
Lianrui Zuo,
Aaron Carass,
Caitlyn Bernhard,
Savannah Hays,
Peter A. Calabresi,
Susan M. Resnick,
James S. Duncan,
Jerry L. Prince
Abstract:
The meninges, located between the skull and brain, are composed of three membrane layers: the pia, the arachnoid, and the dura. Reconstruction of these layers can aid in studying volume differences between patients with neurodegenerative diseases and normal aging subjects. In this work, we use convolutional neural networks (CNNs) to reconstruct surfaces representing meningeal layer boundaries from…
▽ More
The meninges, located between the skull and brain, are composed of three membrane layers: the pia, the arachnoid, and the dura. Reconstruction of these layers can aid in studying volume differences between patients with neurodegenerative diseases and normal aging subjects. In this work, we use convolutional neural networks (CNNs) to reconstruct surfaces representing meningeal layer boundaries from magnetic resonance (MR) images. We first use the CNNs to predict the signed distance functions (SDFs) representing these surfaces while preserving their anatomical ordering. The marching cubes algorithm is then used to generate continuous surface representations; both the subarachnoid space (SAS) and the intracranial volume (ICV) are computed from these surfaces. The proposed method is compared to a state-of-the-art deformable model-based reconstruction method, and we show that our method can reconstruct smoother and more accurate surfaces using less computation time. Finally, we conduct experiments with volumetric analysis on both subjects with multiple sclerosis and healthy controls. For healthy and MS subjects, ICVs and SAS volumes are found to be significantly correlated to sex (p<0.01) and age (p<0.03) changes, respectively.
△ Less
Submitted 12 April, 2023;
originally announced April 2023.
-
Optimal operating MR contrast for brain ventricle parcellation
Authors:
Savannah P. Hays,
Lianrui Zuo,
Yuli Wang,
Mark G. Luciano,
Aaron Carass,
Jerry L. Prince
Abstract:
Development of MR harmonization has enabled different contrast MRIs to be synthesized while preserving the underlying anatomy. In this paper, we use image harmonization to explore the impact of different T1-w MR contrasts on a state-of-the-art ventricle parcellation algorithm VParNet. We identify an optimal operating contrast (OOC) for ventricle parcellation; by showing that the performance of a p…
▽ More
Development of MR harmonization has enabled different contrast MRIs to be synthesized while preserving the underlying anatomy. In this paper, we use image harmonization to explore the impact of different T1-w MR contrasts on a state-of-the-art ventricle parcellation algorithm VParNet. We identify an optimal operating contrast (OOC) for ventricle parcellation; by showing that the performance of a pretrained VParNet can be boosted by adjusting contrast to the OOC.
△ Less
Submitted 4 April, 2023;
originally announced April 2023.
-
DrDisco: Deep Registration for Distortion Correction of Diffusion MRI with single phase-encoding
Authors:
Zhangxing Bian,
Muhan Shao,
Aaron Carass,
Jerry L. Prince
Abstract:
Diffusion-weighted magnetic resonance imaging (DW-MRI) is a non-invasive way of imaging white matter tracts in the human brain. DW-MRIs are usually acquired using echo-planar imaging (EPI) with high gradient fields, which could introduce severe geometric distortions that interfere with further analyses. Most tools for correcting distortion require two minimally weighted DW-MRI images (B0) acquired…
▽ More
Diffusion-weighted magnetic resonance imaging (DW-MRI) is a non-invasive way of imaging white matter tracts in the human brain. DW-MRIs are usually acquired using echo-planar imaging (EPI) with high gradient fields, which could introduce severe geometric distortions that interfere with further analyses. Most tools for correcting distortion require two minimally weighted DW-MRI images (B0) acquired with different phase-encoding directions, and they can take hours to process per subject. Since a great amount of diffusion data are only acquired with a single phase-encoding direction, the application of existing approaches is limited. We propose a deep learning-based registration approach to correct distortion using only the B0 acquired from a single phase-encoding direction. Specifically, we register undistorted T1-weighted images and distorted B0 to remove the distortion through a deep learning model. We apply a differentiable mutual information loss during training to improve inter-modality alignment. Experiments on the Human Connectome Project dataset show the proposed method outperforms SyN and VoxelMorph on several metrics, and only takes a few seconds to process one subject.
△ Less
Submitted 1 April, 2023;
originally announced April 2023.
-
Label Propagation via Random Walk for Training Robust Thalamus Nuclei Parcellation Model from Noisy Annotations
Authors:
Anqi Feng,
Yuan Xue,
Yuli Wang,
Chang Yan,
Zhangxing Bian,
Muhan Shao,
Jiachen Zhuo,
Rao P. Gullapalli,
Aaron Carass,
Jerry L. Prince
Abstract:
Data-driven thalamic nuclei parcellation depends on high-quality manual annotations. However, the small size and low contrast changes among thalamic nuclei, yield annotations that are often incomplete, noisy, or ambiguously labelled. To train a robust thalamic nuclei parcellation model with noisy annotations, we propose a label propagation algorithm based on random walker to refine the annotations…
▽ More
Data-driven thalamic nuclei parcellation depends on high-quality manual annotations. However, the small size and low contrast changes among thalamic nuclei, yield annotations that are often incomplete, noisy, or ambiguously labelled. To train a robust thalamic nuclei parcellation model with noisy annotations, we propose a label propagation algorithm based on random walker to refine the annotations before model training. A two-step model was trained to generate first the whole thalamus and then the nuclei masks. We conducted experiments on a mild traumatic brain injury~(mTBI) dataset with noisy thalamic nuclei annotations. Our model outperforms current state-of-the-art thalamic nuclei parcellations by a clear margin. We believe our method can also facilitate the training of other parcellation models with noisy labels.
△ Less
Submitted 30 March, 2023;
originally announced March 2023.
-
Automated Ventricle Parcellation and Evan's Ratio Computation in Pre- and Post-Surgical Ventriculomegaly
Authors:
Yuli Wang,
Anqi Feng,
Yuan Xue,
Lianrui Zuo,
Yihao Liu,
Ari M. Blitz,
Mark G. Luciano,
Aaron Carass,
Jerry L. Prince
Abstract:
Normal pressure hydrocephalus~(NPH) is a brain disorder associated with enlarged ventricles and multiple cognitive and motor symptoms. The degree of ventricular enlargement can be measured using magnetic resonance images~(MRIs) and characterized quantitatively using the Evan's ratio (ER). Automatic computation of ER is desired to avoid the extra time and variations associated with manual measureme…
▽ More
Normal pressure hydrocephalus~(NPH) is a brain disorder associated with enlarged ventricles and multiple cognitive and motor symptoms. The degree of ventricular enlargement can be measured using magnetic resonance images~(MRIs) and characterized quantitatively using the Evan's ratio (ER). Automatic computation of ER is desired to avoid the extra time and variations associated with manual measurements on MRI. Because shunt surgery is often used to treat NPH, it is necessary that this process be robust to image artifacts caused by the shunt and related implants. In this paper, we propose a 3D regions-of-interest aware (ROI-aware) network for segmenting the ventricles. The method achieves state-of-the-art performance on both pre-surgery MRIs and post-surgery MRIs with artifacts. Based on our segmentation results, we also describe an automated approach to compute ER from these results. Experimental results on multiple datasets demonstrate the potential of the proposed method to assist clinicians in the diagnosis and management of NPH.
△ Less
Submitted 6 March, 2023; v1 submitted 3 March, 2023;
originally announced March 2023.
-
FastCod: Fast Brain Connectivity in Diffusion Imaging
Authors:
Zhangxing Bian,
Muhan Shao,
Jiachen Zhuo,
Rao P. Gullapalli,
Aaron Carass,
Jerry L. Prince
Abstract:
Connectivity information derived from diffusion-weighted magnetic resonance images~(DW-MRIs) plays an important role in studying human subcortical gray matter structures. However, due to the $O(N^2)$ complexity of computing the connectivity of each voxel to every other voxel (or multiple ROIs), the current practice of extracting connectivity information is highly inefficient. This makes the proces…
▽ More
Connectivity information derived from diffusion-weighted magnetic resonance images~(DW-MRIs) plays an important role in studying human subcortical gray matter structures. However, due to the $O(N^2)$ complexity of computing the connectivity of each voxel to every other voxel (or multiple ROIs), the current practice of extracting connectivity information is highly inefficient. This makes the processing of high-resolution images and population-level analyses very computationally demanding. To address this issue, we propose a more efficient way to extract connectivity information; briefly, we consider two regions/voxels to be connected if a white matter fiber streamline passes through them -- no matter where the streamline originates. We consider the thalamus parcellation task for demonstration purposes; our experiments show that our approach brings a 30 to 120 times speedup over traditional approaches with comparable qualitative parcellation results. We also demonstrate high-resolution connectivity features can be super-resolved from low-resolution DW-MRI in our framework. Together, these two innovations enable higher resolution connectivity analysis from DW-MRI. Our source code is availible at jasonbian97.github.io/fastcod.
△ Less
Submitted 18 February, 2023;
originally announced February 2023.
-
Synthesizing audio from tongue motion during speech using tagged MRI via transformer
Authors:
Xiaofeng Liu,
Fangxu Xing,
Jerry L. Prince,
Maureen Stone,
Georges El Fakhri,
Jonghye Woo
Abstract:
Investigating the relationship between internal tissue point motion of the tongue and oropharyngeal muscle deformation measured from tagged MRI and intelligible speech can aid in advancing speech motor control theories and developing novel treatment methods for speech related-disorders. However, elucidating the relationship between these two sources of information is challenging, due in part to th…
▽ More
Investigating the relationship between internal tissue point motion of the tongue and oropharyngeal muscle deformation measured from tagged MRI and intelligible speech can aid in advancing speech motor control theories and developing novel treatment methods for speech related-disorders. However, elucidating the relationship between these two sources of information is challenging, due in part to the disparity in data structure between spatiotemporal motion fields (i.e., 4D motion fields) and one-dimensional audio waveforms. In this work, we present an efficient encoder-decoder translation network for exploring the predictive information inherent in 4D motion fields via 2D spectrograms as a surrogate of the audio data. Specifically, our encoder is based on 3D convolutional spatial modeling and transformer-based temporal modeling. The extracted features are processed by an asymmetric 2D convolution decoder to generate spectrograms that correspond to 4D motion fields. Furthermore, we incorporate a generative adversarial training approach into our framework to further improve synthesis quality on our generated spectrograms. We experiment on 63 paired motion field sequences and speech waveforms, demonstrating that our framework enables the generation of clear audio waveforms from a sequence of motion fields. Thus, our framework has the potential to improve our understanding of the relationship between these two modalities and inform the development of treatments for speech disorders.
△ Less
Submitted 14 February, 2023;
originally announced February 2023.
-
New starting point registration method for tagged MRI tongue motion estimation
Authors:
Jinglun Yu,
Muhan Shao,
Zhangxing Bian,
Xiao Liang,
Jiachen Zhuo,
Maureen Stone,
Jerry L. Prince
Abstract:
Accurate tongue motion estimation is essential for tongue function evaluation. The harmonic phase processing (HARP) method and the phase vector incompressible registration algorithm (PVIRA) based on HARP can generate motion estimates from tagged MRI images, but they suffer from tag jumping due to large motions. This paper proposes a new registration method by combining the stationary velocity fiel…
▽ More
Accurate tongue motion estimation is essential for tongue function evaluation. The harmonic phase processing (HARP) method and the phase vector incompressible registration algorithm (PVIRA) based on HARP can generate motion estimates from tagged MRI images, but they suffer from tag jumping due to large motions. This paper proposes a new registration method by combining the stationary velocity fields produced by PVIRA between successive time frames as a new initialization of the final registration stage to avoid tag jumping. The experiment results demonstrate the proposed method can avoid tag jumping and outperform the existing methods on tongue motion estimates.
△ Less
Submitted 8 February, 2023;
originally announced February 2023.
-
A latent space for unsupervised MR image quality control via artifact assessment
Authors:
Lianrui Zuo,
Yuan Xue,
Blake E. Dewey,
Yihao Liu,
Jerry L. Prince,
Aaron Carass
Abstract:
Image quality control (IQC) can be used in automated magnetic resonance (MR) image analysis to exclude erroneous results caused by poorly acquired or artifact-laden images. Existing IQC methods for MR imaging generally require human effort to craft meaningful features or label large datasets for supervised training. The involvement of human labor can be burdensome and biased, as labeling MR images…
▽ More
Image quality control (IQC) can be used in automated magnetic resonance (MR) image analysis to exclude erroneous results caused by poorly acquired or artifact-laden images. Existing IQC methods for MR imaging generally require human effort to craft meaningful features or label large datasets for supervised training. The involvement of human labor can be burdensome and biased, as labeling MR images based on their quality is a subjective task. In this paper, we propose an automatic IQC method that evaluates the extent of artifacts in MR images without supervision. In particular, we design an artifact encoding network that learns representations of artifacts based on contrastive learning. We then use a normalizing flow to estimate the density of learned representations for unsupervised classification. Our experiments on large-scale multi-cohort MR datasets show that the proposed method accurately detects images with high levels of artifacts, which can inform downstream analysis tasks about potentially flawed data.
△ Less
Submitted 1 February, 2023;
originally announced February 2023.
-
DRIMET: Deep Registration for 3D Incompressible Motion Estimation in Tagged-MRI with Application to the Tongue
Authors:
Zhangxing Bian,
Fangxu Xing,
Jinglun Yu,
Muhan Shao,
Yihao Liu,
Aaron Carass,
Jiachen Zhuo,
Jonghye Woo,
Jerry L. Prince
Abstract:
Tagged magnetic resonance imaging~(MRI) has been used for decades to observe and quantify the detailed motion of deforming tissue. However, this technique faces several challenges such as tag fading, large motion, long computation times, and difficulties in obtaining diffeomorphic incompressible flow fields. To address these issues, this paper presents a novel unsupervised phase-based 3D motion es…
▽ More
Tagged magnetic resonance imaging~(MRI) has been used for decades to observe and quantify the detailed motion of deforming tissue. However, this technique faces several challenges such as tag fading, large motion, long computation times, and difficulties in obtaining diffeomorphic incompressible flow fields. To address these issues, this paper presents a novel unsupervised phase-based 3D motion estimation technique for tagged MRI. We introduce two key innovations. First, we apply a sinusoidal transformation to the harmonic phase input, which enables end-to-end training and avoids the need for phase interpolation. Second, we propose a Jacobian determinant-based learning objective to encourage incompressible flow fields for deforming biological tissues. Our method efficiently estimates 3D motion fields that are accurate, dense, and approximately diffeomorphic and incompressible. The efficacy of the method is assessed using human tongue motion during speech, and includes both healthy controls and patients that have undergone glossectomy. We show that the method outperforms existing approaches, and also exhibits improvements in speed, robustness to tag fading, and large tongue motion. The code is available: https://github.com/jasonbian97/DRIMET-tagged-MRI
△ Less
Submitted 30 April, 2023; v1 submitted 17 January, 2023;
originally announced January 2023.
-
Segmenting thalamic nuclei from manifold projections of multi-contrast MRI
Authors:
Chang Yan,
Muhan Shao,
Zhangxing Bian,
Anqi Feng,
Yuan Xue,
Jiachen Zhuo,
Rao P. Gullapalli,
Aaron Carass,
Jerry L. Prince
Abstract:
The thalamus is a subcortical gray matter structure that plays a key role in relaying sensory and motor signals within the brain. Its nuclei can atrophy or otherwise be affected by neurological disease and injuries including mild traumatic brain injury. Segmenting both the thalamus and its nuclei is challenging because of the relatively low contrast within and around the thalamus in conventional m…
▽ More
The thalamus is a subcortical gray matter structure that plays a key role in relaying sensory and motor signals within the brain. Its nuclei can atrophy or otherwise be affected by neurological disease and injuries including mild traumatic brain injury. Segmenting both the thalamus and its nuclei is challenging because of the relatively low contrast within and around the thalamus in conventional magnetic resonance (MR) images. This paper explores imaging features to determine key tissue signatures that naturally cluster, from which we can parcellate thalamic nuclei. Tissue contrasts include T1-weighted and T2-weighted images, MR diffusion measurements including FA, mean diffusivity, Knutsson coefficients that represent fiber orientation, and synthetic multi-TI images derived from FGATIR and T1-weighted images. After registration of these contrasts and isolation of the thalamus, we use the uniform manifold approximation and projection (UMAP) method for dimensionality reduction to produce a low-dimensional representation of the data within the thalamus. Manual labeling of the thalamus provides labels for our UMAP embedding from which k nearest neighbors can be used to label new unseen voxels in that same UMAP embedding. N -fold cross-validation of the method reveals comparable performance to state-of-the-art methods for thalamic parcellation.
△ Less
Submitted 31 January, 2023; v1 submitted 15 January, 2023;
originally announced January 2023.
-
HACA3: A Unified Approach for Multi-site MR Image Harmonization
Authors:
Lianrui Zuo,
Yihao Liu,
Yuan Xue,
Blake E. Dewey,
Samuel W. Remedios,
Savannah P. Hays,
Murat Bilgel,
Ellen M. Mowry,
Scott D. Newsome,
Peter A. Calabresi,
Susan M. Resnick,
Jerry L. Prince,
Aaron Carass
Abstract:
The lack of standardization is a prominent issue in magnetic resonance (MR) imaging. This often causes undesired contrast variations in the acquired images due to differences in hardware and acquisition parameters. In recent years, image synthesis-based MR harmonization with disentanglement has been proposed to compensate for the undesired contrast variations. Despite the success of existing metho…
▽ More
The lack of standardization is a prominent issue in magnetic resonance (MR) imaging. This often causes undesired contrast variations in the acquired images due to differences in hardware and acquisition parameters. In recent years, image synthesis-based MR harmonization with disentanglement has been proposed to compensate for the undesired contrast variations. Despite the success of existing methods, we argue that three major improvements can be made. First, most existing methods are built upon the assumption that multi-contrast MR images of the same subject share the same anatomy. This assumption is questionable, since different MR contrasts are specialized to highlight different anatomical features. Second, these methods often require a fixed set of MR contrasts for training (e.g., both T1-weighted and T2-weighted images), limiting their applicability. Lastly, existing methods are generally sensitive to imaging artifacts. In this paper, we present Harmonization with Attention-based Contrast, Anatomy, and Artifact Awareness (HACA3), a novel approach to address these three issues. HACA3 incorporates an anatomy fusion module that accounts for the inherent anatomical differences between MR contrasts. Furthermore, HACA3 is also robust to imaging artifacts and can be trained and applied to any set of MR contrasts. HACA3 is developed and evaluated on diverse MR datasets acquired from 21 sites with varying field strengths, scanner platforms, and acquisition protocols. Experiments show that HACA3 achieves state-of-the-art performance under multiple image quality metrics. We also demonstrate the applicability and versatility of HACA3 on downstream tasks including white matter lesion segmentation and longitudinal volumetric analyses.
△ Less
Submitted 25 April, 2023; v1 submitted 12 December, 2022;
originally announced December 2022.
-
On Finite Difference Jacobian Computation in Deformable Image Registration
Authors:
Yihao Liu,
Junyu Chen,
Shuwen Wei,
Aaron Carass,
Jerry Prince
Abstract:
Producing spatial transformations that are diffeomorphic is a key goal in deformable image registration. As a diffeomorphic transformation should have positive Jacobian determinant |J| everywhere, the number of voxels with |J|<0 has been used to test for diffeomorphism and also to measure the irregularity of the transformation. For digital transformations, |J| is commonly approximated using a cent…
▽ More
Producing spatial transformations that are diffeomorphic is a key goal in deformable image registration. As a diffeomorphic transformation should have positive Jacobian determinant |J| everywhere, the number of voxels with |J|<0 has been used to test for diffeomorphism and also to measure the irregularity of the transformation. For digital transformations, |J| is commonly approximated using a central difference, but this strategy can yield positive |J|'s for transformations that are clearly not diffeomorphic -- even at the voxel resolution level. To show this, we first investigate the geometric meaning of different finite difference approximations of |J|. We show that to determine if a deformation is diffeomorphic for digital images, the use of any individual finite difference approximation of |J| is insufficient. We further demonstrate that for a 2D transformation, four unique finite difference approximations of |J|'s must be positive to ensure that the entire domain is invertible and free of folding at the pixel level. For a 3D transformation, ten unique finite differences approximations of |J|'s are required to be positive. Our proposed digital diffeomorphism criteria solves several errors inherent in the central difference approximation of |J| and accurately detects non-diffeomorphic digital transformations. The source code of this work is available at https://github.com/yihao6/digital_diffeomorphism.
△ Less
Submitted 28 May, 2023; v1 submitted 12 December, 2022;
originally announced December 2022.
-
Thermochromic Metal Halide Perovskite Windows with Ideal Transition Temperatures
Authors:
Bryan A. Rosales,
Janghyun Kim,
Vincent M. Wheeler,
Laura E. Crowe,
Kevin J. Prince,
Mirzo Mirzokarimov,
Tom Daligault,
Adam Duell,
Colin A. Wolden,
Laura T. Schelhas,
Lance M. Wheeler
Abstract:
Urban centers across the globe are responsible for a significant fraction of energy consumption and CO2 emission. As urban centers continue to grow, the popularity of glass as cladding material in urban buildings is an alarming trend. Dynamic windows reduce heating and cooling loads in buildings by passive heating in cold seasons and mitigating solar heat gain in hot seasons. In this work, we deve…
▽ More
Urban centers across the globe are responsible for a significant fraction of energy consumption and CO2 emission. As urban centers continue to grow, the popularity of glass as cladding material in urban buildings is an alarming trend. Dynamic windows reduce heating and cooling loads in buildings by passive heating in cold seasons and mitigating solar heat gain in hot seasons. In this work, we develop a mesoscopic building energy model that demonstrates reduced building energy consumption when thermochromic windows are employed. Savings are realized across eight disparate climate zones of the United States. We use the model to determine the ideal critical transition temperature of 20 to 27.5 °C for thermochromic windows based on metal halide perovskite materials. Ideal transition temperatures are realized experimentally in composite metal halide perovskite film composed of perovskite crystals and an adjacent reservoir phase. The transition temperature is controlled by co-intercalating methanol, instead of water, with methylammonium iodide and tailoring the hydrogen-bonding chemistry of the reservoir phase. Thermochromic windows based on metal halide perovskites represent a clear opportunity to mitigate the effects of energy-hungry buildings.
△ Less
Submitted 6 November, 2022;
originally announced November 2022.
-
Deep filter bank regression for super-resolution of anisotropic MR brain images
Authors:
Samuel W. Remedios,
Shuo Han,
Yuan Xue,
Aaron Carass,
Trac D. Tran,
Dzung L. Pham,
Jerry L. Prince
Abstract:
In 2D multi-slice magnetic resonance (MR) acquisition, the through-plane signals are typically of lower resolution than the in-plane signals. While contemporary super-resolution (SR) methods aim to recover the underlying high-resolution volume, the estimated high-frequency information is implicit via end-to-end data-driven training rather than being explicitly stated and sought. To address this, w…
▽ More
In 2D multi-slice magnetic resonance (MR) acquisition, the through-plane signals are typically of lower resolution than the in-plane signals. While contemporary super-resolution (SR) methods aim to recover the underlying high-resolution volume, the estimated high-frequency information is implicit via end-to-end data-driven training rather than being explicitly stated and sought. To address this, we reframe the SR problem statement in terms of perfect reconstruction filter banks, enabling us to identify and directly estimate the missing information. In this work, we propose a two-stage approach to approximate the completion of a perfect reconstruction filter bank corresponding to the anisotropic acquisition of a particular scan. In stage 1, we estimate the missing filters using gradient descent and in stage 2, we use deep networks to learn the mapping from coarse coefficients to detail coefficients. In addition, the proposed formulation does not rely on external training data, circumventing the need for domain shift correction. Under our approach, SR performance is improved particularly in "slice gap" scenarios, likely due to the constrained solution space imposed by the framework.
△ Less
Submitted 6 September, 2022;
originally announced September 2022.
-
Tagged-MRI Sequence to Audio Synthesis via Self Residual Attention Guided Heterogeneous Translator
Authors:
Xiaofeng Liu,
Fangxu Xing,
Jerry L. Prince,
Jiachen Zhuo,
Maureen Stone,
Georges El Fakhri,
Jonghye Woo
Abstract:
Understanding the underlying relationship between tongue and oropharyngeal muscle deformation seen in tagged-MRI and intelligible speech plays an important role in advancing speech motor control theories and treatment of speech related-disorders. Because of their heterogeneous representations, however, direct mapping between the two modalities -- i.e., two-dimensional (mid-sagittal slice) plus tim…
▽ More
Understanding the underlying relationship between tongue and oropharyngeal muscle deformation seen in tagged-MRI and intelligible speech plays an important role in advancing speech motor control theories and treatment of speech related-disorders. Because of their heterogeneous representations, however, direct mapping between the two modalities -- i.e., two-dimensional (mid-sagittal slice) plus time tagged-MRI sequence and its corresponding one-dimensional waveform -- is not straightforward. Instead, we resort to two-dimensional spectrograms as an intermediate representation, which contains both pitch and resonance, from which to develop an end-to-end deep learning framework to translate from a sequence of tagged-MRI to its corresponding audio waveform with limited dataset size.~Our framework is based on a novel fully convolutional asymmetry translator with guidance of a self residual attention strategy to specifically exploit the moving muscular structures during speech.~In addition, we leverage a pairwise correlation of the samples with the same utterances with a latent space representation disentanglement strategy.~Furthermore, we incorporate an adversarial training approach with generative adversarial networks to offer improved realism on our generated spectrograms.~Our experimental results, carried out with a total of 63 tagged-MRI sequences alongside speech acoustics, showed that our framework enabled the generation of clear audio waveforms from a sequence of tagged-MRI, surpassing competing methods. Thus, our framework provides the great potential to help better understand the relationship between the two modalities.
△ Less
Submitted 25 September, 2022; v1 submitted 5 June, 2022;
originally announced June 2022.
-
Disentangling A Single MR Modality
Authors:
Lianrui Zuo,
Yihao Liu,
Yuan Xue,
Shuo Han,
Murat Bilgel,
Susan M. Resnick,
Jerry L. Prince,
Aaron Carass
Abstract:
Disentangling anatomical and contrast information from medical images has gained attention recently, demonstrating benefits for various image analysis tasks. Current methods learn disentangled representations using either paired multi-modal images with the same underlying anatomy or auxiliary labels (e.g., manual delineations) to provide inductive bias for disentanglement. However, these requireme…
▽ More
Disentangling anatomical and contrast information from medical images has gained attention recently, demonstrating benefits for various image analysis tasks. Current methods learn disentangled representations using either paired multi-modal images with the same underlying anatomy or auxiliary labels (e.g., manual delineations) to provide inductive bias for disentanglement. However, these requirements could significantly increase the time and cost in data collection and limit the applicability of these methods when such data are not available. Moreover, these methods generally do not guarantee disentanglement. In this paper, we present a novel framework that learns theoretically and practically superior disentanglement from single modality magnetic resonance images. Moreover, we propose a new information-based metric to quantitatively evaluate disentanglement. Comparisons over existing disentangling methods demonstrate that the proposed method achieves superior performance in both disentanglement and cross-domain image-to-image translation tasks.
△ Less
Submitted 10 May, 2022;
originally announced May 2022.
-
Coordinate Translator for Learning Deformable Medical Image Registration
Authors:
Yihao Liu,
Lianrui Zuo,
Shuo Han,
Yuan Xue,
Jerry L. Prince,
Aaron Carass
Abstract:
The majority of deep learning (DL) based deformable image registration methods use convolutional neural networks (CNNs) to estimate displacement fields from pairs of moving and fixed images. This, however, requires the convolutional kernels in the CNN to not only extract intensity features from the inputs but also understand image coordinate systems. We argue that the latter task is challenging fo…
▽ More
The majority of deep learning (DL) based deformable image registration methods use convolutional neural networks (CNNs) to estimate displacement fields from pairs of moving and fixed images. This, however, requires the convolutional kernels in the CNN to not only extract intensity features from the inputs but also understand image coordinate systems. We argue that the latter task is challenging for traditional CNNs, limiting their performance in registration tasks. To tackle this problem, we first introduce Coordinate Translator, a differentiable module that identifies matched features between the fixed and moving image and outputs their coordinate correspondences without the need for training. It unloads the burden of understanding image coordinate systems for CNNs, allowing them to focus on feature extraction. We then propose a novel deformable registration network, im2grid, that uses multiple Coordinate Translator's with the hierarchical features extracted from a CNN encoder and outputs a deformation field in a coarse-to-fine fashion. We compared im2grid with the state-of-the-art DL and non-DL methods for unsupervised 3D magnetic resonance image registration. Our experiments show that im2grid outperforms these methods both qualitatively and quantitatively.
△ Less
Submitted 31 July, 2022; v1 submitted 5 March, 2022;
originally announced March 2022.
-
Structure-aware Unsupervised Tagged-to-Cine MRI Synthesis with Self Disentanglement
Authors:
Xiaofeng Liu,
Fangxu Xing,
Jerry L. Prince,
Maureen Stone,
Georges El Fakhri,
Jonghye Woo
Abstract:
Cycle reconstruction regularized adversarial training -- e.g., CycleGAN, DiscoGAN, and DualGAN -- has been widely used for image style transfer with unpaired training data. Several recent works, however, have shown that local distortions are frequent, and structural consistency cannot be guaranteed. Targeting this issue, prior works usually relied on additional segmentation or consistent feature e…
▽ More
Cycle reconstruction regularized adversarial training -- e.g., CycleGAN, DiscoGAN, and DualGAN -- has been widely used for image style transfer with unpaired training data. Several recent works, however, have shown that local distortions are frequent, and structural consistency cannot be guaranteed. Targeting this issue, prior works usually relied on additional segmentation or consistent feature extraction steps that are task-specific. To counter this, this work aims to learn a general add-on structural feature extractor, by explicitly enforcing the structural alignment between an input and its synthesized image. Specifically, we propose a novel input-output image patches self-training scheme to achieve a disentanglement of underlying anatomical structures and imaging modalities. The translator and structure encoder are updated, following an alternating training protocol. In addition, the information w.r.t. imaging modality can be eliminated with an asymmetric adversarial game. We train, validate, and test our network on 1,768, 416, and 1,560 unpaired subject-independent slices of tagged and cine magnetic resonance imaging from a total of twenty healthy subjects, respectively, demonstrating superior performance over competing methods.
△ Less
Submitted 24 February, 2022;
originally announced February 2022.
-
Generative Self-training for Cross-domain Unsupervised Tagged-to-Cine MRI Synthesis
Authors:
Xiaofeng Liu,
Fangxu Xing,
Maureen Stone,
Jiachen Zhuo,
Reese Timothy,
Jerry L. Prince,
Georges El Fakhri,
Jonghye Woo
Abstract:
Self-training based unsupervised domain adaptation (UDA) has shown great potential to address the problem of domain shift, when applying a trained deep learning model in a source domain to unlabeled target domains. However, while the self-training UDA has demonstrated its effectiveness on discriminative tasks, such as classification and segmentation, via the reliable pseudo-label selection based o…
▽ More
Self-training based unsupervised domain adaptation (UDA) has shown great potential to address the problem of domain shift, when applying a trained deep learning model in a source domain to unlabeled target domains. However, while the self-training UDA has demonstrated its effectiveness on discriminative tasks, such as classification and segmentation, via the reliable pseudo-label selection based on the softmax discrete histogram, the self-training UDA for generative tasks, such as image synthesis, is not fully investigated. In this work, we propose a novel generative self-training (GST) UDA framework with continuous value prediction and regression objective for cross-domain image synthesis. Specifically, we propose to filter the pseudo-label with an uncertainty mask, and quantify the predictive confidence of generated images with practical variational Bayes learning. The fast test-time adaptation is achieved by a round-based alternative optimization scheme. We validated our framework on the tagged-to-cine magnetic resonance imaging (MRI) synthesis problem, where datasets in the source and target domains were acquired from different scanners or centers. Extensive validations were carried out to verify our framework against popular adversarial training UDA methods. Results show that our GST, with tagged MRI of test subjects in new target domains, improved the synthesis quality by a large margin, compared with the adversarial training UDA methods.
△ Less
Submitted 23 June, 2021;
originally announced June 2021.
-
MR Slice Profile Estimation by Learning to Match Internal Patch Distributions
Authors:
Shuo Han,
Samuel Remedios,
Aaron Carass,
Michael Schär,
Jerry L. Prince
Abstract:
To super-resolve the through-plane direction of a multi-slice 2D magnetic resonance (MR) image, its slice selection profile can be used as the degeneration model from high resolution (HR) to low resolution (LR) to create paired data when training a supervised algorithm. Existing super-resolution algorithms make assumptions about the slice selection profile since it is not readily known for a given…
▽ More
To super-resolve the through-plane direction of a multi-slice 2D magnetic resonance (MR) image, its slice selection profile can be used as the degeneration model from high resolution (HR) to low resolution (LR) to create paired data when training a supervised algorithm. Existing super-resolution algorithms make assumptions about the slice selection profile since it is not readily known for a given image. In this work, we estimate a slice selection profile given a specific image by learning to match its internal patch distributions. Specifically, we assume that after applying the correct slice selection profile, the image patch distribution along HR in-plane directions should match the distribution along the LR through-plane direction. Therefore, we incorporate the estimation of a slice selection profile as part of learning a generator in a generative adversarial network (GAN). In this way, the slice selection profile can be learned without any external data. Our algorithm was tested using simulations from isotropic MR images, incorporated in a through-plane super-resolution algorithm to demonstrate its benefits, and also used as a tool to measure image resolution. Our code is at https://github.com/shuohan/espreso2.
△ Less
Submitted 31 March, 2021;
originally announced April 2021.
-
Information-based Disentangled Representation Learning for Unsupervised MR Harmonization
Authors:
Lianrui Zuo,
Blake E. Dewey,
Aaron Carass,
Yihao Liu,
Yufan He,
Peter A. Calabresi,
Jerry L. Prince
Abstract:
Accuracy and consistency are two key factors in computer-assisted magnetic resonance (MR) image analysis. However, contrast variation from site to site caused by lack of standardization in MR acquisition impedes consistent measurements. In recent years, image harmonization approaches have been proposed to compensate for contrast variation in MR images. Current harmonization approaches either requi…
▽ More
Accuracy and consistency are two key factors in computer-assisted magnetic resonance (MR) image analysis. However, contrast variation from site to site caused by lack of standardization in MR acquisition impedes consistent measurements. In recent years, image harmonization approaches have been proposed to compensate for contrast variation in MR images. Current harmonization approaches either require cross-site traveling subjects for supervised training or heavily rely on site-specific harmonization models to encourage harmonization accuracy. These requirements potentially limit the application of current harmonization methods in large-scale multi-site studies. In this work, we propose an unsupervised MR harmonization framework, CALAMITI (Contrast Anatomy Learning and Analysis for MR Intensity Translation and Integration), based on information bottleneck theory. CALAMITI learns a disentangled latent space using a unified structure for multi-site harmonization without the need for traveling subjects. Our model is also able to adapt itself to harmonize MR images from a new site with fine tuning solely on images from the new site. Both qualitative and quantitative results show that the proposed method achieves superior performance compared with other unsupervised harmonization approaches.
△ Less
Submitted 24 March, 2021;
originally announced March 2021.