Search | arXiv e-print repository

Text2CT: Towards 3D CT Volume Generation from Free-text Descriptions Using Diffusion Model

Authors: Pengfei Guo, Can Zhao, Dong Yang, Yufan He, Vishwesh Nath, Ziyue Xu, Pedro R. A. S. Bassi, Zongwei Zhou, Benjamin D. Simon, Stephanie Anne Harmon, Baris Turkbey, Daguang Xu

Abstract: Generating 3D CT volumes from descriptive free-text inputs presents a transformative opportunity in diagnostics and research. In this paper, we introduce Text2CT, a novel approach for synthesizing 3D CT volumes from textual descriptions using the diffusion model. Unlike previous methods that rely on fixed-format text input, Text2CT employs a novel prompt formulation that enables generation from di… ▽ More Generating 3D CT volumes from descriptive free-text inputs presents a transformative opportunity in diagnostics and research. In this paper, we introduce Text2CT, a novel approach for synthesizing 3D CT volumes from textual descriptions using the diffusion model. Unlike previous methods that rely on fixed-format text input, Text2CT employs a novel prompt formulation that enables generation from diverse, free-text descriptions. The proposed framework encodes medical text into latent representations and decodes them into high-resolution 3D CT scans, effectively bridging the gap between semantic text inputs and detailed volumetric representations in a unified 3D framework. Our method demonstrates superior performance in preserving anatomical fidelity and capturing intricate structures as described in the input text. Extensive evaluations show that our approach achieves state-of-the-art results, offering promising potential applications in diagnostics, and data augmentation. △ Less

Submitted 7 May, 2025; originally announced May 2025.

arXiv:2504.16578 [pdf, other]

Spontaneous symmetry breaking induced by curvature : Analysis via non-perturbative 2PI Hartree approximation

Authors: Vishal Nath, Kinsuk Roy, Sourav Bhattacharya

Abstract: In this work we investigate the spontaneous symmetry breaking (SSB) induced by a classical background spacetime's curvature, via the 2 particle irreducible (2PI) non-perturbative effective action formalism. We use the standard Schwinger-DeWitt local expansion of the Feynman propagator, appropriate to probe the effect of spacetime curvature on the local or short scale physics. Recently it was shown… ▽ More In this work we investigate the spontaneous symmetry breaking (SSB) induced by a classical background spacetime's curvature, via the 2 particle irreducible (2PI) non-perturbative effective action formalism. We use the standard Schwinger-DeWitt local expansion of the Feynman propagator, appropriate to probe the effect of spacetime curvature on the local or short scale physics. Recently it was shown using perturbative computations that such SSB is possible with a scalar with a quartic self interaction, positive rest mass squared and positive non-minimal coupling. Here we confirm in the two loop Hartree approximation that curvature can indeed induce SSB for such a theory. SSB for such a model is not possible in a flat spacetime. The 2PI technique does not only resum the self energy resulting in mass generation, but also resums, as we have discussed, curvature terms through such mass generation. We have explicitly discussed our results in the context of the de Sitter spacetime, although our calculations are valid for any non-singular curved spacetime. We show that, in contrast to the perturbative results, SSB is possible with a vanishing non-minimal coupling. These results are further extended to the case of an $O(N)$ symmetric scalar field theory. Restoration of the broken symmetry in the thermal case is also briefly discussed. △ Less

Submitted 23 April, 2025; originally announced April 2025.

Comments: v1, 31 pages, 27 figures

arXiv:2503.07480 [pdf, ps, other]

Trapping and Transport of Inertial Particles in a Taylor-Green Vortex: Effects of Added Mass and History Force

Authors: Prabhash Kumar, Anu V. S. Nath, Mahesh Panchagnula, Anubhab Roy

Abstract: We investigate the dynamics of small inertial particles in a two-dimensional, steady Taylor-Green vortex flow. A classic study by Taylor (2022) showed that heavy inertial point particles (having density parameter R = 1) are trapped by the flow separatrices when the particle Stokes number St, which measures the particle's inertia, is less than 1/4. Here, we consider finitely dense particles, incorp… ▽ More We investigate the dynamics of small inertial particles in a two-dimensional, steady Taylor-Green vortex flow. A classic study by Taylor (2022) showed that heavy inertial point particles (having density parameter R = 1) are trapped by the flow separatrices when the particle Stokes number St, which measures the particle's inertia, is less than 1/4. Here, we consider finitely dense particles, incorporating the previously neglected effects of added mass and the Boussinesq-Basset history force. Using linear stability analysis near stagnation points, we determine the critical parametric conditions in the St-R plane that leads to particle trapping within vortex cells. We identify additional stagnation points perceived by inertial particles, beyond the traditional ones at vortex cell corners, when the added mass effect is included, and we analyze their stability. Numerical analysis of the full nonlinear system confirms the existence of distinct particle behaviours--trapped, diffusive, and ballistic--depending on initial conditions, consistent with Nath et al. (2024), with modifications due to added mass effect. We delineate the regions in the St-R plane where these behaviours dominate based on the prominent particle dynamics. However, when both the history force and added mass effect are included, all particles exhibit ballistic motion regardless of St and R. △ Less

Submitted 10 March, 2025; originally announced March 2025.

Comments: 21 pages, 10 figures

arXiv:2501.01290 [pdf, other]

ToolComp: A Multi-Tool Reasoning & Process Supervision Benchmark

Authors: Vaskar Nath, Pranav Raja, Claire Yoon, Sean Hendryx

Abstract: Despite recent advances in AI, the development of systems capable of executing complex, multi-step reasoning tasks involving multiple tools remains a significant challenge. Current benchmarks fall short in capturing the real-world complexity of tool-use reasoning, where verifying the correctness of not only the final answer but also the intermediate steps is important for evaluation, development,… ▽ More Despite recent advances in AI, the development of systems capable of executing complex, multi-step reasoning tasks involving multiple tools remains a significant challenge. Current benchmarks fall short in capturing the real-world complexity of tool-use reasoning, where verifying the correctness of not only the final answer but also the intermediate steps is important for evaluation, development, and identifying failures during inference time. To bridge this gap, we introduce ToolComp, a comprehensive benchmark designed to evaluate multi-step tool-use reasoning. ToolComp is developed through a collaboration between models and human annotators, featuring human-edited/verified prompts, final answers, and process supervision labels, allowing for the evaluation of both final outcomes and intermediate reasoning. Evaluation across six different model families demonstrates the challenging nature of our dataset, with the majority of models achieving less than 50% accuracy. Additionally, we generate synthetic training data to compare the performance of outcome-supervised reward models (ORMs) with process-supervised reward models (PRMs) to assess their ability to improve complex tool-use reasoning as evaluated by ToolComp. Our results show that PRMs generalize significantly better than ORMs, achieving a 19% and 11% improvement in rank@1 accuracy for ranking base and fine-tuned model trajectories, respectively. These findings highlight the critical role of process supervision in both the evaluation and training of AI models, paving the way for more robust and capable systems in complex, multi-step tool-use tasks. △ Less

Submitted 2 January, 2025; originally announced January 2025.

arXiv:2412.04468 [pdf, other]

NVILA: Efficient Frontier Visual Language Models

Authors: Zhijian Liu, Ligeng Zhu, Baifeng Shi, Zhuoyang Zhang, Yuming Lou, Shang Yang, Haocheng Xi, Shiyi Cao, Yuxian Gu, Dacheng Li, Xiuyu Li, Yunhao Fang, Yukang Chen, Cheng-Yu Hsieh, De-An Huang, An-Chieh Cheng, Vishwesh Nath, Jinyi Hu, Sifei Liu, Ranjay Krishna, Daguang Xu, Xiaolong Wang, Pavlo Molchanov, Jan Kautz, Hongxu Yin , et al. (2 additional authors not shown)

Abstract: Visual language models (VLMs) have made significant advances in accuracy in recent years. However, their efficiency has received much less attention. This paper introduces NVILA, a family of open VLMs designed to optimize both efficiency and accuracy. Building on top of VILA, we improve its model architecture by first scaling up the spatial and temporal resolutions, and then compressing visual tok… ▽ More Visual language models (VLMs) have made significant advances in accuracy in recent years. However, their efficiency has received much less attention. This paper introduces NVILA, a family of open VLMs designed to optimize both efficiency and accuracy. Building on top of VILA, we improve its model architecture by first scaling up the spatial and temporal resolutions, and then compressing visual tokens. This "scale-then-compress" approach enables NVILA to efficiently process high-resolution images and long videos. We also conduct a systematic investigation to enhance the efficiency of NVILA throughout its entire lifecycle, from training and fine-tuning to deployment. NVILA matches or surpasses the accuracy of many leading open and proprietary VLMs across a wide range of image and video benchmarks. At the same time, it reduces training costs by 4.5X, fine-tuning memory usage by 3.4X, pre-filling latency by 1.6-2.2X, and decoding latency by 1.2-2.8X. We will soon make our code and models available to facilitate reproducibility. △ Less

Submitted 5 March, 2025; v1 submitted 5 December, 2024; originally announced December 2024.

arXiv:2411.12915 [pdf, other]

VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge

Authors: Vishwesh Nath, Wenqi Li, Dong Yang, Andriy Myronenko, Mingxin Zheng, Yao Lu, Zhijian Liu, Hongxu Yin, Yucheng Tang, Pengfei Guo, Can Zhao, Ziyue Xu, Yufan He, Greg Heinrich, Yee Man Law, Benjamin Simon, Stephanie Harmon, Stephen Aylward, Marc Edgar, Michael Zephyr, Song Han, Pavlo Molchanov, Baris Turkbey, Holger Roth, Daguang Xu

Abstract: Generalist vision language models (VLMs) have made significant strides in computer vision, but they fall short in specialized fields like healthcare, where expert knowledge is essential. In traditional computer vision tasks, creative or approximate answers may be acceptable, but in healthcare, precision is paramount.Current large multimodal models like Gemini and GPT-4o are insufficient for medica… ▽ More Generalist vision language models (VLMs) have made significant strides in computer vision, but they fall short in specialized fields like healthcare, where expert knowledge is essential. In traditional computer vision tasks, creative or approximate answers may be acceptable, but in healthcare, precision is paramount.Current large multimodal models like Gemini and GPT-4o are insufficient for medical tasks due to their reliance on memorized internet knowledge rather than the nuanced expertise required in healthcare. VLMs are usually trained in three stages: vision pre-training, vision-language pre-training, and instruction fine-tuning (IFT). IFT has been typically applied using a mixture of generic and healthcare data. In contrast, we propose that for medical VLMs, a fourth stage of specialized IFT is necessary, which focuses on medical data and includes information from domain expert models. Domain expert models developed for medical use are crucial because they are specifically trained for certain clinical tasks, e.g. to detect tumors and classify abnormalities through segmentation and classification, which learn fine-grained features of medical data$-$features that are often too intricate for a VLM to capture effectively especially in radiology. This paper introduces a new framework, VILA-M3, for medical VLMs that utilizes domain knowledge via expert models. Through our experiments, we show an improved state-of-the-art (SOTA) performance with an average improvement of ~9% over the prior SOTA model Med-Gemini and ~6% over models trained on the specific tasks. Our approach emphasizes the importance of domain expertise in creating precise, reliable VLMs for medical applications. △ Less

Submitted 4 March, 2025; v1 submitted 19 November, 2024; originally announced November 2024.

arXiv:2411.09618 [pdf, other]

doi 10.59275/j.melba.2024-9c68

MICCAI-CDMRI 2023 QuantConn Challenge Findings on Achieving Robust Quantitative Connectivity through Harmonized Preprocessing of Diffusion MRI

Authors: Nancy R. Newlin, Kurt Schilling, Serge Koudoro, Bramsh Qamar Chandio, Praitayini Kanakaraj, Daniel Moyer, Claire E. Kelly, Sila Genc, Jian Chen, Joseph Yuan-Mou Yang, Ye Wu, Yifei He, Jiawei Zhang, Qingrun Zeng, Fan Zhang, Nagesh Adluru, Vishwesh Nath, Sudhir Pathak, Walter Schneider, Anurag Gade, Yogesh Rathi, Tom Hendriks, Anna Vilanova, Maxime Chamberland, Tomasz Pieciak , et al. (11 additional authors not shown)

Abstract: White matter alterations are increasingly implicated in neurological diseases and their progression. International-scale studies use diffusion-weighted magnetic resonance imaging (DW-MRI) to qualitatively identify changes in white matter microstructure and connectivity. Yet, quantitative analysis of DW-MRI data is hindered by inconsistencies stemming from varying acquisition protocols. There is a… ▽ More White matter alterations are increasingly implicated in neurological diseases and their progression. International-scale studies use diffusion-weighted magnetic resonance imaging (DW-MRI) to qualitatively identify changes in white matter microstructure and connectivity. Yet, quantitative analysis of DW-MRI data is hindered by inconsistencies stemming from varying acquisition protocols. There is a pressing need to harmonize the preprocessing of DW-MRI datasets to ensure the derivation of robust quantitative diffusion metrics across acquisitions. In the MICCAI-CDMRI 2023 QuantConn challenge, participants were provided raw data from the same individuals collected on the same scanner but with two different acquisitions and tasked with preprocessing the DW-MRI to minimize acquisition differences while retaining biological variation. Submissions are evaluated on the reproducibility and comparability of cross-acquisition bundle-wise microstructure measures, bundle shape features, and connectomics. The key innovations of the QuantConn challenge are that (1) we assess bundles and tractography in the context of harmonization for the first time, (2) we assess connectomics in the context of harmonization for the first time, and (3) we have 10x additional subjects over prior harmonization challenge, MUSHAC and 100x over SuperMUDI. We find that bundle surface area, fractional anisotropy, connectome assortativity, betweenness centrality, edge count, modularity, nodal strength, and participation coefficient measures are most biased by acquisition and that machine learning voxel-wise correction, RISH mapping, and NeSH methods effectively reduce these biases. In addition, microstructure measures AD, MD, RD, bundle length, connectome density, efficiency, and path length are least biased by these acquisition differences. △ Less

Submitted 14 November, 2024; originally announced November 2024.

Comments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org/2024/019

Journal ref: Machine.Learning.for.Biomedical.Imaging. 2 (2024)

arXiv:2410.06563 [pdf, other]

Self interacting scalar field theory in general curved spacetimes at zero and finite temperature revisited

Authors: Vishal Nath, Sourav Bhattacharya

Abstract: We revisit the problem of spontaneous symmetry breaking (SSB), its restoration, and phase transition for a self interacting quantum scalar field in a general curved background, at zero and finite temperature. To the best of our knowledge, most of the earlier computations in this context have been done in the linear order in curvature, which may not be very suitable for the Ricci flat spacetimes. O… ▽ More We revisit the problem of spontaneous symmetry breaking (SSB), its restoration, and phase transition for a self interacting quantum scalar field in a general curved background, at zero and finite temperature. To the best of our knowledge, most of the earlier computations in this context have been done in the linear order in curvature, which may not be very suitable for the Ricci flat spacetimes. One of our objectives is to see whether the higher order terms can bring in qualitatively new physical effects, and thereby attempting to fill in this gap in the literature. We use Bunch and Parker's local momentum space representation of the Schwinger-DeWitt expansion of the Feynman propagator. Such expansion, being based upon the local Lorentz symmetry of spacetime, essentially probes the leading curvature correction to short scale, ultraviolet quantum processes. We compute the renormalised, background spacetime curvature (up to quadratic order) and temperature dependent one loop effective potential for $φ^4$ plus $φ^3$ self interaction. In particular for the de Sitter spacetime, we have shown for the $φ^4$-theory that we can have SSB even with a positive rest mass squared and positive non-minimal coupling, at zero temperature. This cannot be achieved by the linear curvature term alone and the result remains valid for a very large range of renormalisation scale. Such SSB will generate a field mass that depends upon the spacetime curvature as well as the non-minimal coupling. For a phase transition, we have computed the leading curvature correction to the critical temperature. At finite temperature, symmetry restoration is demonstrated. We also extend some of the above results to two loop level. The symmetry breaking in de Sitter at two loop remains present. We have further motivated the necessity of treating this problem non-perturbatively in some instances. △ Less

Submitted 20 March, 2025; v1 submitted 9 October, 2024; originally announced October 2024.

Comments: v2; 37pp, 16 figs; added references, discussions and many clarifications; improved presentation; accepted in PRD

arXiv:2410.03717 [pdf, other]

Revisiting the Superficial Alignment Hypothesis

Authors: Mohit Raghavendra, Vaskar Nath, Sean Hendryx

Abstract: The Superficial Alignment Hypothesis posits that almost all of a language model's abilities and knowledge are learned during pre-training, while post-training is about giving a model the right style and format. We re-examine these claims by empirically studying the scaling behavior of post-training with increasing finetuning examples and evaluating them using objective task-specific standardized b… ▽ More The Superficial Alignment Hypothesis posits that almost all of a language model's abilities and knowledge are learned during pre-training, while post-training is about giving a model the right style and format. We re-examine these claims by empirically studying the scaling behavior of post-training with increasing finetuning examples and evaluating them using objective task-specific standardized benchmarks. Through experiments with the Llama-3, Mistral, and Llama-2 model families of multiple sizes, we observe that, similar to the pre-training scaling laws, post-training task performance scales as a power law against the number of finetuning examples. This power law relationship holds across a broad array of capabilities, including mathematical reasoning, coding, instruction following, and multihop-reasoning. In addition, for tasks like math and multihop reasoning, we observe that a handful of examples merely align the model stylistically but do not saturate performance on the benchmarks. Model performance is instead correlated with its reasoning ability and it improves significantly with more examples, illustrating the need for holistic evaluation programs leveraging objective benchmarks in addition to measurement of alignment to human preferences. We also observe that language models are not necessarily limited to using knowledge learned during pre-training. With appropriate post-training, a model's ability to integrate new knowledge greatly improves on downstream tasks like multihop question-answering. Taken together, these results shed new light on the Superficial Alignment Hypothesis, suggesting that it is, at best, an over-simplification. △ Less

Submitted 27 September, 2024; originally announced October 2024.

arXiv:2409.11169 [pdf, other]

MAISI: Medical AI for Synthetic Imaging

Authors: Pengfei Guo, Can Zhao, Dong Yang, Ziyue Xu, Vishwesh Nath, Yucheng Tang, Benjamin Simon, Mason Belue, Stephanie Harmon, Baris Turkbey, Daguang Xu

Abstract: Medical imaging analysis faces challenges such as data scarcity, high annotation costs, and privacy concerns. This paper introduces the Medical AI for Synthetic Imaging (MAISI), an innovative approach using the diffusion model to generate synthetic 3D computed tomography (CT) images to address those challenges. MAISI leverages the foundation volume compression network and the latent diffusion mode… ▽ More Medical imaging analysis faces challenges such as data scarcity, high annotation costs, and privacy concerns. This paper introduces the Medical AI for Synthetic Imaging (MAISI), an innovative approach using the diffusion model to generate synthetic 3D computed tomography (CT) images to address those challenges. MAISI leverages the foundation volume compression network and the latent diffusion model to produce high-resolution CT images (up to a landmark volume dimension of 512 x 512 x 768 ) with flexible volume dimensions and voxel spacing. By incorporating ControlNet, MAISI can process organ segmentation, including 127 anatomical structures, as additional conditions and enables the generation of accurately annotated synthetic images that can be used for various downstream tasks. Our experiment results show that MAISI's capabilities in generating realistic, anatomically accurate images for diverse regions and conditions reveal its promising potential to mitigate challenges using synthetic data. △ Less

Submitted 29 October, 2024; v1 submitted 13 September, 2024; originally announced September 2024.

Comments: WACV25 accepted. https://monai.io/research/maisi

arXiv:2409.03733 [pdf, other]

Planning In Natural Language Improves LLM Search For Code Generation

Authors: Evan Wang, Federico Cassano, Catherine Wu, Yunfeng Bai, Will Song, Vaskar Nath, Ziwen Han, Sean Hendryx, Summer Yue, Hugh Zhang

Abstract: While scaling training compute has led to remarkable improvements in large language models (LLMs), scaling inference compute has not yet yielded analogous gains. We hypothesize that a core missing component is a lack of diverse LLM outputs, leading to inefficient search due to models repeatedly sampling highly similar, yet incorrect generations. We empirically demonstrate that this lack of diversi… ▽ More While scaling training compute has led to remarkable improvements in large language models (LLMs), scaling inference compute has not yet yielded analogous gains. We hypothesize that a core missing component is a lack of diverse LLM outputs, leading to inefficient search due to models repeatedly sampling highly similar, yet incorrect generations. We empirically demonstrate that this lack of diversity can be mitigated by searching over candidate plans for solving a problem in natural language. Based on this insight, we propose PlanSearch, a novel search algorithm which shows strong results across HumanEval+, MBPP+, and LiveCodeBench (a contamination-free benchmark for competitive coding). PlanSearch generates a diverse set of observations about the problem and then uses these observations to construct plans for solving the problem. By searching over plans in natural language rather than directly over code solutions, PlanSearch explores a significantly more diverse range of potential solutions compared to baseline search methods. Using PlanSearch on top of Claude 3.5 Sonnet achieves a state-of-the-art pass@200 of 77.0% on LiveCodeBench, outperforming both the best score achieved without search (pass@1 = 41.4%) and using standard repeated sampling (pass@200 = 60.6%). Finally, we show that, across all models, search algorithms, and benchmarks analyzed, we can accurately predict performance gains due to search as a direct function of the diversity over generated ideas. Code can be found at https://github.com/scaleapi/plansearch. △ Less

Submitted 18 October, 2024; v1 submitted 5 September, 2024; originally announced September 2024.

arXiv:2408.11210 [pdf, other]

A Short Review and Evaluation of SAM2's Performance in 3D CT Image Segmentation

Authors: Yufan He, Pengfei Guo, Yucheng Tang, Andriy Myronenko, Vishwesh Nath, Ziyue Xu, Dong Yang, Can Zhao, Daguang Xu, Wenqi Li

Abstract: Since the release of Segment Anything 2 (SAM2), the medical imaging community has been actively evaluating its performance for 3D medical image segmentation. However, different studies have employed varying evaluation pipelines, resulting in conflicting outcomes that obscure a clear understanding of SAM2's capabilities and potential applications. We shortly review existing benchmarks and point out… ▽ More Since the release of Segment Anything 2 (SAM2), the medical imaging community has been actively evaluating its performance for 3D medical image segmentation. However, different studies have employed varying evaluation pipelines, resulting in conflicting outcomes that obscure a clear understanding of SAM2's capabilities and potential applications. We shortly review existing benchmarks and point out that the SAM2 paper clearly outlines a zero-shot evaluation pipeline, which simulates user clicks iteratively for up to eight iterations. We reproduced this interactive annotation simulation on 3D CT datasets and provided the results and code~\url{https://github.com/Project-MONAI/VISTA}. Our findings reveal that directly applying SAM2 on 3D medical imaging in a zero-shot manner is far from satisfactory. It is prone to generating false positives when foreground objects disappear, and annotating more slices cannot fully offset this tendency. For smaller single-connected objects like kidney and aorta, SAM2 performs reasonably well but for most organs it is still far behind state-of-the-art 3D annotation methods. More research and innovation are needed for 3D medical imaging community to use SAM2 correctly. △ Less

Submitted 20 August, 2024; originally announced August 2024.

arXiv:2407.13887 [pdf, other]

Learning Goal-Conditioned Representations for Language Reward Models

Authors: Vaskar Nath, Dylan Slack, Jeff Da, Yuntao Ma, Hugh Zhang, Spencer Whitehead, Sean Hendryx

Abstract: Techniques that learn improved representations via offline data or self-supervised objectives have shown impressive results in traditional reinforcement learning (RL). Nevertheless, it is unclear how improved representation learning can benefit reinforcement learning from human feedback (RLHF) on language models (LMs). In this work, we propose training reward models (RMs) in a contrastive,… ▽ More Techniques that learn improved representations via offline data or self-supervised objectives have shown impressive results in traditional reinforcement learning (RL). Nevertheless, it is unclear how improved representation learning can benefit reinforcement learning from human feedback (RLHF) on language models (LMs). In this work, we propose training reward models (RMs) in a contrastive, $\textit{goal-conditioned}$ fashion by increasing the representation similarity of future states along sampled preferred trajectories and decreasing the similarity along randomly sampled dispreferred trajectories. This objective significantly improves RM performance by up to 0.09 AUROC across challenging benchmarks, such as MATH and GSM8k. These findings extend to general alignment as well -- on the Helpful-Harmless dataset, we observe $2.3\%$ increase in accuracy. Beyond improving reward model performance, we show this way of training RM representations enables improved $\textit{steerability}$ because it allows us to evaluate the likelihood of an action achieving a particular goal-state (e.g., whether a solution is correct or helpful). Leveraging this insight, we find that we can filter up to $55\%$ of generated tokens during majority voting by discarding trajectories likely to end up in an "incorrect" state, which leads to significant cost savings. We additionally find that these representations can perform fine-grained control by conditioning on desired future goal-states. For example, we show that steering a Llama 3 model towards helpful generations with our approach improves helpfulness by $9.6\%$ over a supervised-fine-tuning trained baseline. Similarly, steering the model towards complex generations improves complexity by $21.6\%$ over the baseline. Overall, we find that training RMs in this contrastive, goal-conditioned fashion significantly improves performance and enables model steerability. △ Less

Submitted 23 October, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

arXiv:2407.03307 [pdf, other]

HoloHisto: End-to-end Gigapixel WSI Segmentation with 4K Resolution Sequential Tokenization

Authors: Yucheng Tang, Yufan He, Vishwesh Nath, Pengfeig Guo, Ruining Deng, Tianyuan Yao, Quan Liu, Can Cui, Mengmeng Yin, Ziyue Xu, Holger Roth, Daguang Xu, Haichun Yang, Yuankai Huo

Abstract: In digital pathology, the traditional method for deep learning-based image segmentation typically involves a two-stage process: initially segmenting high-resolution whole slide images (WSI) into smaller patches (e.g., 256x256, 512x512, 1024x1024) and subsequently reconstructing them to their original scale. This method often struggles to capture the complex details and vast scope of WSIs. In this… ▽ More In digital pathology, the traditional method for deep learning-based image segmentation typically involves a two-stage process: initially segmenting high-resolution whole slide images (WSI) into smaller patches (e.g., 256x256, 512x512, 1024x1024) and subsequently reconstructing them to their original scale. This method often struggles to capture the complex details and vast scope of WSIs. In this paper, we propose the holistic histopathology (HoloHisto) segmentation method to achieve end-to-end segmentation on gigapixel WSIs, whose maximum resolution is above 80,000$\times$70,000 pixels. HoloHisto fundamentally shifts the paradigm of WSI segmentation to an end-to-end learning fashion with 1) a large (4K) resolution base patch for elevated visual information inclusion and efficient processing, and 2) a novel sequential tokenization mechanism to properly model the contextual relationships and efficiently model the rich information from the 4K input. To our best knowledge, HoloHisto presents the first holistic approach for gigapixel resolution WSI segmentation, supporting direct I/O of complete WSI and their corresponding gigapixel masks. Under the HoloHisto platform, we unveil a random 4K sampler that transcends ultra-high resolution, delivering 31 and 10 times more pixels than standard 2D and 3D patches, respectively, for advancing computational capabilities. To facilitate efficient 4K resolution dense prediction, we leverage sequential tokenization, utilizing a pre-trained image tokenizer to group image features into a discrete token grid. To assess the performance, our team curated a new kidney pathology image segmentation (KPIs) dataset with WSI-level glomeruli segmentation from whole mouse kidneys. From the results, HoloHisto-4K delivers remarkable performance gains over previous state-of-the-art models. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.02604 [pdf, other]

D-Rax: Domain-specific Radiologic assistant leveraging multi-modal data and eXpert model predictions

Authors: Hareem Nisar, Syed Muhammad Anwar, Zhifan Jiang, Abhijeet Parida, Ramon Sanchez-Jacob, Vishwesh Nath, Holger R. Roth, Marius George Linguraru

Abstract: Large vision language models (VLMs) have progressed incredibly from research to applicability for general-purpose use cases. LLaVA-Med, a pioneering large language and vision assistant for biomedicine, can perform multi-modal biomedical image and data analysis to provide a natural language interface for radiologists. While it is highly generalizable and works with multi-modal data, it is currently… ▽ More Large vision language models (VLMs) have progressed incredibly from research to applicability for general-purpose use cases. LLaVA-Med, a pioneering large language and vision assistant for biomedicine, can perform multi-modal biomedical image and data analysis to provide a natural language interface for radiologists. While it is highly generalizable and works with multi-modal data, it is currently limited by well-known challenges that exist in the large language model space. Hallucinations and imprecision in responses can lead to misdiagnosis which currently hinder the clinical adaptability of VLMs. To create precise, user-friendly models in healthcare, we propose D-Rax -- a domain-specific, conversational, radiologic assistance tool that can be used to gain insights about a particular radiologic image. In this study, we enhance the conversational analysis of chest X-ray (CXR) images to support radiological reporting, offering comprehensive insights from medical imaging and aiding in the formulation of accurate diagnosis. D-Rax is achieved by fine-tuning the LLaVA-Med architecture on our curated enhanced instruction-following data, comprising of images, instructions, as well as disease diagnosis and demographic predictions derived from MIMIC-CXR imaging data, CXR-related visual question answer (VQA) pairs, and predictive outcomes from multiple expert AI models. We observe statistically significant improvement in responses when evaluated for both open and close-ended conversations. Leveraging the power of state-of-the-art diagnostic models combined with VLMs, D-Rax empowers clinicians to interact with medical images using natural language, which could potentially streamline their decision-making process, enhance diagnostic accuracy, and conserve their time. △ Less

Submitted 2 August, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

Comments: accepted to the MICCAI 2024 Second International Workshop on Foundation Models for General Medical AI

arXiv:2406.05285 [pdf, other]

VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging

Authors: Yufan He, Pengfei Guo, Yucheng Tang, Andriy Myronenko, Vishwesh Nath, Ziyue Xu, Dong Yang, Can Zhao, Benjamin Simon, Mason Belue, Stephanie Harmon, Baris Turkbey, Daguang Xu, Wenqi Li

Abstract: Foundation models for interactive segmentation in 2D natural images and videos have sparked significant interest in building 3D foundation models for medical imaging. However, the domain gaps and clinical use cases for 3D medical imaging require a dedicated model that diverges from existing 2D solutions. Specifically, such foundation models should support a full workflow that can actually reduce h… ▽ More Foundation models for interactive segmentation in 2D natural images and videos have sparked significant interest in building 3D foundation models for medical imaging. However, the domain gaps and clinical use cases for 3D medical imaging require a dedicated model that diverges from existing 2D solutions. Specifically, such foundation models should support a full workflow that can actually reduce human effort. Treating 3D medical images as sequences of 2D slices and reusing interactive 2D foundation models seems straightforward, but 2D annotation is too time-consuming for 3D tasks. Moreover, for large cohort analysis, it's the highly accurate automatic segmentation models that reduce the most human effort. However, these models lack support for interactive corrections and lack zero-shot ability for novel structures, which is a key feature of "foundation". While reusing pre-trained 2D backbones in 3D enhances zero-shot potential, their performance on complex 3D structures still lags behind leading 3D models. To address these issues, we present VISTA3D, Versatile Imaging SegmenTation and Annotation model, that targets to solve all these challenges and requirements with one unified foundation model. VISTA3D is built on top of the well-established 3D segmentation pipeline, and it is the first model to achieve state-of-the-art performance in both 3D automatic (supporting 127 classes) and 3D interactive segmentation, even when compared with top 3D expert models on large and diverse benchmarks. Additionally, VISTA3D's 3D interactive design allows efficient human correction, and a novel 3D supervoxel method that distills 2D pretrained backbones grants VISTA3D top 3D zero-shot performance. We believe the model, recipe, and insights represent a promising step towards a clinically useful 3D foundation model. Code and weights are publicly available at https://github.com/Project-MONAI/VISTA. △ Less

Submitted 21 November, 2024; v1 submitted 7 June, 2024; originally announced June 2024.

arXiv:2405.17824 [pdf, other]

mTREE: Multi-Level Text-Guided Representation End-to-End Learning for Whole Slide Image Analysis

Authors: Quan Liu, Ruining Deng, Can Cui, Tianyuan Yao, Vishwesh Nath, Yucheng Tang, Yuankai Huo

Abstract: Multi-modal learning adeptly integrates visual and textual data, but its application to histopathology image and text analysis remains challenging, particularly with large, high-resolution images like gigapixel Whole Slide Images (WSIs). Current methods typically rely on manual region labeling or multi-stage learning to assemble local representations (e.g., patch-level) into global features (e.g.,… ▽ More Multi-modal learning adeptly integrates visual and textual data, but its application to histopathology image and text analysis remains challenging, particularly with large, high-resolution images like gigapixel Whole Slide Images (WSIs). Current methods typically rely on manual region labeling or multi-stage learning to assemble local representations (e.g., patch-level) into global features (e.g., slide-level). However, there is no effective way to integrate multi-scale image representations with text data in a seamless end-to-end process. In this study, we introduce Multi-Level Text-Guided Representation End-to-End Learning (mTREE). This novel text-guided approach effectively captures multi-scale WSI representations by utilizing information from accompanying textual pathology information. mTREE innovatively combines - the localization of key areas (global-to-local) and the development of a WSI-level image-text representation (local-to-global) - into a unified, end-to-end learning framework. In this model, textual information serves a dual purpose: firstly, functioning as an attention map to accurately identify key areas, and secondly, acting as a conduit for integrating textual features into the comprehensive representation of the image. Our study demonstrates the effectiveness of mTREE through quantitative analyses in two image-related tasks: classification and survival prediction, showcasing its remarkable superiority over baselines. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.05539 [pdf, other]

Instability of a dusty shear flow

Authors: Anu V. S. Nath, Anubhab Roy, M. Houssem Kasbaoui

Abstract: We study the instability of a dusty simple shear flow where the dust particles are distributed non-uniformly. A simple shear flow is modally stable to infinitesimal perturbations. Also, a band of particles remains unaffected in the absence of any background flow. However, we demonstrate that the combined scenario -- comprising a simple shear flow with a localised band of particles -- can exhibit d… ▽ More We study the instability of a dusty simple shear flow where the dust particles are distributed non-uniformly. A simple shear flow is modally stable to infinitesimal perturbations. Also, a band of particles remains unaffected in the absence of any background flow. However, we demonstrate that the combined scenario -- comprising a simple shear flow with a localised band of particles -- can exhibit destabilisation due to their two-way interaction. The instability originates solely from the momentum feedback from the particle phase to the fluid phase. Eulerian-Lagrangian simulations are employed to illustrate the existence of this instability. Furthermore, the results are compared with a linear stability analysis of the system using an Eulerian-Eulerian model. Our findings indicate that the instability has an inviscid origin and is characterised by a critical wavelength below which it is not persistent. We have observed that increasing particle inertia dampens the unstable modes, whereas the strength of the instability increases with the strength of the coupling between the fluid and particle phases. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: 37 pages, 13 figures

arXiv:2403.10011 [pdf, other]

Clustering and chaotic motion of heavy inertial particles in an isolated non-axisymmetric vortex

Authors: Anu V. S. Nath, Anubhab Roy

Abstract: We investigate the dynamics of heavy inertial particles in a flow field due to an isolated, non-axisymmetric vortex. For our study, we consider a canonical elliptical vortex - the Kirchhoff vortex and its strained variant, the Kida vortex. Contrary to the anticipated centrifugal dispersion of inertial particles, which is typical in open vortical flows, we observe the clustering of particles around… ▽ More We investigate the dynamics of heavy inertial particles in a flow field due to an isolated, non-axisymmetric vortex. For our study, we consider a canonical elliptical vortex - the Kirchhoff vortex and its strained variant, the Kida vortex. Contrary to the anticipated centrifugal dispersion of inertial particles, which is typical in open vortical flows, we observe the clustering of particles around co-rotating attractors near the Kirchhoff vortex due to its non-axisymmetric nature. We analyze the inertia-modified stability characteristics of the fixed points, highlighting how some of the fixed points migrate in physical space, collide and then annihilate with increasing particle inertia. The introduction of external straining, the Kida vortex being an example, introduces chaotic tracer transport. Using a Melnikov analysis, we show that particle inertia and external straining can compete, where chaotic transport can be suppressed beyond a critical value of particle inertia. △ Less

Submitted 18 September, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

Comments: 45 pages, 24 figures

arXiv:2307.16896 [pdf, other]

Disruptive Autoencoders: Leveraging Low-level features for 3D Medical Image Pre-training

Authors: Jeya Maria Jose Valanarasu, Yucheng Tang, Dong Yang, Ziyue Xu, Can Zhao, Wenqi Li, Vishal M. Patel, Bennett Landman, Daguang Xu, Yufan He, Vishwesh Nath

Abstract: Harnessing the power of pre-training on large-scale datasets like ImageNet forms a fundamental building block for the progress of representation learning-driven solutions in computer vision. Medical images are inherently different from natural images as they are acquired in the form of many modalities (CT, MR, PET, Ultrasound etc.) and contain granulated information like tissue, lesion, organs etc… ▽ More Harnessing the power of pre-training on large-scale datasets like ImageNet forms a fundamental building block for the progress of representation learning-driven solutions in computer vision. Medical images are inherently different from natural images as they are acquired in the form of many modalities (CT, MR, PET, Ultrasound etc.) and contain granulated information like tissue, lesion, organs etc. These characteristics of medical images require special attention towards learning features representative of local context. In this work, we focus on designing an effective pre-training framework for 3D radiology images. First, we propose a new masking strategy called local masking where the masking is performed across channel embeddings instead of tokens to improve the learning of local feature representations. We combine this with classical low-level perturbations like adding noise and downsampling to further enable low-level representation learning. To this end, we introduce Disruptive Autoencoders, a pre-training framework that attempts to reconstruct the original image from disruptions created by a combination of local masking and low-level perturbations. Additionally, we also devise a cross-modal contrastive loss (CMCL) to accommodate the pre-training of multiple modalities in a single framework. We curate a large-scale dataset to enable pre-training of 3D medical radiology images (MRI and CT). The proposed pre-training framework is tested across multiple downstream tasks and achieves state-of-the-art performance. Notably, our proposed method tops the public test leaderboard of BTCV multi-organ segmentation challenge. △ Less

Submitted 31 July, 2023; originally announced July 2023.

Comments: Preprint

arXiv:2307.12004 [pdf, other]

COLosSAL: A Benchmark for Cold-start Active Learning for 3D Medical Image Segmentation

Authors: Han Liu, Hao Li, Xing Yao, Yubo Fan, Dewei Hu, Benoit Dawant, Vishwesh Nath, Zhoubing Xu, Ipek Oguz

Abstract: Medical image segmentation is a critical task in medical image analysis. In recent years, deep learning based approaches have shown exceptional performance when trained on a fully-annotated dataset. However, data annotation is often a significant bottleneck, especially for 3D medical images. Active learning (AL) is a promising solution for efficient annotation but requires an initial set of labele… ▽ More Medical image segmentation is a critical task in medical image analysis. In recent years, deep learning based approaches have shown exceptional performance when trained on a fully-annotated dataset. However, data annotation is often a significant bottleneck, especially for 3D medical images. Active learning (AL) is a promising solution for efficient annotation but requires an initial set of labeled samples to start active selection. When the entire data pool is unlabeled, how do we select the samples to annotate as our initial set? This is also known as the cold-start AL, which permits only one chance to request annotations from experts without access to previously annotated data. Cold-start AL is highly relevant in many practical scenarios but has been under-explored, especially for 3D medical segmentation tasks requiring substantial annotation effort. In this paper, we present a benchmark named COLosSAL by evaluating six cold-start AL strategies on five 3D medical image segmentation tasks from the public Medical Segmentation Decathlon collection. We perform a thorough performance analysis and explore important open questions for cold-start AL, such as the impact of budget on different strategies. Our results show that cold-start AL is still an unsolved problem for 3D segmentation tasks but some important trends have been observed. The code repository, data partitions, and baseline results for the complete benchmark are publicly available at https://github.com/MedICL-VU/COLosSAL. △ Less

Submitted 22 July, 2023; originally announced July 2023.

Comments: Accepted by MICCAI 2023

arXiv:2306.02900 [pdf, other]

Robust Fiber Orientation Distribution Function Estimation Using Deep Constrained Spherical Deconvolution for Diffusion MRI

Authors: Tianyuan Yao, Francois Rheault, Leon Y Cai, Vishwesh nath, Zuhayr Asad, Nancy Newlin, Can Cui, Ruining Deng, Karthik Ramadass, Andrea Shafer, Susan Resnick, Kurt Schilling, Bennett A. Landman, Yuankai Huo

Abstract: Diffusion-weighted magnetic resonance imaging (DW-MRI) is a critical imaging method for capturing and modeling tissue microarchitecture at a millimeter scale. A common practice to model the measured DW-MRI signal is via fiber orientation distribution function (fODF). This function is the essential first step for the downstream tractography and connectivity analyses. With recent advantages in data… ▽ More Diffusion-weighted magnetic resonance imaging (DW-MRI) is a critical imaging method for capturing and modeling tissue microarchitecture at a millimeter scale. A common practice to model the measured DW-MRI signal is via fiber orientation distribution function (fODF). This function is the essential first step for the downstream tractography and connectivity analyses. With recent advantages in data sharing, large-scale multi-site DW-MRI datasets are being made available for multi-site studies. However, measurement variabilities (e.g., inter- and intra-site variability, hardware performance, and sequence design) are inevitable during the acquisition of DW-MRI. Most existing model-based methods (e.g., constrained spherical deconvolution (CSD)) and learning based methods (e.g., deep learning (DL)) do not explicitly consider such variabilities in fODF modeling, which consequently leads to inferior performance on multi-site and/or longitudinal diffusion studies. In this paper, we propose a novel data-driven deep constrained spherical deconvolution method to explicitly constrain the scan-rescan variabilities for a more reproducible and robust estimation of brain microstructure from repeated DW-MRI scans. Specifically, the proposed method introduces a new 3D volumetric scanner-invariant regularization scheme during the fODF estimation. We study the Human Connectome Project (HCP) young adults test-retest group as well as the MASiVar dataset (with inter- and intra-site scan/rescan data). The Baltimore Longitudinal Study of Aging (BLSA) dataset is employed for external validation. From the experimental results, the proposed data-driven framework outperforms the existing benchmarks in repeated fODF estimation. The proposed method is assessing the downstream connectivity analysis and shows increased performance in distinguishing subjects with different biomarkers. △ Less

Submitted 3 December, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

Comments: 33 pages, 7 figures

arXiv:2305.10655 [pdf, other]

doi 10.1007/978-3-031-17027-0_2

DeepEdit: Deep Editable Learning for Interactive Segmentation of 3D Medical Images

Authors: Andres Diaz-Pinto, Pritesh Mehta, Sachidanand Alle, Muhammad Asad, Richard Brown, Vishwesh Nath, Alvin Ihsani, Michela Antonelli, Daniel Palkovics, Csaba Pinter, Ron Alkalay, Steve Pieper, Holger R. Roth, Daguang Xu, Prerna Dogra, Tom Vercauteren, Andrew Feng, Abood Quraini, Sebastien Ourselin, M. Jorge Cardoso

Abstract: Automatic segmentation of medical images is a key step for diagnostic and interventional tasks. However, achieving this requires large amounts of annotated volumes, which can be tedious and time-consuming task for expert annotators. In this paper, we introduce DeepEdit, a deep learning-based method for volumetric medical image annotation, that allows automatic and semi-automatic segmentation, and… ▽ More Automatic segmentation of medical images is a key step for diagnostic and interventional tasks. However, achieving this requires large amounts of annotated volumes, which can be tedious and time-consuming task for expert annotators. In this paper, we introduce DeepEdit, a deep learning-based method for volumetric medical image annotation, that allows automatic and semi-automatic segmentation, and click-based refinement. DeepEdit combines the power of two methods: a non-interactive (i.e. automatic segmentation using nnU-Net, UNET or UNETR) and an interactive segmentation method (i.e. DeepGrow), into a single deep learning model. It allows easy integration of uncertainty-based ranking strategies (i.e. aleatoric and epistemic uncertainty computation) and active learning. We propose and implement a method for training DeepEdit by using standard training combined with user interaction simulation. Once trained, DeepEdit allows clinicians to quickly segment their datasets by using the algorithm in auto segmentation mode or by providing clicks via a user interface (i.e. 3D Slicer, OHIF). We show the value of DeepEdit through evaluation on the PROSTATEx dataset for prostate/prostatic lesions and the Multi-Atlas Labeling Beyond the Cranial Vault (BTCV) dataset for abdominal CT segmentation, using state-of-the-art network architectures as baseline for comparison. DeepEdit could reduce the time and effort annotating 3D medical images compared to DeepGrow alone. Source code is available at https://github.com/Project-MONAI/MONAILabel △ Less

Submitted 17 May, 2023; originally announced May 2023.

arXiv:2304.09804 [pdf, ps, other]

Irregular dependence on Stokes number and non-ergodic transport of heavy inertial particles in steady laminar flows

Authors: Anu V. S. Nath, Anubhab Roy, S. Ravichandran, Rama Govindarajan

Abstract: Small heavy particles in a fluid flow respond to the flow on a time-scale proportional to their inertia, or Stokes number St. Their behaviour is thought to be gradually modified as St increases. We show, in the steady spatially-periodic laminar Taylor-Green flow, that particle dynamics, and their effective diffusivity, actually change in an irregular, non-monotonic and sometimes discontinuous mann… ▽ More Small heavy particles in a fluid flow respond to the flow on a time-scale proportional to their inertia, or Stokes number St. Their behaviour is thought to be gradually modified as St increases. We show, in the steady spatially-periodic laminar Taylor-Green flow, that particle dynamics, and their effective diffusivity, actually change in an irregular, non-monotonic and sometimes discontinuous manner, with increasing St. At Stokes of order one, we show chaotic particle motion, contrasting earlier conclusions for heavy particles in the same flow (Wang et al. 1992). Particles may display trapped orbits, or unbounded diffusive or ballistic dispersion, with the vortices behaving like scatterers in a soft Lorentz gas (Klages et al. 2019). The dynamics is non-ergodic. We discuss the possible consequences of our findings for particulate turbulent flows. △ Less

Submitted 19 April, 2023; originally announced April 2023.

arXiv:2303.16520 [pdf, other]

Fair Federated Medical Image Segmentation via Client Contribution Estimation

Authors: Meirui Jiang, Holger R Roth, Wenqi Li, Dong Yang, Can Zhao, Vishwesh Nath, Daguang Xu, Qi Dou, Ziyue Xu

Abstract: How to ensure fairness is an important topic in federated learning (FL). Recent studies have investigated how to reward clients based on their contribution (collaboration fairness), and how to achieve uniformity of performance across clients (performance fairness). Despite achieving progress on either one, we argue that it is critical to consider them together, in order to engage and motivate more… ▽ More How to ensure fairness is an important topic in federated learning (FL). Recent studies have investigated how to reward clients based on their contribution (collaboration fairness), and how to achieve uniformity of performance across clients (performance fairness). Despite achieving progress on either one, we argue that it is critical to consider them together, in order to engage and motivate more diverse clients joining FL to derive a high-quality global model. In this work, we propose a novel method to optimize both types of fairness simultaneously. Specifically, we propose to estimate client contribution in gradient and data space. In gradient space, we monitor the gradient direction differences of each client with respect to others. And in data space, we measure the prediction error on client data using an auxiliary model. Based on this contribution estimation, we propose a FL method, federated training via contribution estimation (FedCE), i.e., using estimation as global model aggregation weights. We have theoretically analyzed our method and empirically evaluated it on two real-world medical datasets. The effectiveness of our approach has been validated with significant performance improvements, better collaboration fairness, better performance fairness, and comprehensive analytical studies. △ Less

Submitted 29 March, 2023; originally announced March 2023.

Comments: Accepted at CVPR 2023

arXiv:2303.16376 [pdf, other]

A Unified Learning Model for Estimating Fiber Orientation Distribution Functions on Heterogeneous Multi-shell Diffusion-weighted MRI

Authors: Tianyuan Yao, Nancy Newlin, Praitayini Kanakaraj, Vishwesh nath, Leon Y Cai, Karthik Ramadass, Kurt Schilling, Bennett A. Landman, Yuankai Huo

Abstract: Diffusion-weighted (DW) MRI measures the direction and scale of the local diffusion process in every voxel through its spectrum in q-space, typically acquired in one or more shells. Recent developments in micro-structure imaging and multi-tissue decomposition have sparked renewed attention to the radial b-value dependence of the signal. Applications in tissue classification and micro-architecture… ▽ More Diffusion-weighted (DW) MRI measures the direction and scale of the local diffusion process in every voxel through its spectrum in q-space, typically acquired in one or more shells. Recent developments in micro-structure imaging and multi-tissue decomposition have sparked renewed attention to the radial b-value dependence of the signal. Applications in tissue classification and micro-architecture estimation, therefore, require a signal representation that extends over the radial as well as angular domain. Multiple approaches have been proposed that can model the non-linear relationship between the DW-MRI signal and biological microstructure. In the past few years, many deep learning-based methods have been developed towards faster inference speed and higher inter-scan consistency compared with traditional model-based methods (e.g., multi-shell multi-tissue constrained spherical deconvolution). However, a multi-stage learning strategy is typically required since the learning process relies on various middle representations, such as simple harmonic oscillator reconstruction (SHORE) representation. In this work, we present a unified dynamic network with a single-stage spherical convolutional neural network, which allows efficient fiber orientation distribution function (fODF) estimation through heterogeneous multi-shell diffusion MRI sequences. We study the Human Connectome Project (HCP) young adults with test-retest scans. From the experimental results, the proposed single-stage method outperforms prior multi-stage approaches in repeated fODF estimation with shell dropoff and single-shell DW-MRI sequences. △ Less

Submitted 29 January, 2024; v1 submitted 28 March, 2023; originally announced March 2023.

arXiv:2303.16270 [pdf, other]

Communication-Efficient Vertical Federated Learning with Limited Overlapping Samples

Authors: Jingwei Sun, Ziyue Xu, Dong Yang, Vishwesh Nath, Wenqi Li, Can Zhao, Daguang Xu, Yiran Chen, Holger R. Roth

Abstract: Federated learning is a popular collaborative learning approach that enables clients to train a global model without sharing their local data. Vertical federated learning (VFL) deals with scenarios in which the data on clients have different feature spaces but share some overlapping samples. Existing VFL approaches suffer from high communication costs and cannot deal efficiently with limited overl… ▽ More Federated learning is a popular collaborative learning approach that enables clients to train a global model without sharing their local data. Vertical federated learning (VFL) deals with scenarios in which the data on clients have different feature spaces but share some overlapping samples. Existing VFL approaches suffer from high communication costs and cannot deal efficiently with limited overlapping samples commonly seen in the real world. We propose a practical vertical federated learning (VFL) framework called \textbf{one-shot VFL} that can solve the communication bottleneck and the problem of limited overlapping samples simultaneously based on semi-supervised learning. We also propose \textbf{few-shot VFL} to improve the accuracy further with just one more communication round between the server and the clients. In our proposed framework, the clients only need to communicate with the server once or only a few times. We evaluate the proposed VFL framework on both image and tabular datasets. Our methods can improve the accuracy by more than 46.5\% and reduce the communication cost by more than 330$\times$ compared with state-of-the-art VFL methods when evaluated on CIFAR-10. Our code will be made publicly available at \url{https://nvidia.github.io/NVFlare/research/one-shot-vfl}. △ Less

Submitted 29 March, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

arXiv:2211.02701 [pdf, other]

MONAI: An open-source framework for deep learning in healthcare

Authors: M. Jorge Cardoso, Wenqi Li, Richard Brown, Nic Ma, Eric Kerfoot, Yiheng Wang, Benjamin Murrey, Andriy Myronenko, Can Zhao, Dong Yang, Vishwesh Nath, Yufan He, Ziyue Xu, Ali Hatamizadeh, Andriy Myronenko, Wentao Zhu, Yun Liu, Mingxin Zheng, Yucheng Tang, Isaac Yang, Michael Zephyr, Behrooz Hashemian, Sachidanand Alle, Mohammad Zalbagi Darestani, Charlie Budd , et al. (32 additional authors not shown)

Abstract: Artificial Intelligence (AI) is having a tremendous impact across most areas of science. Applications of AI in healthcare have the potential to improve our ability to detect, diagnose, prognose, and intervene on human disease. For AI models to be used clinically, they need to be made safe, reproducible and robust, and the underlying software framework must be aware of the particularities (e.g. geo… ▽ More Artificial Intelligence (AI) is having a tremendous impact across most areas of science. Applications of AI in healthcare have the potential to improve our ability to detect, diagnose, prognose, and intervene on human disease. For AI models to be used clinically, they need to be made safe, reproducible and robust, and the underlying software framework must be aware of the particularities (e.g. geometry, physiology, physics) of medical data being processed. This work introduces MONAI, a freely available, community-supported, and consortium-led PyTorch-based framework for deep learning in healthcare. MONAI extends PyTorch to support medical data, with a particular focus on imaging, and provide purpose-specific AI model architectures, transformations and utilities that streamline the development and deployment of medical AI models. MONAI follows best practices for software-development, providing an easy-to-use, robust, well-documented, and well-tested software framework. MONAI preserves the simple, additive, and compositional approach of its underlying PyTorch libraries. MONAI is being used by and receiving contributions from research, clinical and industrial teams from around the world, who are pursuing applications spanning nearly every aspect of healthcare. △ Less

Submitted 4 November, 2022; originally announced November 2022.

Comments: www.monai.io

arXiv:2209.06285 [pdf, other]

Warm Start Active Learning with Proxy Labels \& Selection via Semi-Supervised Fine-Tuning

Authors: Vishwesh Nath, Dong Yang, Holger R. Roth, Daguang Xu

Abstract: Which volume to annotate next is a challenging problem in building medical imaging datasets for deep learning. One of the promising methods to approach this question is active learning (AL). However, AL has been a hard nut to crack in terms of which AL algorithm and acquisition functions are most useful for which datasets. Also, the problem is exacerbated with which volumes to label first when the… ▽ More Which volume to annotate next is a challenging problem in building medical imaging datasets for deep learning. One of the promising methods to approach this question is active learning (AL). However, AL has been a hard nut to crack in terms of which AL algorithm and acquisition functions are most useful for which datasets. Also, the problem is exacerbated with which volumes to label first when there is zero labeled data to start with. This is known as the cold start problem in AL. We propose two novel strategies for AL specifically for 3D image segmentation. First, we tackle the cold start problem by proposing a proxy task and then utilizing uncertainty generated from the proxy task to rank the unlabeled data to be annotated. Second, we craft a two-stage learning framework for each active iteration where the unlabeled data is also used in the second stage as a semi-supervised fine-tuning strategy. We show the promise of our approach on two well-known large public datasets from medical segmentation decathlon. The results indicate that the initial selection of data and semi-supervised framework both showed significant improvement for several AL strategies. △ Less

Submitted 13 September, 2022; originally announced September 2022.

Comments: 12 pages, 5 figures

arXiv:2203.12362 [pdf, other]

doi 10.1016/j.media.2024.103207

MONAI Label: A framework for AI-assisted Interactive Labeling of 3D Medical Images

Authors: Andres Diaz-Pinto, Sachidanand Alle, Vishwesh Nath, Yucheng Tang, Alvin Ihsani, Muhammad Asad, Fernando Pérez-García, Pritesh Mehta, Wenqi Li, Mona Flores, Holger R. Roth, Tom Vercauteren, Daguang Xu, Prerna Dogra, Sebastien Ourselin, Andrew Feng, M. Jorge Cardoso

Abstract: The lack of annotated datasets is a major bottleneck for training new task-specific supervised machine learning models, considering that manual annotation is extremely expensive and time-consuming. To address this problem, we present MONAI Label, a free and open-source framework that facilitates the development of applications based on artificial intelligence (AI) models that aim at reducing the t… ▽ More The lack of annotated datasets is a major bottleneck for training new task-specific supervised machine learning models, considering that manual annotation is extremely expensive and time-consuming. To address this problem, we present MONAI Label, a free and open-source framework that facilitates the development of applications based on artificial intelligence (AI) models that aim at reducing the time required to annotate radiology datasets. Through MONAI Label, researchers can develop AI annotation applications focusing on their domain of expertise. It allows researchers to readily deploy their apps as services, which can be made available to clinicians via their preferred user interface. Currently, MONAI Label readily supports locally installed (3D Slicer) and web-based (OHIF) frontends and offers two active learning strategies to facilitate and speed up the training of segmentation algorithms. MONAI Label allows researchers to make incremental improvements to their AI-based annotation application by making them available to other researchers and clinicians alike. Additionally, MONAI Label provides sample AI-based interactive and non-interactive labeling applications, that can be used directly off the shelf, as plug-and-play to any given dataset. Significant reduced annotation times using the interactive model can be observed on two public datasets. △ Less

Submitted 28 April, 2023; v1 submitted 23 March, 2022; originally announced March 2022.

arXiv:2201.01266 [pdf, other]

Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images

Authors: Ali Hatamizadeh, Vishwesh Nath, Yucheng Tang, Dong Yang, Holger Roth, Daguang Xu

Abstract: Semantic segmentation of brain tumors is a fundamental medical image analysis task involving multiple MRI imaging modalities that can assist clinicians in diagnosing the patient and successively studying the progression of the malignant entity. In recent years, Fully Convolutional Neural Networks (FCNNs) approaches have become the de facto standard for 3D medical image segmentation. The popular "U… ▽ More Semantic segmentation of brain tumors is a fundamental medical image analysis task involving multiple MRI imaging modalities that can assist clinicians in diagnosing the patient and successively studying the progression of the malignant entity. In recent years, Fully Convolutional Neural Networks (FCNNs) approaches have become the de facto standard for 3D medical image segmentation. The popular "U-shaped" network architecture has achieved state-of-the-art performance benchmarks on different 2D and 3D semantic segmentation tasks and across various imaging modalities. However, due to the limited kernel size of convolution layers in FCNNs, their performance of modeling long-range information is sub-optimal, and this can lead to deficiencies in the segmentation of tumors with variable sizes. On the other hand, transformer models have demonstrated excellent capabilities in capturing such long-range information in multiple domains, including natural language processing and computer vision. Inspired by the success of vision transformers and their variants, we propose a novel segmentation model termed Swin UNEt TRansformers (Swin UNETR). Specifically, the task of 3D brain tumor semantic segmentation is reformulated as a sequence to sequence prediction problem wherein multi-modal input data is projected into a 1D sequence of embedding and used as an input to a hierarchical Swin transformer as the encoder. The swin transformer encoder extracts features at five different resolutions by utilizing shifted windows for computing self-attention and is connected to an FCNN-based decoder at each resolution via skip connections. We have participated in BraTS 2021 segmentation challenge, and our proposed model ranks among the top-performing approaches in the validation phase. Code: https://monai.io/research/swin-unetr △ Less

Submitted 4 January, 2022; originally announced January 2022.

Comments: 13 pages, 3 figures

arXiv:2112.10652 [pdf, other]

HyperSegNAS: Bridging One-Shot Neural Architecture Search with 3D Medical Image Segmentation using HyperNet

Authors: Cheng Peng, Andriy Myronenko, Ali Hatamizadeh, Vish Nath, Md Mahfuzur Rahman Siddiquee, Yufan He, Daguang Xu, Rama Chellappa, Dong Yang

Abstract: Semantic segmentation of 3D medical images is a challenging task due to the high variability of the shape and pattern of objects (such as organs or tumors). Given the recent success of deep learning in medical image segmentation, Neural Architecture Search (NAS) has been introduced to find high-performance 3D segmentation network architectures. However, because of the massive computational require… ▽ More Semantic segmentation of 3D medical images is a challenging task due to the high variability of the shape and pattern of objects (such as organs or tumors). Given the recent success of deep learning in medical image segmentation, Neural Architecture Search (NAS) has been introduced to find high-performance 3D segmentation network architectures. However, because of the massive computational requirements of 3D data and the discrete optimization nature of architecture search, previous NAS methods require a long search time or necessary continuous relaxation, and commonly lead to sub-optimal network architectures. While one-shot NAS can potentially address these disadvantages, its application in the segmentation domain has not been well studied in the expansive multi-scale multi-path search space. To enable one-shot NAS for medical image segmentation, our method, named HyperSegNAS, introduces a HyperNet to assist super-net training by incorporating architecture topology information. Such a HyperNet can be removed once the super-net is trained and introduces no overhead during architecture search. We show that HyperSegNAS yields better performing and more intuitive architectures compared to the previous state-of-the-art (SOTA) segmentation networks; furthermore, it can quickly and accurately find good architecture candidates under different computing constraints. Our method is evaluated on public datasets from the Medical Segmentation Decathlon (MSD) challenge, and achieves SOTA performances. △ Less

Submitted 24 March, 2022; v1 submitted 20 December, 2021; originally announced December 2021.

arXiv:2111.14791 [pdf, other]

Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis

Authors: Yucheng Tang, Dong Yang, Wenqi Li, Holger Roth, Bennett Landman, Daguang Xu, Vishwesh Nath, Ali Hatamizadeh

Abstract: Vision Transformers (ViT)s have shown great performance in self-supervised learning of global and local representations that can be transferred to downstream applications. Inspired by these results, we introduce a novel self-supervised learning framework with tailored proxy tasks for medical image analysis. Specifically, we propose: (i) a new 3D transformer-based model, dubbed Swin UNEt TRansforme… ▽ More Vision Transformers (ViT)s have shown great performance in self-supervised learning of global and local representations that can be transferred to downstream applications. Inspired by these results, we introduce a novel self-supervised learning framework with tailored proxy tasks for medical image analysis. Specifically, we propose: (i) a new 3D transformer-based model, dubbed Swin UNEt TRansformers (Swin UNETR), with a hierarchical encoder for self-supervised pre-training; (ii) tailored proxy tasks for learning the underlying pattern of human anatomy. We demonstrate successful pre-training of the proposed model on 5,050 publicly available computed tomography (CT) images from various body organs. The effectiveness of our approach is validated by fine-tuning the pre-trained models on the Beyond the Cranial Vault (BTCV) Segmentation Challenge with 13 abdominal organs and segmentation tasks from the Medical Segmentation Decathlon (MSD) dataset. Our model is currently the state-of-the-art (i.e. ranked 1st) on the public test leaderboards of both MSD and BTCV datasets. Code: https://monai.io/research/swin-unetr △ Less

Submitted 28 March, 2022; v1 submitted 29 November, 2021; originally announced November 2021.

Comments: CVPR'22 Accepted Paper

arXiv:2111.04102 [pdf, ps, other]

doi 10.1103/PhysRevE.105.035101

Transport of condensing droplets in Taylor-Green vortex flow in the presence of thermal noise

Authors: Anu V. S. Nath, Anubhab Roy, Rama Govindarajan, S. Ravichandran

Abstract: We study the role of phase change and thermal noise in particle transport in turbulent flows. We employ a toy model to extract the main physics: condensing droplets are modelled as heavy particles which grow in size, the ambient flow is modelled as a two-dimensional Taylor-Green (TG) flow consisting of an array of vortices delineated by separatrices, and thermal noise are modelled as uncorrelated… ▽ More We study the role of phase change and thermal noise in particle transport in turbulent flows. We employ a toy model to extract the main physics: condensing droplets are modelled as heavy particles which grow in size, the ambient flow is modelled as a two-dimensional Taylor-Green (TG) flow consisting of an array of vortices delineated by separatrices, and thermal noise are modelled as uncorrelated Gaussian white noise. In general, heavy inertial particles are centrifuged out of regions of high vorticity and into regions of high strain. In cellular flows, we find, in agreement with earlier results, that droplets with Stokes numbers smaller than a critical value, $St < St_{\rm{cr}}$, remain trapped in the vortices in which they are initialised, while larger droplets move ballistically away from their initial positions by crossing separatrices. We independently vary the Péclet number $Pe$ characterising the amplitude of thermal noise and the condensation rate $Π$ to study their effects on the critical Stokes number for droplet trapping, as well as on the final states of motion of the droplets. We find that the imposition of thermal noise, or of a finite condensation rate, allows droplets of $St < St_{\rm{cr}}$ to leave their initial vortices. We find that the effects of thermal noise become negligible for growing droplets, and that growing droplets achieve ballistic motion when their Stokes numbers become $\mathcal{O}(1)$. We also find an intermediate regime prior to attaining the ballistic state, in which droplets move diffusively away from their initial vortices in the presence of thermal noise. △ Less

Submitted 7 November, 2021; originally announced November 2021.

Comments: 14 pages, 11 figures

arXiv:2107.05471 [pdf, other]

The Power of Proxy Data and Proxy Networks for Hyper-Parameter Optimization in Medical Image Segmentation

Authors: Vishwesh Nath, Dong Yang, Ali Hatamizadeh, Anas A. Abidin, Andriy Myronenko, Holger Roth, Daguang Xu

Abstract: Deep learning models for medical image segmentation are primarily data-driven. Models trained with more data lead to improved performance and generalizability. However, training is a computationally expensive process because multiple hyper-parameters need to be tested to find the optimal setting for best performance. In this work, we focus on accelerating the estimation of hyper-parameters by prop… ▽ More Deep learning models for medical image segmentation are primarily data-driven. Models trained with more data lead to improved performance and generalizability. However, training is a computationally expensive process because multiple hyper-parameters need to be tested to find the optimal setting for best performance. In this work, we focus on accelerating the estimation of hyper-parameters by proposing two novel methodologies: proxy data and proxy networks. Both can be useful for estimating hyper-parameters more efficiently. We test the proposed techniques on CT and MR imaging modalities using well-known public datasets. In both cases using one dataset for building proxy data and another data source for external evaluation. For CT, the approach is tested on spleen segmentation with two datasets. The first dataset is from the medical segmentation decathlon (MSD), where the proxy data is constructed, the secondary dataset is utilized as an external validation dataset. Similarly, for MR, the approach is evaluated on prostate segmentation where the first dataset is from MSD and the second dataset is PROSTATEx. First, we show higher correlation to using full data for training when testing on the external validation set using smaller proxy data than a random selection of the proxy data. Second, we show that a high correlation exists for proxy networks when compared with the full network on validation Dice score. Third, we show that the proposed approach of utilizing a proxy network can speed up an AutoML framework for hyper-parameter search by 3.3x, and by 4.4x if proxy data and proxy network are utilized together. △ Less

Submitted 12 July, 2021; originally announced July 2021.

arXiv:2103.10504 [pdf, other]

UNETR: Transformers for 3D Medical Image Segmentation

Authors: Ali Hatamizadeh, Yucheng Tang, Vishwesh Nath, Dong Yang, Andriy Myronenko, Bennett Landman, Holger Roth, Daguang Xu

Abstract: Fully Convolutional Neural Networks (FCNNs) with contracting and expanding paths have shown prominence for the majority of medical image segmentation applications since the past decade. In FCNNs, the encoder plays an integral role by learning both global and local features and contextual representations which can be utilized for semantic output prediction by the decoder. Despite their success, the… ▽ More Fully Convolutional Neural Networks (FCNNs) with contracting and expanding paths have shown prominence for the majority of medical image segmentation applications since the past decade. In FCNNs, the encoder plays an integral role by learning both global and local features and contextual representations which can be utilized for semantic output prediction by the decoder. Despite their success, the locality of convolutional layers in FCNNs, limits the capability of learning long-range spatial dependencies. Inspired by the recent success of transformers for Natural Language Processing (NLP) in long-range sequence learning, we reformulate the task of volumetric (3D) medical image segmentation as a sequence-to-sequence prediction problem. We introduce a novel architecture, dubbed as UNEt TRansformers (UNETR), that utilizes a transformer as the encoder to learn sequence representations of the input volume and effectively capture the global multi-scale information, while also following the successful "U-shaped" network design for the encoder and decoder. The transformer encoder is directly connected to a decoder via skip connections at different resolutions to compute the final semantic segmentation output. We have validated the performance of our method on the Multi Atlas Labeling Beyond The Cranial Vault (BTCV) dataset for multi-organ segmentation and the Medical Segmentation Decathlon (MSD) dataset for brain tumor and spleen segmentation tasks. Our benchmarks demonstrate new state-of-the-art performance on the BTCV leaderboard. Code: https://monai.io/research/unetr △ Less

Submitted 9 October, 2021; v1 submitted 18 March, 2021; originally announced March 2021.

Comments: Accepted to IEEE Winter Conference on Applications of Computer Vision (WACV) 2022

arXiv:2101.02323 [pdf]

doi 10.1109/TMI.2020.3048055

Diminishing Uncertainty within the Training Pool: Active Learning for Medical Image Segmentation

Authors: Vishwesh Nath, Dong Yang, Bennett A. Landman, Daguang Xu, Holger R. Roth

Abstract: Active learning is a unique abstraction of machine learning techniques where the model/algorithm could guide users for annotation of a set of data points that would be beneficial to the model, unlike passive machine learning. The primary advantage being that active learning frameworks select data points that can accelerate the learning process of a model and can reduce the amount of data needed to… ▽ More Active learning is a unique abstraction of machine learning techniques where the model/algorithm could guide users for annotation of a set of data points that would be beneficial to the model, unlike passive machine learning. The primary advantage being that active learning frameworks select data points that can accelerate the learning process of a model and can reduce the amount of data needed to achieve full accuracy as compared to a model trained on a randomly acquired data set. Multiple frameworks for active learning combined with deep learning have been proposed, and the majority of them are dedicated to classification tasks. Herein, we explore active learning for the task of segmentation of medical imaging data sets. We investigate our proposed framework using two datasets: 1.) MRI scans of the hippocampus, 2.) CT scans of pancreas and tumors. This work presents a query-by-committee approach for active learning where a joint optimizer is used for the committee. At the same time, we propose three new strategies for active learning: 1.) increasing frequency of uncertain data to bias the training data set; 2.) Using mutual information among the input images as a regularizer for acquisition to ensure diversity in the training dataset; 3.) adaptation of Dice log-likelihood for Stein variational gradient descent (SVGD). The results indicate an improvement in terms of data reduction by achieving full accuracy while only using 22.69 % and 48.85 % of the available data for each dataset, respectively. △ Less

Submitted 6 January, 2021; originally announced January 2021.

Comments: 19 pages, 13 figures, Transactions of Medical Imaging

Journal ref: IEEE Transactions on Medical Imaging, 2020

arXiv:2003.07921 [pdf, other]

Semi-supervised Contrastive Learning Using Partial Label Information

Authors: Colin B. Hansen, Vishwesh Nath, Diego A. Mesa, Yuankai Huo, Bennett A. Landman, Thomas A. Lasko

Abstract: In semi-supervised learning, information from unlabeled examples is used to improve the model learned from labeled examples. In some learning problems, partial label information can be inferred from otherwise unlabeled examples and used to further improve the model. In particular, partial label information exists when subsets of training examples are known to have the same label, even though the l… ▽ More In semi-supervised learning, information from unlabeled examples is used to improve the model learned from labeled examples. In some learning problems, partial label information can be inferred from otherwise unlabeled examples and used to further improve the model. In particular, partial label information exists when subsets of training examples are known to have the same label, even though the label itself is missing. By encouraging the model to give the same label to all such examples through contrastive learning objectives, we can potentially improve its performance. We call this encouragement Nullspace Tuning because the difference vector between any pair of examples with the same label should lie in the nullspace of a linear model. In this paper, we investigate the benefit of using partial label information using a careful comparison framework over well-characterized public datasets. We show that the additional information provided by partial labels reduces test error over good semi-supervised methods usually by a factor of 2, up to a factor of 5.5 in the best case. We also show that adding Nullspace Tuning to the newer and state-of-the-art MixMatch method decreases its test error by up to a factor of 1.8. △ Less

Submitted 3 June, 2024; v1 submitted 17 March, 2020; originally announced March 2020.

arXiv:2002.08820 [pdf]

Deep Learning Estimation of Multi-Tissue Constrained Spherical Deconvolution with Limited Single Shell DW-MRI

Authors: Vishwesh Nath, Sudhir K. Pathak, Kurt G. Schilling, Walt Schneider, Bennett A. Landman

Abstract: Diffusion-weighted magnetic resonance imaging (DW-MRI) is the only non-invasive approach for estimation of intra-voxel tissue microarchitecture and reconstruction of in vivo neural pathways for the human brain. With improvement in accelerated MRI acquisition technologies, DW-MRI protocols that make use of multiple levels of diffusion sensitization have gained popularity. A well-known advanced meth… ▽ More Diffusion-weighted magnetic resonance imaging (DW-MRI) is the only non-invasive approach for estimation of intra-voxel tissue microarchitecture and reconstruction of in vivo neural pathways for the human brain. With improvement in accelerated MRI acquisition technologies, DW-MRI protocols that make use of multiple levels of diffusion sensitization have gained popularity. A well-known advanced method for reconstruction of white matter microstructure that uses multi-shell data is multi-tissue constrained spherical deconvolution (MT-CSD). MT-CSD substantially improves the resolution of intra-voxel structure over the traditional single shell version, constrained spherical deconvolution (CSD). Herein, we explore the possibility of using deep learning on single shell data (using the b=1000 s/mm2 from the Human Connectome Project (HCP)) to estimate the information content captured by 8th order MT-CSD using the full three shell data (b=1000, 2000, and 3000 s/mm2 from HCP). Briefly, we examine two network architectures: 1.) Sequential network of fully connected dense layers with a residual block in the middle (ResDNN), 2.) Patch based convolutional neural network with a residual block (ResCNN). For both networks an additional output block for estimation of voxel fraction was used with a modified loss function. Each approach was compared against the baseline of using MT-CSD on all data on 15 subjects from the HCP divided into 5 training, 2 validation, and 8 testing subjects with a total of 6.7 million voxels. The fiber orientation distribution function (fODF) can be recovered with high correlation (0.77 vs 0.74 and 0.65) as compared to the ground truth of MT-CST, which was derived from the multi-shell DW-MRI acquisitions. Source code and models have been made publicly available. △ Less

Submitted 20 February, 2020; originally announced February 2020.

Comments: 10 pages, 7 figures

arXiv:1911.07927 [pdf]

Deep Learning Captures More Accurate Diffusion Fiber Orientations Distributions than Constrained Spherical Deconvolution

Authors: Vishwesh Nath, Kurt G. Schilling, Colin B. Hansen, Prasanna Parvathaneni, Allison E. Hainline, Camilo Bermudez, Andrew J. Plassard, Vaibhav Janve, Yurui Gao, Justin A. Blaber, Iwona Stępniewska, Adam W. Anderson, Bennett A. Landman

Abstract: Confocal histology provides an opportunity to establish intra-voxel fiber orientation distributions that can be used to quantitatively assess the biological relevance of diffusion weighted MRI models, e.g., constrained spherical deconvolution (CSD). Here, we apply deep learning to investigate the potential of single shell diffusion weighted MRI to explain histologically observed fiber orientation… ▽ More Confocal histology provides an opportunity to establish intra-voxel fiber orientation distributions that can be used to quantitatively assess the biological relevance of diffusion weighted MRI models, e.g., constrained spherical deconvolution (CSD). Here, we apply deep learning to investigate the potential of single shell diffusion weighted MRI to explain histologically observed fiber orientation distributions (FOD) and compare the derived deep learning model with a leading CSD approach. This study (1) demonstrates that there exists additional information in the diffusion signal that is not currently exploited by CSD, and (2) provides an illustrative data-driven model that makes use of this information. △ Less

Submitted 13 November, 2019; originally announced November 2019.

Comments: 2 pages, 4 figures. This work was accepted and published as an abstract at ISMRM 2018 held in Paris, France

arXiv:1907.06319 [pdf]

Enabling Multi-Shell b-Value Generalizability of Data-Driven Diffusion Models with Deep SHORE

Authors: Vishwesh Nath, Ilwoo Lyu, Kurt G. Schilling, Prasanna Parvathaneni, Colin B. Hansen, Yucheng Tang, Yuankai Huo, Vaibhav A. Janve, Yurui Gao, Iwona Stepniewska, Adam W. Anderson, Bennett A. Landman

Abstract: Intra-voxel models of the diffusion signal are essential for interpreting organization of the tissue environment at micrometer level with data at millimeter resolution. Recent advances in data driven methods have enabled direct compari-son and optimization of methods for in-vivo data with externally validated histological sections with both 2-D and 3-D histology. Yet, all existing methods make lim… ▽ More Intra-voxel models of the diffusion signal are essential for interpreting organization of the tissue environment at micrometer level with data at millimeter resolution. Recent advances in data driven methods have enabled direct compari-son and optimization of methods for in-vivo data with externally validated histological sections with both 2-D and 3-D histology. Yet, all existing methods make limiting assumptions of either (1) model-based linkages between b-values or (2) limited associations with single shell data. We generalize prior deep learning models that used single shell spherical harmonic transforms to integrate the re-cently developed simple harmonic oscillator reconstruction (SHORE) basis. To enable learning on the SHORE manifold, we present an alternative formulation of the fiber orientation distribution (FOD) object using the SHORE basis while rep-resenting the observed diffusion weighted data in the SHORE basis. To ensure consistency of hyper-parameter optimization for SHORE, we present our Deep SHORE approach to learn on a data-optimized manifold. Deep SHORE is evalu-ated with eight-fold cross-validation of a preclinical MRI-histology data with four b-values. Generalizability of in-vivo human data is evaluated on two separate 3T MRI scanners. Specificity in terms of angular correlation (ACC) with the preclinical data improved on single shell: 0.78 relative to 0.73 and 0.73, multi-shell: 0.80 relative to 0.74 (p < 0.001). In the in-vivo human data, Deep SHORE was more consistent across scanners with 0.63 relative to other multi-shell methods 0.39, 0.52 and 0.57 in terms of ACC. In conclusion, Deep SHORE is a promising method to enable data driven learning with DW-MRI under conditions with varying b-values, number of diffusion shells, and gradient directions per shell. △ Less

Submitted 22 February, 2020; v1 submitted 14 July, 2019; originally announced July 2019.

arXiv:1907.05395 [pdf, other]

Cortical Surface Parcellation using Spherical Convolutional Neural Networks

Authors: Prasanna Parvathaneni, Shunxing Bao, Vishwesh Nath, Neil D. Woodward, Daniel O. Claassen, Carissa J. Cascio, David H. Zald, Yuankai Huo, Bennett A. Landman, Ilwoo Lyu

Abstract: We present cortical surface parcellation using spherical deep convolutional neural networks. Traditional multi-atlas cortical surface parcellation requires inter-subject surface registration using geometric features with high processing time on a single subject (2-3 hours). Moreover, even optimal surface registration does not necessarily produce optimal cortical parcellation as parcel boundaries a… ▽ More We present cortical surface parcellation using spherical deep convolutional neural networks. Traditional multi-atlas cortical surface parcellation requires inter-subject surface registration using geometric features with high processing time on a single subject (2-3 hours). Moreover, even optimal surface registration does not necessarily produce optimal cortical parcellation as parcel boundaries are not fully matched to the geometric features. In this context, a choice of training features is important for accurate cortical parcellation. To utilize the networks efficiently, we propose cortical parcellation-specific input data from an irregular and complicated structure of cortical surfaces. To this end, we align ground-truth cortical parcel boundaries and use their resulting deformation fields to generate new pairs of deformed geometric features and parcellation maps. To extend the capability of the networks, we then smoothly morph cortical geometric features and parcellation maps using the intermediate deformation fields. We validate our method on 427 adult brains for 49 labels. The experimental results show that our method out-performs traditional multi-atlas and naive spherical U-Net approaches, while achieving full cortical parcellation in less than a minute. △ Less

Submitted 11 July, 2019; originally announced July 2019.

arXiv:1903.04207 [pdf, other]

Distributed deep learning for robust multi-site segmentation of CT imaging after traumatic brain injury

Authors: Samuel Remedios, Snehashis Roy, Justin Blaber, Camilo Bermudez, Vishwesh Nath, Mayur B. Patel, John A. Butman, Bennett A. Landman, Dzung L. Pham

Abstract: Machine learning models are becoming commonplace in the domain of medical imaging, and with these methods comes an ever-increasing need for more data. However, to preserve patient anonymity it is frequently impractical or prohibited to transfer protected health information (PHI) between institutions. Additionally, due to the nature of some studies, there may not be a large public dataset available… ▽ More Machine learning models are becoming commonplace in the domain of medical imaging, and with these methods comes an ever-increasing need for more data. However, to preserve patient anonymity it is frequently impractical or prohibited to transfer protected health information (PHI) between institutions. Additionally, due to the nature of some studies, there may not be a large public dataset available on which to train models. To address this conundrum, we analyze the efficacy of transferring the model itself in lieu of data between different sites. By doing so we accomplish two goals: 1) the model gains access to training on a larger dataset that it could not normally obtain and 2) the model better generalizes, having trained on data from separate locations. In this paper, we implement multi-site learning with disparate datasets from the National Institutes of Health (NIH) and Vanderbilt University Medical Center (VUMC) without compromising PHI. Three neural networks are trained to convergence on a computed tomography (CT) brain hematoma segmentation task: one only with NIH data,one only with VUMC data, and one multi-site model alternating between NIH and VUMC data. Resultant lesion masks with the multi-site model attain an average Dice similarity coefficient of 0.64 and the automatically segmented hematoma volumes correlate to those done manually with a Pearson correlation coefficient of 0.87,corresponding to an 8% and 5% improvement, respectively, over the single-site model counterparts. △ Less

Submitted 11 March, 2019; originally announced March 2019.

arXiv:1811.04289 [pdf]

Coronary Calcium Detection using 3D Attention Identical Dual Deep Network Based on Weakly Supervised Learning

Authors: Yuankai Huo, James G. Terry, Jiachen Wang, Vishwesh Nath, Camilo Bermudez, Shunxing Bao, Prasanna Parvathaneni, J. Jeffery Carr, Bennett A. Landman

Abstract: Coronary artery calcium (CAC) is biomarker of advanced subclinical coronary artery disease and predicts myocardial infarction and death prior to age 60 years. The slice-wise manual delineation has been regarded as the gold standard of coronary calcium detection. However, manual efforts are time and resource consuming and even impracticable to be applied on large-scale cohorts. In this paper, we pr… ▽ More Coronary artery calcium (CAC) is biomarker of advanced subclinical coronary artery disease and predicts myocardial infarction and death prior to age 60 years. The slice-wise manual delineation has been regarded as the gold standard of coronary calcium detection. However, manual efforts are time and resource consuming and even impracticable to be applied on large-scale cohorts. In this paper, we propose the attention identical dual network (AID-Net) to perform CAC detection using scan-rescan longitudinal non-contrast CT scans with weakly supervised attention by only using per scan level labels. To leverage the performance, 3D attention mechanisms were integrated into the AID-Net to provide complementary information for classification tasks. Moreover, the 3D Gradient-weighted Class Activation Mapping (Grad-CAM) was also proposed at the testing stage to interpret the behaviors of the deep neural network. 5075 non-contrast chest CT scans were used as training, validation and testing datasets. Baseline performance was assessed on the same cohort. From the results, the proposed AID-Net achieved the superior performance on classification accuracy (0.9272) and AUC (0.9627). △ Less

Submitted 10 November, 2018; originally announced November 2018.

Comments: Accepted by SPIE medical imaging 2019

arXiv:1810.04260 [pdf]

Inter-Scanner Harmonization of High Angular Resolution DW-MRI using Null Space Deep Learning

Authors: Vishwesh Nath, Prasanna Parvathaneni, Colin B. Hansen, Allison E. Hainline, Camilo Bermudez, Samuel Remedios, Justin A. Blaber, Kurt G. Schilling, Ilwoo Lyu, Vaibhav Janve, Yurui Gao, Iwona Stepniewska, Baxter P. Rogers, Allen T. Newton, L. Taylor Davis, Jeff Luci, Adam W. Anderson, Bennett A. Landman

Abstract: Diffusion-weighted magnetic resonance imaging (DW-MRI) allows for non-invasive imaging of the local fiber architecture of the human brain at a millimetric scale. Multiple classical approaches have been proposed to detect both single (e.g., tensors) and multiple (e.g., constrained spherical deconvolution, CSD) fiber population orientations per voxel. However, existing techniques generally exhibit l… ▽ More Diffusion-weighted magnetic resonance imaging (DW-MRI) allows for non-invasive imaging of the local fiber architecture of the human brain at a millimetric scale. Multiple classical approaches have been proposed to detect both single (e.g., tensors) and multiple (e.g., constrained spherical deconvolution, CSD) fiber population orientations per voxel. However, existing techniques generally exhibit low reproducibility across MRI scanners. Herein, we propose a data-driven tech-nique using a neural network design which exploits two categories of data. First, training data were acquired on three squirrel monkey brains using ex-vivo DW-MRI and histology of the brain. Second, repeated scans of human subjects were acquired on two different scanners to augment the learning of the network pro-posed. To use these data, we propose a new network architecture, the null space deep network (NSDN), to simultaneously learn on traditional observed/truth pairs (e.g., MRI-histology voxels) along with repeated observations without a known truth (e.g., scan-rescan MRI). The NSDN was tested on twenty percent of the histology voxels that were kept completely blind to the network. NSDN significantly improved absolute performance relative to histology by 3.87% over CSD and 1.42% over a recently proposed deep neural network approach. More-over, it improved reproducibility on the paired data by 21.19% over CSD and 10.09% over a recently proposed deep approach. Finally, NSDN improved gen-eralizability of the model to a third in vivo human scanner (which was not used in training) by 16.08% over CSD and 10.41% over a recently proposed deep learn-ing approach. This work suggests that data-driven approaches for local fiber re-construction are more reproducible, informative and precise and offers a novel, practical method for determining these models. △ Less

Submitted 9 October, 2018; originally announced October 2018.

Comments: 10 pages, 5 figures

Showing 1–45 of 45 results for author: nath, V