Search | arXiv e-print repository

DIsoN: Decentralized Isolation Networks for Out-of-Distribution Detection in Medical Imaging

Authors: Felix Wagner, Pramit Saha, Harry Anthony, J. Alison Noble, Konstantinos Kamnitsas

Abstract: Safe deployment of machine learning (ML) models in safety-critical domains such as medical imaging requires detecting inputs with characteristics not seen during training, known as out-of-distribution (OOD) detection, to prevent unreliable predictions. Effective OOD detection after deployment could benefit from access to the training data, enabling direct comparison between test samples and the tr… ▽ More Safe deployment of machine learning (ML) models in safety-critical domains such as medical imaging requires detecting inputs with characteristics not seen during training, known as out-of-distribution (OOD) detection, to prevent unreliable predictions. Effective OOD detection after deployment could benefit from access to the training data, enabling direct comparison between test samples and the training data distribution to identify differences. State-of-the-art OOD detection methods, however, either discard training data after deployment or assume that test samples and training data are centrally stored together, an assumption that rarely holds in real-world settings. This is because shipping training data with the deployed model is usually impossible due to the size of training databases, as well as proprietary or privacy constraints. We introduce the Isolation Network, an OOD detection framework that quantifies the difficulty of separating a target test sample from the training data by solving a binary classification task. We then propose Decentralized Isolation Networks (DIsoN), which enables the comparison of training and test data when data-sharing is impossible, by exchanging only model parameters between the remote computational nodes of training and deployment. We further extend DIsoN with class-conditioning, comparing a target sample solely with training data of its predicted class. We evaluate DIsoN on four medical imaging datasets (dermatology, chest X-ray, breast ultrasound, histopathology) across 12 OOD detection tasks. DIsoN performs favorably against existing methods while respecting data-privacy. This decentralized OOD detection framework opens the way for a new type of service that ML developers could provide along with their models: providing remote, secure utilization of their training data for OOD detection services. Code will be available upon acceptance at: ***** △ Less

Submitted 10 June, 2025; originally announced June 2025.

ACM Class: I.2.11; I.4.9; I.4.9; J.3; I.2.0

arXiv:2506.06544 [pdf, ps, other]

Reasoning about External Calls

Authors: Sophia Drossopoulou, Julian Mackay, Susan Eisenbach, James Noble

Abstract: In today's complex software, internal trusted code is tightly intertwined with external untrusted code. To reason about internal code, programmers must reason about the potential effects of calls to external code, even though that code is not trusted and may not even be available. The effects of external calls can be limited, if internal code is programmed defensively, limiting potential effects b… ▽ More In today's complex software, internal trusted code is tightly intertwined with external untrusted code. To reason about internal code, programmers must reason about the potential effects of calls to external code, even though that code is not trusted and may not even be available. The effects of external calls can be limited, if internal code is programmed defensively, limiting potential effects by limiting access to the capabilities necessary to cause those effects. This paper addresses the specification and verification of internal code that relies on encapsulation and object capabilities to limit the effects of external calls. We propose new assertions for access to capabilities, new specifications for limiting effects, and a Hoare logic to verify that a module satisfies its specification, even while making external calls. We illustrate the approach though a running example with mechanised proofs, and prove soundness of the Hoare logic. △ Less

Submitted 6 June, 2025; originally announced June 2025.

Comments: 86 pages, 25 main paper, and 58 pages of appendices, many diagrams and figures

arXiv:2505.18381 [pdf, other]

Monocular Marker-free Patient-to-Image Intraoperative Registration for Cochlear Implant Surgery

Authors: Yike Zhang, Eduardo Davalos Anaya, Jack H. Noble

Abstract: This paper presents a novel method for monocular patient-to-image intraoperative registration, specifically designed to operate without any external hardware tracking equipment or fiducial point markers. Leveraging a synthetic microscopy surgical scene dataset with a wide range of transformations, our approach directly maps preoperative CT scans to 2D intraoperative surgical frames through a light… ▽ More This paper presents a novel method for monocular patient-to-image intraoperative registration, specifically designed to operate without any external hardware tracking equipment or fiducial point markers. Leveraging a synthetic microscopy surgical scene dataset with a wide range of transformations, our approach directly maps preoperative CT scans to 2D intraoperative surgical frames through a lightweight neural network for real-time cochlear implant surgery guidance via a zero-shot learning approach. Unlike traditional methods, our framework seamlessly integrates with monocular surgical microscopes, making it highly practical for clinical use without additional hardware dependencies and requirements. Our method estimates camera poses, which include a rotation matrix and a translation vector, by learning from the synthetic dataset, enabling accurate and efficient intraoperative registration. The proposed framework was evaluated on nine clinical cases using a patient-specific and cross-patient validation strategy. Our results suggest that our approach achieves clinically relevant accuracy in predicting 6D camera poses for registering 3D preoperative CT scans to 2D surgical scenes with an angular error within 10 degrees in most cases, while also addressing limitations of traditional methods, such as reliance on external tracking systems or fiducial markers. △ Less

Submitted 23 May, 2025; originally announced May 2025.

arXiv:2505.18368 [pdf, other]

Weakly-supervised Mamba-Based Mastoidectomy Shape Prediction for Cochlear Implant Surgery Using 3D T-Distribution Loss

Authors: Yike Zhang, Jack H. Noble

Abstract: Cochlear implant surgery is a treatment for individuals with severe hearing loss. It involves inserting an array of electrodes inside the cochlea to electrically stimulate the auditory nerve and restore hearing sensation. A crucial step in this procedure is mastoidectomy, a surgical intervention that removes part of the mastoid region of the temporal bone, providing a critical pathway to the cochl… ▽ More Cochlear implant surgery is a treatment for individuals with severe hearing loss. It involves inserting an array of electrodes inside the cochlea to electrically stimulate the auditory nerve and restore hearing sensation. A crucial step in this procedure is mastoidectomy, a surgical intervention that removes part of the mastoid region of the temporal bone, providing a critical pathway to the cochlea for electrode placement. Accurate prediction of the mastoidectomy region from preoperative imaging assists presurgical planning, reduces surgical risks, and improves surgical outcomes. In previous work, a self-supervised network was introduced to predict the mastoidectomy region using only preoperative CT scans. While promising, the method suffered from suboptimal robustness, limiting its practical application. To address this limitation, we propose a novel weakly-supervised Mamba-based framework to predict accurate mastoidectomy regions directly from preoperative CT scans. Our approach utilizes a 3D T-Distribution loss function inspired by the Student-t distribution, which effectively handles the complex geometric variability inherent in mastoidectomy shapes. Weak supervision is achieved using the segmentation results from the prior self-supervised network to eliminate the need for manual data cleaning or labeling throughout the training process. The proposed method is extensively evaluated against state-of-the-art approaches, demonstrating superior performance in predicting accurate and clinically relevant mastoidectomy regions. Our findings highlight the robustness and efficiency of the weakly-supervised learning framework with the proposed novel 3D T-Distribution loss. △ Less

Submitted 23 May, 2025; originally announced May 2025.

arXiv:2504.15329 [pdf, other]

Vision6D: 3D-to-2D Interactive Visualization and Annotation Tool for 6D Pose Estimation

Authors: Yike Zhang, Eduardo Davalos, Jack Noble

Abstract: Accurate 6D pose estimation has gained more attention over the years for robotics-assisted tasks that require precise interaction with physical objects. This paper presents an interactive 3D-to-2D visualization and annotation tool to support the 6D pose estimation research community. To the best of our knowledge, the proposed work is the first tool that allows users to visualize and manipulate 3D… ▽ More Accurate 6D pose estimation has gained more attention over the years for robotics-assisted tasks that require precise interaction with physical objects. This paper presents an interactive 3D-to-2D visualization and annotation tool to support the 6D pose estimation research community. To the best of our knowledge, the proposed work is the first tool that allows users to visualize and manipulate 3D objects interactively on a 2D real-world scene, along with a comprehensive user study. This system supports robust 6D camera pose annotation by providing both visual cues and spatial relationships to determine object position and orientation in various environments. The annotation feature in Vision6D is particularly helpful in scenarios where the transformation matrix between the camera and world objects is unknown, as it enables accurate annotation of these objects' poses using only the camera intrinsic matrix. This capability serves as a foundational step in developing and training advanced pose estimation models across various domains. We evaluate Vision6D's effectiveness by utilizing widely-used open-source pose estimation datasets Linemod and HANDAL through comparisons between the default ground-truth camera poses with manual annotations. A user study was performed to show that Vision6D generates accurate pose annotations via visual cues in an intuitive 3D user interface. This approach aims to bridge the gap between 2D scene projections and 3D scenes, offering an effective way for researchers and developers to solve 6D pose annotation related problems. The software is open-source and publicly available at https://github.com/InteractiveGL/vision6D. △ Less

Submitted 21 April, 2025; originally announced April 2025.

arXiv:2504.13089 [pdf, other]

Absorption of Fermionic Dark Matter in the PICO-60 C$_{3}$F$_{8}$ Bubble Chamber

Authors: E. Adams, B. Ali, R. Anderson-Dornan, I. J. Arnquist, M. Bai, D. Baxter, E. Behnke, B. Broerman, C. J. Chen, K. Clark, J. I. Collar, P. S. Cooper, D. Cranshaw, C. Cripe, M. Crisler, C. E. Dahl, M. Das, S. Das, S. Fallows, J. Farine, R. Filgas, A. García-Viltres, G. Giroux, O. Harris, H. Hawley-Herrera , et al. (36 additional authors not shown)

Abstract: Fermionic dark matter absorption on nuclear targets via neutral current interactions is explored using a non-relativistic effective field theory framework. An analysis of data from the PICO-60 C$_{3}$F$_{8}$ bubble chamber sets leading constraints on spin-independent absorption for dark matter masses below 23 MeV/$\textit{c}^2$ and establishes the first limits on spin-dependent absorptive interact… ▽ More Fermionic dark matter absorption on nuclear targets via neutral current interactions is explored using a non-relativistic effective field theory framework. An analysis of data from the PICO-60 C$_{3}$F$_{8}$ bubble chamber sets leading constraints on spin-independent absorption for dark matter masses below 23 MeV/$\textit{c}^2$ and establishes the first limits on spin-dependent absorptive interactions. These results demonstrate the sensitivity of bubble chambers to low-mass dark matter and underscore the importance of absorption searches in expanding the parameter space of direct detection experiments. △ Less

Submitted 17 April, 2025; originally announced April 2025.

arXiv:2504.06088 [pdf, other]

MCAT: Visual Query-Based Localization of Standard Anatomical Clips in Fetal Ultrasound Videos Using Multi-Tier Class-Aware Token Transformer

Authors: Divyanshu Mishra, Pramit Saha, He Zhao, Netzahualcoyotl Hernandez-Cruz, Olga Patey, Aris Papageorghiou, J. Alison Noble

Abstract: Accurate standard plane acquisition in fetal ultrasound (US) videos is crucial for fetal growth assessment, anomaly detection, and adherence to clinical guidelines. However, manually selecting standard frames is time-consuming and prone to intra- and inter-sonographer variability. Existing methods primarily rely on image-based approaches that capture standard frames and then classify the input fra… ▽ More Accurate standard plane acquisition in fetal ultrasound (US) videos is crucial for fetal growth assessment, anomaly detection, and adherence to clinical guidelines. However, manually selecting standard frames is time-consuming and prone to intra- and inter-sonographer variability. Existing methods primarily rely on image-based approaches that capture standard frames and then classify the input frames across different anatomies. This ignores the dynamic nature of video acquisition and its interpretation. To address these challenges, we introduce Multi-Tier Class-Aware Token Transformer (MCAT), a visual query-based video clip localization (VQ-VCL) method, to assist sonographers by enabling them to capture a quick US sweep. By then providing a visual query of the anatomy they wish to analyze, MCAT returns the video clip containing the standard frames for that anatomy, facilitating thorough screening for potential anomalies. We evaluate MCAT on two ultrasound video datasets and a natural image VQ-VCL dataset based on Ego4D. Our model outperforms state-of-the-art methods by 10% and 13% mIoU on the ultrasound datasets and by 5.35% mIoU on the Ego4D dataset, using 96% fewer tokens. MCAT's efficiency and accuracy have significant potential implications for public health, especially in low- and middle-income countries (LMICs), where it may enhance prenatal care by streamlining standard plane acquisition, simplifying US-based screening, diagnosis and allowing sonographers to examine more patients. △ Less

Submitted 8 April, 2025; originally announced April 2025.

Comments: Accepted in AAAI 2025

arXiv:2504.04911 [pdf, other]

IterMask3D: Unsupervised Anomaly Detection and Segmentation with Test-Time Iterative Mask Refinement in 3D Brain MR

Authors: Ziyun Liang, Xiaoqing Guo, Wentian Xu, Yasin Ibrahim, Natalie Voets, Pieter M Pretorius, J. Alison Noble, Konstantinos Kamnitsas

Abstract: Unsupervised anomaly detection and segmentation methods train a model to learn the training distribution as 'normal'. In the testing phase, they identify patterns that deviate from this normal distribution as 'anomalies'. To learn the `normal' distribution, prevailing methods corrupt the images and train a model to reconstruct them. During testing, the model attempts to reconstruct corrupted input… ▽ More Unsupervised anomaly detection and segmentation methods train a model to learn the training distribution as 'normal'. In the testing phase, they identify patterns that deviate from this normal distribution as 'anomalies'. To learn the `normal' distribution, prevailing methods corrupt the images and train a model to reconstruct them. During testing, the model attempts to reconstruct corrupted inputs based on the learned 'normal' distribution. Deviations from this distribution lead to high reconstruction errors, which indicate potential anomalies. However, corrupting an input image inevitably causes information loss even in normal regions, leading to suboptimal reconstruction and an increased risk of false positives. To alleviate this, we propose IterMask3D, an iterative spatial mask-refining strategy designed for 3D brain MRI. We iteratively spatially mask areas of the image as corruption and reconstruct them, then shrink the mask based on reconstruction error. This process iteratively unmasks 'normal' areas to the model, whose information further guides reconstruction of 'normal' patterns under the mask to be reconstructed accurately, reducing false positives. In addition, to achieve better reconstruction performance, we also propose using high-frequency image content as additional structural information to guide the reconstruction of the masked area. Extensive experiments on the detection of both synthetic and real-world imaging artifacts, as well as segmentation of various pathological lesions across multiple MRI sequences, consistently demonstrate the effectiveness of our proposed method. △ Less

Submitted 7 April, 2025; originally announced April 2025.

arXiv:2503.24309 [pdf, other]

doi 10.1051/0004-6361/202452966

The edge-on disk Tau042021: icy grains at high altitudes and a wind containing astronomical PAHs

Authors: E. Dartois, J. A. Noble, M. K. McClure, J. A. Sturm, T. L. Beck, N. Arulanantham, M. N. Drozdovskaya, C. C. Espaillat, D. Harsono, M. -E. Palumbo, Y. J. Pendleton, K. M. Pontoppidan

Abstract: Spectra of the nearly edge-on protoplanetary disks observed with the JWST have shown ice absorption bands of varying optical depths and peculiar profiles, challenging radiative transfer modelling and our understanding of dust and ice in disks. We build models including dust grain size, shape, and composition to reproduce JWST IFU spectroscopy of the large edge-on disk Tau042021. We explore radiati… ▽ More Spectra of the nearly edge-on protoplanetary disks observed with the JWST have shown ice absorption bands of varying optical depths and peculiar profiles, challenging radiative transfer modelling and our understanding of dust and ice in disks. We build models including dust grain size, shape, and composition to reproduce JWST IFU spectroscopy of the large edge-on disk Tau042021. We explore radiative transfer models using different dust grain size distributions, including grains of effective radii a_eff = 0.005-3000 microns. Scattering properties of distributions of triaxial ellipsoidal grains are calculated. We consider compositions with silicates, amorphous carbon, and mixtures of H2O, CO2, and CO. We use RADMC-3D Monte Carlo radiative transfer models of Tau042021 to simulate the spectral cubes observed with JWST-NIRSpec and MIRI. We compare the results to observations, including H2O at 3.05 microns, CO at 4.67 microns, and CO2 at 4.27 microns and to archival JWST-NIRCam and ALMA continuum images. The observed near- to mid-infrared imply dust distributions with grain sizes up to several tens of microns. The intensity distribution perpendicular to the disk exhibits emission profile wings extending into the upper disk atmosphere at altitudes exceeding the classical scale height expected in the isothermal hydrostatic limit. We produce ice map images demonstrating the presence of icy dust grains up to altitudes high above the disk midplane, more than three hydrostatic equilibrium scale heights. We demonstrate the presence of a wind containing the carriers of astronomical PAH bands. The wind appears as an X-shaped emission at 3.3, 6.2, 7.7 and 11.3 microns, characteristic wavelengths of the infrared astronomical PAH bands. We associate the spatial distribution of this component with carriers of astronomical PAH bands that form a layer of emission at the interface with the H2 wind. △ Less

Submitted 31 March, 2025; originally announced March 2025.

Comments: Accepted for publication in Astronomy and Astrophysics

Journal ref: A&A 698, A8 (2025)

arXiv:2503.15414 [pdf, other]

Federated Continual 3D Segmentation With Single-round Communication

Authors: Can Peng, Qianhui Men, Pramit Saha, Qianye Yang, Cheng Ouyang, J. Alison Noble

Abstract: Federated learning seeks to foster collaboration among distributed clients while preserving the privacy of their local data. Traditionally, federated learning methods assume a fixed setting in which client data and learning objectives remain constant. However, in real-world scenarios, new clients may join, and existing clients may expand the segmentation label set as task requirements evolve. In s… ▽ More Federated learning seeks to foster collaboration among distributed clients while preserving the privacy of their local data. Traditionally, federated learning methods assume a fixed setting in which client data and learning objectives remain constant. However, in real-world scenarios, new clients may join, and existing clients may expand the segmentation label set as task requirements evolve. In such a dynamic federated analysis setup, the conventional federated communication strategy of model aggregation per communication round is suboptimal. As new clients join, this strategy requires retraining, linearly increasing communication and computation overhead. It also imposes requirements for synchronized communication, which is difficult to achieve among distributed clients. In this paper, we propose a federated continual learning strategy that employs a one-time model aggregation at the server through multi-model distillation. This approach builds and updates the global model while eliminating the need for frequent server communication. When integrating new data streams or onboarding new clients, this approach efficiently reuses previous client models, avoiding the need to retrain the global model across the entire federation. By minimizing communication load and bypassing the need to put unchanged clients online, our approach relaxes synchronization requirements among clients, providing an efficient and scalable federated analysis framework suited for real-world applications. Using multi-class 3D abdominal CT segmentation as an application task, we demonstrate the effectiveness of the proposed approach. △ Less

Submitted 19 March, 2025; originally announced March 2025.

arXiv:2503.10383 [pdf, other]

Position Reconstruction in the DEAP-3600 Dark Matter Search Experiment

Authors: The DEAP Collaboration, P. Adhikari, R. Ajaj, M. Alpízar-Venegas, P. -A. Amaudruz, J. Anstey, G. R. Araujo, D. J. Auty, M. Baldwin, M. Batygov, B. Beltran, H. Benmansour, M. A. Bigentini, C. E. Bina, J. Bonatt, W. M. Bonivento, M. G. Boulay, B. Broerman, J. F. Bueno, P. M. Burghardt, A. Butcher, M. Cadeddu, B. Cai, M. Cárdenas-Montes, S. Cavuoti , et al. (139 additional authors not shown)

Abstract: In the DEAP-3600 dark matter search experiment, precise reconstruction of the positions of scattering events in liquid argon is key for background rejection and defining a fiducial volume that enhances dark matter candidate events identification. This paper describes three distinct position reconstruction algorithms employed by DEAP-3600, leveraging the spatial and temporal information provided by… ▽ More In the DEAP-3600 dark matter search experiment, precise reconstruction of the positions of scattering events in liquid argon is key for background rejection and defining a fiducial volume that enhances dark matter candidate events identification. This paper describes three distinct position reconstruction algorithms employed by DEAP-3600, leveraging the spatial and temporal information provided by photomultipliers surrounding a spherical liquid argon vessel. Two of these methods are maximum-likelihood algorithms: the first uses the spatial distribution of detected photoelectrons, while the second incorporates timing information from the detected scintillation light. Additionally, a machine learning approach based on the pattern of photoelectron counts across the photomultipliers is explored. △ Less

Submitted 13 March, 2025; originally announced March 2025.

Comments: 24 pages, 16 figures

arXiv:2503.07799 [pdf, other]

Self-supervised Normality Learning and Divergence Vector-guided Model Merging for Zero-shot Congenital Heart Disease Detection in Fetal Ultrasound Videos

Authors: Pramit Saha, Divyanshu Mishra, Netzahualcoyotl Hernandez-Cruz, Olga Patey, Aris Papageorghiou, Yuki M. Asano, J. Alison Noble

Abstract: Congenital Heart Disease (CHD) is one of the leading causes of fetal mortality, yet the scarcity of labeled CHD data and strict privacy regulations surrounding fetal ultrasound (US) imaging present significant challenges for the development of deep learning-based models for CHD detection. Centralised collection of large real-world datasets for rare conditions, such as CHD, from large populations r… ▽ More Congenital Heart Disease (CHD) is one of the leading causes of fetal mortality, yet the scarcity of labeled CHD data and strict privacy regulations surrounding fetal ultrasound (US) imaging present significant challenges for the development of deep learning-based models for CHD detection. Centralised collection of large real-world datasets for rare conditions, such as CHD, from large populations requires significant co-ordination and resource. In addition, data governance rules increasingly prevent data sharing between sites. To address these challenges, we introduce, for the first time, a novel privacy-preserving, zero-shot CHD detection framework that formulates CHD detection as a normality modeling problem integrated with model merging. In our framework dubbed Sparse Tube Ultrasound Distillation (STUD), each hospital site first trains a sparse video tube-based self-supervised video anomaly detection (VAD) model on normal fetal heart US clips with self-distillation loss. This enables site-specific models to independently learn the distribution of healthy cases. To aggregate knowledge across the decentralized models while maintaining privacy, we propose a Divergence Vector-Guided Model Merging approach, DivMerge, that combines site-specific models into a single VAD model without data exchange. Our approach preserves domain-agnostic rich spatio-temporal representations, ensuring generalization to unseen CHD cases. We evaluated our approach on real-world fetal US data collected from 5 hospital sites. Our merged model outperformed site-specific models by 23.77% and 30.13% in accuracy and F1-score respectively on external test sets. △ Less

Submitted 10 March, 2025; originally announced March 2025.

arXiv:2502.10123 [pdf, other]

doi 10.1051/0004-6361/202452389

Modelling methanol and hydride formation in the JWST Ice Age era

Authors: Izaskun Jiménez-Serra, Andrés Megías, Joseph Salaris, Herma Cuppen, Angèle Taillard, Miwha Jin, Valentine Wakelam, Anton I. Vasyunin, Paola Caselli, Yvonne J. Pendleton, Emmanuel Dartois, Jennifer A. Noble, Serena Viti, Katerina Borshcheva, Robin T. Garrod, Thanja Lamberts, Helen Fraser, Gary Melnick, Melissa McClure, Will Rocha, Maria N. Drozdovskaya, Dariusz C. Lis

Abstract: (Abridged) JWST observations have measured the ice composition toward two highly-extinguished field stars in the Chamaeleon I cloud. The observed extinction excess on the long-wavelength side of the H2O ice band at 3 micron has been attributed to a mixture of CH3OH with ammonia hydrates, which suggests that CH3OH ice could have formed in a water-rich environment with little CO depletion. Laborator… ▽ More (Abridged) JWST observations have measured the ice composition toward two highly-extinguished field stars in the Chamaeleon I cloud. The observed extinction excess on the long-wavelength side of the H2O ice band at 3 micron has been attributed to a mixture of CH3OH with ammonia hydrates, which suggests that CH3OH ice could have formed in a water-rich environment with little CO depletion. Laboratory experiments and quantum chemical calculations suggest that CH3OH could form via the grain surface reactions CH3+OH and/or C+H2O in water-rich ices. However, no dedicated chemical modelling has been carried out thus far to test their efficiency and dependence on the astrochemical code employed. We model the ice chemistry in the Chamaeleon I cloud using a set of astrochemical codes (MAGICKAL, MONACO, Nautilus, UCLCHEM, and KMC simulations) to test the effects of the different code architectures and of the assumed ice chemistry. Our models show that the JWST ice observations are better reproduced for gas densities >1e5 cm-3 and collapse times >1e5 yr. CH3OH ice forms predominantly (>99%) via CO hydrogenation. The contribution of reactions CH3+OH and C+H2O, is negligible. The CO2 ice may form either via CO+OH or CO+O depending on the code. However, KMC simulations reveal that both mechanisms are efficient despite the low rate constant of the CO+O surface reaction. CH4 is largely underproduced for all codes except for UCLCHEM, for which a higher amount of atomic C is available during the initial translucent cloud phase. Large differences in the ice abundances are found at Tdust<12 K between diffusive and non-diffusive chemistry codes. This is due to the fact that non-diffusive chemistry takes over diffusive chemistry at such low Tdust. This could explain the rather constant ice chemical composition found in Chamaeleon I and other dense cores despite the different visual extinctions probed. △ Less

Submitted 14 February, 2025; originally announced February 2025.

Comments: Accepted in A&A

Journal ref: A&A 695, A247 (2025)

arXiv:2502.09384 [pdf, other]

Predicting the detectability of sulphur-bearing molecules in the solid phase with simulated spectra of JWST instruments

Authors: A. Taillard, R. Martín-Doménech, H. Carrascosa, J. A. Noble, G. M. Muñoz Caro, E. Dartois, D. Navarro-Almaida, B. Escribano, A. Sanchez-Monge, A. Fuente

Abstract: To date, gas phase observations of sulphur in dense interstellar environments have only constrained the molecular carriers of 1% of its predicted cosmic abundance. An additional 5% is known to be locked up in molecular solids in dense clouds, leaving the main reservoir of depleted sulphur in the solid phase unknown. The spectral resolution and sensitivity of the JWST could make a substantial diffe… ▽ More To date, gas phase observations of sulphur in dense interstellar environments have only constrained the molecular carriers of 1% of its predicted cosmic abundance. An additional 5% is known to be locked up in molecular solids in dense clouds, leaving the main reservoir of depleted sulphur in the solid phase unknown. The spectral resolution and sensitivity of the JWST could make a substantial difference in detecting part of this missing sulphur, with its wavelength coverage that includes vibrational absorption features of the S-carriers H2S, OCS, SO2, CS2, SO, CS, and S8. The aim of this study is to determine whether these molecules may be viable candidates for detection. We carried out new laboratory measurements of the IR absorption spectra of CS2 and S8 to update the IR band strength of the most intense CS2 absorption feature at 6.8 μm, as well as to determine that of S8 at 20.3 μm for the first time. These data, along with values previously reported in the literature, allow us to evaluate which S-bearing species could be potentially detected with JWST in interstellar ices. Taking the literature abundances of the major ice species determined by previous IR observations towards starless cores, LYSOs and MYSOs, we generated simulated IR spectra using the characteristics of the instruments on the JWST. Thus, we have been able to establish a case study for three stages of the star formation process. We conclude that the detection of S-bearing molecules remains challenging. Despite these obstacles, the detection of H2S and potentially SO2 should be possible in regions with favourable physical and chemical conditions. In contrast, S8 would remain undetected. Although the sensitivity of JWST is insufficient to determine the sulphur budget in the solid state, the detection of an additional icy sulphur compound (H2S, SO2) would enable us to elevate our knowledge of sulphur chemistry. △ Less

Submitted 13 February, 2025; originally announced February 2025.

arXiv:2502.05710 [pdf, other]

SSDD-GAN: Single-Step Denoising Diffusion GAN for Cochlear Implant Surgical Scene Completion

Authors: Yike Zhang, Eduardo Davalos, Jack Noble

Abstract: Recent deep learning-based image completion methods, including both inpainting and outpainting, have demonstrated promising results in restoring corrupted images by effectively filling various missing regions. Among these, Generative Adversarial Networks (GANs) and Denoising Diffusion Probabilistic Models (DDPMs) have been employed as key generative image completion approaches, excelling in the fi… ▽ More Recent deep learning-based image completion methods, including both inpainting and outpainting, have demonstrated promising results in restoring corrupted images by effectively filling various missing regions. Among these, Generative Adversarial Networks (GANs) and Denoising Diffusion Probabilistic Models (DDPMs) have been employed as key generative image completion approaches, excelling in the field of generating high-quality restorations with reduced artifacts and improved fine details. In previous work, we developed a method aimed at synthesizing views from novel microscope positions for mastoidectomy surgeries; however, that approach did not have the ability to restore the surrounding surgical scene environment. In this paper, we propose an efficient method to complete the surgical scene of the synthetic postmastoidectomy dataset. Our approach leverages self-supervised learning on real surgical datasets to train a Single-Step Denoising Diffusion-GAN (SSDD-GAN), combining the advantages of diffusion models with the adversarial optimization of GANs for improved Structural Similarity results of 6%. The trained model is then directly applied to the synthetic postmastoidectomy dataset using a zero-shot approach, enabling the generation of realistic and complete surgical scenes without the need for explicit ground-truth labels from the synthetic postmastoidectomy dataset. This method addresses key limitations in previous work, offering a novel pathway for full surgical microscopy scene completion and enhancing the usability of the synthetic postmastoidectomy dataset in surgical preoperative planning and intraoperative navigation. △ Less

Submitted 8 February, 2025; originally announced February 2025.

arXiv:2412.14424 [pdf, other]

FedPIA -- Permuting and Integrating Adapters leveraging Wasserstein Barycenters for Finetuning Foundation Models in Multi-Modal Federated Learning

Authors: Pramit Saha, Divyanshu Mishra, Felix Wagner, Konstantinos Kamnitsas, J. Alison Noble

Abstract: Large Vision-Language Models typically require large text and image datasets for effective fine-tuning. However, collecting data from various sites, especially in healthcare, is challenging due to strict privacy regulations. An alternative is to fine-tune these models on end-user devices, such as in medical clinics, without sending data to a server. These local clients typically have limited compu… ▽ More Large Vision-Language Models typically require large text and image datasets for effective fine-tuning. However, collecting data from various sites, especially in healthcare, is challenging due to strict privacy regulations. An alternative is to fine-tune these models on end-user devices, such as in medical clinics, without sending data to a server. These local clients typically have limited computing power and small datasets, which are not enough for fully fine-tuning large VLMs on their own. A naive solution to these scenarios is to leverage parameter-efficient fine-tuning (PEFT) strategies and apply federated learning (FL) algorithms to combine the learned adapter weights, thereby respecting the resource limitations and data privacy. However, this approach does not fully leverage the knowledge from multiple adapters trained on diverse data distributions and for diverse tasks. The adapters are adversely impacted by data heterogeneity and task heterogeneity across clients resulting in suboptimal convergence. To this end, we propose a novel framework called FedPIA that improves upon the naive combinations of FL and PEFT by introducing Permutation and Integration of the local Adapters in the server and global Adapters in the clients exploiting Wasserstein barycenters for improved blending of client-specific and client-agnostic knowledge. This layerwise permutation helps to bridge the gap in the parameter space of local and global adapters before integration. We conduct over 2000 client-level experiments utilizing 48 medical image datasets across five different medical vision-language FL task settings encompassing visual question answering as well as image and report-based multi-label disease detection. Our experiments involving diverse client settings, ten different modalities, and two VLM backbones demonstrate that FedPIA consistently outperforms the state-of-the-art PEFT-FL baselines. △ Less

Submitted 18 December, 2024; originally announced December 2024.

Comments: Accepted for publication in AAAI 2025 (Main Track)

arXiv:2411.19651 [pdf, other]

doi 10.1051/0004-6361/202451505

Ice inventory towards the protostar Ced 110 IRS4 observed with the James Webb Space Telescope. Results from the ERS Ice Age program

Authors: W. R. M. Rocha, M. K. McClure, J. A. Sturm, T. L. Beck, Z. L. Smith, H. Dickinson, F. Sun, E. Egami, A. C. A. Boogert, H. J. Fraser, E. Dartois, I. Jimenez-Serra, J. A. Noble, J. Bergner, P. Caselli, S. B. Charnley, J. Chiar, L. Chu, I. Cooke, N. Crouzet, E. F. van Dishoeck, M. N. Drozdovskaya, R. Garrod, D. Harsono, S. Ioppolo , et al. (15 additional authors not shown)

Abstract: This work focuses on the ice features toward the binary protostellar system Ced 110 IRS 4A and 4B, and observed with JWST as part of the Early Release Science Ice Age collaboration. We aim to explore the JWST observations of the binary protostellar system Ced~110~IRS4A and IRS4B to unveil and quantify the ice inventories toward these sources. We compare the ice abundances with those found for the… ▽ More This work focuses on the ice features toward the binary protostellar system Ced 110 IRS 4A and 4B, and observed with JWST as part of the Early Release Science Ice Age collaboration. We aim to explore the JWST observations of the binary protostellar system Ced~110~IRS4A and IRS4B to unveil and quantify the ice inventories toward these sources. We compare the ice abundances with those found for the same molecular cloud. The analysis is performed by fitting or comparing laboratory infrared spectra of ices to the observations. Spectral fits are carried out with the ENIIGMA fitting tool that searches for the best fit. For Ced~110~IRS4B, we detected the major ice species H$_2$O, CO, CO$_2$ and NH$_3$. All species are found in a mixture except for CO and CO$_2$, which have both mixed and pure ice components. In the case of Ced~110~IRS4A, we detected the same major species as in Ced~110~IRS4B, as well as the following minor species CH$_4$, SO$_2$, CH$_3$OH, OCN$^-$, NH$_4^+$ and HCOOH. Tentative detection of N$_2$O ice (7.75~$μ$m), forsterite dust (11.2~$μ$m) and CH$_3^+$ gas emission (7.18~$μ$m) in the primary source are also presented. Compared with the two lines of sight toward background stars in the Chameleon I molecular cloud, the protostar has similar ice abundances, except in the case of the ions that are higher in IRS4A. The clearest differences are the absence of the 7.2 and 7.4~$μ$m absorption features due to HCOO$^-$ and icy complex organic molecules in IRS4A and evidence of thermal processing in both IRS4A and IRS4B as probed by the CO$_2$ ice features. We conclude that the binary protostellar system Ced~110~IRS4A and IRS4B has a large inventory of icy species. The similar ice abundances in comparison to the starless regions in the same molecular cloud suggest that the chemical conditions of the protostar were set at earlier stages in the molecular cloud. △ Less

Submitted 29 November, 2024; originally announced November 2024.

Comments: 33 pages, 19 Figures. Accepted for publication in Astronomy & Astrophysics

Journal ref: A&A 693, A288 (2025)

arXiv:2411.16453 [pdf]

doi 10.1063/5.0223957

Observation of a core-excited dipole-bound state $\sim$1 eV above the electron detachment threshold in cryogenically cooled acetylacetonate

Authors: Rafael A Jara-Toro, Martín Taccone, Jordan Dezalay, Jennifer Noble, Gert von Helden, Gustavo Pino

Abstract: Dipole-bound states in anions exist when a polar neutral core binds an electron in a diffuse orbital through charge--dipole interaction. Electronically excited polar neutral cores can also bind an electron in a diffuse orbital to form Core-Excited Dipole-Bound States (CE-DBSs), which are difficult to observe because they usually lie above the electron detachment threshold, leading to very short li… ▽ More Dipole-bound states in anions exist when a polar neutral core binds an electron in a diffuse orbital through charge--dipole interaction. Electronically excited polar neutral cores can also bind an electron in a diffuse orbital to form Core-Excited Dipole-Bound States (CE-DBSs), which are difficult to observe because they usually lie above the electron detachment threshold, leading to very short lifetimes and, thus, unstructured transitions. We report here the photodetachment spectroscopy of cryogenically cooled acetylacetonate anion (C$_5$H$_7$O$_2^-$) recorded by detecting the neutral radical produced upon photodetachment and the infrared spectroscopy in He-nanodroplets. Two DBSs were identified in this anion. One of them lies close to the electron detachment threshold ($\sim$2.74 eV) and is associated with the ground state of the radical (D0-DBS). Surprisingly, the other DBS appears as resonant transitions at 3.69 eV and is assigned to the CE-DBS associated with the first excited state of the radical (D1-DBS). It is proposed that the resonant transitions of the D1-DBS are observed $\sim$1 eV above the detachment threshold because its lifetime is determined by the internal conversion to the D0-DBS, after which the fast electron detachment takes place. △ Less

Submitted 25 November, 2024; originally announced November 2024.

Journal ref: The Journal of Chemical Physics, 2024, 161 (8)

arXiv:2411.11912 [pdf, other]

F$^3$OCUS -- Federated Finetuning of Vision-Language Foundation Models with Optimal Client Layer Updating Strategy via Multi-objective Meta-Heuristics

Authors: Pramit Saha, Felix Wagner, Divyanshu Mishra, Can Peng, Anshul Thakur, David Clifton, Konstantinos Kamnitsas, J. Alison Noble

Abstract: Effective training of large Vision-Language Models (VLMs) on resource-constrained client devices in Federated Learning (FL) requires the usage of parameter-efficient fine-tuning (PEFT) strategies. To this end, we demonstrate the impact of two factors \textit{viz.}, client-specific layer importance score that selects the most important VLM layers for fine-tuning and inter-client layer diversity sco… ▽ More Effective training of large Vision-Language Models (VLMs) on resource-constrained client devices in Federated Learning (FL) requires the usage of parameter-efficient fine-tuning (PEFT) strategies. To this end, we demonstrate the impact of two factors \textit{viz.}, client-specific layer importance score that selects the most important VLM layers for fine-tuning and inter-client layer diversity score that encourages diverse layer selection across clients for optimal VLM layer selection. We first theoretically motivate and leverage the principal eigenvalue magnitude of layerwise Neural Tangent Kernels and show its effectiveness as client-specific layer importance score. Next, we propose a novel layer updating strategy dubbed F$^3$OCUS that jointly optimizes the layer importance and diversity factors by employing a data-free, multi-objective, meta-heuristic optimization on the server. We explore 5 different meta-heuristic algorithms and compare their effectiveness for selecting model layers and adapter layers towards PEFT-FL. Furthermore, we release a new MedVQA-FL dataset involving overall 707,962 VQA triplets and 9 modality-specific clients and utilize it to train and evaluate our method. Overall, we conduct more than 10,000 client-level experiments on 6 Vision-Language FL task settings involving 58 medical image datasets and 4 different VLM architectures of varying sizes to demonstrate the effectiveness of the proposed method. △ Less

Submitted 30 March, 2025; v1 submitted 17 November, 2024; originally announced November 2024.

Comments: Accepted in CVPR 2025

arXiv:2410.18366 [pdf]

Cochlear Implantation of Slim Pre-curved Arrays using Automatic Pre-operative Insertion Plans

Authors: Kareem O. Tawfik, Mohammad M. R. Khan, Ankita Patro, Miriam R. Smetak, David Haynes, Robert F. Labadie, René H. Gifford, Jack H. Noble

Abstract: Hypothesis: Pre-operative cochlear implant (CI) electrode array (EL) insertion plans created by automated image analysis methods can improve positioning of slim pre-curved EL. Background: This study represents the first evaluation of a system for patient-customized EL insertion planning for a slim pre-curved EL. Methods: Twenty-one temporal bone specimens were divided into experimental and con… ▽ More Hypothesis: Pre-operative cochlear implant (CI) electrode array (EL) insertion plans created by automated image analysis methods can improve positioning of slim pre-curved EL. Background: This study represents the first evaluation of a system for patient-customized EL insertion planning for a slim pre-curved EL. Methods: Twenty-one temporal bone specimens were divided into experimental and control groups and underwent cochlear implantation. For the control group, the surgeon performed a traditional insertion without an insertion plan. For the experimental group, customized insertion plans guided entry site, trajectory, curl direction, and base insertion depth. An additional 35 clinical insertions from the same surgeon were analyzed, 7 of which were conducted using the insertion plans. EL positioning was analyzed using post-operative imaging auto-segmentation techniques, allowing measurement of angular insertion depth (AID), mean modiolar distance (MMD), and scalar position. Results: In the cadaveric temporal bones, 3 scalar translocations, including 2 foldovers, occurred in 14 control group insertions. In the clinical insertions, translocations occurred in 2 of 28 control cases. No translocations or folds occurred in the 7 experimental temporal bone and the 7 experimental clinical insertions. Among the non-translocated cases, overall AID and MMD were 401(41) degrees and 0.34(0.13) mm for the control insertions. AID and MMD for the experimental insertions were 424(43) degrees and 0.34(0.09) mm overall and were 432(19) and 0.30(0.07) mm for cases where the planned insertion depth was achieved. Conclusions: Trends toward improved EL positioning within scala tympani were observed when EL insertion plans are used. Variability in MMD was significantly reduced (0.07mm vs 0.13 mm, p=0.039) when the planned depth was achieved. △ Less

Submitted 23 October, 2024; originally announced October 2024.

Comments: First two listed authors are co-first authors

arXiv:2410.14169 [pdf, other]

DaRePlane: Direction-aware Representations for Dynamic Scene Reconstruction

Authors: Ange Lou, Benjamin Planche, Zhongpai Gao, Yamin Li, Tianyu Luan, Hao Ding, Meng Zheng, Terrence Chen, Ziyan Wu, Jack Noble

Abstract: Numerous recent approaches to modeling and re-rendering dynamic scenes leverage plane-based explicit representations, addressing slow training times associated with models like neural radiance fields (NeRF) and Gaussian splatting (GS). However, merely decomposing 4D dynamic scenes into multiple 2D plane-based representations is insufficient for high-fidelity re-rendering of scenes with complex mot… ▽ More Numerous recent approaches to modeling and re-rendering dynamic scenes leverage plane-based explicit representations, addressing slow training times associated with models like neural radiance fields (NeRF) and Gaussian splatting (GS). However, merely decomposing 4D dynamic scenes into multiple 2D plane-based representations is insufficient for high-fidelity re-rendering of scenes with complex motions. In response, we present DaRePlane, a novel direction-aware representation approach that captures scene dynamics from six different directions. This learned representation undergoes an inverse dual-tree complex wavelet transformation (DTCWT) to recover plane-based information. Within NeRF pipelines, DaRePlane computes features for each space-time point by fusing vectors from these recovered planes, then passed to a tiny MLP for color regression. When applied to Gaussian splatting, DaRePlane computes the features of Gaussian points, followed by a tiny multi-head MLP for spatial-time deformation prediction. Notably, to address redundancy introduced by the six real and six imaginary direction-aware wavelet coefficients, we introduce a trainable masking approach, mitigating storage issues without significant performance decline. To demonstrate the generality and efficiency of DaRePlane, we test it on both regular and surgical dynamic scenes, for both NeRF and GS systems. Extensive experiments show that DaRePlane yields state-of-the-art performance in novel view synthesis for various complex dynamic scenes. △ Less

Submitted 18 October, 2024; originally announced October 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2403.02265

arXiv:2410.07434 [pdf]

Surgical Depth Anything: Depth Estimation for Surgical Scenes using Foundation Models

Authors: Ange Lou, Yamin Li, Yike Zhang, Jack Noble

Abstract: Monocular depth estimation is crucial for tracking and reconstruction algorithms, particularly in the context of surgical videos. However, the inherent challenges in directly obtaining ground truth depth maps during surgery render supervised learning approaches impractical. While many self-supervised methods based on Structure from Motion (SfM) have shown promising results, they rely heavily on hi… ▽ More Monocular depth estimation is crucial for tracking and reconstruction algorithms, particularly in the context of surgical videos. However, the inherent challenges in directly obtaining ground truth depth maps during surgery render supervised learning approaches impractical. While many self-supervised methods based on Structure from Motion (SfM) have shown promising results, they rely heavily on high-quality camera motion and require optimization on a per-patient basis. These limitations can be mitigated by leveraging the current state-of-the-art foundational model for depth estimation, Depth Anything. However, when directly applied to surgical scenes, Depth Anything struggles with issues such as blurring, bleeding, and reflections, resulting in suboptimal performance. This paper presents a fine-tuning of the Depth Anything model specifically for the surgical domain, aiming to deliver more accurate pixel-wise depth maps tailored to the unique requirements and challenges of surgical environments. Our fine-tuning approach significantly improves the model's performance in surgical scenes, reducing errors related to blurring and reflections, and achieving a more reliable and precise depth estimation. △ Less

Submitted 9 October, 2024; originally announced October 2024.

arXiv:2409.08117 [pdf, other]

JWST ice band profiles reveal mixed ice compositions in the HH 48 NE disk

Authors: Jennifer B. Bergner, J. A. Sturm, Elettra L. Piacentino, M. K. McClure, Karin I. Oberg, A. C. A. Boogert, E. Dartois, M. N. Drozdovskaya, H. J. Fraser, Daniel Harsono, Sergio Ioppolo, Charles J. Law, Dariusz C. Lis, Brett A. McGuire, Gary J. Melnick, Jennifer A. Noble, M. E. Palumbo, Yvonne J. Pendleton, Giulia Perotti, Danna Qasim, W. R. M. Rocha, E. F. van Dishoeck

Abstract: Planet formation is strongly influenced by the composition and distribution of volatiles within protoplanetary disks. With JWST, it is now possible to obtain direct observational constraints on disk ices, as recently demonstrated by the detection of ice absorption features towards the edge-on HH 48 NE disk as part of the Ice Age Early Release Science program. Here, we introduce a new radiative tra… ▽ More Planet formation is strongly influenced by the composition and distribution of volatiles within protoplanetary disks. With JWST, it is now possible to obtain direct observational constraints on disk ices, as recently demonstrated by the detection of ice absorption features towards the edge-on HH 48 NE disk as part of the Ice Age Early Release Science program. Here, we introduce a new radiative transfer modeling framework designed to retrieve the composition and mixing status of disk ices using their band profiles, and apply it to interpret the H2O, CO2, and CO ice bands observed towards the HH 48 NE disk. We show that the ices are largely present as mixtures, with strong evidence for CO trapping in both H2O and CO2 ice. The HH 48 NE disk ice composition (pure vs. polar vs. apolar fractions) is markedly different from earlier protostellar stages, implying thermal and/or chemical reprocessing during the formation or evolution of the disk. We infer low ice-phase C/O ratios around 0.1 throughout the disk, and also demonstrate that the mixing and entrapment of disk ices can dramatically affect the radial dependence of the C/O ratio. It is therefore imperative that realistic disk ice compositions are considered when comparing planetary compositions with potential formation scenarios, which will fortunately be possible for an increasing number of disks with JWST. △ Less

Submitted 12 September, 2024; originally announced September 2024.

Comments: Accepted to ApJ. 24 pages, 15 figures

arXiv:2409.03190 [pdf, other]

Post-mastoidectomy Surface Multi-View Synthesis from a Single Microscopy Image

Authors: Yike Zhang, Jack Noble

Abstract: Cochlear Implant (CI) procedures involve performing an invasive mastoidectomy to insert an electrode array into the cochlea. In this paper, we introduce a novel pipeline that is capable of generating synthetic multi-view videos from a single CI microscope image. In our approach, we use a patient's pre-operative CT scan to predict the post-mastoidectomy surface using a method designed for this purp… ▽ More Cochlear Implant (CI) procedures involve performing an invasive mastoidectomy to insert an electrode array into the cochlea. In this paper, we introduce a novel pipeline that is capable of generating synthetic multi-view videos from a single CI microscope image. In our approach, we use a patient's pre-operative CT scan to predict the post-mastoidectomy surface using a method designed for this purpose. We manually align the surface with a selected microscope frame to obtain an accurate initial pose of the reconstructed CT mesh relative to the microscope. We then perform UV projection to transfer the colors from the frame to surface textures. Novel views of the textured surface can be used to generate a large dataset of synthetic frames with ground truth poses. We evaluated the quality of synthetic views rendered using Pytorch3D and PyVista. We found both rendering engines lead to similarly high-quality synthetic novel-view frames compared to ground truth with a structural similarity index for both methods averaging about 0.86. A large dataset of novel views with known poses is critical for ongoing training of a method to automatically estimate microscope pose for 2D to 3D registration with the pre-operative CT to facilitate augmented reality surgery. This dataset will empower various downstream tasks, such as integrating Augmented Reality (AR) in the OR, tracking surgical tools, and supporting other video analysis studies. △ Less

Submitted 28 February, 2025; v1 submitted 31 August, 2024; originally announced September 2024.

Comments: Submitted to Medical Imaging 2025: Image-Guided Procedures, Robotic Interventions, and Modeling

arXiv:2408.09931 [pdf, other]

Pose-GuideNet: Automatic Scanning Guidance for Fetal Head Ultrasound from Pose Estimation

Authors: Qianhui Men, Xiaoqing Guo, Aris T. Papageorghiou, J. Alison Noble

Abstract: 3D pose estimation from a 2D cross-sectional view enables healthcare professionals to navigate through the 3D space, and such techniques initiate automatic guidance in many image-guided radiology applications. In this work, we investigate how estimating 3D fetal pose from freehand 2D ultrasound scanning can guide a sonographer to locate a head standard plane. Fetal head pose is estimated by the pr… ▽ More 3D pose estimation from a 2D cross-sectional view enables healthcare professionals to navigate through the 3D space, and such techniques initiate automatic guidance in many image-guided radiology applications. In this work, we investigate how estimating 3D fetal pose from freehand 2D ultrasound scanning can guide a sonographer to locate a head standard plane. Fetal head pose is estimated by the proposed Pose-GuideNet, a novel 2D/3D registration approach to align freehand 2D ultrasound to a 3D anatomical atlas without the acquisition of 3D ultrasound. To facilitate the 2D to 3D cross-dimensional projection, we exploit the prior knowledge in the atlas to align the standard plane frame in a freehand scan. A semantic-aware contrastive-based approach is further proposed to align the frames that are off standard planes based on their anatomical similarity. In the experiment, we enhance the existing assessment of freehand image localization by comparing the transformation of its estimated pose towards standard plane with the corresponding probe motion, which reflects the actual view change in 3D anatomy. Extensive results on two clinical head biometry tasks show that Pose-GuideNet not only accurately predicts pose but also successfully predicts the direction of the fetal head. Evaluations with probe motions further demonstrate the feasibility of adopting Pose-GuideNet for freehand ultrasound-assisted navigation in a sensor-free environment. △ Less

Submitted 19 August, 2024; originally announced August 2024.

Comments: Accepted by MICCAI2024

arXiv:2408.08652 [pdf, other]

TextCAVs: Debugging vision models using text

Authors: Angus Nicolson, Yarin Gal, J. Alison Noble

Abstract: Concept-based interpretability methods are a popular form of explanation for deep learning models which provide explanations in the form of high-level human interpretable concepts. These methods typically find concept activation vectors (CAVs) using a probe dataset of concept examples. This requires labelled data for these concepts -- an expensive task in the medical domain. We introduce TextCAVs:… ▽ More Concept-based interpretability methods are a popular form of explanation for deep learning models which provide explanations in the form of high-level human interpretable concepts. These methods typically find concept activation vectors (CAVs) using a probe dataset of concept examples. This requires labelled data for these concepts -- an expensive task in the medical domain. We introduce TextCAVs: a novel method which creates CAVs using vision-language models such as CLIP, allowing for explanations to be created solely using text descriptions of the concept, as opposed to image exemplars. This reduced cost in testing concepts allows for many concepts to be tested and for users to interact with the model, testing new ideas as they are thought of, rather than a delay caused by image collection and annotation. In early experimental results, we demonstrate that TextCAVs produces reasonable explanations for a chest x-ray dataset (MIMIC-CXR) and natural images (ImageNet), and that these explanations can be used to debug deep learning-based models. △ Less

Submitted 16 August, 2024; originally announced August 2024.

Comments: 11 pages, 2 figures. Accepted at iMIMIC Workshop at MICCAI 2024

ACM Class: I.2.1; I.2.6

arXiv:2408.03761 [pdf, other]

MMSummary: Multimodal Summary Generation for Fetal Ultrasound Video

Authors: Xiaoqing Guo, Qianhui Men, J. Alison Noble

Abstract: We present the first automated multimodal summary generation system, MMSummary, for medical imaging video, particularly with a focus on fetal ultrasound analysis. Imitating the examination process performed by a human sonographer, MMSummary is designed as a three-stage pipeline, progressing from keyframe detection to keyframe captioning and finally anatomy segmentation and measurement. In the keyf… ▽ More We present the first automated multimodal summary generation system, MMSummary, for medical imaging video, particularly with a focus on fetal ultrasound analysis. Imitating the examination process performed by a human sonographer, MMSummary is designed as a three-stage pipeline, progressing from keyframe detection to keyframe captioning and finally anatomy segmentation and measurement. In the keyframe detection stage, an innovative automated workflow is proposed to progressively select a concise set of keyframes, preserving sufficient video information without redundancy. Subsequently, we adapt a large language model to generate meaningful captions for fetal ultrasound keyframes in the keyframe captioning stage. If a keyframe is captioned as fetal biometry, the segmentation and measurement stage estimates biometric parameters by segmenting the region of interest according to the textual prior. The MMSummary system provides comprehensive summaries for fetal ultrasound examinations and based on reported experiments is estimated to reduce scanning time by approximately 31.5%, thereby suggesting the potential to enhance clinical workflow efficiency. △ Less

Submitted 30 October, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

Comments: MICCAI 2024

arXiv:2408.01648 [pdf]

Zero-Shot Surgical Tool Segmentation in Monocular Video Using Segment Anything Model 2

Authors: Ange Lou, Yamin Li, Yike Zhang, Robert F. Labadie, Jack Noble

Abstract: The Segment Anything Model 2 (SAM 2) is the latest generation foundation model for image and video segmentation. Trained on the expansive Segment Anything Video (SA-V) dataset, which comprises 35.5 million masks across 50.9K videos, SAM 2 advances its predecessor's capabilities by supporting zero-shot segmentation through various prompts (e.g., points, boxes, and masks). Its robust zero-shot perfo… ▽ More The Segment Anything Model 2 (SAM 2) is the latest generation foundation model for image and video segmentation. Trained on the expansive Segment Anything Video (SA-V) dataset, which comprises 35.5 million masks across 50.9K videos, SAM 2 advances its predecessor's capabilities by supporting zero-shot segmentation through various prompts (e.g., points, boxes, and masks). Its robust zero-shot performance and efficient memory usage make SAM 2 particularly appealing for surgical tool segmentation in videos, especially given the scarcity of labeled data and the diversity of surgical procedures. In this study, we evaluate the zero-shot video segmentation performance of the SAM 2 model across different types of surgeries, including endoscopy and microscopy. We also assess its performance on videos featuring single and multiple tools of varying lengths to demonstrate SAM 2's applicability and effectiveness in the surgical domain. We found that: 1) SAM 2 demonstrates a strong capability for segmenting various surgical videos; 2) When new tools enter the scene, additional prompts are necessary to maintain segmentation accuracy; and 3) Specific challenges inherent to surgical videos can impact the robustness of SAM 2. △ Less

Submitted 2 August, 2024; originally announced August 2024.

Comments: The first work evaluates the performance of SAM 2 in surgical videos

arXiv:2407.15787 [pdf, other]

Self-supervised Mamba-based Mastoidectomy Shape Prediction for Cochlear Implant Surgery

Authors: Yike Zhang, Eduardo Davalos, Dingjie Su, Ange Lou, Jack H. Noble

Abstract: Cochlear Implant (CI) procedures require the insertion of an electrode array into the cochlea within the inner ear. To achieve this, mastoidectomy, a surgical procedure involving the removal of part of the mastoid region of the temporal bone using a high-speed drill provides safe access to the cochlea through the middle and inner ear. In this paper, we propose a novel Mamba-based method to synthes… ▽ More Cochlear Implant (CI) procedures require the insertion of an electrode array into the cochlea within the inner ear. To achieve this, mastoidectomy, a surgical procedure involving the removal of part of the mastoid region of the temporal bone using a high-speed drill provides safe access to the cochlea through the middle and inner ear. In this paper, we propose a novel Mamba-based method to synthesize the mastoidectomy volume using only preoperative Computed Tomography (CT) scans, where the mastoid remains intact. Our approach introduces a self-supervised learning framework designed to predict the mastoidectomy shape and reconstruct a 3D post-mastoidectomy surface directly from preoperative CT scans. This reconstruction aligns with intraoperative microscope views, enabling various downstream surgical applications. For training, we leverage postoperative CT scans to bypass manual data cleaning and labeling, even when the region removed during mastoidectomy is affected by challenges such as metal artifacts, low signal-to-noise ratio, or electrode wiring. Our method achieves a mean Dice score of 0.70 in estimating mastoidectomy regions, demonstrating its effectiveness for accurate and efficient surgical preoperative planning. △ Less

Submitted 28 February, 2025; v1 submitted 22 July, 2024; originally announced July 2024.

arXiv:2407.09627 [pdf, other]

doi 10.1051/0004-6361/202450865

A JWST/MIRI analysis of the ice distribution and PAH emission in the protoplanetary disk HH 48 NE

Authors: J. A. Sturm, M. K. McClure, D. Harsono, J. B. Bergner, E. Dartois, A. C. A. Boogert, M. A. Cordiner, M. N. Drozdovskaya, S. Ioppolo, C. J. Law, D. C. Lis, B. A. McGuire, G. J. Melnick, J. A. Noble, K. I. Öberg, M. E. Palumbo, Y. J. Pendleton, G. Perotti, W. R. M. Rocha, R. G. Urso, E. F. van Dishoeck

Abstract: Ice-coated dust grains provide the main reservoir of volatiles that play an important role in planet formation processes and may become incorporated into planetary atmospheres. However, due to observational challenges, the ice abundance distribution in protoplanetary disks is not well constrained. We present JWST/MIRI observations of the edge-on disk HH 48 NE carried out as part of the IRS program… ▽ More Ice-coated dust grains provide the main reservoir of volatiles that play an important role in planet formation processes and may become incorporated into planetary atmospheres. However, due to observational challenges, the ice abundance distribution in protoplanetary disks is not well constrained. We present JWST/MIRI observations of the edge-on disk HH 48 NE carried out as part of the IRS program Ice Age. We detect CO$_2$, NH$_3$, H$_2$O and tentatively CH$_4$ and NH$_4^+$. Radiative transfer models suggest that ice absorption features are produced predominantly in the 50-100 au region of the disk. The CO$_2$ feature at 15 micron probes a region closer to the midplane (z/r = 0.1-0.15) than the corresponding feature at 4.3 micron (z/r = 0.2-0.6), but all observations trace regions significantly above the midplane reservoirs where we expect the bulk of the ice mass to be located. Ices must reach a high scale height (z/r ~ 0.6; corresponding to modeled dust extinction Av ~ 0.1), in order to be consistent with the observed vertical distribution of the peak ice optical depths. The weakness of the CO$_2$ feature at 15 micron relative to the 4.3 micron feature and the red emission wing of the 4.3 micron CO$_2$ feature are both consistent with ices being located at high elevation in the disk. The retrieved NH$_3$ abundance and the upper limit on the CH$_3$OH abundance relative to H$_2$O are significantly lower than those in the interstellar medium (ISM), but consistent with cometary observations. Full wavelength coverage is required to properly study the abundance distribution of ices in disks. To explain the presence of ices at high disk altitudes, we propose two possible scenarios: a disk wind that entrains sufficient amounts of dust, thus blocking part of the stellar UV radiation, or vertical mixing that cycles enough ices into the upper disk layers to balance ice photodesorption. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 16 pages, 11 figures, accepted in A&A

Journal ref: A&A 689, A92 (2024)

arXiv:2406.17596 [pdf, other]

doi 10.1039/D4CP01177E

Flux and fluence effects on the Vacuum-UV photodesorption and photoprocessing of CO$_2$ ices

Authors: Antoine B. Hacquard, Daniela Torres-Diaz, Romain Basalgète, Delfina Toulouse, Géraldine Féraud, Samuel Del Fré, Jennifer A. Noble, Laurent Philippe, Xavier Michaut, Jean-Hugues Fillion, Anne Lafosse, Lionel Amiaud, Mathieu Bertin

Abstract: CO$_2$ is a major component of the icy mantles surrounding dust grains in planet and star formation regions. Understanding its photodesorption is crucial for explaining gas phase abundances in the coldest environments of the interstellar medium irradiated by vacuum-UV (VUV) photons. Photodesorption yields determined experimentally from CO$_2$ samples grown at low temperatures (T=15~K) have been fo… ▽ More CO$_2$ is a major component of the icy mantles surrounding dust grains in planet and star formation regions. Understanding its photodesorption is crucial for explaining gas phase abundances in the coldest environments of the interstellar medium irradiated by vacuum-UV (VUV) photons. Photodesorption yields determined experimentally from CO$_2$ samples grown at low temperatures (T=15~K) have been found to be very sensitive to experimental methods and conditions. Several mechanisms have been suggested for explaining the desorption of CO$_2$, O$_2$ and CO from CO$_2$ ices. In the present study, the cross sections characterizing the dynamics of photodesorption as a function of photon fluence (determined from released molecules in the gas phase) and of ice composition modification (determined in situ in the solid phase) are compared for the first time for different photon flux conditions (from 7.3$\times 10^{12}$~photon/s/cm$^2$ to 2.2$\times 10^{14}$~photon/s/cm$^2$) using monochromatic synchrotron radiation in the VUV range (on the DESIRS beamline at SOLEIL). This approach reveals that CO and O$_2$ desorption are decorrelated from that of CO$_2$. CO and O$_2$ photodesorption yields depend on photon flux conditions and can be linked to surface chemistry. By contrast, the phodesorption yield of CO$_2$ is independent of the photon flux conditions and can be linked to bulk ice chemical modification, consistently with an indirect desorption induced by electronic transition (DIET) process. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: 11 pages, 7 figures

Journal ref: Phys. Chem. Chem. Phys., 2024

arXiv:2406.11636 [pdf, other]

Feasibility of Federated Learning from Client Databases with Different Brain Diseases and MRI Modalities

Authors: Felix Wagner, Wentian Xu, Pramit Saha, Ziyun Liang, Daniel Whitehouse, David Menon, Virginia Newcombe, Natalie Voets, J. Alison Noble, Konstantinos Kamnitsas

Abstract: Segmentation models for brain lesions in MRI are typically developed for a specific disease and trained on data with a predefined set of MRI modalities. Such models cannot segment the disease using data with a different set of MRI modalities, nor can they segment other types of diseases. Moreover, this training paradigm prevents a model from using the advantages of learning from heterogeneous data… ▽ More Segmentation models for brain lesions in MRI are typically developed for a specific disease and trained on data with a predefined set of MRI modalities. Such models cannot segment the disease using data with a different set of MRI modalities, nor can they segment other types of diseases. Moreover, this training paradigm prevents a model from using the advantages of learning from heterogeneous databases that may contain scans and segmentation labels for different brain pathologies and diverse sets of MRI modalities. Additionally, the confidentiality of patient data often prevents central data aggregation, necessitating a decentralized approach. Is it feasible to use Federated Learning (FL) to train a single model on client databases that contain scans and labels of different brain pathologies and diverse sets of MRI modalities? We demonstrate promising results by combining appropriate, simple, and practical modifications to the model and training strategy: Designing a model with input channels that cover the whole set of modalities available across clients, training with random modality drop, and exploring the effects of feature normalization methods. Evaluation on 7 brain MRI databases with 5 different diseases shows that this FL framework can train a single model achieving very promising results in segmenting all disease types seen during training. Importantly, it can segment these diseases in new databases that contain sets of modalities different from those in training clients. These results demonstrate, for the first time, the feasibility and effectiveness of using FL to train a single 3D segmentation model on decentralised data with diverse brain diseases and MRI modalities, a necessary step towards leveraging heterogeneous real-world databases. Code: https://github.com/FelixWag/FedUniBrain △ Less

Submitted 19 November, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: Accepted as a conference paper at WACV 2025

ACM Class: I.4.9; I.4.6; I.2.11; I.4.0

arXiv:2406.02422 [pdf, other]

IterMask2: Iterative Unsupervised Anomaly Segmentation via Spatial and Frequency Masking for Brain Lesions in MRI

Authors: Ziyun Liang, Xiaoqing Guo, J. Alison Noble, Konstantinos Kamnitsas

Abstract: Unsupervised anomaly segmentation approaches to pathology segmentation train a model on images of healthy subjects, that they define as the 'normal' data distribution. At inference, they aim to segment any pathologies in new images as 'anomalies', as they exhibit patterns that deviate from those in 'normal' training data. Prevailing methods follow the 'corrupt-and-reconstruct' paradigm. They inten… ▽ More Unsupervised anomaly segmentation approaches to pathology segmentation train a model on images of healthy subjects, that they define as the 'normal' data distribution. At inference, they aim to segment any pathologies in new images as 'anomalies', as they exhibit patterns that deviate from those in 'normal' training data. Prevailing methods follow the 'corrupt-and-reconstruct' paradigm. They intentionally corrupt an input image, reconstruct it to follow the learned 'normal' distribution, and subsequently segment anomalies based on reconstruction error. Corrupting an input image, however, inevitably leads to suboptimal reconstruction even of normal regions, causing false positives. To alleviate this, we propose a novel iterative spatial mask-refining strategy IterMask2. We iteratively mask areas of the image, reconstruct them, and update the mask based on reconstruction error. This iterative process progressively adds information about areas that are confidently normal as per the model. The increasing content guides reconstruction of nearby masked areas, improving reconstruction of normal tissue under these areas, reducing false positives. We also use high-frequency image content as an auxiliary input to provide additional structural information for masked areas. This further improves reconstruction error of normal in comparison to anomalous areas, facilitating segmentation of the latter. We conduct experiments on several brain lesion datasets and demonstrate effectiveness of our method. Code is available at: https://github.com/ZiyunLiang/IterMask2 △ Less

Submitted 5 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

arXiv:2404.03713 [pdf, other]

Explaining Explainability: Recommendations for Effective Use of Concept Activation Vectors

Authors: Angus Nicolson, Lisa Schut, J. Alison Noble, Yarin Gal

Abstract: Concept-based explanations translate the internal representations of deep learning models into a language that humans are familiar with: concepts. One popular method for finding concepts is Concept Activation Vectors (CAVs), which are learnt using a probe dataset of concept exemplars. In this work, we investigate three properties of CAVs: (1) inconsistency across layers, (2) entanglement with othe… ▽ More Concept-based explanations translate the internal representations of deep learning models into a language that humans are familiar with: concepts. One popular method for finding concepts is Concept Activation Vectors (CAVs), which are learnt using a probe dataset of concept exemplars. In this work, we investigate three properties of CAVs: (1) inconsistency across layers, (2) entanglement with other concepts, and (3) spatial dependency. Each property provides both challenges and opportunities in interpreting models. We introduce tools designed to detect the presence of these properties, provide insight into how each property can lead to misleading explanations, and provide recommendations to mitigate their impact. To demonstrate practical applications, we apply our recommendations to a melanoma classification task, showing how entanglement can lead to uninterpretable results and that the choice of negative probe set can have a substantial impact on the meaning of a CAV. Further, we show that understanding these properties can be used to our advantage. For example, we introduce spatially dependent CAVs to test if a model is translation invariant with respect to a specific concept and class. Our experiments are performed on natural images (ImageNet), skin lesions (ISIC 2019), and a new synthetic dataset, Elements. Elements is designed to capture a known ground truth relationship between concepts and classes. We release this dataset to facilitate further research in understanding and evaluating interpretability methods. △ Less

Submitted 13 February, 2025; v1 submitted 4 April, 2024; originally announced April 2024.

Comments: Accepted by Transactions on Machine Learning Research (02/2025)

ACM Class: I.2.6

arXiv:2403.07219 [pdf, other]

Monocular Microscope to CT Registration using Pose Estimation of the Incus for Augmented Reality Cochlear Implant Surgery

Authors: Yike Zhang, Eduardo Davalos, Dingjie Su, Ange Lou, Jack H. Noble

Abstract: For those experiencing severe-to-profound sensorineural hearing loss, the cochlear implant (CI) is the preferred treatment. Augmented reality (AR) aided surgery can potentially improve CI procedures and hearing outcomes. Typically, AR solutions for image-guided surgery rely on optical tracking systems to register pre-operative planning information to the display so that hidden anatomy or other imp… ▽ More For those experiencing severe-to-profound sensorineural hearing loss, the cochlear implant (CI) is the preferred treatment. Augmented reality (AR) aided surgery can potentially improve CI procedures and hearing outcomes. Typically, AR solutions for image-guided surgery rely on optical tracking systems to register pre-operative planning information to the display so that hidden anatomy or other important information can be overlayed and co-registered with the view of the surgical scene. In this paper, our goal is to develop a method that permits direct 2D-to-3D registration of the microscope video to the pre-operative Computed Tomography (CT) scan without the need for external tracking equipment. Our proposed solution involves using surface mapping of a portion of the incus in surgical recordings and determining the pose of this structure relative to the surgical microscope by performing pose estimation via the perspective-n-point (PnP) algorithm. This registration can then be applied to pre-operative segmentations of other anatomy-of-interest, as well as the planned electrode insertion trajectory to co-register this information for the AR display. Our results demonstrate the accuracy with an average rotation error of less than 25 degrees and a translation error of less than 2 mm, 3 mm, and 0.55% for the x, y, and z axes, respectively. Our proposed method has the potential to be applicable and generalized to other surgical procedures while only needing a monocular microscope during intra-operation. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.02265 [pdf, other]

DaReNeRF: Direction-aware Representation for Dynamic Scenes

Authors: Ange Lou, Benjamin Planche, Zhongpai Gao, Yamin Li, Tianyu Luan, Hao Ding, Terrence Chen, Jack Noble, Ziyan Wu

Abstract: Addressing the intricate challenge of modeling and re-rendering dynamic scenes, most recent approaches have sought to simplify these complexities using plane-based explicit representations, overcoming the slow training time issues associated with methods like Neural Radiance Fields (NeRF) and implicit representations. However, the straightforward decomposition of 4D dynamic scenes into multiple 2D… ▽ More Addressing the intricate challenge of modeling and re-rendering dynamic scenes, most recent approaches have sought to simplify these complexities using plane-based explicit representations, overcoming the slow training time issues associated with methods like Neural Radiance Fields (NeRF) and implicit representations. However, the straightforward decomposition of 4D dynamic scenes into multiple 2D plane-based representations proves insufficient for re-rendering high-fidelity scenes with complex motions. In response, we present a novel direction-aware representation (DaRe) approach that captures scene dynamics from six different directions. This learned representation undergoes an inverse dual-tree complex wavelet transformation (DTCWT) to recover plane-based information. DaReNeRF computes features for each space-time point by fusing vectors from these recovered planes. Combining DaReNeRF with a tiny MLP for color regression and leveraging volume rendering in training yield state-of-the-art performance in novel view synthesis for complex dynamic scenes. Notably, to address redundancy introduced by the six real and six imaginary direction-aware wavelet coefficients, we introduce a trainable masking approach, mitigating storage issues without significant performance decline. Moreover, DaReNeRF maintains a 2x reduction in training time compared to prior art while delivering superior performance. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: Accepted at CVPR 2024. Paper + supplementary material

arXiv:2402.12256 [pdf, other]

JWST MIRI MRS Images Disk Winds, Water, and CO in an Edge-On Protoplanetary Disk

Authors: Nicole Arulanantham, M. K. McClure, Klaus Pontoppidan, Tracy L. Beck, J. A. Sturm, D. Harsono, A. C. A. Boogert, M. Cordiner, E. Dartois, M. N. Drozdovskaya, C. Espaillat, G. J. Melnick, J. A. Noble, M. E. Palumbo, Y. J. Pendleton, H. Terada, E. F. van Dishoeck

Abstract: We present JWST MIRI MRS observations of the edge-on protoplanetary disk around the young sub-solar mass star Tau 042021, acquired as part of the Cycle 1 GO program "Mapping Inclined Disk Astrochemical Signatures (MIDAS)." These data resolve the mid-IR spatial distributions of H$_2$, revealing X-shaped emission extending to ~200 au above the disk midplane with a semi-opening angle of $35 \pm 5$ de… ▽ More We present JWST MIRI MRS observations of the edge-on protoplanetary disk around the young sub-solar mass star Tau 042021, acquired as part of the Cycle 1 GO program "Mapping Inclined Disk Astrochemical Signatures (MIDAS)." These data resolve the mid-IR spatial distributions of H$_2$, revealing X-shaped emission extending to ~200 au above the disk midplane with a semi-opening angle of $35 \pm 5$ degrees. We do not velocity-resolve the gas in the spectral images, but the measured semi-opening angle of the H$_2$ is consistent with an MHD wind origin. A collimated, bipolar jet is seen in forbidden emission lines from [Ne II], [Ne III], [Ni II], [Fe II], [Ar II], and [S III]. Extended H$_2$O and CO emission lines are also detected, reaching diameters between ~90 and 190 au, respectively. Hot molecular emission is not expected at such radii, and we interpret its extended spatial distribution as scattering of inner disk molecular emission by dust grains in the outer disk surface. H I recombination lines, characteristic of inner disk accretion shocks, are similarly extended, and are likely also scattered light from the innermost star-disk interface. Finally, we detect extended PAH emission at 11.3 microns co-spatial with the scattered light continuum, making this the first low-mass T Tauri star around which extended PAHs have been confirmed, to our knowledge. MIRI MRS line images of edge-on disks provide an unprecedented window into the outflow, accretion, and scattering processes within protoplanetary disks, allowing us to constrain the disk lifetimes and accretion and mass loss mechanisms. △ Less

Submitted 20 March, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: Accepted for publication in ApJL on March 13th, 2024

arXiv:2402.10728 [pdf, other]

Semi-weakly-supervised neural network training for medical image registration

Authors: Yiwen Li, Yunguan Fu, Iani J. M. B. Gayo, Qianye Yang, Zhe Min, Shaheer U. Saeed, Wen Yan, Yipei Wang, J. Alison Noble, Mark Emberton, Matthew J. Clarkson, Dean C. Barratt, Victor A. Prisacariu, Yipeng Hu

Abstract: For training registration networks, weak supervision from segmented corresponding regions-of-interest (ROIs) have been proven effective for (a) supplementing unsupervised methods, and (b) being used independently in registration tasks in which unsupervised losses are unavailable or ineffective. This correspondence-informing supervision entails cost in annotation that requires significant specialis… ▽ More For training registration networks, weak supervision from segmented corresponding regions-of-interest (ROIs) have been proven effective for (a) supplementing unsupervised methods, and (b) being used independently in registration tasks in which unsupervised losses are unavailable or ineffective. This correspondence-informing supervision entails cost in annotation that requires significant specialised effort. This paper describes a semi-weakly-supervised registration pipeline that improves the model performance, when only a small corresponding-ROI-labelled dataset is available, by exploiting unlabelled image pairs. We examine two types of augmentation methods by perturbation on network weights and image resampling, such that consistency-based unsupervised losses can be applied on unlabelled data. The novel WarpDDF and RegCut approaches are proposed to allow commutative perturbation between an image pair and the predicted spatial transformation (i.e. respective input and output of registration networks), distinct from existing perturbation methods for classification or segmentation. Experiments using 589 male pelvic MR images, labelled with eight anatomical ROIs, show the improvement in registration performance and the ablated contributions from the individual strategies. Furthermore, this study attempts to construct one of the first computational atlases for pelvic structures, enabled by registering inter-subject MRs, and quantifies the significant differences due to the proposed semi-weak supervision with a discussion on the potential clinical use of example atlas-derived statistics. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.10425 [pdf]

DABS-LS: Deep Atlas-Based Segmentation Using Regional Level Set Self-Supervision

Authors: Hannah G. Mason, Jack H. Noble

Abstract: Cochlear implants (CIs) are neural prosthetics used to treat patients with severe-to-profound hearing loss. Patient-specific modeling of CI stimulation of the auditory nerve fiber (ANFs) can help audiologists improve the CI programming. These models require localization of the ANFs relative to surrounding anatomy and the CI. Localization is challenging because the ANFs are so small they are not di… ▽ More Cochlear implants (CIs) are neural prosthetics used to treat patients with severe-to-profound hearing loss. Patient-specific modeling of CI stimulation of the auditory nerve fiber (ANFs) can help audiologists improve the CI programming. These models require localization of the ANFs relative to surrounding anatomy and the CI. Localization is challenging because the ANFs are so small they are not directly visible in clinical imaging. In this work, we hypothesize the position of the ANFs can be accurately inferred from the location of the internal auditory canal (IAC), which has high contrast in CT, since the ANFs pass through this canal between the cochlea and the brain. Inspired by VoxelMorph, in this paper we propose a deep atlas-based IAC segmentation network. We create a single atlas in which the IAC and ANFs are pre-localized. Our network is trained to produce deformation fields (DFs) mapping coordinates from the atlas to new target volumes and that accurately segment the IAC. We hypothesize that DFs that accurately segment the IAC in target images will also facilitate accurate atlas-based localization of the ANFs. As opposed to VoxelMorph, which aims to produce DFs that accurately register the entire volume, our novel contribution is an entirely self-supervised training scheme that aims to produce DFs that accurately segment the target structure. This self-supervision is facilitated using a regional level set (LS) inspired loss function. We call our method Deep Atlas Based Segmentation using Level Sets (DABS-LS). Results show that DABS-LS outperforms VoxelMorph for IAC segmentation. Tests with publicly available datasets for trachea and kidney segmentation also show significant improvement in segmentation accuracy, demonstrating the generalizability of the method. △ Less

Submitted 15 February, 2024; originally announced February 2024.

arXiv:2402.05294 [pdf, other]

Examining Modality Incongruity in Multimodal Federated Learning for Medical Vision and Language-based Disease Detection

Authors: Pramit Saha, Divyanshu Mishra, Felix Wagner, Konstantinos Kamnitsas, J. Alison Noble

Abstract: Multimodal Federated Learning (MMFL) utilizes multiple modalities in each client to build a more powerful Federated Learning (FL) model than its unimodal counterpart. However, the impact of missing modality in different clients, also called modality incongruity, has been greatly overlooked. This paper, for the first time, analyses the impact of modality incongruity and reveals its connection with… ▽ More Multimodal Federated Learning (MMFL) utilizes multiple modalities in each client to build a more powerful Federated Learning (FL) model than its unimodal counterpart. However, the impact of missing modality in different clients, also called modality incongruity, has been greatly overlooked. This paper, for the first time, analyses the impact of modality incongruity and reveals its connection with data heterogeneity across participating clients. We particularly inspect whether incongruent MMFL with unimodal and multimodal clients is more beneficial than unimodal FL. Furthermore, we examine three potential routes of addressing this issue. Firstly, we study the effectiveness of various self-attention mechanisms towards incongruity-agnostic information fusion in MMFL. Secondly, we introduce a modality imputation network (MIN) pre-trained in a multimodal client for modality translation in unimodal clients and investigate its potential towards mitigating the missing modality problem. Thirdly, we assess the capability of client-level and server-level regularization techniques towards mitigating modality incongruity effects. Experiments are conducted under several MMFL settings on two publicly available real-world datasets, MIMIC-CXR and Open-I, with Chest X-Ray and radiology reports. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: 42 pages

arXiv:2402.00247 [pdf, other]

doi 10.1145/3643763

Towards AI-Assisted Synthesis of Verified Dafny Methods

Authors: Md Rakib Hossain Misu, Cristina V. Lopes, Iris Ma, James Noble

Abstract: Large language models show great promise in many domains, including programming. A promise is easy to make but hard to keep, and language models often fail to keep their promises, generating erroneous code. A promising avenue to keep models honest is to incorporate formal verification: generating programs' specifications as well as code so that the code can be proved correct with respect to the sp… ▽ More Large language models show great promise in many domains, including programming. A promise is easy to make but hard to keep, and language models often fail to keep their promises, generating erroneous code. A promising avenue to keep models honest is to incorporate formal verification: generating programs' specifications as well as code so that the code can be proved correct with respect to the specifications. Unfortunately, existing large language models show a severe lack of proficiency in verified programming. In this paper, we demonstrate how to improve two pretrained models' proficiency in the Dafny verification-aware language. Using 178 problems from the MBPP dataset, we prompt two contemporary models (GPT-4 and PaLM-2) to synthesize Dafny methods. We use three different types of prompts: a direct Contextless prompt; a Signature prompt that includes a method signature and test cases, and a Chain of Thought (CoT) prompt that decomposes the problem into steps and includes retrieval augmentation generated example problems and solutions. Our results show that GPT-4 performs better than PaLM-2 on these tasks and that both models perform best with the retrieval augmentation generated CoT prompt. GPT-4 was able to generate verified, human-evaluated, Dafny methods for 58% of the problems, however, GPT-4 managed only 19% of the problems with the Contextless prompt, and even fewer (10%) for the Signature prompt. We are thus able to contribute 153 verified Dafny solutions to MBPP problems, 50 that we wrote manually, and 103 synthesized by GPT-4. Our results demonstrate that the benefits of formal program verification are now within reach of code generating large language models... △ Less

Submitted 10 June, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

Comments: This is an author provided preprint. The final version will be published at Proc. ACM Softw. Eng; FSE 2024, in July 2024

arXiv:2311.00469 [pdf, other]

Dual Conditioned Diffusion Models for Out-Of-Distribution Detection: Application to Fetal Ultrasound Videos

Authors: Divyanshu Mishra, He Zhao, Pramit Saha, Aris T. Papageorghiou, J. Alison Noble

Abstract: Out-of-distribution (OOD) detection is essential to improve the reliability of machine learning models by detecting samples that do not belong to the training distribution. Detecting OOD samples effectively in certain tasks can pose a challenge because of the substantial heterogeneity within the in-distribution (ID), and the high structural similarity between ID and OOD classes. For instance, when… ▽ More Out-of-distribution (OOD) detection is essential to improve the reliability of machine learning models by detecting samples that do not belong to the training distribution. Detecting OOD samples effectively in certain tasks can pose a challenge because of the substantial heterogeneity within the in-distribution (ID), and the high structural similarity between ID and OOD classes. For instance, when detecting heart views in fetal ultrasound videos there is a high structural similarity between the heart and other anatomies such as the abdomen, and large in-distribution variance as a heart has 5 distinct views and structural variations within each view. To detect OOD samples in this context, the resulting model should generalise to the intra-anatomy variations while rejecting similar OOD samples. In this paper, we introduce dual-conditioned diffusion models (DCDM) where we condition the model on in-distribution class information and latent features of the input image for reconstruction-based OOD detection. This constrains the generative manifold of the model to generate images structurally and semantically similar to those within the in-distribution. The proposed model outperforms reference methods with a 12% improvement in accuracy, 22% higher precision, and an 8% better F1 score. △ Less

Submitted 1 November, 2023; originally announced November 2023.

Comments: Published in MICCAI 2023

arXiv:2310.18815 [pdf, other]

Rethinking Semi-Supervised Federated Learning: How to co-train fully-labeled and fully-unlabeled client imaging data

Authors: Pramit Saha, Divyanshu Mishra, J. Alison Noble

Abstract: The most challenging, yet practical, setting of semi-supervised federated learning (SSFL) is where a few clients have fully labeled data whereas the other clients have fully unlabeled data. This is particularly common in healthcare settings where collaborating partners (typically hospitals) may have images but not annotations. The bottleneck in this setting is the joint training of labeled and unl… ▽ More The most challenging, yet practical, setting of semi-supervised federated learning (SSFL) is where a few clients have fully labeled data whereas the other clients have fully unlabeled data. This is particularly common in healthcare settings where collaborating partners (typically hospitals) may have images but not annotations. The bottleneck in this setting is the joint training of labeled and unlabeled clients as the objective function for each client varies based on the availability of labels. This paper investigates an alternative way for effective training with labeled and unlabeled clients in a federated setting. We propose a novel learning scheme specifically designed for SSFL which we call Isolated Federated Learning (IsoFed) that circumvents the problem by avoiding simple averaging of supervised and semi-supervised models together. In particular, our training approach consists of two parts - (a) isolated aggregation of labeled and unlabeled client models, and (b) local self-supervised pretraining of isolated global models in all clients. We evaluate our model performance on medical image datasets of four different modalities publicly available within the biomedical image classification benchmark MedMNIST. We further vary the proportion of labeled clients and the degree of heterogeneity to demonstrate the effectiveness of the proposed method under varied experimental settings. △ Less

Submitted 28 October, 2023; originally announced October 2023.

Comments: Published in MICCAI 2023 with early acceptance and selected as 1 of the top 20 poster highlights under the category: Which work has the potential to impact other applications of AI and CV

arXiv:2310.16477 [pdf, other]

Show from Tell: Audio-Visual Modelling in Clinical Settings

Authors: Jianbo Jiao, Mohammad Alsharid, Lior Drukker, Aris T. Papageorghiou, Andrew Zisserman, J. Alison Noble

Abstract: Auditory and visual signals usually present together and correlate with each other, not only in natural environments but also in clinical settings. However, the audio-visual modelling in the latter case can be more challenging, due to the different sources of audio/video signals and the noise (both signal-level and semantic-level) in auditory signals -- usually speech. In this paper, we consider a… ▽ More Auditory and visual signals usually present together and correlate with each other, not only in natural environments but also in clinical settings. However, the audio-visual modelling in the latter case can be more challenging, due to the different sources of audio/video signals and the noise (both signal-level and semantic-level) in auditory signals -- usually speech. In this paper, we consider audio-visual modelling in a clinical setting, providing a solution to learn medical representations that benefit various clinical tasks, without human expert annotation. A simple yet effective multi-modal self-supervised learning framework is proposed for this purpose. The proposed approach is able to localise anatomical regions of interest during ultrasound imaging, with only speech audio as a reference. Experimental evaluations on a large-scale clinical multi-modal ultrasound video dataset show that the proposed self-supervised method learns good transferable anatomical representations that boost the performance of automated downstream clinical tasks, even outperforming fully-supervised solutions. △ Less

Submitted 25 October, 2023; originally announced October 2023.

arXiv:2309.07817 [pdf, other]

A JWST inventory of protoplanetary disk ices: The edge-on protoplanetary disk HH 48 NE, seen with the Ice Age ERS program

Authors: J. A. Sturm, M. K. McClure, T. L. Beck, D. Harsono, J. B. Bergner, E. Dartois, A. C. A. Boogert, J. E. Chiar, M. A. Cordiner, M. N. Drozdovskaya, S. Ioppolo, C. J. Law, H. Linnartz, D. C. Lis, G. J. Melnick, B. A. McGuire, J. A. Noble, K. I. Öberg, M. E. Palumbo, Y. J. Pendleton, G. Perotti, K. M. Pontoppidan, D. Qasim, W. R. M. Rocha, H. Terada , et al. (2 additional authors not shown)

Abstract: Ices are the main carriers of volatiles in protoplanetary disks and are crucial to our understanding of the chemistry that ultimately sets the organic composition of planets. The ERS program Ice Age on the JWST follows the ice evolution through all stages of star and planet formation. JWST/NIRSpec observations of the edge-on Class II protoplanetary disk HH~48~NE reveal spatially resolved absorptio… ▽ More Ices are the main carriers of volatiles in protoplanetary disks and are crucial to our understanding of the chemistry that ultimately sets the organic composition of planets. The ERS program Ice Age on the JWST follows the ice evolution through all stages of star and planet formation. JWST/NIRSpec observations of the edge-on Class II protoplanetary disk HH~48~NE reveal spatially resolved absorption features of the major ice components H$_2$O, CO$_2$, CO, and multiple weaker signatures from less abundant ices NH$_3$, OCN$^-$, and OCS. Isotopologue $^{13}$CO$_2$ ice has been detected for the first time in a protoplanetary disk. Since multiple complex light paths contribute to the observed flux, the ice absorption features are filled in by ice-free scattered light. The $^{12}$CO$_2$/$^{13}$CO$_2$ ratio of 14 implies that the $^{12}$CO$_2$ feature is saturated, without the flux approaching 0, indicative of a very high CO$_2$ column density on the line of sight, and a corresponding abundance with respect to hydrogen that is higher than ISM values by a factor of at least a few. Observations of rare isotopologues are crucial, as we show that the $^{13}$CO$_2$ observation allows us to determine the column density of CO$_2$ to be at an order of magnitude higher than the lower limit directly inferred from the observed optical depth. Radial variations in ice abundance, e.g., snowlines, are significantly modified since all observed photons have passed through the full radial extent of the disk. CO ice is observed at perplexing heights in the disk, extending to the top of the CO-emitting gas layer. We argue that the most likely interpretation is that we observe some CO ice at high temperatures, trapped in less volatile ices like H$_2$O and CO$_2$. Future radiative transfer models will be required to constrain the implications on our current understanding of disk physics and chemistry. △ Less

Submitted 14 September, 2023; originally announced September 2023.

Comments: 16 pages, 8 figures, accepted for publication in A&A

arXiv:2309.02983 [pdf, other]

doi 10.1145/3622846

Reference Capabilities for Flexible Memory Management: Extended Version

Authors: Ellen Arvidsson, Elias Castegren, Sylvan Clebsch, Sophia Drossopoulou, James Noble, Matthew J. Parkinson, Tobias Wrigstad

Abstract: Verona is a concurrent object-oriented programming language that organises all the objects in a program into a forest of isolated regions. Memory is managed locally for each region, so programmers can control a program's memory use by adjusting objects' partition into regions, and by setting each region's memory management strategy. A thread can only mutate (allocate, deallocate) objects within on… ▽ More Verona is a concurrent object-oriented programming language that organises all the objects in a program into a forest of isolated regions. Memory is managed locally for each region, so programmers can control a program's memory use by adjusting objects' partition into regions, and by setting each region's memory management strategy. A thread can only mutate (allocate, deallocate) objects within one active region -- its "window of mutability". Memory management costs are localised to the active region, ensuring overheads can be predicted and controlled. Moving the mutability window between regions is explicit, so code can be executed wherever it is required, yet programs remain in control of memory use. An ownership type system based on reference capabilities enforces region isolation, controlling aliasing within and between regions, yet supporting objects moving between regions and threads. Data accesses never need expensive atomic operations, and are always thread-safe. △ Less

Submitted 6 September, 2023; originally announced September 2023.

Comments: 87 pages, 10 figures, 5 listings, 4 tables. Extended version of paper to be published at OOPSLA 2023

arXiv:2308.11776 [pdf]

WS-SfMLearner: Self-supervised Monocular Depth and Ego-motion Estimation on Surgical Videos with Unknown Camera Parameters

Authors: Ange Lou, Jack Noble

Abstract: Depth estimation in surgical video plays a crucial role in many image-guided surgery procedures. However, it is difficult and time consuming to create depth map ground truth datasets in surgical videos due in part to inconsistent brightness and noise in the surgical scene. Therefore, building an accurate and robust self-supervised depth and camera ego-motion estimation system is gaining more atten… ▽ More Depth estimation in surgical video plays a crucial role in many image-guided surgery procedures. However, it is difficult and time consuming to create depth map ground truth datasets in surgical videos due in part to inconsistent brightness and noise in the surgical scene. Therefore, building an accurate and robust self-supervised depth and camera ego-motion estimation system is gaining more attention from the computer vision community. Although several self-supervision methods alleviate the need for ground truth depth maps and poses, they still need known camera intrinsic parameters, which are often missing or not recorded. Moreover, the camera intrinsic prediction methods in existing works depend heavily on the quality of datasets. In this work, we aimed to build a self-supervised depth and ego-motion estimation system which can predict not only accurate depth maps and camera pose, but also camera intrinsic parameters. We proposed a cost-volume-based supervision manner to give the system auxiliary supervision for camera parameters prediction. The experimental results showed that the proposed method improved the accuracy of estimated camera parameters, ego-motion, and depth estimation. △ Less

Submitted 5 February, 2024; v1 submitted 22 August, 2023; originally announced August 2023.

Comments: Accepted by SPIE 2024

arXiv:2308.11774 [pdf]

SAMSNeRF: Segment Anything Model (SAM) Guides Dynamic Surgical Scene Reconstruction by Neural Radiance Field (NeRF)

Authors: Ange Lou, Yamin Li, Xing Yao, Yike Zhang, Jack Noble

Abstract: The accurate reconstruction of surgical scenes from surgical videos is critical for various applications, including intraoperative navigation and image-guided robotic surgery automation. However, previous approaches, mainly relying on depth estimation, have limited effectiveness in reconstructing surgical scenes with moving surgical tools. To address this limitation and provide accurate 3D positio… ▽ More The accurate reconstruction of surgical scenes from surgical videos is critical for various applications, including intraoperative navigation and image-guided robotic surgery automation. However, previous approaches, mainly relying on depth estimation, have limited effectiveness in reconstructing surgical scenes with moving surgical tools. To address this limitation and provide accurate 3D position prediction for surgical tools in all frames, we propose a novel approach called SAMSNeRF that combines Segment Anything Model (SAM) and Neural Radiance Field (NeRF) techniques. Our approach generates accurate segmentation masks of surgical tools using SAM, which guides the refinement of the dynamic surgical scene reconstruction by NeRF. Our experimental results on public endoscopy surgical videos demonstrate that our approach successfully reconstructs high-fidelity dynamic surgical scenes and accurately reflects the spatial information of surgical tools. Our proposed approach can significantly enhance surgical navigation and automation by providing surgeons with accurate 3D position information of surgical tools during surgery.The source code will be released soon. △ Less

Submitted 5 February, 2024; v1 submitted 22 August, 2023; originally announced August 2023.

Comments: Accepted by SPIE 2024

arXiv:2306.08346 [pdf, other]

doi 10.1051/0004-6361/202346188

Astrochemical models of interstellar ices: History matters

Authors: A. Clément, A. Taillard, V. Wakelam, P. Gratier, J. -C. Loison, E. Dartois, F. Dulieu, J. A. Noble, M. Chabot

Abstract: Ice is ubiquitous in the interstellar medium. We model the formation of the main constituents of interstellar ices, including H2O, CO2 , CO, and CH3 OH. We strive to understand what physical or chemical parameters influence the final composition of the ice and how they benchmark to what has already been observed, with the aim of applying these models to the preparation and analysis of JWST observa… ▽ More Ice is ubiquitous in the interstellar medium. We model the formation of the main constituents of interstellar ices, including H2O, CO2 , CO, and CH3 OH. We strive to understand what physical or chemical parameters influence the final composition of the ice and how they benchmark to what has already been observed, with the aim of applying these models to the preparation and analysis of JWST observations. We used the Nautilus gas-grain model, which computes the gas and ice composition as a function of time for a set of physical conditions, starting from an initial gas phase composition. All important processes (gas-phase reactions, gas-grain interactions, and grain surface processes) are included and solved with the rate equation approximation. We first ran an astrochemical code for fixed conditions of temperature and density mapped in the cold core L429-C to benchmark the chemistry. One key parameter was revealed to be the dust temperature. When the dust temperature is higher than 12 K, CO2 will form efficiently at the expense of H2O, while at temperatures below 12 K, it will not form. Whatever hypothesis we assumed for the chemistry (within realistic conditions), the static simulations failed to reproduce the observed trends of interstellar ices in our target core. In a second step, we simulated the chemical evolution of parcels of gas undergoing different physical and chemical situations throughout the molecular cloud evolution and starting a few 1e7 yr prior to the core formation (dynamical simulations). Our dynamical simulations satisfactorily reproduce the main trends already observed for interstellar ices. Moreover, we predict that the apparent constant ratio of CO2/H2O observed to date is probably not true for regions of low AV , and that the history of the evolution of clouds plays an essential role, even prior to their formation. △ Less

Submitted 14 June, 2023; originally announced June 2023.

Comments: Accepted for publication in A&A

arXiv:2305.07152 [pdf, other]

Intuitive Surgical SurgToolLoc Challenge Results: 2022-2023

Authors: Aneeq Zia, Max Berniker, Rogerio Garcia Nespolo, Conor Perreault, Kiran Bhattacharyya, Xi Liu, Ziheng Wang, Satoshi Kondo, Satoshi Kasai, Kousuke Hirasawa, Bo Liu, David Austin, Yiheng Wang, Michal Futrega, Jean-Francois Puget, Zhenqiang Li, Yoichi Sato, Ryo Fujii, Ryo Hachiuma, Mana Masuda, Hideo Saito, An Wang, Mengya Xu, Mobarakol Islam, Long Bai , et al. (69 additional authors not shown)

Abstract: Robotic assisted (RA) surgery promises to transform surgical intervention. Intuitive Surgical is committed to fostering these changes and the machine learning models and algorithms that will enable them. With these goals in mind we have invited the surgical data science community to participate in a yearly competition hosted through the Medical Imaging Computing and Computer Assisted Interventions… ▽ More Robotic assisted (RA) surgery promises to transform surgical intervention. Intuitive Surgical is committed to fostering these changes and the machine learning models and algorithms that will enable them. With these goals in mind we have invited the surgical data science community to participate in a yearly competition hosted through the Medical Imaging Computing and Computer Assisted Interventions (MICCAI) conference. With varying changes from year to year, we have challenged the community to solve difficult machine learning problems in the context of advanced RA applications. Here we document the results of these challenges, focusing on surgical tool localization (SurgToolLoc). The publicly released dataset that accompanies these challenges is detailed in a separate paper arXiv:2501.09209 [1]. △ Less

Submitted 28 February, 2025; v1 submitted 11 May, 2023; originally announced May 2023.

Showing 1–50 of 192 results for author: Noble, J