-
UMA: A Family of Universal Models for Atoms
Authors:
Brandon M. Wood,
Misko Dzamba,
Xiang Fu,
Meng Gao,
Muhammed Shuaibi,
Luis Barroso-Luque,
Kareem Abdelmaqsoud,
Vahe Gharakhanyan,
John R. Kitchin,
Daniel S. Levine,
Kyle Michel,
Anuroop Sriram,
Taco Cohen,
Abhishek Das,
Ammar Rizvi,
Sushree Jagriti Sahoo,
Zachary W. Ulissi,
C. Lawrence Zitnick
Abstract:
The ability to quickly and accurately compute properties from atomic simulations is critical for advancing a large number of applications in chemistry and materials science including drug discovery, energy storage, and semiconductor manufacturing. To address this need, Meta FAIR presents a family of Universal Models for Atoms (UMA), designed to push the frontier of speed, accuracy, and generalizat…
▽ More
The ability to quickly and accurately compute properties from atomic simulations is critical for advancing a large number of applications in chemistry and materials science including drug discovery, energy storage, and semiconductor manufacturing. To address this need, Meta FAIR presents a family of Universal Models for Atoms (UMA), designed to push the frontier of speed, accuracy, and generalization. UMA models are trained on half a billion unique 3D atomic structures (the largest training runs to date) by compiling data across multiple chemical domains, e.g. molecules, materials, and catalysts. We develop empirical scaling laws to help understand how to increase model capacity alongside dataset size to achieve the best accuracy. The UMA small and medium models utilize a novel architectural design we refer to as mixture of linear experts that enables increasing model capacity without sacrificing speed. For example, UMA-medium has 1.4B parameters but only ~50M active parameters per atomic structure. We evaluate UMA models on a diverse set of applications across multiple domains and find that, remarkably, a single model without any fine-tuning can perform similarly or better than specialized models. We are releasing the UMA code, weights, and associated data to accelerate computational workflows and enable the community to continue to build increasingly capable AI models.
△ Less
Submitted 30 June, 2025;
originally announced June 2025.
-
OpenPros: A Large-Scale Dataset for Limited View Prostate Ultrasound Computed Tomography
Authors:
Hanchen Wang,
Yixuan Wu,
Yinan Feng,
Peng Jin,
Shihang Feng,
Yiming Mao,
James Wiskin,
Baris Turkbey,
Peter A. Pinto,
Bradford J. Wood,
Songting Luo,
Yinpeng Chen,
Emad Boctor,
Youzuo Lin
Abstract:
Prostate cancer is one of the most common and lethal cancers among men, making its early detection critically important. Although ultrasound imaging offers greater accessibility and cost-effectiveness compared to MRI, traditional transrectal ultrasound methods suffer from low sensitivity, especially in detecting anteriorly located tumors. Ultrasound computed tomography provides quantitative tissue…
▽ More
Prostate cancer is one of the most common and lethal cancers among men, making its early detection critically important. Although ultrasound imaging offers greater accessibility and cost-effectiveness compared to MRI, traditional transrectal ultrasound methods suffer from low sensitivity, especially in detecting anteriorly located tumors. Ultrasound computed tomography provides quantitative tissue characterization, but its clinical implementation faces significant challenges, particularly under anatomically constrained limited-angle acquisition conditions specific to prostate imaging. To address these unmet needs, we introduce OpenPros, the first large-scale benchmark dataset explicitly developed for limited-view prostate USCT. Our dataset includes over 280,000 paired samples of realistic 2D speed-of-sound (SOS) phantoms and corresponding ultrasound full-waveform data, generated from anatomically accurate 3D digital prostate models derived from real clinical MRI/CT scans and ex vivo ultrasound measurements, annotated by medical experts. Simulations are conducted under clinically realistic configurations using advanced finite-difference time-domain and Runge-Kutta acoustic wave solvers, both provided as open-source components. Through comprehensive baseline experiments, we demonstrate that state-of-the-art deep learning methods surpass traditional physics-based approaches in both inference efficiency and reconstruction accuracy. Nevertheless, current deep learning models still fall short of delivering clinically acceptable high-resolution images with sufficient accuracy. By publicly releasing OpenPros, we aim to encourage the development of advanced machine learning algorithms capable of bridging this performance gap and producing clinically usable, high-resolution, and highly accurate prostate ultrasound images. The dataset is publicly accessible at https://open-pros.github.io/.
△ Less
Submitted 18 May, 2025;
originally announced May 2025.
-
Adjoint Sampling: Highly Scalable Diffusion Samplers via Adjoint Matching
Authors:
Aaron Havens,
Benjamin Kurt Miller,
Bing Yan,
Carles Domingo-Enrich,
Anuroop Sriram,
Brandon Wood,
Daniel Levine,
Bin Hu,
Brandon Amos,
Brian Karrer,
Xiang Fu,
Guan-Horng Liu,
Ricky T. Q. Chen
Abstract:
We introduce Adjoint Sampling, a highly scalable and efficient algorithm for learning diffusion processes that sample from unnormalized densities, or energy functions. It is the first on-policy approach that allows significantly more gradient updates than the number of energy evaluations and model samples, allowing us to scale to much larger problem settings than previously explored by similar met…
▽ More
We introduce Adjoint Sampling, a highly scalable and efficient algorithm for learning diffusion processes that sample from unnormalized densities, or energy functions. It is the first on-policy approach that allows significantly more gradient updates than the number of energy evaluations and model samples, allowing us to scale to much larger problem settings than previously explored by similar methods. Our framework is theoretically grounded in stochastic optimal control and shares the same theoretical guarantees as Adjoint Matching, being able to train without the need for corrective measures that push samples towards the target distribution. We show how to incorporate key symmetries, as well as periodic boundary conditions, for modeling molecules in both cartesian and torsional coordinates. We demonstrate the effectiveness of our approach through extensive experiments on classical energy functions, and further scale up to neural network-based energy models where we perform amortized conformer generation across many molecular systems. To encourage further research in developing highly scalable sampling methods, we plan to open source these challenging benchmarks, where successful methods can directly impact progress in computational chemistry.
△ Less
Submitted 28 May, 2025; v1 submitted 15 April, 2025;
originally announced April 2025.
-
A practical guide to machine learning interatomic potentials -- Status and future
Authors:
Ryan Jacobs,
Dane Morgan,
Siamak Attarian,
Jun Meng,
Chen Shen,
Zhenghao Wu,
Clare Yijia Xie,
Julia H. Yang,
Nongnuch Artrith,
Ben Blaiszik,
Gerbrand Ceder,
Kamal Choudhary,
Gabor Csanyi,
Ekin Dogus Cubuk,
Bowen Deng,
Ralf Drautz,
Xiang Fu,
Jonathan Godwin,
Vasant Honavar,
Olexandr Isayev,
Anders Johansson,
Boris Kozinsky,
Stefano Martiniani,
Shyue Ping Ong,
Igor Poltavsky
, et al. (5 additional authors not shown)
Abstract:
The rapid development and large body of literature on machine learning interatomic potentials (MLIPs) can make it difficult to know how to proceed for researchers who are not experts but wish to use these tools. The spirit of this review is to help such researchers by serving as a practical, accessible guide to the state-of-the-art in MLIPs. This review paper covers a broad range of topics related…
▽ More
The rapid development and large body of literature on machine learning interatomic potentials (MLIPs) can make it difficult to know how to proceed for researchers who are not experts but wish to use these tools. The spirit of this review is to help such researchers by serving as a practical, accessible guide to the state-of-the-art in MLIPs. This review paper covers a broad range of topics related to MLIPs, including (i) central aspects of how and why MLIPs are enablers of many exciting advancements in molecular modeling, (ii) the main underpinnings of different types of MLIPs, including their basic structure and formalism, (iii) the potentially transformative impact of universal MLIPs for both organic and inorganic systems, including an overview of the most recent advances, capabilities, downsides, and potential applications of this nascent class of MLIPs, (iv) a practical guide for estimating and understanding the execution speed of MLIPs, including guidance for users based on hardware availability, type of MLIP used, and prospective simulation size and time, (v) a manual for what MLIP a user should choose for a given application by considering hardware resources, speed requirements, energy and force accuracy requirements, as well as guidance for choosing pre-trained potentials or fitting a new potential from scratch, (vi) discussion around MLIP infrastructure, including sources of training data, pre-trained potentials, and hardware resources for training, (vii) summary of some key limitations of present MLIPs and current approaches to mitigate such limitations, including methods of including long-range interactions, handling magnetic systems, and treatment of excited states, and finally (viii) we finish with some more speculative thoughts on what the future holds for the development and application of MLIPs over the next 3-10+ years.
△ Less
Submitted 12 March, 2025;
originally announced March 2025.
-
Learning Smooth and Expressive Interatomic Potentials for Physical Property Prediction
Authors:
Xiang Fu,
Brandon M. Wood,
Luis Barroso-Luque,
Daniel S. Levine,
Meng Gao,
Misko Dzamba,
C. Lawrence Zitnick
Abstract:
Machine learning interatomic potentials (MLIPs) have become increasingly effective at approximating quantum mechanical calculations at a fraction of the computational cost. However, lower errors on held out test sets do not always translate to improved results on downstream physical property prediction tasks. In this paper, we propose testing MLIPs on their practical ability to conserve energy dur…
▽ More
Machine learning interatomic potentials (MLIPs) have become increasingly effective at approximating quantum mechanical calculations at a fraction of the computational cost. However, lower errors on held out test sets do not always translate to improved results on downstream physical property prediction tasks. In this paper, we propose testing MLIPs on their practical ability to conserve energy during molecular dynamic simulations. If passed, improved correlations are found between test errors and their performance on physical property prediction tasks. We identify choices which may lead to models failing this test, and use these observations to improve upon highly-expressive models. The resulting model, eSEN, provides state-of-the-art results on a range of physical property prediction tasks, including materials stability prediction, thermal conductivity prediction, and phonon calculations.
△ Less
Submitted 23 April, 2025; v1 submitted 17 February, 2025;
originally announced February 2025.
-
FlowLLM: Flow Matching for Material Generation with Large Language Models as Base Distributions
Authors:
Anuroop Sriram,
Benjamin Kurt Miller,
Ricky T. Q. Chen,
Brandon M. Wood
Abstract:
Material discovery is a critical area of research with the potential to revolutionize various fields, including carbon capture, renewable energy, and electronics. However, the immense scale of the chemical space makes it challenging to explore all possible materials experimentally. In this paper, we introduce FlowLLM, a novel generative model that combines large language models (LLMs) and Riemanni…
▽ More
Material discovery is a critical area of research with the potential to revolutionize various fields, including carbon capture, renewable energy, and electronics. However, the immense scale of the chemical space makes it challenging to explore all possible materials experimentally. In this paper, we introduce FlowLLM, a novel generative model that combines large language models (LLMs) and Riemannian flow matching (RFM) to design novel crystalline materials. FlowLLM first fine-tunes an LLM to learn an effective base distribution of meta-stable crystals in a text representation. After converting to a graph representation, the RFM model takes samples from the LLM and iteratively refines the coordinates and lattice parameters. Our approach significantly outperforms state-of-the-art methods, increasing the generation rate of stable materials by over three times and increasing the rate for stable, unique, and novel crystals by $\sim50\%$ - a huge improvement on a difficult problem. Additionally, the crystals generated by FlowLLM are much closer to their relaxed state when compared with another leading model, significantly reducing post-hoc computational cost.
△ Less
Submitted 30 October, 2024;
originally announced October 2024.
-
Fast Deep Hedging with Second-Order Optimization
Authors:
Konrad Mueller,
Amira Akkari,
Lukas Gonon,
Ben Wood
Abstract:
Hedging exotic options in presence of market frictions is an important risk management task. Deep hedging can solve such hedging problems by training neural network policies in realistic simulated markets. Training these neural networks may be delicate and suffer from slow convergence, particularly for options with long maturities and complex sensitivities to market parameters. To address this, we…
▽ More
Hedging exotic options in presence of market frictions is an important risk management task. Deep hedging can solve such hedging problems by training neural network policies in realistic simulated markets. Training these neural networks may be delicate and suffer from slow convergence, particularly for options with long maturities and complex sensitivities to market parameters. To address this, we propose a second-order optimization scheme for deep hedging. We leverage pathwise differentiability to construct a curvature matrix, which we approximate as block-diagonal and Kronecker-factored to efficiently precondition gradients. We evaluate our method on a challenging and practically important problem: hedging a cliquet option on a stock with stochastic volatility by trading in the spot and vanilla options. We find that our second-order scheme can optimize the policy in 1/4 of the number of steps that standard adaptive moment-based optimization takes.
△ Less
Submitted 29 October, 2024;
originally announced October 2024.
-
Estimating the Causal Effects of T Cell Receptors
Authors:
Eli N. Weinstein,
Elizabeth B. Wood,
David M. Blei
Abstract:
A central question in human immunology is how a patient's repertoire of T cells impacts disease. Here, we introduce a method to infer the causal effects of T cell receptor (TCR) sequences on patient outcomes using observational TCR repertoire sequencing data and clinical outcomes data. Our approach corrects for unobserved confounders, such as a patient's environment and life history, by using the…
▽ More
A central question in human immunology is how a patient's repertoire of T cells impacts disease. Here, we introduce a method to infer the causal effects of T cell receptor (TCR) sequences on patient outcomes using observational TCR repertoire sequencing data and clinical outcomes data. Our approach corrects for unobserved confounders, such as a patient's environment and life history, by using the patient's immature, pre-selection TCR repertoire. The pre-selection repertoire can be estimated from nonproductive TCR data, which is widely available. It is generated by a randomized mutational process, V(D)J recombination, which provides a natural experiment. We show formally how to use the pre-selection repertoire to draw causal inferences, and develop a scalable neural-network estimator for our identification formula. Our method produces an estimate of the effect of interventions that add a specific TCR sequence to patient repertoires. As a demonstration, we use it to analyze the effects of TCRs on COVID-19 severity, uncovering potentially therapeutic TCRs that are (1) observed in patients, (2) bind SARS-CoV-2 antigens in vitro and (3) have strong positive effects on clinical outcomes.
△ Less
Submitted 17 October, 2024;
originally announced October 2024.
-
Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models
Authors:
Luis Barroso-Luque,
Muhammed Shuaibi,
Xiang Fu,
Brandon M. Wood,
Misko Dzamba,
Meng Gao,
Ammar Rizvi,
C. Lawrence Zitnick,
Zachary W. Ulissi
Abstract:
The ability to discover new materials with desirable properties is critical for numerous applications from helping mitigate climate change to advances in next generation computing hardware. AI has the potential to accelerate materials discovery and design by more effectively exploring the chemical space compared to other computational methods or by trial-and-error. While substantial progress has b…
▽ More
The ability to discover new materials with desirable properties is critical for numerous applications from helping mitigate climate change to advances in next generation computing hardware. AI has the potential to accelerate materials discovery and design by more effectively exploring the chemical space compared to other computational methods or by trial-and-error. While substantial progress has been made on AI for materials data, benchmarks, and models, a barrier that has emerged is the lack of publicly available training data and open pre-trained models. To address this, we present a Meta FAIR release of the Open Materials 2024 (OMat24) large-scale open dataset and an accompanying set of pre-trained models. OMat24 contains over 110 million density functional theory (DFT) calculations focused on structural and compositional diversity. Our EquiformerV2 models achieve state-of-the-art performance on the Matbench Discovery leaderboard and are capable of predicting ground-state stability and formation energies to an F1 score above 0.9 and an accuracy of 20 meV/atom, respectively. We explore the impact of model size, auxiliary denoising objectives, and fine-tuning on performance across a range of datasets including OMat24, MPtraj, and Alexandria. The open release of the OMat24 dataset and models enables the research community to build upon our efforts and drive further advancements in AI-assisted materials science.
△ Less
Submitted 16 October, 2024;
originally announced October 2024.
-
The doctor will polygraph you now: ethical concerns with AI for fact-checking patients
Authors:
James Anibal,
Jasmine Gunkel,
Shaheen Awan,
Hannah Huth,
Hang Nguyen,
Tram Le,
Jean-Christophe Bélisle-Pipon,
Micah Boyer,
Lindsey Hazen,
Bridge2AI Voice Consortium,
Yael Bensoussan,
David Clifton,
Bradford Wood
Abstract:
Artificial intelligence (AI) methods have been proposed for the prediction of social behaviors which could be reasonably understood from patient-reported information. This raises novel ethical concerns about respect, privacy, and control over patient data. Ethical concerns surrounding clinical AI systems for social behavior verification can be divided into two main categories: (1) the potential fo…
▽ More
Artificial intelligence (AI) methods have been proposed for the prediction of social behaviors which could be reasonably understood from patient-reported information. This raises novel ethical concerns about respect, privacy, and control over patient data. Ethical concerns surrounding clinical AI systems for social behavior verification can be divided into two main categories: (1) the potential for inaccuracies/biases within such systems, and (2) the impact on trust in patient-provider relationships with the introduction of automated AI systems for fact-checking, particularly in cases where the data/models may contradict the patient. Additionally, this report simulated the misuse of a verification system using patient voice samples and identified a potential LLM bias against patient-reported information in favor of multi-dimensional data and the outputs of other AI methods (i.e., AI self-trust). Finally, recommendations were presented for mitigating the risk that AI verification methods will cause harm to patients or undermine the purpose of the healthcare system.
△ Less
Submitted 11 November, 2024; v1 submitted 14 August, 2024;
originally announced August 2024.
-
Toward Automated Detection of Biased Social Signals from the Content of Clinical Conversations
Authors:
Feng Chen,
Manas Satish Bedmutha,
Ray-Yuan Chung,
Janice Sabin,
Wanda Pratt,
Brian R. Wood,
Nadir Weibel,
Andrea L. Hartzler,
Trevor Cohen
Abstract:
Implicit bias can impede patient-provider interactions and lead to inequities in care. Raising awareness is key to reducing such bias, but its manifestations in the social dynamics of patient-provider communication are difficult to detect. In this study, we used automated speech recognition (ASR) and natural language processing (NLP) to identify social signals in patient-provider interactions. We…
▽ More
Implicit bias can impede patient-provider interactions and lead to inequities in care. Raising awareness is key to reducing such bias, but its manifestations in the social dynamics of patient-provider communication are difficult to detect. In this study, we used automated speech recognition (ASR) and natural language processing (NLP) to identify social signals in patient-provider interactions. We built an automated pipeline to predict social signals from audio recordings of 782 primary care visits that achieved 90.1% average accuracy across codes, and exhibited fairness in its predictions for white and non-white patients. Applying this pipeline, we identified statistically significant differences in provider communication behavior toward white versus non-white patients. In particular, providers expressed more patient-centered behaviors towards white patients including more warmth, engagement, and attentiveness. Our study underscores the potential of automated tools in identifying subtle communication signals that may be linked with bias and impact healthcare quality and equity.
△ Less
Submitted 30 July, 2024; v1 submitted 1 July, 2024;
originally announced July 2024.
-
Location-based Radiology Report-Guided Semi-supervised Learning for Prostate Cancer Detection
Authors:
Alex Chen,
Nathan Lay,
Stephanie Harmon,
Kutsev Ozyoruk,
Enis Yilmaz,
Brad J. Wood,
Peter A. Pinto,
Peter L. Choyke,
Baris Turkbey
Abstract:
Prostate cancer is one of the most prevalent malignancies in the world. While deep learning has potential to further improve computer-aided prostate cancer detection on MRI, its efficacy hinges on the exhaustive curation of manually annotated images. We propose a novel methodology of semisupervised learning (SSL) guided by automatically extracted clinical information, specifically the lesion locat…
▽ More
Prostate cancer is one of the most prevalent malignancies in the world. While deep learning has potential to further improve computer-aided prostate cancer detection on MRI, its efficacy hinges on the exhaustive curation of manually annotated images. We propose a novel methodology of semisupervised learning (SSL) guided by automatically extracted clinical information, specifically the lesion locations in radiology reports, allowing for use of unannotated images to reduce the annotation burden. By leveraging lesion locations, we refined pseudo labels, which were then used to train our location-based SSL model. We show that our SSL method can improve prostate lesion detection by utilizing unannotated images, with more substantial impacts being observed when larger proportions of unannotated images are used.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
FlowMM: Generating Materials with Riemannian Flow Matching
Authors:
Benjamin Kurt Miller,
Ricky T. Q. Chen,
Anuroop Sriram,
Brandon M Wood
Abstract:
Crystalline materials are a fundamental component in next-generation technologies, yet modeling their distribution presents unique computational challenges. Of the plausible arrangements of atoms in a periodic lattice only a vanishingly small percentage are thermodynamically stable, which is a key indicator of the materials that can be experimentally realized. Two fundamental tasks in this area ar…
▽ More
Crystalline materials are a fundamental component in next-generation technologies, yet modeling their distribution presents unique computational challenges. Of the plausible arrangements of atoms in a periodic lattice only a vanishingly small percentage are thermodynamically stable, which is a key indicator of the materials that can be experimentally realized. Two fundamental tasks in this area are to (a) predict the stable crystal structure of a known composition of elements and (b) propose novel compositions along with their stable structures. We present FlowMM, a pair of generative models that achieve state-of-the-art performance on both tasks while being more efficient and more flexible than competing methods. We generalize Riemannian Flow Matching to suit the symmetries inherent to crystals: translation, rotation, permutation, and periodic boundary conditions. Our framework enables the freedom to choose the flow base distributions, drastically simplifying the problem of learning crystal structures compared with diffusion models. In addition to standard benchmarks, we validate FlowMM's generated structures with quantum chemistry calculations, demonstrating that it is about 3x more efficient, in terms of integration steps, at finding stable materials compared to previous open methods.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Voice EHR: Introducing Multimodal Audio Data for Health
Authors:
James Anibal,
Hannah Huth,
Ming Li,
Lindsey Hazen,
Veronica Daoud,
Dominique Ebedes,
Yen Minh Lam,
Hang Nguyen,
Phuc Hong,
Michael Kleinman,
Shelley Ost,
Christopher Jackson,
Laura Sprabery,
Cheran Elangovan,
Balaji Krishnaiah,
Lee Akst,
Ioan Lina,
Iqbal Elyazar,
Lenny Ekwati,
Stefan Jansen,
Richard Nduwayezu,
Charisse Garcia,
Jeffrey Plum,
Jacqueline Brenner,
Miranda Song
, et al. (5 additional authors not shown)
Abstract:
Artificial intelligence (AI) models trained on audio data may have the potential to rapidly perform clinical tasks, enhancing medical decision-making and potentially improving outcomes through early detection. Existing technologies depend on limited datasets collected with expensive recording equipment in high-income countries, which challenges deployment in resource-constrained, high-volume setti…
▽ More
Artificial intelligence (AI) models trained on audio data may have the potential to rapidly perform clinical tasks, enhancing medical decision-making and potentially improving outcomes through early detection. Existing technologies depend on limited datasets collected with expensive recording equipment in high-income countries, which challenges deployment in resource-constrained, high-volume settings where audio data may have a profound impact on health equity. This report introduces a novel data type and a corresponding collection system that captures health data through guided questions using only a mobile/web application. The app facilitates the collection of an audio electronic health record (Voice EHR) which may contain complex biomarkers of health from conventional voice/respiratory features, speech patterns, and spoken language with semantic meaning and longitudinal context, potentially compensating for the typical limitations of unimodal clinical datasets. This report presents the application used for data collection, initial experiments on data quality, and case studies which demonstrate the potential of voice EHR to advance the scalability/diversity of audio AI.
△ Less
Submitted 9 November, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Towards Enhanced Human Activity Recognition through Natural Language Generation and Pose Estimation
Authors:
Nikhil Kashyap,
Manas Satish Bedmutha,
Prerit Chaudhary,
Brian Wood,
Wanda Pratt,
Janice Sabin,
Andrea Hartzler,
Nadir Weibel
Abstract:
Vision-based human activity recognition (HAR) has made substantial progress in recognizing predefined gestures but lacks adaptability for emerging activities. This paper introduces a paradigm shift by harnessing generative modeling and large language models (LLMs) to enhance vision-based HAR. We propose utilizing LLMs to generate descriptive textual representations of activities using pose keypoin…
▽ More
Vision-based human activity recognition (HAR) has made substantial progress in recognizing predefined gestures but lacks adaptability for emerging activities. This paper introduces a paradigm shift by harnessing generative modeling and large language models (LLMs) to enhance vision-based HAR. We propose utilizing LLMs to generate descriptive textual representations of activities using pose keypoints as an intermediate representation. Incorporating pose keypoints adds contextual depth to the recognition process, allowing for sequences of vectors resembling text chunks, compatible with LLMs. This innovative fusion of computer vision and natural language processing holds significant potential for revolutionizing activity recognition. A proof of concept study on a Kinetics700 dataset subset validates the approach's efficacy, highlighting improved accuracy and interpretability. Future implications encompass enhanced accuracy, novel research avenues, model generalization, and ethical considerations for transparency. This framework has real-world applications, including personalized gym workout feedback and nuanced sports training insights. By connecting visual cues to interpretable textual descriptions, the proposed framework advances HAR accuracy and applicability, shaping the landscape of pervasive computing and activity recognition research. As this approach evolves, it promises a more insightful understanding of human activities across diverse contexts, marking a significant step towards a better world.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
From Molecules to Materials: Pre-training Large Generalizable Models for Atomic Property Prediction
Authors:
Nima Shoghi,
Adeesh Kolluru,
John R. Kitchin,
Zachary W. Ulissi,
C. Lawrence Zitnick,
Brandon M. Wood
Abstract:
Foundation models have been transformational in machine learning fields such as natural language processing and computer vision. Similar success in atomic property prediction has been limited due to the challenges of training effective models across multiple chemical domains. To address this, we introduce Joint Multi-domain Pre-training (JMP), a supervised pre-training strategy that simultaneously…
▽ More
Foundation models have been transformational in machine learning fields such as natural language processing and computer vision. Similar success in atomic property prediction has been limited due to the challenges of training effective models across multiple chemical domains. To address this, we introduce Joint Multi-domain Pre-training (JMP), a supervised pre-training strategy that simultaneously trains on multiple datasets from different chemical domains, treating each dataset as a unique pre-training task within a multi-task framework. Our combined training dataset consists of $\sim$120M systems from OC20, OC22, ANI-1x, and Transition-1x. We evaluate performance and generalization by fine-tuning over a diverse set of downstream tasks and datasets including: QM9, rMD17, MatBench, QMOF, SPICE, and MD22. JMP demonstrates an average improvement of 59% over training from scratch, and matches or sets state-of-the-art on 34 out of 40 tasks. Our work highlights the potential of pre-training strategies that utilize diverse data to advance property prediction across chemical domains, especially for low-data tasks. Please visit https://nima.sh/jmp for further information.
△ Less
Submitted 6 May, 2024; v1 submitted 25 October, 2023;
originally announced October 2023.
-
C-DARL: Contrastive diffusion adversarial representation learning for label-free blood vessel segmentation
Authors:
Boah Kim,
Yujin Oh,
Bradford J. Wood,
Ronald M. Summers,
Jong Chul Ye
Abstract:
Blood vessel segmentation in medical imaging is one of the essential steps for vascular disease diagnosis and interventional planning in a broad spectrum of clinical scenarios in image-based medicine and interventional medicine. Unfortunately, manual annotation of the vessel masks is challenging and resource-intensive due to subtle branches and complex structures. To overcome this issue, this pape…
▽ More
Blood vessel segmentation in medical imaging is one of the essential steps for vascular disease diagnosis and interventional planning in a broad spectrum of clinical scenarios in image-based medicine and interventional medicine. Unfortunately, manual annotation of the vessel masks is challenging and resource-intensive due to subtle branches and complex structures. To overcome this issue, this paper presents a self-supervised vessel segmentation method, dubbed the contrastive diffusion adversarial representation learning (C-DARL) model. Our model is composed of a diffusion module and a generation module that learns the distribution of multi-domain blood vessel data by generating synthetic vessel images from diffusion latent. Moreover, we employ contrastive learning through a mask-based contrastive loss so that the model can learn more realistic vessel representations. To validate the efficacy, C-DARL is trained using various vessel datasets, including coronary angiograms, abdominal digital subtraction angiograms, and retinal imaging. Experimental results confirm that our model achieves performance improvement over baseline methods with noise robustness, suggesting the effectiveness of C-DARL for vessel segmentation.
△ Less
Submitted 31 July, 2023;
originally announced August 2023.
-
EquiformerV2: Improved Equivariant Transformer for Scaling to Higher-Degree Representations
Authors:
Yi-Lun Liao,
Brandon Wood,
Abhishek Das,
Tess Smidt
Abstract:
Equivariant Transformers such as Equiformer have demonstrated the efficacy of applying Transformers to the domain of 3D atomistic systems. However, they are limited to small degrees of equivariant representations due to their computational complexity. In this paper, we investigate whether these architectures can scale well to higher degrees. Starting from Equiformer, we first replace $SO(3)$ convo…
▽ More
Equivariant Transformers such as Equiformer have demonstrated the efficacy of applying Transformers to the domain of 3D atomistic systems. However, they are limited to small degrees of equivariant representations due to their computational complexity. In this paper, we investigate whether these architectures can scale well to higher degrees. Starting from Equiformer, we first replace $SO(3)$ convolutions with eSCN convolutions to efficiently incorporate higher-degree tensors. Then, to better leverage the power of higher degrees, we propose three architectural improvements -- attention re-normalization, separable $S^2$ activation and separable layer normalization. Putting this all together, we propose EquiformerV2, which outperforms previous state-of-the-art methods on large-scale OC20 dataset by up to $9\%$ on forces, $4\%$ on energies, offers better speed-accuracy trade-offs, and $2\times$ reduction in DFT calculations needed for computing adsorption energies. Additionally, EquiformerV2 trained on only OC22 dataset outperforms GemNet-OC trained on both OC20 and OC22 datasets, achieving much better data efficiency. Finally, we compare EquiformerV2 with Equiformer on QM9 and OC20 S2EF-2M datasets to better understand the performance gain brought by higher degrees.
△ Less
Submitted 6 March, 2024; v1 submitted 21 June, 2023;
originally announced June 2023.
-
Quantum Deep Hedging
Authors:
El Amine Cherrat,
Snehal Raj,
Iordanis Kerenidis,
Abhishek Shekhar,
Ben Wood,
Jon Dee,
Shouvanik Chakrabarti,
Richard Chen,
Dylan Herman,
Shaohan Hu,
Pierre Minssen,
Ruslan Shaydulin,
Yue Sun,
Romina Yalovetzky,
Marco Pistoia
Abstract:
Quantum machine learning has the potential for a transformative impact across industry sectors and in particular in finance. In our work we look at the problem of hedging where deep reinforcement learning offers a powerful framework for real markets. We develop quantum reinforcement learning methods based on policy-search and distributional actor-critic algorithms that use quantum neural network a…
▽ More
Quantum machine learning has the potential for a transformative impact across industry sectors and in particular in finance. In our work we look at the problem of hedging where deep reinforcement learning offers a powerful framework for real markets. We develop quantum reinforcement learning methods based on policy-search and distributional actor-critic algorithms that use quantum neural network architectures with orthogonal and compound layers for the policy and value functions. We prove that the quantum neural networks we use are trainable, and we perform extensive simulations that show that quantum models can reduce the number of trainable parameters while achieving comparable performance and that the distributional approach obtains better performance than other standard approaches, both classical and quantum. We successfully implement the proposed models on a trapped-ion quantum processor, utilizing circuits with up to $16$ qubits, and observe performance that agrees well with noiseless simulation. Our quantum techniques are general and can be applied to other reinforcement learning problems beyond hedging.
△ Less
Submitted 26 November, 2023; v1 submitted 29 March, 2023;
originally announced March 2023.
-
Heterogeneous robot teams with unified perception and autonomy: How Team CSIRO Data61 tied for the top score at the DARPA Subterranean Challenge
Authors:
Navinda Kottege,
Jason Williams,
Brendan Tidd,
Fletcher Talbot,
Ryan Steindl,
Mark Cox,
Dennis Frousheger,
Thomas Hines,
Alex Pitt,
Benjamin Tam,
Brett Wood,
Lauren Hanson,
Katrina Lo Surdo,
Thomas Molnar,
Matt Wildie,
Kazys Stepanas,
Gavin Catt,
Lachlan Tychsen-Smith,
Dean Penfold,
Leslie Overs,
Milad Ramezani,
Kasra Khosoussi,
Farid Kendoul,
Glenn Wagner,
Duncan Palmer
, et al. (5 additional authors not shown)
Abstract:
The DARPA Subterranean Challenge was designed for competitors to develop and deploy teams of autonomous robots to explore difficult unknown underground environments. Categorised in to human-made tunnels, underground urban infrastructure and natural caves, each of these subdomains had many challenging elements for robot perception, locomotion, navigation and autonomy. These included degraded wirele…
▽ More
The DARPA Subterranean Challenge was designed for competitors to develop and deploy teams of autonomous robots to explore difficult unknown underground environments. Categorised in to human-made tunnels, underground urban infrastructure and natural caves, each of these subdomains had many challenging elements for robot perception, locomotion, navigation and autonomy. These included degraded wireless communication, poor visibility due to smoke, narrow passages and doorways, clutter, uneven ground, slippery and loose terrain, stairs, ledges, overhangs, dripping water, and dynamic obstacles that move to block paths among others. In the Final Event of this challenge held in September 2021, the course consisted of all three subdomains. The task was for the robot team to perform a scavenger hunt for a number of pre-defined artefacts within a limited time frame. Only one human supervisor was allowed to communicate with the robots once they were in the course. Points were scored when accurate detections and their locations were communicated back to the scoring server. A total of 8 teams competed in the finals held at the Mega Cavern in Louisville, KY, USA. This article describes the systems deployed by Team CSIRO Data61 that tied for the top score and won second place at the event.
△ Less
Submitted 25 February, 2023;
originally announced February 2023.
-
Human-Robot Team Performance Compared to Full Robot Autonomy in 16 Real-World Search and Rescue Missions: Adaptation of the DARPA Subterranean Challenge
Authors:
Nicole Robinson,
Jason Williams,
David Howard,
Brendan Tidd,
Fletcher Talbot,
Brett Wood,
Alex Pitt,
Navinda Kottege,
Dana Kulić
Abstract:
Human operators in human-robot teams are commonly perceived to be critical for mission success. To explore the direct and perceived impact of operator input on task success and team performance, 16 real-world missions (10 hrs) were conducted based on the DARPA Subterranean Challenge. These missions were to deploy a heterogeneous team of robots for a search task to locate and identify artifacts suc…
▽ More
Human operators in human-robot teams are commonly perceived to be critical for mission success. To explore the direct and perceived impact of operator input on task success and team performance, 16 real-world missions (10 hrs) were conducted based on the DARPA Subterranean Challenge. These missions were to deploy a heterogeneous team of robots for a search task to locate and identify artifacts such as climbing rope, drills and mannequins representing human survivors. Two conditions were evaluated: human operators that could control the robot team with state-of-the-art autonomy (Human-Robot Team) compared to autonomous missions without human operator input (Robot-Autonomy). Human-Robot Teams were often in directed autonomy mode (70% of mission time), found more items, traversed more distance, covered more unique ground, and had a higher time between safety-related events. Human-Robot Teams were faster at finding the first artifact, but slower to respond to information from the robot team. In routine conditions, scores were comparable for artifacts, distance, and coverage. Reasons for intervention included creating waypoints to prioritise high-yield areas, and to navigate through error-prone spaces. After observing robot autonomy, operators reported increases in robot competency and trust, but that robot behaviour was not always transparent and understandable, even after high mission performance.
△ Less
Submitted 11 December, 2022;
originally announced December 2022.
-
AdsorbML: A Leap in Efficiency for Adsorption Energy Calculations using Generalizable Machine Learning Potentials
Authors:
Janice Lan,
Aini Palizhati,
Muhammed Shuaibi,
Brandon M. Wood,
Brook Wander,
Abhishek Das,
Matt Uyttendaele,
C. Lawrence Zitnick,
Zachary W. Ulissi
Abstract:
Computational catalysis is playing an increasingly significant role in the design of catalysts across a wide range of applications. A common task for many computational methods is the need to accurately compute the adsorption energy for an adsorbate and a catalyst surface of interest. Traditionally, the identification of low energy adsorbate-surface configurations relies on heuristic methods and r…
▽ More
Computational catalysis is playing an increasingly significant role in the design of catalysts across a wide range of applications. A common task for many computational methods is the need to accurately compute the adsorption energy for an adsorbate and a catalyst surface of interest. Traditionally, the identification of low energy adsorbate-surface configurations relies on heuristic methods and researcher intuition. As the desire to perform high-throughput screening increases, it becomes challenging to use heuristics and intuition alone. In this paper, we demonstrate machine learning potentials can be leveraged to identify low energy adsorbate-surface configurations more accurately and efficiently. Our algorithm provides a spectrum of trade-offs between accuracy and efficiency, with one balanced option finding the lowest energy configuration 87.36% of the time, while achieving a 2000x speedup in computation. To standardize benchmarking, we introduce the Open Catalyst Dense dataset containing nearly 1,000 diverse surfaces and 100,000 unique configurations.
△ Less
Submitted 15 September, 2023; v1 submitted 29 November, 2022;
originally announced November 2022.
-
Distance Map Supervised Landmark Localization for MR-TRUS Registration
Authors:
Xinrui Song,
Xuanang Xu,
Sheng Xu,
Baris Turkbey,
Bradford J. Wood,
Thomas Sanford,
Pingkun Yan
Abstract:
In this work, we propose to explicitly use the landmarks of prostate to guide the MR-TRUS image registration. We first train a deep neural network to automatically localize a set of meaningful landmarks, and then directly generate the affine registration matrix from the location of these landmarks. For landmark localization, instead of directly training a network to predict the landmark coordinate…
▽ More
In this work, we propose to explicitly use the landmarks of prostate to guide the MR-TRUS image registration. We first train a deep neural network to automatically localize a set of meaningful landmarks, and then directly generate the affine registration matrix from the location of these landmarks. For landmark localization, instead of directly training a network to predict the landmark coordinates, we propose to regress a full-resolution distance map of the landmark, which is demonstrated effective in avoiding statistical bias to unsatisfactory performance and thus improving performance. We then use the predicted landmarks to generate the affine transformation matrix, which outperforms the clinicians' manual rigid registration by a significant margin in terms of TRE.
△ Less
Submitted 11 October, 2022;
originally announced October 2022.
-
Spherical Channels for Modeling Atomic Interactions
Authors:
C. Lawrence Zitnick,
Abhishek Das,
Adeesh Kolluru,
Janice Lan,
Muhammed Shuaibi,
Anuroop Sriram,
Zachary Ulissi,
Brandon Wood
Abstract:
Modeling the energy and forces of atomic systems is a fundamental problem in computational chemistry with the potential to help address many of the world's most pressing problems, including those related to energy scarcity and climate change. These calculations are traditionally performed using Density Functional Theory, which is computationally very expensive. Machine learning has the potential t…
▽ More
Modeling the energy and forces of atomic systems is a fundamental problem in computational chemistry with the potential to help address many of the world's most pressing problems, including those related to energy scarcity and climate change. These calculations are traditionally performed using Density Functional Theory, which is computationally very expensive. Machine learning has the potential to dramatically improve the efficiency of these calculations from days or hours to seconds. We propose the Spherical Channel Network (SCN) to model atomic energies and forces. The SCN is a graph neural network where nodes represent atoms and edges their neighboring atoms. The atom embeddings are a set of spherical functions, called spherical channels, represented using spherical harmonics. We demonstrate, that by rotating the embeddings based on the 3D edge orientation, more information may be utilized while maintaining the rotational equivariance of the messages. While equivariance is a desirable property, we find that by relaxing this constraint in both message passing and aggregation, improved accuracy may be achieved. We demonstrate state-of-the-art results on the large-scale Open Catalyst dataset in both energy and force prediction for numerous tasks and metrics.
△ Less
Submitted 13 October, 2022; v1 submitted 28 June, 2022;
originally announced June 2022.
-
The Open Catalyst 2022 (OC22) Dataset and Challenges for Oxide Electrocatalysts
Authors:
Richard Tran,
Janice Lan,
Muhammed Shuaibi,
Brandon M. Wood,
Siddharth Goyal,
Abhishek Das,
Javier Heras-Domingo,
Adeesh Kolluru,
Ammar Rizvi,
Nima Shoghi,
Anuroop Sriram,
Felix Therrien,
Jehad Abed,
Oleksandr Voznyy,
Edward H. Sargent,
Zachary Ulissi,
C. Lawrence Zitnick
Abstract:
The development of machine learning models for electrocatalysts requires a broad set of training data to enable their use across a wide variety of materials. One class of materials that currently lacks sufficient training data is oxides, which are critical for the development of OER catalysts. To address this, we developed the OC22 dataset, consisting of 62,331 DFT relaxations (~9,854,504 single p…
▽ More
The development of machine learning models for electrocatalysts requires a broad set of training data to enable their use across a wide variety of materials. One class of materials that currently lacks sufficient training data is oxides, which are critical for the development of OER catalysts. To address this, we developed the OC22 dataset, consisting of 62,331 DFT relaxations (~9,854,504 single point calculations) across a range of oxide materials, coverages, and adsorbates. We define generalized total energy tasks that enable property prediction beyond adsorption energies; we test baseline performance of several graph neural networks; and we provide pre-defined dataset splits to establish clear benchmarks for future efforts. In the most general task, GemNet-OC sees a ~36% improvement in energy predictions when combining the chemically dissimilar OC20 and OC22 datasets via fine-tuning. Similarly, we achieved a ~19% improvement in total energy predictions on OC20 and a ~9% improvement in force predictions in OC22 when using joint training. We demonstrate the practical utility of a top performing model by capturing literature adsorption energies and important OER scaling relationships. We expect OC22 to provide an important benchmark for models seeking to incorporate intricate long-range electrostatic and magnetic interactions in oxide surfaces. Dataset and baseline models are open sourced, and a public leaderboard is available to encourage continued community developments on the total energy tasks and data.
△ Less
Submitted 7 March, 2023; v1 submitted 17 June, 2022;
originally announced June 2022.
-
Towards Training Billion Parameter Graph Neural Networks for Atomic Simulations
Authors:
Anuroop Sriram,
Abhishek Das,
Brandon M. Wood,
Siddharth Goyal,
C. Lawrence Zitnick
Abstract:
Recent progress in Graph Neural Networks (GNNs) for modeling atomic simulations has the potential to revolutionize catalyst discovery, which is a key step in making progress towards the energy breakthroughs needed to combat climate change. However, the GNNs that have proven most effective for this task are memory intensive as they model higher-order interactions in the graphs such as those between…
▽ More
Recent progress in Graph Neural Networks (GNNs) for modeling atomic simulations has the potential to revolutionize catalyst discovery, which is a key step in making progress towards the energy breakthroughs needed to combat climate change. However, the GNNs that have proven most effective for this task are memory intensive as they model higher-order interactions in the graphs such as those between triplets or quadruplets of atoms, making it challenging to scale these models. In this paper, we introduce Graph Parallelism, a method to distribute input graphs across multiple GPUs, enabling us to train very large GNNs with hundreds of millions or billions of parameters. We empirically evaluate our method by scaling up the number of parameters of the recently proposed DimeNet++ and GemNet models by over an order of magnitude. On the large-scale Open Catalyst 2020 (OC20) dataset, these graph-parallelized models lead to relative improvements of 1) 15% on the force MAE metric for the S2EF task and 2) 21% on the AFbT metric for the IS2RS task, establishing new state-of-the-art results.
△ Less
Submitted 17 March, 2022;
originally announced March 2022.
-
Auto-FedRL: Federated Hyperparameter Optimization for Multi-institutional Medical Image Segmentation
Authors:
Pengfei Guo,
Dong Yang,
Ali Hatamizadeh,
An Xu,
Ziyue Xu,
Wenqi Li,
Can Zhao,
Daguang Xu,
Stephanie Harmon,
Evrim Turkbey,
Baris Turkbey,
Bradford Wood,
Francesca Patella,
Elvira Stellato,
Gianpaolo Carrafiello,
Vishal M. Patel,
Holger R. Roth
Abstract:
Federated learning (FL) is a distributed machine learning technique that enables collaborative model training while avoiding explicit data sharing. The inherent privacy-preserving property of FL algorithms makes them especially attractive to the medical field. However, in case of heterogeneous client data distributions, standard FL methods are unstable and require intensive hyperparameter tuning t…
▽ More
Federated learning (FL) is a distributed machine learning technique that enables collaborative model training while avoiding explicit data sharing. The inherent privacy-preserving property of FL algorithms makes them especially attractive to the medical field. However, in case of heterogeneous client data distributions, standard FL methods are unstable and require intensive hyperparameter tuning to achieve optimal performance. Conventional hyperparameter optimization algorithms are impractical in real-world FL applications as they involve numerous training trials, which are often not affordable with limited compute budgets. In this work, we propose an efficient reinforcement learning (RL)-based federated hyperparameter optimization algorithm, termed Auto-FedRL, in which an online RL agent can dynamically adjust hyperparameters of each client based on the current training progress. Extensive experiments are conducted to investigate different search strategies and RL agents. The effectiveness of the proposed method is validated on a heterogeneous data split of the CIFAR-10 dataset as well as two real-world medical image segmentation datasets for COVID-19 lesion segmentation in chest CT and pancreas segmentation in abdominal CT.
△ Less
Submitted 31 August, 2022; v1 submitted 11 March, 2022;
originally announced March 2022.
-
Small Cohort of Epilepsy Patients Showed Increased Activity on Facebook before Sudden Unexpected Death
Authors:
Ian B. Wood,
Rion Brattig Correia,
Wendy R. Miller,
Luis M. Rocha
Abstract:
Sudden Unexpected Death in Epilepsy (SUDEP) remains a leading cause of death in people with epilepsy. Despite the constant risk for patients and bereavement to family members, to date the physiological mechanisms of SUDEP remain unknown. Here we explore the potential to identify putative predictive signals of SUDEP from online digital behavioral data using text and sentiment analysis. Specifically…
▽ More
Sudden Unexpected Death in Epilepsy (SUDEP) remains a leading cause of death in people with epilepsy. Despite the constant risk for patients and bereavement to family members, to date the physiological mechanisms of SUDEP remain unknown. Here we explore the potential to identify putative predictive signals of SUDEP from online digital behavioral data using text and sentiment analysis. Specifically, we analyze Facebook timelines of six epilepsy patients deceased due to SUDEP, donated by surviving family members. We find preliminary evidence for behavioral changes detectable by text and sentiment analysis tools. Namely, in the months preceding their SUDEP event patient social media timelines show: i) increase in verbosity; ii) increased use of functional words; and iii) sentiment shifts as measured by different sentiment analysis tools. Combined, these results suggest that social media engagement, as well as its sentiment, may serve as possible early-warning signals for SUDEP in people with epilepsy. While the small sample of patient timelines analyzed in this study prevents generalization, our preliminary investigation demonstrates the potential of social media data as complementary data in larger studies of SUDEP and epilepsy.
△ Less
Submitted 19 January, 2022;
originally announced January 2022.
-
Multi-Asset Spot and Option Market Simulation
Authors:
Magnus Wiese,
Ben Wood,
Alexandre Pachoud,
Ralf Korn,
Hans Buehler,
Phillip Murray,
Lianjun Bai
Abstract:
We construct realistic spot and equity option market simulators for a single underlying on the basis of normalizing flows. We address the high-dimensionality of market observed call prices through an arbitrage-free autoencoder that approximates efficient low-dimensional representations of the prices while maintaining no static arbitrage in the reconstructed surface. Given a multi-asset universe, w…
▽ More
We construct realistic spot and equity option market simulators for a single underlying on the basis of normalizing flows. We address the high-dimensionality of market observed call prices through an arbitrage-free autoencoder that approximates efficient low-dimensional representations of the prices while maintaining no static arbitrage in the reconstructed surface. Given a multi-asset universe, we leverage the conditional invertibility property of normalizing flows and introduce a scalable method to calibrate the joint distribution of a set of independent simulators while preserving the dynamics of each simulator. Empirical results highlight the goodness of the calibrated simulators and their fidelity.
△ Less
Submitted 13 December, 2021;
originally announced December 2021.
-
Efficient, Interpretable Graph Neural Network Representation for Angle-dependent Properties and its Application to Optical Spectroscopy
Authors:
Tim Hsu,
Tuan Anh Pham,
Nathan Keilbart,
Stephen Weitzner,
James Chapman,
Penghao Xiao,
S. Roger Qiu,
Xiao Chen,
Brandon C. Wood
Abstract:
Graph neural networks are attractive for learning properties of atomic structures thanks to their intuitive graph encoding of atoms and bonds. However, conventional encoding does not include angular information, which is critical for describing atomic arrangements in disordered systems. In this work, we extend the recently proposed ALIGNN encoding, which incorporates bond angles, to also include d…
▽ More
Graph neural networks are attractive for learning properties of atomic structures thanks to their intuitive graph encoding of atoms and bonds. However, conventional encoding does not include angular information, which is critical for describing atomic arrangements in disordered systems. In this work, we extend the recently proposed ALIGNN encoding, which incorporates bond angles, to also include dihedral angles (ALIGNN-d). This simple extension leads to a memory-efficient graph representation that captures the complete geometry of atomic structures. ALIGNN-d is applied to predict the infrared optical response of dynamically disordered Cu(II) aqua complexes, leveraging the intrinsic interpretability to elucidate the relative contributions of individual structural components. Bond and dihedral angles are found to be critical contributors to the fine structure of the absorption response, with distortions representing transitions between more common geometries exhibiting the strongest absorption intensity. Future directions for further development of ALIGNN-d are discussed.
△ Less
Submitted 15 February, 2022; v1 submitted 23 September, 2021;
originally announced September 2021.
-
End-to-end Ultrasound Frame to Volume Registration
Authors:
Hengtao Guo,
Xuanang Xu,
Sheng Xu,
Bradford J. Wood,
Pingkun Yan
Abstract:
Fusing intra-operative 2D transrectal ultrasound (TRUS) image with pre-operative 3D magnetic resonance (MR) volume to guide prostate biopsy can significantly increase the yield. However, such a multimodal 2D/3D registration problem is a very challenging task. In this paper, we propose an end-to-end frame-to-volume registration network (FVR-Net), which can efficiently bridge the previous research g…
▽ More
Fusing intra-operative 2D transrectal ultrasound (TRUS) image with pre-operative 3D magnetic resonance (MR) volume to guide prostate biopsy can significantly increase the yield. However, such a multimodal 2D/3D registration problem is a very challenging task. In this paper, we propose an end-to-end frame-to-volume registration network (FVR-Net), which can efficiently bridge the previous research gaps by aligning a 2D TRUS frame with a 3D TRUS volume without requiring hardware tracking. The proposed FVR-Net utilizes a dual-branch feature extraction module to extract the information from TRUS frame and volume to estimate transformation parameters. We also introduce a differentiable 2D slice sampling module which allows gradients backpropagating from an unsupervised image similarity loss for content correspondence learning. Our model shows superior efficiency for real-time interventional guidance with highly competitive registration accuracy.
△ Less
Submitted 13 July, 2021;
originally announced July 2021.
-
Cross-modal Attention for MRI and Ultrasound Volume Registration
Authors:
Xinrui Song,
Hengtao Guo,
Xuanang Xu,
Hanqing Chao,
Sheng Xu,
Baris Turkbey,
Bradford J. Wood,
Ge Wang,
Pingkun Yan
Abstract:
Prostate cancer biopsy benefits from accurate fusion of transrectal ultrasound (TRUS) and magnetic resonance (MR) images. In the past few years, convolutional neural networks (CNNs) have been proved powerful in extracting image features crucial for image registration. However, challenging applications and recent advances in computer vision suggest that CNNs are quite limited in its ability to unde…
▽ More
Prostate cancer biopsy benefits from accurate fusion of transrectal ultrasound (TRUS) and magnetic resonance (MR) images. In the past few years, convolutional neural networks (CNNs) have been proved powerful in extracting image features crucial for image registration. However, challenging applications and recent advances in computer vision suggest that CNNs are quite limited in its ability to understand spatial correspondence between features, a task in which the self-attention mechanism excels. This paper aims to develop a self-attention mechanism specifically for cross-modal image registration. Our proposed cross-modal attention block effectively maps each of the features in one volume to all features in the corresponding volume. Our experimental results demonstrate that a CNN network designed with the cross-modal attention block embedded outperforms an advanced CNN network 10 times of its size. We also incorporated visualization techniques to improve the interpretability of our network. The source code of our work is available at https://github.com/DIAL-RPI/Attention-Reg .
△ Less
Submitted 11 July, 2021; v1 submitted 9 July, 2021;
originally announced July 2021.
-
Deploying COTS Legged Robot Platforms into a Heterogeneous Robot Team
Authors:
Benjamin Tam,
Thomas Molnar,
Fletcher Talbot,
Brett Wood,
Ryan Steindl
Abstract:
The recent availability of commercial-off-the-shelf (COTS) legged robot platforms have opened up new opportunities in deploying legged systems into different scenarios. While the main advantage of legged robots is their ability to traverse unstructured terrain, there are still large gaps between what robot platforms can achieve and their animal counterparts. Therefore, when deploying as part of a…
▽ More
The recent availability of commercial-off-the-shelf (COTS) legged robot platforms have opened up new opportunities in deploying legged systems into different scenarios. While the main advantage of legged robots is their ability to traverse unstructured terrain, there are still large gaps between what robot platforms can achieve and their animal counterparts. Therefore, when deploying as part of a heterogeneous robot team of different platforms, it is beneficial to understand the different scenarios where a legged platform would perform better than a wheeled, tracked or aerial platform. Two COTS quadruped robots, Ghost Robotics' Vision 60 and Boston Dynamics' Spot, were deployed into a heterogeneous team. A description of some of the challenges faced while integrating the platforms, as well as some experiments in traversing different terrains are provided to give insight into the real-world deployment of legged robots.
△ Less
Submitted 14 June, 2021;
originally announced June 2021.
-
Auto-FedAvg: Learnable Federated Averaging for Multi-Institutional Medical Image Segmentation
Authors:
Yingda Xia,
Dong Yang,
Wenqi Li,
Andriy Myronenko,
Daguang Xu,
Hirofumi Obinata,
Hitoshi Mori,
Peng An,
Stephanie Harmon,
Evrim Turkbey,
Baris Turkbey,
Bradford Wood,
Francesca Patella,
Elvira Stellato,
Gianpaolo Carrafiello,
Anna Ierardi,
Alan Yuille,
Holger Roth
Abstract:
Federated learning (FL) enables collaborative model training while preserving each participant's privacy, which is particularly beneficial to the medical field. FedAvg is a standard algorithm that uses fixed weights, often originating from the dataset sizes at each client, to aggregate the distributed learned models on a server during the FL process. However, non-identical data distribution across…
▽ More
Federated learning (FL) enables collaborative model training while preserving each participant's privacy, which is particularly beneficial to the medical field. FedAvg is a standard algorithm that uses fixed weights, often originating from the dataset sizes at each client, to aggregate the distributed learned models on a server during the FL process. However, non-identical data distribution across clients, known as the non-i.i.d problem in FL, could make this assumption for setting fixed aggregation weights sub-optimal. In this work, we design a new data-driven approach, namely Auto-FedAvg, where aggregation weights are dynamically adjusted, depending on data distributions across data silos and the current training progress of the models. We disentangle the parameter set into two parts, local model parameters and global aggregation parameters, and update them iteratively with a communication-efficient algorithm. We first show the validity of our approach by outperforming state-of-the-art FL methods for image recognition on a heterogeneous data split of CIFAR-10. Furthermore, we demonstrate our algorithm's effectiveness on two multi-institutional medical image analysis tasks, i.e., COVID-19 lesion segmentation in chest CT and pancreas segmentation in abdominal CT.
△ Less
Submitted 20 April, 2021;
originally announced April 2021.
-
Heterogeneous Ground and Air Platforms, Homogeneous Sensing: Team CSIRO Data61's Approach to the DARPA Subterranean Challenge
Authors:
Nicolas Hudson,
Fletcher Talbot,
Mark Cox,
Jason Williams,
Thomas Hines,
Alex Pitt,
Brett Wood,
Dennis Frousheger,
Katrina Lo Surdo,
Thomas Molnar,
Ryan Steindl,
Matt Wildie,
Inkyu Sa,
Navinda Kottege,
Kazys Stepanas,
Emili Hernandez,
Gavin Catt,
William Docherty,
Brendan Tidd,
Benjamin Tam,
Simon Murrell,
Mitchell Bessell,
Lauren Hanson,
Lachlan Tychsen-Smith,
Hajime Suzuki
, et al. (9 additional authors not shown)
Abstract:
Heterogeneous teams of robots, leveraging a balance between autonomy and human interaction, bring powerful capabilities to the problem of exploring dangerous, unstructured subterranean environments. Here we describe the solution developed by Team CSIRO Data61, consisting of CSIRO, Emesent and Georgia Tech, during the DARPA Subterranean Challenge. These presented systems were fielded in the Tunnel…
▽ More
Heterogeneous teams of robots, leveraging a balance between autonomy and human interaction, bring powerful capabilities to the problem of exploring dangerous, unstructured subterranean environments. Here we describe the solution developed by Team CSIRO Data61, consisting of CSIRO, Emesent and Georgia Tech, during the DARPA Subterranean Challenge. These presented systems were fielded in the Tunnel Circuit in August 2019, the Urban Circuit in February 2020, and in our own Cave event, conducted in September 2020. A unique capability of the fielded team is the homogeneous sensing of the platforms utilised, which is leveraged to obtain a decentralised multi-agent SLAM solution on each platform (both ground agents and UAVs) using peer-to-peer communications. This enabled a shift in focus from constructing a pervasive communications network to relying on multi-agent autonomy, motivated by experiences in early circuit events. These experiences also showed the surprising capability of rugged tracked platforms for challenging terrain, which in turn led to the heterogeneous team structure based on a BIA5 OzBot Titan ground robot and an Emesent Hovermap UAV, supplemented by smaller tracked or legged ground robots. The ground agents use a common CatPack perception module, which allowed reuse of the perception and autonomy stack across all ground agents with minimal adaptation.
△ Less
Submitted 19 April, 2021;
originally announced April 2021.
-
Federated Semi-Supervised Learning for COVID Region Segmentation in Chest CT using Multi-National Data from China, Italy, Japan
Authors:
Dong Yang,
Ziyue Xu,
Wenqi Li,
Andriy Myronenko,
Holger R. Roth,
Stephanie Harmon,
Sheng Xu,
Baris Turkbey,
Evrim Turkbey,
Xiaosong Wang,
Wentao Zhu,
Gianpaolo Carrafiello,
Francesca Patella,
Maurizio Cariati,
Hirofumi Obinata,
Hitoshi Mori,
Kaku Tamura,
Peng An,
Bradford J. Wood,
Daguang Xu
Abstract:
The recent outbreak of COVID-19 has led to urgent needs for reliable diagnosis and management of SARS-CoV-2 infection. As a complimentary tool, chest CT has been shown to be able to reveal visual patterns characteristic for COVID-19, which has definite value at several stages during the disease course. To facilitate CT analysis, recent efforts have focused on computer-aided characterization and di…
▽ More
The recent outbreak of COVID-19 has led to urgent needs for reliable diagnosis and management of SARS-CoV-2 infection. As a complimentary tool, chest CT has been shown to be able to reveal visual patterns characteristic for COVID-19, which has definite value at several stages during the disease course. To facilitate CT analysis, recent efforts have focused on computer-aided characterization and diagnosis, which has shown promising results. However, domain shift of data across clinical data centers poses a serious challenge when deploying learning-based models. In this work, we attempt to find a solution for this challenge via federated and semi-supervised learning. A multi-national database consisting of 1704 scans from three countries is adopted to study the performance gap, when training a model with one dataset and applying it to another. Expert radiologists manually delineated 945 scans for COVID-19 findings. In handling the variability in both the data and annotations, a novel federated semi-supervised learning technique is proposed to fully utilize all available data (with or without annotations). Federated learning avoids the need for sensitive data-sharing, which makes it favorable for institutions and nations with strict regulatory policy on data privacy. Moreover, semi-supervision potentially reduces the annotation burden under a distributed setting. The proposed framework is shown to be effective compared to fully supervised scenarios with conventional data sharing instead of model weight sharing.
△ Less
Submitted 23 November, 2020;
originally announced November 2020.
-
Transducer Adaptive Ultrasound Volume Reconstruction
Authors:
Hengtao Guo,
Sheng Xu,
Bradford J. Wood,
Pingkun Yan
Abstract:
Reconstructed 3D ultrasound volume provides more context information compared to a sequence of 2D scanning frames, which is desirable for various clinical applications such as ultrasound-guided prostate biopsy. Nevertheless, 3D volume reconstruction from freehand 2D scans is a very challenging problem, especially without the use of external tracking devices. Recent deep learning based methods demo…
▽ More
Reconstructed 3D ultrasound volume provides more context information compared to a sequence of 2D scanning frames, which is desirable for various clinical applications such as ultrasound-guided prostate biopsy. Nevertheless, 3D volume reconstruction from freehand 2D scans is a very challenging problem, especially without the use of external tracking devices. Recent deep learning based methods demonstrate the potential of directly estimating inter-frame motion between consecutive ultrasound frames. However, such algorithms are specific to particular transducers and scanning trajectories associated with the training data, which may not be generalized to other image acquisition settings. In this paper, we tackle the data acquisition difference as a domain shift problem and propose a novel domain adaptation strategy to adapt deep learning algorithms to data acquired with different transducers. Specifically, feature extractors that generate transducer-invariant features from different datasets are trained by minimizing the discrepancy between deep features of paired samples in a latent space. Our results show that the proposed domain adaptation method can successfully align different feature distributions while preserving the transducer-specific information for universal freehand ultrasound volume reconstruction.
△ Less
Submitted 16 November, 2020;
originally announced November 2020.
-
The Open Catalyst 2020 (OC20) Dataset and Community Challenges
Authors:
Lowik Chanussot,
Abhishek Das,
Siddharth Goyal,
Thibaut Lavril,
Muhammed Shuaibi,
Morgane Riviere,
Kevin Tran,
Javier Heras-Domingo,
Caleb Ho,
Weihua Hu,
Aini Palizhati,
Anuroop Sriram,
Brandon Wood,
Junwoong Yoon,
Devi Parikh,
C. Lawrence Zitnick,
Zachary Ulissi
Abstract:
Catalyst discovery and optimization is key to solving many societal and energy challenges including solar fuels synthesis, long-term energy storage, and renewable fertilizer production. Despite considerable effort by the catalysis community to apply machine learning models to the computational catalyst discovery process, it remains an open challenge to build models that can generalize across both…
▽ More
Catalyst discovery and optimization is key to solving many societal and energy challenges including solar fuels synthesis, long-term energy storage, and renewable fertilizer production. Despite considerable effort by the catalysis community to apply machine learning models to the computational catalyst discovery process, it remains an open challenge to build models that can generalize across both elemental compositions of surfaces and adsorbate identity/configurations, perhaps because datasets have been smaller in catalysis than related fields. To address this we developed the OC20 dataset, consisting of 1,281,040 Density Functional Theory (DFT) relaxations (~264,890,000 single point evaluations) across a wide swath of materials, surfaces, and adsorbates (nitrogen, carbon, and oxygen chemistries). We supplemented this dataset with randomly perturbed structures, short timescale molecular dynamics, and electronic structure analyses. The dataset comprises three central tasks indicative of day-to-day catalyst modeling and comes with pre-defined train/validation/test splits to facilitate direct comparisons with future model development efforts. We applied three state-of-the-art graph neural network models (CGCNN, SchNet, Dimenet++) to each of these tasks as baseline demonstrations for the community to build on. In almost every task, no upper limit on model size was identified, suggesting that even larger models are likely to improve on initial results. The dataset and baseline models are both provided as open resources, as well as a public leader board to encourage community contributions to solve these important tasks.
△ Less
Submitted 24 September, 2021; v1 submitted 19 October, 2020;
originally announced October 2020.
-
An Introduction to Electrocatalyst Design using Machine Learning for Renewable Energy Storage
Authors:
C. Lawrence Zitnick,
Lowik Chanussot,
Abhishek Das,
Siddharth Goyal,
Javier Heras-Domingo,
Caleb Ho,
Weihua Hu,
Thibaut Lavril,
Aini Palizhati,
Morgane Riviere,
Muhammed Shuaibi,
Anuroop Sriram,
Kevin Tran,
Brandon Wood,
Junwoong Yoon,
Devi Parikh,
Zachary Ulissi
Abstract:
Scalable and cost-effective solutions to renewable energy storage are essential to addressing the world's rising energy needs while reducing climate change. As we increase our reliance on renewable energy sources such as wind and solar, which produce intermittent power, storage is needed to transfer power from times of peak generation to peak demand. This may require the storage of power for hours…
▽ More
Scalable and cost-effective solutions to renewable energy storage are essential to addressing the world's rising energy needs while reducing climate change. As we increase our reliance on renewable energy sources such as wind and solar, which produce intermittent power, storage is needed to transfer power from times of peak generation to peak demand. This may require the storage of power for hours, days, or months. One solution that offers the potential of scaling to nation-sized grids is the conversion of renewable energy to other fuels, such as hydrogen or methane. To be widely adopted, this process requires cost-effective solutions to running electrochemical reactions. An open challenge is finding low-cost electrocatalysts to drive these reactions at high rates. Through the use of quantum mechanical simulations (density functional theory), new catalyst structures can be tested and evaluated. Unfortunately, the high computational cost of these simulations limits the number of structures that may be tested. The use of machine learning may provide a method to efficiently approximate these calculations, leading to new approaches in finding effective electrocatalysts. In this paper, we provide an introduction to the challenges in finding suitable electrocatalysts, how machine learning may be applied to the problem, and the use of the Open Catalyst Project OC20 dataset for model training.
△ Less
Submitted 14 October, 2020;
originally announced October 2020.
-
Multi-Domain Image Completion for Random Missing Input Data
Authors:
Liyue Shen,
Wentao Zhu,
Xiaosong Wang,
Lei Xing,
John M. Pauly,
Baris Turkbey,
Stephanie Anne Harmon,
Thomas Hogue Sanford,
Sherif Mehralivand,
Peter Choyke,
Bradford Wood,
Daguang Xu
Abstract:
Multi-domain data are widely leveraged in vision applications taking advantage of complementary information from different modalities, e.g., brain tumor segmentation from multi-parametric magnetic resonance imaging (MRI). However, due to possible data corruption and different imaging protocols, the availability of images for each domain could vary amongst multiple data sources in practice, which m…
▽ More
Multi-domain data are widely leveraged in vision applications taking advantage of complementary information from different modalities, e.g., brain tumor segmentation from multi-parametric magnetic resonance imaging (MRI). However, due to possible data corruption and different imaging protocols, the availability of images for each domain could vary amongst multiple data sources in practice, which makes it challenging to build a universal model with a varied set of input data. To tackle this problem, we propose a general approach to complete the random missing domain(s) data in real applications. Specifically, we develop a novel multi-domain image completion method that utilizes a generative adversarial network (GAN) with a representational disentanglement scheme to extract shared skeleton encoding and separate flesh encoding across multiple domains. We further illustrate that the learned representation in multi-domain image completion could be leveraged for high-level tasks, e.g., segmentation, by introducing a unified framework consisting of image completion and segmentation with a shared content encoder. The experiments demonstrate consistent performance improvement on three datasets for brain tumor segmentation, prostate segmentation, and facial expression image completion respectively.
△ Less
Submitted 10 July, 2020;
originally announced July 2020.
-
A Data-driven Market Simulator for Small Data Environments
Authors:
Hans Bühler,
Blanka Horvath,
Terry Lyons,
Imanol Perez Arribas,
Ben Wood
Abstract:
Neural network based data-driven market simulation unveils a new and flexible way of modelling financial time series without imposing assumptions on the underlying stochastic dynamics. Though in this sense generative market simulation is model-free, the concrete modelling choices are nevertheless decisive for the features of the simulated paths. We give a brief overview of currently used generativ…
▽ More
Neural network based data-driven market simulation unveils a new and flexible way of modelling financial time series without imposing assumptions on the underlying stochastic dynamics. Though in this sense generative market simulation is model-free, the concrete modelling choices are nevertheless decisive for the features of the simulated paths. We give a brief overview of currently used generative modelling approaches and performance evaluation metrics for financial time series, and address some of the challenges to achieve good results in the latter. We also contrast some classical approaches of market simulation with simulation based on generative modelling and highlight some advantages and pitfalls of the new approach. While most generative models tend to rely on large amounts of training data, we present here a generative model that works reliably in environments where the amount of available training data is notoriously small. Furthermore, we show how a rough paths perspective combined with a parsimonious Variational Autoencoder framework provides a powerful way for encoding and evaluating financial time series in such environments where available training data is scarce. Finally, we also propose a suitable performance evaluation metric for financial time series and discuss some connections of our Market Generator to deep hedging.
△ Less
Submitted 21 June, 2020;
originally announced June 2020.
-
Sensorless Freehand 3D Ultrasound Reconstruction via Deep Contextual Learning
Authors:
Hengtao Guo,
Sheng Xu,
Bradford Wood,
Pingkun Yan
Abstract:
Transrectal ultrasound (US) is the most commonly used imaging modality to guide prostate biopsy and its 3D volume provides even richer context information. Current methods for 3D volume reconstruction from freehand US scans require external tracking devices to provide spatial position for every frame. In this paper, we propose a deep contextual learning network (DCL-Net), which can efficiently exp…
▽ More
Transrectal ultrasound (US) is the most commonly used imaging modality to guide prostate biopsy and its 3D volume provides even richer context information. Current methods for 3D volume reconstruction from freehand US scans require external tracking devices to provide spatial position for every frame. In this paper, we propose a deep contextual learning network (DCL-Net), which can efficiently exploit the image feature relationship between US frames and reconstruct 3D US volumes without any tracking device. The proposed DCL-Net utilizes 3D convolutions over a US video segment for feature extraction. An embedded self-attention module makes the network focus on the speckle-rich areas for better spatial movement prediction. We also propose a novel case-wise correlation loss to stabilize the training process for improved accuracy. Highly promising results have been obtained by using the developed method. The experiments with ablation studies demonstrate superior performance of the proposed method by comparing against other state-of-the-art methods. Source code of this work is publicly available at https://github.com/DIAL-RPI/FreehandUSRecon.
△ Less
Submitted 13 June, 2020;
originally announced June 2020.
-
Mining social media data for biomedical signals and health-related behavior
Authors:
Rion Brattig Correia,
Ian B. Wood,
Johan Bollen,
Luis M. Rocha
Abstract:
Social media data has been increasingly used to study biomedical and health-related phenomena. From cohort level discussions of a condition to planetary level analyses of sentiment, social media has provided scientists with unprecedented amounts of data to study human behavior and response associated with a variety of health conditions and medical treatments. Here we review recent work in mining s…
▽ More
Social media data has been increasingly used to study biomedical and health-related phenomena. From cohort level discussions of a condition to planetary level analyses of sentiment, social media has provided scientists with unprecedented amounts of data to study human behavior and response associated with a variety of health conditions and medical treatments. Here we review recent work in mining social media for biomedical, epidemiological, and social phenomena information relevant to the multilevel complexity of human health. We pay particular attention to topics where social media data analysis has shown the most progress, including pharmacovigilance, sentiment analysis especially for mental health, and other areas. We also discuss a variety of innovative uses of social media data for health-related applications and important limitations in social media data access and use.
△ Less
Submitted 28 January, 2020;
originally announced January 2020.
-
Deep Hedging: Learning to Simulate Equity Option Markets
Authors:
Magnus Wiese,
Lianjun Bai,
Ben Wood,
Hans Buehler
Abstract:
We construct realistic equity option market simulators based on generative adversarial networks (GANs). We consider recurrent and temporal convolutional architectures, and assess the impact of state compression. Option market simulators are highly relevant because they allow us to extend the limited real-world data sets available for the training and evaluation of option trading strategies. We sho…
▽ More
We construct realistic equity option market simulators based on generative adversarial networks (GANs). We consider recurrent and temporal convolutional architectures, and assess the impact of state compression. Option market simulators are highly relevant because they allow us to extend the limited real-world data sets available for the training and evaluation of option trading strategies. We show that network-based generators outperform classical methods on a range of benchmark metrics, and adversarial training achieves the best performance. Our work demonstrates for the first time that GANs can be successfully applied to the task of generating multivariate financial time series.
△ Less
Submitted 5 November, 2019;
originally announced November 2019.
-
Unified Multi-scale Feature Abstraction for Medical Image Segmentation
Authors:
Xi Fang,
Bo Du,
Sheng Xu,
Bradford J. Wood,
Pingkun Yan
Abstract:
Automatic medical image segmentation, an essential component of medical image analysis, plays an importantrole in computer-aided diagnosis. For example, locating and segmenting the liver can be very helpful in livercancer diagnosis and treatment. The state-of-the-art models in medical image segmentation are variants ofthe encoder-decoder architecture such as fully convolutional network (FCN) and U…
▽ More
Automatic medical image segmentation, an essential component of medical image analysis, plays an importantrole in computer-aided diagnosis. For example, locating and segmenting the liver can be very helpful in livercancer diagnosis and treatment. The state-of-the-art models in medical image segmentation are variants ofthe encoder-decoder architecture such as fully convolutional network (FCN) and U-Net.1A major focus ofthe FCN based segmentation methods has been on network structure engineering by incorporating the latestCNN structures such as ResNet2and DenseNet.3In addition to exploring new network structures for efficientlyabstracting high level features, incorporating structures for multi-scale image feature extraction in FCN hashelped to improve performance in segmentation tasks. In this paper, we design a new multi-scale networkarchitecture, which takes multi-scale inputs with dedicated convolutional paths to efficiently combine featuresfrom different scales to better utilize the hierarchical information.
△ Less
Submitted 24 October, 2019;
originally announced October 2019.
-
Learning Deep Similarity Metric for 3D MR-TRUS Registration
Authors:
Grant Haskins,
Jochen Kruecker,
Uwe Kruger,
Sheng Xu,
Peter A. Pinto,
Brad J. Wood,
Pingkun Yan
Abstract:
Purpose: The fusion of transrectal ultrasound (TRUS) and magnetic resonance (MR) images for guiding targeted prostate biopsy has significantly improved the biopsy yield of aggressive cancers. A key component of MR-TRUS fusion is image registration. However, it is very challenging to obtain a robust automatic MR-TRUS registration due to the large appearance difference between the two imaging modali…
▽ More
Purpose: The fusion of transrectal ultrasound (TRUS) and magnetic resonance (MR) images for guiding targeted prostate biopsy has significantly improved the biopsy yield of aggressive cancers. A key component of MR-TRUS fusion is image registration. However, it is very challenging to obtain a robust automatic MR-TRUS registration due to the large appearance difference between the two imaging modalities. The work presented in this paper aims to tackle this problem by addressing two challenges: (i) the definition of a suitable similarity metric and (ii) the determination of a suitable optimization strategy.
Methods: This work proposes the use of a deep convolutional neural network to learn a similarity metric for MR-TRUS registration. We also use a composite optimization strategy that explores the solution space in order to search for a suitable initialization for the second-order optimization of the learned metric. Further, a multi-pass approach is used in order to smooth the metric for optimization.
Results: The learned similarity metric outperforms the classical mutual information and also the state-of-the-art MIND feature based methods. The results indicate that the overall registration framework has a large capture range. The proposed deep similarity metric based approach obtained a mean TRE of 3.86mm (with an initial TRE of 16mm) for this challenging problem.
Conclusion: A similarity metric that is learned using a deep neural network can be used to assess the quality of any given image registration and can be used in conjunction with the aforementioned optimization framework to perform automatic registration that is robust to poor initialization.
△ Less
Submitted 15 October, 2018; v1 submitted 12 June, 2018;
originally announced June 2018.
-
Adversarial Image Registration with Application for MR and TRUS Image Fusion
Authors:
Pingkun Yan,
Sheng Xu,
Ardeshir R. Rastinehad,
Brad J. Wood
Abstract:
Robust and accurate alignment of multimodal medical images is a very challenging task, which however is very useful for many clinical applications. For example, magnetic resonance (MR) and transrectal ultrasound (TRUS) image registration is a critical component in MR-TRUS fusion guided prostate interventions. However, due to the huge difference between the image appearances and the large variation…
▽ More
Robust and accurate alignment of multimodal medical images is a very challenging task, which however is very useful for many clinical applications. For example, magnetic resonance (MR) and transrectal ultrasound (TRUS) image registration is a critical component in MR-TRUS fusion guided prostate interventions. However, due to the huge difference between the image appearances and the large variation in image correspondence, MR-TRUS image registration is a very challenging problem. In this paper, an adversarial image registration (AIR) framework is proposed. By training two deep neural networks simultaneously, one being a generator and the other being a discriminator, we can obtain not only a network for image registration, but also a metric network which can help evaluate the quality of image registration. The developed AIR-net is then evaluated using clinical datasets acquired through image-fusion guided prostate biopsy procedures and promising results are demonstrated.
△ Less
Submitted 1 October, 2018; v1 submitted 29 April, 2018;
originally announced April 2018.
-
A Collaborative Computer Aided Diagnosis (C-CAD) System with Eye-Tracking, Sparse Attentional Model, and Deep Learning
Authors:
Naji Khosravan,
Haydar Celik,
Baris Turkbey,
Elizabeth Jones,
Bradford Wood,
Ulas Bagci
Abstract:
There are at least two categories of errors in radiology screening that can lead to suboptimal diagnostic decisions and interventions:(i)human fallibility and (ii)complexity of visual search. Computer aided diagnostic (CAD) tools are developed to help radiologists to compensate for some of these errors. However, despite their significant improvements over conventional screening strategies, most CA…
▽ More
There are at least two categories of errors in radiology screening that can lead to suboptimal diagnostic decisions and interventions:(i)human fallibility and (ii)complexity of visual search. Computer aided diagnostic (CAD) tools are developed to help radiologists to compensate for some of these errors. However, despite their significant improvements over conventional screening strategies, most CAD systems do not go beyond their use as second opinion tools due to producing a high number of false positives, which human interpreters need to correct. In parallel with efforts in computerized analysis of radiology scans, several researchers have examined behaviors of radiologists while screening medical images to better understand how and why they miss tumors, how they interact with the information in an image, and how they search for unknown pathology in the images. Eye-tracking tools have been instrumental in exploring answers to these fundamental questions. In this paper, we aim to develop a paradigm shift CAD system, called collaborative CAD (C-CAD), that unifies both of the above mentioned research lines: CAD and eye-tracking. We design an eye-tracking interface providing radiologists with a real radiology reading room experience. Then, we propose a novel algorithm that unifies eye-tracking data and a CAD system. Specifically, we present a new graph based clustering and sparsification algorithm to transform eye-tracking data (gaze) into a signal model to interpret gaze patterns quantitatively and qualitatively. The proposed C-CAD collaborates with radiologists via eye-tracking technology and helps them to improve diagnostic decisions. The C-CAD learns radiologists' search efficiency by processing their gaze patterns. To do this, the C-CAD uses a deep learning algorithm in a newly designed multi-task learning platform to segment and diagnose cancers simultaneously.
△ Less
Submitted 28 April, 2018; v1 submitted 17 February, 2018;
originally announced February 2018.
-
Human Sexual Cycles are Driven by Culture and Match Collective Moods
Authors:
Ian B. Wood,
Pedro Leal Varela,
Johan Bollen,
Luis M. Rocha,
Joana Gonçalves-Sá
Abstract:
It is a long-standing question whether human sexual and reproductive cycles are affected predominantly by biology or culture. The literature is mixed with respect to whether biological or cultural factors best explain the reproduction cycle phenomenon, with biological explanations dominating the argument. The biological hypothesis proposes that human reproductive cycles are an adaptation to the se…
▽ More
It is a long-standing question whether human sexual and reproductive cycles are affected predominantly by biology or culture. The literature is mixed with respect to whether biological or cultural factors best explain the reproduction cycle phenomenon, with biological explanations dominating the argument. The biological hypothesis proposes that human reproductive cycles are an adaptation to the seasonal cycles caused by hemisphere positioning, while the cultural hypothesis proposes that conception dates vary mostly due to cultural factors, such as vacation schedule or religious holidays. However, for many countries, common records used to investigate these hypotheses are incomplete or unavailable, biasing existing analysis towards primarily Christian countries in the Northern Hemisphere. Here we show that interest in sex peaks sharply online during major cultural and religious celebrations, regardless of hemisphere location. This online interest, when shifted by nine months, corresponds to documented human birth cycles, even after adjusting for numerous factors such as language, season, and amount of free time due to holidays. We further show that mood, measured independently on Twitter, contains distinct collective emotions associated with those cultural celebrations, and these collective moods correlate with sex search volume outside of these holidays as well. Our results provide converging evidence that the cyclic sexual and reproductive behavior of human populations is mostly driven by culture and that this interest in sex is associated with specific emotions, characteristic of, but not limited to, major cultural and religious celebrations.
△ Less
Submitted 27 October, 2017; v1 submitted 12 July, 2017;
originally announced July 2017.
-
Element-centric clustering comparison unifies overlaps and hierarchy
Authors:
Alexander J. Gates,
Ian B. Wood,
William P. Hetrick,
Yong-Yeol Ahn
Abstract:
Clustering is one of the most universal approaches for understanding complex data. A pivotal aspect of clustering analysis is quantitatively comparing clusterings; clustering comparison is the basis for many tasks such as clustering evaluation, consensus clustering, and tracking the temporal evolution of clusters. In particular, the extrinsic evaluation of clustering methods requires comparing the…
▽ More
Clustering is one of the most universal approaches for understanding complex data. A pivotal aspect of clustering analysis is quantitatively comparing clusterings; clustering comparison is the basis for many tasks such as clustering evaluation, consensus clustering, and tracking the temporal evolution of clusters. In particular, the extrinsic evaluation of clustering methods requires comparing the uncovered clusterings to planted clusterings or known metadata. Yet, as we demonstrate, existing clustering comparison measures have critical biases which undermine their usefulness, and no measure accommodates both overlapping and hierarchical clusterings. Here we unify the comparison of disjoint, overlapping, and hierarchically structured clusterings by proposing a new element-centric framework: elements are compared based on the relationships induced by the cluster structure, as opposed to the traditional cluster-centric philosophy. We demonstrate that, in contrast to standard clustering similarity measures, our framework does not suffer from critical biases and naturally provides unique insights into how the clusterings differ. We illustrate the strengths of our framework by revealing new insights into the organization of clusters in two applications: the improved classification of schizophrenia based on the overlapping and hierarchical community structure of fMRI brain networks, and the disentanglement of various social homophily factors in Facebook social networks. The universality of clustering suggests far-reaching impact of our framework throughout all areas of science.
△ Less
Submitted 12 June, 2019; v1 submitted 19 June, 2017;
originally announced June 2017.