Search | arXiv e-print repository

arXiv:2411.07322 [pdf]

Artificial Intelligence-Informed Handheld Breast Ultrasound for Screening: A Systematic Review of Diagnostic Test Accuracy

Authors: Arianna Bunnell, Dustin Valdez, Fredrik Strand, Yannik Glaser, Peter Sadowski, John A. Shepherd

Abstract: Background. Breast cancer screening programs using mammography have led to significant mortality reduction in high-income countries. However, many low- and middle-income countries lack resources for mammographic screening. Handheld breast ultrasound (BUS) is a low-cost alternative but requires substantial training. Artificial intelligence (AI) enabled BUS may aid in both the detection (perception)… ▽ More Background. Breast cancer screening programs using mammography have led to significant mortality reduction in high-income countries. However, many low- and middle-income countries lack resources for mammographic screening. Handheld breast ultrasound (BUS) is a low-cost alternative but requires substantial training. Artificial intelligence (AI) enabled BUS may aid in both the detection (perception) and classification (interpretation) of breast cancer. Materials and Methods. This review (CRD42023493053) is reported in accordance with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) and SWiM (Synthesis Without Meta-analysis) guidelines. PubMed and Google Scholar were searched from January 1, 2016 to December 12, 2023. A meta-analysis was not attempted. Studies are grouped according to their AI task type, application time, and AI task. Study quality is assessed using the QUality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool. Results. Of 763 candidate studies, 314 total full texts were reviewed. 34 studies are included. The AI tasks of included studies are as follows: 1 frame selection, 6 detection, 11 segmentation, and 16 classification. In total, 5.7 million BUS images from over 185,000 patients were used for AI training or validation. A single study included a prospective testing set. 79% of studies were at high or unclear risk of bias. Conclusion. There has been encouraging development of AI for BUS. Despite studies demonstrating high performance across all identified tasks, the evidence supporting AI-enhanced BUS generally lacks robustness. High-quality model validation will be key to realizing the potential for AI-enhanced BUS in increasing access to screening in resource-limited environments. △ Less

Submitted 11 November, 2024; originally announced November 2024.

arXiv:2411.00891 [pdf]

Deep Learning Predicts Mammographic Breast Density in Clinical Breast Ultrasound Images

Authors: Arianna Bunnell, Dustin Valdez, Thomas K. Wolfgruber, Brandon Quon, Kailee Hung, Brenda Y. Hernandez, Todd B. Seto, Jeffrey Killeen, Marshall Miyoshi, Peter Sadowski, John A. Shepherd

Abstract: Background: Breast density, as derived from mammographic images and defined by the American College of Radiology's Breast Imaging Reporting and Data System (BI-RADS), is one of the strongest risk factors for breast cancer. Breast ultrasound (BUS) is an alternative breast cancer screening modality, particularly useful for early detection in low-resource, rural contexts. The purpose of this study wa… ▽ More Background: Breast density, as derived from mammographic images and defined by the American College of Radiology's Breast Imaging Reporting and Data System (BI-RADS), is one of the strongest risk factors for breast cancer. Breast ultrasound (BUS) is an alternative breast cancer screening modality, particularly useful for early detection in low-resource, rural contexts. The purpose of this study was to explore an artificial intelligence (AI) model to predict BI-RADS mammographic breast density category from clinical, handheld BUS imaging. Methods: All data are sourced from the Hawaii and Pacific Islands Mammography Registry. We compared deep learning methods from BUS imaging, as well as machine learning models from image statistics alone. The use of AI-derived BUS density as a risk factor for breast cancer was then compared to clinical BI-RADS breast density while adjusting for age. The BUS data were split by individual into 70/20/10% groups for training, validation, and testing. Results: 405,120 clinical BUS images from 14.066 women were selected for inclusion in this study, resulting in 9.846 women for training (302,574 images), 2,813 for validation (11,223 images), and 1,406 for testing (4,042 images). On the held-out testing set, the strongest AI model achieves AUROC 0.854 predicting BI-RADS mammographic breast density from BUS imaging and outperforms all shallow machine learning methods based on image statistics. In cancer risk prediction, age-adjusted AI BUS breast density predicted 5-year breast cancer risk with 0.633 AUROC, as compared to 0.637 AUROC from age-adjusted clinical breast density. Conclusions: BI-RADS mammographic breast density can be estimated from BUS imaging with high accuracy using a deep learning model. Furthermore, we demonstrate that AI-derived BUS breast density is predictive of 5-year breast cancer risk in our population. △ Less

Submitted 7 November, 2024; v1 submitted 31 October, 2024; originally announced November 2024.

arXiv:2407.11316 [pdf]

BUSClean: Open-source software for breast ultrasound image pre-processing and knowledge extraction for medical AI

Authors: Arianna Bunnell, Kailee Hung, John A. Shepherd, Peter Sadowski

Abstract: Development of artificial intelligence (AI) for medical imaging demands curation and cleaning of large-scale clinical datasets comprising hundreds of thousands of images. Some modalities, such as mammography, contain highly standardized imaging. In contrast, breast ultrasound imaging (BUS) can contain many irregularities not indicated by scan metadata, such as enhanced scan modes, sonographer anno… ▽ More Development of artificial intelligence (AI) for medical imaging demands curation and cleaning of large-scale clinical datasets comprising hundreds of thousands of images. Some modalities, such as mammography, contain highly standardized imaging. In contrast, breast ultrasound imaging (BUS) can contain many irregularities not indicated by scan metadata, such as enhanced scan modes, sonographer annotations, or additional views. We present an open-source software solution for automatically processing clinical BUS datasets. The algorithm performs BUS scan filtering (flagging of invalid and non-B-mode scans), cleaning (dual-view scan detection, scan area cropping, and caliper detection), and knowledge extraction (BI-RADS Labeling and Measurement fields) from sonographer annotations. Its modular design enables users to adapt it to new settings. Experiments on an internal testing dataset of 430 clinical BUS images achieve >95% sensitivity and >98% specificity in detecting every type of text annotation, >98% sensitivity and specificity in detecting scans with blood flow highlighting, alternative scan modes, or invalid scans. A case study on a completely external, public dataset of BUS scans found that BUSClean identified text annotations and scans with blood flow highlighting with 88.6% and 90.9% sensitivity and 98.3% and 99.9% specificity, respectively. Adaptation of the lesion caliper detection method to account for a type of caliper specific to the case study demonstrates the intended use of BUSClean in new data distributions and improved performance in lesion caliper detection from 43.3% and 93.3% out-of-the-box to 92.1% and 92.3% sensitivity and specificity, respectively. Source code, example notebooks, and sample data are available at https://github.com/hawaii-ai/bus-cleaning. △ Less

Submitted 29 October, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.00267 [pdf]

Learning a Clinically-Relevant Concept Bottleneck for Lesion Detection in Breast Ultrasound

Authors: Arianna Bunnell, Yannik Glaser, Dustin Valdez, Thomas Wolfgruber, Aleen Altamirano, Carol Zamora González, Brenda Y. Hernandez, Peter Sadowski, John A. Shepherd

Abstract: Detecting and classifying lesions in breast ultrasound images is a promising application of artificial intelligence (AI) for reducing the burden of cancer in regions with limited access to mammography. Such AI systems are more likely to be useful in a clinical setting if their predictions can be explained to a radiologist. This work proposes an explainable AI model that provides interpretable pred… ▽ More Detecting and classifying lesions in breast ultrasound images is a promising application of artificial intelligence (AI) for reducing the burden of cancer in regions with limited access to mammography. Such AI systems are more likely to be useful in a clinical setting if their predictions can be explained to a radiologist. This work proposes an explainable AI model that provides interpretable predictions using a standard lexicon from the American College of Radiology's Breast Imaging and Reporting Data System (BI-RADS). The model is a deep neural network featuring a concept bottleneck layer in which known BI-RADS features are predicted before making a final cancer classification. This enables radiologists to easily review the predictions of the AI system and potentially fix errors in real time by modifying the concept predictions. In experiments, a model is developed on 8,854 images from 994 women with expert annotations and histological cancer labels. The model outperforms state-of-the-art lesion detection frameworks with 48.9 average precision on the held-out testing set, and for cancer classification, concept intervention is shown to increase performance from 0.876 to 0.885 area under the receiver operating characteristic curve. Training and evaluation code is available at https://github.com/hawaii-ai/bus-cbm. △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: Submitted version of manuscript accepted at MICCAI 2024. This preprint has not undergone peer review or any post-submission improvements or corrections

arXiv:2206.06663 [pdf, ps, other]

Quantitative Imaging Principles Improves Medical Image Learning

Authors: Lambert T. Leong, Michael C. Wong, Yannik Glaser, Thomas Wolfgruber, Steven B. Heymsfield, Peter Sadowski, John A. Shepherd

Abstract: Fundamental differences between natural and medical images have recently favored the use of self-supervised learning (SSL) over ImageNet transfer learning for medical image applications. Differences between image types are primarily due to the imaging modality and medical images utilize a wide range of physics based techniques while natural images are captured using only visible light. While many… ▽ More Fundamental differences between natural and medical images have recently favored the use of self-supervised learning (SSL) over ImageNet transfer learning for medical image applications. Differences between image types are primarily due to the imaging modality and medical images utilize a wide range of physics based techniques while natural images are captured using only visible light. While many have demonstrated that SSL on medical images has resulted in better downstream task performance, our work suggests that more performance can be gained. The scientific principles which are used to acquire medical images are not often considered when constructing learning problems. For this reason, we propose incorporating quantitative imaging principles during generative SSL to improve image quality and quantitative biological accuracy. We show that this training schema results in better starting states for downstream supervised training on limited data. Our model also generates images that validate on clinical quantitative analysis software. △ Less

Submitted 11 July, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

arXiv:2110.04157 [pdf, other]

Velocity Level Approximation of Pressure Field Contact Patches

Authors: Joseph Masterjohn, Damrong Guoy, John Shepherd, Alejandro Castro

Abstract: Pressure Field Contact (PFC) was recently introduced as a method for detailed modeling of contact interface regions at rates much faster than elasticity-theory models, while at the same time predicting essential trends and capturing rich contact behavior. The PFC model was designed to work in conjunction with error-controlled integration at the acceleration level. Therefore a vast majority of exis… ▽ More Pressure Field Contact (PFC) was recently introduced as a method for detailed modeling of contact interface regions at rates much faster than elasticity-theory models, while at the same time predicting essential trends and capturing rich contact behavior. The PFC model was designed to work in conjunction with error-controlled integration at the acceleration level. Therefore a vast majority of existent multibody codes using solvers at the velocity level cannot incorporate PFC in its original form. In this work we introduce a discrete in time approximation of PFC making it suitable for use with existent velocity-level time steppers and enabling execution at real-time rates. We evaluate the accuracy and performance gains of our approach and demonstrate its effectiveness in simulating relevant manipulation tasks. The method is available in open source as part of Drake's Hydroelastic Contact model. △ Less

Submitted 9 June, 2022; v1 submitted 8 October, 2021; originally announced October 2021.

Comments: 8 pages, 10 figures. Supplementary video can be found at https://youtu.be/AdCnTyqqQP0

arXiv:2109.12186 [pdf]

Aristotle Cloud Federation: Container Runtimes Technical Report

Authors: Peter Z. Vaillancourt, Bennett Wineholt, Tristan J. Shepherd, Sara C. Pryor, Jeffrey Lantz, Richard Knepper, Rich Wolski, Christopher R. Myers, Ben Trumbore, Resa Reynolds, Jodie Sprouse, David Lifka

Abstract: A National Science Foundation-sponsored container runtimes investigation was conducted by the Aristotle Cloud Federation to better understand the challenges of selecting and using Docker, Singularity, and X-Containers. The main goal of this investigation was to identify the "pain points" experienced by users when selecting and using containers for scientific research and to share lessons learned.… ▽ More A National Science Foundation-sponsored container runtimes investigation was conducted by the Aristotle Cloud Federation to better understand the challenges of selecting and using Docker, Singularity, and X-Containers. The main goal of this investigation was to identify the "pain points" experienced by users when selecting and using containers for scientific research and to share lessons learned. Application performance characteristics are included in this report as well as user experiences with Kubernetes and container orchestration on cloud and HPC platforms. Scientists, research computing practitioners, and educators may find value in this report when considering the use and/or deployment of containers or when preparing students to meet the unique challenges of using containers in scientific research. △ Less

Submitted 24 September, 2021; originally announced September 2021.

Comments: Lead author contact: Peter Z. Vaillancourt at [email protected]; Edited by: P. Redfern; 48 pages

arXiv:2103.10164 [pdf]

PySTACHIO: Python Single-molecule TrAcking stoiCHiometry Intensity and simulatiOn, a flexible, extensible, beginner-friendly and optimized program for analysis of single-molecule microscopy

Authors: Jack W Shepherd, Ed J Higgins, Adam J M Wollman, Mark C Leake

Abstract: As camera pixel arrays have grown larger and faster, and optical microscopy techniques ever more refined, there has been an explosion in the quantity of data acquired during routine light microcopy. At the single-molecule level, analysis involves multiple steps and can rapidly become computationally expensive, in some cases intractable on office workstations. Complex bespoke software can present h… ▽ More As camera pixel arrays have grown larger and faster, and optical microscopy techniques ever more refined, there has been an explosion in the quantity of data acquired during routine light microcopy. At the single-molecule level, analysis involves multiple steps and can rapidly become computationally expensive, in some cases intractable on office workstations. Complex bespoke software can present high activation barriers to entry for new users. Here, we redevelop our quantitative single-molecule analysis routines into an optimized and extensible Python program, with GUI and command-line implementations to facilitate use on local machines and remote clusters, by beginners and advanced users alike. We demonstrate that its performance is on par with previous MATLAB implementations but runs an order of magnitude faster. We tested it against challenge data and demonstrate its performance is comparable to state-of-the-art analysis platforms. We show the code can extract fluorescence intensity values for single reporter dye molecules and, using these, estimate molecular stoichiometries and cellular copy numbers of fluorescently-labeled biomolecules. It can evaluate 2D diffusion coefficients for the characteristically short single-particle tracking data. To facilitate benchmarking we include data simulation routines to compare different analysis programs. Finally, we show that it works with 2-color data and enables colocalization analysis based on overlap integration, to infer interactions between differently labelled biomolecules. By making this freely available we aim to make complex light microscopy single-molecule analysis more democratized. △ Less

Submitted 5 July, 2021; v1 submitted 18 March, 2021; originally announced March 2021.

arXiv:1504.07999 [pdf, other]

doi 10.1103/PhysRevLett.117.080501

Average-case complexity versus approximate simulation of commuting quantum computations

Authors: Michael J. Bremner, Ashley Montanaro, Dan J. Shepherd

Abstract: We use the class of commuting quantum computations known as IQP (Instantaneous Quantum Polynomial time) to strengthen the conjecture that quantum computers are hard to simulate classically. We show that, if either of two plausible average-case hardness conjectures holds, then IQP computations are hard to simulate classically up to constant additive error. One conjecture relates to the hardness of… ▽ More We use the class of commuting quantum computations known as IQP (Instantaneous Quantum Polynomial time) to strengthen the conjecture that quantum computers are hard to simulate classically. We show that, if either of two plausible average-case hardness conjectures holds, then IQP computations are hard to simulate classically up to constant additive error. One conjecture relates to the hardness of estimating the complex-temperature partition function for random instances of the Ising model; the other concerns approximating the number of zeroes of random low-degree polynomials. We observe that both conjectures can be shown to be valid in the setting of worst-case complexity. We arrive at these conjectures by deriving spin-based generalisations of the Boson Sampling problem that avoid the so-called permanent anticoncentration conjecture. △ Less

Submitted 23 September, 2015; v1 submitted 29 April, 2015; originally announced April 2015.

Comments: This version is arguably easier to read than v1. Trust us, we argued about it. 4+1+5 pages, RevTex 4.1

Journal ref: Phys. Rev. Lett. 117, 080501 (2016)

arXiv:1407.5407 [pdf, other]

doi 10.5334/jors.bw

Open-source development experiences in scientific software: the HANDE quantum Monte Carlo project

Authors: J. S. Spencer, N. S. Blunt, W. A. Vigor, F. D. Malone, W. M. C. Foulkes, James J. Shepherd, A. J. W. Thom

Abstract: The HANDE quantum Monte Carlo project offers accessible stochastic algorithms for general use for scientists in the field of quantum chemistry. HANDE is an ambitious and general high-performance code developed by a geographically-dispersed team with a variety of backgrounds in computational science. In the course of preparing a public, open-source release, we have taken this opportunity to step ba… ▽ More The HANDE quantum Monte Carlo project offers accessible stochastic algorithms for general use for scientists in the field of quantum chemistry. HANDE is an ambitious and general high-performance code developed by a geographically-dispersed team with a variety of backgrounds in computational science. In the course of preparing a public, open-source release, we have taken this opportunity to step back and look at what we have done and what we hope to do in the future. We pay particular attention to development processes, the approach taken to train students joining the project, and how a flat hierarchical structure aids communication △ Less

Submitted 14 November, 2015; v1 submitted 21 July, 2014; originally announced July 2014.

Comments: 6 pages. Submission to WSSSPE2

Journal ref: Journal of Open Research Software, 3, e9, 2015

arXiv:1005.1425 [pdf, other]

Quantum Complexity: restrictions on algorithms and architectures

Authors: Daniel James Shepherd

Abstract: A dissertation submitted to the University of Bristol in accordance with the requirements of the degree of Doctor of Philosophy (PhD) in the Faculty of Engineering, Department of Computer Science, July 2009. A dissertation submitted to the University of Bristol in accordance with the requirements of the degree of Doctor of Philosophy (PhD) in the Faculty of Engineering, Department of Computer Science, July 2009. △ Less

Submitted 9 May, 2010; originally announced May 2010.

Comments: 137 pages, 10 figs.

arXiv:cmp-lg/9706013 [pdf, ps, other]

A Corpus-Based Approach for Building Semantic Lexicons

Authors: Ellen Riloff, Jessica Shepherd

Abstract: Semantic knowledge can be a great asset to natural language processing systems, but it is usually hand-coded for each application. Although some semantic information is available in general-purpose knowledge bases such as WordNet and Cyc, many applications require domain-specific lexicons that represent words and categories for a particular topic. In this paper, we present a corpus-based method… ▽ More Semantic knowledge can be a great asset to natural language processing systems, but it is usually hand-coded for each application. Although some semantic information is available in general-purpose knowledge bases such as WordNet and Cyc, many applications require domain-specific lexicons that represent words and categories for a particular topic. In this paper, we present a corpus-based method that can be used to build semantic lexicons for specific categories. The input to the system is a small set of seed words for a category and a representative text corpus. The output is a ranked list of words that are associated with the category. A user then reviews the top-ranked words and decides which ones should be entered in the semantic lexicon. In experiments with five categories, users typically found about 60 words per category in 10-15 minutes to build a core semantic lexicon. △ Less

Submitted 10 June, 1997; originally announced June 1997.

Comments: 8 pages - to appear in Proceedings of EMNLP-2

Showing 1–12 of 12 results for author: Shepherd, J