-
Advancing the Scientific Method with Large Language Models: From Hypothesis to Discovery
Authors:
Yanbo Zhang,
Sumeer A. Khan,
Adnan Mahmud,
Huck Yang,
Alexander Lavin,
Michael Levin,
Jeremy Frey,
Jared Dunnmon,
James Evans,
Alan Bundy,
Saso Dzeroski,
Jesper Tegner,
Hector Zenil
Abstract:
With recent Nobel Prizes recognising AI contributions to science, Large Language Models (LLMs) are transforming scientific research by enhancing productivity and reshaping the scientific method. LLMs are now involved in experimental design, data analysis, and workflows, particularly in chemistry and biology. However, challenges such as hallucinations and reliability persist. In this contribution,…
▽ More
With recent Nobel Prizes recognising AI contributions to science, Large Language Models (LLMs) are transforming scientific research by enhancing productivity and reshaping the scientific method. LLMs are now involved in experimental design, data analysis, and workflows, particularly in chemistry and biology. However, challenges such as hallucinations and reliability persist. In this contribution, we review how Large Language Models (LLMs) are redefining the scientific method and explore their potential applications across different stages of the scientific cycle, from hypothesis testing to discovery. We conclude that, for LLMs to serve as relevant and effective creative engines and productivity enhancers, their deep integration into all steps of the scientific process should be pursued in collaboration and alignment with human scientific goals, with clear evaluation metrics. The transition to AI-driven science raises ethical questions about creativity, oversight, and responsibility. With careful guidance, LLMs could evolve into creative engines, driving transformative breakthroughs across scientific disciplines responsibly and effectively. However, the scientific community must also decide how much it leaves to LLMs to drive science, even when associations with 'reasoning', mostly currently undeserved, are made in exchange for the potential to explore hypothesis and solution regions that might otherwise remain unexplored by human exploration alone.
△ Less
Submitted 22 May, 2025;
originally announced May 2025.
-
Modeling Multivariate Biosignals With Graph Neural Networks and Structured State Space Models
Authors:
Siyi Tang,
Jared A. Dunnmon,
Liangqiong Qu,
Khaled K. Saab,
Tina Baykaner,
Christopher Lee-Messer,
Daniel L. Rubin
Abstract:
Multivariate biosignals are prevalent in many medical domains, such as electroencephalography, polysomnography, and electrocardiography. Modeling spatiotemporal dependencies in multivariate biosignals is challenging due to (1) long-range temporal dependencies and (2) complex spatial correlations between the electrodes. To address these challenges, we propose representing multivariate biosignals as…
▽ More
Multivariate biosignals are prevalent in many medical domains, such as electroencephalography, polysomnography, and electrocardiography. Modeling spatiotemporal dependencies in multivariate biosignals is challenging due to (1) long-range temporal dependencies and (2) complex spatial correlations between the electrodes. To address these challenges, we propose representing multivariate biosignals as time-dependent graphs and introduce GraphS4mer, a general graph neural network (GNN) architecture that improves performance on biosignal classification tasks by modeling spatiotemporal dependencies in biosignals. Specifically, (1) we leverage the Structured State Space architecture, a state-of-the-art deep sequence model, to capture long-range temporal dependencies in biosignals and (2) we propose a graph structure learning layer in GraphS4mer to learn dynamically evolving graph structures in the data. We evaluate our proposed model on three distinct biosignal classification tasks and show that GraphS4mer consistently improves over existing models, including (1) seizure detection from electroencephalographic signals, outperforming a previous GNN with self-supervised pre-training by 3.1 points in AUROC; (2) sleep staging from polysomnographic signals, a 4.1 points improvement in macro-F1 score compared to existing sleep staging models; and (3) 12-lead electrocardiogram classification, outperforming previous state-of-the-art models by 2.7 points in macro-F1 score.
△ Less
Submitted 29 April, 2023; v1 submitted 20 November, 2022;
originally announced November 2022.
-
xView3-SAR: Detecting Dark Fishing Activity Using Synthetic Aperture Radar Imagery
Authors:
Fernando Paolo,
Tsu-ting Tim Lin,
Ritwik Gupta,
Bryce Goodman,
Nirav Patel,
Daniel Kuster,
David Kroodsma,
Jared Dunnmon
Abstract:
Unsustainable fishing practices worldwide pose a major threat to marine resources and ecosystems. Identifying vessels that do not show up in conventional monitoring systems -- known as ``dark vessels'' -- is key to managing and securing the health of marine environments. With the rise of satellite-based synthetic aperture radar (SAR) imaging and modern machine learning (ML), it is now possible to…
▽ More
Unsustainable fishing practices worldwide pose a major threat to marine resources and ecosystems. Identifying vessels that do not show up in conventional monitoring systems -- known as ``dark vessels'' -- is key to managing and securing the health of marine environments. With the rise of satellite-based synthetic aperture radar (SAR) imaging and modern machine learning (ML), it is now possible to automate detection of dark vessels day or night, under all-weather conditions. SAR images, however, require a domain-specific treatment and are not widely accessible to the ML community. Maritime objects (vessels and offshore infrastructure) are relatively small and sparse, challenging traditional computer vision approaches. We present the largest labeled dataset for training ML models to detect and characterize vessels and ocean structures in SAR imagery. xView3-SAR consists of nearly 1,000 analysis-ready SAR images from the Sentinel-1 mission that are, on average, 29,400-by-24,400 pixels each. The images are annotated using a combination of automated and manual analysis. Co-located bathymetry and wind state rasters accompany every SAR image. We also provide an overview of the xView3 Computer Vision Challenge, an international competition using xView3-SAR for ship detection and characterization at large scale. We release the data (\href{https://iuu.xview.us/}{https://iuu.xview.us/}) and code (\href{https://github.com/DIUx-xView}{https://github.com/DIUx-xView}) to support ongoing development and evaluation of ML approaches for this important application.
△ Less
Submitted 5 November, 2022; v1 submitted 2 June, 2022;
originally announced June 2022.
-
Multimodal spatiotemporal graph neural networks for improved prediction of 30-day all-cause hospital readmission
Authors:
Siyi Tang,
Amara Tariq,
Jared Dunnmon,
Umesh Sharma,
Praneetha Elugunti,
Daniel Rubin,
Bhavik N. Patel,
Imon Banerjee
Abstract:
Measures to predict 30-day readmission are considered an important quality factor for hospitals as accurate predictions can reduce the overall cost of care by identifying high risk patients before they are discharged. While recent deep learning-based studies have shown promising empirical results on readmission prediction, several limitations exist that may hinder widespread clinical utility, such…
▽ More
Measures to predict 30-day readmission are considered an important quality factor for hospitals as accurate predictions can reduce the overall cost of care by identifying high risk patients before they are discharged. While recent deep learning-based studies have shown promising empirical results on readmission prediction, several limitations exist that may hinder widespread clinical utility, such as (a) only patients with certain conditions are considered, (b) existing approaches do not leverage data temporality, (c) individual admissions are assumed independent of each other, which is unrealistic, (d) prior studies are usually limited to single source of data and single center data. To address these limitations, we propose a multimodal, modality-agnostic spatiotemporal graph neural network (MM-STGNN) for prediction of 30-day all-cause hospital readmission that fuses multimodal in-patient longitudinal data. By training and evaluating our methods using longitudinal chest radiographs and electronic health records from two independent centers, we demonstrate that MM-STGNN achieves AUROC of 0.79 on both primary and external datasets. Furthermore, MM-STGNN significantly outperforms the current clinical reference standard, LACE+ score (AUROC=0.61), on the primary dataset. For subset populations of patients with heart and vascular disease, our model also outperforms baselines on predicting 30-day readmission (e.g., 3.7 point improvement in AUROC in patients with heart disease). Lastly, qualitative model interpretability analysis indicates that while patients' primary diagnoses were not explicitly used to train the model, node features crucial for model prediction directly reflect patients' primary diagnoses. Importantly, our MM-STGNN is agnostic to node feature modalities and could be utilized to integrate multimodal data for triaging patients in various downstream resource allocation tasks.
△ Less
Submitted 14 April, 2022;
originally announced April 2022.
-
Domino: Discovering Systematic Errors with Cross-Modal Embeddings
Authors:
Sabri Eyuboglu,
Maya Varma,
Khaled Saab,
Jean-Benoit Delbrouck,
Christopher Lee-Messer,
Jared Dunnmon,
James Zou,
Christopher Ré
Abstract:
Machine learning models that achieve high overall accuracy often make systematic errors on important subsets (or slices) of data. Identifying underperforming slices is particularly challenging when working with high-dimensional inputs (e.g. images, audio), where important slices are often unlabeled. In order to address this issue, recent studies have proposed automated slice discovery methods (SDM…
▽ More
Machine learning models that achieve high overall accuracy often make systematic errors on important subsets (or slices) of data. Identifying underperforming slices is particularly challenging when working with high-dimensional inputs (e.g. images, audio), where important slices are often unlabeled. In order to address this issue, recent studies have proposed automated slice discovery methods (SDMs), which leverage learned model representations to mine input data for slices on which a model performs poorly. To be useful to a practitioner, these methods must identify slices that are both underperforming and coherent (i.e. united by a human-understandable concept). However, no quantitative evaluation framework currently exists for rigorously assessing SDMs with respect to these criteria. Additionally, prior qualitative evaluations have shown that SDMs often identify slices that are incoherent. In this work, we address these challenges by first designing a principled evaluation framework that enables a quantitative comparison of SDMs across 1,235 slice discovery settings in three input domains (natural images, medical images, and time-series data). Then, motivated by the recent development of powerful cross-modal representation learning approaches, we present Domino, an SDM that leverages cross-modal embeddings and a novel error-aware mixture model to discover and describe coherent slices. We find that Domino accurately identifies 36% of the 1,235 slices in our framework - a 12 percentage point improvement over prior methods. Further, Domino is the first SDM that can provide natural language descriptions of identified slices, correctly generating the exact name of the slice in 35% of settings.
△ Less
Submitted 21 May, 2022; v1 submitted 24 March, 2022;
originally announced March 2022.
-
OncoNet: Weakly Supervised Siamese Network to automate cancer treatment response assessment between longitudinal FDG PET/CT examinations
Authors:
Anirudh Joshi,
Sabri Eyuboglu,
Shih-Cheng Huang,
Jared Dunnmon,
Arjun Soin,
Guido Davidzon,
Akshay Chaudhari,
Matthew P Lungren
Abstract:
FDG PET/CT imaging is a resource intensive examination critical for managing malignant disease and is particularly important for longitudinal assessment during therapy. Approaches to automate longtudinal analysis present many challenges including lack of available longitudinal datasets, managing complex large multimodal imaging examinations, and need for detailed annotations for traditional superv…
▽ More
FDG PET/CT imaging is a resource intensive examination critical for managing malignant disease and is particularly important for longitudinal assessment during therapy. Approaches to automate longtudinal analysis present many challenges including lack of available longitudinal datasets, managing complex large multimodal imaging examinations, and need for detailed annotations for traditional supervised machine learning. In this work we develop OncoNet, novel machine learning algorithm that assesses treatment response from a 1,954 pairs of sequential FDG PET/CT exams through weak supervision using the standard uptake values (SUVmax) in associated radiology reports. OncoNet demonstrates an AUROC of 0.86 and 0.84 on internal and external institution test sets respectively for determination of change between scans while also showing strong agreement to clinical scoring systems with a kappa score of 0.8. We also curated a dataset of 1,954 paired FDG PET/CT exams designed for response assessment for the broader machine learning in healthcare research community. Automated assessment of radiographic response from FDG PET/CT with OncoNet could provide clinicians with a valuable tool to rapidly and consistently interpret change over time in longitudinal multi-modal imaging exams.
△ Less
Submitted 3 August, 2021;
originally announced August 2021.
-
Self-Supervised Graph Neural Networks for Improved Electroencephalographic Seizure Analysis
Authors:
Siyi Tang,
Jared A. Dunnmon,
Khaled Saab,
Xuan Zhang,
Qianying Huang,
Florian Dubost,
Daniel L. Rubin,
Christopher Lee-Messer
Abstract:
Automated seizure detection and classification from electroencephalography (EEG) can greatly improve seizure diagnosis and treatment. However, several modeling challenges remain unaddressed in prior automated seizure detection and classification studies: (1) representing non-Euclidean data structure in EEGs, (2) accurately classifying rare seizure types, and (3) lacking a quantitative interpretabi…
▽ More
Automated seizure detection and classification from electroencephalography (EEG) can greatly improve seizure diagnosis and treatment. However, several modeling challenges remain unaddressed in prior automated seizure detection and classification studies: (1) representing non-Euclidean data structure in EEGs, (2) accurately classifying rare seizure types, and (3) lacking a quantitative interpretability approach to measure model ability to localize seizures. In this study, we address these challenges by (1) representing the spatiotemporal dependencies in EEGs using a graph neural network (GNN) and proposing two EEG graph structures that capture the electrode geometry or dynamic brain connectivity, (2) proposing a self-supervised pre-training method that predicts preprocessed signals for the next time period to further improve model performance, particularly on rare seizure types, and (3) proposing a quantitative model interpretability approach to assess a model's ability to localize seizures within EEGs. When evaluating our approach on seizure detection and classification on a large public dataset, we find that our GNN with self-supervised pre-training achieves 0.875 Area Under the Receiver Operating Characteristic Curve on seizure detection and 0.749 weighted F1-score on seizure classification, outperforming previous methods for both seizure detection and classification. Moreover, our self-supervised pre-training strategy significantly improves classification of rare seizure types. Furthermore, quantitative interpretability analysis shows that our GNN with self-supervised pre-training precisely localizes 25.4% focal seizures, a 21.9 point improvement over existing CNNs. Finally, by superimposing the identified seizure locations on both raw EEG signals and EEG graphs, our approach could provide clinicians with an intuitive visualization of localized seizure regions.
△ Less
Submitted 13 March, 2022; v1 submitted 16 April, 2021;
originally announced April 2021.
-
No Subclass Left Behind: Fine-Grained Robustness in Coarse-Grained Classification Problems
Authors:
Nimit S. Sohoni,
Jared A. Dunnmon,
Geoffrey Angus,
Albert Gu,
Christopher Ré
Abstract:
In real-world classification tasks, each class often comprises multiple finer-grained "subclasses." As the subclass labels are frequently unavailable, models trained using only the coarser-grained class labels often exhibit highly variable performance across different subclasses. This phenomenon, known as hidden stratification, has important consequences for models deployed in safety-critical appl…
▽ More
In real-world classification tasks, each class often comprises multiple finer-grained "subclasses." As the subclass labels are frequently unavailable, models trained using only the coarser-grained class labels often exhibit highly variable performance across different subclasses. This phenomenon, known as hidden stratification, has important consequences for models deployed in safety-critical applications such as medicine. We propose GEORGE, a method to both measure and mitigate hidden stratification even when subclass labels are unknown. We first observe that unlabeled subclasses are often separable in the feature space of deep neural networks, and exploit this fact to estimate subclass labels for the training data via clustering techniques. We then use these approximate subclass labels as a form of noisy supervision in a distributionally robust optimization objective. We theoretically characterize the performance of GEORGE in terms of the worst-case generalization error across any subclass. We empirically validate GEORGE on a mix of real-world and benchmark image classification datasets, and show that our approach boosts worst-case subclass accuracy by up to 22 percentage points compared to standard training techniques, without requiring any prior information about the subclasses.
△ Less
Submitted 10 April, 2022; v1 submitted 25 November, 2020;
originally announced November 2020.
-
Data Valuation for Medical Imaging Using Shapley Value: Application on A Large-scale Chest X-ray Dataset
Authors:
Siyi Tang,
Amirata Ghorbani,
Rikiya Yamashita,
Sameer Rehman,
Jared A. Dunnmon,
James Zou,
Daniel L. Rubin
Abstract:
The reliability of machine learning models can be compromised when trained on low quality data. Many large-scale medical imaging datasets contain low quality labels extracted from sources such as medical reports. Moreover, images within a dataset may have heterogeneous quality due to artifacts and biases arising from equipment or measurement errors. Therefore, algorithms that can automatically ide…
▽ More
The reliability of machine learning models can be compromised when trained on low quality data. Many large-scale medical imaging datasets contain low quality labels extracted from sources such as medical reports. Moreover, images within a dataset may have heterogeneous quality due to artifacts and biases arising from equipment or measurement errors. Therefore, algorithms that can automatically identify low quality data are highly desired. In this study, we used data Shapley, a data valuation metric, to quantify the value of training data to the performance of a pneumonia detection algorithm in a large chest X-ray dataset. We characterized the effectiveness of data Shapley in identifying low quality versus valuable data for pneumonia detection. We found that removing training data with high Shapley values decreased the pneumonia detection performance, whereas removing data with low Shapley values improved the model performance. Furthermore, there were more mislabeled examples in low Shapley value data and more true pneumonia cases in high Shapley value data. Our results suggest that low Shapley value indicates mislabeled or poor quality images, whereas high Shapley value indicates data that are valuable for pneumonia detection. Our method can serve as a framework for using data Shapley to denoise large-scale medical imaging datasets.
△ Less
Submitted 15 October, 2020;
originally announced October 2020.
-
Ivy: Instrumental Variable Synthesis for Causal Inference
Authors:
Zhaobin Kuang,
Frederic Sala,
Nimit Sohoni,
Sen Wu,
Aldo Córdova-Palomera,
Jared Dunnmon,
James Priest,
Christopher Ré
Abstract:
A popular way to estimate the causal effect of a variable x on y from observational data is to use an instrumental variable (IV): a third variable z that affects y only through x. The more strongly z is associated with x, the more reliable the estimate is, but such strong IVs are difficult to find. Instead, practitioners combine more commonly available IV candidates---which are not necessarily str…
▽ More
A popular way to estimate the causal effect of a variable x on y from observational data is to use an instrumental variable (IV): a third variable z that affects y only through x. The more strongly z is associated with x, the more reliable the estimate is, but such strong IVs are difficult to find. Instead, practitioners combine more commonly available IV candidates---which are not necessarily strong, or even valid, IVs---into a single "summary" that is plugged into causal effect estimators in place of an IV. In genetic epidemiology, such approaches are known as allele scores. Allele scores require strong assumptions---independence and validity of all IV candidates---for the resulting estimate to be reliable. To relax these assumptions, we propose Ivy, a new method to combine IV candidates that can handle correlated and invalid IV candidates in a robust manner. Theoretically, we characterize this robustness, its limits, and its impact on the resulting causal estimates. Empirically, Ivy can correctly identify the directionality of known relationships and is robust against false discovery (median effect size <= 0.025) on three real-world datasets with no causal effects, while allele scores return more biased estimates (median effect size >= 0.118).
△ Less
Submitted 11 April, 2020;
originally announced April 2020.
-
Assessing Robustness to Noise: Low-Cost Head CT Triage
Authors:
Sarah M. Hooper,
Jared A. Dunnmon,
Matthew P. Lungren,
Sanjiv Sam Gambhir,
Christopher Ré,
Adam S. Wang,
Bhavik N. Patel
Abstract:
Automated medical image classification with convolutional neural networks (CNNs) has great potential to impact healthcare, particularly in resource-constrained healthcare systems where fewer trained radiologists are available. However, little is known about how well a trained CNN can perform on images with the increased noise levels, different acquisition protocols, or additional artifacts that ma…
▽ More
Automated medical image classification with convolutional neural networks (CNNs) has great potential to impact healthcare, particularly in resource-constrained healthcare systems where fewer trained radiologists are available. However, little is known about how well a trained CNN can perform on images with the increased noise levels, different acquisition protocols, or additional artifacts that may arise when using low-cost scanners, which can be underrepresented in datasets collected from well-funded hospitals. In this work, we investigate how a model trained to triage head computed tomography (CT) scans performs on images acquired with reduced x-ray tube current, fewer projections per gantry rotation, and limited angle scans. These changes can reduce the cost of the scanner and demands on electrical power but come at the expense of increased image noise and artifacts. We first develop a model to triage head CTs and report an area under the receiver operating characteristic curve (AUROC) of 0.77. We then show that the trained model is robust to reduced tube current and fewer projections, with the AUROC dropping only 0.65% for images acquired with a 16x reduction in tube current and 0.22% for images acquired with 8x fewer projections. Finally, for significantly degraded images acquired by a limited angle scan, we show that a model trained specifically to classify such images can overcome the technological limitations to reconstruction and maintain an AUROC within 0.09% of the original model.
△ Less
Submitted 28 March, 2020; v1 submitted 17 March, 2020;
originally announced March 2020.
-
Hidden Stratification Causes Clinically Meaningful Failures in Machine Learning for Medical Imaging
Authors:
Luke Oakden-Rayner,
Jared Dunnmon,
Gustavo Carneiro,
Christopher Ré
Abstract:
Machine learning models for medical image analysis often suffer from poor performance on important subsets of a population that are not identified during training or testing. For example, overall performance of a cancer detection model may be high, but the model still consistently misses a rare but aggressive cancer subtype. We refer to this problem as hidden stratification, and observe that it re…
▽ More
Machine learning models for medical image analysis often suffer from poor performance on important subsets of a population that are not identified during training or testing. For example, overall performance of a cancer detection model may be high, but the model still consistently misses a rare but aggressive cancer subtype. We refer to this problem as hidden stratification, and observe that it results from incompletely describing the meaningful variation in a dataset. While hidden stratification can substantially reduce the clinical efficacy of machine learning models, its effects remain difficult to measure. In this work, we assess the utility of several possible techniques for measuring and describing hidden stratification effects, and characterize these effects on multiple medical imaging datasets. We find evidence that hidden stratification can occur in unidentified imaging subsets with low prevalence, low label quality, subtle distinguishing features, or spurious correlates, and that it can result in relative performance differences of over 20% on clinically important subsets. Finally, we explore the clinical implications of our findings, and suggest that evaluation of hidden stratification should be a critical component of any machine learning deployment in medical imaging.
△ Less
Submitted 15 November, 2019; v1 submitted 26 September, 2019;
originally announced September 2019.
-
Cross-Modal Data Programming Enables Rapid Medical Machine Learning
Authors:
Jared Dunnmon,
Alexander Ratner,
Nishith Khandwala,
Khaled Saab,
Matthew Markert,
Hersh Sagreiya,
Roger Goldman,
Christopher Lee-Messer,
Matthew Lungren,
Daniel Rubin,
Christopher Ré
Abstract:
Labeling training datasets has become a key barrier to building medical machine learning models. One strategy is to generate training labels programmatically, for example by applying natural language processing pipelines to text reports associated with imaging studies. We propose cross-modal data programming, which generalizes this intuitive strategy in a theoretically-grounded way that enables si…
▽ More
Labeling training datasets has become a key barrier to building medical machine learning models. One strategy is to generate training labels programmatically, for example by applying natural language processing pipelines to text reports associated with imaging studies. We propose cross-modal data programming, which generalizes this intuitive strategy in a theoretically-grounded way that enables simpler, clinician-driven input, reduces required labeling time, and improves with additional unlabeled data. In this approach, clinicians generate training labels for models defined over a target modality (e.g. images or time series) by writing rules over an auxiliary modality (e.g. text reports). The resulting technical challenge consists of estimating the accuracies and correlations of these rules; we extend a recent unsupervised generative modeling technique to handle this cross-modal setting in a provably consistent way. Across four applications in radiography, computed tomography, and electroencephalography, and using only several hours of clinician time, our approach matches or exceeds the efficacy of physician-months of hand-labeling with statistical significance, demonstrating a fundamentally faster and more flexible way of building machine learning models in medicine.
△ Less
Submitted 26 March, 2019;
originally announced March 2019.
-
Predicting US State-Level Agricultural Sentiment as a Measure of Food Security with Tweets from Farming Communities
Authors:
Jared Dunnmon,
Swetava Ganguli,
Darren Hau,
Brooke Husic
Abstract:
The ability to obtain accurate food security metrics in developing areas where relevant data can be sparse is critically important for policy makers tasked with implementing food aid programs. As a result, a great deal of work has been dedicated to predicting important food security metrics such as annual crop yields using a variety of methods including simulation, remote sensing, weather models,…
▽ More
The ability to obtain accurate food security metrics in developing areas where relevant data can be sparse is critically important for policy makers tasked with implementing food aid programs. As a result, a great deal of work has been dedicated to predicting important food security metrics such as annual crop yields using a variety of methods including simulation, remote sensing, weather models, and human expert input. As a complement to existing techniques in crop yield prediction, this work develops neural network models for predicting the sentiment of Twitter feeds from farming communities. Specifically, we investigate the potential of both direct learning on a small dataset of agriculturally-relevant tweets and transfer learning from larger, well-labeled sentiment datasets from other domains (e.g.~politics) to accurately predict agricultural sentiment, which we hope would ultimately serve as a useful crop yield predictor. We find that direct learning from small, relevant datasets outperforms transfer learning from large, fully-labeled datasets, that convolutional neural networks broadly outperform recurrent neural networks on Twitter sentiment classification, and that these models perform substantially less well on ternary sentiment problems characteristic of practical settings than on binary problems often found in the literature.
△ Less
Submitted 25 April, 2019; v1 submitted 13 February, 2019;
originally announced February 2019.
-
Predicting Food Security Outcomes Using Convolutional Neural Networks (CNNs) for Satellite Tasking
Authors:
Swetava Ganguli,
Jared Dunnmon,
Darren Hau
Abstract:
Obtaining reliable data describing local Food Security Metrics (FSM) at a granularity that is informative to policy-makers requires expensive and logistically difficult surveys, particularly in the developing world. We train a CNN on publicly available satellite data describing land cover classification and use both transfer learning and direct training to build a model for FSM prediction purely f…
▽ More
Obtaining reliable data describing local Food Security Metrics (FSM) at a granularity that is informative to policy-makers requires expensive and logistically difficult surveys, particularly in the developing world. We train a CNN on publicly available satellite data describing land cover classification and use both transfer learning and direct training to build a model for FSM prediction purely from satellite imagery data. We then propose efficient tasking algorithms for high resolution satellite assets via transfer learning, Markovian search algorithms, and Bayesian networks.
△ Less
Submitted 25 April, 2019; v1 submitted 12 February, 2019;
originally announced February 2019.
-
Training Complex Models with Multi-Task Weak Supervision
Authors:
Alexander Ratner,
Braden Hancock,
Jared Dunnmon,
Frederic Sala,
Shreyash Pandey,
Christopher Ré
Abstract:
As machine learning models continue to increase in complexity, collecting large hand-labeled training sets has become one of the biggest roadblocks in practice. Instead, weaker forms of supervision that provide noisier but cheaper labels are often used. However, these weak supervision sources have diverse and unknown accuracies, may output correlated labels, and may label different tasks or apply…
▽ More
As machine learning models continue to increase in complexity, collecting large hand-labeled training sets has become one of the biggest roadblocks in practice. Instead, weaker forms of supervision that provide noisier but cheaper labels are often used. However, these weak supervision sources have diverse and unknown accuracies, may output correlated labels, and may label different tasks or apply at different levels of granularity. We propose a framework for integrating and modeling such weak supervision sources by viewing them as labeling different related sub-tasks of a problem, which we refer to as the multi-task weak supervision setting. We show that by solving a matrix completion-style problem, we can recover the accuracies of these multi-task sources given their dependency structure, but without any labeled data, leading to higher-quality supervision for training an end model. Theoretically, we show that the generalization error of models trained with this approach improves with the number of unlabeled data points, and characterize the scaling with respect to the task and dependency structures. On three fine-grained classification problems, we show that our approach leads to average gains of 20.2 points in accuracy over a traditional supervised approach, 6.8 points over a majority vote baseline, and 4.1 points over a previously proposed weak supervision method that models tasks separately.
△ Less
Submitted 7 December, 2018; v1 submitted 5 October, 2018;
originally announced October 2018.
-
Learning to Compose Domain-Specific Transformations for Data Augmentation
Authors:
Alexander J. Ratner,
Henry R. Ehrenberg,
Zeshan Hussain,
Jared Dunnmon,
Christopher Ré
Abstract:
Data augmentation is a ubiquitous technique for increasing the size of labeled training sets by leveraging task-specific data transformations that preserve class labels. While it is often easy for domain experts to specify individual transformations, constructing and tuning the more sophisticated compositions typically needed to achieve state-of-the-art results is a time-consuming manual task in p…
▽ More
Data augmentation is a ubiquitous technique for increasing the size of labeled training sets by leveraging task-specific data transformations that preserve class labels. While it is often easy for domain experts to specify individual transformations, constructing and tuning the more sophisticated compositions typically needed to achieve state-of-the-art results is a time-consuming manual task in practice. We propose a method for automating this process by learning a generative sequence model over user-specified transformation functions using a generative adversarial approach. Our method can make use of arbitrary, non-deterministic transformation functions, is robust to misspecified user input, and is trained on unlabeled data. The learned transformation model can then be used to perform data augmentation for any end discriminative model. In our experiments, we show the efficacy of our approach on both image and text datasets, achieving improvements of 4.0 accuracy points on CIFAR-10, 1.4 F1 points on the ACE relation extraction task, and 3.4 accuracy points when using domain-specific transformation operations on a medical imaging dataset as compared to standard heuristic augmentation approaches.
△ Less
Submitted 30 September, 2017; v1 submitted 5 September, 2017;
originally announced September 2017.
-
Optimizing and Visualizing Deep Learning for Benign/Malignant Classification in Breast Tumors
Authors:
Darvin Yi,
Rebecca Lynn Sawyer,
David Cohn III,
Jared Dunnmon,
Carson Lam,
Xuerong Xiao,
Daniel Rubin
Abstract:
Breast cancer has the highest incidence and second highest mortality rate for women in the US. Our study aims to utilize deep learning for benign/malignant classification of mammogram tumors using a subset of cases from the Digital Database of Screening Mammography (DDSM). Though it was a small dataset from the view of Deep Learning (about 1000 patients), we show that currently state of the art ar…
▽ More
Breast cancer has the highest incidence and second highest mortality rate for women in the US. Our study aims to utilize deep learning for benign/malignant classification of mammogram tumors using a subset of cases from the Digital Database of Screening Mammography (DDSM). Though it was a small dataset from the view of Deep Learning (about 1000 patients), we show that currently state of the art architectures of deep learning can find a robust signal, even when trained from scratch. Using convolutional neural networks (CNNs), we are able to achieve an accuracy of 85% and an ROC AUC of 0.91, while leading hand-crafted feature based methods are only able to achieve an accuracy of 71%. We investigate an amalgamation of architectures to show that our best result is reached with an ensemble of the lightweight GoogLe Nets tasked with interpreting both the coronal caudal view and the mediolateral oblique view, simply averaging the probability scores of both views to make the final prediction. In addition, we have created a novel method to visualize what features the neural network detects for the benign/malignant classification, and have correlated those features with well known radiological features, such as spiculation. Our algorithm significantly improves existing classification methods for mammography lesions and identifies features that correlate with established clinical markers.
△ Less
Submitted 17 May, 2017;
originally announced May 2017.
-
Machine Learning for Better Models for Predicting Bond Prices
Authors:
Swetava Ganguli,
Jared Dunnmon
Abstract:
Bond prices are a reflection of extremely complex market interactions and policies, making prediction of future prices difficult. This task becomes even more challenging due to the dearth of relevant information, and accuracy is not the only consideration--in trading situations, time is of the essence. Thus, machine learning in the context of bond price predictions should be both fast and accurate…
▽ More
Bond prices are a reflection of extremely complex market interactions and policies, making prediction of future prices difficult. This task becomes even more challenging due to the dearth of relevant information, and accuracy is not the only consideration--in trading situations, time is of the essence. Thus, machine learning in the context of bond price predictions should be both fast and accurate. In this course project, we use a dataset describing the previous 10 trades of a large number of bonds among other relevant descriptive metrics to predict future bond prices. Each of 762,678 bonds in the dataset is described by a total of 61 attributes, including a ground truth trade price. We evaluate the performance of various supervised learning algorithms for regression followed by ensemble methods, with feature and model selection considerations being treated in detail. We further evaluate all methods on both accuracy and speed. Finally, we propose a novel hybrid time-series aided machine learning method that could be applied to such datasets in future work.
△ Less
Submitted 31 March, 2017;
originally announced May 2017.