-
Leveraging Neural Networks to Profile Health Care Providers with Application to Medicare Claims
Authors:
Wenbo Wu,
Fan Li,
Richard Liu,
Yiting Li,
Mara McAdams-DeMarco,
Krzysztof J. Geras,
Douglas E. Schaubel,
Iván Díaz
Abstract:
Encompassing numerous nationwide, statewide, and institutional initiatives in the United States, provider profiling has evolved into a major health care undertaking with ubiquitous applications, profound implications, and high-stakes consequences. In line with such a significant profile, the literature has accumulated a number of developments dedicated to enhancing the statistical paradigm of prov…
▽ More
Encompassing numerous nationwide, statewide, and institutional initiatives in the United States, provider profiling has evolved into a major health care undertaking with ubiquitous applications, profound implications, and high-stakes consequences. In line with such a significant profile, the literature has accumulated a number of developments dedicated to enhancing the statistical paradigm of provider profiling. Tackling wide-ranging profiling issues, these methods typically adjust for risk factors using linear predictors. While this approach is simple, it can be too restrictive to characterize complex and dynamic factor-outcome associations in certain contexts. One such example arises from evaluating dialysis facilities treating Medicare beneficiaries with end-stage renal disease. It is of primary interest to consider how the coronavirus disease (COVID-19) affected 30-day unplanned readmissions in 2020. The impact of COVID-19 on the risk of readmission varied dramatically across pandemic phases. To efficiently capture the variation while profiling facilities, we develop a generalized partially linear model (GPLM) that incorporates a neural network. Considering provider-level clustering, we implement the GPLM as a stratified sampling-based stochastic optimization algorithm that features accelerated convergence. Furthermore, an exact test is designed to identify under- and over-performing facilities, with an accompanying funnel plot to visualize profiles. The advantages of the proposed methods are demonstrated through simulation experiments and profiling dialysis facilities using 2020 Medicare claims from the United States Renal Data System.
△ Less
Submitted 20 January, 2024; v1 submitted 26 September, 2023;
originally announced September 2023.
-
Generative multitask learning mitigates target-causing confounding
Authors:
Taro Makino,
Krzysztof J. Geras,
Kyunghyun Cho
Abstract:
We propose generative multitask learning (GMTL), a simple and scalable approach to causal representation learning for multitask learning. Our approach makes a minor change to the conventional multitask inference objective, and improves robustness to target shift. Since GMTL only modifies the inference objective, it can be used with existing multitask learning methods without requiring additional t…
▽ More
We propose generative multitask learning (GMTL), a simple and scalable approach to causal representation learning for multitask learning. Our approach makes a minor change to the conventional multitask inference objective, and improves robustness to target shift. Since GMTL only modifies the inference objective, it can be used with existing multitask learning methods without requiring additional training. The improvement in robustness comes from mitigating unobserved confounders that cause the targets, but not the input. We refer to them as \emph{target-causing confounders}. These confounders induce spurious dependencies between the input and targets. This poses a problem for conventional multitask learning, due to its assumption that the targets are conditionally independent given the input. GMTL mitigates target-causing confounding at inference time, by removing the influence of the joint target distribution, and predicting all targets jointly. This removes the spurious dependencies between the input and targets, where the degree of removal is adjustable via a single hyperparameter. This flexibility is useful for managing the trade-off between in- and out-of-distribution generalization. Our results on the Attributes of People and Taskonomy datasets reflect an improved robustness to target shift across four multitask learning methods.
△ Less
Submitted 22 October, 2022; v1 submitted 8 February, 2022;
originally announced February 2022.
-
Understanding the robustness of deep neural network classifiers for breast cancer screening
Authors:
Witold Oleszkiewicz,
Taro Makino,
Stanisław Jastrzębski,
Tomasz Trzciński,
Linda Moy,
Kyunghyun Cho,
Laura Heacock,
Krzysztof J. Geras
Abstract:
Deep neural networks (DNNs) show promise in breast cancer screening, but their robustness to input perturbations must be better understood before they can be clinically implemented. There exists extensive literature on this subject in the context of natural images that can potentially be built upon. However, it cannot be assumed that conclusions about robustness will transfer from natural images t…
▽ More
Deep neural networks (DNNs) show promise in breast cancer screening, but their robustness to input perturbations must be better understood before they can be clinically implemented. There exists extensive literature on this subject in the context of natural images that can potentially be built upon. However, it cannot be assumed that conclusions about robustness will transfer from natural images to mammogram images, due to significant differences between the two image modalities. In order to determine whether conclusions will transfer, we measure the sensitivity of a radiologist-level screening mammogram image classifier to four commonly studied input perturbations that natural image classifiers are sensitive to. We find that mammogram image classifiers are also sensitive to these perturbations, which suggests that we can build on the existing literature. We also perform a detailed analysis on the effects of low-pass filtering, and find that it degrades the visibility of clinically meaningful features called microcalcifications. Since low-pass filtering removes semantically meaningful information that is predictive of breast cancer, we argue that it is undesirable for mammogram image classifiers to be invariant to it. This is in contrast to natural images, where we do not want DNNs to be sensitive to low-pass filtering due to its tendency to remove information that is human-incomprehensible.
△ Less
Submitted 22 March, 2020;
originally announced March 2020.
-
An interpretable classifier for high-resolution breast cancer screening images utilizing weakly supervised localization
Authors:
Yiqiu Shen,
Nan Wu,
Jason Phang,
Jungkyu Park,
Kangning Liu,
Sudarshini Tyagi,
Laura Heacock,
S. Gene Kim,
Linda Moy,
Kyunghyun Cho,
Krzysztof J. Geras
Abstract:
Medical images differ from natural images in significantly higher resolutions and smaller regions of interest. Because of these differences, neural network architectures that work well for natural images might not be applicable to medical image analysis. In this work, we extend the globally-aware multiple instance classifier, a framework we proposed to address these unique properties of medical im…
▽ More
Medical images differ from natural images in significantly higher resolutions and smaller regions of interest. Because of these differences, neural network architectures that work well for natural images might not be applicable to medical image analysis. In this work, we extend the globally-aware multiple instance classifier, a framework we proposed to address these unique properties of medical images. This model first uses a low-capacity, yet memory-efficient, network on the whole image to identify the most informative regions. It then applies another higher-capacity network to collect details from chosen regions. Finally, it employs a fusion module that aggregates global and local information to make a final prediction. While existing methods often require lesion segmentation during training, our model is trained with only image-level labels and can generate pixel-level saliency maps indicating possible malignant findings. We apply the model to screening mammography interpretation: predicting the presence or absence of benign and malignant lesions. On the NYU Breast Cancer Screening Dataset, consisting of more than one million images, our model achieves an AUC of 0.93 in classifying breasts with malignant findings, outperforming ResNet-34 and Faster R-CNN. Compared to ResNet-34, our model is 4.1x faster for inference while using 78.4% less GPU memory. Furthermore, we demonstrate, in a reader study, that our model surpasses radiologist-level AUC by a margin of 0.11. The proposed model is available online: https://github.com/nyukat/GMIC.
△ Less
Submitted 13 February, 2020;
originally announced February 2020.
-
Improving localization-based approaches for breast cancer screening exam classification
Authors:
Thibault Févry,
Jason Phang,
Nan Wu,
S. Gene Kim,
Linda Moy,
Kyunghyun Cho,
Krzysztof J. Geras
Abstract:
We trained and evaluated a localization-based deep CNN for breast cancer screening exam classification on over 200,000 exams (over 1,000,000 images). Our model achieves an AUC of 0.919 in predicting malignancy in patients undergoing breast cancer screening, reducing the error rate of the baseline (Wu et al., 2019a) by 23%. In addition, the models generates bounding boxes for benign and malignant f…
▽ More
We trained and evaluated a localization-based deep CNN for breast cancer screening exam classification on over 200,000 exams (over 1,000,000 images). Our model achieves an AUC of 0.919 in predicting malignancy in patients undergoing breast cancer screening, reducing the error rate of the baseline (Wu et al., 2019a) by 23%. In addition, the models generates bounding boxes for benign and malignant findings, providing interpretable predictions.
△ Less
Submitted 1 August, 2019;
originally announced August 2019.
-
Screening Mammogram Classification with Prior Exams
Authors:
Jungkyu Park,
Jason Phang,
Yiqiu Shen,
Nan Wu,
S. Gene Kim,
Linda Moy,
Kyunghyun Cho,
Krzysztof J. Geras
Abstract:
Radiologists typically compare a patient's most recent breast cancer screening exam to their previous ones in making informed diagnoses. To reflect this practice, we propose new neural network models that compare pairs of screening mammograms from the same patient. We train and evaluate our proposed models on over 665,000 pairs of images (over 166,000 pairs of exams). Our best model achieves an AU…
▽ More
Radiologists typically compare a patient's most recent breast cancer screening exam to their previous ones in making informed diagnoses. To reflect this practice, we propose new neural network models that compare pairs of screening mammograms from the same patient. We train and evaluate our proposed models on over 665,000 pairs of images (over 166,000 pairs of exams). Our best model achieves an AUC of 0.866 in predicting malignancy in patients who underwent breast cancer screening, reducing the error rate of the corresponding baseline.
△ Less
Submitted 30 July, 2019;
originally announced July 2019.
-
Globally-Aware Multiple Instance Classifier for Breast Cancer Screening
Authors:
Yiqiu Shen,
Nan Wu,
Jason Phang,
Jungkyu Park,
Gene Kim,
Linda Moy,
Kyunghyun Cho,
Krzysztof J. Geras
Abstract:
Deep learning models designed for visual classification tasks on natural images have become prevalent in medical image analysis. However, medical images differ from typical natural images in many ways, such as significantly higher resolutions and smaller regions of interest. Moreover, both the global structure and local details play important roles in medical image analysis tasks. To address these…
▽ More
Deep learning models designed for visual classification tasks on natural images have become prevalent in medical image analysis. However, medical images differ from typical natural images in many ways, such as significantly higher resolutions and smaller regions of interest. Moreover, both the global structure and local details play important roles in medical image analysis tasks. To address these unique properties of medical images, we propose a neural network that is able to classify breast cancer lesions utilizing information from both a global saliency map and multiple local patches. The proposed model outperforms the ResNet-based baseline and achieves radiologist-level performance in the interpretation of screening mammography. Although our model is trained only with image-level labels, it is able to generate pixel-level saliency maps that provide localization of possible malignant findings.
△ Less
Submitted 19 August, 2019; v1 submitted 6 June, 2019;
originally announced June 2019.
-
Deep Neural Networks Improve Radiologists' Performance in Breast Cancer Screening
Authors:
Nan Wu,
Jason Phang,
Jungkyu Park,
Yiqiu Shen,
Zhe Huang,
Masha Zorin,
Stanisław Jastrzębski,
Thibault Févry,
Joe Katsnelson,
Eric Kim,
Stacey Wolfson,
Ujas Parikh,
Sushma Gaddam,
Leng Leng Young Lin,
Kara Ho,
Joshua D. Weinstein,
Beatriu Reig,
Yiming Gao,
Hildegard Toth,
Kristine Pysarenko,
Alana Lewin,
Jiyon Lee,
Krystal Airola,
Eralda Mema,
Stephanie Chung
, et al. (7 additional authors not shown)
Abstract:
We present a deep convolutional neural network for breast cancer screening exam classification, trained and evaluated on over 200,000 exams (over 1,000,000 images). Our network achieves an AUC of 0.895 in predicting whether there is a cancer in the breast, when tested on the screening population. We attribute the high accuracy of our model to a two-stage training procedure, which allows us to use…
▽ More
We present a deep convolutional neural network for breast cancer screening exam classification, trained and evaluated on over 200,000 exams (over 1,000,000 images). Our network achieves an AUC of 0.895 in predicting whether there is a cancer in the breast, when tested on the screening population. We attribute the high accuracy of our model to a two-stage training procedure, which allows us to use a very high-capacity patch-level network to learn from pixel-level labels alongside a network learning from macroscopic breast-level labels. To validate our model, we conducted a reader study with 14 readers, each reading 720 screening mammogram exams, and find our model to be as accurate as experienced radiologists when presented with the same data. Finally, we show that a hybrid model, averaging probability of malignancy predicted by a radiologist with a prediction of our neural network, is more accurate than either of the two separately. To better understand our results, we conduct a thorough analysis of our network's performance on different subpopulations of the screening population, model design, training procedure, errors, and properties of its internal representations.
△ Less
Submitted 19 March, 2019;
originally announced March 2019.
-
fastMRI: An Open Dataset and Benchmarks for Accelerated MRI
Authors:
Jure Zbontar,
Florian Knoll,
Anuroop Sriram,
Tullie Murrell,
Zhengnan Huang,
Matthew J. Muckley,
Aaron Defazio,
Ruben Stern,
Patricia Johnson,
Mary Bruno,
Marc Parente,
Krzysztof J. Geras,
Joe Katsnelson,
Hersh Chandarana,
Zizhao Zhang,
Michal Drozdzal,
Adriana Romero,
Michael Rabbat,
Pascal Vincent,
Nafissa Yakubova,
James Pinkerton,
Duo Wang,
Erich Owens,
C. Lawrence Zitnick,
Michael P. Recht
, et al. (2 additional authors not shown)
Abstract:
Accelerating Magnetic Resonance Imaging (MRI) by taking fewer measurements has the potential to reduce medical costs, minimize stress to patients and make MRI possible in applications where it is currently prohibitively slow or expensive. We introduce the fastMRI dataset, a large-scale collection of both raw MR measurements and clinical MR images, that can be used for training and evaluation of ma…
▽ More
Accelerating Magnetic Resonance Imaging (MRI) by taking fewer measurements has the potential to reduce medical costs, minimize stress to patients and make MRI possible in applications where it is currently prohibitively slow or expensive. We introduce the fastMRI dataset, a large-scale collection of both raw MR measurements and clinical MR images, that can be used for training and evaluation of machine-learning approaches to MR image reconstruction. By introducing standardized evaluation criteria and a freely-accessible dataset, our goal is to help the community make rapid advances in the state of the art for MR image reconstruction. We also provide a self-contained introduction to MRI for machine learning researchers with no medical imaging background.
△ Less
Submitted 11 December, 2019; v1 submitted 21 November, 2018;
originally announced November 2018.
-
Classifier-agnostic saliency map extraction
Authors:
Konrad Zolna,
Krzysztof J. Geras,
Kyunghyun Cho
Abstract:
Currently available methods for extracting saliency maps identify parts of the input which are the most important to a specific fixed classifier. We show that this strong dependence on a given classifier hinders their performance. To address this problem, we propose classifier-agnostic saliency map extraction, which finds all parts of the image that any classifier could use, not just one given in…
▽ More
Currently available methods for extracting saliency maps identify parts of the input which are the most important to a specific fixed classifier. We show that this strong dependence on a given classifier hinders their performance. To address this problem, we propose classifier-agnostic saliency map extraction, which finds all parts of the image that any classifier could use, not just one given in advance. We observe that the proposed approach extracts higher quality saliency maps than prior work while being conceptually simple and easy to implement. The method sets the new state of the art result for localization task on the ImageNet data, outperforming all existing weakly-supervised localization techniques, despite not using the ground truth labels at the inference time. The code reproducing the results is available at https://github.com/kondiz/casme .
The final version of this manuscript is published in Computer Vision and Image Understanding and is available online at https://doi.org/10.1016/j.cviu.2020.102969 .
△ Less
Submitted 19 July, 2020; v1 submitted 21 May, 2018;
originally announced May 2018.
-
Breast density classification with deep convolutional neural networks
Authors:
Nan Wu,
Krzysztof J. Geras,
Yiqiu Shen,
Jingyi Su,
S. Gene Kim,
Eric Kim,
Stacey Wolfson,
Linda Moy,
Kyunghyun Cho
Abstract:
Breast density classification is an essential part of breast cancer screening. Although a lot of prior work considered this problem as a task for learning algorithms, to our knowledge, all of them used small and not clinically realistic data both for training and evaluation of their models. In this work, we explore the limits of this task with a data set coming from over 200,000 breast cancer scre…
▽ More
Breast density classification is an essential part of breast cancer screening. Although a lot of prior work considered this problem as a task for learning algorithms, to our knowledge, all of them used small and not clinically realistic data both for training and evaluation of their models. In this work, we explore the limits of this task with a data set coming from over 200,000 breast cancer screening exams. We use this data to train and evaluate a strong convolutional neural network classifier. In a reader study, we find that our model can perform this task comparably to a human expert.
△ Less
Submitted 9 November, 2017;
originally announced November 2017.
-
High-Resolution Breast Cancer Screening with Multi-View Deep Convolutional Neural Networks
Authors:
Krzysztof J. Geras,
Stacey Wolfson,
Yiqiu Shen,
Nan Wu,
S. Gene Kim,
Eric Kim,
Laura Heacock,
Ujas Parikh,
Linda Moy,
Kyunghyun Cho
Abstract:
Advances in deep learning for natural images have prompted a surge of interest in applying similar techniques to medical images. The majority of the initial attempts focused on replacing the input of a deep convolutional neural network with a medical image, which does not take into consideration the fundamental differences between these two types of images. Specifically, fine details are necessary…
▽ More
Advances in deep learning for natural images have prompted a surge of interest in applying similar techniques to medical images. The majority of the initial attempts focused on replacing the input of a deep convolutional neural network with a medical image, which does not take into consideration the fundamental differences between these two types of images. Specifically, fine details are necessary for detection in medical images, unlike in natural images where coarse structures matter most. This difference makes it inadequate to use the existing network architectures developed for natural images, because they work on heavily downscaled images to reduce the memory requirements. This hides details necessary to make accurate predictions. Additionally, a single exam in medical imaging often comes with a set of views which must be fused in order to reach a correct conclusion. In our work, we propose to use a multi-view deep convolutional neural network that handles a set of high-resolution medical images. We evaluate it on large-scale mammography-based breast cancer screening (BI-RADS prediction) using 886,000 images. We focus on investigating the impact of the training set size and image size on the prediction accuracy. Our results highlight that performance increases with the size of training set, and that the best performance can only be achieved using the original resolution. In the reader study, performed on a random subset of the test set, we confirmed the efficacy of our model, which achieved performance comparable to a committee of radiologists when presented with the same data.
△ Less
Submitted 27 June, 2018; v1 submitted 21 March, 2017;
originally announced March 2017.
-
Do Deep Convolutional Nets Really Need to be Deep and Convolutional?
Authors:
Gregor Urban,
Krzysztof J. Geras,
Samira Ebrahimi Kahou,
Ozlem Aslan,
Shengjie Wang,
Rich Caruana,
Abdelrahman Mohamed,
Matthai Philipose,
Matt Richardson
Abstract:
Yes, they do. This paper provides the first empirical demonstration that deep convolutional models really need to be both deep and convolutional, even when trained with methods such as distillation that allow small or shallow models of high accuracy to be trained. Although previous research showed that shallow feed-forward nets sometimes can learn the complex functions previously learned by deep n…
▽ More
Yes, they do. This paper provides the first empirical demonstration that deep convolutional models really need to be both deep and convolutional, even when trained with methods such as distillation that allow small or shallow models of high accuracy to be trained. Although previous research showed that shallow feed-forward nets sometimes can learn the complex functions previously learned by deep nets while using the same number of parameters as the deep models they mimic, in this paper we demonstrate that the same methods cannot be used to train accurate models on CIFAR-10 unless the student models contain multiple layers of convolution. Although the student models do not have to be as deep as the teacher model they mimic, the students need multiple convolutional layers to learn functions of comparable accuracy as the deep convolutional teacher.
△ Less
Submitted 3 March, 2017; v1 submitted 17 March, 2016;
originally announced March 2016.
-
Scheduled denoising autoencoders
Authors:
Krzysztof J. Geras,
Charles Sutton
Abstract:
We present a representation learning method that learns features at multiple different levels of scale. Working within the unsupervised framework of denoising autoencoders, we observe that when the input is heavily corrupted during training, the network tends to learn coarse-grained features, whereas when the input is only slightly corrupted, the network tends to learn fine-grained features. This…
▽ More
We present a representation learning method that learns features at multiple different levels of scale. Working within the unsupervised framework of denoising autoencoders, we observe that when the input is heavily corrupted during training, the network tends to learn coarse-grained features, whereas when the input is only slightly corrupted, the network tends to learn fine-grained features. This motivates the scheduled denoising autoencoder, which starts with a high level of noise that lowers as training progresses. We find that the resulting representation yields a significant boost on a later supervised task compared to the original input, or to a standard denoising autoencoder trained at a single noise level. After supervised fine-tuning our best model achieves the lowest ever reported error on the CIFAR-10 data set among permutation-invariant methods.
△ Less
Submitted 10 April, 2015; v1 submitted 12 June, 2014;
originally announced June 2014.