Search | arXiv e-print repository

Fact-Controlled Diagnosis of Hallucinations in Medical Text Summarization

Authors: Suhas BN, Han-Chin Shing, Lei Xu, Mitch Strong, Jon Burnsky, Jessica Ofor, Jordan R. Mason, Susan Chen, Sundararajan Srinivasan, Chaitanya Shivade, Jack Moriarty, Joseph Paul Cohen

Abstract: Hallucinations in large language models (LLMs) during summarization of patient-clinician dialogues pose significant risks to patient care and clinical decision-making. However, the phenomenon remains understudied in the clinical domain, with uncertainty surrounding the applicability of general-domain hallucination detectors. The rarity and randomness of hallucinations further complicate their inve… ▽ More Hallucinations in large language models (LLMs) during summarization of patient-clinician dialogues pose significant risks to patient care and clinical decision-making. However, the phenomenon remains understudied in the clinical domain, with uncertainty surrounding the applicability of general-domain hallucination detectors. The rarity and randomness of hallucinations further complicate their investigation. In this paper, we conduct an evaluation of hallucination detection methods in the medical domain, and construct two datasets for the purpose: A fact-controlled Leave-N-out dataset -- generated by systematically removing facts from source dialogues to induce hallucinated content in summaries; and a natural hallucination dataset -- arising organically during LLM-based medical summarization. We show that general-domain detectors struggle to detect clinical hallucinations, and that performance on fact-controlled hallucinations does not reliably predict effectiveness on natural hallucinations. We then develop fact-based approaches that count hallucinations, offering explainability not available with existing methods. Notably, our LLM-based detectors, which we developed using fact-controlled hallucinations, generalize well to detecting real-world clinical hallucinations. This research contributes a suite of specialized metrics supported by expert-annotated datasets to advance faithful clinical summarization systems. △ Less

Submitted 31 May, 2025; originally announced June 2025.

Comments: https://github.com/amazon-science/acibench-hallucination-annotations

arXiv:2502.07156 [pdf, other]

Explaining 3D Computed Tomography Classifiers with Counterfactuals

Authors: Joseph Paul Cohen, Louis Blankemeier, Akshay Chaudhari

Abstract: Counterfactual explanations enhance the interpretability of deep learning models in medical imaging, yet adapting them to 3D CT scans poses challenges due to volumetric complexity and resource demands. We extend the Latent Shift counterfactual generation method from 2D applications to explain 3D computed tomography (CT) scans classifiers. We address the challenges associated with 3D classifiers, s… ▽ More Counterfactual explanations enhance the interpretability of deep learning models in medical imaging, yet adapting them to 3D CT scans poses challenges due to volumetric complexity and resource demands. We extend the Latent Shift counterfactual generation method from 2D applications to explain 3D computed tomography (CT) scans classifiers. We address the challenges associated with 3D classifiers, such as limited training samples and high memory demands, by implementing a slice-based autoencoder and gradient blocking except for specific chunks of slices. This method leverages a 2D encoder trained on CT slices, which are subsequently combined to maintain 3D context. We demonstrate this technique on two models for clinical phenotype prediction and lung segmentation. Our approach is both memory-efficient and effective for generating interpretable counterfactuals in high-resolution 3D medical imaging. △ Less

Submitted 2 April, 2025; v1 submitted 10 February, 2025; originally announced February 2025.

Comments: Code and models: https://github.com/ieee8023/ct-counterfactuals

arXiv:2406.06512 [pdf, other]

Merlin: A Vision Language Foundation Model for 3D Computed Tomography

Authors: Louis Blankemeier, Joseph Paul Cohen, Ashwin Kumar, Dave Van Veen, Syed Jamal Safdar Gardezi, Magdalini Paschali, Zhihong Chen, Jean-Benoit Delbrouck, Eduardo Reis, Cesar Truyts, Christian Bluethgen, Malte Engmann Kjeldskov Jensen, Sophie Ostmeier, Maya Varma, Jeya Maria Jose Valanarasu, Zhongnan Fang, Zepeng Huo, Zaid Nabulsi, Diego Ardila, Wei-Hung Weng, Edson Amaro Junior, Neera Ahuja, Jason Fries, Nigam H. Shah, Andrew Johnston , et al. (6 additional authors not shown)

Abstract: Over 85 million computed tomography (CT) scans are performed annually in the US, of which approximately one quarter focus on the abdomen. Given the current radiologist shortage, there is a large impetus to use artificial intelligence to alleviate the burden of interpreting these complex imaging studies. Prior state-of-the-art approaches for automated medical image interpretation leverage vision la… ▽ More Over 85 million computed tomography (CT) scans are performed annually in the US, of which approximately one quarter focus on the abdomen. Given the current radiologist shortage, there is a large impetus to use artificial intelligence to alleviate the burden of interpreting these complex imaging studies. Prior state-of-the-art approaches for automated medical image interpretation leverage vision language models (VLMs). However, current medical VLMs are generally limited to 2D images and short reports, and do not leverage electronic health record (EHR) data for supervision. We introduce Merlin - a 3D VLM that we train using paired CT scans (6+ million images from 15,331 CTs), EHR diagnosis codes (1.8+ million codes), and radiology reports (6+ million tokens). We evaluate Merlin on 6 task types and 752 individual tasks. The non-adapted (off-the-shelf) tasks include zero-shot findings classification (31 findings), phenotype classification (692 phenotypes), and zero-shot cross-modal retrieval (image to findings and image to impressions), while model adapted tasks include 5-year disease prediction (6 diseases), radiology report generation, and 3D semantic segmentation (20 organs). We perform internal validation on a test set of 5,137 CTs, and external validation on 7,000 clinical CTs and on two public CT datasets (VerSe, TotalSegmentator). Beyond these clinically-relevant evaluations, we assess the efficacy of various network architectures and training strategies to depict that Merlin has favorable performance to existing task-specific baselines. We derive data scaling laws to empirically assess training data needs for requisite downstream task performance. Furthermore, unlike conventional VLMs that require hundreds of GPUs for training, we perform all training on a single GPU. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 18 pages, 7 figures

arXiv:2403.05651 [pdf, other]

High-energy polarized electron beams from the ionization of isolated spin polarized hydrogen atoms

Authors: Dimitris Sofikitis, Lars Reichwein, Marios G. Stamatakis, Christos Zois, Dimitrios G. Papazoglou Samuel Cohen, Markus Büscher, Alexander Pukhov, T. Peter Rakitzis

Abstract: We propose a laser-based method for the preparation of high-energy polarized electrons, from the ionization of isolated spin-polarized hydrogen (SPH) atoms. The SPH atoms are prepared from the photodissociation of HCl, using two consecutive UV pulses of ps duration. By appropriately timing and focusing the pulses, we can spatially separate the highly polarized SPH from other unwanted photoproducts… ▽ More We propose a laser-based method for the preparation of high-energy polarized electrons, from the ionization of isolated spin-polarized hydrogen (SPH) atoms. The SPH atoms are prepared from the photodissociation of HCl, using two consecutive UV pulses of ps duration. By appropriately timing and focusing the pulses, we can spatially separate the highly polarized SPH from other unwanted photoproducts, which then act as the target for the acceleration lasers. We show how elastic collisions define number density $n$ and polarization P regimes ($10^{16}\leq$ $n$ $\leq 10^{18}$ cm$^{-3}$, 0.99 $\geq$ P $\geq$ 0.40) for the pre-polarized targets, and use particle-in-cell simulations to demonstrate the method's feasibility. △ Less

Submitted 2 April, 2025; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2402.04609 [pdf, other]

Improving Cross-Domain Low-Resource Text Generation through LLM Post-Editing: A Programmer-Interpreter Approach

Authors: Zhuang Li, Levon Haroutunian, Raj Tumuluri, Philip Cohen, Gholamreza Haffari

Abstract: Post-editing has proven effective in improving the quality of text generated by large language models (LLMs) such as GPT-3.5 or GPT-4, particularly when direct updating of their parameters to enhance text quality is infeasible or expensive. However, relying solely on smaller language models for post-editing can limit the LLMs' ability to generalize across domains. Moreover, the editing strategies… ▽ More Post-editing has proven effective in improving the quality of text generated by large language models (LLMs) such as GPT-3.5 or GPT-4, particularly when direct updating of their parameters to enhance text quality is infeasible or expensive. However, relying solely on smaller language models for post-editing can limit the LLMs' ability to generalize across domains. Moreover, the editing strategies in these methods are not optimally designed for text-generation tasks. To address these limitations, we propose a neural programmer-interpreter approach that preserves the domain generalization ability of LLMs when editing their output. The editing actions in this framework are specifically devised for text generation. Extensive experiments demonstrate that the programmer-interpreter significantly enhances GPT-3.5's performance in logical form-to-text conversion and low-resource machine translation, surpassing other state-of-the-art (SOTA) LLM post-editing methods in cross-domain settings. △ Less

Submitted 7 February, 2024; originally announced February 2024.

Comments: EACL 2024 (findings), short paper, 5 pages

arXiv:2401.12208 [pdf, other]

A Vision-Language Foundation Model to Enhance Efficiency of Chest X-ray Interpretation

Authors: Zhihong Chen, Maya Varma, Justin Xu, Magdalini Paschali, Dave Van Veen, Andrew Johnston, Alaa Youssef, Louis Blankemeier, Christian Bluethgen, Stephan Altmayer, Jeya Maria Jose Valanarasu, Mohamed Siddig Eltayeb Muneer, Eduardo Pontes Reis, Joseph Paul Cohen, Cameron Olsen, Tanishq Mathew Abraham, Emily B. Tsai, Christopher F. Beaulieu, Jenia Jitsev, Sergios Gatidis, Jean-Benoit Delbrouck, Akshay S. Chaudhari, Curtis P. Langlotz

Abstract: Over 1.4 billion chest X-rays (CXRs) are performed annually due to their cost-effectiveness as an initial diagnostic test. This scale of radiological studies provides a significant opportunity to streamline CXR interpretation and documentation. While foundation models are a promising solution, the lack of publicly available large-scale datasets and benchmarks inhibits their iterative development a… ▽ More Over 1.4 billion chest X-rays (CXRs) are performed annually due to their cost-effectiveness as an initial diagnostic test. This scale of radiological studies provides a significant opportunity to streamline CXR interpretation and documentation. While foundation models are a promising solution, the lack of publicly available large-scale datasets and benchmarks inhibits their iterative development and real-world evaluation. To overcome these challenges, we constructed a large-scale dataset (CheXinstruct), which we utilized to train a vision-language foundation model (CheXagent). We systematically demonstrated competitive performance across eight distinct task types on our novel evaluation benchmark (CheXbench). Beyond technical validation, we assessed the real-world utility of CheXagent in directly drafting radiology reports. Our clinical assessment with eight radiologists revealed a 36% time saving for residents using CheXagent-drafted reports, while attending radiologists showed no significant time difference editing resident-drafted or CheXagent-drafted reports. The CheXagent-drafted reports improved the writing efficiency of both radiology residents and attending radiologists in 81% and 61% of cases, respectively, without loss of quality. Overall, we demonstrate that CheXagent can effectively perform a variety of CXR interpretation tasks and holds potential to assist radiologists in routine clinical workflows. △ Less

Submitted 18 December, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

Comments: 26 pages, 8 figures

arXiv:2312.02186 [pdf, other]

Identifying Spurious Correlations using Counterfactual Alignment

Authors: Joseph Paul Cohen, Louis Blankemeier, Akshay Chaudhari

Abstract: Models driven by spurious correlations often yield poor generalization performance. We propose the counterfactual (CF) alignment method to detect and quantify spurious correlations of black box classifiers. Our methodology is based on counterfactual images generated with respect to one classifier being input into other classifiers to see if they also induce changes in the outputs of these classifi… ▽ More Models driven by spurious correlations often yield poor generalization performance. We propose the counterfactual (CF) alignment method to detect and quantify spurious correlations of black box classifiers. Our methodology is based on counterfactual images generated with respect to one classifier being input into other classifiers to see if they also induce changes in the outputs of these classifiers. The relationship between these responses can be quantified and used to identify specific instances where a spurious correlation exists. This is validated by observing intuitive trends in face-attribute and waterbird classifiers, as well as by fabricating spurious correlations and detecting their presence, both visually and quantitatively. Furthermore, utilizing the CF alignment method, we demonstrate that we can evaluate robust optimization methods (GroupDRO, JTT, and FLAC) by detecting a reduction in spurious correlations. △ Less

Submitted 15 January, 2025; v1 submitted 1 December, 2023; originally announced December 2023.

Comments: Accepted to Transactions on Machine Learning Research (TMLR), Code: https://github.com/ieee8023/latentshift

arXiv:2309.12294 [pdf, other]

Reranking for Natural Language Generation from Logical Forms: A Study based on Large Language Models

Authors: Levon Haroutunian, Zhuang Li, Lucian Galescu, Philip Cohen, Raj Tumuluri, Gholamreza Haffari

Abstract: Large language models (LLMs) have demonstrated impressive capabilities in natural language generation. However, their output quality can be inconsistent, posing challenges for generating natural language from logical forms (LFs). This task requires the generated outputs to embody the exact semantics of LFs, without missing any LF semantics or creating any hallucinations. In this work, we tackle th… ▽ More Large language models (LLMs) have demonstrated impressive capabilities in natural language generation. However, their output quality can be inconsistent, posing challenges for generating natural language from logical forms (LFs). This task requires the generated outputs to embody the exact semantics of LFs, without missing any LF semantics or creating any hallucinations. In this work, we tackle this issue by proposing a novel generate-and-rerank approach. Our approach involves initially generating a set of candidate outputs by prompting an LLM and subsequently reranking them using a task-specific reranker model. In addition, we curate a manually collected dataset to evaluate the alignment between different ranking metrics and human judgements. The chosen ranking metrics are utilized to enhance the training and evaluation of the reranker model. By conducting extensive experiments on three diverse datasets, we demonstrate that the candidates selected by our reranker outperform those selected by baseline methods in terms of semantic consistency and fluency, as measured by three comprehensive metrics. Our findings provide strong evidence for the effectiveness of our approach in improving the quality of generated outputs. △ Less

Submitted 21 September, 2023; originally announced September 2023.

Comments: IJCNLP-AACL 2023

arXiv:2305.12737 [pdf, other]

The Best of Both Worlds: Combining Human and Machine Translations for Multilingual Semantic Parsing with Active Learning

Authors: Zhuang Li, Lizhen Qu, Philip R. Cohen, Raj V. Tumuluri, Gholamreza Haffari

Abstract: Multilingual semantic parsing aims to leverage the knowledge from the high-resource languages to improve low-resource semantic parsing, yet commonly suffers from the data imbalance problem. Prior works propose to utilize the translations by either humans or machines to alleviate such issues. However, human translations are expensive, while machine translations are cheap but prone to error and bias… ▽ More Multilingual semantic parsing aims to leverage the knowledge from the high-resource languages to improve low-resource semantic parsing, yet commonly suffers from the data imbalance problem. Prior works propose to utilize the translations by either humans or machines to alleviate such issues. However, human translations are expensive, while machine translations are cheap but prone to error and bias. In this work, we propose an active learning approach that exploits the strengths of both human and machine translations by iteratively adding small batches of human translations into the machine-translated training set. Besides, we propose novel aggregated acquisition criteria that help our active learning method select utterances to be manually translated. Our experiments demonstrate that an ideal utterance selection can significantly reduce the error and bias in the translated data, resulting in higher parser accuracies than the parsers merely trained on the machine-translated data. △ Less

Submitted 22 May, 2023; originally announced May 2023.

Comments: ACL 2023

arXiv:2304.00487 [pdf, other]

The Effect of Counterfactuals on Reading Chest X-rays

Authors: Joseph Paul Cohen, Rupert Brooks, Sovann En, Evan Zucker, Anuj Pareek, Matthew Lungren, Akshay Chaudhari

Abstract: This study evaluates the effect of counterfactual explanations on the interpretation of chest X-rays. We conduct a reader study with two radiologists assessing 240 chest X-ray predictions to rate their confidence that the model's prediction is correct using a 5 point scale. Half of the predictions are false positives. Each prediction is explained twice, once using traditional attribution methods a… ▽ More This study evaluates the effect of counterfactual explanations on the interpretation of chest X-rays. We conduct a reader study with two radiologists assessing 240 chest X-ray predictions to rate their confidence that the model's prediction is correct using a 5 point scale. Half of the predictions are false positives. Each prediction is explained twice, once using traditional attribution methods and once with a counterfactual explanation. The overall results indicate that counterfactual explanations allow a radiologist to have more confidence in true positive predictions compared to traditional approaches (0.15$\pm$0.95 with p=0.01) with only a small increase in false positive predictions (0.04$\pm$1.06 with p=0.57). We observe the specific prediction tasks of Mass and Atelectasis appear to benefit the most compared to other tasks. △ Less

Submitted 2 April, 2023; originally announced April 2023.

Comments: Abstract submitted to CVPR XAI4CV 2023 based on longer version: arXiv:2102.09475

arXiv:2302.09646 [pdf]

An Explainable Collaborative Dialogue System using a Theory of Mind

Authors: Philip R. Cohen, Lucian Galescu, Maayan Shvo

Abstract: Eva is a neuro-symbolic domain-independent multimodal collaborative dialogue system that takes seriously that the purpose of task-oriented dialogue is to assist the user. To do this, the system collaborates by inferring their intentions and plans, detects obstacles to success, finds plans to overcome them or to achieve higher-level goals, and plans its actions, including speech acts, to help users… ▽ More Eva is a neuro-symbolic domain-independent multimodal collaborative dialogue system that takes seriously that the purpose of task-oriented dialogue is to assist the user. To do this, the system collaborates by inferring their intentions and plans, detects obstacles to success, finds plans to overcome them or to achieve higher-level goals, and plans its actions, including speech acts, to help users accomplish those goals. In doing so, the system maintains and reasons with its own declaratively-specified beliefs, goals and intentions, and explicitly reasons about those of its user. Because Eva can track different users' mental states, it can engage multiple agents in multi-party dialogues. Reasoning is accomplished with a modal Horn-clause meta-interpreter that enables computable inference within the subset of logic implemented. The system employs both hierarchical and backward-chaining planning, operating over a rich modal logic-based knowledge and action representation. The planning and reasoning subsystems obey the principles of persistent goals and intentions including: 1) The formation and decomposition of intentions to perform complex actions, 2) the conditions under which persistent goals and intentions can be given up, and 3) persistent goal and intention revision using the relativizing formulas that are created during the planning process. The system treats its speech acts just like its other actions. This general approach enables Eva to plan a variety of speech acts, including requests, informs, questions, confirmations, offers, acceptances, and emotive expressions. Because the dialogue engine is a planner, as the dialogue proceeds, the system can flexibly generate, execute, and potentially repair its plans using physical, digital, and speech actions. Importantly, Eva can explain its utterances because it has created a plan that caused it to utter them. △ Less

Submitted 20 June, 2024; v1 submitted 19 February, 2023; originally announced February 2023.

Comments: 46 pages, 7 figures, 2 appendices

ACM Class: I.2.7; I.2.8; I.2.4; I.2.3; I.2.11

arXiv:2211.14830 [pdf, other]

Medical Image Segmentation Review: The success of U-Net

Authors: Reza Azad, Ehsan Khodapanah Aghdam, Amelie Rauland, Yiwei Jia, Atlas Haddadi Avval, Afshin Bozorgpour, Sanaz Karimijafarbigloo, Joseph Paul Cohen, Ehsan Adeli, Dorit Merhof

Abstract: Automatic medical image segmentation is a crucial topic in the medical domain and successively a critical counterpart in the computer-aided diagnosis paradigm. U-Net is the most widespread image segmentation architecture due to its flexibility, optimized modular design, and success in all medical image modalities. Over the years, the U-Net model achieved tremendous attention from academic and indu… ▽ More Automatic medical image segmentation is a crucial topic in the medical domain and successively a critical counterpart in the computer-aided diagnosis paradigm. U-Net is the most widespread image segmentation architecture due to its flexibility, optimized modular design, and success in all medical image modalities. Over the years, the U-Net model achieved tremendous attention from academic and industrial researchers. Several extensions of this network have been proposed to address the scale and complexity created by medical tasks. Addressing the deficiency of the naive U-Net model is the foremost step for vendors to utilize the proper U-Net variant model for their business. Having a compendium of different variants in one place makes it easier for builders to identify the relevant research. Also, for ML researchers it will help them understand the challenges of the biological tasks that challenge the model. To address this, we discuss the practical aspects of the U-Net model and suggest a taxonomy to categorize each network variant. Moreover, to measure the performance of these strategies in a clinical application, we propose fair evaluations of some unique and famous designs on well-known datasets. We provide a comprehensive implementation library with trained models for future research. In addition, for ease of future studies, we created an online list of U-Net papers with their possible official implementation. All information is gathered in https://github.com/NITR098/Awesome-U-Net repository. △ Less

Submitted 27 November, 2022; originally announced November 2022.

Comments: Submitted to the IEEE Transactions on Pattern Analysis and Machine Intelligence Journal

arXiv:2208.02625 [pdf, ps, other]

On the moments of one-level densities in families of holomorphic cusp forms in the level aspect

Authors: Peter Cohen, Justine Dell, Oscar E. González, Simran Khunger, Chung-Hang Kwan, Steven J. Miller, Alexander Shashkov, Alicia Smith Reina, Carsten Sprunger, Nicholas Triantafillou, Nhi Truong, Roger Van Peski, Stephen Willis

Abstract: We study the $n^{\rm th}$ centered moments of the $1$-level density for the low-lying zeros of $L$-functions attached to holomorphic cuspidal newforms of large prime level and fixed weight. Assuming the Generalized Riemann Hypotheses, we compute this statistic for any $n\ge 1$ and for all test functions whose Fourier transforms are supported in $\left(-2/n, \, 2/n\right)$. This is believed to be t… ▽ More We study the $n^{\rm th}$ centered moments of the $1$-level density for the low-lying zeros of $L$-functions attached to holomorphic cuspidal newforms of large prime level and fixed weight. Assuming the Generalized Riemann Hypotheses, we compute this statistic for any $n\ge 1$ and for all test functions whose Fourier transforms are supported in $\left(-2/n, \, 2/n\right)$. This is believed to be the natural limit of the current technology. Our work significantly extends beyond the trivial range $(-1/n, \, 1/n)$ and surpasses the previous record of $(-1/(n-1),\, 1/(n-1))$ whenever $n>2$. The Katz-Sarnak philosophy predicts that the aforementioned statistic can be modeled by the corresponding statistic for the eigenvalues of random orthogonal matrices. We prove that this is the case for test functions with Fourier support contained in $(-2/n,\, 2/n)$. The main technical innovation is a tractable vantage to evaluate the combinatorial zoo of terms, similar to the work of Conrey-Snaith and Mason-Snaith. As an application, our work provides better bounds on the order of vanishing at the central point for the $L$-functions in our family. △ Less

Submitted 28 March, 2025; v1 submitted 27 July, 2022; originally announced August 2022.

Comments: 58 pages. Revised version, to appear in Algebra & Number Theory

arXiv:2203.09016 [pdf, other]

Natural Language Communication with a Teachable Agent

Authors: Rachel Love, Edith Law, Philip R. Cohen, Dana Kulić

Abstract: Conversational teachable agents offer a promising platform to support learning, both in the classroom and in remote settings. In this context, the agent takes the role of the novice, while the student takes on the role of teacher. This framing is significant for its ability to elicit the Protégé effect in the student-teacher, a pedagogical phenomenon known to increase engagement in the teaching ta… ▽ More Conversational teachable agents offer a promising platform to support learning, both in the classroom and in remote settings. In this context, the agent takes the role of the novice, while the student takes on the role of teacher. This framing is significant for its ability to elicit the Protégé effect in the student-teacher, a pedagogical phenomenon known to increase engagement in the teaching task, and also improve cognitive outcomes. In prior work, teachable agents often take a passive role in the learning interaction, and there are few studies in which the agent and student engage in natural language dialogue during the teaching task. This work investigates the effect of teaching modality when interacting with a virtual agent, via the web-based teaching platform, the Curiosity Notebook. A method of teaching the agent by selecting sentences from source material is compared to a method paraphrasing the source material and typing text input to teach. A user study has been conducted to measure the effect teaching modality on the learning outcomes and engagement of the participants. The results indicate that teaching via paraphrasing and text input has a positive effect on learning outcomes for the material covered, and also on aspects of affective engagement. Furthermore, increased paraphrasing effort, as measured by the similarity between the source material and the material the teacher conveyed to the robot, improves learning outcomes for participants. △ Less

Submitted 16 March, 2022; originally announced March 2022.

Comments: This work has been submitted to the IEEE for possible publication

arXiv:2203.05596 [pdf, other]

doi 10.1093/mnras/stac723

The Lick Observatory Supernova Search follow-up program: photometry data release of 70 stripped-envelope supernovae

Authors: WeiKang Zheng, Benjamin E. Stahl, Thomas de Jaeger, Alexei V. Filippenko, Shan-Qin Wang, Wen-Pei Gan, Thomas G. Brink, Ivan Altunin, Raphael Baer-Way, Andrew Bigley, Kyle Blanchard, Peter K. Blanchard, James Bradley, Samantha K. Cargill, Chadwick Casper, Teagan Chapman, Vidhi Chander, Sanyum Channa, Byung Yun Choi, Nick Choksi, Matthew Chu, Kelsey I. Clubb, Daniel P. Cohen, Paul A. Dalba, Asia deGraw , et al. (63 additional authors not shown)

Abstract: We present BVRI and unfiltered Clear light curves of 70 stripped-envelope supernovae (SESNe), observed between 2003 and 2020, from the Lick Observatory Supernova Search (LOSS) follow-up program. Our SESN sample consists of 19 spectroscopically normal SNe~Ib, two peculiar SNe Ib, six SN Ibn, 14 normal SNe Ic, one peculiar SN Ic, ten SNe Ic-BL, 15 SNe IIb, one ambiguous SN IIb/Ib/c, and two superlum… ▽ More We present BVRI and unfiltered Clear light curves of 70 stripped-envelope supernovae (SESNe), observed between 2003 and 2020, from the Lick Observatory Supernova Search (LOSS) follow-up program. Our SESN sample consists of 19 spectroscopically normal SNe~Ib, two peculiar SNe Ib, six SN Ibn, 14 normal SNe Ic, one peculiar SN Ic, ten SNe Ic-BL, 15 SNe IIb, one ambiguous SN IIb/Ib/c, and two superluminous SNe. Our follow-up photometry has (on a per-SN basis) a mean coverage of 81 photometric points (median of 58 points) and a mean cadence of 3.6d (median of 1.2d). From our full sample, a subset of 38 SNe have pre-maximum coverage in at least one passband, allowing for the peak brightness of each SN in this subset to be quantitatively determined. We describe our data collection and processing techniques, with emphasis toward our automated photometry pipeline, from which we derive publicly available data products to enable and encourage further study by the community. Using these data products, we derive host-galaxy extinction values through the empirical colour evolution relationship and, for the first time, produce accurate rise-time measurements for a large sample of SESNe in both optical and infrared passbands. By modeling multiband light curves, we find that SNe Ic tend to have lower ejecta masses and lower ejecta velocities than SNe~Ib and IIb, but higher $^{56}$Ni masses. △ Less

Submitted 10 March, 2022; originally announced March 2022.

Comments: Accepted by MNRAS

arXiv:2202.02833 [pdf, other]

CheXstray: Real-time Multi-Modal Data Concordance for Drift Detection in Medical Imaging AI

Authors: Arjun Soin, Jameson Merkow, Jin Long, Joseph Paul Cohen, Smitha Saligrama, Stephen Kaiser, Steven Borg, Ivan Tarapov, Matthew P Lungren

Abstract: Clinical Artificial lntelligence (AI) applications are rapidly expanding worldwide, and have the potential to impact to all areas of medical practice. Medical imaging applications constitute a vast majority of approved clinical AI applications. Though healthcare systems are eager to adopt AI solutions a fundamental question remains: \textit{what happens after the AI model goes into production?} We… ▽ More Clinical Artificial lntelligence (AI) applications are rapidly expanding worldwide, and have the potential to impact to all areas of medical practice. Medical imaging applications constitute a vast majority of approved clinical AI applications. Though healthcare systems are eager to adopt AI solutions a fundamental question remains: \textit{what happens after the AI model goes into production?} We use the CheXpert and PadChest public datasets to build and test a medical imaging AI drift monitoring workflow to track data and model drift without contemporaneous ground truth. We simulate drift in multiple experiments to compare model performance with our novel multi-modal drift metric, which uses DICOM metadata, image appearance representation from a variational autoencoder (VAE), and model output probabilities as input. Through experimentation, we demonstrate a strong proxy for ground truth performance using unsupervised distributional shifts in relevant metadata, predicted probabilities, and VAE latent representation. Our key contributions include (1) proof-of-concept for medical imaging drift detection that includes the use of VAE and domain specific statistical methods, (2) a multi-modal methodology to measure and unify drift metrics, (3) new insights into the challenges and solutions to observe deployed medical imaging AI, and (4) creation of open-source tools that enable others to easily run their own workflows and scenarios. This work has important implications. It addresses the concerning translation gap found in continuous medical imaging AI model monitoring common in dynamic healthcare environments. △ Less

Submitted 17 March, 2022; v1 submitted 6 February, 2022; originally announced February 2022.

Comments: Added code url

arXiv:2112.13734 [pdf, ps, other]

Multi-Domain Balanced Sampling Improves Out-of-Distribution Generalization of Chest X-ray Pathology Prediction Models

Authors: Enoch Tetteh, Joseph Viviano, Yoshua Bengio, David Krueger, Joseph Paul Cohen

Abstract: Learning models that generalize under different distribution shifts in medical imaging has been a long-standing research challenge. There have been several proposals for efficient and robust visual representation learning among vision research practitioners, especially in the sensitive and critical biomedical domain. In this paper, we propose an idea for out-of-distribution generalization of chest… ▽ More Learning models that generalize under different distribution shifts in medical imaging has been a long-standing research challenge. There have been several proposals for efficient and robust visual representation learning among vision research practitioners, especially in the sensitive and critical biomedical domain. In this paper, we propose an idea for out-of-distribution generalization of chest X-ray pathologies that uses a simple balanced batch sampling technique. We observed that balanced sampling between the multiple training datasets improves the performance over baseline models trained without balancing. △ Less

Submitted 27 December, 2021; v1 submitted 27 December, 2021; originally announced December 2021.

Comments: MED-NEURIPS 2021

arXiv:2112.02064 [pdf, ps, other]

Moments of discrete classical $q$-orthogonal polynomial ensembles

Authors: Philip Cohen

Abstract: We consider some discrete $q$-analogues of the classical continuous orthogonal polynomial ensembles. Building on results due to Morozov, Popolitov and Shakirov, we find representations for the moments of the discrete $q$-Hermite and discrete $q$-Laguerre ensembles in terms of basic hypergeometric series. We find that when the number of particles is suitably randomised, the moments may be represent… ▽ More We consider some discrete $q$-analogues of the classical continuous orthogonal polynomial ensembles. Building on results due to Morozov, Popolitov and Shakirov, we find representations for the moments of the discrete $q$-Hermite and discrete $q$-Laguerre ensembles in terms of basic hypergeometric series. We find that when the number of particles is suitably randomised, the moments may be represented as basic hypergeometric orthogonal polynomials, with corresponding three-term recurrences in $k$, the order of the moments. △ Less

Submitted 3 December, 2021; originally announced December 2021.

Comments: 19 pages, 0 figures

MSC Class: 60B20 (Primary) 33D45 (Secondary)

arXiv:2111.00595 [pdf, other]

TorchXRayVision: A library of chest X-ray datasets and models

Authors: Joseph Paul Cohen, Joseph D. Viviano, Paul Bertin, Paul Morrison, Parsa Torabian, Matteo Guarrera, Matthew P Lungren, Akshay Chaudhari, Rupert Brooks, Mohammad Hashir, Hadrien Bertrand

Abstract: TorchXRayVision is an open source software library for working with chest X-ray datasets and deep learning models. It provides a common interface and common pre-processing chain for a wide set of publicly available chest X-ray datasets. In addition, a number of classification and representation learning models with different architectures, trained on different data combinations, are available thro… ▽ More TorchXRayVision is an open source software library for working with chest X-ray datasets and deep learning models. It provides a common interface and common pre-processing chain for a wide set of publicly available chest X-ray datasets. In addition, a number of classification and representation learning models with different architectures, trained on different data combinations, are available through the library to serve as baselines or feature extractors. △ Less

Submitted 31 October, 2021; originally announced November 2021.

Comments: Library source code: https://github.com/mlmed/torchxrayvision

arXiv:2110.12050 [pdf]

doi 10.1111/jors.12591

The Impact of the Coronavirus Pandemic on New York City Real Estate: First Evidence

Authors: Jeffrey P. Cohen, Felix L. Friedt, Jackson P. Lautier

Abstract: We investigate whether pandemic-induced contagion disamenities and income effects arising due to COVID-related unemployment adversely affected real estate prices of one- or two-family owner-occupied properties across New York City (NYC). First, OLS hedonic results indicate that greater COVID case numbers are concentrated in neighborhoods with lower-valued properties. Second, we use a repeat-sales… ▽ More We investigate whether pandemic-induced contagion disamenities and income effects arising due to COVID-related unemployment adversely affected real estate prices of one- or two-family owner-occupied properties across New York City (NYC). First, OLS hedonic results indicate that greater COVID case numbers are concentrated in neighborhoods with lower-valued properties. Second, we use a repeat-sales approach for the period 2003 to 2020, and we find that both the possibility of contagion and pandemic-induced income effects adversely impacted home sale prices. Estimates suggest sale prices fell by roughly $60,000 or around 8% in response to both of the following: 1,000 additional infections per 100,000 residents; and a 10-percentage point increase in unemployment in a given Modified Zip Code Tabulation Area (MODZCTA). These price effects were more pronounced during the second wave of infections. Based on cumulative MODZCTA infection rates through 2020, the estimated COVID-19 price discount ranged from approximately 1% to 50% in the most affected neighborhoods, and averaged 14%. The contagion effect intensified in the more affluent, but less densely populated NYC neighborhoods, while the income effect was more pronounced in the most densely populated neighborhoods with more rental properties and greater population shares of foreign-born residents. This disparity implies the pandemic may have been correlated with a wider gap in housing wealth in NYC between homeowners in lower-priced and higher-priced neighborhoods. △ Less

Submitted 27 January, 2022; v1 submitted 22 October, 2021; originally announced October 2021.

Comments: 38 pages, 5 tables, 3 figures, Revised 1/27/2022

arXiv:2103.02641 [pdf, other]

doi 10.1093/mnras/stab643

Observing the influence of the youngest super star clusters in NGC 1569: Keck Brackett $α$ spectroscopy

Authors: Daniel P. Cohen, Jean L. Turner, Sara C. Beck, S. Michelle Consiglio

Abstract: We report Keck-NIRSPEC observations of the Brackett $α$ 4.05 $μ$m recombination line across the two candidate embedded super star clusters (SSCs) in NGC 1569. These SSCs power a bright HII region and have been previously detected as radio and mid-infrared sources. Supplemented with high resolution VLA mapping of the radio continuum along with IRTF-TEXES spectroscopy of the [SIV] 10.5 $μ$m line, th… ▽ More We report Keck-NIRSPEC observations of the Brackett $α$ 4.05 $μ$m recombination line across the two candidate embedded super star clusters (SSCs) in NGC 1569. These SSCs power a bright HII region and have been previously detected as radio and mid-infrared sources. Supplemented with high resolution VLA mapping of the radio continuum along with IRTF-TEXES spectroscopy of the [SIV] 10.5 $μ$m line, the Brackett $α$ spectra data provide new insight into the dynamical state of gas ionized by these forming massive clusters. NIR sources detected in 2 $μ$m images from the Slit-viewing Camera are matched with GAIA sources to obtain accurate celestial coordinates and slit positions to within $\sim 0.1''$. Br$α$ is detected as a strong emission peak powered by the less luminous infrared source, MIR1 ($L_{\rm IR}\sim 2\times10^7~L_\odot$). The second candidate SSC MIR2 is more luminous ($L_{\rm IR}\gtrsim 4\times10^8~L_\odot$) but exhibits weak radio continuum and Br$α$ emission, suggesting the ionized gas is extremely dense ($n_e\gtrsim 10^5$ cm$^{-3}$), corresponding to hypercompact HII regions around newborn massive stars. The Br$α$ and [SIV] lines across the region are both remarkably symmetric and extremely narrow, with observed line widths $Δv \simeq 40$ km s$^{-1}$, FWHM. This result is the first clear evidence that feedback from NGC 1569's youngest giant clusters is currently incapable of rapid gas dispersal, consistent with the emerging theoretical paradigm in the formation of giant star clusters. △ Less

Submitted 3 March, 2021; originally announced March 2021.

Comments: Accepted for publication in MNRAS 2021 Feb 26; 9 pages, 7 figures

arXiv:2102.09582 [pdf, other]

Benefits of Linear Conditioning with Metadata for Image Segmentation

Authors: Andreanne Lemay, Charley Gros, Olivier Vincent, Yaou Liu, Joseph Paul Cohen, Julien Cohen-Adad

Abstract: Medical images are often accompanied by metadata describing the image (vendor, acquisition parameters) and the patient (disease type or severity, demographics, genomics). This metadata is usually disregarded by image segmentation methods. In this work, we adapt a linear conditioning method called FiLM (Feature-wise Linear Modulation) for image segmentation tasks. This FiLM adaptation enables integ… ▽ More Medical images are often accompanied by metadata describing the image (vendor, acquisition parameters) and the patient (disease type or severity, demographics, genomics). This metadata is usually disregarded by image segmentation methods. In this work, we adapt a linear conditioning method called FiLM (Feature-wise Linear Modulation) for image segmentation tasks. This FiLM adaptation enables integrating metadata into segmentation models for better performance. We observed an average Dice score increase of 5.1% on spinal cord tumor segmentation when incorporating the tumor type with FiLM. The metadata modulates the segmentation process through low-cost affine transformations applied on feature maps which can be included in any neural network's architecture. Additionally, we assess the relevance of segmentation FiLM layers for tackling common challenges in medical imaging: multi-class training with missing segmentations, model adaptation to multiple tasks, and training with a limited or unbalanced number of annotated data. Our results demonstrated the following benefits of FiLM for segmentation: FiLMed U-Net was robust to missing labels and reached higher Dice scores with few labels (up to 16.7%) compared to single-task U-Net. The code is open-source and available at www.ivadomed.org. △ Less

Submitted 26 April, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

Comments: Accepted at MIDL 2021

arXiv:2102.09475 [pdf, other]

Gifsplanation via Latent Shift: A Simple Autoencoder Approach to Counterfactual Generation for Chest X-rays

Authors: Joseph Paul Cohen, Rupert Brooks, Sovann En, Evan Zucker, Anuj Pareek, Matthew P. Lungren, Akshay Chaudhari

Abstract: Motivation: Traditional image attribution methods struggle to satisfactorily explain predictions of neural networks. Prediction explanation is important, especially in medical imaging, for avoiding the unintended consequences of deploying AI systems when false positive predictions can impact patient care. Thus, there is a pressing need to develop improved models for model explainability and intros… ▽ More Motivation: Traditional image attribution methods struggle to satisfactorily explain predictions of neural networks. Prediction explanation is important, especially in medical imaging, for avoiding the unintended consequences of deploying AI systems when false positive predictions can impact patient care. Thus, there is a pressing need to develop improved models for model explainability and introspection. Specific problem: A new approach is to transform input images to increase or decrease features which cause the prediction. However, current approaches are difficult to implement as they are monolithic or rely on GANs. These hurdles prevent wide adoption. Our approach: Given an arbitrary classifier, we propose a simple autoencoder and gradient update (Latent Shift) that can transform the latent representation of a specific input image to exaggerate or curtail the features used for prediction. We use this method to study chest X-ray classifiers and evaluate their performance. We conduct a reader study with two radiologists assessing 240 chest X-ray predictions to identify which ones are false positives (half are) using traditional attribution maps or our proposed method. Results: We found low overlap with ground truth pathology masks for models with reasonably high accuracy. However, the results from our reader study indicate that these models are generally looking at the correct features. We also found that the Latent Shift explanation allows a user to have more confidence in true positive predictions compared to traditional approaches (0.15$\pm$0.95 in a 5 point scale with p=0.01) with only a small increase in false positive predictions (0.04$\pm$1.06 with p=0.57). Accompanying webpage: https://mlmed.org/gifsplanation Source code: https://github.com/mlmed/gifsplanation △ Less

Submitted 24 April, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

Comments: Full paper at MIDL2021

arXiv:2101.11506 [pdf, ps, other]

doi 10.1016/j.jqsrt.2021.107822

Multi-Group Discontinuous Asymptotic $P_1$ Approximation in Radiative Marshak Waves Experiments

Authors: Avner P. Cohen, Shay I. Heizler

Abstract: We study the propagation of radiative heat (Marshak) waves, using modified $P_1$-approximation equations. In relatively optically-thin media the heat propagation is supersonic,~i.e. hydrodynamic motion is negligible, and thus can be described by the radiative transfer Boltzmann equation, coupled with the material energy equation. However, the exact thermal radiative transfer problem is still diffi… ▽ More We study the propagation of radiative heat (Marshak) waves, using modified $P_1$-approximation equations. In relatively optically-thin media the heat propagation is supersonic,~i.e. hydrodynamic motion is negligible, and thus can be described by the radiative transfer Boltzmann equation, coupled with the material energy equation. However, the exact thermal radiative transfer problem is still difficult to solve and requires massive simulation capabilities. Hence, there still exists a need for adequate approximations that are comparatively easy to carry out. Classic approximations, such as the classic diffusion and classic $P_1$, fail to describe the correct heat wave velocity, when the optical depth is not sufficiently high. Therefore, we use the recently developed discontinuous asymptotic $P_1$ approximation, which is a time-dependent analogy for the adjustment of the discontinuous asymptotic diffusion for two different zones. This approximation was tested via several benchmarks, showing better results than other common approximations, and has also demonstrated a good agreement with a main Marshak wave experiment and its Monte-Carlo gray simulation. Here we derive energy expansion of the discontinuous asymptotic $P_1$ approximation in slab geometry, and test it with numerous experimental results for propagating Marshak waves inside low density foams. The new approximation describes the heat wave propagation with good agreement. Furthermore, a comparison of the simulations to exact implicit Monte-Carlo slab-geometry multi-group simulations, in this wide range of experimental conditions, demonstrates the superiority of this approximation to others. △ Less

Submitted 10 February, 2021; v1 submitted 27 January, 2021; originally announced January 2021.

Comments: 17 pages, 7 figures

Journal ref: Journal of Quantitative Spectroscopy and Radiative Transfer, 272, 107822 (2021)

arXiv:2012.09246 [pdf, ps, other]

No-harm calibration for generalized Oaxaca-Blinder estimators

Authors: Peter L. Cohen, Colin B. Fogarty

Abstract: In randomized experiments, adjusting for observed features when estimating treatment effects has been proposed as a way to improve asymptotic efficiency. However, only linear regression has been proven to form an estimate of the average treatment effect that is asymptotically no less efficient than the treated-minus-control difference in means regardless of the true data generating process. Random… ▽ More In randomized experiments, adjusting for observed features when estimating treatment effects has been proposed as a way to improve asymptotic efficiency. However, only linear regression has been proven to form an estimate of the average treatment effect that is asymptotically no less efficient than the treated-minus-control difference in means regardless of the true data generating process. Randomized treatment assignment provides this "do-no-harm" property, with neither truth of a linear model nor a generative model for the outcomes being required. We present a general calibration method which confers the same no-harm property onto estimators leveraging a broad class of nonlinear models. This recovers the usual regression-adjusted estimator when ordinary least squares is used, and further provides non-inferior treatment effect estimators using methods such as logistic and Poisson regression. The resulting estimators are non-inferior to both the difference in means estimator and to treatment effect estimators that have not undergone calibration. We show that our estimator is asymptotically equivalent to an inverse probability weighted estimator using a logit link with predicted potential outcomes as covariates. In a simulation study, we demonstrate that common nonlinear estimators without our calibration procedure may perform markedly worse than both the calibrated estimator and the unadjusted difference in means. △ Less

Submitted 12 April, 2022; v1 submitted 16 December, 2020; originally announced December 2020.

MSC Class: 62G99

arXiv:2010.09984 [pdf, other]

ivadomed: A Medical Imaging Deep Learning Toolbox

Authors: Charley Gros, Andreanne Lemay, Olivier Vincent, Lucas Rouhier, Anthime Bucquet, Joseph Paul Cohen, Julien Cohen-Adad

Abstract: ivadomed is an open-source Python package for designing, end-to-end training, and evaluating deep learning models applied to medical imaging data. The package includes APIs, command-line tools, documentation, and tutorials. ivadomed also includes pre-trained models such as spinal tumor segmentation and vertebral labeling. Original features of ivadomed include a data loader that can parse image met… ▽ More ivadomed is an open-source Python package for designing, end-to-end training, and evaluating deep learning models applied to medical imaging data. The package includes APIs, command-line tools, documentation, and tutorials. ivadomed also includes pre-trained models such as spinal tumor segmentation and vertebral labeling. Original features of ivadomed include a data loader that can parse image metadata (e.g., acquisition parameters, image contrast, resolution) and subject metadata (e.g., pathology, age, sex) for custom data splitting or extra information during training and evaluation. Any dataset following the Brain Imaging Data Structure (BIDS) convention will be compatible with ivadomed without the need to manually organize the data, which is typically a tedious task. Beyond the traditional deep learning methods, ivadomed features cutting-edge architectures, such as FiLM and HeMis, as well as various uncertainty estimation methods (aleatoric and epistemic), and losses adapted to imbalanced classes and non-binary predictions. Each step is conveniently configurable via a single file. At the same time, the code is highly modular to allow addition/modification of an architecture or pre/post-processing steps. Example applications of ivadomed include MRI object detection, segmentation, and labeling of anatomical and pathological structures. Overall, ivadomed enables easy and quick exploration of the latest advances in deep learning for medical imaging applications. ivadomed's main project page is available at https://ivadomed.org. △ Less

Submitted 19 October, 2020; originally announced October 2020.

arXiv:2009.08348 [pdf, other]

S2SD: Simultaneous Similarity-based Self-Distillation for Deep Metric Learning

Authors: Karsten Roth, Timo Milbich, Björn Ommer, Joseph Paul Cohen, Marzyeh Ghassemi

Abstract: Deep Metric Learning (DML) provides a crucial tool for visual similarity and zero-shot applications by learning generalizing embedding spaces, although recent work in DML has shown strong performance saturation across training objectives. However, generalization capacity is known to scale with the embedding space dimensionality. Unfortunately, high dimensional embeddings also create higher retriev… ▽ More Deep Metric Learning (DML) provides a crucial tool for visual similarity and zero-shot applications by learning generalizing embedding spaces, although recent work in DML has shown strong performance saturation across training objectives. However, generalization capacity is known to scale with the embedding space dimensionality. Unfortunately, high dimensional embeddings also create higher retrieval cost for downstream applications. To remedy this, we propose \emph{Simultaneous Similarity-based Self-distillation (S2SD). S2SD extends DML with knowledge distillation from auxiliary, high-dimensional embedding and feature spaces to leverage complementary context during training while retaining test-time cost and with negligible changes to the training time. Experiments and ablations across different objectives and standard benchmarks show S2SD offers notable improvements of up to 7% in Recall@1, while also setting a new state-of-the-art. Code available at https://github.com/MLforHealth/S2SD. △ Less

Submitted 4 June, 2021; v1 submitted 17 September, 2020; originally announced September 2020.

Comments: Accepted to ICML2021

arXiv:2007.13224 [pdf, other]

Uniformizing Techniques to Process CT scans with 3D CNNs for Tuberculosis Prediction

Authors: Hasib Zunair, Aimon Rahman, Nabeel Mohammed, Joseph Paul Cohen

Abstract: A common approach to medical image analysis on volumetric data uses deep 2D convolutional neural networks (CNNs). This is largely attributed to the challenges imposed by the nature of the 3D data: variable volume size, GPU exhaustion during optimization. However, dealing with the individual slices independently in 2D CNNs deliberately discards the depth information which results in poor performanc… ▽ More A common approach to medical image analysis on volumetric data uses deep 2D convolutional neural networks (CNNs). This is largely attributed to the challenges imposed by the nature of the 3D data: variable volume size, GPU exhaustion during optimization. However, dealing with the individual slices independently in 2D CNNs deliberately discards the depth information which results in poor performance for the intended task. Therefore, it is important to develop methods that not only overcome the heavy memory and computation requirements but also leverage the 3D information. To this end, we evaluate a set of volume uniformizing methods to address the aforementioned issues. The first method involves sampling information evenly from a subset of the volume. Another method exploits the full geometry of the 3D volume by interpolating over the z-axis. We demonstrate performance improvements using controlled ablation studies as well as put this approach to the test on the ImageCLEF Tuberculosis Severity Assessment 2019 benchmark. We report 73% area under curve (AUC) and binary classification accuracy (ACC) of 67.5% on the test set beating all methods which leveraged only image information (without using clinical meta-data) achieving 5-th position overall. All codes and models are made available at https://github.com/hasibzunair/uniformizing-3D. △ Less

Submitted 26 July, 2020; originally announced July 2020.

Comments: Accepted for publication at the MICCAI 2020 International Workshop on PRedictive Intelligence In MEdicine (PRIME)

arXiv:2007.04250 [pdf, other]

A Benchmark of Medical Out of Distribution Detection

Authors: Tianshi Cao, Chin-Wei Huang, David Yu-Tung Hui, Joseph Paul Cohen

Abstract: Motivation: Deep learning models deployed for use on medical tasks can be equipped with Out-of-Distribution Detection (OoDD) methods in order to avoid erroneous predictions. However it is unclear which OoDD method should be used in practice. Specific Problem: Systems trained for one particular domain of images cannot be expected to perform accurately on images of a different domain. These images s… ▽ More Motivation: Deep learning models deployed for use on medical tasks can be equipped with Out-of-Distribution Detection (OoDD) methods in order to avoid erroneous predictions. However it is unclear which OoDD method should be used in practice. Specific Problem: Systems trained for one particular domain of images cannot be expected to perform accurately on images of a different domain. These images should be flagged by an OoDD method prior to diagnosis. Our approach: This paper defines 3 categories of OoD examples and benchmarks popular OoDD methods in three domains of medical imaging: chest X-ray, fundus imaging, and histology slides. Results: Our experiments show that despite methods yielding good results on some categories of out-of-distribution samples, they fail to recognize images close to the training distribution. Conclusion: We find a simple binary classifier on the feature representation has the best accuracy and AUPRC on average. Users of diagnostic tools which employ these OoDD methods should still remain vigilant that images very close to the training distribution yet not in it could yield unexpected results. △ Less

Submitted 4 August, 2020; v1 submitted 8 July, 2020; originally announced July 2020.

Comments: Submitted to Machine Learning for Biomedical Imaging Journal (MELBA)

arXiv:2006.11988 [pdf, other]

doi 10.59275/j.melba.2020-48g7

COVID-19 Image Data Collection: Prospective Predictions Are the Future

Authors: Joseph Paul Cohen, Paul Morrison, Lan Dao, Karsten Roth, Tim Q Duong, Marzyeh Ghassemi

Abstract: Across the world's coronavirus disease 2019 (COVID-19) hot spots, the need to streamline patient diagnosis and management has become more pressing than ever. As one of the main imaging tools, chest X-rays (CXRs) are common, fast, non-invasive, relatively cheap, and potentially bedside to monitor the progression of the disease. This paper describes the first public COVID-19 image data collection as… ▽ More Across the world's coronavirus disease 2019 (COVID-19) hot spots, the need to streamline patient diagnosis and management has become more pressing than ever. As one of the main imaging tools, chest X-rays (CXRs) are common, fast, non-invasive, relatively cheap, and potentially bedside to monitor the progression of the disease. This paper describes the first public COVID-19 image data collection as well as a preliminary exploration of possible use cases for the data. This dataset currently contains hundreds of frontal view X-rays and is the largest public resource for COVID-19 image and prognostic data, making it a necessary resource to develop and evaluate tools to aid in the treatment of COVID-19. It was manually aggregated from publication figures as well as various web based repositories into a machine learning (ML) friendly format with accompanying dataloader code. We collected frontal and lateral view imagery and metadata such as the time since first symptoms, intensive care unit (ICU) status, survival status, intubation status, or hospital location. We present multiple possible use cases for the data such as predicting the need for the ICU, predicting patient survival, and understanding a patient's trajectory during treatment. Data can be accessed here: https://github.com/ieee8023/covid-chestxray-dataset △ Less

Submitted 14 December, 2020; v1 submitted 21 June, 2020; originally announced June 2020.

Comments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org. Code for baseline experiments can be found here: https://github.com/mlmed/covid-baselines

arXiv:2005.13009 [pdf, ps, other]

A Kuratowski closure-complement variant whose solution is independent of ZF

Authors: Michael P. Cohen, Todd Johnson, Adam Kral, Aaron Li, Justin Soll

Abstract: We pose the following new variant of the Kuratowski closure-complement problem: How many distinct sets may be obtained by starting with a set $A$ of a Polish space $X$, and applying only closure, complementation, and the $d$ operator, as often as desired, in any order? The set operator $d$ was studied by Kuratowski in his foundational text \textit{Topology: Volume I}; it assigns to $A$ the collect… ▽ More We pose the following new variant of the Kuratowski closure-complement problem: How many distinct sets may be obtained by starting with a set $A$ of a Polish space $X$, and applying only closure, complementation, and the $d$ operator, as often as desired, in any order? The set operator $d$ was studied by Kuratowski in his foundational text \textit{Topology: Volume I}; it assigns to $A$ the collection $dA$ of all points of second category for $A$. We show that in ZFC set theory, the answer to this variant problem is $22$. In a distinct system equiconsistent with ZFC, namely ZF+DC+PB, the answer is only $18$. △ Less

Submitted 26 May, 2020; originally announced May 2020.

Comments: 9 pages, 3 figures

arXiv:2005.11856 [pdf, other]

Predicting COVID-19 Pneumonia Severity on Chest X-ray with Deep Learning

Authors: Joseph Paul Cohen, Lan Dao, Paul Morrison, Karsten Roth, Yoshua Bengio, Beiyi Shen, Almas Abbasi, Mahsa Hoshmand-Kochi, Marzyeh Ghassemi, Haifang Li, Tim Q Duong

Abstract: Purpose: The need to streamline patient management for COVID-19 has become more pressing than ever. Chest X-rays provide a non-invasive (potentially bedside) tool to monitor the progression of the disease. In this study, we present a severity score prediction model for COVID-19 pneumonia for frontal chest X-ray images. Such a tool can gauge severity of COVID-19 lung infections (and pneumonia in ge… ▽ More Purpose: The need to streamline patient management for COVID-19 has become more pressing than ever. Chest X-rays provide a non-invasive (potentially bedside) tool to monitor the progression of the disease. In this study, we present a severity score prediction model for COVID-19 pneumonia for frontal chest X-ray images. Such a tool can gauge severity of COVID-19 lung infections (and pneumonia in general) that can be used for escalation or de-escalation of care as well as monitoring treatment efficacy, especially in the ICU. Methods: Images from a public COVID-19 database were scored retrospectively by three blinded experts in terms of the extent of lung involvement as well as the degree of opacity. A neural network model that was pre-trained on large (non-COVID-19) chest X-ray datasets is used to construct features for COVID-19 images which are predictive for our task. Results: This study finds that training a regression model on a subset of the outputs from an this pre-trained chest X-ray model predicts our geographic extent score (range 0-8) with 1.14 mean absolute error (MAE) and our lung opacity score (range 0-6) with 0.78 MAE. Conclusions: These results indicate that our model's ability to gauge severity of COVID-19 lung infections could be used for escalation or de-escalation of care as well as monitoring treatment efficacy, especially in the intensive care unit (ICU). A proper clinical trial is needed to evaluate efficacy. To enable this we make our code, labels, and data available online at https://github.com/mlmed/torchxrayvision/tree/master/scripts/covid-severity and https://github.com/ieee8023/covid-chestxray-dataset △ Less

Submitted 30 June, 2020; v1 submitted 24 May, 2020; originally announced May 2020.

arXiv:2004.13458 [pdf, other]

DiVA: Diverse Visual Feature Aggregation for Deep Metric Learning

Authors: Timo Milbich, Karsten Roth, Homanga Bharadhwaj, Samarth Sinha, Yoshua Bengio, Björn Ommer, Joseph Paul Cohen

Abstract: Visual Similarity plays an important role in many computer vision applications. Deep metric learning (DML) is a powerful framework for learning such similarities which not only generalize from training data to identically distributed test distributions, but in particular also translate to unknown test classes. However, its prevailing learning paradigm is class-discriminative supervised training, w… ▽ More Visual Similarity plays an important role in many computer vision applications. Deep metric learning (DML) is a powerful framework for learning such similarities which not only generalize from training data to identically distributed test distributions, but in particular also translate to unknown test classes. However, its prevailing learning paradigm is class-discriminative supervised training, which typically results in representations specialized in separating training classes. For effective generalization, however, such an image representation needs to capture a diverse range of data characteristics. To this end, we propose and study multiple complementary learning tasks, targeting conceptually different data relationships by only resorting to the available training samples and labels of a standard DML setting. Through simultaneous optimization of our tasks we learn a single model to aggregate their training signals, resulting in strong generalization and state-of-the-art performance on multiple established DML benchmark datasets. △ Less

Submitted 10 September, 2020; v1 submitted 28 April, 2020; originally announced April 2020.

Comments: published at ECCV 2020

arXiv:2003.11597 [pdf, other]

COVID-19 Image Data Collection

Authors: Joseph Paul Cohen, Paul Morrison, Lan Dao

Abstract: This paper describes the initial COVID-19 open image data collection. It was created by assembling medical images from websites and publications and currently contains 123 frontal view X-rays. This paper describes the initial COVID-19 open image data collection. It was created by assembling medical images from websites and publications and currently contains 123 frontal view X-rays. △ Less

Submitted 25 March, 2020; originally announced March 2020.

Comments: Dataset available here: https://github.com/ieee8023/covid-chestxray-dataset

arXiv:2003.08783 [pdf, other]

Redistribution Systems and PRAM

Authors: Paul Cohen, Tomasz Loboda

Abstract: Redistribution systems iteratively redistribute mass between groups under the control of rules. PRAM is a framework for building redistribution systems. We discuss the relationships between redistribution systems, agent-based systems, compartmental models and Bayesian models. PRAM puts agent-based models on a sound probabilistic footing by reformulating them as redistribution systems. This provide… ▽ More Redistribution systems iteratively redistribute mass between groups under the control of rules. PRAM is a framework for building redistribution systems. We discuss the relationships between redistribution systems, agent-based systems, compartmental models and Bayesian models. PRAM puts agent-based models on a sound probabilistic footing by reformulating them as redistribution systems. This provides a basis for integrating agent-based and probabilistic models. \pram/ extends the themes of probabilistic relational models and lifted inference to incorporate dynamical models and simulation. We illustrate PRAM with an epidemiological example. △ Less

Submitted 19 March, 2020; v1 submitted 17 March, 2020; originally announced March 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:1902.05677

arXiv:2003.04387 [pdf, other]

Spine intervertebral disc labeling using a fully convolutional redundant counting model

Authors: Lucas Rouhier, Francisco Perdigon Romero, Joseph Paul Cohen, Julien Cohen-Adad

Abstract: Labeling intervertebral discs is relevant as it notably enables clinicians to understand the relationship between a patient's symptoms (pain, paralysis) and the exact level of spinal cord injury. However manually labeling those discs is a tedious and user-biased task which would benefit from automated methods. While some automated methods already exist for MRI and CT-scan, they are either not publ… ▽ More Labeling intervertebral discs is relevant as it notably enables clinicians to understand the relationship between a patient's symptoms (pain, paralysis) and the exact level of spinal cord injury. However manually labeling those discs is a tedious and user-biased task which would benefit from automated methods. While some automated methods already exist for MRI and CT-scan, they are either not publicly available, or fail to generalize across various imaging contrasts. In this paper we combine a Fully Convolutional Network (FCN) with inception modules to localize and label intervertebral discs. We demonstrate a proof-of-concept application in a publicly-available multi-center and multi-contrast MRI database (n=235 subjects). The code is publicly available at https://github.com/neuropoly/vertebral-labeling-deep-learning. △ Less

Submitted 11 March, 2020; v1 submitted 9 March, 2020; originally announced March 2020.

Comments: MIDL 2020

arXiv:2003.04377 [pdf, other]

Automatic segmentation of spinal multiple sclerosis lesions: How to generalize across MRI contrasts?

Authors: Olivier Vincent, Charley Gros, Joseph Paul Cohen, Julien Cohen-Adad

Abstract: Despite recent improvements in medical image segmentation, the ability to generalize across imaging contrasts remains an open issue. To tackle this challenge, we implement Feature-wise Linear Modulation (FiLM) to leverage physics knowledge within the segmentation model and learn the characteristics of each contrast. Interestingly, a well-optimised U-Net reached the same performance as our FiLMed-U… ▽ More Despite recent improvements in medical image segmentation, the ability to generalize across imaging contrasts remains an open issue. To tackle this challenge, we implement Feature-wise Linear Modulation (FiLM) to leverage physics knowledge within the segmentation model and learn the characteristics of each contrast. Interestingly, a well-optimised U-Net reached the same performance as our FiLMed-Unet on a multi-contrast dataset (0.72 of Dice score), which suggests that there is a bottleneck in spinal MS lesion segmentation different from the generalization across varying contrasts. This bottleneck likely stems from inter-rater variability, which is estimated at 0.61 of Dice score in our dataset. △ Less

Submitted 3 June, 2020; v1 submitted 9 March, 2020; originally announced March 2020.

Comments: Presented at OHBM 2020 (v2-3 : corrected typos)

arXiv:2002.08473 [pdf, other]

Revisiting Training Strategies and Generalization Performance in Deep Metric Learning

Authors: Karsten Roth, Timo Milbich, Samarth Sinha, Prateek Gupta, Björn Ommer, Joseph Paul Cohen

Abstract: Deep Metric Learning (DML) is arguably one of the most influential lines of research for learning visual similarities with many proposed approaches every year. Although the field benefits from the rapid progress, the divergence in training protocols, architectures, and parameter choices make an unbiased comparison difficult. To provide a consistent reference point, we revisit the most widely used… ▽ More Deep Metric Learning (DML) is arguably one of the most influential lines of research for learning visual similarities with many proposed approaches every year. Although the field benefits from the rapid progress, the divergence in training protocols, architectures, and parameter choices make an unbiased comparison difficult. To provide a consistent reference point, we revisit the most widely used DML objective functions and conduct a study of the crucial parameter choices as well as the commonly neglected mini-batch sampling process. Under consistent comparison, DML objectives show much higher saturation than indicated by literature. Further based on our analysis, we uncover a correlation between the embedding space density and compression to the generalization performance of DML models. Exploiting these insights, we propose a simple, yet effective, training regularization to reliably boost the performance of ranking-based DML models on various standard benchmark datasets. Code and a publicly accessible WandB-repo are available at https://github.com/Confusezius/Revisiting_Deep_Metric_Learning_PyTorch. △ Less

Submitted 1 August, 2020; v1 submitted 19 February, 2020; originally announced February 2020.

Comments: ICML 2020. Main paper 8.25 pages, 26 pages total

arXiv:2002.06654 [pdf, other]

doi 10.1111/rssb.12439

Gaussian Prepivoting for Finite Population Causal Inference

Authors: Peter L. Cohen, Colin B. Fogarty

Abstract: In finite population causal inference exact randomization tests can be constructed for sharp null hypotheses, i.e. hypotheses which fully impute the missing potential outcomes. Oftentimes inference is instead desired for the weak null that the sample average of the treatment effects takes on a particular value while leaving the subject-specific treatment effects unspecified. Without proper care, t… ▽ More In finite population causal inference exact randomization tests can be constructed for sharp null hypotheses, i.e. hypotheses which fully impute the missing potential outcomes. Oftentimes inference is instead desired for the weak null that the sample average of the treatment effects takes on a particular value while leaving the subject-specific treatment effects unspecified. Without proper care, tests valid for sharp null hypotheses may be anti-conservative should only the weak null hold, creating the risk of misinterpretation when randomization tests are deployed in practice. We develop a general framework for unifying modes of inference for sharp and weak nulls, wherein a single procedure simultaneously delivers exact inference for sharp nulls and asymptotically valid inference for weak nulls. To do this, we employ randomization tests based upon prepivoted test statistics, wherein a test statistic is first transformed by a suitably constructed cumulative distribution function and its randomization distribution assuming the sharp null is then enumerated. For a large class of commonly employed test statistics, we show that prepivoting may be accomplished by employing the push-forward of a sample-based Gaussian measure based upon a suitably constructed covariance estimator. In essence, the approach enumerates the randomization distribution (assuming the sharp null) of a P-value for a large-sample test known to be valid under the weak null, and uses the resulting randomization distribution to perform inference. The versatility of the method is demonstrated through a host of examples, including rerandomized designs and regression-adjusted estimators in completely randomized designs. △ Less

Submitted 13 June, 2021; v1 submitted 16 February, 2020; originally announced February 2020.

arXiv:2002.02582 [pdf, other]

Quantifying the Value of Lateral Views in Deep Learning for Chest X-rays

Authors: Mohammad Hashir, Hadrien Bertrand, Joseph Paul Cohen

Abstract: Most deep learning models in chest X-ray prediction utilize the posteroanterior (PA) view due to the lack of other views available. PadChest is a large-scale chest X-ray dataset that has almost 200 labels and multiple views available. In this work, we use PadChest to explore multiple approaches to merging the PA and lateral views for predicting the radiological labels associated with the X-ray ima… ▽ More Most deep learning models in chest X-ray prediction utilize the posteroanterior (PA) view due to the lack of other views available. PadChest is a large-scale chest X-ray dataset that has almost 200 labels and multiple views available. In this work, we use PadChest to explore multiple approaches to merging the PA and lateral views for predicting the radiological labels associated with the X-ray image. We find that different methods of merging the model utilize the lateral view differently. We also find that including the lateral view increases performance for 32 labels in the dataset, while being neutral for the others. The increase in overall performance is comparable to the one obtained by using only the PA view with twice the amount of patients in the training set. △ Less

Submitted 6 February, 2020; originally announced February 2020.

Comments: Under review at MIDL 2020

arXiv:2002.02497 [pdf, other]

On the limits of cross-domain generalization in automated X-ray prediction

Authors: Joseph Paul Cohen, Mohammad Hashir, Rupert Brooks, Hadrien Bertrand

Abstract: This large scale study focuses on quantifying what X-rays diagnostic prediction tasks generalize well across multiple different datasets. We present evidence that the issue of generalization is not due to a shift in the images but instead a shift in the labels. We study the cross-domain performance, agreement between models, and model representations. We find interesting discrepancies between perf… ▽ More This large scale study focuses on quantifying what X-rays diagnostic prediction tasks generalize well across multiple different datasets. We present evidence that the issue of generalization is not due to a shift in the images but instead a shift in the labels. We study the cross-domain performance, agreement between models, and model representations. We find interesting discrepancies between performance and agreement where models which both achieve good performance disagree in their predictions as well as models which agree yet achieve poor performance. We also test for concept similarity by regularizing a network to group tasks across multiple datasets together and observe variation across the tasks. All code is made available online and data is publicly available: https://github.com/mlmed/torchxrayvision △ Less

Submitted 24 May, 2020; v1 submitted 6 February, 2020; originally announced February 2020.

Comments: Full paper at MIDL2020

arXiv:2001.11002 [pdf, other]

doi 10.1093/mnras/staa292

Unveiling Kinematic Structure in the Starburst Heart of NGC 253

Authors: Daniel P. Cohen, Jean L. Turner, S. Michelle Consiglio

Abstract: We investigate the kinematics of ionized gas within the nuclear starburst of NGC 253 with observations of the Brackett $α$ recombination line at 4.05 $μ$m. The goal is to distinguish motions driven by star-formation feedback from gravitational motions induced by the central mass structure. Using NIRSPEC on Keck II, we obtained 30 spectra through a $0''.5$ slit stepped across the central $\sim$5… ▽ More We investigate the kinematics of ionized gas within the nuclear starburst of NGC 253 with observations of the Brackett $α$ recombination line at 4.05 $μ$m. The goal is to distinguish motions driven by star-formation feedback from gravitational motions induced by the central mass structure. Using NIRSPEC on Keck II, we obtained 30 spectra through a $0''.5$ slit stepped across the central $\sim$5$''\times 25''$ (85 $\times$ 425 pc) region to produce a spectral cube. The Br$α$ emission resolves into four nuclear sources: S1 at the infrared core (IRC), N1 at the radio core near nonthermal source TH2, and the fainter sources N2 and N3 in the northeast. The line profile is characterized by a primary component with $Δv_{\mathrm{primary}}$$\sim$90-130 km s$^{-1}$ (FWHM) on top of a broad {blue} wing with $Δv_{\mathrm{broad}}$$\sim$300-350 km s$^{-1}$, and an additional redshifted narrow component in the west. The velocity field generated from our cube reveals several distinct patterns. A mean NE-SW velocity gradient of +10 km s$^{-1}$ arcsec$^{-1}$ along the major axis traces the solid-body rotation curve of the nuclear disk. At the radio core, isovelocity contours become S-shaped, indicating the presence of secondary nuclear bar of total extent $\sim$5$''$ (90 pc). The symmetry of the bar places the galactic center near the radio peak TH2 of the galaxy rather than the IRC, and makes this the most likely location of a SMBH. A third kinematic substructure is formed by blueshifted gas on the southeast side of the IRC. This feature provides evidence for a $\sim$100-250 km s$^{-1}$ starburst-driven outflow potentially responsible for powering the kpc-scale galactic wind of NGC 253. △ Less

Submitted 29 January, 2020; originally announced January 2020.

Comments: Accepted for publication in MNRAS on Jan 24, 2020 ; 12 pages, 7 figures

arXiv:1911.07093 [pdf, other]

doi 10.1103/PhysRevResearch.2.023007

Key to understanding supersonic radiative Marshak waves using simple models and advanced simulations

Authors: Avner P. Cohen, Guy Malamud, Shay I. Heizler

Abstract: This article studies the propagation of supersonic radiative Marshak waves. These waves are radiation dominated, and play an important role in inertial confinement fusion and in astrophysical and laboratory systems. For that reason, this phenomenon has attracted considerable experimental attention in recent decades in several different facilities. The present study integrates the various experimen… ▽ More This article studies the propagation of supersonic radiative Marshak waves. These waves are radiation dominated, and play an important role in inertial confinement fusion and in astrophysical and laboratory systems. For that reason, this phenomenon has attracted considerable experimental attention in recent decades in several different facilities. The present study integrates the various experimental results published in the literature, demonstrating a common physical base. A new simple semi-analytic model is derived and presented along with advanced radiative hydrodynamic implicit Monte Carlo direct numerical simulations, which explain the experimental results. This study identifies the main physical effects dominating the experiments, notwithstanding their different apparatuses and different physical regimes. △ Less

Submitted 22 March, 2020; v1 submitted 16 November, 2019; originally announced November 2019.

Comments: 33 pages, 17 figures

Journal ref: Phys. Rev. Research 2, 023007 (2020)

arXiv:1910.13249 [pdf, other]

Navigation Agents for the Visually Impaired: A Sidewalk Simulator and Experiments

Authors: Martin Weiss, Simon Chamorro, Roger Girgis, Margaux Luck, Samira E. Kahou, Joseph P. Cohen, Derek Nowrouzezahrai, Doina Precup, Florian Golemo, Chris Pal

Abstract: Millions of blind and visually-impaired (BVI) people navigate urban environments every day, using smartphones for high-level path-planning and white canes or guide dogs for local information. However, many BVI people still struggle to travel to new places. In our endeavor to create a navigation assistant for the BVI, we found that existing Reinforcement Learning (RL) environments were unsuitable f… ▽ More Millions of blind and visually-impaired (BVI) people navigate urban environments every day, using smartphones for high-level path-planning and white canes or guide dogs for local information. However, many BVI people still struggle to travel to new places. In our endeavor to create a navigation assistant for the BVI, we found that existing Reinforcement Learning (RL) environments were unsuitable for the task. This work introduces SEVN, a sidewalk simulation environment and a neural network-based approach to creating a navigation agent. SEVN contains panoramic images with labels for house numbers, doors, and street name signs, and formulations for several navigation tasks. We study the performance of an RL algorithm (PPO) in this setting. Our policy model fuses multi-modal observations in the form of variable resolution images, visible text, and simulated GPS data to navigate to a goal door. We hope that this dataset, simulator, and experimental results will provide a foundation for further research into the creation of agents that can assist members of the BVI community with outdoor navigation. △ Less

Submitted 29 October, 2019; originally announced October 2019.

Comments: Accepted at CoRL2019. Code & video available at https://mweiss17.github.io/SEVN/

arXiv:1910.09600 [pdf, other]

Is graph-based feature selection of genes better than random?

Authors: Mohammad Hashir, Paul Bertin, Martin Weiss, Vincent Frappier, Theodore J. Perkins, Geneviève Boucher, Joseph Paul Cohen

Abstract: Gene interaction graphs aim to capture various relationships between genes and represent decades of biology research. When trying to make predictions from genomic data, those graphs could be used to overcome the curse of dimensionality by making machine learning models sparser and more consistent with biological common knowledge. In this work, we focus on assessing whether those graphs capture dep… ▽ More Gene interaction graphs aim to capture various relationships between genes and represent decades of biology research. When trying to make predictions from genomic data, those graphs could be used to overcome the curse of dimensionality by making machine learning models sparser and more consistent with biological common knowledge. In this work, we focus on assessing whether those graphs capture dependencies seen in gene expression data better than random. We formulate a condition that graphs should satisfy to provide a good prior knowledge and propose to test it using a `Single Gene Inference' (SGI) task. We compare random graphs with seven major gene interaction graphs published by different research groups, aiming to measure the true benefit of using biologically relevant graphs in this context. Our analysis finds that dependencies can be captured almost as well at random which suggests that, in terms of gene expression levels, the relevant information about the state of the cell is spread across many genes. △ Less

Submitted 27 December, 2019; v1 submitted 21 October, 2019; originally announced October 2019.

Comments: Accepted to the Machine Learning in Computational Biology (MLCB) meeting 2019. 7 pages. 4 figures. arXiv admin note: substantial text overlap with arXiv:1905.02295

arXiv:1910.09570 [pdf, other]

Icentia11K: An Unsupervised Representation Learning Dataset for Arrhythmia Subtype Discovery

Authors: Shawn Tan, Guillaume Androz, Ahmad Chamseddine, Pierre Fecteau, Aaron Courville, Yoshua Bengio, Joseph Paul Cohen

Abstract: We release the largest public ECG dataset of continuous raw signals for representation learning containing 11 thousand patients and 2 billion labelled beats. Our goal is to enable semi-supervised ECG models to be made as well as to discover unknown subtypes of arrhythmia and anomalous ECG signal events. To this end, we propose an unsupervised representation learning task, evaluated in a semi-super… ▽ More We release the largest public ECG dataset of continuous raw signals for representation learning containing 11 thousand patients and 2 billion labelled beats. Our goal is to enable semi-supervised ECG models to be made as well as to discover unknown subtypes of arrhythmia and anomalous ECG signal events. To this end, we propose an unsupervised representation learning task, evaluated in a semi-supervised fashion. We provide a set of baselines for different feature extractors that can be built upon. Additionally, we perform qualitative evaluations on results from PCA embeddings, where we identify some clustering of known subtypes indicating the potential for representation learning in arrhythmia sub-type discovery. △ Less

Submitted 21 October, 2019; originally announced October 2019.

Comments: Under Review

arXiv:1910.08636 [pdf, other]

The TCGA Meta-Dataset Clinical Benchmark

Authors: Mandana Samiei, Tobias Würfl, Tristan Deleu, Martin Weiss, Francis Dutil, Thomas Fevens, Geneviève Boucher, Sebastien Lemieux, Joseph Paul Cohen

Abstract: Machine learning is bringing a paradigm shift to healthcare by changing the process of disease diagnosis and prognosis in clinics and hospitals. This development equips doctors and medical staff with tools to evaluate their hypotheses and hence make more precise decisions. Although most current research in the literature seeks to develop techniques and methods for predicting one particular clinica… ▽ More Machine learning is bringing a paradigm shift to healthcare by changing the process of disease diagnosis and prognosis in clinics and hospitals. This development equips doctors and medical staff with tools to evaluate their hypotheses and hence make more precise decisions. Although most current research in the literature seeks to develop techniques and methods for predicting one particular clinical outcome, this approach is far from the reality of clinical decision making in which you have to consider several factors simultaneously. In addition, it is difficult to follow the recent progress concretely as there is a lack of consistency in benchmark datasets and task definitions in the field of Genomics. To address the aforementioned issues, we provide a clinical Meta-Dataset derived from the publicly available data hub called The Cancer Genome Atlas Program (TCGA) that contains 174 tasks. We believe those tasks could be good proxy tasks to develop methods which can work on a few samples of gene expression data. Also, learning to predict multiple clinical variables using gene-expression data is an important task due to the variety of phenotypes in clinical problems and lack of samples for some of the rare variables. The defined tasks cover a wide range of clinical problems including predicting tumor tissue site, white cell count, histological type, family history of cancer, gender, and many others which we explain later in the paper. Each task represents an independent dataset. We use regression and neural network baselines for all the tasks using only 150 samples and compare their performance. △ Less

Submitted 18 October, 2019; originally announced October 2019.

Comments: 5 Pages, Submitted to MLCB 2019

arXiv:1910.07655 [pdf, other]

Deep Semantic Segmentation of Natural and Medical Images: A Review

Authors: Saeid Asgari Taghanaki, Kumar Abhishek, Joseph Paul Cohen, Julien Cohen-Adad, Ghassan Hamarneh

Abstract: The semantic image segmentation task consists of classifying each pixel of an image into an instance, where each instance corresponds to a class. This task is a part of the concept of scene understanding or better explaining the global context of an image. In the medical image analysis domain, image segmentation can be used for image-guided interventions, radiotherapy, or improved radiological dia… ▽ More The semantic image segmentation task consists of classifying each pixel of an image into an instance, where each instance corresponds to a class. This task is a part of the concept of scene understanding or better explaining the global context of an image. In the medical image analysis domain, image segmentation can be used for image-guided interventions, radiotherapy, or improved radiological diagnostics. In this review, we categorize the leading deep learning-based medical and non-medical image segmentation solutions into six main groups of deep architectural, data synthesis-based, loss function-based, sequenced models, weakly supervised, and multi-task methods and provide a comprehensive review of the contributions in each of these groups. Further, for each group, we analyze each variant of these groups and discuss the limitations of the current approaches and present potential future research directions for semantic image segmentation. △ Less

Submitted 30 March, 2024; v1 submitted 16 October, 2019; originally announced October 2019.

Comments: 45 pages, 16 figures. Accepted for publication in Springer Artificial Intelligence Review

arXiv:1910.00199 [pdf, other]

Saliency is a Possible Red Herring When Diagnosing Poor Generalization

Authors: Joseph D. Viviano, Becks Simpson, Francis Dutil, Yoshua Bengio, Joseph Paul Cohen

Abstract: Poor generalization is one symptom of models that learn to predict target variables using spuriously-correlated image features present only in the training distribution instead of the true image features that denote a class. It is often thought that this can be diagnosed visually using attribution (aka saliency) maps. We study if this assumption is correct. In some prediction tasks, such as for me… ▽ More Poor generalization is one symptom of models that learn to predict target variables using spuriously-correlated image features present only in the training distribution instead of the true image features that denote a class. It is often thought that this can be diagnosed visually using attribution (aka saliency) maps. We study if this assumption is correct. In some prediction tasks, such as for medical images, one may have some images with masks drawn by a human expert, indicating a region of the image containing relevant information to make the prediction. We study multiple methods that take advantage of such auxiliary labels, by training networks to ignore distracting features which may be found outside of the region of interest. This mask information is only used during training and has an impact on generalization accuracy depending on the severity of the shift between the training and test distributions. Surprisingly, while these methods improve generalization performance in the presence of a covariate shift, there is no strong correspondence between the correction of attribution towards the features a human expert has labelled as important and generalization performance. These results suggest that the root cause of poor generalization may not always be spatially defined, and raise questions about the utility of masks as "attribution priors" as well as saliency maps for explainable predictions. △ Less

Submitted 10 February, 2021; v1 submitted 1 October, 2019; originally announced October 2019.

Comments: 25 pages, 27 figures, 5 tables, code in paper (https://github.com/josephdviviano/saliency-red-herring). Published at International Conference on Learning Representations (ICLR) 2021. Previously titled "Underwhelming Generalization Improvements from Controlling Feature Attribution"

arXiv:1909.11140 [pdf, other]

doi 10.1093/mnras/stz2742

Lick Observatory Supernova Search Follow-Up Program: Photometry Data Release of 93 Type Ia Supernovae

Authors: Benjamin E. Stahl, WeiKang Zheng, Thomas de Jaeger, Alexei V. Filippenko, Andrew Bigley, Kyle Blanchard, Peter K. Blanchard, Thomas G. Brink, Samantha K. Cargill, Chadwick Casper, Sanyum Channa, Byung Yun Choi, Nick Choksi, Jason Chu, Kelsey I. Clubb, Daniel P. Cohen, Michael Ellison, Edward Falcon, Pegah Fazeli, Kiera Fuller, Mohan Ganeshalingam, Elinor L. Gates, Carolina Gould, Goni Halevi, Kevin T. Hayakawa , et al. (30 additional authors not shown)

Abstract: We present BVRI and unfiltered light curves of 93 Type Ia supernovae (SNe Ia) from the Lick Observatory Supernova Search (LOSS) follow-up program conducted between 2005 and 2018. Our sample consists of 78 spectroscopically normal SNe Ia, with the remainder divided between distinct subclasses (three SN 1991bg-like, three SN 1991T-like, four SNe Iax, two peculiar, and three super-Chandrasekhar event… ▽ More We present BVRI and unfiltered light curves of 93 Type Ia supernovae (SNe Ia) from the Lick Observatory Supernova Search (LOSS) follow-up program conducted between 2005 and 2018. Our sample consists of 78 spectroscopically normal SNe Ia, with the remainder divided between distinct subclasses (three SN 1991bg-like, three SN 1991T-like, four SNe Iax, two peculiar, and three super-Chandrasekhar events), and has a median redshift of 0.0192. The SNe in our sample have a median coverage of 16 photometric epochs at a cadence of 5.4 days, and the median first observed epoch is ~4.6 days before maximum B-band light. We describe how the SNe in our sample are discovered, observed, and processed, and we compare the results from our newly developed automated photometry pipeline to those from the previous processing pipeline used by LOSS. After investigating potential biases, we derive a final systematic uncertainty of 0.03 mag in BVRI for our dataset. We perform an analysis of our light curves with particular focus on using template fitting to measure the parameters that are useful in standardising SNe Ia as distance indicators. All of the data are available to the community, and we encourage future studies to incorporate our light curves in their analyses. △ Less

Submitted 24 September, 2019; originally announced September 2019.

Comments: 29 pages, 13 figures, accepted for publication in MNRAS

Showing 1–50 of 106 results for author: Cohen, P