Search | arXiv e-print repository

Self Distillation via Iterative Constructive Perturbations

Authors: Maheak Dave, Aniket Kumar Singh, Aryan Pareek, Harshita Jha, Debasis Chaudhuri, Manish Pratap Singh

Abstract: Deep Neural Networks have achieved remarkable achievements across various domains, however balancing performance and generalization still remains a challenge while training these networks. In this paper, we propose a novel framework that uses a cyclic optimization strategy to concurrently optimize the model and its input data for better training, rethinking the traditional training paradigm. Centr… ▽ More Deep Neural Networks have achieved remarkable achievements across various domains, however balancing performance and generalization still remains a challenge while training these networks. In this paper, we propose a novel framework that uses a cyclic optimization strategy to concurrently optimize the model and its input data for better training, rethinking the traditional training paradigm. Central to our approach is Iterative Constructive Perturbation (ICP), which leverages the model's loss to iteratively perturb the input, progressively constructing an enhanced representation over some refinement steps. This ICP input is then fed back into the model to produce improved intermediate features, which serve as a target in a self-distillation framework against the original features. By alternately altering the model's parameters to the data and the data to the model, our method effectively addresses the gap between fitting and generalization, leading to enhanced performance. Extensive experiments demonstrate that our approach not only mitigates common performance bottlenecks in neural networks but also demonstrates significant improvements across training variations. △ Less

Submitted 20 May, 2025; originally announced May 2025.

arXiv:2505.00325 [pdf, other]

doi 10.1145/3534678.3539179

CognitionNet: A Collaborative Neural Network for Play Style Discovery in Online Skill Gaming Platform

Authors: Rukma Talwadker, Surajit Chakrabarty, Aditya Pareek, Tridib Mukherjee, Deepak Saini

Abstract: Games are one of the safest source of realizing self-esteem and relaxation at the same time. An online gaming platform typically has massive data coming in, e.g., in-game actions, player moves, clickstreams, transactions etc. It is rather interesting, as something as simple as data on gaming moves can help create a psychological imprint of the user at that moment, based on her impulsive reactions… ▽ More Games are one of the safest source of realizing self-esteem and relaxation at the same time. An online gaming platform typically has massive data coming in, e.g., in-game actions, player moves, clickstreams, transactions etc. It is rather interesting, as something as simple as data on gaming moves can help create a psychological imprint of the user at that moment, based on her impulsive reactions and response to a situation in the game. Mining this knowledge can: (a) immediately help better explain observed and predicted player behavior; and (b) consequently propel deeper understanding towards players' experience, growth and protection. To this effect, we focus on discovery of the "game behaviours" as micro-patterns formed by continuous sequence of games and the persistent "play styles" of the players' as a sequence of such sequences on an online skill gaming platform for Rummy. We propose a two stage deep neural network, CognitionNet. The first stage focuses on mining game behaviours as cluster representations in a latent space while the second aggregates over these micro patterns to discover play styles via a supervised classification objective around player engagement. The dual objective allows CognitionNet to reveal several player psychology inspired decision making and tactics. To our knowledge, this is the first and one-of-its-kind research to fully automate the discovery of: (i) player psychology and game tactics from telemetry data; and (ii) relevant diagnostic explanations to players' engagement predictions. The collaborative training of the two networks with differential input dimensions is enabled using a novel formulation of "bridge loss". The network plays pivotal role in obtaining homogeneous and consistent play style definitions and significantly outperforms the SOTA baselines wherever applicable. △ Less

Submitted 1 May, 2025; originally announced May 2025.

arXiv:2504.21383 [pdf, other]

doi 10.1145/3701716.3715224

FAST-Q: Fast-track Exploration with Adversarially Balanced State Representations for Counterfactual Action Estimation in Offline Reinforcement Learning

Authors: Pulkit Agrawal, Rukma Talwadker, Aditya Pareek, Tridib Mukherjee

Abstract: Recent advancements in state-of-the-art (SOTA) offline reinforcement learning (RL) have primarily focused on addressing function approximation errors, which contribute to the overestimation of Q-values for out-of-distribution actions, a challenge that static datasets exacerbate. However, high stakes applications such as recommendation systems in online gaming, introduce further complexities due to… ▽ More Recent advancements in state-of-the-art (SOTA) offline reinforcement learning (RL) have primarily focused on addressing function approximation errors, which contribute to the overestimation of Q-values for out-of-distribution actions, a challenge that static datasets exacerbate. However, high stakes applications such as recommendation systems in online gaming, introduce further complexities due to player's psychology (intent) driven by gameplay experiences and the inherent volatility on the platform. These factors create highly sparse, partially overlapping state spaces across policies, further influenced by the experiment path selection logic which biases state spaces towards specific policies. Current SOTA methods constrain learning from such offline data by clipping known counterfactual actions as out-of-distribution due to poor generalization across unobserved states. Further aggravating conservative Q-learning and necessitating more online exploration. FAST-Q introduces a novel approach that (1) leverages Gradient Reversal Learning to construct balanced state representations, regularizing the policy-specific bias between the player's state and action thereby enabling counterfactual estimation; (2) supports offline counterfactual exploration in parallel with static data exploitation; and (3) proposes a Q-value decomposition strategy for multi-objective optimization, facilitating explainable recommendations over short and long-term objectives. These innovations demonstrate superiority of FAST-Q over prior SOTA approaches and demonstrates at least 0.15 percent increase in player returns, 2 percent improvement in lifetime value (LTV), 0.4 percent enhancement in the recommendation driven engagement, 2 percent improvement in the player's platform dwell time and an impressive 10 percent reduction in the costs associated with the recommendation, on our volatile gaming platform. △ Less

Submitted 30 April, 2025; originally announced April 2025.

arXiv:2504.03777 [pdf, other]

doi 10.1145/3637528.3671657

Explainable and Interpretable Forecasts on Non-Smooth Multivariate Time Series for Responsible Gameplay

Authors: Hussain Jagirdar, Rukma Talwadker, Aditya Pareek, Pulkit Agrawal, Tridib Mukherjee

Abstract: Multi-variate Time Series (MTS) forecasting has made large strides (with very negligible errors) through recent advancements in neural networks, e.g., Transformers. However, in critical situations like predicting gaming overindulgence that affects one's mental well-being; an accurate forecast without a contributing evidence (explanation) is irrelevant. Hence, it becomes important that the forecast… ▽ More Multi-variate Time Series (MTS) forecasting has made large strides (with very negligible errors) through recent advancements in neural networks, e.g., Transformers. However, in critical situations like predicting gaming overindulgence that affects one's mental well-being; an accurate forecast without a contributing evidence (explanation) is irrelevant. Hence, it becomes important that the forecasts are Interpretable - intermediate representation of the forecasted trajectory is comprehensible; as well as Explainable - attentive input features and events are accessible for a personalized and timely intervention of players at risk. While the contributing state of the art research on interpretability primarily focuses on temporally-smooth single-process driven time series data, our online multi-player gameplay data demonstrates intractable temporal randomness due to intrinsic orthogonality between player's game outcome and their intent to engage further. We introduce a novel deep Actionable Forecasting Network (AFN), which addresses the inter-dependent challenges associated with three exclusive objectives - 1) forecasting accuracy; 2) smooth comprehensible trajectory and 3) explanations via multi-dimensional input features while tackling the challenges introduced by our non-smooth temporal data, together in one single solution. AFN establishes a \it{new benchmark} via: (i) achieving 25% improvement on the MSE of the forecasts on player data in comparison to the SOM-VAE based SOTA networks; (ii) attributing unfavourable progression of a player's time series to a specific future time step(s), with the premise of eliminating near-future overindulgent player volume by over 18% with player specific actionable inputs feature(s) and (iii) proactively detecting over 23% (100% jump from SOTA) of the to-be overindulgent, players on an average, 4 weeks in advance. △ Less

Submitted 3 April, 2025; originally announced April 2025.

arXiv:2411.18602 [pdf, other]

Evaluating and Improving the Effectiveness of Synthetic Chest X-Rays for Medical Image Analysis

Authors: Eva Prakash, Jeya Maria Jose Valanarasu, Zhihong Chen, Eduardo Pontes Reis, Andrew Johnston, Anuj Pareek, Christian Bluethgen, Sergios Gatidis, Cameron Olsen, Akshay Chaudhari, Andrew Ng, Curtis Langlotz

Abstract: Purpose: To explore best-practice approaches for generating synthetic chest X-ray images and augmenting medical imaging datasets to optimize the performance of deep learning models in downstream tasks like classification and segmentation. Materials and Methods: We utilized a latent diffusion model to condition the generation of synthetic chest X-rays on text prompts and/or segmentation masks. We e… ▽ More Purpose: To explore best-practice approaches for generating synthetic chest X-ray images and augmenting medical imaging datasets to optimize the performance of deep learning models in downstream tasks like classification and segmentation. Materials and Methods: We utilized a latent diffusion model to condition the generation of synthetic chest X-rays on text prompts and/or segmentation masks. We explored methods like using a proxy model and using radiologist feedback to improve the quality of synthetic data. These synthetic images were then generated from relevant disease information or geometrically transformed segmentation masks and added to ground truth training set images from the CheXpert, CANDID-PTX, SIIM, and RSNA Pneumonia datasets to measure improvements in classification and segmentation model performance on the test sets. F1 and Dice scores were used to evaluate classification and segmentation respectively. One-tailed t-tests with Bonferroni correction assessed the statistical significance of performance improvements with synthetic data. Results: Across all experiments, the synthetic data we generated resulted in a maximum mean classification F1 score improvement of 0.150453 (CI: 0.099108-0.201798; P=0.0031) compared to using only real data. For segmentation, the maximum Dice score improvement was 0.14575 (CI: 0.108267-0.183233; P=0.0064). Conclusion: Best practices for generating synthetic chest X-ray images for downstream tasks include conditioning on single-disease labels or geometrically transformed segmentation masks, as well as potentially using proxy modeling for fine-tuning such generations. △ Less

Submitted 27 November, 2024; originally announced November 2024.

arXiv:2407.03666 [pdf, ps, other]

Greedy on Preorder is Linear for Preorder Initial Tree

Authors: Akash Pareek

Abstract: The (preorder) traversal conjecture states that starting with an initial tree, the cost to search a sequence $S=(s_1,s_2,\dots,s_n) \in [n]^n$ in a binary search tree (BST) algorithm is $O(n)$, where $S$ is obtained by a preorder traversal of some BST. The sequence $S$ is called a preorder sequence. For Splay trees (candidate for dynamic optimality conjecture), the preorder traversal holds only… ▽ More The (preorder) traversal conjecture states that starting with an initial tree, the cost to search a sequence $S=(s_1,s_2,\dots,s_n) \in [n]^n$ in a binary search tree (BST) algorithm is $O(n)$, where $S$ is obtained by a preorder traversal of some BST. The sequence $S$ is called a preorder sequence. For Splay trees (candidate for dynamic optimality conjecture), the preorder traversal holds only when the initial tree is empty (Levy and Tarjan, WADS 2019). The preorder traversal conjecture for GREEDY (candidate for dynamic optimality conjecture) was known to be $n2^{α(n)^{O(1)}}$ (Chalermsook et al., FOCS 2015), which was recently improved to $O(n2^{α(n)})$ (Chalermsook et al., SODA 2023), here $α(n)$ is the inverse Ackermann function of $n$. For a special case when the initial tree is flat, GREEDY is known to satisfy the traversal conjecture, i.e., $O(n)$ (Chalermsook et al., FOCS 2015). In this paper, we show that for every preorder sequence $S$, there exists an initial tree called the preorder initial tree for which GREEDY satisfies the preorder traversal conjecture. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: 10 pages, 5 figures

arXiv:2312.15426 [pdf, other]

The Group Access Bounds for Binary Search Trees

Authors: Parinya Chalermsook, Manoj Gupta, Wanchote Jiamjitrak, Akash Pareek, Sorrachai Yingchareonthawornchai

Abstract: The access lemma (Sleator and Tarjan, JACM 1985) is a property of binary search trees that implies interesting consequences such as static optimality, static finger, and working set property. However, there are known corollaries of the dynamic optimality that cannot be derived via the access lemma, such as the dynamic finger, and any $o(\log n)$-competitive ratio to the optimal BST where $n$ is th… ▽ More The access lemma (Sleator and Tarjan, JACM 1985) is a property of binary search trees that implies interesting consequences such as static optimality, static finger, and working set property. However, there are known corollaries of the dynamic optimality that cannot be derived via the access lemma, such as the dynamic finger, and any $o(\log n)$-competitive ratio to the optimal BST where $n$ is the number of keys. In this paper, we introduce the group access bound that can be defined with respect to a reference group access tree. Group access bounds generalize the access lemma and imply properties that are far stronger than those implied by the access lemma. For each of the following results, there is a group access tree whose group access bound Is $O(\sqrt{\log n})$-competitive to the optimal BST. Achieves the $k$-finger bound with an additive term of $O(m \log k \log \log n)$ (randomized) when the reference tree is an almost complete binary tree. Satisfies the unified bound with an additive term of $O(m \log \log n)$. Matches the unified bound with a time window $k$ with an additive term of $O(m \log k \log \log n)$ (randomized). Furthermore, we prove simulation theorem: For every group access tree, there is an online BST algorithm that is $O(1)$-competitive with its group access bound. In particular, any new group access bound will automatically imply a new BST algorithm achieving the same bound. Thereby, we obtain an improved $k$-finger bound (reference tree is an almost complete binary tree), an improved unified bound with a time window $k$, and matching the best-known bound for Unified bound in the BST model. Since any dynamically optimal BST must achieve the group access bounds, we believe our results provide a new direction towards proving $o(\log n)$-competitiveness of Splay tree and Greedy. △ Less

Submitted 24 December, 2023; originally announced December 2023.

arXiv:2309.07430 [pdf, other]

doi 10.1038/s41591-024-02855-5

Adapted Large Language Models Can Outperform Medical Experts in Clinical Text Summarization

Authors: Dave Van Veen, Cara Van Uden, Louis Blankemeier, Jean-Benoit Delbrouck, Asad Aali, Christian Bluethgen, Anuj Pareek, Malgorzata Polacin, Eduardo Pontes Reis, Anna Seehofnerova, Nidhi Rohatgi, Poonam Hosamani, William Collins, Neera Ahuja, Curtis P. Langlotz, Jason Hom, Sergios Gatidis, John Pauly, Akshay S. Chaudhari

Abstract: Analyzing vast textual data and summarizing key information from electronic health records imposes a substantial burden on how clinicians allocate their time. Although large language models (LLMs) have shown promise in natural language processing (NLP), their effectiveness on a diverse range of clinical summarization tasks remains unproven. In this study, we apply adaptation methods to eight LLMs,… ▽ More Analyzing vast textual data and summarizing key information from electronic health records imposes a substantial burden on how clinicians allocate their time. Although large language models (LLMs) have shown promise in natural language processing (NLP), their effectiveness on a diverse range of clinical summarization tasks remains unproven. In this study, we apply adaptation methods to eight LLMs, spanning four distinct clinical summarization tasks: radiology reports, patient questions, progress notes, and doctor-patient dialogue. Quantitative assessments with syntactic, semantic, and conceptual NLP metrics reveal trade-offs between models and adaptation methods. A clinical reader study with ten physicians evaluates summary completeness, correctness, and conciseness; in a majority of cases, summaries from our best adapted LLMs are either equivalent (45%) or superior (36%) compared to summaries from medical experts. The ensuing safety analysis highlights challenges faced by both LLMs and medical experts, as we connect errors to potential medical harm and categorize types of fabricated information. Our research provides evidence of LLMs outperforming medical experts in clinical text summarization across multiple tasks. This suggests that integrating LLMs into clinical workflows could alleviate documentation burden, allowing clinicians to focus more on patient care. △ Less

Submitted 11 April, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

Comments: 27 pages, 19 figures

Journal ref: Nature Medicine, 2024

arXiv:2305.01146 [pdf, other]

RadAdapt: Radiology Report Summarization via Lightweight Domain Adaptation of Large Language Models

Authors: Dave Van Veen, Cara Van Uden, Maayane Attias, Anuj Pareek, Christian Bluethgen, Malgorzata Polacin, Wah Chiu, Jean-Benoit Delbrouck, Juan Manuel Zambrano Chaves, Curtis P. Langlotz, Akshay S. Chaudhari, John Pauly

Abstract: We systematically investigate lightweight strategies to adapt large language models (LLMs) for the task of radiology report summarization (RRS). Specifically, we focus on domain adaptation via pretraining (on natural language, biomedical text, or clinical text) and via discrete prompting or parameter-efficient fine-tuning. Our results consistently achieve best performance by maximally adapting to… ▽ More We systematically investigate lightweight strategies to adapt large language models (LLMs) for the task of radiology report summarization (RRS). Specifically, we focus on domain adaptation via pretraining (on natural language, biomedical text, or clinical text) and via discrete prompting or parameter-efficient fine-tuning. Our results consistently achieve best performance by maximally adapting to the task via pretraining on clinical text and fine-tuning on RRS examples. Importantly, this method fine-tunes a mere 0.32% of parameters throughout the model, in contrast to end-to-end fine-tuning (100% of parameters). Additionally, we study the effect of in-context examples and out-of-distribution (OOD) training before concluding with a radiologist reader study and qualitative analysis. Our findings highlight the importance of domain adaptation in RRS and provide valuable insights toward developing effective natural language processing solutions for clinical tasks. △ Less

Submitted 20 July, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

Comments: 12 pages, 10 figures. Published in ACL BioNLP. Compared to v1, v2 includes minor edits and one additional figure in the appendix. Compared to v2, v3 includes a link to the project's GitHub repository

arXiv:2304.00487 [pdf, other]

The Effect of Counterfactuals on Reading Chest X-rays

Authors: Joseph Paul Cohen, Rupert Brooks, Sovann En, Evan Zucker, Anuj Pareek, Matthew Lungren, Akshay Chaudhari

Abstract: This study evaluates the effect of counterfactual explanations on the interpretation of chest X-rays. We conduct a reader study with two radiologists assessing 240 chest X-ray predictions to rate their confidence that the model's prediction is correct using a 5 point scale. Half of the predictions are false positives. Each prediction is explained twice, once using traditional attribution methods a… ▽ More This study evaluates the effect of counterfactual explanations on the interpretation of chest X-rays. We conduct a reader study with two radiologists assessing 240 chest X-ray predictions to rate their confidence that the model's prediction is correct using a 5 point scale. Half of the predictions are false positives. Each prediction is explained twice, once using traditional attribution methods and once with a counterfactual explanation. The overall results indicate that counterfactual explanations allow a radiologist to have more confidence in true positive predictions compared to traditional approaches (0.15$\pm$0.95 with p=0.01) with only a small increase in false positive predictions (0.04$\pm$1.06 with p=0.57). We observe the specific prediction tasks of Mass and Atelectasis appear to benefit the most compared to other tasks. △ Less

Submitted 2 April, 2023; originally announced April 2023.

Comments: Abstract submitted to CVPR XAI4CV 2023 based on longer version: arXiv:2102.09475

arXiv:2211.04112 [pdf, other]

Improved Pattern-Avoidance Bounds for Greedy BSTs via Matrix Decomposition

Authors: Parinya Chalermsook, Manoj Gupta, Wanchote Jiamjitrak, Nidia Obscura Acosta, Akash Pareek, Sorrachai Yingchareonthawornchai

Abstract: Greedy BST (or simply Greedy) is an online self-adjusting binary search tree defined in the geometric view ([Lucas, 1988; Munro, 2000; Demaine, Harmon, Iacono, Kane, Patrascu, SODA 2009). Along with Splay trees (Sleator, Tarjan 1985), Greedy is considered the most promising candidate for being dynamically optimal, i.e., starting with any initial tree, their access costs on any sequence is conjectu… ▽ More Greedy BST (or simply Greedy) is an online self-adjusting binary search tree defined in the geometric view ([Lucas, 1988; Munro, 2000; Demaine, Harmon, Iacono, Kane, Patrascu, SODA 2009). Along with Splay trees (Sleator, Tarjan 1985), Greedy is considered the most promising candidate for being dynamically optimal, i.e., starting with any initial tree, their access costs on any sequence is conjectured to be within $O(1)$ factor of the offline optimal. However, in the past four decades, the question has remained elusive even for highly restricted input. In this paper, we prove new bounds on the cost of Greedy in the ''pattern avoidance'' regime. Our new results include: The (preorder) traversal conjecture for Greedy holds up to a factor of $O(2^{α(n)})$, improving upon the bound of $2^{α(n)^{O(1)}}$ in (Chalermsook et al., FOCS 2015). This is the best known bound obtained by any online BSTs. We settle the postorder traversal conjecture for Greedy. The deque conjecture for Greedy holds up to a factor of $O(α(n))$, improving upon the bound $2^{O(α(n))}$ in (Chalermsook, et al., WADS 2015). The split conjecture holds for Greedy up to a factor of $O(2^{α(n)})$. Key to all these results is to partition (based on the input structures) the execution log of Greedy into several simpler-to-analyze subsets for which classical forbidden submatrix bounds can be leveraged. Finally, we show the applicability of this technique to handle a class of increasingly complex pattern-avoiding input sequences, called $k$-increasing sequences. As a bonus, we discover a new class of permutation matrices whose extremal bounds are polynomially bounded. This gives a partial progress on an open question by Jacob Fox (2013). △ Less

Submitted 8 November, 2022; originally announced November 2022.

Comments: Accepted to SODA 2023

arXiv:2102.09475 [pdf, other]

Gifsplanation via Latent Shift: A Simple Autoencoder Approach to Counterfactual Generation for Chest X-rays

Authors: Joseph Paul Cohen, Rupert Brooks, Sovann En, Evan Zucker, Anuj Pareek, Matthew P. Lungren, Akshay Chaudhari

Abstract: Motivation: Traditional image attribution methods struggle to satisfactorily explain predictions of neural networks. Prediction explanation is important, especially in medical imaging, for avoiding the unintended consequences of deploying AI systems when false positive predictions can impact patient care. Thus, there is a pressing need to develop improved models for model explainability and intros… ▽ More Motivation: Traditional image attribution methods struggle to satisfactorily explain predictions of neural networks. Prediction explanation is important, especially in medical imaging, for avoiding the unintended consequences of deploying AI systems when false positive predictions can impact patient care. Thus, there is a pressing need to develop improved models for model explainability and introspection. Specific problem: A new approach is to transform input images to increase or decrease features which cause the prediction. However, current approaches are difficult to implement as they are monolithic or rely on GANs. These hurdles prevent wide adoption. Our approach: Given an arbitrary classifier, we propose a simple autoencoder and gradient update (Latent Shift) that can transform the latent representation of a specific input image to exaggerate or curtail the features used for prediction. We use this method to study chest X-ray classifiers and evaluate their performance. We conduct a reader study with two radiologists assessing 240 chest X-ray predictions to identify which ones are false positives (half are) using traditional attribution maps or our proposed method. Results: We found low overlap with ground truth pathology masks for models with reasonably high accuracy. However, the results from our reader study indicate that these models are generally looking at the correct features. We also found that the Latent Shift explanation allows a user to have more confidence in true positive predictions compared to traditional approaches (0.15$\pm$0.95 in a 5 point scale with p=0.01) with only a small increase in false positive predictions (0.04$\pm$1.06 with p=0.57). Accompanying webpage: https://mlmed.org/gifsplanation Source code: https://github.com/mlmed/gifsplanation △ Less

Submitted 24 April, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

Comments: Full paper at MIDL2021

arXiv:2102.08660 [pdf, other]

doi 10.1145/3450439.3451876

CheXternal: Generalization of Deep Learning Models for Chest X-ray Interpretation to Photos of Chest X-rays and External Clinical Settings

Authors: Pranav Rajpurkar, Anirudh Joshi, Anuj Pareek, Andrew Y. Ng, Matthew P. Lungren

Abstract: Recent advances in training deep learning models have demonstrated the potential to provide accurate chest X-ray interpretation and increase access to radiology expertise. However, poor generalization due to data distribution shifts in clinical settings is a key barrier to implementation. In this study, we measured the diagnostic performance for 8 different chest X-ray models when applied to (1) s… ▽ More Recent advances in training deep learning models have demonstrated the potential to provide accurate chest X-ray interpretation and increase access to radiology expertise. However, poor generalization due to data distribution shifts in clinical settings is a key barrier to implementation. In this study, we measured the diagnostic performance for 8 different chest X-ray models when applied to (1) smartphone photos of chest X-rays and (2) external datasets without any finetuning. All models were developed by different groups and submitted to the CheXpert challenge, and re-applied to test datasets without further tuning. We found that (1) on photos of chest X-rays, all 8 models experienced a statistically significant drop in task performance, but only 3 performed significantly worse than radiologists on average, and (2) on the external set, none of the models performed statistically significantly worse than radiologists, and five models performed statistically significantly better than radiologists. Our results demonstrate that some chest X-ray models, under clinically relevant distribution shifts, were comparable to radiologists while other models were not. Future work should investigate aspects of model training procedures and dataset collection that influence generalization in the presence of data distribution shifts. △ Less

Submitted 20 February, 2021; v1 submitted 17 February, 2021; originally announced February 2021.

Comments: Accepted to ACM Conference on Health, Inference, and Learning (ACM-CHIL) 2021. arXiv admin note: substantial text overlap with arXiv:2011.06129

arXiv:2011.06129 [pdf, other]

CheXphotogenic: Generalization of Deep Learning Models for Chest X-ray Interpretation to Photos of Chest X-rays

Authors: Pranav Rajpurkar, Anirudh Joshi, Anuj Pareek, Jeremy Irvin, Andrew Y. Ng, Matthew Lungren

Abstract: The use of smartphones to take photographs of chest x-rays represents an appealing solution for scaled deployment of deep learning models for chest x-ray interpretation. However, the performance of chest x-ray algorithms on photos of chest x-rays has not been thoroughly investigated. In this study, we measured the diagnostic performance for 8 different chest x-ray models when applied to photos of… ▽ More The use of smartphones to take photographs of chest x-rays represents an appealing solution for scaled deployment of deep learning models for chest x-ray interpretation. However, the performance of chest x-ray algorithms on photos of chest x-rays has not been thoroughly investigated. In this study, we measured the diagnostic performance for 8 different chest x-ray models when applied to photos of chest x-rays. All models were developed by different groups and submitted to the CheXpert challenge, and re-applied to smartphone photos of x-rays in the CheXphoto dataset without further tuning. We found that several models had a drop in performance when applied to photos of chest x-rays, but even with this drop, some models still performed comparably to radiologists. Further investigation could be directed towards understanding how different model training procedures may affect model generalization to photos of chest x-rays. △ Less

Submitted 11 November, 2020; originally announced November 2020.

Comments: Machine Learning for Health (ML4H) at NeurIPS 2020 - Extended Abstract

arXiv:2007.06199 [pdf, other]

CheXphoto: 10,000+ Photos and Transformations of Chest X-rays for Benchmarking Deep Learning Robustness

Authors: Nick A. Phillips, Pranav Rajpurkar, Mark Sabini, Rayan Krishnan, Sharon Zhou, Anuj Pareek, Nguyet Minh Phu, Chris Wang, Mudit Jain, Nguyen Duong Du, Steven QH Truong, Andrew Y. Ng, Matthew P. Lungren

Abstract: Clinical deployment of deep learning algorithms for chest x-ray interpretation requires a solution that can integrate into the vast spectrum of clinical workflows across the world. An appealing approach to scaled deployment is to leverage the ubiquity of smartphones by capturing photos of x-rays to share with clinicians using messaging services like WhatsApp. However, the application of chest x-ra… ▽ More Clinical deployment of deep learning algorithms for chest x-ray interpretation requires a solution that can integrate into the vast spectrum of clinical workflows across the world. An appealing approach to scaled deployment is to leverage the ubiquity of smartphones by capturing photos of x-rays to share with clinicians using messaging services like WhatsApp. However, the application of chest x-ray algorithms to photos of chest x-rays requires reliable classification in the presence of artifacts not typically encountered in digital x-rays used to train machine learning models. We introduce CheXphoto, a dataset of smartphone photos and synthetic photographic transformations of chest x-rays sampled from the CheXpert dataset. To generate CheXphoto we (1) automatically and manually captured photos of digital x-rays under different settings, and (2) generated synthetic transformations of digital x-rays targeted to make them look like photos of digital x-rays and x-ray films. We release this dataset as a resource for testing and improving the robustness of deep learning algorithms for automated chest x-ray interpretation on smartphone photos of chest x-rays. △ Less

Submitted 11 December, 2020; v1 submitted 13 July, 2020; originally announced July 2020.

arXiv:2004.09167 [pdf, other]

CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT

Authors: Akshay Smit, Saahil Jain, Pranav Rajpurkar, Anuj Pareek, Andrew Y. Ng, Matthew P. Lungren

Abstract: The extraction of labels from radiology text reports enables large-scale training of medical imaging models. Existing approaches to report labeling typically rely either on sophisticated feature engineering based on medical domain knowledge or manual annotations by experts. In this work, we introduce a BERT-based approach to medical image report labeling that exploits both the scale of available r… ▽ More The extraction of labels from radiology text reports enables large-scale training of medical imaging models. Existing approaches to report labeling typically rely either on sophisticated feature engineering based on medical domain knowledge or manual annotations by experts. In this work, we introduce a BERT-based approach to medical image report labeling that exploits both the scale of available rule-based systems and the quality of expert annotations. We demonstrate superior performance of a biomedically pretrained BERT model first trained on annotations of a rule-based labeler and then finetuned on a small set of expert annotations augmented with automated backtranslation. We find that our final model, CheXbert, is able to outperform the previous best rules-based labeler with statistical significance, setting a new SOTA for report labeling on one of the largest datasets of chest x-rays. △ Less

Submitted 18 October, 2020; v1 submitted 20 April, 2020; originally announced April 2020.

Comments: Accepted to EMNLP 2020

arXiv:2002.11379 [pdf, other]

CheXpedition: Investigating Generalization Challenges for Translation of Chest X-Ray Algorithms to the Clinical Setting

Authors: Pranav Rajpurkar, Anirudh Joshi, Anuj Pareek, Phil Chen, Amirhossein Kiani, Jeremy Irvin, Andrew Y. Ng, Matthew P. Lungren

Abstract: Although there have been several recent advances in the application of deep learning algorithms to chest x-ray interpretation, we identify three major challenges for the translation of chest x-ray algorithms to the clinical setting. We examine the performance of the top 10 performing models on the CheXpert challenge leaderboard on three tasks: (1) TB detection, (2) pathology detection on photos of… ▽ More Although there have been several recent advances in the application of deep learning algorithms to chest x-ray interpretation, we identify three major challenges for the translation of chest x-ray algorithms to the clinical setting. We examine the performance of the top 10 performing models on the CheXpert challenge leaderboard on three tasks: (1) TB detection, (2) pathology detection on photos of chest x-rays, and (3) pathology detection on data from an external institution. First, we find that the top 10 chest x-ray models on the CheXpert competition achieve an average AUC of 0.851 on the task of detecting TB on two public TB datasets without fine-tuning or including the TB labels in training data. Second, we find that the average performance of the models on photos of x-rays (AUC = 0.916) is similar to their performance on the original chest x-ray images (AUC = 0.924). Third, we find that the models tested on an external dataset either perform comparably to or exceed the average performance of radiologists. We believe that our investigation will inform rapid translation of deep learning algorithms to safe and effective clinical decision support tools that can be validated prospectively with large impact studies and clinical trials. △ Less

Submitted 11 March, 2020; v1 submitted 26 February, 2020; originally announced February 2020.

Comments: Accepted as workshop paper at ACM Conference on Health, Inference, and Learning (CHIL) 2020

arXiv:1906.07337 [pdf, ps, other]

Measuring Bias in Contextualized Word Representations

Authors: Keita Kurita, Nidhi Vyas, Ayush Pareek, Alan W Black, Yulia Tsvetkov

Abstract: Contextual word embeddings such as BERT have achieved state of the art performance in numerous NLP tasks. Since they are optimized to capture the statistical properties of training data, they tend to pick up on and amplify social stereotypes present in the data as well. In this study, we (1)~propose a template-based method to quantify bias in BERT; (2)~show that this method obtains more consistent… ▽ More Contextual word embeddings such as BERT have achieved state of the art performance in numerous NLP tasks. Since they are optimized to capture the statistical properties of training data, they tend to pick up on and amplify social stereotypes present in the data as well. In this study, we (1)~propose a template-based method to quantify bias in BERT; (2)~show that this method obtains more consistent results in capturing social biases than the traditional cosine based method; and (3)~conduct a case study, evaluating gender bias in a downstream task of Gender Pronoun Resolution. Although our case study focuses on gender bias, the proposed technique is generalizable to unveiling other biases, including in multiclass settings, such as racial and religious biases. △ Less

Submitted 17 June, 2019; originally announced June 2019.

Comments: 1st ACL Workshop on Gender Bias for Natural Language Processing 2019

arXiv:1706.06681 [pdf, other]

Graph-based Neural Multi-Document Summarization

Authors: Michihiro Yasunaga, Rui Zhang, Kshitijh Meelu, Ayush Pareek, Krishnan Srinivasan, Dragomir Radev

Abstract: We propose a neural multi-document summarization (MDS) system that incorporates sentence relation graphs. We employ a Graph Convolutional Network (GCN) on the relation graphs, with sentence embeddings obtained from Recurrent Neural Networks as input node features. Through multiple layer-wise propagation, the GCN generates high-level hidden sentence features for salience estimation. We then use a g… ▽ More We propose a neural multi-document summarization (MDS) system that incorporates sentence relation graphs. We employ a Graph Convolutional Network (GCN) on the relation graphs, with sentence embeddings obtained from Recurrent Neural Networks as input node features. Through multiple layer-wise propagation, the GCN generates high-level hidden sentence features for salience estimation. We then use a greedy heuristic to extract salient sentences while avoiding redundancy. In our experiments on DUC 2004, we consider three types of sentence relation graphs and demonstrate the advantage of combining sentence relations in graphs with the representation power of deep neural networks. Our model improves upon traditional graph-based extractive approaches and the vanilla GRU sequence model with no graph, and it achieves competitive results against other state-of-the-art multi-document summarization systems. △ Less

Submitted 23 August, 2017; v1 submitted 20 June, 2017; originally announced June 2017.

Comments: In CoNLL 2017

arXiv:1208.1723 [pdf, ps, other]

RMR-Efficient Randomized Abortable Mutual Exclusion

Authors: Abhijeet Pareek, Philipp Woelfel

Abstract: Recent research on mutual exclusion for shared-memory systems has focused on "local spin" algorithms. Performance is measured using the "remote memory references" (RMRs) metric. As common in recent literature, we consider a standard asynchronous shared memory model with N processes, which allows atomic read, write and compare-and-swap (short: CAS) operations. In such a model, the asymptotically… ▽ More Recent research on mutual exclusion for shared-memory systems has focused on "local spin" algorithms. Performance is measured using the "remote memory references" (RMRs) metric. As common in recent literature, we consider a standard asynchronous shared memory model with N processes, which allows atomic read, write and compare-and-swap (short: CAS) operations. In such a model, the asymptotically tight upper and lower bound on the number of RMRs per passage through the Critical Section is Theta(log N) for the optimal deterministic algorithms (see Yang and Anderson,1995, and Attiya, Hendler and Woelfel, 2008). Recently, several randomized algorithms have been devised that break the Omega(log N) barrier and need only o(log N) RMRs per passage in expectation (see Hendler and Woelfel, 2010, Hendler and Woelfel, 2011, and Bender and Gilbert, 2011). In this paper we present the first randomized "abortable" mutual exclusion algorithm that achieves a sub-logarithmic expected RMR complexity. More precisely, against a weak adversary (which can make scheduling decisions based on the entire past history, but not the latest coin-flips of each process) every process needs an expected number of O(log N/ log log N) RMRs to enter end exit the critical section. If a process receives an abort-signal, it can abort an attempt to enter the critical section within a finite number of its own steps and by incurring O(log N/ log log N) RMRs. △ Less

Submitted 8 August, 2012; originally announced August 2012.

Comments: Extended abstract will appear at DISC 2012

Showing 1–20 of 20 results for author: Pareek, A