-
A Simulator Dataset to Support the Study of Impaired Driving
Authors:
John Gideon,
Kimimasa Tamura,
Emily Sumner,
Laporsha Dees,
Patricio Reyes Gomez,
Bassamul Haq,
Todd Rowell,
Avinash Balachandran,
Simon Stent,
Guy Rosman
Abstract:
Despite recent advances in automated driving technology, impaired driving continues to incur a high cost to society. In this paper, we present a driving dataset designed to support the study of two common forms of driver impairment: alcohol intoxication and cognitive distraction. Our dataset spans 23.7 hours of simulated urban driving, with 52 human subjects under normal and impaired conditions, a…
▽ More
Despite recent advances in automated driving technology, impaired driving continues to incur a high cost to society. In this paper, we present a driving dataset designed to support the study of two common forms of driver impairment: alcohol intoxication and cognitive distraction. Our dataset spans 23.7 hours of simulated urban driving, with 52 human subjects under normal and impaired conditions, and includes both vehicle data (ground truth perception, vehicle pose, controls) and driver-facing data (gaze, audio, surveys). It supports analysis of changes in driver behavior due to alcohol intoxication (0.10\% blood alcohol content), two forms of cognitive distraction (audio n-back and sentence parsing tasks), and combinations thereof, as well as responses to a set of eight controlled road hazards, such as vehicle cut-ins. The dataset will be made available at https://toyotaresearchinstitute.github.io/IDD/.
△ Less
Submitted 15 April, 2025;
originally announced July 2025.
-
PuzzleWorld: A Benchmark for Multimodal, Open-Ended Reasoning in Puzzlehunts
Authors:
Hengzhi Li,
Brendon Jiang,
Alexander Naehu,
Regan Song,
Justin Zhang,
Megan Tjandrasuwita,
Chanakya Ekbote,
Steven-Shine Chen,
Adithya Balachandran,
Wei Dai,
Rebecca Chang,
Paul Pu Liang
Abstract:
Puzzlehunts are a genre of complex, multi-step puzzles lacking well-defined problem definitions. In contrast to conventional reasoning benchmarks consisting of tasks with clear instructions, puzzlehunts require models to discover the underlying problem structure from multimodal evidence and iterative reasoning, mirroring real-world domains such as scientific discovery, exploratory data analysis, o…
▽ More
Puzzlehunts are a genre of complex, multi-step puzzles lacking well-defined problem definitions. In contrast to conventional reasoning benchmarks consisting of tasks with clear instructions, puzzlehunts require models to discover the underlying problem structure from multimodal evidence and iterative reasoning, mirroring real-world domains such as scientific discovery, exploratory data analysis, or investigative problem-solving. Despite recent progress in foundation models, their performance on such open-ended settings remains largely untested. In this paper, we introduce PuzzleWorld, a large-scale benchmark of 667 puzzlehunt-style problems designed to assess step-by-step, open-ended, and creative multimodal reasoning. Each puzzle is annotated with the final solution, detailed reasoning traces, and cognitive skill labels, enabling holistic benchmarking and fine-grained diagnostic analysis. Most state-of-the-art models achieve only 1-2% final answer accuracy, with the best model solving only 14% of puzzles and reaching 40% stepwise accuracy. To demonstrate the value of our reasoning annotations, we show that fine-tuning a small model on reasoning traces improves stepwise reasoning from 4% to 11%, while training on final answers alone degrades performance to near zero. Our error analysis reveals that current models exhibit myopic reasoning, are bottlenecked by the limitations of language-based inference, and lack sketching capabilities crucial for visual and spatial reasoning. We release PuzzleWorld at https://github.com/MIT-MI/PuzzleWorld to support future work on building more general, open-ended, and creative reasoning systems.
△ Less
Submitted 6 June, 2025;
originally announced June 2025.
-
TruthLens: Explainable DeepFake Detection for Face Manipulated and Fully Synthetic Data
Authors:
Rohit Kundu,
Athula Balachandran,
Amit K. Roy-Chowdhury
Abstract:
Detecting DeepFakes has become a crucial research area as the widespread use of AI image generators enables the effortless creation of face-manipulated and fully synthetic content, yet existing methods are often limited to binary classification (real vs. fake) and lack interpretability. To address these challenges, we propose TruthLens, a novel and highly generalizable framework for DeepFake detec…
▽ More
Detecting DeepFakes has become a crucial research area as the widespread use of AI image generators enables the effortless creation of face-manipulated and fully synthetic content, yet existing methods are often limited to binary classification (real vs. fake) and lack interpretability. To address these challenges, we propose TruthLens, a novel and highly generalizable framework for DeepFake detection that not only determines whether an image is real or fake but also provides detailed textual reasoning for its predictions. Unlike traditional methods, TruthLens effectively handles both face-manipulated DeepFakes and fully AI-generated content while addressing fine-grained queries such as "Does the eyes/nose/mouth look real or fake?"
The architecture of TruthLens combines the global contextual understanding of multimodal large language models like PaliGemma2 with the localized feature extraction capabilities of vision-only models like DINOv2. This hybrid design leverages the complementary strengths of both models, enabling robust detection of subtle manipulations while maintaining interpretability. Extensive experiments on diverse datasets demonstrate that TruthLens outperforms state-of-the-art methods in detection accuracy (by 2-14%) and explainability, in both in-domain and cross-data settings, generalizing effectively across traditional and emerging manipulation techniques.
△ Less
Submitted 20 March, 2025;
originally announced March 2025.
-
Towards a Universal Synthetic Video Detector: From Face or Background Manipulations to Fully AI-Generated Content
Authors:
Rohit Kundu,
Hao Xiong,
Vishal Mohanty,
Athula Balachandran,
Amit K. Roy-Chowdhury
Abstract:
Existing DeepFake detection techniques primarily focus on facial manipulations, such as face-swapping or lip-syncing. However, advancements in text-to-video (T2V) and image-to-video (I2V) generative models now allow fully AI-generated synthetic content and seamless background alterations, challenging face-centric detection methods and demanding more versatile approaches.
To address this, we intr…
▽ More
Existing DeepFake detection techniques primarily focus on facial manipulations, such as face-swapping or lip-syncing. However, advancements in text-to-video (T2V) and image-to-video (I2V) generative models now allow fully AI-generated synthetic content and seamless background alterations, challenging face-centric detection methods and demanding more versatile approaches.
To address this, we introduce the \underline{U}niversal \underline{N}etwork for \underline{I}dentifying \underline{T}ampered and synth\underline{E}tic videos (\texttt{UNITE}) model, which, unlike traditional detectors, captures full-frame manipulations. \texttt{UNITE} extends detection capabilities to scenarios without faces, non-human subjects, and complex background modifications. It leverages a transformer-based architecture that processes domain-agnostic features extracted from videos via the SigLIP-So400M foundation model. Given limited datasets encompassing both facial/background alterations and T2V/I2V content, we integrate task-irrelevant data alongside standard DeepFake datasets in training. We further mitigate the model's tendency to over-focus on faces by incorporating an attention-diversity (AD) loss, which promotes diverse spatial attention across video frames. Combining AD loss with cross-entropy improves detection performance across varied contexts. Comparative evaluations demonstrate that \texttt{UNITE} outperforms state-of-the-art detectors on datasets (in cross-data settings) featuring face/background manipulations and fully synthetic T2V/I2V videos, showcasing its adaptability and generalizable detection capabilities.
△ Less
Submitted 16 December, 2024;
originally announced December 2024.
-
Towards an Autonomous Test Driver: High-Performance Driver Modeling via Reinforcement Learning
Authors:
John Subosits,
Jenna Lee,
Shawn Manuel,
Paul Tylkin,
Avinash Balachandran
Abstract:
Success in racing requires a unique combination of vehicle setup, understanding of the racetrack, and human expertise. Since building and testing many different vehicle configurations in the real world is prohibitively expensive, high-fidelity simulation is a critical part of racecar development. However, testing different vehicle configurations still requires expert human input in order to evalua…
▽ More
Success in racing requires a unique combination of vehicle setup, understanding of the racetrack, and human expertise. Since building and testing many different vehicle configurations in the real world is prohibitively expensive, high-fidelity simulation is a critical part of racecar development. However, testing different vehicle configurations still requires expert human input in order to evaluate their performance on different racetracks. In this work, we present the first steps towards an autonomous test driver, trained using deep reinforcement learning, capable of evaluating changes in vehicle setup on racing performance while driving at the level of the best human drivers. In addition, the autonomous driver model can be tuned to exhibit more human-like behavioral patterns by incorporating imitation learning into the RL training process. This extension permits the possibility of driver-specific vehicle setup optimization.
△ Less
Submitted 4 December, 2024;
originally announced December 2024.
-
Can DeepFake Speech be Reliably Detected?
Authors:
Hongbin Liu,
Youzheng Chen,
Arun Narayanan,
Athula Balachandran,
Pedro J. Moreno,
Lun Wang
Abstract:
Recent advances in text-to-speech (TTS) systems, particularly those with voice cloning capabilities, have made voice impersonation readily accessible, raising ethical and legal concerns due to potential misuse for malicious activities like misinformation campaigns and fraud. While synthetic speech detectors (SSDs) exist to combat this, they are vulnerable to ``test domain shift", exhibiting decrea…
▽ More
Recent advances in text-to-speech (TTS) systems, particularly those with voice cloning capabilities, have made voice impersonation readily accessible, raising ethical and legal concerns due to potential misuse for malicious activities like misinformation campaigns and fraud. While synthetic speech detectors (SSDs) exist to combat this, they are vulnerable to ``test domain shift", exhibiting decreased performance when audio is altered through transcoding, playback, or background noise. This vulnerability is further exacerbated by deliberate manipulation of synthetic speech aimed at deceiving detectors. This work presents the first systematic study of such active malicious attacks against state-of-the-art open-source SSDs. White-box attacks, black-box attacks, and their transferability are studied from both attack effectiveness and stealthiness, using both hardcoded metrics and human ratings. The results highlight the urgent need for more robust detection methods in the face of evolving adversarial threats.
△ Less
Submitted 9 October, 2024;
originally announced October 2024.
-
Computational Teaching for Driving via Multi-Task Imitation Learning
Authors:
Deepak Gopinath,
Xiongyi Cui,
Jonathan DeCastro,
Emily Sumner,
Jean Costa,
Hiroshi Yasuda,
Allison Morgan,
Laporsha Dees,
Sheryl Chau,
John Leonard,
Tiffany Chen,
Guy Rosman,
Avinash Balachandran
Abstract:
Learning motor skills for sports or performance driving is often done with professional instruction from expert human teachers, whose availability is limited. Our goal is to enable automated teaching via a learned model that interacts with the student similar to a human teacher. However, training such automated teaching systems is limited by the availability of high-quality annotated datasets of e…
▽ More
Learning motor skills for sports or performance driving is often done with professional instruction from expert human teachers, whose availability is limited. Our goal is to enable automated teaching via a learned model that interacts with the student similar to a human teacher. However, training such automated teaching systems is limited by the availability of high-quality annotated datasets of expert teacher and student interactions that are difficult to collect at scale. To address this data scarcity problem, we propose an approach for training a coaching system for complex motor tasks such as high performance driving via a Multi-Task Imitation Learning (MTIL) paradigm. MTIL allows our model to learn robust representations by utilizing self-supervised training signals from more readily available non-interactive datasets of humans performing the task of interest. We validate our approach with (1) a semi-synthetic dataset created from real human driving trajectories, (2) a professional track driving instruction dataset, (3) a track-racing driving simulator human-subject study, and (4) a system demonstration on an instrumented car at a race track. Our experiments show that the right set of auxiliary machine learning tasks improves performance in predicting teaching instructions. Moreover, in the human subjects study, students exposed to the instructions from our teaching system improve their ability to stay within track limits, and show favorable perception of the model's interaction with them, in terms of usefulness and satisfaction.
△ Less
Submitted 2 October, 2024;
originally announced October 2024.
-
Swarm Algorithms for Dynamic Task Allocation in Unknown Environments
Authors:
Adithya Balachandran,
Noble Harasha,
Nancy Lynch
Abstract:
Robot swarms, systems of many robots that operate in a distributed fashion, have many applications in areas such as search-and-rescue, natural disaster response, and self-assembly. Several of these applications can be abstracted to the general problem of task allocation in an environment, in which robots must assign themselves to and complete tasks. While several algorithms for task allocation hav…
▽ More
Robot swarms, systems of many robots that operate in a distributed fashion, have many applications in areas such as search-and-rescue, natural disaster response, and self-assembly. Several of these applications can be abstracted to the general problem of task allocation in an environment, in which robots must assign themselves to and complete tasks. While several algorithms for task allocation have been proposed, most of them assume either prior knowledge of task locations or a static set of tasks. Operating under a discrete general model where tasks dynamically appear in unknown locations, we present three new swarm algorithms for task allocation. We demonstrate that when tasks appear slowly, our variant of a distributed algorithm based on propagating task information completes tasks more efficiently than a Levy random walk algorithm, which is a strategy used by many organisms in nature to efficiently search an environment. We also propose a division of labor algorithm where some agents are using our algorithm based on propagating task information while the remaining agents are using the Levy random walk algorithm. Finally, we introduce a hybrid algorithm where each agent dynamically switches between using propagated task information and following a Levy random walk. We show that our division of labor and hybrid algorithms can perform better than both our algorithm based on propagated task information and the Levy walk algorithm, especially at low and medium task rates. When tasks appear fast, we observe the Levy random walk strategy performs as well or better when compared to these novel approaches. Our work demonstrates the relative performance of these algorithms on a variety of task rates and also provide insight into optimizing our algorithms based on environment parameters.
△ Less
Submitted 14 September, 2024;
originally announced September 2024.
-
Personalizing Driver Safety Interfaces via Driver Cognitive Factors Inference
Authors:
Emily S Sumner,
Jonathan DeCastro,
Jean Costa,
Deepak E Gopinath,
Everlyne Kimani,
Shabnam Hakimi,
Allison Morgan,
Andrew Best,
Hieu Nguyen,
Daniel J Brooks,
Bassam ul Haq,
Andrew Patrikalakis,
Hiroshi Yasuda,
Kate Sieck,
Avinash Balachandran,
Tiffany Chen,
Guy Rosman
Abstract:
Recent advances in AI and intelligent vehicle technology hold promise to revolutionize mobility and transportation, in the form of advanced driving assistance (ADAS) interfaces. Although it is widely recognized that certain cognitive factors, such as impulsivity and inhibitory control, are related to risky driving behavior, play a significant role in on-road risk-taking, existing systems fail to l…
▽ More
Recent advances in AI and intelligent vehicle technology hold promise to revolutionize mobility and transportation, in the form of advanced driving assistance (ADAS) interfaces. Although it is widely recognized that certain cognitive factors, such as impulsivity and inhibitory control, are related to risky driving behavior, play a significant role in on-road risk-taking, existing systems fail to leverage such factors. Varying levels of these cognitive factors could influence the effectiveness and acceptance of driver safety interfaces.
We demonstrate an approach for personalizing driver interaction via driver safety interfaces that are triggered based on a learned recurrent neural network. The network is trained from a population of human drivers to infer impulsivity and inhibitory control from recent driving behavior. Using a high-fidelity vehicle motion simulator, we demonstrate the ability to deduce these factors from driver behavior. We then use these inferred factors to make instantaneous determinations on whether or not to engage a driver safety interface. This interface aims to decrease a driver's speed during yellow lights and reduce their inclination to run through them.
△ Less
Submitted 8 February, 2024;
originally announced February 2024.
-
Can ChatGPT Play the Role of a Teaching Assistant in an Introductory Programming Course?
Authors:
Anishka,
Atharva Mehta,
Nipun Gupta,
Aarav Balachandran,
Dhruv Kumar,
Pankaj Jalote
Abstract:
The emergence of Large language models (LLMs) is expected to have a major impact on education. This paper explores the potential of using ChatGPT, an LLM, as a virtual Teaching Assistant (TA) in an Introductory Programming Course. We evaluate ChatGPT's capabilities by comparing its performance with that of human TAs in some of the important TA functions. The TA functions which we focus on include…
▽ More
The emergence of Large language models (LLMs) is expected to have a major impact on education. This paper explores the potential of using ChatGPT, an LLM, as a virtual Teaching Assistant (TA) in an Introductory Programming Course. We evaluate ChatGPT's capabilities by comparing its performance with that of human TAs in some of the important TA functions. The TA functions which we focus on include (1) grading student code submissions, and (2) providing feedback to undergraduate students in an introductory programming course. Firstly, we assess ChatGPT's proficiency in grading student code submissions using a given grading rubric and compare its performance with the grades assigned by human TAs. Secondly, we analyze the quality and relevance of the feedback provided by ChatGPT. This evaluation considers how well ChatGPT addresses mistakes and offers suggestions for improvement in student solutions from both code correctness and code quality perspectives. We conclude with a discussion on the implications of integrating ChatGPT into computing education for automated grading, personalized learning experiences, and instructional support.
△ Less
Submitted 22 January, 2024; v1 submitted 12 December, 2023;
originally announced December 2023.
-
Tamil-Llama: A New Tamil Language Model Based on Llama 2
Authors:
Abhinand Balachandran
Abstract:
Language modeling has witnessed remarkable advancements in recent years, with Large Language Models (LLMs) like ChatGPT setting unparalleled benchmarks in human-like text generation. However, a prevailing limitation is the underrepresentation of languages like Tamil in these cutting-edge models, leading to suboptimal performance in diverse linguistic contexts. This paper addresses this lacuna, enh…
▽ More
Language modeling has witnessed remarkable advancements in recent years, with Large Language Models (LLMs) like ChatGPT setting unparalleled benchmarks in human-like text generation. However, a prevailing limitation is the underrepresentation of languages like Tamil in these cutting-edge models, leading to suboptimal performance in diverse linguistic contexts. This paper addresses this lacuna, enhancing the open-source LLaMA model with an addition of 16,000 Tamil tokens, aiming to achieve superior text generation and comprehension in the Tamil language. We strategically employ the LoRA methodology for efficient model training on a comprehensive Tamil corpus, ensuring computational feasibility and model robustness. Moreover, we introduce a Tamil-translated version of the Alpaca dataset and a subset of the OpenOrca dataset tailored for instruction fine-tuning. Our results showcase significant performance improvements in Tamil text generation, with potential implications for the broader landscape of LLMs in Indian languages. We further underscore our commitment to open research by making our models, datasets, and code publicly accessible, fostering further innovations in language modeling.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
Hierarchical Grammar-Induced Geometry for Data-Efficient Molecular Property Prediction
Authors:
Minghao Guo,
Veronika Thost,
Samuel W Song,
Adithya Balachandran,
Payel Das,
Jie Chen,
Wojciech Matusik
Abstract:
The prediction of molecular properties is a crucial task in the field of material and drug discovery. The potential benefits of using deep learning techniques are reflected in the wealth of recent literature. Still, these techniques are faced with a common challenge in practice: Labeled data are limited by the cost of manual extraction from literature and laborious experimentation. In this work, w…
▽ More
The prediction of molecular properties is a crucial task in the field of material and drug discovery. The potential benefits of using deep learning techniques are reflected in the wealth of recent literature. Still, these techniques are faced with a common challenge in practice: Labeled data are limited by the cost of manual extraction from literature and laborious experimentation. In this work, we propose a data-efficient property predictor by utilizing a learnable hierarchical molecular grammar that can generate molecules from grammar production rules. Such a grammar induces an explicit geometry of the space of molecular graphs, which provides an informative prior on molecular structural similarity. The property prediction is performed using graph neural diffusion over the grammar-induced geometry. On both small and large datasets, our evaluation shows that this approach outperforms a wide spectrum of baselines, including supervised and pre-trained graph neural networks. We include a detailed ablation study and further analysis of our solution, showing its effectiveness in cases with extremely limited data. Code is available at https://github.com/gmh14/Geo-DEG.
△ Less
Submitted 4 September, 2023;
originally announced September 2023.
-
Autonomous Drifting with 3 Minutes of Data via Learned Tire Models
Authors:
Franck Djeumou,
Jonathan Y. M. Goh,
Ufuk Topcu,
Avinash Balachandran
Abstract:
Near the limits of adhesion, the forces generated by a tire are nonlinear and intricately coupled. Efficient and accurate modelling in this region could improve safety, especially in emergency situations where high forces are required. To this end, we propose a novel family of tire force models based on neural ordinary differential equations and a neural-ExpTanh parameterization. These models are…
▽ More
Near the limits of adhesion, the forces generated by a tire are nonlinear and intricately coupled. Efficient and accurate modelling in this region could improve safety, especially in emergency situations where high forces are required. To this end, we propose a novel family of tire force models based on neural ordinary differential equations and a neural-ExpTanh parameterization. These models are designed to satisfy physically insightful assumptions while also having sufficient fidelity to capture higher-order effects directly from vehicle state measurements. They are used as drop-in replacements for an analytical brush tire model in an existing nonlinear model predictive control framework. Experiments with a customized Toyota Supra show that scarce amounts of driving data -- less than three minutes -- is sufficient to achieve high-performance autonomous drifting on various trajectories with speeds up to 45mph. Comparisons with the benchmark model show a $4 \times$ improvement in tracking performance, smoother control inputs, and faster and more consistent computation time.
△ Less
Submitted 16 October, 2023; v1 submitted 9 June, 2023;
originally announced June 2023.
-
NashFormer: Leveraging Local Nash Equilibria for Semantically Diverse Trajectory Prediction
Authors:
Justin Lidard,
Oswin So,
Yanxia Zhang,
Jonathan DeCastro,
Xiongyi Cui,
Xin Huang,
Yen-Ling Kuo,
John Leonard,
Avinash Balachandran,
Naomi Leonard,
Guy Rosman
Abstract:
Interactions between road agents present a significant challenge in trajectory prediction, especially in cases involving multiple agents. Because existing diversity-aware predictors do not account for the interactive nature of multi-agent predictions, they may miss these important interaction outcomes. In this paper, we propose NashFormer, a framework for trajectory prediction that leverages game-…
▽ More
Interactions between road agents present a significant challenge in trajectory prediction, especially in cases involving multiple agents. Because existing diversity-aware predictors do not account for the interactive nature of multi-agent predictions, they may miss these important interaction outcomes. In this paper, we propose NashFormer, a framework for trajectory prediction that leverages game-theoretic inverse reinforcement learning to improve coverage of multi-modal predictions. We use a training-time game-theoretic analysis as an auxiliary loss resulting in improved coverage and accuracy without presuming a taxonomy of actions for the agents. We demonstrate our approach on the interactive split of the Waymo Open Motion Dataset, including four subsets involving scenarios with high interaction complexity. Experiment results show that our predictor produces accurate predictions while covering $33\%$ more potential interactions versus a baseline model.
△ Less
Submitted 11 November, 2023; v1 submitted 27 May, 2023;
originally announced May 2023.
-
Automated detection and quantification of COVID-19 airspace disease on chest radiographs: A novel approach achieving radiologist-level performance using a CNN trained on digital reconstructed radiographs (DRRs) from CT-based ground-truth
Authors:
Eduardo Mortani Barbosa Jr.,
Warren B. Gefter,
Rochelle Yang,
Florin C. Ghesu,
Siqi Liu,
Boris Mailhe,
Awais Mansoor,
Sasa Grbic,
Sebastian Piat,
Guillaume Chabin,
Vishwanath R S.,
Abishek Balachandran,
Sebastian Vogt,
Valentin Ziebandt,
Steffen Kappler,
Dorin Comaniciu
Abstract:
Purpose: To leverage volumetric quantification of airspace disease (AD) derived from a superior modality (CT) serving as ground truth, projected onto digitally reconstructed radiographs (DRRs) to: 1) train a convolutional neural network to quantify airspace disease on paired CXRs; and 2) compare the DRR-trained CNN to expert human readers in the CXR evaluation of patients with confirmed COVID-19.…
▽ More
Purpose: To leverage volumetric quantification of airspace disease (AD) derived from a superior modality (CT) serving as ground truth, projected onto digitally reconstructed radiographs (DRRs) to: 1) train a convolutional neural network to quantify airspace disease on paired CXRs; and 2) compare the DRR-trained CNN to expert human readers in the CXR evaluation of patients with confirmed COVID-19.
Materials and Methods: We retrospectively selected a cohort of 86 COVID-19 patients (with positive RT-PCR), from March-May 2020 at a tertiary hospital in the northeastern USA, who underwent chest CT and CXR within 48 hrs. The ground truth volumetric percentage of COVID-19 related AD (POv) was established by manual AD segmentation on CT. The resulting 3D masks were projected into 2D anterior-posterior digitally reconstructed radiographs (DRR) to compute area-based AD percentage (POa). A convolutional neural network (CNN) was trained with DRR images generated from a larger-scale CT dataset of COVID-19 and non-COVID-19 patients, automatically segmenting lungs, AD and quantifying POa on CXR. CNN POa results were compared to POa quantified on CXR by two expert readers and to the POv ground-truth, by computing correlations and mean absolute errors.
Results: Bootstrap mean absolute error (MAE) and correlations between POa and POv were 11.98% [11.05%-12.47%] and 0.77 [0.70-0.82] for average of expert readers, and 9.56%-9.78% [8.83%-10.22%] and 0.78-0.81 [0.73-0.85] for the CNN, respectively.
Conclusion: Our CNN trained with DRR using CT-derived airspace quantification achieved expert radiologist level of accuracy in the quantification of airspace disease on CXR, in patients with positive RT-PCR for COVID-19.
△ Less
Submitted 13 August, 2020;
originally announced August 2020.
-
Quantifying and Leveraging Predictive Uncertainty for Medical Image Assessment
Authors:
Florin C. Ghesu,
Bogdan Georgescu,
Awais Mansoor,
Youngjin Yoo,
Eli Gibson,
R. S. Vishwanath,
Abishek Balachandran,
James M. Balter,
Yue Cao,
Ramandeep Singh,
Subba R. Digumarthy,
Mannudeep K. Kalra,
Sasa Grbic,
Dorin Comaniciu
Abstract:
The interpretation of medical images is a challenging task, often complicated by the presence of artifacts, occlusions, limited contrast and more. Most notable is the case of chest radiography, where there is a high inter-rater variability in the detection and classification of abnormalities. This is largely due to inconclusive evidence in the data or subjective definitions of disease appearance.…
▽ More
The interpretation of medical images is a challenging task, often complicated by the presence of artifacts, occlusions, limited contrast and more. Most notable is the case of chest radiography, where there is a high inter-rater variability in the detection and classification of abnormalities. This is largely due to inconclusive evidence in the data or subjective definitions of disease appearance. An additional example is the classification of anatomical views based on 2D Ultrasound images. Often, the anatomical context captured in a frame is not sufficient to recognize the underlying anatomy. Current machine learning solutions for these problems are typically limited to providing probabilistic predictions, relying on the capacity of underlying models to adapt to limited information and the high degree of label noise. In practice, however, this leads to overconfident systems with poor generalization on unseen data. To account for this, we propose a system that learns not only the probabilistic estimate for classification, but also an explicit uncertainty measure which captures the confidence of the system in the predicted output. We argue that this approach is essential to account for the inherent ambiguity characteristic of medical images from different radiologic exams including computed radiography, ultrasonography and magnetic resonance imaging. In our experiments we demonstrate that sample rejection based on the predicted uncertainty can significantly improve the ROC-AUC for various tasks, e.g., by 8% to 0.91 with an expected rejection rate of under 25% for the classification of different abnormalities in chest radiographs. In addition, we show that using uncertainty-driven bootstrapping to filter the training data, one can achieve a significant increase in robustness and accuracy.
△ Less
Submitted 8 July, 2020;
originally announced July 2020.
-
3D Tomographic Pattern Synthesis for Enhancing the Quantification of COVID-19
Authors:
Siqi Liu,
Bogdan Georgescu,
Zhoubing Xu,
Youngjin Yoo,
Guillaume Chabin,
Shikha Chaganti,
Sasa Grbic,
Sebastian Piat,
Brian Teixeira,
Abishek Balachandran,
Vishwanath RS,
Thomas Re,
Dorin Comaniciu
Abstract:
The Coronavirus Disease (COVID-19) has affected 1.8 million people and resulted in more than 110,000 deaths as of April 12, 2020. Several studies have shown that tomographic patterns seen on chest Computed Tomography (CT), such as ground-glass opacities, consolidations, and crazy paving pattern, are correlated with the disease severity and progression. CT imaging can thus emerge as an important mo…
▽ More
The Coronavirus Disease (COVID-19) has affected 1.8 million people and resulted in more than 110,000 deaths as of April 12, 2020. Several studies have shown that tomographic patterns seen on chest Computed Tomography (CT), such as ground-glass opacities, consolidations, and crazy paving pattern, are correlated with the disease severity and progression. CT imaging can thus emerge as an important modality for the management of COVID-19 patients. AI-based solutions can be used to support CT based quantitative reporting and make reading efficient and reproducible if quantitative biomarkers, such as the Percentage of Opacity (PO), can be automatically computed. However, COVID-19 has posed unique challenges to the development of AI, specifically concerning the availability of appropriate image data and annotations at scale. In this paper, we propose to use synthetic datasets to augment an existing COVID-19 database to tackle these challenges. We train a Generative Adversarial Network (GAN) to inpaint COVID-19 related tomographic patterns on chest CTs from patients without infectious diseases. Additionally, we leverage location priors derived from manually labeled COVID-19 chest CTs patients to generate appropriate abnormality distributions. Synthetic data are used to improve both lung segmentation and segmentation of COVID-19 patterns by adding 20% of synthetic data to the real COVID-19 training data. We collected 2143 chest CTs, containing 327 COVID-19 positive cases, acquired from 12 sites across 7 countries. By testing on 100 COVID-19 positive and 100 control cases, we show that synthetic data can help improve both lung segmentation (+6.02% lesion inclusion rate) and abnormality segmentation (+2.78% dice coefficient), leading to an overall more accurate PO computation (+2.82% Pearson coefficient).
△ Less
Submitted 4 May, 2020;
originally announced May 2020.
-
Automated Quantification of CT Patterns Associated with COVID-19 from Chest CT
Authors:
Shikha Chaganti,
Abishek Balachandran,
Guillaume Chabin,
Stuart Cohen,
Thomas Flohr,
Bogdan Georgescu,
Philippe Grenier,
Sasa Grbic,
Siqi Liu,
François Mellot,
Nicolas Murray,
Savvas Nicolaou,
William Parker,
Thomas Re,
Pina Sanelli,
Alexander W. Sauter,
Zhoubing Xu,
Youngjin Yoo,
Valentin Ziebandt,
Dorin Comaniciu
Abstract:
Purpose: To present a method that automatically segments and quantifies abnormal CT patterns commonly present in coronavirus disease 2019 (COVID-19), namely ground glass opacities and consolidations. Materials and Methods: In this retrospective study, the proposed method takes as input a non-contrasted chest CT and segments the lesions, lungs, and lobes in three dimensions, based on a dataset of 9…
▽ More
Purpose: To present a method that automatically segments and quantifies abnormal CT patterns commonly present in coronavirus disease 2019 (COVID-19), namely ground glass opacities and consolidations. Materials and Methods: In this retrospective study, the proposed method takes as input a non-contrasted chest CT and segments the lesions, lungs, and lobes in three dimensions, based on a dataset of 9749 chest CT volumes. The method outputs two combined measures of the severity of lung and lobe involvement, quantifying both the extent of COVID-19 abnormalities and presence of high opacities, based on deep learning and deep reinforcement learning. The first measure of (PO, PHO) is global, while the second of (LSS, LHOS) is lobewise. Evaluation of the algorithm is reported on CTs of 200 participants (100 COVID-19 confirmed patients and 100 healthy controls) from institutions from Canada, Europe and the United States collected between 2002-Present (April, 2020). Ground truth is established by manual annotations of lesions, lungs, and lobes. Correlation and regression analyses were performed to compare the prediction to the ground truth. Results: Pearson correlation coefficient between method prediction and ground truth for COVID-19 cases was calculated as 0.92 for PO (P < .001), 0.97 for PHO(P < .001), 0.91 for LSS (P < .001), 0.90 for LHOS (P < .001). 98 of 100 healthy controls had a predicted PO of less than 1%, 2 had between 1-2%. Automated processing time to compute the severity scores was 10 seconds per case compared to 30 minutes required for manual annotations. Conclusion: A new method segments regions of CT abnormalities associated with COVID-19 and computes (PO, PHO), as well as (LSS, LHOS) severity scores.
△ Less
Submitted 18 November, 2020; v1 submitted 2 April, 2020;
originally announced April 2020.
-
CARE: Content Aware Redundancy Elimination for Disaster Communications on Damaged Networks
Authors:
Udi Weinsberg,
Athula Balachandran,
Nina Taft,
Gianluca Iannaccone,
Vyas Sekar,
Srinivasan Seshan
Abstract:
During a disaster scenario, situational awareness information, such as location, physical status and images of the surrounding area, is essential for minimizing loss of life, injury, and property damage. Today's handhelds make it easy for people to gather data from within the disaster area in many formats, including text, images and video. Studies show that the extreme anxiety induced by disasters…
▽ More
During a disaster scenario, situational awareness information, such as location, physical status and images of the surrounding area, is essential for minimizing loss of life, injury, and property damage. Today's handhelds make it easy for people to gather data from within the disaster area in many formats, including text, images and video. Studies show that the extreme anxiety induced by disasters causes humans to create a substantial amount of repetitive and redundant content. Transporting this content outside the disaster zone can be problematic when the network infrastructure is disrupted by the disaster.
This paper presents the design of a novel architecture called CARE (Content-Aware Redundancy Elimination) for better utilizing network resources in disaster-affected regions. Motivated by measurement-driven insights on redundancy patterns found in real-world disaster area photos, we demonstrate that CARE can detect the semantic similarity between photos in the networking layer, thus reducing redundant transfers and improving buffer utilization. Using DTN simulations, we explore the boundaries of the usefulness of deploying CARE on a damaged network, and show that CARE can reduce packet delivery times and drops, and enables 20-40% more unique information to reach the rescue teams outside the disaster area than when CARE is not deployed.
△ Less
Submitted 8 June, 2012;
originally announced June 2012.