-
Building reliable sim driving agents by scaling self-play
Authors:
Daphne Cornelisse,
Aarav Pandya,
Kevin Joseph,
Joseph Suárez,
Eugene Vinitsky
Abstract:
Simulation agents are essential for designing and testing systems that interact with humans, such as autonomous vehicles (AVs). These agents serve various purposes, from benchmarking AV performance to stress-testing system limits, but all applications share one key requirement: reliability. To enable sound experimentation, a simulation agent must behave as intended. It should minimize actions that…
▽ More
Simulation agents are essential for designing and testing systems that interact with humans, such as autonomous vehicles (AVs). These agents serve various purposes, from benchmarking AV performance to stress-testing system limits, but all applications share one key requirement: reliability. To enable sound experimentation, a simulation agent must behave as intended. It should minimize actions that may lead to undesired outcomes, such as collisions, which can distort the signal-to-noise ratio in analyses. As a foundation for reliable sim agents, we propose scaling self-play to thousands of scenarios on the Waymo Open Motion Dataset under semi-realistic limits on human perception and control. Training from scratch on a single GPU, our agents solve almost the full training set within a day. They generalize to unseen test scenes, achieving a 99.8% goal completion rate with less than 0.8% combined collision and off-road incidents across 10,000 held-out scenarios. Beyond in-distribution generalization, our agents show partial robustness to out-of-distribution scenes and can be fine-tuned in minutes to reach near-perfect performance in such cases. We open-source the pre-trained agents and integrate them with a batched multi-agent simulator. Demonstrations of agent behaviors can be viewed at https://sites.google.com/view/reliable-sim-agents, and we open-source our agents at https://github.com/Emerge-Lab/gpudrive.
△ Less
Submitted 19 May, 2025; v1 submitted 20 February, 2025;
originally announced February 2025.
-
KinDEL: DNA-Encoded Library Dataset for Kinase Inhibitors
Authors:
Benson Chen,
Tomasz Danel,
Patrick J. McEnaney,
Nikhil Jain,
Kirill Novikov,
Spurti Umesh Akki,
Joshua L. Turnbull,
Virja Atul Pandya,
Boris P. Belotserkovskii,
Jared Bryce Weaver,
Ankita Biswas,
Dat Nguyen,
Gabriel H. S. Dreiman,
Mohammad Sultan,
Nathaniel Stanley,
Daniel M Whalen,
Divya Kanichar,
Christoph Klein,
Emily Fox,
R. Edward Watts
Abstract:
DNA-Encoded Libraries (DEL) are combinatorial small molecule libraries that offer an efficient way to characterize diverse chemical spaces. Selection experiments using DELs are pivotal to drug discovery efforts, enabling high-throughput screens for hit finding. However, limited availability of public DEL datasets hinders the advancement of computational techniques designed to process such data. To…
▽ More
DNA-Encoded Libraries (DEL) are combinatorial small molecule libraries that offer an efficient way to characterize diverse chemical spaces. Selection experiments using DELs are pivotal to drug discovery efforts, enabling high-throughput screens for hit finding. However, limited availability of public DEL datasets hinders the advancement of computational techniques designed to process such data. To bridge this gap, we present KinDEL, one of the first large, publicly available DEL datasets on two kinases: Mitogen-Activated Protein Kinase 14 (MAPK14) and Discoidin Domain Receptor Tyrosine Kinase 1 (DDR1). Interest in this data modality is growing due to its ability to generate extensive supervised chemical data that densely samples around select molecular structures. Demonstrating one such application of the data, we benchmark different machine learning techniques to develop predictive models for hit identification; in particular, we highlight recent structure-based probabilistic approaches. Finally, we provide biophysical assay data, both on- and off-DNA, to validate our models on a smaller subset of molecules. Data and code for our benchmarks can be found at: https://github.com/insitro/kindel.
△ Less
Submitted 11 October, 2024;
originally announced October 2024.
-
Estimating Body Volume and Height Using 3D Data
Authors:
Vivek Ganesh Sonar,
Muhammad Tanveer Jan,
Mike Wells,
Abhijit Pandya,
Gabriela Engstrom,
Richard Shih,
Borko Furht
Abstract:
Accurate body weight estimation is critical in emergency medicine for proper dosing of weight-based medications, yet direct measurement is often impractical in urgent situations. This paper presents a non-invasive method for estimating body weight by calculating total body volume and height using 3D imaging technology. A RealSense D415 camera is employed to capture high-resolution depth maps of th…
▽ More
Accurate body weight estimation is critical in emergency medicine for proper dosing of weight-based medications, yet direct measurement is often impractical in urgent situations. This paper presents a non-invasive method for estimating body weight by calculating total body volume and height using 3D imaging technology. A RealSense D415 camera is employed to capture high-resolution depth maps of the patient, from which 3D models are generated. The Convex Hull Algorithm is then applied to calculate the total body volume, with enhanced accuracy achieved by segmenting the point cloud data into multiple sections and summing their individual volumes. The height is derived from the 3D model by identifying the distance between key points on the body. This combined approach provides an accurate estimate of body weight, improving the reliability of medical interventions where precise weight data is unavailable. The proposed method demonstrates significant potential to enhance patient safety and treatment outcomes in emergency settings.
△ Less
Submitted 18 September, 2024;
originally announced October 2024.
-
GPUDrive: Data-driven, multi-agent driving simulation at 1 million FPS
Authors:
Saman Kazemkhani,
Aarav Pandya,
Daphne Cornelisse,
Brennan Shacklett,
Eugene Vinitsky
Abstract:
Multi-agent learning algorithms have been successful at generating superhuman planning in various games but have had limited impact on the design of deployed multi-agent planners. A key bottleneck in applying these techniques to multi-agent planning is that they require billions of steps of experience. To enable the study of multi-agent planning at scale, we present GPUDrive. GPUDrive is a GPU-acc…
▽ More
Multi-agent learning algorithms have been successful at generating superhuman planning in various games but have had limited impact on the design of deployed multi-agent planners. A key bottleneck in applying these techniques to multi-agent planning is that they require billions of steps of experience. To enable the study of multi-agent planning at scale, we present GPUDrive. GPUDrive is a GPU-accelerated, multi-agent simulator built on top of the Madrona Game Engine capable of generating over a million simulation steps per second. Observation, reward, and dynamics functions are written directly in C++, allowing users to define complex, heterogeneous agent behaviors that are lowered to high-performance CUDA. Despite these low-level optimizations, GPUDrive is fully accessible through Python, offering a seamless and efficient workflow for multi-agent, closed-loop simulation. Using GPUDrive, we train reinforcement learning agents on the Waymo Open Motion Dataset, achieving efficient goal-reaching in minutes and scaling to thousands of scenarios in hours. We open-source the code and pre-trained agents at https://github.com/Emerge-Lab/gpudrive.
△ Less
Submitted 18 February, 2025; v1 submitted 2 August, 2024;
originally announced August 2024.
-
Standing on FURM ground -- A framework for evaluating Fair, Useful, and Reliable AI Models in healthcare systems
Authors:
Alison Callahan,
Duncan McElfresh,
Juan M. Banda,
Gabrielle Bunney,
Danton Char,
Jonathan Chen,
Conor K. Corbin,
Debadutta Dash,
Norman L. Downing,
Sneha S. Jain,
Nikesh Kotecha,
Jonathan Masterson,
Michelle M. Mello,
Keith Morse,
Srikar Nallan,
Abby Pandya,
Anurang Revri,
Aditya Sharma,
Christopher Sharp,
Rahul Thapa,
Michael Wornow,
Alaa Youssef,
Michael A. Pfeffer,
Nigam H. Shah
Abstract:
The impact of using artificial intelligence (AI) to guide patient care or operational processes is an interplay of the AI model's output, the decision-making protocol based on that output, and the capacity of the stakeholders involved to take the necessary subsequent action. Estimating the effects of this interplay before deployment, and studying it in real time afterwards, are essential to bridge…
▽ More
The impact of using artificial intelligence (AI) to guide patient care or operational processes is an interplay of the AI model's output, the decision-making protocol based on that output, and the capacity of the stakeholders involved to take the necessary subsequent action. Estimating the effects of this interplay before deployment, and studying it in real time afterwards, are essential to bridge the chasm between AI model development and achievable benefit. To accomplish this, the Data Science team at Stanford Health Care has developed a Testing and Evaluation (T&E) mechanism to identify fair, useful and reliable AI models (FURM) by conducting an ethical review to identify potential value mismatches, simulations to estimate usefulness, financial projections to assess sustainability, as well as analyses to determine IT feasibility, design a deployment strategy, and recommend a prospective monitoring and evaluation plan. We report on FURM assessments done to evaluate six AI guided solutions for potential adoption, spanning clinical and operational settings, each with the potential to impact from several dozen to tens of thousands of patients each year. We describe the assessment process, summarize the six assessments, and share our framework to enable others to conduct similar assessments. Of the six solutions we assessed, two have moved into a planning and implementation phase. Our novel contributions - usefulness estimates by simulation, financial projections to quantify sustainability, and a process to do ethical assessments - as well as their underlying methods and open source tools, are available for other healthcare systems to conduct actionable evaluations of candidate AI solutions.
△ Less
Submitted 14 March, 2024; v1 submitted 26 February, 2024;
originally announced March 2024.
-
Opinion Mining Using Population-tuned Generative Language Models
Authors:
Allmin Susaiyah,
Abhinay Pandya,
Aki Härmä
Abstract:
We present a novel method for mining opinions from text collections using generative language models trained on data collected from different populations. We describe the basic definitions, methodology and a generic algorithm for opinion insight mining. We demonstrate the performance of our method in an experiment where a pre-trained generative model is fine-tuned using specifically tailored conte…
▽ More
We present a novel method for mining opinions from text collections using generative language models trained on data collected from different populations. We describe the basic definitions, methodology and a generic algorithm for opinion insight mining. We demonstrate the performance of our method in an experiment where a pre-trained generative model is fine-tuned using specifically tailored content with unnatural and fully annotated opinions. We show that our approach can learn and transfer the opinions to the semantic classes while maintaining the proportion of polarisation. Finally, we demonstrate the usage of an insight mining system to scale up the discovery of opinion insights from a real text corpus.
△ Less
Submitted 24 July, 2023;
originally announced July 2023.
-
nuReality: A VR environment for research of pedestrian and autonomous vehicle interactions
Authors:
Paul Schmitt,
Nicholas Britten,
JiHyun Jeong,
Amelia Coffey,
Kevin Clark,
Shweta Sunil Kothawade,
Elena Corina Grigore,
Adam Khaw,
Christopher Konopka,
Linh Pham,
Kim Ryan,
Christopher Schmitt,
Aryaman Pandya,
Emilio Frazzoli
Abstract:
We present nuReality, a virtual reality 'VR' environment designed to test the efficacy of vehicular behaviors to communicate intent during interactions between autonomous vehicles 'AVs' and pedestrians at urban intersections. In this project we focus on expressive behaviors as a means for pedestrians to readily recognize the underlying intent of the AV's movements. VR is an ideal tool to use to te…
▽ More
We present nuReality, a virtual reality 'VR' environment designed to test the efficacy of vehicular behaviors to communicate intent during interactions between autonomous vehicles 'AVs' and pedestrians at urban intersections. In this project we focus on expressive behaviors as a means for pedestrians to readily recognize the underlying intent of the AV's movements. VR is an ideal tool to use to test these situations as it can be immersive and place subjects into these potentially dangerous scenarios without risk. nuReality provides a novel and immersive virtual reality environment that includes numerous visual details (road and building texturing, parked cars, swaying tree limbs) as well as auditory details (birds chirping, cars honking in the distance, people talking). In these files we present the nuReality environment, its 10 unique vehicle behavior scenarios, and the Unreal Engine and Autodesk Maya source files for each scenario. The files are publicly released as open source at www.nuReality.org, to support the academic community studying the critical AV-pedestrian interaction.
△ Less
Submitted 12 January, 2022;
originally announced January 2022.
-
Cascading Adaptors to Leverage English Data to Improve Performance of Question Answering for Low-Resource Languages
Authors:
Hariom A. Pandya,
Bhavik Ardeshna,
Brijesh S. Bhatt
Abstract:
Transformer based architectures have shown notable results on many down streaming tasks including question answering. The availability of data, on the other hand, impedes obtaining legitimate performance for low-resource languages. In this paper, we investigate the applicability of pre-trained multilingual models to improve the performance of question answering in low-resource languages. We tested…
▽ More
Transformer based architectures have shown notable results on many down streaming tasks including question answering. The availability of data, on the other hand, impedes obtaining legitimate performance for low-resource languages. In this paper, we investigate the applicability of pre-trained multilingual models to improve the performance of question answering in low-resource languages. We tested four combinations of language and task adapters using multilingual transformer architectures on seven languages similar to MLQA dataset. Additionally, we have also proposed zero-shot transfer learning of low-resource question answering using language and task adapters. We observed that stacking the language and the task adapters improves the multilingual transformer models' performance significantly for low-resource languages.
△ Less
Submitted 18 December, 2021;
originally announced December 2021.
-
Question Answering Survey: Directions, Challenges, Datasets, Evaluation Matrices
Authors:
Hariom A. Pandya,
Brijesh S. Bhatt
Abstract:
The usage and amount of information available on the internet increase over the past decade. This digitization leads to the need for automated answering system to extract fruitful information from redundant and transitional knowledge sources. Such systems are designed to cater the most prominent answer from this giant knowledge source to the user query using natural language understanding (NLU) an…
▽ More
The usage and amount of information available on the internet increase over the past decade. This digitization leads to the need for automated answering system to extract fruitful information from redundant and transitional knowledge sources. Such systems are designed to cater the most prominent answer from this giant knowledge source to the user query using natural language understanding (NLU) and thus eminently depends on the Question-answering(QA) field.
Question answering involves but not limited to the steps like mapping of user question to pertinent query, retrieval of relevant information, finding the best suitable answer from the retrieved information etc. The current improvement of deep learning models evince compelling performance improvement in all these tasks.
In this review work, the research directions of QA field are analyzed based on the type of question, answer type, source of evidence-answer, and modeling approach. This detailing followed by open challenges of the field like automatic question generation, similarity detection and, low resource availability for a language. In the end, a survey of available datasets and evaluation measures is presented.
△ Less
Submitted 7 December, 2021;
originally announced December 2021.
-
Multi-channel MRI Embedding: An EffectiveStrategy for Enhancement of Human Brain WholeTumor Segmentation
Authors:
Apurva Pandya,
Catherine Samuel,
Nisargkumar Patel,
Vaibhavkumar Patel,
Thangarajah Akilan
Abstract:
One of the most important tasks in medical image processing is the brain's whole tumor segmentation. It assists in quicker clinical assessment and early detection of brain tumors, which is crucial for lifesaving treatment procedures of patients. Because, brain tumors often can be malignant or benign, if they are detected at an early stage. A brain tumor is a collection or a mass of abnormal cells…
▽ More
One of the most important tasks in medical image processing is the brain's whole tumor segmentation. It assists in quicker clinical assessment and early detection of brain tumors, which is crucial for lifesaving treatment procedures of patients. Because, brain tumors often can be malignant or benign, if they are detected at an early stage. A brain tumor is a collection or a mass of abnormal cells in the brain. The human skull encloses the brain very rigidly and any growth inside this restricted place can cause severe health issues. The detection of brain tumors requires careful and intricate analysis for surgical planning and treatment. Most physicians employ Magnetic Resonance Imaging (MRI) to diagnose such tumors. A manual diagnosis of the tumors using MRI is known to be time-consuming; approximately, it takes up to eighteen hours per sample. Thus, the automatic segmentation of tumors has become an optimal solution for this problem. Studies have shown that this technique provides better accuracy and it is faster than manual analysis resulting in patients receiving the treatment at the right time. Our research introduces an efficient strategy called Multi-channel MRI embedding to improve the result of deep learning-based tumor segmentation. The experimental analysis on the Brats-2019 dataset wrt the U-Net encoder-decoder (EnDec) model shows significant improvement. The embedding strategy surmounts the state-of-the-art approaches with an improvement of 2% without any timing overheads.
△ Less
Submitted 13 September, 2020;
originally announced September 2020.
-
Machine Learning Approach for Skill Evaluation in Robotic-Assisted Surgery
Authors:
Mahtab J. Fard,
Sattar Ameri,
Ratna B. Chinnam,
Abhilash K. Pandya,
Michael D. Klein,
R. Darin Ellis
Abstract:
Evaluating surgeon skill has predominantly been a subjective task. Development of objective methods for surgical skill assessment are of increased interest. Recently, with technological advances such as robotic-assisted minimally invasive surgery (RMIS), new opportunities for objective and automated assessment frameworks have arisen. In this paper, we applied machine learning methods to automatica…
▽ More
Evaluating surgeon skill has predominantly been a subjective task. Development of objective methods for surgical skill assessment are of increased interest. Recently, with technological advances such as robotic-assisted minimally invasive surgery (RMIS), new opportunities for objective and automated assessment frameworks have arisen. In this paper, we applied machine learning methods to automatically evaluate performance of the surgeon in RMIS. Six important movement features were used in the evaluation including completion time, path length, depth perception, speed, smoothness and curvature. Different classification methods applied to discriminate expert and novice surgeons. We test our method on real surgical data for suturing task and compare the classification result with the ground truth data (obtained by manual labeling). The experimental results show that the proposed framework can classify surgical skill level with relatively high accuracy of 85.7%. This study demonstrates the ability of machine learning methods to automatically classify expert and novice surgeons using movement features for different RMIS tasks. Due to the simplicity and generalizability of the introduced classification method, it is easy to implement in existing trainers.
△ Less
Submitted 15 November, 2016;
originally announced November 2016.