Search | arXiv e-print repository

Practical Equivalence Testing and Its Application in Synthetic Pre-Crash Scenario Validation

Authors: Jian Wu, Ulrich Sander, Carol Flannagan, Minxiang Zhao, Jonas Bärgman

Abstract: The use of representative pre-crash scenarios is critical for assessing the safety impact of driving automation systems through simulation. However, a gap remains in the robust evaluation of the similarity between synthetic and real-world pre-crash scenarios and their crash characteristics. Without proper validation, it cannot be ensured that the synthetic test scenarios adequately represent real-… ▽ More The use of representative pre-crash scenarios is critical for assessing the safety impact of driving automation systems through simulation. However, a gap remains in the robust evaluation of the similarity between synthetic and real-world pre-crash scenarios and their crash characteristics. Without proper validation, it cannot be ensured that the synthetic test scenarios adequately represent real-world driving behaviors and crash characteristics. One reason for this validation gap is the lack of focus on methods to confirm that the synthetic test scenarios are practically equivalent to real-world ones, given the assessment scope. Traditional statistical methods, like significance testing, focus on detecting differences rather than establishing equivalence; since failure to detect a difference does not imply equivalence, they are of limited applicability for validating synthetic pre-crash scenarios and crash characteristics. This study addresses this gap by proposing an equivalence testing method based on the Bayesian Region of Practical Equivalence (ROPE) framework. This method is designed to assess the practical equivalence of scenario characteristics that are most relevant for the intended assessment, making it particularly appropriate for the domain of virtual safety assessments. We first review existing equivalence testing methods. Then we propose and demonstrate the Bayesian ROPE-based method by testing the equivalence of two rear-end pre-crash datasets. Our approach focuses on the most relevant scenario characteristics. Our analysis provides insights into the practicalities and effectiveness of equivalence testing in synthetic test scenario validation and demonstrates the importance of testing for improving the credibility of synthetic data for automated vehicle safety assessment, as well as the credibility of subsequent safety impact assessments. △ Less

Submitted 20 May, 2025; v1 submitted 19 May, 2025; originally announced May 2025.

arXiv:2503.00818 [pdf]

Strategic decision points in experiments: A predictive Bayesian optional stopping method

Authors: Xiaomi Yang, Carol Flannagan, Jonas Bärgman

Abstract: Sample size determination is crucial in experimental design, especially in traffic and transport research. Frequentist statistics require a fixed sample size determined by power analysis, which cannot be adjusted once the experiment starts. Bayesian sample size determination, with proper priors, offers an alternative. Bayesian optional stopping (BOS) allows experiments to stop when statistical tar… ▽ More Sample size determination is crucial in experimental design, especially in traffic and transport research. Frequentist statistics require a fixed sample size determined by power analysis, which cannot be adjusted once the experiment starts. Bayesian sample size determination, with proper priors, offers an alternative. Bayesian optional stopping (BOS) allows experiments to stop when statistical targets are met. We introduce predictive Bayesian optional stopping (pBOS), combining BOS with Bayesian rehearsal simulations to predict future data and stop experiments if targets are unlikely to be met within resource constraints. We identified and corrected a bias in predictions using multiple linear regression. pBOS shows up to 118% better cost benefit than traditional BOS and is more efficient than frequentist methods. pBOS allows researchers to, under certain conditions, stop experiments when resources are insufficient or when enough data is collected, optimizing resource use and cost savings. △ Less

Submitted 2 March, 2025; originally announced March 2025.

arXiv:2503.00815 [pdf, other]

Evaluation of adaptive sampling methods in scenario generation for virtual safety impact assessment of pre-crash safety systems

Authors: Xiaomi Yang, Henrik Imberg, Carol Flannagan, Jonas Bärgman

Abstract: Virtual safety assessment plays a vital role in evaluating the safety impact of pre-crash safety systems such as advanced driver assistance systems (ADAS) and automated driving systems (ADS). However, as the number of parameters in simulation-based scenario generation increases, the number of crash scenarios to simulate grows exponentially, making complete enumeration computationally infeasible. E… ▽ More Virtual safety assessment plays a vital role in evaluating the safety impact of pre-crash safety systems such as advanced driver assistance systems (ADAS) and automated driving systems (ADS). However, as the number of parameters in simulation-based scenario generation increases, the number of crash scenarios to simulate grows exponentially, making complete enumeration computationally infeasible. Efficient sampling methods, such as importance sampling and active sampling, have been proposed to address this challenge. However, a comprehensive evaluation of how domain knowledge, stratification, and batch sampling affect their efficiency remains limited. This study evaluates the performance of importance sampling and active sampling in scenario generation, incorporating two domain-knowledge-driven features: adaptive sample space reduction (ASSR) and stratification. Additionally, we assess the effects of a third feature, batch sampling, on computational efficiency in terms of both CPU and wall-clock time. Based on our findings, we provide practical recommendations for applying ASSR, stratification, and batch sampling to optimize sampling performance. Our results demonstrate that ASSR substantially improves sampling efficiency for both importance sampling and active sampling. When integrated into active sampling, ASSR reduces the root mean squared estimation error (RMSE) of the estimates by up to 90\%. Stratification further improves sampling performance for both methods, regardless of ASSR implementation. When ASSR and/or stratification are applied, importance sampling performs on par with active sampling, whereas when neither feature is used, active sampling is more efficient. Larger batch sizes reduce wall-clock time but increase the number of simulations required to achieve the same estimation accuracy. △ Less

Submitted 2 March, 2025; originally announced March 2025.

arXiv:2408.07758 [pdf]

doi 10.1080/15389588.2024.2435620

RAVE Checklist: Recommendations for Overcoming Challenges in Retrospective Safety Studies of Automated Driving Systems

Authors: John M. Scanlon, Eric R. Teoh, David G. Kidd, Kristofer D. Kusano, Jonas Bärgman, Geoffrey Chi-Johnston, Luigi Di Lillo, Francesca Favaro, Carol Flannagan, Henrik Liers, Bonnie Lin, Magdalena Lindman, Shane McLaughlin, Miguel Perez, Trent Victor

Abstract: The public, regulators, and domain experts alike seek to understand the effect of deployed SAE level 4 automated driving system (ADS) technologies on safety. The recent expansion of ADS technology deployments is paving the way for early stage safety impact evaluations, whereby the observational data from both an ADS and a representative benchmark fleet are compared to quantify safety performance.… ▽ More The public, regulators, and domain experts alike seek to understand the effect of deployed SAE level 4 automated driving system (ADS) technologies on safety. The recent expansion of ADS technology deployments is paving the way for early stage safety impact evaluations, whereby the observational data from both an ADS and a representative benchmark fleet are compared to quantify safety performance. In January 2024, a working group of experts across academia, insurance, and industry came together in Washington, DC to discuss the current and future challenges in performing such evaluations. A subset of this working group then met, virtually, on multiple occasions to produce this paper. This paper presents the RAVE (Retrospective Automated Vehicle Evaluation) checklist, a set of fifteen recommendations for performing and evaluating retrospective ADS performance comparisons. The recommendations are centered around the concepts of (1) quality and validity, (2) transparency, and (3) interpretation. Over time, it is anticipated there will be a large and varied body of work evaluating the observed performance of these ADS fleets. Establishing and promoting good scientific practices benefits the work of stakeholders, many of whom may not be subject matter experts. This working group's intentions are to: i) strengthen individual research studies and ii) make the at-large community more informed on how to evaluate this collective body of work. △ Less

Submitted 14 August, 2024; originally announced August 2024.

arXiv:2406.15538 [pdf, other]

Model-based generation of representative rear-end crash scenarios across the full severity range using pre-crash data

Authors: Jian Wu, Carol Flannagan, Ulrich Sander, Jonas Bärgman

Abstract: Generating representative rear-end crash scenarios is crucial for safety assessments of Advanced Driver Assistance Systems (ADAS) and Automated Driving systems (ADS). However, existing methods for scenario generation face challenges such as limited and biased in-depth crash data and difficulties in validation. This study sought to overcome these challenges by combining naturalistic driving data an… ▽ More Generating representative rear-end crash scenarios is crucial for safety assessments of Advanced Driver Assistance Systems (ADAS) and Automated Driving systems (ADS). However, existing methods for scenario generation face challenges such as limited and biased in-depth crash data and difficulties in validation. This study sought to overcome these challenges by combining naturalistic driving data and pre-crash kinematics data from rear-end crashes. The combined dataset was weighted to create a representative dataset of rear-end crash characteristics across the full severity range in the United States. Multivariate distribution models were built for the combined dataset, and a driver behavior model for the following vehicle was created by combining two existing models. Simulations were conducted to generate a set of synthetic rear-end crash scenarios, which were then weighted to create a representative synthetic rear-end crash dataset. Finally, the synthetic dataset was validated by comparing the distributions of parameters and the outcomes (Delta-v, the total change in vehicle velocity over the duration of the crash event) of the generated crashes with those in the original combined dataset. The synthetic crash dataset can be used for the safety assessments of ADAS and ADS and as a benchmark when evaluating the representativeness of scenarios generated through other methods. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2401.01501 [pdf, other]

Evaluation of automated driving system safety metrics with logged vehicle trajectory data

Authors: Xintao Yan, Shuo Feng, David J. LeBlanc, Carol Flannagan, Henry X. Liu

Abstract: Real-time safety metrics are important for the automated driving system (ADS) to assess the risk of driving situations and to assist the decision-making. Although a number of real-time safety metrics have been proposed in the literature, systematic performance evaluation of these safety metrics has been lacking. As different behavioral assumptions are adopted in different safety metrics, it is dif… ▽ More Real-time safety metrics are important for the automated driving system (ADS) to assess the risk of driving situations and to assist the decision-making. Although a number of real-time safety metrics have been proposed in the literature, systematic performance evaluation of these safety metrics has been lacking. As different behavioral assumptions are adopted in different safety metrics, it is difficult to compare the safety metrics and evaluate their performance. To overcome this challenge, in this study, we propose an evaluation framework utilizing logged vehicle trajectory data, in that vehicle trajectories for both subject vehicle (SV) and background vehicles (BVs) are obtained and the prediction errors caused by behavioral assumptions can be eliminated. Specifically, we examine whether the SV is in a collision unavoidable situation at each moment, given all near-future trajectories of BVs. In this way, we level the ground for a fair comparison of different safety metrics, as a good safety metric should always alarm in advance to the collision unavoidable moment. When trajectory data from a large number of trips are available, we can systematically evaluate and compare different metrics' statistical performance. In the case study, three representative real-time safety metrics, including the time-to-collision (TTC), the PEGASUS Criticality Metric (PCM), and the Model Predictive Instantaneous Safety Metric (MPrISM), are evaluated using a large-scale simulated trajectory dataset. The proposed evaluation framework is important for researchers, practitioners, and regulators to characterize different metrics, and to select appropriate metrics for different applications. Moreover, by conducting failure analysis on moments when a safety metric failed, we can identify its potential weaknesses which are valuable for its potential refinements and improvements. △ Less

Submitted 2 January, 2024; originally announced January 2024.

arXiv:2310.08453 [pdf, other]

doi 10.1109/TITS.2024.3369097

Modeling Lead-vehicle Kinematics For Rear-end Crash Scenario Generation

Authors: Jian Wu, Carol Flannagan, Ulrich Sander, Jonas Bärgman

Abstract: The use of virtual safety assessment as the primary method for evaluating vehicle safety technologies has emphasized the importance of crash scenario generation. One of the most common crash types is the rear-end crash, which involves a lead vehicle and a following vehicle. Most studies have focused on the following vehicle, assuming that the lead vehicle maintains a constant acceleration/decelera… ▽ More The use of virtual safety assessment as the primary method for evaluating vehicle safety technologies has emphasized the importance of crash scenario generation. One of the most common crash types is the rear-end crash, which involves a lead vehicle and a following vehicle. Most studies have focused on the following vehicle, assuming that the lead vehicle maintains a constant acceleration/deceleration before the crash. However, there is no evidence for this premise in the literature. This study aims to address this knowledge gap by thoroughly analyzing and modeling the lead vehicle's behavior as a first step in generating rear-end crash scenarios. Accordingly, the study employed a piecewise linear model to parameterize the speed profiles of lead vehicles, utilizing two rear-end pre-crash/near-crash datasets. These datasets were merged and categorized into multiple sub-datasets; for each one, a multivariate distribution was constructed to represent the corresponding parameters. Subsequently, a synthetic dataset was generated using these distribution models and validated by comparison with the original combined dataset. The results highlight diverse lead-vehicle speed patterns, indicating that a more accurate model, such as the proposed piecewise linear model, is required instead of the conventional constant acceleration/deceleration model. Crashes generated with the proposed models accurately match crash data across the full severity range, surpassing existing lead-vehicle kinematics models in both severity range and accuracy. By providing more realistic speed profiles for the lead vehicle, the model developed in the study contributes to creating realistic rear-end crash scenarios and reconstructing real-life crashes. △ Less

Submitted 13 October, 2023; v1 submitted 22 September, 2023; originally announced October 2023.

Journal ref: IEEETrans.Intell.Transp.Syst. 25 (2024) 3176-3186

arXiv:2212.10024 [pdf, other]

doi 10.1080/00401706.2024.2374554

Active sampling: A machine-learning-assisted framework for finite population inference with optimal subsamples

Authors: Henrik Imberg, Xiaomi Yang, Carol Flannagan, Jonas Bärgman

Abstract: Data subsampling has become widely recognized as a tool to overcome computational and economic bottlenecks in analyzing massive datasets. We contribute to the development of adaptive design for estimation of finite population characteristics, using active learning and adaptive importance sampling. We propose an active sampling strategy that iterates between estimation and data collection with opti… ▽ More Data subsampling has become widely recognized as a tool to overcome computational and economic bottlenecks in analyzing massive datasets. We contribute to the development of adaptive design for estimation of finite population characteristics, using active learning and adaptive importance sampling. We propose an active sampling strategy that iterates between estimation and data collection with optimal subsamples, guided by machine learning predictions on yet unseen data. The method is illustrated on virtual simulation-based safety assessment of advanced driver assistance systems. Substantial performance improvements are demonstrated compared to traditional sampling methods. △ Less

Submitted 3 July, 2024; v1 submitted 20 December, 2022; originally announced December 2022.

Comments: Accepted for Technometrics

MSC Class: 62D99; 62K05; 62L05; 62P30 ACM Class: G.3

arXiv:2204.03215 [pdf, other]

Robust Model-based Inference for Non-Probability Samples

Authors: Ali Rafei, Michael R. Elliott, Carol A. C. Flannagan

Abstract: With the ubiquitous availability of unstructured data, growing attention is paid as how to adjust for selection bias in such non-probability samples. The majority of the robust estimators proposed by prior literature are either fully or partially design-based, which may lead to inefficient estimates if outlying (pseudo-)weights are present. In addition, correctly reflecting the uncertainty of the… ▽ More With the ubiquitous availability of unstructured data, growing attention is paid as how to adjust for selection bias in such non-probability samples. The majority of the robust estimators proposed by prior literature are either fully or partially design-based, which may lead to inefficient estimates if outlying (pseudo-)weights are present. In addition, correctly reflecting the uncertainty of the adjusted estimator remains a challenge when the available reference survey is complex in the sample design. This article proposes a fully model-based method for inference using non-probability samples where the goal is to predict the outcome variable for the entire population units. We employ a Bayesian bootstrap method with Rubin's combing rules to derive the adjusted point and interval estimates. Using Gaussian process regression, our method allows for kernel matching between the non-probability sample units and population units based on the estimated selection propensities when the outcome model is misspecified. The repeated sampling properties of our method are evaluated through two Monte Carlo simulation studies. Finally, we examine it on a real-world non-probability sample with the aim to estimate crash-attributed injury rates in different body regions in the United States. △ Less

Submitted 7 April, 2022; originally announced April 2022.

arXiv:2203.14355 [pdf, other]

Robust and Efficient Bayesian Inference for Non-Probability Samples

Authors: Ali Rafei, Michael R. Elliott, Carol A. C. Flannagan

Abstract: The declining response rates in probability surveys along with the widespread availability of unstructured data has led to growing research into non-probability samples. Existing robust approaches are not well-developed for non-Gaussian outcomes and may perform poorly in presence of influential pseudo-weights. Furthermore, their variance estimator lacks a unified framework and rely often on asympt… ▽ More The declining response rates in probability surveys along with the widespread availability of unstructured data has led to growing research into non-probability samples. Existing robust approaches are not well-developed for non-Gaussian outcomes and may perform poorly in presence of influential pseudo-weights. Furthermore, their variance estimator lacks a unified framework and rely often on asymptotic theory. To address these gaps, we propose an alternative Bayesian approach using a partially linear Gaussian process regression that utilizes a prediction model with a flexible function of the pseudo-inclusion probabilities to impute the outcome variable for the reference survey. By efficiency, we mean not only computational scalability but also superiority with respect to variance. We also show that Gaussian process regression behaves as a kernel matching technique based on the estimated propensity scores, which yields double robustness and lowers sensitivity to influential pseudo-weights. Using the simulated posterior predictive distribution, one can directly quantify the uncertainty of the proposed estimator and derive associated $95\%$ credible intervals. We assess the repeated sampling properties of our method in two simulation studies. The application of this study deals with modeling count data with varying exposures under a non-probability sample setting. △ Less

Submitted 27 March, 2022; originally announced March 2022.

arXiv:2101.07456 [pdf, other]

doi 10.48550/arXiv.2101.0745

Robust Bayesian Inference for Big Data: Combining Sensor-based Records with Traditional Survey Data

Authors: Ali Rafei, Carol A. C. Flannagan, Brady T. West, Michael R. Elliott

Abstract: Big Data often presents as massive non-probability samples. Not only is the selection mechanism often unknown, but larger data volume amplifies the relative contribution of selection bias to total error. Existing bias adjustment approaches assume that the conditional mean structures have been correctly specified for the selection indicator or key substantive measures. In the presence of a referenc… ▽ More Big Data often presents as massive non-probability samples. Not only is the selection mechanism often unknown, but larger data volume amplifies the relative contribution of selection bias to total error. Existing bias adjustment approaches assume that the conditional mean structures have been correctly specified for the selection indicator or key substantive measures. In the presence of a reference probability sample, these methods rely on a pseudo-likelihood method to account for the sampling weights of the reference sample, which is parametric in nature. Under a Bayesian framework, handling the sampling weights is an even bigger hurdle. To further protect against model misspecification, we expand the idea of double robustness such that more flexible non-parametric methods, as well as Bayesian models, can be used for prediction. In particular, we employ Bayesian additive regression trees, which not only capture non-linear associations automatically but permit direct quantification of the uncertainty of point estimates through its posterior predictive draws. We apply our method to sensor-based naturalistic driving data from the second Strategic Highway Research Program using the 2017 National Household Travel Survey as a benchmark. △ Less

Submitted 26 March, 2022; v1 submitted 18 January, 2021; originally announced January 2021.

arXiv:2011.14922 [pdf, other]

doi 10.1007/s42421-022-00053-8

Driver Behavior Extraction from Videos in Naturalistic Driving Datasets with 3D ConvNets

Authors: Hanwen Miao, Shengan Zhang, Carol Flannagan

Abstract: Naturalistic driving data (NDD) is an important source of information to understand crash causation and human factors and to further develop crash avoidance countermeasures. Videos recorded while driving are often included in such datasets. While there is often a large amount of video data in NDD, only a small portion of them can be annotated by human coders and used for research, which underuses… ▽ More Naturalistic driving data (NDD) is an important source of information to understand crash causation and human factors and to further develop crash avoidance countermeasures. Videos recorded while driving are often included in such datasets. While there is often a large amount of video data in NDD, only a small portion of them can be annotated by human coders and used for research, which underuses all video data. In this paper, we explored a computer vision method to automatically extract the information we need from videos. More specifically, we developed a 3D ConvNet algorithm to automatically extract cell-phone-related behaviors from videos. The experiments show that our method can extract chunks from videos, most of which (~79%) contain the automatically labeled cell phone behaviors. In conjunction with human review of the extracted chunks, this approach can find cell-phone-related driver behaviors much more efficiently than simply viewing video. △ Less

Submitted 30 November, 2020; originally announced November 2020.

arXiv:2002.00386 [pdf]

doi 10.1016/j.aap.2020.105455

Multitasking additional-to-driving: Prevalence, structure, and associated risk in SHRP2 naturalistic driving data

Authors: András Bálint, Carol A. C. Flannagan, Andrew Leslie, Sheila Klauer, Feng Guo, Marco Dozza

Abstract: This paper 1) analyzes the extent to which drivers engage in multitasking additional-to-driving (MAD) under various conditions, 2) specifies odds ratios (ORs) of crashing associated with MAD compared to no task engagement, and 3) explores the structure of MAD, based on data from the Second Strategic Highway Research Program Naturalistic Driving Study (SHRP2 NDS). Sensitivity analysis in which seco… ▽ More This paper 1) analyzes the extent to which drivers engage in multitasking additional-to-driving (MAD) under various conditions, 2) specifies odds ratios (ORs) of crashing associated with MAD compared to no task engagement, and 3) explores the structure of MAD, based on data from the Second Strategic Highway Research Program Naturalistic Driving Study (SHRP2 NDS). Sensitivity analysis in which secondary tasks were re-defined by grouping similar tasks was performed to investigate the extent to which ORs are affected by the specific task definitions in SHRP2. A novel visual representation of multitasking was developed to show which secondary tasks co-occur frequently and which ones do not. MAD occurs in 11% of control driving segments, 22% of crashes and near-crashes (CNC), 26% of Level 1-3 crashes and 39% of rear-end striking crashes, and 9%, 16%, 17% and 28% respectively for the same event types if MAD is defined in terms of general task groups. The most common co-occurrences of secondary tasks vary substantially among event types; for example, 'Passenger in adjacent seat - interaction' and 'Other non-specific internal eye glance' tend to co-occur in CNC but tend not to co-occur in control driving segments. The odds ratios of MAD compared to driving without any secondary task and the corresponding 95% confidence intervals are 2.38 (2.17-2.61) for CNC, 3.72 (3.11-4.45) for Level 1-3 crashes and 8.48 (5.11-14.07) for rear-end striking crashes. The corresponding ORs using general task groups to define MAD are slightly lower at 2.00 (1.80-2.21) for CNC, 3.03 (2.48-3.69) for Level 1-3 crashes and 6.94 (4.04-11.94) for rear-end striking crashes. The results confirm that independently of whether secondary tasks are defined according to SHRP2 or general task groups, the reduction of driving performance from MAD observed in simulator studies is manifested in real-world crashes as well. △ Less

Submitted 2 February, 2020; originally announced February 2020.

Comments: Accepted manuscript, to appear in Accident Analysis and Prevention. 21 pages, 11 figures

Journal ref: Accident Analysis & Prevention 137, March 2020, 105455

arXiv:1812.08855 [pdf, ps, other]

Accounting for selection bias due to death in estimating the effect of wealth shock on cognition for the Health and Retirement Study

Authors: Yaoyuan Vincent Tan, Carol A. C. Flannagan, Lindsay R. Pool, Michael R. Elliott

Abstract: The Health and Retirement Study is a longitudinal study of US adults enrolled at age 50 and older. We were interested in investigating the effect of a sudden large decline in wealth on the cognitive score of subjects. Our analysis was complicated by the lack of randomization, confounding by indication, and a substantial fraction of the sample and population will die during follow-up leading to som… ▽ More The Health and Retirement Study is a longitudinal study of US adults enrolled at age 50 and older. We were interested in investigating the effect of a sudden large decline in wealth on the cognitive score of subjects. Our analysis was complicated by the lack of randomization, confounding by indication, and a substantial fraction of the sample and population will die during follow-up leading to some of our outcomes being censored. Common methods to handle these problems for example marginal structural models, may not be appropriate because it upweights subjects who are more likely to die to obtain a population that over time resembles that would have been obtained in the absence of death. We propose a refined approach by comparing the treatment effect among subjects who would survive under both sets of treatment regimes being considered. We do so by viewing this as a large missing data problem and impute the survival status and outcomes of the counterfactual. To improve the robustness of our imputation, we used a modified version of the penalized spline of propensity methods in treatment comparisons approach. We found that our proposed method worked well in various simulation scenarios and our data analysis. △ Less

Submitted 20 December, 2018; originally announced December 2018.

Comments: 43 pages, 8 Tables

arXiv:1801.03147 [pdf, ps, other]

"Robust-squared" Imputation Models Using BART

Authors: Yaoyuan V. Tan, Carol A. C. Flannagan, Michael R. Elliott

Abstract: Examples of "doubly robust" estimator for missing data include augmented inverse probability weighting (AIPWT) models (Robins et al., 1994) and penalized splines of propensity prediction (PSPP) models (Zhang and Little, 2009). Doubly-robust estimators have the property that, if either the response propensity or the mean is modeled correctly, a consistent estimator of the population mean is obtaine… ▽ More Examples of "doubly robust" estimator for missing data include augmented inverse probability weighting (AIPWT) models (Robins et al., 1994) and penalized splines of propensity prediction (PSPP) models (Zhang and Little, 2009). Doubly-robust estimators have the property that, if either the response propensity or the mean is modeled correctly, a consistent estimator of the population mean is obtained. However, doubly-robust estimators can perform poorly when modest misspecification is present in both models (Kang and Schafer, 2007). Here we consider extensions of the AIPWT and PSPP models that use Bayesian Additive Regression Trees (BART; Chipman et al., 2010) to provide highly robust propensity and mean model estimation. We term these "robust-squared" in the sense that the propensity score, the means, or both can be estimated with minimal model misspecification, and applied to the doubly-robust estimator. We consider their behavior via simulations where propensities and/or mean models are misspecified. We apply our proposed method to impute missing instantaneous velocity (delta-v) values from the 2014 National Automotive Sampling System Crashworthiness Data System dataset and missing Blood Alcohol Concentration values from the 2015 Fatality Analysis Reporting System dataset. We found that BART applied to PSPP and AIPWT, provides a more robust and efficient estimate compared to PSPP and AIPWT, with the BART-estimated propensity score combined with PSPP providing the most efficient estimator with close to nominal coverage. △ Less

Submitted 9 January, 2018; originally announced January 2018.

arXiv:1609.07464 [pdf, other]

Predicting human-driving behavior to help driverless vehicles drive: random intercept Bayesian Additive Regression Trees

Authors: Yaoyuan Vincent Tan, Carol A. C. Flannagan, Michael R. Elliott

Abstract: The development of driverless vehicles has spurred the need to predict human driving behavior to facilitate interaction between driverless and human-driven vehicles. Predicting human driving movements can be challenging, and poor prediction models can lead to accidents between the driverless and human-driven vehicles. We used the vehicle speed obtained from a naturalistic driving dataset to predic… ▽ More The development of driverless vehicles has spurred the need to predict human driving behavior to facilitate interaction between driverless and human-driven vehicles. Predicting human driving movements can be challenging, and poor prediction models can lead to accidents between the driverless and human-driven vehicles. We used the vehicle speed obtained from a naturalistic driving dataset to predict whether a human-driven vehicle would stop before executing a left turn. In a preliminary analysis, we found that BART produced less variable and higher AUC values compared to a variety of other state-of-the-art binary predictor methods. However, BART assumes independent observations, but our dataset consists of multiple observations clustered by driver. Although methods extending BART to clustered or longitudinal data are available, they lack readily available software and can only be applied to clustered continuous outcomes. We extend BART to handle correlated binary observations by adding a random intercept and used a simulation study to determine bias, root mean squared error, 95% coverage, and average length of 95% credible interval in a correlated data setting. We then successfully implemented our random intercept BART model to our clustered dataset and found substantial improvements in prediction performance compared to BART and random intercept linear logistic regression. △ Less

Submitted 1 May, 2017; v1 submitted 23 September, 2016; originally announced September 2016.

Comments: 6 figures, 1 Table

Showing 1–16 of 16 results for author: Flannagan, C