Optimal Data Split Methodology for Model Validation

Morrison, Rebecca; Bryant, Corey; Terejanu, Gabriel; Miki, Kenji; Prudhomme, Serge

Physics > Data Analysis, Statistics and Probability

arXiv:1108.6043 (physics)

[Submitted on 30 Aug 2011]

Title:Optimal Data Split Methodology for Model Validation

Authors:Rebecca Morrison, Corey Bryant, Gabriel Terejanu, Kenji Miki, Serge Prudhomme

View PDF

Abstract:The decision to incorporate cross-validation into validation processes of mathematical models raises an immediate question - how should one partition the data into calibration and validation sets? We answer this question systematically: we present an algorithm to find the optimal partition of the data subject to certain constraints. While doing this, we address two critical issues: 1) that the model be evaluated with respect to predictions of a given quantity of interest and its ability to reproduce the data, and 2) that the model be highly challenged by the validation set, assuming it is properly informed by the calibration set. This framework also relies on the interaction between the experimentalist and/or modeler, who understand the physical system and the limitations of the model; the decision-maker, who understands and can quantify the cost of model failure; and the computational scientists, who strive to determine if the model satisfies both the modeler's and decision maker's requirements. We also note that our framework is quite general, and may be applied to a wide range of problems. Here, we illustrate it through a specific example involving a data reduction model for an ICCD camera from a shock-tube experiment located at the NASA Ames Research Center (ARC).

Comments:	Submitted to International Conference on Modeling, Simulation and Control 2011 (ICMSC'11), San Francisco, USA, 19-21 October, 2011
Subjects:	Data Analysis, Statistics and Probability (physics.data-an); Probability (math.PR); Methodology (stat.ME)
Cite as:	arXiv:1108.6043 [physics.data-an]
	(or arXiv:1108.6043v1 [physics.data-an] for this version)
	https://doi.org/10.48550/arXiv.1108.6043

Submission history

From: Gabriel Terejanu [view email]
[v1] Tue, 30 Aug 2011 19:24:12 UTC (882 KB)

Physics > Data Analysis, Statistics and Probability

Title:Optimal Data Split Methodology for Model Validation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Physics > Data Analysis, Statistics and Probability

Title:Optimal Data Split Methodology for Model Validation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators