Designing a Data Science simulation with MERITS: A Primer

Elliott, Corrine F; Duncan, James PC; Tang, Tiffany M; Behr, Merle; Kumbier, Karl; Yu, Bin

Statistics > Computation

arXiv:2403.08971 (stat)

[Submitted on 13 Mar 2024 (v1), last revised 15 May 2025 (this version, v2)]

Title:Designing a Data Science simulation with MERITS: A Primer

Authors:Corrine F Elliott, James PC Duncan, Tiffany M Tang, Merle Behr, Karl Kumbier, Bin Yu

View PDF HTML (experimental)

Abstract:Simulations play a crucial role in the modern scientific process. Yet despite (or due to) this ubiquity, the Data Science community shares neither a comprehensive definition for a "high-quality" study nor a consolidated guide to designing one. Inspired by the Predictability-Computability-Stability (PCS) framework for 'veridical' Data Science, we propose six MERITS that a simulation study should satisfy. (Modularity and Efficiency support the computability of a study, encouraging clean and flexible implementation. Realism and Stability address the conceptualization of the research problem: How well does a study predict reality, such that its conclusions generalize to new data/contexts? Finally, Intuitiveness and Transparency encourage good communication and trustworthiness of study design and results.) Drawing an analogy between simulation and cooking, we moreover offer (a) a conceptual framework for thinking about the anatomy of a simulation 'recipe'; (b) a baker's dozen in guidelines to aid the Data Science practitioner in designing one; and (c) a case study demonstrating the practical utility of our framework by using it to autopsy a preexisting simulation study. With this "PCS primer" for high-quality Data Science simulation, we seek to distill and enrich the best practices of simulation across disciplines into a cohesive recipe for trustworthy, veridical Data Science.

Comments:	31 pages (main text); 1 figure; 2 tables; James PC Duncan, Tiffany M Tang: Authors contributed equally to this manuscript; Merle Behr, Karl Kumbier: Authors contributed equally to this manuscript
Subjects:	Computation (stat.CO)
Cite as:	arXiv:2403.08971 [stat.CO]
	(or arXiv:2403.08971v2 [stat.CO] for this version)
	https://doi.org/10.48550/arXiv.2403.08971

Submission history

From: Corrine Elliott PhD [view email]
[v1] Wed, 13 Mar 2024 21:32:14 UTC (595 KB)
[v2] Thu, 15 May 2025 05:16:07 UTC (598 KB)

Statistics > Computation

Title:Designing a Data Science simulation with MERITS: A Primer

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Computation

Title:Designing a Data Science simulation with MERITS: A Primer

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators