The External Validity of Combinatorial Samples and Populations

Ribeiro, Andre F.

Statistics > Methodology

arXiv:2108.04376v3 (stat)

COVID-19 e-print

Important: e-prints posted on arXiv are not peer-reviewed by arXiv; they should not be relied upon without context to guide clinical practice or health-related behavior and should not be reported in news media as established information without consulting multiple experts in the field.

[Submitted on 9 Aug 2021 (v1), revised 25 Feb 2023 (this version, v3), latest version 2 Jun 2024 (v6)]

Title:The External Validity of Combinatorial Samples and Populations

Authors:Andre F. Ribeiro

View PDF

Abstract:The widely used 'Counterfactual' definition of Causal Effects was derived for unbiasedness and accuracy - and not generalizability. We propose a simple definition for the External Validity (EV) of interventions and counterfactuals. The definition leads to EV statistics for individual counterfactuals, and to non-parametric effect estimators for sets of counterfactuals (i.e., for samples). We use this definition to discuss several issues that have baffled the original counterfactual formulation: out-of-sample validity, reliance on independence assumptions or estimation, concurrent estimation of multiple effects and full-models, bias-variance tradeoffs, statistical power, omitted variables, and connections to current predictive and explaining techniques.
Methodologically, the definition also allows us to replace the parametric, and generally ill-posed, estimation problems that followed the counterfactual definition by combinatorial enumeration problems in non-experimental samples. We use this framework to generalize popular supervised, explaining, and causal-effect estimators, improving their performance across three dimensions (External Validity, Unconfoundness and Accuracy), and enabling their use in non-i.i.d. samples. We demonstrate gains in out-of-sample prediction, intervention effect prediction, and causal effect estimation tasks. The COVID19 pandemic highlighted the need for learning solutions to provide general predictions in small samples - many times with missing variables. We also demonstrate applications in this pressing problem.

Subjects:	Methodology (stat.ME); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2108.04376 [stat.ME]
	(or arXiv:2108.04376v3 [stat.ME] for this version)
	https://doi.org/10.48550/arXiv.2108.04376

Submission history

From: Andre Ribeiro [view email]
[v1] Mon, 9 Aug 2021 22:17:29 UTC (17,722 KB)
[v2] Fri, 23 Dec 2022 18:20:29 UTC (37,601 KB)
[v3] Sat, 25 Feb 2023 23:40:59 UTC (5,063 KB)
[v4] Tue, 5 Mar 2024 18:28:09 UTC (5,679 KB)
[v5] Tue, 14 May 2024 17:50:27 UTC (5,688 KB)
[v6] Sun, 2 Jun 2024 12:32:37 UTC (5,688 KB)

Statistics > Methodology

Title:The External Validity of Combinatorial Samples and Populations

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Methodology

Title:The External Validity of Combinatorial Samples and Populations

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators