Learning Counterfactual Distributions via Kernel Nearest Neighbors

Choi, Kyuseong; Feitelberg, Jacob; Chin, Caleb; Agarwal, Anish; Dwivedi, Raaz

Statistics > Machine Learning

arXiv:2410.13381 (stat)

[Submitted on 17 Oct 2024 (v1), last revised 2 Dec 2024 (this version, v2)]

Title:Learning Counterfactual Distributions via Kernel Nearest Neighbors

Authors:Kyuseong Choi, Jacob Feitelberg, Caleb Chin, Anish Agarwal, Raaz Dwivedi

View PDF HTML (experimental)

Abstract:Consider a setting with multiple units (e.g., individuals, cohorts, geographic locations) and outcomes (e.g., treatments, times, items), where the goal is to learn a multivariate distribution for each unit-outcome entry, such as the distribution of a user's weekly spend and engagement under a specific mobile app version. A common challenge is the prevalence of missing not at random data, where observations are available only for certain unit-outcome combinations and the observation availability can be correlated with the properties of distributions themselves, i.e., there is unobserved confounding. An additional challenge is that for any observed unit-outcome entry, we only have a finite number of samples from the underlying distribution. We tackle these two challenges by casting the problem into a novel distributional matrix completion framework and introduce a kernel based distributional generalization of nearest neighbors to estimate the underlying distributions. By leveraging maximum mean discrepancies and a suitable factor model on the kernel mean embeddings of the underlying distributions, we establish consistent recovery of the underlying distributions even when data is missing not at random and positivity constraints are violated. Furthermore, we demonstrate that our nearest neighbors approach is robust to heteroscedastic noise, provided we have access to two or more measurements for the observed unit-outcome entries, a robustness not present in prior works on nearest neighbors with single measurements.

Comments:	39 pages, 8 figures
Subjects:	Machine Learning (stat.ML); Machine Learning (cs.LG)
Cite as:	arXiv:2410.13381 [stat.ML]
	(or arXiv:2410.13381v2 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.2410.13381

Submission history

From: Kyuseong Choi [view email]
[v1] Thu, 17 Oct 2024 09:36:01 UTC (57 KB)
[v2] Mon, 2 Dec 2024 07:18:46 UTC (130 KB)

Statistics > Machine Learning

Title:Learning Counterfactual Distributions via Kernel Nearest Neighbors

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Learning Counterfactual Distributions via Kernel Nearest Neighbors

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators